FlowingData com Data Visualization Set 1st Edition Nathan Yau pdf download
FlowingData com Data Visualization Set 1st Edition Nathan Yau pdf download
https://ebookname.com/product/flowingdata-com-data-visualization-
set-1st-edition-nathan-yau/
https://ebookname.com/product/python-data-visualization-cookbook-
milovanovic/
https://ebookname.com/product/building-responsive-data-
visualization-for-the-web-1st-edition-hinderman/
https://ebookname.com/product/visual-data-mining-techniques-and-
tools-for-data-visualization-and-mining-1st-edition-tom-soukup/
https://ebookname.com/product/tcp-ip-tutorial-and-technical-
overview-8th-edition-lydia-parziale/
Handbook of Research on Managerial Thinking in Global
Business Economics 1st Edition Debasish Batabyal
https://ebookname.com/product/handbook-of-research-on-managerial-
thinking-in-global-business-economics-1st-edition-debasish-
batabyal/
https://ebookname.com/product/cultivating-ch-i-a-samurai-
physician-s-teachings-on-the-way-of-health-kaibara-ekiken/
https://ebookname.com/product/role-based-access-control-second-
edition-david-f-ferraiolo/
https://ebookname.com/product/earthquakes-and-coseismic-surface-
faulting-on-the-iranian-plateau-a-historical-social-and-physical-
approach-first-edition-manuel-berberian/
https://ebookname.com/product/production-and-operations-
analysis-7th-edition-steven-nahmias/
The Social Worker as Manager A Practical Guide to
Success Seventh Edition. Edition Robert W. Weinbach
https://ebookname.com/product/the-social-worker-as-manager-a-
practical-guide-to-success-seventh-edition-edition-robert-w-
weinbach/
CONTENTS
VISUALIZE THIS
DATA POINTS
Nathan Yau
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright © 2011 by Nathan Yau
Published by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-94488-2
ISBN: 978-1-118-14024-6 (ebk)
ISBN: 978-1-118-14026-0 (ebk)
ISBN: 978-1-118-14025-3 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sec-
tions 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Pub-
lisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permis-
sion should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties
with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warran-
ties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or
extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for
every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal,
accounting, or other professional services. If professional assistance is required, the services of a competent
professional person should be sought. Neither the publisher nor the author shall be liable for damages arising
herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential
source of further information does not mean that the author or the publisher endorses the information the
organization or website may provide or recommendations it may make. Further, readers should be aware that
Internet websites listed in this work may have changed or disappeared between when this work was written and
when it is read.
For general information on our other products and services please contact our Customer Care Department
within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Not all content that
is available in standard print versions of this book may appear or be packaged in all book formats. If you have
purchased a version of this book that did not include media that is referenced by or accompanies a standard print
version, you may request this media by visiting http://booksupport.wiley.com. For more information about
Wiley products, visit us at www.wiley.com .
Library of Congress Control Number: 2011928441
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/
or its affiliates, in the United States and other countries, and may not be used without written permission. All
other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any
product or vendor mentioned in this book.
To my loving wife, Bea
About the Author
Since 2007, Nathan Yau has written and created graphics for FlowingData, a site on
visualization, statistics, and design. Working with groups such as The New York Times,
CNN, Mozilla, and SyFy, Yau believes that data and information graphics, while great for
analysis, are also perfect for telling stories with data.
Yau has a master’s degree in statistics from the University of California, Los Angeles,
and is currently a Ph.D. candidate with a focus on visualization and personal data.
About the Technical Editor
This book would not be possible without the work by the data scientists before me who
developed and continue to create useful and open tools for everyone to use. The soft-
ware from these generous developers makes my life much easier, and I am sure they
will keep innovating.
My many thanks to FlowingData readers who helped me reach more people than I ever
imagined. They are one of the main reasons why this book was written.
Thank you to Wiley Publishing, who let me write the book that I wanted to, and to Kim
Rees for helping me produce something worth reading.
Finally, thank you to my wife for supporting me and to my parents who always encour-
aged me to find what makes me happy.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . xv
2 Handling Data . . . . . . . . . . . . . . . . . . . 21
Gather Data . . . . . . . . . . . . . . . . . . . . . 22
Formatting Data . . . . . . . . . . . . . . . . . . . . 38
Wrapping Up . . . . . . . . . . . . . . . . . . . . . 52
Index . . . . . . . . . . . . . . . . . . . . . . . . . 343
Introduction
Data is nothing new. People have been quantifying and tabulating things for centuries.
However, while writing for FlowingData, my website on design, visualization, and sta-
tistics, I’ve seen a huge boom in just these past few years, and it keeps getting better.
Improvements in technology have made it extremely easy to collect and store data,
and the web lets you access it whenever you want. This wealth in data can, in the right
hands, provide a wealth of information to help improve decision making, communicate
ideas more clearly, and provide a more objective window looking in at how you look at
the world and yourself.
A significant shift in release of government data came in mid-2009, with the United
States’ launch of Data.gov. It’s a comprehensive catalog of data provided by federal
agencies and represents transparency and accountability of groups and officials. The
thought here is that you should know how the government spends tax dollars. Whereas
before, the government felt more like a black box. A lot of the data on Data.gov was
already available on agency sites scattered across the web, but now a lot of it is all in
one place and better formatted for analysis and visualization. The United Nations has
something similar with UNdata; the United Kingdom launched Data.gov.uk soon after,
and cities around the world such as New York, San Francisco, and London have also
taken part in big releases of data.
The collective web has also grown to be more open with thousands of Application Pro-
gramming Interfaces (API) to encourage and entice developers to do something with all
the available data. Applications such as Twitter and Flickr provide comprehensive APIs
that enable completely different user interfaces from the actual sites. API-cataloging
site ProgrammableWeb reports more than 2,000 APIs. New applications, such as
Infochimps and Factual, also launched fairly recently and were specifically developed
to provide structured data.
At the individual level, you can update friends on Facebook, share your location on Four-
square, or tweet what you’re doing on Twitter, all with a few clicks on a mouse or taps on
a keyboard. More specialized applications enable you to log what you eat, how much you
xvi I ntroduc T ion
weigh, your mood, and plenty of other things. If you want to track some-
thing about yourself, there is probably an application to help you do it.
With all this data sitting around in stores, warehouses, and databases,
the field is ripe for people to make sense of it. The data itself isn’t all
that interesting (to most people). It’s the information that comes out of
the data. People want to know what their data says, and if you can help
them, you’re going to be in high demand. There’s a reason that Hal Var-
ian, Google’s chief economist, says that statistician is the sexy job of the
next 10 years, and it’s not just because statisticians are beautiful people.
(Although we are quite nice to look at in that geek chic sort of way.)
Visualization
One of the best ways to explore and try to understand a large dataset is with
visualization. Place the numbers into a visual space and let your brain or
your readers’ brains find the patterns. We’re good at that. You can often find
stories that you might never have found with just formal statistical methods.
John Tukey, my favorite statistician and the father of exploratory data analy-
sis, was well versed in statistical methods and properties but believed that
graphical techniques also had a place. He was a strong believer in discover-
ing the unexpected through pictures. You can find out a lot about data just by
visualizing it, and a lot of the time this is all you need to make an informed
decision or to tell a story.
For example, in 2009, the United States experienced a significant increase
in its unemployment rate. In 2007, the national average was 4.6 percent.
In 2008, it had risen to 5.8 percent. By September 2009, however, it was
9.8 percent. These national averages tell only part of the story though.
It’s generalizing over an entire country. Were there any regions that had
higher unemployment rates than others? Were there any regions that
seemed to be unaffected?
The maps in Figure I-1 tell a more complete story, and you can answer the
preceding questions after a glance. Darker-colored counties are areas
that had relatively higher unemployment rates, whereas the lighter-
colored counties had relatively lower rates. In 2009, you see a lot of
regions with rates greater than 10 percent in the west and most areas in
the east. Areas in the Midwest were not hit as hard (Figure I-2).
I ntroduc T ion xvii
You couldn’t find these geographic and temporal patterns so quickly with
just a spreadsheet, and definitely not with just the national averages. Also,
although the county-level data is more complex, most people can still
interpret the maps. These maps could in turn help policy makers decide
where to allocate relief funds or other types of support.
The great thing about this is that the data used to produce these maps is
all free and publicly available from the Bureau of Labor Statistics. Albeit
the data was not incredibly easy to find from an outdated data browser, but
the numbers are there at your disposal, and there is a lot sitting around
waiting for some visual treatment.
The Statistical Abstract of the United States, for instance, exists as hun-
dreds of tables of data (Figure I-3), but no graphs. That’s an opportunity
to provide a comprehensive picture of a country. Really interesting stuff. I
graphed some of the tables a while back as a proof of concept, as shown in
Figure I-4, and you get marriage and divorce rates, postal rates, electricity
usage, and a few others. The former is hard to read and you don’t get any-
thing out of it other than individual values. In the graphical view, you can
find trends and patterns easily and make comparisons at a glance.
News outlets, such as The New York Times and The Washington Post do a
great job at making data more accessible and visual. They have probably
made the best use of this available data, as related stories have come and
passed. Sometimes data graphics are used to enhance a story with a dif-
ferent point of view, whereas other times the graphics tell the entire story.
Graphics have become even more prevalent with the shift to online media.
There are now departments within news organizations that deal only with
interactives or only graphics or only maps. The New York Times, for exam-
ple, even has a news desk specifically dedicated to what it calls computer-
assisted reporting. These are reporters who focus on telling the news with
numbers. The New York Times graphics desk is also comfortable dealing
with large amounts of data.
Visualization has also found its way into pop culture. Stamen Design, a
visualization firm well known for its online interactives, has provided a
Twitter tracker for the MTV Video Music Awards the past few years. Each
year Stamen designs something different, but at its core, it shows what
people are talking about on Twitter in real-time. When Kanye West had his
little outburst during Taylor Swift’s acceptance speech in 2009, it was obvi-
ous what people thought of him via the tracker.
I ntroduc T ion xix
Figure I-4 A graphical view of data from the Statistical Abstract of the United States
I ntroduc T ion xxi
At this point, you enter a realm of visualization less analytical and more
about feeling. The definition of visualization starts to get kind of fuzzy. For
a long time, visualization was about quantitative facts. You should recog-
nize patterns with your tools, and they should aid your analysis in some
way. Visualization isn’t just about getting the cold hard facts. Like in the
case of Stamen’s tracker, it’s almost more about the entertainment factor.
It’s a way for viewers to watch the awards show and interact with others
in the process. Jonathan Harris’ work is another great example. Harris
designs his work, such as We Feel Fine and Whale Hunt, around stories
rather than analytical insight, and those stories revolve around human
emotion over the numbers and analytics.
Charts and graphs have also evolved into not just tools but also as vehi-
cles to communicate ideas—and even tell jokes. Sites such as GraphJam
and Indexed use Venn diagrams, pie charts, and the like to represent pop
songs or show that a combination of red, black, and white equals a Com-
munist newspaper or a panda murder. Data Underload, a data comic of
sorts that I post on FlowingData, is my own take on the genre. I take every-
day observations and put it in chart form. The chart in Figure I-5 shows
famous movie quotes listed by the American Film Institute. It’s totally
ridiculous but amusing (to me, at least).
So what is visualization? Well, it depends on who you talk to. Some people
say it’s strictly traditional graphs and charts. Others have a more liberal P Find more Data
view where anything that displays data is visualization, whether it is data Underload on
FlowingData at
art or a spreadsheet in Microsoft Excel. I tend to sway more toward the
http://datafl
latter, but sometimes find myself in the former group, too. In the end, it
.ws/underload
doesn’t actually matter all that much. Just make something that works for
your purpose.
Whatever you decide visualization is, whether you’re making charts for
your presentation, analyzing a large dataset, or reporting the news with
data, you’re ultimately looking for truth. At some point in time, lies and
statistics became almost synonymous, but it’s not that the numbers lie.
It’s the people who use the numbers who lie. Sometimes it’s on purpose
to serve an agenda, but most of the time it’s inadvertent. When you don’t
know how to create a graph properly or communicate with data in an unbi-
ased way, false junk is likely to sprout. However, if you learn proper visu-
alization techniques and how to work with data, you can state your points
confidently and feel good about your findings.
Exploring the Variety of Random
Documents with Different Content
states were at length persuaded to consent to its reunion to the
crown from which it had been separated, though to some extent
dependent, since the death of Lothar I (son of Lewis the Pious). On
Rudolf's death in 1032, Eudes, count of Champagne, endeavoured to
seize it, and entered the north-western districts, from which he was
dislodged by Conrad with some difficulty. Unlike Italy, it became an
integral member of the Germanic realm: its prelates and nobles sat
in imperial diets, and retained till recently the style and title of
Princes of the Holy Empire. The central government was, however,
seldom effective in these outlying territories, exposed always to the
intrigues, finally to the aggressions, of Capetian France.
Under Conrad's son Henry the Third the Empire
Henry III.
attained the meridian of its power. At home Otto
the Great's prerogative had not stood so high. The
duchies, always the chief source of fear, were allowed to remain
vacant or filled by the relatives of the monarch, who himself
retained, contrary to usual practice, those of Franconia and (for
some years) Swabia. Abbeys and sees lay entirely
His reform of the
Popedom.
in his gift. Intestine feuds were repressed by the
proclamation of a public peace. Abroad, the feudal
superiority over Hungary, which Henry II had gained by conferring
the title of King with the hand of his sister Gisela, was enforced by
war, the country made almost a province, and compelled to pay
tribute. In Rome no German sovereign had ever been so absolute. A
disgraceful contest between three claimants of the papal chair had
shocked even the reckless apathy of Italy. Henry deposed them all,
and appointed their successor: he became hereditary patrician, and
wore constantly the green mantle and circlet of gold which were the
badges of that office, seeming, one might think, to find in it some
further authority than that which the imperial name conferred. The
synod passed a decree granting to Henry the right of nominating the
supreme pontiff; and the Roman priesthood, who had forfeited the
respect of the world even more by habitual simony than by the
flagrant corruption of their manners, were forced to receive German
after German as their bishop, at the bidding of a ruler so powerful,
so severe, and so pious. But Henry's encroachments alarmed his
own nobles no less than the Italians, and the reaction, which might
have been dangerous to himself, was fatal to his successor. A mere
chance, as some might call it, determined the course of history. The
great Emperor died suddenly in A.D. 1056, and a
Henry IV, A.D. 1056-
1106.
child was left at the helm, while storms were
gathering that might have demanded the wisest
hand.
CHAPTER X.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com