Full Download Data Visualization: Exploring and Explaining With Data 1st Edition Jeffrey D. Camm PDF
Full Download Data Visualization: Exploring and Explaining With Data 1st Edition Jeffrey D. Camm PDF
https://ebookmass.com/product/data-visualization-
exploring-and-explaining-with-data-1st-edition-
jeffrey-d-camm/
OR CLICK BUTTON
DOWLOAD NOW
https://ebookmass.com/product/business-analytics-3rd-edition-
edition-jeffrey-d-camm/
https://ebookmass.com/product/business-analytics-5e-5th-edition-
jeffrey-d-camm/
https://ebookmass.com/product/data-science-with-rust-a-
comprehensive-guide-data-analysis-machine-learning-data-
visualization-more-van-der-post/
https://ebookmass.com/product/ebook-pdf-business-analytics-4th-
edition-by-jeffrey-d-camm/
Hydrogeological Conceptual Site Models: Data Analysis
and Visualization 1st Edition, (Ebook PDF)
https://ebookmass.com/product/hydrogeological-conceptual-site-
models-data-analysis-and-visualization-1st-edition-ebook-pdf/
https://ebookmass.com/product/data-visualization-in-
enlightenment-literature-and-culture-1st-edition-ileana-baird-
editor/
https://ebookmass.com/product/data-fabric-and-data-mesh-
approaches-with-ai-1st-edition-eberhard-hechler/
https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler-2/
https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler/
Data Visualization
Exploring and Explaining with Data
Important Notice: Media content referenced within the product description or the product
text may not be available in the eBook version.
Data Visualization: Exploring and © 2022 Cengage Learning, Inc.
Explaining with Data, WCN: 02-300
First Edition
Unless otherwise noted, all content is © Cengage.
Jeffrey D. Camm, James J. Cochran,
Michael J. Fry, Jeffrey W. Ohlmann
ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced or distributed in any form or by any means,
SVP, Higher Education & Skills Product: except as permitted by U.S. copyright law, without the prior written
Erin Joyner permission of the copyright owner.
VP, Higher Education & Skills Product:
Michael Schenk For product information and technology assistance, contact us at
Product Director: Joe Sabatino Cengage Customer & Sales Support, 1-800-354-9706 or
support.cengage.com.
Senior Product Manager: Aaron Arnsparger
For permission to use material from this text or product,
Senior Learning Designer: Brandon Foltz
submit all requests online at
Senior Content Manager: Conor Allen www.cengage.com/permissions.
Digital Delivery Lead: Mark Hopkinson
Cover Image Source: Cengage is a leading provider of customized learning solutions with
iStockPhoto.com/mpilecky employees residing in nearly 40 different countries and sales in more
than 125 countries around the world. Find your local representative at
www.cengage.com.
Chapter 1 Introduction 2
Chapter 2 Selecting a Chart Type 26
Chapter 3 Data Visualization and Design 76
Chapter 4 Purposeful Use of Color 128
Chapter 5 Visualizing Variability 174
Chapter 6 Exploring Data Visually 226
Chapter 7 Explaining Visually to Influence with Data 284
Chapter 8 Data Dashboards 322
Chapter 9 Telling the Truth with Data Visualization 360
References 397
Index 399
Contents
ABOUT THE AUTHORS xi
PREFACE xiii
Chapter 1 Introduction 2
1.1 Analytics 3
1.2 Why Visualize Data? 4
Data Visualization for Exploration 4
Data Visualization for Explanation 7
1.3 Types of Data 8
Quantitative and Categorical Data 8
Cross-Sectional and Time Series Data 9
Big Data 10
1.4 Data Visualization in Practice 11
Accounting 11
Finance 12
Human Resource Management 13
Marketing 14
Operations 14
Engineering 16
Sciences 16
Sports 17
Summary 18
Glossary 19
Problems 20
References 397
Index 399
About the Authors
Jeffrey D. Camm is Inmar Presidential Chair and Senior Associate Dean of Business
Analytics in the School of Business at Wake Forest University. Born in Cincinnati, Ohio,
he holds a B.S. from Xavier University (Ohio) and a Ph.D. from Clemson University. Prior
to joining the faculty at Wake Forest, he was on the faculty of the University of Cincinnati.
He has also been a visiting scholar at Stanford University and a visiting professor of business
administration at the Tuck School of Business at Dartmouth College.
Dr. Camm has published more than 45 papers in the general area of optimization applied
to problems in operations management and marketing. He has published his research in
Science, Management Science, Operations Research, INFORMS Journal on Applied
Analytics, and other professional journals. Dr. Camm was named the Dornoff Fellow of
Teaching Excellence at the University of Cincinnati, and he was the 2006 recipient of the
INFORMS Prize for the Teaching of Operations Research Practice. A firm believer in prac-
ticing what he preaches, he has served as an operations research consultant to numerous
companies and government agencies. From 2005 to 2010 he served as editor-in-chief of the
INFORMS Journal on Applied Analytics (formerly Interfaces). In 2016, Professor Camm
received the George E. Kimball Medal for service to the operations research profession, and
in 2017 he was named an INFORMS Fellow.
James J. Cochran is Associate Dean for Research, Professor of Applied Statistics, and
the Rogers-Spivey Faculty Fellow at The University of Alabama. Born in Dayton, Ohio, he
earned his B.S., M.S., and M.B.A. from Wright State University and his Ph.D. from the Uni-
versity of Cincinnati. He has been at The University of Alabama since 2014 and has been a
visiting scholar at Stanford University, Universidad de Talca, the University of South Africa,
and Pole Universitaire Leonard de Vinci.
Dr. Cochran has published more than 50 papers in the development and application of
operations research and statistical methods. He has published in several journals, including
Management Science, The American Statistician, Communications in Statistics—Theory and
Methods, Annals of Operations Research, European Journal of Operational Research, Jour-
nal of Combinatorial Optimization, INFORMS Journal on Applied Analytics, and Statistics
and Probability Letters. He received the 2008 INFORMS Prize for the Teaching of Opera-
tions Research Practice, 2010 Mu Sigma Rho Statistical Education Award, and 2016 Waller
Distinguished Teaching Career Award from the American Statistical Association. Dr. Cochran
was elected to the International Statistics Institute in 2005, named a Fellow of the American
Statistical Association in 2011, and named a Fellow of INFORMS in 2017. He also received
the Founders Award in 2014 and the Karl E. Peace Award in 2015 from the American Statis-
tical Association, and he received the INFORMS President’s Award in 2019.
A strong advocate for effective operations research and statistics education as a means
of improving the quality of applications to real problems, Dr. Cochran has chaired teaching
effectiveness workshops around the globe. He has served as an operations research consul-
tant to numerous companies and not-for-profit organizations. He served as editor-in-chief of
INFORMS Transactions on Education and is on the editorial board of INFORMS Journal on
Applied Analytics, International Transactions in Operational Research, and Significance.
Professor Fry has published more than 25 research papers in journals such as Opera-
tions Research, Manufacturing and Service Operations Management, Transportation Sci-
ence, Naval Research Logistics, IIE Transactions, Critical Care Medicine, and Interfaces.
He serves on editorial boards for journals such as Production and Operations Management,
INFORMS Journal on Applied Analytics (formerly Interfaces), and Journal of Quantitative
Analysis in Sports. His research interests are in applying analytics to the areas of supply chain
management, sports, and public-policy operations. He has worked with many different orga-
nizations for his research, including Dell, Inc., Starbucks Coffee Company, Great American
Insurance Group, the Cincinnati Fire Department, the State of Ohio Election Commission, the
Cincinnati Bengals, and the Cincinnati Zoo and Botanical Gardens. In 2008, he was named a
finalist for the Daniel H. Wagner Prize for Excellence in Operations Research Practice, and
he has been recognized for both his research and teaching excellence at the University of
Cincinnati. In 2019, he led the team that was awarded the INFORMS UPS George D. Smith
Prize on behalf of the OBAIS Department at the University of Cincinnati.
Intro Chart Type Design Color Variability Exploring Explaining Dashboards Truth
MindTap
MindTap is a customizable digital course solution that includes an interactive eBook,
auto-graded exercises and problems from the textbook with solutions feedback, interactive
visualization applets with quizzes, chapter overview and problem walk-through videos, and
more! MindTap also includes step-by-step instructions for creating charts and tables from
the textbook in Tableau and Power BI. Contact your Cengage account executive for more
information about MindTap.
ACKNOWLEDGMENTS
We would like to acknowledge the work of reviewers who have provided comments and
suggestions for improvement of this first edition of this text. Thanks to:
Xiaohui Chang
Oregon State University
Wei Chen
York College of Pennsylvania
Anjee Gorkhali
Susquehanna University
Rita Kumar
Cal Poly Pomona
Barin Nag
Towson University
Andy Olstad
Oregon State University
Vivek Patil
Gonzaga University
Nolan Taylor
Indiana University
We are also indebted to the entire team at Cengage who worked on this title: Senior Prod-
uct Manager, Aaron Arnsparger; Senior Content Manager, Conor Allen; Senior Learning
Designer, Brandon Foltz; Digital Delivery Lead, Mark Hopkinson; Associate Subject-Matter
Expert, Nancy Marchant; Content Program Manager, Jessica Galloway; Content Quality
Assurance Engineer, Douglas Marks; and our Senior Project Manager at MPS Limited,
Anubhav Kaushal, for their editorial counsel and support during the preparation of this text.
The following Technical Content Developers worked on the MindTap content for this
text: Anthony Bacon, Philip Bozarth, Sam Gallagher, Anna Geyer, Matthew Holmes, and
Christopher Kurt. Our thanks to them as well.
Jeffrey D. Camm
James J. Cochran
Michael J. Fry
Jeffrey W. Ohlmann
Chapter 1
Introduction
Contents
LE A R NI N G O B J E C T I V ES
After completing this chapter, you will be able to
LO 1 D
efine analytics and describe the different types LO 3 D
escribe various examples of data visualization
of analytics used in practice
LO 2 D
escribe the different types of data and give LO 4 Identify the various charts defined in this chapter
an example of each
1-1 Analytics 3
You need a ride to a concert, so you select the Uber app on your phone. You enter the loca-
tion of the concert. Your phone automatically knows your location and the app presents
several options with prices. You select an option and confirm with your driver. You receive
the driver’s name, license plate number, make and model of vehicle, and a photograph of
the driver and the car. A map showing the location of the driver and the time remaining
until arrival is updated in real time.
Without even thinking about it, we continually use data to make decisions in our lives.
How the data are displayed to us has a direct impact on how much effort we must expend
to utilize the data. In the case of Uber, we enter data (our destination) and we are presented
with data (prices) that allow us to make an informed decision. We see the result of our
decision with an indication of the driver’s name, make and model of vehicle, and license
plate number that makes us feel more secure. Rather than simply displaying the time until
arrival, seeing the progress of the car on a map gives us some indication of the driver’s
route. Watching the driver’s progress on the app removes some uncertainty and to some
extent can divert our attention from how long we have been waiting. What data are pre-
sented and how they are presented has an impact on our ability to understand the situation
and make more-informed decisions.
A weather map, an airplane seating chart, the dashboard of your car, a chart of the per-
formance of the Dow Jones Industrial Average, your fitness tracker—all of these involve
the visual display of data. Data visualization is the graphical representation of data and
information using displays such as charts, graphs, and maps. Our ability to process infor-
mation visually is strong. For example, numerical data that have been displayed in a chart,
graph, or map allow us to more easily see relationships between variables in our data set.
Trends, patterns, and the distributions of data are more easily comprehended when data are
displayed visually.
This book is about how to effectively display data to both discover and describe the
information it contains data. We provide best practices in the design of visual displays of
data, the effective use of color, and chart type selection. The goal of this book is to instruct
you how to create effective data visualizations. Through the use of examples (using real
data when possible), this book presents visualization principles and guidelines for gaining
insight from data and conveying an impactful message to the audience.
With the increased use of analytics in business, industry, science, engineering, and
government, data visualization has increased dramatically in importance. We begin with a
discussion of analytics and data visualization’s role in this rapidly growing field.
1-1 Analytics
Analytics is the scientific process of transforming data into insights for making better
decisions.1 Three developments have spurred the explosive growth in the use of analytics
for improving decision making in all facets of our lives, including business, sports, science,
medicine, and government:
●● Incredible amounts of data are produced by technological advances such as point-
of-sale scanner technology; e-commerce and social networks; sensors on all kinds
of mechanical devices such as aircraft engines, automobiles, thermometers, and
farm machinery enabled by the so-called Internet of Things; and personal electronic
devices such as cell phones. Businesses naturally want to use these data to improve
the efficiency and profitability of their operations, better understand their customers,
and price their products more effectively and competitively. Scientists and engineers
use these data to invent new products, improve existing products, and make new
basic discoveries about nature and human behavior.
1
We adopt the definition of analytics developed by the Institute for Operations Research and the Management
Sciences (INFORMS).
4 Chapter 1 Introduction
hardware, parallel computing, and cloud computing (the remote use of hardware and
software over the internet) enable us to solve larger decision problems more quickly
and more accurately than ever before.
In summary, the availability of massive amounts of data, improvements in analytical meth-
ods, and substantial increases in computing power and storage have enabled the explosive
growth in analytics, data science, and artificial intelligence.
Analytics can involve techniques as simple as reports or as complex as large-scale opti-
mizations and simulations. Analytics is generally grouped into three broad categories of
methods: descriptive, predictive, and prescriptive analytics.
Descriptive analytics is the set of analytical tools that describe what has happened.
This includes techniques such as data queries (requests for information with certain charac-
teristics from a database), reports, descriptive or summary statistics, and data visualization.
Descriptive data mining techniques such as cluster analysis (grouping data points with
similar characteristics) also fall into this category. In general, these techniques summarize
existing data or the output from predictive or prescriptive analyses.
Predictive analytics consists of techniques that use mathematical models constructed
from past data to predict future events or better understand the relationships between vari-
ables. Techniques in this category include regression analysis, time series forecasting,
computer simulation, and predictive data mining. As an example of a predictive model, past
weather data are used to build mathematical models that forecast future weather. Likewise,
past sales data can be used to predict future sales for seasonal products such as snowblow-
ers, winter coats, and bathing suits.
Prescriptive analytics are mathematical or logical models that suggest a decision
or course of action. This category includes mathematical optimization models, decision
analysis, and heuristic or rule-based systems. For example, solutions to supply network
optimization models provide insights into the quantities of a company’s various products
that should be manufactured at each plant, how much should be shipped to each of the
company’s distribution centers, and which distribution center should serve each customer
to minimize cost and meet service constraints.
Data visualization is mission-critical to the success of all three types of analytics. We
discuss this in more detail with examples in the next section.
Attendance
25000
20000
15000
10000
5000
0
Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec
Month
Our intuition and experience tells us that we would expect zoo attendance to be high-
est in the summer months when many school-aged children are out of school for summer
break. Figure 1.1 confirms this, as the attendance at the zoo is highest in the summer
months of June, July, and August. Furthermore, we see that attendance increases gradually
each month from February through May as the average temperature increases, and atten-
dance gradually decreases each month from September through November as the average
temperature decreases. But why does the zoo attendance in December and January not fol-
low these patterns? It turns out that the zoo has an event known as the “Festival of Lights”
that runs from the end of November through early January. Children are out of school
during the last half of December and early January for the holiday season, and this leads to
increased attendance in the evenings at the zoo despite the colder winter temperatures.
Visual data exploration is an important part of descriptive analytics. Data visualization
can also be used directly to monitor key performance metrics, that is, measure how an
Data dashboards are organization is performing relative to its goals. A data dashboard is a data visualization
discussed in more detail in tool that gives multiple outputs and may update in real time. Just as the dashboard in your
Chapter 8.
car measures the speed, engine temperature, and other important performance data as you
drive, corporate data dashboards measure performance metrics such as sales, inventory
levels, and service levels relative to the goals set by the company. These data dashboards
alert management when performances deviate from goals so that corrective actions can
be taken.
Visual data exploration is also critical for ensuring that model assumptions hold in predictive
and prescriptive analytics. Understanding the data before using that data in modeling builds
trust and can be important in determining and explaining which type of model is appropriate.
6 Chapter 1 Introduction
2
Anscombe, F. J., “The Validity of Comparative Experiments,” Journal of the Royal Statistical Society, Vol. 11,
No. 3, 1948, pp. 181–211.
1-2 Why Visualize Data? 7
Data Set 1
Y
12
10
4 y = 0.5x + 3.00
R² = 0.67
2
0
0 2 4 6 8 10 12 14 16
X
(a)
Anscombe
Data Set 2
Y
12
10
4 y = 0.5x + 3.00
R² = 0.67
2
0
0 2 4 6 8 10 12 14 16
X
(b)
3
Lublin, J. S. “Check Out the Culture Before a New Job,” The Wall Street Journal, January 16, 2020.
8 Chapter 1 Introduction
article). We immediately see that only “Salary and bonus” is more frequently cited
than “Company culture.” When you first glance at the chart, the message that is com-
The effective use of color is
municated is that corporate culture is the second most important factor cited by job
discussed in more detail in seekers. And as a reader, based on that message, you then decide whether the article is
Chapter 4. worth reading.
What matters most to you when deciding which job to take next?
Location 13%
Industry 8%
Job Title 6%
TABLE 1.3 ata for the Dow Jones Industrial Index Companies
D
(April 3, 2020)
Company Symbol Industry Share Price ($) Volume
Apple Inc. AAPL Technology 241.41 32,470,017
American Express AXP Financial Services 73.6 9,902,194
Boeing BA Manufacturing 124.52 36,489,379
Caterpillar Inc. CAT Manufacturing 114.67 4,803,174
Cisco Systems CSCO Technology 39.06 21,235,157
Chevron CVX Petroleum 75.11 14,317,998
Disney DIS Entertainment 93.88 14,592,062
Goldman Sachs GS Financial Services 146.93 2,773,298
Home Depot, Inc. HD Retailing 178.7 6,762,357
IBM IBM Technology 106.34 3,909,196
Intel Corporation INTC Technology 54.13 23,906,062
Johnson & Johnson JNJ Pharmaceutical 134.17 9,409,033
JPMorgan Chase JPM Financial Services 84.05 20,363,095
Coca-Cola KO Food 43.83 13,294,556
McDonald’s MCD Food 160.33 4,361,094
3M Company MMM Conglomerate 133.79 3,461,642
Merck & Co. MRK Pharmaceutical 76.25 9,181,539
Microsoft MSFT Technology 153.83 41,243,284
Nike NKE Apparel 78.86 8,297,443
Pfizer PFE Pharmaceutical 33.64 30,306,371
Procter & Gamble PG Consumer Goods 115.08 7,520,086
Travelers TRV Financial Services 93.89 1,595,000
UnitedHealth Group UNH Healthcare 229.49 4,356,992
Raytheon UTX Conglomerate 86.01 13,203,254
Visa V Financial Services 151.85 11,649,519
Verizon VZ Telecommunication 54.7 16,304,703
Walgreens WBA Retailing 40.72 6,489,129
Walmart WMT Retailing 119.48 9,390,287
Exxon Mobil XOM Petroleum 39.21 48,094,821
For example, the graph of the time series in Figure 1.4 shows the DJI value from January
2010 to April 2020. The graph shows the upward trend of the DJI value from 2010
to 2020, when there was a steep decline in value due to the economic impact of the
COVID-19 pandemic.
Big Data
There is no universally accepted definition of big data. However, probably the most general
definition of big data is any set of data that is too large or too complex to be handled by
standard data-processing techniques using a typical desktop computer. People refer to the
four Vs of big data:
●● volume—the amount of data generated
●● velocity—the speed at which the data are generated
●● variety—the diversity in types and structures of data generated
Volume and velocity can pose a challenge for processing analytics, including data visual-
ization. Special data management software such as Hadoop and higher capacity hardware
(increased server or cloud computing) may be required. The variety of the data is handled
by converting video, voice, and text data to numerical data, to which we can then apply
standard data visualization techniques.
In summary, the type of data you have will influence the type of graph you should use to
convey your message. The zoo attendance data in Figure 1.1 are time series data. We used
a column chart in Figure 1.1 because the numbers are the total attendance for each month,
and we wanted to compare the attendance by month. The height of the columns allows us
to easily compare attendance by month. Contrast Figure 1.1 with Figure 1.4, which is also
time series data. Here we have the value of the Dow Jones Index. These data are a snapshot
of the current value of the DJI on the first trading day of each month. They provide what is
FIGURE 1.4 Dow Jones Index Values from January 2010 to April 2020
DJI Value
30,000
25,000
20,000
15,000
DJI
10,000
5,000
0
11
13
12
10
20
14
18
19
16
15
17
20
20
20
20
20
20
20
20
20
20
20
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1/
1-4 Data Visualization in Practice 11
How to select an effective essentially a time path of the value, and so we use a line graph to emphasize the continuity
chart type is discussed in more
of time.
detail in Chapter 2.
Accounting
Accounting is a data-driven profession. Accountants prepare financial statements and
examine financial statements for accuracy and conformance to legal regulations and best
practices, including reporting required for tax purposes. Data visualization is a part of
every accountant’s tool kit. Data visualization is used to detect outliers that could be an
indication of a data error or fraud. As an example of data visualization in accounting, let us
consider Benford’s Law.
Benfords Law, also known as the First-Digit Law, gives the expected probability that
the first digit of a reported number takes on the values one through nine, based on many
real-life numerical data sets such as company expense accounts. A column chart displaying
Benford’s Law is shown in Figure 1.5. We have rounded the probabilities to four digits. We
see, for example, that the probability of the first digit being a 1 is 0.3010. The probability
of the first digit being a 2 is 0.1761, and so forth.
0.1761
0.1249
0.0969
0.0792
0.0669 0.0580 0.0512 0.0458
1 2 3 4 5 6 7 8 9
First Digit
12 Chapter 1 Introduction
Benford’s Law can be used to detect fraud. If the first digits of numbers in a data set
do not conform to Bedford’s Law, then further investigation of fraud may be warranted.
Consider the accounts payable (money owed the company) for Tucker Software. Figure 1.6
is a clustered column chart (also known as a side-by-side column chart). A clustered
column chart is a column chart that shows multiple variables of interest on the same
chart, with the different variables usually denoted by different colors or shades of a color.
In Figure 1.6, the two variables are Benford’s Law probability and the first digit data for a
random sample of 500 of Tucker’s accounts payable entries. The frequency of occurrence
in the data is used to estimate the probability of the first digit for all of Tucker’s accounts
payable entries. It appears that there are an inordinate number of first digits of 5 and 9 and
a lower than expected number of first digits of 1. These might warrant further investigation
by Tucker’s auditors.
0.25
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9
First Digit
Finance
Like accounting, the area of business known as finance is numerical and data-driven.
Finance is the area of business concerned with investing. Financial analysts, also known
as “quants,” use massive amounts of financial data to decide when to buy and sell certain
stocks, bonds, and other financial instruments. Data visualization is useful in finance for
recognizing trends, assessing risk, and tracking actual versus forecasted values of metrics
of concern.
Yahoo! Finance and other websites allow you to download daily stock price data. As an
example, the file Verizon has five days of stock prices for telecommunications company
We discuss High-Low-Close Verizon Wireless. Each of the five observations includes the date, the high share price for
Stock charts in more detail in
that date, the low share price for that day, and the closing share price for that day. Excel has
Chapter 2.
several charts designed for tracking stock performance with such data. Figure 1.7 displays
1-4 Data Visualization in Practice 13
these data in a high-low-close stock chart, a chart that shows the high value, low value,
and closing value of the price of a share of stock over time. For each date shown, the bar
indicates the range of the stock price per share on that day, and the labelled point on the
bar indicates closing price per share for that day. The chart shows how the closing price is
changing over time and the volatility of the price on each day.
59.00
58.50
58.13
58.00 57.99 57.93
57.50 57.59
57.00
56.82
56.50
56.00
55.50
20-Apr 21-Apr 22-Apr 23-Apr 24-Apr
Visualizations like Figure 1.8 can be helpful in better understanding and managing work-
force fluctuations.
Number of Employees
60
50 Gains Losses
40
30
20
10
–10
–20
–30
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
Marketing
Marketing is one of the most popular application areas of analytics. Analytics \is used
for optimal pricing, markdown pricing for seasonal goods, and optimal allocation of
marketing budget. Sentiment analysis using text data such as tweets, social networks to
determine influence, and website analytics for understanding website traffic and sales,
are just a few examples of how data visualization can be used to support more effective
marketing.
Let us consider a software company’s website effectiveness. Figure 1.9 shows a funnel
chart of the conversion of website visitors to subscribers and then to renewal customers.
Funnel charts are discussed in A funnel chart is a chart that shows the progression of a numerical variable for various
more detail in Chapter 2.
categories from larger to smaller values. In Figure 1.9, at the top of the funnel, we track
100% of the first-time visitors to the website over some period of time, for example, a
six-month period. The funnel chart shows that of those original visitors, 74% return to
the website one or more times after their initial visit. Sixty-one percent of the first-time
visitors downloaded a 30-day trial version of the software, 47% eventually contacted
support services, 28% purchased a one-year subscription to the software, and 17% even-
tually renewed their subscription. This type of funnel chart can be used to compare the
conversion effectiveness of different website configurations, the use of bots, or changes in
support services.
Operations
Like marketing, analytics is used heavily in managing the operations function of busi-
ness. Operations management is concerned with the management of the production and
1-4 Data Visualization in Practice 15
Subscribed 28%
Renewed 17%
distribution of goods and services. It includes responsibility for planning and scheduling,
inventory planning, demand forecasting, and supply chain optimization. Figure 1.10
shows time series data for monthly unit sales for a product (measured in thousands of
units sold). Each period corresponds to one month. So that a cost-effective produc-
tion schedule can be developed, an operations manager might have responsibility for
2500
2000
1500
1000
500
0
0 5 10 15 20 25 30 35 40
Month
16 Chapter 1 Introduction
forecasting the monthly unit sales for next twelve months (periods 37–48). In looking at
the time series data in Figure 1.10, it appears that there is a repeating pattern and units
sold might also be increasing slightly over time. The operations manager can use these
observations to help guide the forecasting techniques to test to arrive at reasonable fore-
casts for periods 37–48.
Engineering
Engineering relies heavily on mathematics and data. Hence, data visualization is an impor-
tant technique in every engineer’s toolkit. For example, industrial engineers monitor the
production process to ensure that it is “in control” or operating as expected. A control
chart is a graphical display that is used to help determine if a production process is in
control or out of control. A variable of interest is plotted over time relative to lower and
upper control limits. Consider the control chart for the production of 10-pound bags of dog
food shown in Figure 1.11. Every minute, a bag is diverted from the line and automatically
weighed. The result is plotted along with lower and upper control limits obtained statisti-
cally from historical data. When the points are between the lower and upper control limits,
the process is considered to be in control. When points begin to appear outside the control
limits with some regularity and/or when large swings start to appear as in Figure 1.11, this
is a signal to inspect the process and make any necessary corrections.
Weight (pounds)
10.10
10.08
10.06 Upper Control Limit
10.04
10.02
10.00
9.98
9.96 Lower Control Limit
9.94
9.92
9.90
1 3 5 7 9 11 13 15
Minute
Sciences
The natural and social sciences rely heavily on the analysis of data and data visualization
for exploring data and explaining the results of analysis. In the natural sciences, data are
often geographic, so maps are used frequently. For example, the weather, pandemic hot
spots, and species distributions can be represented on a geographic map. Geographic maps
are not only used to display data, but also to display the results of predictive models. An
example of this is shown in Figure 1.12. Predicting the path a hurricane will follow is a
1-4 Data Visualization in Practice 17
complicated problem. Numerous models, each with its own set of influencing variables
(also known as model features), yield different predictions. Displaying the results of each
model on a map gives a sense of the uncertainty in predicted paths across all models and
expands the alert to a broader range of the population than relying on a single model.
Because the multiple paths resemble pieces of spaghetti, this type of map is sometimes
referred to as a “spaghetti chart.” More generally, a spaghetti chart is a chart depicting
possible flows through a system using a line for each possible path.
Sports
The use of analytics in sports has gained considerable notoriety since 2003, when
renowned author Michael Lewis published his book Moneyball. Lewis’s book tells how
the Oakland Athletics used an analytical approach for player evaluation to assemble a
competitive team using a limited budget. The use of analytics for player evaluation and on-
field strategy is now common throughout professional sports. Data visualization is a key
component of how analytics is applied in sports. It is common for coaches to have tablet
computers on the sideline that they use to make real-time decisions such as calling plays
and making player substitutions.
Figure 1.13 shows an example of how data visualization is used in basketball. A shot
chart is a chart that displays the location of the shots attempted by a player during a
basketball game with different symbols or colors indicating successful and unsuccess-
ful shots. Figure 1.12 shows shot attempts by NBA player Chris Paul, with a blue dot
indicating a successful shot and a orange x indicating a missed shot (source: Basketball-
Reference.com). Other NBA teams can utilize this chart to help devise strategies for
defending Chris Paul.
18 Chapter 1 Introduction
No t e s 1 C o mm e n t s
Chart is considered a more general term than graph. For (a line chart). In this text, we use the terms chart and graph
example, charts encompass maps, bar charts, etc., but graphs interchangeably.
generally refer to a chart of the type shown in Figure 1.4
S U M M A RY
This introductory chapter began with a discussion of analytics, the scientific process of
transforming data into insights for making better decisions. We discussed the three types of
analytics: descriptive, predictive, and prescriptive. Descriptive analytics describes what has
happened and includes tools such as reports, data visualization, data dashboards, descrip-
tive statistics, and some data-mining techniques. Predictive analytics consists of techniques
that use past data to predict future events or understand the relationships between variables.
These techniques include regression, data mining, forecasting, and simulation. Prescriptive
analytics uses input data to suggest a decision or course of action. This class of analytical
techniques includes rule-based models, simulation, decision analysis, and optimization.
Descriptive and predictive analytics can help us better understand the uncertainty and risk
associated with our decision alternatives.
This text focuses on descriptive analytics, and in particular on data visualization. Data
visualization can be used for exploring data and for explaining data and the output of anal-
yses. We explore data to more easily identify patterns, recognize anomalies or irregularities
in the data, and better understand relationships between variables. Visually displaying data
enhances our ability to identify these characteristics of data. Often we put various charts
and tables of several related variables into a single display called a data dashboard. Data
dashboards are collections of tables, charts, maps, and summary statistics that are updated
Glossary 19
as new data become available. Many organizations and businesses use data dashboards to
explore and monitor performance data such as inventory levels, sales, and the quality of
production.
We also use data visualization for explaining data and the results of data analyses. As
business becomes more data-driven, it is increasingly important to be able to influence
decision making by telling a compelling data-driven story with data visualization. Much
of the rest of this text is devoted to how to visualize data to clearly convey a compelling
message.
The type of chart, graph, or table to use depends on the type of data you have and
your intended message. Therefore, we discussed the different types of data. Quantitative
data are numerical values used to indicate magnitude, such as how many or how much.
Arithmetic operations, such as addition and subtraction, can be performed on quantitative
data. Categorical data are data for which categories of like items are identified by labels
or names. Arithmetic operations cannot be performed on categorical data. Cross-sectional
data are collected from several entities at the same or approximately the same point in
time, whereas time series data are collected on a single variable at several points in time.
Big data is any set of data that is too large or complex to be handled by typical data-pro-
cessing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
We concluded the chapter with a discussion of applications of data visualization in
accounting, finance, human resource management, marketing, operations, engineering,
science, and sports, and we provided an example for each area. Each of the remaining
chapters of this text will begin with a real-world application of a data visualization. Each
Data Visualization Makeover is a real visualization we discuss and then improve by apply-
ing the principles of the chapter.
G L O S S A RY
Analytics The scientific process of transforming data into insights for making better
decisions.
Bar chart A chart that shows a summary of categorical data using the length of horizontal
bars to display the magnitude of a quantitative variable.
Big data Any set of data that is too large or complex to be handled by standard data-
processing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
Categorical data Data for which categories of like items are identified by labels or names.
Arithmetic operations cannot be performed on categorical data.
Clustered column chart A column chart showing multiple variables of interest on the
same chart, the different variables usually denoted by different colors or shades of a color
with the columns side by side.
Column chart A chart that shows numerical data by the height of a column for a variety of
categories or time periods.
Control chart A graphical display in which a variable of interest is plotted over time
relative to lower and upper control limits.
Cross-sectional data Data collected from several entities at the same or approximately the
same point in time.
Data dashboard A data visualization tool that gives multiple outputs and may update in
real time.
Data visualization The graphical representation of data and information using displays
such as charts, graphs, and maps.
Descriptive analytics The set of analytical tools that describe what has happened.
Funnel chart A chart that shows the progression of a numerical variable to typically
smaller values through a process, for example, the percentage of website visitors who
ultimately result in a sale.
20 Chapter 1 Introduction
High-low-close stock chart A chart that shows three numerical values: high value, low
value, and closing value for the price of a share of stock over time.
Predictive analytics Techniques that use models constructed from past data to predict
future events or better understand the relationships between variables.
Prescriptive analytics Mathematical or logical models that suggest a decision or course of
action.
Quantitative data Data for which numerical values are used to indicate magnitude,
such as how many or how much. Arithmetic operations, such as addition, subtraction,
multiplication, and division, can be performed on quantitative data.
Scatter chart A graphical presentation of the relationship between two quantitative
variables. One variable is shown on the horizontal axis and the other is shown on the
vertical axis.
Shot chart A chart that displays the location of shots attempted by a basketball player
during a basketball game with different symbols or colors indicating successful and
unsuccessful shots.
Spaghetti chart A chart depicting possible flows through a system using a line for each
possible path.
Time series data Data collected over several points in time (minutes, hours, days, months,
years, etc.).
P R O B L E M S
For each of the four pieces of data, indicate whether the data are quantitative or cate-
gorical and whether the data are cross-sectional or time series. LO 2
5. House Price and Square Footage. Suppose we want to better understand the relation-
ship between house price and square footage of the house, and we have collected house
price and square footage for 75 houses in a particular neighborhood of Cincinnati,
Ohio, from the Zillow website on January 3, 2021. LO 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or times series?
c. Which of the following type of chart would provide the best display of these data?
Explain your answer.
i. Bar chart
ii. Column chart
iii. Scatter chart
6. Netflix Subscribers. The following chart displays the total number of Netflix sub-
scribers from 2010 to 2019. LO 1, 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or time series?
c. What type of chart is this?
139.3
110.6
89.1
74.8
57.4
44.4
33.3
26.3
20.0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
7. U.S. Netflix Subscribers. Refer to the previous problem. Suppose that in addition
to the total number of Netflix subscribers, we have the number of those subscribers
by year for the years 2010–2019 who live in the United States. Our message is to
22 Chapter 1 Introduction
emphasize how much of the growth is coming from the United States. Which of the
following types of charts would best display the data? Explain your answer. LO 2, 3
i. Bar chart
ii. Clustered column chart
iii. Stacked column chart
iv. Stock chart
8. How Data Scientists Spend Their Day. The Wall Street Journal reported the results
of a survey of data scientists. The survey asked the data scientists how they spend their
time. The following chart shows the percentage of respondents who answered less than
five hours per week or at least five hours per week for the amount of time they spend
on exploring data and on presenting analyses. LO 2, 3, 4
74%
Presenting Analysis
26%
Less than five hours per week At least five hours per week
42%
Exploring Data
58%
10. Job Factors. The following chart is based on the same data used to construct
Figure 1.3. The data are percentages of respondents to a survey who listed various
factors as most important when making a job decision. LO 3, 4
a. What type of chart is this?
b. What is the fifth most-cited factor?
What matters most to you when deciding which job to take next?
24%
22%
13%
11% 11%
8%
6%
5%
Salary and Company Location Day-to-day Flexible Industry Job Title Health Care
Bonus Culture Work Schedule Benefits
11. Retirement Financial Concerns. The results of the American Institute of Certified
Public Accountants’ Personal Financial Planning Trends Survey indicated 48% of
clients had concerns about outliving their money. The top reasons for these concerns
and the percentage of respondents who cited the reason were as follows. LO 3, 4
Concerns for Retirement
online information session is sent. At the information session, faculty discuss the pro-
gram and answer questions. Students apply through a web portal. An admissions com-
mittee makes an offer of admission (or not) along with any financial aid. If the person
is admitted, the person either accepts or rejects the offer. Consider the following chart.
LO 3, 4
Email 100%
Admitted 25%
Enrolled 21%
97.00
Upper Control Limit
96.80
96.60
96.40
96.20
95.80
95.60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Hour
Another random document with
no related content on Scribd:
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.