100% found this document useful (3 votes)
25 views

Full Download Data Visualization: Exploring and Explaining With Data 1st Edition Jeffrey D. Camm PDF

ebook

Uploaded by

cosicturek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
25 views

Full Download Data Visualization: Exploring and Explaining With Data 1st Edition Jeffrey D. Camm PDF

ebook

Uploaded by

cosicturek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Download More ebooks [PDF]. Format PDF ebook download PDF KINDLE.

Full download ebooks at ebookmass.com

Data Visualization: Exploring and


Explaining with Data 1st Edition Jeffrey
D. Camm
For dowload this book click BUTTON or LINK below

https://ebookmass.com/product/data-visualization-
exploring-and-explaining-with-data-1st-edition-
jeffrey-d-camm/
OR CLICK BUTTON

DOWLOAD NOW

Download More ebooks from https://ebookmass.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Business Analytics 3rd Edition Edition Jeffrey D. Camm

https://ebookmass.com/product/business-analytics-3rd-edition-
edition-jeffrey-d-camm/

Business Analytics, 5e 5th Edition Jeffrey D. Camm

https://ebookmass.com/product/business-analytics-5e-5th-edition-
jeffrey-d-camm/

Data Science With Rust: A Comprehensive Guide - Data


Analysis, Machine Learning, Data Visualization & More
Van Der Post

https://ebookmass.com/product/data-science-with-rust-a-
comprehensive-guide-data-analysis-machine-learning-data-
visualization-more-van-der-post/

(eBook PDF) Business Analytics 4th Edition by Jeffrey


D. Camm

https://ebookmass.com/product/ebook-pdf-business-analytics-4th-
edition-by-jeffrey-d-camm/
Hydrogeological Conceptual Site Models: Data Analysis
and Visualization 1st Edition, (Ebook PDF)

https://ebookmass.com/product/hydrogeological-conceptual-site-
models-data-analysis-and-visualization-1st-edition-ebook-pdf/

Data Visualization in Enlightenment Literature and


Culture 1st Edition Ileana Baird (Editor)

https://ebookmass.com/product/data-visualization-in-
enlightenment-literature-and-culture-1st-edition-ileana-baird-
editor/

Data Fabric and Data Mesh Approaches with AI 1st


Edition Eberhard Hechler

https://ebookmass.com/product/data-fabric-and-data-mesh-
approaches-with-ai-1st-edition-eberhard-hechler/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler-2/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler/
Data Visualization
Exploring and Explaining with Data

Jeffrey D. Camm James J. Cochran


Wake Forest University University of Alabama

Michael J. Fry Jeffrey W. Ohlmann


University of Cincinnati University of Iowa

Australia ● Brazil ● Canada ● Mexico ● Singapore ● United Kingdom ● United States


This is an electronic version of the print textbook. Due to electronic rights restrictions,
some third party content may be suppressed. Editorial review has deemed that any suppressed
content does not materially affect the overall learning experience. The publisher reserves the right
to remove content from this title at any time if subsequent rights restrictions require it. For
valuable information on pricing, previous editions, changes to current editions, and alternate
formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for
materials in your areas of interest.

Important Notice: Media content referenced within the product description or the product
text may not be available in the eBook version.
Data Visualization: Exploring and © 2022 Cengage Learning, Inc.
Explaining with Data, WCN: 02-300
First Edition
Unless otherwise noted, all content is © Cengage.
Jeffrey D. Camm, James J. Cochran,
Michael J. Fry, Jeffrey W. Ohlmann
ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced or distributed in any form or by any means,
SVP, Higher Education & Skills Product: except as permitted by U.S. copyright law, without the prior written
Erin Joyner permission of the copyright owner.
VP, Higher Education & Skills Product:
Michael Schenk For product information and technology assistance, contact us at
Product Director: Joe Sabatino Cengage Customer & Sales Support, 1-800-354-9706 or
support.cengage.com.
Senior Product Manager: Aaron Arnsparger
For permission to use material from this text or product,
Senior Learning Designer: Brandon Foltz
submit all requests online at
Senior Content Manager: Conor Allen www.cengage.com/permissions.
Digital Delivery Lead: Mark Hopkinson

Marketing Director: Danae April Library of Congress Control Number: 2021930729

Executive Marketing Manager:


Nate Anderson ISBN: 978-0-357-63134-8

IP Analyst: Ashley Maynard


Cengage
IP Project Manager: Kelli Besse 200 Pier 4 Boulevard
Production Service: MPS Limited Boston, MA 02210
USA
Designer: Chris Doughman

Cover Image Source: Cengage is a leading provider of customized learning solutions with
iStockPhoto.com/mpilecky employees residing in nearly 40 different countries and sales in more
than 125 countries around the world. Find your local representative at
www.cengage.com.

To learn more about Cengage platforms and services, register or access


your online learning solution, or purchase materials for your course, visit
www.cengage.com.

Printed in the United States of America


Print Number: 01 Print Year: 2021
Brief Contents
ABOUT THE AUTHORS xi
PREFACE xiii

Chapter 1 Introduction 2
Chapter 2 Selecting a Chart Type 26
Chapter 3 Data Visualization and Design 76
Chapter 4 Purposeful Use of Color 128
Chapter 5 Visualizing Variability 174
Chapter 6 Exploring Data Visually 226
Chapter 7 Explaining Visually to Influence with Data 284
Chapter 8 Data Dashboards 322
Chapter 9 Telling the Truth with Data Visualization 360

References 397
Index 399
Contents
ABOUT THE AUTHORS xi
PREFACE xiii

Chapter 1 Introduction 2
1.1 Analytics 3
1.2 Why Visualize Data? 4
Data Visualization for Exploration 4
Data Visualization for Explanation 7
1.3 Types of Data 8
Quantitative and Categorical Data 8
Cross-Sectional and Time Series Data 9
Big Data 10
1.4 Data Visualization in Practice 11
Accounting 11
Finance 12
Human Resource Management 13
Marketing 14
Operations 14
Engineering 16
Sciences 16
Sports 17
Summary 18
Glossary 19
Problems 20

Chapter 2 Selecting a Chart Type 26


2.1 Defining the Goal of Your Data Visualization 28
Selecting an Appropriate Chart 28
2.2 Creating and Editing Charts in Excel 29
Creating a Chart in Excel 30
Editing a Chart in Excel 30
2.3 Scatter Charts and Bubble Charts 32
Scatter Charts 32
Bubble Charts 33
2.4 Line Charts, Column Charts, and Bar Charts 35
Line Charts 35
Column Charts 39
Bar Charts 41
2.5 Maps 42
Geographic Maps 42
Heat Maps 44
Treemaps 45
vi Contents

2.6 When to Use Tables 47


Tables versus Charts 47
2.7 Other Specialized Charts 49
Waterfall Charts 49
Stock Charts 51
Funnel Charts 52
2.8 A Summary Guide to Chart Selection 54
Guidelines for Selecting a Chart 54
Some Charts to Avoid 55
Excel’s Recommended Charts Tool 57
Summary 59
Glossary 60
Problems 61

Chapter 3 Data Visualization and Design 76


3.1 Preattentive Attributes 78
Color 81
Form 81
Length and Width 84
Spatial Positioning 87
Movement 87
3.2 Gestalt Principles 88
Similarity 88
Proximity 88
Enclosure 89
Connection 89
3.3 Data-Ink Ratio 91
3.4 Other Data Visualization Design Issues 98
Minimizing Eye Travel 98
Choosing a Font for Text 100
3.5 Common Mistakes in Data Visualization Design 102
Wrong Type of Visualization 102
Trying to Display Too Much Information 104
Using Excel Default Settings for Charts 106
Too Many Attributes 108
Unnecessary Use of 3D 109
Summary 111
Glossary 111
Problems 112

Chapter 4 Purposeful Use of Color 128


4.1 Color and Perception 130
Attributes of Color: Hue, Saturation, and Luminance 130
Contents vii

Color Psychology and Color Symbolism 132


Perceived Color 132
4.2 Color Schemes and Types of Data 135
Categorical Color Schemes 135
Sequential Color Schemes 137
Diverging Color Schemes 139
4.3 Custom Color Using the HSL Color System 141
4.4  Common Mistakes in the Use of Color in Data
Visualization 146
Unnecessary Color 146
Excessive Color 148
Insufficient Contrast 151
Inconsistency Across Related Charts 153
Neglecting Colorblindness 153
Not Considering the Mode of Delivery 156
Summary 156
Glossary 157
Problems 157

Chapter 5 Visualizing Variability 174


5.1 Creating Distributions from Data 176
Frequency Distributions for Categorical Data 176
Relative Frequency and Percent Frequency 179
Visualizing Distributions of Quantitative Data 181
5.2  Statistical Analysis of Distributions of Quantitative
Variables 193
Measures of Location 193
Measures of Variability 194
Box and Whisker Charts 197
5.3 Uncertainty in Sample Statistics 200
Displaying a Confidence Interval on a Mean 201
Displaying a Confidence Interval on a Proportion 203
5.4 Uncertainty in Predictive Models 205
Illustrating Prediction Intervals for a Simple Linear
Regression Model 205
Illustrating Prediction Intervals for a Time Series Model 208
Summary 211
Glossary 211
Problems 213

Chapter 6 Exploring Data Visually 226


6.1 Introduction to Exploratory Data Analysis 228
Espléndido Jugo y Batido, Inc. Example 229
Organizing Data to Facilitate Exploration 230
viii Contents

6.2 Analyzing Variables One at a Time 234


Exploring a Categorical Variable 234
Exploring a Quantitative Variable 237
6.3 Relationships between Variables 242
Crosstabulation 242
Association between Two Quantitative Variables 247
6.4 Analysis of Missing Data 256
Types of Missing Data 256
Exploring Patterns Associated with Missing Data 258
6.5 Visualizing Time-Series Data 260
Viewing Data at Different Temporal Frequencies 260
Highlighting Patterns in Time Series Data 262
Rearranging Data for Visualization 266
6.6 Visualizing Geospatial Data 269
Choropleth Maps 269
Cartograms 272
Summary 273
Glossary 274
Problems 275

Chapter 7 Explaining Visually to Influence with Data 284


7.1 Know Your Audience 287
Audience Member Needs 287
Audience Member Analytical Comfort Levels 289
7.2 Know Your Message 292
What Helps the Decision Maker? 293
Empathizing with Data 294
7.3 Storytelling with Charts 300
Choosing the Correct Chart to Tell Your Story 300
Using Preattentive Attributes to Tell Your Story 304
7.4  Bringing It All Together: Storytelling
and Presentation Design 306
Aristotle’s Rhetorical Triangle 307
Freytag’s Pyramid 308
Storyboarding 311
Summary 313
Glossary 313
Problems 314

Chapter 8 Data Dashboards 322


8.1 What Is a Data Dashboard? 324
Principles of Effective Data Dashboards 325
Applications of Data Dashboards 325
Contents ix

8.2 Data Dashboards Taxonomies 327


Data Updates 327
User Interaction 327
Organizational Function 328
8.3 Data Dashboard Design 328
Understanding the Purpose of the Data Dashboard 329
Considering the Needs of the Data Dashboard’s Users 329
Data Dashboard Engineering 330
8.4 Using Excel Tools to Build a Data Dashboard 331
Espléndido Jugo y Batido, Inc. 331
Using PivotTables, PivotCharts, and Slicers to Build
a Data Dashboard 332
Linking Slicers to Multiple PivotTables 343
Protecting a Data Dashboard 346
Final Review of a Data Dashboard 347
8.5  Common Mistakes in Data Dashboard Design 348
Summary 349
Glossary 349
Problems 350

Chapter 9 Telling the Truth with Data Visualization 360


9.1 Missing Data and Data Errors 363
Identifying Missing Data 363
Identifying Data Errors 366
9.2 Biased Data 369
Selection Bias 369
Survivor Bias 372
9.3 Adjusting for Inflation 374
9.4 Deceptive Design 377
Design of Chart Axes 377
Dual-Axis Charts 381
Data Selection and Temporal Frequency 382
Issues Related to Geographic Maps 386
Summary 388
Glossary 389
Problems 389

References  397

Index 399
About the Authors
Jeffrey D. Camm is Inmar Presidential Chair and Senior Associate Dean of Business
Analytics in the School of Business at Wake Forest University. Born in Cincinnati, Ohio,
he holds a B.S. from Xavier University (Ohio) and a Ph.D. from Clemson University. Prior
to joining the faculty at Wake Forest, he was on the faculty of the University of Cincinnati.
He has also been a visiting scholar at Stanford University and a visiting professor of business
administration at the Tuck School of Business at Dartmouth College.
Dr. Camm has published more than 45 papers in the general area of optimization applied
to problems in operations management and marketing. He has published his research in
Science, Management Science, Operations Research, INFORMS Journal on Applied
Analytics, and other professional journals. Dr. Camm was named the Dornoff Fellow of
Teaching Excellence at the University of Cincinnati, and he was the 2006 recipient of the
INFORMS Prize for the Teaching of Operations Research Practice. A firm believer in prac-
ticing what he preaches, he has served as an operations research consultant to numerous
companies and government agencies. From 2005 to 2010 he served as editor-in-chief of the
INFORMS Journal on Applied Analytics (formerly Interfaces). In 2016, Professor Camm
received the George E. Kimball Medal for service to the operations research profession, and
in 2017 he was named an INFORMS Fellow.

James J. Cochran is Associate Dean for Research, Professor of Applied Statistics, and
the Rogers-Spivey Faculty Fellow at The University of Alabama. Born in Dayton, Ohio, he
earned his B.S., M.S., and M.B.A. from Wright State University and his Ph.D. from the Uni-
versity of Cincinnati. He has been at The University of Alabama since 2014 and has been a
visiting scholar at Stanford University, Universidad de Talca, the University of South Africa,
and Pole Universitaire Leonard de Vinci.
Dr. Cochran has published more than 50 papers in the development and application of
operations research and statistical methods. He has published in several journals, including
Management Science, The American Statistician, Communications in Statistics—Theory and
Methods, Annals of Operations Research, European Journal of Operational Research, Jour-
nal of Combinatorial Optimization, INFORMS Journal on Applied Analytics, and Statistics
and Probability Letters. He received the 2008 INFORMS Prize for the Teaching of Opera-
tions Research Practice, 2010 Mu Sigma Rho Statistical Education Award, and 2016 Waller
Distinguished Teaching Career Award from the American Statistical Association. Dr. Cochran
was elected to the International Statistics Institute in 2005, named a Fellow of the American
Statistical Association in 2011, and named a Fellow of INFORMS in 2017. He also received
the Founders Award in 2014 and the Karl E. Peace Award in 2015 from the American Statis-
tical Association, and he received the INFORMS President’s Award in 2019.
A strong advocate for effective operations research and statistics education as a means
of improving the quality of applications to real problems, Dr. Cochran has chaired teaching
effectiveness workshops around the globe. He has served as an operations research consul-
tant to numerous companies and not-for-profit organizations. He served as editor-in-chief of
INFORMS Transactions on Education and is on the editorial board of INFORMS Journal on
Applied Analytics, International Transactions in Operational Research, and Significance.

Michael J. Fry is Professor of Operations, Business Analytics, and Information Systems


(OBAIS) and Academic Director of the Center for Business Analytics in the Carl H. Lindner
College of Business at the University of Cincinnati. Born in Killeen, Texas, he earned a B.S.
from Texas A&M University and M.S.E. and Ph.D. degrees from the University of Michigan.
He has been at the University of Cincinnati since 2002, where he served as Department Head
from 2014 to 2018 and has been named a Lindner Research Fellow. He has also been a visit-
ing professor at Cornell University and at the University of British Columbia.
xii About the Authors

Professor Fry has published more than 25 research papers in journals such as Opera-
tions Research, Manufacturing and Service Operations Management, Transportation Sci-
ence, Naval Research Logistics, IIE Transactions, Critical Care Medicine, and Interfaces.
He serves on editorial boards for journals such as Production and Operations Management,
INFORMS Journal on Applied Analytics (formerly Interfaces), and Journal of Quantitative
Analysis in Sports. His research interests are in applying analytics to the areas of supply chain
management, sports, and public-policy operations. He has worked with many different orga-
nizations for his research, including Dell, Inc., Starbucks Coffee Company, Great American
Insurance Group, the Cincinnati Fire Department, the State of Ohio Election Commission, the
Cincinnati Bengals, and the Cincinnati Zoo and Botanical Gardens. In 2008, he was named a
finalist for the Daniel H. Wagner Prize for Excellence in Operations Research Practice, and
he has been recognized for both his research and teaching excellence at the University of
Cincinnati. In 2019, he led the team that was awarded the INFORMS UPS George D. Smith
Prize on behalf of the OBAIS Department at the University of Cincinnati.

Jeffrey W. Ohlmann is Associate Professor of Business Analytics and Huneke Research


Fellow in the Tippie College of Business at the University of Iowa. Born in Valentine,
Nebraska, he earned a B.S. from the University of Nebraska and M.S. and Ph.D. degrees
from the University of Michigan. He has been at the University of Iowa since 2003.
Professor Ohlmann’s research on the modeling and solution of decision-making prob-
lems has produced more than two dozen research papers in journals such as Operations
Research, Mathematics of Operations Research, INFORMS Journal on Computing, Trans-
portation Science, and European Journal of Operational Research. He has collaborated with
organizations such as Transfreight, LeanCor, Cargill, the Hamilton County Board of Elec-
tions, and three National Football League franchises. Because of the relevance of his work
to industry, he was bestowed the George B. Dantzig Dissertation Award and was recognized
as a finalist for the Daniel H. Wagner Prize for Excellence in Operations Research Practice.
Preface
D ata Visualization: Exploring and Explaining with Data is designed to introduce best
practices in data visualization to undergraduate and graduate students. This is one
of the first books on data visualization designed for college courses. The book contains
material on effective design, choice of chart type, effective use of color, how to explore
data visually, how to build data dashboards, and how to explain concepts and results
visually in a compelling way with data. In an increasingly data-driven economy, these
concepts are becoming more important for analysts, natural scientists, social scientists,
engineers, medical professionals, business professionals, and virtually everyone who
needs to interact with data. Indeed, the skills developed in this book will be helpful to
all who want to influence with data or be accurately informed by data.
The book is designed for a semester-long course at either the undergraduate or graduate
level. The examples used in this book are drawn from a variety of functional areas in the
business world including accounting, finance, operations, and human resources as well as
from sports, politics, science, medicine, and economics. The intention is that this book will
be relevant to students at either the undergraduate or graduate level in a business school as
well as to students studying in other academic areas.
Data Visualization: Exploring and Explaining with Data is written in a style that does
not require advanced knowledge of mathematics or statistics. The first five chapters cover
foundational issues important to constructing good charts. Chapter 1 introduces data visual-
ization and how it fits into the broader area of analytics. A brief history of data visualization
is provided as well as a discussion of the different types of data and examples of a variety of
charts. Chapter 2 provides guidance on selecting an appropriate type of chart based on the
goals of the visualization and the type of data to be visualized. Best practices in chart design,
including discussions of preattentive attributes, Gestalt principles, and the data-ink ratio, are
covered in Chapter 3. Chapter 4 discusses the attributes of color, how to use color effectively,
and some common mistakes in the use of color in data visualization. Chapter 5 covers the im-
portant topic of visualizing and describing variability that occurs in observed values. Chapter
5 introduces the visualization of frequency distributions for categorical and quantitative vari-
ables, measures of location and variability, and confidence intervals and prediction intervals.
Chapters 6 and 7 cover how to explore and explain with data visualization in detail with
examples. Chapter 6 discusses the use of visualization in exploratory data analysis. The ex-
ploration of individual variables as well as the relationship between pairs of variables is con-
sidered. The organization of data to facilitate exploration is discussed as well as the effect of
missing data. The special considerations of visualizing time series data and geospatial data
are also presented. Chapter 7 provides important coverage of how to explain and influence
with data visualization, including knowing your message, understanding the needs of your
audience, and using preattentive attributes to better convey your message. Chapter 8 is a
discussion of how to design and construct data dashboards, collections of data visualizations
used for decision making. Finally, Chapter 9 covers the responsible use of data visualization
to avoid confusing or misleading your audience. Chapter 9 addresses the importance of
understanding your data in order to best convey insights accurately and also discusses how
design choices in a data visualization affect the insights conveyed to the audience.
This textbook can be used by students who have previously taken a basic statistics course
as well as by students who have not had a prior course in statistics. The two most techni-
cal chapters, Chapters 5 (Visualizing Variability) and 6 (Exploring Data Visually), do not
assume a previous course in statistics. All technical concepts are gently introduced. For
students who have had a previous statistics class, the statistical coverage in these chapters
provides a good review within a treatment where the focus is on visualization. The book of-
fers complete coverage for a full course in data visualization, but it can also support a basic
statistics or analytics course. The following table gives our recommendations for chapters to
use to support a variety of courses.
xiv Preface

Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter


1 2 3 4 5 6 7 8 9

Intro Chart Type Design Color Variability Exploring Explaining Dashboards Truth

Full Data Visualiza-


tion Course • • • • • • • • •
Data Visualization
Course Focused on • • • • • •
Presentation

Part of a Basic Statis-


tics Course • • • • • •
Part of an Analytics
Course • • • • • •

Features and Pedagogy


The style and format of this textbook are similar to our other textbooks. Some of the specific
features that we use in this textbook are listed here.
●● Data Visualization Makeover: With the exception of Chapter 1, each chapter contains a
Data Visualization Makeover. Each of these vignettes presents a real visualization that
can be improved using the principles discussed in the chapter. We present the original
data visualization and then discuss how it can be improved. The examples are drawn
from many different organizations in a variety of areas including government, retail,
sports, science, politics, and entertainment.
●● Learning Objectives: Each chapter has a list of learning objectives of that chapter. The
list provides details of what students should be able to do and understand once they
have completed the chapter.
●● Software: Because of its widespread use and ease of availability, we have chosen
Microsoft Excel as the software to illustrate the best practices and principles contained
herein. Excel has been thoroughly integrated throughout this textbook. Whenever we
introduce a new type of chart or table, we provide detailed step-by-step instructions
for how to create the chart or table in Excel. Step-by-step instructions for creating
many of the charts and tables from the textbook using Tableau and Power BI are also
available in MindTap.
●● Notes and Comments: At the end of many sections, we provide Notes and Comments
to give the student additional insights about the material presented in that section.
Additionally, margin notes are used throughout the textbook to provide insights and
tips related to the specific material being discussed.
●● End-of-Chapter Problems: Each chapter contains at least 15 problems to help the stu-
dent master the material presented in that chapter. The problems are separated into
Conceptual and Applications problems. Conceptual problems test the student’s under-
standing of concepts presented in the chapter. Applications problems are hands-on and
require the student to construct or edit charts or tables.
●● DATAfiles and CHARTfiles: All data sets used as examples and in end-of-chapter
problems are Excel files designated as DATAfiles and are available for download by
the student. The names of the DATAfiles are called out in margin notes throughout the
textbook. Similarly, some Excel files with completed charts are available for download
and are designated as CHARTfiles.
Preface xv

MindTap
MindTap is a customizable digital course solution that includes an interactive eBook,
auto-graded exercises and problems from the textbook with solutions feedback, interactive
visualization applets with quizzes, chapter overview and problem walk-through videos, and
more! MindTap also includes step-by-step instructions for creating charts and tables from
the textbook in Tableau and Power BI. Contact your Cengage account executive for more
information about MindTap.

Instructor and Student Resources


Additional instructor and student resources for this product are available online. Instructor
assets include an Instructor’s Manual, Educator’s Guide, PowerPoint® slides, a Solutions
and Answers Guide, and a test bank powered by Cognero®. Student assets include data sets.
Sign up or sign in at www.cengage.com to search for and access this product and its online
resources.

ACKNOWLEDGMENTS
We would like to acknowledge the work of reviewers who have provided comments and
suggestions for improvement of this first edition of this text. Thanks to:
Xiaohui Chang
Oregon State University
Wei Chen
York College of Pennsylvania
Anjee Gorkhali
Susquehanna University
Rita Kumar
Cal Poly Pomona
Barin Nag
Towson University
Andy Olstad
Oregon State University
Vivek Patil
Gonzaga University
Nolan Taylor
Indiana University

We are also indebted to the entire team at Cengage who worked on this title: Senior Prod-
uct Manager, Aaron Arnsparger; Senior Content Manager, Conor Allen; Senior Learning
Designer, Brandon Foltz; Digital Delivery Lead, Mark Hopkinson; Associate Subject-Matter
Expert, Nancy Marchant; Content Program Manager, Jessica Galloway; Content Quality
Assurance Engineer, Douglas Marks; and our Senior Project Manager at MPS Limited,
Anubhav Kaushal, for their editorial counsel and support during the preparation of this text.
The following Technical Content Developers worked on the MindTap content for this
text: Anthony Bacon, Philip Bozarth, Sam Gallagher, Anna Geyer, Matthew Holmes, and
Christopher Kurt. Our thanks to them as well.

Jeffrey D. Camm
James J. Cochran
Michael J. Fry
Jeffrey W. Ohlmann
Chapter 1
Introduction
Contents

1-1 ANALYTICS 1-4 DATA VISUALIZATION IN PRACTICE


Accounting
1-2 WHY VISUALIZE DATA? Finance
Data Visualization for Exploration Human Resource Management
Data Visualization for Explanation Marketing
Operations
Engineering
1-3 TYPES OF DATA Sciences
Quantitative and Categorical Data Sports
Cross-Sectional and Time Series Data
Big Data SUMMARY
GLOSSARY
PROBLEMS

LE A R NI N G O B J E C T I V ES
After completing this chapter, you will be able to

LO 1 D
 efine analytics and describe the different types LO 3 D
 escribe various examples of data visualization
of analytics used in practice

LO 2 D
 escribe the different types of data and give LO 4 Identify the various charts defined in this chapter
an example of each
1-1 Analytics 3

You need a ride to a concert, so you select the Uber app on your phone. You enter the loca-
tion of the concert. Your phone automatically knows your location and the app presents
several options with prices. You select an option and confirm with your driver. You receive
the driver’s name, license plate number, make and model of vehicle, and a photograph of
the driver and the car. A map showing the location of the driver and the time remaining
until arrival is updated in real time.
Without even thinking about it, we continually use data to make decisions in our lives.
How the data are displayed to us has a direct impact on how much effort we must expend
to utilize the data. In the case of Uber, we enter data (our destination) and we are presented
with data (prices) that allow us to make an informed decision. We see the result of our
decision with an indication of the driver’s name, make and model of vehicle, and license
plate number that makes us feel more secure. Rather than simply displaying the time until
arrival, seeing the progress of the car on a map gives us some indication of the driver’s
route. Watching the driver’s progress on the app removes some uncertainty and to some
extent can divert our attention from how long we have been waiting. What data are pre-
sented and how they are presented has an impact on our ability to understand the situation
and make more-informed decisions.
A weather map, an airplane seating chart, the dashboard of your car, a chart of the per-
formance of the Dow Jones Industrial Average, your fitness tracker—all of these involve
the visual display of data. Data visualization is the graphical representation of data and
information using displays such as charts, graphs, and maps. Our ability to process infor-
mation visually is strong. For example, numerical data that have been displayed in a chart,
graph, or map allow us to more easily see relationships between variables in our data set.
Trends, patterns, and the distributions of data are more easily comprehended when data are
displayed visually.
This book is about how to effectively display data to both discover and describe the
information it contains data. We provide best practices in the design of visual displays of
data, the effective use of color, and chart type selection. The goal of this book is to instruct
you how to create effective data visualizations. Through the use of examples (using real
data when possible), this book presents visualization principles and guidelines for gaining
insight from data and conveying an impactful message to the audience.
With the increased use of analytics in business, industry, science, engineering, and
government, data visualization has increased dramatically in importance. We begin with a
discussion of analytics and data visualization’s role in this rapidly growing field.

1-1 Analytics
Analytics is the scientific process of transforming data into insights for making better
decisions.1 Three developments have spurred the explosive growth in the use of analytics
for improving decision making in all facets of our lives, including business, sports, science,
medicine, and government:
●● Incredible amounts of data are produced by technological advances such as point-

of-sale scanner technology; e-commerce and social networks; sensors on all kinds
of mechanical devices such as aircraft engines, automobiles, thermometers, and
farm machinery enabled by the so-called Internet of Things; and personal electronic
devices such as cell phones. Businesses naturally want to use these data to improve
the efficiency and profitability of their operations, better understand their customers,
and price their products more effectively and competitively. Scientists and engineers
use these data to invent new products, improve existing products, and make new
basic discoveries about nature and human behavior.

1
We adopt the definition of analytics developed by the Institute for Operations Research and the Management
Sciences (INFORMS).
4 Chapter 1 Introduction

●● Ongoing research has resulted in numerous methodological developments, including


advances in computational approaches to effectively handle and explore massive
amounts of data as well as faster algorithms for data visualization, machine learning,
optimization, and simulation.
●● The explosion in computing power and storage capability through better computing

hardware, parallel computing, and cloud computing (the remote use of hardware and
software over the internet) enable us to solve larger decision problems more quickly
and more accurately than ever before.
In summary, the availability of massive amounts of data, improvements in analytical meth-
ods, and substantial increases in computing power and storage have enabled the explosive
growth in analytics, data science, and artificial intelligence.
Analytics can involve techniques as simple as reports or as complex as large-scale opti-
mizations and simulations. Analytics is generally grouped into three broad categories of
methods: descriptive, predictive, and prescriptive analytics.
Descriptive analytics is the set of analytical tools that describe what has happened.
This includes techniques such as data queries (requests for information with certain charac-
teristics from a database), reports, descriptive or summary statistics, and data visualization.
Descriptive data mining techniques such as cluster analysis (grouping data points with
similar characteristics) also fall into this category. In general, these techniques summarize
existing data or the output from predictive or prescriptive analyses.
Predictive analytics consists of techniques that use mathematical models constructed
from past data to predict future events or better understand the relationships between vari-
ables. Techniques in this category include regression analysis, time series forecasting,
computer simulation, and predictive data mining. As an example of a predictive model, past
weather data are used to build mathematical models that forecast future weather. Likewise,
past sales data can be used to predict future sales for seasonal products such as snowblow-
ers, winter coats, and bathing suits.
Prescriptive analytics are mathematical or logical models that suggest a decision
or course of action. This category includes mathematical optimization models, decision
analysis, and heuristic or rule-based systems. For example, solutions to supply network
optimization models provide insights into the quantities of a company’s various products
that should be manufactured at each plant, how much should be shipped to each of the
company’s distribution centers, and which distribution center should serve each customer
to minimize cost and meet service constraints.
Data visualization is mission-critical to the success of all three types of analytics. We
discuss this in more detail with examples in the next section.

1-2 Why Visualize Data?


We create data visualizations for two reasons: exploring data and communicating/explaining a
message. Let us discuss these uses of data visualization in more detail, examine the differences
in the two uses, and consider how they relate to the types of analytics previously described.

Data Visualization for Exploration


Data visualization is a powerful tool for exploring data to more easily identify patterns,
recognize anomalies or irregularities in the data, and better understand the relationships
between variables. Our ability to spot these types of characteristics of data is much stronger
and quicker when we look at a visual display of the data rather than a simple listing.
As an example of data visualization for exploration, let us consider the zoo attendance
In chapter 2, we introduce a data shown in Table 1.1 and Figure 1.1. These data on monthly attendance to a zoo can be
variety of different chart types found in the file Zoo. Comparing Table 1.1 and Figure 1.1, observe that the pattern in the data
and how to construct charts
in Excel.
is more detectable in the column chart of Figure 1.1 than in a table of numbers. A column
chart shows numerical data by the height of the column for a variety of categories or time
periods. In the case of Figure 1.1, the time periods are the different months of the year.
1-2 Why Visualize Data? 5

TABLE 1.1 Zoo Attendance Data


Month Jan Feb Mar Apr May Jun
Attendance 5422 4878 6586 6943 7876 17843

Month July Aug Sept Oct Nov Dec


Zoo
Attendance 21967 14542 8751 6454 5677 11422

FIGURE 1.1 A Column Chart of Zoo Attendance by Month

Attendance
25000

20000

15000

10000

5000

0
Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec
Month

Our intuition and experience tells us that we would expect zoo attendance to be high-
est in the summer months when many school-aged children are out of school for summer
break. Figure 1.1 confirms this, as the attendance at the zoo is highest in the summer
months of June, July, and August. Furthermore, we see that attendance increases gradually
each month from February through May as the average temperature increases, and atten-
dance gradually decreases each month from September through November as the average
temperature decreases. But why does the zoo attendance in December and January not fol-
low these patterns? It turns out that the zoo has an event known as the “Festival of Lights”
that runs from the end of November through early January. Children are out of school
during the last half of December and early January for the holiday season, and this leads to
increased attendance in the evenings at the zoo despite the colder winter temperatures.
Visual data exploration is an important part of descriptive analytics. Data visualization
can also be used directly to monitor key performance metrics, that is, measure how an
Data dashboards are organization is performing relative to its goals. A data dashboard is a data visualization
discussed in more detail in tool that gives multiple outputs and may update in real time. Just as the dashboard in your
Chapter 8.
car measures the speed, engine temperature, and other important performance data as you
drive, corporate data dashboards measure performance metrics such as sales, inventory
levels, and service levels relative to the goals set by the company. These data dashboards
alert management when performances deviate from goals so that corrective actions can
be taken.
Visual data exploration is also critical for ensuring that model assumptions hold in predictive
and prescriptive analytics. Understanding the data before using that data in modeling builds
trust and can be important in determining and explaining which type of model is appropriate.
6 Chapter 1 Introduction

As an example of the importance of exploring data visually before modeling, we con-


sider two data sets provided by statistician Francis Anscombe.2 Table 1.2 contains these
two data sets, each of which contains 11 X-Y pairs of data. Notice in Table 1.2 that both
data sets have the same average values for X and Y, and both sets of X and Y also have the
same standard deviations. Based on these commonly used summary statistics, these two
data sets are indistinguishable.
Figure 1.2 shows the two data sets visually as scatter charts. A scatter chart is a
graphical presentation of the relationship between two quantitative variables. One variable
is shown on the horizontal axis and the other is shown on the vertical axis. Scatter charts
are used to better understand the relationship between the two variables under consider-
ation. Even though the two different data sets have the same average values and standard
deviations of X and Y, the respective relationships between X and Y are different.
A scatter chart is often One of the most commonly used predictive models is linear regression, which involves
referred to as a scatter plot. finding the best-fitting line to the data. In the graphs in Figure 1.2, we show the best-
fitting lines for each data set. Notice that the lines are the same for each data set. In
fact, the measure of how well the line fits the data (expressed by a statistic labeled R2)
is the same (67% of the variation in the data is explained by the line). Yet, as we can see
because we have graphed the data, in Figure 1.2a, fitting a straight line looks appropriate
for the data set. However, as shown in Figure 1.2b, a line is not appropriate for data set 2.
We will need to find a different, more appropriate mathematical equation for data set 2.
The line shown in Figure 1.2 for data set 2 would likely dramatically overestimate values
of Y for values of X less than 5 or greater than 14.
Hence, before applying predictive and prescriptive analytics, it is always best to visually
explore the data to be used. This helps the analyst avoid misapplying more complex tech-
niques and reduces the risk of poor results.

TABLE 1.2 Two Data Sets from Anscombe


Data Set 1 Data Set 2
X Y X Y
10 8.04 10 9.14
8 6.95 8 8.14
13 7.58 13 8.74
9 8.81 9 8.77
11 8.33 11 9.26
14 9.96 14 8.1
6 7.24 6 6.13
4 4.26 4 3.10
12 10.84 12 9.13
7 4.82 7 7.26
5 5.68 5 4.74
Average 9 7.501 9 7.501
Standard Deviation 3.317 2.032 3.317 2.032

2
Anscombe, F. J., “The Validity of Comparative Experiments,” Journal of the Royal Statistical Society, Vol. 11,
No. 3, 1948, pp. 181–211.
1-2 Why Visualize Data? 7

FIGURE 1.2 Anscombe’s Data Displayed Graphically

Data Set 1
Y
12

10

4 y = 0.5x + 3.00
R² = 0.67
2

0
0 2 4 6 8 10 12 14 16
X
(a)
Anscombe
Data Set 2
Y
12

10

4 y = 0.5x + 3.00
R² = 0.67
2

0
0 2 4 6 8 10 12 14 16
X
(b)

Data Visualization for Explanation


Data visualization is also important for explaining relationships found in data and for
explaining the results of predictive and prescriptive models. More generally, data visual-
ization is helpful in communicating with your audience and ensuring that your audience
understands and focuses on your intended message.
Let us consider the article, “Check Out the Culture Before a New Job,” which appeared
in The Wall Street Journal.3 The article discusses the importance of finding a good cultural
fit when seeking a new job. Difficulty in understanding a corporate culture or misalignment
with that culture can lead to job dissatisfaction. Figure 1.3 is a re-creation of a bar chart
that appeared in this article. A bar chart shows a summary of categorical data using the
length of horizontal bars to display the magnitude of a quantitative variable.
The chart shown in Figure 1.3 shows the percentage of the 10,002 survey respon-
dents who listed a factor as the most important in seeking a job. Notice that our
attention is drawn to the dark blue bar, which is “Company culture” (the focus of the

3
Lublin, J. S. “Check Out the Culture Before a New Job,” The Wall Street Journal, January 16, 2020.
8 Chapter 1 Introduction

article). We immediately see that only “Salary and bonus” is more frequently cited
than “Company culture.” When you first glance at the chart, the message that is com-
The effective use of color is
municated is that corporate culture is the second most important factor cited by job
discussed in more detail in seekers. And as a reader, based on that message, you then decide whether the article is
Chapter 4. worth reading.

FIGURE 1.3 A Bar Chart of Survey Results of Job Seekers

What matters most to you when deciding which job to take next?

Salary and Bonus 24%

Company Culture 22%

Location 13%

Flexible Schedule 11%

Day-to-day Work 11%

Industry 8%

Job Title 6%

Health Care Benefits 5%

1-3 Types of Data


Different types of charts are more effective than others for certain types of data. For that
reason, let us discuss the different types of data you might encounter.
The Dow Jones Industrial Table 1.3 contains information on the 30 companies that make up the Dow Jones
Average is a stock market
Industrial Index (DJI). The table contains the company name, the stock symbol, the indus-
index. It was created in 1896
by Charles Dow. The 30
try type, the share price, and the volume (number of shares traded). We will use the data
companies that are included in contained in Table 1.3 to facilitate our discussion.
The Dow change periodically
to reflect changes in major
corporations in the United Quantitative and Categorical Data
States.
Quantitative data are data for which numerical values are used to indicate magnitude,
such as how many or how much. Arithmetic operations, such as addition, subtraction,
multiplication, and division, can be performed on quantitative data. For instance,
we can sum the values for Volume in Table 1.3 to calculate a total volume of all
shares traded by companies included in the Dow, because Volume is a quantitative
variable.
Categorical data are data for which categories of like items are identified by labels or
names. Arithmetic operations cannot be performed on categorical data. We can summarize
categorical data by counting the number of observations or computing the proportions of
observations in each category. For instance, the data in the Industry column in Table 1.3
are categorical. We can count the number of companies in the Dow that are, for example,
in the food industry. Table 1.3 shows two companies in the food industry: Coca-Cola and
McDonald’s. However, we cannot perform arithmetic operations directly on the data in the
Industry column.
1-3 Types of Data 9

TABLE 1.3  ata for the Dow Jones Industrial Index Companies
D
(April 3, 2020)
Company Symbol Industry Share Price ($) Volume
Apple Inc. AAPL Technology 241.41 32,470,017
American Express AXP Financial Services 73.6 9,902,194
Boeing BA Manufacturing 124.52 36,489,379
Caterpillar Inc. CAT Manufacturing 114.67 4,803,174
Cisco Systems CSCO Technology 39.06 21,235,157
Chevron CVX Petroleum 75.11 14,317,998
Disney DIS Entertainment 93.88 14,592,062
Goldman Sachs GS Financial Services 146.93 2,773,298
Home Depot, Inc. HD Retailing 178.7 6,762,357
IBM IBM Technology 106.34 3,909,196
Intel Corporation INTC Technology 54.13 23,906,062
Johnson & Johnson JNJ Pharmaceutical 134.17 9,409,033
JPMorgan Chase JPM Financial Services 84.05 20,363,095
Coca-Cola KO Food 43.83 13,294,556
McDonald’s MCD Food 160.33 4,361,094
3M Company MMM Conglomerate 133.79 3,461,642
Merck & Co. MRK Pharmaceutical 76.25 9,181,539
Microsoft MSFT Technology 153.83 41,243,284
Nike NKE Apparel 78.86 8,297,443
Pfizer PFE Pharmaceutical 33.64 30,306,371
Procter & Gamble PG Consumer Goods 115.08 7,520,086
Travelers TRV Financial Services 93.89 1,595,000
UnitedHealth Group UNH Healthcare 229.49 4,356,992
Raytheon UTX Conglomerate 86.01 13,203,254
Visa V Financial Services 151.85 11,649,519
Verizon VZ Telecommunication 54.7 16,304,703
Walgreens WBA Retailing 40.72 6,489,129
Walmart WMT Retailing 119.48 9,390,287
Exxon Mobil XOM Petroleum 39.21 48,094,821

Cross-Sectional and Time Series Data


We distinguish between cross-sectional data and times series data. Cross-sectional data
are collected from several entities at the same or approximately the same point in time. The
data in Table 1.3 are cross-sectional because they describe the 30 companies that comprise
the Dow at the same point in time (April 2020).
Time series data are data collected over several points in time (minutes, hours,
days, months, years, etc.). Graphs of time series data are frequently found in business,
economic, and science publications. Such graphs help analysts understand what hap-
pened in the past, identify trends over time, and project future levels for the time series.
10 Chapter 1 Introduction

For example, the graph of the time series in Figure 1.4 shows the DJI value from January
2010 to April 2020. The graph shows the upward trend of the DJI value from 2010
to 2020, when there was a steep decline in value due to the economic impact of the
COVID-19 pandemic.

Big Data
There is no universally accepted definition of big data. However, probably the most general
definition of big data is any set of data that is too large or too complex to be handled by
standard data-processing techniques using a typical desktop computer. People refer to the
four Vs of big data:
●● volume—the amount of data generated
●● velocity—the speed at which the data are generated
●● variety—the diversity in types and structures of data generated

●● veracity—the reliability of the data generated

Volume and velocity can pose a challenge for processing analytics, including data visual-
ization. Special data management software such as Hadoop and higher capacity hardware
(increased server or cloud computing) may be required. The variety of the data is handled
by converting video, voice, and text data to numerical data, to which we can then apply
standard data visualization techniques.
In summary, the type of data you have will influence the type of graph you should use to
convey your message. The zoo attendance data in Figure 1.1 are time series data. We used
a column chart in Figure 1.1 because the numbers are the total attendance for each month,
and we wanted to compare the attendance by month. The height of the columns allows us
to easily compare attendance by month. Contrast Figure 1.1 with Figure 1.4, which is also
time series data. Here we have the value of the Dow Jones Index. These data are a snapshot
of the current value of the DJI on the first trading day of each month. They provide what is

FIGURE 1.4 Dow Jones Index Values from January 2010 to April 2020

DJI Value
30,000

25,000

20,000

15,000
DJI
10,000

5,000

0
11

13
12
10

20
14

18

19
16
15

17
20

20
20
20

20
20

20

20
20
20

20
1/

1/
1/
1/

1/
1/

1/

1/
1/
1/

1/
1/

1/
1/
1/

1/
1/

1/

1/
1/
1/

1/
1-4 Data Visualization in Practice 11

How to select an effective essentially a time path of the value, and so we use a line graph to emphasize the continuity
chart type is discussed in more
of time.
detail in Chapter 2.

1-4 Data Visualization in Practice


Data visualization is used to explore and explain data and to guide decision making in
all areas of business and science. Even the most analytically advanced companies such
as Google, Uber, and Amazon rely heavily on data visualization. Consumer goods giant
Procter & Gamble (P&G), the maker of household brands such as Tide, Pampers, Crest,
and Swiffer, has invested heavily in analytics, including data visualization. P&G has
built what it calls the Business Sphere™ in more than 50 of its sites around the world.
The Business Sphere is a conference room with technology for displaying data visual-
izations on its walls. The Business Sphere displays data and information P&G executives
and managers can use to make better-informed decisions. Let us briefly discuss some
ways in which the functional areas of business, engineering, science, and sports use data
visualization.

Accounting
Accounting is a data-driven profession. Accountants prepare financial statements and
examine financial statements for accuracy and conformance to legal regulations and best
practices, including reporting required for tax purposes. Data visualization is a part of
every accountant’s tool kit. Data visualization is used to detect outliers that could be an
indication of a data error or fraud. As an example of data visualization in accounting, let us
consider Benford’s Law.
Benfords Law, also known as the First-Digit Law, gives the expected probability that
the first digit of a reported number takes on the values one through nine, based on many
real-life numerical data sets such as company expense accounts. A column chart displaying
Benford’s Law is shown in Figure 1.5. We have rounded the probabilities to four digits. We
see, for example, that the probability of the first digit being a 1 is 0.3010. The probability
of the first digit being a 2 is 0.1761, and so forth.

FIGURE 1.5 A Column Chart Showing Benford’s Law

Benford’s Law: The Probability of the First Digit


0.3010

0.1761

0.1249
0.0969
0.0792
0.0669 0.0580 0.0512 0.0458

1 2 3 4 5 6 7 8 9
First Digit
12 Chapter 1 Introduction

Benford’s Law can be used to detect fraud. If the first digits of numbers in a data set
do not conform to Bedford’s Law, then further investigation of fraud may be warranted.
Consider the accounts payable (money owed the company) for Tucker Software. Figure 1.6
is a clustered column chart (also known as a side-by-side column chart). A clustered
column chart is a column chart that shows multiple variables of interest on the same
chart, with the different variables usually denoted by different colors or shades of a color.
In Figure 1.6, the two variables are Benford’s Law probability and the first digit data for a
random sample of 500 of Tucker’s accounts payable entries. The frequency of occurrence
in the data is used to estimate the probability of the first digit for all of Tucker’s accounts
payable entries. It appears that there are an inordinate number of first digits of 5 and 9 and
a lower than expected number of first digits of 1. These might warrant further investigation
by Tucker’s auditors.

FIGURE 1.6 A Clustered Column Chart Showing Benford’s Law versus


Tucker Software’s Accounts Payable Entries

Benford’s Law versus Tucker Software Accounts Payable


Probability
0.35

0.30 Benford Tucker

0.25

0.20

0.15

0.10

0.05

0.00
1 2 3 4 5 6 7 8 9
First Digit

Finance
Like accounting, the area of business known as finance is numerical and data-driven.
Finance is the area of business concerned with investing. Financial analysts, also known
as “quants,” use massive amounts of financial data to decide when to buy and sell certain
stocks, bonds, and other financial instruments. Data visualization is useful in finance for
recognizing trends, assessing risk, and tracking actual versus forecasted values of metrics
of concern.
Yahoo! Finance and other websites allow you to download daily stock price data. As an
example, the file Verizon has five days of stock prices for telecommunications company
We discuss High-Low-Close Verizon Wireless. Each of the five observations includes the date, the high share price for
Stock charts in more detail in
that date, the low share price for that day, and the closing share price for that day. Excel has
Chapter 2.
several charts designed for tracking stock performance with such data. Figure 1.7 displays
1-4 Data Visualization in Practice 13

these data in a high-low-close stock chart, a chart that shows the high value, low value,
and closing value of the price of a share of stock over time. For each date shown, the bar
indicates the range of the stock price per share on that day, and the labelled point on the
bar indicates closing price per share for that day. The chart shows how the closing price is
changing over time and the volatility of the price on each day.

FIGURE 1.7 A High-Low-Close Stock Chart for Verizon Wireless

Verizon Wireless Stock Price per Share Performance


Price per Share ($) Close
59.50

59.00

58.50
58.13
58.00 57.99 57.93

57.50 57.59

57.00
56.82
56.50

56.00

55.50
20-Apr 21-Apr 22-Apr 23-Apr 24-Apr

Human Resource Management


Human resource management (HRM) is the part of an organization that focuses on an orga-
nization’s recruitment, training, and retention of employees. With the increased use of ana-
lytics in business, HRM has become much more data-driven. Indeed, HRM is sometimes
now referred to as “people analytics.” HRM professionals use data and analytical models to
form high-performing teams, monitor productivity and employee performance, and ensure
diversity of the workforce. Data visualization is an important component of HRM, as HRM
professionals use data dashboards to monitor relevant data supporting their goal of having
a high-performing workforce.
A key interest of HRM professionals is employee churn, or turnover in an organiza-
tion’s workforce. When employees leave and others are hired, there is often a loss of pro-
ductivity as positions go unfilled. Also, new employees typically have a training period
and then must gain experience, which means employees will not be fully productive at
the beginning of their tenure with the company. Figure 1.8, a stacked column chart, is an
example of a visual display of employee turnover. It shows gains and losses of employees
by month. A stacked column chart is a column chart that shows part-to-whole compari-
sons, either over time or across categories. Different colors or shades of color are used to
denote the different parts of the whole within a column. In Figure 1.8, gains in employees
(new hires) are represented by positive numbers in darker blue and losses (people leaving
the company) are presented as negative numbers and lighter blue bars. We see that January
and July–October are the months during which the greatest numbers of employees left the
company, and the months with the highest numbers of new hires are April through June.
14 Chapter 1 Introduction

Visualizations like Figure 1.8 can be helpful in better understanding and managing work-
force fluctuations.

FIGURE 1.8 A Stacked Column Chart of Employee Turnover by Month

Number of Employees
60

50 Gains Losses

40

30

20

10

–10

–20

–30
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month

Marketing
Marketing is one of the most popular application areas of analytics. Analytics \is used
for optimal pricing, markdown pricing for seasonal goods, and optimal allocation of
marketing budget. Sentiment analysis using text data such as tweets, social networks to
determine influence, and website analytics for understanding website traffic and sales,
are just a few examples of how data visualization can be used to support more effective
marketing.
Let us consider a software company’s website effectiveness. Figure 1.9 shows a funnel
chart of the conversion of website visitors to subscribers and then to renewal customers.
Funnel charts are discussed in A funnel chart is a chart that shows the progression of a numerical variable for various
more detail in Chapter 2.
categories from larger to smaller values. In Figure 1.9, at the top of the funnel, we track
100% of the first-time visitors to the website over some period of time, for example, a
six-month period. The funnel chart shows that of those original visitors, 74% return to
the website one or more times after their initial visit. Sixty-one percent of the first-time
visitors downloaded a 30-day trial version of the software, 47% eventually contacted
support services, 28% purchased a one-year subscription to the software, and 17% even-
tually renewed their subscription. This type of funnel chart can be used to compare the
conversion effectiveness of different website configurations, the use of bots, or changes in
support services.

Operations
Like marketing, analytics is used heavily in managing the operations function of busi-
ness. Operations management is concerned with the management of the production and
1-4 Data Visualization in Practice 15

FIGURE 1.9 A Funnel Chart of Website Conversions for a Software Company

Visited the Website 100%

Returned to the Website 74%

Downloaded a Trial Version 61%

Contacted Support 47%

Subscribed 28%

Renewed 17%

distribution of goods and services. It includes responsibility for planning and scheduling,
inventory planning, demand forecasting, and supply chain optimization. Figure 1.10
shows time series data for monthly unit sales for a product (measured in thousands of
units sold). Each period corresponds to one month. So that a cost-effective produc-
tion schedule can be developed, an operations manager might have responsibility for

FIGURE 1.10 Time Series Data for Units Sales of a Product

Sales (1000s units)


3000

2500

2000

1500

1000

500

0
0 5 10 15 20 25 30 35 40
Month
16 Chapter 1 Introduction

forecasting the monthly unit sales for next twelve months (periods 37–48). In looking at
the time series data in Figure 1.10, it appears that there is a repeating pattern and units
sold might also be increasing slightly over time. The operations manager can use these
observations to help guide the forecasting techniques to test to arrive at reasonable fore-
casts for periods 37–48.

Engineering
Engineering relies heavily on mathematics and data. Hence, data visualization is an impor-
tant technique in every engineer’s toolkit. For example, industrial engineers monitor the
production process to ensure that it is “in control” or operating as expected. A control
chart is a graphical display that is used to help determine if a production process is in
control or out of control. A variable of interest is plotted over time relative to lower and
upper control limits. Consider the control chart for the production of 10-pound bags of dog
food shown in Figure 1.11. Every minute, a bag is diverted from the line and automatically
weighed. The result is plotted along with lower and upper control limits obtained statisti-
cally from historical data. When the points are between the lower and upper control limits,
the process is considered to be in control. When points begin to appear outside the control
limits with some regularity and/or when large swings start to appear as in Figure 1.11, this
is a signal to inspect the process and make any necessary corrections.

FIGURE 1.11 A Quality Control Chart for Dog Food Production

Weight (pounds)
10.10
10.08
10.06 Upper Control Limit
10.04
10.02
10.00
9.98
9.96 Lower Control Limit
9.94
9.92
9.90
1 3 5 7 9 11 13 15
Minute

Sciences
The natural and social sciences rely heavily on the analysis of data and data visualization
for exploring data and explaining the results of analysis. In the natural sciences, data are
often geographic, so maps are used frequently. For example, the weather, pandemic hot
spots, and species distributions can be represented on a geographic map. Geographic maps
are not only used to display data, but also to display the results of predictive models. An
example of this is shown in Figure 1.12. Predicting the path a hurricane will follow is a
1-4 Data Visualization in Practice 17

FIGURE 1.12 A Spaghetti Chart of Hurricane Paths from Multiple Predictive


Models

complicated problem. Numerous models, each with its own set of influencing variables
(also known as model features), yield different predictions. Displaying the results of each
model on a map gives a sense of the uncertainty in predicted paths across all models and
expands the alert to a broader range of the population than relying on a single model.
Because the multiple paths resemble pieces of spaghetti, this type of map is sometimes
referred to as a “spaghetti chart.” More generally, a spaghetti chart is a chart depicting
possible flows through a system using a line for each possible path.

Sports
The use of analytics in sports has gained considerable notoriety since 2003, when
renowned author Michael Lewis published his book Moneyball. Lewis’s book tells how
the Oakland Athletics used an analytical approach for player evaluation to assemble a
competitive team using a limited budget. The use of analytics for player evaluation and on-
field strategy is now common throughout professional sports. Data visualization is a key
component of how analytics is applied in sports. It is common for coaches to have tablet
computers on the sideline that they use to make real-time decisions such as calling plays
and making player substitutions.
Figure 1.13 shows an example of how data visualization is used in basketball. A shot
chart is a chart that displays the location of the shots attempted by a player during a
basketball game with different symbols or colors indicating successful and unsuccess-
ful shots. Figure 1.12 shows shot attempts by NBA player Chris Paul, with a blue dot
indicating a successful shot and a orange x indicating a missed shot (source: Basketball-
Reference.com). Other NBA teams can utilize this chart to help devise strategies for
defending Chris Paul.
18 Chapter 1 Introduction

FIGURE 1.13 A Shot Chart for NBA Player Chris Paul

No t e s 1 C o mm e n t s

Chart is considered a more general term than graph. For (a line chart). In this text, we use the terms chart and graph
example, charts encompass maps, bar charts, etc., but graphs interchangeably.
generally refer to a chart of the type shown in Figure 1.4

S U M M A RY

This introductory chapter began with a discussion of analytics, the scientific process of
transforming data into insights for making better decisions. We discussed the three types of
analytics: descriptive, predictive, and prescriptive. Descriptive analytics describes what has
happened and includes tools such as reports, data visualization, data dashboards, descrip-
tive statistics, and some data-mining techniques. Predictive analytics consists of techniques
that use past data to predict future events or understand the relationships between variables.
These techniques include regression, data mining, forecasting, and simulation. Prescriptive
analytics uses input data to suggest a decision or course of action. This class of analytical
techniques includes rule-based models, simulation, decision analysis, and optimization.
Descriptive and predictive analytics can help us better understand the uncertainty and risk
associated with our decision alternatives.
This text focuses on descriptive analytics, and in particular on data visualization. Data
visualization can be used for exploring data and for explaining data and the output of anal-
yses. We explore data to more easily identify patterns, recognize anomalies or irregularities
in the data, and better understand relationships between variables. Visually displaying data
enhances our ability to identify these characteristics of data. Often we put various charts
and tables of several related variables into a single display called a data dashboard. Data
dashboards are collections of tables, charts, maps, and summary statistics that are updated
Glossary 19

as new data become available. Many organizations and businesses use data dashboards to
explore and monitor performance data such as inventory levels, sales, and the quality of
production.
We also use data visualization for explaining data and the results of data analyses. As
business becomes more data-driven, it is increasingly important to be able to influence
decision making by telling a compelling data-driven story with data visualization. Much
of the rest of this text is devoted to how to visualize data to clearly convey a compelling
message.
The type of chart, graph, or table to use depends on the type of data you have and
your intended message. Therefore, we discussed the different types of data. Quantitative
data are numerical values used to indicate magnitude, such as how many or how much.
Arithmetic operations, such as addition and subtraction, can be performed on quantitative
data. Categorical data are data for which categories of like items are identified by labels
or names. Arithmetic operations cannot be performed on categorical data. Cross-sectional
data are collected from several entities at the same or approximately the same point in
time, whereas time series data are collected on a single variable at several points in time.
Big data is any set of data that is too large or complex to be handled by typical data-pro-
cessing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
We concluded the chapter with a discussion of applications of data visualization in
accounting, finance, human resource management, marketing, operations, engineering,
science, and sports, and we provided an example for each area. Each of the remaining
chapters of this text will begin with a real-world application of a data visualization. Each
Data Visualization Makeover is a real visualization we discuss and then improve by apply-
ing the principles of the chapter.

G L O S S A RY

Analytics The scientific process of transforming data into insights for making better
decisions.
Bar chart A chart that shows a summary of categorical data using the length of horizontal
bars to display the magnitude of a quantitative variable.
Big data Any set of data that is too large or complex to be handled by standard data-
processing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
Categorical data Data for which categories of like items are identified by labels or names.
Arithmetic operations cannot be performed on categorical data.
Clustered column chart A column chart showing multiple variables of interest on the
same chart, the different variables usually denoted by different colors or shades of a color
with the columns side by side.
Column chart A chart that shows numerical data by the height of a column for a variety of
categories or time periods.
Control chart A graphical display in which a variable of interest is plotted over time
relative to lower and upper control limits.
Cross-sectional data Data collected from several entities at the same or approximately the
same point in time.
Data dashboard A data visualization tool that gives multiple outputs and may update in
real time.
Data visualization The graphical representation of data and information using displays
such as charts, graphs, and maps.
Descriptive analytics The set of analytical tools that describe what has happened.
Funnel chart A chart that shows the progression of a numerical variable to typically
smaller values through a process, for example, the percentage of website visitors who
ultimately result in a sale.
20 Chapter 1 Introduction

High-low-close stock chart A chart that shows three numerical values: high value, low
value, and closing value for the price of a share of stock over time.
Predictive analytics Techniques that use models constructed from past data to predict
future events or better understand the relationships between variables.
Prescriptive analytics Mathematical or logical models that suggest a decision or course of
action.
Quantitative data Data for which numerical values are used to indicate magnitude,
such as how many or how much. Arithmetic operations, such as addition, subtraction,
multiplication, and division, can be performed on quantitative data.
Scatter chart A graphical presentation of the relationship between two quantitative
variables. One variable is shown on the horizontal axis and the other is shown on the
vertical axis.
Shot chart A chart that displays the location of shots attempted by a basketball player
during a basketball game with different symbols or colors indicating successful and
unsuccessful shots.
Spaghetti chart A chart depicting possible flows through a system using a line for each
possible path.
Time series data Data collected over several points in time (minutes, hours, days, months,
years, etc.).

P R O B L E M S

1. Types of Analytics. Indicate which type of analytics (descriptive, predictive, or pre-


scriptive analytics) each of the following represents. LO 1
a. a data dashboard
b. a model that finds the production schedule that minimizes overtime
c. a model that forecasts sales for the next quarter
d. a bar chart
e. a model that allocates your financial investments to achieve your financial goal
2. Transportation Planning. An analytics professional is asked to plan the shipment of a
product for the next quarter. She employs the following process:
Step 1. For each of the 12 distribution centers, she plots the quarterly demand for the
product over the last three years.
Step 2. Based on the plot for each distribution center, she develops a forecasting
model to forecast demand for next quarter for each distribution center.
Step 3. She takes the forecast for next quarter for each distribution center and inputs
those forecasts, along with the capacities of the company’s four factories and
transportation rates from each factory to each distribution center, into an opti-
mization model. The optimization model suggests a shipping plan that min-
imizes the cost of how to satisfy the forecasted demand from the company’s
four different factories to the distribution centers.
Describe the type of analytics being utilized in each of the three steps outlined above.
LO 1
3. Wall Street Journal Subscriber Characteristics. A Wall Street Journal subscriber
survey asked a series of questions about subscriber characteristics and interests. State
whether each of the following questions provides categorical or quantitative data. LO 2
a. What is your age?
b. Are you male or female?
c. When did you first start reading the WSJ? High school, college, early career, midca-
reer, late career, or retirement?
d. How long have you been in your present job or position?
e. What type of vehicle are you considering for your next purchase? Nine response
categories for this question include sedan, sports car, SUV, minivan, and so on.
Problems 21

4. Comparing Smartwatches. Consumer Reports provides product evaluations for its


subscribers. The following table shows data from Consumer Reports for five smart-
watches on the following characteristics:
Overall Score—a score awarded for a variety of performance factors
Price—the retail price
Recommended—does Consumer Reports recommend purchasing the smartwatch based
on performance and strengths?
Best Buy—if Consumer Reports recommends purchasing the smartwatch, does it also
consider it a “best buy” based on a blend of performance and value?

Make Overall Score Recommended Best Buy Price


Apple Watch Series 5 84 Yes No $395
Fitbit Versa 2 78 Yes Yes $200
Garmin Venu 77 Yes No $350
Fitbit Versa Lite 65 No No $100

For each of the four pieces of data, indicate whether the data are quantitative or cate-
gorical and whether the data are cross-sectional or time series. LO 2
5. House Price and Square Footage. Suppose we want to better understand the relation-
ship between house price and square footage of the house, and we have collected house
price and square footage for 75 houses in a particular neighborhood of Cincinnati,
Ohio, from the Zillow website on January 3, 2021. LO 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or times series?
c. Which of the following type of chart would provide the best display of these data?
Explain your answer.
i. Bar chart
ii. Column chart
iii. Scatter chart
6. Netflix Subscribers. The following chart displays the total number of Netflix sub-
scribers from 2010 to 2019. LO 1, 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or time series?
c. What type of chart is this?

Netflix Subscribers (millions)


167.1

139.3

110.6
89.1
74.8
57.4
44.4
33.3
26.3
20.0

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year

7. U.S. Netflix Subscribers. Refer to the previous problem. Suppose that in addition
to the total number of Netflix subscribers, we have the number of those subscribers
by year for the years 2010–2019 who live in the United States. Our message is to
22 Chapter 1 Introduction

emphasize how much of the growth is coming from the United States. Which of the
following types of charts would best display the data? Explain your answer. LO 2, 3
i. Bar chart
ii. Clustered column chart
iii. Stacked column chart
iv. Stock chart
8. How Data Scientists Spend Their Day. The Wall Street Journal reported the results
of a survey of data scientists. The survey asked the data scientists how they spend their
time. The following chart shows the percentage of respondents who answered less than
five hours per week or at least five hours per week for the amount of time they spend
on exploring data and on presenting analyses. LO 2, 3, 4

What Data Scientists Do: Exploring versus Presenting

74%
Presenting Analysis
26%

Less than five hours per week At least five hours per week

42%
Exploring Data
58%

a. Are these data quantitative or categorical?


b. Are these data cross-sectional or time series?
c. What type of chart is this?
d. What conclusions can you make based on this chart?
9. Industries in the Dow Jones Industrial Index. Refer to the data on the Dow Jones
Industrial Index given in Table 1.3. The following chart displays the number of compa-
nies in each industry that make up this index. LO 3
a. What type of chart is this?
b. Which industry has the highest number of companies in the Dow Jones Industrial
Index?

Number of Companies by Industry


Technology 5
Financial Services 5
Retailing 3
Pharmaceutical 3
Petroleum 2
Manufacturing 2
Food 2
Conglomerate 2
Telecommunication 1
Healthcare 1
Entertainment 1
Consumer Goods 1
Apparel 1
Problems 23

10. Job Factors. The following chart is based on the same data used to construct
Figure 1.3. The data are percentages of respondents to a survey who listed various
factors as most important when making a job decision. LO 3, 4
a. What type of chart is this?
b. What is the fifth most-cited factor?

What matters most to you when deciding which job to take next?
24%
22%

13%
11% 11%

8%
6%
5%

Salary and Company Location Day-to-day Flexible Industry Job Title Health Care
Bonus Culture Work Schedule Benefits

11. Retirement Financial Concerns. The results of the American Institute of Certified
Public Accountants’ Personal Financial Planning Trends Survey indicated 48% of
clients had concerns about outliving their money. The top reasons for these concerns
and the percentage of respondents who cited the reason were as follows. LO 3, 4
Concerns for Retirement

Health-care Costs 77%

Stock Market Fluctuations 53%

Unexpected Costs 50%

Lifestyle Changes 42%

Possibility of Being a Financial Burden 22%

Desire to Leave an Inheritance 21%

a. What type of chart is this?


b. Only 48% of the survey respondents had financial concerns about retirement
(outliving their money). What percentage of the total people surveyed had retire-
ment health-care cost concerns?
12. Master’s Degree Program Recruiting. The recruiting process for a full-time master’s
program in data science consists of the following steps. The program director obtains
email addresses of undergraduate seniors who have taken the Graduate Record Exam
(GRE) and expressed an interest in data science. An email inviting the students to an
24 Chapter 1 Introduction

online information session is sent. At the information session, faculty discuss the pro-
gram and answer questions. Students apply through a web portal. An admissions com-
mittee makes an offer of admission (or not) along with any financial aid. If the person
is admitted, the person either accepts or rejects the offer. Consider the following chart.
LO 3, 4

Master’s Degree in Data Science Recruiting

Email 100%

Information Session 64%

Applied for Admission 47%

Admitted 25%

Enrolled 21%

a. What type of chart is this?


b. Which of the following is the correct interpretation of the 21% for Enrolled?
i. Of those who were sent an email, 21% enrolled.
ii. Of those who were admitted, 21% enrolled.
iii. Of those who applied for admission, 21% enrolled.
iv. None of the above
13. Chemical Process Control. The following chart is a quality control chart of the tem-
perature of a chemical manufacturing process. What observations can you make about
the process? LO 3

Temperature (degrees Fahrenheit)

97.00
Upper Control Limit
96.80

96.60

96.40

96.20

96.00 Lower Control Limit

95.80

95.60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Hour
Another random document with
no related content on Scribd:
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund from
the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like