Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python 1st Edition Sam Lau 2025 scribd download
Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python 1st Edition Sam Lau 2025 scribd download
com
https://ebookmeta.com/product/learning-data-science-data-
wrangling-exploration-visualization-and-modeling-with-
python-1st-edition-sam-lau/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/python-for-data-analysis-data-wrangling-
with-pandas-numpy-and-ipython-1st-edition-wes-mckinney/
ebookmeta.com
https://ebookmeta.com/product/project-management-the-managerial-
process-8th-edition-larson/
ebookmeta.com
Rose City Free Fall Dent Miller Thrillers 1 1st Edition Dl
Barbur
https://ebookmeta.com/product/rose-city-free-fall-dent-miller-
thrillers-1-1st-edition-dl-barbur/
ebookmeta.com
https://ebookmeta.com/product/15-minutes-maizie-albright-star-
detective-1-1st-edition-larissa-reinhart/
ebookmeta.com
https://ebookmeta.com/product/the-philosophy-of-philosophy-2nd-
edition-timothy-williamson/
ebookmeta.com
https://ebookmeta.com/product/intelligent-computing-proceedings-of-
the-2021-computing-conference-volume-1-1st-edition-kohei-arai/
ebookmeta.com
Cocktails Modern Favorites to Make at Home Williams Sonoma
Test Kitchen
https://ebookmeta.com/product/cocktails-modern-favorites-to-make-at-
home-williams-sonoma-test-kitchen-2/
ebookmeta.com
Learning
Data Science
Data Wrangling, Exploration, Visualization,
and Modeling with Python
Sam Lau,
Joseph Gonzalez
& Deborah Nolan
Learning Data Science
As an aspiring data scientist, you appreciate why organizations
rely on data for important decisions—whether it’s for companies “This is the book I wish
designing websites, cities deciding how to improve services, or we had when we first
scientists working to stop the spread of disease. And you want came up with the
the skills required to distill a messy pile of data into actionable term data scientist
insights. We call this the data science lifecycle: the process to describe what we
of collecting, wrangling, analyzing, and drawing conclusions do. If you’re looking
from data. to be in data science/
Learning Data Science is the first book to cover foundational engineering, AI, or
skills in both programming and statistics that encompass this machine learning, this is
entire lifecycle. It’s aimed at those who wish to become data where you need to start.”
scientists or who work with data scientists, and at data —DJ Patil, PhD
analysts who wish to cross the “technical/nontechnical” first US Chief Data Scientist
• Glean valuable insights through data cleaning, exploration, Joey Gonzalez is an associate professor
in the EECS Department at UC Berkeley,
and visualization
a member of the Berkeley AI Research
• Learn how to use modeling to describe the data group, and a founding member of the
Berkeley RISE Lab. He also cofounded
• Generalize findings beyond the data
Turi Inc. and Aqueduct, which develop
tools for data scientists.
Deborah Nolan is professor emerita of
statistics and associate dean for students
in the College of Computing, Data
Science, and Society at UC Berkeley.
I helped develop and teach the UC Berkeley data science course based on Learning Data
Science. This book provides the foundational skills and concepts needed to solve
real-world data science problems.
— Fernando Pérez, UC Berkeley Professor and
Cofounder of Project Jupyter
Learning Data Science is a great introduction to the field of data science for beginners and
working professionals alike. Read it for the exciting case studies.
—Siddharth Yadav, Freelance Data Scientist
There’s not a lot of data science books that focus on exploratory data analysis and how
that segues into the real modeling process. This book does just that and should serve
anyone wanting a deep-dive in how to explore data.
—Thomas Nield, Consultant/Instructor,
Nield Consulting Group/Yawman Flight
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning Data Science, the cover image,
and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
978-1-098-11300-1
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
v
3. Simulation and Data Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The Urn Model 28
Sampling Designs 30
Sampling Distribution of a Statistic 32
Simulating the Sampling Distribution 33
Simulation with the Hypergeometric Distribution 35
Example: Simulating Election Poll Bias and Variance 36
The Pennsylvania Urn Model 38
An Urn Model with Bias 40
Conducting Larger Polls 41
Example: Simulating a Randomized Trial for a Vaccine 43
Scope 43
The Urn Model for Random Assignment 44
Example: Measuring Air Quality 46
Summary 49
vi | Table of Contents
Aggregating 91
Basic Group-Aggregate 92
Grouping on Multiple Columns 95
Custom Aggregation Functions 96
Pivoting 98
Joining 100
Inner Joins 101
Left, Right, and Outer Joins 103
Example: Popularity of NYT Name Categories 105
Transforming 107
Apply 107
Example: Popularity of “L” Names 109
The Price of Apply 110
How Are Dataframes Different from Other Data Representations? 111
Dataframes and Spreadsheets 111
Dataframes and Matrices 112
Dataframes and Relations 113
Summary 113
Table of Contents | ix
Data Collected Over Time 263
Observational Studies 265
Unequal Sampling 266
Geographic Data 267
Adding Context 268
Example: 100m Sprint Times 269
Creating Plots Using plotly 270
Figure and Trace Objects 271
Modifying Layout 273
Plotting Functions 274
Annotations 276
Other Tools for Visualization 277
matplotlib 278
Grammar of Graphics 278
Summary 279
12. Case Study: How Accurate Are Air Quality Measurements?. . . . . . . . . . . . . . . . . . . . . . 281
Question, Design, and Scope 282
Finding Collocated Sensors 284
Wrangling the List of AQS Sites 284
Wrangling the List of PurpleAir Sites 286
Matching AQS and PurpleAir Sensors 288
Wrangling and Cleaning AQS Sensor Data 290
Checking Granularity 291
Removing Unneeded Columns 292
Checking the Validity of Dates 292
Checking the Quality of PM2.5 Measurements 293
Wrangling PurpleAir Sensor Data 294
Checking the Granularity 296
Handling Missing Values 300
Exploring PurpleAir and AQS Measurements 302
Creating a Model to Correct PurpleAir Measurements 308
Summary 310
x | Table of Contents
Transform Text into Features 317
Text Analysis 317
String Manipulation 318
Converting Text to a Standard Format with Python String Methods 318
String Methods in pandas 319
Splitting Strings to Extract Pieces of Text 320
Regular Expressions 321
Concatenation of Literals 322
Quantifiers 324
Alternation and Grouping to Create Features 326
Reference Tables 327
Text Analysis 329
Summary 334
Table of Contents | xi
Summary 407