Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch Adi Polak download pdf
Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch Adi Polak download pdf
com
https://ebookmeta.com/product/scaling-machine-learning-with-
spark-distributed-ml-with-mllib-tensorflow-and-pytorch-adi-
polak/
OR CLICK BUTTON
DOWNLOAD NOW
https://ebookmeta.com/product/scaling-machine-learning-with-spark-
third-early-release-adi-polak/
ebookmeta.com
https://ebookmeta.com/product/deep-reinforcement-learning-with-python-
with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-sanghi-3/
ebookmeta.com
https://ebookmeta.com/product/deep-reinforcement-learning-with-python-
with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-sanghi/
ebookmeta.com
https://ebookmeta.com/product/horimiya-vol-3-daisuke-hagiwara/
ebookmeta.com
Climate Geoengineering: Law and Governance (AESS
Interdisciplinary Environmental Studies and Sciences
Series) Wil Burns
https://ebookmeta.com/product/climate-geoengineering-law-and-
governance-aess-interdisciplinary-environmental-studies-and-sciences-
series-wil-burns/
ebookmeta.com
https://ebookmeta.com/product/plant-growth-responses-for-smart-
agriculture-prospects-and-applications-1st-edition-t-girija-editor/
ebookmeta.com
https://ebookmeta.com/product/fifty-years-of-the-australian-master-
tax-guide-cch-australia-limited/
ebookmeta.com
https://ebookmeta.com/product/quantum-mechanics-nouredine-zettili/
ebookmeta.com
https://ebookmeta.com/product/very-british-weather-over-365-hidden-
wonders-from-the-world-s-greatest-forecasters-1st-edition-the-met-
office/
ebookmeta.com
Official ABCs of Love 1st Edition Georgia Coffman
https://ebookmeta.com/product/official-abcs-of-love-1st-edition-
georgia-coffman/
ebookmeta.com
Scaling Machine
Learning with
Spark
Distributed ML with MLlib,
TensorFlow, and PyTorch
Adi Polak
Scaling Machine Learning with Spark
Learn how to build end-to-end scalable machine learning
solutions with Apache Spark. With this practical guide, author “If there is one book the
Adi Polak introduces data and ML practitioners to creative
Spark community has
solutions that supersede today’s traditional methods. You’ll
learn a more holistic approach that takes you beyond specific been craving for the last
requirements and organizational goals—allowing data and ML decade, it’s this.”
practitioners to collaborate and understand each other better. —Andy Petrella
Founder at Kensu and author of
Scaling Machine Learning with Spark examines several Fundamentals of Data Observability
technologies for building end-to-end distributed ML
workflows based on the Apache Spark ecosystem with
Spark MLlib, MLFlow, TensorFlow, and PyTorch. If you’re Adi Polak is an open source
a data scientist who works with machine learning, this technologist who believes in
communities and education, and
book shows you when and why to use each technology.
their ability to positively impact the
You will: world around us. She is passionate
about building a better world through
• Explore machine learning, including distributed open collaboration and technological
computing concepts and terminology innovation. As a seasoned engineer
• Manage the ML lifecycle with MLflow and vice president of developer
experience at Treeverse, Adi shapes
• Ingest data and perform basic preprocessing with Spark the future of data and ML technologies
• Explore feature engineering, and use Spark for hands-on builders. She serves on
multiple program committees and acts
to extract features
as an advisor for conferences like Data
• Train a model with MLlib and build a pipeline to reproduce it & AI Summit by Databricks, Current by
Confluent, and Scale by the Bay, among
• Build a data system to combine the power of Spark others. Adi previously served as a
with deep learning senior manager for Azure at Microsoft,
• Get a step-by-step example of working with where she helped build advanced
distributed TensorFlow analytics systems and modern data
architectures. Adi gained experience
• Use PyTorch to scale machine learning and in machine learning by conducting
its internal architecture research for IBM, Deutsche Telekom,
and other Fortune 500 companies.
9 781098 106829
Praise for Scaling Machine Learning with Spark
If there is one book the Spark community has been craving for the last decade, it’s this.
Writing about the combination of Spark and AI requires broad knowledge, a deep
technical skillset, and the ability to break down complex concepts so they’re easy to
understand. Adi delivers all of this and more while covering big data, AI, and
everything in between.
—Andy Petrella, founder at Kensu and author of
Fundamentals of Data Observability (O’Reilly)
Scaling Machine Learning with Spark is a wealth of knowledge for data and ML
practitioners, providing a holistic and creative approach to building end-to-end scalable
machine learning solutions. The author’s expertise and knowledge, combined with a focus
on collaboration and understanding, makes this book a must-read for anyone
in the industry.
—Noah Gift, Duke executive in residence
Adi’s book is without any doubt a good reference and resource to have beside you when
working with Spark and distributed ML. You will learn best practices she has to share
along with her experience working in the industry for many years. Worth the investment
and time reading it.
—Laura Uzcategui, machine learning engineer at TalentBait
This book is an amazing synthesis of knowledge and experience. I consider it essential
reading for both novice and veteran machine learning engineers. Readers will deepen
their understanding of general principles for machine learning in distributed systems
while simultaneously engaging with the technical details required to integrate and scale
the most widely used tools of the trade including Spark, PyTorch, Tensorflow.
—Matthew Housley, CTO and coauthor of Fundamentals of
Data Engineering (O’Reilly)
Adi’s done a wonderful job at creating a very readable, practical, and insanely detailed
deep dive into machine learning with Spark.
—Joe Reis, coauthor of Fundamentals of Data Engineering
(O’Reilly) and “recovering data scientist”
Scaling Machine Learning
with Spark
Distributed ML with MLlib,
TensorFlow, and PyTorch
Adi Polak
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Scaling Machine Learning with Spark,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author and do not represent the publisher’s views. While
the publisher and the author have used good faith efforts to ensure that the information and instructions
contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or
omissions, including without limitation responsibility for damages resulting from the use of or reliance
on this work. Use of the information and instructions contained in this work is at your own risk. If any
code samples or other technology this work contains or describes is subject to open source licenses or the
intellectual property rights of others, it is your responsibility to ensure that your use thereof complies
with such licenses and/or rights.
978-1-098-10682-9
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
v
2. Introduction to Spark and PySpark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Apache Spark Architecture 30
Intro to PySpark 32
Apache Spark Basics 33
Software Architecture 33
PySpark and Functional Programming 39
Executing PySpark Code 40
pandas DataFrames Versus Spark DataFrames 41
Scikit-Learn Versus MLlib 42
Summary 43
vi | Table of Contents
5. Feature Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Features and Their Impact on Models 93
MLlib Featurization Tools 96
Extractors 96
Selectors 97
Example: Word2Vec 98
The Image Featurization Process 99
Understanding Image Manipulation 101
Extracting Features with Spark APIs 103
The Text Featurization Process 109
Bag-of-Words 110
TF-IDF 110
N-Gram 111
Additional Techniques 112
Enriching the Dataset 112
Summary 113
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Table of Contents | ix
Preface
Welcome to Scaling Machine Learning with Spark: Distributed ML with MLlib, Tensor‐
Flow, and PyTorch. This book aims to guide you in your journey as you learn more
about machine learning (ML) systems. Apache Spark is currently the most popular
framework for large-scale data processing. It has numerous APIs implemented in
Python, Java, and Scala and is used by many powerhouse companies, including Net‐
flix, Microsoft, and Apple. PyTorch and TensorFlow are among the most popular
frameworks for machine learning. Combining these tools, which are already in use in
many organizations today, allows you to take full advantage of their strengths.
Before we get started, though, perhaps you are wondering why I decided to write this
book. Good question. There are two reasons. The first is to support the machine
learning ecosystem and community by sharing the knowledge, experience, and exper‐
tise I have accumulated over the last decade working as a machine learning algorithm
researcher, designing and implementing algorithms to run on large-scale data. I have
spent most of my career working as a data infrastructure engineer, building infra‐
structure for large-scale analytics with all sorts of formatting, types, schemas, etc., and
integrating knowledge collected from customers, community members, and collea‐
gues who have shared their experience while brainstorming and developing solutions.
Our industry can use such knowledge to propel itself forward at a faster rate, by lev‐
eraging the expertise of others. While not all of this book’s content will be applicable
to everyone, much of it will open up new approaches for a wide array of practitioners.
This brings me to my second reason for writing this book: I want to provide a holistic
approach to building end-to-end scalable machine learning solutions that extends
beyond the traditional approach. Today, many solutions are customized to the specific
requirements of the organization and specific business goals. This will most likely con‐
tinue to be the industry norm for many years to come. In this book, I aim to challenge
the status quo and inspire more creative solutions while explaining the pros and cons
of multiple approaches and tools, enabling you to leverage whichever tools are used in
your organization and get the best of all worlds. My overall goal is to make it simpler
xi
for data and machine learning practitioners to collaborate and understand each other
better.
xii | Preface
Chapter 1, “Distributed Machine Learning Terminology and Concepts”
This chapter provides a high-level introduction to machine learning and covers
terminology and concepts related to distributed computing and network topolo‐
gies. I will walk you through various concepts and terms, so you have a strong
foundation for the next chapters.
Chapter 2, “Introduction to Spark and PySpark”
The goal of this chapter is to bring you up to speed on Spark and its Python
library, PySpark. We’ll discuss terminology, software abstractions, and more.
Chapter 3, “Managing the Machine Learning Experiment Lifecycle with MLflow”
This chapter introduces MLflow, a platform that facilitates managing the
machine learning lifecycle. We’ll discuss what a machine learning experiment is
and why managing its lifecycle is important, and we’ll examine the various com‐
ponents of MLflow that make this possible.
Chapter 4, “Data Ingestion, Preprocessing, and Descriptive Statistics”
Next, we will dive into working with data. In this chapter, I will discuss how to
use Spark to ingest your data, perform basic preprocessing (using image files as
an example), and get a feel for the data. I’ll also cover how to avoid the so-called
small file problem with image files by leveraging the PySpark API.
Chapter 5, “Feature Engineering”
Once you’ve performed the steps in the previous chapter, you’re ready to engi‐
neer the features you will use to train your machine learning model. This chapter
explains in detail what feature engineering is, covering various types, and show‐
cases how to leverage Spark’s functionality for extracting features. We’ll also look
at how and when to use applyInPandas and pandas_udf to optimize perfor‐
mance.
Chapter 6, “Training Models with Spark MLlib”
This chapter walks you through working with MLlib to train a model, evaluate
and build a pipeline to reproduce the model, and finally persist it to disk.
Chapter 7, “Bridging Spark and Deep Learning Frameworks”
This chapter breaks down how to build a data system to combine the power of
Spark with deep learning frameworks. It discusses bridging Spark and deep
learning clusters and provides an introduction to Petastorm, Horovod, and the
Spark initiative Project Hydrogen.
Chapter 8, “TensorFlow Distributed Machine Learning Approach”
Here, I’ll lead you through a step-by-step example of working with distributed
TensorFlow—specifically tf.keras—while leveraging the preprocessing you’ve
done with Spark. You will also learn about the various TensorFlow patterns for
scaling machine learning and the component architectures that support it.
Preface | xiii
Other documents randomly have
different content
Bull and the Cockpit probably whistled the tune as they wended their
way homeward to crab-apple ale and spiced gingerbread.
Next to the Champions of Christendom, the King’s Knight Champion
of England is perhaps the most important personage—in the point of
view of chivalry. I think it is some French author who has said, that
revolutions resemble the game of chess, where the pawns or pieces
(les pions) may cause the ruin of the king, save him, or take his
place. Now the champ pion, as this French remark reminds me, is
nothing more than the field pion, pawn, or piece, put forward to fight
in the king’s quarrel.
The family of the Champion of England bears, it may be observed,
exactly the name which suits a calling so derived. The appellation
“Dymoke” is derived from De Umbrosâ Quercu; I should rather say it
is the translation of it; and Harry De Umbrosâ Quercu is only Harry of
the Shady or Dim Oak, a very apt dwelling-place and name for one
whose chief profession was that of field-pawn to the king.
This derivation or adaptation of names from original Latin surnames
is common enough, and some amusing pages might be written on
the matter, in addition to what has been so cleverly put together by
Mr. Mark Anthony Lower, in his volume devoted especially to an
elucidation of English surnames.
The royal champions came in with the Conquest. The Norman dukes
had theirs in the family of Marmion—ancestors of that Marmion of Sir
Walter Scott’s, who commits forgery, like a common knave of more
degenerate times. The Conqueror conferred sundry broad lands in
England on his champions; among others, the lands adjacent to, as
well as the castle of Tamworth. Near this place was the first nunnery
established in this country. The occupants were the nuns of St.
Edith, at Polesworth. Robert de Marmion used the ladies very
“cavalierly,” ejected them from their house, and deprived them of
their property. But such victims had a wonderfully clever way of
recovering their own.
My readers may possibly remember how a certain Eastern potentate
injured the church, disgusted the Christians generally, and irritated
especially that Simeon Stylites who sat on the summit of a pillar,
night and day, and never moved from his abiding-place. The offender
had a vision, in which he not only saw the indignant Simeon, but was
cudgelled almost into pulp by the simulacre of that saint. I very much
doubt if Simeon himself was in his airy dwelling-place at that
particular hour of the night. I was reminded of this by what happened
to the duke’s champion, Robert de Marmion. He was roused from a
deep sleep by the vision of a stout lady, who announced herself as
the wronged St. Edith, and who proceeded to show her opinion of De
Marmion’s conduct toward her nuns, by pommelling his ribs with her
crosier, until she had covered his side with bruises, and himself with
repentance. What strong-armed young monk played St. Edith that
night, it is impossible to say; but that he enacted the part
successfully, is seen from the fact that Robert brought back the
ladies to Polesworth, and made ample restitution of all of which they
had been deprived. The nuns, in return, engaged with alacrity to inter
all defunct Marmions within the chapter-house of their abbey, for
nothing.
With the manor of Tamworth in Warwickshire, Marmion held that of
Scrivelsby in Lincolnshire. The latter was held of the King by grand
sergeantry, “to perform the office of champion at the King’s
coronation.” At his death he was succeeded by a son of the same
Christian name, who served the monks of Chester precisely as his
sire had treated the nuns at Polesworth. This second Robert fortified
his ill-acquired prize—the priory; but happening to fall into one of the
newly-made ditches, when inspecting the fortifications, a soldier of
the Earl of Chester killed him, without difficulty, as he lay with broken
hip and thigh, at the bottom of the fosse. The next successor, a third
Robert, was something of a judge, with a dash of the warrior, too,
and he divided his estates between two sons, both Roberts, by
different mothers. The eldest son and chief possessor, after a
bustling and emphatically “battling” life, was succeeded by his son
Philip, who fell into some trouble in the reign of Henry III. for
presuming to act as a judge or justice of the peace, without being
duly commissioned. This Philip was, nevertheless, one of the most
faithful servants to a king who found so many faithless; and if honors
were heaped upon him in consequence, he fairly merited them all.
He was happy, too, in marriage, for he espoused a lady sole heiress
to a large estate, and who brought him four daughters, co-heiresses
to the paternal and maternal lands of the Marmions and the Kilpecs.
This, however, is wandering. Let us once more return to orderly
illustration. In St. George I have shown how pure romance deals with
a hero. In the next chapter I will endeavor to show in what spirit the
lives and actions of real English heroes have been treated by native
historians. In so doing, I will recount the story of Sir Guy of Warwick,
after their fashion, with original illustrations and “modern instances.”
SIR GUY OF WARWICK,
AND WHAT BEFELL HIM.
“His desires
Are higher than his state, and his deserts
Not much short of the most he can desire.”
Chapman’s Byron’s Conspiracy.
At eight years old, I was about to remark, young Guy was the most
insufferable puppy of his district. He won all the prizes for athletic
sports; and by the time he was sixteen there was not a man in all
England who dared accept his challenge to wrestle with both arms,
against him using only one.
It was at this time that he kept his father’s books and a leash of
hounds, with the latter of which he performed such extraordinary
feats, that the Earl of Warwick invited him from the steward’s room to
his own table; where Guy’s father changed his plate, and Master
Guy twitched him by the beard as he did it.
At the head of the earl’s table sat his daughter “Phillis the Fair,” a
lady who, like her namesake in the song, was “sometimes forward,
sometimes coy,” and altogether so sweetly smiling and so beguiling,
that when the earl asked Guy if he would not come and hunt (the
dinner was at 10 a. m.), Guy answered, as the Frenchman did who
could not bear the sport, with a Merci, j’ai été! and affecting an iliac
seizure, hinted at the necessity of staying at home.
The youth forthwith was carried to bed. Phillis sent him a posset, the
earl sent him his own physician; and this learned gentleman, after
much perplexity veiled beneath the most affable and confident
humbug, wrote a prescription which, if it could do the patient no good
would do him no harm. He was a most skilful man, and his patients
almost invariably recovered under this treatment. He occasionally
sacrificed one or two when a consultation was held, and he was
called upon to prescribe secundum artem; but he compensated for
this professional slaying by, in other cases, leaving matters to
Nature, who was the active partner in his firm, and of whose success
he was not in the least degree jealous. So, when he had written the
prescription, Master Guy fell a discoursing of the passion of love,
and that with a completeness and a variety of illustration as though
he were the author of the chapter on that subject in Burton’s
“Anatomy of Melancholy.” The doctor heard him to the end, gently
rubbing one side of his nose the while with the index-finger of his
right hand; and when his patient had concluded, the medical
gentleman smiled, hummed “Phillis is my only joy,” and left the room
with his head nodding like a Chinese Mandarin’s.
By this time the four o’clock sun was making green and gold pillars
of the trees in the neighboring wood, and Guy got up, looked at the
falling leaves, and thought of the autumn of his hopes. He whistled
“Down, derry, down,” with a marked emphasis on the down; but
suddenly his hopes again sprang up, as he beheld Phillis among her
flower-beds, engaged in the healthful occupation which a sublime
poet has given to the heroine whom he names, and whose action he
describes, when he tells us that