Download Complete Deep Learning with Swift for TensorFlow Differentiable Programming with Swift 1st Edition Rahul Bhalley PDF for All Chapters
Download Complete Deep Learning with Swift for TensorFlow Differentiable Programming with Swift 1st Edition Rahul Bhalley PDF for All Chapters
com
https://ebookmeta.com/product/deep-learning-with-swift-for-
tensorflow-differentiable-programming-with-swift-1st-
edition-rahul-bhalley/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/deep-learning-with-swift-for-tensorflow-
rahul-bhalley/
ebookmeta.com
https://ebookmeta.com/product/ios-15-programming-fundamentals-with-
swift-swift-xcode-and-cocoa-basics-1st-edition-matt-neuburg/
ebookmeta.com
https://ebookmeta.com/product/combine-asynchronous-programming-with-
swift-third-edition-scott-gardner/
ebookmeta.com
https://ebookmeta.com/product/forced-migration-and-human-security-in-
the-eastern-orthodox-world-1st-edition-lucian-n-leustean/
ebookmeta.com
The Everything Girl L. Maleki
https://ebookmeta.com/product/the-everything-girl-l-maleki/
ebookmeta.com
https://ebookmeta.com/product/allied-armor-in-normandy-1st-edition-
yves-buffetaut/
ebookmeta.com
https://ebookmeta.com/product/a-kiss-for-a-kiss-all-in-4-1st-edition-
helena-hunting/
ebookmeta.com
https://ebookmeta.com/product/when-to-walk-away-finding-freedom-from-
toxic-people-gary-thomas/
ebookmeta.com
Compressor Handbook Principles and Practice 2nd Edition
Tony Giampaolo
https://ebookmeta.com/product/compressor-handbook-principles-and-
practice-2nd-edition-tony-giampaolo/
ebookmeta.com
Deep Learning
with Swift for
TensorFlow
Differentiable Programming with Swift
—
Rahul Bhalley
Deep Learning with
Swift for TensorFlow
Differentiable Programming
with Swift
Rahul Bhalley
Deep Learning with Swift for TensorFlow: Differentiable Programming
with Swift
Rahul Bhalley
Ludhiana, India
iii
Table of Contents
iv
Table of Contents
v
Table of Contents
vi
Table of Contents
5.6 Optimization������������������������������������������������������������������������������������������������219
5.6.1 Gradient Descent��������������������������������������������������������������������������������220
5.6.2 Momentum�����������������������������������������������������������������������������������������223
5.7 Regularization���������������������������������������������������������������������������������������������223
5.7.1 Dataset�����������������������������������������������������������������������������������������������224
5.7.2 Architecture����������������������������������������������������������������������������������������225
5.7.3 Loss Function�������������������������������������������������������������������������������������227
5.7.4 Optimization���������������������������������������������������������������������������������������228
5.8 Summary����������������������������������������������������������������������������������������������������230
References����������������������������������������������������������������������������������������269
Index�������������������������������������������������������������������������������������������������283
vii
About the Author
Rahul Bhalley is an independent machine intelligence researcher. He
was the co-founder of a short-lived deep learning startup in 2018. He
has published research papers in areas such as speech processing and
generative modeling. He actively contributes to open source projects
related to deep learning on GitHub. He has also worked with Apple’s Swift
and shares Google’s vision of making it easy for others to understand deep
learning with Swift.
ix
About the Technical Reviewer
Vishwesh Ravi Shrimali graduated from BITS Pilani in 2018, where he
studied mechanical engineering. Since then, he has worked with Big
Vision LLC on deep learning and computer vision and was involved in
creating official OpenCV AI courses. Currently, he is working at Mercedes-
Benz Research and Development India Pvt. Ltd. He has a keen interest
in programming and AI and has applied that interest in mechanical
engineering projects. He has also written multiple blogs on OpenCV and
deep learning on Learn OpenCV, a leading blog on computer vision. He
has also coauthored Machine Learning for OpenCV (second edition) by
Packt. When he is not writing blogs or working on projects, he likes to go
on long walks or play his acoustic guitar.
xi
Preface
As a programmer and student pursuing graduation, I had a lot of trouble
understanding deep learning by myself when I started this journey back
in 2015. So I decided to write a deep learning programming book that
might help people in a similar situation as mine to easily understand deep
learning. I have tried to keep the explanation of difficult deep learning
concepts simple throughout the book.
But why Swift for deep learning? Swift is a powerful general-purpose
differentiable programming language. Swift is a well-researched programming
language and the code written in it seems like reading English sentences,
making it easy for newcomers to learn programming and even deep learning.
Furthermore, Swift is optimized for performance, so researchers can write all
deep learning algorithms in a single language with simple syntax.
And what about Python’s wide ecosystem of various libraries? With the
Python interoperability feature of Swift for TensorFlow (S4TF), you can use
Python libraries within Swift!
The intended audience of this book is as follows:
xiii
CHAPTER 1
Machine Learning
Basics
We’re unquestionably in the business of forging the gods.1
—Pamela McCorduck
1
Quoted from (McCorduck, 2004).
© Rahul Bhalley 2021 1
R. Bhalley, Deep Learning with Swift for TensorFlow,
https://doi.org/10.1007/978-1-4842-6330-3_1
Chapter 1 Machine Learning Basics
iPhone and iPad use the LIDAR information from camera sensors to create
depth map of surrounding instantly. This information is then combined
with machine intelligence to deliver computational photography features
such as bokeh effect with adjustable strength, immersive augmented
reality (AR) features such as reflection and lighting of surrounding on AR
objects, object occlusions when humans enter in the scene, and much
more. Personal voice assistant like Siri understands your speech allowing
you to do various tasks such as controlling your home accessories, playing
music on HomePod, calling and texting people, and more. The machine
intelligence technology becomes possible due to fast graphics processing
unit (GPU). Nowadays GPU on portable devices are fast enough to process
user’s data without having to send it to the cloud servers. This approach
helps in keeping the user’s data private and hence secure from undesirable
exposure and usage (Sharma and Bhalley, 2016). In fact, all the features
mentioned above are made available with on-device machine intelligence.
It might surprise you that AI is not a new technology. It actually dates
back to the 1940s, and it was not considered useful and cool at all. It had
many ups and downs. The AI technology arose to popularity for mainly
three times. It had different names over these eras, and now we popularly
know it as deep learning. Between the 1940s-1960s, AI was known as
“cybernetics”; around the 1980s–1990s, it was known as “connectionism”;
and since 2006, we know AI as “deep learning.”
At some point in the past, there was also a misconception, believed
by many researchers, that if all the rules of the way everything in the
universe works were programmed in a computing machine, then it would
automatically become intelligent. But this idea is strongly challenged by
the current state of AI because we now know there are simpler ways to
make machines mimic human-like intelligence.
In earlier days of AI research, the data was sparsely available. The
computational machines were also slow. These were one of the main
factors that drowned the popularity of AI systems. But now we have the
Internet, and a very large fraction of the population on Earth interacts with
one another which generates humongous amounts of data quickly which
2
Chapter 1 Machine Learning Basics
classifying a text using decision tree methods, and so on. ML uses data to
learn and is also known to perform weaker than deep learning. Finally, the
current state-of-the-art AI is deep learning. Deep learning (DL) also uses
data for learning but in a hierarchical fashion (LeCun et al., 2015) taking
inspiration from the brain. DL algorithms can learn the mapping of very
complicated datasets easily without compromising accuracy, but they
instead perform better than machine learning algorithms. If you’d draw a
Venn diagram, shown in Figure 1-1, you’d see deep learning is a subset of
machine learning, whereas the artificial intelligence field is a superset of
both these fields.
Machine Learning
Examples: k-nearest neighbors, k-means
clustering, Gaussian processes, principle
component analysis, t-SNE, etc.
Deep Learning
4
Chapter 1 Machine Learning Basics
Now we can begin our journey with deep learning starting with simple
machine learning concepts.
1.1.1 Experience
The experience is multiple observations made by a model to learn to
perform a task. These observations are samples from an available dataset.
During learning, a model is always required to observe the data.
5
Chapter 1 Machine Learning Basics
The data can be in various forms such as image, video, audio, text,
tactile, and others. Each sample, also known as example, from data can be
expressed in terms of its features. For example, features of an image sample
are its pixels where each pixel consists of red, green, and blue color values.
The different brightness value of all these colors together represents a
single color in the visible range of the spectrum (which our eyes can
perceive) of electromagnetic radiations.
In addition to the features, each sample sometimes might also contain
a corresponding label vector, also known as target vector, which represents
the class to which the sample belongs. For instance, a fish image sample
might have a corresponding label vector that represents a fish. The label
is usually expressed in one-hot encoding (also called 1-of-k coding where
k is the number of classes), a representation where only a single index
in a whole vector has a value of one and all others are set to zero. Each
index is assumed to represent a certain class, and the index whose value
is one is assumed to represent the class to which the sample belongs. For
instance, assume the [1 0 0] vector represents a dog, whereas [0 1 0] and
[0 0 1] vectors represent a fish and a bird, respectively. This means that all
the image samples of birds have a corresponding label vector [0 0 1] and
likewise dog and fish image samples will have their own labels.
The features of samples we have listed previously are raw features, that
is, these are not handpicked by humans. Sometimes, in machine learning,
feature selection plays an important role in the performance of the model.
For instance, a high-resolution image will be slower to process than its
low-resolution counterpart for a task like face recognition. Because deep
learning can work directly on raw data with great performance,
we won’t discuss feature selection in particular. But we will go through
some preprocessing techniques as the need arises in code listings to get
the data in correct format. We refer the interested readers to (Theodoridis
and Koutroumbas, 2009) textbook to read about feature selection.
In deep learning, we may require to preprocess the data. Preprocessing
is a sequence of functions applied on raw samples to transform them
into a desired specific form. This desired form is usually decided based
6
Chapter 1 Machine Learning Basics
on the design of the model and the task at hand. For instance, a raw
audio waveform sampled at 16 KHz has 16,384 samples per second
expressed as a vector. For even a short audio recording, say 5 seconds,
this vector’s dimension size will become very large, that is, an 81,920
elements long vector! This will take longer to process by our model. This
is where preprocessing becomes helpful. We can then preprocess each
raw audio waveform sample with the fast Fourier transform (Heideman
et al., 1985) function to transform it into a spectrogram representation.
Now this image can be processed much faster than the previous lengthy
raw audio waveform. There are different ways to preprocess the data, and
the choice depends on the model design and the task at hand. We will
cover some preprocessing steps in the book for different kinds of data,
non-exhaustively, wherever the need occurs.
1.1.2 Task
The task is an act of processing the sample features by the model to return
the correct label for the sample. There are fundamentally two tasks for which
machine learning models are designed, namely, regression and classification.
There are more interesting tasks which we will introduce and program in later
chapters and are simply the extension of these two basic tasks.
For instance, for a fish image, the model should return the [0 1 0]
vector. Because here the image is being mapped to its label, this task is
commonly known as image classification. This serves as a simple example
for a classification task.
A good example of a regression task is object detection. We might want
to detect the location of an object, say ball, in an image. Here, features are
image pixels, and the labels are the coordinates of an object in the image.
These coordinates represent a bounding box for the object, that is, the
location where the object is present in a given image. Here, our goal is to
7
Chapter 1 Machine Learning Basics
train a model that takes image features as input and predicts the correct
bonding box coordinates for an object. Because the prediction output is
real-valued, object detection is considered as a regression task.
Classifier Accuracy
C1 92%
C2 99%
8
Chapter 1 Machine Learning Basics
Now let us consider precision and recall for these two classifiers which
is a two-number evaluation metric. Precision and recall are defined as a
fraction of all and car images in the test or validation set that the classifier
correctly labeled as cars, respectively. For our arbitrary classifiers, these
metric values are shown in Table 1-2.
C1 98% 95%
C2 95% 90%
2
F1 =
1 1 (1.1)
+
Precision Recall
Table 1-3 shows the F1 score for each classifier by putting their
precision and recall values in Equation 1.1.
From Table 1-3, by simply looking at the F1 scores, we can easily
conclude that classifier C2 performs better than C1. In practice, having
a single-number metric for evaluation can be extremely helpful in
determining the superiority of the trained models and accelerate your
research or deployment process.
9
Chapter 1 Machine Learning Basics
10
Chapter 1 Machine Learning Basics
11
Chapter 1 Machine Learning Basics
(Goodfellow et al., 2014), the label for datapoint generated from a generator is
given a fake label (or 0), whereas a datapoint sampled from a dataset is given
a real label (or 1). Another example is auto-encoder (Vincent et al., 2008)
where labels are the corresponding sample images themselves.
12
Chapter 1 Machine Learning Basics
The reward is either positive or negative and can be, respectively, regarded
as a good or bad response to the agent from the world in accordance to the
behavioral science viewpoint. We are more interested in return instead of
the current step reward because the goal of the agent is to maximize the
return over the course of each episode. Here, an episode is a sequence of
interactions between an agent and its environment from a start to an end.
Examples of an episode are as follows: a gameplay by the agent where the
game ends when a certain condition is met and an agent trying to stay
alive in harsh environmental conditions until it dies due to some accident.
See Figure 1-2 for a diagrammatic view of an interaction between the agent
and its environment.
State St
Reward Rt Agent
Action At
Environment
13
Chapter 1 Machine Learning Basics
The agent perceives the previous state of the environment St-1 and
takes an action At on the environment whose state St changes and is
returned to the agent. The agent also receives a scalar reward Rt from
the environment describing how good is the current state for the agent.
Although the state of the environment changes when the agent acts, it may
also change by itself. In multi-agent reinforcement learning, there might
also be other agents maximizing their own returns.
Reinforcement learning is a very interesting machine learning field
and is being studied aggressively at the time of this writing. It is also
considered to be more close to the way humans (or other mammals) learn
by making behavioral modifications, that is, reinforcing an action based on
the reward. A recent work (Minh et al., 2015) showed that a combination
of deep learning and reinforcement learning called deep reinforcement
learning can even surpass human-level gameplay capabilities.
Unfortunately, we don’t discuss reinforcement learning in the book.
Interested readers may refer (Sutton and Barto, 2018) textbook for the
fundamentals of this field. For deep reinforcement learning advances, we
suggest the works (Mnih et al., 2015; Schulman et al., 2017).
Now let us look at the basic idea known as maximum likelihood
estimation that helps in constructing machine learning algorithms.
14
Chapter 1 Machine Learning Basics
{(
containing a set of N datapoints (or samples), that is, ⅅ = x (i ) ,t (i )
Assume that each datapoint is identical and sampled independently from
)}
i =1
N
.
15
Chapter 1 Machine Learning Basics
( )
N
Pm ( ⅅ q ) = Õ Pm x (i ) , t (i ) q = (q ⅅ)
i =1 (1.2)
( ) ( ) ( )
N N
= Õ Pm x (i ) , t (i ) = Õ Pm t (i ) x (i ) Pm x (i ) (1.3)
i =1 i =1
( ) ( )
N N
L = - ln = -å ln Pm t (i ) x (i ) - å ln Pm x (i ) (1.4)
i =1 i =1
16
Chapter 1 Machine Learning Basics
much smaller values which may get rounded off due to limited-precision
representational capacity of computational devices. This is why machine
learning frequently uses the logarithm function.
The negative log-likelihood function can be regarded as a loss or
error function denoted by L(.). Our goal is to minimize the loss function
by updating its parameter values such that our parameterized model
PDF approximates the data PDF using the available fixed and finite data
samples from the data PDF. Note that the second term in this equation
doesn’t contribute in the parameters estimation of the model PDF
because it doesn’t depend on the model’s parameters and is simply a
negative additive term which can be ignored for maximizing the likelihood
function. The loss function can now be simply described by the following
equation:
( )
N
L = -å ln Pm t (i ) x (i ) (1.5)
i =1
17
Chapter 1 Machine Learning Basics
1.4.1 Data
The data present in the dataset serves as an experience for the machine
learning model. When the model is first initialized with random parameter
values, it has no knowledge of how to perform well on the certain task.
This knowledge is gained by iteratively exposing the model to data (usually
in small sample counts, also known as mini-batches). As the model
experiences more samples, it gradually learns to perform the task more
accurately.
A dataset is simply a structured, usually a tabular (or matrix),
arrangement of datapoints. Table 1-4 shows an arbitrary example of
a dataset. This dataset contains some characteristics (or features) of
people where each row contains the features for a single person. Each
18
Chapter 1 Machine Learning Basics
row contains the height (in centimeters), age (in years), and weight
(in kilograms) values for a certain person. Note that features and
their corresponding targets can be tensor values of any dimensions (0
dimensions, i.e., scalar variables, here).
151.7 25 47.8
139.7 20 36.4
136.5 18 31.8
156.8 28 53.0
Given the dataset, we must decide the features and targets for a task.
In other words, selection of the correct features for a task is dependent
solely upon our decision. For instance, we can use this dataset to perform
a regression task of predicting the weight given the height and age of a
person. In this setting, our feature vector x contains height x1 and age x2,
that is, x = [x1 x2], for each person (called sample), whereas the target t
is the weight of a person. For example, x = [136.5 18] and t = 31.8 are the
feature vector and target scalar for the third person in the given dataset.
In machine learning literature, datapoint input to the model is also
known as example or sample, and its features are also known as attributes.
For instance, features of a car can be color, tire size, top speed, and so on.
Another example can be a drug whose features may include chemical
formula, proportion of each chemical element, and so on. The target, also
known as label or hard target (when differentiating from soft target), is the
desired output corresponding to a given sample. For instance, given the
19
Exploring the Variety of Random
Documents with Different Content
WAYSIDE SHRINE WITH AN OFFERING OF FLOWERS
A RURALE
From this class of men the first companies of rurales were formed.
Finding it was more profitable, or at least safer, to be in favour with this
aggressive government than under its ban, they willingly entered this
service. These men were brave and thoroughly familiar with all the
mountain retreats and haunts of the outlaw bands. They hunted down
their former confederates until a live bandit was a rare specimen.
Travelling once more became secure, and now there are few places in
Mexico where it is not perfectly safe for a traveller to journey. The
companies of rurales, of which there are many, form one of the most
effective forces for preserving order ever devised by any government.
Like the famous guardia civil of Spain, the rurales patrol the remote
mountain trails and great plains of the central plateaus, and are in
reality a body of rural police. Many a lonely traveller has been made glad
by the sight of the gray uniform of this band. They are generally kind
hearted, and will do everything in their power for a foreigner. Their
uniform is the typical riding costume of the country, and differs from the
French appearance of the uniforms of the regular army. They are fine
horsemen, expert in the use of pistol and carbine, and form one of the
most picturesque cavalry bodies in the world.
There is no sickly sentimentality wasted upon law breakers, and the
highwayman, or robber, gets little sympathy. Few criminals get a second
opportunity to commit their outrages through the pardoning process.
The old ley fuga, or law of attempted escape, which was in force under
Spanish rule, under which Indians or slaves attempting to flee were
shot, was revived. Orders were promulgated to shoot highwaymen on
sight, and all other prisoners if escape was attempted. Few attempts to
escape are now made by prisoners, for the guards have a reckless way
of sending bullets after fleeing prisoners, so that no chains are needed
to secure them. The bullets are swift and any one in custody, even
though held as a witness, will be followed by the quick, death-dealing
messengers, if an attempt to escape is made. Gangs of convicts may be
seen in various places working on the streets, or on the roads, under
military guard but without shackle. The only report necessary in the
event a prisoner is killed is that he attempted to escape. It may be a
harsh proceeding, but it saves the state a great deal of money, and
conviction is sure. Furthermore, it relieves judge, jury and prison officials
of much hard work and annoyance.
A few years ago the Mexican army consisted of a few thousand
irregular, nondescript soldiers so common in Spanish-American
countries. Such men it was who placed Porfirio Diaz in power in 1876,
the same year that we were celebrating the first centennial of our
independence. In promoting peace this man of Mexico has not forgotten
the arts of war. The army has been improved until it has ceased to be
made up of the comic-opera type of the barefooted, half-naked soldier,
and is now a well fed, well equipped, and well clothed organization to
which Mexicans can point with pride. To the American eye the soldiers
appear rather indifferent and insignificant, because of their smaller
stature and brown skin, which reveals the fact that the regular soldier is
generally drawn from the lower classes of Mexicans.
Although Mexico might be termed a military nation, as military service
is made obligatory by the law of the country, yet in times of peace this
service is not enforced. It is said that the majority of the enlistments are
not even voluntary, but that recruits are drawn from the ranks of those
who are persistent law breakers—those guilty of petty criminal offences
which we would term misdemeanours. Many of these peon soldiers who
before enlistment never knew what it was to have regular meals and
wear clean clothes every day, leave the service after a few years much
better citizens, and possessing a better education, for schools are
maintained in connection with all the barracks where instruction is given
in reading, writing and mathematics. The pay is about forty cents per
day, in Mexican silver, and is good pay for that country when you take
into consideration the fact that the soldier has absolutely no expenses
except for such luxuries as he may want.
The standing army of Mexico consists of thirty thousand men and
three thousand two hundred officers. Of this number the infantry
number twenty-two thousand six hundred, cavalry five thousand five
hundred, artillery two thousand, engineers and other branches of the
service making up the remainder. This gives a soldier for every five
hundred inhabitants, as compared with one for every fifteen hundred
inhabitants in the United States. Both infantry and cavalry are equipped
with the Spanish Mauser rifles and carbines. The headquarters of the
army are in the City of Mexico, and several battalions of infantry and
regiments of cavalry are stationed there at all times. The country is
divided into a number of districts, at the headquarters of each of which
are stationed large bodies of troops. Nearly every town of any size has a
commandancia where a few troops are quartered. This general
distribution of the military forces has been made with a prudent
foresight in order to prevent any local uprising.
ARMY HEADQUARTERS, CITY OF MEXICO