100% found this document useful (7 votes)
414 views

Download full (Ebook) Math and Architectures of Deep Learning (MEAP V10) by Krishnendu Chaudhury ebook all chapters

The document provides information about the ebook 'Math and Architectures of Deep Learning' by Krishnendu Chaudhury, which is available for download in various formats. It emphasizes the importance of understanding both the theoretical and practical aspects of deep learning, highlighting the need for mathematical knowledge and programming skills. The content includes an overview of machine learning and deep learning, along with chapters covering essential mathematical concepts and neural network architectures.

Uploaded by

aliotopintor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (7 votes)
414 views

Download full (Ebook) Math and Architectures of Deep Learning (MEAP V10) by Krishnendu Chaudhury ebook all chapters

The document provides information about the ebook 'Math and Architectures of Deep Learning' by Krishnendu Chaudhury, which is available for download in various formats. It emphasizes the importance of understanding both the theoretical and practical aspects of deep learning, highlighting the need for mathematical knowledge and programming skills. The content includes an overview of machine learning and deep learning, along with chapters covering essential mathematical concepts and neural network architectures.

Uploaded by

aliotopintor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Download the Full Ebook and Access More Features - ebooknice.

com

(Ebook) Math and Architectures of Deep Learning


(MEAP V10) by Krishnendu Chaudhury

https://ebooknice.com/product/math-and-architectures-of-
deep-learning-meap-v10-51388806

OR CLICK HERE

DOWLOAD EBOOK

Download more ebook instantly today at https://ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Math and Architectures of Deep Learning (Final


Release) by Krishnendu Chaudhury ISBN 9781617296482,
1617296481
https://ebooknice.com/product/math-and-architectures-of-deep-learning-
final-release-56728152

ebooknice.com

(Ebook) Math for Programmers: 3D graphics, machine


learning, and simulations with Python MEAP V10 by Paul
Orland
https://ebooknice.com/product/math-for-programmers-3d-graphics-
machine-learning-and-simulations-with-python-meap-v10-11069540

ebooknice.com

(Ebook) Vagabond, Vol. 29 (29) by Inoue, Takehiko ISBN


9781421531489, 1421531488

https://ebooknice.com/product/vagabond-vol-29-29-37511002

ebooknice.com

(Ebook) Inside Deep Learning: Math, Algorithms, Models by


Edward Raff ISBN 9781617298639, 1617298638

https://ebooknice.com/product/inside-deep-learning-math-algorithms-
models-43565358

ebooknice.com
(Ebook) 29, Single and Nigerian by Naijasinglegirl ISBN
9781310004216, 1310004218

https://ebooknice.com/product/29-single-and-nigerian-53599780

ebooknice.com

(Ebook) Computer Vision Using Deep Learning: Neural


Network Architectures with Python and Keras by Vaibhav
Verdhan ISBN 9781484266151, 1484266153
https://ebooknice.com/product/computer-vision-using-deep-learning-
neural-network-architectures-with-python-and-keras-24606292

ebooknice.com

(Ebook) Boeing B-29 Superfortress ISBN 9780764302725,


0764302728

https://ebooknice.com/product/boeing-b-29-superfortress-1573658

ebooknice.com

(Ebook) Harrow County 29 by Cullen Bunn, Tyler Crook

https://ebooknice.com/product/harrow-county-29-53599548

ebooknice.com

(Ebook) Jahrbuch für Geschichte: Band 29 ISBN


9783112622223, 3112622227

https://ebooknice.com/product/jahrbuch-fur-geschichte-band-29-50958290

ebooknice.com
MEAP Edition
Manning Early Access Program
Math and Architectures of Deep Learning
Version 10

Copyright 2022 Manning Publications

For more information on this and other Manning titles go to


manning.com

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
welcome
Dear Reader,

Welcome to Manning Early Access Program (MEAP) for Math and Architectures of Deep Learning.
This membership will give you access to the developing manuscript along with the resources
which includes fully functional python/PyTorch code downloadable and executable via Jupyter-
notebook.

Deep learning is a complex subject. On one hand, it is deeply theoretical with extensive
mathematical backing. Indeed, without a good intuitive understanding of the mathematical
underpinnings, one is doomed to merely running off the shelf pre-packaged models without
understanding them fully. These models often do not lend themselves well to the exact problem
one needs to solve and one is helpless if any change or re-architecting is necessary. On the
other hand, deep learning is also intensely practical requiring significant Python programming
skills on new platforms like Tensorflow and PyTorch. Failure to master those leaves one unable
to solve any real problem.

This author feels that there is a dearth of books that addresses both of these aspects of the
subject in a connected fashion. That is what has led to the genesis of this book.

The author will feel justified in his efforts if these pages help the reader to become a successful
exponent in the art and science of deep learning.

Please post all the comments, questions and suggestions in the liveBook's Discussion Forum.

Sincerely,
Krishnendu Chaudhury

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
brief contents
1 An overview of machine learning and deep learning
2 Introduction to Vectors, Matrices and Tensors from Machine Learning and
Data Science point of view
3 Introduction to Vector Calculus from Machine Learning point of view
4 Linear Algebraic Tools in Machine Learning and Data Science
5 Probability Distributions for Machine Learning and Data Science
6 Bayesian Tools for Machine Learning and Data Science
7 Function Approximation: How Neural Networks model the world
8 Training Neural Networks: Forward and Backpropagation
9 Loss, Optimization and Regularization
10 One, Two and Three Dimensional Convolution and Transposed Convolution
in Neural Networks
11 Deep Convolutional Neural Network Architectures for Image Classification
and Object Detection
12 Manifolds, homeomorphism and Neural Networks
13 Bayesian Inferencing
14 Latent Space and Generative Modeling, AutoEncoders and Variational
AutoEncoders

Appendix

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
1

An overview of machine
learning and deep learning

1
Deep learning has transformed computer vision, natural language and speech process-
ing in particular and artificial intelligence in general. From a bag of semi-discordant
tricks, none of which worked satisfactorily on a real life problem, artificial intelligence
has become a formidable tool to solve real problems faced by industry, at scale. This
is nothing short of a revolution going on under our very noses. If one wants to lead
the curve of this revolution, it is imperative to understand the underlying principles
and abstractions, rather than simply memorizing the ”how to” steps of some hands on
guide. This is where the mathematics comes in.

In this first chapter we will give an overview of deep learning. This will require us
to use some concepts that have been explained in subsequent chapters. The reader
should not worry if there are some open questions at the end of this chapter. This
chapter is aimed at orienting one’s mind towards this difficult subject. As individual
concepts get clearer in subsequent chapters, the reader should consider coming back
and giving this chapter a re-read.

1.1 A first look at machine/deep learning - a paradigm shift in computa-


tion
Making decisions and/or predictions is a central requirement of life. This essentially
involves taking in a set of sensory or knowledge inputs and generating decisions or
estimates by processing them.
For instance, a cat’s brain is often trying to choose between the following options: run
away from the object in front vs ignore the object in front vs approach the object in front
and purr. It makes that decision by processing sensory inputs, like perceived hardness

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
2

of the object in front, perceived sharpness of the object in front, etc. This is an instance
of classification problem where the output is one out of a set of possible classes.
Some other examples of classification problem in life:
buy vs hold vs sell a certain stock, from inputs like price history of this stock, change
in price of this stock in recent times
object recognition (from an image), e.g.,:
– is this a car or a giraffe
– is this a human or a non-human
– is this an inanimate object or a living object
– face recognition - is this Tom or Dick or Mary or Einstein or Messi
action recognition from video, e.g.,:
– is this person running or not running
– is this person picking something up or not
– is this person doing something violent or not
Natural Language Processing aka NLP from digital documents, e.g.,:
– does this news article belong to the realm of politics or sports
– does this query phrase match a particular article in the archive
etc.
Sometimes life requires a quantitative estimation as opposed to classification. A lion
brain needs to estimate what should be the length of a jump so as to land on the top of
its prey, by processing inputs like speed of prey, distance to prey etc. Another instance
of quantitative estimation is to estimate house price, based on inputs like current in-
come, crime statistics for the neighborhood etc.
Some other examples of quantitative estimations required by life
object localization from an image: identifying the rectangle bounding the lo-
cation of an object
stock price prediction from historical stock prices and other world events
similarity score between a pair of documents
Sometimes, a classification output can be generated from a quantitative estimate. For
instance, the cat brain described above, can combine the inputs (hardness, sharpness
etc) to generate a quantitative threat score. If that threat score is high, the cat runs
away. If the threat score is near zero, the cat ignores the object in front. If threat score
is negative, the cat approaches the object in front and purrs.
Many of these examples are pictorially depicted in Fig 1.1.

In each of these instances, there is a machine - viz., brain - that transforms sensory
or knowledge inputs to decisions or quantitative estimates. The goal of machine learn-
ing is to emulate that machine.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
3

One must note that machine learning has a long way to go before it can catch up with
the human brain. The human brain can single handedly deal with thousands, if not
millions, of such problems. On the other hand, at its present state of development, ma-
chine learning can hardly create a single general purpose machine that makes a wide
variety of decisions and estimates. We are mostly trying to make separate machines to
solve individual tasks (stock picker, car recognizer etc).

Figure 1.1: Examples of Decision Making and Quantitative Estimations in Life

At this point, one might ask, wait, converting inputs to outputs - isn’t that exactly what
computers have been doing for last thirty or more years? What is this paradigm shift I
am hearing about? The answer: it is a paradigm shift because we do not provide a step
by step instruction set - viz., a program - to the machine to convert the input to output.
Instead, we develop a mathematical model for the problem.
Let us illustrate the idea with an example. For the sake of simplicity and concreteness,
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
4

we will consider a hypothetical cat brain which needs to make only one decision in life -
whether to run away from the object in front or ignore it or approach and purr. This decision,
then, is the output of the model we will discuss. And, in this toy example, the decision
is made based on only two quantitative inputs (aka features), perceived hardness of the
object in front and its perceived sharpness. (as depicted in Fig 1.1). We do not provide
any step by step instruction such as ”if sharpness greater than some threshold then run away”
etc. Instead, we try to identify a parameterized function that takes the input and converts
it to the desired decision or estimate. The simplest such function is a weighted sum of
inputs:

y (hardness, sharpness) = w0 × hardness + w1 × sharpness + b

The weights w0 , w1 and the bias b are the parameters of the function. The output y can
be interpreted as a threat score. If the threat score exceeds a threshold the cat runs
away. If it is close to 0 the cat ignores. If the threat score is negative the cat approaches
and purrs. For more complex tasks, we will use more sophisticated functions.
Note that the weights are not known at first, we need to estimate them. This is done
through a process called model training.

Overall, solving a problem via machine learning has following stages:


We first design a parameterized model function (e.g., weighted sum) with un-
known parameters (weights). This constitutes the model architecture. Choosing
the right model architecture is where the expertise of the machine learning
engineer comes into play.
Then we estimate the weights via model training.
Once the weights are estimated, we have a complete model. This model can
take arbitrary inputs not necessarily seen before and generate outputs. The
process, where a trained model processes an arbitrary real life input and emits
an output is called inferencing.

In the most popular variety of machine learning, called supervised learning, we prepare
the training data before we commence training. Training data comprises example input
items, each with its corresponding desired output.1 . Training data is often created manually,
i.e., a human goes over every single input item and produces the desired output (aka
target output). It is usually the most arduous part of doing machine learning.
For instance, in our hypothetical cat brain example, some possible training data items
1 If you have some experience with machine learning, you will realize that I am talking about ”supervised”
learning here. There are also machines that do not need known outputs to learn - the so called ”unsuper-
vised” machines - we will talk about them later.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
5

are

input: (hardness = 0.01, sharpness = 0.02) → threat = −0.90 → decision : ”approach and p
input: (hardness = 0.50, sharpness = 0.60) → threat = 0.01 → decision : ”ignore”
input: (hardness = 0.99, sharpness = 0.97) → threat = 0.90 → decision : ”run away”

where the input values of hardness and sharpness are assumed to lie between 0 and 1.

What exactly happens during training? Answer: we iteratively process the input train-
ing data items. For each input item, we know the desired (aka target) output. On
each iteration, we adjust the model weight values in a way that the output of the model
function on that specific input item gets at least a little bit closer to the corresponding
target output. For instance, suppose at a given iteration, the weight values are w0 = 20
and w1 = 10 and b = 50. On the input (hardness = 0.01, sharpness = 0.02), we get
an output threat score y = 50.3 which is quite different from the desired y = −0.9.
We will adjust the weights, for instance reduce the bias - so w0 = 20 and w1 = 10
and b = 40. The corresponding threat score y = 40.3 is still nowhere near the desired
value, but it has moved closer. After doing this on many training data items, the weights
would start approaching their ideal values. Note that how to identify the adjustments
to the weight values is not discussed here. It needs somewhat deeper math and will be
discussed later.

As stated above, this process of iteratively tuning weights is called training or learning. At
the beginning of learning, the weights have random values, so the machine outputs
often do not match desired outputs. But with time, more training iterations happen
and the machine ”learns” to generate the correct output. That is when the model is
ready for deployment in real world. Given arbitrary input, the model will (hopefully)
emit something close to the desired output during inferencing.
Come to think of it, that is probably how living brains work. They contain equivalents
of mathematical models for various tasks. Here, the weights are the strengths of the
connections (aka synapses) between the different neurons in the brain. In the begin-
ning, the parameters are untuned, the brain repeatedly makes mistakes. E.g., a baby’s
brain often makes mistake in identifying edible objects - anybody who has had a child
will know what we are talking about. But each example tunes the parameters (eating
green and white rectangular things with $ sign invites much scolding - should not eat
them in future etc). Eventually this machine tunes its parameters to yield better results.
One subtle point should be noted here. During training, the machine is tuning
its parameters so that it produces the desired outcome - on the training data input only.
Of course, it sees only a small fraction of all possible inputs during training - we are
not building a lookup table from known inputs to known outputs here. Hence, when
this machine gets released in the world, it mostly runs on input data it has never seen
before. What guarantee do we have that it will generate the right outcome on never
before seen data? Frankly, there is no guarantee. Only, in most real life problems, the
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
6

inputs are not really random. They have a pattern. Hopefully, the machine will see
enough during training to capture that pattern. Then, its output on unseen input will
be close to desired value. The closer the distribution of the the training data is to real
life, likelier that becomes.

1.2 A Function Approximation View of Machine Learning: Models and


their Training
As stated in section 1.1, to create a brain-like machine that makes classifications or
estimations, we have to find a mathematical function (model) that transforms inputs
into corresponding desired outputs. Sadly however, in typical real life situations, we do
not know that transformation function. For instance, we do not know the function that
takes in past prices, world events etc and estimates the future price of a stock - some-
thing that stops us from building a stock price estimator and getting rich. All we have
is the training data - a set of inputs on which the output is known. How do we proceed
then? Answer, we will try to model the unknown function. This means, we will create
a function that will be a proxy or surrogate to the unknown function. Viewed in this
way, machine learning is nothing but function approximation - we are simply trying to
approximate the unknown classification or estimation function.

Let us briefly recapitulate the main ideas from the previous section. In machine learn-
ing, we try to solve problems that can be abstractly viewed as transforming a set of inputs
to an output. The output is either a class or an estimated value. Since we do not know
the true transformation function, we try to come up with a model function. We start by
designing - using our physical understanding of the problem - a model function with
tunable parameter values that could serve as a proxy for the true function. This is the
model architecture and the tunable parameters are also known as weights. The simplest
model architecture is one where the output is a weighted sum of the input values. De-
termining the model architecture does not fully determine the model - we still need
to determine the actual parameter values (weights). That is where training comes in.
During training, we find an optimal set of weights that would transform the training
inputs to outputs that match the corresponding training outputs as closely as possible .
Then we deploy this machine in the world - now its weights are estimated and the func-
tion is fully determined - on any input, it simply applies the function and generates an
output. This is called inferencing. Of course, training inputs are only a fraction of all
possible inputs, so there is no guarantee that inferencing will yield a desired result on
all real inputs. The success of the model depends on the appropriateness of the chosen
model architecture and the quality and quantity of training data.
In this context, the author would like to note that after mastering machine learning,
the biggest struggle faced by a practitioner turns out to be procurement of training
data. It is common practice, when one can afford it, to use humans to hand generate
the outputs corresponding to the training data inputs (these target outputs are some-
times referred to as ground truth). This process, known as human labeling or human

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
7

curation, involves an army of human beings looking at a substantial number of train-


ing data inputs and producing the corresponding ground truth output. For some well
researched problems, one maybe lucky enough to get training data on the internet,
else it becomes a daunting challenge. More on this later.
Now, let us study the process of model building with a concrete example, the cat
brain machine shown in Fig 1.1.

1.3 A simple machine learning model - the cat brain


2
For the sake of simplicity and concreteness, we will deal with a hypothetical cat which
needs to make only one decision in life - whether to run away from the object in front
or ignore it or approach and purr. And it makes this decision based on only two quan-
titative inputs pertaining to the object in front of the cat (shown in Fig 1.1).
INPUT FEATURES
i) x0 signifying Hardness ii) x1 signifying Sharpness .
Without loss of generality, we can normalize the inputs. This is a pretty popular trick,
whereby the input values ranging between a minimum possible value vmin and a maxi-
mum possible value vmax are transformed to values between 0 and 1. To transform an
arbitrary input value v to a normalized value vnorm we use the formula
(v − vmin )
vnorm = (1.1)
(vmax − vmin )
In mathematical parlance, transformation via equation 1.1, v ∈ [vmin , vmax ] → vnorm ∈
[0, 1] maps the values v from the input domain [vmin , vmax ] to the output values vnorm
in the range [0, 1].  
x0 2
A 2 element vector ~x =   ∈ [0, 1] represents a single input instance succinctly.
x1
OUTPUT DECISIONS
The final output is multi-class, which can take one of three possible values i) 0: Imply-
ing run away from the object in front ii) 1: Implying ignore the object in front iii) 2:
Implying approach and purr . It is possible in machine learning to compute the class
directly. However, in this example we will have our model estimate a threat score. It
is interpreted as follows: i) threat high positive - run away ii) threat near zero - ignore
iii) threat high negative - approach and purr (negative threat is attractive) .
We can make a final multi-class run/ignore/approach decision based on threat
score by comparing the threat score y against a threshold δ as follows

 > δ → high threat, run away


y >= −δ and <= δ → threat close to zero, ignore (1.2)

 < −δ → negative threat, approach and purr

2 This chapter is a lightweight overview of machine/deep learning. As such, it mildly relies upon mathemat-
ical concepts that we will introduce later. The reader is encouraged to read this chapter now, nonetheless,
and perhaps re-read after the chapters on vectors and matrices have been digested.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
8

MODEL ESTIMATION
Now for the all important step. We need to estimate the function which transforms the
input vector to the output. With slight abuse of terms, we will denote this function as
well as the output by y. In mathematical notation, we want to estimate y (~x).

Of course, we do not know the ideal function. We will try to estimate this unknown
function from the training data. This is accomplished in two steps:
1 Model Architecture Selection: Designing a parameterized function that we ex-
pect is a good proxy or surrogate for the unknown ideal function.
2 Training: Estimating the parameters of that chosen function such that the
outputs on training inputs match correspond outputs as closely as possible.

MODEL ARCHITECTURE SELECTION


This is the step where various machine learning approaches differ from one another.
In this toy cat brain example, we will use the simplest possible model. Our model has
3 parameters, w0, w1 , b - they can be represented compactly with a single 2 element
w0
~ =   ∈ R2 and and a constant bias b ∈ R (here R denotes the set of all
vector w
w1
real numbers, R2 denotes the set of 2D vectors with both elements real, etc). It emits
the threat score, y, which is computed as
 
h i x0
y (x0 , x1 ) = w0 x0 + w1 x1 + b = w0 w1   + b = w~ T ~x + b (1.3)
x1

Note that b is a slightly special parameter. It is a constant, that does not get multiplied
with any of the inputs. It is common practice in machine learning to refer to it as bias
while the other parameters that get multiplied with inputs as weights.
MODEL TRAINING
Once the model architecture is chosen, we know the exact parametric function we are
going to use to model the unknown function y (~x) that transforms inputs to outputs.
We still need to estimate the function’s parameters. Thus, we have a function with un-
known parameters and the parameters are to be estimated from a set of inputs with
known outputs (training data). We will choose the parameters so that the outputs on
the training data inputs match the corresponding outputs as closely as possible.
It should be noted that this problem, has been studied by mathematicians and known
as a function fitting problem in mathematics. What changed with the advent of ma-
chine learning however is the sheer scale. In machine learning, we deal with training
data comprising millions and millions of items. This changed the philosophy of the
solution. Mathematicians used a ”closed-form solution”, where the parameters are es-
timated by directly solving equations involving all the training data items together. In
machine learning, one goes for iterative solutions, where one deals with a few, perhaps
a single, training data item at a time. In the iterative solution, there is no need to hold
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
9

the entire training data in the computer’s memory. One simply loads small portions of
it at a time and deals with only that portion. We will exemplify this with our cat brain
example.

Concretely, the goal of the training process is to estimate the parameters w0 , w1 , b


or equivalently the vector w ~ along with constant b from equation 1.3 in such a way that
the output y (x0 , x1 ) on training data input (x0 , x1 ) matches the corresponding known
training data outputs (aka ground truth or GT) as much as possible.

Let the training data comprise N + 1 inputs ~x(0) , ~x(1) , · · · ~x(N ) . Here each ~x(i) is a
2 × 1 vector denoting a single training data input instance. The corresponding desired
(0) (1) (N )
threat values (outputs) are ygt , ygt , · · · ygt , say (here the subscript gt denotes ground
truth). Equivalently, we can say training data comprises N + 1 (input, output) pairs:
     
(0) (1) (N )
~x(0) , ygt , ~x(1) , ygt · · · ~x(N ) , ygt

Supposing w ~ denotes the (as yet unknown) optimal parameters for the model. Then,
given an arbitrary input ~x the machine
 ~ T ~x +
willestimate a threat value of ypredicted = w
th (i) (i)
b. On the i training data pair, ~x , ygt the machine will estimate

(i)
~ T ~x(i) + b
ypredicted = w
(i)
while the desired output is ygt . Thus the squared error (aka loss) made by the machine
on the ith training data instance is3
 2
(i) (i)
e2i = ypredicted − ygt

The overall loss on the entire training data set is obtained by adding the loss from each
individual training data instance
i=N i=N  2 i=N  2
(i) (i) (i)
X X X
E2 = e2i = ypredicted − ygt = ~ T ~xi + b − ygt
w
i=0 i=0 i=0

The goal of training is to find the set of model parameters (aka weights), w, ~ that mini-
mizes the total error E. Exactly how we do this will be described later.
In most cases, it is not possible to come up with a closed-form solution for the optimal
w,
~ b. Instead, we take an iterative approach depicted in Algorithm 1. In algorithm 1,
we start with random parameter values and keep tuning parameters so that the total
error goes down at least a little bit. Keep doing this until the error becomes sufficiently
small.
In a purely mathematical sense, one continues the iterations until the error is minimal.
3 In this context, it should be noted that it is a common practice to square the error/loss to make it sign
independent. If we desired an output of, say 10, we are equally happy/unhappy if the output is 9.5 or 10.5.
Thus, error of +5 or −5 is effectively the same, hence we make the error sign independent.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
10

Algorithm 1 Training a supervised model


Initialize parameters w,
~ b with random values

. iterate while error not small enough


Pi=N  T 2
(i)
while (E 2 = i=0 w ~ ~xi + b − ygt > threshold) do
. iterate over all training data instances
for ∀i ∈ [0, N ] do
. details provided in section 3.3 after gradients are introduced
~ b so that E 2 is reduced
Adjust w,
end for
end while
. remember the final parameter values as optimal
~ ∗ ← w,
w ~ b∗ ← b

But in practice, one often stops when the results are accurate enough for the problem
being solved.
It is worth re-emphasizing that error here refers only to error on training data.
INFERENCING
Finally, a trained machine (with optimal parameters w ~ ∗ , b∗ is deployed in the world.
It will receive new inputs ~x and will infer ypredicted (~x) = w ~ ∗T ~x + b∗ . Classification will
happen by thresholding ypredicted as shown in equation 1.2.

1.4 Geometrical View of Machine Learning


Each input to the cat’s brain model is an array of 2 numbers: x0 (signifying hardness
of the object), x1 (signifying sharpness of the object) or equivalently a 2 × 1 vector ~x. A
good mental picture here is to think of the input as a point in a high dimensional space.
The input space is often called the feature space - a space where all the characteristic
features to be examined by the model are represented. The feature space dimension
is two here but in real life problems it will be in hundreds or thousands or more. The
exact dimensionality of the input changes from problem to problem, but the intuition
that it is a point remains.
The output y should also be viewed as a point in another high dimensional space. In
this toy problem the dimensionality of the output space is 1, but in real problems it will
be higher. Typically, however, number of output dimensions is much smaller than the
number of input dimensions.
Geometrically speaking, a machine learning model essentially maps a point in the fea-
ture space to a point in the output space. It is expected that the classification or esti-
mation job to be performed by the model is easier in the output space than the feature
space. In particular, for a classification job, input points belonging to separate classes are ex-
pected to map to separate clusters in output space.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
11

Let us continue with our example cat’s brain model to illustrate the idea. As stated
earlier, our feature space is 2D, with two coordinate axes X0 signifying hardness and
X1 signifying sharpness 4 . Individual points in this 2D space will be denoted by coordi-
nate values (x0 , x1 ), in lower case. This is depicted in Fig 1.2. As shown in the diagram,
a good way to model the threat score is to measure distance from line x0 + x1 = 1.
From coordinate geometry, in a 2D space with coordinate axes X0 and X1 , the signed
distance of a point (a, b) from the line x0 + x1 = 1 is y = a+b−1 √
2
. Examining the sign
of y we can determine which side of the separator line the input point belongs to.

Figure 1.2: Geometrical View of Machine Learning: 2D input point space for cat brain
model. The bottom left corner shows low hardness and low sharpness objects (’-’ signs)
while top right corner shows high hardness and high sharpness objects (’+’ signs). The
intermediate values are near the diagonal (’$’ signs). In this simple situation, mere
observation tells us that the threat score can be proxied by the signed distance, y, from
the diagonal line x0 +x1 −1 = 0. One can make the run/ignore/approach decision by
thresholding y. Values close to zero imply ignore, positive values imply run away and
negative values imply approach and purr. From high school geometry, the distance of
an arbitrary input point (x0 = a, x1 = b) from line x0 + x1 − 1 = 0 is a+b−1 √
2
. Thus,
√ 1 −1 is a possible model for the cat brain threat estimator
the function y (x0 , x1 ) = x0 +x2
function. Training should converge to w0 = √12 , w1 = √12 and b = − √12

4 We use X0 , X1 as coordinate symbols instead of the more familiar X, Y so as not to run out of symbols
when going to higher dimensional spaces.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
12

Thus, our simplified cat brain threat score model is


1 1 1
y (x0 , x1 ) = √ x0 + √ x1 − √ (1.4)
2 2 2
It maps the 2D input points, signifying hardness and sharpness of the object in front, to
a 1D value corresponding to the signed distance from a separator line. This distance,
physically interpretable as threat score, makes it possible to separate the classes (neg-
ative threat, neutral, positive threat) via thresholding as shown in equation 1.2. The
separate classes form distinct clusters in the output space, depicted by +, − and $ signs
in the output space. Low values of inputs produce negative threats (the cat will ap-
proach and purr), e.g., y (0, 0) = − √12 . High values of inputs produce high threat (cat
will run away), e.g., y (1, 1) = √12 . Medium values of input produce near zero threat
(cat will ignore), e.g., y (0.5, 0.5) = 0. Of course, because the problem is so simple,
here we could come up with the model parameters via simple observation. In real life
situations, this will need training.

The geometric view holds in higher dimensions too. In general, a n-dimensional input
vector ~x is mapped to a m-dimensional output vector (usually m < n) in such a way that
the problem becomes much simpler in the output space. An example with 3D feature
space is shown in Figure 1.3.

1.5 Regression vs Classification in Machine Learning


As briefly outlined in section 1.1, there are two types of machine learning models: re-
gressors and classifiers.

In a regressor, the model tries to emit a desired value given a specific input. For in-
stance, the first stage (threat score estimator) of the cat brain model in section 1.3 is a
regressor model.
Classifiers on the other hand have a set of pre-specified classes. Given a specific input,
they try to emit the class to which the input belongs. For instance, the full cat brain
model has 3 classes: (i) run away (ii) ignore (iii) approach and purr. Thus, it takes an
input (hardness and sharpness values) and emits an output decision (aka class).
In this example, we convert a regressor into a classifier by thresholding the output of
the regressor (see equation 1.2. It is also possible to create models that directly output
the class without having an intervening regressor.

1.6 Linear vs Nonlinear Models


In Fig. 1.2 we faced a rather simple situation where the classes could be separated by a
line (hyper-plane in higher dimensional surfaces). This often does not happen in real
life. What if the points belonging to different classes are as shown in Fig. 1.4? In such
cases, our model architecture should no longer be a simple weighted combination. It
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
13

Figure 1.3: Geometrical View of Machine Learning: A model maps the points from
input (feature) space to an output space where it is easier to separate the classes. For
instance, in this figure, input feature points belonging to two classes, red and green,
are distributed over the volume of a cylinder in a 3D feature space. The model unfurls
the cylinder into a rectangle. The feature points get mapped onto a 2D planar output
space where the two classes can be discriminated with a simple linear separator.

will be a non-linear function. For instance, check the curved separator in Fig. 1.4. An-
other example is shown in Figure 1.5 - classifying the points in the 2D plane into the
two classes indicated in blue and red requires non-linear models.
Non-linear models make sense from the function approximation point of view as well.
Ultimately, our goal is to approximate very complex and highly non-linear functions
that model the classification or estimation processes demanded by life. Intuitively, it
seems better to use non-linear functions to model them.

A very popular non-linear function in machine learning is the sigmoid function, so


named because it looks like the letter ’S’ in the alphabet. The sigmoid function is
typically symbolized by the Greek letter σ. It is defined as
1
σ (x) = (1.5)
1 + e−x
The graph of the sigmoid function is shown in Fig. 1.6. Thus we can use the following
model architecture
~ T ~x + b

y=σ w (1.6)
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
14

Figure 1.4: The two classes (indicated by ’+’ and ’-’) can not be separated by a line.
Curved separator needed. In 3D, this is equivalent to saying no plane can separate
the surfaces, a curved surface is necessary. In still higher dimensional spaces, this is
equivalent to saying no hyper-plane can separate the classes. A curved hyper-surface is
needed.

Figure 1.5: The two classes (indicated by blue and red colors respectively) can not be
separated by a line. Non-linear (curved) separator needed.

Thus, a popular model architecture (still kind of simple) is that we take sigmoid (with-
out parameters) of the weighted sum of the inputs. The sigmoid imparts the non-
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
15

Figure 1.6: The sigmoid graph

linearity. This architecture will be able to handle relatively more complex classifica-
tion tasks than the weighted sum alone. In fact, equation 1.6 depicts the basic building
block of a neural networks.

1.7 Higher Expressive Power through multiple non-linear layers: Deep


Neural Networks
In section 1.6 we stated that adding non-linearity to the basic weighted sum yielded a
model architecture that is able to handle more complex tasks. In machine learning
parlance, the non-linear model has more ”expressive power”.
Now consider a real life problem, say building a dog recognizer. The input space
comprises pixel locations and pixel colors (x, y, r, g, b where r, g, b denotes red, green,
blue components of a pixel color). The input dimensionality is large (proportional
to the number of pixels in the image). Table 1.1 gives a small glimpse into possible
variations in background and foreground that a typical deep learning system, say, a
dog image recognizer has to deal with.
We need a machine with really high expressive power here. How do we create such
a machine in a principled way?
Instead of generating the output from input in a single step, how about taking a
cascaded approach? We will generate a set of intermediate or hidden outputs from the
inputs, where each hidden output is essentially a single logistic regression unit. Then
we add another layer which takes the output of the previous layer as input. And so on.
Finally, we will combine the outermost hidden layer outputs into the grand output.
We describe the system in the following equations. It should be noted that we have
added a superscript to the weights to identify layer (layer 0 is closest to the input, layer
L is the last layer furthest from input). We also have made the subscripts two dimen-
sional (so that the weights for a given layer becomes a matrix). The first subscript
identifies the destination node and the second subscript identifies the source node
(see Fig 1.7).
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
16

Table 1.1: A glimpse into background and foreground variations a typical deep learn-
ing system (here a dog image recognizer) has to deal with

The astute reader might notice that the following equations do not have an explicit
bias term. That is because, for simplicity of notation, we have rolled it into the set of
weights and assumed that one of the inputs, say x0 = 1 and the corresponding weight,
e.g., w0 is the bias.

Layer 0: generates n0 hidden outputs from n + 1 inputs


 
(0) (0) (0) (0)
h0 = σ w00 x0 + w01 x1 + · · · w0n xn
 
(0) (0) (0) (0)
h1 = σ w10 x0 + w11 x1 + · · · w1n xn
..
.  
(0) (0) (0)
h(0)
n0 = σ wn0 0 x0 + wn0 1 x1 + · · · wn0 n xn (1.7)

Layer 1: generates n1 hidden outputs from n0 hidden outputs from layer 0


 
(1) (1) (0) (1) (0) (1) (0)
h0 = σ w00 h0 + w01 h1 + · · · w0n0 hn0
 
(1) (1) (0) (1) (0) (1) (0)
h1 = σ w10 h0 + w11 h1 + · · · w1n0 hn0
..
.  
(1) (0) (1) (0) (1) (0)
h(1)
n1 = σ wn1 0 h0 + wn1 1 h1 + · · · wn1 n0 hn0 (1.8)

···

Final Layer (L): generates m + 1 visible outputs from nL−1 previous layer hidden out-
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
17

puts
 
(L) (L) (L−1) (L) (L−1) (L) (L−1)
h0 = σ w00 h0 + w01 h1 + · · · w0nL−1 hnL−1
 
(L) (L) (L−1) (L) (L−1) (L) (L−1)
h1 = σ w10 h0 + w11 h1 + · · · w1nL−1 hnL−1
..
.  
(L) (L−1) (L) (L−1) (L) (L−1)
h(L)
m = σ wm0 h0 + wm1 h1 + · · · wmnL−1 hnL−1 (1.9)

The above equations can be pictorially depicted in Fig 1.7.

Figure 1.7: Multi Layered Neural Network

This machine, depicted in Fig 1.7, can be incredibly powerful, with huge expressive
power. We can adjust its expressive power systematically to fit the problem at hand.
This then is a neural network. We will devote the rest of the book to studying this.

Chapter Summary
In this chapter we gave an overview of machine learning leading all the way up to deep
learning. The ideas were illustrated with a toy cat brain example. Some mathemati-
cal notions (e.g., vectors) were used in this chapter without proper introduction. The
reader is encouraged to revisit this chapter after vectors and matrices have been intro-
duced.
The author would like to leave the reader with the following mental pictures from this
chapter
Machine learning as a fundamentally different paradigm of computing. In
traditional computing, one provides a step by step instruction sequence to the
computer, telling it what to do. In machine learning, one builds a mathemat-
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
18

ical model that tries to approximate the unknown function that generates a
classification or estimation from inputs.
The mathematical nature of the model function is stipulated from the physical
nature and complexity of the classification or estimation task. Models have
parameters. Parameter values are estimated from training data - inputs with
known outputs. The parameter values are optimized so that the model output
is as close as possible to training outputs on training inputs.
An alternative geometric view of a machine is a transformation that maps points
in the multi-dimension input space to a point in the output space.
More complex the classification/estimation task, the more complex the ap-
proximating function. In machine learning parlance, complex tasks need ma-
chines with higher expressive power. Higher expressive power comes from
non-linearity (e.g., the sigmoid function, see 8.1) and layered combination
of simpler machines. This takes us to deep learning, which is nothing but a
multi-layered non-linear machine.
Complex model functions are often built by combining simpler basis func-
tions.
Tighten your seat belts, the fun is about to get more intense.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
19

Introduction to Vectors,
Matrices and Tensors from
Machine Learning and
Data Science point of view 2
At its core, machine learning, indeed all computer software, is about number crunch-
ing. One inputs a set of numbers to the machine and gets back a different set of num-
bers as output. However, this cannot be done randomly. It is important to organize
these numbers appropriately, group them into meaningful objects that go in and come
out of the machine. This is where vectors and matrices come in. These are concepts
that mathematicians have been using for centuries - we are simply reusing them in
machine learning. In this chapter, we will study vectors and matrices, primarily from
a machine learning point of view. Starting from the basics, we will quickly graduate
to advanced concepts, restricting ourselves to topics that have relevance to machine
learning.

We provide Jupyter notebook based python implementations for most of the concepts
discussed in this and other chapters. Complete fully functional code that can be down-
loaded and executed (after installing python and Jupyter notebook) can be found at
http://mng.bz/KMQ4. The code relevant to this chapter can be found at http:
//mng.bz/d4nz.

2.1 Vectors and their role in Machine Learning and Data Science
Let us revisit the machine learning model for cat brain that was introduced in 1.3 . It
takes two numbers as input: representing the hardness and sharpness of the object in
front of the cat. Cat brain processes the input and generates an output threat score

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
20

which leads to run away or ignore or approach and purr decision. Now the two input
numbers usually appear together and it will be handy to group them together into a
single object. This object will be an ordered sequence of two numbers, the first one
representing hardness and the second one representing sharpness. Such an object is
a perfect example of a vector.
Thus, a vector can be thought of as an ordered sequence of two or more numbers,
also known as an array of numbers 1 . Vectors constitute a compact way of denoting a
set of numbers that together represent some entity. In this book, vectors will be rep-
resented by lower case letters with an overhead arrow and arrays by square
  brackets.
x0
For instance, the input to the cat brain model in 1.3 was a vector ~x =   where x0
x1
represented hardness and x1 represents sharpness.
Outputs to machine learning models too are often represented as vectors. For instance,
consider an object recognition model that takes an image as input and emits a set of
numbers indicating the probabilities that the image contains a dog, human  or cat re-
y
 0
spectively. The output of such a model will be a 3 element vector ~y = y1  where
 
 
y2
the number y0 denotes the probability that the image contains a dog, y1 denotes the
probability that the image contains a human and y2 denotes the probability that the
image contains a cat. Table 2.1 shows some possible input images and corresponding
output vectors.

In multi-layered machines like neural networks, the input and output to each layer will
be vectors. We also typically represent the parameters of the model function (see 1.3)
as vectors. This is illustrated below in 2.3.

One particularly significant notion in machine learning and data science is the idea of
a feature vector. This is essentially a vector that describes various properties of the object
being dealt with in a particular machine learning problem. We will illustrate the idea
with an example from the world of Natural Language Processing (NLP). Suppose we
have a set of documents. We want to create a document retrieval system where, given a
new document, we have to retrieve ”similar” documents in the system. This essentially
boils down to estimating similarity between documents in a quantitative fashion. We
will study this problem in detail later, but for now we want to note that the most natural
way to approach this is to create feature vectors for each document that quantitatively
describe the document. Later, in section 2.5.6 we will see how to measure the similarity
between these vectors, for now let us focus on simply creating descriptor vectors for the
documents. A popular way to do this is to choose a set of interesting words - we typically
1 In mathematics, vectors can have an infinite number of elements. Such vectors cannot be expressed as
arrays - but we will mostly ignore them in this book.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
21

Table 2.1: Input images and corresponding output vectors denoting probabilities that
the image contains a dog and/or human and/or cat respectively: possible output vec-
tor for top left image - [0.9 0.01 0.1], possible output vector for top right image- [0.9
0.01. 0.9], possible output vector for bottom left image - [0.01 0.99 0.01] possible out-
put vector for bottom right image - [0.88 0.9. 0.001]

exclude words like ”and”, ”if”, ”to” which are present in all documents from this list -
count the number of occurrence of interesting words in each document and make a
vector of these. Table 2.2 shows a toy example with 6 documents and corresponding
feature vectors. For simplicity, we have considered only two (”gun” and ”violence” in
pleural or singular, upper or lower case) of the possible set of words.
The sequence of pixels in the image can also be viewed as a feature vector. Neural
networks in computer vision tasks usually expect this feature vector.

2.1.1 Geometric View of Vectors and its significance in Machine Learning and Data
Science
Vectors  also be viewed geometrically. The simplest example is a 2-element vector
 can
x0
~x =  . Its 2 elements can be taken to be x and y, Cartesian coordinates in a 2-
x1
dimensional space. Then the vector will correspond to a point in that space. Vectors
with n elements will represent points in an n-dimensional space. The ability to see inputs
and outputs of machine learning models as points allows us to view the model itself
as a geometric transformation that maps input points to output points in some high
dimensional space. We have already seen this once in section 1.4. It is an enormously
powerful concept that we will keep utilizing throughout the book.

We will briefly touch upon a subtle issue here. A vector represents the position of a
x
point with respect to another. Furthermore, an array of coordinate values, like   de-
y

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
22

docid Document Feature Vector


h i
d0 Roses are lovely. Nobody hates roses. 0 0
h i
d1 Gun violence has reached an epidemic proportion in America. 1 1
h i
d2 The issue of gun violence is really over-hyped. One can find many instances 2 2
of violence where no guns were involved.
h i
d3 Guns are for violence prone people. Violence begets guns. Guns beget vio- 3 3
lence.
h i
d4 I like guns but I hate violence. I have never been involved in violence. But I 5 5
own many guns. Gun violence is incomprehensible to me. I do believe gun
owners are the most anti violence people on the planet. He who never uses a
gun will be prone to senseless violence.
h i
d5 Guns were used in a armed robbery in San Francisco last night. 1 0
h i
d6 Acts of violence usually involves a weapon. 0 1

Table 2.2: Example Toy Documents and corresponding Feature Vectors describing
them. Words eligible for the Feature Vector are colored in red. The first element of
the feature vector indicates the number of occurrences of the word ”gun”, the second
”violence”.

scribes the position of one point, in a given coordinate system . See Figure 2.1 to get a
intuitive understanding of this. For instance, consider the plane of a page of this book.
Suppose we want to reach the top right corner point of the page from the bottom left
corner. Let us call the bottom left corner O and the top right corner P . We can travel
the width (8.5 inches) rightwards to reach the bottom left corner and then travel the
height (11 inches) upwards to reach the top right corner. Thus, if we choose a coordi-
nate system with the bottom left corner as origin and the X axis along the width  and
8.5
the Y axis along the height, point P corresponds to the array representation  .
11
But we could also have traveled along the diagonal from bottom left to top right corner
to reach P from O. Either way, we end up at the same point P . Thus we have a co-
nundrum. The vector OP ~ represents the abstract geometric notion, position of P with
respect to O independent of our choice of coordinate axes. On the other hand,  the
8.5
array representation depends on the choice of coordinate system. E.g., the array  
11
represents the the top right corner point P only under a specific choice of coordinate
axes (parallel to the sides of the page) and a reference point (bottom left corner). Ide-
ally, we should specify the coordinate system along with the array representation to be
unambiguous. How come then we never do so in machine learning? The answer: in
machine learning, it does not matter what exactly the coordinate system is, as long as
we stick to any fixed coordinate system.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
23

There are explicit rules (which we will study below) that state how the vector trans-
forms when the coordinate system changes. We will invoke them when necessary. All
vectors used in a machine learning computation must consistently use the same coor-
dinate system or must be transformed appropriately.

One other point. Planar spaces, e.g., the plane of the paper on which this book is
written, are 2-dimensional (abbreviated 2D). The mechanical world we live in is 3-
dimensional 3D). Human imagination usually fails to see higher dimensions. In ma-
chine learning and data science, we often will talk of spaces with thousands of dimen-
sions. You may not see those spaces in your mind. But that is not a crippling limitation.
You will use 3 dimensional analogues in your head. They work in a surprisingly large
variety of cases. However, it is important to bear in mind that this is not always true.
Some examples where the lower dimensional intuitions fail at higher dimensions will
be shown later.

Figure 2.1: A vector describing the position of point P with respect to point O. The
basic mental picture to have is an arrowed line. This agrees with the definition of
vector we learnt in high school: vector has a magnitude (length of the arrowed line)
and direction (indicated by the arrow). On a plane, this is equivalent to the ordered
pair of numbers x, y, where the geometric interpretations of x and y are as shown in
Figure. In this context, it is worthwhile to note that only the relative positions of the
points O and P matter. If both the points are moved, keeping their relationship intact,
the vector does not change.

2.2 Python code to create and access vectors and sub-vectors, slice and
dice vectors, via Numpy and PyTorch parallel code
In this book, we will try to familiarize the the reader with numpy, PyTorch and similar
programming paradigms alongside the relevant mathematics. Knowledge of python
basics will be assumed. The reader is strongly encouraged to try out all code snippets
in this book - after installing appropriate packages like numpy, PyTorch etc.
All the python code in this book is produced via jupyter-notebook. A summarized
recapitulation of the theoretical material presented in code is provided right above
the code snippet.
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
24

The fully functional code demonstrating how to create vectors and access its elements,
in Python Numpy as well as PyTorch can be found at http://mng.bz/xm8q.

2.2.1 Python Numpy code for introduction to Vectors


Numpy stands for Numerical Python. It is an inalienable part of practical machine
learning.

Listing 2.1 Introduction to vectors via numpy

vector in Numpy is a 1-dimensional


array of numbers
v = np.array([0.11, 0.01, 0.98, 0.12, 0.98,
input hardness
0.85, 0.03, 0.55, 0.49, 0.99, vector of our
0.02, 0.31, 0.55, 0.87, 0.63]) cat-brain model

first_element = v[0]
third_element = v[2] square bracket operator lets us access individual vector elements

last_element = v[-1] negative indices count from end of array


second_last_element = v[-2] -1 denotes last element.
-2 denotes second to last element

second_to_fifth_elements = v[1:4]
colon operator slices off a range of elements from
the vector
first_to_third_elements = v[:2]
last_two_elements = v[-2:]
nothing before colon denotes beginning of array.
nothing after colon denotes end of array
num_elements_in_v = len(v)

2.2.2 PyTorch code for introduction to Vectors


Pytorch is an open-source machine learning library developed by Facebook’s artificial
intelligence group. It is one of the most elegant practical tools for developing deep
learning applications at present.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
25

Listing 2.2 Introduction to vectors via PyTorch

Torch Tensor represents a multi-dimensional array


vector is a 1D tensor-can be initialized
by directly specifying values
u = torch.tensor([0.11, 0.01, 0.98, 0.12, 0.98,
tensor elements are float by de-
,0.85, 0.03, 0.55, 0.49, 0.99, fault. We can force tensors to
0.02, 0.31, 0.55, 0.87, 0.63], be of other types e.g., float64
dtype=torch.float64) (double)

v1 = torch.from_numpy(v) Torch tensors can be initialized from Numpy arrays

diff = v1.sub(u) difference between the Torch tensor and its Numpy version is zero

u1 = u.numpy() Torch tensors can be converted to Numpy arrays

2.3 Matrices and their role in Machine Learning and Data Science
Sometimes, it is not sufficient to group a set of numbers into a vector. We have to col-
lect several vectors into another group. For instance, consider the input to training a
machine learning model. Here we have several input instances, each comprising of a
sequence of numbers. As seen in section 2.1, the sequence of numbers belonging to
a single input instance can be grouped into a vector. How do we represent the entire
collection of input instances? This is where the concept of matrices, from the world
of mathematics, come in handy. A matrix can be viewed as a rectangular array of num-
bers, arranged in a fixed count of rows and columns. Each row of a matrix is a vector,
so is each column. Thus a matrix can be thought of as a collection of row vectors. It
can also be viewed as a collection of column vectors. We can represent the entire set
of numbers that constitute the training input to a machine learning model as a matrix,
with each row vector corresponding to a single training instance.

Consider our familiar cat-brainproblem


 again. As stated earlier, single input instance
x0
to the machine is a vector ~x =   where x0 describes hardness of the object in front
x1
of the cat . Now consider a training dataset with many such input instances, each with a
known output threat score. You might recall from section 1.1 that the goal in machine
learning is to create a function that maps these inputs to their respective outputs with
as little overall error as possible. Our training data may look as shown in Table 2.3
below (it should be noted that in real life problems the training dataset is usually large
in size, often runs into millions of input-output pairs, however in this toy problem we
will have 15 training data instances). From Table 2.3, we can collect the columns cor-
responding to hardness and sharpness into a matrix as shown in equation 2.1 - this is a

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
26

input value: hardness input value: sharpness output: threat score

0 0.11 0.09 -0.8


1 0.01 0.02 -0.97
2 0.98 0.91 0.89
3 0.12 0.21 -0.68
4 0.98 0.99 0.95
5 0.85 0.87 0.74
6 0.03 0.14 -0.88
7 0.55 0.45 0.00
8 0.49 0.51 0.01
9 0.99 0.01 0.009
10 0.02 0.89 -0.07
11 0.31 0.47 -0.23
12 0.55 0.29 -0.14
13 0.87 0.76 0.65
14 0.63 0.74 0.36

Table 2.3: Example Training Dataset for our Toy Machine Learning Based Cat Brain
compact representation of the training dataset for this problem. 2
 
0.11 0.09
 
0.01 0.02
 
 
0.98 0.91
 
 
0.12 0.21
 
 
0.98 0.99
 
 
0.85 0.87
 
 
0.03 0.14
 
 
Example cat-brain dataset matrix X = 0.55 0.45 (2.1)
 
 
 
0.49 0.51
 
 
0.99 0.01
 
 
0.02 0.89
 
 
0.31 0.47
 
 
0.55 0.29
 
 
0.87 0.76
 
0.63 0.74
2 we will usually use upper case letters to symbolize matrices
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
27

Each row of matrix X is a particular input instance. Different rows represent different
input instances. Thus, moving along a row, one encounters successive elements of a
single input vectors. Moving along a column, one encounters elements of different
input instances. Notice that an individual element is now h indexedi by 2 numbers, as
th
opposed to 1 in a vector. Thus the 0 row is the vector x00 x01 representing the
0th input instance.
MATRIX REPRESENTATION OF DIGITAL IMAGES
Digital images too are often represented as matrices. Here, each element represents
the brightness at a specific pixel position (x, y coordinate) of the image. Typically, the
brightness value is normalized to an integer in the range 0 to 255. 0 is black and 255
is white and 128 is gray etc3 . Following is an example of a tiny image, 9 pixel in width
and 4 pixel in height.
 
0 8 16 24 32 40 48 56 64
 

 64 72 80 88 96 104 112 120 128

I4,9 = 
 
 (2.2)
128 136 144 152 160 168 176 184 192
 
192 200 208 216 224 232 240 248 255

The brightness increases gradually from left to right and also top to bottom. I00 rep-
resents the top left pixel which is black. I3,8 represents the bottom right pixel which is
white. The intermediate pixels are various shades of gray in between black and white.
The actual image looks as shown in Figure 2.2.

Figure 2.2: Image corresponding to matrix I4,9 in equation 2.2

2.4 Python Code: Introduction to Matrices, Tensors and Images via Numpy
and PyTorch parallel code
For programming purposes, one can think of tensors as multi-dimensional arrays. Scalars
are 0-dimensional tensors. Vectors are 1-dimensional tensors. Matrices are 2-dimensional
tensors. RGB images are 3-dimensional tensors (colorchannels × height × width). A
batch of 64 images is a 4-dimensional tensor (64 × colorchannels × height × width).

2.4.1 Python Numpy code for introduction to Tensors, Matrices and Images

3 in digital computers, numbers in the range 0..255 can be represented with a single byte of storage, hence
this choice

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
28

Listing 2.3 Introduction to matrices via numpy


Matrix is a 2D array of numbers, i.e., 2D tensor.
Entire Training data input set for a machine-learning model can be viewed as a matrix
Each input instance is one row.
Row count ≡ number of training examples, column count ≡ training instance size
X = np.array(
[
[0.11, 0.09], [0.01, 0.02], [0.98, 0.91],
cat-brain training data input
[0.12, 0.21], [0.98, 0.99], [0.85, 0.87],
15 examples, each with 2 val-
[0.03, 0.14], [0.55, 0.45], [0.49, 0.51], ues - hardness, sharpness
[0.99, 0.01], [0.02, 0.89], [0.31, 0.47], 15 × 2 numpy array, created by
[0.55, 0.29], [0.87, 0.76], [0.63, 0.24] directly specifying values
]
)
Shape of a tensor is a list.
For a matrix, first list
print(''Shape of the matrix is: ''.format(X.shape)) element is num rows, second
list element is num columns
first_element = X[0, 0] Square brackets extract individual matrix elements

row_0 = X[0, :] standalone colon operator denotes all possible indices


row_1 = X[1, 0:2] colon operator denotes range of indices

column_0 = X[:, 0] 0th column


column_1 = X[:, 1] 1th column

Listing 2.4 Slicing and dicing matrices

Ranges of rows and columns can be specified via colon operator to slice off (extract) sub-matrices
first_3_training_examples = X[:3, ] extract first 3 training examples (rows)

print(''Sharpness of 5-7 training examples is: ''


.format(X[5:8, 1])) extract sharpness feature for 5th to 7th training examples

Listing 2.5 Tensors and Images in numpy


Numpy n-dimensional arrays represent tensors
A vector is a 1-tensor, a matrix is 2-tensor, a scalar is 0-tensor
tensor = np.random.random((5, 5, 3)) Creating a random tensor of specified dimensions

All images are tensors. RGB image of height H, width W is 3-tensor of shape [3, H, W]
I49 = np.array([[0, 8, 16, 24, 32, 40, 48, 56, 64],
[64, 72, 80, 88, 96, 104, 112, 120, 128],
[128, 136, 144, 152, 160, 168, 176, 184, 192],
[192, 200, 208, 216, 224, 232, 240, 248, 255]],
dtype=np.uint8) 4 × 9 single channel image shown in Fig 2.2

img = cv2.imread('../../Figures/dog3.jpg') Read 199 × 256 × 3 image from disk


img_b = img[:, :, 0]

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
29

usual slicing dicing operators work


img_g = img[:, :, 1] extract red, green, blue channels of image, shown in Fig. 2.3
img_r = img[:, :, 2]
img_cropped = img[0:100, 0:100, :] crop out 100 × 100 sub-image, shown in Fig. 2.4

(a) Original image (b) Red channel

(c) Green channel (d) Blue channel


Figure 2.3: Tensors and images in numpy

Figure 2.4: Cropped image of dog

2.4.2 PyTorch code for introduction to Tensors and Matrices

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
30

Listing 2.6 Tensors in PyTorch

cat-brain training data input, 15 examples, each with 2 values - hardness, sharpness
15 × 2 torch tensor, each element a 64 bit float, created by directly specifying values
Y = torch.tensor([[0.11, 0.09], [0.01, 0.02], [0.98, 0.91],
[0.12, 0.21], [0.98, 0.99], [0.85, 0.87],
[0.03, 0.14], [0.55, 0.45], [0.49, 0.51],
[0.99, 0.01], [0.02, 0.89], [0.31, 0.47],
[0.55, 0.29], [0.87, 0.76], [0.63, 0.24]],
dtype=torch.float64)

Torch tensors can be converted to Numpy arrays, the two arrays are equivalent
np.allclose(X, Y.numpy(), rtol=1e-7)

np.allclose(torch.from_numpy(X), Y, 1e-7)

print(Y[3, :]) Slicing operations of numpy arrays work on torch tensors too
print(Y[3:5, 1:2])

print(torch.from_numpy(X) - Y) Torch tensors can be added subtracted like Numpy arrays

2.5 Basic Vector and Matrix operations in Machine Learning and Data
Science
In this section we will introduce several basic vector and matrix operations along with
examples to demonstrate their significance in image processing, computer vision and
machine learning. It is meant to be an application-centric introduction to linear alge-
bra. But it is not meant to be a comprehensive review of matrix and vector operations,
for which the reader is referred to a textbook on linear algebra.

2.5.1 Matrix and Vector Transpose


In equation 2.2 we encountered the matrix I4,9 depicting a tiny image. Suppose we
want to rotate the image by 90° , so that it looks like Figure 2.5. The original matrix I4,9

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
31

Figure 2.5: Image corresponding to transpose of matrix I4,9 ,shown in equation 2.3.
This is equivalent to rotating the image by 90° angle
.
T
and its transpose I4,9 = I9,4 are shown below
 
0 8 16 24 32 40 48 56 64
 

 64 72 80 88 96 104 112 120 128

I4,9 = 



128 136 144 152 160 168 176 184 192
 
192 200 208 216 224 232 240 248 255
 
0 64 128 192
 
8 72 136 200
 
 
16 80 144 208
 
 
24 88 152 216
 
 
T
I4,9 = I9,4 = 32 96 160 224 (2.3)
 
 
 
40 104 168 232
 
 
48 112 176 240
 
 
56 120 184 248
 
64 128 192 255
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
32

By comparing equation 2.2 and equation 2.3 one can easily see that one can be ob-
tained from the other by interchanging the row and column indices. This operation is
generally known as matrix transposition.

Formally, the transpose of a matrix Am,n with m rows and n columns is another ma-
trix with n rows and m columns. This transposed matrix, denoted ATn,m is such that
ATij = Aji . Like the value at row 0 column 6 in matrix I4,9 is 48. In the transposed ma-
trix the same value will appear in row 6 and column 0. In matrix parlance I4,9 [0, 6] =
T
I9,4 [6, 0] = 48.

Vector transposition is really a special case of matrix transposition (since all vectors
are matrices - a column vector with n elements is a n × 1 matrix). For instance, an
arbitrary vector and its transpose are shown in equation
 
1
 
~v = 2 (2.4)
 
 
3
h i
~v T = 1 2 3 (2.5)

2.5.2 Dot Product of two vectors and its role in Machine Learning and Data Science
In section 1.3 we saw the simplest of machine learning models where the output is gen-
erated by taking a weighted sum of the inputs (and then adding a constant bias value).
This model/machine is characterized by the weights w0 , w1 and bias b. Take the rows of
Table 2.3. E.g., for row 0, input values are: hardness of approaching object = 0.11 and
softness = 0.09. The corresponding model output will be y = w0 × 0.11 + w1 × 0.09 + b.
In fact, goal of training is to choose w0 , w1 and b such that model outputs are as close
as possible to the known outputs: i.e., y = w0 ×0.11+w1 ×0.09+b should be as close to
−0.8 as possible, y = w0 ×0.01+w1 ×0.02+b  should be as close to −0.97 as possible etc.
x0
In general, given an input instance ~x =  , the model output is y = x0 w0 +x1 w1 +b.
x1
We will keep returning to the above model throughout the chapter. But in this sub-
section, let us consider a different question. In this toy example we have only 2 val-
ues per input instance. That implies we have only 3 model parameters: 2 weights,
w0 , w1 and 1 bias b. Hence it is not very messy to write the model output flat out as
y = x0 w0 + x1 w1 + b. Is there a compact way to represent the model output on a
specific input instance, irrespective of the size of the input?

Turns out the answer is yes - we can use an operation called dot product from the
world of mathematics. We have already seen in section 2.1 that an individual instance
of model input can be compactly represented by a vector, say ~x (it can have any num-
ber of input values). We can also represent the set of weights as vector w
~ - it will have
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
33

the same number of items as input vector. The model output is obtained via the dot
product operation of vectors. Dot product is simply the point wise multiplication of
the two vectors ~x and w
~ as shown below.
   
x0 w0
   
 x1   w1 
   
Formally, given two vectors ~x =  ..  and w
 ~ = 
 .. , dot product of the two vec-

 .   . 
   
xn wn
tors is defined as

~x · w
~ = x0 w0 + x1 w1 + · · · xn wn (2.6)

In other words, sum of the products of corresponding elements of the two vectors is
called dot product of the two vectors, denoted ~a · ~b.
Note that the dot product notation can compactly represent the model output as y =
~ · ~x + b. The representation does not increase in size even when the number of inputs
w
and weights are large.

Consider our (by now familiar) cat brain example again. Suppose the weight vector
3
~ =   and the bias value b = 5. Then the model output for the 0th input instance
is w
2
   
0.11 3
from Table 2.3 will be   ·   = 0.11 × 3 + 0.09 × 2 + 5 = 5.51.
0.09 2
It is another matter that these are bad choices for weight and bias parameters, since
the model output 5.51 is a far cry from the desired output −0.89. We will soon see how
to obtain better parameter values. For now, we just need to note that the dot product
offers a neat way to represent the simple weighted sum model output.

The dot product is defined only if the vectors have the same dimensions.
D E
Sometimes the dot product is also referred to as inner product, denoted ~a, ~b . Strictly
speaking, the phrase inner product is a bit more general, it applies to infinite dimen-
sional vectors as well. In this book, we will often use the terms interchangeably, sacri-
ficing mathematical rigor for enhanced understanding.

2.5.3 Matrix Multiplication and Machine Learning, Data Science


Matrix-Vector
  Multiplication: In section 2.5.2 we saw that given a weight vector, say
3
w
~ =   and the bias value b = 5, the weighted sum model output upon a sin-
2

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
34

 
0.11
gle input instance, say   can be represented using a vector-vector dot product
0.09
   
0.11 3
~ · ~x + b = 
w  ·   + 5. Now, as depicted in equation 2.1, during training we
0.09 2
are dealing with many training data instances at the same time. In fact, in real life, we
typically deal with hundreds of thousands of input instances, each having hundreds of
values. Is there a way to represent this compactly, such that it is independent of the
count of input instances and their sizes?

Again turns out the answer is yes. We can use the idea of matrix-vector multiplica-
tion from the world of mathematics. The product of a matrix X and column vector w ~
is another vector, denoted X w.
~ Its elements are the dot products between therow
 vec-
3
tors of X and the column vector w.~ E.g.,given the model weight vector w~ =   and
2
the bias value b = 5, the outputs on the toy training dataset of our familiar cat-brain
model (equation 2.1) can be obtained via the following steps

   
0.11 0.09 0.11 × 3 + 0.09 × 2 = 0.51
   
0.01 0.02 0.01 × 3 + 0.02 × 2 = 0.07
   
   
0.98 0.91 0.98 × 3 + 0.91 × 2 = 4.76
   
   
0.12 0.21 0.12 × 3 + 0.21 × 2 = 0.78
   
   
0.98 0.99 0.98 × 3 + 0.99 × 2 = 4.92
   
   
0.85 0.87 0.85 × 3 + 0.87 × 2 = 4.29
   
   
0.03 0.14 0.03 × 3 + 0.14 × 2 = 0.37
    
 
  3  
0.45   = 0.55 × 3 + 0.45 × 2 = 2.55 (2.7)
   
0.55
  2  
0.49 × 3 + 0.51 × 2 = 2.49
   
0.49 0.51 
   
0.99 × 3 + 0.01 × 2 = 2.99
   
0.99 0.01
   
   
0.02 × 3 + 0.89 × 2 = 1.84
0.02 0.89
  

   
0.31 0.47 0.31 × 3 + 0.47 × 2 = 1.87
   
   
0.55 × 3 + 0.29 × 2 = 2.23
0.55 0.29
  

   
0.87 0.76 0.87 × 3 + 0.76 × 2 = 4.13
   
0.63 0.74 0.63 × 3 + 0.74 × 2 = 3.37

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
35

Adding the bias value of 5, the model output on the toy training dataset is
 
5 + 0.51 = 5.51
 
5 + 0.07 = 5.07
 
 
5 + 4.76 = 9.76
 
 
5 + 0.78 = 5.78
 
 
5 + 4.92 = 9.92
 
 
5 + 4.29 = 9.29
 
 
5 + 0.37 = 5.37
 
 
(2.8)
 
5 + 2.55 = 7.55
 
 
5 + 2.49 = 7.49
 
 
5 + 2.99 = 7.99
 
 
5 + 1.84 = 6.84
 
 
5 + 1.87 = 6.87
 
 
5 + 2.23 = 7.23
 
 
5 + 4.13 = 9.13
 
5 + 3.37 = 8.37

In general, matrix column-vector multiplication works as follows.

A~b = ~c or
    
a11 a12 · · · a1n b1 c1 = a11 b1 + a12 b2 + · · · a1n bn
    
 a21 a22 · · · a2n   b2   c2 = a21 b1 + a22 b2 + · · · a2n bn 
    
  =  (2.9)
 ..   ..  ..
 
 
 .  .   . 
    
am1 am2 · · · amn bn cm = am1 b1 + am2 b2 + · · · amn bn

Matrix-Matrix Multiplication: Generalizing the notion of matrix times vector, we can


define a matrix times matrix. A matrix with m rows and p columns, say, Am,p can be
multiplied with another matrix with p rows and n columns, say Bp,n to generate a matrix
with m rows and p columns, say Cm,n , e.g., Cm,n = Am,p Bp,n . Note that the number of
columns of the left matrix must match the number of rows in the right matrix. Element
i, j of the result matrix - Ci,j is obtained by point-wise multiplication of the elements of
the ith row vector of A and j th column vector of B. The following example illustrates
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
36

the idea

 
a11 a12
 
A3,2 = a21 a22 
 
 
a31 a32
 
b11 b12
B2,2 = 
b21 b22

 
a11 a12  
  b b
 11 12
C3,2 = a21

a22   
  b21
b22
a31 a32
 
c11 = a11 b11 + a12 b21 c12 = a11 b12 + a12 b22
 
= c21 =a21 b11 +a22 b21 c22 = a21 b12 + a22 b22 
 
 
c31 = a31 b11 + a32 b21 c32 = a31 b12 + a32 b22

The computation for C2,1 is shown via highlights by way of example.


It is worthwhile to note that matrix multiplication is not commutative, in general,
AB 6= BA.

At this point, the astute reader may already have noted that the dot product is a spe-
cial case
 of matrix multiplication.
  For instance, the dot product between two vectors
w0 x0
w
~ =   and ~x =   is equivalent to transposing either of the two vectors and
w1 x1
then doing a matrix multiplication with the other. In other words,

 T    
w0 x h i x0
~ T ~x =  
~ · ~x = w
w  0  = w0 T
w1   = ~x w~ = w0 x0 + w1 x1
w1 x1 x1

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
37

 
x0
 
 x1 
 
The idea works in higher dimensions too. In general, given two vectors ~x = 
 ..  and

 . 
 
xn
 
w0
 
 w1 
 
w
~ = .. , dot product of the two vectors is defined as

 . 
 
wn

~x · w
~
 
x0
 
 x1 
i
h 
~ T ~x = w0
=w w1 · · · wn  . 

 .. 

 
xn
 
w0
 
 w1 
i
h 
= ~xT w
~ = x0 x1 · · · xn  . 

 .. 

 
wn
= x0 w0 + x1 w1 + · · · xn wn (2.10)

Another special case of matrix matrix multiplication is row-vector matrix multiplica-


tion, e.g.,
~bT A = ~c or
 
a a12
h i  11  h i
a22  = c1 = a11 b1 + a21 b2 + a31 b3
 
b1 b2 b3 a21 c2 = a12 b1 + a22 b2 + a32 b3
 
a31 a32

Transpose of Matrix Products: Given two matrices A and B where the number of
columns of A matches the number of rows of B (i.e., it is possible to multiply them)
the transpose of the product is the product of the individual transposes, in reversed
order. The rule also applies to matrix vector multiplication. The following equations
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
38

capture this rule


T
(AB) = B T AT
T
(A~x) = ~xT AT
T
~xT A = AT ~x (2.11)

2.5.4 Length of a Vector aka L2 norm and its role in Machine Learning
Suppose a machine learning model was supposed to output a target value ȳ but it out-
putted y instead. We are interested in the error made by the model. The error is the
difference between the target and the actual outputs.

We would like to make one important note here. During computing errors, we are
only interested in how far away from ideal the computed value is. We do not care
whether the computed value is bigger or smaller than ideal. For instance, if the target
(ideal) value is 2, the computed values 1.5 and 2.5 are equally in error - we are equally
happy or unhappy with either of them. Hence, it is common practice to square error
values. Thus for instance, if the the target value is 2 and the computed values 1.5 the
error is (1.5 − 2)2 = 0.25. If the target value is 2.5, the error is (2.5 − 2)2 = 0.25.
The squaring operation essentially eliminates the sign of the error value. We can then
follow it up with a square root, but it is OK not to. One might ask, but wait, squaring
alters the value of the quantity, don’t care about the exact values of the error? Answer
is, we usually don’t, we only care about relative values of errors. If the target is 2, all we
want that the error for an output value of, say, 2.1 is less than the error for output value
of 2.5, the exact values of the errors do not matter.

Let us now continue with our discussion of machine learningmodel  error. As seen
3
earlier in section 2.5.3, given a model weight vector, say w
~ =   and the bias value
2
 
0.11
b = 5, the weighted sum model output upon a single input instance, say,   is
0.09
   
0.11 3
  ·   + 5 = 5.51. The corresponding target (ideal) output, from Table 2.3, is
0.09 2
2
−0.8. The squared error e2 = (−0.8 − 5.51) = 39.82 gives us an idea of how good or
bad the
 model parameters 3, 2, 5 are. For instance, if instead, weuse aweight
  vector
1 0.11 1
~ =   and bias value −1, we get model output w
w ~ · ~x + b =  · −1 =
1 0.09 1
−0.8. The output is exactly same as the target. The corresponding squared error
2
e2 = (−0.8 − (−0.8)) = 0. This (zero error) immediately tells us that 1, 1, −1 is a
much better choice of model parameters than 3, 2, 5.

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
39

What happens when we have multiple inputs, as during training a model? In equa-
tion 2.8 we have seen that given the toy training dataset from Table 2.3, a simple
weighted sum model with weights 3, 2 and bias 5 will generate the output vector

 
5.51
 
5.07
 
 
9.76
 
 
5.78
 
 
9.92
 
 
9.29
 
 
5.37
 
 
~y = 7.55
 
 
 
7.49
 
 
7.99
 
 
6.84
 
 
6.87
 
 
7.23
 
 
9.13
 
8.37

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
40

From Table 2.3 we also see that the target output vector is

 
−0.8
 
−0.97
 
 
 0.89 
 
 
−0.67
 
 
 0.97 
 
 
 0.72 
 
 
−0.83
 
 
~y¯ =  0.00 
 
 
 
 0.00 
 
 
 0.00 
 
 
−0.09
 
 
−0.22
 
 
−0.16
 
 
 0.63 
 
0.37

©Manning Publications Co. To comment go to liveBook


https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
41

The differences between target and model output over the entire training set can be
expressed as a vector

     
5.51 −0.8 6.31
     
5.07 −0.97  6.04 
     
     
9.76  0.89   8.87 
     
     
5.78 −0.67  6.45 
     
     
9.92  0.97   8.95 
     
     
9.29  0.72   8.57 
     
     
5.37 −0.83  6.2 
     
     
~y¯ − ~y = 7.55 −  0.00  =  7.55 
     
     
     
7.49  0.00   7.49 
     
     
7.99  0.00   7.99 
     
     
6.84 −0.09 −0.09
     
     
6.87 −0.22 −0.22
     
     
7.23 −0.16 −0.16
     
     
9.13  0.63   0.63 
     
8.37 0.37 0.37

We can square the individual elements of the difference vector to obtain a squared
error vector. However, to get a proper feel for the overall error during training, we
would like to obtain a single number. What we would really like to do is to square each
term of the difference vector and then add those elements to yield a single number. Recalling
equation 2.10, this is exactly what would happen if we take the dot product of the difference
vector with itself. That happens to be the definition of the squared magnitude or length
or L2-Norm of a vector: dot product of the vector with itself. In the above example,
©Manning Publications Co. To comment go to liveBook
https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion
Random documents with unrelated
content Scribd suggests to you:
CHAPTER XXIII.
"MOTHER'S EYES ARE VERY TIRED."

"YOU have come to see me," said Christina rising, as


Margaret Fenton was shown into her sitting room in Gower
Street.

"Yes, ma'am," answered Margaret, looking up in the


face of her future mistress.

"Sit down," she said, "and I will explain what I should


want you to do. I hear you are very fond of children."

"Very indeed, ma'am; I was a nurse for many years till I


married."

"So I heard. Now, you know, the place I want you to fill
is not exactly a nurse's situation, it is more that of a
matron. I am very lonely, and I am going to take care of a
few little children, and try to bring them up to be useful and
happy; but above all things I wish to teach them to love our
Saviour."

Margaret's eyes looked very sympathising, but she did


not speak.

"I have almost settled on a house at Hampstead, and I


shall want, I believe, three servants; that is, a nurse-
matron, a cook, and a housemaid. My own maid Ellen has
consented to be the housemaid, at any rate for a time, and
if you are willing to be nurse, there only remains the cook
to find. But first I must tell you that I shall not be rich, so
your wages will have to be moderate."

"Oh, I am quite willing!" exclaimed Margaret. "To have


my child with me, and to be engaged in such work, I should
only want just enough to keep me respectable."

"You shall have that, you may be sure. But I mean this:
I shall not be able to pay you according to the amount of
trust I put in you, but rather according to what I can
afford."

"I quite agree," said Margaret earnestly. Then,


hesitating, she said shyly: "Have you made up your mind,
ma'am, what sort of person you wish to have for a cook?"

"Not at all. Do you know of anyone?"

"It has been my only trouble in accepting your kind


offer, ma'am—my mother-in-law; she will, I fear, be so very
strange without me and Maggie; but I did not know if she
would be too old. She is very strong and able, ma'am, and
an earnest Christian woman too."

"I should be most thankful to find such an one. But


when could I see her?" asked Christina, while an inward
thanksgiving rose to the Father who was helping her
forward step by step.

"Well, ma'am, as it happens, she has come up to town


with me to-day. I had earned an extra shilling or two, and I
gave her and Maggie the treat, as I thought it might be a
long time before I could again."
"I am so glad," said Christina; "where is she now?"

Margaret went to the window and looked out, and


Christina also glanced down the street, and in a minute the
grandmother and child appeared pacing slowly on the
opposite side of the way.

"There they are!" exclaimed Margaret. And hastily


asking permission, she ran down, and soon touched her
mother-in-law on the shoulder.

"Come in, mother," she said breathlessly, "she wants to


see you."

Christina was struck with the calm face of the elder


woman as she turned to cross the road; her white plaited
cap-border setting off the peaceful face and neat hair; and
again she thanked God and took courage.

"So here is Maggie," said the ringing voice, while the


beautiful face bent down and kissed the little one. "Maggie
is to be my first little niece; eh, Maggie?"

The child drew back a little shyly, and her mother


spoke. "Maggie dear, this is the lady that is going to give us
all such a nice home! You will like to speak to her."

"Is it?" said Maggie, looking up.

"Yes, Maggie," said Christina; "and I shall love you so


much; and you, and mother, and perhaps grandmother, will
be so happy, I hope."

Maggie came forward under the influence of those kind


eyes, and laid her hand in Christina's. "Thank you, ma'am,"
she said, "'cause mother's eyes are very tired with that
work."
Christina kissed her again, and thought of their talk
about the clean children being the nicest, and then she
turned to the grandmother. It had all to be explained again;
but Mrs. Fenton did not accept it as quickly and readily as
her daughter-in-law and Christina expected.

"It is a great change in my life, ma'am," she said at


last; "and I think I must have time to consider it well. I
should like very, very much to do it; but I would not wish to
break up my little home, and lose what work I have now,
and then repent it!"

"My mother is a laundress," explained Margaret.

Christina looked abstractedly out of the window; a new


thought had struck her.

"I wonder," at length she said, "whether we could


manage it in rather a different way. There is a gardener's
cottage, a very small one, adjoining the house I think of
having, and I was going to let it off; but supposing you lived
there and did our washing for us?"

Margaret looked anxiously at her mother, as if this must


be the very thing for her.

The woman paused again. "I am extremely obliged,


ma'am," she said, with great feeling in her voice, "and I will
ask my Father about it, and let you know. I cannot go a
step without Him, ma'am."

Christina held out her hand kindly and gravely. "You are
quite right; and remember we shall all be one family in Him,
whatever our different callings may be."

She rang the bell, and told Ellen to give her visitors a
comfortable lunch in the dining room, and to ask Miss
Arbuthnot to step upstairs.

"Oh, aunt, I wish you had been here, only I was so


nervous in anticipation! But she is the dearest old creature
you ever saw."

"I met her on the stairs, a sweet face."

Christina then told her aunt all about the interview, and
they both hoped the decision would be in favour of
accepting her proposal.

Miss Arbuthnot had been extremely surprised when


Christina had first propounded her plans to her; but she had
quick and ready sympathy; she knew the desolation of the
young heart; and she had read enough of the life of
workhouse children to know that to rescue even a few of
these from the deadness, apathy, and sin which prevailed in
such places, would be no mean work. So she had consented
cheerfully, and Christina had given her a warm, grateful
kiss, and had said, "I will try to make your life too as happy
as I can, dear aunt."
CHAPTER XXIV.
A BASKET OF FLOWERS.

A WEEK or two passed away. The gentleman did not


take the house at Hampstead, and Christina did.

Old Mrs. Fenton consulted "her Father," as she said, and


decided to come and make her home in the gardener's
cottage, bringing with her her little stock of furniture, her
plants, and her washing paraphernalia.

She was soon settled into her tiny home, and after a
few days, felt as if she had lived there for years. Her own
fender, table, old-fashioned chest of drawers, cuckoo-clock,
etc., made her feel homelike at once; and she trusted she
had come to a right decision.

One day before Christina left Gower Street, she


privately asked Nellie if she could be spared to help her
arrange her house; but Nellie answered that it would be
impossible, and begged her not to put the question to her
mamma. "She would do anything to give me pleasure, and I
would not have her asked on any account."

"If you really feel so, Nellie, I shall ask Ada; for I
believe it would be an interest for her; only, you know, dear,
you are my friend."

Nellie smiled gratefully. "No one could be more


pleased," she said, and then blushing deeply, she added
softly, "Some day we may perhaps be more than friends."
"Hush!" said Christina, putting her hand in front of
Nellie's mouth. "I can't have that spoken of."

"We never have mentioned it," answered Nellie, looking


up to see if Christina were displeased; "but I should like to
tell you once how happy it would make me."

"Dear Nellie, I know; but it would not make you happy


unless I could with all my heart?"

"Oh, no!" said Nellie, looking down.

"Then do not talk of it at present."

And Christina gave her a loving kiss, and left the room.

Ada was enraptured at the invitation which Nellie


brought for her that evening; and the only difficulty was
how her attendance at her school could be arranged for.
After some consideration, and with many promises to take
care, Ada was to be trusted to go and return daily by
omnibus for the two weeks after her school began; and
Arthur willingly undertook to meet her in the morning,
before his own school hours, and see her safely into the
omnibus again after one o'clock.

Ada thought it was very good of her mamma to allow


this, and a few days afterwards packed her box, and went
with Christina to "Sunnyside," as the house was called.

The next day there came, by parcel delivery, about two


o'clock in the afternoon, a basket of lovely flowers from
Christina's garden. It was directed to Tom; and just inside,
on a slip of paper, was written, in her clear hand:

"For my dear little Tom; one of the gifts that his


heavenly Father sends him."

Tom's face, when the basket was opened, was eloquent;


but he turned away and burst into tears. Never before in his
life had he possessed such flowers, and to think they had
been sent to him quite overcame him.

When Nellie, with tasteful fingers, was arranging them


in all the vases she could muster, he said to her:

"I should like that little boy, Black Tom's son, to have
some of them; do you think you could take him a bunch,
Nellie?"

"Oh, willingly, dear! And what message shall I take with


them?"

Tom was silent; and after a few moments he said:


"Could you say they come from a little boy who loves
Jesus?"

"Yes, dear, I will; anything else?"

"And say I am like him, but that the Lord Jesus has
comforted me, and I don't mind so much now."

"Which bunch shall it be, Tom?" she asked. She held


each in turn where he could see them comfortably, and he
decided on what he considered the best.

"Then I think I shall take them vase and all," said Nellie,
"and tell him I shall fetch the vase when I think they are
faded."

"Yes, do; they will look prettier so. I suppose there will
be plenty of flowers in heaven," said Tom musingly.
"I should not wonder, for trees are spoken of; but I
believe, Tom, above everything else will be the joy of seeing
Jesus. It says, 'The Lamb . . . shall lead them unto living
fountains of waters: and God shalt wipe away all tears from
their eyes.'"

"Yes," answered Tom, looking thoughtfully out of the


window towards the clear sky, for he was lying in the
nursery; then turning round again, "I would like you to go
now Nellie, so that he may see them by daylight."

So Nellie fetched a sheet of silver paper, and standing


her vase in it, lightly pinned the corners at the top, and
telling her mamma where she was going, set forth.

A short walk brought her to one of the "gone down" and


miserable-looking streets which abound in London the
moment you turn out of the large thoroughfares. She went
down this, and presently came to the house she sought. It
was not by any means one of the "dens" of the vast city,
but miserable and squalid enough notwithstanding. She
rang the top bell of the four, and in a minute, a woman
looked out from the bottom room and asked what she
wanted.

"Do you think I can see Tom Taylor's boy?" she asked.

"He's at home, safe enough!" said the woman, with not


unkind humour. "Go up and find him, miss; you know the
room."

"Yes; I'm the doctor's daughter."

The woman eyed the tissue-paper parcel inquisitively,


and Nellie said—
"Perhaps you would like to see the flowers I have
brought for him."

"Step in, miss."

Nellie entered the dirty little room, and unpinned her


parcel on the table. The flowers, with their elegant
arrangement, standing on the snowy paper, looked
strangely incongruous in the untidy apartment; but Nellie
had not brought them in there for nothing.

She looked up in the woman's face, "Who made these,


do you think?" she asked her.

The woman shook her head, then smelt at them, and


said suddenly, "Why, I suppose it's God Almighty?"

"Yes, God Almighty. He gave them to us, and we should


all have had lovely gardens, and every happiness, but for
sin; that has spoiled everything. But, do you know, He has
made a way by which we may have it all again, and that is
by believing in Jesus Christ His Son, and having all these
sins forgiven."

The woman looked at the flowers again. "It's a hard


world," she said; "I wish I could think there ever could be
anything different."

"There will be for those who will look to Jesus," said


Nellie. "Think over that, will you? And you will find
everything will look altered."

The woman glanced round the dingy room and sighed.

"I'll look in again as I come down, if I can," said Nellie.


So she ascended to the very top, and knocked at the
door of the front room.

Tom Taylor's boy might have lived in a very different


room from this; but his father's good earnings were spent
at the public-house at the corner of the street, and the poor
little boy often went short even of the necessaries of life.

Nellie knocked at the door, and a thin, querulous voice


bade her "come in."

She entered. In the room were two small beds, and on


one of these, at some distance from the window, lay a boy
of about ten or eleven years. He was somewhat propped up
by two pillows, but still he seemed obliged to lie very flat.
Over his shoulders an old worn jacket was drawn and
buttoned in the front, which did not however hide the soiled
and tattered shirt beneath.

Nellie had been there once before, and she knew the
smell of the close little room; but she came forward to the
bedside.

"Tom," she said tenderly, "I have been sent to you with
a present."

The boy looked astonished. "For me? Who is it from?"

"It is from a little boy who sent you word that 'he loves
Jesus.'"

"Oh!" said Tom; "And what is it?"

Nellie set the parcel on a little table which was pushed


against the side of his bed, and opened it the second time.
The child looked and looked at it, clasping his thin and
wasted hands. "I never saw such beauties, never," he said,
and slowly, as he looked, tears trickled down his cheeks,
and ran on to the collar of his old jacket. And while he
gazed at them, Nellie softly and clearly sang words which
melted that hard young heart as much as the flowers had,
and completed the work they had begun.

"The great Physician now is near,


The sympathising Jesus;
He speaks the drooping heart to cheer:
Oh, hear the voice of Jesus!

"Sweetest note in seraph song,


Sweetest name on mortal tongue,
Sweetest carol ever sung,
Jesus, blessed Jesus.

"Your many sins are all forgiven;


Oh, hear the voice of Jesus!
Go on your way in peace to heaven,
And wear a crown with Jesus.

"His name dispels my guilt and fear,


No other name but Jesus;
Oh, how my soul delights to hear
The precious name of Jesus!"

The boy listened to every word of the long hymn,


looking at the flowers till he was too blinded with tears;
then he turned away his head, and hid his face in the sheet.
When Nellie had done, there was silence; at last the boy
stretched out his hand to draw the flowers nearer to him.
He bent his head down, and drew in their lovely fragrance,
and then touched one of them tenderly and reverently with
his finger.

The child looked and looked at it,


clasping his thin and wasted hands.

"Did you say Jesus sent them?" he asked.


"A little boy who lies helpless like you sent them,
because he loves Jesus, and wishes you to love Him too."

"I never did before," said the boy; "I always thought it
was so dreadfully hard. But these flowers—" he covered his
face again, and sobbed.

Nellie touched his hand. "The little boy sent you another
message, Tom."

"Did he? What?" he answered, wiping his eyes again on


the sheet, and looking up.

"He said, 'Tell him I am like him; but the Lord Jesus has
comforted me, and I don't mind so much now.'"

"Tell him then," said the boy, "that his Jesus has
comforted me too; for though I cry, miss, it's only because I
can't thank Him enough for wanting to save me."

Nellie passed her hand over his forehead, and pushed


back the tangled hair. "I will tell him," she answered very
tenderly; "and he will be so very glad, Tom. But now I must
go, and I will try and come to-morrow, and see if I can
make you a little more comfortable."

She made her way down the dingy staircase again, and
stopped at the door of the front room, as she had promised.

It stood wide-open, and the woman came forward. She


had been busy while Nellie was upstairs, and had whisked
away many untidinesses, and had brushed up her hearth,
and now stood with a smile of welcome.

Nellie was quick to perceive the change, and said, "You


have made it tidy for me; thank you very much." Then
stopping short, in her gentle, modest way, she said,
"There's another visitor would willingly come in."

"Who, miss? A friend of yours?" then guessing from


Nellie's face whom she meant, she sat down in a chair, and
exclaimed, "This ain't no fit place for Him!"

"No; but He says He loves to come and dwell in the


lowly and contrite heart; and if you are sorry for all the
past, and willing to take Him for your visitor, and Saviour,
and King, no one will be more glad than He to come."

"Bless you, miss!" said the woman, wringing her hand


hard. "I never thought of it; but I will do as you say, and
the first thing as ever I do shall be to clean my place up a
bit for Him."

Nellie smiled with a glad look. "Ah!" she said. "It's the
heart, remember, He wants."

"I know, I know; but He shall have a clean room too!"


CHAPTER XXV.
FATHERLESS BAIRNS.

DR. ARUNDEL'S carriage rolled swiftly towards


Hampstead. In it were the Doctor and Mrs. Arundel and
Tom, while Arthur found a seat on the box by the
coachman. Nellie had already gone by omnibus with Netta
and Isabel. They were all going to pay Christina the much-
talked-of visit.

Arthur informed them he was "Prince Arthur" going to


open and inspect "The Orphanage," and pretended to be
very grand. Christina and he had kept up this little joke
whenever she had made her flying visits to No. 8. He had
told her that, as he was such an august personage, he must
not go till everything was ready; but he had kept away with
great difficulty, as the accounts from Nellie and Walter made
him long to be able to talk it over and enjoy it with them.
They had been backwards and forwards a good deal—Nellie
to help in suggesting and arranging, and Walter to hang
pictures, move furniture, and assist generally in a most
wonderful way, Christina thought; for she had never before
met a gentleman who could "use his hands," as she called
it.

Walter was invaluable, and Ada, who was chief "aide-


de-camp," used to suggest sending for him whenever the
least difficulty arose. He looked in, however, on them nearly
every day, and Miss Arbuthnot, who was ignorant of the
episode of that walk along the shore, heartily wished that
the two whom she considered so suitable for each other
should find it out. She, too, remembered the past; but she
had wisdom enough neither to refer to it nor to make any
remark as to the present. She welcomed Walter gladly, and
thought the day seemed rather blank which had not brought
his pleasant face. Did Christina think so? If so she kept it to
herself; for nobody could guess.

As Dr. and Mrs. Arundel drove along Seymour Street on


this bright afternoon of their visit to Christina, the carriage
was brought to a stand by a crowd collected round some
object at the side of the road.

"I wonder what it is?" said Mrs. Arundel, leaning


forward anxiously.

"I do not suppose it is much," said Dr. Arundel; "but I


will go and see."

He got out, and pressed into the crowd. "What is it?" he


asked.

"A woman fainted," was the reply.

"Let me in then," he answered; "I am a doctor."

The by-standers made way for him, and he found


himself in a moment by the side of a woman who was lying
on the curbstone, her head supported by the friendly knee
of an elderly woman who had been passing when she fell.
Even in her fainting condition she was clutching an infant,
who was crying painfully, in her wasted arms.

Dr. Arundel begged the people to stand further away to


give her air, while he dipped his handkerchief in a jug of
water which someone had brought, and bathed her face and
hands, and then sent a message to his carriage for his
wife's smelling-bottle. Gradually the poor creature began to
revive; and as she did so, she held her baby tighter to her
breast.
"Stay, you will hurt it," said Dr. Arundel tenderly; "no
one will take it away; do not press it so."

She instantly desisted, but opened her eyes and gazed


at his face in a wild kind of way.

"You are better now," he said soothingly.

"Oh, yes!" she answered, trying to struggle to her feet.


"Let me go on."

"Where is your home? You are not fit to be out, my


dear," said the kind doctor in his fatherly manner, gently
preventing her rising.

"No; I am dying," she said, "dying of starvation!"

The by-standers who could hear this looked appalled,


and several hands were put in pockets to draw out some
money.

"You are ill, my dear," he said. "Where is your home?"

"I have no home," she answered in a low tone. "I was


taking her to the river—when—I can't remember," she said,
looking bewildered.

"Were you going to the workhouse?" he asked, not


hearing.

"No, no, not there! I was driven to it," she said huskily,
"she was so hungry; I had nothing for her—she cried so
dreadfully. I had sold everything; I had had nothing to eat
since yesterday morning, and I could not see her starve, so
I started. But, oh, my baby, it is so hungry!"
She looked down on its wee pinched face, and wrapped
her thin tattered shawl closely round it again protectingly,
and, oh, so tenderly.

"My poor girl, you are very ill," said the doctor; "but if
you will come to a good woman I know of, she will take care
both of you and baby; should you like that?"

"Not be separated?" she exclaimed, looking up at him.


"Oh, say that again! How can I part with my baby?"

"Yes," he said, "you shall be together. It is not far from


here."

The crowd had begun to disperse. Several small coins


had been placed in the doctor's hands for the woman's
relief, and he looked round now at the shops near. A
chemist's was close, and next door to it a second-rate
coffee-house. He told the poor creature he would be back in
a moment, and hastened in and asked for a cup of coffee.
The woman who was serving had seen the commotion, and
quickly poured out some, asking, as was natural, "What is
the matter, sir?"

"Dying of consumption and starvation," he answered.

"Oh, sir!" said the woman.

"Too true; there is many a respectable woman who goes


down, and down, and down, and dies at last, sooner than
ask for help."

He took the cup, and returned to the dying woman. He


poured a little in the saucer, and put it to her lips.

"My baby first," she said faintly, drawing back.


"Not coffee for her; get some milk," he said, looking up
at the coffee-house keeper, who had followed him out.

She hastened back, and soon came with some in a


teacup.

Meanwhile the sick mother had with difficulty raised


herself from the still-supporting knee, and had settled her
babe in her lap, so that when the milk came she might be
ready for it. Then she stretched out her wasted hand for the
coffee, and drank it eagerly.

When the milk arrived, the young mother took the


spoon and poured a little into the poor little mouth. The
child stopped crying, and swallowed it; but before she could
get ready the next spoonful it began again.

"When was it last fed?" asked Dr. Arundel.

"I had nothing for it," she answered, "so I spent my last
halfpenny last night for a hap'orth of milk, and it had the
rest of that early this morning."

"Poor little baby," he said pityingly. "Now while you give


it a few more spoonfuls, I will go and get a cab, and will
take you where I promised."

He went to his wife, who had been anxiously looking out


of the carriage window, but could not leave little Tom.

"Starvation!" he said sadly. "Poor things. I cannot go


with you, love; I must take her to Cromer Street. What a
mercy our little hospital room has a bed vacant!"

"It is indeed; but cannot you come?"


"No; I am so very sorry; but it will be a wonder if she
pulls through the next few hours. She is revived now, and
we must get her to bed as fast as possible; she is in the last
stage of consumption."

"Oh, poor, poor creature!"

"Yes; and I want to speak to her of Christ, so good-bye.


Ask Christina if she can attempt a baby three or four
months old; for there will be very shortly another little
orphan cast on the world."

"I will tell her. Poor mother! Poor baby!"

"Drive on now, love; you can do no good to her; and I


shall call a cab at once."

He gave the signal to the coachman, and the carriage


once more proceeded on its way.

Tom was very silent. He had heard enough to


understand, and he held his mother's hand tightly, but did
not like to ask her any questions; for she seemed sad, and
Tom kissed her hand softly over and over again, without
getting more than a loving pressure in return.

"Poor creature!" she said at last.

"Will she die?" asked Tom, speaking for the first time.

"I fear so. Oh, Tom, what must it be to leave a baby


behind on the cold world."

Tom kissed her hand again, and then said softly, "You
often say, mamma, we must trust everything to Jesus; I
suppose, if she loved Him—"
"Yes, my dear," she answered, rousing herself; "that is
the only way. 'Cast thy burden upon the Lord, and He shall
sustain thee.'"

She bent down and kissed his pale little face.

"So Tom has turned comforter," she said, smiling softly,


and looking at him.

"When shall we get there?" asked Tom presently.

"Very soon; we are just up the second hill, and soon we


shall have a third. That shows how high it is, Tom."

"Here is the Heath," exclaimed his mother; "and here


are the donkeys Christina so dislikes! And now we turn
down to the left, and shall be there in a moment."

As she spoke they drew up at Sunnyside. There at the


gate stood Christina and Ada, while just inside the garden
they could see Nellie, Isabel, and Netta, who had already
arrived. Walter came forward when he heard the carriage,
for he had been specially invited for the grand occasion.

"Where is Dr. Arundel?" said Christina, looking


astonished when he did not appear.

"He was prevented at the last moment; we must tell


you about it," said Mrs. Arundel.

"What a pity," exclaimed Ada.

"He was so sorry, and so was I. He sent you a message,


which I must give you presently."

Mrs. Arundel turned to superintend the lifting out of her


invalid; but Arthur and Walter were accustomed to moving
him, and now did it very cleverly, so that without a shake,
he was laid on the drawing room sofa.

"We got here ever so long before you," said Isabel,


bounding in through the French window; "what made you
so long?"

Mrs. Arundel explained all about it to a very interested


audience, and then gave Dr. Arundel's message to Christina,
who looked very grave for a moment when she heard the
age of the baby.

"Do you think I could?" she asked Mrs. Arundel.

"I do not see any insuperable objection as you have


Margaret Fenton, but if you had not, it would be another
thing. It will, however, fill her hands and yours in a
wonderful way; you will begin work in earnest then."

"Just what I should delight in," said Nellie; "but, oh, we


are forgetting the poor young mother!"

The sad story rather sobered the happy party, and it


was some little time before they could turn their thoughts
away from it.

"Is not this a nice room, mamma?" said Ada presently.

"Oh," said Christina, "this is not the part you will care
about! When Tom is able, we must go into the other room."

Tom soon said he should like to go, so Arthur and


Walter carried him between them, while Christina led the
way.

The "other room" was "the play-room." The name


"nursery" had been discarded, because Ada said, "There
might be children of all ages."

Like the drawing room, it had windows to the ground,


with a south-west aspect, looking over the garden, the
Heath, and the Surrey hills.

The floor was covered with a bright-coloured


kamptulicon, while a very ample hearthrug was laid at the
fireplace, which had a large wide-barred guard.

The walls were decorated with tasteful pictures, several


really good engravings, and half a dozen plainly illuminated
texts, which had been Walter's gift. The pictures were all
Scripture subjects; for, as Christina said, earliest
impressions last the longest.

On one side of the large room three small tables, hardly


two feet high, were standing, and near them were six or
eight tempting little wooden chairs, of various shapes,
which would just suit the tables and the little occupants of
the play-room.

Ada sprang forward when her mamma was looking at


these. "Now these are my especial pets," she exclaimed;
"they are the dearest little tables and chairs. Christina, let
me fetch Alfy!"

"Very well," said Christina, smiling.

Ada opened a door at the end of the room, and called,


"Alfy! Maggie!"

The little ones, who were with Margaret Fenton in the


dining room, came rushing in. And though Maggie was very
shy, Alfy feeling he was with old friends, calmly walked to
the little table nearest him, and took his seat in a small
arm-chair.
He proceeded to open a box of toys which stood
conveniently there, and took no further notice of the guests.
Maggie, however, kept close to Ada; for her mother had
wisely closed the door, and disappeared.

"It is delightful," said Mrs. Arundel; "a most lovely


room!"

Near the window were two rocking-chairs and a


medium-sized table, the rest of the floor was left
unoccupied, except by a few chairs against the wall.

Two large cupboards had been fixed on either side of


the fireplace. On the door of one of these was painted in
neat letters, "No fresh toy to be taken out till the last one is
put away." This had been Nellie's suggestion; for though the
children could not read perhaps, the nurse could read it to
them.

On the other cupboard was painted likewise, "Each toy


to be put neatly into its own box when done with."

Arthur laughed heartily at this, and said, pinching her


soft pink cheek, "That's exactly like our Nell—as practical
and as tidy as can be."

"Now for the dining room," said Ada.

This was somewhat like other dining rooms, but was


also covered with kamptulicon, and a good many high
chairs stood round the wall; while Christina's long dining
table, sideboard, and handsome chairs gave an air of
comfort to the room.

"Shall you have dinner with the children or not?" asked


Arthur.
"Sometimes, perhaps; but I have my aunt to think of
too; and I fancy we shall perhaps make our lunch when the
children have dinner, and then dine alone at six o'clock."

"I am sure that would be wise," said Mrs. Arundel.

The kitchen was next inspected, and there they found


the new cook, and Ellen, who looked delighted to see them
all again.

"You must see Mrs. Fenton's cottage presently," said


Christina.

Next came the bedrooms. There were six—Miss


Arbuthnot's, Christina's, a spare room, and a servants'
room, while two of the largest had been reserved for the
children.

The walls of these were painted a pale green, and


"could be washed," as Ada explained. The little beds and
cribs were covered by snowy counterpanes, and were so
arranged that a single strip of green carpet could be put
down the middle of each room. The blinds were green, and
the window-hangings, which were devised to take down and
put up "with no trouble," were white. The china was also
green and white, and everything looked fresh, and
countrified, and peaceful.

Tom was deposited in some convenient place in each


room in turn, and took a keen interest in it all.

"It is beautiful," said Mrs. Arundel, pleased.

"And here is a bath-room," said Ada, opening a door


close by, "with hot and cold water laid on."
Christina opened a drawer in one of the chests, and
asked them to look. They all gathered round, and as they
peeped in they saw neatly arranged a complete suit of
clothes for a little child of about Maggie's size.

"These are all Ada's work," said Christina proudly;


"every stitch! and I can assure you she has been
industrious to get it done, besides all the other things she
has been doing for me from morning till night, and her
school too."

Mrs. Arundel was delighted, and could not forbear


giving her daughter a loving kiss.

Ada blushed deeply at the praise, but said softly, "It was
very little to do after all the goodness and love—"

"Ah!" said her mother, understanding the unfinished


sentence. "But He accepts the least thing done for His sake,
dear."

"Here is another contribution," said Nellie, for Netta had


been squeezing her hand during the last few minutes, and
now brought forward a little parcel which she and Isabel
had conveyed to Hampstead with the greatest care and
pride.

"Why what is it?" asked Christina, bending down and


taking it from them.

On being undone, the parcel was found to contain two


nicely-made little petticoats, and two list bodies, lined with
unbleached calico, which looked as if they would wear for
ever.

"Who are these from?" said Christina, looking kindly in


the two little faces.
"From us," answered Isabel, "for the little orphans."

They were delighted with the loving thanks which they


received, and with seeing their work placed cosily by the
side of Ada's.

Walter, who was standing close behind holding Tom's


frame safely on one of the little beds, now said to Christina,
"Did you hear the sound of a tea-bell?"

She smiled, and said, "I think I did; but they must go
round the garden first, or it will be dark. What a beautiful
October day it is!"

For it was the first of October, the month that was the
last of Walter's holiday.

They then went round the pretty garden and visited


Mrs. Fenton's cottage, where Mrs. Arundel would have liked
to stay to have a chat with the dear old woman; but
Christina stood beckoning to them, and they had to cut
their wanderings short.

"Buttered toast is not nice cold," she said, "so let us


begin, dear friends."

They had a very happy tea-time; and there was plenty


to talk of, and many questions to ask Christina about what
she would do, and how she would arrange, while Miss
Arbuthnot sat next to her niece, and looked very happy and
contented.

"Aunt Mary likes my orphanage better than she


expected," she said, laying her hand on her aunt's.

"Yes, my dear, I do; and I feel pleased to try and help in


any way. I have no doubt when Christina gets more
children, our hands will be very full."

"No fear of there being plenty of children when once


you are ready," said Walter.

"But what is this 'hospital room' you were mentioning,


where that poor creature is gone?" asked Christina. "I have
never heard of it."

"That is one of papa's little quiet bits of 'work for the


King,'" answered Mrs. Arundel. "He rents two rooms in
Cromer Street, near us, where he has put a sort of Bible-
woman nurse, who lives in one of them, and undertakes to
nurse and care for any special sick one whom papa may
send to her. She is able also to visit a few very poor
invalids, who are without the means to pay for even a little
attention, and to these her periodical visits are the greatest
boon. She settles her own patients comfortably, and then
goes out for an hour about eleven o'clock, and again later in
the day, to make a bed for someone here, or a little gruel
for someone there, and then home again almost before she
is missed. We have had several very interesting cases, and
the gratitude of the sick for a little kindly nursing is most
touching."

"It is a beautiful plan," said Christina warmly, "and so


very simple and natural."

"There is the carriage come for us," said Arthur, "and


'the Prince' has been so very interested in everything that
he has forgotten to be as grand as he intended! What a
pity; but he has enjoyed himself extremely
notwithstanding."
CHAPTER XXVI.
SAVED FROM THE RIVER.

WHEN Dr. Arundel turned from seeing his wife on her


way, he called a cab, and placing the poor woman and her
baby in it, drove quickly to Cromer Street, directing the
man to stop at a house near the middle.

His "nurse" was at home, and came directly to the cab


door.

"I have brought you another patient," said Dr. Arundel


cheerfully; "and a baby this time, too."

The nurse held out her arms for it, and the poor weak
mother, after a glance at her kind face, yielded it to her,
tottering, however, after her as quickly as she could.

"Give her two or three spoonfuls of beef-tea at once,


and get her to bed, and in about twenty minutes I will call
in and see how she is."

With an unspoken explanation, which the nurse seemed


to comprehend, he turned away to visit another patient
near.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like