Instant download Deep Learning with Python 2nd Edition Nikhil Ketkar pdf all chapter
Instant download Deep Learning with Python 2nd Edition Nikhil Ketkar pdf all chapter
com
https://ebookmeta.com/product/deep-learning-with-python-2nd-
edition-nikhil-ketkar/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/deep-learning-with-python-1st-edition-
nikhil-ketkar/
ebookmeta.com
https://ebookmeta.com/product/fundamentals-of-deep-learning-nikhil-
buduma/
ebookmeta.com
https://ebookmeta.com/product/developing-self-and-self-concepts-in-
early-childhood-education-and-beyond-1st-edition-bridie-raban/
ebookmeta.com
The Consumer Society Myths and Structures Jean Baudrillard
https://ebookmeta.com/product/the-consumer-society-myths-and-
structures-jean-baudrillard/
ebookmeta.com
https://ebookmeta.com/product/insight-guides-france-travel-guide-
ebook-7th-edition-insight-guides/
ebookmeta.com
https://ebookmeta.com/product/world-s-great-men-of-color-volume-ii-
rogers-j-a/
ebookmeta.com
https://ebookmeta.com/product/chicken-essentials-a-chicken-cookbook-
with-delicious-chicken-recipes-2nd-edition-booksumo-press/
ebookmeta.com
https://ebookmeta.com/product/another-kind-of-eden-first-edition-
james-lee-burke/
ebookmeta.com
Lippincott Illustrated Reviews Biochemistry 7th Edition
2017 Denise R Ferrier
https://ebookmeta.com/product/lippincott-illustrated-reviews-
biochemistry-7th-edition-2017-denise-r-ferrier/
ebookmeta.com
Nikhil Ketkar and Jojo Moolayil
Deep Learning with Python
Learn Best Practices of Deep Learning Models with
PyTorch
2nd ed.
Nikhil Ketkar
Bangalore, Karnataka, India
Jojo Moolayil
Vancouver, BC, Canada
Apress Standard
Trademarked names, logos, and images may appear in this book. Rather
than use a trademark symbol with every occurrence of a trademarked
name, logo, or image we use the names, logos, and images only in an
editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark. The use in this publication
of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of
opinion as to whether or not they are subject to proprietary rights.
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
—Nikhil Ketkar
—Jojo Moolayil
Table of Contents
Chapter 1:Introduction to Machine Learning and Deep Learning
Defining Deep Learning
A Brief History
Advances in Related Fields
Prerequisites
The Approach Ahead
Installing the Required Libraries
The Concept of Machine Learning
Binary Classification
Regression
Generalization
Regularization
Summary
Chapter 2:Introduction to PyTorch
Why Do We Need a Deep Learning Framework?
What Is PyTorch?
Why PyTorch?
It All Starts with a Tensor
Creating Tensors
Tensor Munging Operations
Mathematical Operations
Element-Wise Mathematical Operations
Trigonometric Operations in Tensors
Comparison Operations for Tensors
Linear Algebraic Operations
Summary
Chapter 3:Feed-Forward Neural Networks
What Is a Neural Network?
Unit
The Overall Structure of a Neural Network
Expressing a Neural Network in Vector Form
Evaluating the Output of a Neural Network
Training a Neural Network
Deriving Cost Functions Using Maximum Likelihood
Binary Cross-Entropy
Cross-Entropy
Squared Error
Summary of Loss Functions
Types of Activation Functions
Linear Unit
Sigmoid Activation
Softmax Activation
Rectified Linear Unit
Hyperbolic Tangent
Backpropagation
Gradient Descent Variants
Gradient-Based Optimization Techniques
Practical Implementation with PyTorch
Summary
Chapter 4:Automatic Differentiation in Deep Learning
Numerical Differentiation
Symbolic Differentiation
Automatic Differentiation Fundamentals
Implementing Automatic Differentiation
Summary
Chapter 5:Training Deep Leaning Models
Performance Metrics
Classification Metrics
Regression Metrics
Data Procurement
Splitting Data for Training, Validation, and Testing
Establishing the Achievable Limit on the Error Rate
Establishing the Baseline with Standard Choices
Building an Automated, End-to-End Pipeline
Orchestration for Visibility
Analysis of Overfitting and Underfitting
Hyperparameter Tuning
Model Capacity
Regularizing the Model
Early Stopping
Norm Penalties
Dropout
A Practical Implementation in PyTorch
Interpreting the Business Outcomes for Deep Learning
Summary
Chapter 6:Convolutional Neural Networks
Convolution Operation
Pooling Operation
Convolution-Detector-Pooling Building Block
Stride
Padding
Batch Normalization
Filter
Filter Depth
Number of Filters
Summarizing key learnings from CNNs
Implementing a basic CNN using PyTorch
Implementing a larger CNN in PyTorch
CNN Thumb Rules
Summary
Chapter 7:Recurrent Neural Networks
Introduction to RNNs
Training RNNs
Bidirectional RNNs
Vanishing and Exploding Gradients
Gradient Clipping
Long Short-Term Memory
Practical Implementation
Summary
Chapter 8:Recent Advances in Deep Learning
Going Beyond Classification in Computer Vision
Object Detection
Image Segmentation
Pose Estimation
Generative Computer Vision
Natural Language Processing with Deep Learning
Transformer Models
Bidirectional Encoder Representations from Transformers
GrokNet
Additional Noteworthy Research
Concluding Thoughts
Index
About the Authors
Nikhil Ketkar
currently leads the Machine Learning
Platform team at Flipkart, India’s largest
ecommerce company. He received his
PhD from Washington State University.
Following that, he conducted
postdoctoral research at University of
North Carolina at Charlotte, which was
followed by a brief stint in high-
frequency trading at TransMarket in
Chicago. More recently, he led the data
mining team at Guavus, a startup doing
big data analytics in the telecom domain,
and Indix, a startup doing data science in the ecommerce domain. His
research interests include machine learning and graph theory.
Jojo Moolayil
is an artificial intelligence professional
and published author of three books on
machine learning, deep learning, and IoT.
He is currently working with Amazon
Web Services as a Research Scientist –
A.I. in their Vancouver, BC office.
In his current role with AWS, Jojo
works on researching and developing
large-scale A.I. solutions for combating
fraud and enriching the customer’s
payment experience in the cloud. He is
also actively involved as a technical
reviewer and AI consultant with leading
publishers and has reviewed over a
dozen books on machine learning, deep
learning, and business analytics.
You can reach Jojo at:
https://www.jojomoolayil.com/
https://www.linkedin.com/in/jojo62000
https://twitter.com/jojo62000
About the Technical Reviewers
Judy T. Raj
is a Google Certified Professional Cloud
Architect. She has great experience with
the three leading cloud platforms—
Amazon Web Services, Azure, and Google
Cloud Platform—and has co-authored a
book on Google Cloud Platform with
Packt Publications. She has also worked
with a wide range of technologies in
machine learning, data science,
blockchains, IoT, robotics, and mobile
and web app development. She is
currently a technical content engineer in
Loonycorn. Judy holds a degree in
computer science and engineering from Cochin University of Science
and Technology. A driven engineer fascinated with technology, she is a
passionate coder, a machine language enthusiast, and a blockchain
aficionado.
Manohar Swamynathan
is a data science practitioner and an avid
programmer, with more than 14 years of
experience in various data science-
related areas, including data
warehousing, business intelligence (BI),
analytical tool development, ad-hoc
analysis, predictive modeling, data
science product development,
consulting, formulating strategy, and
executing analytics programs. His career
has covered the life cycle of data across
multiple domains, such as US mortgage
banking, retail/ecommerce, insurance,
and industrial IoT. Manohar has a bachelor’s degree with a
specialization in physics, mathematics, computers, and a master’s
degree in project management. He is currently living in Bengaluru, the
silicon valley of India.
© Nikhil Ketkar, Jojo Moolayil 2021
N. Ketkar, J. Moolayil, Deep Learning with Python
https://doi.org/10.1007/978-1-4842-5364-9_1
1. Introduction to Machine Learning and Deep Learning
Nikhil Ketkar1 and Jojo Moolayil2
(1) Bangalore, Karnataka, India
(2) Vancouver, BC, Canada
The subject of deep learning has gained immense popularity recently, and, in the process, has given rise to
several terminologies that make distinguishing them fairly complex. One might find the task of neatly
separating each field overwhelming, with the sheer volume of overlap between the topics.
This chapter introduces the subject of deep learning by discussing its historical context and how the field
evolved into its present-day form. Later, we will introduce machine learning by covering the foundational
topics in brief. To start with deep learning, we will leverage the constructs gained from machine learning
using basic Python. Chapter 2 begins the practical implementation using PyTorch.
Defining Deep Learning
Deep learning is a subfield within machine learning that deals with the algorithms that closely resemble an
over-simplified version of the human brain that solves a vast category of modern-day machine intelligence.
Many common examples can be found within the smartphone’s app ecosystem (iOS and Android): face
detection on the camera, auto-correct and predictive text on keyboards, AI-enhanced beautification apps,
smart assistants like Siri/Alexa/Google Assistant, Face-ID (face unlock on iPhones), video suggestions on
YouTube, friend suggestions on Facebook, cat filters on Snapchat are all products that were made the state-
of-the-art only for deep learning. Essentially, deep learning is ubiquitous in the today’s digital life.
Truth be told, it can be complicated to define deep learning without navigating some historical context.
A Brief History
The journey of artificial intelligence (AI) to its present day can be broadly divided into four parts: viz. rule-
based systems, knowledge-based systems, machine, and deep learning. Although the granular transitions in
the journey can be mapped into several important milestones, we will cover a more simplistic overview. The
entire evolution is encompassed into the larger idea of “artificial intelligence.” Let’s take a step-by-step
approach to tackle this broad term.
The journey of Deep Learning starts with the field of artificial intelligence, the rightful parent of the field,
and has a rich history going back to the 1950s. The field of artificial intelligence can be defined in simple
terms as the ability of machines to think and learn. In more layman words, we would define it as the process
of aiding machines with intelligence in some form so that they can execute a task better than before. The
above Figure 1-1 showcases a simplified landscape of AI with the various aforementioned fields showcased
a subset. We will explore each of these subsets in more detail in the section below.
Rule-Based Systems
The intelligence we induce into a machine may not necessarily be a sophisticated process or ability;
something as simple as a set of rules can be defined as intelligence. The first-generation AI products were
simply rule-based systems, wherein a comprehensive set of rules were guided to the machine to map the
exhaustive possibilities. A machine that executes a task based on defined rules would result in a more
appealing outcome than a rigid machine (one without intelligence).
A more layman example for the modern-day equivalent would be an ATM that dispenses cash. Once
authenticated, users enter the amount they want and the machine, based on the existing combination of
notes in-store, dispenses the correct amount with the least number of bills. The logic (intelligence) for the
machine to solve the problem is explicitly coded (designed). The designer of the machine carefully thought
through the comprehensive list of possibilities and designed a system that can solve the task
programmatically with finite time and resources.
Most of the early day’s success in artificial intelligence was fairly simple. Such tasks can be easily
described formally, like the game of checkers or chess. This notion of being able to easily describe the task
formally is at the heart of what can or cannot be done easily by a computer program. For instance, consider
the game of chess. The formal description of the game of chess would be the representation of the board, a
description of how each of the pieces moves, the starting configuration, and a description of the
configuration wherein the game terminates. With these notions formalized, it is relatively easy to model a
chess-playing AI program as a search, and, given sufficient computational resources, it’s possible to produce
relatively good chess-playing AI.
The first era of AI focused on such tasks with a fair amount of success. At the heart of the methodology
were a symbolic representation of the domain and the manipulation of the symbols based on given rules
(with increasingly sophisticated algorithms for searching the solution space to arrive at a solution).
It must be noted that the formal definitions of such rules were done manually. However, such early AI
systems were fairly general-purpose task/problem solvers in the sense that any problem that could be
described formally could be solved with the generic approach.
The key limitation of such systems is that the game of chess is a relatively easy problem for AI simply
because the problem set is relatively simple and can be easily formalized. This is not the case with many of
the problems human beings solve on a day-to-day basis (natural intelligence). For instance, consider
diagnosing a disease or transcribing human speech to text. These tasks, which human beings can do but
which are hard to describe formally, presented as a challenge in the early days of AI.
Knowledge-Based Systems
The challenge of addressing natural intelligence to solve day-to-day problems evolved the landscape of AI
into an approach akin to human-beings—i.e., by leveraging a large amount of knowledge about the
task/problem domain. Given this observation, subsequent AI systems relied on large knowledge bases that
captured the knowledge about the problem/task domain. Note that the term used here is knowledge, not
information or data. By knowledge, we simply mean data/information that a program/algorithm can reason
about. An example could be a graph representation of a map with edges labeled with distances and about of
traffic (which is being constantly updated), allowing a program to reason about the shortest path between
points.
Such knowledge-based systems, wherein the knowledge was compiled by experts and represented in a
way that allowed algorithms/programs to reason about it, represented the second generation of AI. At the
heart of such approaches were increasingly sophisticated approaches for representing and reasoning about
knowledge to solve tasks/problems that required such knowledge. Examples of such sophistication include
the use of first-order logic to encode knowledge and probabilistic representations to capture and reason
where uncertainty is inherent to the domain.
One of the key challenges that such systems faced, and addressed to some extent, was the uncertainty
inherent in many domains. Human beings are relatively good at reasoning in environments with unknowns
and uncertainty. One key observation here is that even the knowledge we hold about a domain is not black
or white but grey. A lot of progress was made in this era on representing and reasoning about unknowns and
uncertainty. There were some limited successes in tasks like diagnosing a disease that relied on leveraging
and reasoning using a knowledge base in the presence of unknowns and uncertainty.
The key limitation of such systems was the need to hand-compile the knowledge about the domain from
experts. Collecting, compiling, and maintaining such knowledge bases rendered such systems impractical. In
certain domains, it was extremely hard to even collect and compile such knowledge—for example,
transcribing speech to text or translating documents from one language to another. While human beings can
easily learn to do such tasks, it’s extremely challenging to hand-compile and encode the knowledge related
to the tasks—for instance, the knowledge of the English language and grammar, accents, and subject matter.
To address these challenges, machine learning is the way forward.
Machine Learning
In formal terms, we define machine learning as the field within AI where intelligence is added without
explicit programming. Human beings acquire knowledge for any task through learning. Given this
observation, the focus of subsequent work in AI shifted over a decade or two to algorithms that improved
their performance based on data provided to them. The focus of this subfield was to develop algorithms that
acquired relevant knowledge for a task/problem domain given data. It is important to note that this
knowledge acquisition relied on labeled data and a suitable representation of labeled data as defined by a
human being.
Consider, for example, the problem of diagnosing a disease. For such a task, a human expert would collect
a lot of cases where a patient had and did not have the disease in question. Then, the human expert would
identify a number of features that would aid in making the prediction—for example, the age and gender of
the patient, and the results from a number of diagnostic tests, such as blood pressure, blood sugar, etc. The
human expert would compile all this data and represent it in a suitable form—for example, by
scaling/normalizing the data, etc. Once this data were prepared, a machine learning algorithm could learn
how to infer whether the patient has the disease or not by generalizing from the labeled data. Note that the
labeled data consisted of patients that both have and do not have the disease. So, in essence, the underlying
machine language algorithm is essentially doing the job of finding a mathematical function that can produce
the right outcome (disease or no disease) given the inputs (features like age, gender, data from diagnostic
tests, and so forth). Finding the simplest mathematical function that predicts the outputs with the required
level of accuracy is at the heart of the field of machine learning. For example, questions related to the
number of examples required to learn a task or the time complexity of an algorithm are specific areas for
which the field of ML has provided answers with theoretical justification. The field has matured to a point
where, given enough data, compute resources, and human resources to engineer features, a large class of
problems are solvable.
The key limitation of mainstream machine language algorithms is that applying them to a new problem
domain requires a massive amount of feature engineering. For instance, consider the problem of recognizing
objects in images. Using traditional machine language techniques, such a problem would require a massive
feature-engineering effort wherein experts identify and generate features that would be used by the
machine language algorithm. In a sense, true intelligence is in the identification of features; the machine
language algorithm is simply learning how to combine these features to arrive at the correct answer. This
identification of features or the representation of data that domain experts do before machine language
algorithms are applied is both a conceptual and practical bottleneck in AI.
It’s a conceptual bottleneck because if features are being identified by domain experts and the machine
language algorithm is simply learning to combine and draw conclusions from this, is this really AI? It’s a
practical bottleneck because the process of building models via traditional machine language is
bottlenecked by the amount of feature engineering required. There are limits to how much human effort can
be thrown at the problem.
Deep Learning
The major bottleneck in machine learning systems was solved with deep learning. Here, we essentially took
the intelligence one step further, where the machine develops relevant features for the task in an automated
way instead of hand-crafting. Human beings learn concepts starting from raw data. For instance, a child
shown with a few examples of a particular animal (say, cats) will soon learn to identify the animal. The
learning process does not involve a parent identifying a cat’s features, such as its whiskers, fur, or tail.
Human learning goes from raw data to a conclusion without the explicit step where features are identified
and provided to the learner. In a sense, human beings learn the appropriate representation of data from the
data itself. Furthermore, they organize concepts as a hierarchy where complicated concepts are expressed
using primitive concepts.
The field of deep learning has its primary focus on learning appropriate representations of data such
that these could be used to conclude. The word “deep” in “deep learning” refers to the idea of learning the
hierarchy of concepts directly from raw data. A more technically appropriate term for deep learning would
be representation learning , and a more practical term for the same would be automated feature engineering .
Advances in Related Fields
It is important to note the advances in other fields like compute power, storage cost, etc. that have played a
key role in the recent interest and success of deep learning. Consider the following, for example:
The ability to collect, store and process large amounts of data has greatly advanced over the last decade
(for instance, the Apache Hadoop ecosystem).
The ability to generate supervised training data (data with labels—for example, pictures annotated with
the objects in the picture) has improved a lot with the availability of crowd-sourcing services (like
Amazon Mechanical Turk).
The massive improvements in computational horsepower brought about by graphical processing units
(GPUs) enabled parallel computing to new heights.
The advances in both the theory and software implementation of automatic differentiation (such as
PyTorch or Theano) accelerated the speed of development and research for deep learning.
Although these advancements are peripheral to deep learning, they have played a big role in enabling
advances in deep learning.
Prerequisites
The key prerequisites for reading this book include a working knowledge of Python and some coursework in
linear algebra, calculus, and probability. Readers should refer to the following in case they need to cover
these prerequisites.
Dive Into Python, by Mark Pilgrim - Apress Publications (2004)
Introduction to Linear Algebra (Fifth Edition), by Gilbert Strang - Wellesley-Cambridge Press
Calculus, by Gilbert Strang - Wellesley-Cambridge Press
All of Statistics (Section 1, chapters 1-5), by Larry Wasserman - Springer (2010)
The Approach Ahead
This book focuses on the key concepts of deep learning and its practical implementation using PyTorch. In
order to use PyTorch, you should possess a basic understanding of Python programming. Chapter 2
introduces PyTorch, and the subsequent chapters discuss additional important constructs within PyTorch.
Before delving into deep learning, we need to discuss the basic constructs of machine learning. In the
remainder of this chapter, we will explore the baby steps of machine learning with a dummy example. To
implement the constructs, we will use Python and again implement the same using PyTorch.
Installing the Required Libraries
You need to install a number of libraries in order to run the source code for the examples in this book. We
recommend installing the Anaconda Python distribution
(https://www.anaconda.com/products/individual), which simplifies the process of installing
the required packages (using either conda or pip). The list of packages you need include NumPy, matplotlib,
scikit-learn, and PyTorch.
PyTorch is not installed as a part of the Anaconda distribution. You should install PyTorch, torchtext, and
torchvision, along with the Anaconda environment.
Note that Python 3.6 (and above) is recommended for the exercises in this book. We highly recommend
creating a new Python environment after installing the Anaconda distribution.
Create a new environment with Python 3.6 (use Terminal in Linux/Mac or the Command Prompt in
Windows), and then install the additional necessary packages, as follows:
For additional help with PyTorch, please refer to the Get Started guide at
https://pytorch.org/get-started/locally/.
The Concept of Machine Learning
As human beings, we are intuitively aware of the concept of learning. It simply means to get better at a task
over time. The task could be physical, such as learning to drive a car, or intellectual, such as learning a new
language. The subject of machine learning focuses on the development of algorithms that can learn as
humans learn; that is, they get better at a task over a period over time and with experience—thus inducing
intelligence without explicit programming.
The first question to ask is why we would be interested in the development of algorithms that improve
their performance over time, with experience. After all, many algorithms are developed and implemented to
solve real-world problems that don’t improve over time; they simply are developed by humans,
implemented in software, and get the job done. From banking to ecommerce and from navigation systems in
our cars to landing a spacecraft on the moon, algorithms are everywhere, and, a majority of them do not
improve over time. These algorithms simply perform the task they are intended to perform, with some
maintenance required from time to time. Why do we need machine learning?
The answer to this question is that for certain tasks it is easier to develop an algorithm that
learns/improves its performance with experience than to develop an algorithm manually. Although this
might seem unintuitive to the reader at this point, we will build intuition for this during this chapter.
Machine learning can be broadly classified as supervised learning , where training data with labels is
provided for the model to learn, and unsupervised learning , where the training data lacks labels. We also
have semi-supervised learning and reinforcement learning , but for now, we would limit our scope to
supervised machine learning. Supervised learning can again be classified into two areas: classification, for
discrete outcomes, and regression, for continuous outcomes.
Binary Classification
In order to further discuss the matter at hand, we need to be precise about some of the terms we have been
intuitively using, such as task, learning, experience, and improvement. We will start with the task of binary
classification.
Consider an abstract problem domain where we have data of the form
where x ∈ ℝn and y = ± 1.
We do not have access to all such data but only a subset S ∈ D. Using S, our task is to generate a
computational procedure that implements the function f : x → y such that we can use f to make predictions
over unseen data (xi, yi) ∉ S that are correct, f(xi) = yi. Let’s denote U ∈ D as the set of unseen data—that is,
(xi, yi) ∉ S and (xi, yi) ∈ U.
We measure performance over this task as the error over unseen data
We now have a precise definition of the task, which is to categorize data into one of two categories
(y = ± 1) based on some seen data S by generating f. We measure performance (and improvement in
performance) using the error E(f, D, U) over unseen data U. The size of the seen data |S| is the conceptual
equivalent of experience. In this context, we want to develop algorithms that generate such functions f
(which are commonly referred to as a model). In general, the field of machine learning studies the
development of such algorithms that produce models that make predictions over unseen data for such, and,
other formal tasks. (We introduce multiple such tasks later in the chapter.) Note that the x is commonly
referred to as the input/input variable and y is referred to as the output/output variable .
As with any other discipline in computer science, the computational characteristics of such algorithms
are an important facet; however, in addition to that, we also would like to have a model f that achieves a
lower error E(f, D, U) with as small a ∣S∣ as possible.
Let’s now relate this abstract but precise definition to a real-world problem so that our abstractions are
grounded. Suppose that an ecommerce website wants to customize its landing page for registered users to
show the products they might be interested in buying. The website has historical data on users and would
like to implement this as a feature to increase sales. Let’s now see how this real-world problem maps on to
the abstract problem of binary classification we described earlier.
The first thing that one might notice is that given a particular user and a particular product, one would
want to predict whether the user will buy the product. Since this is the value to be predicted, it maps on to
y = ± 1, where we will let the value of y = + 1 denote the prediction that the user will buy the product and
the value of y = − 1 denote the prediction that the user will not buy the product. Note that there is no
particular reason for picking these values; we could have swapped this (let y = + 1 denote the does not buy
case and y = − 1 denote the buy case), and there would be no difference. We just use y = ± 1 to denote the
two classes of interest to categorize data. Next, let’s assume that we can represent the attributes of the
product and the users buying and browsing history as x ∈ ℝn. This step is referred to as feature engineering
in machine learning and we will cover it later in the chapter. For now, it suffices to say that we are able to
generate such a mapping. Thus, we have historical data of what the users browsed and bought, attributes of
a product, and whether the user bought the product or not mapped on to {(x1, y1), (x2, y2), …(xn, yn)}. Now,
based on this data, we would like to generate a function or a model f : x → y, which we can use to determine
which products a particular user will buy, and use this to populate the landing page for users. We can
measure how well the model is doing on unseen data by populating the landing page for users, seeing
whether they buy the products or not, and evaluating the error E(f, D, U).
Regression
This section introduces another task: regression. Here, we have data of the form D = {(x1, y1), (x2, y2), …(xn,
yn)}, where x ∈ ℝn and y ∈ ℝ, and our task is to generate a computational procedure that implements the
function f : x → y. Note that instead of the prediction being a binary class label y = ± 1, like in binary
classification, we have real valued prediction. We measure performance over this task as the root-mean-
square error (RMSE) over unseen data
Note that the RMSE is simply taking the difference between the predicted and actual value, squaring it
so as to account for both positive and negative differences, taking the mean so as to aggregate over all the
unseen data, and, finally, taking the square root so as to counterbalance the square operation.
A real-world problem that corresponds to the abstract task of regression is to predict the credit score for an
individual based on their financial history, which can be used by a credit card company to extend the line of
credit.
Generalization
Let’s now cover what is the single most important intuition in machine leaning, which is that we want to
develop/generate models that have good performance over unseen data. In order to do that, first will we
introduce a toy data set for a regression task. Later, we will develop three different models using the same
dataset with varying levels of complexity and study how the results differ to understand intuitively the
concept of generalization.
In Listing 1-1, we generate the toy dataset by generating 100 values equidistantly between -1 and 1 as
the input variable (x). We generate the output variable (y) based on y = 2 + x + 2x2 + ϵ, where
is noise (random variation) from a normal distribution, with 0 being the mean and 0.1 being the standard
deviation. The code for this is presented in Listing 1-1, and the data is plotted in Figure 1-2. In order to
simulate seen and unseen data, we use the first 80 data points as seen data and treat the rest as unseen data.
That is, we build the model using only the first 80 data points and use the rest for evaluating the model.
#import packages
import matplotlib.pyplot as plt
import numpy as np
Output[]
Shape of x_train: (80,)
Shape of y_train: (80,)
Listing 1-1 Generalization vs. Rote Learning
Another Random Document on
Scribd Without Any Related Topics
that his account is hidden in the Proceedings of the Bath Natural History
and Antiquarian Field Club, vol. ii. no. 3, 1872.
[107] See N. C. vol. iv. p. 310.
[108] Chron. Petrib. 1088. “And syððon foron út of þam castele and
hergodon Baðon, and eall þæt land þær abutan.” Florence adds the
burning; “Rotbertus … congregato exercitu invasit Bathoniam, civitatem
regiam, eamque igne succendit.”
[109] Flor. Wig. 1088. “Illa [Bathonia] deprædata, transivit in
Wiltusciram, villasque depopulans, multorumque hominum strage facta,
tandem adiit Givelceastram, obsedit, et expugnare disposuit.”
[110] Geveltone, now Yeovilton, was held by one Ralph under William of
Eu (Domesday, 96 b). Givele, now Yeovil, was held by Count Robert
(Domesday, 93). All these names come in various corruptions from the
river Givel or Ivel, also called Yeo. Only in Yeovil we may trace a bit of
false etymology, which has also set the pattern to Yeovilton.
[111] I took with me to Ilchester a book by the Rev. W. Buckler,
“Ilchester Almshouse Deeds” (Yeovil, 1866), which contains the accounts
of Ilchester from Leland, Camden, and Stukeley, together with Stukeley’s
map. The last-named writer may have drawn somewhat on his
imagination; but I could trace the line of the walls, represented in a great
part of their course by modern buildings. Under the circumstances of the
site, the usual carfax is not to be found at Ilchester, any more than at
Godmanchester.
[112] Domesday, 86 a. “In Givelcestre sunt 107 burgenses, reddentes
xx. solidos. Mercatum cum suis appendiciis reddit xi. libras.”
[113] Flor. Wig. 1088. “Pugnant exterius spe capti prædæ et amore
victoriæ, repugnant intrinsecus acriter pro se suorumque salute. Tandem
inter utrumque necessitatis vicit causa; repulsus et tristis recedit Rotbertus
privatus victoria.” The Chronicle and William of Malmesbury do not speak
of Ilchester. William thus sums up the campaign; “Gaufridus episcopus,
cum nepote, Bathoniam et Bercheleiam partemque pagi Wiltensis
depopulans, manubias apud Bristou collocabat.”
[114] See N. C. vol. ii. p. 144.
[115] Chron. Petrib. 1088. “And eall Beorclea hyrnesse hi awæston.”
Florence more fully; “Willelmus de Owe Glawornensem invadit comitatum,
regiam villam deprædatur Beorchelaum, per totam ferro et flamma grande
perpetrat malum.”
[116] See N. C. vol. ii. p. 557.
[117] See Domesday, 164. But it had already given a name to Roger and
Ralph of Berkeley; Domesday, 168. From Roger’s descendants it passed by
marriage to Robert the son of Harding. See N. C. vol. iv. p. 758.
[118]Domesday, 163. “In Nesse [Sharpness] sunt v. hidæ pertinentes
ad Berchelai quos W. comes misit extra ad faciendum unum castellulum.”
[119] Since I wrote the fourth volume of the Norman Conquest, there
has been much controversy about the origin of Robert Fitz-Harding. (See
Notes and Queries, Jan. 3rd, 1880.) I am confirmed on the whole in my
old belief that he was the son of Harding the son of Eadnoth.
[120] See N. C. vol. iv. pp. 590, 855.
[121] See above, p. 33.
[122] Chron. Petrib. 1088. “Þa men þe yldest wæron of Hereforde, and
eall þeo scír forþmid, and þa men of Scrobscyre mid mycele folce of
Brytlande.”
[123] See above, p. 33.
[124] Flor. Wig. 1088. “Cum hominibus comitis Rogerii de
Scrobbesbyria.” Yet the Chronicler says distinctly, “And Rogere eorl wæs
eac æt þam unræde.” That is, he joined in the conspiracy, but did not take
a personal share in the war.
[125] See above, p. 35, note 3.
[126]Flor. Wig. 1088. “Congregato magno Anglorum, Normannorum, et
Walensium exercitu.”
[127] See N. C. vol. ii. p. 395.
[128] Ib. vol. i. p. 520.
[129] Chron. Petrib. 1088. “Þa men … comon and hergodon and
þærndon on Wiðreceastrescire forð, and hi comon to þam porte sylfan,
and woldon þa þæne port bærnen, and þæt mynster reafian, and þæs
cynges castel gewinnan heom to handa.” Florence adds, “grandem de
regis incolis fidelibus sumpturos vindictam.” On the deliverance of
Worcester, see Appendix D.
[130] Florence brings in his own Bishop with a panegyric; “Vir magnæ
pietatis et columbinæ simplicitatis, Deo populoque quern regebat in
omnibus amabilis, regi, ut terreno domino, per omnia fidelis, pater
reverendus Wlstanus.” In the Chronicle he is simply “se arwurða bisceop
Wlfstan.” He goes on to make his exhortation after the manner of Moses.
[131] See N. C. vol. iii. p. 61.
[132] Ib. vol. iv. p. 579.
[133] See N. C. vol. iv. p. 174.
[134] See N. C. vol. iv. p. 379.
[135] Ib.
[136] Flor. Wig. 1088. “Normanni interim, ineuntes consilium, rogant
ipsum episcopum ut ab ecclesia transiret in castellam, tutiores se
affirmantes de ejus præsentia, si majus incumberet periculum; diligebant
enim eum valde. Ipse enim, ut erat miræ mansuetudinis, et pro regis
fidelitate, et pro eorum dilectione, petitioni eorum adquievit.”
[137] See N. C. vol. iv. p. 174.
[138] Flor. Wig. u. s. “Interea audenter in arma se parat episcopalis
familia.” On the nature of this “familia,” see N. C. vol. v. p. 496.
[139]Ib. “Inter quos [hostes] magna belli jam fervebat insania;
contumaciter enim episcopi contemnentes mandata, in terram ipsius
posuerunt incendia.” On the order of events, see Appendix D.
[140] Ib. “Conveniunt castellani et omnis civium turma, occurrere se
affirmant hostibus ex altera parte Sabrinæ fluminis, si hoc eis pontificis
annueret licentia. Parati igitur et armis instructi, ipsum ad castellum
euntem habent obviam, quam optabant requirunt licentiam; quibus
libentur annuens, ‘Ite,’ inquit, ‘filii, ite in pace, ite securi, cum Dei et nostra
benedictione.’ Confidens ego in Domino, spondeo vobis, non hodie nocebit
vobis gladius, non quicquam infortunii, non quisquam adversarius. State in
regis fidelitate, viriliter agentes pro populi urbisque salute.”
[141]Ib. “Episcopus ingenti concutitur dolore, videns debilitari res
ecclesiæ, acceptoque inde consilio, gravi eos, ab omnibus qui
circumaderant coactus, percussit anathemate.” See Appendix D.
[142] Ib. “Alacres pontem reparatum transeunt, hostes de longinquo
accelerantes conspiciunt.”
[143] See Appendix D.
[144] Flor. Wig. u. s. “Cæduntur pedites, capiuntur milites, cum
Normannis tam Angli quam Walenses, cæteris vero vix debili elapsis fuga
[were the ‘milites’ spared for the sake of ransom?] regis fideles cum
pontificis familia, exultantes in gaudio, sine ulla diminutione suorum,
redeunt ad propria; gratias Deo referunt de rerum ecclesiæ incolumitate,
gratias episcopo referunt de consilii ejus salubritate.”
[145] See N. C. vol. iv. p. 386.
[146] Chron. Petrib, 1088. “Þe wæs ærur heafod to þam unræde.”
[147] See above, p. 29.
[148] Chron. Petrib. 1088. “Ðe bisceop Odo, þe þas cyng of awocan,
ferde into Cent to his earldome and fordyde hit swyðe, and þæs cynges
land and þæs arcebisceopes mid ealle aweston, and brohte eall þæt gód
into his castele on Hrofeceastre.” This follows at once on the accounts of
Roger the Bigod and Hugh of Grantmesnil. So William of Malmesbury, who
here brings in the story of Lanfranc’s share in Odo’s imprisonment in 1082,
in order to account for Odo’s special hatred towards the Archbishop.
[149] See N. C. vol. i. pp. 267, 296. On the early history of Rochester
generally, see Mr. Hartshorne’s paper in the Archæological Journal,
September, 1863.
[150] This is brought out by Orderic, 667 B; “Oppidum igitur Rovecestræ
sollicita elegerunt provisione, quoniam, si rex eos non obsedisset in urbe,
in medio positi laxis habenis Lundoniam et Cantuariam devastarent, et per
mare, quod proximum est, insulasque vicinas, pro auxiliis conducendis
nuntios cito dirigerent.” The islands must be Sheppey and Thanet.
[151] See the siege of Rochester in 1215 and his defence by William of
Albini in Roger of Wendover, iii. 333.
[152] For the siege of 1264 see W. Rishanger, Chron. p. 25 (Camd. Soc.).
On Simon’s military engines he remarks that the Earl “exemplum
relinquens Anglicis qualiter circa castrorum assultationes agendum sit, qui
penitus hujusmodi diebus illis fuerant ignari.” A forerunner of Kanarês, he
had a fire-ship in the river; he also used mines, as the Conqueror had
done at Exeter.
[153] Mr. Hartshorne showed distinctly that the present tower of
Rochester was not built by Gundulf, but by William of Corbeuil. See the
passages which he quotes from Gervase, X Scriptt. 1664, and the
continuator of Florence, 1126. But we have seen (see N. C. vol. iv. p. 366)
that Gundulf did build a stone castle at Rochester for William Rufus
(“castrum Hrofense lapidum”), and we should most naturally look for it on
the site of the later one. On the other hand, there is a tower, seemingly of
Gundulf’s building and of a military rather than an ecclesiastical look,
which is now almost swallowed up between the transepts of the cathedral.
But it would be strange if a tower built for the King stood in the middle of
the monastic precinct.
[154] The odd position of the cloister at Rochester suggests the notion
that Gundulf’s church occupied only the site of the present eastern limb,
and that the later Norman nave was an enlargement rather than a
rebuilding.
[155] Domesday, 2 b. “Episcopus de Rouecestre pro excambio terræ in
qua castellum sedet, tantum de hac terra tenet quod xvii.s. et iv.d. valet.”
This is said of land at Aylesford; but the castle spoken of must surely be
that of Rochester. The Domesday phrase “sedet” seems beautifully to
describe either the massive square donjon or the shell-keep on the
mound; yet it may be doubted whether Rochester had either in the
Conqueror’s day.
[156] This ditch is said to have been traced right across the middle of
the cathedral, with the twelfth-century nave to the west of it. I can say
nothing either way from my own observation; but such an extension of
the church to the west would exactly answer to the extension of the
churches of Le Mans and Lincoln to the east. In both those cases the
Roman wall had to give way.
[157] See N. C. vol. iv. p. 367.
[158] Ord. Vit. 667 A. “Tunc Odo Bajocensis cum quingentis militibus
intra Rofensem urbem se conclusit, ibique Robertum ducem cum suis
auxiliaribus secundum statuta quæ pepigerant præstolari proposuit.” The
last clause of course implies the supposed earlier agreement with Duke
Robert, on which see above, p. 25, and Appendix B.
[159] Flor. Wig. 1088. “Rumore autem percussus insolito, comes exultat,
amicis nunciat, quasi jam de victoria securus triumphat, plures ad prædam
incitat; Odoni episcopo, patruo suo, auxiliarios in Angliam legat, se
quantocius, congregato majori exercitu, secuturum affirmat.”
[160] Ib. “Prædictus episcopus Baiocensis, munita Roveceastra, misit
Normanniam, exhortans comitem Rotbertum cito venire in Angliam,
nuntians ei rem gestam, affirmans paratum sibi regnum, et si sibi non
desisteret, paratam et coronam.”
[161] Ib. “Missi a comite Rotberto venerunt in Angliam, ab Odone
episcopo ad custodiendum receperunt Roveceastram; et horum ut
primates Eustatius junior, comes Bononiæ, et Rotbertus de Beleasmo
gerebant curam.” Here we have (see Appendix B) the true moment of
their coming. From this point we may accept the account in Orderic (667
B); “Prædictum oppidum Odo præsul et Eustachius comes atque Robertus
Bellesmensis, cum multis nobilibus viris et mediocribus, tenebant,
auxiliumque Roberti ducis, qui desidia mollitieque detinebatur, frustra
exspectabant.” We meet them again in 765 B.
[162] “Eustatius junior,” “Eustatius þe iunga.” See N. C. vol. iv. p. 745.
[163] They are mentioned in the Chronicle along with the incidental
mention of Eustace; “Innan þam castele wæron swiðe gode cnihtas,
Eustatius þe iunga, and Rogeres eorles þreo sunan, and ealle þa
betstboren men þe wæron innan þisan lande oððe on Normandige.” This
is followed by William of Malmesbury (iv. 306); “Erat tunc apud
Roveceastram omnis pene juventutis ex Anglia et Normannia nobilitas;
tres filii Rogerii comitis, et Eustachius Bononiæ junior, multique alii quos
infra curam nostram existimo.”
[164] The three sons of Earl Roger can hardly fail to be his three eldest
sons (see Will. Gem. vii. 16; Ord. Vit. 708 D), Robert, Hugh, and Roger, all
of whom figure in our story. Arnulf does not appear in English history till
later, and Philip the clerk does not appear at all. Geoffrey Gaimar (Chron.
Ang. Norm. i. 35), after setting forth the possessions of Robert of Bellême,
mentions the other three; but one does not exactly see why he says,
“Le conte Ernulf ert le quarte frère, Par cors valeit un emperère.”