Instant Ebooks Textbook Introduction To Algorithms For Data Mining and Machine Learning Yang Download All Chapters
Instant Ebooks Textbook Introduction To Algorithms For Data Mining and Machine Learning Yang Download All Chapters
com
https://ebookmass.com/product/introduction-to-
algorithms-for-data-mining-and-machine-learning-
yang/
https://ebookmass.com/product/introduction-to-algorithms-for-data-
mining-and-machine-learning-xin-she-yang/
ebookmass.com
https://ebookmass.com/product/fundamentals-of-machine-learning-for-
predictive-data-analytics-algorithms/
ebookmass.com
https://ebookmass.com/product/machine-learning-for-signal-processing-
data-science-algorithms-and-computational-statistics-max-a-little/
ebookmass.com
https://ebookmass.com/product/big-data-analytics-introduction-to-
hadoop-spark-and-machine-learning-raj-kamal/
ebookmass.com
Machine Learning for Biometrics: Concepts, Algorithms and
Applications (Cognitive Data Science in Sustainable
Computing) Partha Pratim Sarangi
https://ebookmass.com/product/machine-learning-for-biometrics-
concepts-algorithms-and-applications-cognitive-data-science-in-
sustainable-computing-partha-pratim-sarangi/
ebookmass.com
https://ebookmass.com/product/machine-learning-algorithms-for-signal-
and-image-processing-suman-lata-tripathi/
ebookmass.com
Xin-She Yang
Middlesex University
School of Science and Technology
London, United Kingdom
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2019 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about the
Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center
and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other
than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
any information, methods, compounds, or experiments described herein. In using such information or methods
they should be mindful of their own safety and the safety of others, including parties for whom they have a
professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability
for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or
from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-817216-2
Xin-She Yang obtained his PhD in Applied Mathematics from the University of Ox-
ford. He then worked at Cambridge University and National Physical Laboratory (UK)
as a Senior Research Scientist. Now he is Reader at Middlesex University London, and
an elected Bye-Fellow at Cambridge University.
He is also the IEEE Computer Intelligence Society (CIS) Chair for the Task Force
on Business Intelligence and Knowledge Management, Director of the International
Consortium for Optimization and Modelling in Science and Industry (iCOMSI), and
an Editor of Springer’s Book Series Springer Tracts in Nature-Inspired Computing
(STNIC).
With more than 20 years of research and teaching experience, he has authored
10 books and edited more than 15 books. He published more than 200 research pa-
pers in international peer-reviewed journals and conference proceedings with more
than 36 800 citations. He has been on the prestigious lists of Clarivate Analytics and
Web of Science highly cited researchers in 2016, 2017, and 2018. He serves on the
Editorial Boards of many international journals including International Journal of
Bio-Inspired Computation, Elsevier’s Journal of Computational Science (JoCS), In-
ternational Journal of Parallel, Emergent and Distributed Systems, and International
Journal of Computer Mathematics. He is also the Editor-in-Chief of the International
Journal of Mathematical Modelling and Numerical Optimisation.
Preface
Both data mining and machine learning are becoming popular subjects for university
courses and industrial applications. This popularity is partly driven by the Internet and
social media because they generate a huge amount of data every day, and the under-
standing of such big data requires sophisticated data mining techniques. In addition,
many applications such as facial recognition and robotics have extensively used ma-
chine learning algorithms, leading to the increasing popularity of artificial intelligence.
From a more general perspective, both data mining and machine learning are closely
related to optimization. After all, in many applications, we have to minimize costs,
errors, energy consumption, and environment impact and to maximize sustainabil-
ity, productivity, and efficiency. Many problems in data mining and machine learning
are usually formulated as optimization problems so that they can be solved by opti-
mization algorithms. Therefore, optimization techniques are closely related to many
techniques in data mining and machine learning.
Courses on data mining, machine learning, and optimization are often compulsory
for students, studying computer science, management science, engineering design, op-
erations research, data science, finance, and economics. All students have to develop
a certain level of data modeling skills so that they can process and interpret data for
classification, clustering, curve-fitting, and predictions. They should also be familiar
with machine learning techniques that are closely related to data mining so as to carry
out problem solving in many real-world applications. This book provides an introduc-
tion to all the major topics for such courses, covering the essential ideas of all key
algorithms and techniques for data mining, machine learning, and optimization.
Though there are over a dozen good books on such topics, most of these books are
either too specialized with specific readership or too lengthy (often over 500 pages).
This book fills in the gap with a compact and concise approach by focusing on the key
concepts, algorithms, and techniques at an introductory level. The main approach of
this book is informal, theorem-free, and practical. By using an informal approach all
fundamental topics required for data mining and machine learning are covered, and
the readers can gain such basic knowledge of all important algorithms with a focus
on their key ideas, without worrying about any tedious, rigorous mathematical proofs.
In addition, the practical approach provides about 30 worked examples in this book
so that the readers can see how each step of the algorithms and techniques works.
Thus, the readers can build their understanding and confidence gradually and in a
step-by-step manner. Furthermore, with the minimal requirements of basic high school
mathematics and some basic calculus, such an informal and practical style can also
enable the readers to learn the contents by self-study and at their own pace.
This book is suitable for undergraduates and graduates to rapidly develop all the
fundamental knowledge of data mining, machine learning, and optimization. It can
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
xii Preface
also be used by students and researchers as a reference to review and refresh their
knowledge in data mining, machine learning, optimization, computer science, and data
science.
Xin-She Yang
January 2019 in London
Acknowledgments
I would like to thank all my students and colleagues who have given valuable feedback
and comments on some of the contents and examples of this book. I also would like to
thank my editors, J. Scott Bentley and Michael Lutz, and the staff at Elsevier for their
professionalism. Last but not least, I thank my family for all the help and support.
Xin-She Yang
January 2019
Introduction to optimization
Contents
1.1 Algorithms
1 1
1.1.1 Essence of an algorithm 1
1.1.2 Issues with algorithms 3
1.1.3 Types of algorithms 3
1.2 Optimization 4
1.2.1 A simple example 4
1.2.2 General formulation of optimization 7
1.2.3 Feasible solution 9
1.2.4 Optimality criteria 10
1.3 Unconstrained optimization 10
1.3.1 Univariate functions 11
1.3.2 Multivariate functions 12
1.4 Nonlinear constrained optimization 14
1.4.1 Penalty method 15
1.4.2 Lagrange multipliers 16
1.4.3 Karush–Kuhn–Tucker conditions 17
1.5 Notes on software 18
This book introduces the most fundamentals and algorithms related to optimization,
data mining, and machine learning. The main requirement is some understanding of
high-school mathematics and basic calculus; however, we will review and introduce
some of the mathematical foundations in the first two chapters.
1.1 Algorithms
An algorithm is an iterative, step-by-step procedure for computation. The detailed
procedure can be a simple description, an equation, or a series of descriptions in
combination with equations. Finding the roots of a polynomial, checking if a natu-
ral number is a prime number, and generating random numbers are all algorithms.
Example 1
As an example, if x0 = 1 and a = 4, then we have
1 4
x1 = (1 + ) = 2.5. (1.2)
2 1
Similarly, we have
1 4 1 4
x2 = (2.5 + ) = 2.05, x3 = (2.05 + ) ≈ 2.0061, (1.3)
2 2.5 2 2.05
x4 ≈ 2.00000927, (1.4)
√
which is very close to the true value of 4 = 2. The accuracy of this iterative formula or algorithm
is high because it achieves the accuracy of five decimal places after four iterations.
The convergence is very quick if we start from different initial values such as
x0 = 10 and even x0 = 100. However, for an obvious reason, we cannot start with
x0 = 0 due to division by
√zero.
Find the root of x = a is equivalent to solving the equation
f (x) = x 2 − a = 0, (1.5)
which is again equivalent to finding the roots of a polynomial f (x). We know that
Newton’s root-finding algorithm can be written as
f (xk )
xk+1 = xk − , (1.6)
f (xk )
where f (x) is the first derivative or gradient of f (x). In this case, we have
f (x) = 2x. Thus, Newton’s formula becomes
(xk2 − a)
xk+1 = xk − , (1.7)
2xk
1.2 Optimization
V = πr 2 h. (1.12)
There are only two design variables r and h and one objective function S to be min-
imized. Obviously, if there is no capacity constraint, then we can choose not to build
the container, and then the cost of materials is zero for r = 0 and h = 0. However,
Introduction to optimization 5
the constraint requirement means that we have to build a container with fixed volume
V0 = πr 2 h = 10 m3 . Therefore, this optimization problem can be written as
πr 2 h = V0 = 10. (1.14)
To solve this problem, we can first try to use the equality constraint to reduce the
number of design variables by solving h. So we have
V0
h= . (1.15)
πr 2
Substituting it into (1.13), we get
S = 2πr 2 + 2πrh
V0 2V0
= 2πr 2 + 2πr 2 = 2πr 2 + . (1.16)
πr r
This is a univariate function. From basic calculus we know that the minimum or max-
imum can occur at the stationary point, where the first derivative is zero, that is,
dS 2V0
= 4πr − 2 = 0, (1.17)
dr r
which gives
V0 3 V0
r3 = , or r = . (1.18)
2π 2π
Thus, the height is
h V0 /(πr 2 ) V0
= = 3 = 2. (1.19)
r r πr
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
6 Introduction to Algorithms for Data Mining and Machine Learning
This means that the height is twice the radius: h = 2r. Thus, the minimum surface is
It is worth pointing out that this optimal solution is based on the assumption or re-
quirement to design a cylindrical container. If we decide to use a sphere with radius R,
we know that its volume and surface area is
4π 3
V0 = R , S = 4πR 2 . (1.21)
3
We can solve R directly
3V0 3 3V0
R =
3
, or R = , (1.22)
4π 4π
which gives the surface area
3V 2/3 √
0 4π 3 9 2/3
S = 4π =√ 3
V0 . (1.23)
4π 16π 2
√3 √ √ 3
Since 6π/ 4π 2 ≈ 5.5358 and 4π 3 9/ 16π 2 ≈ 4.83598, we have S < S∗ , that is, the
surface area of a sphere is smaller than the minimum surface area of a cylinder with
the same volume. In fact, for the same V0 = 10, we have
√
4π 3 9 2/3
S(sphere) = √ 3
V0 ≈ 22.47, (1.24)
16π 2
which is smaller than S∗ = 25.69 for a cylinder.
This highlights the importance of the choice of design type (here in terms of shape)
before we can do any truly useful optimization. Obviously, there are many other fac-
tors that can influence the choice of design, including the manufacturability of the
design, stability of the structure, ease of installation, space availability, and so on. For
a container, in most applications, a cylinder may be much easier to produce than a
sphere, and thus the overall cost may be lower in practice. Though there are so many
factors to be considered in engineering design, for the purpose of optimization, here
we will only focus on the improvement and optimization of a design with well-posed
mathematical formulations.
Introduction to optimization 7
where f (x), φj (x), and ψk (x) are scalar functions of the design vector x. Here the
components xi of x = (x1 , . . . , xD )T are called design or decision variables, and they
can be either continuous, discrete, or a mixture of these two. The vector x is often
called the decision vector, which varies in a D-dimensional space RD .
It is worth pointing out that we use a column vector here for x (thus with trans-
pose T ). We can also use a row vector x = (x1 , . . . , xD ) and the results will be the
same. Different textbooks may use slightly different formulations. Once we are aware
of such minor variations, it should cause no difficulty or confusion.
In addition, the function f (x) is called the objective function or cost function,
φj (x) are constraints in terms of M equalities, and ψk (x) are constraints written as
N inequalities. So there are M + N constraints in total. The optimization problem
formulated here is a nonlinear constrained problem. Here the inequalities ψk (x) ≤ 0
are written as “less than”, and they can also be written as “greater than” via a simple
transformation by multiplying both sides by −1.
The space spanned by the decision variables is called the search space RD , whereas
the space formed by the values of the objective function is called the objective or
response space, and sometimes the landscape. The optimization problem essentially
maps the domain RD or the space of decision variables into the solution space R (or
the real axis in general).
The objective function f (x) can be either linear or nonlinear. If the constraints φj
and ψk are all linear, it becomes a linearly constrained problem. Furthermore, when
φj , ψk , and the objective function f (x) are all linear, then it becomes a linear pro-
gramming problem [35]. If the objective is at most quadratic with linear constraints,
then it is called a quadratic programming problem. If all the values of the decision
variables can be only integers, then this type of linear programming is called integer
programming or integer linear programming.
On the other hand, if no constraints are specified and thus xi can take any values
in the real axis (or any integers), then the optimization problem is referred to as an
unconstrained optimization problem.
As a very simple example of optimization problems without any constraints, we
discuss the search of the maxima or minima of a univariate function.
8 Introduction to Algorithms for Data Mining and Machine Learning
2
Figure 1.2 A simple multimodal function f (x) = x 2 e−x .
Example 2
For example, to find the maximum of a univariate function f (x)
f (x) = x 2 e−x ,
2
−∞ < x < ∞, (1.26)
is a simple unconstrained problem, whereas the following problem is a simple constrained mini-
mization problem:
subject to
x1 ≥ 1, x2 − 2 = 0. (1.28)
It is worth pointing out that the objectives are explicitly known in all the optimiza-
tion problems to be discussed in this book. However, in reality, it is often difficult to
quantify what we want to achieve, but we still try to optimize certain things such as the
degree of enjoyment or service quality on holiday. In other cases, it may be impossible
to write the objective function in any explicit form mathematically.
From basic calculus we know that, for a given curve described by f (x), its gradient
f (x) describes the rate of change. When f (x) = 0, the curve has a horizontal tangent
at that particular point. This means that it becomes a point of special interest. In fact,
the maximum or minimum of a curve occurs at
f (x∗ ) = 0, (1.29)
Example 3
To find the minimum of f (x) = x 2 e−x (see Fig. 1.2), we have the stationary condition
2
f (x) = 0 or
Figure 1.3 (a) Feasible domain with nonlinear inequality constraints ψ1 (x) and ψ2 (x) (left) and linear
inequality constraint ψ3 (x). (b) An example with an objective of f (x) = x 2 subject to x ≥ 2 (right).
f (x) = 2e−x (1 − 5x 2 + 2x 4 ),
2
two maxima that occur at x∗ = ±1 with fmax = e−1 . At x = 0, we have f (0) = 2 > 0, thus
the minimum of f (x) occurs at x∗ = 0 with fmin (0) = 0.
Whatever the objective is, we have to evaluate it many times. In most cases, the
evaluations of the objective functions consume a substantial amount of computational
power (which costs money) and design time. Any efficient algorithm that can reduce
the number of objective evaluations saves both time and money.
In mathematical programming, there are many important concepts, and we will
first introduce a few related concepts: feasible solutions, optimality criteria, the strong
local optimum, and weak local optimum.
f (x∗ ) = f (0) = 0.
In fact, f (x) = x 3 has a saddle point x∗ = 0 because f (0) = 0 but f changes sign
from f (0+) > 0 to f (0−) < 0 as x moves from positive to negative.
Example 4
For example, to find the maximum or minimum of a univariate function
we first have to find its stationary points x∗ when the first derivative f (x) is zero, that is,
x∗ = −1, x∗ = 2, x∗ = 0.
From the basic calculus we know that the maximum requires f (x∗ ) ≤ 0 whereas the minimum
requires f (x∗ ) ≥ 0.
At x∗ = −1, we have
where we are more than half-way from Du Bartas and Aubigné to Victor
Hugo. The mere image—this new “vision of the guarded mount,” with the
black Furies silhouetted against the flaming cone, and the explosions of
the volcano deepening the bugle-call to massacre—is fine: the means
taken to make it poetical are finer. The use of the proper names, and the
cunning arrangement of epithet and noun in noires Euménides and
Vêpres homicides, and the sharp blasts of the long and short o's in the
second line, are more than Hugonian, they are positively Miltonic: and the
couplet will serve to keep a man in Mr Arnold’s “torpid and dismal” stage
of later middle life cheerful for an evening, and whensoever he
remembers it afterwards. True, Fontenelle admits demurely that he knows
“vespers” and “Eumenides” are something of an anachronism in
conjunction, and proposes a slight alteration to suit this objection of
“correctness.” But this is his way; and the wonderful thing is that he
should have admired it at all—should have actually tasted this heady
wine of poetry. As he finishes the paragraph in his own quaint style,[656] “Il
était bien aisé, même à de grands poètes, de ne pas trouver” this couplet:
and in his time it would have been still easier even for great critics not to
do justice to it, and not to see that it is to these things “so easy for the
poet not to find” that it is the critic’s business to look.
The general remarks on Comedy which he prefixed to a collection of
his efforts in that kind are not negligible; but in those on Eclogue,[657] and
still more in the Digression sur Les Anciens et Les Modernes, the curse,
or at least the gainsaying, of the Quarrel is upon him, and the main drift is
not merely digressive but aggressive and excessive. In the Digression he
anticipates (as he did in so many things) the materialist-rationalist
explanations of the later eighteenth century by climate, fibres of the brain,
&c. Here he becomes scientific, and therefore necessarily ceases to be of
importance in literature.
But he always regains that importance before long—in his Discourse of
the Origin of Fable, in his Academic Discourses and Replies, in many a
fragment and isolated remark. Even in his Eloges—mostly devoted (there
are nearly two volumes of them) to scientific personages from Leibniz and
Newton downwards—the unconquerable critical power of the man shows
itself, subject to the limitations noted. The world is sometimes not allowed
to know anything of its greatest critics, and Fontenelle is an example of
this. But those who have won something of that knowledge of criticism
which it is the humble purpose of this book to facilitate, will not slight the
man who, at the junction of the seventeenth and eighteenth centuries,
could flirt in the face of Ancients and Moderns alike the suggestion (which
Mr Rigmarole doubtless borrowed from him) that all times are “pretty
much like our own,” and could see and hear the sable sisters sounding
the tocsin on the flaming crest of Mongibel.
Fontenelle is elusive, but comprehensible by the imagination. La Motte,
[658]
his inseparable companion in the renewed sacrilege of the Moderns,
seems an easier, but is really a harder, personage to lay hold of. It is
La Motte. indeed not extremely difficult to explain his attitude to the
Ancients by the fact that he knew no Greek; and his
exaltation of prose by a consciousness (wherein he has left a family by no
means extinct) that his own verses were worth very little. But it is so easy
not to write verses if you cannot; and not to write about Greek if you do
not know it! And the problem is further complicated by the facts that at
least some judges, who are not exactly the first comers, such as
Fontenelle himself and Voltaire, maintained that La Motte could write
verses,—and that, so far from being “a fellow who had failed,” he had
obtained the greatest scenic success of the early eighteenth century with
Inès de Castro, and, what is more, had deserved it. But for once, as also
again in Pope’s case, the dangerous explanation of physical defects and
constitutional weakness seems to have some validity. The invulnerable
nonchalance of his friend Fontenelle had met the damnation of Aspar by
a cool tearing up of the piece, and an undismayed advance upon the fate
of the plusquam semel damnatus; La Motte, at twenty or at little more, felt
the similar misfortune of Les Originaux so severely that he actually went
to La Trappe for a time. Before middle life he was blind and a cripple. The
irritability which did not show itself in his temper (for he was the most
amiable of men) would seem to have transferred itself to his literary
attitude, not affecting his politeness of expression, but inducing a sort of
“rash” of paradox.
To trace the vagaries of this might not be unamusing, but would
His “Unity of certainly be excessive here. La Motte, it seems to me,
Interest.” had considerably less natural literary taste than
Fontenelle; and of the controversy[659] (it was not his antagonist’s fault if it
was not a very acrimonious one) between him and Madame Dacier one
cannot say much more than that the lady is very aggressive, very erudite,
and very unintelligent; the gentleman very suave, rather ignorant, and of
an intelligence better, but not much better, directed; while both are
sufficiently distant from any true critical point of view. Yet once, as was
not unnatural in the case of a very clever man who was at least
endeavouring to form independent conclusions, La Motte did hit upon a
great critical truth when,[660] discussing the Three Unities, he laid it down
that there is after all only one Unity which is of real importance, and that
this is the “Unity of Interest,” to which all the others are subsidiary, and
but as means to an end. “Self-evident,” some one may say; but in how
many critics have we found the fact acknowledged hitherto? and by how
many has it been frankly acknowledged since? That the aim of the poet is
to please, to satisfy the thirst for pleasure—that is to say, to interest—all
but the extremest ethical prudery will admit. But critics, especially
classical and neo-classical critics, have always been in the mood of