100% found this document useful (2 votes)
18 views

Introduction to Algorithms for Data Mining and Machine Learning Xin-She Yanginstant download

Ebook access

Uploaded by

otiopikua54
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
18 views

Introduction to Algorithms for Data Mining and Machine Learning Xin-She Yanginstant download

Ebook access

Uploaded by

otiopikua54
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Visit https://ebookmass.

com to download the full version and


browse more ebooks or textbooks

Introduction to Algorithms for Data Mining and


Machine Learning Xin-She Yang

_____ Press the link below to begin your download _____

https://ebookmass.com/product/introduction-to-algorithms-
for-data-mining-and-machine-learning-xin-she-yang/

Access ebookmass.com now to download high-quality


ebooks or textbooks
We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmass.com
to discover even more!

Introduction to algorithms for data mining and machine


learning Yang

https://ebookmass.com/product/introduction-to-algorithms-for-data-
mining-and-machine-learning-yang/

Fundamentals of Machine Learning for Predictive Data


Analytics: Algorithms,

https://ebookmass.com/product/fundamentals-of-machine-learning-for-
predictive-data-analytics-algorithms/

Machine Learning for Signal Processing: Data Science,


Algorithms, and Computational Statistics Max A. Little

https://ebookmass.com/product/machine-learning-for-signal-processing-
data-science-algorithms-and-computational-statistics-max-a-little/

BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and


Machine-Learning Raj Kamal

https://ebookmass.com/product/big-data-analytics-introduction-to-
hadoop-spark-and-machine-learning-raj-kamal/
Machine Learning for Biometrics: Concepts, Algorithms and
Applications (Cognitive Data Science in Sustainable
Computing) Partha Pratim Sarangi
https://ebookmass.com/product/machine-learning-for-biometrics-
concepts-algorithms-and-applications-cognitive-data-science-in-
sustainable-computing-partha-pratim-sarangi/

Machine Learning Algorithms for Signal and Image


Processing Suman Lata Tripathi

https://ebookmass.com/product/machine-learning-algorithms-for-signal-
and-image-processing-suman-lata-tripathi/

Learn Data Mining Through Excel: A Step-by-Step Approach


for Understanding Machine Learning Methods, 2nd Edition
Hong Zhou
https://ebookmass.com/product/learn-data-mining-through-excel-a-step-
by-step-approach-for-understanding-machine-learning-methods-2nd-
edition-hong-zhou/

Absolute Beginner’s Guide to Algorithms: A Practical


Introduction to Data Structures and Algorithms in
JavaScript (for True Epub) Kirupa Chinnathambi
https://ebookmass.com/product/absolute-beginners-guide-to-algorithms-
a-practical-introduction-to-data-structures-and-algorithms-in-
javascript-for-true-epub-kirupa-chinnathambi/

Absolute Beginner's Guide to Algorithms: A Practical


Introduction to Data Structures and Algorithms in
JavaScript Kirupa Chinnathambi
https://ebookmass.com/product/absolute-beginners-guide-to-algorithms-
a-practical-introduction-to-data-structures-and-algorithms-in-
javascript-kirupa-chinnathambi/
Xin-She Yang

Introduction to
Algorithms for Data Mining
and Machine Learning
Introduction to Algorithms for Data Mining and
Machine Learning
This page intentionally left blank
Introduction to
Algorithms for Data
Mining and Machine
Learning

Xin-She Yang
Middlesex University
School of Science and Technology
London, United Kingdom
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2019 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about the
Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center
and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other
than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using
any information, methods, compounds, or experiments described herein. In using such information or methods
they should be mindful of their own safety and the safety of others, including parties for whom they have a
professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability
for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or
from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

ISBN: 978-0-12-817216-2

For information on all Academic Press publications


visit our website at https://www.elsevier.com/books-and-journals

Publisher: Candice Janco


Acquisition Editor: J. Scott Bentley
Editorial Project Manager: Michael Lutz
Production Project Manager: Nilesh Kumar Shah
Designer: Miles Hitchen
Typeset by VTeX
Contents

About the author ix


Preface xi
Acknowledgments xiii

1 Introduction to optimization 1
1.1 Algorithms 1
1.1.1 Essence of an algorithm 1
1.1.2 Issues with algorithms 3
1.1.3 Types of algorithms 3
1.2 Optimization 4
1.2.1 A simple example 4
1.2.2 General formulation of optimization 7
1.2.3 Feasible solution 9
1.2.4 Optimality criteria 10
1.3 Unconstrained optimization 10
1.3.1 Univariate functions 11
1.3.2 Multivariate functions 12
1.4 Nonlinear constrained optimization 14
1.4.1 Penalty method 15
1.4.2 Lagrange multipliers 16
1.4.3 Karush–Kuhn–Tucker conditions 17
1.5 Notes on software 18

2 Mathematical foundations 19
2.1 Convexity 20
2.1.1 Linear and affine functions 20
2.1.2 Convex functions 21
2.1.3 Mathematical operations on convex functions 22
2.2 Computational complexity 22
2.2.1 Time and space complexity 24
2.2.2 Complexity of algorithms 25
2.3 Norms and regularization 26
2.3.1 Norms 26
2.3.2 Regularization 28
2.4 Probability distributions 29
2.4.1 Random variables 29
2.4.2 Probability distributions 30
vi Contents

2.4.3 Conditional probability and Bayesian rule 32


2.4.4 Gaussian process 34
2.5 Bayesian network and Markov models 35
2.6 Monte Carlo sampling 36
2.6.1 Markov chain Monte Carlo 37
2.6.2 Metropolis–Hastings algorithm 37
2.6.3 Gibbs sampler 39
2.7 Entropy, cross entropy, and KL divergence 39
2.7.1 Entropy and cross entropy 39
2.7.2 DL divergence 40
2.8 Fuzzy rules 41
2.9 Data mining and machine learning 42
2.9.1 Data mining 42
2.9.2 Machine learning 42
2.10 Notes on software 42

3 Optimization algorithms 45
3.1 Gradient-based methods 45
3.1.1 Newton’s method 45
3.1.2 Newton’s method for multivariate functions 47
3.1.3 Line search 48
3.2 Variants of gradient-based methods 49
3.2.1 Stochastic gradient descent 50
3.2.2 Subgradient method 51
3.2.3 Conjugate gradient method 52
3.3 Optimizers in deep learning 53
3.4 Gradient-free methods 56
3.5 Evolutionary algorithms and swarm intelligence 58
3.5.1 Genetic algorithm 58
3.5.2 Differential evolution 60
3.5.3 Particle swarm optimization 61
3.5.4 Bat algorithm 61
3.5.5 Firefly algorithm 62
3.5.6 Cuckoo search 62
3.5.7 Flower pollination algorithm 63
3.6 Notes on software 64

4 Data fitting and regression 67


4.1 Sample mean and variance 67
4.2 Regression analysis 69
4.2.1 Maximum likelihood 69
4.2.2 Liner regression 70
4.2.3 Linearization 75
4.2.4 Generalized linear regression 77
4.2.5 Goodness of fit 80
Contents vii

4.3 Nonlinear least squares 81


4.3.1 Gauss–Newton algorithm 82
4.3.2 Levenberg–Marquardt algorithm 85
4.3.3 Weighted least squares 85
4.4 Overfitting and information criteria 86
4.5 Regularization and Lasso method 88
4.6 Notes on software 90

5 Logistic regression, PCA, LDA, and ICA 91


5.1 Logistic regression 91
5.2 Softmax regression 96
5.3 Principal component analysis 96
5.4 Linear discriminant analysis 101
5.5 Singular value decomposition 104
5.6 Independent component analysis 105
5.7 Notes on software 108

6 Data mining techniques 109


6.1 Introduction 110
6.1.1 Types of data 110
6.1.2 Distance metric 110
6.2 Hierarchy clustering 111
6.3 k-Nearest-neighbor algorithm 112
6.4 k-Means algorithm 113
6.5 Decision trees and random forests 115
6.5.1 Decision tree algorithm 115
6.5.2 ID3 algorithm and C4.5 classifier 116
6.5.3 Random forest 120
6.6 Bayesian classifiers 121
6.6.1 Naive Bayesian classifier 121
6.6.2 Bayesian networks 123
6.7 Data mining for big data 124
6.7.1 Characteristics of big data 124
6.7.2 Statistical nature of big data 125
6.7.3 Mining big data 125
6.8 Notes on software 127

7 Support vector machine and regression 129


7.1 Statistical learning theory 129
7.2 Linear support vector machine 130
7.3 Kernel functions and nonlinear SVM 133
7.4 Support vector regression 135
7.5 Notes on software 137
viii Contents

8 Neural networks and deep learning 139


8.1 Learning 139
8.2 Artificial neural networks 140
8.2.1 Neuron models 140
8.2.2 Activation models 141
8.2.3 Artificial neural networks 143
8.3 Back propagation algorithm 146
8.4 Loss functions in ANN 147
8.5 Optimizers and choice of optimizers 149
8.6 Network architecture 149
8.7 Deep learning 151
8.7.1 Convolutional neural networks 151
8.7.2 Restricted Boltzmann machine 157
8.7.3 Deep neural nets 158
8.7.4 Trends in deep learning 159
8.8 Tuning of hyperparameters 160
8.9 Notes on software 161

Bibliography 163

Index 171
About the author

Xin-She Yang obtained his PhD in Applied Mathematics from the University of Ox-
ford. He then worked at Cambridge University and National Physical Laboratory (UK)
as a Senior Research Scientist. Now he is Reader at Middlesex University London, and
an elected Bye-Fellow at Cambridge University.
He is also the IEEE Computer Intelligence Society (CIS) Chair for the Task Force
on Business Intelligence and Knowledge Management, Director of the International
Consortium for Optimization and Modelling in Science and Industry (iCOMSI), and
an Editor of Springer’s Book Series Springer Tracts in Nature-Inspired Computing
(STNIC).
With more than 20 years of research and teaching experience, he has authored
10 books and edited more than 15 books. He published more than 200 research pa-
pers in international peer-reviewed journals and conference proceedings with more
than 36 800 citations. He has been on the prestigious lists of Clarivate Analytics and
Web of Science highly cited researchers in 2016, 2017, and 2018. He serves on the
Editorial Boards of many international journals including International Journal of
Bio-Inspired Computation, Elsevier’s Journal of Computational Science (JoCS), In-
ternational Journal of Parallel, Emergent and Distributed Systems, and International
Journal of Computer Mathematics. He is also the Editor-in-Chief of the International
Journal of Mathematical Modelling and Numerical Optimisation.
This page intentionally left blank
Preface

Both data mining and machine learning are becoming popular subjects for university
courses and industrial applications. This popularity is partly driven by the Internet and
social media because they generate a huge amount of data every day, and the under-
standing of such big data requires sophisticated data mining techniques. In addition,
many applications such as facial recognition and robotics have extensively used ma-
chine learning algorithms, leading to the increasing popularity of artificial intelligence.
From a more general perspective, both data mining and machine learning are closely
related to optimization. After all, in many applications, we have to minimize costs,
errors, energy consumption, and environment impact and to maximize sustainabil-
ity, productivity, and efficiency. Many problems in data mining and machine learning
are usually formulated as optimization problems so that they can be solved by opti-
mization algorithms. Therefore, optimization techniques are closely related to many
techniques in data mining and machine learning.
Courses on data mining, machine learning, and optimization are often compulsory
for students, studying computer science, management science, engineering design, op-
erations research, data science, finance, and economics. All students have to develop
a certain level of data modeling skills so that they can process and interpret data for
classification, clustering, curve-fitting, and predictions. They should also be familiar
with machine learning techniques that are closely related to data mining so as to carry
out problem solving in many real-world applications. This book provides an introduc-
tion to all the major topics for such courses, covering the essential ideas of all key
algorithms and techniques for data mining, machine learning, and optimization.
Though there are over a dozen good books on such topics, most of these books are
either too specialized with specific readership or too lengthy (often over 500 pages).
This book fills in the gap with a compact and concise approach by focusing on the key
concepts, algorithms, and techniques at an introductory level. The main approach of
this book is informal, theorem-free, and practical. By using an informal approach all
fundamental topics required for data mining and machine learning are covered, and
the readers can gain such basic knowledge of all important algorithms with a focus
on their key ideas, without worrying about any tedious, rigorous mathematical proofs.
In addition, the practical approach provides about 30 worked examples in this book
so that the readers can see how each step of the algorithms and techniques works.
Thus, the readers can build their understanding and confidence gradually and in a
step-by-step manner. Furthermore, with the minimal requirements of basic high school
mathematics and some basic calculus, such an informal and practical style can also
enable the readers to learn the contents by self-study and at their own pace.
This book is suitable for undergraduates and graduates to rapidly develop all the
fundamental knowledge of data mining, machine learning, and optimization. It can
xii Preface

also be used by students and researchers as a reference to review and refresh their
knowledge in data mining, machine learning, optimization, computer science, and data
science.

Xin-She Yang
January 2019 in London
Acknowledgments

I would like to thank all my students and colleagues who have given valuable feedback
and comments on some of the contents and examples of this book. I also would like to
thank my editors, J. Scott Bentley and Michael Lutz, and the staff at Elsevier for their
professionalism. Last but not least, I thank my family for all the help and support.

Xin-She Yang
January 2019
This page intentionally left blank
Introduction to optimization
Contents
1.1 Algorithms
1 1
1.1.1 Essence of an algorithm 1
1.1.2 Issues with algorithms 3
1.1.3 Types of algorithms 3
1.2 Optimization 4
1.2.1 A simple example 4
1.2.2 General formulation of optimization 7
1.2.3 Feasible solution 9
1.2.4 Optimality criteria 10
1.3 Unconstrained optimization 10
1.3.1 Univariate functions 11
1.3.2 Multivariate functions 12
1.4 Nonlinear constrained optimization 14
1.4.1 Penalty method 15
1.4.2 Lagrange multipliers 16
1.4.3 Karush–Kuhn–Tucker conditions 17
1.5 Notes on software 18

This book introduces the most fundamentals and algorithms related to optimization,
data mining, and machine learning. The main requirement is some understanding of
high-school mathematics and basic calculus; however, we will review and introduce
some of the mathematical foundations in the first two chapters.

1.1 Algorithms
An algorithm is an iterative, step-by-step procedure for computation. The detailed
procedure can be a simple description, an equation, or a series of descriptions in
combination with equations. Finding the roots of a polynomial, checking if a natu-
ral number is a prime number, and generating random numbers are all algorithms.

1.1.1 Essence of an algorithm


In essence, an algorithm can be written as an iterative equation or a set of iterative
equations. For example, to find a square root of a > 0, we can use the following
iterative equation:
1 a
xk+1 = xk + , (1.1)
2 xk
where k is the iteration counter (k = 0, 1, 2, . . . ) starting with a random guess x0 = 1.
Introduction to Algorithms for Data Mining and Machine Learning. https://doi.org/10.1016/B978-0-12-817216-2.00008-9
Copyright © 2019 Elsevier Inc. All rights reserved.
2 Introduction to Algorithms for Data Mining and Machine Learning

Example 1
As an example, if x0 = 1 and a = 4, then we have

1 4
x1 = (1 + ) = 2.5. (1.2)
2 1

Similarly, we have

1 4 1 4
x2 = (2.5 + ) = 2.05, x3 = (2.05 + ) ≈ 2.0061, (1.3)
2 2.5 2 2.05
x4 ≈ 2.00000927, (1.4)

which is very close to the true value of 4 = 2. The accuracy of this iterative formula or algorithm
is high because it achieves the accuracy of five decimal places after four iterations.

The convergence is very quick if we start from different initial values such as
x0 = 10 and even x0 = 100. However, for an obvious reason, we cannot start with
x0 = 0 due to division by
√zero.
Find the root of x = a is equivalent to solving the equation

f (x) = x 2 − a = 0, (1.5)

which is again equivalent to finding the roots of a polynomial f (x). We know that
Newton’s root-finding algorithm can be written as

f (xk )
xk+1 = xk − , (1.6)
f  (xk )

where f  (x) is the first derivative or gradient of f (x). In this case, we have
f  (x) = 2x. Thus, Newton’s formula becomes

(xk2 − a)
xk+1 = xk − , (1.7)
2xk

which can be written as


xk a 1 a
xk+1 = (xk − )+ = xk + ). (1.8)
2 2xk 2 xk

This is exactly what we have in Eq. (1.1).


Newton’s method has rigorous mathematical foundations, which has a guaranteed
convergence under certain conditions. However, in general, Eq. (1.6) is more general,
and the gradient information f  (x) is needed. In addition, for the formula to be valid,
we must have f  (x) = 0.
Introduction to optimization 3

1.1.2 Issues with algorithms


The advantage of the algorithm given in Eq. (1.1) is that√it converges very quickly.
However, careful readers may have asked: we know that 4 = ±2, how can we find
the other root −2 in addition to +2?
Even if we use different initial value x0 = 10 or x0 = 0.5, we can only reach x∗ = 2,
not −2.
What happens if we start with x0 < 0? From x0 = −1, we have
1 4 1 4
x1 = (−1 + ) = −2.5, x 2 = (−2.5 + ) = −2.05, (1.9)
2 −1 2 −2.5
x3 ≈ −2.0061, x4 ≈ −2.00000927, (1.10)
which is approaching −2 very quickly. If we start from x0 = −10 or x0 = −0.5, then
we can always get x∗ = −2, not +2.
This highlights a key issue here: the final solution seems to depend on the initial
starting point for this algorithm, which is true for many algorithms.
Now the relevant question is: how do we know where to start to get a particular
solution? The general short answer is “we do not know”. Thus, some knowledge of
the problem under consideration or an educated guess may be useful to find the final
solution.
In fact, most algorithms may depend on the initial configuration, and such algo-
rithms are often carrying out search moves locally. Thus, this type of algorithm is
often referred to as local search. A good algorithm should be able to “forget” its initial
configuration though such algorithms may not exist at all for most types of problems.
What we need in general is the global search, which attempts to find final solutions
that are less sensitive to the initial starting point(s).
Another important issue in our discussions is that the gradient information f  (x) is
necessary for some algorithms such as Newton’s method given in Eq. (1.6). This poses
certain requirements on the smoothness of the function f (x). For example, we know
that |x| is not differentiable at x = 0. Thus, we cannot directly use Newton’s method
to find the roots of f (x) = |x|x 2 − a = 0 for a > 0. Some modifications are needed.
There are other issues related to algorithms such as the setting of parameters, the
slow rate of convergence, condition numbers, and iteration structures. All these make
algorithm designs and usage somehow challenging, and we will discuss these issues
in more detail later in this book.

1.1.3 Types of algorithms


An algorithm can only do a specific computation task (at most a class of computational
tasks), and no algorithms can do all the tasks. Thus, algorithms can be classified due
to their purposes. An algorithm to find roots of a polynomial belongs to root-finding
algorithms, whereas an algorithm for ranking a set of numbers belongs to sorting
algorithms. There are many classes of algorithms for different purposes. Even for the
same purpose such as sorting, there are many different algorithms such as the merge
sort, bubble sort, quicksort, and others.
4 Introduction to Algorithms for Data Mining and Machine Learning

We can also categorize algorithms in terms of their characteristics. The root-finding


algorithms we just introduced are deterministic algorithms because the final solutions
are exactly the same if we start from the same initial guess. We obtain the same set of
solutions every time we run the algorithm. On the other hand, we may introduce some
randomization into the algorithm, for example, using purely random initial points.
Every time we run the algorithm, we use a new random initial guess. In this case, the
algorithm can have some nondeterministic nature, and such algorithms are referred
to as stochastic.√Sometimes, using randomness may be advantageous. For example, in
the example of 4 = ±2 using Eq. (1.1), random initial values (both positive and neg-
ative) can allow the algorithm to find both roots. In fact, a major trend in the modern
metaheuristics is using some randomization to suit different purposes.
For algorithms to be introduced in this book, we are mainly concerned with al-
gorithms for data mining, optimization, and machine learning. We use a relatively
unified approach to link algorithms in data mining and machine learning to algorithms
for optimization.

1.2 Optimization

Optimization is everywhere, from engineering design to business planning. After all,


time and resources are limited, and optimal use of such valuable resources is crucial.
In addition, designs of products have to maximize the performance, sustainability, and
energy efficiency and to minimize the costs. Therefore, optimization is important for
many applications.

1.2.1 A simple example


Let us start with a very simple example to design a container with volume capacity
V0 = 10 m3 . As the main cost is related to the cost of materials, the main aim is to
minimize the total surface area S.
The first thing we have to decide is the shape of the container (cylinder, cubic,
sphere or ellipsoid, or more complex geometry). For simplicity, let us start with a
cylindrical shape with radius r and height h (see Fig. 1.1).
The total surface area of a cylinder is

S = 2(πr 2 ) + 2πrh, (1.11)

and the volume is

V = πr 2 h. (1.12)

There are only two design variables r and h and one objective function S to be min-
imized. Obviously, if there is no capacity constraint, then we can choose not to build
the container, and then the cost of materials is zero for r = 0 and h = 0. However,
Introduction to optimization 5

Figure 1.1 Design of a cylindric container.

the constraint requirement means that we have to build a container with fixed volume
V0 = πr 2 h = 10 m3 . Therefore, this optimization problem can be written as

minimize S = 2πr 2 + 2πrh, (1.13)

subject to the equality constraint

πr 2 h = V0 = 10. (1.14)

To solve this problem, we can first try to use the equality constraint to reduce the
number of design variables by solving h. So we have
V0
h= . (1.15)
πr 2
Substituting it into (1.13), we get

S = 2πr 2 + 2πrh
V0 2V0
= 2πr 2 + 2πr 2 = 2πr 2 + . (1.16)
πr r
This is a univariate function. From basic calculus we know that the minimum or max-
imum can occur at the stationary point, where the first derivative is zero, that is,
dS 2V0
= 4πr − 2 = 0, (1.17)
dr r
which gives

V0 3 V0
r3 = , or r = . (1.18)
2π 2π
Thus, the height is

h V0 /(πr 2 ) V0
= = 3 = 2. (1.19)
r r πr
6 Introduction to Algorithms for Data Mining and Machine Learning

This means that the height is twice the radius: h = 2r. Thus, the minimum surface is

S∗ = 2πr 2 + 2πrh = 2πr 2 + 2πr(2r) = 6πr 2


 V 2/3 6π
0 2/3
= 6π =√3
V0 . (1.20)
2π 4π 2

For V0 = 10, we have


 
3 V0 3 10
r= = ≈ 1.1675, h = 2r = 2.335,
(2π) 2π

and the total surface area

S∗ = 2πr 2 + 2πrh ≈ 25.69.

It is worth pointing out that this optimal solution is based on the assumption or re-
quirement to design a cylindrical container. If we decide to use a sphere with radius R,
we know that its volume and surface area is
4π 3
V0 = R , S = 4πR 2 . (1.21)
3
We can solve R directly

3V0 3 3V0
R =
3
, or R = , (1.22)
4π 4π
which gives the surface area
 3V 2/3 √
0 4π 3 9 2/3
S = 4π =√ 3
V0 . (1.23)
4π 16π 2
√3 √ √ 3
Since 6π/ 4π 2 ≈ 5.5358 and 4π 3 9/ 16π 2 ≈ 4.83598, we have S < S∗ , that is, the
surface area of a sphere is smaller than the minimum surface area of a cylinder with
the same volume. In fact, for the same V0 = 10, we have

4π 3 9 2/3
S(sphere) = √ 3
V0 ≈ 22.47, (1.24)
16π 2
which is smaller than S∗ = 25.69 for a cylinder.
This highlights the importance of the choice of design type (here in terms of shape)
before we can do any truly useful optimization. Obviously, there are many other fac-
tors that can influence the choice of design, including the manufacturability of the
design, stability of the structure, ease of installation, space availability, and so on. For
a container, in most applications, a cylinder may be much easier to produce than a
sphere, and thus the overall cost may be lower in practice. Though there are so many
factors to be considered in engineering design, for the purpose of optimization, here
we will only focus on the improvement and optimization of a design with well-posed
mathematical formulations.
Introduction to optimization 7

1.2.2 General formulation of optimization


Whatever the real-world applications may be, it is usually possible to formulate an
optimization problem in a generic form [49,53,160]. All optimization problems with
explicit objectives can in general be expressed as a nonlinearly constrained optimiza-
tion problem

maximize/minimize f (x), x = (x1 , x2 , . . . , xD )T ∈ RD ,


subject to φj (x) = 0 (j = 1, 2, . . . , M),
ψk (x) ≤ 0 (k = 1, . . . , N), (1.25)

where f (x), φj (x), and ψk (x) are scalar functions of the design vector x. Here the
components xi of x = (x1 , . . . , xD )T are called design or decision variables, and they
can be either continuous, discrete, or a mixture of these two. The vector x is often
called the decision vector, which varies in a D-dimensional space RD .
It is worth pointing out that we use a column vector here for x (thus with trans-
pose T ). We can also use a row vector x = (x1 , . . . , xD ) and the results will be the
same. Different textbooks may use slightly different formulations. Once we are aware
of such minor variations, it should cause no difficulty or confusion.
In addition, the function f (x) is called the objective function or cost function,
φj (x) are constraints in terms of M equalities, and ψk (x) are constraints written as
N inequalities. So there are M + N constraints in total. The optimization problem
formulated here is a nonlinear constrained problem. Here the inequalities ψk (x) ≤ 0
are written as “less than”, and they can also be written as “greater than” via a simple
transformation by multiplying both sides by −1.
The space spanned by the decision variables is called the search space RD , whereas
the space formed by the values of the objective function is called the objective or
response space, and sometimes the landscape. The optimization problem essentially
maps the domain RD or the space of decision variables into the solution space R (or
the real axis in general).
The objective function f (x) can be either linear or nonlinear. If the constraints φj
and ψk are all linear, it becomes a linearly constrained problem. Furthermore, when
φj , ψk , and the objective function f (x) are all linear, then it becomes a linear pro-
gramming problem [35]. If the objective is at most quadratic with linear constraints,
then it is called a quadratic programming problem. If all the values of the decision
variables can be only integers, then this type of linear programming is called integer
programming or integer linear programming.
On the other hand, if no constraints are specified and thus xi can take any values
in the real axis (or any integers), then the optimization problem is referred to as an
unconstrained optimization problem.
As a very simple example of optimization problems without any constraints, we
discuss the search of the maxima or minima of a univariate function.
8 Introduction to Algorithms for Data Mining and Machine Learning

2
Figure 1.2 A simple multimodal function f (x) = x 2 e−x .

Example 2
For example, to find the maximum of a univariate function f (x)

f (x) = x 2 e−x ,
2
−∞ < x < ∞, (1.26)

is a simple unconstrained problem, whereas the following problem is a simple constrained mini-
mization problem:

f (x1 , x2 ) = x12 + x1 x2 + x22 , (x1 , x2 ) ∈ R2 , (1.27)

subject to

x1 ≥ 1, x2 − 2 = 0. (1.28)

It is worth pointing out that the objectives are explicitly known in all the optimiza-
tion problems to be discussed in this book. However, in reality, it is often difficult to
quantify what we want to achieve, but we still try to optimize certain things such as the
degree of enjoyment or service quality on holiday. In other cases, it may be impossible
to write the objective function in any explicit form mathematically.
From basic calculus we know that, for a given curve described by f (x), its gradient
f  (x) describes the rate of change. When f  (x) = 0, the curve has a horizontal tangent
at that particular point. This means that it becomes a point of special interest. In fact,
the maximum or minimum of a curve occurs at
f  (x∗ ) = 0, (1.29)

which is a critical condition or stationary condition. The solution x∗ to this equation


corresponds to a stationary point, and there may be multiple stationary points for a
given curve.
To see if it is a maximum or minimum at x = x∗ , we have to use the information of
its second derivative f  (x). In fact, f  (x∗ ) > 0 corresponds to a minimum, whereas
f  (x∗ ) < 0 corresponds to a maximum. Let us see a concrete example.

Example 3
To find the minimum of f (x) = x 2 e−x (see Fig. 1.2), we have the stationary condition
2

f  (x) = 0 or

f  (x) = 2x × e−x + x 2 × (−2x)e−x = 2(x − x 3 )e−x = 0.


2 2 2
Introduction to optimization 9

Figure 1.3 (a) Feasible domain with nonlinear inequality constraints ψ1 (x) and ψ2 (x) (left) and linear
inequality constraint ψ3 (x). (b) An example with an objective of f (x) = x 2 subject to x ≥ 2 (right).

As e−x > 0, we have


2

x(1 − x 2 ) = 0, or x = 0 and x = ±1.

The second derivative is given by

f  (x) = 2e−x (1 − 5x 2 + 2x 4 ),
2

which is an even function with respect to x.


So at x = ±1, f  (±1) = 2[1 − 5(±1)2 + 2(±1)4 ]e−(±1) = −4e−1 < 0. Thus, there are
2

two maxima that occur at x∗ = ±1 with fmax = e−1 . At x = 0, we have f  (0) = 2 > 0, thus
the minimum of f (x) occurs at x∗ = 0 with fmin (0) = 0.

Whatever the objective is, we have to evaluate it many times. In most cases, the
evaluations of the objective functions consume a substantial amount of computational
power (which costs money) and design time. Any efficient algorithm that can reduce
the number of objective evaluations saves both time and money.
In mathematical programming, there are many important concepts, and we will
first introduce a few related concepts: feasible solutions, optimality criteria, the strong
local optimum, and weak local optimum.

1.2.3 Feasible solution


A point x that satisfies all the constraints is called a feasible point and thus is a feasible
solution to the problem. The set of all feasible points is called the feasible region (see
Fig. 1.3).
For example, we know that the domain f (x) = x 2 consists of all real numbers. If
we want to minimize f (x) without any constraint, all solutions such as x = −1, x = 1,
and x = 0 are feasible. In fact, the feasible region is the whole real axis. Obviously,
x = 0 corresponds to f (0) = 0 as the true minimum.
However, if we want to find the minimum of f (x) = x 2 subject to x ≥ 2, then it
becomes a constrained optimization problem. The points such as x = 1 and x = 0 are
no longer feasible because they do not satisfy x ≥ 2. In this case the feasible solutions
are all the points that satisfy x ≥ 2. So x = 2, x = 100, and x = 108 are all feasible. It
is obvious that the minimum occurs at x = 2 with f (2) = 22 = 4, that is, the optimal
solution for this problem occurs at the boundary point x = 2 (see Fig. 1.3).
10 Introduction to Algorithms for Data Mining and Machine Learning

Figure 1.4 Local optima, weak optima, and global optimality.

1.2.4 Optimality criteria


A point x ∗ is called a strong local maximum of the nonlinearly constrained op-
timization problem if f (x) is defined in a δ-neighborhood N (x ∗ , δ) and satisfies
f (x ∗ ) > f (u) for u ∈ N (x ∗ , δ), where δ > 0 and u = x ∗ . If x ∗ is not a strong lo-
cal maximum, then the inclusion of equality in the condition f (x ∗ ) ≥ f (u) for all
u ∈ N (x ∗ , δ) defines the point x ∗ as a weak local maximum (see Fig. 1.4). The local
minima can be defined in a similar manner when > and ≥ are replaced by < and ≤,
respectively.
Fig. 1.4 shows various local maxima and minima. Point A is a strong local max-
imum, whereas point B is a weak local maximum because there are many (in fact,
infinite) different values of x that will lead to the same value of f (x ∗ ). Point D is the
global maximum, and point E is the global minimum. In addition, point F is a strong
local minimum. However, point C is a strong local minimum, but it has a discontinuity
in f  (x ∗ ). So the stationary condition for this point f  (x ∗ ) = 0 is not valid. We will
not deal with these types of minima or maxima in detail.
As we briefly mentioned before, for a smooth curve f (x), optimal solutions usu-
ally occur at stationary points where f  (x) = 0. This is not always the case because
optimal solutions can also occur at the boundary, as we have seen in the previous ex-
ample of minimizing f (x) = x 2 subject to x ≥ 2. In our present discussion, we will
assume that both f (x) and f  (x) are always continuous or f (x) is everywhere twice
continuously differentiable. Obviously, the information of f  (x) is not sufficient to
determine whether a stationary point is a local maximum or minimum. Thus, higher-
order derivatives such as f  (x) are needed, but we do not make any assumption at this
stage. We will further discuss this in detail in the next section.

1.3 Unconstrained optimization

Optimization problems can be classified as either unconstrained or constrained. Un-


constrained optimization problems can in turn be subdivided into univariate and mul-
tivariate problems.
Introduction to optimization 11

1.3.1 Univariate functions


The simplest optimization problem without any constraints is probably the search for
the maxima or minima of a univariate function f (x). For unconstrained optimization
problems, the optimality occurs at the critical points given by the stationary condition
f  (x) = 0.
However, this stationary condition is just a necessary condition, but it is not a suf-
ficient condition. If f  (x∗ ) = 0 and f  (x∗ ) > 0, it is a local minimum. Conversely, if
f  (x∗ ) = 0 and f  (x∗ ) < 0, then it is a local maximum. However, if f  (x∗ ) = 0 and
f  (x∗ ) = 0, care should be taken because f  (x) may be indefinite (both positive and
negative) when x → x∗ , then x∗ corresponds to a saddle point.
For example, for f (x) = x 3 , we have

f  (x) = 3x 2 , f  (x) = 6x. (1.30)

The stationary condition f  (x) = 3x 2 = 0 gives x∗ = 0. However, we also have

f  (x∗ ) = f  (0) = 0.

In fact, f (x) = x 3 has a saddle point x∗ = 0 because f  (0) = 0 but f  changes sign
from f  (0+) > 0 to f  (0−) < 0 as x moves from positive to negative.

Example 4
For example, to find the maximum or minimum of a univariate function

f (x) = 3x 4 − 4x 3 − 12x 2 + 9, −∞ < x < ∞,

we first have to find its stationary points x∗ when the first derivative f  (x) is zero, that is,

f  (x) = 12x 3 − 12x 2 − 24x = 12(x 3 − x 2 − 2x) = 0.

Since f  (x) = 12(x 3 − x 2 − 2x) = 12x(x + 1)(x − 2) = 0, we have

x∗ = −1, x∗ = 2, x∗ = 0.

The second derivative of f (x) is simply

f  (x) = 36x 2 − 24x − 24.

From the basic calculus we know that the maximum requires f  (x∗ ) ≤ 0 whereas the minimum
requires f  (x∗ ) ≥ 0.
At x∗ = −1, we have

f  (−1) = 36(−1)2 − 24(−1) − 24 = 36 > 0,

so this point corresponds to a local minimum

f (−1) = 3(−1)4 − 4(−1)3 − 12(−1)2 + 9 = 4.


12 Introduction to Algorithms for Data Mining and Machine Learning

Similarly, at x∗ = 2, f  (x∗ ) = 72 > 0, and thus we have another local minimum

f (x∗ ) = −23.

However, at x∗ = 0, we have f  (0) = −24 < 0, which corresponds to a local maximum


f (0) = 9. However, this maximum is not a global maximum because the global maxima for f (x)
occur at x = ±∞.
The global minimum occurs at x∗ = 2 with f (2) = −23.

The maximization of a function f (x) can be converted into the minimization of A−


f (x), where A is usually a large positive number (though A = 0 will do). For example,
we know the maximum of f (x) = e−x , x ∈ (−∞, ∞), is 1 at x∗ = 0. This problem
2

can be converted to the minimization of −f (x). For this reason, the optimization
problems can be expressed as either minimization or maximization depending on the
context and convenience of formulations.
In fact, in the optimization literature, some books formulate all the optimization
problems in terms of maximization, whereas others write these problems in terms of
minimization, though they are in essence dealing with the same problems.

1.3.2 Multivariate functions


We can extend the optimization procedure for univariate functions to multivariate
functions using partial derivatives and relevant conditions. Let us start with the ex-
ample

minimize f (x, y) = x 2 + y 2 , x, y ∈ R. (1.31)

It is obvious that x = 0 and y = 0 is a minimum solution because f (0, 0) = 0. The


question is how to solve this problem formally. We can extend the stationary condition
to partial derivatives, and we have ∂f ∂f
∂x = 0 and ∂y = 0. In this case, we have

∂f ∂f
= 2x + 0 = 0, = 0 + 2y = 0. (1.32)
∂x ∂y

The solution is obviously x∗ = 0 and y∗ = 0.


Now how do we know that it corresponds to a maximum or minimum? If we try to
use the second derivatives, we have four different partial derivatives such as fxx and
fyy , and which one should we use? In fact, we need to define the Hessian matrix from
these second partial derivatives, and we have
⎛ ⎞
 ∂ 2f ∂ 2f
fxx fxy
=⎝ ⎠.
∂x 2 ∂x∂y
H= (1.33)
fyx fyy ∂ 2f ∂ 2f
∂y∂x ∂y 2
Introduction to optimization 13

Since

∂ 2f ∂ 2f
= , (1.34)
∂x∂y ∂y∂x

we can conclude that the Hessian matrix is always symmetric. In the case of f (x, y) =
x 2 + y 2 , it is easy to check that the Hessian matrix is

2 0
H= . (1.35)
0 2

Mathematically speaking, if H is positive definite, then the stationary point (x∗ , y∗ )


corresponds to a local minimum. Similarly, if H is negative definite, then the sta-
tionary point corresponds to a maximum. The definiteness of a symmetric matrix is
controlled by its eigenvalues. For this simple diagonal matrix H , its eigenvalues are
its two diagonal entries 2 and 2. As both eigenvalues are positive, this matrix is pos-
itive definite. Since the Hessian matrix here does not involve any x or y, it is always
positive definite in the whole search domain (x, y) ∈ R2 , so we can conclude that the
solution at point (0, 0) is the global minimum.
Obviously, this is a particular case. In general, the Hessian matrix depends on the
independent variables, but the definiteness test conditions still apply. That is, positive
definiteness of a stationary point means a local minimum. Alternatively, for bivariate
functions, we can define the determinant of the Hessian matrix in Eq. (1.33) as

 = det(H ) = fxx fyy − (fxy )2 . (1.36)

At the stationary point (x∗ , y∗ ), if  > 0 and fxx > 0, then (x∗ , y∗ ) is a local mini-
mum. If  > 0 but fxx < 0, then it is a local maximum. If  = 0, then it is inconclu-
sive, and we have to use other information such as higher-order derivatives. However,
if  < 0, then it is a saddle point. A saddle point is a special point where a local
minimum occurs along one direction, whereas the maximum occurs along another
(orthogonal) direction.

Example 5
To minimize f (x, y) = (x − 1)2 + x 2 y 2 , we have

∂f ∂f
= 2(x − 1) + 2xy 2 = 0, = 0 + 2x 2 y = 0. (1.37)
∂x ∂y

The second condition gives y = 0 or x = 0. Substituting y = 0 into the first condition, we have
x = 1. However, x = 0 does not satisfy the first condition. Therefore, we have a solution x∗ = 1
and y∗ = 0.
For our example with f = (x − 1)2 + x 2 y 2 , we have

∂ 2f 2 2 2
2 + 2, ∂ f = 4xy, ∂ f = 4xy, ∂ f = 2x 2 ,
= 2y (1.38)
∂x 2 ∂x∂y ∂y∂x ∂y 2
14 Introduction to Algorithms for Data Mining and Machine Learning

and thus we have


 
2y 2 + 2 4xy
H= . (1.39)
4xy 2x 2

At the stationary point (x∗ , y∗ ) = (1, 0), the Hessian matrix becomes
 
2 0
H= ,
0 2

which is positive definite because its double eigenvalues 2 are positive. Alternatively, we have
 = 4 > 0 and fxx = 2 > 0. Therefore, (1, 0) is a local minimum.

In fact, for a multivariate function f (x1 , x2 , . . . , xn ) in an n-dimensional space, the


stationary condition can be extended to
∂f ∂f ∂f T
G = ∇f = ( , ,..., ) = 0, (1.40)
∂x1 ∂x2 ∂xn
where G is called the gradient vector. The second derivative test becomes the definite-
ness of the Hessian matrix
⎛ 2 ⎞
∂f ∂ 2f ∂ 2f
...
⎜ ∂x1 2 ∂x1 ∂x2 ∂x1 ∂xn ⎟
⎜ ∂ 2f 2f 2f ⎟
⎜ ∂x ∂x ∂
... ∂x∂2 ∂x ⎟
H =⎜ ⎜ 2 1 ∂x 2 n ⎟. (1.41)

2
⎜ .. . .
.. . .. .
.. ⎟
⎝ ⎠
∂ 2f ∂ 2f ∂ 2f
∂xn ∂x1 ∂xn ∂x2 ... ∂xn 2

At the stationary point defined by G = ∇f = 0, the positive definiteness of H gives a


local minimum, whereas the negative definiteness corresponds to a local maximum. In
essence, the eigenvalues of the Hessian matrix H determine the local behavior of the
function. As we mentioned before, if H is positive semidefinite, then it corresponds
to a local minimum.

1.4 Nonlinear constrained optimization


As most real-world problems are nonlinear, nonlinear mathematical programming
forms an important part of mathematical optimization methods. A broad class of non-
linear programming problems is about the minimization or maximization of f (x) sub-
ject to no constraints, and another important class is the minimization of a quadratic
objective function subject to nonlinear constraints. There are many other nonlinear
programming problems as well.
Nonlinear programming problems are often classified according to the convexity
of the defining functions. An interesting property of a convex function f is that the
Introduction to optimization 15

vanishing of the gradient ∇f (x ∗ ) = 0 guarantees that the point x∗ is a global minimum


or maximum of f . We will introduce the concept of convexity in the next chapter. If
a function is not convex or concave, then it is much more difficult to find its global
minima or maxima.

1.4.1 Penalty method


For the simple function optimization with equality and inequality constraints, a com-
mon method is the penalty method. For the optimization problem

minimize f (x), x = (x1 , . . . , xn )T ∈ Rn ,

subject to φi (x) = 0, (i = 1, . . . , M), ψj (x) ≤ 0, (j = 1, . . . , N ), (1.42)


the idea is to define a penalty function so that the constrained problem is transformed
into an unconstrained problem. Now we define


M 
N
(x, μi , νj ) = f (x) + μi φi2 (x) + νj max{0, ψj (x)}2 , (1.43)
i=1 j =1

where μi 1 and νj ≥ 0.
For example, let us solve the following minimization problem:

minimize f (x) = 40(x − 1)2 , x ∈ R, subject to g(x) = x − a ≥ 0, (1.44)

where a is a given value. Obviously, without this constraint, the minimum value occurs
at x = 1 with fmin = 0. If a < 1, then the constraint will not affect the result. However,
if a > 1, then the minimum should occur at the boundary x = a (which can be obtained
by inspecting or visualizing the objective function and the constraint). Now we can
define a penalty function (x) using a penalty parameter μ 1. We have

(x, μ) = f (x) + μ[g(x)]2 = 40(x − 1)2 + μ(x − a)2 , (1.45)

which converts the original constrained optimization problem into an unconstrained


problem. From the stationarity condition  (x) = 0 we have
40 − μa
80(x − 1) − 2μ(x − a) = 0, or x∗ = . (1.46)
40 − μ
For a particular case a = 1, we have x∗ = 1, and the result does not depend on μ.
However, in the case of a > 1 (say, a = 5), the result will depend on μ. When a = 5
and μ = 100, we have x∗ = 40 − 100 × 5/40 − 100 = 7.6667. If μ = 1000, then this
gives 50 − 1000 ∗ 5/40 − 1000 = 5.1667. Both values are far from the exact solution
xtrue = a = 5. If we use μ = 104 , then we have x∗ ≈ 5.0167. Similarly, for μ = 105 ,
we have x∗ ≈ 5.00167. This clearly demonstrates that the solution in general depends
on μ. However, it is very difficult to use extremely large values without causing extra
computational difficulties.
16 Introduction to Algorithms for Data Mining and Machine Learning

Ideally, the formulation using the penalty method should be properly designed so
that the results will not depend on the penalty coefficient, or at least the dependence
should be sufficiently weak.

1.4.2 Lagrange multipliers


Another powerful method without the limitation of using large μ is the method of
Lagrange multipliers. Suppose we want to minimize a function f (x):

minimize f (x), x = (x1 , . . . , xn )T ∈ Rn , (1.47)

subject to the nonlinear equality constraint

h(x) = 0. (1.48)

Then we can combine the objective function f (x) with the equality to form the new
function, called the Lagrangian,

 = f (x) + λh(x), (1.49)

where λ is the Lagrange multiplier, which is an unknown scalar to be determined.


This again converts the constrained optimization into an unconstrained problem for
(x), which is the beauty of this method. If we have M equalities

hj (x) = 0 (j = 1, . . . , M), (1.50)

then we need M Lagrange multipliers λj (j = 1, . . . , M). We thus have


M
(x, λj ) = f (x) + λj hj (x). (1.51)
j =1

The requirement of stationary conditions leads to

∂ ∂f  ∂hj
M
∂
= + λj (i = 1, . . . , n), = hj = 0 (j = 1, . . . , M). (1.52)
∂xi ∂xi ∂xi ∂λj
j =1

These M + n equations determine the n-component x and M Lagrange multipliers.


∂
As ∂g j
= λj , we can consider λj as the rate of the change of  as a functional of hj .

Example 6
For the well-known monkey surface f (x, y) = x 3 − 3xy 2 , the function does not have a unique
maximum or minimum. In fact, the point x = y = 0 is a saddle point. However, if we impose an
extra equality x − y 2 = 1, we can formulate an optimization problem as

minimize f (x, y) = x 3 − 3xy 2 , (x, y) ∈ R2 ,


Introduction to optimization 17

subject to

h(x, y) = x − y 2 = 1.

Now we can define

= f (x, y) + λh(x, y) = x 3 − 3xy 2 + λ(x − y 2 − 1).

The stationary conditions become

∂ ∂
= 3x 2 − 3y 2 + λ = 0, = 0 − 6xy + (−2λy) = 0,
∂x ∂y

= x − y 2 − 1 = 0.
∂λ

The second condition −6xy − 2λy = −2y(3x + λ) = 0 implies that y = 0 or λ = −3x.


• If y = 0, then the third condition x − y 2 − 1 = 0 gives x = 1. The first condition 3x 2 + 3y 2 −
λ = 0 leads to λ = −3. Therefore, x = 1 and y = 0 is an optimal solution with fmin = 1. It is
straightforward to verify that this solution corresponds to a minimum (not a maximum).
• If λ = −3x, then the first condition becomes 3x 2 − 3y 2 − 3x = 0. Substituting x = y 2 + 1
(from the third condition), we have

3(y 2 + 1)2 − 3y 2 − 3(y 2 + 1) = 0, or 3(y 4 + 2) = 0.

This equation has no solution in the real domain. Therefore, the optimality occurs at (1, 0) with
fmin = 1.

1.4.3 Karush–Kuhn–Tucker conditions


There is a counterpart of the Lagrange multipliers for nonlinear optimization with
inequality constraints. The Karush–Kuhn–Tucker (KKT) conditions concern the re-
quirement for a solution to be optimal in nonlinear programming [111].
Let us know focus on the nonlinear optimization problem

minimize f (x), x ∈ Rn ,
subject to φi (x) = 0 (i = 1, . . . , M), ψj (x) ≤ 0 (j = 1, . . . , N). (1.53)

If all the functions are continuously differentiable at a local minimum x ∗ , then there
exist constants λ0 , λ1 , . . . , λq and μ1 , . . . , μp such that


M 
N
λ0 ∇f (x ∗ ) + μi ∇φi (x ∗ ) + λj ∇ψj (x ∗ ) = 0, (1.54)
i=1 j =1
ψj (x ∗ ) ≤ 0, λj ψj (x ∗ ) = 0 (j = 1, 2, . . . , N), (1.55)
18 Introduction to Algorithms for Data Mining and Machine Learning

 M
where λj ≥ 0 (i = 0, 1, . . . , N). The constants satisfy Nj =0 λj + i=1 |μi | ≥ 0. This
is essentially a generalized method of the Lagrange multipliers. However, there is a
possibility of degeneracy when λ0 = 0 under certain conditions.
It is worth pointing out that such KKT conditions can be useful to prove theorems
and sometimes useful to gain insight into certain types of problems. However, they are
not really helpful in practice in the sense that they do not give any indication where
the optimal solutions may lie in the search domain so as to guide the search process.
Optimization problems, especially highly nonlinear multimodal problems, are usu-
ally difficult to solve. However, if we are mainly concerned about local optimal or
suboptimal solutions (not necessarily about global optimal solutions), there are rel-
atively efficient methods such as interior-point methods, trust-region methods, the
simplex method, sequential quadratic programming, and swarm intelligence-based
methods [151]. All these methods have been implemented in a diverse range of soft-
ware packages. Interested readers can refer to more advanced literature.

1.5 Notes on software


Though there many different algorithms for optimization, most software packages and
programming languages have some sort of optimization capabilities due to the popu-
larity and relevance of optimization in many applications. For example, Wikipedia has
some extensive lists of
• optimization software,1
• data mining and machine learning,2
• deep learning software.3
There is a huge list of software packages and internet resources; it requires a
lengthy book to cover most of it, which is not our intention here. Interested readers
can refer to them for more detail.

1 https://en.wikipedia.org/wiki/List_of_optimization_software.
2 https://en.wikipedia.org/wiki/Category:Data_mining_and_machine_learning_software.
3 https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software.
Mathematical foundations
2
Contents
2.1 Convexity 20
2.1.1 Linear and affine functions 20
2.1.2 Convex functions 21
2.1.3 Mathematical operations on convex functions 22
2.2 Computational complexity 22
2.2.1 Time and space complexity 24
2.2.2 Complexity of algorithms 25
2.3 Norms and regularization 26
2.3.1 Norms 26
2.3.2 Regularization 28
2.4 Probability distributions 29
2.4.1 Random variables 29
2.4.2 Probability distributions 30
2.4.3 Conditional probability and Bayesian rule 32
2.4.4 Gaussian process 34
2.5 Bayesian network and Markov models 35
2.6 Monte Carlo sampling 36
2.6.1 Markov chain Monte Carlo 37
2.6.2 Metropolis–Hastings algorithm 37
2.6.3 Gibbs sampler 39
2.7 Entropy, cross entropy, and KL divergence 39
2.7.1 Entropy and cross entropy 39
2.7.2 DL divergence 40
2.8 Fuzzy rules 41
2.9 Data mining and machine learning 42
2.9.1 Data mining 42
2.9.2 Machine learning 42
2.10 Notes on software 42

Though the main requirement of this book is basic calculus, we will still briefly review
some basic concepts concerning functions and basic calculus and then introduce some
new concepts. The readers can skip this chapter if they are already familiar with such
topics.
Introduction to Algorithms for Data Mining and Machine Learning. https://doi.org/10.1016/B978-0-12-817216-2.00009-0
Copyright © 2019 Elsevier Inc. All rights reserved.
20 Introduction to Algorithms for Data Mining and Machine Learning

2.1 Convexity

2.1.1 Linear and affine functions


Generally speaking, a function is a mapping from independent variables or inputs to a
dependent variable or variables/outputs. For example, the function

f (x, y) = x 2 + y 2 + xy, (2.1)

depends on two independent variables. This function maps the domain R2 (for −∞ <
x < ∞ and −∞ < y < ∞) to f on the real axis as its range. So we use the notation
f : R2 → R to denote this.
In general, a function f (x, y, z, . . . ) maps n independent variables to m dependent
variables, and we use the notation f : Rn → Rm to mean that the domain of the func-
tion is a subset of Rn , whereas its range is a subset of Rm . The domain of a function
is sometimes denoted by dom(f ) or dom f .
The inputs or independent variables can often be written as a vector. For simplicity,
we often use a vector x = (x, y, z, . . . )T = (x1 , x2 , . . . , xn )T for multiple variables.
Therefore, f (x) is often used to mean f (x, y, z, . . . ) or f (x1 , x2 , . . . , xn ).
A function L(x) is called linear if

L(x + y) = L(x) + L(y) and L(αx) = αL(x) (2.2)

for any vectors x and y and any scalar α ∈ R.

Example 7
To see if f (x) = f (x1 , x2 ) = 2x1 + 3x2 is linear, we use

f (x1 + y1 , x2 + y2 ) = 2(x1 + y1 ) + 3(x2 + y2 ) = 2x1 + 2y1 + 3x2 + 3y2


= [2x1 + 3x2 ] + [2y1 + 3y2 ] = f (x1 , x2 ) + f (y1 , y2 ).

In addition, for any scalar α, we have

f (αx1 , αx2 ) = 2αx1 + 3αx2 = α[2x1 + 3x2 ] = αf (x1 , x2 ).

Therefore, this function is indeed linear. This function can also be written as a vector form
 
  x
1
f (x) = 2 3 = a · x = a T x,
x2

where a · x = a T x is the inner product of a = (2 3)T and x = (x1 x2 )T .

In general, functions can be a multiple-component vector, which can be written


as F [22]. A function F is called affine if there exists a linear function L and a vector
constant b such that F = L(x) + b. In general, an affine function is a linear function
Mathematical foundations 21

Figure 2.1 Convex functions.

with translation, which can be written in a matrix form F = Ax + b, where A is an


m × n matrix, and b is a column vector in Rn .
Knowing the properties of a function can be useful for finding the maximum or
minimum of the function. In fact, in mathematical optimization, nonlinear problems
are often classified according to the convexity of the defining function(s). Geometri-
cally speaking, an object is convex if for any two points within the object, every point
on the straight line segment joining them is also within the object. Examples are a
solid ball, a cube, and a pyramid. Obviously, a hollow object is not convex.
Mathematically speaking, a set S ∈ Rn in a real vector space is called a convex set
if

θ x + (1 − θ )y ∈ S, ∀(x, y) ∈ S, θ ∈ [0, 1]. (2.3)

Thus, an affine set is always convex, but a convex set is not necessarily affine.

2.1.2 Convex functions


A function f (x) defined on a convex set  is called convex if

f (αx + βy) ≤ αf (x) + βf (y), ∀x, y ∈ , (2.4)

where

α ≥ 0, β ≥ 0, α + β = 1. (2.5)

Some examples of convex functions are shown in Fig. 2.1.

Example 8
For example, the convexity of f (x) = x 2 − 1 requires

(αx + βy)2 − 1 ≤ α(x 2 − 1) + β(y 2 − 1), ∀x, y ∈ ,

where α, β ≥ 0 and α + β = 1. This is equivalent to

αx 2 + βy 2 − (αx + βy)2 ≥ 0,
22 Introduction to Algorithms for Data Mining and Machine Learning

where we have used α + β = 1. We now have

αx 2 + βy 2 − α 2 x 2 − 2αβxy − β 2 y 2
= α(1 − α)(x − y)2 = αβ(x − y)2 ≥ 0,

which is always true because α, β ≥ 0 and (x − y)2 ≥ 0. Therefore, f (x) = x 2 − 1 is convex for
all x ∈ R.

A function f (x) on  is concave if and only if g(x) = −f (x) is convex. An


interesting property of a convex function f is that the vanishing of the gradient
df/dx|x∗ = 0 guarantees that the point x∗ is the global minimum of f . Similarly,
for a concave function, any local maximum is also the global maximum. If a function
is not convex or concave, then it is much more difficult to find its global minimum or
maximum.

2.1.3 Mathematical operations on convex functions


There are some important mathematical operations that still preserve the convexity:
nonnegative weighted sum, composition using affine functions, and maximization or
minimization. For example, if f is convex, then βf is also convex for β ≥ 0. The
nonnegative sum αf1 + βf2 is convex if f1 , f2 are convex and α, β ≥ 0.
The composition using an affine function also holds. For example, f (Ax + b) is
convex if f is convex. In addition, if f1 , f2 , . . . , fn are convex, then the maximum of
all these functions, max{f1 , f2 , . . . , fn }, is also convex. Similarly, the piecewise-linear
function maxni=1 (Ai x + bi ) is also convex.
If both f and g are convex, then ψ(x) = f (g(x)) can also be convex under certain
nondecreasing conditions. For example, exp[f (x)] is convex if f (x) is convex. This
can be extended to the vector composition, and most interestingly, the log-sum-exp
function

n
f (x) = log exk , (2.6)
k=1

is convex. For a more comprehensive introduction of convex functions, we refer the


readers to more advanced literature such as the book by Boyd and Vandenberghe [22].

2.2 Computational complexity


In the description of algorithmic complexity, we often have to use the order notations,
often in terms of big O and small o. Loosely speaking, for two functions f (x) and
g(x), if
f (x)
lim → K, (2.7)
x→x0 g(x)
Mathematical foundations 23

where K is a finite, nonzero limit, we write

f = O(g). (2.8)

The big O notation means that f is asymptotically equivalent to the order of g(x). If
the limit is unity or K = 1, then we that say f (x) is asymptotically equivalent to g(x).
In this particular case, we write

f ∼ g, (2.9)

which is equivalent to f/g → 1 and g/f → 1 as x → x0 . Obviously, x0 can be any


value, including 0 and ∞. The notation ∼ does not necessarily mean ≈ in general,
though it may give the same results, especially in the case where x → 0. For example,
sin x ∼ x and sin x ≈ x as x → 0.
When we say f is order of 100 (or f ∼ 100), this does not mean f ≈ 100 but rather
that f can be between about 50 and 150. The small o notation is often used if the limit
tends to 0, that is,
f
lim → 0, (2.10)
x→x0 g
or

f = o(g). (2.11)

If g > 0, then f = o(g) is equivalent to f  g (that is, f is much less than g).

Example 9
For example, for all x ∈ R, we have

x2 x3 xn
ex = 1 + x + + + ··· + + ··· , (2.12)
2! 3! n!

which can be written as

x2
ex ≈ 1 + x + O(x 2 ) ≈ 1 + x + + o(x), (2.13)
2
depending on the accuracy of the approximation of interest.

It is worth pointing out that the expressions in computational complexity are most
concerned with functions such as f (n) of an input of problem size n, where n ∈ N is
an integer in the set of natural numbers N = {1, 2, 3, . . . }.
For example, for the functions f (n) = 10n2 + 20n + 100 and g(n) = 5n2 , we have
 
f (n) = O g(n) (2.14)
24 Introduction to Algorithms for Data Mining and Machine Learning

for every sufficiently large n. As n is sufficiently large, n2 is much larger than n (i.e.,
n2 n), then n2 terms dominate two expressions. To emphasize the input n, we can
often write
 
f (n) = O g(n) = O(n2 ). (2.15)

In addition, f (n) is in general a polynomial of n, which not only includes terms such as
n3 and n2 , but it also may include n2.5 or log(n). Therefore, f (n) = 100n3 + 20n2.5 +
25n log(n) + 123n is a valid polynomial in the context of computational complexity.
In this case, we have

f (n) = 100n3 + 20n2.5 + 25n log(n) + 123n = O(n3 ). (2.16)

Here, we always implicitly assume that n is sufficiently large and the base of the
logarithm is 2.
To measure how easily or hardly a problem can be solved, we need to estimate its
computational complexity. We cannot simply ask how long it takes to solve a particular
problem instance because the actual computational time depends on both hardware
and software used to solve it. Thus, time does not make much sense in this context.
A useful measure of complexity should be independent of the hardware and software
used. However, such complexity is closely linked to the algorithms used.

2.2.1 Time and space complexity


To find the maximum (or minimum) among n different numbers, we only need to go
through each number once by simply comparing the current number with the highest
(or lowest) number once and update the new highest (or lowest) when necessary. Thus,
the number of mathematical operations is simply O(n), which is the time complexity
of this problem.
In practice, comparing two big numbers may take slightly longer, and different
representations of numbers can also affect the speed of this comparison. In addition,
multiplication and division usually take more time than simple addition and substrac-
tion. However, in computational complexity, we usually ignore such minor differences
and simply treat all operations as equal. In this sense, the complexity is about the num-
ber or order of mathematical operations, not the actual order of computational time.
On the other hand, space computational complexity estimates the size of com-
puter memory needed to solve the problem. In the previous simple problem of finding
the maximum or minimum among n different numbers, the memory needed is O(n)
because it needs to store n different numbers at n different entries in the computer
memory. Though we need one more entry to store the largest or smallest number, this
minor change does not affect the order of complexity because we implicitly assume
that n is sufficiently large [6,58].
In most literature, if there is no time or space explicitly used when talking about
computational complexity, it usually means time complexity. In discussing computa-
Mathematical foundations 25

tional complexity, we often use the word “problem” to mean a class of problems of
the same type and an “instance” to mean a specific example of a problem class. Thus,
Ax = b is a problem (class) for linear algebra, whereas

  
2 3 x 8
= (2.17)
1 1 y 3

is an instance. In addition, a decision problem is a yes–no problem where an output is


binary (0 or 1), even though the inputs can be any values.
The computational complexity is closely linked to the type of problems. For the
same type of problems, different algorithms can be used, and the number of basic
mathematical operations may be different. In this case, we are concerned with the
complexity of an algorithm in terms of arithmetic complexity.

2.2.2 Complexity of algorithms


The computational complexity discussed up to now has focused on the problems, and
the algorithms are mainly described simply in terms of polynomial or exponential
time. From the perspective of algorithm development and analysis, different algo-
rithms will have different complexity even for the same type of problems. In this case,
we have to estimate the arithmetic complexity of an algorithm or simply algorithmic
complexity.
For example, to solve a sorting problem with n different numbers so as to sort
them from the smallest to the largest, we can use different algorithms. For example,
the selection sort uses two loops for sorting n, which has an algorithmic complexity
of O(n2 ), whereas the quicksort (or partition and exchange sort) has a complexity of
O(n log n). There are many different sorting algorithms with different complexities.
It is worth pointing out that the algorithmic complexity here is mainly about time
complexity because the space (memory) complexity is less important. In this case, the
space algorithmic complexity is O(n).

Example 10
The multiplication of two n × n matrices A and B using simple matrix multiplication rules has
a complexity of O(n3 ). There are n rows and n columns for each matrix, and their product C
has n × n entries. To get each entry, we need to carry out the multiplication of a row of A by a
corresponding column of B and calculate their sum, and thus the complexity is O(n). As there
are n × n = n2 entries, the overall complexity is O(n2 ) × O(n) = O(n3 ).

In the rest of this book, we analyze different algorithms; the complexity to be given
is usually the arithmetic complexity of an algorithm under discussion.
26 Introduction to Algorithms for Data Mining and Machine Learning

2.3 Norms and regularization

2.3.1 Norms
In general, a vector in an n-dimensional space (n ≥ 1) can be written as a column
vector
⎛ ⎞
x1
⎜ x2 ⎟
⎜ ⎟
x = ⎜ . ⎟ = (x1 , x2 , . . . , xn )T (2.18)
⎝ .. ⎠
xn

or a row vector
 
x = x1 x2 ... xn . (2.19)

A simple transpose (T) can convert a column vector into its corresponding row vector.
The length of x can be written as

||x|| = x12 + x22 + · · · + xn2 , (2.20)

which is the Euclidean norm.


The addition or substraction of two vectors u and v are the addition or substraction
of their corresponding components, that is,
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
u1 v1 u1 ± v1
⎜ u2 ⎟ ⎜ v2 ⎟ ⎜ u2 ± v2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
u±v=⎜ . ⎟±⎜ . ⎟=⎜ .. ⎟. (2.21)
⎝ .. ⎠ ⎝ .. ⎠ ⎝ . ⎠
un vn un ± vn

The dot product, also called the inner product, of two vectors u and v is defined as


n
uT v ≡ u · v = ui vi = u1 v1 + u2 v2 + · · · + un vn . (2.22)
i=1

For an n-dimensional vector x, we can define the p-norm or Lp -norm (also


Lp -norm) as
 1/p 
n 1/p
||x||p ≡ |x1 | + |x2 | + · · · + |xn |
p p p
= |xi |p , p > 0. (2.23)
i=1

Obviously, the Cartesian norm or length is the L2 -norm


 
||x||2 = |x1 |2 + |x2 |2 + · · · + |xn |2 = x12 + x22 + · · · + xn2 . (2.24)
Mathematical foundations 27

Three most widely used norms are p = 1, 2, and ∞ [160]. When p = 2, it becomes
the Cartesian L2 -norm as discussed before. When p = 1, the L1 -norm is given by

||x||1 = |x1 | + |x2 | + · · · + |xn |. (2.25)

For p = ∞, it becomes

||x||∞ = max{|x1 |, |x2 |, . . . , |xn |} = xmax , (2.26)

which is the largest absolute component of x. This is because



p 1/p  n 
  
 xi p 1/p
||x||∞ = lim |xi |p = lim |xmax |p  
p→∞ p→∞ xmax
i=1 i=1

n
 xi 1/p
= xmax lim   = xmax , (2.27)
p→∞ xmax
i=1

where we have used the fact that |xi /xmax | < 1 (except for one component, say, |xk | =
xmax ). Thus, limp→∞ |xi /xmax |p → 0 for all i = k. Thus, the sum of all ratio terms
is 1, that is,
  x p 1/p
 i 
lim   = 1. (2.28)
p→∞ xmax

In general, for any two vectors u and v in the same space, we have the inequality

||u||p + ||v||p ≥ ||u + v||p , p ≥ 0. (2.29)

Example 11
For two vectors u = [1 2 3]T and v = [1 − 2 − 1]T , we have

uT v = 1 × 1 + 2 × (−2) + 3 × (−1) = −6,


||u||1 = |1| + |2| + |3| = 6, ||v||1 = |1| + | − 2| + | − 1| = 4,
 √  √
||u||2 = 12 + 22 + 32 = 14, ||v||2 = 12 + (−2)2 + (−1)2 = 6,

||u||∞ = max{|1|, |2|, |3|} = 3, ||v||∞ = max{|1|, | − 2|, | − 1|} = 2,

and
 T  T
w=u+v= 1+1 2 + (−2) 3 + (−1) = 2 0 2

with norms

||w||1 = |2| + |0| + |2| = 2, ||w||∞ = max{|2|, |0|, |2|} = 2,


 √
||w||2 = 22 + 02 + 22 = 8.
28 Introduction to Algorithms for Data Mining and Machine Learning

Figure 2.2 Different p-norms for p = 1, 2, and ∞ (left) as well as p = 1/2 and p = 4 (right).

Using these values, it is straightforward to verify that

||u||p + ||v||p ≥ ||u + v||p (p = 1, 2, ∞).

In the particular case of two-dimensional (2D) vectors, different norms Lp =


(|x|p + |y|p )1/p with different values of p are shown in Fig. 2.2.

2.3.2 Regularization
In many applications such as curve-fitting and machine learning, overfitting can be a
serious issue, and one way to avoid overfitting is using regularization. Loosely speak-
ing, regularization is using some penalty term added to the objective or loss function so
as to constrain certain model parameters. For example, in the method of least-squares
and many learning algorithms, the objective is to minimize the loss function L(x),
which represents the errors between data labels yi and the predictions fi = f (xi ) for
m data points (xi , yi ), i = 1, 2, . . . , m, that is,


m
 2
L(x) = yi − f (xi ) , (2.30)
i=1

which is the L2 -norm of the errors Ei = yi − fi . The model prediction f (x, w) usually
have many model parameters such as w = (w1 , w2 , ..., wK ) for simple polynomial
curve-fitting. In general, a prediction model can have K different model parameters,
overfitting can occur if the model becomes too complex with too many parameters, and
the oscillations become significant. Thus, a penalty term in terms of some norm of the
model parameters is usually added to the loss function. For example, the well-known
Tikhonov regularization uses the L2 -norm, and we have

m 
 2
minimize yi − f (xi , w) + λ||w||2 , (2.31)
i=1
Mathematical foundations 29

where λ > 0 is the penalty parameter. Obviously, other norms can be used. For exam-
ple, in the Lasso method, the regularization uses 1-norm, which gives

1  2
m
minimize yi − f (xi , w) + λ||w||1 . (2.32)
m
i=1

We will introduce both the method of least-squares and Lasso method in late chapters.

2.4 Probability distributions

2.4.1 Random variables


For a discrete random variable X with distinct values such as the number of cars
passing through a junction, each value xi may occur with certain probability p(xi ).
In other words, the probability varies and is associated with the corresponding ran-
dom variable. Traditionally, an uppercase letter such as X is used to denote a random
variable, whereas a lowercase letter such as xi represents its values. For example, if
X means a coin-flipping event, then xi = 0 (tail) or 1 (head). A probability function
p(xi ) is a function that assigns probabilities to all the discrete values xi of the random
variable X.
As an event must occur inside a sample space, the requirement that all the proba-
bilities must be summed to one, which leads to


n
p(xi ) = 1. (2.33)
i=1

For example, the outcomes of tossing a fair coin form a sample space. The outcome
of a head (H) is an event with probability P (H ) = 1/2, and the outcome of a tail (T)
is also an event with probability P (T ) = 1/2. The sum of both probabilities should be
one, that is,

1 1
P (H ) + P (T ) = + = 1. (2.34)
2 2
The cumulative probability function of X is defined by

P (X ≤ x) = p(xi ). (2.35)
xi <x

Two main measures for a random variable X with given probability distribution
p(x) are its mean and variance. The mean μ or expectation of E[X] is defined by

μ ≡ E[X] ≡<X>= xp(x)dx (2.36)
30 Introduction to Algorithms for Data Mining and Machine Learning

for a continuous distribution and the integration is within the integration limits. If the
random variable is discrete, then the integration becomes the weighted sum

E[X] = xi p(xi ). (2.37)
i

The variance var[X] = σ 2 is the expectation value of the deviation squared, that is,
E[(X − μ)2 ]. We have

σ 2 ≡ var[X] = E[(X − μ)2 ] = (x − μ)2 p(x)dx. (2.38)


The square root of the variance σ = var[X] is called the standard deviation, which
is simply σ .
The above definition of mean μ = E[X] is essentially the first moment if we define
the kth moment of a random variable X (with a probability density distribution p(x))
by

μk ≡ E[X ] = x k p(x)dx (k = 1, 2, 3, . . . ).
k
(2.39)

Similarly, we can define the kth central moment by

νk ≡ E[(X − E[X])k ] ≡ E[(X − μ)k ]



= (x − μ)k p(x)dx (k = 0, 1, 2, 3, . . . ), (2.40)

where μ is the mean (the first moment). Thus, the zeroth central moment is the sum of
all probabilities when k = 0, which gives ν0 = 1. The first central moment is ν1 = 0.
The second central moment ν2 is the variance σ 2 , that is, ν2 = σ 2 .

2.4.2 Probability distributions


There are a number of other important distributions such as the normal distribution,
Poisson distribution, exponential distribution, binomial distribution, Cauchy distribu-
tion, Lévy distribution, and Student t-distribution.
A Bernoulli distribution is a distribution of outcomes of a binary random variable
X where the random variable can only take two values, either 1 (success or yes) or 0
(failure or no). The probability of taking 1 is 0 ≤ p ≤ 1, whereas the probability of
taking 0 is q = 1 − p. Then, the probability mass function can be written as

p if m = 1,
B(m, p) = (2.41)
1 − p, if m = 0,

which can be written more compactly as

B(m, p) = p m (1 − p)1−m , m ∈ {0, 1}. (2.42)


Other documents randomly have
different content
heather; thus the noon of the next was far advanced before he set
out once more.
Malise MacKim, his sullen acquaintance of the preceding evening,
conducted him for some distance beyond the Urr, and told him, what
Gray already knew well, that if he wished to reach the clachan of
Tongland, he must pass the Loch of Carlinwark on his right, and
pursue the road that lay through the wood on the left bank of the
Dee.
"And whither go you, my friend?" he asked, as the gigantic smith was
about to leave him.
"To join my seven sons, and scheme our vengeance; yet what can
mortal vengeance avail against the earl of Douglas?"——"How?" said
Gray; "in what manner?"
"Know you not that he wears a warlock jacket, against which the
sharpest swords are pointless?"
"What do you mean?" asked the soldier, keeping his horse in check.
"I mean a doublet made for him by a warlock in Glenkens, woven of
the skins of water-snakes caught in a south-running burn where
three lairds' lands met, and woven for him under the beams of a
March moon, on the haunted Moat of Urr."
Gray laughed and said, "I should like to test this dagger, my poor
MacLellan's gift, upon that same doublet."
"Moreover," said the smith, lowering his voice, while a deeper scowl
impressed his grisly visage, "it is said in Galloway here, that when
Earl James, a child, was held by his godmother at the font in
Tongland Abbey Kirk, the blessed water, as it fell from the hand of
Abbot John, hissed upon his little face as upon iron in a white heat."
"Peace, carle! can a stout fellow like thee be moonling enough to give
such stories credence?"
"'Tis folly, perhaps, to think of them, betouch us, too! so near the
Moat of Urr," said the smith, with a perceptible shudder, as he
glanced covertly over his shoulder.
"And why here more than elsewhere?"
"Know ye not?" asked the smith, in a whisper.
"You forget that I am a stranger."
"True. Then it was on this spot that James Achanna, the earl's sooth
fast-friend and henchman, sold himself to Satan, after conjuring him
up by performing some nameless rites of hell."
"Adieu, and God be wi' you," said Gray, laughing, yet nevertheless
making the sign of the cross, for the place was savage and solitary,
and he was not without a due share of the superstition incident to his
age and country. Turning his horse, he rode rapidly off.
As he did so, a cunning smile passed over the swarthy face of Malise
MacKim, who swung his mace round his head as if he were about to
brain an enemy.
The day was far advanced, when, at Kelton, Gray crossed the Dee by
a flat-bottomed boat, near a place where a group of peasants were
assembled under a gallows-tree. Thereon hung a man, and there, by
paying a fee to the doomster of Thrave, persons afflicted by wens, or
similar excrescences, came to obtain the benefit of the deid-strake—a
touch of the dead hand being deemed a certain cure.
When Gray saw the poor corse swinging in the wind, he remembered
the fate of Sir Herbert Maxwell, and reflected how easily Douglas
might release Murielle from her marriage-ties by putting him to
death, as he had done that powerful baron; yet his heart never
trembled, nor did he swerve from his resolution of attempting to save
MacLellan, in spite of every danger.
CHAPTER XLVI.
AN UNEXPECTED GUIDE.

And a good evening to my friend Don Carlos.


Some lucky star has led my steps this way:
I was in search of you.—The Spanish Student.
Before Gray crossed the Dee at Kelton, there came over the scenery
a dense white mist, which rolled like smoke along the hills, and hung
in dewdrops on his horse's mane and bridle, dimming the brightness
of his armour and the embroidery of his surcoat. In this obscurity he
lost his way amid the waste muirlands which the road, a mere bridle-
path destitute of wall or fence, traversed. Then a sharp shower of
hail fell, the stones rattling on his steel trappings as on a latticed
window; and through the openings in the haze, the far-stretching
dells and pastoral hills of Galloway seemed wet and grey and dreary.
The country was singularly desolate; he met no person to direct him;
thus, amid the obscurity of the mist and the approaching evening, he
knew not where he was, but continued to ride slowly and vaguely on.
Anon a breeze came, and the grey clouds began to disperse; the hail
ceased, and the haze rolled away like finely-carded wool along the
sides of the hills. The setting sun of August beamed forth in his
farewell splendour, the mavis and merle chorused merrily in the
sauch and hawthorn trees; for a time the hill-tops became green, and
the high corn that waved on the upland slopes seemed to brighten
with the partial heat and moisture. After a time, Sir Patrick found that
he had penetrated to the border of Glenkens, then the wildest and
most savage part of Galloway. Wheeling round his horse, he rode fast
in the direction from whence he had come, and just as the sun's
broad disc began to dip behind the grassy hills, and to shed its warm
light upon the windings of the Dee, from an eminence he could see
afar off the vast square keep of Thrave looming black and sombre,
with the dusky smoke ascending from its great chimney-stalks into
the calm sky in steady columns, unbroken by the breeze.
Soft was the evening light, and softer now the air, and no sound but
the occasional lowing of the black cattle, or those nameless country
noises which seem to come from afar, broke the stillness of the vast
pastoral landscape.
The Dee has all the characteristics of a Scottish stream: now gliding
stealthily and sullenly through deep pools and dark rocky chasms,
where the wiry pine, the crisp-leaved oak, and the feathery silver
birch cast their shadows on the darting trout or the lurking salmon;
now chafing and brawling in white foam over a precipitous ledge of
red sandstone, then gurgling down a bed of "unnumbered pebbles;"
and now sweeping broad and stilly past a thatched clachan, a baron's
moated tower, a ruined chapel, where bells were rung and masses
said when Alan was lord of Galloway and constable of Scotland; then
round some statelier fane like Tongland, or a vast feudal strength like
Thrave of the Douglases.
After seeing the latter, Gray rode slowly and thoughtfully, for it
brought the face, the form, the voice, the smile, and all the image of
Murielle more vividly before him. The scenery, the place, the very air,
seemed full of the presence of her, his loved and lost one.
And now the moon arose, but not brilliantly; it shot fitful gleams
between weird masses of flying cloud, with a pale and ghastly effect
which made the gnarled trunks of the old trees seem like spectres or
fantastic figures. Erelong, Gray entered a long and narrow glen,
clothed on each side by a thick fir forest, where the density of the
wiry foliage was such that the darkness became quite opaque.
Here he suddenly found himself joined by a horseman, who came
either from the wayside thicket or out of the ground—it might have
been either, so unexpected was his appearance. A gleam of the
moon, as it came down a ravine, showed that this man and his horse
were of great strength and stature. He wore a hunting suit, with a
sword, bugle, and small steel cap which glittered in the moonlight.
"Under favour, I presume, we may travel together?" said he, bluntly.
"Provided the road be broad enough," replied Gray in the same
manner, for there was something in this man's voice which strangely
affected him, causing his hair to bristle up, his pulses to quicken, and
the almost obliterated wounds on his face to smart.
Whence was this emotion? Where had he heard that voice before?
Where seen that grim and sturdy figure? Each looked from time to
time at the other, and seemed anxious to make out who or what he
was.
"Go you far this way?" asked the stranger.
"No," replied Gray, curtly.
"May I ask how far?"——"I am bound for Thrave."
"Indeed," said the other, looking fixedly at Gray, as they walked their
horses side by side; "have you made a long journey?"——"From
Edinburgh," replied Gray, briefly.
"You are a bold man to pass through the Johnstones of Annandale
and the Borderland at this time."
"How bolder at this time than at any other?"
"That you may soon learn," replied the stranger, laughing at Gray's
tone of displeasure.
"I am Sir Patrick Gray of Foulis, captain of the king's guard, and am
bound to ride wherever he may order, and woe to those who dare
obstruct me," said Gray, peering forward to discern the speaker, who
started visibly at this reply, and after the silence of a moment said, "I
too am bound for Thrave. For two days past I have been abroad
hunting, but have missed or outridden my friends. Well, what may
the news be from the good town of Edinburgh? and how fare the
king, his carpet knights, and cock-sparrow courtiers, eh?"
"Were I not riding on the king's errand, which makes my life more
precious than if I were riding on my own, I would find you a more
fitting reply than words," said Gray, who could scarcely repress his
rising wrath, for the tone of the other chafed him.
"You have chosen a perilous time, assuredly, to enter Galloway on the
king's service," observed the stranger, loftily; "and if my words
displease, I can give full reparation when your errand is sped."
"'Tis enough, sir," said Gray, hoarsely; "on the morrow I shall have
sure vengeance. No man shall slight the king in my presence, and
live."
"By my sooth, his last messenger—the Rothsay herald, who came
hither anent the laird of Teregles—left Galloway faster than he
entered it. We are about to teach this James Stuart, that the realm of
Scotland was not made for his especial use. What? after fighting for
centuries, and defeating Romans, Saxons, Danes, and Normans—in
short, all the invaders of England—we are now to tremble before this
boy-king and his little Gueldrian wife? Has he forgotten how his
father died?"
"How—what mean you?" asked Gray, making a vigorous effort to
control his passion.
"In the Black Friary at Perth," said the other, grinding his teeth, "with
the swords of Grahame and Athole clashing in his heart."
"Be assured our king has not forgotten it—but hush—be wary."
"And wherefore hush?" was the fierce response.
"Because they who did that deed, the most foul murder of God's
anointed king, perished miserably on the scaffold more than twenty
years ago. Their ashes have long since mingled with the earth, for
fire consumed and the wind of heaven scattered them; but their
names exist in the execration of all good men and true."
"Hard words," said the other, scoffingly; "hard words, sir, for us, who
are on the eve, perhaps, of a most just rebellion against his son, if he
and his Flemish princess, with that old fox, Crichton, push us too
severely; and then I think his dainty Falkland knights, and well-fed
Lothian infantry, may find it perilous work to march through Nithsdale
and penetrate among the wild hills of Galloway."
Gray did not answer; they had now emerged from the wood, and the
Loch of Carlinwark was shining like a mirror in the full splendour of
the moonlight. At some distance he could discern the three old thorn-
trees, where, on a similarly calm and lovely moonlit night, he had first
plighted love, life, and hope, to Murielle; and now, as then, he could
see Thrave, her home and her prison, casting its long black shadows
on the Dee.
"Here is Thrave," said the stranger, reining up his powerful horse
beside the barbican-gate.
"And you, sir?——"
"I am James, earl of Douglas," replied the other, sternly and loftily;
"you are on the king's errand, Sir Patrick Gray—'tis well; I bid you
welcome; but remember, save for that tabard which you wear, by the
bones of St. Bryde, I would hang you by the neck from that stone
knob above the gate!"
Gray bowed and smiled bitterly, as they rode into the court-yard, and
he found himself inclosed by the gates and surrounded by the
followers of his mortal enemy.
He had shuddered on passing under the barbican-gate, for a man
was hanging at the gallows-knob above it. Gray knew the dread
custom there—that each culprit or victim was replaced by another,
and he knew not who the next "tassel" might be.
The night-wind lifted the dead man's hair at times as the body swung
mournfully to and fro. Beneath this ghastly object, in the blaze of the
torches which were upheld by a crowd of liveried serving-men and
savage-looking kilted Galwegians, there shone a great shield of
carved and painted stone. It bore the arms of the ancient lords of
Galloway—azure a lion rampant, argent crowned with an imperial
diadem, impaled with the countless quarterings of the Douglases.
The moon was waning now; but the number of torches, as they
flared on the walls and grated windows of the vast keep, made the
court-yard seem light as if the night was noon.
As Gray dismounted, a familiar voice reached his ear, saying,
"Thanks, brave friend and kinsman—you have perilled much to save
me!"
"Thou art right, MacLellan—I come by the king's orders, so take
courage!" replied Gray, looking about him; but from which of the
black gratings of that lofty edifice the voice came his eye failed to
discover.
A cruel smile passed over the grim face of the earl, as he said, "Sir
Patrick Gray, it is ill speaking between a full man and a fasting; so get
you to bed for to-night; after breakfast to-morrow, we will consider
your errand from the king; and you have my knightly word for your
safety while within the walls of Thrave."
"And how, when I leave them, my lord?"
"That is as may be," said the other, turning on his heel.
With these dubious, or rather ominous words, the earl retired, and
within an hour, Gray, after partaking of some refreshment alone,
found himself lying on a couch with his armour on and his drawn
sword by his side, endeavouring to court sleep, with his mind full of
the terrible novelty of his situation; and not without a sense of
charm, for he knew that Murielle was near him, and that the same
roof covered them both.
On a tabourette by the bedside were placed a night-lamp, a cup of
sack posset, and the earl of Douglas's dagger as a symbol of peace
and protection—that he had armed his unwelcome guest against
even his own household; for such was the custom of the age and
country.
CHAPTER XLVII.
HUSBAND AND WIFE.

First rose a star out owre the hill,


And next the lovelier moon;
While the bonnie bride o' Galloway
Looked for her blythe bridegroom;
Lythlie she sang as the new moon rose,
Blythe as a young bride may,
When the new moon lights her lamp o' love,
And blinks the bride away.—Cromek.
Sir Patrick Gray sprang from a couch, where dreams, rather than
sleep, had pressed thick and fast upon him. He rose while yet the
summer sun was below the green Galloway hills, and while the dark
waters of the Dee were veiled by the white mists of the early
morning.
His mind was full of Murielle, and he was not without hope, that
while all the numerous household and powerful feudal garrison were
yet abed, he might find some means to communicate with her—to
see, to speak to one so beloved—one from whom he had been so
long, so wickedly separated—his seven years' wedded wife!
It seemed to Gray, while thinking of this, that some one had been
softly and timidly tapping at his door.
Gently drawing back the numerous bars of wood and iron, with which
the doors of all bed-chambers in old Scottish mansions were
furnished in those days and for long after, he stepped into an arched
corridor; then, on looking along its dusky vista, he saw a female
figure approach, and what were his emotions on beholding the
sudden realization of his dearest wish—Murielle, who had left the
room thus early on the same errand and with the same desperate yet
tenderly loving hope, had been watching the door of his chamber.
She seemed pale and wan, as one who had been sleepless; but
though more womanly and more full in figure, she was otherwise
unchanged as when he had seen her last, on that happy and yet
unfortunate night, in the church of St. Genevieve, in Flanders.
"At last, my Murielle!"
"At last we meet—but oh! for a moment only."
They clasped each other in a tender embrace—heart to heart, and lip
to lip. His face was bent on hers, and her tears of joy and fear fell
fast.
"You love me still, Murielle?"
"Still!" she reiterated reproachfully—"oh, with all my life and
strength."
"But to what a hopeless love and aimless life have my passion and its
selfish ties consigned you!" said he; "we are the slaves of others and
of destiny."
"Such have we ever been, since that fatal day on which my cousins,
William and David, were slain. That was in 1440, ten long, long years
ago; but—"
"A crisis in our fate is coming now, dear Murielle."
"But say why—oh, why are you here—here in Thrave, here, where
your life is in peril so deadly?"
"I am come, in our good king's name, to demand MacLellan's release,
and to invite the earl, under cartel, to meet the council at Stirling,
that all these evils may be peacefully ended."
"I pray to the kind Father of all, and to his Blessed Mother, who is in
heaven, that it may be so," sighed poor Murielle. "But oh, I am so
weary, weary—so sad and weary here! They keep me quite a
prisoner, though not so cruelly as they keep Sir Thomas MacLellan;
for I am told by Marion Douglas that he is confined in the pit."
"Mahoun! say you so, dearest—in that horrid vault?"
"Yes; but hush—we may be overheard."
"Ah! my brave and noble kinsman—such a doom! Was it not in that
dungeon that Earl Archibald, the first duke of Touraine, kept
MacLellan, the laird of Borgue, chained, like a wild beast, till he
became a jabbering idiot, and when found by the prior of St. Mary's
Isle, he was laughing as he strove to catch the single sunbeam that
fell through the grated slit into his prison—yea, striving fatuously to
catch it with his thin, wan, fettered hand—the same hand that carried
the king's banner at the battle of Homildon!"
"Do not frown thus, my dearest heart," said Murielle, weeping; "I
have little need to add to the hatred that grows apace in every breast
against the name of Douglas."
"Do not say in every breast, sweet Murielle—sweet wife," he added,
pressing her close and closer still in his embrace; "for my heart is
wrung with anguish and with love for you, and of this love God alone
knoweth the depth and the strength!"
Murielle continued to weep in silence.
"My love for you," resumed Gray, "and my duty to the king, whom my
father, old Sir Andrew Gray, taught me to love, respect, and almost
worship, are impulses that rend my heart between them. At the risk
of my life I have ridden here on the king's service, alone, with no
protection but my sword, my hand, and, it may be, this royal tabard
—a badge but little respected on this side of the Nith or Annan."
"And you came——"
"To see you, and to save MacLellan from the fate of Sir Herbert
Herries. God wot, though, I would give the last drop of my blood to
serve my kinsman. A king's herald might have borne the mandate as
well as I; but the hope of seeing you—of hearing your dear voice, of
concerting some plan for your escape and future freedom from a
tyranny that is maddening,—chiefly, if not alone, brought me into the
wilds of Galloway—the very land and stronghold of the enemies of
the throne."
"Say not the enemies," said Murielle mournfully. "I hope that men
misjudge us sorely."
"I hope they do; yet there are strange whispers abroad of a rebel
league with the earls of Ross and Crawford, with Henry of England,
and the lord of the Isles—a league to dethrone the king and plunge
the land in ruin. But let us speak at present of escape—of flight——"
"My disappearance would be your destruction; all Galloway, with
hound and horn, would be upon your track."
"True—Douglas gave me his word for safety only while within the
walls of Thrave," said Gray, bitterly.
"The most sunny summer-day may have its clouds, dear Patrick; but
here, in this dull residence, with me it is ever cloud, and never
sunshine—I mean the sunshine of the heart. My time is passed, as it
were, in perpetual winter. I have no solace—no friend—no
amusement, but my cithern and the songs you loved so well——"
"And love still, Murielle, for the sake of you!"
"So cheerlessly I live on without hope or aim, a wedded nun, amid
councils of fierce and stern men, whose meetings, debates, and
thoughts are all for opposition, and revenge for the terrible deed of
1440."
"In other words, Murielle, men who are ripe for treason and
rebellion."
"Why will you speak or think so harshly of us?" she asked so
imploringly that Gray kissed her tenderly as his only or best reply.
"And that spiteful beauty your sister—what says she of me
now?"——"That you are the king's liege man," was the cautious
answer.
"She is right, my beloved one—I am his till death——"
"And our enemy!" said a sharp voice close by them, like the hiss of a
snake.
They turned and saw Margaret, the countess of Douglas, standing at
the entrance of her bower-chamber, the tapestry covering the door of
which she held back with one hand. She was clad entirely in black,
with a long veil of fine lace depending from the apex of her lofty
head-dress, enveloping her haughty head and handsome white
shoulders. She was somewhat changed since Gray had seen her last,
for angry passions were lining her young face prematurely; her
marvellous beauty remained in all its striking power; but it was the
beauty of a devil—diavolessa, an Italian would term it. Ten years of
feud and anxious hostility to the crown and its adherents had
imparted a sternness to her fine brow, a keen boldness to her black
eyes, and a sneering scorn to her lovely lip that made her seem a
tragedy queen.
"And so another errand than the king's message, anent his minion's
life or death, has brought you hither, scurvy patch!" said she
scornfully; "but by St. Bryde I shall rid our house and my sister of
such intrusive visitors!"
"Madame," Gray began, with anger.
"Varlet—would you dare to threaten me?" she exclaimed, holding up
a clenched hand, which was white, small, and singularly beautiful;
"but, my gay moth, you will flutter about that poor candle until your
wings are burnt. I have but to say one word to Douglas of this
clandestine meeting and he will hang you in your boots and tabard
above our gate, where Herbert of Teregles and many a better man
has hung."
"Oh, sister Margaret," urged Murielle, trembling like an aspen-leaf.
"Ha—to speak that word would remove the only barrier to your being
duchess of Albany—and why should I not speak it?" she continued
fiercely and with flashing eyes; "Why should I not speak it?"
"Because, dear Maggie, you have still some gentle mercy left, and
Heaven forbids you," said Murielle, clinging to her pitiless sister.
"Begone, madam," said the latter imperiously; "your instant
obedience alone purchases my silence. But here comes Sir Alan
Lauder."
So ended abruptly, as at the abbot's house in Edinburgh, this
unexpected meeting. Terror for her lover-husband's life made Murielle
withdraw instantly with her sister, just as Sir Alan Lauder of the Bass,
who was still captain and governor of Thrave, approached with an
undisguised sneer on his lips to say that the earl would receive Sir
Patrick Gray at brekfaast, in his own chamber, and there give his
answer to the king's message.
CHAPTER XLVIII.
DOUGLAS AND GRAY.

And darest thou, then,


To beard the lion in his den,
The Douglas in his hall?—Scott.
Clad in a robe of fine scarlet cloth, which was lined with white fur,
and fastened by a brooch or jewel at the neck, and which was open
just enough to show an undershirt and long hose of buff-coloured
silk, the earl of Douglas was seated in a high-backed easy-chair,
which was covered with crimson taffety. His feet were placed on a
tabourette, and close by, with his long sharp nose resting on his
outstretched paws, crouched Souyllard, the snow-white bloodhound,
uttering deep growls from time to time.
Breakfast, which consisted of cold beef, partridge-pie, flour cakes,
wheaten bread, honey, ale, and wine, was spread upon the table,
which, like the rest of the carved oak furniture, and like the castle
itself, seemed strong enough to last twenty generations of Douglases.
The equipage was entirely composed of the beautiful pottery of
Avignon, which was all of a dark-brown metallic tint, like tortoise-
shell, but perforated and in relief.
At the earl's hand, in the old Scottish fashion, stood a flat-shaped
pilgrim-bottle of usquebaugh. It was from Beauvais, and bore a Scots
lion, with the three fleurs-de-lys, and the name of Charle Roy VII., for
it was a personal gift from the then reigning monarch of France to
Douglas, when last at his hunting-castle in the wood of Vincennes.
A painted casement, one half of which stood open, admitted the
warm rays of the summer sun through the deep embrasure of the
enormously thick castle-wall, and afforded a glimpse without of the
far-stretching landscape and the windings of the Dee, and, nearer
still, the green islet it formed around Thrave, on the grassy sward of
which some noisy urchins and pages belonging to the feudal garrison
were gambolling, playing at leapfrog, and launching stones and
mimic spears at an old battered quintain, or carved figure of an
armed man, which stood there for the use of those who practised
tilting.
As Gray entered in his armour, with his surcoat on and his helmet
open, readier for departure than a repast, the earl, without rising or
offering his hand, bowed with cold courtesy, with a sardonic smile on
his white lip and hatred in his deep-set eyes. He made a signal to a
page who was in attendance to withdraw, and they were immediately
left together. "I said last night that it was ill speaking between a full
man and a fasting," said he; "we are both fasting this morning, so,
Sir Patrick, let us eat, drink, and then to business."
As they were both anxious to come to the point at issue, after a few
morsels of food and a draught of spiced ale, the grim earl spoke
again:—
"I have read the king's letters—that anent the laird of Bombie, and
that which invites me to a conference at Stirling, with a safe conduct
to me and all my followers. By my faith, Sir Patrick, I think their
number, in horse, foot, and archers, will be their best safe conduct.
The first letter demands——"
"The release, the instant release of my kinsman, Sir Thomas
MacLellan, of Bombie, and of that ilk, steward of Kirkcudbright, whom
you unlawfully and most unworthily detain here a prisoner," said
Gray, whom the cool and insolent bearing of Douglas exasperated
beyond the point of prudence. "What if I refuse," asked the latter
with an icy smile.
"That the king and council will consider."
"What if he be dead?"
"He is not dead," exclaimed Gray with growing excitement; "last night
I heard his voice, for he addressed me as we entered the barbican."
"Ha!" said Douglas sharply, with a furious glance.
"But dead or alive, lord earl, in the king's name I demand his body!"
"That shall I grant you readily," replied Douglas grimly, as he blew on
a silver whistle which lay on the table. The arras of the doorway was
raised, and there reappeared the page, to whom the earl gave a ring,
which he drew from his finger, and said, "Tell James Achanna to lead
forth the laird of Bombie from the vault, and to obey my orders."
The page bowed and retired; and he observed, if Gray did not, the
terribly sinister expression which at that moment filled the earl's
eyes.
"You said lead forth, my lord; hence my kinsman lives, and I thank
you," said Gray with more composure.
"Some men die between the night and morning, others between the
morning and the night;—but now about this conference at Stirling:
what boon does the king hold out as an inducement for a Douglas to
risk so much as an acceptance of royal hospitality?"
"Boon, my lord?"
"By St. Bryde, I spoke plainly enough!"
"'Twas said the restoration to you of the office of Warden of the
Marches towards England, the seat at the Privy Council, and the
commission of Lieutenant-General of the Kingdom."
"In short, all of which I was unjustly deprived, by Crichton's
influence, during my absence in France and Italy," said the earl,
removing the breast from a partridge.
Gray did not reply, for at that moment the castle bell was tolled
slowly, and a strange foreboding seized him. "With the recollection of
the Black Dinner of 1440 before me, by the Devil's mass! I would
require some great boon to tempt me, assuredly," said the earl, with
a laugh which had something diabolical in its sound.
"My lord, there will be the king's safe conduct."
"King's memories are precarious. Since the day of the bloody banquet
at which my kinsman perished, I have been, as it were, a man of
granite in a shirt of steel—immoveable, implacable—and feeling no
sentiment but the longing for revenge!"
The earl, with a sparkling eye and a flushing cheek, spoke as
feelingly as if he had not had a secret hand in the execution of his
nephews, nor won anything by it.
"King James," he added, "has yet to learn what a ten years' hatred
is!"
"Ten years?" reiterated Gray, as he thought of Murielle.
"Ten years—we are in the year 1450, and for the ten which have
preceded it, my armour has scarcely ever been off," said the earl;
"and even in my own hall the sword and dagger have never been
from my side."
"For the same reason, my lord, you have kept other men's armour
on, and others' weapons on the grindstone," replied Gray; "but
endeavour to be as good and loyal as your fathers were, earl of
Douglas—renounce your evil leagues, and bonds for rebellion, else
you may find the king alike more wise and powerful than you
imagine."
"I seek not advice from you, laird of Foulis," said Douglas, with proud
disdain. "Within sound of the bells of Holyrood, or those of St. Giles,
your king may be both powerful and wise; but on this side the Moat
of Urr, I have my doubts of his power or wisdom."
"How, my lord?"
"He would scarcely be wise to venture into the wilds of Galloway,
even with all the forces he and that fangless wolf, his chancellor, may
collect; and never powerful enough to do so, with the hope of
success."——"Daring words, when said of one who is king of all
Scotland."
"But I am lord of all Galloway, and I shall uphold its ancient rights
and laws in battle, as stoutly as Earl Archibald did in Parliament."
"An old story, my lord," said Gray, rising.
"True; in the days of Robert II.," added Douglas, rising also.
"In 1385," observed Gray, with a scarcely perceptible smile.
"I forget."——"Though you forget many things, my lord," said Gray,
rashly and impetuously, "do not imagine that you or they are
forgotten."
"Does this imply a threat, my cock-laird of the north country?" asked
Douglas, with profound disdain.
"As you please—I am a plain soldier."
A ferocious expression darkened the earl's face; but Gray drew back a
pace, and laid a hand significantly on his tabard.
"Sir Patrick Gray, I advise you to get your horse and begone!"
thundered Douglas, starting to his feet. "Without there! Order Sir
Patrick's horse to the barbican-gate!"
"But your answer to the king?" said Gray, tightening his waistbelt,
and preparing for a sudden start.
"That I shall convey in person to Stirling."
"And my kinsman——"
"Dead or alive?" said Douglas, with a sullen glare in his eye. Alarmed
by the expression of the earl's face, Gray said earnestly:—
"Lest you might not obey the king's commands, proud lord, or might
scoff at my humble request, I bring you a mother's prayer for her
son."
"A mother's?" said Douglas, pausing as they descended the staircase
together.
"The prayer of my father's sister, Marion Gray, for her son's
release."——"It comes too late," muttered the earl under his black
moustache, as they issued into the sunny court-yard of Thrave.
CHAPTER XLIX.
THE FATE OF MACLELLAN.

Lift not the shroud! a speaking stain


Of blood upon its sable seen,
Tells how the spirit fled from plain,
For there the headsman's axe hath been.
Ballad.
"The king's demand shall be granted, but rather for your sake—come
hither," said the earl.
There was a cruel banter in his manner, a bitter smile on his face,
and Gray grew pale, and felt the blood rush back upon his heart with
a terrible foreboding as they crossed the court-yard.
Then his eye at once detected something like a human form
stretched at full length upon the ground, and covered by a sheet.
About it there could be no doubt—it was so cold, white, angular, and
fearfully rigid. Upon the breast was placed a platter filled with salt—a
Scottish superstition as old as the days of Turpin—and close by lay an
axe and bloodstained billet, about which the brown sparrows were
hopping and twittering in the warm morning sunshine. With a
choking sensation in his throat, Gray stepped resolutely forward.
"Remove this cloth," said the earl to some of his people who were
near; and on their doing so, there was seen the body of a headless
man—a body which Gray knew too well to be that of his friend and
kinsman, for on the breast of the soiled and faded pourpoint was
embroidered a gold scutcheon, with the three black chevrons of
MacLellan.
"Sir Patrick, you have come a little too late," said the sneering earl;
"here lies your father's sister's son; but, unfortunately, he wants the
head, and a head is an awkward loss. The body, however, is
completely at your service."
Grief and indignation almost choked Gray's utterance. He knelt down
and kissed the cold right hand, which yet bore the mark of an iron
fetter, and then turning to the earl, said, "My lord, you may now
dispose of the body as you please; but the head——"
"Behold it on the battlement above you!"[4]
Gray mounted his horse, which was at that moment led to the outer
gate by the earl's grooms; and mistrusting them, though feeling as
one in a terrible dream, before putting his foot in the stirrup he
carefully examined his bridle, girths, and crupper. Then, says Sir
Walter Scott in his history, "his resentment broke forth in spite of the
dangerous situation in which he was placed":—
"My lord," said he, shaking his gauntleted hand close to the earl's
beard, "if I live you shall bitterly pay for this day's work; and I—
Patrick Gray of Foulis—tell thee, that thou art a bloodthirsty coward—
a disgrace to knighthood and nobility!"
He then wheeled round his horse, pressed the sharp Rippon spurs
into its flanks, and galloped off.
"To horse and chase him!" cried the earl, furiously. "I will ride to
Stirling, false minion, with your head at my saddle-bow! To horse and
follow him—this venturesome knight must sleep beside his kinsman!"
"But he came on the king's service," urged Sir Alan Lauder, as he put
his foot in the stirrup of his horse, when some twenty or thirty
mounted moss-troopers came hurriedly from the stable court.
"Bah! love-lured and destiny dragged him hither. Let slip Souyllard
the sleuth-bratch. Horse and spear, I say, Lauder and Achanna—a
hundred crowns for the head of yonder minion! I swear by St. Bryde
of Douglas and Kildara, by the Blessed Virgin and her son, never to
eat at a table, sleep in a bed, to rest under a Christian roof, or to lay
aside sword and armour, till I have passed my dagger through the
heart of Patrick Gray, dead or alive!"——"If you break this terrible
vow," said Sir Alan, aghast at the earl's fury, and the form it took in
words.
"Then I pray Heaven, at the judgment day, to show such mercy to
me as I shall show my enemies," was the fierce response.
It was fortunate for the earl that he soon found a friar to release him
from a vow, the fulfilment of which must have entailed a vast deal of
trouble, fuss, and discomfort upon him and his followers.
"A hundred crowns and St. Bryde for Douglas!" was the shout of the
moss-troopers, as they dashed through the outer gate, and with their
light active horses, their steel caps, jacks, and spears, they clattered
over the drawbridge; but Gray, after escaping six or eight crossbow
bolts, was already three miles before them, and spurring in hot haste
along the road towards Dumfries.
It was a fortunate circumstance for him that he was well mounted on
a fleet, strong, and active horse; for he was a muscular man, and
heavily mailed, while his pursuers, being Border Prickers, wore but
little armour, and their wiry nags were used to scamper on forays in
all weathers and seasons by day and night, over moor, and moss, and
mountain sides.
Gray knew well that if taken his death was certain; for Douglas,
reckless, ruthless, and bloodthirsty, by nature, was certain now to
give full scope to his long-treasured hatred.
He no longer heard the whooping of his pursuers; he had either
distanced them, or they were husbanding their energies for a long
chase; but there came after him at times, upon the hollow wind, the
grunting bark of the sleuth-bratch, by which he was surely and
savagely tracked, Souyllard, the earl's favourite white bloodhound,
and the heart of the fugitive swelled anew, with grief and rage, and
hatred of his unrelenting tormentor.
He was far from shelter or succour, for until he reached the Lothians,
all the land belonged to his enemies, or to those who dared not
protect him. For miles and miles before him stretched flattened hills
and open plains covered with waving heather, the purple tints of
which were glowing in the noonday sun, and these tints deepened
into blue or black on the shaded sides of the glens. The whirr of the
blackcock was heard at times, as it rose from the pale green or
withered yellow leaves of the ferns that grew among the rushes,
where the trouting burn brawled, or by the lonely ravine, the red-
scaured bank, or stony gulley, which Gray made his foam-covered
horse clear by a flying leap.
Louder and nearer came the savage bay of the sleuth-bratch, and
Gray, as he looked back, could see it tracking him closely and surely;
while about three miles distant the border spears of his pursuers
glittered on the summit of a hill.
He had swam his horse through the Urr and spurred on for miles, and
now before him lay Lochrutton, with its old peel-house, named, from
its loneliness, the castle-of-the-hills; then, as he was about to cross a
foaming tributary of the loch—a stream that tore, all red and muddy,
through a stony ravine—the bounding sleuth-bratch came upon him,
and sprang at his horse's flanks, just as the terrified animal rose into
the air to cross by a flying leap.
Clenching his gauntleted hand, Gray struck the fierce brute on the
head, and it fell into the rushing torrent; but gained the other side as
soon as he, and sent its deep, hoarse bay into the air, as if to
summon the pursuers. Now, the terrible sleuth-bratch was running
parallel with his horse's flanks, and vainly he strove to strike at it with
his sword. His temples throbbed as if with fever, and now, for air, for
coolness, and relief, he drew off his barred helmet; then he tossed it
into a bush, for the double purpose of staying the hound and
concealing it as a trace of his flight. Spurring on—he redoubled his
efforts to escape; he called to his horse—he cheered and caressed it,
while the perspiration of toil oozed through the joints of his armour,
from his gorget to his spurs.
The terrible white hound was preparing for a final, and, doubtless, a
fatal spring, when a man, whose head and shoulders were enveloped
in a scarlet hood, suddenly rose from the whin bushes amid which
Gray was galloping, and, by a single blow from his ponderous mace,
dashed out the brains of the dog, killing it on the spot.
Gray had just time to perceive that his preserver was Malise MacKim,
the smith of Thrave, when he passed on like a whirlwind; and now he
saw before him, in the distance, the lovely and far-stretching valley of
the Nith, the long bridge, the red-walled town, the spires, and the old
castle of Dumfries. He dashed through the portal of the bridge, and
pushed on by the way for Edinburgh; but, owing to the rough and
devious nature of the roads, night overtook him among the wilds of
Tweedmuir. If he drew bridle for a moment, he still heard the tramp
of hoofs in his rear, for now fresh horsemen had joined in the pursuit;
and, but for the relays he had so wisely provided, and of which he
only once availed himself, he had assuredly been taken and slain.
At Moffat, he obtained his own favourite horse from the priest of the
village church.
"Will you not, for safety, have your charger's shoes reversed?" said
the churchman, as Gray mounted.
"Like King Robert of old; but there is no snow to reveal the track."
"But there is mud, and the ways are deep and soft."
"True; but who is here to do it for me?"
"The smith at the forge."
"Nay, nay, good father," said Gray, shortening his reins; "hear you
that!" a whoop and a bugle blast, came together on the night breeze;
"I must even trust me to my good Brechin blade and Clydesdale nag;
for both are our good king's gift," and he set forth with renewed
speed.
He had good reason for declining to loiter, for just as he rode off, the
mountain pass, which opens into the valley, rang with shouts and the
rush of many iron hoofs, as the laird of Hawkshaw and the Hunters of
Palmood, with Sir Alan Lauder and a band of Douglas moss-troopers,
came galloping down.
On, on, rushed the fleet horse, with its small head outstretched, and
cutting the night air, as the prow of a ship cleaves the water. The
taper ears lay flat on its neck, the mane streamed behind like smoke
from a funnel, and the quivering nostrils shot forth white puffs of
steamy breath at every bound; while foam and blood mingled
together on its flanks, as the sharp Rippon spurs of the daring rider
urged it fast and furiously on.
This flight of Sir Patrick Gray from Thrave suggested to Scott the
escape of Marmion from the future chief of the Douglases—the earl
of Angus, at Tantallan.
FOOTNOTES:
[4] One account states that the body of MacLellan was interred in
the church of Kirkconnel, and some old inscription is quoted in
proof; another, that he was conveyed to the abbey church of
Dundrennan, where a monument was erected to his memory; but
it is much more probable that he would be interred in the church
of Kirkcudbright.
CHAPTER L.
WILL HE ESCAPE?

Black is my steed as a cloud of rain,


There's a star of white on his brow;
The free gales play with his feathery mane,
And lightnings gleam round his feet of snow.
Polish Poetry.
Dark foliaged glens and heathy hills, furrowed fields and wayside
cottages, with the pale smoke curling through their roofs of yellow
thatch and emerald moss; rock-perched towers, with corbelled
battlements and grated windows; deep fordless rivers, pathless
woods, and uncultivated wolds, seemed to fly past, and still the steed
with its bare-headed rider rushed on at a frightful pace, as if it was
enchanted, or bestrode by an evil spirit, like that of Lenore in the
ballad of Burger.
The moon was shining brightly now.
Near Stobo the pursuers came so close upon Gray that he began to
fear escape was impossible—the more so, as fresh blood-hounds
were baying on his track. On reaching the Tweed, instead of crossing
it by a ford—the river was deeper everywhere then than now—he
waded or swam his horse up the current for about a hundred yards,
and backed it into a low-browed fissure or cavern which he
discovered amid the rocks.
Dismounting, he drew his sword and dagger, threw the bridle over his
left arm, and stood at the cavern mouth to confront all or any who
might come near, and resolved if they discovered his lurking place to
sell his life dearly; but he felt how much the long ride in heavy
armour over rough ground had impaired his natural strength. His
sinews were stiffened, his overtasked muscles were swollen with
pain, and his mind was as weary as his body.
On the silver current of the Tweed, as it brawled over its broad bed of
pebbles, the moon shone bright and clearly; close by, a tributary from
the hills rushed over a brow of rock, and formed a feathery cascade,
which plunged into a deep pool. There the peasantry affirmed that a
kind fairy was wont to appear at times, and to bend over the
cascade, mingling her white arms and floating drapery with the foam,
as she sought to save those wayfarers whom the evil kelpie in the
darksome linn below sought to drown and devour.
Nor hideous kelpie, nor lovely fairy were visible to-night; but now
came the hoarse grunting bark of four large sleuth-bratches, as they
leaped with heavy plunges to the margin of the stream: there the
scent was lost, and they were once more, as at every running water,
at fault, so they ran snorting and sniffing to and fro among the
leaves, reeds, and water-docks, with the breath curling up from their
fierce red nostrils like white steam in the clear moonlight.
Then came furious surmises and angry oaths as a dozen or more
moss-troopers galloped down to the bank of the stream, and rode in
an excited manner hither and thither, seeking to put the dogs upon a
track or trail. Through the leafy screen of his hiding-place, Gray could
see their fierce and sun-burned faces, their rusty helmets and
battered trappings, their long reed-like lances that glittered in the
moonshine; for those moss-troopers, in their well-worn and half-
barbaric accoutrements, were the very Cossacks of the Scottish
borders.
James Achanna now came up and spurred his horse across to
examine the ford, and uttered a shout of exultation on discovering
the trace of horses' hoofs recently impressed in the soft mud.
Gray drew a long breath, and felt the edge of his sword, for he
thought the critical moment was at hand! But now a trooper, with an
oath expressive of disappointment, drew the attention of all to the
circumstance that the marks were those of a horse which had gone
towards the ford and must have crossed it for the south, and that, if
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookmasss.com

You might also like