Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding Machine Learning Methods, 2nd Edition Hong Zhou instant download
Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding Machine Learning Methods, 2nd Edition Hong Zhou instant download
https://ebookmeta.com/product/learn-data-mining-through-excel-a-
step-by-step-approach-for-understanding-machine-learning-
methods-2nd-edition-hong-zhou/
https://ebookmeta.com/product/mastering-excel-through-projects-a-
learn-by-doing-approach-from-payroll-to-crypto-to-data-analysis-
hong-zhou/
https://ebookmeta.com/product/bayesian-statistics-for-beginners-
a-step-by-step-approach-therese-m-donovan/
https://ebookmeta.com/product/strategic-marketing-planning-a-
step-by-step-approach-2nd-edition-karel-jan-alsem/
https://ebookmeta.com/product/protein-design-methods-and-
applications-valentin-kohler-editor/
Business Communication and Character Amy Newman
https://ebookmeta.com/product/business-communication-and-
character-amy-newman/
https://ebookmeta.com/product/an-analysis-of-mary-wollstonecraft-
s-a-vindication-of-the-rights-of-woman-1st-edition-ruth-scobie/
https://ebookmeta.com/product/jean-toomer-and-the-terrors-of-
american-history-charles-scruggs/
https://ebookmeta.com/product/go-ahead-7-jahrgangsstufe-ausgabe-
fur-realschulen-in-bayern-schulerbuch-annette-baader/
https://ebookmeta.com/product/the-woman-with-no-name-1st-edition-
kate-bizos/
Paper Wonderland 32 Terribly Cute Toys Ready to Cut
Fold Build 1st Edition Michelle Romo
https://ebookmeta.com/product/paper-wonderland-32-terribly-cute-
toys-ready-to-cut-fold-build-1st-edition-michelle-romo/
Learn Data Mining
Through Excel
A Step-by-Step Approach for
Understanding Machine
Learning Methods
Second Edition
Hong Zhou
Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding
Machine Learning Methods
Hong Zhou
Department of Mathematics and Computer Science
University of Saint Joseph, West Hartford, CT, USA
v
Table of Contents
vi
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 281
viii
About the Author
Hong Zhou, PhD, is a professor of computer science and
mathematics and has been teaching courses in computer
science, data science, mathematics, and informatics at the
University of Saint Joseph for nearly 20 years. His research
interests include bioinformatics, data mining, software
agents, and blockchain. Prior to his current position, he was
a Java developer in Silicon Valley. Dr. Zhou believes that
learners can develop a better foundation of data mining
models when they visually experience them step by step,
which is what Excel offers. He has employed Excel in teaching data mining and finds it
an effective approach for both data mining learners and educators.
ix
About the Technical Reviewer
Adam Gladstone has over 25 years’ experience in software
development, mostly in C++ and C#. He has worked mainly
in the investment banking and finance sectors. For the last
few years, he has been developing data science and machine
learning skills, particularly in Python and R after completing
a degree in maths and statistics. He loves programming in
C++ and C# and his free time is spent developing software
tools.
xi
CHAPTER 1
Why Excel?
If you are already an experienced data mining professional, I would say that you
are asking the right question and probably you should not read this book. However,
if you are a beginner in data mining, or a visual learner, or want to understand the
mathematical background behind some popular data mining techniques, or an
educator, then this book is right for you, and probably is the first book you should read
before you start your data mining journey.
Excel allows you to work with data in a transparent manner, meaning when an Excel
file is opened, the data is visible immediately and every step of data processing is also
visible. Intermediate results are contained in the Excel worksheet and can be examined
while you are conducting your mining task. This allows you to obtain a deep and clear
understanding of how the data are manipulated and how the results are obtained.
Other software tools and programming languages hide critical aspects of the model
construction process. For most data mining projects, the goal is to find the internal
hidden patterns inside the data. Therefore, hiding the detailed process is beneficial to the
users of the tools or packages. But it is not helpful for beginners, visual learners, or those
who want to understand how the mining process works. Let me use k-nearest neighbors
method (K-NN) to illustrate the learning differences between RapidMiner, R, and Excel.
Before we do that, we need to understand several terminologies in data mining.
1
© Hong Zhou 2023
H. Zhou, Learn Data Mining Through Excel, https://doi.org/10.1007/978-1-4842-9771-1_1
Chapter 1 Excel and Data Mining
There are two types of data mining techniques: supervised and unsupervised.
Supervised methods require the use of a training dataset to “train” the software
programs or algorithms (such programs or algorithms are often referred to as machines)
first. Programs are trained to reach an optimal state called a model. This is why a
training process is also called modeling. Data mining methods can also be categorized
into parametric and nonparametric methods. For parametric methods, a model is just
a set of parameters or rules obtained through the training process that are believed to
allow the programs to work well with the training dataset. Nonparametric methods do
not generate a set of parameters. Instead, they dynamically evaluate the incoming data
based on the existing dataset. You may be confused by such definitions at this time. They
will make sense soon.
What is a training dataset? In a training dataset, the target variable (also called label,
target, dependent variable, outcome variable, response), the value of which is to be
predicted, is given or known. The value of the target variable depends on the values of
other variables which are usually called attributes, predictors, or independent variables.
Based on the attribute values, a supervised data mining method computes (or so-called
predicts) the value of the target variable. Some computed target values might not match
the known target values in the training dataset. A good model indicates an optimal set of
parameters or rules that can minimize the mismatches.
A model is usually constructed to work on future datasets with unknown target
values in a supervised data mining method. Such future datasets are commonly called
scoring datasets. In an unsupervised data mining method, however, there is no training
dataset and the model is an algorithm that can directly be applied on the scoring
datasets. K-nearest neighbors method is a supervised data mining technique.
Suppose we want to predict if a person is likely to accept a credit card offer based
on the person’s age, gender, income, and number of credit cards they already have. The
target variable is the response to the credit card offer (assume it is either Yes or No),
while age, gender, income, and number of existing credit cards are the attributes. In the
training dataset, all variables including both the target and attributes are known. In such
a scenario, a K-NN model is constructed through the use of the training dataset. Based
on the constructed model, we can predict the responses to the credit card offer of people
whose information is stored in the scoring dataset.
2
Chapter 1 Excel and Data Mining
In RapidMiner, one of the best data mining tools, the prediction process is as follows:
retrieve both the training data and scoring data from the repository ➤ set role for the
training data ➤ apply the K-NN operator on the training data to construct the model ➤
connect the model and the scoring data to the Apply Model operator. That’s it! You can
now execute the process and the result is obtained. Yes, very straightforward. This is
shown in Figure 1-1. Be aware that there is no model validation in this simple process.
Applying K-NN method is very simple in R, too. After loading the library “class”, read
the training data and scoring data, make use of the K-NN function, and by then we have
finished our job: ready to view our result. This is demonstrated in Figure 1-2. Note that
lines starting with “#” are comments.
3
Chapter 1 Excel and Data Mining
The knowledge you have gained from the preceding tasks is enough to just be able
to apply the data mining method K-NN. But if you are trying to understand, step by step,
why and how K-NN works, you will need a lot more information. Excel can offer you the
opportunity to go through a step-by-step analysis process on a dataset during which you
can develop a solid understanding of the K-NN algorithm. With this solid understanding,
you can then be more proficient in using other powerful tools or programming
languages. Most importantly, you will have a better understanding of the quality and
value of your data mining results. You will see that in later chapters.
Of course, Excel is much more limited in data mining compared to Python, R, and
RapidMiner. Excel can only work with data up to a smaller size limit. Meanwhile, some
data mining techniques are too complicated to be practiced through Excel. Nonetheless,
Excel provides us direct and visual understanding of the data mining mechanisms. In
addition, Excel is naturally suitable for data preparation.
4
Chapter 1 Excel and Data Mining
Today, because of the software tools and other packages, most effort in a data mining
task is spent on understanding the task (including the business understanding and data
understanding), preparing the data, and presenting the results. Less than 10% of the
effort is spent on the modeling process. The process of preparing the data for modeling is
called data engineering. Excel has an advantage on data engineering when the datasets
are not too large because it can give us a visual representation of data engineering, which
allows us to be more confident in our data preparation process.
As an experienced educator, I realize that students can better develop a deep
understanding of data mining methods if these methods are also explained through
step-by-step instructions in Excel. Studying through Excel unveils the mystery behind
data mining or machine learning methods and makes students more confident in
applying these methods.
Did I just mention machine learning? Yes, I did. Machine learning is another buzz
phrase today. What is machine learning? What is the difference between data mining
and machine learning? Moreover, what is artificial intelligence (AI) and what is the
difference between AI and machine learning?
The purposes of machine learning and data mining are somewhat different. The
purpose of machine learning is to study how computers can develop human-like
learning ability by learning from data. The purpose of data mining is to find valuable
patterns or knowledge from data. However, data mining makes use of machine learning
methods to achieve its goals, that is, the methodologies of data mining and machine
learning can be the same. Anyhow, it is not necessary to differentiate them, and I would
suggest that we treat them the same at this moment.
The same scenario applies to AI and machine learning. Simply put, AI is computer
software that simulates human brain functions, while machine learning trains computer
algorithm through data to mimic human thinking ability. Because most AI makes use of
machine learning to achieve its goals, machine learning is usually considered a subset of
AI and therefore it is almost impossible to differentiate AI and machine learning either.
5
Exploring the Variety of Random
Documents with Different Content
49. In our mind’s eye, Horatio. Hamlet, Act I. Sc. 2.
Warton. Thomas Warton (1728–1790). See vol. V. Lectures on the
English Poets, p. 120 and note.
50. At every fall. Milton, Comus, 251.
51. Nod to him, elves. A Midsummer Night’s Dream, Act III. Sc. 1.
The breezy call. Gray’s Elegy written in a Country Churchyard.
52. Air [shape] and gesture proudly eminent. Paradise Lost, Book I.
590.
53. It is place which lessens. Cymbeline, Act III. Sc. 3.
54. Sigh our souls. Merchant of Venice, Act V. Sc. 1.
Snyders. Franz Snyders (1579–1657), of Antwerp, painter of hunting
scenes.
55. Of the earth, earthy. 1 Cor. xv. 47.
We think it had better not be seen. The Magazine article adds:—‘We
never very much liked this picture; but that may probably be our
fault.’
PAGE
61. Trace his footsteps.
Cf. ‘Where shall I seek
His bright appearances, or footstep trace?
For, though I fled him angry, yet, recalled
To life prolonged and promised race, I now
Gladly behold though but his utmost skirts
Of glory, and far-off his steps adore.’
Paradise Lost, XI. 328.
From the New Monthly Magazine, vol. IV., 1822, Table Talk, No. IV.
62. And dull [dead] cold winter. The Two Noble Kinsmen, Act II. Sc.
1.
Faded to the light. Wordsworth, Ode, Intimations of Immortality.
Ways were mire. Milton, Sonnet XX.
63. And still walking under. See ante, note to p. 10.
I was brutish [beastly] like, warlike as the wolf. Cymbeline, Act III.
Sc. 3.
Paul Potter. Of Enkhuizen (1625–1654), animal painter.
64. To see the sun to bed. Lamb, John Woodvil, Act II.
Hunt half a day. Wordsworth’s Hart-Leap Well, Part II.
65. Humbled by such rebuke. Paradise Lost, VI. 342.
And in its liquid texture. Ibid., VI. 348–9.
Inimitable on earth. Ibid., III. 508.
66. Hesperian fable true. Ibid., IV. 250.
Dream of a Painter. See Northcote’s Varieties on Art in his Memoirs
of Sir Joshua Reynolds, etc. (1813–1815), p. xvi. See also vol. I. The
Round Table, note to Guido, p. 162.
Paul Brill. Of Antwerp (1556–1626), a follower of Titian.
67. His light shone in darkness. Cf. S. John i. 5.
Luca Jordano. Luca Giordano (1632–1705), of Naples, ‘Il Presto,’ the
quick worker, who imitated all the great painters.
Grinling Gibbons. The wood carver (1648–1720), of Rotterdam. He
was brought to public notice by Evelyn, the Diarist, and his work may
be seen in St. Paul’s, London, and Trinity College Library,
Cambridge.
68. Lords who love their ladies like. Cf. Home’s Douglas, Act I. Sc. 1:
‘As women wish to be who love their lords.’
See vol. I. The Round Table, pp. 25 et seq., and notes thereto.
NOTES OF A JOURNEY THROUGH FRANCE
AND ITALY
The circumstances which led to and succeeded the tour in France
and Italy described in the following letters will be found detailed in
the Memoirs of William Hazlitt, pp. 107 et seq. The journey began in
August 1824, shortly after Hazlitt married Mrs. Bridgewater; and it
ended in October 1825, by the return home alone of Hazlitt and his
son.
CHAPTER I
CHAPTER II
September 17
94. Bidding the lovely scenes. Collins, Ode on the Passions.
98. The pomp of groves. Beattie, The Minstrel, I. 9.
99. Note. Gil Blas’s Supper. Cf. Book I. chap. 2.
Note. Chateaubriand ... On the Censorship. François René, Vicomte
de Chateaubriand’s (1768–1848) phase of politics between 1824 and
1830 was one of Liberalism. His writings in the Journal des Débats
and elsewhere caused the Chamber to abandon its proposed law
against the press.
100. Swinging slow with sullen roar. Il Penseroso, 76.
CHAPTER III
September 24
102. My tables. Hamlet, Act I. Sc. 5.
103. Like the fat weed. Hamlet, Act I. Sc. 5.
105. Exhalation [steam] of rich-distilled perfumes. Milton, Comus,
556.
106. Let their discreet hearts believe [think] it. Othello, Act II. Sc. 1.
CHAPTER IV.
September 28
106. First and last and midst. Paradise Lost, v. 165.
Worn them as a rich jewel. Hazlitt quotes from himself. See vol. VI.,
Table Talk, p. 174.
Thrown into the pit. Cf. Genesis xxxvii. 24.
School calleth unto School. Psalm xlii. 7: ‘deep calleth to deep.’
107. My theme [shame] in crowds. Goldsmith, The Deserted Village,
412.
Brave o’er-hanging firmament. Hamlet, Act II. Sc. 2.
Hang upon the beatings of my heart. Wordsworth, Tintern Abbey.
Stood the statue that enchants the world. Thomson, The Seasons,
Summer, 1347.
There was old Proteus. Altered from Wordsworth’s Sonnet, ‘The
world is too much with us.’
Sit squat, like a toad. Paradise Lost, IV. 800.
108. The death of the King. Louis XVIII. of France died in September
1824.
Sir Thomas Lawrence. Portrait-painter (1769–1830).
109. To cure [drive] all sadness but despair. Paradise Lost, IV. 156.
Verdurous wall of Paradise. Ibid., IV. 143.
In darkness visible. Ibid., I. 63.
Hulling. ‘Hull on the flood.’ Ibid., XI. 840.
Blind with rain.
Cf. ‘When the chill rain begins at shut of eve
In dull November, and their chancel vault,
The Heaven itself, is blinded throughout night.’
Keats’s Hyperion, II. 36–38.
CHAPTER V
CHAPTER VI
CHAPTER VII
PAGE
133. It out-herods Herod. Hamlet, Act III. Sc. 2.
Note. Dip it in the ocean. A Sentimental Journey, The Wig, Paris.
Note. Perilous stuff that weighs upon the heart. Macbeth, Act V. Sc.
3.
136. Like stars, shoot madly [start] from their spheres. Hamlet, Act
I. Sc. 5.
CHAPTER VIII
CHAPTER IX
November 17. Numbered X.
147. Mademoiselle Mars. See vol. VII., The Plain Speaker, pp. 324 et
seq.
Mrs. Jordan. Dorothea or Dorothy Jordan (1762–1816). See vol. VIII.,
containing Hazlitt’s dramatic writings, for criticism upon her and the
following actresses.
Mrs. Siddons. Sarah Siddons (1755–1831).
Miss Farren. Elizabeth Farren (1759?-1829), Countess of Derby. See
vol. VIII., Lectures on the Comic Writers, 165, etc.
Mrs. Abington. Frances Abington (1737–1815).
Miss O’Neil. Eliza O’Neil (1791–1872), afterwards Lady Becher. See
vol. I., The Round Table, note to p. 156, and vol. VIII. A View of the
English Stage, p. 291.
Flavia the least and slightest toy. Bishop Atterbury’s Flavia’s Fan.
149. Monsieur Damas. For more than twenty-five years one of the
most brilliant actors at the Comédie Française. He retired from the
stage in 1825 and died in 1834.
151. Midsummer madness. Twelfth Night, Act III. Sc. 4.
Mr. Bartolino Saddletree. See Scott’s Heart of Midlothian.
Whole loosened soul.
Cf. ‘All my loose soul unbounded springs to thee.’
Pope, Eloisa to Abelard, 228.
CHAPTER X
CHAPTER XI
CHAPTER XII
CHAPTER XIII
April 6. Numbered XV
Devoutly to be wished. Hamlet, Act III. Sc. 1.
184. Honest sonsie bawsont face. Burns, The Twa Dogs.
The icy fang and season’s difference. As You Like it, Act II. Sc. 1.
Mr. Theodore Hook. Theodore Edward Hook (1788–1841), novelist
and political writer, the Lucian Gay of ‘Coningsby,’ and editor of the
Tory ‘John Bull’ newspaper.
PAGE
186. Here was sympathy. The Merry Wives of Windsor, Act II. Sc. 1.
De Stutt—Tracey’s ‘Idéologie.’ Antoine Louis Claude Comte Destutt
de Tracy’s (1754–1836), Élémens d’Idéologie was published in 1817–
1818.
Mignet’s French Revolution. François-Auguste-Marie Mignet’s
(1796–1884) Histoire de la Révolution Française was published in
1824.
Sayings and Doings. Nine novels of Theodore Hook, published
1826–1829.
Irving’s Orations. Probably Edward Irving’s Four Orations for the
Oracles of God, published in 1823, a third edition of which was
issued in the following year. Cf. vol. iv. The Spirit of the Age, p. 228.
The Paris edition of ‘Table Talk.’ See vol. VI. Bibliographical Note to
Table Talk.
187. Note. Mr. Canning’s ‘faithlessness.’ He had the reputation for
preferring devious paths. ‘I said of him “that his mind’s-eye
squinted,”’ wrote Croker to Lord Brougham, March 1839. See the
Croker Papers, vol. II. p. 352.
Note. Like that ensanguined [sanguine] flower. Lycidas, 106.
Note. Francesco Guicciardini’s (1483–1540), History of Italy from
1494–1532.
Note. Enrico Caterino Davila (1576–1631) of Padua, author of a
History of the Civil Wars of France.
190. The merit of the death of Hotspur. 1 King Henry IV., Act V. Sc.
4.
He who relished. i.e., Rousseau.
The Magdalen Muse of Mr. Moore. See vol. VII. The Plain Speaker, p.
368.
191. Where Alps o’er [on] Alps arise. Pope, Essay on Criticism, II. 32.
This fortress, built by nature. King Richard II., Act II. Sc. 1.
Nodded to him. A Midsummer Night’s Dream, Act III. Sc. 1.
193. Hemskirk. Maerten van Veen of Heemskerk, near Haarlem
(1498–1574), a follower of Michael Angelo.
Kean. Edmund Kean (1787–1833).
194. With cautious haste [wanton heed] and giddy cunning.
L’Allegro, 141.
CHAPTER XV
CHAPTER XVI
CHAPTER XVII
CHAPTER XVIII
CHAPTER XIX
CHAPTER XX
CHAPTER XXI