(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
https://ebooksecure.com/product/ebook-pdf-data-mining-concepts-
and-techniques-3rd/
http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-with-
xlminer-3rd-edition/
http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-in-r/
http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-with-jmp-
pro/
http://ebooksecure.com/product/ebook-pdf-data-mining-and-
predictive-analytics-2nd-edition/
(eBook PDF) Introduction to Business Data Mining 1st
Edition
http://ebooksecure.com/product/ebook-pdf-introduction-to-
business-data-mining-1st-edition/
http://ebooksecure.com/product/ebook-pdf-handbook-of-statistical-
analysis-and-data-mining-applications-2nd-edition/
https://ebooksecure.com/download/predictive-modeling-in-
biomedical-data-mining-and-analysis-ebook-pdf/
http://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-global-edition-2nd-edition/
https://ebooksecure.com/download/big-data-mining-for-climate-
change-ebook-pdf/
To Y. Dora and Lawrence for your love and encouragement
J.H.
To Erik, Kevan, Kian, and Mikael for your love and inspiration
M.K.
Foreword xix
Foreword to Second Edition xxi
Preface xxiii
Acknowledgments xxxi
About the Authors xxxv
Chapter 1 Introduction 1
1.1 Why Data Mining? 1
1.1.1 Moving toward the Information Age 1
1.1.2 Data Mining as the Evolution of Information Technology 2
1.2 What Is Data Mining? 5
1.3 What Kinds of Data Can Be Mined? 8
1.3.1 Database Data 9
1.3.2 Data Warehouses 10
1.3.3 Transactional Data 13
1.3.4 Other Kinds of Data 14
1.4 What Kinds of Patterns Can Be Mined? 15
1.4.1 Class/Concept Description: Characterization and Discrimination 15
1.4.2 Mining Frequent Patterns, Associations, and Correlations 17
1.4.3 Classification and Regression for Predictive Analysis 18
1.4.4 Cluster Analysis 19
1.4.5 Outlier Analysis 20
1.4.6 Are All Patterns Interesting? 21
1.5 Which Technologies Are Used? 23
1.5.1 Statistics 23
1.5.2 Machine Learning 24
1.5.3 Database Systems and Data Warehouses 26
1.5.4 Information Retrieval 26
ix
x Contents
Bibliography 633
Index 673
Foreword
Analyzing large amounts of data is a necessity. Even popular science books, like “super
crunchers,” give compelling cases where large amounts of data yield discoveries and
intuitions that surprise even experts. Every enterprise benefits from collecting and ana-
lyzing its data: Hospitals can spot trends and anomalies in their patient records, search
engines can do better ranking and ad placement, and environmental and public health
agencies can spot patterns and abnormalities in their data. The list continues, with
cybersecurity and computer network intrusion detection; monitoring of the energy
consumption of household appliances; pattern analysis in bioinformatics and pharma-
ceutical data; financial and business intelligence data; spotting trends in blogs, Twitter,
and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus,
collecting and storing data is easier than ever before.
The problem then becomes how to analyze the data. This is exactly the focus of this
Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all
the related methods, from the classic topics of clustering and classification, to database
methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g.,
SVD/PCA, wavelets, support vector machines).
The exposition is extremely accessible to beginners and advanced readers alike. The
book gives the fundamental material first and the more advanced material in follow-up
chapters. It also has numerous rhetorical questions, which I found extremely helpful for
maintaining focus.
We have used the first two editions as textbooks in data mining courses at Carnegie
Mellon and plan to continue to do so with this Third Edition. The new version has
significant additions: Notably, it has more than 100 citations to works from 2006
onward, focusing on more recent material such as graphs and social networks, sen-
sor networks, and outlier detection. This book has a new section for visualization, has
expanded outlier detection into a whole chapter, and has separate chapters for advanced
xix
xx Foreword
methods—for example, pattern mining with top-k patterns and more and clustering
methods with biclustering and graph clustering.
Overall, it is an excellent book on classic and modern data mining methods, and it is
ideal not only for teaching but also as a reference book.
Christos Faloutsos
Carnegie Mellon University
Foreword to Second Edition
We are deluged by data—scientific data, medical data, demographic data, financial data,
and marketing data. People have no time to look at this data. Human attention has
become the precious resource. So, we must find ways to automatically analyze the
data, to automatically classify it, to automatically summarize it, to automatically dis-
cover and characterize trends in it, and to automatically flag anomalies. This is one
of the most active and exciting areas of the database research community. Researchers
in areas including statistics, visualization, artificial intelligence, and machine learning
are contributing to this field. The breadth of the field makes it difficult to grasp the
extraordinary progress over the last few decades.
Six years ago, Jiawei Han’s and Micheline Kamber’s seminal textbook organized and
presented Data Mining. It heralded a golden age of innovation in the field. This revision
of their book reflects that progress; more than half of the references and historical notes
are to recent work. The field has matured with many new and improved algorithms, and
has broadened to include many more datatypes: streams, sequences, graphs, time-series,
geospatial, audio, images, and video. We are certainly not at the end of the golden age—
indeed research and commercial interest in data mining continues to grow—but we are
all fortunate to have this modern compendium.
The book gives quick introductions to database and data mining concepts with
particular emphasis on data analysis. It then covers in a chapter-by-chapter tour the
concepts and techniques that underlie classification, prediction, association, and clus-
tering. These topics are presented with examples, a tour of the best algorithms for each
problem class, and with pragmatic rules of thumb about when to apply each technique.
The Socratic presentation style is both very readable and very informative. I certainly
learned a lot from reading the first edition and got re-educated and updated in reading
the second edition.
Jiawei Han and Micheline Kamber have been leading contributors to data mining
research. This is the text they use with their students to bring them up to speed on
xxi
xxii Foreword to Second Edition
the field. The field is evolving very rapidly, but this book is a quick way to learn the
basic ideas, and to understand where the field is today. I found it very informative and
stimulating, and believe you will too.
Jim Gray
In his memory
Preface
The computerization of our society has substantially enhanced our capabilities for both
generating and collecting data from diverse sources. A tremendous amount of data has
flooded almost every aspect of our lives. This explosive growth in stored or transient
data has generated an urgent need for new techniques and automated tools that can
intelligently assist us in transforming the vast amounts of data into useful information
and knowledge. This has led to the generation of a promising and flourishing frontier
in computer science called data mining, and its various applications. Data mining, also
popularly referred to as knowledge discovery from data (KDD), is the automated or con-
venient extraction of patterns representing knowledge implicitly stored or captured in
large databases, data warehouses, the Web, other massive information repositories, or
data streams.
This book explores the concepts and techniques of knowledge discovery and data min-
ing. As a multidisciplinary field, data mining draws on work from areas including statistics,
machine learning, pattern recognition, database technology, information retrieval,
network science, knowledge-based systems, artificial intelligence, high-performance
computing, and data visualization. We focus on issues relating to the feasibility, use-
fulness, effectiveness, and scalability of techniques for the discovery of patterns hidden
in large data sets. As a result, this book is not intended as an introduction to statis-
tics, machine learning, database systems, or other such areas, although we do provide
some background knowledge to facilitate the reader’s comprehension of their respective
roles in data mining. Rather, the book is a comprehensive introduction to data mining.
It is useful for computing science students, application developers, and business
professionals, as well as researchers involved in any of the disciplines previously listed.
Data mining emerged during the late 1980s, made great strides during the 1990s, and
continues to flourish into the new millennium. This book presents an overall picture
of the field, introducing interesting data mining techniques and systems and discussing
applications and research directions. An important motivation for writing this book was
the need to build an organized framework for the study of data mining—a challenging
task, owing to the extensive multidisciplinary nature of this fast-developing field. We
hope that this book will encourage people with different backgrounds and experiences
to exchange their views regarding data mining so as to contribute toward the further
promotion and shaping of this exciting and dynamic field.
xxiii
xxiv Preface
n 1844 Thomas Hood wrote and published his famous “Song of the
Howe first commenced his work on the sewing machine in 1844, and
although he had made a rough model of that date, he was too poor to
follow it up with more practical results until a former schoolmate, George
Fisher, provided $500 to build a machine and support his family while it
was being constructed, in consideration of which Mr. Fisher was to receive
a half interest in the invention. In April, 1845, the machine was
completed, and in July he sewed two suits of clothes on it, one for Mr.
Fisher and the other for himself. Notwithstanding the success of his
machine, which on public exhibition beat five of the swiftest hand sewers,
he met only discouragement and disappointment. He, however, built a
second machine, which was the basis of his patent, and is the one shown
in the illustration. After obtaining his United States patent Howe went to
England with the hope of introducing his machine there, but, failing, he
returned to America, some years later, only to find that his invention had
been taken up by infringers, and that sewing machines embodying his
invention were being built and sold. These infringers sought to break his
patent by endeavoring to prove, but without success, that Howe’s
invention was anticipated by the abandoned experiments of Walter Hunt
in 1834. Howe won his suit, and the infringers were obliged to pay him
royalties, which, for a time, amounted to $25 on each machine. Howe
then bought the outstanding interest in his patent, established a factory in
New York, and from the profits of his manufacture, and the royalties, he
soon reaped a princely fortune of several million dollars. In six years his
royalties had grown from $300 to $200,000 a year, and in 1863 his
royalties were estimated at $4,000 a day.
A patent that occupied an important place in sewing machine feeds was
that granted to Bachelder May 8, 1849, No. 6,439, in which a spiked and
endless belt passed horizontally around two pulleys. This patent contained
the first continuous feed, and it was re-issued and extended, and ran with
dominating claims on the continuous feed, until 1877.
FIG. 145.—WILSON SEWING MACHINE, 1852.
From the foregoing table it will be seen that as far back as a quarter of a
century ago the output of machines was over a half a million a year. By
1877 all of the fundamental patents on the sewing machine had expired,
but the continued activity of inventors in this field is attested by the fact
that to-day there are many thousands of patents relating to the sewing
machine and its parts. Besides those relating to the organization of the
machine itself there is an endless variety of attachments, such as
hemmers, tuckers, fellers, quilters, binders, gatherers and rufflers,
embroiderers, corders and button hole attachments. Every part of the
machine has also received separate attention and separate patents, all
tending to the perfection of the machine, until to-day, with all fundamental
principles public property, and endless improvements in details, it is
difficult to discriminate as to comparative excellence.
There is to-day a great variety of sewing machines on the market,
standard machines for ordinary work, and special machines for numerous
special applications. It is said that one concern alone manufactures over
four hundred different varieties of sewing machines.
One of the most important and revolutionary of the applications of the
sewing machine is for making shoes. Prior to 1861 shoemaking was
confined to the slow, laborious hand methods of the shoemaker. Cheap
shoes could only be made by roughly fastening the soles to the uppers by
wooden pegs, whose row of projecting points within has made many a
man and boy do unnecessary penance. Hand sewed shoes cost from $8 to
$12 a pair, and were too expensive a luxury for any but the rich. With the
McKay shoe sewing machine in 1861, however, comfortable shoes were
made, with the soles strongly and substantially sewed to the uppers, at a
less price even than the coarse and clumsy pegged variety. The McKay
machine was the result of more than three years patient study and work.
It was covered by United States patents No. 35,105, April 29, 1862; No.
35,165, May 6, 1862; No. 36,163, Aug. 12, 1862; and No. 45,422, Dec.
13, 1864, and its development cost $130,000 before practical results were
obtained. A modern form of it is shown in Fig. 147. In preparing a shoe for
the machine, an inner sole is placed on the last, the upper is then lasted
and its edges secured to the inner sole. An outer sole, channeled to
receive the stitches, is then tacked on so that the edges of the upper are
caught and retained between the two soles. The shoe is then placed on
the end of a rotary support called a horn, which holds it up to the needle.
A spool containing thread coated with shoemakers’ wax is carried by the
horn, and the thread, with its wax kept soft by a lamp, runs up the inside
of the horn to the whirl. The latter is a small ring placed at the upper end
of the horn, and through which there is an opening for the passage of the
needle. The needle has a barb, or hook, and as it descends through the
sole the whirl lays the thread in this hook, and as the needle rises it draws
the thread through the soles and forms a chain stitch in the external
channel of the outer sole. As the sewing proceeds, the horn is rotated so
as to bring every part of the margin of the sole under the needle. With
this machine a single operator has been able to sew nine hundred pairs of
shoes in a day of ten hours, and five hundred to six hundred pairs is only
an average workman’s output. It is said that up to 1877 there were
350,000,000 pairs of shoes made on this machine in the United States,
and probably an equal or greater number in Europe. Shoes made on this
machine were strongly made and comfortable, but they could not be
resoled by a shoemaker, except by pegging or nailing, and the soles were
furthermore somewhat stiff and lacking in flexibility. To meet these
difficulties, a new machine known as the “Goodyear Welt Machine,” was
patented in 1871 and 1875, and brought out a little later. This sewed a
welt to an upper, which welt in a subsequent operation was sewed by an
external row of stitches to the sole. This gave much greater flexibility, and
the further advantage of enabling a shoemaker to half sole the shoe by
the old method of hand sewing. This advanced the art of shoemaking in
the finer varieties of shoes, and to-day nearly all men’s fine shoes are
made in this way. The introduction of the sewing machine into the shoe
industry made a new era in foot wear, and it is said that no nation on
earth is so well and cheaply shod as the people of the United States.
n the harvest scenes upon the tombs of ancient Thebes the thirsty
ebooksecure.com