100% found this document useful (3 votes)

14 views

(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download

The document is a comprehensive guide on data mining concepts and techniques, detailing various aspects such as data preprocessing, data warehousing, and pattern mining. It includes chapters on the types of data that can be mined, the technologies used, and applications in business analytics. Additionally, it provides links to various related eBooks available for download.

Uploaded by

frealieakut

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

14 views

(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download

Uploaded by

frealieakut

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

(eBook PDF) Data Mining Concepts and Techniques

3rd pdf download

https://ebooksecure.com/product/ebook-pdf-data-mining-concepts-
and-techniques-3rd/

Download more ebook from https://ebooksecure.com

We believe these products will be a great fit for you. Click
the link to download now, or visit ebooksecure.com
to discover even more!

(eBook PDF) Data Mining for Business Analytics:

Concepts, Techniques, and Applications with XLMiner 3rd
Edition

http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-with-
xlminer-3rd-edition/

(eBook PDF) Data Mining for Business Analytics:

Concepts, Techniques, and Applications in R

http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-in-r/

(eBook PDF) Data Mining for Business Analytics:

Concepts, Techniques, and Applications with JMP Pro

http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-with-jmp-
pro/

(eBook PDF) Data Mining and Predictive Analytics 2nd

Edition

http://ebooksecure.com/product/ebook-pdf-data-mining-and-
predictive-analytics-2nd-edition/
(eBook PDF) Introduction to Business Data Mining 1st
Edition

http://ebooksecure.com/product/ebook-pdf-introduction-to-
business-data-mining-1st-edition/

(eBook PDF) Handbook of Statistical Analysis and Data

Mining Applications 2nd Edition

http://ebooksecure.com/product/ebook-pdf-handbook-of-statistical-
analysis-and-data-mining-applications-2nd-edition/

Predictive Modeling in Biomedical Data Mining and

Analysis 1st Edition- eBook PDF

https://ebooksecure.com/download/predictive-modeling-in-
biomedical-data-mining-and-analysis-ebook-pdf/

(eBook PDF) Introduction to Data Mining, Global Edition

2nd Edition

http://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-global-edition-2nd-edition/

Big Data Mining for Climate Change 1st edition - eBook

PDF

https://ebooksecure.com/download/big-data-mining-for-climate-
change-ebook-pdf/
To Y. Dora and Lawrence for your love and encouragement
J.H.

To Erik, Kevan, Kian, and Mikael for your love and inspiration
M.K.

To my wife, Jennifer, and daughter, Jacqueline

J.P.
This page intentionally left blank
Contents

Foreword xix
Foreword to Second Edition xxi
Preface xxiii
Acknowledgments xxxi
About the Authors xxxv

Chapter 1 Introduction 1
1.1 Why Data Mining? 1
1.1.1 Moving toward the Information Age 1
1.1.2 Data Mining as the Evolution of Information Technology 2
1.2 What Is Data Mining? 5
1.3 What Kinds of Data Can Be Mined? 8
1.3.1 Database Data 9
1.3.2 Data Warehouses 10
1.3.3 Transactional Data 13
1.3.4 Other Kinds of Data 14
1.4 What Kinds of Patterns Can Be Mined? 15
1.4.1 Class/Concept Description: Characterization and Discrimination 15
1.4.2 Mining Frequent Patterns, Associations, and Correlations 17
1.4.3 Classification and Regression for Predictive Analysis 18
1.4.4 Cluster Analysis 19
1.4.5 Outlier Analysis 20
1.4.6 Are All Patterns Interesting? 21
1.5 Which Technologies Are Used? 23
1.5.1 Statistics 23
1.5.2 Machine Learning 24
1.5.3 Database Systems and Data Warehouses 26
1.5.4 Information Retrieval 26

ix
x Contents

1.6 Which Kinds of Applications Are Targeted? 27

1.6.1 Business Intelligence 27
1.6.2 Web Search Engines 28
1.7 Major Issues in Data Mining 29
1.7.1 Mining Methodology 29
1.7.2 User Interaction 30
1.7.3 Efficiency and Scalability 31
1.7.4 Diversity of Database Types 32
1.7.5 Data Mining and Society 32
1.8 Summary 33
1.9 Exercises 34
1.10 Bibliographic Notes 35
Chapter 2 Getting to Know Your Data 39
2.1 Data Objects and Attribute Types 40
2.1.1 What Is an Attribute? 40
2.1.2 Nominal Attributes 41
2.1.3 Binary Attributes 41
2.1.4 Ordinal Attributes 42
2.1.5 Numeric Attributes 43
2.1.6 Discrete versus Continuous Attributes 44
2.2 Basic Statistical Descriptions of Data 44
2.2.1 Measuring the Central Tendency: Mean, Median, and Mode 45
2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance,
Standard Deviation, and Interquartile Range 48
2.2.3 Graphic Displays of Basic Statistical Descriptions of Data 51
2.3 Data Visualization 56
2.3.1 Pixel-Oriented Visualization Techniques 57
2.3.2 Geometric Projection Visualization Techniques 58
2.3.3 Icon-Based Visualization Techniques 60
2.3.4 Hierarchical Visualization Techniques 63
2.3.5 Visualizing Complex Data and Relations 64
2.4 Measuring Data Similarity and Dissimilarity 65
2.4.1 Data Matrix versus Dissimilarity Matrix 67
2.4.2 Proximity Measures for Nominal Attributes 68
2.4.3 Proximity Measures for Binary Attributes 70
2.4.4 Dissimilarity of Numeric Data: Minkowski Distance 72
2.4.5 Proximity Measures for Ordinal Attributes 74
2.4.6 Dissimilarity for Attributes of Mixed Types 75
2.4.7 Cosine Similarity 77
2.5 Summary 79
2.6 Exercises 79
2.7 Bibliographic Notes 81
Contents xi

Chapter 3 Data Preprocessing 83

3.1 Data Preprocessing: An Overview 84
3.1.1 Data Quality: Why Preprocess the Data? 84
3.1.2 Major Tasks in Data Preprocessing 85
3.2 Data Cleaning 88
3.2.1 Missing Values 88
3.2.2 Noisy Data 89
3.2.3 Data Cleaning as a Process 91
3.3 Data Integration 93
3.3.1 Entity Identification Problem 94
3.3.2 Redundancy and Correlation Analysis 94
3.3.3 Tuple Duplication 98
3.3.4 Data Value Conflict Detection and Resolution 99
3.4 Data Reduction 99
3.4.1 Overview of Data Reduction Strategies 99
3.4.2 Wavelet Transforms 100
3.4.3 Principal Components Analysis 102
3.4.4 Attribute Subset Selection 103
3.4.5 Regression and Log-Linear Models: Parametric
Data Reduction 105
3.4.6 Histograms 106
3.4.7 Clustering 108
3.4.8 Sampling 108
3.4.9 Data Cube Aggregation 110
3.5 Data Transformation and Data Discretization 111
3.5.1 Data Transformation Strategies Overview 112
3.5.2 Data Transformation by Normalization 113
3.5.3 Discretization by Binning 115
3.5.4 Discretization by Histogram Analysis 115
3.5.5 Discretization by Cluster, Decision Tree, and Correlation
Analyses 116
3.5.6 Concept Hierarchy Generation for Nominal Data 117
3.6 Summary 120
3.7 Exercises 121
3.8 Bibliographic Notes 123

Chapter 4 Data Warehousing and Online Analytical Processing 125

4.1 Data Warehouse: Basic Concepts 125
4.1.1 What Is a Data Warehouse? 126
4.1.2 Differences between Operational Database Systems
and Data Warehouses 128
4.1.3 But, Why Have a Separate Data Warehouse? 129
xii Contents

4.1.4 Data Warehousing: A Multitiered Architecture 130

4.1.5 Data Warehouse Models: Enterprise Warehouse, Data Mart,
and Virtual Warehouse 132
4.1.6 Extraction, Transformation, and Loading 134
4.1.7 Metadata Repository 134
4.2 Data Warehouse Modeling: Data Cube and OLAP 135
4.2.1 Data Cube: A Multidimensional Data Model 136
4.2.2 Stars, Snowflakes, and Fact Constellations: Schemas
for Multidimensional Data Models 139
4.2.3 Dimensions: The Role of Concept Hierarchies 142
4.2.4 Measures: Their Categorization and Computation 144
4.2.5 Typical OLAP Operations 146
4.2.6 A Starnet Query Model for Querying Multidimensional
Databases 149
4.3 Data Warehouse Design and Usage 150
4.3.1 A Business Analysis Framework for Data Warehouse Design 150
4.3.2 Data Warehouse Design Process 151
4.3.3 Data Warehouse Usage for Information Processing 153
4.3.4 From Online Analytical Processing to Multidimensional
Data Mining 155
4.4 Data Warehouse Implementation 156
4.4.1 Efficient Data Cube Computation: An Overview 156
4.4.2 Indexing OLAP Data: Bitmap Index and Join Index 160
4.4.3 Efficient Processing of OLAP Queries 163
4.4.4 OLAP Server Architectures: ROLAP versus MOLAP
versus HOLAP 164
4.5 Data Generalization by Attribute-Oriented Induction 166
4.5.1 Attribute-Oriented Induction for Data Characterization 167
4.5.2 Efficient Implementation of Attribute-Oriented Induction 172
4.5.3 Attribute-Oriented Induction for Class Comparisons 175
4.6 Summary 178
4.7 Exercises 180
4.8 Bibliographic Notes 184
Chapter 5 Data Cube Technology 187
5.1 Data Cube Computation: Preliminary Concepts 188
5.1.1 Cube Materialization: Full Cube, Iceberg Cube, Closed Cube,
and Cube Shell 188
5.1.2 General Strategies for Data Cube Computation 192
5.2 Data Cube Computation Methods 194
5.2.1 Multiway Array Aggregation for Full Cube Computation 195
Contents xiii

5.2.2 BUC: Computing Iceberg Cubes from the Apex Cuboid

Downward 200
5.2.3 Star-Cubing: Computing Iceberg Cubes Using a Dynamic
Star-Tree Structure 204
5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP 210
5.3 Processing Advanced Kinds of Queries by Exploring Cube
Technology 218
5.3.1 Sampling Cubes: OLAP-Based Mining on Sampling Data 218
5.3.2 Ranking Cubes: Efficient Computation of Top-k Queries 225
5.4 Multidimensional Data Analysis in Cube Space 227
5.4.1 Prediction Cubes: Prediction Mining in Cube Space 227
5.4.2 Multifeature Cubes: Complex Aggregation at Multiple
Granularities 230
5.4.3 Exception-Based, Discovery-Driven Cube Space Exploration 231
5.5 Summary 234
5.6 Exercises 235
5.7 Bibliographic Notes 240

Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic

Concepts and Methods 243
6.1 Basic Concepts 243
6.1.1 Market Basket Analysis: A Motivating Example 244
6.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules 246
6.2 Frequent Itemset Mining Methods 248
6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined
Candidate Generation 248
6.2.2 Generating Association Rules from Frequent Itemsets 254
6.2.3 Improving the Efficiency of Apriori 254
6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets 257
6.2.5 Mining Frequent Itemsets Using Vertical Data Format 259
6.2.6 Mining Closed and Max Patterns 262
6.3 Which Patterns Are Interesting?—Pattern Evaluation
Methods 264
6.3.1 Strong Rules Are Not Necessarily Interesting 264
6.3.2 From Association Analysis to Correlation Analysis 265
6.3.3 A Comparison of Pattern Evaluation Measures 267
6.4 Summary 271
6.5 Exercises 273
6.6 Bibliographic Notes 276
xiv Contents

Chapter 7 Advanced Pattern Mining 279

7.1 Pattern Mining: A Road Map 279
7.2 Pattern Mining in Multilevel, Multidimensional Space 283
7.2.1 Mining Multilevel Associations 283
7.2.2 Mining Multidimensional Associations 287
7.2.3 Mining Quantitative Association Rules 289
7.2.4 Mining Rare Patterns and Negative Patterns 291
7.3 Constraint-Based Frequent Pattern Mining 294
7.3.1 Metarule-Guided Mining of Association Rules 295
7.3.2 Constraint-Based Pattern Generation: Pruning Pattern Space
and Pruning Data Space 296
7.4 Mining High-Dimensional Data and Colossal Patterns 301
7.4.1 Mining Colossal Patterns by Pattern-Fusion 302
7.5 Mining Compressed or Approximate Patterns 307
7.5.1 Mining Compressed Patterns by Pattern Clustering 308
7.5.2 Extracting Redundancy-Aware Top-k Patterns 310
7.6 Pattern Exploration and Application 313
7.6.1 Semantic Annotation of Frequent Patterns 313
7.6.2 Applications of Pattern Mining 317
7.7 Summary 319
7.8 Exercises 321
7.9 Bibliographic Notes 323

Chapter 8 Classification: Basic Concepts 327

8.1 Basic Concepts 327
8.1.1 What Is Classification? 327
8.1.2 General Approach to Classification 328
8.2 Decision Tree Induction 330
8.2.1 Decision Tree Induction 332
8.2.2 Attribute Selection Measures 336
8.2.3 Tree Pruning 344
8.2.4 Scalability and Decision Tree Induction 347
8.2.5 Visual Mining for Decision Tree Induction 348
8.3 Bayes Classification Methods 350
8.3.1 Bayes’ Theorem 350
8.3.2 Naı̈ve Bayesian Classification 351
8.4 Rule-Based Classification 355
8.4.1 Using IF-THEN Rules for Classification 355
8.4.2 Rule Extraction from a Decision Tree 357
8.4.3 Rule Induction Using a Sequential Covering Algorithm 359
Contents xv

8.5 Model Evaluation and Selection 364

8.5.1 Metrics for Evaluating Classifier Performance 364
8.5.2 Holdout Method and Random Subsampling 370
8.5.3 Cross-Validation 370
8.5.4 Bootstrap 371
8.5.5 Model Selection Using Statistical Tests of Significance 372
8.5.6 Comparing Classifiers Based on Cost–Benefit and ROC Curves 373
8.6 Techniques to Improve Classification Accuracy 377
8.6.1 Introducing Ensemble Methods 378
8.6.2 Bagging 379
8.6.3 Boosting and AdaBoost 380
8.6.4 Random Forests 382
8.6.5 Improving Classification Accuracy of Class-Imbalanced Data 383
8.7 Summary 385
8.8 Exercises 386
8.9 Bibliographic Notes 389
Chapter 9 Classification: Advanced Methods 393
9.1 Bayesian Belief Networks 393
9.1.1 Concepts and Mechanisms 394
9.1.2 Training Bayesian Belief Networks 396
9.2 Classification by Backpropagation 398
9.2.1 A Multilayer Feed-Forward Neural Network 398
9.2.2 Defining a Network Topology 400
9.2.3 Backpropagation 400
9.2.4 Inside the Black Box: Backpropagation and Interpretability 406
9.3 Support Vector Machines 408
9.3.1 The Case When the Data Are Linearly Separable 408
9.3.2 The Case When the Data Are Linearly Inseparable 413
9.4 Classification Using Frequent Patterns 415
9.4.1 Associative Classification 416
9.4.2 Discriminative Frequent Pattern–Based Classification 419
9.5 Lazy Learners (or Learning from Your Neighbors) 422
9.5.1 k-Nearest-Neighbor Classifiers 423
9.5.2 Case-Based Reasoning 425
9.6 Other Classification Methods 426
9.6.1 Genetic Algorithms 426
9.6.2 Rough Set Approach 427
9.6.3 Fuzzy Set Approaches 428
9.7 Additional Topics Regarding Classification 429
9.7.1 Multiclass Classification 430
xvi Contents

9.7.2 Semi-Supervised Classification 432

9.7.3 Active Learning 433
9.7.4 Transfer Learning 434
9.8 Summary 436
9.9 Exercises 438
9.10 Bibliographic Notes 439
Chapter 10 Cluster Analysis: Basic Concepts and Methods 443
10.1 Cluster Analysis 444
10.1.1 What Is Cluster Analysis? 444
10.1.2 Requirements for Cluster Analysis 445
10.1.3 Overview of Basic Clustering Methods 448
10.2 Partitioning Methods 451
10.2.1 k-Means: A Centroid-Based Technique 451
10.2.2 k-Medoids: A Representative Object-Based Technique 454
10.3 Hierarchical Methods 457
10.3.1 Agglomerative versus Divisive Hierarchical Clustering 459
10.3.2 Distance Measures in Algorithmic Methods 461
10.3.3 BIRCH: Multiphase Hierarchical Clustering Using Clustering
Feature Trees 462
10.3.4 Chameleon: Multiphase Hierarchical Clustering Using Dynamic
Modeling 466
10.3.5 Probabilistic Hierarchical Clustering 467
10.4 Density-Based Methods 471
10.4.1 DBSCAN: Density-Based Clustering Based on Connected
Regions with High Density 471
10.4.2 OPTICS: Ordering Points to Identify the Clustering Structure 473
10.4.3 DENCLUE: Clustering Based on Density Distribution Functions 476
10.5 Grid-Based Methods 479
10.5.1 STING: STatistical INformation Grid 479
10.5.2 CLIQUE: An Apriori-like Subspace Clustering Method 481
10.6 Evaluation of Clustering 483
10.6.1 Assessing Clustering Tendency 484
10.6.2 Determining the Number of Clusters 486
10.6.3 Measuring Clustering Quality 487
10.7 Summary 490
10.8 Exercises 491
10.9 Bibliographic Notes 494

Chapter 11 Advanced Cluster Analysis 497

11.1 Probabilistic Model-Based Clustering 497
11.1.1 Fuzzy Clusters 499
Contents xvii

11.1.2 Probabilistic Model-Based Clusters 501

11.1.3 Expectation-Maximization Algorithm 505
11.2 Clustering High-Dimensional Data 508
11.2.1 Clustering High-Dimensional Data: Problems, Challenges,
and Major Methodologies 508
11.2.2 Subspace Clustering Methods 510
11.2.3 Biclustering 512
11.2.4 Dimensionality Reduction Methods and Spectral Clustering 519
11.3 Clustering Graph and Network Data 522
11.3.1 Applications and Challenges 523
11.3.2 Similarity Measures 525
11.3.3 Graph Clustering Methods 528
11.4 Clustering with Constraints 532
11.4.1 Categorization of Constraints 533
11.4.2 Methods for Clustering with Constraints 535
11.5 Summary 538
11.6 Exercises 539
11.7 Bibliographic Notes 540

Chapter 12 Outlier Detection 543

12.1 Outliers and Outlier Analysis 544
12.1.1 What Are Outliers? 544
12.1.2 Types of Outliers 545
12.1.3 Challenges of Outlier Detection 548
12.2 Outlier Detection Methods 549
12.2.1 Supervised, Semi-Supervised, and Unsupervised Methods 549
12.2.2 Statistical Methods, Proximity-Based Methods, and
Clustering-Based Methods 551
12.3 Statistical Approaches 553
12.3.1 Parametric Methods 553
12.3.2 Nonparametric Methods 558
12.4 Proximity-Based Approaches 560
12.4.1 Distance-Based Outlier Detection and a Nested Loop
Method 561
12.4.2 A Grid-Based Method 562
12.4.3 Density-Based Outlier Detection 564
12.5 Clustering-Based Approaches 567
12.6 Classification-Based Approaches 571
12.7 Mining Contextual and Collective Outliers 573
12.7.1 Transforming Contextual Outlier Detection to Conventional
Outlier Detection 573
xviii Contents

12.7.2 Modeling Normal Behavior with Respect to Contexts 574

12.7.3 Mining Collective Outliers 575
12.8 Outlier Detection in High-Dimensional Data 576
12.8.1 Extending Conventional Outlier Detection 577
12.8.2 Finding Outliers in Subspaces 578
12.8.3 Modeling High-Dimensional Outliers 579
12.9 Summary 581
12.10 Exercises 582
12.11 Bibliographic Notes 583
Chapter 13 Data Mining Trends and Research Frontiers 585
13.1 Mining Complex Data Types 585
13.1.1 Mining Sequence Data: Time-Series, Symbolic Sequences,
and Biological Sequences 586
13.1.2 Mining Graphs and Networks 591
13.1.3 Mining Other Kinds of Data 595
13.2 Other Methodologies of Data Mining 598
13.2.1 Statistical Data Mining 598
13.2.2 Views on Data Mining Foundations 600
13.2.3 Visual and Audio Data Mining 602
13.3 Data Mining Applications 607
13.3.1 Data Mining for Financial Data Analysis 607
13.3.2 Data Mining for Retail and Telecommunication Industries 609
13.3.3 Data Mining in Science and Engineering 611
13.3.4 Data Mining for Intrusion Detection and Prevention 614
13.3.5 Data Mining and Recommender Systems 615
13.4 Data Mining and Society 618
13.4.1 Ubiquitous and Invisible Data Mining 618
13.4.2 Privacy, Security, and Social Impacts of Data Mining 620
13.5 Data Mining Trends 622
13.6 Summary 625
13.7 Exercises 626
13.8 Bibliographic Notes 628

Bibliography 633
Index 673
Foreword

Analyzing large amounts of data is a necessity. Even popular science books, like “super
crunchers,” give compelling cases where large amounts of data yield discoveries and
intuitions that surprise even experts. Every enterprise benefits from collecting and ana-
lyzing its data: Hospitals can spot trends and anomalies in their patient records, search
engines can do better ranking and ad placement, and environmental and public health
agencies can spot patterns and abnormalities in their data. The list continues, with
cybersecurity and computer network intrusion detection; monitoring of the energy
consumption of household appliances; pattern analysis in bioinformatics and pharma-
ceutical data; financial and business intelligence data; spotting trends in blogs, Twitter,
and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus,
collecting and storing data is easier than ever before.
The problem then becomes how to analyze the data. This is exactly the focus of this
Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all
the related methods, from the classic topics of clustering and classification, to database
methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g.,
SVD/PCA, wavelets, support vector machines).
The exposition is extremely accessible to beginners and advanced readers alike. The
book gives the fundamental material first and the more advanced material in follow-up
chapters. It also has numerous rhetorical questions, which I found extremely helpful for
maintaining focus.
We have used the first two editions as textbooks in data mining courses at Carnegie
Mellon and plan to continue to do so with this Third Edition. The new version has
significant additions: Notably, it has more than 100 citations to works from 2006
onward, focusing on more recent material such as graphs and social networks, sen-
sor networks, and outlier detection. This book has a new section for visualization, has
expanded outlier detection into a whole chapter, and has separate chapters for advanced

xix
xx Foreword

methods—for example, pattern mining with top-k patterns and more and clustering
methods with biclustering and graph clustering.
Overall, it is an excellent book on classic and modern data mining methods, and it is
ideal not only for teaching but also as a reference book.

Christos Faloutsos
Carnegie Mellon University
Foreword to Second Edition

We are deluged by data—scientific data, medical data, demographic data, financial data,
and marketing data. People have no time to look at this data. Human attention has
become the precious resource. So, we must find ways to automatically analyze the
data, to automatically classify it, to automatically summarize it, to automatically dis-
cover and characterize trends in it, and to automatically flag anomalies. This is one
of the most active and exciting areas of the database research community. Researchers
in areas including statistics, visualization, artificial intelligence, and machine learning
are contributing to this field. The breadth of the field makes it difficult to grasp the
extraordinary progress over the last few decades.
Six years ago, Jiawei Han’s and Micheline Kamber’s seminal textbook organized and
presented Data Mining. It heralded a golden age of innovation in the field. This revision
of their book reflects that progress; more than half of the references and historical notes
are to recent work. The field has matured with many new and improved algorithms, and
has broadened to include many more datatypes: streams, sequences, graphs, time-series,
geospatial, audio, images, and video. We are certainly not at the end of the golden age—
indeed research and commercial interest in data mining continues to grow—but we are
all fortunate to have this modern compendium.
The book gives quick introductions to database and data mining concepts with
particular emphasis on data analysis. It then covers in a chapter-by-chapter tour the
concepts and techniques that underlie classification, prediction, association, and clus-
tering. These topics are presented with examples, a tour of the best algorithms for each
problem class, and with pragmatic rules of thumb about when to apply each technique.
The Socratic presentation style is both very readable and very informative. I certainly
learned a lot from reading the first edition and got re-educated and updated in reading
the second edition.
Jiawei Han and Micheline Kamber have been leading contributors to data mining
research. This is the text they use with their students to bring them up to speed on

xxi
xxii Foreword to Second Edition

the field. The field is evolving very rapidly, but this book is a quick way to learn the
basic ideas, and to understand where the field is today. I found it very informative and
stimulating, and believe you will too.

Jim Gray
In his memory
Preface

The computerization of our society has substantially enhanced our capabilities for both
generating and collecting data from diverse sources. A tremendous amount of data has
flooded almost every aspect of our lives. This explosive growth in stored or transient
data has generated an urgent need for new techniques and automated tools that can
intelligently assist us in transforming the vast amounts of data into useful information
and knowledge. This has led to the generation of a promising and flourishing frontier
in computer science called data mining, and its various applications. Data mining, also
popularly referred to as knowledge discovery from data (KDD), is the automated or con-
venient extraction of patterns representing knowledge implicitly stored or captured in
large databases, data warehouses, the Web, other massive information repositories, or
data streams.
This book explores the concepts and techniques of knowledge discovery and data min-
ing. As a multidisciplinary field, data mining draws on work from areas including statistics,
machine learning, pattern recognition, database technology, information retrieval,
network science, knowledge-based systems, artificial intelligence, high-performance
computing, and data visualization. We focus on issues relating to the feasibility, use-
fulness, effectiveness, and scalability of techniques for the discovery of patterns hidden
in large data sets. As a result, this book is not intended as an introduction to statis-
tics, machine learning, database systems, or other such areas, although we do provide
some background knowledge to facilitate the reader’s comprehension of their respective
roles in data mining. Rather, the book is a comprehensive introduction to data mining.
It is useful for computing science students, application developers, and business
professionals, as well as researchers involved in any of the disciplines previously listed.
Data mining emerged during the late 1980s, made great strides during the 1990s, and
continues to flourish into the new millennium. This book presents an overall picture
of the field, introducing interesting data mining techniques and systems and discussing
applications and research directions. An important motivation for writing this book was
the need to build an organized framework for the study of data mining—a challenging
task, owing to the extensive multidisciplinary nature of this fast-developing field. We
hope that this book will encourage people with different backgrounds and experiences
to exchange their views regarding data mining so as to contribute toward the further
promotion and shaping of this exciting and dynamic field.

xxiii
xxiv Preface

Organization of the Book

Since the publication of the first two editions of this book, great progress has been
made in the field of data mining. Many new data mining methodologies, systems, and
applications have been developed, especially for handling new kinds of data, includ-
ing information networks, graphs, complex structures, and data streams, as well as text,
Web, multimedia, time-series, and spatiotemporal data. Such fast development and rich,
new technical contents make it difficult to cover the full spectrum of the field in a single
book. Instead of continuously expanding the coverage of this book, we have decided to
cover the core material in sufficient scope and depth, and leave the handling of complex
data types to a separate forthcoming book.
The third edition substantially revises the first two editions of the book, with numer-
ous enhancements and a reorganization of the technical contents. The core technical
material, which handles mining on general data types, is expanded and substantially
enhanced. Several individual chapters for topics from the second edition (e.g., data pre-
processing, frequent pattern mining, classification, and clustering) are now augmented
and each split into two chapters for this new edition. For these topics, one chapter encap-
sulates the basic concepts and techniques while the other presents advanced concepts
and methods.
Chapters from the second edition on mining complex data types (e.g., stream data,
sequence data, graph-structured data, social network data, and multirelational data,
as well as text, Web, multimedia, and spatiotemporal data) are now reserved for a new
book that will be dedicated to advanced topics in data mining. Still, to support readers
in learning such advanced topics, we have placed an electronic version of the relevant
chapters from the second edition onto the book’s web site as companion material for
the third edition.
The chapters of the third edition are described briefly as follows, with emphasis on
the new material.
Chapter 1 provides an introduction to the multidisciplinary field of data mining. It
discusses the evolutionary path of information technology, which has led to the need
for data mining, and the importance of its applications. It examines the data types to be
mined, including relational, transactional, and data warehouse data, as well as complex
data types such as time-series, sequences, data streams, spatiotemporal data, multimedia
data, text data, graphs, social networks, and Web data. The chapter presents a general
classification of data mining tasks, based on the kinds of knowledge to be mined, the
kinds of technologies used, and the kinds of applications that are targeted. Finally, major
challenges in the field are discussed.
Chapter 2 introduces the general data features. It first discusses data objects and
attribute types and then introduces typical measures for basic statistical data descrip-
tions. It overviews data visualization techniques for various kinds of data. In addition
to methods of numeric data visualization, methods for visualizing text, tags, graphs,
and multidimensional data are introduced. Chapter 2 also introduces ways to measure
similarity and dissimilarity for various kinds of data.
Another Random Document on
Scribd Without Any Related Topics
may be written, and in special cases, when copying from memory, a speed
of 150 words a minute has been maintained for a limited time. It was
estimated that there were in use in the United States in 1896 150,000
typewriters, and that up to that time 450,000 had been made altogether.
In the last four years this number has been greatly increased, and a fair
estimate of the present output in the United States is between 75,000 and
100,000 yearly. In 1898 there were exported from the United States
typewriting machines to the value of $1,902,153.
The typewriter has not only revolutionized modern business methods, by
furnishing a quick and legible copy that may be rapidly taken from
dictation, and also at the same time a duplicate carbon copy for the use of
the writer, but it has established a distinct avocation especially adapted to
the deftness and skill of women, who as bread winners at the end of the
Nineteenth Century are working out a destiny and place in the business
activities of life unthought of a hundred years ago. The typewriter saves
time, labor, postage and paper; it reduces the liability to mistakes, brings
system into official correspondence, and delights the heart of the printer.
It furnishes profitable amusement to the young, and satisfactory aid to the
nervous and paralytic. All over the world it has already traveled—from the
counting house of the merchant to the Imperial Courts of Europe, from
the home of the new woman in the Western Hemisphere to the harem of
the East—everywhere its familiar click is to be heard, faithfully translating
thought into all languages, and for all peoples.
CHAPTER XV.
The Sewing Machine.

Embroidering Machine, the Forerunner of the Sewing Machine—Sewing Machine of

Thomas Saint—The Thimonnier Wooden Machine—Greenough’s Double Pointed Needle
—Bean’s Stationary Needle—The Howe Sewing Machine—Bachelder’s Continuous Feed
—Improvements of Singer—Wilson’s Rotary Hook and Four-Motion Feed—The McKay
Shoe Sewing Machine—Buttonhole Machines—Carpet Sewing Machine—Statistics.

“With fingers weary and worn,

With eyelids heavy and red,
A woman sat in unwomanly rags,
Plying her needle and thread—
Stitch! Stitch! Stitch!
In poverty, hunger and dirt,
And still with a voice of dolorous pitch,
She sang the ‘Song of the Shirt.’”

n 1844 Thomas Hood wrote and published his famous “Song of the

I Shirt,” in which the drudgery of the needle is portrayed with pathetic

fidelity. It is not to be supposed that any relation of cause and effect
exists between the events, but it is nevertheless a singular fact that
about this time Howe commenced work on his great invention, which
was patented in 1846, and was the prototype of the modern sewing
machine. If the sewing machine had appeared a few years earlier, the
“Song of the Shirt” would doubtless never have been written.
From the time of Mother Eve, who crudely stitched together her fig leaves,
sewing seems to have been set apart as an occupation peculiarly
belonging to women, and it may be that this was the reason why in the
history of mechanical progress the sewing machine was so late appearing,
for women are not, as a rule, inventors, and none of the sewing machines
were invented by women.
In all the preceding centuries of civilization hand sewing was exclusively
employed, and it was reserved for the Nineteenth Century to relieve
women from the drudgery which for so many centuries had enslaved
them.
Embroidery machines had been patented in England by Weisenthal in
1755, and Alsop in 1770, and on July 17, 1790, an English patent, No.
1,764, was granted to Thomas Saint for a crude form of sewing machine,
having a horizontal arm and vertical needle. In 1826 a patent was granted
in the United States to one Lye for a sewing machine, but no records of
the same remain, as all were burned in the fire of 1836. In 1830 B.
Thimonnier patented a sewing machine in France, 80 of which, made of
wood, were in use in 1841 for sewing army clothing, but they were
destroyed by a mob, as many other labor-saving inventions had been
before. Between 1832 and 1835 Walter Hunt, of New York, made a lock-
stitch sewing machine, but abandoned it. On Feb. 21, 1842, U. S. Pat. No.
2,466 was granted to J. J. Greenough for a sewing machine having a
double pointed needle with an eye in the middle, which needle was drawn
through the work by pairs of traveling pincers. It was designed for sewing
leather, and an awl pierced the hole in advance of the needle. On March 4,
1843, U. S. Pat. No. 2,982 was granted to B. W. Bean for a sewing
machine in which the needle was stationary, and the cloth was gathered in
crimps or folds and forced over the stationary needle. In 1844, British Pat.
No. 10,424 was granted to Fisher and Gibbons for working ornamental
designs by machinery, in which two threads were looped together, one
passing through the fabric, and the other looping with it on the surface
without passing through.
The great epoch of the sewing machine, however, begins with Elias Howe
and the sewing machine patented by him Sept. 10, 1846, No. 4,750.
Almost everyone is familiar with the modern Howe sewing machine, and it
will be therefore more interesting to present the form in which it originally
appeared. This is shown in Fig. 144. A curved eye-pointed needle was
carried at the end of a pendent vibrating lever, which had a motion
simulating that of a pick-ax in the hands of a workman. The needle took
its thread from a spool situated above the lever, and the tension on the
thread was produced by a spring brake whose semicircular end bore upon
the spool, the pressure being regulated by a vertical thumb screw. The
work was held in a vertical plane by means of a horizontal row of pins
projecting from the edge of a thin metal “baster plate,” to which an
intermittent motion was given by the teeth of a pinion. Above, and to one
side of the “baster plate” was the shuttle race, through which the shuttle
carrying the second thread was driven by two strikers, which were
operated by two arms and cams located on the horizontal main shaft. As
will be seen, this machine bears but little resemblance to any of the
modern machines, but it embodied the three essential features which
characterize most all practical machines, viz.: a grooved needle with the
eye at the point, a shuttle operating on the opposite side of the cloth from
the needle to form a lock stitch, and an automatic feed.

FIG. 144.—HOWE’S SEWING MACHINE, 1846.

Howe first commenced his work on the sewing machine in 1844, and
although he had made a rough model of that date, he was too poor to
follow it up with more practical results until a former schoolmate, George
Fisher, provided $500 to build a machine and support his family while it
was being constructed, in consideration of which Mr. Fisher was to receive
a half interest in the invention. In April, 1845, the machine was
completed, and in July he sewed two suits of clothes on it, one for Mr.
Fisher and the other for himself. Notwithstanding the success of his
machine, which on public exhibition beat five of the swiftest hand sewers,
he met only discouragement and disappointment. He, however, built a
second machine, which was the basis of his patent, and is the one shown
in the illustration. After obtaining his United States patent Howe went to
England with the hope of introducing his machine there, but, failing, he
returned to America, some years later, only to find that his invention had
been taken up by infringers, and that sewing machines embodying his
invention were being built and sold. These infringers sought to break his
patent by endeavoring to prove, but without success, that Howe’s
invention was anticipated by the abandoned experiments of Walter Hunt
in 1834. Howe won his suit, and the infringers were obliged to pay him
royalties, which, for a time, amounted to $25 on each machine. Howe
then bought the outstanding interest in his patent, established a factory in
New York, and from the profits of his manufacture, and the royalties, he
soon reaped a princely fortune of several million dollars. In six years his
royalties had grown from $300 to $200,000 a year, and in 1863 his
royalties were estimated at $4,000 a day.
A patent that occupied an important place in sewing machine feeds was
that granted to Bachelder May 8, 1849, No. 6,439, in which a spiked and
endless belt passed horizontally around two pulleys. This patent contained
the first continuous feed, and it was re-issued and extended, and ran with
dominating claims on the continuous feed, until 1877.
FIG. 145.—WILSON SEWING MACHINE, 1852.

In connection with the development of the sewing machine the name of

A. B. Wilson stands next in rank to that of Howe. Wilson invented the
rotary hook carrying a bobbin, which took the place of the reciprocating
shuttle. This was patented by him June 15, 1852, No. 9,041, and is shown
in Fig. 145. He also invented the far more important improvement of the
four-motion feed, which is a characteristic feature of nearly all practical
family sewing machines. This four-motion feed was pooled in the early
sewing machine combination with the Bachelder and other patents, and
earned for its promotors a far greater pecuniary return than the original
Howe sewing machine itself. Estimates place this profit high in the
millions. The four-motion feed was patented December 19, 1854, No.
12,116, and it is a comparatively simple affair. Divested of its operating
mechanism, it consists simply of a little metal bar serrated with forwardly
projecting saw teeth on its upper surface, to which bar, by means of an
operating cam, a motion in four directions in the path of a rectangle is
given. The serrated bar first rises through a slot in the table, then moves
horizontally to advance the cloth, then drops below the table, and finally
moves back again horizontally below the table to its starting point.
Upon these two important features—the rotating hook patented by Wilson
in 1852, and the four-motion feed, patented in 1854—a large and
important business was built. In this business Mr. Nathaniel Wheeler was
associated with Mr. Wilson, and the well-known Wheeler & Wilson
machines are the result of their enterprise and ingenuity.

FIG. 146.—ORIGINAL SINGER SEWING MACHINE.

Contemporaneous with the Wheeler & Wilson machine were other

excellent machines, among which may be mentioned the Singer machine,
patented Aug. 12, 1851, No. 8,294, by Isaac M. Singer, the original model
of which is shown in Fig. 146. The Singer machine met the demands of
the tailoring and leather industries for a heavier and more powerful
machine. A characteristic feature was the vertical standard with horizontal
arm above the work table, which was afterwards adopted in many other
machines. Singer was the first to apply the treadle to the sewing machine
for actuating it by foot power in the place of the hand-driven crank wheel.
In 1851 W. O. Grover and W. E. Baker patented a machine which made
the double chain stitch, characteristic of the Grover & Baker machine.
James E. A. Gibbs invented and covered in several patents from 1856 to
1860 the single-thread rotating hook, which was embodied in the Wilcox &
Gibbs machine. In addition to these, the “Weed” machine, made under
Fairfield’s patents; the “Domestic” machine, made under Mack’s patents;
and the “Florence” machine, made under Langdon’s patents, were other
representative machines, which, in a few years after Howe’s patent,
helped to revolutionize the art of tailoring, introduced the great era of
ready-made clothing and ready-made shoes, emancipated women from
the drudgery of the needle, and increased the efficiency of one pair of
hands fully ten fold.
In 1856 the owners of the original sewing machine patents formed the
famous “sewing machine combination,” for the establishment of a
common license fee, and for the protection of their mutual interests. The
combination included Elias Howe, the Wheeler & Wilson Manufacturing
Company, the Grover & Baker Sewing Machine Company, and I. M. Singer
& Co. The following summary of machines made by the leading companies
from 1853 to 1876 illustrates the early growth of this industry:
Manufacturer. 1853. 1859. 1867. 1871. 1873. 1876.
Wheeler & Wilson Manufacturing
Co. 799 21,306 38,055 128,526 119,190 108,997
The Singer Manufacturing Company 810 10,953 43,053 181,260 232,444 262,316
Grover & Baker Sewing Machine Co. 657 10,280 32,999 50,838 36,179 ....
Howe Sewing Machine Company .... .... 11,053 134,010 90,000 109,294
Wilcox & Gibbs Sewing Machine Co. .... .... 14,152 30,127 15,881 12,758
Domestic Sewing Machine Company .... .... .... 10,397 40,114 23,587

From the foregoing table it will be seen that as far back as a quarter of a
century ago the output of machines was over a half a million a year. By
1877 all of the fundamental patents on the sewing machine had expired,
but the continued activity of inventors in this field is attested by the fact
that to-day there are many thousands of patents relating to the sewing
machine and its parts. Besides those relating to the organization of the
machine itself there is an endless variety of attachments, such as
hemmers, tuckers, fellers, quilters, binders, gatherers and rufflers,
embroiderers, corders and button hole attachments. Every part of the
machine has also received separate attention and separate patents, all
tending to the perfection of the machine, until to-day, with all fundamental
principles public property, and endless improvements in details, it is
difficult to discriminate as to comparative excellence.
There is to-day a great variety of sewing machines on the market,
standard machines for ordinary work, and special machines for numerous
special applications. It is said that one concern alone manufactures over
four hundred different varieties of sewing machines.
One of the most important and revolutionary of the applications of the
sewing machine is for making shoes. Prior to 1861 shoemaking was
confined to the slow, laborious hand methods of the shoemaker. Cheap
shoes could only be made by roughly fastening the soles to the uppers by
wooden pegs, whose row of projecting points within has made many a
man and boy do unnecessary penance. Hand sewed shoes cost from $8 to
$12 a pair, and were too expensive a luxury for any but the rich. With the
McKay shoe sewing machine in 1861, however, comfortable shoes were
made, with the soles strongly and substantially sewed to the uppers, at a
less price even than the coarse and clumsy pegged variety. The McKay
machine was the result of more than three years patient study and work.
It was covered by United States patents No. 35,105, April 29, 1862; No.
35,165, May 6, 1862; No. 36,163, Aug. 12, 1862; and No. 45,422, Dec.
13, 1864, and its development cost $130,000 before practical results were
obtained. A modern form of it is shown in Fig. 147. In preparing a shoe for
the machine, an inner sole is placed on the last, the upper is then lasted
and its edges secured to the inner sole. An outer sole, channeled to
receive the stitches, is then tacked on so that the edges of the upper are
caught and retained between the two soles. The shoe is then placed on
the end of a rotary support called a horn, which holds it up to the needle.
A spool containing thread coated with shoemakers’ wax is carried by the
horn, and the thread, with its wax kept soft by a lamp, runs up the inside
of the horn to the whirl. The latter is a small ring placed at the upper end
of the horn, and through which there is an opening for the passage of the
needle. The needle has a barb, or hook, and as it descends through the
sole the whirl lays the thread in this hook, and as the needle rises it draws
the thread through the soles and forms a chain stitch in the external
channel of the outer sole. As the sewing proceeds, the horn is rotated so
as to bring every part of the margin of the sole under the needle. With
this machine a single operator has been able to sew nine hundred pairs of
shoes in a day of ten hours, and five hundred to six hundred pairs is only
an average workman’s output. It is said that up to 1877 there were
350,000,000 pairs of shoes made on this machine in the United States,
and probably an equal or greater number in Europe. Shoes made on this
machine were strongly made and comfortable, but they could not be
resoled by a shoemaker, except by pegging or nailing, and the soles were
furthermore somewhat stiff and lacking in flexibility. To meet these
difficulties, a new machine known as the “Goodyear Welt Machine,” was
patented in 1871 and 1875, and brought out a little later. This sewed a
welt to an upper, which welt in a subsequent operation was sewed by an
external row of stitches to the sole. This gave much greater flexibility, and
the further advantage of enabling a shoemaker to half sole the shoe by
the old method of hand sewing. This advanced the art of shoemaking in
the finer varieties of shoes, and to-day nearly all men’s fine shoes are
made in this way. The introduction of the sewing machine into the shoe
industry made a new era in foot wear, and it is said that no nation on
earth is so well and cheaply shod as the people of the United States.

FIG. 147.—MCKAY SHOE SEWING MACHINE.

A buttonhole does not strike the average person as a thing of any

importance whatever. The needlewoman, however, who has to patiently
stitch around and form the buttonholes, knows differently, and when this
needlewoman, working in the great shirt factories and shoe factories, is
confronted with the many millions of buttonholes in collars, cuffs, shirts
and shoes, the great amount of this painstaking and nerve destroying
labor becomes appalling. For cheapening the cost of buttonholes, and
reducing the hand labor, various buttonhole machines and attachments to
sewing machines have been devised. Patents Nos. 36,616 and 36,617, to
Humphrey, Oct. 7, 1862, covered one of the earliest forms, but the Reece
buttonhole machine, which is specially devised for the work, is one of the
most modern and successful. It was patented April 26, 1881, Sept. 21,
1886, and Aug. 20, 1895. These machines mark an important departure,
which consists in working the buttonhole by moving the stitch forming
mechanism about the buttonhole, instead of moving the fabric. An
illustration of the machine is given in Fig. 148. Upon this machine 10,010
button holes have been made in nine hours and fifty minutes. The
machine first cuts the buttonhole, then transfers it to the stitching devices,
which stitch and bar the buttonhole, finishing it entirely in an automatic
manner. The saving involved to the manufacturer by this machine over the
hand method is several hundred per cent., but the relief to the
needlewoman is of far greater consequence.

FIG. 148.—REECE BUTTONHOLE MACHINE.

Many striking applications of the sewing machine to various kinds of work
have been made. A recent one is the automatic power carpet sewing
machine, made and sold by the Singer Manufacturing Company. It was
patented by E. B. Allen in 1894. This machine in general appearance
resembles a miniature elevated railroad. It consists of an elevated track
about thirty-six feet long, sustained every three or four feet upon
standards, and having clamping jaws, which hold together the upper
edges of the two lengths of carpet to be sewed together. A compact little
stitching apparatus, not larger than a tea-pot, is actuated by an endless
belt from an electric motor at one end. The little machine runs along and
stitches together the upper edges of the suspended carpet lengths, and as
it crawls along at its work, it strikingly reminds one of the movements of a
squirrel along the top of a rail fence. This machine will sew five yards of
seam every minute, fastening together evenly and strongly ten yards of
carpet, and entirely dispensing with all hand labor in this roughest and
most trying of all fabrics.
Probably no organized piece of machinery has ever been so systematically
exploited, so thoroughly advertised, so persistently canvassed, and so
extensively sold as the sewing machine. With their main central offices,
their branch offices, sub-agencies and traveling canvassers in wagons,
every city, village, hamlet, and farmhouse has been actively besieged, and
with the enticing system of payment by instalments there is scarcely a
home too humble to be without its sewing machine. The retail price of
sewing machines bears no proper relation to their cost, but this price to
the consumer results from the liberal commissions to agents, and the
expensive methods of canvassing. In the early days of the sewing
machine its sales were chiefly for family use, but this is now no longer the
case. While almost every family owns a sewing machine, it is only brought
into requisition for finer and special varieties of work, since nearly all the
clothing of men, women and children can now be purchased ready made,
at a price much less than the cost of the material and the labor of making
it up. A man to-day buys a ready-made shirt for fifty cents, which fifty
years ago would have cost him $2. This has largely transferred the sphere
of action of the sewing machine from the family to the factory. Great
factories now make ready-made clothing for men, women and children,
shirts, collars and cuffs, shoes, hats, caps, awnings, tents, sails, bags,
flags, banners, corsets, gloves, pocketbooks, harness, saddlery, rubber
goods, etc., and all these industries are founded upon the sewing
machine, which may be seen in long rows beside the factory walls, busily
supplying the demand of the world. With this transition in the sewing
machine foot treadles are no longer relied on, but the machines are run by
power from countershafts. This, in turn, has opened up possibilities of
much higher speed and greater efficiency in the machine. Inventors have
found, however, that high speed is handicapped with certain limitations.
Beyond a certain speed the needle gets hot from friction, which burns off
the thread and draws the temper. Cams and springs, moreover, are not
positive enough in action, as the resilience of the spring does not act
quickly enough, and so more positive gearings, such as eccentrics and
cranks, must be employed. Despite these difficulties, however, the modern
factory machine has raised the speed of the old-time sewing machine from
a few hundred stitches a minute to three and four thousand stitches a
minute.
The United States is the home of the sewing machine, and New York City
is the center of the industry, probably 90 per cent. of the sewing machine
trade being managed and handled there. German manufacturers are
making great efforts to compete in this field, but American machines are
generally regarded as the best in the world.
Among those prominently interested in the machine in its early days were
Orlando B. Potter and the law firm of Jordan & Clarke. The latter were
attorneys representing some of the prominent inventors in litigation, and
in this way Mr. Edward Clarke became interested in the business, and it
was he who in 1856 instituted the system of selling on the instalment
plan. For some years before his death Mr. Clarke was the president of the
Singer Company.
Recent statistics in relation to the sewing machine industry are difficult to
obtain, partly by reason of the great extent and ramifications of the
business, and partly by reason of the unwillingness of the larger
companies to give out data for publication. At the Patent Centennial in
Washington, in 1891, Ex-Commissioner of Patents Butterworth made the
statement that “Cæsar conquered Gaul with a force numerically less than
was employed in inventing and perfecting the parts of the sewing
machine.” The great Singer Company, with headquarters at New York,
operates not only a factory at Elizabethport, N. J., employing 5,000 men,
but also other factories in Europe and Canada, the one at Kilbowie,
Scotland, employing 6,000 men. Of the total of 13,500,000 machines
made by this company from 1853 to the end of 1896, nearly 6,000,000
have been made in factories located abroad, but directly controlled and
managed by the New York office. It is stated that the present output of
the American factory of the Singer Company amounts to over 11,000
weekly, or more than half a million annually. Although so many sewing
machines are made abroad, the exports from the United States for 1899
amounted to $3,264,344.
In the early days of the Howe sewing machine it was denounced as a
menace to the occupations of the thousands of men and women who
worked in the clothing shops, and the struggles of the inventor against
this opposition and discouragement form an interesting page of history.
But it had come to stay and to grow. Some 7,000 United States patents
attest the interest and ingenuity in this field, in the neighborhood of
100,000 persons make a living from the manufacture and sale of the
machine, millions find profitable employment in its use, and from 700,000
to 800,000 machines are annually manufactured in the United States. The
output of all countries is estimated to be from 1,200,000 to 1,300,000
annually.
The sewing machine has for its objective result only the simple and
insignificant function of fastening one piece of fabric to another, but its
influence upon civilization in ministering to the wants of the race has been
so great as to cause it to be numbered with the epoch-making inventions
of the age. It has created new industries. It has given useful employment
to capital, has extended the lists of the wage earner, and increased his
daily pay. It has clothed the naked, fed the hungry, and warded off the
ravages of cold and death; but, best of all its tuneful accompaniment has
lightened the heart and smoothed the pathway of life for Hood’s weary
working woman, to whose tired fingers and aching eyes it has brought the
balm of much-needed rest.
CHAPTER XVI.
The Reaper.

Early English Machines—Machine of Patrick Bell—The Hussey Reaper—McCormick’s

Reaper and Its Great Success—Rivalry Between the Two American Reapers—Self
Rakers—Automatic Binders—Combined Steam Reaper and Threshing Machine—Great
Wheat Fields of the West—Statistics.

n the harvest scenes upon the tombs of ancient Thebes the thirsty

I reaper is depicted, with curved sickle in hand, alternately bending his

back to the grain and refreshing himself at the skin bottle. For more
than thirty centuries did man thus continue to earn his bread by the
sweat of his brow. Even to the present time the scythe, with its cradle
of wooden fingers, is occasionally met with, and it is to the older
generation a familiar suggestion of the sweat, toil, bustle and excitement
of the old harvest time. But all this has been changed by the advent of the
reaper, and ere long the grain cradle will hang on the walls of the museum
as an ethnological specimen only.
The first reaper of which we find historical evidence is that described by
Pliny in the first century of the Christian Era (A. D. 70). He says: “The
mode of getting in the harvest varies considerably. In the vast domains of
the province of Gaul a large hollow frame, armed with comb-like teeth,
and supported on two wheels, is driven through the standing grain, the
beasts being yoked behind it (in contrarium juncto), the result being that
the ears are torn off and fall within the frame.”
This crude machine has in late years been many times re-invented, and it
finds a special application to-day for the gathering of clover seeds, and is
called a “header.”
The first attempt of modern times to devise a reaper was the English
machine of Pitt, in 1786, which followed the principle of the old Gallic
implement, in that it stripped the heads from the standing grain. The Pitt
machine, however, had a revolving cylinder on which were rows of comb
teeth, which tore off the heads of grain and discharged them into a
receptacle. In 1799 Boyce, of England, invented the vertical shaft, with
horizontally rotating cutters. In 1800 Mears devised a machine employing
shears. In 1806 Gladstone devised a front-draft, side-cut machine, in
which a curved segment-bar with fingers gathered the grain and held it
while a horizontally revolving knife cut the same. In 1811 Cumming
introduced the reel, and in 1814 Dobbs described a wheelbarrow
arrangement of reaper in which he used the divider. In 1822 the important
improvement of the reciprocating knife bar was made by Ogle, which
became a characteristic feature of all subsequent successful reapers. It
was drawn by horses in front. The cutter bar projected at the side. It had
a reel to gather the grain to the cutter, and the grain platform was tilted to
drop the gavel. In 1826 Rev. Patrick Bell, of Scotland, devised a reaper
that had a movable vibrating cutter working like a series of shears, a reel,
and a traveling apron, which carried off the grain to one side. This
machine was pushed from behind, and, with a swath of five feet, cut an
acre in an hour. It was, however, for some reason laid aside till 1851,
when it was reorganized and put in service at the World’s Fair in London in
competition with the American machines. All the earlier experiments in the
development of the reaper were made in England. Grain raising was in its
infancy in the United States, and near the end of the Eighteenth Century
the Royal Agricultural Society of England had stimulated its own inventors
by offering a prize for the production of a successful reaper, and continued
thus to offer it for many years. There is no evidence, however, that the
preceding machines attained any practical results, and it remained for the
fertility of American genius to invent a practical reaper which satisfactorily
performed its work, and continued to do so. Quite a number of patents for
reapers were granted to American inventors in the early part of the
century, among which may be mentioned that to Manning, of Plainfield, N.
J., May 3, 1831, which embodied finger bars to hold the grain and a
reciprocating cutter bar with spear-shaped blades.
FIG. 149.—PATENT OFFICE DRAWING, HUSSEY’S REAPER,
DECEMBER 31, 1833.

Cyrus H. McCormick, of Virginia, and Obed Hussey, of Maryland, were the

men who brought the reaper to a condition of practical utility. The
commercial development of their machines was practically
contemporaneous, and their respective claims for superiority had about an
equal number of supporters among the farmers of that day. Hussey,
originally of Cincinnati, but afterwards of Maryland, was the first to obtain
a patent, which was granted December 31, 1833. An illustration of the
patent drawing is given in Fig. 149. It embodied a reciprocating saw tooth
cutter f sliding within double guard fingers e. It had a front draft, side-cut,
and a platform. The cutter was driven by a pitman from a crank shaft
operated through gear wheels from the main drive wheels. His
specification provided for the locking or unlocking of the drive wheels; also
for the hinging of the platform, and states that the operator who takes off
the grain may ride on the machine.

FIG. 150.—PATENT OFFICE DRAWING, McCORMICK’S REAPER, JUNE 21, 1834.

On June 21, 1834, Cyrus H. McCormick, of Virginia, obtained a patent on

his reaper. In Fig. 150 appears an illustration of his patent drawing. This
had two features which were not found in the Hussey patent, viz., a reel
on a horizontal axis above the cutter, and a divider L, at the outer end of
the cutter, which divider projected in front of the cutter, and separated in
advance the grain which was to be cut from that which was to be left
standing. McCormick’s machine had two cutters or knives, reciprocated by
cranks in opposite directions to each other. This feature he afterward
abandoned, adopting the single knife, described by him as an alternative.
This machine was to be pushed ahead of the team, which was hitched to
the bar C of the tongue B in the rear, but provision was made for a front
draft by a pair of shafts in front, shown in dotted lines. The curved dotted
line beside the shafts indicated a bowed guard to press the standing grain
away from the horse. The divider L had a cloth screen extending to the
rear of the platform.
Neither Hussey nor McCormick appears at that time to have been
cognizant of the prior state of the art, and as the patent law of 1836 had
not yet been enacted, there was little or no examination as to novelty, and
no interference proceedings as to priority of invention, and consequently
their respective claims were drawn to much that was old, and probably
much that would have been in conflict with each other under the present
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about testbank and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebooksecure.com