100% found this document useful (1 vote)

22 views

Instant ebooks textbook (Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download all chapters

ISBN

Uploaded by

davihimamu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

22 views

Instant ebooks textbook (Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download all chapters

ISBN

Uploaded by

davihimamu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Visit https://ebooknice.

com to download the full version and

explore more ebooks

(Ebook) Data Mining: A Tutorial-Based Primer, 2nd

Edition by Richard J. Roiger ISBN 9781498763974,
1498763979

_ Click the link below to download _

https://ebooknice.com/product/data-mining-a-tutorial-
based-primer-2nd-edition-7263272

Explore and download more ebooks at ebooknice.com

Here are some recommended products that might interest you.
You can download now and explore!

(Ebook) Data Mining: A Tutorial-Based Primer, Second Edition by

Roiger, Richard J ISBN 9781051051067, 9781498763974, 9781498763981,
1051051061, 1498763979, 1498763987

https://ebooknice.com/product/data-mining-a-tutorial-based-primer-
second-edition-5743374

ebooknice.com

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James

ISBN 9781459699816, 9781743365571, 9781925268492, 1459699815,
1743365578, 1925268497

https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

ebooknice.com

(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study:

the United States, 1919-41 2nd Edition by Benjamin Harrison ISBN
9781398375147, 9781398375048, 1398375144, 1398375047

https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044

ebooknice.com

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans

Heikne, Sanna Bodemyr ISBN 9789127456600, 9127456609

https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312

ebooknice.com
(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success)
by Peterson's ISBN 9780768906677, 0768906679

https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018

ebooknice.com

(Ebook) Making Sense of Data I: A Practical Guide to Exploratory Data

Analysis and Data Mining, 2nd Edition by Glenn J. Myatt, Wayne P.
Johnson ISBN 9781118407417, 1118407415

https://ebooknice.com/product/making-sense-of-data-i-a-practical-
guide-to-exploratory-data-analysis-and-data-mining-2nd-edition-5470958

ebooknice.com

(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT
Subject Test: Math Levels 1 & 2) by Arco ISBN 9780768923049,
0768923042

https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-
arco-master-the-sat-subject-test-math-levels-1-2-2326094

ebooknice.com

(Ebook) Geographic Data Mining and Knowledge Discovery, Second Edition

(Chapman & Hall CRC Data Mining and Knowledge Discovery Series) by
Harvey J. Miller, Jiawei Han ISBN 9781420073973, 1420073974

https://ebooknice.com/product/geographic-data-mining-and-knowledge-
discovery-second-edition-chapman-hall-crc-data-mining-and-knowledge-
discovery-series-2023104
ebooknice.com

(Ebook) Data Mining With Decision Trees: Theory and Applications (2nd
Edition) by Lior Rokach, Oded Maimon ISBN 9789814590075, 981459007X

https://ebooknice.com/product/data-mining-with-decision-trees-theory-
and-applications-2nd-edition-4913228

ebooknice.com
DATA MINING
A Tutorial-Based Primer
SECOND EDITION
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.

AIMS AND SCOPE

This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

PUBLISHED TITLES
ACCELERATING DISCOVERY: MINING UNSTRUCTURED INFORMATION FOR
HYPOTHESIS GENERATION
Scott Spangler
ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY
Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
BIOLOGICAL DATA MINING
Jake Y. Chen and Stefano Lonardi
COMPUTATIONAL BUSINESS ANALYTICS
Subrata Das
COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE
DEVELOPMENT
Ting Yu, Nitesh V. Chawla, and Simeon Simoff
COMPUTATIONAL METHODS OF FEATURE SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY,
AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS
Guozhu Dong and James Bailey
DATA CLASSIFICATION: ALGORITHMS AND APPLICATIONS
Charu C. Aggarwal
DATA CLUSTERING: ALGORITHMS AND APPLICATIONS
Charu C. Aggarwal and Chandan K. Reddy
DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH
Guojun Gan
DATA MINING: A TUTORIAL-BASED PRIMER, SECOND EDITION
Richard J. Roiger
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada
DATA MINING WITH R: LEARNING WITH CASE STUDIES, SECOND EDITION
Luís Torgo
EVENT MINING: ALGORITHMS AND APPLICATIONS
Tao Li
FOUNDATIONS OF PREDICTIVE ANALYTICS
James Wu and Stephen Coggeshall
GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY,
SECOND EDITION
Harvey J. Miller and Jiawei Han
GRAPH-BASED SOCIAL MEDIA ANALYSIS
Ioannis Pitas
HANDBOOK OF EDUCATIONAL DATA MINING
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker
HEALTHCARE DATA ANALYTICS
Chandan K. Reddy and Charu C. Aggarwal
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS
Vagelis Hristidis
INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS
Priti Srinivas Sajja and Rajendra Akerkar
INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS
AND TECHNIQUES
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu
KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND
LAW ENFORCEMENT
David Skillicorn
KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR
ENGINEERING SYSTEMS HEALTH MANAGEMENT
Ashok N. Srivastava and Jiawei Han
MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS
David Lo, Siau-Cheng Khoo, Jiawei Han, and Chao Liu
MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO
CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang
MUSIC DATA MINING
Tao Li, Mitsunori Ogihara, and George Tzanetakis
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar
RAPIDMINER: DATA MINING USE CASES AND BUSINESS ANALYTICS
APPLICATIONS
Markus Hofmann and Ralf Klinkenberg
RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS,
AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY
Domenico Talia and Paolo Trunfio
SPECTRAL FEATURE SELECTION FOR DATA MINING
Zheng Alan Zhao and Huan Liu
STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION
George Fernandez
SUPPORT VECTOR MACHINES: OPTIMIZATION BASED THEORY,
ALGORITHMS, AND EXTENSIONS
Naiyang Deng, Yingjie Tian, and Chunhua Zhang
TEMPORAL DATA MINING
Theophano Mitsa
TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS
Ashok N. Srivastava and Mehran Sahami
TEXT MINING AND VISUALIZATION: CASE STUDIES USING OPEN-SOURCE
TOOLS
Markus Hofmann and Andrew Chisholm
THE TOP TEN ALGORITHMS IN DATA MINING
Xindong Wu and Vipin Kumar
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX
DECOMPOSITIONS
David Skillicorn
DATA MINING
A Tutorial-Based Primer
SECOND EDITION

Richard J. Roiger
This book was previously published by Pearson Education, Inc.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

Version Date: 20161025

International Standard Book Number-13: 978-1-4987-6397-4 (Pack - Book and Ebook)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents

List of Figures, xvii

List of Tables, xxix
Preface, xxxi
Acknowledgments, xxxix
Author, xli

SECTION I Data Mining Fundamentals

CHAPTER 1 ◾ Data Mining: A First View 3

CHAPTER OBJECTIVES 3
1.1 DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE
DISCOVERY IN DATABASES 4
1.1.1 Data Science and Analytics 4
1.1.2 Data Mining 5
1.1.3 Data Science versus Knowledge Discovery in Databases 5
1.2 WHAT CAN COMPUTERS LEARN? 6
1.2.1 Three Concept Views 6
1.2.1.1 The Classical View 6
1.2.1.2 The Probabilistic View 7
1.2.1.3 The Exemplar View 7
1.2.2 Supervised Learning 8
1.2.3 Supervised Learning: A Decision Tree Example 9
1.2.4 Unsupervised Clustering 11
1.3 IS DATA MINING APPROPRIATE FOR MY PROBLEM? 14
1.3.1 Data Mining or Data Query? 14
1.3.2 Data Mining versus Data Query: An Example 15
1.4 DATA MINING OR KNOWLEDGE ENGINEERING? 16

vii
viii ◾ Contents

1.5 A NEAREST NEIGHBOR APPROACH 18

1.6 A PROCESS MODEL FOR DATA MINING 19
1.6.1 Acquiring Data 20
1.6.1.1 The Data Warehouse 20
1.6.1.2 Relational Databases and Flat Files 21
1.6.1.3 Distributed Data Access 21
1.6.2 Data Preprocessing 21
1.6.3 Mining the Data 23
1.6.4 Interpreting the Results 23
1.6.5 Result Application 24
1.7 DATA MINING, BIG DATA, AND CLOUD COMPUTING 24
1.7.1 Hadoop 24
1.7.2 Cloud Computing 24
1.8 DATA MINING ETHICS 25
1.9 INTRINSIC VALUE AND CUSTOMER CHURN 26
1.10 CHAPTER SUMMARY 27
1.11 KEY TERMS 28

CHAPTER 2 ◾ Data Mining: A Closer Look 33

CHAPTER OBJECTIVES 33
2.1 DATA MINING STRATEGIES 34
2.1.1 Classification 34
2.1.2 Estimation 35
2.1.3 Prediction 36
2.1.4 Unsupervised Clustering 39
2.1.5 Market Basket Analysis 40
2.2 SUPERVISED DATA MINING TECHNIQUES 41
2.2.1 The Credit Card Promotion Database 41
2.2.2 Rule-Based Techniques 42
2.2.3 Neural Networks 44
2.2.4 Statistical Regression 46
2.3 ASSOCIATION RULES 47
2.4 CLUSTERING TECHNIQUES 48
2.5 EVALUATING PERFORMANCE 49
2.5.1 Evaluating Supervised Learner Models 50
2.5.2 Two-Class Error Analysis 52
Contents ◾ ix

2.5.3 Evaluating Numeric Output 53

2.5.4 Comparing Models by Measuring Lift 53
2.5.5 Unsupervised Model Evaluation 55
2.6 CHAPTER SUMMARY 56
2.7 KEY TERMS 57

CHAPTER 3 ◾ Basic Data Mining Techniques 63

CHAPTER OBJECTIVES 63
3.1 DECISION TREES 64
3.1.1 An Algorithm for Building Decision Trees 64
3.1.2 Decision Trees for the Credit Card Promotion Database 70
3.1.3 Decision Tree Rules 73
3.1.4 Other Methods for Building Decision Trees 73
3.1.5 General Considerations 74
3.2 A BASIC COVERING RULE ALGORITHM 74
3.3 GENERATING ASSOCIATION RULES 80
3.3.1 Confidence and Support 80
3.3.2 Mining Association Rules: An Example 82
3.3.3 General Considerations 84
3.4 THE K-MEANS ALGORITHM 85
3.4.1 An Example Using K-means 86
3.4.2 General Considerations 89
3.5 GENETIC LEARNING 90
3.5.1 Genetic Algorithms and Supervised Learning 91
3.5.2 General Considerations 95
3.6 CHOOSING A DATA MINING TECHNIQUE 95
3.7 CHAPTER SUMMARY 97
3.8 KEY TERMS 98

SECTION II Tools for Knowledge Discovery

CHAPTER 4 ◾ Weka—An Environment for Knowledge Discovery 105

CHAPTER OBJECTIVES 105
4.1 GETTING STARTED WITH WEKA 106
4.2 BUILDING DECISION TREES 109
4.3 GENERATING PRODUCTION RULES WITH PART 117
x ◾ Contents

4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION 122

4.5 ASSOCIATION RULES 127
4.6 COST/BENEFIT ANALYSIS, (OPTIONAL) 131
4.7 UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM 137
4.8 CHAPTER SUMMARY 141

CHAPTER 5 ◾ Knowledge Discovery with RapidMiner 145

CHAPTER OBJECTIVES 145
5.1 GETTING STARTED WITH RAPIDMINER 146
5.1.1 Installing RapidMiner 146
5.1.2 Navigating the Interface 146
5.1.3 A First Process Model 149
5.1.4 A Decision Tree for the Credit Card Promotion Database 156
5.1.5 Breakpoints 158
5.2 BUILDING DECISION TREES 159
5.2.1 Scenario 1: Using a Training and Test Set 160
5.2.2 Scenario 2: Adding a Subprocess 165
5.2.3 Scenario 3: Creating, Saving, and Applying the Final Model 167
5.2.3.1 Saving a Model to an Output File 167
5.2.3.2 Reading and Applying a Model 168
5.2.4 Scenario 4: Using Cross-Validation 168
5.3 GENERATING RULES 173
5.3.1 Scenario 1: Tree to Rules 173
5.3.2 Scenario 2: Rule Induction 176
5.3.3 Scenario 3: Subgroup Discovery 178
5.4 ASSOCIATION RULE LEARNING 181
5.4.1 Association Rules for the Credit Card Promotion Database 182
5.4.2 The Market Basket Analysis Template 183
5.5 UNSUPERVISED CLUSTERING WITH K-MEANS 187
5.6 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION 191
5.7 CHAPTER SUMMARY 194

CHAPTER 6 ◾ The Knowledge Discovery Process 199

CHAPTER OBJECTIVES 199
6.1 A PROCESS MODEL FOR KNOWLEDGE DISCOVERY 199
6.2 GOAL IDENTIFICATION 201
Contents ◾ xi

6.3 CREATING A TARGET DATA SET 202

6.4 DATA PREPROCESSING 203
6.4.1 Noisy Data 203
6.4.1.1 Locating Duplicate Records 204
6.4.1.2 Locating Incorrect Attribute Values 204
6.4.1.3 Data Smoothing 204
6.4.1.4 Detecting Outliers 205
6.4.2 Missing Data 207
6.5 DATA TRANSFORMATION 208
6.5.1 Data Normalization 208
6.5.2 Data Type Conversion 209
6.5.3 Attribute and Instance Selection 209
6.5.3.1 Wrapper and Filtering Techniques 210
6.5.3.2 More Attribute Selection Techniques 211
6.5.3.3 Genetic Learning for Attribute Selection 211
6.5.3.4 Creating Attributes 212
6.5.3.5 Instance Selection 213
6.6 DATA MINING 214
6.7 INTERPRETATION AND EVALUATION 214
6.8 TAKING ACTION 215
6.9 THE CRISP-DM PROCESS MODEL 215
6.10 CHAPTER SUMMARY 216
6.11 KEY TERMS 216

CHAPTER 7 ◾ Formal Evaluation Techniques 221

CHAPTER OBJECTIVES 221
7.1 WHAT SHOULD BE EVALUATED? 222
7.2 TOOLS FOR EVALUATION 223
7.2.1 Single-Valued Summary Statistics 224
7.2.2 The Normal Distribution 225
7.2.3 Normal Distributions and Sample Means 226
7.2.4 A Classical Model for Hypothesis Testing 228
7.3 COMPUTING TEST SET CONFIDENCE INTERVALS 230
7.4 COMPARING SUPERVISED LEARNER MODELS 232
7.4.1 Comparing the Performance of Two Models 233
7.4.2 Comparing the Performance of Two or More Models 234
xii ◾ Contents

7.5 UNSUPERVISED EVALUATION TECHNIQUES 235

7.5.1 Unsupervised Clustering for Supervised Evaluation 235
7.5.2 Supervised Evaluation for Unsupervised Clustering 235
7.5.3 Additional Methods for Evaluating an Unsupervised Clustering 236
7.6 EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT 236
7.7 COMPARING MODELS WITH RAPIDMINER 238
7.8 ATTRIBUTE EVALUATION FOR MIXED DATA TYPES 241
7.9 PARETO LIFT CHARTS 244
7.10 CHAPTER SUMMARY 247
7.11 KEY TERMS 248

SECTION III Building Neural Networks

CHAPTER 8 ◾ Neural Networks 253

CHAPTER OBJECTIVES 253
8.1 FEED-FORWARD NEURAL NETWORKS 254
8.1.1 Neural Network Input Format 254
8.1.2 Neural Network Output Format 255
8.1.3 The Sigmoid Evaluation Function 256
8.2 NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW 258
8.2.1 Supervised Learning with Feed-Forward Networks 258
8.2.1.1 Training a Neural Network: Backpropagation Learning 258
8.2.1.2 Training a Neural Network: Genetic Learning 259
8.2.2 Unsupervised Clustering with Self-Organizing Maps 259
8.3 NEURAL NETWORK EXPLANATION 260
8.4 GENERAL CONSIDERATIONS 262
8.5 NEURAL NETWORK TRAINING: A DETAILED VIEW 263
8.5.1 The Backpropagation Algorithm: An Example 263
8.5.2 Kohonen Self-Organizing Maps: An Example 266
8.6 CHAPTER SUMMARY 268
8.7 KEY TERMS 269

CHAPTER 9 ◾ Building Neural Networks with Weka 271

CHAPTER OBJECTIVES 271
9.1 DATA SETS FOR BACKPROPAGATION LEARNING 272
9.1.1 The Exclusive-OR Function 272
9.1.2 The Satellite Image Data Set 273
Contents ◾ xiii

9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT 274

9.3 MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT 280
9.4 MINING SATELLITE IMAGE DATA 282
9.5 UNSUPERVISED NEURAL NET CLUSTERING 287
9.6 CHAPTER SUMMARY 288
9.7 KEY TERMS 289

CHAPTER 10 ◾ Building Neural Networks with RapidMiner 293

CHAPTER OBJECTIVES 293
10.1 MODELING THE EXCLUSIVE-OR FUNCTION 294
10.2 MINING SATELLITE IMAGE DATA 301
10.3 PREDICTING CUSTOMER CHURN 306
10.4 RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR 311
10.5 CHAPTER SUMMARY 313

SECTION IV Advanced Data Mining Techniques

CHAPTER 11 ◾ Supervised Statistical Techniques 317

CHAPTER OBJECTIVES 317
11.1 NAÏVE BAYES CLASSIFIER 317
11.1.1 Naïve Bayes Classifier: An Example 318
11.1.2 Zero-Valued Attribute Counts 321
11.1.3 Missing Data 321
11.1.4 Numeric Data 322
11.1.5 Implementations of the Naïve Bayes Classifier 324
11.1.6 General Considerations 324
11.2 SUPPORT VECTOR MACHINES 324
11.2.1 Linearly Separable Classes 332
11.2.2 The Nonlinear Case 336
11.2.3 General Considerations 337
11.2.4 Implementations of Support Vector Machines 340
11.3 LINEAR REGRESSION ANALYSIS 340
11.3.1 Simple Linear Regression 344
11.3.2 Multiple Linear Regression 344
11.3.2.1 Linear Regression—Weka 344
11.3.2.2 Linear Regression—RapidMiner 345
xiv ◾ Contents

11.4 REGRESSION TREES 349

11.5 LOGISTIC REGRESSION 350
11.5.1 Transforming the Linear Regression Model 350
11.5.2 The Logistic Regression Model 351
11.6 CHAPTER SUMMARY 352
11.7 KEY TERMS 352

CHAPTER 12 ◾ Unsupervised Clustering Techniques 357

CHAPTER OBJECTIVES 357
12.1 AGGLOMERATIVE CLUSTERING 358
12.1.1 Agglomerative Clustering: An Example 358
12.1.2 General Considerations 360
12.2 CONCEPTUAL CLUSTERING 360
12.2.1 Measuring Category Utility 361
12.2.2 Conceptual Clustering: An Example 362
12.2.3 General Considerations 364
12.3 EXPECTATION MAXIMIZATION 364
12.3.1 Implementations of the EM Algorithm 365
12.3.2 General Considerations 365
12.4 GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING 371
12.5 CHAPTER SUMMARY 374
12.6 KEY TERMS 374

CHAPTER 13 ◾ Specialized Techniques 377

CHAPTER OBJECTIVES 377
13.1 TIME-SERIES ANALYSIS 377
13.1.1 Stock Market Analytics 378
13.1.2 Time-Series Analysis—An Example 379
13.1.2.1 Creating the Target Data Set—Numeric Output 380
13.1.2.2 Data Preprocessing and Transformation 380
13.1.2.3 Creating the Target Data Set—Categorical Output 382
13.1.2.4 Mining the Data—RapidMiner 382
13.1.2.5 Mining the Data—Weka 387
13.1.2.6 Interpretation, Evaluation, and Action 390
13.1.3 General Considerations 390
Contents ◾ xv

13.2 MINING THE WEB 391

13.2.1 Web-Based Mining: General Issues 391
13.2.1.1 Identifying the Goal 391
13.2.2 Preparing the Data 392
13.2.2.1 Mining the Data 393
13.2.2.2 Interpreting and Evaluating Results 393
13.2.2.3 Taking Action 394
13.2.3 Data Mining for Website Evaluation 395
13.2.4 Data Mining for Personalization 395
13.2.5 Data Mining for Website Adaptation 396
13.2.6 PageRank and Link Analysis 396
13.2.7 Operators for Web-Based Mining 398
13.3 MINING TEXTUAL DATA 398
13.3.1 Analyzing Customer Reviews 399
13.4 TECHNIQUES FOR LARGE-SIZED, IMBALANCED,
AND STREAMING DATA 404
13.4.1 Large-Sized Data 404
13.4.2 Dealing with Imbalanced Data 405
13.4.2.1 Methods for Addressing Rarity 406
13.4.2.2 Receiver Operating Characteristics Curves 406
13.4.3 Methods for Streaming Data 412
13.5 ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE 413
13.5.1 Bagging 413
13.5.2 Boosting 414
13.5.3 AdaBoost—An Example 414
13.6 CHAPTER SUMMARY 417
13.7 KEY TERMS 418

CHAPTER 14 ◾ The Data Warehouse 423

CHAPTER OBJECTIVES 423
14.1 OPERATIONAL DATABASES 424
14.1.1 Data Modeling and Normalization 424
14.1.2 The Relational Model 425
14.2 DATA WAREHOUSE DESIGN 426
14.2.1 Entering Data into the Warehouse 427
xvi ◾ Contents

14.2.2 Structuring the Data Warehouse: The Star Schema 429

14.2.2.1 The Multidimensionality of the Star Schema 430
14.2.2.2 Additional Relational Schemas 431
14.2.3 Decision Support: Analyzing the Warehouse Data 432
14.3 ONLINE ANALYTICAL PROCESSING 434
14.3.1 OLAP: An Example 435
14.3.2 General Considerations 438
14.4 EXCEL PIVOT TABLES FOR DATA ANALYTICS 438
14.5 CHAPTER SUMMARY 445
14.6 KEY TERMS 446

APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING, 451

APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION, 455

BIBLIOGRAPHY, 461

INDEX, 465
List of Figures

Figure 1.1 A decision tree for the data in Table 1.1. 10

Figure 1.2 Data mining versus expert systems. 17
Figure 1.3 A process model for data mining. 20
Figure 1.4 A perfect positive correlation (r = 1). 22
Figure 1.5 A perfect negative correlation (r = −1). 23
Figure 1.6 Intrinsic versus actual customer value. 26
Figure 2.1 A hierarchy of data mining strategies. 34
Figure 2.2 A fully connected multilayer neural network. 45
Figure 2.3 An unsupervised clustering of the credit card database. 49
Figure 2.4 Targeted versus mass mailing. 54
Figure 3.1 A partial decision tree with root node = income range. 68
Figure 3.2 A partial decision tree with root node = credit card insurance. 69
Figure 3.3 A partial decision tree with root node = age. 70
Figure 3.4 A three-node decision tree for the credit card database. 71
Figure 3.5 A two-node decision tree for the credit card database. 71
Figure 3.6 Domain statistics for the credit card promotion database. 76
Figure 3.7 Class statistics for life insurance promotion = yes. 77
Figure 3.8 Class statistics for life insurance promotion = no. 78
Figure 3.9 Statistics for life insurance promotion = yes after removing five instances. 79
Figure 3.10 A coordinate mapping of the data in Table 3.6. 86
Figure 3.11 A K-means clustering of the data in Table 3.6 (K = 2). 89
Figure 3.12 Supervised genetic learning. 91

xvii
xviii ◾ List of Figures

Figure 3.13 A crossover operation. 94

Figure 4.1 Weka GUI Chooser. 107
Figure 4.2 Explorer four graphical user interfaces (GUI’s). 107
Figure 4.3 Weka install folder. 108
Figure 4.4 Sample data sets. 108
Figure 4.5 Instances of the contact-lenses file. 109
Figure 4.6 Loading the contact-lenses data set. 110
Figure 4.7 Navigating the Explorer interface. 111
Figure 4.8 A partial list of attribute filters. 112
Figure 4.9 Command line call for J48. 112
Figure 4.10 Parameter setting options for J48. 113
Figure 4.11 Decision tree for the contact-lenses data set. 113
Figure 4.12 Weka’s tree visualizer. 114
Figure 4.13 Decision tree output for the contact-lenses data set. 115
Figure 4.14 Classifier output options. 116
Figure 4.15 Actual and predicted output. 116
Figure 4.16 Customer churn data. 118
Figure 4.17 A decision list for customer churn data. 118
Figure 4.18 Customer churn output generated by PART. 119
Figure 4.19 Loading the customer churn instances of unknown outcome. 120
Figure 4.20 Predicting customers likely to churn. 121
Figure 4.21 Nearest neighbor output for the spam data set. 123
Figure 4.22 Weka’s attribute selection filter. 124
Figure 4.23 Options for the attribute selection filter. 124
Figure 4.24 Parameter settings for ranker. 125
Figure 4.25 Most predictive attributes for the spam data set. 126
Figure 4.26 IBk output after removing the 10 least predictive attributes. 126
Figure 4.27 Association rules for the contact-lenses data set. 128
Figure 4.28 Parameters for the Apriori algorithm. 129
List of Figures ◾ xix

Figure 4.29 The supermarket data set. 130

Figure 4.30 Instances of the supermarket data set. 130
Figure 4.31 Ten association rules for the supermarket data set. 131
Figure 4.32 A J48 classification of the credit card screening data set. 132
Figure 4.33 Invoking a cost/benefit analysis. 133
Figure 4.34 Cost/benefit output for the credit card screening data set. 133
Figure 4.35 Cost/benefit analysis set to match J48 classifier output. 134
Figure 4.36 Invoking a cost/benefit analysis. 135
Figure 4.37 Minimizing total cost. 135
Figure 4.38 Cutoff scores for credit card application acceptance. 136
Figure 4.39 Classes to clusters evaluation for simpleKmeans. 137
Figure 4.40 Include standard deviation values for simpleKmeans. 138
Figure 4.41 Classes to clusters output. 139
Figure 4.42 Partial list of attribute values for the K-means clustering in Figure 4.41. 140
Figure 4.43 Additional attribute values for the SimpleKMeans clustering in Figure 4.41. 140
Figure 5.1 An introduction to RapidMiner. 147
Figure 5.2 Creating a new blank process. 147
Figure 5.3 A new blank process with helpful pointers. 148
Figure 5.4 Creating and saving a process. 150
Figure 5.5 Importing the credit card promotion database. 150
Figure 5.6 Selecting the cells to import. 151
Figure 5.7 A list of allowable data types. 152
Figure 5.8 Changing the role of Life Ins Promo. 152
Figure 5.9 Storing a file in the data folder. 153
Figure 5.10 The credit card promotion database. 153
Figure 5.11 A successful file import. 154
Figure 5.12 Connecting the credit card promotion database to an output port. 154
Figure 5.13 Summary statistics for the credit card promotion database. 155
Figure 5.14 A bar graph for income range. 155
xx ◾ List of Figures

Figure 5.15 A scatterplot comparing age and life insurance promotion. 156
Figure 5.16 A decision tree process model. 157
Figure 5.17 A decision tree for the credit card promotion database. 158
Figure 5.18 A decision tree in descriptive form. 158
Figure 5.19 A list of operator options. 159
Figure 5.20 Customer churn—A training and test set scenario. 160
Figure 5.21 Removing instances of unknown outcome from the churn data set. 161
Figure 5.22 Partitioning the customer churn data. 162
Figure 5.23 The customer churn data set. 163
Figure 5.24 Filter Examples has removed all instances of unknown outcome. 163
Figure 5.25 A decision tree for the customer churn data set. 164
Figure 5.26 Output of the Apply Model operator. 164
Figure 5.27 A performance vector for the customer churn data set. 165
Figure 5.28 Adding a subprocess to the main process window. 166
Figure 5.29 A subprocess for data preprocessing. 167
Figure 5.30 Creating and saving a decision tree model. 168
Figure 5.31 Reading and applying a saved model. 169
Figure 5.32 An Excel file stores model predictions. 169
Figure 5.33 Testing a model using cross-validation. 170
Figure 5.34 A subprocess to read and filter customer churn data. 171
Figure 5.35 Nested subprocesses for cross-validation. 171
Figure 5.36 Performance vector for a decision tree tested using cross-validation. 172
Figure 5.37 Subprocess for the Tree to Rules operator. 174
Figure 5.38 Building a model with the Tree to Rules operator. 174
Figure 5.39 Rules generated by the Tree to Rules operator. 175
Figure 5.40 Performance vector for the customer churn data set. 175
Figure 5.41 A process design for rule induction. 176
Figure 5.42 Adding the Discretize by Binning operator. 177
Figure 5.43 Covering rules for customer churn data. 177
List of Figures ◾ xxi

Figure 5.44 Performance vector for the covering rules of Figure 5.43. 178
Figure 5.45 Process design for subgroup discovery. 179
Figure 5.46 Subprocess design for subgroup discovery. 179
Figure 5.47 Rules generated by the Subgroup Discovery operator. 180
Figure 5.48 Ten rules identifying likely churn candidates. 181
Figure 5.49 Generating association rules for the credit card promotion database. 182
Figure 5.50 Preparing data for association rule generation. 183
Figure 5.51 Interface for listing association rules. 184
Figure 5.52 Association rules for the credit card promotion database. 184
Figure 5.53 Market basket analysis template. 185
Figure 5.54 The pivot operator rotates the example set. 186
Figure 5.55 Association rules for the Market Basket Analysis template. 186
Figure 5.56 Process design for clustering gamma-ray burst data. 188
Figure 5.57 A partial clustering of gamma-ray burst data. 189
Figure 5.58 Three clusters of gamma-ray burst data. 189
Figure 5.59 Decision tree illustrating a gamma-ray burst clustering. 190
Figure 5.60 A descriptive form of a decision tree showing a clustering
of gamma-ray burst data. 190
Figure 5.61 Benchmark performance for nearest neighbor classification. 192
Figure 5.62 Main process design for nearest neighbor classification. 192
Figure 5.63 Subprocess for nearest neighbor classification. 193
Figure 5.64 Forward selection subprocess for nearest neighbor classification. 193
Figure 5.65 Performance vector when forward selection is used for choosing
attributes. 194
Figure 5.66 Unsupervised clustering for attribute evaluation. 197
Figure 6.1 A seven-step KDD process model. 200
Figure 6.2 The Acme credit card database. 203
Figure 6.3 A process model for detecting outliers. 205
Figure 6.4 Two outlier instances from the diabetes patient data set. 206
xxii ◾ List of Figures

Figure 6.5 Ten outlier instances from the diabetes patient data set. 207
Figure 7.1 Components for supervised learning. 222
Figure 7.2 A normal distribution. 225
Figure 7.3 Random samples from a population of 10 elements. 226
Figure 7.4 A process model for comparing three competing models. 239
Figure 7.5 Subprocess for comparing three competing models. 240
Figure 7.6 Cross-validation test for a decision tree with maximum depth = 5. 240
Figure 7.7 A matrix of t-test scores. 241
Figure 7.8 ANOVA comparing three competing models. 241
Figure 7.9 ANOVA operators for comparing nominal and numeric attributes. 242
Figure 7.10 The grouped ANOVA operator comparing class and maximum heart
rate. 243
Figure 7.11 The ANOVA matrix operator for the cardiology patient data set. 243
Figure 7.12 A process model for creating a lift chart. 244
Figure 7.13 Preprocessing the customer churn data set. 245
Figure 7.14 Output of the Apply Model operator for the customer churn data set. 245
Figure 7.15 Performance vector for customer churn. 246
Figure 7.16 A Pareto lift chart for customer churn. 247
Figure 8.1 A fully connected feed-forward neural network. 254
Figure 8.2 The sigmoid evaluation function. 257
Figure 8.3 A 3 × 3 Kohonen network with two input-layer nodes. 260
Figure 8.4 Connections for two output-layer nodes. 266
Figure 9.1 Graph of the XOR function. 272
Figure 9.2 XOR training data. 273
Figure 9.3 Satellite image data. 274
Figure 9.4 Weka four graphical user interfaces (GUIs) for XOR training. 275
Figure 9.5 Backpropagation learning parameters. 276
Figure 9.6 Architecture for the XOR function. 278
Figure 9.7 XOR training output. 278
List of Figures ◾ xxiii

Figure 9.8 Network architecture with associated connection weights. 279

Figure 9.9 XOR network architecture without a hidden layer. 280
Figure 9.10 Confusion matrix for XOR without a hidden layer. 281
Figure 9.11 XOR with hidden layer and categorical output. 281
Figure 9.12 XOR confusion matrix and categorical output. 282
Figure 9.13 Satellite image data network architecture. 284
Figure 9.14 Confusion matrix for satellite image data. 284
Figure 9.15 Updated class assignment for instances 78 through 94 of the satellite
image data set. 286
Figure 9.16 Initial classification for pixel instances 78 through 94 of the satellite
image data set. 286
Figure 9.17 Parameter settings for Weka’s SelfOrganizingMap. 287
Figure 9.18 Applying Weka’s SelfOrganizingMap to the diabetes data set. 288
Figure 10.1 Statistics for the XOR function. 295
Figure 10.2 Main process for learning the XOR function. 295
Figure 10.3 Default settings for the hidden layers parameter. 296
Figure 10.4 A single hidden layer of five nodes. 297
Figure 10.5 Performance parameters for neural network learning. 298
Figure 10.6 Neural network architecture for the XOR function. 298
Figure 10.7 Hidden-to-output layer connection weights. 299
Figure 10.8 Prediction confidence values for the XOR function. 299
Figure 10.9 Performance vector for the XOR function. 300
Figure 10.10 Absolute error value for the XOR function. 300
Figure 10.11 Attribute declarations for the satellite image data set. 302
Figure 10.12 Main process for mining the satellite image data set. 302
Figure 10.13 Subprocesses for satellite image data set. 303
Figure 10.14 Network architecture for the satellite image data set. 304
Figure 10.15 Performance vector for the satellite image data set. 304
Figure 10.16 Removing correlated attributes from the satellite image data set. 305
xxiv ◾ List of Figures

Figure 10.17 Green and red have been removed from the satellite image data set. 305
Figure 10.18 Correlation matrix for the satellite image data set. 306
Figure 10.19 Neural network model for predicting customer churn. 307
Figure 10.20 Preprocessing the customer churn data. 308
Figure 10.21 Cross-validation subprocess for customer churn. 308
Figure 10.22 Performance vector for customer churn. 309
Figure 10.23 Process for creating and saving a neural network model. 309
Figure 10.24 Process model for reading and applying a neural network model. 310
Figure 10.25 Neural network output for predicting customer churn. 310
Figure 10.26 SOM process model for the cardiology patient data set. 312
Figure 10.27 Clustered instances of the cardiology patient data set. 312
Figure 11.1 RapidMiner’s naïve Bayes operator. 325
Figure 11.2 Subprocess for applying naïve Bayes to customer churn data. 326
Figure 11.3 Naïve Bayes Distribution Table for customer churn data. 326
Figure 11.4 Naïve Bayes performance vector for customer churn data. 327
Figure 11.5 Life insurance promotion by gender. 328
Figure 11.6 Naïve Bayes model with output attribute = LifeInsPromo. 329
Figure 11.7 Predictions for the life insurance promotion. 329
Figure 11.8 Hyperplanes separating the circle and star classes. 330
Figure 11.9 Hyperplanes passing through their respective support vectors. 331
Figure 11.10 Maximal margin hyperplane separating the star and circle classes. 335
Figure 11.11 Loading the nine instances of Figure 11.8 into the Explorer. 338
Figure 11.12 Invoking SMO model. 339
Figure 11.13 Disabling data normalization/standardization. 339
Figure 11.14 The SMO-created MMH for the data shown in Figure 11.8. 340
Figure 11.15 Applying mySVM to the cardiology patient data set. 341
Figure 11.16 Normalized cardiology patient data. 342
Figure 11.17 Equation of the MMH for the cardiology patient data set. 342
Figure 11.18 Actual and predicted output for the cardiology patient data. 343
List of Figures ◾ xxv

Figure 11.19 Performance vector for the cardiology patient data. 343
Figure 11.20 A linear regression model for the instances of Figure 11.8. 345
Figure 11.21 Main process window for applying RapidMiner’s linear regression
operator to the gamma-ray burst data set. 346
Figure 11.22 Subprocess windows for the Gamma Ray burst experiment. 346
Figure 11.23 Linear regression—actual and predicted output for the gamma-ray
burst data set. 347
Figure 11.24 Summary statistics and the linear regression equation for the
gamma-ray burst data set. 347
Figure 11.25 Scatterplot diagram showing the relationship between t90 and t50. 348
Figure 11.26 Performance vector resulting from the application of linear
regression to the gamma-ray burst data set. 348
Figure 11.27 A generic model tree. 349
Figure 11.28 The logistic regression equation. 351
Figure 12.1 A Cobweb-created hierarchy. 363
Figure 12.2 Applying EM to the gamma-ray burst data set. 366
Figure 12.3 Removing correlated attributes from the gamma-ray burst data set. 367
Figure 12.4 An EM clustering of the gamma-ray burst data set. 367
Figure 12.5 Summary statistics for an EM clustering of the gamma-ray burst data set. 368
Figure 12.6 Decision tree representing a clustering of the gamma-ray burst data set. 368
Figure 12.7 The decision tree of Figure 12.6 in descriptive form. 369
Figure 12.8 Classes of the sensor data set. 370
Figure 12.9 Generic object editor allows us to specify the number of clusters. 370
Figure 12.10 Classes to clusters summary statistics. 371
Figure 12.11 Unsupervised genetic clustering. 372
Figure 13.1 A process model for extracting historical market data. 380
Figure 13.2 Historical data for XIV. 381
Figure 13.3 Time-series data with numeric output. 382
Figure 13.4 Time-series data with categorical output. 383
Figure 13.5 Time-series data for processing with RapidMiner. 383
xxvi ◾ List of Figures

Figure 13.6 A 3-month price chart for XIV. 384

Figure 13.7 A process model for time-series analysis with categorical output. 385
Figure 13.8 Predictions and confidence scores for time-series analysis. 385
Figure 13.9 Performance vector—time-series analysis for XIV. 386
Figure 13.10 Predicting the next-day closing price of XIV. 386
Figure 13.11 Time-series data formatted for Weka—categorical output. 387
Figure 13.12 Time-series data formatted for Weka—numeric output. 388
Figure 13.13 Time-series analysis with categorical output. 388
Figure 13.14 Time-series analysis with numeric output. 389
Figure 13.15 Cluster analysis using time-series data. 389
Figure 13.16 A generic Web usage data mining model. 392
Figure 13.17 Creating usage profiles from session data. 395
Figure 13.18 Hypertext link recommendations from usage profiles. 396
Figure 13.19 A page link structure. 397
Figure 13.20 A main process model for mining textual data. 401
Figure 13.21 A template to enter folder names used for textual data. 401
Figure 13.22 Class and folder names containing textual data. 402
Figure 13.23 Subprocess for tokenizing and stemming textual data. 402
Figure 13.24 A tokenized positive evaluation. 403
Figure 13.25 A tokenized and stemmed positive evaluation. 403
Figure 13.26 Textual data reduced to two dimensions. 403
Figure 13.27 Rules defining the three product evaluation classes. 404
Figure 13.28 An ROC graph for four competing models. 407
Figure 13.29 The PART algorithm applied to the spam data set. 408
Figure 13.30 An ROC curve created by applying PART to the spam data set. 409
Figure 13.31 Locating the true- and false-positive rate position in the ROC curve. 410
Figure 13.32 Confidence scores for predicted values with the spam data set. 410
Figure 13.33 Sorted confidence scores for the spam data set. 411
Figure 13.34 A main process for testing the AdaBoost operator. 415
List of Figures ◾ xxvii

Figure 13.35 Subprocess using a decision tree without AdaBoost. 415

Figure 13.36 Subprocess using AdaBoost, which builds several decision trees. 416
Figure 13.37 T-test results for testing AdaBoost. 416
Figure 13.38 Results of the ANOVA with the AdaBoost operator. 417
Figure 14.1 A simple entity-relationship diagram. 424
Figure 14.2 A data warehouse process model. 428
Figure 14.3 A star schema for credit card purchases. 429
Figure 14.4 Dimensions of the fact table shown in Figure 14.3. 431
Figure 14.5 A constellation schema for credit card purchases and promotions. 433
Figure 14.6 A multidimensional cube for credit card purchases. 435
Figure 14.7 A concept hierarchy for location. 436
Figure 14.8 Rolling up from months to quarters. 437
Figure 14.9 Creating a pivot table. 439
Figure 14.10 A blank pivot table. 440
Figure 14.11 A comparison of credit card insurance and income range. 440
Figure 14.12 A chart comparing credit card insurance and income range. 441
Figure 14.13 A credit card promotion cube. 442
Figure 14.14 A pivot table for the cube shown in Figure 14.13. 442
Figure 14.15 Pivot table position corresponding to the highlighted cell
in Figure 14.13. 443
Figure 14.16 Drilling down into the cell highlighted in Figure 14.15. 444
Figure 14.17 Highlighting female customers with a slice operation. 444
Figure 14.18 A second approach for highlighting female customers. 445
Figure A.1 A successful installation. 452
Figure A.2 Locating and installing a package. 452
Figure A.3 List of installed packages. 452
List of Tables

Table 1.1 Hypothetical training data for disease diagnosis 9

Table 1.2 Data instances with an unknown classification 10
Table 1.3 Acme Investors Incorporated 12
Table 2.1 Cardiology patient data 37
Table 2.2 Typical and atypical instances from the cardiology domain 38
Table 2.3 Credit card promotion database 41
Table 2.4 Neural network training: actual and computed output 46
Table 2.5 A three-class confusion matrix 51
Table 2.6 A simple confusion matrix 52
Table 2.7 Two confusion matrices each showing a 10% error rate 52
Table 2.8 Two confusion matrices: no model and an ideal model 54
Table 2.9 Two confusion matrices for alternative models with lift equal to 2.25 55
Table 3.1 The credit card promotion database 67
Table 3.2 Training data instances following the path in Figure 3.4 to credit card
insurance = no 72
Table 3.3 A subset of the credit card promotion database 82
Table 3.4 Single-item sets 83
Table 3.5 Two-item sets 83
Table 3.6 K-means input values 86
Table 3.7 Several applications of the K-means algorithm (K = 2) 89
Table 3.8 An initial population for supervised genetic learning 92
Table 3.9 Training data for genetic learning 93

xxix
xxx ◾ List of Tables

Table 3.10 A second-generation population 94

Table 6.1 Initial population for genetic attribute selection 212
Table 7.1 A confusion matrix for the null hypothesis 229
Table 7.2 Absolute and squared error (output attribute = life insurance promotion) 237
Table 8.1 Initial weight values for the neural network shown in Figure 8.1 257
Table 8.2 A population of weight elements for the network in Figure 8.1 259
Table 9.1 Exclusive-OR function 272
Table 11.1 Data for Bayes classifier 318
Table 11.2 Counts and probabilities for attribute gender 318
Table 11.3 Addition of attribute age to the Bayes classifier data set 323
Table 12.1 Five instances from the credit card promotion database 358
Table 12.2 Agglomerative clustering: first iteration 358
Table 12.3 Agglomerative clustering: second iteration 359
Table 12.4 Data for conceptual clustering 364
Table 12.5 Instances for unsupervised genetic learning 373
Table 12.6 A first-generation population for unsupervised clustering 373
Table 13.1 Daily closing price for XIV, January 4, 2016 to January 8, 2016 381
Table 13.2 Daily closing price for XIV January 5, 2016 to January 11, 2016 381
Table 14.1 Relational table for vehicle-type 425
Table 14.2 Relational table for customer 426
Table 14.3 Join of Tables 14.1 and 14.2 426
Preface

Data mining is the process of finding interesting patterns in data. The objective of data
mining is to use discovered patterns to help explain current behavior or to predict future
outcomes. Several aspects of the data mining process can be studied. These include

• Data gathering and storage

• Data selection and preparation
• Model building and testing
• Interpreting and validating results
• Model application

A single book cannot concentrate on all areas of the data mining process. Although
we furnish some detail about all aspects of data mining and knowledge discovery, our
primary focus is centered on model building and testing, as well as on interpreting and
validating results.

OBJECTIVES
I wrote the text to facilitate the following student learning goals:

• Understand what data mining is and how data mining can be employed to solve real
problems
• Recognize whether a data mining solution is a feasible alternative for a specific
problem
• Step through the knowledge discovery process and write a report about the results of
a data mining session
• Know how to apply data mining software tools to solve real problems
• Apply basic statistical and nonstatistical techniques to evaluate the results of a data
mining session

xxxi
xxxii ◾ Preface

• Recognize several data mining strategies and know when each strategy is appropriate
• Develop a comprehensive understanding of how several data mining techniques
build models to solve problems
• Develop a general awareness about the structure of a data warehouse and how a data
warehouse can be used
• Understand what online analytical processing (OLAP) is and how it can be applied
to analyze data

UPDATED CONTENT AND SOFTWARE CHANGES

The most obvious difference between the first and second editions of the text is the change
in data mining software. Here is a short list of the major changes seen with the second
edition:

• In Chapter 4, I introduce the Waikato Environment for Knowledge Analysis (Weka),

an easy-to-use, publicly available tool for data mining. Weka contains a wealth of
preprocessing and data mining techniques, graphical features, and visualization
capabilities.

• Chapter 5 is all about data mining using RapidMiner Studio, a powerful open-source
and code-free version of RapidMiner’s commercial product. RapidMiner uses a
drag and drop workflow paradigm for building models to solve complex problems.
RapidMiner’s intuitive user interface, visualization capabilities, and assortment of
operators for preprocessing and mining data are second to none.

• This edition covers what are considered to be the top 10 data mining algorithms
(Wu and Kumar, 2009). Nine of the algorithms are used in one or more tutorials.

• Tutorials have been added for attribute selection, dealing with imbalanced data, out-
lier analysis, time-series analysis, and mining textual data.

• Over 90% of the tutorials are presented using both Weka and RapidMiner. This
allows readers maximum flexibility for their hands-on data mining experience.

• Selected new topics include

• A brief introduction to big data and data analytics

• Receiver operating characteristic (ROC) curves

• Methods for handling large-sized, streaming, and imbalanced data

• Extended coverage of textual data mining

• Added techniques for attribute and outlier analysis

Preface ◾ xxxiii

DATA SETS FOR DATA MINING

All data sets used for tutorials, illustrations, and end-of-chapter exercises are described in
the text. The data sets come from several areas including business, health and medicine, and
science. The data sets together with screenshots in PowerPoint and PDF format showing
what you will see as you work through the tutorials can be downloaded from two locations:

• The CRC website: https://www.crcpress.com/Data-Mining-A-Tutorial-Based-Primer

-Second-Edition/Roiger/p/book/9781498763974, under the Downloads tab
• https://krypton.mnsu.edu/~sa7379bt

INTENDED AUDIENCE
I developed most of the material for this book while teaching a one-semester data mining
course open to students majoring or minoring in business or computer science. In writing
this text, I directed my attention toward four groups of individuals:

• Educators in the areas of decision science, computer science, information systems,

and information technology who wish to teach a unit, workshop, or entire course on
data mining and knowledge discovery
• Students who want to learn about data mining and desire hands-on experience with
a data mining tool
• Business professionals who need to understand how data mining can be applied to
help solve their business problems
• Applied researchers who wish to add data mining methods to their problem-solving
and analytics tool kit

CHAPTER FEATURES
I take the approach that model building is both an art and a science best understood from
the perspective of learning by doing. My view is supported by several features found within
the pages of the text. The following is a partial list of these features.

• Simple, detailed examples. I remove much of the mystery surrounding data mining
by presenting simple, detailed examples of how the various data mining techniques
build their models. Because of its tutorial nature, the text is appropriate as a self-study
guide as well as a college-level textbook for a course about data mining and knowl-
edge discovery.
• Overall tutorial style. All examples in Chapters 4, 5, 9, and 10 are tutorials. Selected
sections in Chapters 6, 7, 11, 12, 13, and 14 offer easy-to-follow, step-by-step tutorials
xxxiv ◾ Preface

for performing data analytics. All selected section tutorials are highlighted for easy
differentiation from regular text.

• Data sets for data mining. A variety of data sets from business, medicine, and science
are ready for data mining.

• Key term definitions. Each chapter introduces several key terms. A list of definitions
for these terms is provided at the end of each chapter.

• End-of-chapter exercises. The end-of-chapter exercises reinforce the techniques

and concepts found within each chapter. The exercises are grouped into one of
three categories—review questions, data mining questions, and computational
questions.

• Review questions ask basic questions about the concepts and content found
within each chapter. The questions are designed to help determine if the reader
understands the major points conveyed in each chapter.

• Data mining questions require the reader to use one or several data mining tools
to perform data mining sessions.

• Computational questions have a mathematical flavor in that they require the

reader to perform one or several calculations. Many of the computational ques-
tions are appropriate for challenging the more advanced student.

CHAPTER CONTENT
The ordering of the chapters and the division of the book into separate parts is based
on several years of experience in teaching courses on data mining. Section I introduces
material that is fundamental to understanding the data mining process. The presenta-
tion is informal and easy to follow. Basic data mining concepts, strategies, and tech-
niques are introduced. Students learn about the types of problems that can be solved
with data mining.
Once the basic concepts are understood, Section II provides the tools for knowledge
discovery with detailed tutorials taking you through the knowledge discovery process.
The fact that data preprocessing is fundamental to successful data mining is empha-
sized. Also, special attention is given to formal data mining evaluation techniques.
Section III is all about neural networks. A conceptual and detailed presentation is offered
for feed-forward networks trained with backpropagation learning and self-organizing
maps for unsupervised clustering. Section III contains several tutorials for neural network
learning with Weka and RapidMiner.
Section IV focuses on several specialized techniques. Topics of current interest such as
time-series analysis, textual data mining, imbalanced and streaming data, as well as Web-
based data mining are described.
Preface ◾ xxxv

Section I: Data Mining Fundamentals

• Chapter 1 offers an overview of data analytics and all aspects of the data mining pro-
cess. Special emphasis is placed on helping the student determine when data mining
is an appropriate problem-solving strategy.
• Chapter 2 presents a synopsis of several common data mining strategies and tech-
niques. Basic methods for evaluating the outcome of a data mining session are described.
• Chapter 3 details a decision tree algorithm, the Apriori algorithm for producing asso-
ciation rules, a covering rule algorithm, the K-means algorithm for unsupervised
clustering, and supervised genetic learning. Tools are provided to help determine
which data mining techniques should be used to solve specific problems.

Section II: Tools for Knowledge Discovery

• Chapter 4 presents a tutorial introduction to Weka’s Explorer. Several tutorials pro-

vide a hands-on experience using the algorithms presented in the first three chapters.
• Chapter 5 introduces RapidMiner Studio 7, an open-source, code-free version of
RapidMiner’s commercial product. The chapter parallels the tutorials presented in
Chapter 4 for building, testing, saving, and applying models.
• Chapter 6 introduces the knowledge discovery in databases (KDD) process model
as a formal methodology for solving problems with data mining. This chapter offers
tutorials for outlier detection.
• Chapter 7 describes formal statistical and nonstatistical methods for evaluating the
outcome of a data mining session. The chapter illustrates how to create and read
Pareto lift charts and how to apply RapidMiner’s ANOVA, Grouped ANOVA, and
T-Test statistical operators for model evaluation.

Section III: Building Neural Networks

• Chapter 8 presents two popular neural network models. A detailed explanation of

neural network training is offered for the more technically inclined reader.
• Chapter 9 offers tutorials for applying Weka’s MultilayerPerceptron neural network
function for supervised learning and Weka’s SelfOrganizingMap for unsupervised
clustering.
• Chapter 10 presents tutorials on applying RapidMiner’s Neural Network operator for
supervised learning, and Self-Organizing Map operator for unsupervised clustering.
xxxvi ◾ Preface

Section IV: Advanced Data Mining Techniques

• Chapter 11 details several supervised statistical techniques including naive Bayes

classifier, support vector machines, linear regression, logistic regression, regression
trees, and model trees. The chapter contains several examples and tutorials.
• Chapter 12 presents several unsupervised clustering techniques including agglomera-
tive clustering, hierarchical conceptual clustering, and expectation maximization (EM).
Tutorials on using supervised learning for unsupervised cluster evaluation are presented.
• Chapter 13 introduces techniques for performing time-series analysis, Web-based
mining, and textual data mining. Methods for dealing with large-sized, imbalanced,
and streaming data are offered. Bagging and boosting are described as methods for
improving model performance. Tutorials and illustrations for time-series analysis,
textual data mining, and ensemble learning are presented. A detailed example using
receiver operator curves is offered.
• Chapter 14 provides a gentle introduction to data warehouse design and OLAP. A
tutorial on using Excel pivot tables for data analysis is included.

INSTRUCTOR SUPPLEMENTS
The following supplements are provided to help the instructor organize lectures and write
examinations:

• PowerPoint slides. Each figure and table in the text is part of a PowerPoint presenta-
tion. These slides are also offered in PDF format.
• A second set of slides containing the screenshots seen as you work through the
tutorials in Chapters 4 through 14.
• All RapidMiner processes used in the tutorials, demonstrations, and end-of-chapter
exercises are readily available together with simple installation instructions.
• Test questions. Several test questions are provided for each chapter.
• Answers to selected exercises. Answers are given for most of the end-of-chapter
exercises.
• Lesson planner. The lesson planner contains ideas for lecture format and points for
discussion. The planner also provides suggestions for using selected end-of-chapter
exercises in a laboratory setting.

Please note that these supplements are available to qualified instructors only. Contact
your CRC sales representative or get help by visiting https://www.crcpress.com/contactus
to access this material. Supplements will be updated as needed.
Preface ◾ xxxvii

USING WEKA AND RAPIDMINER

Students are likely to benefit most by developing a working knowledge of both tools. This
is best accomplished by students beginning their data mining experience with Weka’s
Explorer interface. The Explorer is easy to navigate and makes several of the more dif-
ficult preprocessing tasks transparent to the user. Missing data are automatically handled
by most data mining algorithms, and data type conversions are automatic. The format
for model evaluation, be it a training/test set scenario or cross-validation, is implemented
with a simple click of the mouse. This transparency allows the beginning student to
immediately experience the data mining process with a minimum of frustration. The
Explorer is a great starting point but still supports the resources for a complete data min-
ing experience.
Once students become familiar with the data mining process, they are ready to advance
to Chapter 5 and RapidMiner. RapidMiner Studio’s drag-and-drop workflow environ-
ment gives students complete control over their model building experience. Just a few of
RapidMiner’s features include an intuitive user interface, excellent graphics, and over 1500
operators for data visualization and preprocessing, data mining, and result evaluation. An
interface for cloud computing as well as extensions for mining textual data, financial data,
and the Web is supplemented by a large user community to help answer your data mining
questions.
Here are a few suggestions when using both models:

• Cover the following sections to gain enough knowledge to understand the tutorials
presented in later chapters.
• If Weka is your choice, at a minimum, work through Sections 4.1, 4.2, and 4.7 of
Chapter 4.
• If you are focusing on RapidMiner, cover at least Sections 5.1 and 5.2 of Chapter 5.
• Here is a summary of the tutorials given in Chapters 6, 7, 11, 12, 13, and 14.
• Chapter 6: RapidMiner is used to provide a tutorial on outlier analysis.
• Chapter 7: Tutorials are presented using RapidMiner’s T-Test and ANOVA opera-
tors for comparing model performance.
• Chapter 11: Both models are used for tutorials highlighting naive Bayes classifier
and support vector machines.
• Chapter 12: RapidMiner and Weka are used to illustrate unsupervised clustering
with the EM (Expectation Maximization) algorithm.
• Chapter 13: Both RapidMiner and Weka are employed for time-series analysis.
RapidMiner is used for a tutorial on textual data mining. Weka is employed for
a tutorial on ROC curves. RapidMiner is used to give an example of ensemble
learning.
xxxviii ◾ Preface

• Chapter 14: Tutorials are given for creating simple and multidimensional MS
Excel pivot tables.
• Chapter 9 is about neural networks using Weka. Chapter 10 employs RapidMiner
to cover the same material. There are advantages to examining at least some of the
material in both chapters. Weka’s neural network function is able to mine data hav-
ing a numeric output attribute, and RapidMiner’s self-organizing map operator can
perform dimensionality reduction as well as unsupervised clustering.

SUGGESTED COURSE OUTLINES

The text is appropriate for the undergraduate information systems or computer science
student. It can also provide assistance for the graduate student who desires a working
knowledge of data mining and knowledge discovery. The text is designed to be covered in
a single semester.

A Data Mining Course for Information Systems Majors or Minors

Cover Chapters 1, 2, and 3 in detail. However, Section 3.5 of Chapter 3 may be omitted or
lightly covered. Spend enough time on Chapters 4 and 5 for students to feel comfortable
working with the software tools.
If your students lack a course in basic statistics, Sections 7.1 through 7.7 can be lightly
covered. The tutorial material in Sections 7.8 through 7.10 is instructive. Section 8.5 can be
excluded, but cover all of either Chapter 9 or Chapter 10. Cover topics from Chapters 11
through 13 as appropriate for your class. For Chapter 13, all students need some exposure
to time-series analysis as well as Web-based and textual data mining.

A Data Mining Course for Undergraduate Computer Science Majors or Minors

Cover Chapters 1 through 13 in detail. Spend a day or two on the material in Chapter 14 to
provide students with a basic understanding of online analytical processing and data ware-
house design. Spend extra time covering material in Chapter 13. For a more intense course,
the material in Appendix B, “Statistics for Performance Evaluation,” can be covered as part
of the regular course.

A Data Mining Short Course

The undergraduate or graduate student interested in quickly developing a working knowl-
edge of data mining should devote time to Chapters 1, 2, and 3, and Chapter 4 or 5. A
working knowledge of neural networks can be obtained through the study of Chapter 8
(Sections 8.1 through 8.4) and Chapter 9 or Chapter 10.
Acknowledgments

I am indebted to my editor Randi Cohen for the confidence she placed in me and for allow-
ing me the freedom to make critical decisions about the content of the text. I am very grate-
ful to Dr. Mark Polczynski and found his constructive comments to be particularly helpful
during revisions of the manuscript. Finally, I am most deeply indebted to my wife Suzanne
for her extreme patience, helpful comments, and consistent support.

xxxix
Author

Richard J. Roiger, PhD, is a professor emeritus at Minnesota State University, Mankato,

where he taught and performed research in the Computer Information Science Department
for 27 years. Dr. Roiger earned his PhD degree in computer and information sciences at the
University of Minnesota. Dr. Roiger has presented conference papers and written several
journal articles about topics in data mining and knowledge discovery. After retirement,
Dr. Roiger continues to serve as a part-time faculty member teaching courses in data min-
ing, artificial intelligence, and research methods. He is a board member of the Retired
Education Association of Minnesota, where he serves as their financial advisor.

xli
I
Data Mining Fundamentals

1
Random documents with unrelated
content Scribd suggests to you:
The Project Gutenberg eBook of Les caravanes
d'un chirurgien d'ambulances pendant le siége
de Paris et sous la commune
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.

Title: Les caravanes d'un chirurgien d'ambulances pendant le siége

de Paris et sous la commune

Author: Désiré Joseph Joulin

Release date: March 23, 2021 [eBook #64909]

Language: French

Credits: Laurent Vogel and the Online Distributed Proofreading Team

at https://www.pgdp.net (This file was produced from
images generously made available by The Internet
Archive/American Libraries.)

*** START OF THE PROJECT GUTENBERG EBOOK LES CARAVANES

D'UN CHIRURGIEN D'AMBULANCES PENDANT LE SIÉGE DE PARIS
ET SOUS LA COMMUNE ***
LES CARAVANES
D'UN
CHIRURGIEN
D'AMBULANCES
PENDANT LE SIÉGE DE PARIS
ET SOUS LA COMMUNE

PAR LE

DR JOULIN
PROFESSEUR AGRÉGÉ DE LA FACULTÉ DE MÉDECINE
CHEVALIER DE LA LÉGION D'HONNEUR

PARIS
E. DENTU, LIBRAIRE-ÉDITEUR
PALAIS-ROYAL, 17 ET 19, GALERIE D'ORLÉANS

A MONSIEUR

ARMAND DU MESNIL
OFFICIER DE LA LÉGION D'HONNEUR

SOUVENIR AFFECTUEUX
LES CARAVANES
D'UN

CHIRURGIEN
D'AMBULANCES
I

Dans ce tohu-bohu militaire qui fut le siége de Paris, la chirurgie

devait malheureusement jouer un grand rôle. Aussi, dès le début, le
corps médical organisa les secours avec un dévouement dont je ne
lui ferai pas l'injure de le féliciter, car il fait partie de ses traditions,
de ses devoirs, de ses habitudes. Le médecin se dévoue aussitôt que
surgit un malheur public qu'il peut soulager, et cela sans craindre
d'écraser ses contemporains sous un lourd fardeau de
reconnaissance, car il sait parfaitement qu'il est sans exemple dans
l'histoire que le public ait jamais tenu compte au médecin de son
dévouement une fois que le péril est passé.
Les ambulances constituaient la question d'urgence, mais toute
l'organisation en fut abandonnée à l'initiative ou à l'inexpérience
individuelle. Chacun fit du mieux qu'il put, chercha ses ressources là
où il espéra les prendre. Les uns, comme l'Internationale et la
Presse, avec leurs puissants moyens d'action, reçurent des capitaux
considérables et firent les choses tout à fait en grand ; d'autres, plus
modestes, sollicitèrent à domicile des souscriptions et un concours
qui firent rarement défaut. Enfin, le zèle de tous accomplit des
prodiges de charité.
L'administration supérieure, qui poussa l'incapacité jusqu'au
génie, eut le bon goût de s'effacer et de nous laisser faire au début,
et c'est une des rares choses dont on lui aurait su gré, si elle avait
eu l'intelligence de persister dans cet effacement. Malheureusement
le gouvernement possède dans sa collection de rouages inutiles un
vieux dieu, sans bras ni jambes, fétiche perclus du cerveau, dur
d'oreille et voulant tout engloutir dans ses vastes mâchoires
démeublées. Ce dieu vorace et impuissant se nomme l'INTENDANCE.
Aussi longtemps que les ambulances furent en voie de création,
l'intendance respecta religieusement cette phase pénible de
l'existence des choses nouvelles, mais aussitôt que les ambulances
organisées furent en état de rendre des services, l'intendance,
escortée de ses riz-pain-sel, se fit porter au milieu de la route pour
empêcher les ambulanciers de passer et leur dit en langage
administratif, que je traduis ici pour la commodité du lecteur :
« Je suis l'intendance, et j'ai dans mes attributions ce qui
concerne les réparations de la peau militaire. Vous voulez me faire
une concurrence déloyale puisque vous prétendez faire et bien faire,
à vos frais, sans qu'il en coûte un sou à l'État, une besogne pour
laquelle l'État me paye très-cher, et que je fais fort mal. Par le foin
qui remplit mes bottes! je ne puis vous permettre un pas de plus
dans cette voie fatale. Je vous absorbe ou je vous brise ; choisissez
entre mes vices et ma colère. Mes vices sont aimables, et ma colère
terrible. J'aime mieux priver la société de vos services que de vous
voir rendre des services à côté de moi, qui suis habituée à n'en
rendre qu'à moi-même. »
Nous verrons plus tard ce qu'il advint des outrecuidantes
prétentions de l'intendance.
Les ambulances s'organisaient donc de tous les côtés. Une
ambulance qui s'organise se compose de deux éléments assez
distincts ; d'abord l'état-major, en général fort disposé à croquer des
marrons, ensuite les comparses, dont l'éternelle destinée semble de
tirer du feu lesdits marrons. J'aurais pu mettre mon couvert du côté
des croqueurs, c'est-à-dire de l'état-major, mais je n'aime ni à tirer,
ni à croquer les marrons. Je restai donc un instant à l'écart,
examinant comment je pourrais me rendre utile en conservant toute
mon indépendance d'action, et en me créant une situation spéciale
où je n'aurais ni trop à commander, ni surtout à obéir.
Après avoir avec soin étudié ce terrain nouveau pour moi,
j'acquis la conviction que pour rendre la plus grande somme de
services possible au combat, un chirurgien, muni d'appareils bien
complets, devait diriger seulement deux voitures d'ambulance, l'une
pour blessés couchés, l'autre pour blessés assis, lesdites voitures
constamment à ses ordres et prêtes à se porter au feu chaque fois
qu'il y a bataille.
C'est là le véritable type de l'ambulance volante. Avec deux
voitures on passe partout, on a son petit personnel tout entier sous
la main, chacun sait d'avance le rôle qu'il doit remplir, les ordres sont
rapidement exécutés, et les soins d'autant plus efficaces qu'ils se
font moins attendre. On fait le pansement complet et définitif sur
place, on charge de suite ses blessés sans leur faire subir une foule
de transbordements toujours pénibles et qui durent de longues
heures, parfois même plusieurs jours ; puis, les voitures pleines, on
revient à Paris et on expédie les malades là où ils trouveront les
soins définitifs les plus convenables, selon la gravité de leurs
blessures.
Dans cette situation, le chirurgien est absolument indépendant ;
il n'obéit qu'à ses inspirations, à sa fantaisie, à son initiative ; il ne
subit d'autre contrôle que celui de sa volonté, et lorsqu'il a acquis un
peu d'habitude dans le métier d'ambulancier, il ne perd pas sa
journée.
Je dis quand il a acquis un peu d'habitude, car il faut encore un
certain apprentissage ; il ne suffit pas d'avoir un brillant équipage de
chasse pour trouver le gibier, il faut en connaître les us et coutumes.
Les trois quarts du temps l'état-major de la place semblait ne pas
savoir où on se battait ou même s'il y avait combat ; il vous envoyait
parfois au nord quand l'affaire était au sud, et cela de la meilleure foi
du monde. Aussi j'ai fait jusqu'à dix ou douze lieues dans une
journée sans pouvoir mettre la main sur un blessé, ce qui n'était pas
absolument agréable par un froid de huit ou dix degrés.
Donc, pour faire une bonne ambulance volante, outre un
chirurgien bien équipé, il faut malheureusement deux voitures et des
chevaux. Je dis malheureusement, parce que c'est justement là que
gît la difficulté.
Pour la première fois qu'une voiture entre en campagne, cela va
encore ; on empaume assez facilement les gens, on leur montre
l'expédition exclusivement par son côté pittoresque, en leur cachant
avec soin le côté laurier. Aussi le voyage, au départ, se fait avec
beaucoup d'entrain et de gaieté ; seulement il peut arriver un
moment où il n'est plus temps de feindre, la dissimulation serait
absolument inutile : on peut tomber en plein drame militaire. Alors la
mine du propriétaire de l'équipage s'allonge ; on entend des : « Ah!
si j'avais su! » étouffés, l'œil a des effarements précurseurs d'une
fuite, et si vous avez le malheur de quitter vos gens cinq minutes,
vous courez la chance de ne plus retrouver personne et de revenir
seul, à pied, avec vos instruments sur le dos.
Au retour, la conversation languit, vous sentez des regards
hostiles et qui semblent dire : « Si jamais tu m'y repinces! »
Mais à mesure qu'on pénètre dans l'atmosphère de Paris, à
mesure qu'on s'écarte du tapage et de la fumée de la bagarre, le
courage du néophyte renaît, sa langue se délie, et bientôt il parle
avec complaisance des dangers qu'on aurait pu courir, du sang-froid
qu'on aurait développé.
Vous croyez votre homme guéri de sa peur et aguerri pour
l'avenir! En vérité, je vous le dis, jamais vous ne remonterez dans la
voiture de cet homme, jamais son cheval ne fera partie d'une
ambulance, jamais sa femme ne vous pardonnera d'avoir conduit à
la boucherie son mari, un père de famille, qui n'a échappé que par
un véritable miracle à la mort des héros.
Je n'ai pas besoin de dire que neuf fois sur dix on n'a couru
aucune espèce de danger, et qu'au retour on s'est simplement
montré en famille d'autant plus téméraire que la peur avait été plus
grande.
Allez frapper à une autre écurie, celle-là vous est fermée pour
toujours.
Après un certain nombre de tentatives dont les résultats
présentaient les diverses nuances qui séparent un échec d'une
réussite, je finis par mettre la main sur deux voitures fidèles et
dévouées qui m'ont servi dans toutes les affaires depuis celle du
Moulin-Saquet. Une appartenait à M. Kerckoff, de la galerie
d'Orléans ; c'était un petit omnibus de famille, coquet, à six places,
traîné par un petit cheval très-fin, très-vigoureux, très-ardent, et qui
ne s'effrayait pas du bruit. Pierre, le cocher, complétait l'équipage
que je montais ordinairement.
Pierre était un bon type ; il avait ses jours de courage ; mais
parfois je le trouvais extrêmement nerveux et impressionnable. Il
affectait alors une vraie tendresse pour le petit cheval, dont il ne
voulait pas, disait-il, trop exposer la peau.
Mais comme la peau de Pierre était toujours située à une très-
faible distance de celle du cheval, je crois sincèrement que, lorsqu'il
voulait à tout prix sauver l'une, il pensait surtout à l'autre.
Le jour de l'affaire de Ville-Évrard, Pierre avait ses nerfs. Nous
débouchions par la route de Montreuil et nous passions au pied du
fort de Rosny, qui faisait un feu d'enfer de tous ses canons. Pierre
commença à devenir rétif. Je regardai son nez, c'était le baromètre
de son courage : quand il se sentait mal à l'aise, son nez se creusait
de petits plis longitudinaux et devenait blanc vers le bout. Le nez de
Pierre était, ce jour-là, houleux, et il passait au blanc.
— Monsieur, nous ne pouvons pas aller plus loin.
— Pourquoi cela?
— Le petit cheval va avoir peur.
— Eh bien, il cache son jeu, car on ne s'en aperçoit guère.
— Je le connais, monsieur, il va avoir peur et va nous faire des
cascades.
— Vous abusez de ce qu'il ne peut pas s'en défendre ; sans cela il
nous dirait que ce n'est pas lui qui a peur, mais que c'est vous.
— Moi!! quand j'étais au siége de Rome, j'en ai bien vu d'autres!
Pendant que Pierre se retrempait dans ses souvenirs belliqueux
du siége de Rome, nous avions dépassé le fort, le petit cheval n'avait
pas eu peur, et Pierre était rassuré, car il avait entendu que les obus
passaient à une vingtaine de pieds au-dessus de notre tête. Il n'y
avait véritablement aucune espèce de danger.
Mais la journée avait mal commencé pour lui, et il n'était pas au
bout de ses transes. Nous arrivâmes à 1 ou 2 kilomètres de Neuilly-
sur-Marne, sur la route qui conduit à Joinville, route absolument
découverte. Le plateau d'Avron échangeait une violente canonnade
avec les batteries prussiennes situées de l'autre côté de la Marne.
Les projectiles se croisaient au-dessus de la route et l'on
cheminait sous un dôme, non pas de verdure, mais d'obus. Le cas se
rencontrait assez fréquemment, car les batteries étaient en général
placées des deux côtés sur des points culminants. Ce cheminement
ne présentait du reste que bien peu de danger pour les voitures
d'ambulances quand elles prenaient le soin de ne pas marcher près
des soldats en armes. On n'avait guère à redouter que les obus trop
pressés qui éclataient en l'air ; mais cela était si rare qu'on n'avait
pas à en tenir compte. Avec un peu d'habitude on reconnaissait fort
bien à la mélodie de son ronflement si l'obus qui rayait cette voûte
de mitraille était à nous ou… aux autres.
II

Les obus ronflaient donc au-dessus de la route, qui était désertée

en ce moment par nos troupes ; on y voyait seulement une charrette
de cantinier escortée de quelques gardes nationaux. Les Prussiens
trouvèrent jovial de tuer ces braves gens. Ils envoyèrent sur la route
un seul obus, mais si bien pointé (leur batterie était à moins de
2,000 mètres) qu'ils crevèrent le cheval et éventrèrent deux des
gardes nationaux de l'escorte. Je ne pus que constater leur mort ; ils
avaient été tués sur le coup.
Je les fis déposer sur le bord du chemin.
Ce spectacle n'était point fait pour calmer les émotions de
Pierre ; son nez devint blafard et se creusa de véritables tranchées.
— Monsieur, allons-nous-en, ces brigands vont tuer le petit
cheval.
— Eh bien! et nos drapeaux d'ambulances qui sont sur les
voitures!
— Ils s'en fichent pas mal des drapeaux! Allons-nous-en,
monsieur, allons-nous-en.
Il portait sa peur avec tant de crânerie que je n'insistai pas trop
pour le faire marcher en avant. Je craignais de le voir filer sur Paris
et nous planter là sans vergogne.
— Puisque vous manquez de courage aujourd'hui, mettez-vous à
l'abri, avec les voitures, au bas du remblai de la route ; mettez à
terre le brancard et les instruments, et nous irons à pied chercher les
blessés.
Pierre ne se le fit pas dire deux fois, et il se jeta en bas du
remblai avec tant d'entrain qu'il engagea dans des branches d'arbres
le drapeau d'ambulance de la voiture ; il se cassa net. Je croyais le
piquer d'honneur, mais il nous regarda impassiblement partir à pied
avec les brancardiers. Il avait l'air de dire : Je me suis ramassé assez
de gloire au siége de Rome ; laissons-en pour les autres.
Nous arrivâmes à Neuilly-sur-Marne, mais ce n'était pas là que se
terminait l'affaire ; il fallait aller toujours à pied jusqu'à Ville-Évrard
et faire filer un à un les blessés jusqu'aux voitures ; c'était
absolument impraticable. Je priai un des brancardiers d'aller
chercher Pierre et de le ramener, n'importe comment, avec les
équipages. Pierre n'osa pas refuser ; son émotion était calmée ;
mais, en route, il s'aperçut qu'il n'avait plus de drapeau protecteur.
Je n'ai pas besoin de dire que le petit cheval fit la route ventre à
terre.
De Neuilly à Ville-Évrard, ce fut une nouvelle litanie. Chaque
maison qu'on rencontrait sur la route excitait son admiration.
— Ah! monsieur, la charmante maison!
— Ma foi! je la trouve assez laide.
— Ah! monsieur, qu'on serait bien ici.
— Pour y passer ses jours?
— Oh! non, pour se mettre à l'abri des obus.
Je dois, du reste, rendre justice à Pierre : ce fut son dernier jour
de faiblesse ; quand les voitures allaient un peu trop loin, son nez
pâlissait légèrement, se creusait de quelques rides, mais ses
observations sur les chances de longévité du petit cheval étaient
simplement mélancoliques, jamais il ne se permit la moindre
opposition à mes volontés [1] . L'affaire de la Ville-Évrard lui avait
laissé des remords.
[1] Hélas! sous la Commune, Pierre devait ternir ses
lauriers. Un beau jour lui et son camarade me plantèrent
là, avec une invincible résolution, ils tournèrent sans
retour le dos à la gloire.

Mais passons à l'étude de ma seconde voiture.

La seconde voiture était un grand fourgon de la maison Chevet,
que tout le monde a rencontré dans Paris, et dans lequel on peut
transporter des blessés couchés. Le cheval était vigoureux mais
dépourvu d'initiative ; il marchait à la suite et manifestait en toute
occasion un profond mépris pour les côtes. Lorsqu'il était forcé de
choisir entre un fossé ou une côte, jamais il n'eut un moment
d'hésitation, il déposa toujours la voiture dans le fossé et tourna la
croupe du côté de la montée.
Il commit, sans pudeur, cette incongruité à Avron, malgré les
regards sévères de l'assistance, et sans se laisser toucher par
l'exemple de son petit camarade qui enlevait avec vigueur l'autre
voiture sur le plateau.
Le cocher de M. Chevet était un solide gaillard, d'une placidité
toute philosophique, ne se plaignant jamais, ni de son cheval, ni du
froid, ni des Prussiens, et allant tranquillement là où je le menais
sans daigner faire une observation.
Mon personnel était complété par un ou deux brancardiers. Pour
eux, je n'avais que le choix, c'étaient des négociants, des amis, des
clients qui s'inscrivaient chez moi avec beaucoup
d'empressement [2] . Il est certain que la curiosité jouait un grand
rôle dans leur empressement. Mais je dois dire que pas un seul n'a
reculé devant la tâche qu'il avait acceptée et que j'avais toujours
soin de bien expliquer au départ.
[2] MM. Hébert, Martin, négociants habitant ma
maison, Laboureur, pharmacien, et son fils, M. Gauthier
etc., ont fait sous ma direction le pénible service de
brancardiers.

Les brancardiers sont souvent indispensables ; surtout lorsque la

pluie a détrempé les terres, il est impossible alors d'aller à travers
champs jusqu'aux blessés. Les voitures ne pourraient s'en tirer. On
va donc recueillir, avec les brancardiers, les hommes tombés ; on les
panse et on les ramène aux voitures.
La création des compagnies de brancardiers organisés en corps
réguliers était une excellente idée. Pour nous, elle avait cet avantage
de ne pas nous obliger à en emmener ; il nous était permis de
conserver ainsi plus de places dans nos voitures pour les blessés ;
sur le champ de bataille, elle avait l'immense avantage de diminuer
la durée de cette période d'angoisse qui sépare pour le soldat le
moment où il tombe de celui où il reçoit les premiers soins.
Malheureusement, on organisa les brancardiers vers la fin du
siége, et lorsqu'ils furent organisés, on ne sut point les utiliser
convenablement.
Il est évident que toute troupe allant au feu devait être
accompagnée de ses brancardiers. Je n'ai rien vu de semblable là où
je me suis trouvé, ce qui n'est pas une raison pour qu'on ne l'ait pas
fait ailleurs, car je ne veux parler que de ce que j'ai constaté par
mes yeux, et dans les affaires militaires le champ d'observations est
beaucoup plus restreint qu'on ne pourrait le croire. On ne sait jamais
ce qui se passe à un kilomètre du point qu'on occupe.
Cependant je puis dire que, le jour de l'affaire de Montretout, je
revenais sur Paris vers deux heures, naturellement avec mes
voitures pleines ; on se battait depuis le matin et la route de Rueil à
Courbevoie était encore émaillée de longues files de brancardiers qui
marchaient vers la bataille. C'était un peu tard. Je n'avais point eu à
constater leur présence près de l'ennemi, et mes blessés, qui
provenaient de l'attaque de la Malmaison, m'étaient apportés par les
cacolets.
Parmi les hommes et les choses qui, ce jour-là, n'étaient pas à
leur place, je citerai certain grand aumônier barbu monté sur un joli
cheval, et qui s'abritait avec soin derrière un pan de mur pendant
que je pansais mes blessés. Il avait la mine altérée d'un homme fort
mal à son aise.
Je me demandais quels services pouvait bien rendre, en pareilles
circonstances, un aumônier à cheval qui s'abrite avec tant de soin
derrière un mur. Je ne pouvais pourtant pas lui envoyer mes blessés
à confesser ; j'en avais cependant un qui avait une mauvaise balle
dans le ventre, et ils auraient pu en causer ensemble.
Je sais que, parmi les aumôniers, un grand nombre ont fait leur
devoir ; mais je crois qu'il ne faut pas généraliser outre mesure les
éloges. A l'affaire de l'Hay, ils étaient trois qui bavardaient entre eux,
sans trop s'occuper du reste ; et cependant les blessés ne
manquaient guère. J'en avais un surtout frappé d'une balle dans la
poitrine, une de ces plaies qui donnent quelques gouttes de sang,
mais qui laissent largement passer la mort. Je n'osais pas le panser ;
il fallait le déshabiller et j'avais peur de le voir expirer dans mes
mains. Pauvre garçon! il était là, mourant, étendu sur une mauvaise
paillasse que les Prussiens nous avaient prêtée. Les brancards
manquaient, et les Prussiens me signifiaient qu'ils ne voulaient pas
que j'emportasse la paillasse.
— Pansez-moi, docteur, me disait-il d'une voix éteinte.
Il lui semblait que là était le salut.
Je regardai du côté des aumôniers ; ils bavardaient toujours, et
cependant c'était bien pour eux le moment de dire quelques petites
choses à ce pauvre diable, avant qu'il partît pour un monde où l'on
ne se bat pas.
Quand les brancards arrivèrent, le soldat était mort. Les
aumôniers causaient toujours.
III

Je vais maintenant exposer avec quelle simplicité de mécanisme

l'initiative des médecins avait créé des ambulances, et je prendrai
comme exemple celle du Ier arrondissement dont j'ai été mieux à
même d'apprécier le fonctionnement. On verra ensuite ce qu'il a fallu
d'ineptie et d'incapacité à l'Intendance pour arriver à porter le
désordre dans une institution qui marchait admirablement.
Aux premiers bruits du siége, les médecins de l'arrondissement
furent convoqués sous la présidence du professeur Lasègue. On leur
demanda un concours qui fut naturellement accordé sans réserve.
Chacun devait fournir, dans la limite de ses moyens, des lits pour les
blessés et des secours de toute nature.
On décida d'abord qu'on fonderait un certain nombre
d'ambulances dans des locaux spéciaux et où on recevrait les
blessés assez gravement atteints, pour que de grandes opérations
pussent être faites avec un personnel de chirurgiens habiles,
d'internes, d'infirmiers, etc. Ces frais furent couverts par des
souscriptions privées, qui s'élevèrent à environ 35,000 francs. D'un
autre côté, les médecins devaient solliciter leurs clients les plus aisés
de prendre chez eux les blessés légèrement atteints. Ces blessés
devaient être nourris, pourvus de toutes les choses nécessaires aux
frais de leur hôte et être considérés comme des membres de la
famille.
Les médecins se chargeaient naturellement de tous les soins
nécessaires. En quelques jours, et de cette façon, le Ier
arrondissement disposa d'environ huit cent quatre-vingts lits qui ne
coûtaient absolument rien à l'État. Il fournissait un blessé, on lui
rendait un soldat bien portant. Je crois qu'il a rarement fait un
marché aussi avantageux.
Le professeur Lasègue se trouva être un organisateur de premier
ordre, qui se dévoua à l'œuvre, lui et toute sa famille, avec une
abnégation et un zèle dont personne naturellement n'a songé à leur
savoir le moindre gré.
Les dames dirigeaient la lingerie au bureau central de
l'ambulance et opéraient les distributions de secours et de vivres.
Le président avait sous ses ordres les bureaux et organisait tous
les services à mesure que la nécessité s'en faisait sentir. Le
mécanisme du fonctionnement était d'une simplicité élémentaire. Les
médecins donnaient le nombre de lits vacants dans le périmètre de
leur quartier. Ces lits, centralisés par le bureau, étaient représentés
par des bulletins. Le jour d'un combat, à mesure que les blessés
étaient amenés au bureau, sans même les faire descendre de
voiture, et selon la gravité de leur blessure, ils recevaient un bulletin
et étaient dirigés chez l'habitant, où ils trouvaient un bon lit tout prêt
à les recevoir, et une famille qui les accueillait avec empressement.
On ne renvoyait le blessé que guéri et prêt à être expédié à son
corps.
Dans la soirée et la nuit du 30 novembre et du 2 décembre,
l'ambulance du Ier arrondissement plaça quatre cent cinquante
blessés ; à deux heures du matin, les derniers arrivaient, et pas un
seul n'attendit l'asile dont ils avaient tous un si grand besoin.
Ici se place un petit fait qui peint bien les intendants. Une partie
des blessés tombés aux combats de la Marne étaient ramenés à
Paris sur les bateaux omnibus. Pour éviter les retards, on avait réuni
sur la berge les moyens de transports, et la distribution des bulletins
fonctionnait aussi régulièrement qu'au bureau central. Un bateau de
blessés aborde ; il en descend un intendant supérieurement
galonné.
— Qui est-ce qui dirige le service ici?
— C'est moi, dit M. Lasègue.
— Combien de lits?
— Quarante-cinq.
— Vous en avez cent.
— Quarante-cinq.
— Je vous dis que vous en avez cent.
M. Lasègue, froissé de la roideur et de l'impertinence de ce
monsieur qui ne savait pas un mot de l'état des choses, lui répondit
froidement en remettant ses bulletins dans sa poche :
— S'il y a cent lits, cherchez-les. Et il lui tourne le dos en fumant
son cigare.
L'intendant appela les brancardiers qui attendaient des ordres.
— Brancardiers, portez vingt blessés au théâtre du Châtelet.
— Il n'y a plus une place.
— Alors, allez à Saint-Merry.
— Tout est plein.
Le monsieur aux galons regarda d'un air furieux le bateau, les
brancardiers, planta là les blessés et le bateau, et disparut sans rien
dire.
Personne, depuis, n'en entendit oncques parler. Immédiatement,
la distribution des bulletins commença, et les trente-cinq blessés (il
n'y en avait pas plus sur le bateau) furent placés chez l'habitant.
L'ambulance du Ier arrondissement, pendant son fonctionnement,
a soigné 2,680 malades ou blessés. Elle trouva dans M. Méline,
adjoint au maire, un concours aussi actif qu'intelligent et dévoué ; il
débarrassa, dans les limites du possible, cette institution charitable
de toutes les entraves administratives qui lui étaient suscitées.
Il est probable que c'est pour la première fois que vous entendez
parler des ambulances du Ier arrondissement, tandis que vous avez
eu les oreilles rebattues des faits et gestes de quelques autres
ambulances.
Ne mesurez pas la somme du bien produit à l'intensité du tapage
qui se fait autour des choses. Les gens dont je vous parle n'ont vu
que le devoir et l'ont accompli noblement, simplement, gratuitement,
sans bruit. Ils fuyaient la réclame et eussent été profondément
blessés de voir leur conduite célébrée aux sons de la grosse caisse.
Avec des sommes véritablement insignifiantes, ils ont accompli des
choses énormes. Ceux-là peuvent dévoiler sans crainte au public le
mobile de leurs sentiments et surtout leurs livres de comptes. Plus
d'un philanthrope et plus d'une ambulance en ce monde ne
pourraient pas en faire autant.
Alors surgit l'intendance, qui ne sait guère jouer que le rôle de
« bâton dans les roues. » Plus d'une fois les intendants avaient fait
leur apparition dans nos bureaux. Mais, à leur sujet, la consigne était
générale : ne jamais discuter, trouver parfait et accepter leurs idées
trop souvent saugrenues, mais n'en tenir absolument aucun compte.
L'intendant se retirait enchanté, et on ne le revoyait jamais, car
c'est une particularité caractéristique de l'histoire naturelle de
l'intendant. Il parle, donne des ordres, et croit que cela suffit.
Presque jamais il ne vérifie si ses intentions ou ses ordres ont été
exécutés : c'est ce qui explique l'admirable chaos, l'ineffable
brouillamini, l'inextricable désordre qui caractérisent les actes de
cette institution.
L'intendance était au comble de la surprise. Malgré son
intervention, l'ambulance du Ier arrondissement fonctionnait toujours
admirablement. Mais il y avait un citoyen, préfet de la Seine, du nom
de Jules Ferry, un vrai préfet des pièces du Châtelet, et que je
confonds toujours avec Hurluberlu XIV. Ce magistrat municipal aurait
dû comprendre que son premier devoir était de sauvegarder ses
administrés du militarisme bouton de guêtre de l'intendance, et que
la charité privée n'a rien à gagner à l'intervention d'un corps égoïste,
incapable, sans cœur, qui envahit, non pas pour faire mieux que ce
qu'il remplace, mais uniquement pour accroître sa puissance, pour
affirmer sa domination envahissante.
Mais M. Ferry n'est point homme à se préoccuper de pareils
détails. Sans savoir un mot de la question, sans réfléchir à
l'absurdité des mesures qu'il prenait, il signa sous la dictée de
l'intendance une série de décisions qu'Hurluberlu XIV n'eût point lui-
même osé signer, sans réunir trois fois son conseil des ministres.
Il déclara qu'il se souciait assez peu de la charité privée qui
nourrissait les blessés ; on n'avait nul besoin de cela. Désormais
l'intendance se chargerait de ce soin. Ce qu'il demandait, c'était des
lits, beaucoup de lits vides, et le reste le regardait. De plus, les
arrondissements furent divisés en lopins appartenant aux secteurs et
dépendants de l'hôpital de ces secteurs : il était expressément
interdit aux ambulances de prendre des blessés, sinon ceux envoyés
par l'hôpital.
Le Ier arrondissement, divisé avec une logique particulière, se
trouvait dépecé entre trois secteurs et avait pour hôpitaux
répartiteurs Beaujon, Lariboisière et l'Hôtel-Dieu.
Voici maintenant le mode de fonctionnement : un blessé était
d'abord conduit à l'hôpital, par exemple à Beaujon, puis de là
renvoyé à l'ambulance, qui de là l'expédiait à destination.
Intelligente complication!
Pour la nourriture, c'était une autre histoire. Chaque jour,
l'habitant qui n'avait plus le droit de nourrir son malade à ses frais,
était fort empêché pour le nourrir aux frais de l'intendance ; car, en
ce temps de réquisition, on n'avait pour son argent des vivres qu'au
moyen d'une carte, et les cartes pour blessés étaient supprimées.
Donc, l'habitant charitable du Ier arrondissement était obligé
d'aller tous les matins à Beaujon ou à Lariboisière, chercher un bon
de cent grammes de viande qu'on lui faisait attendre parfois fort
longtemps ; puis, muni de ce bon, il continuait son voyage et allait
se faire servir, à quelques lieues de là, ses cent grammes de viande,
en faisant naturellement une nouvelle queue à la porte de la
boucherie de l'intendance.
Il est vrai que ses cent grammes de viande (quand il y avait de la
viande) ne lui coûtaient absolument rien — que la perte de sa
journée tout entière. Même cérémonie pour le pain et pour tout ce
qui était nécessaire aux blessés. Il était du reste absolument
défendu à un logeur de blessés de représenter ses voisins ; chacun
devait perdre sa propre journée et faire le voyage pour son compte.
Hélas! combien de gens donnèrent alors leur démission d'âmes
charitables!
Et dire qu'une époque qui a produit dans l'ordre moral tant de
flibustiers éminents, a pu produire en même temps dans l'ordre
administratif des administrateurs d'une aussi haute capacité, et
encore ils n'avaient pas l'excuse d'être hydrocéphales!
Toute la journée c'était une procession de gens qui arrivaient à
l'ambulance exaspérés :
« Mais, monsieur, j'ai chez moi quatre ou six, ou dix blessés qui
meurent de faim. Je meurs de faim aussi ; avec quoi voulez-vous
que je les nourrisse? »
L'intendance, qui laissait nos soldats valides crever de faim et de
misère, alors qu'ils avaient encore assez de voix pour faire retentir
leur colère, osait prendre la responsabilité de nourrir de malheureux
blessés qui ne pouvaient faire entendre leurs souffrances.
Ah! Monsieur Ferry, certaines sottises dans la vie privée ne sont
que des sottises, dans la vie publique elles peuvent devenir des
crimes.
Peu à peu, et grâce à l'énergie des municipalités, cette
organisation stupide fut un peu modifiée et fonctionna d'une façon
moins impraticable, mais l'élan de la charité privée était brisé, et il
devint fort difficile vers la fin d'y avoir recours.
IV

L'intendance ne se contentait pas de mettre la main sur les

ambulances civiles, elle voulait encore appliquer son estampille sur
le dos des médecins et s'en faire d'humbles subordonnés. Je n'ai
jamais compris pourquoi les grandes ambulances se sont laissé
mettre au cou le collier de l'intendance et lui ont prêté serment de
vasselage en se faisant un titre d'être ses auxiliaires.
Les grandes ambulances n'avaient nul besoin de l'administration
qui, elle, au contraire, ne pouvait se passer d'elles. Il leur était donc
facile de conserver une indépendance pleine de dignité.
Parmi les médecins qui se consacraient au soulagement des
blessés, un certain nombre se montra absolument réfractaire aux
étreintes de l'intendance ; je n'ai pas besoin de dire que j'étais de
ces médecins-là.
Pour sortir des portes de Paris, quand il y avait une affaire, il
fallait naturellement être muni de certains insignes, tels que :
drapeaux aux voitures, brassards estampillés par les maires, cartes
d'ambulances et laissez-passer. Il fallait nécessairement, dans
l'intérêt du service, qu'on eût recours à des mesures de précaution.
Seulement, celles que je viens d'énumérer étaient insuffisantes. Il
était facile au premier venu de se procurer tout cela et les routes se
trouvaient encombrées de flâneurs, qui en prenaient seulement pour
leur plaisir, en se tenant à une distance trop respectueuse de
l'affaire.
Leurs voitures rentraient constamment vides de blessés ; ils
s'étaient contentés d'admirer les effets du lointain et d'embarrasser
la route des ambulanciers sérieux. Rien de plus facile, comme je le
dirai tout à l'heure, que d'écarter ces gens-là des routes où ils
n'étaient que gênants. Mais l'intendance n'y songeait guère ; elle ne
semblait pas tenir absolument à ce qu'on fût utile, elle voulait
surtout qu'on portât sa livrée. Aussi, en collaboration de M. Trochu,
elle fit publier un arrêté qui lui laissait la faculté de choisir ses élus,
c'est-à-dire les gens porteurs de son estampille.
Je ne critique pas l'arrêté d'une manière absolue, mais il ne
remédiait nullement à l'abus que j'ai signalé et il devenait une
barrière opposée à des médecins qui pouvaient rendre de réels
services. Ainsi un fruitier qui aurait désiré faire entendre à sa famille
le bruit lointain d'une bataille aurait trouvé devant sa charrette les
portes grandes ouvertes, s'il avait pris la simple précaution de
demander à l'intendance un visa qu'elle ne refusait à personne,
tandis qu'un docteur, fût-il professeur à la Faculté de Médecine,
pouvait se voir fermer ladite porte au nez s'il dédaignait de se laisser
viser par l'intendance.
Les ambulances régulièrement organisées n'étaient pas non plus,
sur ce point, à l'abri de tout reproche. On rencontrait sur les routes
des voitures absolument pleines d'ambulanciers. Je me demandais
où ils pourraient, au retour, loger leurs blessés? et cela s'est vu
jusqu'à la fin de la guerre, c'est-à-dire à une époque où les
brancardiers, organisés en escouades, rendaient tout à fait inutile le
transport de ce personnel de curieux, qui n'avaient même pas le
prétexte de rendre des services.
Pour écarter cette cohue encombrante, il aurait suffi d'interdire le
chemin des combats à toute voiture contenant plus d'un ambulancier
en dehors du cocher.
On aurait ainsi réservé aux blessés toute la place disponible, et
qui se trouvait occupée par des gens qu'une simple curiosité
conduisait. Et comme, en général, ces gens-là étaient fort prudents,
il en résultait que trop souvent les voitures s'arrêtaient beaucoup
trop loin du combat.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and

personal growth!

ebooknice.com

Data Mining A Tutorial-Based Primer, Second Edition PDF
100% (1)
Data Mining A Tutorial-Based Primer, Second Edition PDF
530 pages
Preview Book Method-Validation
No ratings yet
Preview Book Method-Validation
29 pages
(Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download
100% (1)
(Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download
48 pages
Complete Download (Ebook) Data Mining: A Tutorial-Based Primer, Second Edition by Roiger, Richard J ISBN 9781051051067, 9781498763974, 9781498763981, 1051051061, 1498763979, 1498763987 PDF All Chapters
100% (10)
Complete Download (Ebook) Data Mining: A Tutorial-Based Primer, Second Edition by Roiger, Richard J ISBN 9781051051067, 9781498763974, 9781498763981, 1051051061, 1498763979, 1498763987 PDF All Chapters
55 pages
Instant download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf all chapter
100% (2)
Instant download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf all chapter
81 pages
Immediate download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger ebooks 2024
No ratings yet
Immediate download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger ebooks 2024
90 pages
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger - The ebook is ready for instant download and access
100% (2)
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger - The ebook is ready for instant download and access
70 pages
PDF Data Mining A Tutorial Based Primer Second Edition Roiger download
100% (1)
PDF Data Mining A Tutorial Based Primer Second Edition Roiger download
44 pages
Data Mining A Tutorial Based Primer Second Edition Roiger instant download
No ratings yet
Data Mining A Tutorial Based Primer Second Edition Roiger instant download
61 pages
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger instant download
100% (1)
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger instant download
73 pages
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
100% (1)
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
81 pages
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf download
100% (3)
Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf download
76 pages
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann instant download
No ratings yet
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann instant download
76 pages
13653
No ratings yet
13653
55 pages
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann instant download
100% (5)
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann instant download
80 pages
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann - Download the complete ebook in PDF format and read freely
100% (2)
RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann - Download the complete ebook in PDF format and read freely
63 pages
Get RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann free all chapters
100% (2)
Get RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann free all chapters
58 pages
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal - The special ebook edition is available for download now
100% (1)
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal - The special ebook edition is available for download now
47 pages
Download Full (Ebook) Data Science and Analytics with Python by Jesus Rogel-Salazar ISBN 9781498742092, 1498742092 PDF All Chapters
100% (8)
Download Full (Ebook) Data Science and Analytics with Python by Jesus Rogel-Salazar ISBN 9781498742092, 1498742092 PDF All Chapters
65 pages
Healthcare Data Analytics 1st Edition Chandan K. Reddy all chapter instant download
100% (7)
Healthcare Data Analytics 1st Edition Chandan K. Reddy all chapter instant download
81 pages
Complete Download RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann PDF All Chapters
No ratings yet
Complete Download RapidMiner Data Mining Use Cases and Business Analytics Applications 1st Edition Markus Hofmann PDF All Chapters
72 pages
Data Science and Analytics with Python 1st Edition Jesus Rogel-Salazar - The ebook in PDF/DOCX format is ready for download now
100% (4)
Data Science and Analytics with Python 1st Edition Jesus Rogel-Salazar - The ebook in PDF/DOCX format is ready for download now
67 pages
Get Data Science and Analytics with Python 1st Edition Jesus Rogel-Salazar free all chapters
100% (6)
Get Data Science and Analytics with Python 1st Edition Jesus Rogel-Salazar free all chapters
55 pages
Healthcare Data Analytics 1st Edition Chandan K. Reddy download
100% (4)
Healthcare Data Analytics 1st Edition Chandan K. Reddy download
76 pages
Petascale analytics large scale machine learning in the earth sciences 1st Edition Ashok N. Srivastava pdf download
100% (1)
Petascale analytics large scale machine learning in the earth sciences 1st Edition Ashok N. Srivastava pdf download
36 pages
Petascale Analytics Largescale Machine Learning In The Earth Sciences 1st Edition Ashok N Srivastava download
No ratings yet
Petascale Analytics Largescale Machine Learning In The Earth Sciences 1st Edition Ashok N Srivastava download
90 pages
(Ebook) Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions by Naiyang Deng, Yingjie Tian, Chunhua Zhang ISBN 9781439857922, 143985792X download
100% (1)
(Ebook) Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions by Naiyang Deng, Yingjie Tian, Chunhua Zhang ISBN 9781439857922, 143985792X download
57 pages
22601
No ratings yet
22601
66 pages
Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF
100% (17)
Healthcare Data Analytics 1st Edition Chandan K. Reddy Download PDF
84 pages
Ata Lassification: Algorithms and Applications
No ratings yet
Ata Lassification: Algorithms and Applications
64 pages
Databook PDF
No ratings yet
Databook PDF
64 pages
Previewpdf
No ratings yet
Previewpdf
107 pages
1-Data Mining and Applications
No ratings yet
1-Data Mining and Applications
70 pages
Healthcare Data Analytics 1st Edition Chandan K. Reddy - Quickly download the ebook to explore the full content
100% (1)
Healthcare Data Analytics 1st Edition Chandan K. Reddy - Quickly download the ebook to explore the full content
65 pages
Computational Intelligent Data Analysis For Sustainable Development Ting Yu instant download
100% (1)
Computational Intelligent Data Analysis For Sustainable Development Ting Yu instant download
81 pages
Data Mining: Concepts and Techniques (2nd Edition)
No ratings yet
Data Mining: Concepts and Techniques (2nd Edition)
8 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
Data Mining for the Masses 2nd Edition by Matthew North ISBN 0615684378 9780615684376 - Experience the full ebook by downloading it now
100% (7)
Data Mining for the Masses 2nd Edition by Matthew North ISBN 0615684378 9780615684376 - Experience the full ebook by downloading it now
90 pages
Instant Access to (Ebook) Biological Data Mining (Chapman & Hall Crc Data Mining and Knowledge Discovery Series) by Jake Y. Chen, Stefano Lonardi ISBN 1420086847 ebook Full Chapters
100% (4)
Instant Access to (Ebook) Biological Data Mining (Chapman & Hall Crc Data Mining and Knowledge Discovery Series) by Jake Y. Chen, Stefano Lonardi ISBN 1420086847 ebook Full Chapters
81 pages
(Ebook) Petascale analytics : large-scale machine learning in the earth sciences by Ashok N. Srivastava, Ramakrishna Nemani, Karsten Steinhaeuser ISBN 9781498703871, 1498703879 download pdf
100% (10)
(Ebook) Petascale analytics : large-scale machine learning in the earth sciences by Ashok N. Srivastava, Ramakrishna Nemani, Karsten Steinhaeuser ISBN 9781498703871, 1498703879 download pdf
65 pages
Instant Download Social Networks With Rich Edge Semantics 1st Edition Quan Zheng PDF All Chapter
100% (2)
Instant Download Social Networks With Rich Edge Semantics 1st Edition Quan Zheng PDF All Chapter
62 pages
Full Download Spectral Feature Selection for Data Mining 1st Edition Zheng Alan Zhao (Author) PDF DOCX
100% (11)
Full Download Spectral Feature Selection for Data Mining 1st Edition Zheng Alan Zhao (Author) PDF DOCX
60 pages
Social Networks with Rich Edge Semantics 1st Edition Quan Zheng - The ebook is ready for download with just one simple click
100% (2)
Social Networks with Rich Edge Semantics 1st Edition Quan Zheng - The ebook is ready for download with just one simple click
61 pages
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal pdf download
No ratings yet
Data Clustering Algorithms and Applications First Edition Charu C. Aggarwal pdf download
45 pages
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Scott Spangler pdf download
100% (2)
Accelerating Discovery Mining Unstructured Information For Hypothesis Generation Scott Spangler pdf download
78 pages
Data Mining for the Masses 2nd Edition by Matthew North ISBN 0615684378 9780615684376 pdf download
100% (4)
Data Mining for the Masses 2nd Edition by Matthew North ISBN 0615684378 9780615684376 pdf download
52 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
(Ebook) Healthcare Data Analytics by Chandan K. Reddy, Charu C Aggarwal, (eds.) ISBN 9781482232110, 1482232111 2024 scribd download
100% (1)
(Ebook) Healthcare Data Analytics by Chandan K. Reddy, Charu C Aggarwal, (eds.) ISBN 9781482232110, 1482232111 2024 scribd download
81 pages
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
No ratings yet
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
17 pages
Download Complete Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia PDF for All Chapters
100% (3)
Download Complete Service Oriented Distributed Knowledge Discovery 1st Edition Domenico Talia PDF for All Chapters
69 pages
(Ebook) Knowledge Discovery Practices and Emerging Applications of Data Mining: Trends and New Domains by A.V. Senthil Kumar ISBN 9781609600679, 1609600673download
100% (4)
(Ebook) Knowledge Discovery Practices and Emerging Applications of Data Mining: Trends and New Domains by A.V. Senthil Kumar ISBN 9781609600679, 1609600673download
52 pages
Data Mining
No ratings yet
Data Mining
26 pages
Social Networks With Rich Edge Semantics 1st Edition Quan Zheng download
No ratings yet
Social Networks With Rich Edge Semantics 1st Edition Quan Zheng download
80 pages
Contrast Data Mining - Concepts, Algorithms, and Applications (Dong & Bailey 2012-09-07)
No ratings yet
Contrast Data Mining - Concepts, Algorithms, and Applications (Dong & Bailey 2012-09-07)
428 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Download Full Social Networks with Rich Edge Semantics 1st Edition Quan Zheng PDF All Chapters
100% (1)
Download Full Social Networks with Rich Edge Semantics 1st Edition Quan Zheng PDF All Chapters
55 pages
(Ebook) Next generation of data mining by Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, Vipin Kumar ISBN 9781420085860, 1420085867 pdf download
100% (1)
(Ebook) Next generation of data mining by Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, Vipin Kumar ISBN 9781420085860, 1420085867 pdf download
58 pages
812248(Ebook) Service-Oriented Distributed Knowledge Discovery by Domenico Talia, Paolo Trunfio ISBN 9781439875315, 1439875316 all chapter instant download
100% (2)
812248(Ebook) Service-Oriented Distributed Knowledge Discovery by Domenico Talia, Paolo Trunfio ISBN 9781439875315, 1439875316 all chapter instant download
81 pages
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Business Statistics - Assignment
No ratings yet
Business Statistics - Assignment
7 pages
Sample Exam Questions Stats1a
No ratings yet
Sample Exam Questions Stats1a
14 pages
DDT: Distributed Decision Tree
No ratings yet
DDT: Distributed Decision Tree
54 pages
BCOM SYLLABUS
No ratings yet
BCOM SYLLABUS
25 pages
May D. Segletes D. and Gordon A. P. 2013 The Application of The Norton Bailey Law For Creep Prediction Through Power Law Regression
No ratings yet
May D. Segletes D. and Gordon A. P. 2013 The Application of The Norton Bailey Law For Creep Prediction Through Power Law Regression
8 pages
Car_crash___Jupyter_Notebook.pdf
No ratings yet
Car_crash___Jupyter_Notebook.pdf
33 pages
Skill Importance in Volleyball
No ratings yet
Skill Importance in Volleyball
15 pages
Fake Profile Identification Using Machine Learning
No ratings yet
Fake Profile Identification Using Machine Learning
7 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
8 pages
Applied Eco No Metrics With Stata
No ratings yet
Applied Eco No Metrics With Stata
170 pages
Professional FP&A_ a Comprehensive Guide to Financial Planning& Analysis-Reactive Publishing (2024)
No ratings yet
Professional FP&A_ a Comprehensive Guide to Financial Planning& Analysis-Reactive Publishing (2024)
358 pages
Tutorial in Biostatistics Using The General Linear Mixed Model To Analyse Unbalanced Repeated Measures and Longitudinal Data
No ratings yet
Tutorial in Biostatistics Using The General Linear Mixed Model To Analyse Unbalanced Repeated Measures and Longitudinal Data
32 pages
Chapter_6_Multiple_Regression_Analysis_Further_Issues
No ratings yet
Chapter_6_Multiple_Regression_Analysis_Further_Issues
9 pages
Site Adaptation and Solar Radiation Forecasting1
No ratings yet
Site Adaptation and Solar Radiation Forecasting1
14 pages
Employee Training On Customer Satisfaction
No ratings yet
Employee Training On Customer Satisfaction
25 pages
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
No ratings yet
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
48 pages
Download Study Resources for Test Bank for Statistics: Learning from Data, 2nd Edition, Roxy Peck Tom Short
100% (18)
Download Study Resources for Test Bank for Statistics: Learning from Data, 2nd Edition, Roxy Peck Tom Short
40 pages
PSE Syllabus
No ratings yet
PSE Syllabus
3 pages
1 The Field of Engineering Management
No ratings yet
1 The Field of Engineering Management
25 pages
Addressing Endogeneity in International Marketing Applications of Partial Least Squares Structural Equation Modeling
No ratings yet
Addressing Endogeneity in International Marketing Applications of Partial Least Squares Structural Equation Modeling
21 pages
FORECASTING AND INVENTORY @paper sample paper
No ratings yet
FORECASTING AND INVENTORY @paper sample paper
17 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
32 pages
Course Content Lecture Plan VF
No ratings yet
Course Content Lecture Plan VF
4 pages
Simple Linear Regression Models Using SAS
No ratings yet
Simple Linear Regression Models Using SAS
10 pages
Excel Guide For CAES9821 Part 1 - Data Analysis Tool and Correlation
No ratings yet
Excel Guide For CAES9821 Part 1 - Data Analysis Tool and Correlation
15 pages
Kaplanlearn - Key Concepts 19
No ratings yet
Kaplanlearn - Key Concepts 19
2 pages
PyCaret Regression
No ratings yet
PyCaret Regression
13 pages
STAT-221 Statistics - II: NUST Business School BBA
No ratings yet
STAT-221 Statistics - II: NUST Business School BBA
4 pages

Instant ebooks textbook (Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download all chapters

Uploaded by

Instant ebooks textbook (Ebook) Data Mining: A Tutorial-Based Primer, 2nd Edition by Richard J. Roiger ISBN 9781498763974, 1498763979 download all chapters

Uploaded by

Visit https://ebooknice.

com to download the full version and

(Ebook) Data Mining: A Tutorial-Based Primer, 2nd

_____ Click the link below to download _____

Explore and download more ebooks at ebooknice.com

(Ebook) Data Mining: A Tutorial-Based Primer, Second Edition by

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James

(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study:

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans

(Ebook) Making Sense of Data I: A Practical Guide to Exploratory Data

(Ebook) Geographic Data Mining and Knowledge Discovery, Second Edition

AIMS AND SCOPE

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-6397-4 (Pack - Book and Ebook)

List of Figures, xvii

SECTION I Data Mining Fundamentals

CHAPTER 1 ◾ Data Mining: A First View 3

1.5 A NEAREST NEIGHBOR APPROACH 18

CHAPTER 2 ◾ Data Mining: A Closer Look 33

2.5.3 Evaluating Numeric Output 53

CHAPTER 3 ◾ Basic Data Mining Techniques 63

SECTION II Tools for Knowledge Discovery

CHAPTER 4 ◾ Weka—An Environment for Knowledge Discovery 105

4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION 122

CHAPTER 5 ◾ Knowledge Discovery with RapidMiner 145

CHAPTER 6 ◾ The Knowledge Discovery Process 199

6.3 CREATING A TARGET DATA SET 202

CHAPTER 7 ◾ Formal Evaluation Techniques 221

7.5 UNSUPERVISED EVALUATION TECHNIQUES 235

SECTION III Building Neural Networks

CHAPTER 8 ◾ Neural Networks 253

CHAPTER 9 ◾ Building Neural Networks with Weka 271

9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT 274

CHAPTER 10 ◾ Building Neural Networks with RapidMiner 293

SECTION IV Advanced Data Mining Techniques

CHAPTER 11 ◾ Supervised Statistical Techniques 317

11.4 REGRESSION TREES 349

CHAPTER 12 ◾ Unsupervised Clustering Techniques 357

CHAPTER 13 ◾ Specialized Techniques 377

13.2 MINING THE WEB 391

CHAPTER 14 ◾ The Data Warehouse 423

14.2.2 Structuring the Data Warehouse: The Star Schema 429

APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING, 451

APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION, 455

Figure 1.1 A decision tree for the data in Table 1.1. 10

Figure 3.13 A crossover operation. 94

Figure 4.29 The supermarket data set. 130

Figure 9.8 Network architecture with associated connection weights. 279

Figure 13.6 A 3-month price chart for XIV. 384

Figure 13.35 Subprocess using a decision tree without AdaBoost. 415

Table 1.1 Hypothetical training data for disease diagnosis 9

Table 3.10 A second-generation population 94

• Data gathering and storage

UPDATED CONTENT AND SOFTWARE CHANGES

• In Chapter 4, I introduce the Waikato Environment for Knowledge Analysis (Weka),

• Selected new topics include

• A brief introduction to big data and data analytics

• Receiver operating characteristic (ROC) curves

• Methods for handling large-sized, streaming, and imbalanced data

• Extended coverage of textual data mining

• Added techniques for attribute and outlier analysis

DATA SETS FOR DATA MINING

• The CRC website: https://www.crcpress.com/Data-Mining-A-Tutorial-Based-Primer

• Educators in the areas of decision science, computer science, information systems,

• End-of-chapter exercises. The end-of-chapter exercises reinforce the techniques

• Computational questions have a mathematical flavor in that they require the

Section I: Data Mining Fundamentals

Section II: Tools for Knowledge Discovery

• Chapter 4 presents a tutorial introduction to Weka’s Explorer. Several tutorials pro-

Section III: Building Neural Networks

• Chapter 8 presents two popular neural network models. A detailed explanation of

Section IV: Advanced Data Mining Techniques

• Chapter 11 details several supervised statistical techniques including naive Bayes

USING WEKA AND RAPIDMINER

SUGGESTED COURSE OUTLINES

_ Click the link below to download _