Exploratory Multivariate Analysis by Example Using R Second Edition Husson download
Exploratory Multivariate Analysis by Example Using R Second Edition Husson download
https://ebookfinal.com/download/exploratory-multivariate-
analysis-by-example-using-r-second-edition-husson/
https://ebookfinal.com/download/data-analysis-and-graphics-using-r-an-
example-based-approach-third-edition-john-maindonald/
https://ebookfinal.com/download/sas-functions-by-example-second-
edition-ron-cody/
https://ebookfinal.com/download/think-stats-exploratory-data-analysis-
second-edition-allen-b-downey/
https://ebookfinal.com/download/applied-multivariate-data-analysis-
second-edition-brian-s-everitt/
Using Multivariate Statistics Barbara G. Tabachnick
https://ebookfinal.com/download/using-multivariate-statistics-barbara-
g-tabachnick/
https://ebookfinal.com/download/exploratory-network-analysis-with-
pajek-wouter-de-nooy/
https://ebookfinal.com/download/handbook-of-univariate-and-
multivariate-data-analysis-with-ibm-spss-second-edition-robert-ho/
https://ebookfinal.com/download/sfml-game-development-by-example-
create-and-develop-exciting-games-from-start-to-finish-using-sfml-
pupius/
https://ebookfinal.com/download/structural-aspects-in-the-theory-of-
probability-second-edition-series-on-multivariate-analysis-herbert-
heyer/
Exploratory Multivariate Analysis by Example Using R
Second Edition Husson Digital Instant Download
Author(s): Husson, François; Lê, Sébastien; Pagès, Jérôme
ISBN(s): 9781315301860, 1315301865
Edition: Second edition
File Details: PDF, 18.05 MB
Year: 2017
Language: english
Chapman & Hall/CRC
Computer Science and Data Analysis Series
Exploratory Multivariate
Analysis by Example Using R
François Husson
Sébastien Lê
Jérôme Pagès
The interface between the computer and statistical sciences is increasing, as each
discipline seeks to harness the power and resources of the other. This series aims to
foster the integration between the computer sciences and statistical, numerical, and
probabilistic methods by publishing a broad range of reference works, textbooks, and
handbooks.
SERIES EDITORS
David Blei, Princeton University
David Madigan, Rutgers University
Marina Meila, University of Washington
Fionn Murtagh, Royal Holloway, University of London
Proposals for the series should be sent directly to one of the series editors above, or submitted to:
Published Titles
®
Computational Statistics Handbook with MATLAB , Third Edition
Wendy L. Martinez and Angel R. Martinez
R Graphics
Paul Murrell
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface xi
v
vi Contents
4 Clustering 173
4.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.2 Formalising the Notion of Similarity . . . . . . . . . . . . . . 177
4.2.1 Similarity between Individuals . . . . . . . . . . . . . 177
4.2.1.1 Distances and Euclidean Distances . . . . . . 177
4.2.1.2 Example of Non-Euclidean Distance . . . . . 178
4.2.1.3 Other Euclidean Distances . . . . . . . . . . 179
4.2.1.4 Similarities and Dissimilarities . . . . . . . . 179
4.2.2 Similarity between Groups of Individuals . . . . . . . 180
4.3 Constructing an Indexed Hierarchy . . . . . . . . . . . . . . 181
4.3.1 Classic Agglomerative Algorithm . . . . . . . . . . . . 181
4.3.2 Hierarchy and Partitions . . . . . . . . . . . . . . . . . 183
4.4 Ward’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.4.1 Partition Quality . . . . . . . . . . . . . . . . . . . . . 184
4.4.2 Agglomeration According to Inertia . . . . . . . . . . 185
4.4.3 Two Properties of the Agglomeration Criterion . . . . 187
4.4.4 Analysing Hierarchies, Choosing Partitions . . . . . . 188
4.5 Direct Search for Partitions: K-Means Algorithm . . . . . . 189
4.5.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . 189
4.5.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 191
4.6 Partitioning and Hierarchical Clustering . . . . . . . . . . . . 191
4.6.1 Consolidating Partitions . . . . . . . . . . . . . . . . . 192
4.6.2 Mixed Algorithm . . . . . . . . . . . . . . . . . . . . . 192
4.7 Clustering and Principal Component Methods . . . . . . . . 192
4.7.1 Principal Component Methods Prior to AHC . . . . . 193
4.7.2 Simultaneous Analysis of a Principal Component Map
and Hierarchy . . . . . . . . . . . . . . . . . . . . . . . 193
4.8 Clustering and Missing Data . . . . . . . . . . . . . . . . . . 194
4.9 Example: The Temperature Dataset . . . . . . . . . . . . . . 194
4.9.1 Data Description — Issues . . . . . . . . . . . . . . . 194
4.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 195
4.9.3 Implementation of the Analysis . . . . . . . . . . . . . 195
4.10 Example: The Tea Dataset . . . . . . . . . . . . . . . . . . . 199
4.10.1 Data Description — Issues . . . . . . . . . . . . . . . 199
4.10.2 Constructing the AHC . . . . . . . . . . . . . . . . . . 201
4.10.3 Defining the Clusters . . . . . . . . . . . . . . . . . . . 202
4.11 Dividing Quantitative Variables into Classes . . . . . . . . . 204
5 Visualisation 209
5.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.2 Viewing PCA Data . . . . . . . . . . . . . . . . . . . . . . . 209
5.2.1 Selecting a Subset of Objects — Cloud of Individuals 210
5.2.2 Selecting a Subset of Objects — Cloud of Variables . . 211
5.2.3 Adding Supplementary Information . . . . . . . . . . 212
x Contents
Appendix 225
A.1 Percentage of Inertia Explained by the First Component or by
the First Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 225
A.2 R Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
A.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 230
A.2.2 The Rcmdr Package . . . . . . . . . . . . . . . . . . . 234
A.2.3 The FactoMineR Package . . . . . . . . . . . . . . . . 236
Bibliography 243
Index 245
Preface
xi
xii Preface
each chapter on managing missing data, which will enable users to conduct
analyses from incomplete tables more easily.
The authors would like to thank Rebecca Clayton for her help in the transla-
tion.
1
Principal Component Analysis (PCA)
1.2 Objectives
The data table can be considered either as a set of rows (individuals) or as a
set of columns (variables), thus raising a number of questions relating to these
different types of objects.
2 Exploratory Multivariate Analysis by Example Using R
TABLE 1.1
Some Examples of Datasets
Field Individuals Variables xik
Ecology Rivers Concentration of pollutants Concentration of pollu-
tant k in river i
Economics Years Economic indicators Indicator value k for year
i
Genetics Patients Genes Expression of gene k for
patient i
Marketing Brands Measures of satisfaction Value of measure k for
brand i
Pedology Soils Granulometric composition Content of component k
in soil i
Biology Animals Measurements Measure k for animal i
TABLE 1.2
The Orange Juice Data
Odour Odour Pulp Intensity Acidity Bitter- Sweet-
intensity typicality of taste ness ness
Pampryl amb. 2.82 2.53 1.66 3.46 3.15 2.97 2.60
Tropicana amb. 2.76 2.82 1.91 3.23 2.55 2.08 3.32
Fruvita fr. 2.83 2.88 4.00 3.45 2.42 1.76 3.38
Joker amb. 2.76 2.59 1.66 3.37 3.05 2.56 2.80
Tropicana fr. 3.20 3.02 3.69 3.12 2.33 1.97 3.34
Pampryl fr. 3.07 2.73 3.34 3.54 3.31 2.63 2.90
2
1.0
l l l
l l
ll l l
1.0
l l l l l
l
l l
ll l
ll l l
ll
ll
l l l
l l l
0.5
1
l l l
l l
0.5
l ll l
l
l l
l l ll
l l
l
Variable k
Variable k
Variable k
l l
l l
l
l ll
0.0
0.0
0
l
l l l
l l l l
l
l
l l
l l
l l
l l
−0.5
l
l
−0.5
l
−1
l l l
l l
l l
l
l
l
l l l l l
l l
−1.0
l l l
l
l ll
l l l
l l l
−1.0
l l
l
−2
l
l l l
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −2 −1 0 1 2 3
Variable j Variable j Variable j
FIGURE 1.1
Representation of 40 individuals described by two variables: j and k.
very unusual variables and intermediate variables, which are to some extent
linked to both groups. In the example, each group can be represented by one
single variable as the variables within each group are very strongly correlated.
We refer to these variables as synthetic variables.
A B C
1.0
l
l l l
l l l
l l l
0.0
0.0
l
l
l ll l l l l
ll l l
l
l l l l l l l
l l l
l l
l l l l
l l l l l
l l l
l
ll l l l l
l l l l
l l l l
l l l l l
l l l
Variable k
l l l
Variable l
Variable l
l l
l l l l
l l
l l l
l
0.0
l l l
l l l l l
l
l ll ll l l
l ll l ll
l l l l
l ll l l l
l l
l l
l l l
l l
l l l l l l
ll l l
l l l l
l ll l l l l
−0.5
l l l l
l l
l l
l
l l l l
l l l
l l
l l l l
l l l l
−0.8
−0.8
l l l l l l
l l l
l
l ll
l l l l
l l l l l l l
−1.0
l l l l
l l l l l l l l
l l l l
l l
l l l l
−1.0
−1.0
l l l
l l l l l
l l
l l l
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Variable j Variable j Variable k
D E F
l l l
l l l
l l l
0.2 0.4 0.6 0.8 1.0
Variable m
Variable m
l l l
l l l
l l l l l l
ll l l l l l l l
l l l l l l
l l l l l l
l l l l ll l l l
l l l
l l l l l l
l l l
l l l
l l l
l l l l l l
l l l l l l
l l l
l l l
l l l l l l
l l l l l l
l l l
l l l l l l l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
l l l l l l
l l l
l l l
−0.2
−0.2
−0.2
l l l
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0
Variable j Variable k Variable l
FIGURE 1.2
Representation of the relationships between four variables: j, k, l, and m,
taken two-by-two.
the strongly correlated variables into sets, but even for this reduced number
of variables, grouping them this way is tedious.
TABLE 1.3
Orange Juice Data: Correlation Matrix
Odour Odour Pulp Intensity Acidity Bitter- Sweet-
intensity typicality of taste ness ness
Odour intensity 1.00 0.58 0.66 −0.27 −0.15 −0.15 0.23
Odour typicality 0.58 1.00 0.77 −0.62 −0.84 −0.88 0.92
Pulp content 0.66 0.77 1.00 −0.02 −0.47 −0.64 0.63
Intensity of taste −0.27 −0.62 −0.02 1.00 0.73 0.51 −0.57
Acidity −0.15 −0.84 −0.47 0.73 1.00 0.91 −0.90
Bitterness −0.15 −0.88 −0.64 0.51 0.91 1.00 −0.98
Sweetness 0.23 0.92 0.63 −0.57 −0.90 −0.98 1.00
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com