0% found this document useful (0 votes)

13 views

Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann pdf download

The document is an introduction to the book 'Introduction to Statistics and Data Analysis with Exercises, Solutions and Applications in R' by Christian Heumann and others, which aims to teach statistical concepts using the R programming language. It emphasizes a balance between comprehensible explanations of statistical methods and their practical application, making it suitable for beginners from various fields. The book includes exercises, solutions, and supplementary materials available online to enhance the learning experience.

Uploaded by

galvalabanhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann pdf download

Uploaded by

galvalabanhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Introduction to Statistics and Data Analysis

With Exercises Solutions and Applications in R

1st Edition Christian Heumann download

https://textbookfull.com/product/introduction-to-statistics-and-
data-analysis-with-exercises-solutions-and-applications-in-r-1st-
edition-christian-heumann/

Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Introduction to Statistics and Data Analysis Roxy Peck

https://textbookfull.com/product/introduction-to-statistics-and-
data-analysis-roxy-peck/

An Introduction to Secondary Data Analysis with IBM

SPSS Statistics 1st Edition John Macinnes

https://textbookfull.com/product/an-introduction-to-secondary-
data-analysis-with-ibm-spss-statistics-1st-edition-john-macinnes/

An Introduction to Secondary Data Analysis with IBM

SPSS Statistics First Edition Macinnes

https://textbookfull.com/product/an-introduction-to-secondary-
data-analysis-with-ibm-spss-statistics-first-edition-macinnes/

Introduction to Data Science Data Analysis and

Prediction Algorithms with R 1st Edition By Rafael A.
Irizarry

https://textbookfull.com/product/introduction-to-data-science-
data-analysis-and-prediction-algorithms-with-r-1st-edition-by-
rafael-a-irizarry/
Reasoning with Data An Introduction to Traditional and
Bayesian Statistics Using R 1st Edition Jeffrey M.
Stanton

https://textbookfull.com/product/reasoning-with-data-an-
introduction-to-traditional-and-bayesian-statistics-using-r-1st-
edition-jeffrey-m-stanton/

Business Statistics with Solutions in R 1st Edition

Mustapha Abiodun Akinkunmi

https://textbookfull.com/product/business-statistics-with-
solutions-in-r-1st-edition-mustapha-abiodun-akinkunmi/

Data Mining with SPSS Modeler Theory Exercises and

Solutions 1st Edition Tilo Wendler

https://textbookfull.com/product/data-mining-with-spss-modeler-
theory-exercises-and-solutions-1st-edition-tilo-wendler/

An Introduction to Categorical Data Analysis 3rd

Edition Wiley Series in Probability and Statistics
Agresti

https://textbookfull.com/product/an-introduction-to-categorical-
data-analysis-3rd-edition-wiley-series-in-probability-and-
statistics-agresti/

Introduction to Data Analysis with R for Forensic

Scientists International Forensic Science and
Investigation 1st Edition Curran

https://textbookfull.com/product/introduction-to-data-analysis-
with-r-for-forensic-scientists-international-forensic-science-
and-investigation-1st-edition-curran/
Christian Heumann · Michael Schomaker
Shalabh

Introduction to
Statistics and
Data Analysis
With Exercises, Solutions and
Applications in R
Introduction to Statistics and Data Analysis
Christian Heumann Michael Schomaker
•

Shalabh

Introduction to Statistics
and Data Analysis
With Exercises, Solutions
and Applications in R

123
Christian Heumann Shalabh
Department of Statistics Department of Mathematics and Statistics
Ludwig-Maximilians-Universität München Indian Institute of Technology Kanpur
München Kanpur
Germany India

Michael Schomaker
Centre for Infectious Disease Epidemiology
and Research
University of Cape Town
Cape Town
South Africa

ISBN 978-3-319-46160-1 ISBN 978-3-319-46162-5 (eBook)

DOI 10.1007/978-3-319-46162-5

Library of Congress Control Number: 2016955516

© Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The success of the open-source statistical software “R” has made a significant
impact on the teaching and research of statistics in the last decade. Analysing data is
now easier and more affordable than ever, but choosing the most appropriate sta-
tistical methods remains a challenge for many users. To understand and interpret
software output, it is necessary to engage with the fundamentals of statistics.
However, many readers do not feel comfortable with complicated mathematics.
In this book, we attempt to find a healthy balance between explaining statistical
concepts comprehensively and showing their application and interpretation using R.
This book will benefit beginners and self-learners from various backgrounds as
we complement each chapter with various exercises and detailed and comprehen-
sible solutions. The results involving mathematics and rigorous proofs are separated
from the main text, where possible, and are kept in an appendix for interested
readers. Our textbook covers material that is generally taught in introductory-level
statistics courses to students from various backgrounds, including sociology,
biology, economics, psychology, medicine, and others. Most often, we introduce
the statistical concepts using examples and illustrate the calculations both manually
and using R.
However, while we provide a gentle introduction to R (in the appendix), this is
not a software book. Our emphasis lies on explaining statistical concepts correctly
and comprehensively, using exercises and software to delve deeper into the subject
matter and learn about the conceptual challenges that the methods present.
This book’s homepage, http://chris.userweb.mwn.de/book/, contains additional
material, most notably the software codes needed to answer the software exercises,
and data sets. In the remainder of this book, we will use grey boxes

to introduce the relevant R commands. In many cases, the code can be directly
pasted into R to reproduce the results and graphs presented in the book; in others,
the code is abbreviated to improve readability and clarity, and the detailed code can
be found online.

v
vi Preface

Many years of teaching experience, from undergraduate to postgraduate level,

went into this book. The authors hope that the reader will enjoy reading it and ﬁnd it a
useful reference for learning. We welcome critical feedback to improve future edi-
tions of this book. Comments can be sent to christian.heumann@stat.uni-
muenchen.de, shalab@iitk.ac.in, and michael.schomaker@uct.
ac.za who contributed equally to this book.
We thank Melanie Schomaker for producing some of the ﬁgures and giving
graphical advice, Alice Blanck from Springer for her continuous help and support,
and Lyn Imeson for her dedicated commitment which improved the earlier versions
of this book. We are grateful to our families who have supported us during the
preparation of this book.

München, Germany Christian Heumann

Cape Town, South Africa Michael Schomaker
Kanpur, India Shalabh
November 2016
Contents

Part I Descriptive Statistics

1 Introduction and Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Population, Sample, and Observations . . . . . . . . . . . . . . . . . . . 3
1.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Qualitative and Quantitative Variables . . . . . . . . . . . . . 5
1.2.2 Discrete and Continuous Variables . . . . . . . . . . . . . . . 6
1.2.3 Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Creating a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Statistical Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Frequency Measures and Graphical Representation of Data . . . . . . 17
2.1 Absolute and Relative Frequencies . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Empirical Cumulative Distribution Function . . . . . . . . . . . . . . . 19
2.2.1 ECDF for Ordinal Variables . . . . . . . . . . . . . . . . . . . . 20
2.2.2 ECDF for Continuous Variables . . . . . . . . . . . . . . . . . 22
2.3 Graphical Representation of a Variable . . . . . . . . . . . . . . . . . . . 24
2.3.1 Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Kernel Density Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Measures of Central Tendency and Dispersion . . . . . . . . . . . . . . . . . 37
3.1 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1.1 Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1.2 Median and Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.3 Quantile–Quantile Plots (QQ-Plots) . . . . . . . . . . . . . . . 44
3.1.4 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

vii
viii Contents

3.1.5 Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.1.6 Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Measures of Dispersion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Range and Interquartile Range . . . . . . . . . . . . . . . . . . . 49
3.2.2 Absolute Deviation, Variance, and Standard
Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.3 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Measures of Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.1 Lorenz Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.2 Gini Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Association of Two Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Summarizing the Distribution of Two Discrete Variables . . . . . 68
4.1.1 Contingency Tables for Discrete Data . . . . . . . . . . . . . 68
4.1.2 Joint, Marginal, and Conditional Frequency
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1.3 Graphical Representation of Two Nominal or
Ordinal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Measures of Association for Two Discrete Variables . . . . . . . . 74
4.2.1 Pearson’s χ2 Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 Cramer’s V Statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.3 Contingency Coefficient C . . . . . . . . . . . . . . . . . . . . . . 77
4.2.4 Relative Risks and Odds Ratios . . . . . . . . . . . . . . . . . . 78
4.3 Association Between Ordinal and Continuous Variables . . . . . . 79
4.3.1 Graphical Representation of Two Continuous
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.2 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.3 Spearman’s Rank Correlation Coefficient. . . . . . . . . . . 84
4.3.4 Measures Using Discordant and Concordant Pairs . . . . 86
4.4 Visualization of Variables from Different Scales . . . . . . . . . . . . 88
4.5 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Part II Probability Calculus

5 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 Permutations without Replacement . . . . . . . . . . . . . . . 101
5.2.2 Permutations with Replacement . . . . . . . . . . . . . . . . . . 101
5.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Contents ix

5.3.1 Combinations without Replacement

and without Consideration of the Order . . . ......... 102
5.3.2 Combinations without Replacement
and with Consideration of the Order . . . . . ......... 103
5.3.3 Combinations with Replacement
and without Consideration of the Order . . . ......... 103
5.3.4 Combinations with Replacement
and with Consideration of the Order . . . . . ......... 104
5.4 Key Points and Further Issues . . . . . . . . . . . . . . . . ......... 105
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... 105
6 Elements of Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.1 Basic Concepts and Set Theory . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Relative Frequency and Laplace Probability . . . . . . . . . . . . . . . 113
6.3 The Axiomatic Deﬁnition of Probability . . . . . . . . . . . . . . . . . . 115
6.3.1 Corollaries Following from Kolomogorov’s
Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.2 Calculation Rules for Probabilities . . . . . . . . . . . . . . . . 117
6.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4.1 Bayes’ Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.6 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.1 Random Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . . 129
7.2.1 CDF of Continuous Random Variables . . . . . . . . . . . . 129
7.2.2 CDF of Discrete Random Variables . . . . . . . . . . . . . . 131
7.3 Expectation and Variance of a Random Variable . . . . . . . . . . . 134
7.3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.3 Quantiles of a Distribution. . . . . . . . . . . . . . . . . . . . . . 137
7.3.4 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4 Tschebyschev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.5 Bivariate Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.6 Calculation Rules for Expectation and Variance . . . . . . . . . . . . 144
7.6.1 Expectation and Variance of the Arithmetic Mean . . . 145
7.7 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.7.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.7.2 Correlation Coefﬁcient . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.8 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
x Contents

8 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.1 Standard Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.1.1 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . 154
8.1.2 Degenerate Distribution . . . . . . . . . . . . . . . . . . . . . . . . 156
8.1.3 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.1.4 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.1.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.1.6 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 161
8.1.7 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . 163
8.1.8 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . 163
8.2 Standard Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . 165
8.2.1 Continuous Uniform Distribution. . . . . . . . . . . . . . . . . 165
8.2.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.2.3 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 170
8.3 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.3.1 χ2 -Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.3.2 t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.3.3 F-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.4 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Part III Inductive Statistics

9 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2 Properties of Point Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.1 Unbiasedness and Efficiency . . . . . . . . . . . . . . . . . . . . 183
9.2.2 Consistency of Estimators . . . . . . . . . . . . . . . . . . . . . . 189
9.2.3 Sufficiency of Estimators . . . . . . . . . . . . . . . . . . . . . . . 190
9.3 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.3.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . 192
9.3.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.4 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.4.2 Confidence Interval for the Mean of a Normal
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.4.3 Confidence Interval for a Binomial Probability . . . . . . 199
9.4.4 Confidence Interval for the Odds Ratio . . . . . . . . . . . . 201
9.5 Sample Size Determinations . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.6 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
10 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.2 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Contents xi

10.2.1 One- and Two-Sample Problems . . . . . . . . . . . . . . . . . 210

10.2.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
10.2.3 One- and Two-Sided Tests . . . . . . . . . . . . . . . . . . . . . 211
10.2.4 Type I and Type II Error . . . . . . . . . . . . . . . . . . . . . . . 213
10.2.5 How to Conduct a Statistical Test . . . . . . . . . . . . . . . . 214
10.2.6 Test Decisions Using the p-Value . . . . . . . . . . . . . . . . 215
10.2.7 Test Decisions Using Conﬁdence Intervals . . . . . . . . . 216
10.3 Parametric Tests for Location Parameters . . . . . . . . . . . . . . . . . 216
10.3.1 Test for the Mean When the Variance
is Known (One-Sample Gauss Test) . . . . . . . . . . .... 216
10.3.2 Test for the Mean When the Variance
is Unknown (One-Sample t-Test) . . . . . . . . . . . . .... 219
10.3.3 Comparing the Means of Two Independent
Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 221
10.3.4 Test for Comparing the Means
of Two Dependent Samples (Paired t-Test) . . . . . . . . . 225
10.4 Parametric Tests for Probabilities . . . . . . . . . . . . . . . . . . . . . . . 227
10.4.1 One-Sample Binomial Test for the Probability p . . . . . 227
10.4.2 Two-Sample Binomial Test . . . . . . . . . . . . . . . . . . . . . 230
10.5 Tests for Scale Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
10.6 Wilcoxon–Mann–Whitney (WMW) U-Test . . . . . . . . . . . . . . . 232
10.7 χ2 -Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.8 χ2 -Independence Test and Other χ2 -Tests. . . . . . . . . . . . . . . . . 238
10.9 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.1 The Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.2 Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.2.1 Properties of the Linear Regression Line . . . . . . . . . . . 255
11.3 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
11.4 Linear Regression with a Binary Covariate . . . . . . . . . . . . . . . . 259
11.5 Linear Regression with a Transformed Covariate . . . . . . . . . . . 261
11.6 Linear Regression with Multiple Covariates . . . . . . . . . . . . . . . 262
11.6.1 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
11.6.2 Categorical Covariates . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.6.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
11.7 The Inductive View of Linear Regression . . . . . . . . . . . . . . . . . 269
11.7.1 Properties of Least Squares and Maximum
Likelihood Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 273
11.7.2 The ANOVA Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.7.3 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.8 Comparing Different Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
11.9 Checking Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 285
xii Contents

11.10 Association Versus Causation . . . . . . . . . . . . . . . . . . . . . . . . . . 288

11.11 Key Points and Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . 289
11.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Appendix A: Introduction to R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Appendix B: Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Appendix C: Technical Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Appendix D: Visual Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
About the Authors

Prof. Christian Heumann is a professor at the Ludwig-Maximilians-Universität

München, Germany, where he teaches students in Bachelor and Master programs
offered by the Department of Statistics, as well as undergraduate students in the
Bachelor of Science programs in business administration and economics. His
research interests include statistical modeling, computational statistics and all
aspects of missing data.
Dr. Michael Schomaker is a Senior Researcher and Biostatistician at the Centre
for Infectious Disease Epidemiology & Research (CIDER), University of Cape
Town, South Africa. He received his doctoral degree from the University of
Munich. He has taught undergraduate students for many years and has written
contributions for various introductory textbooks. His research focuses on missing
data, causal inference, model averaging and HIV/AIDS.
Prof. Shalabh is a Professor at the Indian Institute of Technology Kanpur, India.
He received his Ph.D. from the University of Lucknow (India) and completed his
post-doctoral work at the University of Pittsburgh (USA) and University of Munich
(Germany). He has over twenty years of experience in teaching and research. His
main research areas are linear models, regression analysis, econometrics, mea-
surement error models, missing data models and sampling theory.

xiii
Part I
Descriptive Statistics
Introduction and Framework
1

Statistics is a collection of methods which help us to describe, summarize, interpret,

and analyse data. Drawing conclusions from data is vital in research, administra-
tion, and business. Researchers are interested in understanding whether a medical
intervention helps in reducing the burden of a disease, how personality relates to
decision-making, whether a new fertilizer increases the yield of crops, how a polit-
ical system affects trade policy, who is going to vote for a political party in the
next election, what are the long-term changes in the population of a fish species,
and many more questions. Governments and organizations may be interested in the
life expectancy of a population, the risk factors for infant mortality, geographical
differences in energy usage, migration patterns, or reasons for unemployment. In
business, identifying people who may be interested in a certain product, optimizing
prices, and evaluating the satisfaction of customers are possible areas of interest.
No matter what the question of interest is, it is important to collect data in a
way which allows its analysis. The representation of collected data in a data set or
data matrix allows the application of a variety of statistical methods. In the first
part of the book, we are going to introduce methods which help us in describing
data, and the second and third parts of the book focus on inferential statistics, which
means drawing conclusions from data. In this chapter, we are going to introduce the
framework of statistics which is needed to properly collect, administer, evaluate, and
analyse data.

1.1 Population, Sample, and Observations

Let us first introduce some terminology and related notations used in this book.
The units on which we measure data—such as persons, cars, animals, or plants—
are called observations. These units/observations are represented by the Greek

© Springer International Publishing Switzerland 2016 3

C. Heumann et al., Introduction to Statistics and Data Analysis,
DOI 10.1007/978-3-319-46162-5_1
4 1 Introduction and Framework

symbol ω. The collection of all units is called population and is represented by Ω.

When we refer to ω ∈ Ω, we mean a single unit out of all units, e.g. one person out of
all persons of interest. If we consider a selection of observations ω1 , ω2 , . . . , ωn , then
these observations are called sample. A sample is always a subset of the population,
{ω1 , ω2 , . . . , ωn } ⊆ Ω.

Example 1.1.1

• If we are interested in the social conditions under which Indian people live, then
we would define all inhabitants of India as Ω and each of its inhabitants as ω. If we
want to collect data from a few inhabitants, then those would represent a sample
from the total population.
• Investigating the economic power of Africa’s platinum industry would require to
treat each platinum-related company as ω, whereas all platinum-related companies
would be collected in Ω. A few companies ω1 , ω2 , . . . , ωn comprise a sample of
all companies.
• We may be interested in collecting information about those participating in a
statistics course. All participants in the course constitute the population Ω, and
each participant refers to a unit or observation ω.

Remark 1.1.1 Sometimes, the concept of a population is not applicable or difficult

to imagine. As an example, imagine that we measure the temperature in New Delhi
every hour. A sample would then be the time series of temperatures in a specific
time window, for example from January to March 2016. A population in the sense of
observational units does not exist here. But now assume that we measure temperatures
in several different cities; then, all the cities form the population, and a sample is any
subset of the cities.

1.2 Variables

If we have specified the population of interest for a specific research question, we

can think of what is of interest about our observations. A particular feature of these
observations can be collected in a statistical variable X . Any information we are
interested in may be captured in such a variable. For example, if our observations
refer to human beings, X may describe marital status, gender, age, or anything else
which may relate to a person. Of course, we can be interested in many different
features, each of them collected in a different variable X i , i = 1, 2, . . . , p. Each
observation ω takes a particular value for X . If X refers to gender, each observation,
i.e. each person, has a particular value x which refers to either “male” or “female”.
The formal definition of a variable is
X :Ω→S
(1.1)
ω → x
1.2 Variables 5

This definition states that a variable X takes a value x for each observation ω ∈ Ω,
whereby the number of possible values is contained in the set S.

Example 1.2.1

• If X refers to gender, possible x-values are contained in S = {male, female}. Each

observation ω is either male or female, and this information is summarized in X .
• Let X be the country of origin for a car. Possible values to be taken by an observation
ω (i.e. a car) are S = {Italy, South Korea, Germany, France, India, China, Japan,
USA, . . .}.
• A variable X which refers to age may take any value between 1 and 125. Each
person ω is assigned a value x which represents the age of this person.

1.2.1 Qualitative and Quantitative Variables

Qualitative variables are the variables which take values x that cannot be ordered in
a logical or natural way. For example,

• the colour of the eye,

• the name of a political party, and
• the type of transport used to travel to work

are all qualitative variables. Neither is there any reason to list blue eyes before brown
eyes (or vice versa) nor does it make sense to list buses before trains (or vice versa).
Quantitative variables represent measurable quantities. The values which these
variables can take can be ordered in a logical and natural way. Examples of quanti-
tative variables are

• size of shoes,
• price for houses,
• number of semesters studied, and
• weight of a person.

Remark 1.2.1 It is common to assign numbers to qualitative variables for practical

purposes in data analyses (see Sect. 1.4 for more detail). For instance, if we consider
the variable “gender”, then each observation can take either the “value” male or
female. We may decide to assign 1 to female and 0 to male and use these numbers
instead of the original categories. However, this is arbitrary, and we could have also
chosen “1” for male and “0” for female, or “2” for male and “10” for female. There
is no logical and natural order on how to arrange male and female, and thus, the
variable gender remains a qualitative variable, even after using numbers for coding
the values that X can take.
6 1 Introduction and Framework

1.2.2 Discrete and Continuous Variables

Discrete variables are variables which can only take a finite number of values.
All qualitative variables are discrete, such as the colour of the eye or the region of
a country. But also quantitative variables can be discrete: the size of shoes or the
number of semesters studied would be discrete because the number of values these
variables can take is limited.
Variables which can take an infinite number of values are called continuous
variables. Examples are the time it takes to travel to university, the length of an
antelope, and the distance between two planets. Sometimes, it is said that continuous
variables are variables which are “measured rather than counted”. This is a rather
informal definition which helps to understand the difference between discrete and
continuous variables. The crucial point is that continuous variables can, in theory,
take an infinite number of values; for instance, the height of a person may be recorded
as 172 cm. However, the actual height on the measuring tape might be 172.3 cm which
was rounded off to 172 cm. If one had a better measuring instrument, we may have
obtained 172.342 cm. But the real height of this person is a number with indefinitely
many decimal places such as 172.342975328… cm. No matter what we eventually
report or obtain, a variable which can take an infinite amount of values is defined to
be a continuous variable.

1.2.3 Scales

The thoughts and considerations from above indicate that different variables contain
different amounts of information. A useful classification of these considerations is
given by the concept of the scale of a variable. This concept will help us in the
remainder of this book to identify which methods are the appropriate ones to use in
a particular setting.

Nominal scale. The values of a nominal variable cannot be ordered. Examples are
the gender of a person (male–female) or the status of an application (pending–not
pending).
Ordinal scale. The values of an ordinal variable can be ordered. However, the differ-
ences between these values cannot be interpreted in a meaningful way. For exam-
ple, the possible values of education level (none–primary education–secondary
education–university degree) can be ordered meaningfully, but the differences
between these values cannot be interpreted. Likewise, the satisfaction with a prod-
uct (unsatisfied–satisfied–very satisfied) is an ordinal variable because the values
this variable can take can be ordered, but the differences between “unsatisfied–
satisfied” and “satisfied–very satisfied” cannot be compared in a numerical way.
Continuous scale. The values of a continuous variable can be ordered. Furthermore,
the differences between these values can be interpreted in a meaningful way. For
instance, the height of a person refers to a continuous variable because the values
can be ordered (170 cm, 171 cm, 172 cm, …), and differences between these
1.2 Variables 7

values can be compared (the difference between 170 and 171 cm is the same
as the difference between 171 and 172 cm). Sometimes, the continuous scale is
divided further into subscales. While in the remainder of the book we typically
do not need these classifications, it is still useful to reflect on them:

Interval scale. Only differences between values, but not ratios, can be interpreted.
An example for this scale would be temperature (measured in ◦ C): the difference
between −2 ◦ C and 4 ◦ C is 6 ◦ C, but the ratio of 4/ − 2 = −2 does not mean that
−4 ◦ C is twice as cold as 2 ◦ C.
Ratio scale. Both differences and ratios can be interpreted. An example is speed:
60 km/h is 40 km/h more than 20 km/h. Moreover, 60 km/h is three times faster
than 20 km/h because the ratio between them is 3.
Absolute scale. The absolute scale is the same as the ratio scale, with the excep-
tion that the values are measured in “natural” units. An example is “number of
semesters studied” where no artificial unit such as km/h or ◦ C is needed: the
values are simply 1, 2, 3, . . ..

1.2.4 Grouped Data

Sometimes, data may be available only in a summarized form: instead of the original
value, one may only know the category or group the value belongs to. For example,

• it is often convenient in a survey to ask for the income (per year) by means of
groups: [e0–e20,000), [e20,000–e30,000), . . ., > e100,000;
• if there are many political parties in an election, those with a low number of voters
are often summarized in a new category “Other Parties”;
• instead of capturing the number of claims made by an insurance company customer,
the variable “claimed” may denote whether or not the customer claimed at all
(yes–no).

If data is available in grouped form, we call the respective variable capturing

this information a grouped variable. Sometimes, these variables are also known as
categorical variables. This is, however, not a complete definition because categorical
variables refer to any type of variable which takes a finite, possibly small, number of
values. Thus, any discrete and/or nominal and/or ordinal and/or qualitative variable
may be regarded as a categorical variable. Any grouped or categorical variable which
can only take two values is called a binary variable.
To gain a better understanding on how the definitions from the above sections
relate to each other see Fig. 1.1. Qualitative data is always discrete, but quantitative
data can be both discrete (e.g. size of shoes or a grouped variable) and continuous
(e.g. temperature). Nominal variables are always qualitative and discrete (e.g. colour
of the eye), whereas continuous variables are always quantitative (e.g. temperature).
Categorical variables can be both qualitative (e.g. colour of the eye) and quantitative
(satisfaction level on a scale from 1 to 5). Categorical variables are never continuous.
8 1 Introduction and Framework

Fig. 1.1 Summary of variable classifications

1.3 Data Collection

When collecting data, we may ask ourselves how to facilitate this in detail and
how much data needs to be collected. The latter question will be partly answered
in Sect. 9.5; but in general, we can think of collecting data either on all subjects of
interest, such as in a national census, or on a representative sample of the population.
Most commonly, we gather data on a sample (described in the Part I of this book) and
then draw conclusions about the population of interest (discussed in the Part III of
this book). A sample might either be chosen by us or obtained through third parties
(hospitals, government agencies), or created during an experiment. This depends on
the context as described below.
Survey. A survey typically (but not always) collects data by asking questions (in
person or by phone) or providing questionnaires to study participants (as a printout
or online). For example, an opinion poll before a national election provides evidence
about the future government: potential voters are asked by phone which party they are
going to vote for in the next election; on the day of the election, this information can
be updated by asking the same question to a sample of voters who have just delivered
their vote at the polling station (so-called exit poll). A behavioural research survey
may ask members of a community about their knowledge and attitudes towards drug
use. For this purpose, the study coordinators can send people with a questionnaire
to this community and interview members of randomly selected households.
Ideally, a survey is conducted in a way which makes the chosen sample repre-
sentative of the population of interest. If a marketing company interviews people in
a pedestrian zone to find their views about a new chocolate bar, then these people
1.3 Data Collection 9

may not be representative of those who will potentially be interested in this product.
Similarly, if students are asked to fill in an online survey to evaluate a lecture, it
may turn out that those who participate are on average less satisfied than those who
do not. Survey sampling is a complex topic on its own. The interested reader may
consult Groves et al. (2009) or Kauermann and Küchenhoff (2011).
Experiment. Experimental data is obtained in “controlled” settings. This can mean
many things, but essentially it is data which is generated by the researcher with full
control over one or many variables of interest. For instance, suppose there are two
competing toothpastes, both of which promise to reduce pain for people with sensitive
teeth. If the researcher decided to randomly assign toothpaste A to half of the study
participants, and toothpaste B to the other half, then this is an experiment because
it is only the researcher who decides which toothpaste is to be used by any of the
participants. It is not decided by the participant. The data of the variable toothpaste
is controlled by the experimenter. Consider another example where the production
process of a product can potentially be reduced by combining two processes. The
management could decide to implement the new process in three production facilities,
but leave it as it is in the other facilities. The production process for the different
units (facilities) is therefore under control of the management. However, if each
facility could decide for themselves if they wanted a change or not, it would not be
an experiment because factors not directly controlled by the management, such as the
leadership style of the facility manager, would determine which process is chosen.
Observational Data. Observational data is data which is collected routinely, without
a researcher designing a survey or conducting an experiment. Suppose a blood sample
is drawn from each patient with a particular acute infection when they arrive at a
hospital. This data may be stored in the hospital’s folders and later accessed by a
researcher who is interested in studying this infection. Or suppose a government
institution monitors where people live and move to. This data can later be used to
explore migration patterns.
Primary and Secondary Data. Primary data is data we collect ourselves, i.e. via a
survey or experiment. Secondary data, in contrast, is collected by someone else. For
example, data from a national census, publicly available databases, previous research
studies, government reports, historical data, and data from the internet, among others,
are secondary data.

1.4 Creating a Data Set

There is a unique way in which data is prepared and collected to utilize statistical
analyses. The data is stored in a data matrix (=data set) with p columns and n rows
(see Fig. 1.2). Each row corresponds to an observation/unit ω and each column to
a variable X . This means that, for example, the entry in the fourth row and second
column (x42 ) describes the value of the fourth observation on the second variable.
The examples below will illustrate the concept of a data set in more detail.
10 1 Introduction and Framework

ω Variable 1 Variable 2 ··· Variable p

⎛ ⎞
1 x11 x12 ··· x1p
⎜2 x21 x22 ··· x2p ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎝. . . . ⎠
n xn1 xn2 ··· xnp

Fig. 1.2 Data set or data matrix

ω Music Mathematics Biology Geography

⎛ ⎞
Student A 65 70 85 45
⎜ Student B 77 82 80 60 ⎟
⎜ ⎟
⎜ Student C 78 73 93 68 ⎟
⎝ Student D 88 71 63 58 ⎠
Student E 75 83 63 57

Fig. 1.3 Data set of marks of five students

Example 1.4.1 Suppose five students take examinations in music, mathematics, biol-
ogy, and geography. Their marks, measured on a scale between 0 and 100 (where
100 is the best mark), can be written down as illustrated in Fig. 1.3. Note that each
row refers to a student and each column to a variable. We consider a larger data set
in the next example.

Example 1.4.2 Consider the data set described in Appendix A.4. A pizza delivery
service captures information related to each delivery, for example the delivery time,
the temperature of the pizza, the name of the driver, the date of the delivery, the
name of the branch, and many more. To capture the data of all deliveries during one
month, we create a data matrix. Each row refers to a particular delivery, therefore
representing the observations of the data. Each column refers to a variable. In Fig. 1.4,
the variables X 1 (delivery time in minutes), X 2 (temperature in ◦ C), and X 12 (name
of branch) are listed.

Delivery Delivery Time Temperature ··· Branch

⎛ ⎞
1 35.1 68.3 ··· East (1)
⎜ 2 25.2 71.0 ··· East (1) ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
1266 35.7 60.8 ··· West (2)

Fig. 1.4 Pizza data set

1.4 Creating a Data Set 11

Table 1.1 Coding list for Variable Values Code

branch
Branch East 1
West 2
Centre 3
Missing 4

The first row tells us about the features of the first pizza delivery: the delivery
time was 35.1 min, the pizza arrived with a temperature of 68.3 ◦ C, and the pizza
was delivered from the branch in the East of the city. In total, there were n = 1266
deliveries. For nominal variables, such as branch, we may decide to produce a coding
list, as illustrated in Table 1.1: instead of referring to the branches as “East”, “West”,
and “Centre”, we may simply call them 1, 2, and 3. As we will see in Chap. 11, this
has benefits for some analysis methods, though this is not needed in general.
If some values are missing, for example because they were never captured or even
lost, then this requires special attention. In Table 1.1, we assign missing values the
number “4” and therefore treat them as a separate category. If we work with statistical
software (see below), we may need other coding such as NA in the statistical software
R or . in Stata. More detail can be found in Appendix A.

Another consideration when collecting data is that of transformations: we may

have captured the velocity of cars in kilometres per hour, but may need to present
the data in miles per hour; we have captured the temperature in degrees Celsius,
whereas we need to communicate results in degrees Fahrenheit, or we have created a
satisfaction score which we want to range from −5 to +5, while the score currently
runs from 0 to 20. This is not a problem at all. We can simply create a new variable
which reflects the required transformation. However, valid transformations depend
on the scale of a variable. Variables on an interval scale can use transformations of
the following kind:
g(x) = a + bx, b > 0. (1.2)
For ratio scales, only the following transformations are valid:
g(x) = bx, b > 0. (1.3)
In the above equation, a is set to 0 because ratios only stay the same if we respect a
variable’s natural point of origin.

Example 1.4.3 The temperature in ◦ F relates to the temperature in ◦ C as follows:

Temperature in ◦ F = 32 + 1.8 Temperature in ◦ C
g(x) = a + b x
This means that 25 ◦ C relates to (32 + 1.8 · 25) ◦ F = 77 ◦ F. If X 1 is a variable
representing temperature by ◦ C, we can simply create a new variable X 2 which is
temperature in ◦ F. Since temperature is measured on an interval scale, this transfor-
mation is valid.
12 1 Introduction and Framework

Changing currencies is also possible. If we would like to represent the price of a

product not in South African Rand but in e, we simply apply the transformation
Price in South African Rand = b · Price in e
whereby b is the currency exchange rate.

1.4.1 Statistical Software

There are number of statistical software packages which allow data collection, man-
agement, and–most importantly–analysis. In this book, we focus on the statistical
software R which is freely available at http://cran.r-project.org/. A gentle introduc-
tion to R is provided in Appendix A. A data matrix can be created manually using
commands such as matrix(), data.frame(), and others. Any data can be edited
using edit(). However, typically analysts have already typed their data into data-
bases or spreadsheets, for example in Excel, Access, or MySQL. In most of these
applications, it is possible to save the data as an ASCII file (.dat), as a tab-delimited
file (.txt), or as a comma-separated values file (.csv). All of these formats allow easy
switching between different software and database applications. Such data can easily
be read into R by means of the following commands:

setwd('C:/directory')
read.table('pizza_delivery.dat')
read.table('pizza_delivery.txt')
read.csv('pizza_delivery.csv')

where setwd specifies the working directory. Alternatively, loading the library
foreign allows the import of data from many different statistical software pack-
ages, notably Stata, SAS, Minitab, SPSS, among others. A detailed description of
data import and export can be found in the respective R manual available at http://
cran.r-project.org/doc/manuals/r-release/R-data.pdf. Once the data is read into R,
it can be viewed with

fix() # option 1
View() # option 2

We can also can get an overview of the data directly in the R-console by displaying
only the top lines of the data with head(). Both approaches are visualized in Fig. 1.5
for the pizza data introduced in Example 1.4.2.
1.5 Key Points and Further Issues 13

Fig. 1.5 Viewing data in R

1.5 Key Points and Further Issues

Note:

The scale of variables is not only a formalism but an essential framework

for choosing the correct analysis methods. This is particularly relevant
for association analysis (Chap. 4), statistical tests (Chap. 10), and linear
regression (Chap. 11).
Even if variables are measured on a nominal scale (i.e. if they are cate-
gorical/qualitative), we may choose to assign a number to each category
of this variable. This eases the implementation of some analysis methods
introduced later in this book.
Data is usually stored in a data matrix where the rows represent the
observations and the columns are variables. It can be analysed with
statistical software. We use R (R Core Team 2016) in this book. A
gentle introduction is provided in Appendix A and throughout the book.
A more comprehensive introduction can be found in other books, for
example in Albert and Rizzo (2012), Crawley (2013), or Ligges (2008).
Even advanced books, e.g. Adler (2012) or Everitt and Hothorn (2011),
can offer insights to beginners.
14 1 Introduction and Framework

1.6 Exercises

Exercise 1.1 Describe both the population and the observations for the following
research questions:

(a) Evaluation of the satisfaction of employees from an airline.

(b) Description of the marks of students from an assignment.
(c) Comparison of two drugs which deal with high blood pressure.

Exercise 1.2 A national park conducts a study on the behaviour of their leopards.
A few of the park’s leopards are registered and receive a GPS device which allows
measuring the position of the leopard. Use this example to describe the following
concepts: population, sample, observation, value, and variable.

Exercise 1.3 Which of the following variables are qualitative, and which are quan-
titative? Specify which of the quantitative variables are discrete and which are
continuous:

Time to travel to work, shoe size, preferred political party, price for a canteen meal, eye
colour, gender, wavelength of light, customer satisfaction on a scale from 1 to 10, delivery
time for a parcel, blood type, number of goals in a hockey match, height of a child, subject
line of an email.

Exercise 1.4 Identify the scale of the following variables:

(a) Political party voted for in an election

(b) The difficulty of different levels in a computer game
(c) Production time of a car
(d) Age of turtles
(e) Calender year
(f) Price of a chocolate bar
(g) Identification number of a student
(h) Final ranking at a beauty contest
(i) Intelligence quotient.

Exercise 1.5 Make yourself familiar with the pizza data set from Appendix A.4.

(a) First, browse through the introduction to R in Appendix A. Then, read in the
data.
(b) View the data both in the R data editor and in the R console.
(c) Create a new data matrix which consists of the first 5 rows and first 5 variables
of the data. Print this data set on the R console. Now, save this data set in your
preferred format.
(d) Add a new variable “NewTemperature” to the data set which converts the tem-
perature from ◦ C to ◦ F.
1.6 Exercises 15

(e) Attach the data and list the values from the variable “NewTemperature”.
(f) Use “?” to make yourself familiar with the following commands: str, dim,
colnames, names, nrow, ncol, head, and tail. Apply these commands
to the data to get more information about it.

Exercise 1.6 Consider the research questions of describing parents’ attitudes towards
immunization, what proportion of them wants immunization against chicken pox for
their last-born child, and whether this proportion differs by gender and age.

(a) Which data collection method is the most suitable one to answer the above
questions: survey or experiment?
(b) How would you capture the attitudes towards immunization in a single variable?
(c) Which variables are needed to answer all the above questions? Describe the scale
of each of them.
(d) Reflect on what an appropriate data set would look like. Now, given this data
set, try to write down the above research questions as precisely as possible.

→ Solutions to all exercises in this chapter can be found on p. 321

Frequency Measures and Graphical
Representation of Data 2

In Chap. 1, we highlighted that different variables contain different levels of informa-

tion. When summarizing or visualizing one or more variable(s), it is this information
which determines the appropriate statistical methods to use.
Suppose we are interested in studying the employment opportunities and starting
salaries of university graduates with a master’s degree. Let the variable X denote the
starting salaries measured in e/year. Now suppose 100 graduate students provide
their initial salaries. Let us write down the salary of the first student as x1 , the
salary of the second student as x2 , and so on. We therefore have 100 observations
x1 , x2 , . . . , x100 . How can we summarize those 100 values best to extract meaningful
information from them? The answer to this question depends upon several aspects
like the nature of the recorded data, e.g. how many observations have been obtained
(either small in number or large in number) or how the data was recorded (either
exact values were obtained or the values were obtained in intervals). For example, the
starting salaries may be obtained as exact values, say 51,500 e/year, 32,350 e/year,
etc. Alternatively, these values could have been summarized in categories such as low
income (<30,000 e/year), medium income (30,000–50,000 e/year), high income
(50,000–70,000 e/year), and very high income (>70,000 e/year). Another approach
is to ask whether the students were employed or not after graduating and record the
data in terms of “yes” or “no”. It is evident that the latter classification is less detailed
than the grouped income data which is less detailed than the exact data. Depending on
which conceptualization of “starting salary” we use, we need to choose the approach
to summarize the data, that is the 100 values relating to the 100 graduated students.

2.1 Absolute and Relative Frequencies

Discrete Data. Let us first consider a simple example to illustrate our notation.

C. Heumann et al., Introduction to Statistics and Data Analysis,
DOI 10.1007/978-3-319-46162-5_2
18 2 Frequency Measures and Graphical Representation of Data

Example 2.1.1 Suppose there are ten people in a supermarket queue. Each of them
is either coded as “F” (if the person is female) or “M” (if the person is male). The
collected data may look like
M, F, M, F, M, M, M, F, M, M.
There are now two categories in the data: male (M) and female (F). We use a1 to refer
to the male category and a2 to refer to the female category. Since there are seven male
and three female students, we have 7 values in category a1 , denoted as n 1 = 7, and 3
values in category a2 , denoted as n 2 = 3. The number of observations in a particular
category is called the absolute frequency. It follows that n 1 = 7 and n 2 = 3 are the
absolute frequencies of a1 and a2 , respectively. Note that n 1 + n 2 = n = 10, which
is the same as the total number of collected observations. We can also calculate
the relative frequencies of a1 and a2 as f 1 = f (a1 ) = nn1 = 10 7
= 0.7 = 70 % and
n2
f 2 = f (a2 ) = n = 10 = 0.3 = 30 %, respectively. This gives us information about
3

the proportions of male and female customers in the queue.

We now extend these concepts to a general framework for the summary of data
on discrete variables. Suppose there are k categories denoted as a1 , a2 , . . . , ak
with n j ( j = 1, 2, . . . , k) observations in category a j . The absolute frequency n j is
defined as the number of units in the jth category a j . The sum of absolute frequencies

equals the total number of units in the data: kj=1 n j = n. The relative frequencies
of the jth class are defined as
nj
f j = f (a j ) = , j = 1, 2, . . . , k. (2.1)
n

The relative frequencies always lie between 0 and 1 and kj=1 f j = 1.
Grouped Continuous Data. Data on continuous variables usually has a large number
(k) of different values. Sometimes k may even be the same as n and in such a case
the relative frequencies become f j = n1 for all j. However, it is possible to define
intervals in which the observed values are contained.

Example 2.1.2 Consider the following n = 20 results of the written part of a driving
licence examination (a maximum of 100 points could be achieved):
28, 35, 42, 90, 70, 56, 75, 66, 30, 89, 75, 64, 81, 69, 55, 83, 72, 68, 73, 16.
We can summarize the results in class intervals such as 0–20, 21–40, 41–60, 61–80,
and 81–100, and the data can be presented as follows:

Class intervals 0–20 21–40 41–60 61–80 81–100

Absolute frequencies n 1 = 1 n 2 = 3 n 3 = 3 n 4 = 9 n 5 = 4
Relative frequencies f1 = 1
20 f2 = 3
20 f3 = 3
20 f4 = 9
20 f5 = 5
20

5 5
We have j=1 n j = 20 = n and j=1 f j = 1.
Random documents with unrelated
content Scribd suggests to you:
Turning our survey to the course of the Danube, we note that
several Magdalenian stations extend into the provinces of Lower
Austria, chief among them being both the open 'loess' station of
Aggsbach, and that of Gobelsburg; there is also the Hundssteig near
Krems, better known as the station of Krems, and the cavern known
as the Gudenushöhle; in the latter station the characteristic bâtons,
javelins, and bone needles have been found.[BB]

Fig. 244. The open loess

station of Aggsbach,
on the Danube, near
Krems. After
Obermaier.
The cavern district of Moravia attracted a relatively large population,
and among the numerous stations are the grottos of Kr̆ íz̆ , Žitný,
Kostelík, Bycis̆ kala, Schoschuwka, Balcarovaskala, Kůlna, and
Lautsch. Near the Russian border bone implements like those of
Gudenushöhle on the Danube have been found at the station of
Kůlna, and the industrial stratification of Šipka is very clear. Not far
from Cracow, across the Russian border, the caverns in the region of
Ojcow were entered by men carrying the Magdalenian culture.
Another site in Russia is the grotto of Mas̆ zycka, and characteristic
Magdalenian harpoons, needles, and bâtons de commandement with
other implements have also been found to the eastward, in the
neighborhood of Kiev, in the Ukraine.

Decline of the Magdalenian Culture

The highest point touched by the Crô-Magnon race in the middle or
high Magdalenian appears to correspond broadly with the cold arid
period of climate in the interval between the Bühl and Gschnitz
advances in the Alpine region, during which the steppe mammals
spread widely over southwestern Europe. The saiga antelope, for
example, a highly characteristic steppe type, is represented in one of
the most skilful bone carvings found in the late Magdalenian layers
of Mas d'Azil; also the steppe type of horse is frequently represented
in the most advanced engravings of late Magdalenian times. How far
this cold, relatively dry climate influenced the artistic and creative
energy of the Crô-Magnons is largely a matter of conjecture. The
entirely independent records of La Madeleine, of Schweizersbild, and
of Kesslerloch concur in associating the highest stage of
Magdalenian history of art with the predominance of the steppe
fauna and evidences of a cold dry climate. That the mammoth still
abounded is seen in the mammoth engravings which are superposed
on those of the bison in Font-de-Gaume.
Larger Image

Fig. 245. Front and side

views of a saiga
antelope carved upon
a bone dart-thrower
from the Magdalenian
deposits of Mas d'Azil.
After Piette.

The succeeding life period is that of the retreat of the tundra and
steppe mammals and of the increasing rarity of the reindeer and of
the mammoth in southwestern Europe; it corresponds broadly with
the returning cold and moist climate of the second Postglacial
advance known in the Alps as the Gschnitz stage. With the spread of
the forests and the retreat to the north of the reindeer, the principal
source both of the supply of food and clothing and of all the bone
implements of industry and of the chase, a new set of life conditions
may have gradually become established. If it is true, as most
students of geographical conditions and of the climate maintain, that
Europe at the same time became more densely forested, the chase
may have become more difficult, and the Crô-Magnons may have
begun to depend more and more upon the life of the streams and
the art of fishing. It is generally agreed that the harpoons were
chiefly used for fishing and that many of the microlithic flints, which
now begin to appear more abundantly, may have been attached to a
shaft for the same purpose. We know that similar microliths were
used as arrow points in predynastic Egypt.
Breuil(35) observes very significant industrial changes in closing
Magdalenian times: first, the beginning of small geometric forms of
flints suggesting the Tardenoisian types; second, the occasional use
of stag horn in place of reindeer horn; third, a modification in the
form of bone implements toward the patterns of Azilian times;
fourth, the rapid decline—one may almost say sudden disappearance
—of the artistic spirit. Schematic and conventional designs begin to
take the place of the free realistic art of the middle Magdalenian.
Thus the decline of the Crô-Magnons as a powerful race may have
been due partly to environmental causes and the abandonment of
their vigorous nomadic mode of life, or it may be that they had
reached the end of a long cycle of psychic development, which we
have traced from the beginning of Aurignacian times. We know as a
parallel that in the history of many civilized races a period of great
artistic and industrial development may be followed by a period of
stagnation and decline without any apparent environmental causes.

Crô-magnon Descendants in Modern Europe

We might attribute this great change, which affected all of western
Europe, to the extinction of the Crô-Magnon race were it not for the
existing evidence that the race survived throughout the Azilian-
Tardenoisian or close of the Upper Palæolithic. On the close of the
Palæolithic the race broke up throughout western Europe into many
colonies, which can perhaps be traced into Neolithic and even into
recent times. The anatomical evidence for this survival theory chiefly
consists of the highly characteristic form of the head.
In Europe a very broad face and a long, narrow cranium is such an
infrequent combination that anthropologists maintain that it affords
a means of identifying the descendants of the prehistoric Crô-
Magnon race wherever they persist to-day. Since Dordogne was the
geographic centre of the race in Upper Palæolithic times, is it merely
a coincidence that Dordogne is still the centre of a similar type?
Ripley(36) has given us a valuable résumé of our present knowledge
of this subject. The most significant trait of the long-headed people
of Dordogne is that in many cases the face is almost as broad as in
the normal Alpine round-headed type; in other words, it is strongly
disharmonic; in profile the back part of the head rises and in front
view the head is narrowed at the top; the skull is very low-vaulted;
the brow ridges are prominent; the nose is well formed; the cheek-
bones are prominent, and the powerful cheek muscles give a
peculiarly rugged cast to the countenance. The appearance,
however, is not repellent, but more often open and kindly. The men
are of medium height, but very susceptible to environment as
regards stature; they are tall in fertile places, and stunted in less
prosperous districts. They are not degenerate at all, but keen and
alert of mind. The present people of Dordogne agree with but one
other type of men known to anthropologists, namely, the ancient
Crô-Magnon race. The geographical evidence that here in Dordogne
we have to do with the survivors of the real Crô-Magnon race seems
to be sustained by a comparison of the characteristics of the
prehistoric skulls found at Crô-Magnon, Laugerie Basse, and
elsewhere in Dordogne, with the heads of the types of to-day. The
cranial indices of the prehistoric skulls, varying from 70 per cent to
73 per cent, correspond with indices of the living head of 72 per
cent to 75 per cent. None of the people of Dordogne are quite so
long-headed as this, the average index of the living head in an
extreme district being 76 per cent; but within the whole population
there are much lower indices.
The probability of direct descent becomes stronger when we
consider the disharmonic low-skulled shape of the Crô-Magnon head
and the remarkable elongation of the skull at the back. In the
prehistoric Crô-Magnons the brows were strongly developed, the eye
orbits low, the chin prominent. The facial type has been
characterized by de Quatrefages(37) as follows: "The eye depressed
beneath the orbital vault; the nose straight rather than arched; the
lips somewhat thick, the jaw and the cheek-bones strongly
developed, the complexion very brown, the hair very dark and
growing low on the forehead—a whole which, without being
attractive, was in no way repulsive."
In southern France we observe a continuity not only of the head
form but of the prevalence of black hair and eyes. Why should this
Crô-Magnon type have survived at this point and have disappeared
elsewhere? In order to consider the particular cause of this
persistence of a Palæolithic race, we must, with Ripley, broaden our
horizon, and consider the whole southwest from the Mediterranean
to Brittany as a unit.
The survival is partly attributed to favorable geographical
environment and partly to geological and racial barriers. On the
north the intrusion of the Teutonic race was shut off and competition
was narrowed down to the Crô-Magnon and Alpine types.
If the people of Dordogne are veritable survivors of the Crô-
Magnons of the Upper Palæolithic, they certainly represent the
oldest living race in western Europe, and is it not extremely
significant that the most primitive language in Europe, that of the
Basques of the northern Pyrenees, is spoken near by, only 200 miles
to the southwest? Is there possibly a connection between the
original language of the Crô-Magnons, a race which once crowded
the region of the Cantabrian Mountains and the Pyrenees, and the
existing agglutinative language of the Basques, which is totally
different from all the European tongues? This hypothesis, suggested
by Ripley,(38) is very well worth considering, for it is not
inconceivable that the ancestors of the Basques conquered the Crô-
Magnons and subsequently acquired their language.
The prehistoric Crô-Magnon men would seem, therefore, to have
remained in or near their early settlements through all the changes
of time and the vicissitudes of history. "It is, perhaps," observes
Ripley, "the most striking instance known of a persistency of
population unchanged through thousands of years."
The geographic extension of this race was once very much wider
than it is to-day. The classical skull of Engis, Belgium, belongs to this
type. It has been traced from Alsace in the east to the Atlantic in the
west. Ranke asserts that it is to be found to-day in the hills of
Thuringia, and that it was a prevalent type there in the past.
Verneau considers that it was the type prevailing among the extinct
Guanches of the Canary Islands. Collignon(39) has identified it in
northern Africa, and regards the Crô-Magnons as a subvariety of the
Mediterranean race, an opinion consistent at least with the
archæological evidence that this race came into Europe with the
Aurignacian culture, which was circum-Mediterranean in distribution.
Traces of Crô-Magnon head formation are found among the living
Berbers.
At present, however, this race is believed to survive only in a few
isolated localities, namely, in Dordogne, at a small spot in Landes,
near the Garonne in southern France, and at Lannion in Brittany,
where nearly one-third of the population is of the Crô-Magnon type.
It is said to survive on the island of Oléron off the west coast of
France, and there is evidence of similar descent to be found among
the people of the islands of northern Holland. The people of Trysil,
on the Scandinavian peninsula, are characterized as having
disharmonic features, possibly representing an outcrop of the Crô-
Magnon type.
Our interest in the fate of the Crô-Magnons is so great that the
Guanche theory may also be considered; it is known to be favored
by many anthropologists: von Behr, von Luschan, Mehlis, and
especially by Verneau. The Guanches were a race of people who
formerly spread all over the Canary Islands and who preserved their
primitive characteristics even after their conquest by Spain in the
fifteenth century. The differences from the supposed modern Crô-
Magnon type may be mentioned first. The skin of the Guanches is
described by the poet Viana as light-colored, and Verneau considers
that the hair was blond or light chestnut and the eyes blue; the
coloring, however, is somewhat conjectural. The features of
resemblance to the ancient Crô-Magnons are numerous. The
minimum stature of the men was 5 feet 7 inches, and the maximum
6 feet 7 inches; in one locality the average male stature was over 6
feet. The women were comparatively small. The most striking
characters of the head were the fine forehead, the extremely long
skull, and the pentagonal form of the cranium, when seen from
above, caused by the prominence of the parietals—a Crô-Magnon
characteristic. Among the insignia of the chiefs was the arm-bone of
an ancestor; the skull also was carefully preserved. The offensive
weapons in warfare consisted of three stones, a club, and several
knives of obsidian; the defensive weapon was a simple lance. The
Guanches used wooden swords with great skill. The habitation of all
the people was in large, well-sheltered caverns, which honeycombed
the sides of the mountains; all the walls of these caverns were
decorated; the ceilings were covered with a uniform coat of red
ochre, while the walls were decorated with various geometric
designs in red, black, gray, and white. Hollowed-out stones served
as lamps. We may conclude with Verneau that there is evidence,
although not of a very convincing kind, that the Guanches were
related to the Crô-Magnons.(40) His observations on these supposed
Crô-Magnons of the Canary Islands are cited in the Appendix, Note
V. We regret that Verneau in his memoir(41) does not present his
more recent views in regard to the prehistoric distribution of this
great race.

(1) Breuil, 1912.7, p. 203.

(2) Op. cit., p. 205.
(3) James, 1902.1.
(4) Heim, 1894.1, p. 184.
(5) Schmidt, 1912.1, p. 262.
(6) Fraunholz, 1911.1.
(7) Geikie, 1914.1, pp. 25, 26.
(8) Boule, 1899.1.
(9) Breuil, 1912.7, pp. 203-205.
(10) Obermaier, 1912.1, pp. 341, 342.
(11) Martin, R., 1914.1, pp. 15, 16.
(12) Verworn, 1914.1.
(13) Op. cit., p. 646.
(14) Breuil, 1912.7, p. 201.
(15) Lartet, 1875.1.
(16) Breuil, 1912.7, p. 213.
(17) Schmidt, 1912.1, p. 136.
(18) Breuil, op. cit., pp. 216, 217.
(19) Breuil, 1909.3.
(20) Op. cit., p. 410.
(21) Cartailhac, 1906.1, pp. 227, 228.
(22) Rivière, 1897.1; 1897.2.
(23) Reinach, 1913.1.
(24) Breuil, 1912.1, p. 202.
(25) Cartailhac, 1908.1.
(26) Capitan, 1908.1, pp. 501-514.
(27) Ibid., 1910.1, pp. 59-132.
(28) Breuil, 1912.1, pp. 196, 197.
(29) Schmidt, 1912.1, p. 116.
(30) Fraunholz, 1911.1.
(31) Schmidt, 1912.1, p. 154.
(32) Déchelette, 1908.1, vol. I, pp. 191-194.
(33) Nehring, 1880.1; 1896.1.
(34) Bayer, 1912.1, pp. 13-21.
(35) Breuil, 1912.7, pp. 212, 216.
(36) Ripley, 1899.1, pp. 39, 165, 173, 174-179, 211, 406.
(37) Op. cit., p. 176.
(38) Op. cit., p. 181.
(39) Collignon, 1890.1.
(40) Verneau, 1891.1.
(41) Ibid., 1906.1.
CHAPTER VI
CLOSE OF THE OLD STONE AGE—
INVASION OF NEW RACES—
HISTORY OF THE MAS D'AZIL, OF
FÈRE-EN-TARDENOIS—FOREST
ENVIRONMENT AND LIFE—ORIGIN
OF THE AZILIAN-TARDENOISIAN
CULTURE—CHARACTERS AND
CUSTOMS OF THE NEW RACES—
TRANSITION TO THE NEOLITHIC
AND RELATIONS OF THE OLD AND
NEW RACES—APPARENT CHIEF
LINES OF HUMAN DESCENT AND OF
HUMAN MIGRATION INTO
WESTERN EUROPE.
We have now reached the very close of the Old Stone Age, a period
which is believed to extend between 10,000 and 7,000 years before
the present era. The entrance to the final cultures of the Upper
Palæolithic, known as the Azilian-Tardenoisian, marks a transition
even more abrupt than that witnessed in any preceding stage. It is
not a development; it is a revolution. The artistic spirit entirely
disappears; there is no trace of animal engraving or sculpture;
painting is found only on flattened pebbles or in schematic or
geometric designs on wall surfaces. Of bone implements only
harpoons and polishers remain, and even these are of inferior
workmanship and without any trace of art. The flint industry
continues the degeneration begun in the Magdalenian and exhibits a
new life and impulse only in the fashioning of the extremely small or
microlithic tools and weapons known as 'Tardenoisian.' Both bone
and flint weapons of the chase disappear, yet the stag is hunted and
its horns are used in the manufacture of harpoons. This is the 'Age
of the Stag,' the final stage of the 'Cave Period' in western Europe,
and is subsequent to the 'Age of the Reindeer' in the south.
It would appear as if the very same regions formerly occupied by the
great hunting Crô-Magnon race from Aurignacian to Magdalenian
times were now inhabited by a race or races largely employed in
fishing. The country is thickly forested. The climate is still cold and
extremely moist, and human life everywhere is in the grottos or
entrances to the caverns.

Invasion of Four New Races in Closing Upper Palæolithic Times

How far this revolution is due to the decline of the Crô-Magnon race
and how far to the invasion of one or more new races is very difficult
to determine in the absence of the anatomical evidence derived from
skeletal remains. Two new races had certainly found their way along
the Danube as shown in the burials of Ofnet, in eastern Bavaria; one
is extremely broad-headed and perhaps of central Asiatic origin,
while the other is extremely long-headed and perhaps of southerly
or Mediterranean origin. It is possible that these two races
correspond respectively with the easterly and southerly industrial
influences which are observed in the Azilian-Tardenoisian stage. The
former is the first brachycephalic race to enter western Europe, for it
will be recalled that all the previous races, the Crô-Magnons, the
Brünns, and the Neanderthals, are dolichocephalic. The long-headed
race found at Ofnet is very clearly distinguished from the
disharmonic long-headed Crô-Magnon race by the narrowness of the
face; in other words, it is an harmonic type of head and face, which
may have been Mediterranean in origin, like the so-called
'Mediterranean race' of Sergi.
This fresh invasion of western Europe by two races arriving by one
or more of the great migration routes from the vast Eurasiatic
mainland to the east, races with a relatively high brain development,
is certainly one of the most surprising features of the close of the
Palæolithic Period, for we have long been accustomed to think that
these fresh easterly and southerly invasions began only in Neolithic
times.
As the Upper Palæolithic draws to an end, there is, according to
Breuil, still another industrial influence making itself felt: it comes
from the northeast along the shores of the Baltic.
Putting together all the fragmentary evidence which we possess, we
may regard western Europe at the close of the Old Stone Age as
peopled by four and possibly by five distinct races, as follows:
5. Arriving late in Palæolithic times, a race along the
shores of the Baltic, known only by its Maglemose
industry; possibly a Teutonic race.
4. A south Mediterranean race, known only by its
Tardenoisian industry, migrating along the northern shores
of Africa and spreading over Spain; with a conventional
and schematic art; probably an advance wave of the true
'Mediterranean' race of Sergi; possibly identical with race
3 below. (The same as Race 4, p. 278.)
3. A long-headed race found at Ofnet, in eastern Bavaria;
possibly a branch of the true 'Mediterranean' race 4
above, but not related to the Brünn. (Possibly the same as
Race 4.)
2. The newly arriving Furfooz-Grenelle race, broad-
headed; known along the Danube at Ofnet, in eastern
Bavaria, and northward in Belgium; possibly a branch of
the 'Alpine' race. (The same as Race 5, p. 278.)
1. The surviving Crô-Magnons, in a stage of industrial
decline, pursuing the Azilian industry, probably inhabiting
France and northern Spain.
The broad-headed Ofnet race mentioned above is apparently the
same as the Furfooz-Grenelle race, and may also correspond with
the existing Alpine-Celtic race of western Europe. The long-headed
race of Ofnet may correspond with the existing 'Mediterranean' race
of Sergi.
The presence of the Crô-Magnon race in western Europe during
Azilian-Tardenoisian times is not sustained, so far as we know, by
any anatomical evidence, but is suggested by the mode of burial of
two skeletons found by Piette in the Azilian deposits of the station of
Mas d'Azil. This burial, like that of Ofnet, is typical of Upper
Palæolithic and not of Neolithic times. These skeletons lay in the
'Azilian' layer (VI) described below. As the smaller bones were
missing, Piette concluded that the remains had been for some time
exposed to the weather before burial, and that the larger bones had
been scraped and cleaned with flint knives, and then colored red
with oxide of iron before interment. According to other authorities,
the traces of scraping and cleaning are doubtful; there can be no
question, however, that the separation of the bones of the skeleton
and the use of coloring matter constitute strong evidence that this
Azilian burial was the work of members of the Crô-Magnon race.
In addition to what we have said as to the survival of the Crô-
Magnon race in the preceding chapter, the opinion of Cartailhac(1)
may be cited: "The race of Crô-Magnon is well determined. There is
no doubt about their high stature, and Topinard is not the only one
who believes that they were blonds. We have traced them through
the 'Reindeer Period' into the Neolithic Epoch, where they were
widely distributed and positively related either to the ancient or
actual populations of modern France, being especially characteristic
of our region [France] and of the western Mediterranean. While the
race of Crô-Magnon predominated in the south and in the west, that
of Furfooz predominated in the northeast of France and in Belgium.
These brachycephals were probably brown-haired or of dark
coloring."
But before observing further the characters of these four or five
races, let us examine their industries.
Discovery of the Azilian Type Station
As remarked above, it is believed that these industries prevailed
between 7,000 and 10,000 years before our era, that is, between the
close of Magdalenian times and the beginning of the Neolithic or
New Stone Age. This transition period corresponds with the interval
in which the Azilian-Tardenoisian culture swept all over western
Europe and completely replaced the Magdalenian. From Castillo in
the Cantabrian Mountains of northern Spain to Ofnet on the upper
Danube there is a complete replacement by this new culture. The
Magdalenian culture does not linger anywhere; it is totally
eliminated; the suddenness of the change both in the animal life and
in the industry is nowhere more clearly indicated than at the type
station of Mas d'Azil in southern France, which may now be
described.
In 1887 Edouard Piette commenced his exploration of the deposits in
the great cavern of Mas d'Azil. This station takes its name from the
little hamlet of Mas d'Azil in the foot-hills of the Pyrenees about forty
miles southwest from Toulouse. Here the River Arize winds for a
quarter of a mile through a lofty natural tunnel traversed by the
highway from St. Girons to Carcassonne. A rich layer of Magdalenian
deposits first attracted Piette's attention, and he found here some of
the finest examples of late Magdalenian art, but above these
deposits he discovered a hitherto unrecognized industrial stage, to
which he gave the name Azilian. The Azilian layers yielded over one
thousand specimens of flattened and double-barbed harpoons made
of the horns of the stag, thus widely differing from the late
Magdalenian harpoons which are rounded and made of the horns of
the reindeer. The entire succession of deposits, as explored by
Piette, is an epitome of the prehistory of Europe from early
Magdalenian times to the Age of Bronze, and should be compared
with the successive deposits of Castillo (p. 164), Sirgenstein (p.
202), Ofnet (p. 476), and Schweizersbild (p. 447).
Fig. 246. Western entrance
to the great station of
Mas d'Azil. "Here the
River Arize winds for a
quarter of a mile
through a lofty natural
tunnel traversed by
the highway from St.
Girons to
Carcassonne."
Photograph by N. C.
Nelson.
The Mas d'Azil section is as follows:
Prehistoric and Neolithic
IX. Iron implements, pottery of the Gauls. At the top
Gallo-Roman remains, glass and glazed pottery.
VIII. Middle Neolithic and Age of Bronze; layer of pottery,
polished stone implements, traces of copper and of
bronze.
VII. Dawn of the Neolithic. Fauna includes the horse, urus,
stag, and wild boar. Chipped and polished flints, awls and
polishers in bone; harpoons rare. Beginnings of pottery.
Upper Palæolithic
VI. Azilian, red archæological layer, masses of peroxide of
iron. Extremely moist climate. Broad flat harpoons of stag
horn perforated at the base, numerous flattened and
painted pebbles (galets), flints of degenerate Magdalenian
form, especially small rounded planers and knife flakes,
awls and polishers in bone. No trace of reindeer in the
fire-hearths; stag abundant, also roe-deer and brown
bear; wild boar, wild cattle, beaver, a variety of birds. No
trace of polished stone implements. Interred in this layer,
beneath the deposits of streaked cinders and quite
undisturbed, two human skeletons were found, which
Piette believed had been macerated with flints and then
colored red with peroxide of iron.
V. Sterile finely stratified loam layer, a flood deposit of the
River Arize.
IV. Late Magdalenian culture layer; twelve double-rowed
harpoons made of reindeer horn, a few fashioned from
stag horn; numerous engravings and sculptures in bone.
Remains of the reindeer rare in the hearths; those of the
royal stag (Cervus elaphus) abundant.
III. A sterile flood deposit of the River Arize.
II. Middle and Early Magdalenian culture layers, with barbed
harpoons of reindeer horn; flint implements of early
Magdalenian type, bone needles. Bones of the reindeer
abundant.
I. Gravel deposits. Interspersed fire-hearths.
The total thickness of these culture deposits is 8.03 m., or 26 feet 4
inches. The Azilian type layer (VI) containing flat harpoons of stag
horn and painted pebbles, intercalated between the deposits of the
Reindeer Age and the Neolithic layers, is, on account of its
stratigraphic position, the most interesting and instructive of all the
sites representing this phase of transition; and Piette was fully
justified in giving to the corresponding culture period the name of
Azilian.(2)
The transformation of art and industry, indicated in the Azilian
culture layer, is as decided as that in the animal life. We observe in
this layer no trace of the animal engravings or sculptures which
occur so abundantly in the late Magdalenian layer below; the use of
pigments is confined to the paintings of schematic or geometric
figures on the flattened pebbles. There is no suggestion of art in any
of the bone implements, and the harpoons of stag horn are rudely
fashioned; this type of harpoon appears to be the chief survivor of
the rich variety of implements noted in the Magdalenian layer below.
The stag horn harpoon, moreover, is fashioned with far less skill than
the beautiful Magdalenian harpoons; like them it has two rows of
barbs, but they are not cut with the same delicacy and exactness. As
to the form of the new model, it is explained by the nature of the
new material; the interior of the stag horn being composed of a
spongy tissue, could not be utilized as could the harder and more
compact interior of the reindeer horn; the craftsman, therefore, was
obliged to fashion his harpoon out of the exterior of one side of the
stag horn, and in consequence to make it flat.
Fig. 247. Typical Azilian
harpoons of stag horn.
After de Mortillet. 287.
A single-rowed
harpoon from Mas
d'Azil. 288. Harpoon
with perforated base
from the shelter of La
Tourasse, Haute-
Garonne. 289. Double-
rowed harpoon from
the same shelter. 290.
A similar harpoon with
the barbs alternate
instead of opposite,
from Mas d'Azil. 291.
Harpoon with
triangular base and
round perforation from
the Grotte de la Vache,
near Tarascon. All one-
third actual size,
except 291, which is
four-ninths actual size.

There are no bone needles, no javelins or sagaies; nor are there any
of the beautifully carved weapons of bone. There is also a reduction
in the uses to which the split bones are put, such as the large lissoirs
or polishers. The bone implements appear to be derived from an
impoverished late Aurignacian stage; the same is true of the flint
implements, for we observe a return of the keeled scraper (grattoir
caréné). There is also a return of certain types of graving tools and
of the knife-like form of the flake; even some of the small geometric
types of flints resemble those of the Aurignacian levels.
The many shells of the moisture-loving snail Helix nemoralis, found
in the fire-hearths of Mas d'Azil are proofs of the humidity of the
climate, a fact confirmed by the contemporary flood deposits of the
Arize. The frequent and heavy rains drove the last few
representatives of the steppe fauna away to the north. These
climatic conditions favored the formation of peat-bogs, so frequent
to-day in the north of France, and also the growth of vast forests,
inhabited by the stag, which extended over the whole country.
The pebbles of Mas d'Azil are painted on one side with peroxide of
iron, a deposit of which is found in the neighborhood of the cave.
The color, mixed in shells of Pecten, or in hollowed pebbles or on flat
stones, was applied either with the finger or with a brush. The many
enigmatic designs consist chiefly of parallel bands, rows of discs or
points, bands with scalloped edges, cruciform designs, ladder-like
patterns (scalariform) such as are found in the 'Azilian' engravings
and paintings of the caverns, and undulating lines. These graphic
combinations resemble certain syllabic and alphabetic characters of
the Ægean, Cypriote, Phœnician, and Greco-Latin inscriptions.
However curious these resemblances may be, they are not sufficient
to warrant any theory connecting the signs on the painted pebbles
of the Azilians with the alphabetic characters of the oldest known
systems of writing.(3) Piette attempted to explain some of the
exceedingly crude designs on these pebbles as a system of notation,
others as pictographs and religious symbols, and some few as
genuine alphabetical signs, and suggested that the cavern of Mas
d'Azil was an Upper Palæolithic school where reading, reckoning,
writing, and the symbols of the sun were learned and taught. The
very wide distribution of these symbolic pebbles and the painting of
similar designs on the walls of the caverns certainly prove that they
had some religious or economic significance, which may be revealed
by subsequent research.
Fig. 248. Azilian galets
coloriés, flat, painted
pebbles, from the type
station of Mas d'Azil.
After Piette.

The Tardenoisian Type Station

Turning from the region of the Pyrenees in Azilian times, we observe
the region lying between the Seine and the Meuse in northern
France as the scene of a contemporary industry. At the station of
Fère-en-Tardenois, in the Department of the Aisne, is found an
especially large number of the pygmy flints;(4) these present various
geometric forms, including the primitive triangular, as well as the
rhomboidal, trapezoidal, and semicircular; together, they were
designated by de Mortillet as Tardenoisian flints, and in 1896, in
monographing this microlithic flint industry, he traced them
throughout France, Belgium, England, Portugal, Spain, Italy,
Germany, and Russia, also along the southern Mediterranean
through Algiers, Tunis, Egypt, and eastward into Syria and even
India.
These geometric flints were at first attributed to a primitive invasion
which was supposed to have occurred at the beginning of Neolithic
times; thus the Tardenoisian industry was considered as
contemporaneous with that of the Campignian, which is early
Neolithic. It was further observed that the topographical location of
the stations closely followed the borders of ocean inlets, or of river
courses, and when the food materials found in the hearths were
compared, it appeared that these flints were used principally by
fishermen or tribes subsisting upon fish. From an examination of the
flints, it would appear that a very large number of them were
adapted for insertion in small harpoons, or that those of grooved
form might even have been used as fish-hooks. Thus the picture was
drawn of a population of fishermen. The Tardenoisian, therefore,
was for a long time regarded as contemporaneous with the early
Neolithic rather than with the close of Palæolithic times, but as
exploration proceeded it was found that neither the remains of
domestic animals nor any traces of pottery occur in any of these
Tardenoisian deposits, which consequently have nothing in common
with the true Neolithic culture.
The problem was finally solved in 1909, when the grotto of Valle
near Gibaja, Santander, in northern Spain, was discovered by Breuil
and Obermaier.(5) Here was a classic Azilian deposit containing all
the well-known Azilian types of bone implements, such as fine
harpoons, carvings in deer horn, bone javelins, polishers of deer
bone, flint flakes resembling those of the late Magdalenian, also
microlithic flints of typical geometric Tardenoisian form. This
discovery established the fact that the lower levels of the
Tardenoisian industry were not really to be distinguished from the
Azilian, for here beneath layers with painted pebbles and harpoons
of Azilian style were harpoons with single and double rows of barbs
of Magdalenian pattern, but cut in stag horn instead of reindeer
horn.
The mammalian life in this true Azilian-Tardenoisian layer includes
the chamois, roe-deer, wild boar, and urus, or wild cattle. In a layer
just below, which represents the close of the Magdalenian industrial
period, there are found, although rarely, remains of the reindeer, an
animal hitherto unknown in this part of Spain, also the wild boar, the
bison, the ibex, and the lynx. After this discovery it could no longer
be questioned that the Azilian and Tardenoisian were contemporary.
As to the relation of these two industries, Breuil remarks(6) that the
prolongation of the Tardenoisian types of flints is observed in Italy
and in Belgium, but neither the term 'Tardenoisian' nor the term
'Azilian' is sufficiently comprehensive to embrace the totality of these
little industries, which will finally be distinguished clearly from each
other. Of the two the Azilian represents the prolongation of an
ancient period of industry, the progress of which was apparently
from south to north, as we can trace the distribution of the
characteristic flat harpoons of deer horn from the Cantabrian
Mountains and the Pyrenees, through southern and central France,
to Belgium, England, and the western coast of Scotland. The later
industrial phase, the Tardenoisian, with its geometric trapeziform
flints, originally appears along the southern Mediterranean in Tunis
and to the eastward in the Crimea, while in France it represents a
final phase of the Palæolithic, closely approaching the period of the
earliest Neolithic or pre-Campignian hearths common along the
Danube and observed in the vicinity of Liége. Thus the most
comprehensive term by which to designate the ensemble of these
implements, in Europe at least, would be Azilian-Tardenoisian.
Larger Image

Fig. 249. Small geometric

flints characteristic of
the Tardenoisian
industry. After de
Mortillet. 295 to 303,
321, 322, 326. From
various sites in
northern France. 311.
Uchaux, Vaucluse,
France. 305, 315, 320.
Valley of the Meuse,
Belgium. 312, 313.
Cabeço da Arruda,
Portugal. 304, 314.
Italy. 317, 318, 329.
Tunis. 325. Egypt. 306,
310, 324, 328. Kizil-
Koba, Crimea. 307 to
309, 316, 319, 323,
327. India. All one-half
actual size.

Environment and Mammalian Life

It appears that the chief geographic change during this period was a
subsidence of the northern coasts of Europe and an advance of the
sea causing the circulation of warm oceanic currents and a more
humid climate favorable to reforestation.
To the north, in Belgium, the tundra fauna lingered during the
extension of the early Tardenoisian industry, for here we still find
remains of the reindeer, the arctic fox, and the arctic hare mingled in
the fire-hearths with flints of Tardenoisian type. This, observes
Obermaier, constitutes proof that the Tardenoisian, with the Azilian,
must be placed at the very close of Postglacial time and with the
final stage of Upper Palæolithic industry.
To the south, in the region of Dordogne and the Pyrenees, the
tundra fauna had entirely disappeared, as well as that of the steppes
and of the alpine heights; the prevailing animal in the forests is the
royal stag, adapted to forests of temperate type and associated with
the Eurasiatic forest and meadow fauna which now dominated
western Europe.
The only survivor of the great African-Asiatic fauna is the lion, which
appears in the late Palæolithic stations in the region of the Pyrenees;
the arctic wolverene also gives the fauna a Postglacial aspect, for,
like the lion, it is never found in central or western Europe after the
close of Upper Palæolithic times. Other enemies of the herbivorous
fauna were the wolf and the brown bear.
Besides the red deer, or stag, the forests at this time were filled with
roe-deer. To the south in the Pyrenees the moose still survived, and
to the north there were still found herds of reindeer which survived
in central Europe as late as the twelfth century. Wild boars were
numerous, and in the streams were found the beaver and the otter.
In the forest borders and in the meadows hares and rabbits were
abundant. Through the forests and meadows of southern France and
along the borders of the Danube ranged the wild cattle (Bos
primigenius). It would appear from our limited knowledge of the life
of Azilian-Tardenoisian times that bison were found chiefly in the
northern parts of Europe. There is little direct evidence in regard to
the wild horse, the remains of which do not occur in the hearths of
Azilian times.
Our knowledge of the life of the Spanish peninsula at a period
closely succeeding this is indirectly derived from the animal frescos
in certain caverns of northern Spain, which were formerly attributed
to the Upper Palæolithic but are now referred rather to the early
Neolithic. Here are found representations of the ibex, the stag, the
fallow deer, the wild cattle, and also of the wild horses. This would
indicate that wild horses were still roaming all over western Europe
at the close of Upper Palæolithic times. The presence of the moose
in late Palæolithic times at Alpera, on the high plateaus of Spain, has
been determined; this animal has also been found in the Pyrenees
during the Azilian stage.(7)
The great contrast between the mammalian life of Magdalenian and
that of Azilian-Tardenoisian times is witnessed in the stations along
the upper Danube, as described by Koken.(8) In Höhlefels,
Schmiechenfels, and Propstfels, associated with implements of the
late Magdalenian industry, are found ten types of animals belonging
to the forests and four characteristic of the forests and meadows, or
fourteen species altogether. With these are mingled two alpine
forms, the ibex and the alpine shrew; also two types of mammals
belonging to the steppes, and no less than six mammals and birds
from the tundras, namely, the reindeer, the arctic fox, the ermine,
the arctic hare, the banded lemming, and the arctic ptarmigan.
In wide contrast to this assemblage of late Magdalenian life on the
upper Danube, there appear in Azilian times along the shores of the
middle Danube in the stations of Ofnet and of Istein the following
characteristic forest forms: Sus scrofa ferus (wild boar), Cervus
elaphus (stag), Capreolus capreolus (roe-deer), Bos (?) primigenius
(urus), Lepus (rabbit or hare), Ursus arctos (brown bear), Felis leo
(lion), Gulo luscus (common wolverene), Lynchus lynx (lynx), Vulpes
(fox), Mustela martes (marten), Castor fiber (European beaver), Mus
(field-mouse), Turdus (thrush). It thus appears that the alpine, the
steppe, and the tundra faunæ had entirely disappeared from this
region.

Origin and Distrubution of the Azilian-Tardenoisian Industry

This industry represents the last stage of the Old Stone Age. The
decline in the art of fashioning flints, begun in Magdalenian times,
appears to continue in the Azilian-Tardenoisian. As to the tiny
symmetrical flints which are characteristic of this period, among the
microliths of almost all the late Magdalenian stations pre-
Tardenoisian forms are found which may be regarded as prototypes
of the geometric Tardenoisian flints;(9) this represents a new fashion
established in flint-making under influences coming from the south.
There was also a natural or local Azilian evolution from the
Magdalenian types and technique. In general the flint implements
which had so long prevailed in western Europe become smaller in
diameter and more carelessly retouched, showing marked
deterioration even from the late Magdalenian stages. For the
preparation of hides and the fashioning of bone we discover
unsymmetrical planing tools (grattoirs), also small, well-formed oval
scrapers (racloirs), and microlithic scrapers. Borers (perçoirs) with
oblique ends and gravers (burins) made of small flakes are the types
of implements which most frequently occur, but the great variety of
borers, so characteristic of the Aurignacian and the Magdalenian
industries, had entirely disappeared in Azilian times.
The marks of industrial degeneration are also conspicuous in the
bone implements, which show a very great deterioration in number
and quality as compared with the Magdalenian, and which are
principally confined to three types—the harpoons, the awls
(poinçons), and the smoothers (lissoirs), together with very small
bone borers (perçoirs). The distinctive feature of the Azilian bone
industry is the flat harpoon of stag horn; it is known that the use of
stags' antlers for fashioning harpoons began in the late Magdalenian,
when most of them were still being fashioned from reindeer horn.
These flat Azilian harpoons succeed the type of the double-rowed,
cylindrical harpoons of the late Magdalenian, and are found mainly
where the rivers, lakes, or pools offered favorable conditions for
fishing. Thus the Azilian bone-harpoon industry, like the Tardenoisian
microlithic flint industry, was largely pursued by fisherfolk.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and

personal growth!

textbookfull.com

First Course in Statistical Programming With R 2nd Edition Braun - The 2025 ebook edition is available with updated content
100% (1)
First Course in Statistical Programming With R 2nd Edition Braun - The 2025 ebook edition is available with updated content
79 pages
Instant Download (Ebook PDF) Statistics: Principles and Methods, 8th Edition PDF All Chapter
100% (3)
Instant Download (Ebook PDF) Statistics: Principles and Methods, 8th Edition PDF All Chapter
41 pages
Introduction To Statistics and Data Analysis
No ratings yet
Introduction To Statistics and Data Analysis
567 pages
Behavioral Research Data Analysis
100% (1)
Behavioral Research Data Analysis
247 pages
Download Full Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann PDF All Chapters
100% (1)
Download Full Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann PDF All Chapters
39 pages
Full download (Ebook) Introduction to Statistics and Data Analysis : With Exercises, Solutions and Applications in R by Christian Heumann, Michael Schomaker, Shalabh (auth.) ISBN 9783319461601, 9783319461625, 3319461605, 3319461621 pdf docx
100% (8)
Full download (Ebook) Introduction to Statistics and Data Analysis : With Exercises, Solutions and Applications in R by Christian Heumann, Michael Schomaker, Shalabh (auth.) ISBN 9783319461601, 9783319461625, 3319461605, 3319461621 pdf docx
65 pages
Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann - The ebook is available for quick download, easy access to content
100% (2)
Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann - The ebook is available for quick download, easy access to content
69 pages
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
100% (3)
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
50 pages
(Ebook) Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition by Christian Heumann, Michael Schomaker, Shalabh ISBN 9783031118326, 3031118324 - The ebook is now available, just one click to start reading
100% (2)
(Ebook) Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition by Christian Heumann, Michael Schomaker, Shalabh ISBN 9783031118326, 3031118324 - The ebook is now available, just one click to start reading
86 pages
Introductory Statistics for Data Analysis Warren J. Ewens instant download
No ratings yet
Introductory Statistics for Data Analysis Warren J. Ewens instant download
33 pages
R Using R Statistics stowell2014
No ratings yet
R Using R Statistics stowell2014
232 pages
Full download (Ebook) Introductory Statistics for Data Analysis by Warren J. Ewens, Katherine Brumberg ISBN 9783031281884, 3031281888 pdf docx
100% (7)
Full download (Ebook) Introductory Statistics for Data Analysis by Warren J. Ewens, Katherine Brumberg ISBN 9783031281884, 3031281888 pdf docx
81 pages
Statistics and Computing: J. Chambers D. Hand W. H Ardle
No ratings yet
Statistics and Computing: J. Chambers D. Hand W. H Ardle
16 pages
Basic Elements of Computational Statistics
No ratings yet
Basic Elements of Computational Statistics
318 pages
R With RStudio For Introductory Statistics
No ratings yet
R With RStudio For Introductory Statistics
163 pages
Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization 1st Edition Matt Wiley download pdf
100% (2)
Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization 1st Edition Matt Wiley download pdf
55 pages
Complete Download Learning Statistics Using R 1st Edition Randall E. Schumacker PDF All Chapters
100% (4)
Complete Download Learning Statistics Using R 1st Edition Randall E. Schumacker PDF All Chapters
81 pages
2015 Book StatisticalAnalysisAndDataDisp PDF
No ratings yet
2015 Book StatisticalAnalysisAndDataDisp PDF
909 pages
56502
No ratings yet
56502
55 pages
Get (Ebook) Learning Statistics Using R by Randall E. Schumacker ISBN 9781452286297, 1452286299 PDF ebook with Full Chapters Now
100% (4)
Get (Ebook) Learning Statistics Using R by Randall E. Schumacker ISBN 9781452286297, 1452286299 PDF ebook with Full Chapters Now
81 pages
[EBOOK PDF] Download complete Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos ebook
No ratings yet
[EBOOK PDF] Download complete Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos ebook
67 pages
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data 2024 scribd download
100% (13)
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data 2024 scribd download
55 pages
Learning Statistics Using R 1st Edition Randall E. Schumacker download
100% (1)
Learning Statistics Using R 1st Edition Randall E. Schumacker download
86 pages
(eBook PDF) Statistics: Principles and Methods, 8th Edition download pdf
100% (1)
(eBook PDF) Statistics: Principles and Methods, 8th Edition download pdf
40 pages
PDF (Ebook PDF) Introductory Statistics: Exploring The World Through Data 3rd Edition Download
100% (3)
PDF (Ebook PDF) Introductory Statistics: Exploring The World Through Data 3rd Edition Download
51 pages
(eBook PDF) Introductory Statistics: Exploring the World Through Data 3rd Editioninstant download
100% (5)
(eBook PDF) Introductory Statistics: Exploring the World Through Data 3rd Editioninstant download
51 pages
Beginning R
No ratings yet
Beginning R
337 pages
1 (1)
No ratings yet
1 (1)
13 pages
Discovering Statistics Using R Zoe Field - Download the ebook now for an unlimited reading experience
No ratings yet
Discovering Statistics Using R Zoe Field - Download the ebook now for an unlimited reading experience
53 pages
Journal of Statistical Software: Reviewer: Pedro Valero-Mora University of Valencia
No ratings yet
Journal of Statistical Software: Reviewer: Pedro Valero-Mora University of Valencia
2 pages
R Book Distribution PDF
No ratings yet
R Book Distribution PDF
215 pages
Download Complete Learning Statistics Using R 1st Edition Randall E. Schumacker PDF for All Chapters
100% (2)
Download Complete Learning Statistics Using R 1st Edition Randall E. Schumacker PDF for All Chapters
88 pages
Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos - Download the ebook today and own the complete version
No ratings yet
Flexible regression and smoothing using GAMLSS in R 1st Edition Mikis D. Stasinopoulos - Download the ebook today and own the complete version
57 pages
Runit 1
No ratings yet
Runit 1
55 pages
Download full Beginning R 4: From Beginner to Pro 1st Edition Matt Wiley ebook all chapters
100% (6)
Download full Beginning R 4: From Beginner to Pro 1st Edition Matt Wiley ebook all chapters
55 pages
Learning Statistics Using R 1st Edition Randall E. Schumacker - The special ebook edition is available for download now
100% (1)
Learning Statistics Using R 1st Edition Randall E. Schumacker - The special ebook edition is available for download now
85 pages
IntroStat Oct2010
No ratings yet
IntroStat Oct2010
324 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
MATH1208AnnotatedBook Imp
No ratings yet
MATH1208AnnotatedBook Imp
145 pages
PDF Discovering Statistics Using R Zoe Field download
100% (13)
PDF Discovering Statistics Using R Zoe Field download
60 pages
Full Download Basic Statistics with R: Reaching Decisions with Data Stephen C. Loftus PDF DOCX
100% (3)
Full Download Basic Statistics with R: Reaching Decisions with Data Stephen C. Loftus PDF DOCX
41 pages
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data - The 2025 ebook edition is available with updated content
100% (1)
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data - The 2025 ebook edition is available with updated content
44 pages
Discovering Statistics Using R 1st Edition Andy Field - The complete ebook version is now available for download
100% (1)
Discovering Statistics Using R 1st Edition Andy Field - The complete ebook version is now available for download
48 pages
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann pdf download
No ratings yet
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann pdf download
28 pages
(Ebook) Learning Statistics Using R by Randall E. Schumacker ISBN 9781452286297, 1452286299download
100% (4)
(Ebook) Learning Statistics Using R by Randall E. Schumacker ISBN 9781452286297, 1452286299download
61 pages
Getting started with R An Introduction for Biologists 2nd Edition Andrew Beckerman download
No ratings yet
Getting started with R An Introduction for Biologists 2nd Edition Andrew Beckerman download
86 pages
StatisticUsing R PDF
No ratings yet
StatisticUsing R PDF
35 pages
Instant download (Ebook) Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization by Matt Wiley, Joshua F. Wiley ISBN 9781484228715, 1484228715 pdf all chapter
100% (9)
Instant download (Ebook) Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization by Matt Wiley, Joshua F. Wiley ISBN 9781484228715, 1484228715 pdf all chapter
65 pages
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data pdf download
No ratings yet
(eBook PDF) Biostatistics with R An Introduction to Statistics Through Biological Data pdf download
50 pages
Intro Stat
No ratings yet
Intro Stat
324 pages
Linear Models with R (Chapman & Hall/CRC Texts in Statistical Science) 2nd Edition, (Ebook PDF) - The complete ebook version is now available for download
100% (1)
Linear Models with R (Chapman & Hall/CRC Texts in Statistical Science) 2nd Edition, (Ebook PDF) - The complete ebook version is now available for download
63 pages
Get Probability and Statistics With R Arnholt Free All Chapters
100% (5)
Get Probability and Statistics With R Arnholt Free All Chapters
75 pages
Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
100% (6)
Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
51 pages
Complete Nonparametric Statistical Methods Using R 1st Edition John Kloke PDF For All Chapters
100% (13)
Complete Nonparametric Statistical Methods Using R 1st Edition John Kloke PDF For All Chapters
70 pages
Instant Download Data Analysis and Graphics Using R An Example based Approach 2nd Edition John Maindonald PDF All Chapters
No ratings yet
Instant Download Data Analysis and Graphics Using R An Example based Approach 2nd Edition John Maindonald PDF All Chapters
81 pages
Aragón, Tomás J. - Applied Epidemiology Using R-Springer (2010)
No ratings yet
Aragón, Tomás J. - Applied Epidemiology Using R-Springer (2010)
190 pages
05.0 PP 1 6 Getting Started
No ratings yet
05.0 PP 1 6 Getting Started
6 pages
Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon download
100% (2)
Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon download
29 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
A Workout in Computational Finance
From Everand
A Workout in Computational Finance
Andreas Binder
No ratings yet
Subjective Global Nutrition Assessment Children
No ratings yet
Subjective Global Nutrition Assessment Children
7 pages
Abstract
No ratings yet
Abstract
4 pages
Chi-Square Pearson Spearman
No ratings yet
Chi-Square Pearson Spearman
8 pages
Download Using IBM SPSS Statistics: An Interactive Hands On Approach 3rd Edition, (Ebook PDF) ebook All Chapters PDF
100% (1)
Download Using IBM SPSS Statistics: An Interactive Hands On Approach 3rd Edition, (Ebook PDF) ebook All Chapters PDF
41 pages
Two Mark Questions With Answers (1)
No ratings yet
Two Mark Questions With Answers (1)
31 pages
Statistics and Experimental Design for Toxicologists and Pharmacologists Fourth Edition Shayne C. Gad - The ebook in PDF format is available for download
100% (1)
Statistics and Experimental Design for Toxicologists and Pharmacologists Fourth Edition Shayne C. Gad - The ebook in PDF format is available for download
29 pages
Download full Simplified Business Statistics Using SPSS 1st Edition Gabriel Otieno Okello ebook all chapters
100% (1)
Download full Simplified Business Statistics Using SPSS 1st Edition Gabriel Otieno Okello ebook all chapters
57 pages
Chapter III
No ratings yet
Chapter III
5 pages
College Academic Performance in Science-Related Programs and Senior High School Strands: A Basis For Higher Education Admission Policy
No ratings yet
College Academic Performance in Science-Related Programs and Senior High School Strands: A Basis For Higher Education Admission Policy
11 pages
Fag Erland 2009
No ratings yet
Fag Erland 2009
7 pages
Statistics With R Solving Problems Using Real World Data 1st Edition Jenine K Harris All Chapters Instant Download
100% (2)
Statistics With R Solving Problems Using Real World Data 1st Edition Jenine K Harris All Chapters Instant Download
62 pages
Hasil Analisis Data Delvi
No ratings yet
Hasil Analisis Data Delvi
13 pages
Uji Statistik Deskriptif Kikiyyy
No ratings yet
Uji Statistik Deskriptif Kikiyyy
3 pages
Introduction To Nonparametric Statistics Craig L. Scanlan, Edd, RRT
No ratings yet
Introduction To Nonparametric Statistics Craig L. Scanlan, Edd, RRT
11 pages
Psychologists Should Use Brunner-Munzel’s Instead of Mann-Whitney’s U Test as the Default Nonparametric Procedure (2021)
No ratings yet
Psychologists Should Use Brunner-Munzel’s Instead of Mann-Whitney’s U Test as the Default Nonparametric Procedure (2021)
14 pages
Vol 19 No 9 September 2020
No ratings yet
Vol 19 No 9 September 2020
408 pages
22bap03-Dabm Lab Manual
100% (1)
22bap03-Dabm Lab Manual
56 pages
Analyze 4 Hypothesis Roadmap
No ratings yet
Analyze 4 Hypothesis Roadmap
1 page
CUSTOMER SATISFACTION TOWARDS AIRTEL 4G IN SIVAKASI Ijariie9085
No ratings yet
CUSTOMER SATISFACTION TOWARDS AIRTEL 4G IN SIVAKASI Ijariie9085
8 pages
MTH302 Solved MCQs Mega File
No ratings yet
MTH302 Solved MCQs Mega File
34 pages
Rosdiana 3
No ratings yet
Rosdiana 3
11 pages
Information and Communication Technology (ICT) Knowledge, Skills, and Attitude Basis For DEPED Support System
No ratings yet
Information and Communication Technology (ICT) Knowledge, Skills, and Attitude Basis For DEPED Support System
13 pages
Umbilicoplastia sin cicatriz
No ratings yet
Umbilicoplastia sin cicatriz
7 pages
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
No ratings yet
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
10 pages
Rank Biserial Correlation
100% (1)
Rank Biserial Correlation
3 pages
Berpikir Positif Untuk Mengurangi Stress
No ratings yet
Berpikir Positif Untuk Mengurangi Stress
22 pages
Frequencies
No ratings yet
Frequencies
3 pages
The Mann Whitney or Wilcoxon Rank
No ratings yet
The Mann Whitney or Wilcoxon Rank
6 pages
spss skripsi
No ratings yet
spss skripsi
3 pages
Lampiran Lampiran 1. Syarat Mutu Sirup: Perpustakaan Unika
No ratings yet
Lampiran Lampiran 1. Syarat Mutu Sirup: Perpustakaan Unika
32 pages

Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann pdf download

Uploaded by

Introduction to Statistics and Data Analysis With Exercises Solutions and Applications in R 1st Edition Christian Heumann pdf download

Uploaded by

Introduction to Statistics and Data Analysis

With Exercises Solutions and Applications in R

Download more ebook from https://textbookfull.com

Introduction to Statistics and Data Analysis Roxy Peck

An Introduction to Secondary Data Analysis with IBM

An Introduction to Secondary Data Analysis with IBM

Introduction to Data Science Data Analysis and

Business Statistics with Solutions in R 1st Edition

Data Mining with SPSS Modeler Theory Exercises and

An Introduction to Categorical Data Analysis 3rd

Introduction to Data Analysis with R for Forensic

ISBN 978-3-319-46160-1 ISBN 978-3-319-46162-5 (eBook)

Library of Congress Control Number: 2016955516

© Springer International Publishing Switzerland 2016

Printed on acid-free paper

This Springer imprint is published by Springer Nature

Many years of teaching experience, from undergraduate to postgraduate level,

München, Germany Christian Heumann

Part I Descriptive Statistics

3.1.5 Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Part II Probability Calculus

5.3.1 Combinations without Replacement

8 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Part III Inductive Statistics

10.2.1 One- and Two-Sample Problems . . . . . . . . . . . . . . . . . 210

11.10 Association Versus Causation . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Prof. Christian Heumann is a professor at the Ludwig-Maximilians-Universität

Statistics is a collection of methods which help us to describe, summarize, interpret,

1.1 Population, Sample, and Observations

© Springer International Publishing Switzerland 2016 3

symbol ω. The collection of all units is called population and is represented by Ω.

Remark 1.1.1 Sometimes, the concept of a population is not applicable or difficult

If we have specified the population of interest for a specific research question, we

• If X refers to gender, possible x-values are contained in S = {male, female}. Each

1.2.1 Qualitative and Quantitative Variables

• the colour of the eye,

Remark 1.2.1 It is common to assign numbers to qualitative variables for practical

1.2.2 Discrete and Continuous Variables

1.2.4 Grouped Data

If data is available in grouped form, we call the respective variable capturing

Fig. 1.1 Summary of variable classifications

1.3 Data Collection

1.4 Creating a Data Set

ω Variable 1 Variable 2 ··· Variable p

Fig. 1.2 Data set or data matrix

ω Music Mathematics Biology Geography

Fig. 1.3 Data set of marks of five students

Delivery Delivery Time Temperature ··· Branch

Fig. 1.4 Pizza data set

Table 1.1 Coding list for Variable Values Code

Another consideration when collecting data is that of transformations: we may

Example 1.4.3 The temperature in ◦ F relates to the temperature in ◦ C as follows:

Changing currencies is also possible. If we would like to represent the price of a

1.4.1 Statistical Software

Fig. 1.5 Viewing data in R

1.5 Key Points and Further Issues

The scale of variables is not only a formalism but an essential framework

(a) Evaluation of the satisfaction of employees from an airline.

Exercise 1.4 Identify the scale of the following variables:

(a) Political party voted for in an election

→ Solutions to all exercises in this chapter can be found on p. 321

In Chap. 1, we highlighted that different variables contain different levels of informa-

2.1 Absolute and Relative Frequencies

© Springer International Publishing Switzerland 2016 17

the proportions of male and female customers in the queue.

Class intervals 0–20 21–40 41–60 61–80 81–100

Fig. 244. The open loess

Decline of the Magdalenian Culture

Fig. 245. Front and side

Crô-magnon Descendants in Modern Europe

(1) Breuil, 1912.7, p. 203.

Invasion of Four New Races in Closing Upper Palæolithic Times

The Tardenoisian Type Station

Fig. 249. Small geometric

Environment and Mammalian Life

Origin and Distrubution of the Azilian-Tardenoisian Industry