100% found this document useful (1 vote)
53 views

PDF R Programming for Data Science 1st Edition Roger Peng download

Data

Uploaded by

zeryamemos2x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
53 views

PDF R Programming for Data Science 1st Edition Roger Peng download

Data

Uploaded by

zeryamemos2x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Download the Full Version of textbook for Fast Typing at textbookfull.

com

R Programming for Data Science 1st Edition Roger


Peng

https://textbookfull.com/product/r-programming-for-data-
science-1st-edition-roger-peng/

OR CLICK BUTTON

DOWNLOAD NOW

Download More textbook Instantly Today - Get Yours Now at textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Functional Programming in R: Advanced Statistical


Programming for Data Science, Analysis and Finance 1st
Edition Thomas Mailund
https://textbookfull.com/product/functional-programming-in-r-advanced-
statistical-programming-for-data-science-analysis-and-finance-1st-
edition-thomas-mailund/
textboxfull.com

Advanced Object-Oriented Programming in R: Statistical


Programming for Data Science, Analysis and Finance 1st
Edition Thomas Mailund
https://textbookfull.com/product/advanced-object-oriented-programming-
in-r-statistical-programming-for-data-science-analysis-and-
finance-1st-edition-thomas-mailund/
textboxfull.com

Metaprogramming in R: Advanced Statistical Programming for


Data Science, Analysis and Finance 1st Edition Thomas
Mailund
https://textbookfull.com/product/metaprogramming-in-r-advanced-
statistical-programming-for-data-science-analysis-and-finance-1st-
edition-thomas-mailund/
textboxfull.com

R for Data Science 1st Edition Garrett Grolemund

https://textbookfull.com/product/r-for-data-science-1st-edition-
garrett-grolemund/

textboxfull.com
Programming Skills for Data Science 1st Edition Michael
Freeman

https://textbookfull.com/product/programming-skills-for-data-
science-1st-edition-michael-freeman/

textboxfull.com

Beginning Data Science in R: Data Analysis, Visualization,


and Modelling for the Data Scientist 1st Edition Thomas
Mailund
https://textbookfull.com/product/beginning-data-science-in-r-data-
analysis-visualization-and-modelling-for-the-data-scientist-1st-
edition-thomas-mailund/
textboxfull.com

Functional Data Structures in R: Advanced Statistical


Programming in R Mailund

https://textbookfull.com/product/functional-data-structures-in-r-
advanced-statistical-programming-in-r-mailund/

textboxfull.com

Advanced R: Data Programming and the Cloud 1st Edition


Matt Wiley

https://textbookfull.com/product/advanced-r-data-programming-and-the-
cloud-1st-edition-matt-wiley/

textboxfull.com

Doing Data Science in R An Introduction for Social


Scientists 1st Edition Mark Andrews

https://textbookfull.com/product/doing-data-science-in-r-an-
introduction-for-social-scientists-1st-edition-mark-andrews/

textboxfull.com
R Programming for Data Science
Roger D. Peng
This book is for sale at http://leanpub.com/rprogramming

This version was published on 2019-12-11

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.

© 2014 - 2019 Roger D. Peng


Also By Roger D. Peng
The Art of Data Science
Exploratory Data Analysis with R
Executive Data Science
Report Writing for Data Science in R
Advanced Statistical Computing
The Data Science Salon
Conversations On Data Science
Mastering Software Development in R
Essays on Data Analysis
Contents

1. Stay in Touch! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3. History and Overview of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


3.1 What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 What is S? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 The S Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Back to R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.5 Basic Features of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.6 Free Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.7 Design of the R System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.8 Limitations of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.9 R Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4. Getting Started with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


4.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Getting started with the R interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5. R Nuts and Bolts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


5.1 Entering Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6 Creating Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.7 Mixing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.8 Explicit Coercion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.9 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.10 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.11 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.12 Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.13 Data Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.14 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.15 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
CONTENTS

6. Getting Data In and Out of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


6.1 Reading and Writing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Reading Data Files with read.table() . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.3 Reading in Larger Datasets with read.table . . . . . . . . . . . . . . . . . . . . . . . . 25
6.4 Calculating Memory Requirements for R Objects . . . . . . . . . . . . . . . . . . . . . 26

7. Using the readr Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8. Using Textual and Binary Formats for Storing Data . . . . . . . . . . . . . . . . . . . . . . 32


8.1 Using dput() and dump() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.2 Binary Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

9. Interfaces to the Outside World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36


9.1 File Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.2 Reading Lines of a Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.3 Reading From a URL Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

10. Subsetting R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


10.1 Subsetting a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
10.2 Subsetting a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
10.3 Subsetting Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
10.4 Subsetting Nested Elements of a List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
10.5 Extracting Multiple Elements of a List . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10.6 Partial Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10.7 Removing NA Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

11. Vectorized Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46


11.1 Vectorized Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

12. Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48


12.1 Dates in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
12.2 Times in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
12.3 Operations on Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

13. Managing Data Frames with the dplyr package . . . . . . . . . . . . . . . . . . . . . . . . . 52


13.1 Data Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
13.2 The dplyr Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
13.3 dplyr Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
13.4 Installing the dplyr package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
13.5 select() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
13.6 filter() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
13.7 arrange() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
13.8 rename() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
13.9 mutate() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
CONTENTS

13.10 group_by() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60


13.11 %>% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
13.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

14. Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65


14.1 if-else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14.2 for Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
14.3 Nested for loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
14.4 while Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
14.5 repeat Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
14.6 next, break . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
14.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

15. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
15.1 Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
15.2 Your First Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
15.3 Argument Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
15.4 Lazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
15.5 The ... Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
15.6 Arguments Coming After the ... Argument . . . . . . . . . . . . . . . . . . . . . . . 79
15.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

16. Scoping Rules of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81


16.1 A Diversion on Binding Values to Symbol . . . . . . . . . . . . . . . . . . . . . . . . . 81
16.2 Scoping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
16.3 Lexical Scoping: Why Does It Matter? . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
16.4 Lexical vs. Dynamic Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
16.5 Application: Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
16.6 Plotting the Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
16.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

17. Coding Standards for R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

18. Loop Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


18.1 Looping on the Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
18.2 lapply() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
18.3 sapply() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
18.4 split() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
18.5 Splitting a Data Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
18.6 tapply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
18.7 apply() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
18.8 Col/Row Sums and Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
18.9 Other Ways to Apply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
18.10 mapply() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
CONTENTS

18.11 Vectorizing a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


18.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

19. Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111


19.1 Before You Begin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
19.2 Primary R Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
19.3 grep() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
19.4 grepl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
19.5 regexpr() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
19.6 sub() and gsub() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
19.7 regexec() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
19.8 The stringr Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
19.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

20. Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


20.1 Something’s Wrong! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
20.2 Figuring Out What’s Wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
20.3 Debugging Tools in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
20.4 Using traceback() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
20.5 Using debug() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
20.6 Using recover() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
20.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

21. Profiling R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


21.1 Using system.time() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
21.2 Timing Longer Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
21.3 The R Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
21.4 Using summaryRprof() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
21.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

22. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138


22.1 Generating Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
22.2 Setting the random number seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.3 Simulating a Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
22.4 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

23. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S. . . . . . . 148
23.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
23.2 Loading and Processing the Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
23.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

24. Parallel Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158


24.1 Hidden Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
CONTENTS

24.2 Embarrassing Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160


24.3 The Parallel Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
24.4 Example: Bootstrapping a Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
24.5 Building a Socket Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
24.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

25. Why I Indent My Code 8 Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

26. About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176


1. Stay in Touch!
Thanks for purchasing this book. If you are interested in hearing more from me about things that
I’m working on (books, data science courses, podcast, etc.), I have a regular podcast called Not So
Standard Deviations¹ that I co-host with Dr. Hilary Parker, a Data Scientist at Stitch Fix. On this
podcast, Hilary and I talk about the craft of data science and discuss common issues and problems
in analyzing data. We’ll also compare how data science is approached in both academia and industry
contexts and discuss the latest industry trends. You can listen to recent episodes on our web page or
you can subscribe to it in iTunes² or your favorite podcasting app.
For those of you who purchased a printed copy of this book, I encourage you to go to the Leanpub
web site and obtain the e-book version³, which is available for free. The reason is that I will
occasionally update the book with new material and readers who purchase the e-book version are
entitled to free updates (this is unfortunately not yet possible with printed books).
Thanks again for purchasing this book and please do stay in touch!
¹http://nssdeviations.com
²https://itunes.apple.com/us/podcast/not-so-standard-deviations/id1040614570
³https://leanpub.com/rprogramming
2. Preface
I started using R in 1998 when I was a college undergraduate working on my senior thesis.
The version was 0.63. I was an applied mathematics major with a statistics concentration and
I was working with Dr. Nicolas Hengartner on an analysis of word frequencies in classic texts
(Shakespeare, Milton, etc.). The idea was to see if we could identify the authorship of each of the
texts based on how frequently they used certain words. We downloaded the data from Project
Gutenberg and used some basic linear discriminant analysis for the modeling. The work was
eventually published¹ and was my first ever peer-reviewed publication. I guess you could argue
it was my first real “data science” experience.
Back then, no one was using R. Most of my classes were taught with Minitab, SPSS, Stata, or
Microsoft Excel. The cool people on the cutting edge of statistical methodology used S-PLUS. I
was working on my thesis late one night and I had a problem. I didn’t have a copy of any of those
software packages because they were expensive and I was a student. I didn’t feel like trekking over
to the computer lab to use the software because it was late at night.
But I had the Internet! After a couple of Yahoo! searches I found a web page for something called R,
which I figured was just a play on the name of the S-PLUS package. From what I could tell, R was a
“clone” of S-PLUS that was free. I had already written some S-PLUS code for my thesis so I figured
I would try to download R and see if I could just run the S-PLUS code.
It didn’t work. At least not at first. It turns out that R is not exactly a clone of S-PLUS and quite a few
modifications needed to be made before the code would run in R. In particular, R was missing a lot of
statistical functionality that had existed in S-PLUS for a long time already. Luckily, R’s programming
language was pretty much there and I was able to more or less re-implement the features that were
missing in R.
After college, I enrolled in a PhD program in statistics at the University of California, Los Angeles.
At the time the department was brand new and they didn’t have a lot of policies or rules (or classes,
for that matter!). So you could kind of do what you wanted, which was good for some students and
not so good for others. The Chair of the department, Jan de Leeuw, was a big fan of XLisp-Stat and
so all of the department’s classes were taught using XLisp-Stat. I diligently bought my copy of Luke
Tierney’s book² and learned to really love XLisp-Stat. It had a number of features that R didn’t have
at all, most notably dynamic graphics.
But ultimately, there were only so many parentheses that I could type, and still all of the research-
level statistics was being done in S-PLUS. The department didn’t really have a lot of copies of S-PLUS
lying around so I turned back to R. When I looked around at my fellow students, I realized that I
was basically the only one who had any experience using R. Since there was a budding interest in R
¹http://amstat.tandfonline.com/doi/abs/10.1198/000313002100#.VQGiSELpagE
²http://www.amazon.com/LISP-STAT-Object-Oriented-Environment-Statistical-Probability/dp/0471509167/
Preface 3

around the department, I decided to start a “brown bag” series where every week for about an hour
I would talk about something you could do in R (which wasn’t much, really). People seemed to like
it, if only because there wasn’t really anyone to turn to if you wanted to learn about R.
By the time I left grad school in 2003, the department had essentially switched over from XLisp-
Stat to R for all its work (although there were a few hold outs). Jan discusses the rationale for the
transition in a paper³ in the Journal of Statistical Software.
In the next step of my career, I went to the Department of Biostatistics⁴ at the Johns Hopkins
Bloomberg School of Public Health, where I have been for the past 16 years. When I got to Johns
Hopkins people already seemed into R. Most people had abandoned S-PLUS a while ago and were
committed to using R for their research. Of all the available statistical packages, R had the most
powerful and expressive programming language, which was perfect for someone developing new
statistical methods.
However, we didn’t really have a class that taught students how to use R. This was a problem because
most of our grad students were coming into the program having never heard of R. Most likely in
their undergraduate programs, they used some other software package. So along with Rafael Irizarry,
Brian Caffo, Ingo Ruczinski, and Karl Broman, I started a new class to teach our graduate students
R and a number of other skills they’d need in grad school.
The class was basically a weekly seminar where one of us talked about a computing topic of interest.
I gave some of the R lectures in that class and when I asked people who had heard of R before, almost
no one raised their hand. And no one had actually used it before. The main selling point at the time
was “It’s just like S-PLUS but it’s free!” A lot of people had experience with SAS or Stata or SPSS.
A number of people had used something like Java or C/C++ before and so I often used that as a
reference frame. No one had ever used a functional-style of programming language like Scheme or
Lisp.
To this day, I still teach the class, known a Biostatistics 140.776 (“Statistical Computing”). However,
the nature of the class has changed quite a bit over the years. The population of students (mostly
first-year graduate students) has shifted to the point where many of them have been introduced to R
as undergraduates. This trend mirrors the overall trend with statistics where we are seeing more and
more students do undergraduate majors in statistics (as opposed to, say, mathematics). Eventually,
by 2008–2009, when I’d asked how many people had heard of or used R before, everyone raised
their hand. However, even at that late date, I still felt the need to convince people that R was a “real”
language that could be used for real tasks.
R has grown a lot in recent years, and is being used in so many places now, that I think it’s
essentially impossible for a person to keep track of everything that is going on. That’s fine, but
it makes “introducing” people to R an interesting experience. Nowadays in class, students are often
teaching me something new about R that I’ve never seen or heard of before (they are quite good
at Googling around for themselves). I feel no need to “bring people over” to R. In fact it’s quite the
opposite–people might start asking questions if I weren’t teaching R.
³http://www.jstatsoft.org/v13/i07
⁴http://www.biostat.jhsph.edu
Preface 4

This book comes from my experience teaching R in a variety of settings and through different stages
of its (and my) development. Much of the material has been taken from by Statistical Computing
class as well as the R Programming⁵ class I teach through Coursera.
I’m looking forward to teaching R to people as long as people will let me, and I’m interested in
seeing how the next generation of students will approach it (and how my approach to them will
change). Overall, it’s been just an amazing experience to see the widespread adoption of R over the
past decade. I’m sure the next decade will be just as amazing.
⁵https://www.coursera.org/course/rprog
3. History and Overview of R
There are only two kinds of languages: the ones people complain about and the ones
nobody uses —Bjarne Stroustrup

Watch a video of this chapter¹

3.1 What is R?
This is an easy question to answer. R is a dialect of S.

3.2 What is S?
S is a language that was developed by John Chambers and others at the old Bell Telephone
Laboratories, originally part of AT&T Corp. S was initiated in 1976² as an internal statistical analysis
environment—originally implemented as Fortran libraries. Early versions of the language did not
even contain functions for statistical modeling.
In 1988 the system was rewritten in C and began to resemble the system that we have today (this
was Version 3 of the language). The book Statistical Models in S by Chambers and Hastie (the white
book) documents the statistical analysis functionality. Version 4 of the S language was released in
1998 and is the version we use today. The book Programming with Data by John Chambers (the
green book) documents this version of the language.
Since the early 90’s the life of the S language has gone down a rather winding path. In 1993 Bell Labs
gave StatSci (later Insightful Corp.) an exclusive license to develop and sell the S language. In 2004
Insightful purchased the S language from Lucent for $2 million. In 2006, Alcatel purchased Lucent
Technologies and is now called Alcatel-Lucent.
Insightful sold its implementation of the S language under the product name S-PLUS and built a
number of fancy features (GUIs, mostly) on top of it—hence the “PLUS”. In 2008 Insightful was
acquired by TIBCO for $25 million. As of this writing TIBCO is the current owner of the S language
and is its exclusive developer.
The fundamentals of the S language itself has not changed dramatically since the publication of the
Green Book by John Chambers in 1998. In 1998, S won the Association for Computing Machinery’s
Software System Award, a highly prestigious award in the computer science field.
¹https://youtu.be/STihTnVSZnI
²http://cm.bell-labs.com/stat/doc/94.11.ps
History and Overview of R 6

3.3 The S Philosophy


The general S philosophy is important to understand for users of S and R because it sets the stage for
the design of the language itself, which many programming veterans find a bit odd and confusing.
In particular, it’s important to realize that the S language had its roots in data analysis, and did not
come from a traditional programming language background. Its inventors were focused on figuring
out how to make data analysis easier, first for themselves, and then eventually for others.
In Stages in the Evolution of S³, John Chambers writes:

“[W]e wanted users to be able to begin in an interactive environment, where they did not
consciously think of themselves as programming. Then as their needs became clearer and
their sophistication increased, they should be able to slide gradually into programming,
when the language and system aspects would become more important.”

The key part here was the transition from user to developer. They wanted to build a language that
could easily service both “people”. More technically, they needed to build language that would
be suitable for interactive data analysis (more command-line based) as well as for writing longer
programs (more traditional programming language-like).

3.4 Back to R
The R language came to use quite a bit after S had been developed. One key limitation of the S
language was that it was only available in a commericial package, S-PLUS. In 1991, R was created
by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland. In
1993 the first announcement of R was made to the public. Ross’s and Robert’s experience developing
R is documented in a 1996 paper in the Journal of Computational and Graphical Statistics:

Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal
of Computational and Graphical Statistics, 5(3):299–314, 1996

In 1995, Martin Mächler made an important contribution by convincing Ross and Robert to use the
GNU General Public License⁴ to make R free software. This was critical because it allowed for the
source code for the entire R system to be accessible to anyone who wanted to tinker with it (more
on free software later).
In 1996, a public mailing list was created (the R-help and R-devel lists) and in 1997 the R Core
Group was formed, containing some people associated with S and S-PLUS. Currently, the core group
controls the source code for R and is solely able to check in changes to the main R source tree. Finally,
in 2000 R version 1.0.0 was released to the public.
³http://www.stat.bell-labs.com/S/history.html
⁴http://www.gnu.org/licenses/gpl-2.0.html
History and Overview of R 7

3.5 Basic Features of R


In the early days, a key feature of R was that its syntax is very similar to S, making it easy for
S-PLUS users to switch over. While the R’s syntax is nearly identical to that of S’s, R’s semantics,
while superficially similar to S, are quite different. In fact, R is technically much closer to the Scheme
language than it is to the original S language when it comes to how R works under the hood.
Today R runs on almost any standard computing platform and operating system. Its open source
nature means that anyone is free to adapt the software to whatever platform they choose. Indeed, R
has been reported to be running on modern tablets, phones, PDAs, and game consoles.
One nice feature that R shares with many popular open source projects is frequent releases. These
days there is a major annual release, typically in October, where major new features are incorporated
and released to the public. Throughout the year, smaller-scale bugfix releases will be made as needed.
The frequent releases and regular release cycle indicates active development of the software and
ensures that bugs will be addressed in a timely manner. Of course, while the core developers control
the primary source tree for R, many people around the world make contributions in the form of new
feature, bug fixes, or both.
Another key advantage that R has over many other statistical packages (even today) is its sophisti-
cated graphics capabilities. R’s ability to create “publication quality” graphics has existed since the
very beginning and has generally been better than competing packages. Today, with many more
visualization packages available than before, that trend continues. R’s base graphics system allows
for very fine control over essentially every aspect of a plot or graph. Other newer graphics systems,
like lattice and ggplot2 allow for complex and sophisticated visualizations of high-dimensional data.
R has maintained the original S philosophy, which is that it provides a language that is both useful
for interactive work, but contains a powerful programming language for developing new tools. This
allows the user, who takes existing tools and applies them to data, to slowly but surely become a
developer who is creating new tools.
Finally, one of the joys of using R has nothing to do with the language itself, but rather with the
active and vibrant user community. In many ways, a language is successful inasmuch as it creates a
platform with which many people can create new things. R is that platform and thousands of people
around the world have come together to make contributions to R, to develop packages, and help
each other use R for all kinds of applications. The R-help and R-devel mailing lists have been highly
active for over a decade now and there is considerable activity on web sites like Stack Overflow.

3.6 Free Software


A major advantage that R has over many other statistical packages and is that it’s free in the sense
of free software (it’s also free in the sense of free beer). The copyright for the primary source code
for R is held by the R Foundation⁵ and is published under the GNU General Public License version
⁵http://www.r-project.org/foundation/
History and Overview of R 8

2.0⁶.
According to the Free Software Foundation, with free software, you are granted the following four
freedoms⁷

• The freedom to run the program, for any purpose (freedom 0).
• The freedom to study how the program works, and adapt it to your needs (freedom 1). Access
to the source code is a precondition for this.
• The freedom to redistribute copies so you can help your neighbor (freedom 2).
• The freedom to improve the program, and release your improvements to the public, so that the
whole community benefits (freedom 3). Access to the source code is a precondition for this.

You can visit the Free Software Foundation’s web site⁸ to learn a lot more about free software. The
Free Software Foundation was founded by Richard Stallman in 1985 and Stallman’s personal web
site⁹ is an interesting read if you happen to have some spare time.

3.7 Design of the R System


The primary R system is available from the Comprehensive R Archive Network¹⁰, also known as
CRAN. CRAN also hosts many add-on packages that can be used to extend the functionality of R.
The R system is divided into 2 conceptual parts:

1. The “base” R system that you download from CRAN: Linux¹¹ Windows¹² Mac¹³ Source Code¹⁴
2. Everything else.

R functionality is divided into a number of packages.

• The “base” R system contains, among other things, the base package which is required to run
R and contains the most fundamental functions.
• The other packages contained in the “base” system include utils, stats, datasets, graphics,
grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4.
• There are also “Recommended” packages: boot, class, cluster, codetools, foreign, KernS-
mooth, lattice, mgcv, nlme, rpart, survival, MASS, spatial, nnet, Matrix.
⁶http://www.gnu.org/licenses/gpl-2.0.html
⁷http://www.gnu.org/philosophy/free-sw.html
⁸http://www.fsf.org
⁹https://stallman.org
¹⁰http://cran.r-project.org
¹¹http://cran.r-project.org/bin/linux/
¹²http://cran.r-project.org/bin/windows/
¹³http://cran.r-project.org/bin/macosx/
¹⁴http://cran.r-project.org/src/base/R-3/R-3.1.3.tar.gz
History and Overview of R 9

When you download a fresh installation of R from CRAN, you get all of the above, which represents
a substantial amount of functionality. However, there are many other packages available:

• There are over 4000 packages on CRAN that have been developed by users and programmers
around the world.
• There are also many packages associated with the Bioconductor project¹⁵.
• People often make packages available on their personal websites; there is no reliable way to
keep track of how many packages are available in this fashion.
• There are a number of packages being developed on repositories like GitHub and BitBucket but
there is no reliable listing of all these packages.

3.8 Limitations of R
No programming language or statistical analysis system is perfect. R certainly has a number of
drawbacks. For starters, R is essentially based on almost 50 year old technology, going back to the
original S system developed at Bell Labs. There was originally little built in support for dynamic or
3-D graphics (but things have improved greatly since the “old days”).
Another commonly cited limitation of R is that objects must generally be stored in physical memory.
This is in part due to the scoping rules of the language, but R generally is more of a memory hog
than other statistical packages. However, there have been a number of advancements to deal with
this, both in the R core and also in a number of packages developed by contributors. Also, computing
power and capacity has continued to grow over time and amount of physical memory that can be
installed on even a consumer-level laptop is substantial. While we will likely never have enough
physical memory on a computer to handle the increasingly large datasets that are being generated,
the situation has gotten quite a bit easier over time.
At a higher level one “limitation” of R is that its functionality is based on consumer demand and
(voluntary) user contributions. If no one feels like implementing your favorite method, then it’s your
job to implement it (or you need to pay someone to do it). The capabilities of the R system generally
reflect the interests of the R user community. As the community has ballooned in size over the past
10 years, the capabilities have similarly increased. When I first started using R, there was very little
in the way of functionality for the physical sciences (physics, astronomy, etc.). However, now some
of those communities have adopted R and we are seeing more code being written for those kinds of
applications.
If you want to know my general views on the usefulness of R, you can see them here in the following
exchange on the R-help mailing list with Douglas Bates and Brian Ripley in June 2004:

Roger D. Peng: I don’t think anyone actually believes that R is designed to make everyone
happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order
a pizza, I still have to pick up the telephone.
¹⁵http://bioconductor.org
History and Overview of R 10

Douglas Bates: There are several chains of pizzerias in the U.S. that provide for Internet-
based ordering (e.g. www.papajohnsonline.com) so, with the Internet modules in R, it’s
only a matter of time before you will have a pizza-ordering function available.

Brian D. Ripley: Indeed, the GraphApp toolkit (used for the RGui interface under R for
Windows, but Guido forgot to include it) provides one (for use in Sydney, Australia, we
presume as that is where the GraphApp author hails from). Alternatively, a Padovian has
no need of ordering pizzas with both home and neighbourhood restaurants ….

At this point in time, I think it would be fairly straightforward to build a pizza ordering R package
using something like the RCurl or httr packages. Any takers?

3.9 R Resources

Official Manuals
As far as getting started with R by reading stuff, there is of course this book. Also, available from
CRAN¹⁶ are

• An Introduction to R¹⁷
• R Data Import/Export¹⁸
• Writing R Extensions¹⁹: Discusses how to write and organize R packages
• R Installation and Administration²⁰: This is mostly for building R from the source code)
• R Internals²¹: This manual describes the low level structure of R and is primarily for developers
and R core members
• R Language Definition²²: This documents the R language and, again, is primarily for developers

Useful Standard Texts on S and R


• Chambers (2008). Software for Data Analysis, Springer
• Chambers (1998). Programming with Data, Springer: This book is not about R, but it describes
the organization and philosophy of the current version of the S language, and is a useful
reference.
• Venables & Ripley (2002). Modern Applied Statistics with S, Springer: This is a standard textbook
in statistics and describes how to use many statistical methods in R. This book has an associated
R package (the MASS package) that comes with every installation of R.
¹⁶http://cran.r-project.org
¹⁷http://cran.r-project.org/doc/manuals/r-release/R-intro.html
¹⁸http://cran.r-project.org/doc/manuals/r-release/R-data.html
¹⁹http://cran.r-project.org/doc/manuals/r-release/R-exts.html
²⁰http://cran.r-project.org/doc/manuals/r-release/R-admin.html
²¹http://cran.r-project.org/doc/manuals/r-release/R-ints.html
²²http://cran.r-project.org/doc/manuals/r-release/R-lang.html
History and Overview of R 11

• Venables & Ripley (2000). S Programming, Springer: This book is a little old but is still relevant
and accurate. Despite its title, this book is useful for R also.
• Murrell (2005). R Graphics, Chapman & Hall/CRC Press: Paul Murrell wrote and designed much
of the graphics system in R and this book essentially documents the underlying details. This
is not so much a “user-level” book as a developer-level book. But it is an important book for
anyone interested in designing new types of graphics or visualizations.
• Wickham (2014). Advanced R, Chapman & Hall/CRC Press: This book by Hadley Wickham
covers a number of areas including object-oriented programming, functional programming,
profiling and other advanced topics.

Other Resources
• Major technical publishers like Springer, Chapman & Hall/CRC have entire series of books
dedicated to using R in various applications. For example, Springer has a series of books called
Use R!.
• A longer list of books can be found on the CRAN web site²³.

²³http://www.r-project.org/doc/bib/R-books.html
4. Getting Started with R
4.1 Installation
The first thing you need to do to get started with R is to install it on your computer. R works on
pretty much every platform available, including the widely available Windows, Mac OS X, and Linux
systems. If you want to watch a step-by-step tutorial on how to install R for Mac or Windows, you
can watch these videos:

• Installing R on Windows¹
• Installing R on the Mac²

There is also an integrated development environment available for R that is built by RStudio. I really
like this IDE—it has a nice editor with syntax highlighting, there is an R object viewer, and there are
a number of other nice features that are integrated. You can see how to install RStudio here

• Installing RStudio³

The RStudio IDE is available from RStudio’s web site⁴.

4.2 Getting started with the R interface


After you install R you will need to launch it and start writing R code. Before we get to exactly how
to write R code, it’s useful to get a sense of how the system is organized. In these two videos I talk
about where to write code and how set your working directory, which let’s R know where to find
all of your files.

• Writing code and setting your working directory on the Mac⁵


• Writing code and setting your working directory on Windows⁶

¹http://youtu.be/Ohnk9hcxf9M
²https://youtu.be/uxuuWXU-7UQ
³https://youtu.be/bM7Sfz-LADM
⁴http://rstudio.com
⁵https://youtu.be/8xT3hmJQskU
⁶https://youtu.be/XBcvH1BpIBo
5. R Nuts and Bolts
5.1 Entering Input
Watch a video of this section¹
At the R prompt we type expressions. The <- symbol is the assignment operator.

> x <- 1
> print(x)
[1] 1
> x
[1] 1
> msg <- "hello"

The grammar of the language determines whether an expression is complete or not.

x <- ## Incomplete expression

The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored.
This is the only comment character in R. Unlike some other languages, R does not support multi-line
comments or comment blocks.

5.2 Evaluation
When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated
expression is returned. The result may be auto-printed.

> x <- 5 ## nothing printed


> x ## auto-printing occurs
[1] 5
> print(x) ## explicit printing
[1] 5

The [1] shown in the output indicates that x is a vector and 5 is its first element.
Typically with interactive work, we do not explicitly print objects with the print function; it is much
easier to just auto-print them by typing the name of the object and hitting return/enter. However,
¹https://youtu.be/vGY5i_J2c-c?t=4m43s
R Nuts and Bolts 14

when writing scripts, functions, or longer programs, there is sometimes a need to explicitly print
objects because auto-printing does not work in those settings.
When an R vector is printed you will notice that an index for the vector is printed in square brackets
[] on the side. For example, see this integer sequence of length 20.

> x <- 11:30


> x
[1] 11 12 13 14 15 16 17 18 19 20 21 22
[13] 23 24 25 26 27 28 29 30

The numbers in the square brackets are not part of the vector itself, they are merely part of the
printed output.
With R, it’s important that one understand that there is a difference between the actual R object
and the manner in which that R object is printed to the console. Often, the printed output may have
additional bells and whistles to make the output more friendly to the users. However, these bells and
whistles are not inherently part of the object.
Note that the : operator is used to create integer sequences.

5.3 R Objects
Watch a video of this section²
R has five basic or “atomic” classes of objects:

• character
• numeric (real numbers)
• integer
• complex
• logical (True/False)

The most basic type of R object is a vector. Empty vectors can be created with the vector() function.
There is really only one rule about vectors in R, which is that A vector can only contain objects
of the same class.
But of course, like any good rule, there is an exception, which is a list, which we will get to a bit later.
A list is represented as a vector but can contain objects of different classes. Indeed, that’s usually
why we use them.
There is also a class for “raw” objects, but they are not commonly used directly in data analysis and
I won’t cover them here.
²https://youtu.be/vGY5i_J2c-c
R Nuts and Bolts 15

5.4 Numbers
Numbers in R are generally treated as numeric objects (i.e. double precision real numbers). This
means that even if you see a number like “1” or “2” in R, which you might think of as integers, they
are likely represented behind the scenes as numeric objects (so something like “1.00” or “2.00”). This
isn’t important most of the time…except when it is.
If you explicitly want an integer, you need to specify the L suffix. So entering 1 in R gives you a
numeric object; entering 1L explicitly gives you an integer object.
There is also a special number Inf which represents infinity. This allows us to represent entities like
1 / 0. This way, Inf can be used in ordinary calculations; e.g. 1 / Inf is 0.

The value NaN represents an undefined value (“not a number”); e.g. 0 / 0; NaN can also be thought of
as a missing value (more on that later)

5.5 Attributes
R objects can have attributes, which are like metadata for the object. These metadata can be very
useful in that they help to describe the object. For example, column names on a data frame help to
tell us what data are contained in each of the columns. Some examples of R object attributes are

• names, dimnames
• dimensions (e.g. matrices, arrays)
• class (e.g. integer, numeric)
• length
• other user-defined attributes/metadata

Attributes of an object (if any) can be accessed using the attributes() function. Not all R objects
contain attributes, in which case the attributes() function returns NULL.

5.6 Creating Vectors


Watch a video of this section³
The c() function can be used to create vectors of objects by concatenating things together.

³https://youtu.be/w8_XdYI3reU
R Nuts and Bolts 16

> x <- c(0.5, 0.6) ## numeric


> x <- c(TRUE, FALSE) ## logical
> x <- c(T, F) ## logical
> x <- c("a", "b", "c") ## character
> x <- 9:29 ## integer
> x <- c(1+0i, 2+4i) ## complex

Note that in the above example, T and F are short-hand ways to specify TRUE and FALSE. However,
in general one should try to use the explicit TRUE and FALSE values when indicating logical values.
The T and F values are primarily there for when you’re feeling lazy.
You can also use the vector() function to initialize vectors.

> x <- vector("numeric", length = 10)


> x
[1] 0 0 0 0 0 0 0 0 0 0

5.7 Mixing Objects


There are occasions when different classes of R objects get mixed together. Sometimes this happens
by accident but it can also happen on purpose. So what happens with the following code?

> y <- c(1.7, "a") ## character


> y <- c(TRUE, 2) ## numeric
> y <- c("a", TRUE) ## character

In each case above, we are mixing objects of two different classes in a vector. But remember that
the only rule about vectors says this is not allowed. When different objects are mixed in a vector,
coercion occurs so that every element in the vector is of the same class.
In the example above, we see the effect of implicit coercion. What R tries to do is find a way to
represent all of the objects in the vector in a reasonable fashion. Sometimes this does exactly what
you want and…sometimes not. For example, combining a numeric object with a character object
will create a character vector, because numbers can usually be easily represented as strings.

5.8 Explicit Coercion


Objects can be explicitly coerced from one class to another using the as.* functions, if available.
R Nuts and Bolts 17

> x <- 0:6


> class(x)
[1] "integer"
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"

Sometimes, R can’t figure out how to coerce an object and this can result in NAs being produced.

> x <- c("a", "b", "c")


> as.numeric(x)
Warning: NAs introduced by coercion
[1] NA NA NA
> as.logical(x)
[1] NA NA NA
> as.complex(x)
Warning: NAs introduced by coercion
[1] NA NA NA

When nonsensical coercion takes place, you will usually get a warning from R.

5.9 Matrices
Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector
of length 2 (number of rows, number of columns)

> m <- matrix(nrow = 2, ncol = 3)


> m
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
> dim(m)
[1] 2 3
> attributes(m)
$dim
[1] 2 3

Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner
and running down the columns.
R Nuts and Bolts 18

> m <- matrix(1:6, nrow = 2, ncol = 3)


> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

Matrices can also be created directly from vectors by adding a dimension attribute.

> m <- 1:10


> m
[1] 1 2 3 4 5 6 7 8 9 10
> dim(m) <- c(2, 5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

> x <- 1:3


> y <- 10:12
> cbind(x, y)
x y
[1,] 1 10
[2,] 2 11
[3,] 3 12
> rbind(x, y)
[,1] [,2] [,3]
x 1 2 3
y 10 11 12

5.10 Lists
Lists are a special type of vector that can contain elements of different classes. Lists are a very
important data type in R and you should get to know them well. Lists, in combination with the
various “apply” functions discussed later, make for a powerful combination.
Lists can be explicitly created using the list() function, which takes an arbitrary number of
arguments.
R Nuts and Bolts 19

> x <- list(1, "a", TRUE, 1 + 4i)


> x
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

[[4]]
[1] 1+4i

We can also create an empty list of a prespecified length with the vector() function

> x <- vector("list", length = 5)


> x
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

5.11 Factors
Watch a video of this section⁴
Factors are used to represent categorical data and can be unordered or ordered. One can think of
a factor as an integer vector where each integer has a label. Factors are important in statistical
modeling and are treated specially by modelling functions like lm() and glm().
Using factors with labels is better than using integers because factors are self-describing. Having a
variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.
Factor objects can be created with the factor() function.
⁴https://youtu.be/NuY6jY4qE7I
R Nuts and Bolts 20

> x <- factor(c("yes", "yes", "no", "yes", "no"))


> x
[1] yes yes no yes no
Levels: no yes
> table(x)
x
no yes
2 3
> ## See the underlying representation of factor
> unclass(x)
[1] 2 2 1 2 1
attr(,"levels")
[1] "no" "yes"

Often factors will be automatically created for you when you read a dataset in using a function like
read.table(). Those functions often default to creating factors when they encounter data that look
like characters or strings.
The order of the levels of a factor can be set using the levels argument to factor(). This can be
important in linear modelling because the first level is used as the baseline level.

> x <- factor(c("yes", "yes", "no", "yes", "no"))


> x ## Levels are put in alphabetical order
[1] yes yes no yes no
Levels: no yes
> x <- factor(c("yes", "yes", "no", "yes", "no"),
+ levels = c("yes", "no"))
> x
[1] yes yes no yes no
Levels: yes no

5.12 Missing Values


Missing values are denoted by NA or NaN for q undefined mathematical operations.

• is.na() is used to test objects if they are NA


• is.nan() is used to test for NaN
• NA values have a class also, so there are integer NA, character NA, etc.
• A NaN value is also NA but the converse is not true
R Nuts and Bolts 21

> ## Create a vector with NAs in it


> x <- c(1, 2, NA, 10, 3)
> ## Return a logical vector indicating which elements are NA
> is.na(x)
[1] FALSE FALSE TRUE FALSE FALSE
> ## Return a logical vector indicating which elements are NaN
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE

> ## Now create a vector with both NA and NaN values


> x <- c(1, 2, NaN, NA, 4)
> is.na(x)
[1] FALSE FALSE TRUE TRUE FALSE
> is.nan(x)
[1] FALSE FALSE TRUE FALSE FALSE

5.13 Data Frames


Data frames are used to store tabular data in R. They are an important type of object in R and
are used in a variety of statistical modeling applications. Hadley Wickham’s package dplyr⁵ has an
optimized set of functions designed to work efficiently with data frames.
Data frames are represented as a special type of list where every element of the list has to have the
same length. Each element of the list can be thought of as a column and the length of each element
of the list is the number of rows.
Unlike matrices, data frames can store different classes of objects in each column. Matrices must
have every element be the same class (e.g. all integers or all numeric).
In addition to column names, indicating the names of the variables or predictors, data frames have
a special attribute called row.names which indicate information about each row of the data frame.
Data frames are usually created by reading in a dataset using the read.table() or read.csv().
However, data frames can also be created explicitly with the data.frame() function or they can be
coerced from other types of objects like lists.
Data frames can be converted to a matrix by calling data.matrix(). While it might seem that the
as.matrix() function should be used to coerce a data frame to a matrix, almost always, what you
want is the result of data.matrix().

⁵https://github.com/hadley/dplyr
Other documents randomly have
different content
The vessels also had a parodus placed outside the vessel and
extending the whole length of the sides above the oars. The
contrivance was probably copied from the Egyptians, who introduced
it to enable the warriors to fight at close quarters when drawing
alongside an enemy, or to run to either end of the ship as occasion
might require without impeding, or being impeded by, the rowers.

GREEK BIREME.
From a Vase in the British Museum, found at Vulci.
GREEK WAR GALLEYS.
From a Vase in the British Museum found at Vulci.
Cancelli, or shields of basket work, were placed along the sides of
the ships at such a height that the heads of those on board are just
visible. The cancelli bore a striking resemblance to the circular
basket-work boats still to be found on the upper Euphrates; this
supports the supposition that the cancelli may have been used for
other purposes, particularly if they were made comparatively
watertight, as the function of a shield was not only to protect a
warrior in battle, but to help to keep him dry when on shipboard by
being disposed along the sides to prevent the spray from entering
the ships. A forecastle was constructed upon these ships, and upon
each forecastle a look-out man was stationed; and when these
structures came to be built of larger dimensions they served to
accommodate a number of fighting men who, from their superior
position, could throw their missiles with greater effect. The
forecastle had the further advantage of serving as a stronghold in
the event of an attempt being made to capture the ship by boarding
it.
Following the Phœnicians, the Greeks are thought to have begun to
build their own warships about 700 b.c., perhaps earlier, but it was
about that time that the first three-banked warship was launched at
Corinth. The three-banked ships were for many years the largest in
existence. During the fourth century b.c. shipbuilding was practised
extensively, four-banked ships being built at Chalcedon, five-banked
at Salamis, and six-banked ships at Syracuse. Ships of ten banks,
according to Pliny, were ordered by Alexander the Great, and about
300 b.c. ships having twelve banks are said to have been built for
Ptolemy, and fifteen-banked ships for Demetrios, for a battle near
Cyprus.
Ptolemy Philopater, who ruled in Egypt from 222 to 204 b.c., is
alleged to have had a forty-banked ship of a length of 280 cubits or,
reckoning the cubit at 18 inches, of 420 feet, and a beam of 57 feet.
While increasing the size and number of oars, it would, nevertheless,
be impossible to augment to any appreciable extent the speed at
which these ships could be rowed, and the more unwieldy would
they become, and the more difficult would it be to keep steering way
upon them. Again, the assertions of the historians are so
contradictory that it is a thankless task to attempt to reconcile all
their stories, especially as they depended much upon hearsay for
their information. For that reason, therefore, a great deal that has
been recorded as to the early ships and their numerous banks of
oars is not to be accepted without careful inquiry and verification.
It has never been established beyond question what is meant by
banks of oars, or whether the Greek text has been interpreted
correctly when it is taken to express forty superimposed banks of
oars. From constructional reasons it may be assumed that a ship
having forty superimposed banks of oars never existed, and it is very
doubtful whether ships having more than a fourth of that number of
banks passed beyond the imaginations of their inventors. In any
case they were soon dispensed with, and in course of time it was
found that the best results were obtained with galleys having two or
three banks of oars.
It is not definitely known how the rowers were disposed in the ships
of anything over seven or eight banks. If any vessels had forty banks
of oars, the upper rows must have been of an absolutely unwieldy
length. Assuming the oars to have been weighted with lead so that
the inborne and outborne portions were equally balanced, they must
nevertheless have been exceedingly difficult to row even by a
number of men, and it was impossible for any rowers to have moved
these great oars at the same speed as the men at the lower banks
moved their lighter and shorter ones. That some such difficulty was
experienced, even in biremes and triremes, is shown by the
arrangement of the oars, whereby all in a bank were not of equal
length, but were graded so that those nearer the ends of the banks
were longer in order that all the blades might enter the water in a
straight line. Each row above must have had its own line in the
water a little farther away from the side of the ship than the row
beneath it, or the blades would have interfered with each other and
the rowers thrown into hopeless confusion. The tremendous amount
of lead that would have to be carried to counterbalance the
outborne portions of several hundred oars would add materially to
the dead weight to be propelled, and, much of it being placed high
above the water, the stability of the vessel would be lessened.
The Athenians used leather or skin aprons or covers over the oar
holes to prevent the water entering, the oar passing through a hole
in the leather, and the apron was bound to the oar in such a way as
to be watertight. This contrivance was widely adopted later. The oar
ports were constructed between the ribs, but the oars instead of
being rowed against the ribs were pulled against thongs fastened to
the next rib, thus minimising the strain upon the ship’s structure and
preventing the oars being lost overboard. One man one oar was
apparently the general rule at that time.
In his most painstaking study of “Ancient Ships” Mr. Cecil Torr has
gone very closely into the subject of the oar equipment of the
galleys. An Athenian three-banked ship would carry two hundred
oars, of which thirty were worked from the upper decking, sixty-two
on the upper bank, and fifty-four to each of the lower. The earliest
two-banked ships had eighteen rowers. An Athenian four-banked
ship might carry two hundred and sixty-six oars. The Roman and
Carthaginian five-banked ships in use about 256 b.c. had three
hundred rowers besides the combatants. The statement is made by
an early historian that in 280 b.c. the Heraclean fleet on the Black
Sea included an eight-banked ship with a hundred rowers on each
file, or one thousand six hundred rowers in all. As usually the
fighting men carried exceeded the rowers in number, the ship must
have had close upon three thousand five hundred men aboard.
Warships of all the early Eastern nations were strengthened by
cables passed longitudinally round them in order to keep the timbers
in place and prevent them from being started under the strain
occasioned by the shock of ramming. Egyptian ships of about 1200
b.c. had cables stretched from stem to stern and passing over the
top of the mast and other posts, but this contrivance was to prevent
the vessel from drooping at the ends, a weakness known as
“hogging.” The shock to the ramming vessel was scarcely less severe
than that to the vessel receiving the blow. To take up the strain and
add to the power of the blow the bows were strengthened by means
of waling pieces which supported the ram proper. The Greek ships
were built with the keel, the stempost, and the lower pair of waling
pieces converging to hold the ram, while higher up the stem was a
smaller ram which in its turn was buttressed by another pair of
waling planks. The catheads, or beams projecting from the bows on
either side by which the anchors were raised, were so placed on a
level with the gangway and gunwale that they would sweep the
upper works of an enemy’s ship and smash its gangway and hurl
into the sea or the hold all the fighting men upon it. Ships of more
than three banks are believed to have carried another ram level with
the catheads, and to have had a ram for every pair of additional
waling beams. The ram heads were generally of bronze and weighed
170 lb. or more.

“AN ANCIENT BIREME, FROM BASIUS, HAVING ONE TIER OF OARS ONLY.”

“ONE OF THE ANCIENT LIBURNI, OR GALLEYS, HAVING A SINGLE TIER


OF OARS, ACCORDING TO BASIUS.”
AN ANCIENT TRIREME, ACCORDING TO BASIUS.
From Charnock’s “History of the Marine Architecture of all Nations.”
The later rams varied considerably in shape. The triple ram was
sometimes made with the teeth pointing slightly downward, while
others had an upward tilt. The lowest ram often extended farther
forward than those above, the idea being that it would inflict severe
injury about or below the water line, and that the upper rams,
besides causing damage, would push the stricken vessel off the
lower ram and let her sink without the assailant being dragged down
by the head with her.
The build of the ships rendered it necessary that an engagement
should be fought on a calm sea, and daylight was preferred in order
that the combatants could see what they were doing. As the fleets
approached one another the commanders of the different vessels
decided upon their individual opponents. Much skilful manœuvring
ensued to ram the enemy or avoid a blow. The slaves strained at the
oars while their taskmasters ran between the files of rowers and,
with unmerciful blows from heavy sticks and whips, stimulated them
to still greater exertions if possible.
Poor slaves, mostly prisoners of war, their prospects were gloomy in
the extreme! If their ship were rammed some of them were sure to
be injured, and if she sank they went down with her, fastened to
their places and having no chance of escape. If the oars were
disabled in the collision between the ships the rowers were bound to
receive violent blows from the inboard end of the oars, or to be
cruelly pierced by splinters of wreckage. Showers of missiles from
the opposing ship fell upon the helpless wretches. In later years,
when the terrible Greek fire was added to the means of attack and
defence, it contributed the prospect of being burnt alive to the other
horrors of their situation. Victory meant no rejoicings for them. The
wounded were of little account and could be dispensed with when
slaves were to be had for the capturing, and it was easy to put them
overboard to die the more quickly. Those who survived the battle
unhurt or not too severely injured to recover rapidly, were retained.
If their ship were vanquished they might look forward to greater
cruelties as a punishment for their share of the defeat. If they
belonged to the victors, they had only more battles, the torturing
whips of their drivers, and insufficient food as their portion in life.
Death came as a welcome relief to the slaves of victor and
vanquished; in it lay their only hope of peace.
When the Roman navy was at its best the ships were painted a
colour which matched the waves, and the hulls were made as
watertight as possible with tar. Occasionally in the later Roman ships
layers of tarred cloth were placed outside the outer planking, and
the hull was then lead-sheathed. Bronze nails and wooden pegs
were used in fastening the timbers together, and some ships were so
built that they could be taken to pieces and transported overland if
necessary. Ships of three, four and five banks were even conveyed
from the Mediterranean to the Euphrates.
The facility with which the Liburnians handled the two-banked ships
in their Adriatic campaign induced the Romans to adopt these
vessels as models for their own two-banked ships, and in course of
time they adopted the name of liburna for all war-vessels of from
one to five banks.
If some of the historians may be believed, anything that could be
piled upon the ancient ship and did not capsize it was permissible.
One is said to have had a tower at the stern and another at the
prow. Another bore “a large tower of masonry with a great gate.
Here appear some vases, probably filled with combustibles.” Another
libernus has a mast or yard, suspended perpendicularly by the side
of the forward tower, and having at each end a crossbeam. Yet
another libernus, besides carrying a protector for the helm at the
stern, is said to have had six round towers; the largest, of embattled
masonry, was at the prow, two others, also of masonry, surmounted
by domes, and connected by a bridge, were near the stern, and the
other three were nearer the fore part of the ship, were roofed, and
two of them had windows.
Shipping in the Mediterranean extended with extraordinary rapidity
in the recovery after the stagnation caused by the fall of the Roman
Empire and the relapse into semi-barbarism which followed the
successful invasion of Italy by the wild tribesmen of the North. The
advent and rise of the Moslem power caused a series of struggles in
which every state was in a more or less constant condition of
warfare against its neighbour, and the Crusades served but to add
fuel to the fire of internecine and religious conflict. Some immense
ships are stated to have been employed up to and at the fall of
Constantinople. The early centuries of the Christian era saw the
evolution of a flat, shallow vessel, fitted with one or two masts
carrying sails, from which the lateen rig developed, equipped with a
long ram above the water line, with two or at most three banks of
oars. It appears from illustrations that some of these boats carried a
superstructure extending beyond the beam on either side. War
vessels of this type became common throughout the length and
breadth of the Mediterranean, and remained in use long after the
introduction of firearms.
Before the discovery of Greek fire, flaring missiles of some kind had
been devised. Frontinus mentions fire-ships, or hulls carrying
combustibles and allowed to drift with wind and tide upon the
enemy’s ships: stinkpots, to nauseate the enemy, though how the
others escaped the smells except by keeping to windward does not
appear; and Evelyn adds, “Nay, snake pots, and false colours.” The
Greek fire, however, was the most terrible of the weapons employed
at that time. By some means by which a fair amount of power was
exerted, the liquid was squirted—or vomited, to use one historian’s
phrase—through copper pipes upon an enemy’s ship, and as the
liquid had the peculiar property of igniting upon exposure to the air
and was inextinguishable by water, it was a most formidable engine
of destruction. Small vases filled with the liquid and sealed airtight
were used as hand grenades and flung at opposing ships and,
breaking, set them on fire. Heavy arrows carrying balls of flax
soaked in the liquid were used both in land and sea warfare, as also
were hand-flung javelins similarly equipped, and the flights of these
masses of inextinguishable flames must have been equally
demoralising to the combatants against whom they were directed
and destructive to the ships and inflammable buildings upon which
they fell. This composition is thought to have been invented in the
seventh century; the first occasion on which it was employed on an
extensive scale was in the great battle between the fleets of
Constantine and the Saracens, when the latter, through its agency,
lost practically their whole fleet and thirty thousand men killed. After
that both sides used Greek fire whenever possible.
Up to the introduction of gunpowder and artillery the methods of
fighting varied but little. The sea-fights of the Crusades were
conducted on the lines which had been recognised as the best for a
couple of thousand years or more, viz., ram the enemy and board
him. Greek fire added this rule: Burn him also if you can.
The countries along the northern and southern shores of the
Mediterranean had attained a high degree of civilisation when the
inhabitants of Western Europe and the British Islands were still more
or less savage. What may be regarded as circumstantial evidence in
support of the contention that the Phœnicians voyaged to Cornwall
and Ireland is the similarity which exists in shape between the
wicker shields, such as the Phœnicians are known to have used, and
the wicker coracles which the Britons employed at the time of the
invasion by Julius Cæsar. There must have been considerable
intercourse between the Phœnicians and the dwellers in the valley of
the Euphrates before the latter conquered the former; but whether
the dwellers in Nineveh, or those by the sea, invented wicker boats,
or whether both derived their knowledge of wicker boats from other
sources, are points of no immediate importance. But what is of
interest is that the British wicker coracles were covered with hides to
make them watertight, that they had keels and gunwales, and that
they were small enough to be used as shields if necessary, their
dimensions being rather over 4 feet in length, with a breadth of
about 3 feet, and a depth of a trifle over 12 inches. They were big
enough to carry one man of average size. There are on the
Euphrates to this day boats or rafts of proportionate dimensions, up
to a maximum length of 40 or 50 feet over all, which are constructed
with a light framework of wicker and timber, over which skins are
stretched to keep them watertight. These boats, when laden, drift
down the river with the current, and, on reaching their destination,
their cargo and skins are sold and the framework is made up into a
package and returned upon the back of an ass to the port of
departure. These cargo boats have been humorously referred to at a
meeting of the Institute of Marine Engineers as of “one ass-power.”
So far as Britain is concerned, the shipping of each coast seems to
have developed under the influence of the foreign shipping with
which it mostly came in contact. The east coast was largely
concerned with the Danes, and the south coast with its neighbours
across the Channel. The Danes and Vikings developed a type of
vessel peculiarly their own. The best specimen yet brought to light is
that known as the Gokstad ship.
AN ANGLO-SAXON SHIP OF ABOUT THE
NINTH CENTURY.
(From Strutt.)
The Viking ships must have walked the waters almost with the grace
of motion of a modern yacht, and when the great square sail was
hoisted, bearing the escutcheon of some dread sea-rover, they must
have been fascinating emblems of human skill and power no less
than of the noblest and the basest passions of mankind.
The large rowing and sailing galleys of the Mediterranean were fine-
weather ships, it being the custom to suspend merchant voyages,
naval expeditions, and piracy in that sea during the winter months.
Obviously, such vessels were wholly unsuited to the Atlantic coasts
of Western Europe. The western coasts of Spain, France and
Portugal produced a ship, short and broad, and strong enough to be
beached even when a moderate sea was running. This model was
seemingly copied by the English of the south coast, and vessels of
this type, built in the eighth century, were planked and carried high,
erect stemposts and sternposts. The vessels were single-masted and
fitted with a yard and square sail, and the steering was effected by a
large oar at the stern. They were not unlike the Viking ships in some
respects, but they were of less average length and broader in
proportion, having bluffer bows, a less fine entry, and a long flat
floor extending farther aft than did that of the northern ships. Some
also had a ram.

VIKING SHIP FOUND AT GOKSTAD, SOUTH NORWAY.


Photograph: O. Vaering, Christiania.
What may be regarded as the first great national step in British
shipbuilding was inaugurated in the latter part of the ninth century,
when King Alfred saw that in order to beat the Danes he must meet
them with ships superior in size and strength to their own. His war
galleys were virtually double the size of those of the invaders, and in
some instances almost double their length. The Gokstad ship, by no
means one of the largest of its type, had sixteen oars a side. If
Alfred’s boats had thirty oars or more a side, as is stated, and were
double-banked—that is, two men to each oar—like those of his foes,
the fighting strength of the individual ships of his navy must have
been very great.
By the eleventh century the Norsemen had taken to painting their
vessels externally, besides making them larger and giving them
decks. The stempost and sternpost were more ornately decorated,
gilded copper being the material used for this purpose. Svend
Forkbeard’s own ship, the Great Dragon, is said to have been in the
form of this legendary beast, but what the historian most likely
meant is that the stern decoration or the design on the sail may
have shown a fantastic representation of the fearsome animal; the
Vikings were too good seamen to have built the ship in any form
likely to be inferior to the shape they had learned to appreciate so
highly. The Long Serpent, which appeared in that century, is said to
have been 117 feet in length, and decked, and to have carried six
hundred men. This is the first war vessel in the Western seas known
to have been decked throughout,[7] and in which cabin
accommodation was provided for the principal fighting men. Beneath
the deck the hull was divided into five cabins or compartments; the
foremost was the lokit, in which, in a royal vessel, the king’s
standard bearers were quartered; next, the sax or storeroom; then
the kraproom, where sails and tackle were kept; the foreroom,
containing the arms chest, and forming the living room of the
warriors; and astern of all was the lofting, or great cabin, devoted to
the commander. For the comfort of the rank and file of the fighting
men at night in port an awning was spread, supported by a ridge
pole on pillars. At other times they would seem to have had to put
up with sleeping on deck and making the best of it; they would
certainly be no worse off than in the old days of the open ships, and
being somewhat higher above the water would be less exposed to
the spray. At the end of the twelfth century King Sverre Sigurdsson
had some merchant ships cut across amidships and lengthened, and
then used them as war ships.
FLEET ATTACKING A FORTIFIED TOWN.
MS. Harl. 326.
William the Conqueror’s fleet in the eleventh century is estimated at
anything between six hundred and ninety-six vessels and three
thousand; a manuscript in the Bodleian Library gives the number as
one thousand. Most of the vessels were small, if the illustrations on
the Bayeux tapestry are to be accepted. The type of ship is no doubt
represented with a fair amount of accuracy, but in certain other
respects the efforts of the weavers of the tapestry are only less
grotesque than the so-called ships which appear on some of the
medals of the ports, but which nevertheless have been accepted as
correct representations of the ships of the times, whereas they
should be regarded as indicating approximately the type of vessel
then in vogue. With the exception that a few ships were built of
rather greater dimensions—the largest in the invading fleet can
hardly have been more than 80 tons burthen—shipbuilding shows
but little development on the Atlantic coast until after the
introduction of artillery.

WARSHIPS OF THE FOURTEENTH CENTURY.


(After Harleian MS.—1319. fol. 18.)
A battle between a Cinque Ports fleet under Hubert de Burgh and a
French fleet under Eustace is chiefly remarkable by reason of the
English manœuvring to secure the windward position, this being the
first occasion on which this manœuvre is recorded, and the attack
on the French rear ended in a signal English victory. The fame of the
English archers was great, and they added to their laurels by playing
no small part in the battle. From their positions in the tops and on
the forecastles they kept up a steady flight of arrows upon the
French. The arrows carried flasks of unslaked lime which broke on
striking the French ships, and the lime dust, borne on the wind,
entered the eyes of the enemy and blinded them, the defeat of the
French following. The ships of that period were provided with
platforms, elevated on wooden pillars, at the bow and stern. The
erections were the forerunners of the immense structures which
were added in later years and did so much to render ships unstable.
A Venetian ship constructed for Louis IX. of France in 1298, and
named the Roccafortis, was 70 feet long on the keel and 110 feet
over all, with a width at prow and poop of 40 feet. She is stated to
have had two decks and a fighting castle at each end. Possibly the
weight of the bellatorium, as the castle was called, may have
necessitated such an extraordinary beam near the bows and stern,
but she could never have been built with such dimensions to be
other than a floating fortress.
In the Mediterranean, however, great activity prevailed. The
Crusades gave a tremendous impetus to the shipping of the Middle
Sea. Christians and Saracens vied with each other in the production
of ships of war. The larger “busses” sent to the Levant in the fleet of
Richard Cœur de Lion carried, according to Richard of Devizes, a
captain and fifteen seamen, and forty knights with their horses, forty
footmen, fourteen servants, and twelve months’ provision for all.
Some vessels are said to have carried double this complement and
cargo. A Saracen ship, of which little is known, was encountered off
the Syrian coast, of so great a size that it could not be subdued until
the Christian galleys charged in line abreast and smashed in her side
so that she went down with nearly all of her one thousand five
hundred men.
A GALLEY OF THE KNIGHTS OF MALTA.
From the Model in the Victoria and Albert Museum.
MEDITERRANEAN GALLEY.
From a Model in the Museum of the Royal United Service Institution.
CHAPTER II
WAR CRAFT OF THE FAR WEST, CENTRAL AFRICA, THE FAR SOUTH,
THE PACIFIC, AND THE FAR EAST

Notwithstanding the enormous strides made in ship construction,


it is still possible to find in active use vessels but little removed from
the earliest types known. It is, of course, in the “Mysterious East,”
where anything that served its purpose very well centuries ago
seems to have been expected to retain its efficiency for ever, that
one finds those survivals from bygone ages. The earliest vessels
known were hollowed logs, or dug-outs; such are in use still. Planks
were stitched or lashed on above the bulwarks to raise the freeboard
and keep out the sea; the same contrivance is applied to this day. A
few strips of bamboo or other light material tied together formed
rafts; their exact counterparts are in existence in many parts of the
world. It was found possible to sail them by means of a sail of
matting attached to a yard which was supported by a stout mast
destitute of stays or standing rigging; a centre-board or drop keel
which could be lowered through the middle of the raft into the water
prevented leeway, and steering was effected by means of a pole
with a blade attached, usually tied on, this long paddle being
sometimes used near the middle of the after end of the raft and
sometimes at either of the after corners, the necessary leverage
being obtained by the provision of a stump for the purpose. The
origin of such rafts is lost in antiquity, yet they continue to be found
in active service.
The bark canoes which the Indians of North America employed on
the great rivers and lakes when white men first went there are
unchanged in their method of construction, and though in places
where civilisation and the mechanical arts have assumed sway the
old canoes have given way to the products of the modern boat-
builders’ skill, yet in the farther North-West the Indian canoe ripples
the summer surface of the lakes and streams as it did centuries ago.
The real Indian canoes were made by building the frame, and then
placing upon it a carefully prepared strip of birch bark sufficiently
large to cover the entire frame in one piece; it was lashed to the
frame and then stitched at the ends to form the bow and stern. The
larger canoes were sometimes stiffened by having two or three
pieces of wood lashed thwartwise. The canoes were propelled by
means of paddles, and the Indians sat or knelt on the bottom of the
boat. Many of these canoes weighed as little as 60 lb. and some
even less. Their chief use was in the migrations of the tribes
between their summer and winter quarters, and very picturesque
they must have appeared to the early settlers as a flotilla glided
past; that is, if an Indian could ever be regarded by an early settler
as anything but “pizen.” But these canoes served equally well to
convey the painted and feathered braves to battle; and anyone who
has seen the Indians in their canoes can well imagine how in days
now happily past, it is hoped for ever, a fleet of these boats, filled
with cruel and relentless men, passed swiftly and silently over the
waters at night, their paddles so skilfully wielded that the blades
entered and left the water with never a splash to break the solemn
stillness. Then the Indian canoe was no longer an emblem of joyous
happiness, made only for the sparkling waters and clear nights and
days of that foretaste of Paradise, the Indian summer, fit craft for
the romantic passing of Hiawatha to “the kingdom of Ponemah”; but
an evil thing, as swift and silent and terrible as the bloodthirsty men
it bore to victory or destruction.
WAR CANOES OF INDIANS OF THE NORTH-WEST.
From a Photograph of a Painting, supplied by the Curator of the Chicago Museum.
The skin canoe or kayak of the Eskimo holds only one person,
though its length may be anything from 7 or 8 feet to 25 feet. It is
simply a light frame, running to a fine point at either end, never
more than a few inches in depth, and with a breadth determined by
the breadth of the man who is to use it. It is entirely skin-covered,
except for a small hole in the deck, just abaft of amidships, in which
the solitary occupant sits. The Eskimo are very clever in the
management of their light craft—it weighs but a few pounds, and for
its size is probably the lightest sea-going vessel in the world—and
employ it chiefly in hunting, even at some distance from land.
The bark canoes of the Australian blacks were very primitive affairs;
they have almost disappeared, sharing the fate of the rapidly
dwindling aborigines. It may be doubted if a trace of one of these
canoes could now be found from one end of the Murray River to the
other. Since the blacks saw how easily the white man knocked
together a few planks and made a flat-bottomed, straight-sided
boat, they ceased to labour at bark canoes, but instead obtained a
few boards, usually by pilfering, “borrowed” or begged a few nails,
and with a stone for a hammer have done likewise, patching the
very leaky seams with anything that came handy, were it scrap of
tin, leather, raw hide, or well-greased fragment of a dirty, torn, old
blanket, and making up for deficiencies by incessant bailing. Never
again on the southern Australian rivers will the bark canoe convey
the braves to the scene of the tribal conflict, or ferry in the dying
glow of the setting sun the skeleton-painted men to the edge of the
grim, dark forest on the other shore to attend a great corroboree,
whether of war, rejoicing, or grief.
Nor have the African negroes made much progress beyond the dug-
out stage of war canoe construction. The Moors and Arabs long
since proved themselves excellent seamen and shipbuilders,
designing boats suitable to their needs, and are in quite another
category. The negroes of the Cross River district in Southern Nigeria
may be taken as typical of the African canoe makers. They usually
chose a mahogany or awosa tree, and, having felled it, burnt it
hollow where it fell. It was then dragged on rollers to the waterside
and finished with whatever tools were available, matchets, knives
and axes being used since the white man’s introduction of those
implements. Occasionally a canoe is “smoked” or hardened by being
exposed to the hot smoke of a fire built round it. Some of the war
canoes are as much as 60 feet in length, and are wide enough to
allow the men to sit two abreast. The larger ones have a steering
platform on a level with the gunwale or raised a foot or two above it,
and a smaller platform is placed at the bow, where a flagstaff may
also be fixed. When there are no thwarts or seats the crew sit on the
bottom of the canoe or on the gunwale, according to the size of the
vessel. Both bow and stern overhang. The paddles are made of
hardwood in one piece, 3 to 4 feet in length, and are pointed.
It is to the East Indies and the Pacific that we must turn to find the
most wonderful examples of the war canoe. They may be divided
into two classes: those with outriggers—this section including double
canoes—and those without.
A “DUG-OUT” CANOE OF NEW GUINEA.
NEW GUINEA CANOES WITH OUTRIGGERS.
From Photographs supplied by the Hon. J. E. Jenkins.
Many of the canoes lacked stability, even in calm waters, and the
risk of capsizing was greater in waters liable to sudden storms or
exposed to the ocean swell. To meet this difficulty and at the same
time permit of the continued use of the shallow harbours of their
coasts, the Malays are supposed to have invented the outrigger, and
this conjecture is based on the fact that wherever the Malay
influence is traceable there some form of the outrigger or double
canoe is to be found also.
The primitive hollowed log generally constitutes the hull of the
canoes of the Pacific Islanders. The rest is mainly a matter of
ornamentation. With but few exceptions, the islanders seem to have
believed that the higher and more imposing and ornamental they
could make the stems or sterns of their vessels, the more dreadful in
war were they likely to be. Many of these elevations are beautifully
carved; other canoes are merely grotesque, and not a few have no
artistic feature whatever to redeem them from absolute hideousness.
As a means of terrifying an enemy by presenting such things to his
astonished gaze they would doubtless be effective, had it not been
that the enemy would retaliate by presenting something equally
ugly, with the result that the moral effect which each party sought to
exercise upon the other would be neutralised. Some of the islanders
are said to have decorated the prows of their vessels with the skulls
of opponents killed in previous expeditions; while others contented
themselves with locks of human hair, similarly derived, as naval
adornments. With the exception of bows, arrows and spears, all
their weapons were designed for fighting at close quarters. It must
have been a labour of love, as well as a feeling of pride in the
appearance of the fearfully shaped and murderous clubs, which led
them to carve their weapons as carefully as they did, to render them
so deadly, and to adorn them with mother-of-pearl and sharks’
teeth. Not a few of the paddles were given serrated edges in order
that they could be the more effectively employed as war clubs if
necessary.
There are not many native war canoes now left in the South Seas.
None of the islanders, except the head-hunters, habitually kept
canoes for war purposes, though at times one would be designed
and built for some special expedition. The last of the great Samoan
war canoes has almost rotted to pieces on the shore. It is doubtful if
it has ever been used in a warlike expedition. It was between 60 and
70 feet in length, and 18 to 20 feet beam over all. It consisted of
two large single canoes, placed parallel a few feet apart, and joined
by a plank deck which ran across the greater part of the vessels.
Amidships was a house-like erection, used as a shelter. It was
propelled by oars, but also carried a mast and sails. It could easily
carry a hundred men.
The great canoe to hold three hundred men is but a memory; all
that is left of it is its steering paddle, 40 feet in length, which adorns
the wall in the Ethnographical section of the British Museum.

STEM-PIECE, MAORI WAR CANOE.


STERN-POSTS OF MAORI WAR CANOES.
From Examples in the Dominion Museum, Wellington, New Zealand.
A MAORI WAR CANOE.
From Angas’s “New Zealand.”
The canoes of that mysterious people, the Maori of New Zealand,
well repay attention in greater detail than is possible in this book.
The origin of the people themselves is unknown, though, if their
traditions are to be accepted, they migrated a few hundred years
ago from certain of the islands in the Central Pacific, partly
conquered and partly absorbed the people whom they found there
already, and have remained ever since. There has been more than
one such expedition. There are affinities between the Maori and the
Hawaians. Did the Maori come originally from Hawaii, or is there
some connection between them and the ancient Egyptians, as is
held to be indicated by certain points of resemblance in their
carvings and mural decorations? In what sort of canoes did they
cross the ocean, and how did they find their way? Unfortunately, the
old chiefs who held the traditions have all died, and it is only owing
to the painstaking researches of a few scholars who recognised the
need and value of preserving what could still be learnt, that anything
at all is known of the history of this strange people. Their legends
tell us that some of their canoes were of great size; some could
carry fires or places for cooking the food, and others were double
canoes. One of the latter is said to have had a platform connecting
the two hulls, and bearing a house; it was a three-masted vessel. All
the New Zealand canoes had names of symbolical or historical
interest. One of them was called Marutuahi, which, translated
literally, means a slaying or devouring fire.[8] The dimensions of the
historical or legendary canoes are not known. The straight, tall kauri
pines of the North Island enabled large canoes to be built; one is
said to have been 110 feet in length, and many of the later canoes
were 60 to 80 feet long, and held a hundred to a hundred and fifty
men. These boats had long, overhanging bows ornamented with a
figurehead and two carved boards extending some little distance
along either bow. Between these boards and resting on the stem the
carved figurehead was placed and was often adorned with tufts of
feathers. A mast set rather far forward and raking aft supported a
triangular mat sail, the foot of which extended along the boom one
and a half times to twice the length of its height, and enabled the
canoe to sail very near the wind. The stays of the mast and the
sheets of the sail were of plaited flax. The drawbacks to these
canoes were that having no keels they made great leeway, and that
their length made them awkward to manage whenever they were
caught in anything like a rough sea; they could not meet the seas
end on, but lay in the trough of the waves, and were so well handled
that disasters were few. In rough weather they were covered with
flax mats over a portion of their length to prevent the seas breaking
inboard.
The long pine hull was of great strength, but to render it more
seaworthy topsides were lashed along the sides of the hull from end
to end of the vessel with braids of flax fibres,[9] and the seams and
holes were caulked with a species of down. As a precaution against
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like