100% found this document useful (3 votes)
7 views

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist 1st Edition Thomas Mailund download

The document provides information about various data science books authored by Thomas Mailund, focusing on R programming for data analysis, visualization, and modeling. It includes links to download these books and mentions additional resources for data science. The content covers topics such as data manipulation, reproducible analysis, and advanced programming techniques in R.

Uploaded by

ahaancbarriob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
7 views

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist 1st Edition Thomas Mailund download

The document provides information about various data science books authored by Thomas Mailund, focusing on R programming for data analysis, visualization, and modeling. It includes links to download these books and mentions additional resources for data science. The content covers topics such as data manipulation, reproducible analysis, and advanced programming techniques in R.

Uploaded by

ahaancbarriob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Beginning Data Science in R: Data Analysis,

Visualization, and Modelling for the Data


Scientist 1st Edition Thomas Mailund download

https://textbookfull.com/product/beginning-data-science-in-r-
data-analysis-visualization-and-modelling-for-the-data-
scientist-1st-edition-thomas-mailund/

Download more ebook from https://textbookfull.com


We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Metaprogramming in R: Advanced Statistical Programming


for Data Science, Analysis and Finance 1st Edition
Thomas Mailund

https://textbookfull.com/product/metaprogramming-in-r-advanced-
statistical-programming-for-data-science-analysis-and-
finance-1st-edition-thomas-mailund/

Functional Programming in R: Advanced Statistical


Programming for Data Science, Analysis and Finance 1st
Edition Thomas Mailund

https://textbookfull.com/product/functional-programming-in-r-
advanced-statistical-programming-for-data-science-analysis-and-
finance-1st-edition-thomas-mailund/

Advanced Object-Oriented Programming in R: Statistical


Programming for Data Science, Analysis and Finance 1st
Edition Thomas Mailund

https://textbookfull.com/product/advanced-object-oriented-
programming-in-r-statistical-programming-for-data-science-
analysis-and-finance-1st-edition-thomas-mailund/

Functional Data Structures in R: Advanced Statistical


Programming in R Thomas Mailund

https://textbookfull.com/product/functional-data-structures-in-r-
advanced-statistical-programming-in-r-thomas-mailund/
Biota Grow 2C gather 2C cook Loucas

https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/

Practical Data Science Cookbook Data pre processing


analysis and visualization using R and Python
Prabhanjan Tattar

https://textbookfull.com/product/practical-data-science-cookbook-
data-pre-processing-analysis-and-visualization-using-r-and-
python-prabhanjan-tattar/

Functional Data Structures in R: Advanced Statistical


Programming in R Mailund

https://textbookfull.com/product/functional-data-structures-in-r-
advanced-statistical-programming-in-r-mailund/

Computer Science in Sport Modeling Simulation Data


Analysis and Visualization of Sports Related Data
2024th Edition Daniel Memmert

https://textbookfull.com/product/computer-science-in-sport-
modeling-simulation-data-analysis-and-visualization-of-sports-
related-data-2024th-edition-daniel-memmert/

A Data Scientist s Guide to Acquiring Cleaning and


Managing Data in R 1st Edition Samuel E. Buttrey

https://textbookfull.com/product/a-data-scientist-s-guide-to-
acquiring-cleaning-and-managing-data-in-r-1st-edition-samuel-e-
buttrey/
Beginning
Data Science in R
Data Analysis, Visualization, and
Modelling for the Data Scientist

Thomas Mailund
Beginning Data
Science in R
Data Analysis, Visualization,
and Modelling for the Data Scientist

Thomas Mailund
Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist
Thomas Mailund
Aarhus, Denmark
ISBN-13 (pbk): 978-1-4842-2670-4 ISBN-13 (electronic): 978-1-4842-2671-1
DOI 10.1007/978-1-4842-2671-1
Library of Congress Control Number: 2017934529
Copyright © 2017 by Thomas Mailund
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with
every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an
editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material
contained herein.
Managing Director: Welmoed Spahr
Editorial Director: Todd Green
Acquisitions Editor: Steve Anglin
Development Editor: Matthew Moodie
Technical Reviewer: Andrew Moskowitz
Coordinating Editor: Mark Powers
Copy Editor: Kezia Endsley
Compositor: SPi Global
Indexer: SPi Global
Artist: SPi Global
Cover image designed by Freepik
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street,
6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-
sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member
(owner) is Springer Science + Business Media Finance, Inc (SSBM Finance, Inc). SSBM Finance Inc is a
Delaware corporation.
For information on translations, please e-mail rights@apress.com, or visit http://www.apress.com/rights-
permissions.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and
licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales
web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is available to
readers on GitHub via the book’s product page, located at www.apress.com/9781484226704. For more
detailed information, please visit http://www.apress.com/source-code.
Printed on acid-free paper
Contents at a Glance

About the Author���������������������������������������������������������������������������������������������������xvii


About the Technical Reviewer��������������������������������������������������������������������������������xix
Acknowledgments��������������������������������������������������������������������������������������������������xxi
Introduction����������������������������������������������������������������������������������������������������������xxiii


■Chapter 1: Introduction to R Programming����������������������������������������������������������� 1

■Chapter 2: Reproducible Analysis����������������������������������������������������������������������� 29

■Chapter 3: Data Manipulation������������������������������������������������������������������������������ 45

■Chapter 4: Visualizing Data��������������������������������������������������������������������������������� 75

■Chapter 5: Working with Large Datasets����������������������������������������������������������� 113

■Chapter 6: Supervised Learning������������������������������������������������������������������������ 125

■Chapter 7: Unsupervised Learning�������������������������������������������������������������������� 169

■Chapter 8: More R Programming����������������������������������������������������������������������� 205

■Chapter 9: Advanced R Programming��������������������������������������������������������������� 233

■Chapter 10: Object Oriented Programming������������������������������������������������������� 257

■Chapter 11: Building an R Package������������������������������������������������������������������� 269

■Chapter 12: Testing and Package Checking������������������������������������������������������ 281

■Chapter 13: Version Control������������������������������������������������������������������������������ 287

■Chapter 14: Profiling and Optimizing���������������������������������������������������������������� 303

Index��������������������������������������������������������������������������������������������������������������������� 347

iii
Contents

About the Author���������������������������������������������������������������������������������������������������xvii


About the Technical Reviewer��������������������������������������������������������������������������������xix
Acknowledgments��������������������������������������������������������������������������������������������������xxi
Introduction����������������������������������������������������������������������������������������������������������xxiii


■Chapter 1: Introduction to R Programming����������������������������������������������������������� 1
Basic Interaction with R��������������������������������������������������������������������������������������������������� 1
Using R as a Calculator���������������������������������������������������������������������������������������������������� 3
Simple Expressions�������������������������������������������������������������������������������������������������������������������������������� 3
Assignments������������������������������������������������������������������������������������������������������������������������������������������� 5
Actually, All of the Above Are Vectors of Values…���������������������������������������������������������������������������������� 5
Indexing Vectors������������������������������������������������������������������������������������������������������������������������������������� 6
Vectorized Expressions��������������������������������������������������������������������������������������������������������������������������� 7

Comments������������������������������������������������������������������������������������������������������������������������ 8
Functions�������������������������������������������������������������������������������������������������������������������������� 8
Getting Documentation for Functions����������������������������������������������������������������������������������������������������� 9
Writing Your Own Functions����������������������������������������������������������������������������������������������������������������� 10
Vectorized Expressions and Functions������������������������������������������������������������������������������������������������� 12

A Quick Look at Control Structures�������������������������������������������������������������������������������� 12


Factors��������������������������������������������������������������������������������������������������������������������������� 16
Data Frames������������������������������������������������������������������������������������������������������������������� 18
Dealing with Missing Values������������������������������������������������������������������������������������������ 20
Using R Packages����������������������������������������������������������������������������������������������������������� 21

v
■ Contents

Data Pipelines (or Pointless Programming)�������������������������������������������������������������������� 22


Writing Pipelines of Function Calls������������������������������������������������������������������������������������������������������� 23
Writing Functions that Work with Pipelines������������������������������������������������������������������������������������������ 23
The magical “.” argument��������������������������������������������������������������������������������������������������������������������� 24
Defining Functions Using . .������������������������������������������������������������������������������������������������������������������ 25
Anonymous Functions�������������������������������������������������������������������������������������������������������������������������� 26
Other Pipeline Operations��������������������������������������������������������������������������������������������������������������������� 27

Coding and Naming Conventions����������������������������������������������������������������������������������� 28


Exercises������������������������������������������������������������������������������������������������������������������������ 28
Mean of Positive Values������������������������������������������������������������������������������������������������������������������������ 28
Root Mean Square Error����������������������������������������������������������������������������������������������������������������������� 28


■Chapter 2: Reproducible Analysis����������������������������������������������������������������������� 29
Literate Programming and Integration of Workflow and Documentation����������������������� 30
Creating an R Markdown/knitr Document in RStudio���������������������������������������������������� 30
The YAML Language������������������������������������������������������������������������������������������������������� 33
The Markdown Language����������������������������������������������������������������������������������������������� 34
Formatting Text������������������������������������������������������������������������������������������������������������������������������������� 35
Cross-Referencing�������������������������������������������������������������������������������������������������������������������������������� 38
Bibliographies��������������������������������������������������������������������������������������������������������������������������������������� 39
Controlling the Output (Templates/Stylesheets)����������������������������������������������������������������������������������� 39

Running R Code in Markdown Documents��������������������������������������������������������������������� 40


Using Chunks when Analyzing Data (Without Compiling Documents)�������������������������������������������������� 42
Caching Results������������������������������������������������������������������������������������������������������������������������������������ 43
Displaying Data������������������������������������������������������������������������������������������������������������������������������������� 43

Exercises������������������������������������������������������������������������������������������������������������������������ 44
Create an R Markdown Document�������������������������������������������������������������������������������������������������������� 44
Produce Different Output���������������������������������������������������������������������������������������������������������������������� 44
Add Caching����������������������������������������������������������������������������������������������������������������������������������������� 44

vi
■ Contents


■Chapter 3: Data Manipulation������������������������������������������������������������������������������ 45
Data Already in R������������������������������������������������������������������������������������������������������������ 45
Quickly Reviewing Data�������������������������������������������������������������������������������������������������� 47
Reading Data������������������������������������������������������������������������������������������������������������������ 48
Examples of Reading and Formatting Datasets������������������������������������������������������������� 49
Breast Cancer Dataset�������������������������������������������������������������������������������������������������������������������������� 49
Boston Housing Dataset����������������������������������������������������������������������������������������������������������������������� 55
The readr Package�������������������������������������������������������������������������������������������������������������������������������� 56
Manipulating Data with dplyr����������������������������������������������������������������������������������������� 58
Some Useful dplyr Functions���������������������������������������������������������������������������������������������������������������� 59
Breast Cancer Data Manipulation��������������������������������������������������������������������������������������������������������� 65

Tidying Data with tidyr��������������������������������������������������������������������������������������������������� 69


Exercises������������������������������������������������������������������������������������������������������������������������ 72
Importing Data�������������������������������������������������������������������������������������������������������������������������������������� 73
Using dplyr�������������������������������������������������������������������������������������������������������������������������������������������� 73
Using tidyr�������������������������������������������������������������������������������������������������������������������������������������������� 73


■Chapter 4: Visualizing Data��������������������������������������������������������������������������������� 75
Basic Graphics��������������������������������������������������������������������������������������������������������������� 75
The Grammar of Graphics and the ggplot2 Package������������������������������������������������������ 83
Using qplot()����������������������������������������������������������������������������������������������������������������������������������������� 84
Using Geometries��������������������������������������������������������������������������������������������������������������������������������� 88
Facets��������������������������������������������������������������������������������������������������������������������������������������������������� 97
Scaling������������������������������������������������������������������������������������������������������������������������������������������������ 100
Themes and Other Graphics Transformations������������������������������������������������������������������������������������ 105

Figures with Multiple Plots������������������������������������������������������������������������������������������� 109


Exercises���������������������������������������������������������������������������������������������������������������������� 111

vii
■ Contents


■Chapter 5: Working with Large Datasets����������������������������������������������������������� 113
Subsample Your Data Before You Analyze the Full Dataset������������������������������������������ 113
Running Out of Memory During Analysis���������������������������������������������������������������������� 115
Too Large to Plot����������������������������������������������������������������������������������������������������������� 116
Too Slow to Analyze������������������������������������������������������������������������������������������������������ 120
Too Large to Load��������������������������������������������������������������������������������������������������������� 121
Exercises���������������������������������������������������������������������������������������������������������������������� 124
Subsampling��������������������������������������������������������������������������������������������������������������������������������������� 124
Hex and 2D Density Plots������������������������������������������������������������������������������������������������������������������� 124


■Chapter 6: Supervised Learning������������������������������������������������������������������������ 125
Machine Learning��������������������������������������������������������������������������������������������������������� 125
Supervised Learning���������������������������������������������������������������������������������������������������� 125
Regression versus Classification�������������������������������������������������������������������������������������������������������� 126
Inference versus Prediction���������������������������������������������������������������������������������������������������������������� 127

Specifying Models�������������������������������������������������������������������������������������������������������� 128


Linear Regression������������������������������������������������������������������������������������������������������������������������������� 128
Logistic Regression (Classification, Really)���������������������������������������������������������������������������������������� 133
Model Matrices and Formula�������������������������������������������������������������������������������������������������������������� 136

Validating Models��������������������������������������������������������������������������������������������������������� 145


Evaluating Regression Models����������������������������������������������������������������������������������������������������������� 145
Evaluating Classification Models�������������������������������������������������������������������������������������������������������� 147
Random Permutations of Your Data���������������������������������������������������������������������������������������������������� 153
Cross-Validation��������������������������������������������������������������������������������������������������������������������������������� 157
Selecting Random Training and Testing Data������������������������������������������������������������������������������������� 159

Examples of Supervised Learning Packages��������������������������������������������������������������� 161


Decision Trees������������������������������������������������������������������������������������������������������������������������������������ 161
Random Forests���������������������������������������������������������������������������������������������������������������������������������� 163
Neural Networks��������������������������������������������������������������������������������������������������������������������������������� 164
Support Vector Machines�������������������������������������������������������������������������������������������������������������������� 165

viii
■ Contents

Naive Bayes������������������������������������������������������������������������������������������������������������������ 165


Exercises���������������������������������������������������������������������������������������������������������������������� 166
Fitting Polynomials����������������������������������������������������������������������������������������������������������������������������� 166
Evaluating Different Classification Measures������������������������������������������������������������������������������������� 166
Breast Cancer Classification��������������������������������������������������������������������������������������������������������������� 166
Leave-One-Out Cross-Validation (Slightly More Difficult)������������������������������������������������������������������� 167
Decision Trees������������������������������������������������������������������������������������������������������������������������������������ 167
Random Forests���������������������������������������������������������������������������������������������������������������������������������� 167
Neural Networks��������������������������������������������������������������������������������������������������������������������������������� 167
Support Vector Machines�������������������������������������������������������������������������������������������������������������������� 167
Compare Classification Algorithms����������������������������������������������������������������������������������������������������� 167


■Chapter 7: Unsupervised Learning�������������������������������������������������������������������� 169
Dimensionality Reduction��������������������������������������������������������������������������������������������� 169
Principal Component Analysis������������������������������������������������������������������������������������������������������������ 169
Multidimensional Scaling������������������������������������������������������������������������������������������������������������������� 177

Clustering��������������������������������������������������������������������������������������������������������������������� 181
k-Means Clustering���������������������������������������������������������������������������������������������������������������������������� 182
Hierarchical Clustering����������������������������������������������������������������������������������������������������������������������� 188

Association Rules��������������������������������������������������������������������������������������������������������� 192


Exercises���������������������������������������������������������������������������������������������������������������������� 196
Dealing with Missing Data in the HouseVotes84 Data������������������������������������������������������������������������ 196
Rescaling for k-Means Clustering������������������������������������������������������������������������������������������������������ 196
Varying k��������������������������������������������������������������������������������������������������������������������������������������������� 196

Project 1����������������������������������������������������������������������������������������������������������������������� 196


Importing Data������������������������������������������������������������������������������������������������������������������������������������ 197
Exploring the Data������������������������������������������������������������������������������������������������������������������������������ 198

Fitting Models��������������������������������������������������������������������������������������������������������������� 203

ix
■ Contents

Exercises���������������������������������������������������������������������������������������������������������������������� 204
Exploring Other Formulas������������������������������������������������������������������������������������������������������������������� 204
Exploring Different Models����������������������������������������������������������������������������������������������������������������� 204
Analyzing Your Own Dataset��������������������������������������������������������������������������������������������������������������� 204


■Chapter 8: More R Programming����������������������������������������������������������������������� 205
Expressions������������������������������������������������������������������������������������������������������������������ 205
Arithmetic Expressions����������������������������������������������������������������������������������������������������������������������� 205
Boolean Expressions�������������������������������������������������������������������������������������������������������������������������� 206

Basic Data Types���������������������������������������������������������������������������������������������������������� 207


The Numeric Type������������������������������������������������������������������������������������������������������������������������������� 207
The Integer Type��������������������������������������������������������������������������������������������������������������������������������� 208
The Complex Type������������������������������������������������������������������������������������������������������������������������������� 208
The Logical Type��������������������������������������������������������������������������������������������������������������������������������� 208
The Character Type����������������������������������������������������������������������������������������������������������������������������� 209

Data Structures������������������������������������������������������������������������������������������������������������ 209


Vectors������������������������������������������������������������������������������������������������������������������������������������������������ 209
Matrix������������������������������������������������������������������������������������������������������������������������������������������������� 210
Lists���������������������������������������������������������������������������������������������������������������������������������������������������� 212
Indexing���������������������������������������������������������������������������������������������������������������������������������������������� 213
Named Values������������������������������������������������������������������������������������������������������������������������������������� 215
Factors������������������������������������������������������������������������������������������������������������������������������������������������ 216
Formulas��������������������������������������������������������������������������������������������������������������������������������������������� 216

Control Structures�������������������������������������������������������������������������������������������������������� 216


Selection Statements������������������������������������������������������������������������������������������������������������������������� 216
Loops�������������������������������������������������������������������������������������������������������������������������������������������������� 218
A Word of Warning About Looping������������������������������������������������������������������������������������������������������ 219

Functions���������������������������������������������������������������������������������������������������������������������� 220
Named Arguments������������������������������������������������������������������������������������������������������������������������������ 221
Default Parameters����������������������������������������������������������������������������������������������������������������������������� 222
Return Values������������������������������������������������������������������������������������������������������������������������������������� 222

x
■ Contents

Lazy Evaluation����������������������������������������������������������������������������������������������������������������������������������� 223


Scoping����������������������������������������������������������������������������������������������������������������������������������������������� 224
Function Names Are Different from Variable Names�������������������������������������������������������������������������� 227

Recursive Functions����������������������������������������������������������������������������������������������������� 227


Exercises���������������������������������������������������������������������������������������������������������������������� 229
Fibonacci Numbers����������������������������������������������������������������������������������������������������������������������������� 229
Outer Product������������������������������������������������������������������������������������������������������������������������������������� 229
Linear Time Merge������������������������������������������������������������������������������������������������������������������������������ 229
Binary Search������������������������������������������������������������������������������������������������������������������������������������� 230
More Sorting��������������������������������������������������������������������������������������������������������������������������������������� 230
Selecting the k Smallest Element������������������������������������������������������������������������������������������������������� 231


■Chapter 9: Advanced R Programming��������������������������������������������������������������� 233
Working with Vectors and Vectorizing Functions��������������������������������������������������������� 233
ifelse��������������������������������������������������������������������������������������������������������������������������������������������������� 235
Vectorizing Functions������������������������������������������������������������������������������������������������������������������������� 235
The apply Family�������������������������������������������������������������������������������������������������������������������������������� 237

Advanced Functions����������������������������������������������������������������������������������������������������� 242


Special Names������������������������������������������������������������������������������������������������������������������������������������ 242
Infix Operators������������������������������������������������������������������������������������������������������������������������������������ 242
Replacement Functions���������������������������������������������������������������������������������������������������������������������� 243

How Mutable Is Data Anyway?������������������������������������������������������������������������������������� 245


Functional Programming���������������������������������������������������������������������������������������������� 246
Anonymous Functions������������������������������������������������������������������������������������������������������������������������ 246
Functions Taking Functions as Arguments����������������������������������������������������������������������������������������� 247
Functions Returning Functions (and Closures)����������������������������������������������������������������������������������� 247
Filter, Map, and Reduce���������������������������������������������������������������������������������������������������������������������� 248

Function Operations: Functions as Input and Output��������������������������������������������������� 250


Ellipsis Parameters����������������������������������������������������������������������������������������������������������������������������� 253

xi
■ Contents

Exercises���������������������������������������������������������������������������������������������������������������������� 255
between���������������������������������������������������������������������������������������������������������������������������������������������� 255
apply_if����������������������������������������������������������������������������������������������������������������������������������������������� 255
power������������������������������������������������������������������������������������������������������������������������������������������������� 255
Row and Column Sums���������������������������������������������������������������������������������������������������������������������� 255
Factorial Again������������������������������������������������������������������������������������������������������������������������������������ 255
Function Composition������������������������������������������������������������������������������������������������������������������������� 256


■Chapter 10: Object Oriented Programming������������������������������������������������������� 257
Immutable Objects and Polymorphic Functions����������������������������������������������������������� 257
Data Structures������������������������������������������������������������������������������������������������������������ 257
Example: Bayesian Linear Model Fitting��������������������������������������������������������������������������������������������� 258

Classes������������������������������������������������������������������������������������������������������������������������� 259
Polymorphic Functions������������������������������������������������������������������������������������������������� 261
Defining Your Own Polymorphic Functions����������������������������������������������������������������������������������������� 262

Class Hierarchies���������������������������������������������������������������������������������������������������������� 263


Specialization as Interface����������������������������������������������������������������������������������������������������������������� 263
Specialization in Implementations������������������������������������������������������������������������������������������������������ 264

Exercises���������������������������������������������������������������������������������������������������������������������� 267
Shapes������������������������������������������������������������������������������������������������������������������������������������������������ 267
Polynomials���������������������������������������������������������������������������������������������������������������������������������������� 267


■Chapter 11: Building an R Package������������������������������������������������������������������� 269
Creating an R Package������������������������������������������������������������������������������������������������� 269
Package Names���������������������������������������������������������������������������������������������������������������������������������� 269
The Structure of an R Package����������������������������������������������������������������������������������������������������������� 270
.Rbuildignore�������������������������������������������������������������������������������������������������������������������������������������� 270
Description����������������������������������������������������������������������������������������������������������������������������������������� 271
NAMESPACE���������������������������������������������������������������������������������������������������������������������������������������� 274
R/ and man/���������������������������������������������������������������������������������������������������������������������������������������� 275

xii
■ Contents

Roxygen������������������������������������������������������������������������������������������������������������������������ 275
Documenting Functions���������������������������������������������������������������������������������������������������������������������� 275
Import and Export������������������������������������������������������������������������������������������������������������������������������� 276
Package Scope Versus Global Scope�������������������������������������������������������������������������������������������������� 277
Internal Functions������������������������������������������������������������������������������������������������������������������������������� 277
File Load Order����������������������������������������������������������������������������������������������������������������������������������� 277

Adding Data to Your Package��������������������������������������������������������������������������������������� 278


Building an R Package������������������������������������������������������������������������������������������������� 279
Exercises���������������������������������������������������������������������������������������������������������������������� 280

■Chapter 12: Testing and Package Checking������������������������������������������������������ 281
Unit Testing������������������������������������������������������������������������������������������������������������������� 281
Automating Testing����������������������������������������������������������������������������������������������������������������������������� 282
Using testthat������������������������������������������������������������������������������������������������������������������������������������� 283
Writing Good Tests������������������������������������������������������������������������������������������������������������������������������ 284
Using Random Numbers in Tests�������������������������������������������������������������������������������������������������������� 285
Testing Random Results��������������������������������������������������������������������������������������������������������������������� 285

Checking a Package for Consistency��������������������������������������������������������������������������� 286


Exercise������������������������������������������������������������������������������������������������������������������������ 286

■Chapter 13: Version Control������������������������������������������������������������������������������ 287
Version Control and Repositories��������������������������������������������������������������������������������� 287
Using git in RStudio������������������������������������������������������������������������������������������������������ 288
Installing git���������������������������������������������������������������������������������������������������������������������������������������� 288
Making Changes to Files, Staging Files, and Committing Changes���������������������������������������������������� 289
Adding git to an Existing Project��������������������������������������������������������������������������������������������������������� 291
Bare Repositories and Cloning Repositories��������������������������������������������������������������������������������������� 291
Pushing Local Changes and Fetching and Pulling Remote Changes�������������������������������������������������� 292
Handling Conflicts������������������������������������������������������������������������������������������������������������������������������ 294
Working with Branches���������������������������������������������������������������������������������������������������������������������� 294
Typical Workflows Involve Lots of Branches��������������������������������������������������������������������������������������� 297
Pushing Branches to the Global Repository���������������������������������������������������������������������������������������� 297

xiii
■ Contents

GitHub��������������������������������������������������������������������������������������������������������������������������� 297
Moving an Existing Repository to GitHub�������������������������������������������������������������������������������������������� 299
Installing Packages from GitHub�������������������������������������������������������������������������������������������������������� 300

Collaborating on GitHub������������������������������������������������������������������������������������������������ 300


Pull Requests�������������������������������������������������������������������������������������������������������������������������������������� 300
Forking Repositories Instead of Cloning��������������������������������������������������������������������������������������������� 301

Exercises���������������������������������������������������������������������������������������������������������������������� 301

■Chapter 14: Profiling and Optimizing���������������������������������������������������������������� 303
Profiling������������������������������������������������������������������������������������������������������������������������ 303
A Graph-Flow Algorithm��������������������������������������������������������������������������������������������������������������������� 304

Speeding Up Your Code������������������������������������������������������������������������������������������������ 315


Parallel Execution��������������������������������������������������������������������������������������������������������� 317
Switching to C++��������������������������������������������������������������������������������������������������������� 320
Exercises���������������������������������������������������������������������������������������������������������������������� 322
Project 2����������������������������������������������������������������������������������������������������������������������� 322
Bayesian Linear Regression����������������������������������������������������������������������������������������� 323
Exercises: Priors and Posteriors��������������������������������������������������������������������������������������������������������� 324
Predicting Target Variables for New Predictor Values������������������������������������������������������������������������� 328

Formulas and Their Model Matrix�������������������������������������������������������������������������������� 330


Working with Model Matrices in R������������������������������������������������������������������������������������������������������ 331
Exercises�������������������������������������������������������������������������������������������������������������������������������������������� 334
Model Matrices Without Response Variables�������������������������������������������������������������������������������������� 334
Exercises�������������������������������������������������������������������������������������������������������������������������������������������� 335

Interface to a blm Class����������������������������������������������������������������������������������������������� 336


Constructor����������������������������������������������������������������������������������������������������������������������������������������� 336
Updating Distributions: An Example Interface������������������������������������������������������������������������������������ 337
Designing Your blm Class������������������������������������������������������������������������������������������������������������������� 340
Model Methods����������������������������������������������������������������������������������������������������������������������������������� 340

xiv
■ Contents

Building an R Package for blm������������������������������������������������������������������������������������� 342


Deciding on the Package Interface����������������������������������������������������������������������������������������������������� 342
Organization of Source Files��������������������������������������������������������������������������������������������������������������� 342
Document Your Package Interface Well���������������������������������������������������������������������������������������������� 343
Adding README and NEWS Files to Your Package����������������������������������������������������������������������������� 343

Testing�������������������������������������������������������������������������������������������������������������������������� 344
GitHub��������������������������������������������������������������������������������������������������������������������������� 344
Conclusions������������������������������������������������������������������������������������������������������������������ 344
Data Science��������������������������������������������������������������������������������������������������������������������������������������� 345
Machine Learning������������������������������������������������������������������������������������������������������������������������������� 345
Data Analysis�������������������������������������������������������������������������������������������������������������������������������������� 345
R Programming����������������������������������������������������������������������������������������������������������������������������������� 345

The End������������������������������������������������������������������������������������������������������������������������ 346


Acknowledgements������������������������������������������������������������������������������������������������������ 346

Index��������������������������������������������������������������������������������������������������������������������� 347

xv
About the Author

Thomas Mailund is an associate professor in bioinformatics at Aarhus University, Denmark. His background
is in math and computer science, but for the last decade, his main focus has been on genetics and
evolutionary studies, particularly comparative genomics, speciation, and gene flow between emerging
species.

xvii
About the Technical Reviewer

Andrew Moskowitz is a doctoral candidate in Quantitative Psychology at


UCLA and self-employed statistical consultant. His quantitative research
focuses mainly on hypothesis testing and effect sizes in mixed effects
models. While at UCLA, Andrew has collaborated with a number of
faculty, students, and enterprises to help them derive meaning from data
across an array of fields ranging from psychological services and health
care delivery to marketing.

xix
Acknowledgments

I would like to thank Asger Hobolth for many useful comments on earlier versions of this manuscript. He
helped me improve the writing and the presentation of the material.

xxi
Introduction

Welcome to Introduction to Data Science with R. This book was written as a set of lecture notes for two
classes I teach, Data Science: Visualization and Analysis and Data Science: Software Development and
Testing. The book is written to fit the structure of these classes, where each class consists of seven weeks of
lectures and project work. This means that there are 14 chapters with the core material, where the first seven
focus on data analysis and the last seven on developing reusable software for data science.

What Is Data Science?


Oh boy! That is a difficult question. I don’t know if it is easy to find someone who is entirely sure what data
science is, but I am pretty sure that it would be difficult to find two people with fewer than three opinions
about it. It is certainly a popular buzzword, and everyone wants to have data scientists these days, so data
science skills are useful to have on the CV. But what is it?
Since I can’t really give you an agreed upon definition, I will just give you my own: Data science is the
science of learning from data.
This is a very broad definition—almost too broad to be useful. I realize this. But then, I think data
science is an incredibly general field. I don’t have a problem with that. Of course, you could argue that any
science is all about getting information out of data, and you might be right. Although I would say that there
is more to science than just transforming raw data into useful information. The sciences are focusing on
answering specific questions about the world while data science is focusing on how to manipulate data
efficiently and effectively. The primary focus is not which questions to ask of the data but how we can
answer them, whatever they may be. It is more like computer science and mathematics than it is like natural
sciences, in this way. It isn’t so much about studying the natural world as it is about how to compute data
efficiently.
Included in data science is the design of experiments. With the right data, we can address the questions
we are interested in. With a poor design of experiments or a poor choice of which data we gather, this can be
difficult. Study design might be the most important aspect of data science, but is not the topic of this book. In
this book I focus on the analysis of data, once gathered.
Computer science is also mainly the study of computations—as is hinted at in the name—but is a bit
broader in this focus. Although datalogy, an earlier name for data science, was also suggested for computer
science, and for example in Denmark it is the name for computer science, using the name “computer
science” puts the focus on computation while using the name “data science” puts the focus on data. But of
course, the fields overlap. If you are writing a sorting algorithm, are you then focusing on the computation or
the data? Is that even a meaningful question to ask?
There is a huge overlap between computer science and data science and naturally the skillsets you need
overlap as well. To efficiently manipulate data you need the tools for doing that, so computer programming
skills are a must and some knowledge about algorithms and data structures usually is as well. For data
science, though, the focus is always on the data. In a data analysis project, the focus is on how the data flows
from its raw form through various manipulations until it is summarized in some useful form. Although the
difference can be subtle, the focus is not about what operations a program does during the analysis, but
about how the data flows and is transformed. It is also focused on why we do certain transformations of the

xxiii
■ Introduction

data, what purpose those changes serve, and how they help us gain knowledge about the data. It is as much
about deciding what to do with the data as it is about how to do it efficiently.
Statistics is of course also closely related to data science. So closely linked, in fact, that many consider
data science just a fancy word for statistics that looks slightly more modern and sexy. I can’t say that I
strongly disagree with this—data science does sound sexier than statistics—but just as data science is
slightly different from computer science, data science is also slightly different from statistics. Just, perhaps,
somewhat less different than computer science is.
A large part of doing statistics is building mathematical models for your data and fitting the models to
the data to learn about the data in this way. That is also what we do in data science. As long as the focus is on
the data, I am happy to call statistics data science. If the focus changes to the models and the mathematics,
then we are drifting away from data science into something else—just as if the focus changes from the data
to computations we are drifting from data science to computer science.
Data science is also related to machine learning and artificial intelligence, and again there are huge
overlaps. Perhaps not surprising since something like machine learning has its home both in computer
science and in statistics; if it is focusing on data analysis, it is also at home in data science. To be honest, it
has never been clear to me when a mathematical model changes from being a plain old statistical model to
becoming machine learning anyway.
For this book, we are just going to go with my definition and, as long as we are focusing on analyzing
data, we are going to call it data science.

Prerequisites for Reading this Book


In the first seven chapters in this book, the focus is on data analysis and not programming. For those
seven chapters, I do not assume a detailed familiarity with topics such as software design, algorithms, data
structures, and such. I do not expect you to have any experience with the R programming language either.
I do, however, expect that you have had some experience with programming, mathematical modeling, and
statistics.
Programming R can be quite tricky at times if you are familiar with a scripting language or object-
oriented languages. R is a functional language that does not allow you to modify data, and while it does
have systems for object-oriented programming, it handles this programming paradigm very differently from
languages you are likely to have seen such as Java or Python.
For the data analysis part of this book, the first seven chapters, we will only use R for very
straightforward programming tasks, so none of this should pose a problem. We will have to write simple
scripts for manipulating and summarizing data so you should be familiar with how to write basic
expressions like function calls, if statements, loops, and so on. These things you will have to be comfortable
with. I will introduce every such construction in the book when we need them so you will see how they are
expressed in R, but I will not spend much time explaining them. I mostly will just expect you to be able to
pick it up from examples.
Similarly, I do not expect you to know already how to fit data and compare models in R. I do expect that
you have had enough introduction to statistics to be comfortable with basic terms like parameter estimation,
model fitting, explanatory and response variables, and model comparison. If not, I expect you to be at least
able to pick up what we are talking about when you need to.
I won’t expect you to know a lot about statistics and programming, but this isn’t Data Science for
Dummies, so I do expect you to be able to figure out examples without me explaining everything in detail.
After the first seven chapters is a short description of a data analysis project, one of my students did
in an earlier class. It shows how such a project could look, but I suggest that you do not wait until you have
finished the first seven chapters to start doing such analysis yourself. To get the most benefit out of reading
this book, you should be applying what you learn continuously. Already when you begin reading, I suggest
that you find a dataset that you would be interested in finding out more about and then apply what you learn
in each chapter to that data.

xxiv
■ Introduction

For the final seven chapters of the book, the focus is on programming. To read this part you should
be familiar with object-oriented programming. I will explain how it is handled in R and how it differs from
languages such as Python, Java or C++ but I expect you to be familiar with terms such as class hierarchies,
inheritance, and polymorphic methods. I will not expect you to be already familiar with functional
programming (but if you are, there should still be plenty to learn in those chapters if you are not already
familiar with R programming as well).

Plan for the Book


In the book, we cover basic data manipulation—filtering and selecting relevant data; transforming data into
shapes readily analyzable; summarizing data; visualizing data in informative ways both for exploring data and
presenting results; and model building. These are the key aspects of doing analysis in data science. After this
we will cover how to develop R code that is reusable and works well with existing packages, and that is easy
to extend, and we will see how to build new R packages that other people will be able to use in their projects.
These are the essential skills you will need to develop your own methods and share them with the world.
We will do all this using the programming language R (https://www.r-project.org/about.html).
R is one of the most popular (and open source) data analysis programming languages around at the
moment. Of course, popularity doesn’t imply quality, but because R is so popular it has a rich ecosystem of
extensions (called “packages” in R) for just about any kind of analysis you could be interested in. People who
develop statistical methods often implement them as R packages, so you can quite often get the state of the
art techniques very easily in R. The popularity also means that there is a large community of people who can
help if you have problems. Most problems you run into can be solved with a few minutes on Google because
you are unlikely to be the first to run into any particular issue. There are also plenty of online tutorials for
learning more about R and specialized packages, there are plenty of videos with talks about R and popular R
packages, and there are plenty of books you can buy if you want to learn more.

Data Analysis and Visualization


The topics focusing on data analysis and visualization are covered in the first seven chapters:
• Chapter 1, Introduction to R programming. In which you learn how to work with data
and write data pipelines.
• Chapter 2, Reproducible analysis. In which you find out how to integrate
documentation and analysis in a single document and how to use such documents
to produce reproducible research.
• Chapter 3, Data manipulation. In which you learn how to import, tidy up, and
transform data, and compute summaries from data.
• Chapter 4, Visualizing and exploring data. In which you learn how to make plots for
exploring data features and for presenting data features and analysis results.
• Chapter 5, Working with large datasets. In which you learn how to deal with data
where the number of observations make the usual approaches too slow.
• Chapter 6, Supervised learning. In which you learn how to train models when you
have datasets with known classes or regression values.
• Chapter 7, Unsupervised learning. In which you learn how to search for patterns you
are not aware of in data.
These chapters are followed by the first project, where you see the various techniques in use.

xxv
■ Introduction

Software Development
Software and package development is then covered in the following seven chapters:
• Chapter 8, More R programming. In which you’ll return to the basics of R
programming and get a few more details than the tutorial in Chapter 1.
• Chapter 9, Advanced R programming. In which you explore more advanced
features of the R programming language, in particular, functional programming.
• Chapter 10, Object oriented programming. In which you learn how R models object
orientation and how you can use it to write more generic code.
• Chapter 11, Building an R package. In which you learn the necessary components of
an R package and how to program your own.
• Chapter 12, Testing and checking. In which you learn techniques for testing your R
code and checking the consistency of your R packages.
• Chapter 13, Version control. In which you learn how to manage code under version
control and how to collaborate using GitHub.
• Chapter 14, Profiling and optimizing. In which you learn how to identify hotspots
of code where inefficient solutions are slowing you down and techniques for
alleviating this.
These chapters are then followed by the second project, where you’ll build a package for Bayesian linear
regression.

Getting R and RStudio


You will need to install R on your computer to do the exercises in this book. I suggest that you get an
integrated environment since it can be slightly easier to keep track of a project when you have your plots,
documentation, code, etc., all in the same program.
I personally use RStudio (https://www.rstudio.com/products/RStudio), which I warmly recommend.
You can get it for free—just follow the link—and I will assume that you have it when I need to refer to the
software environment you are using in the following chapters. There won’t be much RStudio specifics,
though, and most tools for working with R have the same features, so if you want to use something else you
can probably follow the notes without any difficulties.

Projects
You cannot learn how to analyze data without analyzing data, and you cannot learn how to develop software
without developing software either. Typing in examples from the book is nothing like writing code on your
own. Even doing exercises from the book—which you really ought to do—is not the same as working on your
own projects. Exercises, after all, cover small isolated aspects of problems you have just been introduced to.
In the real world, there is not a chapter of material presented before every task you have to deal with. You
need to work out by yourself what needs to be done and how. If you only do the exercises in this book, you
will miss the most important lessons in analyzing data. How to explore the data and get a feeling for it; how
to do the detective work necessary to pull out some understanding from the data; and how to deal with all
the noise and weirdness found in any dataset. And for developing a package, you need to think through how
to design and implement its functionality so that the various functions and data structures fit well together.

xxvi
■ Introduction

In this book, I go through a data analysis project to show you what that can look like. To actually
learn how to analyze data, you need to do it yourself as well, and you need to do it with a dataset that I
haven’t analyzed for you. You might have a dataset lying around you have worked on before, a dataset
from something you are just interested in, or you can probably find something interesting at a public data
repository, e.g., one of these:
• RDataMining.com
• UCI machine learning repository (http://archive.ics.uci.edu/ml/)
• KDNuggets (http://www.kdnuggets.com/datasets/index.html)
• Reddit r/datasets (https://www.reddit.com/r/datasets)
• GitHub awesome public datasets (https://github.com/caesar0301/awesome-
public-datasets)
I suggest that you find yourself a dataset and that after each lesson, you use the skills you have learned
to explore this dataset. Pick data that is structured as a table with observations as rows and variables as
columns, since that is the form of the data we consider in this book. At the end of the first seven chapters,
you will have analyzed this data, you can write a report about your analysis that others can evaluate to follow
and maybe modify it. You will be doing reproducible science.
For the programming topics, I describe another project illustrating the design and implementation
issues involved in making an R package. There, you should be able to learn from just implementing your
own version of the project I use, but you will, of course, be more challenged by working on a project without
any of my help at all. Whichever you do, to get the full benefit of this book you should make your own
package while reading the programming chapters.

xxvii
CHAPTER 1

Introduction to R Programming

We will use R for our data analysis so we need to know the basics of programming in the R language. R is a
full programming language with both functional programming and object oriented programming features.
Learning the language is far beyond the scope of this chapter and is something we return to later. The good
news, though, is that to use R for data analysis, you rarely need to do much programming. At least, if you do
the right kind of programming, you won’t need much.
For manipulating data—and how to do this is the topic of the next chapter—you mainly just have to
string together a couple of operations. Operations such as “group the data by this feature” followed by
“calculate the mean value of these features within each group” and then “plot these means”. This used to be
much more complicated to do in R, but a couple of new ideas on how to structure such data flow—and some
clever implementations of these in a couple of packages such as magrittr and dplyr—has significantly
simplified it. We will see some of this at the end of this chapter and more in the next chapter. First, though,
you need to get a taste for R.

Basic Interaction with R


Start by downloading RStudio if you haven’t done so already (https://www.rstudio.com/products/
RStudio). If you open it, you should see a window similar to Figure 1-1. Well, except that you will be in an
empty project while the figure shows (on the top right) that this RStudio is opened in a project called “Data
Science”. You always want to be working on a project. Projects keep track of the state of your analysis by
remembering variables and functions you have written and keep track of which files you have opened and
such. Choose File ➤ New Project to create a project. You can create a project from an existing directory, but
if this is the first time you are working with R you probably just want to create an empty project in a new
directory, so do that.

© Thomas Mailund 2017 1


T. Mailund, Beginning Data Science in R, DOI 10.1007/978-1-4842-2671-1_1
Chapter 1 ■ Introduction to R Programming

Figure 1-1. RStudio

Once you have opened RStudio, you can type R expressions into the console, which is the frame on
the left of the RStudio window. When you write an expression there, R will read it, evaluate it, and print the
result. When you assign values to variables, and you will see how to do this shortly, they will appear in the
Environment frame on the top right. At the bottom right, you have the directory where the project lives, and
files you create will go there.
To create a new file, choose File ➤ New File. You can select several different file types. We are interested
in the R Script and R Markdown types. The former is the file type for pure R code, while the latter is used for
creating reports where documentation text is mixed with R code. For data analysis projects, I recommend
using Markdown files. Writing documentation for what you are doing is really helpful when you need to go
back to a project several months down the line.
For most of this chapter, you can just write R code in the console, or you can create an R Script file. If
you create an R Script file, it will show up on the top left, as shown in Figure 1-2. You can evaluate single
expressions using the Run button on the top-right of this frame, or evaluate the entire file using the Source
button. For longer expressions, you might want to write them in an R Script file for now. In the next chapter,
we talk about R Markdown, which is the better solution for data science projects.

2
Chapter 1 ■ Introduction to R Programming

Figure 1-2. RStudio with a new R Script file

Using R as a Calculator
You can use the R console as a calculator where you just type in an expression you want calculated, press
Enter, and R gives you the result. You can play around with that a little bit to get familiar with how to write
expressions in R—there is some explanation for how to write them below—moving from using R as a
calculator in this sense to writing more sophisticated analysis programs is only a question of degree. A data
analysis program is really little more than a sequence of calculations, after all.

Simple Expressions
Simple arithmetic expressions are written, as in most other programming languages, in the typical
mathematical notation that you are used to.

1 + 2
## [1] 3
4 / 2
## [1] 2
(2 + 2) * 3
## [1] 12

3
Chapter 1 ■ Introduction to R Programming

It also works pretty much as you are used to. Except, perhaps, that you might be used to integers
behaving as integers in a division. At least in some programming languages, division between integers is
integer division, but in R, you can divide integers and if there is a remainder you will get a floating-point
number back as the result.

4 / 3
## [1] 1.333333

When you write numbers like 4 and 3, they are interpreted as floating-point numbers. To explicitly get
an integer, you must write 4L and 3L.

class(4)
## [1] "numeric"
class(4L)
## [1] "integer"

You will still get a floating-point if you divide two integers, although there is no need to tell R explicitly
that you want floating-point division. If you want integer division, on the other hand, you need a different
operator, %/%:

4 %/% 3
## [1] 1

In many languages % is used to get the remainder of a division, but this doesn’t quite work with R, where
% is used to construct infix operators. So in R, the operator for this is %%:

4 %% 3
## [1] 1

In addition to the basic arithmetic operators—addition, subtraction, multiplication, division, and the
modulus operator you just saw—you also have an exponentiation operator for taking powers. For this, you
can use ^ or ** as infix operators:

2^2
## [1] 4
2^3
## [1] 8
2**2
## [1] 4
2**3
## [1] 8

There are some other data types besides numbers, but we won’t go into an exhaustive list here. There
are two types you do need to know about early, though, since they are frequently used and since not knowing
about how they work can lead to all kinds of grief. Those are strings and “factors”.
Strings work as you would expect. You write them in quotes, either double quotes or single quotes, and
that is about it.

"hello,"
## [1] "hello,"
'world!'
## [1] "world!"

4
Chapter 1 ■ Introduction to R Programming

Strings are not particularly tricky, but I mention them because they look a lot like factors, but factors are
not like strings, they just look sufficiently like them to cause some confusion. I explain factors a little later in
this chapter when you have seen how functions and vectors work.

Assignments
To assign a value to a variable, you use the arrow operators. So you assign the value 2 to the variable x, you
would write the following:

x <- 2

You can test that x now holds the value 2 by evaluating x.

x
## [1] 2

And of course, you can now use x in expressions:

2 * x
## [1] 4

You can assign with arrows in both directions, so you could also write the following:

2 -> x

An assignment won’t print anything if you write it into the R terminal, but you can get R to print it just
by putting the assignment in parentheses.

x <- "invisible"
(y <- "visible")
## [1] "visible"

Actually, All of the Above Are Vectors of Values…


If you were wondering why all the values printed above had a [1] in front of them, I am going to explain
that right now. It is because we are usually not working with single values anywhere in R. We are working
with vectors of values (and you will hear more about vectors in the next section). The vectors we have seen
have length one—they consist of a single value—so there is nothing wrong about thinking about them as
individual values. But they really are vectors.
The [1] does not indicate that we are looking at a vector of length one, though. The [1] tells you that the
first value after [1] is the first value in the vector. With longer vectors, you get the index each time R moves to
the next line of output. This is just done to make it easier to count your way into a particular index.
You will see this if you make a longer vector, for example, you can make one of length 50 using the :
operator:

1:50
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
## [16] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## [31] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## [46] 46 47 48 49 50

5
Chapter 1 ■ Introduction to R Programming

Because we are essentially always working on vectors, there is one caveat I want to warn you about. If
you want to know the length of a string, you might—reasonably enough—think you can get that using the
length function. You would be wrong. That function gives you the length of a vector, so if you give it a single
string, it will always return 1.

length("qax")
## [1] 1
length("quux")
## [1] 1
length(c("foo", "barz"))
## [1] 2

In the last expression, we used the function c() to concatenate two strings. This creates a vector of
two strings, and thus the result of calling length on that is 2. To get the length of the actual string, you want
nchar instead:

nchar("qax")
## [1] 3
nchar("quux")
## [1] 4
nchar(c("foo", "barz"))
## [1] 3 4

Indexing Vectors
If you have a vector and want the i’th element of that vector, you can index the vector to get it like this:

(v <- 1:5)
## [1] 1 2 3 4 5
v[1]
## [1] 1
v[3]
## [1] 3

We have parentheses around the first expression to see the output of the operation. An assignment is
usually silent in R, but by putting the expression in parentheses, we make sure that R prints the result, which
is the vector of integers from 1 to 5. Notice here that the first element is at index 1. Many programming
languages start indexing at 0, but R starts indexing at 1. A vector of length n is thus indexed from 1 to n,
unlike in zero-indexed languages, where the indices go from 0 to n–1.
If you want to extract a subvector, you can also do this with indexing. You just use a vector of the indices
you want inside the square brackets. You can use the : operator for this or the concatenate function, c():

v[1:3]
## [1] 1 2 3
v[c(1,3,5)]
## [1] 1 3 5

You can even use a vector of Boolean values to pick out those values that are “true”:

v[c(TRUE, FALSE, TRUE, FALSE, TRUE)]


## [1] 1 3 5

6
Exploring the Variety of Random
Documents with Different Content
"In none of the enactments in 5 & 6 Vict. c. 45 will be found
anything which prejudicially affects the right of sole
representation conferred by the statute of 3 & 4 Will IV. c. 15.
The first production of a dramatic piece mentioned in section 20
of the statute of Victoria confers no priority upon the first
producer, nor does it confer a title to the sole liberty of
representation. That is conferred by the statute 3 & 4 Will. IV. c.
15 upon the author or his assignee: it[701] only fixes the first
production as the point from which (if entitled to it) the
endurance of the sole liberty of representation is to be
calculated."

What Hawkins, J., decided was that there is a vested statutory


interest in a dramatic piece immediately it is composed, and
although it is not quite clear from his judgment, it seems necessarily
to follow that the whole rights and remedies given by 3 & 4 Will. IV.
c. 15 still attach immediately on composition, and that there is
nothing in 5 & 6 Vict. to divest the author of that right. When a
dramatic work is performed, no doubt the protection to performing
right is restricted as well as extended to the period given by 5 & 6
Vict. c. 45, i. e. forty-two years from the date of first performance,
or life and seven years: but as regards unperformed works, it is
submitted that the performing right is given by 3 & 4 Will. IV. c. 15
on composition, and is perpetual if the work be not printed and
published as a book within the British dominions, or if it be printed
and published as a book, then for forty-two years from the date of
publication as a book, or for the author's life and seven years.

Extremely difficult questions may arise as regards performing rights


when a dramatic or musical work has been published as a book or
publicly performed outside the British dominions before the first
publication or the first public performance within the British
dominions. [129]

Section 19 of 7 & 8 Vict. c. 12, provides:


"That neither the author of any book, nor the author or
composer of any Dramatic Piece or Musical Composition ...
which shall, after the passing of this Act, be first published out
of Her Majesty's Dominions, shall have any copyright therein
respectively, or any exclusive right to the public representation
or performance thereof, otherwise than such, if any, as he may
become entitled to under this Act."[702]

The whole difficulty lies in the meaning of the words "first published"
as applied to the performing right. In Boucicault v. Delafield,[703]
and Boucicault v. Chatterton,[704] it was held that when an
unpublished play was first performed outside the British dominions
the performing right in this country was extinguished. "First
published" was held to include the "first performance" of a drama.
This, however, only provides for one possible contingency. As the
literary exchange with America, with which we have no international
convention, is becoming larger every year, it may be useful to
consider some of the other contingencies which may arise, and the
difficulties of which are not yet judicially solved. The cases
suggested are in connexion with the United States, but apply equally
to any foreign country, except in so far as rights may be acquired
under International Convention.

Dramatic or Musical Work unpublished, first performed in America.—


This has been decided as above. The performing right in this country
is lost.

Dramatic or Musical Work first published in America, subsequently


first performed within the British Dominions.—This problem is not
solved by the above cases. The alternative views are that
"publication" in the section means: (i) a putting before the public in
any form, whether by representation or in print, or (ii) as regards
copyright, a publication in print, as regards performing right, a
publication by representation. I am inclined to think that the second
alternative is the correct one, and that the performing right in this
country is not lost. The contrary, however, seems to have been
assumed in Boucicault v. Chatterton,[705] both by the bench and[130]
bar.

Dramatic or Musical Work first published in the British Dominions,


subsequently first performed in America.—This problem depends on
the same two alternatives as the last. I therefore think that the
performing right here would be lost, even although there was first
publication as a book within the British dominions.

Dramatic or Musical Work first performed in America, subsequently


first published in the British Dominions.—The performing right in this
country would be lost, but probably not the copyright.

Dramatic or Musical Work first performed in the British Dominions,


subsequently first published in America.—The performing right in this
country would be secured, but the copyright lost.

Section V.—What is a Musical Composition.

The necessary originality in a musical composition consists either in


a new air or melody, or in the new arrangement and adaptation of
an old air. Thus an arrangement of an opera for the pianoforte is an
original work separate and distinct from the opera itself.[706] So the
adaptation of new words and accompaniment to an old air is a
musical composition entitled to protection.[707] It must always be
remembered, however, that a new arrangement or adaptation will
only be protected quoad its novelty. In so far as the new work is
taken from a non-copyright work, an unauthorised taking of that
part is not an infringement of the new work.

Section VI.—What Musical Works are Protected:

Duration of Protection.
As in the case of dramatic works, so in the case of musical
compositions it is submitted that the statutory protection dates from
composition, not from first public performance. Musical compositions
are protected under the same provisions which protect dramatic
works. The protection is therefore identical, except as to the[131]
two
amending statutes noticed below which do not apply to dramatic
works. It was contended in one case that the extension of 3 & 4
Will. IV. c. 15 to musical compositions was only applicable to musical
compositions of a dramatic nature.[708] This, however, is not the
case, and all musical compositions are protected.[709]

By the Copyright (Musical Compositions) Act, 1882, the performing


right in musical compositions which have been published in "book"
form is conditional[710] on a notice reserving the performing right,
and printed on every published copy. If the copyright and performing
right are in different hands the owner of the performing right must
give notice in writing to the owner of the copyright, requiring him to
print such notice, and if the latter after due notice fail to do so, he
shall forfeit to the owner of the performing right the sum of £20.

Even if the musical composition is also a dramatic piece or part


thereof, it comes within this requirement as to notice of reservation
on published copies.[711]

Once a musical composition has been printed and published without


notice of reservation, it will probably be impossible to obtain any
protection for the performing right afterwards by publishing copies
with reservation.[712]

A limited reservation is constantly made, and is probably effectual, e.


g. reserving the right to sing in music halls, but permitting public
performances elsewhere without fee or licence.[713]

Section VII.—Registration of Performing Rights.


Section 20 of 5 & 6 Vict. c. 45 enacts that "the provisions
hereinbefore enacted" in respect of registering the copyright in
books shall apply to the liberty of representing or performing any
dramatic piece or musical composition; provided that in the case of a
[132]
dramatic piece or musical composition in manuscript it shall be
sufficient to register—

1. The title.
2. The name and place of abode of author or composer.
3. The name and place of abode of the proprietor.
4. The time and place of first representation.

In the case, therefore, of a dramatic piece or musical composition


which has been published as a book, the proper registration in
respect of both copyright and performing right would seem to be
that provided by section II, viz.:

1. The title.
2. The time of first publication.
3. The name and place of abode of the publisher.[714]
4. The name and place of abode of the proprietor.[715]

This is probably correct, although it may not strictly be in accordance


with the proviso in section 20, viz.:"save and except that the first
public representation or performance of any dramatic piece or
musical composition shall be deemed equivalent in the construction
of this Act to the first publication of any book." If, however, the
provision as to registration in section 11 were strictly construed in
accordance with this proviso, the result is that the proper
registration would be:
1. The title.
2. The time of first representation.
3. The name and place of abode of the person who first
represented it.
4. The name and place of abode of the proprietor.

It is obviously absurd that this should be the form of registration


when the dramatic piece or musical composition has been printed
and published, and that the form in section 20 should be the form of
registration when it is in manuscript. The distinction between the
two forms is meaningless.

Section 24 of 5 & 6 Vict. c. 45, which enacts that no action for


[133]
infringement of copyright shall be brought unless the book is
registered, provides "that nothing herein contained shall prejudice
the remedies which the proprietor of the sole liberty of representing
any dramatic piece shall have by virtue of the Act 3 & 4 Will. IV. c.
15, or of this Act, although no entry shall be made in the book of
registry aforesaid."

The provisions as to registration of dramatic pieces are therefore


merely permissive and are in no way a condition precedent either to
the performing right itself or to the right of action upon
infringement;[716] but registration is primâ facie proof of the right of
representation subject to rebuttal by other evidence.[717]

All the provisions as to the keeping of the registry book,[718] making


false entries therein,[719] and motion to expunge,[720] apply equally
to registration of a dramatic piece for the purpose of protecting
performing right as to registration of a book for the purpose of
protecting copyright.[721]

Musical Compositions.—The requisite registration is the same as


for performing rights in dramatic works; but quære whether in the
case of performing right in a musical composition it is not a condition
precedent to action. This doubt is raised by section 24, which
provides that the registration of a book is a condition precedent to
an action for infringement of copyright, and it specially excepts "the
remedies which the proprietor of the sole liberty of representing any
dramatic piece shall have" from the operation of the section. It is
curious that "musical compositions" are omitted from this saving
clause, whereas in nearly every other part of the Act "dramatic piece
and musical compositions" are dealt with together. The arguments
against registration being a condition precedent are, (1) the first part
of section 24 relates only to copyright which does not include
performing right; (2) section 20 does not extend the provisions of
section 24 to performing right, since it only applies the provisions
"before enacted." There is also a suggestion that "dramatic piece" in
the saving clause of section 24 includes "musical composition,"[134]
since
the definition of "dramatic piece" in section 2 includes "musical or
dramatic entertainment." There is no authority directly in point. In
Russell v. Smith[722] the song called "The Ship on Fire" was
protected without registration, but then it was held to be a "dramatic
piece" and something more than a musical composition. In Clark v.
Bishop[723] the song protected was also held to be a "dramatic
piece." In Lacy v. Rhys,[724] where it was held that in the case of a
dramatic piece there was clearly no obligation to register, Crompton,
J., said that if it had not been for the proviso in section 24, there
would have been a doubt whether registration were not necessary.
[725]

In registering an unpublished arrangement of dance music taken


from an opera, the arranger, not the composer of the original opera,
must be entered as composer.[726]

Section VIII.—Assignment of Performing Rights.


The performing right in dramatic pieces and musical compositions
can only be transferred by a written assignment[727] or by entry on
the register.[728] See decisions as to assignment of copyright;[729]
but note that as regards performing right the assignment, even if
before publication or performance, must be in writing.[730] The
performing right will not pass by a mere conveyance of the copyright
in a dramatic or musical work[731] unless an entry shall be made of
such assignment in the register expressing the intention of the
parties that such right should pass.[732] As in the case of copyright,
there is no express enactment that assignment must be in writing;
but it is inferred from the fact that a licence which is a smaller right
cannot be given except by writing.[733] The assignment does not
require to be by deed,[734] and if by written document it is[135] valid
without registration.[735] Section 22 of 5 & 6 Vict. c. 45 appears at
first sight to make registration necessary in every assignment of
performing right, at least if the copyright is assigned with it; but this
is not so. If in the written assignment there is a specific conveyance
of the performing right,[736] or if general words are used such as "all
other the estate, right, title, and interest," showing that something
else than the copyright was intended to be conveyed, the performing
right will pass without registration.[737] Cotton, L. J., in considering
this section, says:

"I incline to think that this enactment was not meant to control
the operation of deeds of assignment, but only to regulate the
effect of entries in the registry book."[738]

In fact it was passed on account of Cumberland v. Planché,[739]


which decided that the assignee of the copyright took the
performing right as well.

If the view is right that the statutory performing right vests


immediately on production,[740] there can be no question of
assignment of common law rights.[741]

Performing rights can probably be partially assigned so as to make a


grantee of provincial rights not only a licensee but an assignee, with
full power to sue alone and re-assign.[742]

Section IX.—Infringement of Dramatic Performing Rights.

By 3 & 4 Will. IV. c. 15, section 1, the author or his assignee has
"the sole liberty of representing, or causing to be represented, at
any place or places of dramatic entertainment whatsoever" in the
British dominions.

Public Performance.—It is no infringement of performing right in


a dramatic work to represent it otherwise than in a place of dramatic
entertainment; but it has been held that any place where a dramatic
[136]
work is publicly performed is for the time being a place of dramatic
entertainment. In Lee v. Simpson,[743] Wilde, C. J., says:

"The legislature clearly meant places where dramatic


entertainments are represented to which the public are
admitted."

In Russell v. Smith[744] the Court decided that a certain song, "The


Ship on Fire," was a dramatic piece. Denman, C. J., said:

"It follows that as Crosby Hall was used for the public
representation for profit of a dramatic piece, it became a place
of dramatic entertainment for the time, within the statutes now
in question. The use for the time in question and not for a
former time is the essential fact. As a regular theatre may be a
lecture-room, dining-room, ball-room, and concert-room on
successive days, so a room used ordinarily for either of these
purposes would become for the time being a theatre if used for
the representation of a regular stage play. In this sense, as "The
Ship on Fire" was a dramatic piece, in our view Crosby Hall,
when used for the public representation and performance of it
for profit, became a place of dramatic entertainment. In thus
deciding we do not declare that the defendant's performances
at Crosby Hall were unlawful without a theatrical licence within
Stat. 6 & 7 Vict. c. 68."[745]

In the judgment of Brett, M. R., in Wall v. Taylor[746] there is a


suggestion that although a single item in a programme might be
dramatic, that would not be sufficient to render the whole
entertainment dramatic or to make the place a place of dramatic
performance. In Duck v. Bates[747] the defendant represented a
dramatic piece without the author's consent. The representation
took place in a room of Guy's Hospital, and was provided entirely for
the amusement of the nurses and attendants of the hospital. The
medical officers of the hospital, the students and some of their
friends were present. A reporter to a theatrical newspaper was also
present by invitation. It was held by Brett, M. R., and Bowen, L. J.
(Fry, L. J., dissenting), that the room was not a place of dramatic
entertainment. Neither profit[748] nor habitual use were essential
elements, but there must be a representation to which a portion
[137]of
the public is admitted. Brett, M. R., said:

"Did the legislature intend to forbid a representation without the


author's consent by children in a nursery before their parents,
or by grown-up persons in a drawing-room? It is clear that
something more than that must have been intended; and why
should not a representation of that kind be called a dramatic
entertainment? Because it is obviously domestic and private.
Suppose that the servants of the household are invited to
witness the performance; nevertheless it is a domestic
entertainment. As I have already intimated, the author wants
protection for the pecuniary value of his drama, and a
representation in a private room is of no pecuniary value. In
order to entitle the author to penalties there must be a
representation which will injure the author's right to money;
such, for instance, as a representation which, although it is not
for profit, would attract persons who are willing to pay money,
and would induce them not to go and see a performance
licensed by the author. Suppose that a representation in the
presence of friends takes place for the amusement of friends
and of the members of the household in an unfurnished house
hired for the occasion: that is not an infringement of the
statute: the representation must be other than domestic or
private. There must be present a sufficient part of the public
who would go also to a performance licensed by the author as a
commercial transaction; otherwise the place where the drama is
represented will not be a 'place of dramatic entertainment'
within the meaning of the statute. Suppose that a drama is
represented in a county town, and that all persons of a certain
class throughout the county are free to come: suppose that a
member for a parliamentary constituency (I do not mean shortly
before or during an election) organises dramatic entertainments
to which the inhabitants are admitted without paying: suppose
that an amateur company choose to act some drama for a
charitable object, with admission upon payment or by tickets
issued generally: in each of these instances an infringement of
the statute has been committed.... I wish to say, by way of
warning, that those who go beyond the facts of the present
case may incur the penalties of the statute."

This case is most instructive as being quite on the border line


between a private and public representation. Performing right in a
drama may be infringed by a representation without scenery and
appropriate dresses.

"We should take away a part of the protection conferred on


authors if we hold that there could be no public representation
without these accompaniments."[749]
Substantial Part.—As in literary copyright the part taken must be
[138]
material and substantial in order to infringe performing right. In
Chatterton v. Cave,[750] Lord Chief Justice Coleridge at the trial
found as a fact "that two scenes or points of the drama of the
defendant had been taken directly from the drama of the plaintiff;"
there was no further copying. He thereupon gave judgment for the
defendant. On a rule for a new trial, Lord Coleridge, sitting in the
Court of Common Pleas, stated orally that what he meant to convey
by his finding was, "that looking to the general character of the
plaintiff's and defendant's dramas, the extent to which the one was
taken from the other was so slight, and the effect upon the total
composition was so small, that there was no substantial and material
taking of any one portion of the defendant's drama from any portion
of the plaintiff's." On this explanation the rule was discharged, and
the judgment subsequently affirmed by the Court of Appeal and the
House of Lords. Lord Hatherley said that the principle de minimis
non curat lex applied to a supposed wrong in taking a part of
dramatic works as well as in reproducing a part of a book. He could
not read the word "part" in the Dramatic Copyright Act as "particle,"
so that the crowing of the cock in "Hamlet," or the introduction of a
line in the dialogue might be held to be an invasion. In Planché v.
Braham,[751] Tindal, C. J., directed the jury that if either one song,
or more than one song be taken from a piece and be performed on
the stage or any place of theatrical entertainment, that would be a
"representing" within the Act of Parliament. The jury, having found
that the defendant had represented "a part of the plaintiff's opera,"
a rule for a new trial was refused.[752] In Beere v. Ellis,[753] two
plays purported to be founded on the same novel. The defendant's
play contained some of the dialogue and several dramatic incidents
and situations taken directly from the plaintiff's play. Baron Pollock
held that a small piece of dialogue would not alone amount to an
infringement, but the defendant had taken two dramatic incidents on
which the plot of the play depended. He had therefore taken a
material part, and although he had done a considerable quantity [139]of
work for himself, he had "extracted the plums" from the plaintiff's
work, and this he was not entitled to do. An indirect taking is, as in
literary copyright, an infringement, e. g. to copy and perform
passages from a play by dramatizing a novel founded on that play.
[754] It is no infringement to produce a play almost identically similar
to that of another author, if this is the result of coincidence and not
of any piracy direct or indirect.[755] As to the taking of a plan or
idea, see the chapter on infringement of literary copyright.[756]
There must be more than the taking of a general idea or scheme.
Lord Blackburn, in Chatterton v. Cave,[757] said:

"An idea may be taken from a drama and used in forming


another without the representation of the second being a
representation of any part of the first. For example, I have no
doubt that Sheridan in composing 'The Critic' took the idea from
'The Rehearsal,' but I think it would be an abuse of language to
say that those who represent 'The Critic' represent 'The
Rehearsal,' or any part thereof, and if it were left to me to find
the fact, I should without hesitation find that they did not. On
the other hand, in composing 'The Trip to Scarborough,'
Sheridan took so much from 'The Relapse,' that if it were left to
me to find the fact, I should find that those who represent 'The
Trip to Scarborough' do represent parts of 'The Relapse.'"

Causing to be Represented.—The "penalty" prescribed by the Act


of 3 & 4 Will. IV. c. 15 is recoverable from those who "represent or
cause to be represented" an unauthorised work. Section 20 of 5 & 6
Vict. c. 45 provides "that the sole liberty of representing, or
performing, or causing or permitting to be represented or
performed, any dramatic piece or musical composition, shall
endure," &c. Notice that this section uses the word "permitting,"
whereas 3 & 4 Will. IV. c. 15 only uses "represent or cause to be
represented." The later statute, however, does not purport to extend
the nature of performing right, and therefore the word "permitting,"
if it have any meaning at all, can only be explanatory of the words
"cause to be represented" in the earlier statute. When then does a
person "cause a dramatic piece to be represented"? Shortly, the
answer probably is, that if he does not actually take part as an [140]
actor,
the defendant must be shown to have had some initiation in or
control over the performance. In Parsons v. Chapman,[758] an acting
manager, who paid the performers' salaries, and was entitled to
dismiss them, was held to have caused a dramatic piece to be
represented within the meaning of 10 Geo. III. c. 28, sec. 1. In
Russell v. Briant,[759] the defendant was the landlord of "The Horns"
Tavern, at Kennington. His premises included a large assembly room
which was hired for evening entertainments. The defendant
furnished the platform and the lights, and allowed bills to be put up
in the tavern, and tickets of admission to be advertised to be sold at
the bar. At one entertainment a song, "The Ship on Fire," which in
Russell v. Smith[760] was held to be a copyright dramatic piece, was
sung. It was held that the defendant had not represented or caused
to be represented the dramatic piece in question. Wilde, C. J., said
that no one could be considered as an offender unless by himself or
his agent he actually took part in the representation. In Lyon v.
Knowles[761] the defendant let his theatre. He provided and paid for
the scenery, lights, printing, advertising, band, doorkeepers, scene-
shifters, and supernumeraries. His servants collected the money at
the door, and he retained half the gross profits to recoup himself.
The lessee brought his own company, and represented pieces of his
own choice, the defendant having no control over any person
employed in the representation. It was held that the defendant had
not caused the piece to be represented within the meaning of the
Acts. In Marsh v. Conquest[762] the defendant was the proprietor of
a theatre, and his son, the acting manager, hired it for a "benefit."
The Court held that the defendant came within the statute. Erle, C.
J., delivered the judgment of the Court:

"It appears that the defendant is the proprietor of the Grecian


Theatre, and the employer of the dramatic corps attached
thereto; that his son, the stage manager, hired for his benefit-
[141]
night the theatre, together with the company of actors, and
servants, and lights, for the sum of £30; and that the son, in the
defendant's theatre, and with the aid of his actors and
actresses, musicians, servants, lights, and other paraphernalia,
represented the dramatic piece in question, in violation of the
plaintiff's sole and exclusive right of representing or causing it to
be represented. I think the defendant is responsible for that
representation. He was the proprietor of the theatre, and had
entire control over the establishment and all belonging to it, and
what was done by his son was done with his permission."

In Monaghan v. Taylor[763] the defendant was the proprietor of a


music hall, and paid a singer to perform, leaving him his own choice
of songs. The singer sang a copyright song. The Court held that the
defendant came within the statute. This decision would not now
apply to musical performing right, since, by the Musical Copyright
Act of 1888, a proprietor is not liable unless he permits the
performance knowing it to be an infringement. It is still applicable to
dramatic performing rights. Suppose, for instance, the proprietor of
a variety theatre hired the services of a troop of players, telling them
to fill up twenty minutes on the programme with any dramatic scene
they pleased. If they infringed a dramatic copyright, the proprietor
would be liable.

It seems to be doubtful whether if B, acting entirely as the agent of


A, causes C and others to perform a dramatic piece, he can be held
liable if he took no part in the representation. In Parsons v.
Chapman[764] Lord Tenterden, C. J., directed the jury that it was
sufficient if the defendant caused the piece to be performed; and
that it made no difference that he did so as an agent for others. This
was a decision under 10 Geo. II. c. 28, and the principle should be
the same under 3 & 4 Will. IV., and 5 & 6 Vict; but in French v.
Day[765] Kennedy, J., took a different view. One of the defendants
was the manager of a theatre. He received instructions for the
production of the piece in question from the proprietor, and he could
not engage or dismiss artistes; he was in every respect bound
[142]to
conform to his employer's orders. Kennedy, J., said:

"The whole thing was carried on by the proprietor, who merely


used the manager as his mouthpiece. I think I ought not to hold
that a person in his position 'represented,' or 'caused to be
represented,' the piece."

Knowledge.—In an action for infringement of dramatic performing


right it is unnecessary to prove that the defendant knew the
performance was an infringement.[766]

Innocent Agents.—All the actors who take part in an unlawful


performance are within the section as "representing," and are liable
to penalties.[767]

Licence.—It is an infringement of performing right to perform


"without the consent in writing of the author or other proprietor."
[768] See decisions on licence as to copyright in books.[769] The

licence must be in writing,[770] but it does not require to be written


by the proprietor or signed by him or any one else.[771] The
secretary of a dramatic author's society may, if he has authority,
grant a good licence on behalf of the authors.[772] A part owner
cannot grant a licence without the consent of the other part owners.
[773]

Section X.—Infringement of Musical Performing Rights.

Substantial Part.—The rule that the taking of a part but not of a


particle in infringement applies equally to musical compositions and
to the performing rights therein. In D'Almaine v. Boosey[774] the
taking of airs from an opera and arranging them as quadrilles and
waltzes was held to be an infringement of the copyright in the
opera. Lord Lyndhurst said:

"Substantially the piracy is when the appropriated music, though


adapted to a different purpose from that of the original, may
still be recognised by the ear."

This test, however, will hardly apply to the piracy of an adaptation


[143]
where the air or melody is a non-copyright one. A comparison of the
actual notes and treatment of the phrases would have to be made.

Public Performance.—It has been contended that the protection


afforded by 3 & 4 Will. IV. c. 15 to musical compositions is only an
exclusive right of performance in places of dramatic entertainment.
That is the protection given to dramatic pieces, and it was said that
5 & 6 Vict. c. 45, in applying 3 & 4 Will. IV. c. 15 to musical
compositions did not give them a wider protection than dramatic
pieces had. In Wall v. Taylor[775] the Court held that this view was
wrong. Bowen, L. J., said:

"I think the answer is this, that what is called in the argument a
'condition' of recovering a penalty in sec. 2 of 3 & 4 Will. IV. c.
15 is nothing of the kind, but part of the definition of the
offence upon which the penalty is to be incurred.... The right
granted is the privilege of representing at places of dramatic
entertainment.... Now sec. 20 of 5 & 6 Vict. c. 45 creates a new
right of property as to a musical composition, and gives the
author and his assigns the sole liberty of representing or
performing it. That is the right given, and sec. 21 says that the
person who shall have that right 'shall have and enjoy the
remedies given and provided' in the Act of 3 & 4 Will. IV. c. 15.
Why read into that word 'remedies' that the second section of
that Act is only to be put in force not where there is an
infringement of that right, but where there has been a
representation or performance at a place of dramatic
entertainment."
The view of Cotton, L. J., in the same case was that the remedies of
3 & 4 Will. IV. c. 15 were not applicable unless the musical
composition was performed in a place of dramatic entertainment;
but that in every case of public performance there was a remedy
under 5 & 6 Vict. c. 45 for damages and injunction. Since the
Musical Copyright Act of 1888 the distinction between these opinions
has become immaterial, for in every case in which the performance
is actionable at all the Court may assess the damages as it thinks
proper.

Causing to be Represented.—The offence is representing or


"causing to be represented." As to what the latter includes see[144]
page
139, on performing right in dramatic pieces. The liability for "causing
to be represented" differs from that in the case of dramatic pieces in
that since the Copyright (Musical Compositions) Act, 1888, "the
proprietor, tenant, or occupier of any place of dramatic
entertainment or other place at which any unauthorised
representation or performance of any musical composition shall take
place ... shall not by reason of such representation or performance
be liable to any penalty or damages in respect thereof, unless he
shall wilfully cause or permit such unauthorised representation or
performance, knowing it to be unauthorised."[776] In respect of
those who are not proprietors, tenants, or occupiers the liability is
the same as in the infringement of dramatic performing right.

Section XI.—Remedies for Infringement of Dramatic

Performing Rights.

An action for—
1. Penalty[777] of 40s. for each performance, or the
defendant's profits, or the actual damage sustained,
whichever be the greater.
2. Injunction.[778]
3. A full and reasonable indemnity as to costs.[779]

Action must be brought within twelve calendar months of the


offence.[780] [145]

Section XII.—Remedies for Infringement of Musical

Performing Rights..

An action for—

1. Damages.[781]
2. Injunction.[782]
3. Costs in the discretion of the Court.[783]

Action must be brought within twelve calendar months of the


offence.[784] [146]
CHAPTER VI

COPYRIGHT IN ENGRAVINGS

Section I.—What Works are Protected.

The following works are protected under the Engraving Acts:


1. Every original engraving or print:[785]
2. [Made within the British dominions:][786]
3. First published within the British dominions:[787]
4. Which bears the date of first publication and the
proprietor's name thereon:[788]
5. And is innocent.[789]

The protection endures for twenty-eight years from publication.[790]

The protection is limited to the United Kingdom.[791]

What is an Original Engraving.—By 8 Geo. II. c. 13 (1734)


copyright is given to "every person who shall invent and design,
engrave, etch, or work in mezzotinto or chiaro oscuro, or from his
own works and invention shall cause to be designed and engraved,
etched, or worked in mezzotinto or chiaro oscuro any historical or
other print or prints."

In Blackwell v. Harper[792] (1740) it was decided that the above Act


was not limited to works of invention such as an historical group, but
extended to the "designing or engraving anything that is already in
nature."

In Jefferys v. Baldwin[793] (1753) it was held that prints of herring


fishing-boats were not within the protection of the Act.

By 7 Geo. III. c. 38 (1766), which was passed in consequence


probably of the doubt thrown upon the earlier Act by the above and
[147]
other decisions, the copyright in engravings is given to "all and every
person or persons who shall invent or design, engrave, etch, or work
in mezzotinto or chiaro oscuro, or from his own work, design, or
invention shall cause or procure to be designed, engraved, etched,
or worked in mezzotinto or chiaro oscuro any historical print or
prints, or any print or prints of any portrait, conversation, landscape,
or architecture, map, chart, or plan, or any other print or prints
whatsoever," and "to all and every person who shall engrave, etch,
or work in mezzotinto or chiaro oscuro, or cause to be engraved,
etched, or worked any print taken from any picture, drawing, model,
or sculpture either ancient or modern."

Notwithstanding this widely worded protection, doubts arose as to


whether lithographs and certain new processes of reproducing prints
came within the Acts, and in consequence a clause was inserted in
the Copyright Act of 1852[794] whereby it was declared that the
provisions of the Engraving Acts were intended to include prints
taken by lithography or any other mechanical process by which
prints or impressions of drawings or designs are capable of being
multiplied indefinitely.

Prints of every description, therefore, are protected under the


Engraving Acts, and it is immaterial whether the design produced is:

1. The imaginative invention of the maker,


2. Taken from some object in nature, or
3. Taken from some other work of art, such as a picture or
model.

Originality.—The only originality required is an originality in


execution, i. e. the work must not be taken from some other print
and reproduce from that other print those characteristics of
execution wherein the peculiar merit of the engraver's art lies.

"The engraver produces his effects by the management of light


and shade, or as the term of his art expresses it, the chiaro
oscuro. The due degrees of light and shade are produced by
different lines and dots; he who is the engraver must decide on
the choice of the different lines or dots for himself, and on his
choice depends the success of his print. If he copies from
[148]
another engraving he may see how the person who engraved
that has produced the desired effect, and so without skill or
attention become a successful rival."[795]

Map, Chart, or Plan.—It will be remembered that maps, charts, and


plans are included under the definition of books in the Copyright Act,
1842,[796] and receive protection as such. Doubt has consequently
been raised as to whether a map must comply with both the
Engraving Acts and the Literary Act in order to obtain protection, or
whether it will be sufficient to comply with the requirements of one
only, and if so, which. The decided cases are unsatisfactory. In
Stannard v. Lee[797] protection was claimed for a "Panoramic Bird's-
eye view of France and Prussia," with the railway and strategic
positions illustrating the Franco-Prussian War of 1870. This was not
registered as a book under the Copyright Act, 1842, and the
objection was held to be fatal. The judges in the Court of Appeal
seemed to be of opinion that the Act of 1842 had taken maps,
charts, and plans out of the protection of the Engraving Acts and
placed them under the protection of the Literary Act, consequently
that the requirements of the latter and not of the Engraving Acts
must be observed. James, L. J., said:

"It was reasonable and proper to take a map out of the class of
artistic copyrights and to give to it the better and more
complete copyright which is intended to be given to literary
works. And there would be, as I have pointed out clearly, great
inconvenience in having two laws of copyright as to two sets of
maps or as to the same set of maps."[798]

Mellish, L. J., said:

"I think it is a perfectly rational enactment that maps shall no


longer be included among works of art but be classed in future
with literary works."[799]
After this case had been decided a petition was brought to the Court
praying that another case, Stannard v. Harrison,[800] in which the
same map had been copied, and to which the defendants[149] had
consented to a decree for injunction and damages, should be
reheard. Bacon, V. C., refused the petition, and indicated in the
course of his judgment that a map not registered as a book might be
protected as an engraving if the claim was properly stated. The
judgment in Stannard v. Lee,[801] he said, had gone on a question of
pleading, the plaintiffs having voluntarily brought their map under
the category of books. This is by no means a satisfactory
explanation of the decision in the Court of Appeal, as it is abundantly
clear from the judgments as reported that in the view of the Lords
Justices the Copyright Act, 1842, took maps, charts, and plans out of
the category of artistic works and placed them in the category of
literary works. Whether this is a correct view is another matter, but
at present it would seem to be law. It is submitted that the true view
probably is that a map may be protected under either Act if the
requisite formalities are observed. The Literary will probably give a
wider protection than the Engraving Acts. The Engraving Acts will
protect a map from infringement of the method of execution, that is
to say, the work which is the peculiar work of the engraver; while
the Literary Act will protect it not only from that, but from a piratical
taking of information imparted. Thus suppose a map of India giving
battles and dates and, say, the principal products of the various
districts marked with printed letters on the surface. It is difficult to
see how the taking of all these dates and products and placing them
perhaps printed in different letters on the new map could be an
infringement of the engraving copyright in the map; there is nothing
in the nature of a design or drawing taken; and yet it is quite clear it
will be an infringement under the protection afforded by a literary
copyright, because there is a taking of the particular expression by
which information is imparted.

Engravings in a book are protected by the Copyright Act, 1842, as


part of the book, and, as such, do not require to comply with the
requirements of the Engraving Acts.[802] The protection of a print
forming part of a book is probably a double one, and if it had the
[150]
name and date inscribed would be protected without registration or
notwithstanding faulty registration of the book.

Must the Engraving be made within the British Dominions.—


The Act 17 Geo. III. c. 57 giving a remedy by action for damages is
expressly confined to works made in Great Britain. The other two
Acts, 8 Geo. II. c. 13 and 7 Geo. III. c. 38, are not expressly limited
to works there made, but it has been held that the limitation is to be
supplied in them.[803]

"It is plain that the object of the legislature was to protect those
works which were designed, engraved, etched, or worked in
Great Britain, and not those which were designed, engraved,
etched, or worked abroad, and only published in Great Britain."
[804]

The Act 6 & 7 Will. IV. c. 59 extends the application of the Engraving
Acts to Ireland, and in section 2 there is a general proviso protecting
"any engraving or print of any description whatever ... which may
have been or which shall hereafter be published in any part of Great
Britain or Ireland." It is quite arguable that this extends the
protection to engravings wherever made if published in the United
Kingdom; but probably it would be held to be only applicable to
engravings made in the United Kingdom, and is merely a proviso
that there will be no copyright until publication in the United
Kingdom. The point, however, is one of great uncertainty.

Under the International Copyright Act, 1886,[805] works first


produced anywhere within the British dominions are protected
equally with those first produced in the United Kingdom. The doubt
still remains whether the engraving must not be made as well as
first published within the British dominions.
The Engraving must be first Published within the British
Dominions.—There is no protection until publication[806] except at
common law. Publication is an act which gives to the public an
opportunity on payment or otherwise of viewing the print. There
may probably be publication without offering copies for sale or
distribution. See as to publication of books[807] and pictures;[808]
but the analogy is not complete with either. There seems to be no
[151]
direct authority as to what constitutes publication of an engraving.

Before 1886 the work had to be published in the United Kingdom.


[809] Now first publication anywhere within the British dominions will

be sufficient to secure the copyright.[810]

Date of First Publication and Proprietor's Name.—It is a


condition precedent to protection that there must be truly engraved
on each plate, and printed on every print or prints[811]—

i. The name of the proprietor;


ii. The day of first publication.

This qualification of the engraver's right is only inserted in 8 Geo. II.


c. 13, and not in the subsequent Acts which extend the protection to
works not there included and give remedies not there given. It has
been held, however, that as the Acts are in pari materia they must
be taken together, and the qualification in the first read into the
others.[812]

The proviso as to the name and date is a condition precedent to


protection, and not merely directory.[813] In one case Lord
Hardwicke thought that, although no action for penalties would lie
unless the name and date were correctly published, an injunction
might be granted even although the name and date were not
published at all.[814] He was probably wrong.
Name of Proprietor.—There is some little doubt as to whether this
must be the name of the person who was proprietor at the date of
first publication or at the date on which protection is claimed. In
Thompson v. Symonds[815] Lord Kenyon said:

"The name of the proprietor should appear in order that those


who wish to copy it might know to whom to apply for consent.
It seems, therefore, necessary that the date should remain, but
that the name of the proprietor should be altered as often as
the property is changed." [152]

But Buller, J., in the same case, thought the proprietor always meant
the inventor and first proprietor, notwithstanding the property had
passed to his assignee. The point is certainly doubtful, but the latter
view that the name of the first proprietor only need be on the print
seems the more reasonable, and not contrary to the wording of the
Act. The proprietor need not be described as such on the plate.[816]
If his name is there it is sufficient, even if there is more than one
name and it is uncertain which is the proprietor.[817] The proprietor
need not be described by his full name, his surname is sufficient.
[818] When a partnership firm are proprietors of an engraving the
trading name of the firm is a sufficient designation, inasmuch as it
enables parties to know whom to apply to for information.[819] If a
single proprietor trades under the designation of A. B. & Co. that is a
sufficient designation.[820]

Immoral Works.—There will be no copyright in profane, libellous,


or indecent prints.[821]

Duration of Protection.—The statutory right begins on


publication, and runs for twenty-eight years from the day of first
publishing.[822]

After publication protection will depend entirely on the statute.[823]


Before publication there is a common law right to prevent all
interference with what is a man's private property,[824] and to
protect this the formalities prescribed by the statute need not be
complied with.

Section II.—The Owner of the Copyright.

The Engraver.—The persons to whom the copyright is given by the


Acts are, "Every person and persons who shall invent or design,
[153]
engrave, etch, or work in mezzotinto or chiaro oscuro, or from his
own work, design, or invention, shall cause or procure to be
designed, engraved, etched, or worked in mezzotinto or chiaro
oscuro any prints ... and every person who shall engrave, etch, or
work in mezzotinto or chiaro oscuro, or cause to be engraved,
etched, or worked any print taken from any picture."

The engraver, therefore, is the first owner of the copyright when he


does the work on his own behalf, or, if he does it on behalf of
another, executes it entirely from his own work, design, or invention.

The Employer.—When one man employs another to execute an


engraving it would seem that by the Acts[825] the copyright vests ab
initio in the employer:

1. In the case of an engraving taken from another work of


art.
2. In the case of an engraving with an original design, if it is
executed from the employer's own work, design, or
invention.

An employer may be the inventor of a design even although he is


unable to draw, and would himself be unable to execute it. For
instance, in the case[826] of a war map for the Franco-Prussian war
in 1870, it was held that a publisher who had employed an engraver,
giving him material and instructions from time to time was the
inventor, and therefore the first owner in the copyright in the map.
Bacon, V. C., said:

"As to whether the design or invention is that of the plaintiff or


not is a mere matter of character.... The compiler has proved
that it is the design of the plaintiff; that the plaintiff brought to
him his rough sketch or draught, a drawing of the same size as
the stone on which it was to be engraved, pointing out, as the
compiler has said, 'a rough sketch of the forts and towns to give
me an idea; he furnished me also with a large French map, and
some maps published in the Times and Daily Telegraph; he gave
me notice also daily of the earthworks that were made and
produced, besides a picture published in the Illustrated London
News.' That the plaintiff cannot draw himself is a matter wholly [154]
unimportant if he has caused other persons to draw for him. He
invents the subject of the design beyond all question. He
prescribes the proportions and the contents of the design; he
furnishes a part of the materials from which the drawing has to
be made in the first instance, and afterwards collects daily from
the proper sources, and even, if it be necessary to say so, from
official sources, the decrees, the reports, the bulletins and
accounts contained in the newspapers of the different phases of
the war, and especially of the places in which earthworks are
thrown up. These he communicates to the man whom he has
employed to make a drawing for him.... It is clear to my mind
that this is a work of diligence, industry, and for aught I know of
genius on the part of the plaintiff, for the notion never seems to
have occurred to the compiler himself."

If the person employed is the servant of the employer and not an


independent contractor, the whole right in the engraving will
probably, irrespective of the Acts, vest ab initio in the employer.[827]
The Assignee.—In one case[828] it was contended that there could
be no assignment under the Engraving Acts enabling an assignee to
sue in his own name, since these Acts only provide for the licence
and exemption from liabilities of a purchaser.[829] It was held,
however, that there could be an assignment, and that the assignee
could sue in his own name.[830]

As a licence is required to be in writing, signed by the proprietor and


in the presence of two or more credible witnesses,[831] so must the
assignment which passes a greater right.[832]

The sale of plates will not in itself operate as an assignment;[833]


but, if it were clearly intended to pass the whole right, probably it
would pass with the plates without assignment in writing.[834]

Before publication the whole right in the engraving, i. e. the common


law right, may be assigned without writing.[835] [155]

Section III.—Infringement of the Copyright.

Prohibited Acts and Remedies.—It is an offence "for any print-


seller or other person whatsoever"[836]—

1. To engrave, etch, or work, or in any manner copy and sell


the protected work.
2. To print, reprint, or import for sale any pirated copy.
3. Knowingly to publish, sell, or expose for sale, or in any
other manner dispose of any pirated copy.
4. To cause or procure any of these acts to be done.

For any such offence the remedy is an action in the High Court for—
i. Forfeiture of plates and sheets to proprietor for destruction.
[837]

ii. Penalty of 5s. for every published copy.[838]


iii. Damages.[839]
iv. Injunction.[840]
v. Inspection and Account.[841]

Further it is an offence—

5. Innocently to publish, sell, or expose for sale, any pirated


copy.[842]
6. To make a copy or copies, whether for sale or not.[843]
7. To cause or procure any of these acts to be done.

For any such offence the remedy is an action in the High Court for—

i. Damages.[844]
ii. Injunction.[845]
iii. Inspection and Account.[846]
[156]

Penalties and delivery of plates or copies may also be recovered by


summary proceeding before any two justices having jurisdiction
where the party offending resides.[847]

Guilty Knowledge.—It will be noticed that in order to recover


penalties and forfeiture of copies under 8 Geo. II. c. 13, for the
offence of selling a piratical copy, it must have been committed
knowing the copy to have been produced without consent. In 17
Geo. III. c. 57, however, the offence for which an action for damages
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like