100% found this document useful (1 vote)
18 views

Humanities Data in R Exploring Networks Geospatial Data Images and Text 2nd Edition Unknown - Download the ebook today and own the complete content

The document promotes the second edition of 'Humanities Data in R,' which explores the integration of computational methods in humanities research. It emphasizes the significance of using R programming for analyzing various forms of humanities data, including text, images, and geospatial data. The authors aim to provide a comprehensive guide for researchers and students to engage with and analyze data related to human societies and cultures, highlighting the importance of interdisciplinary collaboration.

Uploaded by

mentolbutrin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
18 views

Humanities Data in R Exploring Networks Geospatial Data Images and Text 2nd Edition Unknown - Download the ebook today and own the complete content

The document promotes the second edition of 'Humanities Data in R,' which explores the integration of computational methods in humanities research. It emphasizes the significance of using R programming for analyzing various forms of humanities data, including text, images, and geospatial data. The authors aim to provide a comprehensive guide for researchers and students to engage with and analyze data related to human societies and cultures, highlighting the importance of interdisciplinary collaboration.

Uploaded by

mentolbutrin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Humanities Data in R Exploring Networks Geospatial


Data Images and Text 2nd Edition Unknown

https://ebookgate.com/product/humanities-data-in-r-
exploring-networks-geospatial-data-images-and-text-2nd-
edition-unknown/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Data Science Fundamentals with R Python and Open Data 1st


Edition Marco Cremonini

https://ebookgate.com/product/data-science-fundamentals-with-r-python-
and-open-data-1st-edition-marco-cremonini/

ebookgate.com

Statistics for Censored Environmental Data Using Minitab


and R Statistics in Practice 2nd Edition Dennis R. Helsel

https://ebookgate.com/product/statistics-for-censored-environmental-
data-using-minitab-and-r-statistics-in-practice-2nd-edition-dennis-r-
helsel/
ebookgate.com

Modern Statistics With R From Wrangling and Exploring Data


to Inference and Predictive Modelling second edition Måns
Thulin
https://ebookgate.com/product/modern-statistics-with-r-from-wrangling-
and-exploring-data-to-inference-and-predictive-modelling-second-
edition-mans-thulin-2/
ebookgate.com

Modern Statistics with R From Wrangling and Exploring Data


to Inference and Predictive Modelling Second Edition Måns
Thulin
https://ebookgate.com/product/modern-statistics-with-r-from-wrangling-
and-exploring-data-to-inference-and-predictive-modelling-second-
edition-mans-thulin/
ebookgate.com
Data Modeling Made Simple with Embarcadero ER Studio Data
Architect Adapting to Agile Data Modeling in a Big Data
World 2nd Edition Steve Hoberman
https://ebookgate.com/product/data-modeling-made-simple-with-
embarcadero-er-studio-data-architect-adapting-to-agile-data-modeling-
in-a-big-data-world-2nd-edition-steve-hoberman/
ebookgate.com

Data structures and algorithms made easy in Java data


structure and algorithmic puzzles 2nd Edition Narasimha
Karumanchi
https://ebookgate.com/product/data-structures-and-algorithms-made-
easy-in-java-data-structure-and-algorithmic-puzzles-2nd-edition-
narasimha-karumanchi/
ebookgate.com

Automotive informatics and communicative systems


principles in vehicular networks and data exchange 1st
Edition Huaqun Guo
https://ebookgate.com/product/automotive-informatics-and-
communicative-systems-principles-in-vehicular-networks-and-data-
exchange-1st-edition-huaqun-guo/
ebookgate.com

R Data Mining Blueprints 1st edition Edition Mishra

https://ebookgate.com/product/r-data-mining-blueprints-1st-edition-
edition-mishra/

ebookgate.com

The Handbook of Computer Networks Key Concepts Data


Transmission and Digital and Optical Networks Volume 1
Hossein Bidgoli
https://ebookgate.com/product/the-handbook-of-computer-networks-key-
concepts-data-transmission-and-digital-and-optical-networks-
volume-1-hossein-bidgoli/
ebookgate.com
Quantitative Methods in the Humanities
and Social Sciences

Taylor Arnold
Lauren Tilton

Humanities
Data in R
Exploring Networks, Geospatial Data,
Images, and Text
Second Edition
Quantitative Methods in the Humanities
and Social Sciences

Series Editors
Thomas DeFanti, Calit2, University of California San Diego, La Jolla, CA, USA
Anthony Grafton, Princeton University, Princeton, NJ, USA
Thomas E. Levy, Calit2, University of California San Diego, La Jolla, CA, USA
Lev Manovich, Graduate Center, The Graduate Center, CUNY, New York, NY, USA
Alyn Rockwood, KAUST, Boulder, CO, USA
Quantitative Methods in the Humanities and Social Sciences is a book series
designed to foster research-based conversation with all parts of the university
campus – from buildings of ivy-covered stone to technologically savvy walls
of glass. Scholarship from international researchers and the esteemed editorial
board represents the far-reaching applications of computational analysis, statistical
models, computer-based programs, and other quantitative methods. Methods are
integrated in a dialogue that is sensitive to the broader context of humanistic study
and social science research. Scholars, including among others historians, archaeolo-
gists, new media specialists, classicists and linguists, promote this interdisciplinary
approach. These texts teach new methodological approaches for contemporary
research. Each volume exposes readers to a particular research method. Researchers
and students then benefit from exposure to subtleties of the larger project or corpus
of work in which the quantitative methods come to fruition.

Editorial Board:
Thomas DeFanti, University of California, San Diego & University of Illinois at
Chicago
Anthony Grafton, Princeton University
Thomas E. Levy, University of California, San Diego
Lev Manovich, The Graduate Center, CUNY
Alyn Rockwood, King Abdullah University of Science and Technology
Publishing Editor for the series at Springer: Faith Su, faith.su@springer.com
Taylor Arnold • Lauren Tilton

Humanities Data in R
Exploring Networks, Geospatial Data,
Images, and Text

Second Edition
Taylor Arnold Lauren Tilton
University of Richmond University of Richmond
Richmond, VA, USA Richmond, VA, USA

ISSN 2199-0956 ISSN 2199-0964 (electronic)


Quantitative Methods in the Humanities and Social Sciences
ISBN 978-3-031-62565-7 ISBN 978-3-031-62566-4 (eBook)
https://doi.org/10.1007/978-3-031-62566-4

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2015, 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.


Preface

Published in 2015, the first edition of this book was written as digital humanities was
fully entering the lexicon of the academy. Debates over ideas such as computation,
digital, and data ensued. Questions such as what does it mean to think of sources
as data, or “humanities data,” were posed by Miriam Posner [75], while Jessica
Marie Johnson brought the longer history of quantification to ask pressing questions
about the process and effect of continuing to turn people into data [47]. Amid
these questions and debates, cultural institutions such as the Library of Congress
made an incredible commitment to digitization and open data, making sources once
only accessible in person available in digital formats that were now amenable to
computational methods. What could be possible with all these sources of data?
We set out to demonstrate how methods from text, spatial, and image analyses
could animate humanities fields by rethinking of our sources as data and using
programming, specifically the language R. This was a rather radical move at the
time, when humanities fields were particularly resistant to the idea of thinking of
materials such as books, photographs, and TV as the subject of analysis through
counting and probabilities, much less algorithms and modeling. The field of digital
humanities was pushing against this impulse, particularly led by scholars in digital
history and what we now call computational literary studies. For those interested in
learning how to bring them together, they were still often on their own. For many,
programming and humanities inquiry still seemed like a contradiction.
Yet, as graduate students, one in American Studies (Lauren) and the other in
Statistics (Taylor), bringing together humanities data such as historical photographs
with computational methods such as mapping seemed incredibly powerful. Our
work building photogrammar.org, and the project’s positive reception, demon-
strated the possibilities of layering mapping, text analysis, and image analysis to
further the study of visual culture. Computational methods did not replace all
the training of humanities fields, but rather fit with the experimentation, trans-
disciplinarity, and creativity that American Studies articulated as central to its
project. At the same time, fields such as Statistics were continuing their emphasis
on mathematical theory, often disconnected from many of the realities of working
with actual data and the methodological problems that the messiness of human data

v
vi Preface

elicited. An openness to thinking across these boundaries is a significant reason why


this book exists.
Our advisors Laura Wexler and Jay Emerson along with graduate colleague Carol
Chiodo at Yale fundamentally understood what was possible, supporting us when
others questioned these two, perhaps precocious, graduate students. We eagerly
joined exciting projects like the Programming Historian and work by Matthew
Jockers and Lev Manovich, both of whom we are deeply grateful for their support,
to demonstrate how computational methods could be a part of the methodological
toolkit of the humanities. Rather than designed for industry or a very technical
audience, Humanities Data in R filled a need for a book designed to introduce
audiences to computational methods and were interested in the sources that served
as primary evidence for understanding the human experience.
Fast forward almost a decade, and a fair amount has changed. We are now
tenured professors at a flourishing small liberal arts college where interdisciplinarity
is celebrated. We teach digital humanities across the Department of Rhetoric
and Communication, Department of Mathematics and Statistics, and programs
in American Studies and Data Science. At the same time, the rapid ascent of
data science over the past 5 years has mostly silenced debates over whether the
humanities should be involved with data and computation. In fact, many of us
are noting how data and computation have never needed the humanities more.
Humanities scholars should be key interlocutors in interpreting the findings of
computational analysis of humanities data as well as have important insights into
the ethical and social impact of computational methods. One goal of this book is
to provide the programming and methodological background to be a part of these
interdisciplinary conversations and debates.
For the computational approaches, fields such as Statistics are now grappling
with the realities of working with messy data. It was already a decade ago that
Taylor realized that the most complicated data came from sources that animated the
humanities. How does one work with film, for example? The data is multimodal,
defies easy classifications, and breaks computer vision algorithms. To use the
gendered logic that permeates so many discussions of academia, humanities fields
weren’t some soft, squishy area of study that was easier, but rather worked with
the hard, complex sources and data that challenged what was seen as a given in
statistical and computational fields. We co-author because we believe that inter-
and transdisciplinary scholarship is key to the (digital) humanities and data science,
and we have so much to learn from each other. We see this book as a part of that
exchange, and for anyone who wants to work with humanities data.

Preface to Second Edition

The second edition is a significant revision, with almost every aspect of the text
rewritten in some way. The biggest difference is the incorporation of the set
of R packages commonly known as the tidyverse, consisting at its core of the
Preface vii

packages ggplot2 and dplyr. These packages have grown significantly in stability
and popularity over the past decade. They allow the kinds of functionality that we
wanted to highlight in the first version of the book, but do so with less code while
being backed by theoretical models of how data processing should work. These
features make them perfect elements to use for an introduction to R for working
with humanities data.
As before, Part I introduces the R programming language and key concepts for
working with data. Exploratory data analysis (EDA) remains a key concept and
philosophy. EDA is an approach for analyzing and summarizing to identify patterns
(and outliers) in data. It is also a way of knowing that is amenable to the kinds
of questions and heuristics that animate how humanistic fields approach studying
the human experience. Based on years of teaching, we have come to realize how
important understanding data collection is to data analysis yet how few resources
there are, so we have added Chap. 5: Collecting Data and Chap. 12: Data Formats to
address perhaps the most time-consuming part, collecting and organizing data.
Part II of the text is still organized around data types. We have decided to reorder
the chapters because of our approach to data. In this edition, we wanted to show how
one can layer types of analysis using the same data set. Rather than each chapter
introducing a new data set, we build our analysis of Wikipedia data from Chaps. 6
to 8 as we move from text to networks to temporal data. Chapter 8: Temporal
Data is a new chapter given the importance of time information, particularly if
we want to study change over time. Chapter 9: Spatial Data returns to the data
that was used in Part I to show how we can layer the information with additional
data. Chapter 10: Image Data introduces a new data set of 1940s photographs to
apply computer vision. While we are always hesitant of hype about technological
change, particularly given all the current (generative) AI boosterism, a significant
methodological shift in the last 10 years is the advances in computer vision,
particularly the ascent of deep learning. We now focus on several of the most popular
tasks such as object detection, and how we can also layer them with additional
methods such as networks. The reorganization, additional chapters, and new data
sets are a part of trying to demonstrate how layering methods can add context and
nuance to our analysis.

Humanities Data

We now return to the term “humanities data.” For us, this means any data that is
engaged with analyzing any aspect of human societies and cultures. This is bigger
than any disciplinary or institutional formation. When we are working with the
messiness of human creativity and meaning, we are engaged in a challenging task,
particularly when we want to understand peoples’ beliefs, values, and behaviors,
whether today or in the past. This is inherently a transdisciplinary project that
traverses any walls that we try to build through academic journals, departments,
scholarly associations, and the university itself. Working with humanities data
viii Preface

happens in industry and beyond. Working with this data carefully, ethically, and
precisely takes collaboration. The book is designed to provide the groundwork for
those who seek to engage with and analyze the data that documents, shapes, and
communicates who we are, where we have been, and the worlds we are building.
No book can do everything, and our orientation is centered around the United
States. The goal of this book is to walk readers through the methods and provide
the code that will give one the resources and confidence to computationally explore
humanities data. Data and methods such as image analysis are the subject of tens of
thousands of articles and books. At the end of each chapter and through our citations,
we offer further reading to start connecting with the wide range of scholarship on
each of these chapters. We also do not go directly into all the debates over the
epistemology and ontology of data and statistics itself; we find a great place to start
is with Lisa Gitelman’s “Raw Data” is an Oxymoron [36] and Chris Wiggins and
Matthew L. Jones’s How Data Happened: A History from the Age of Reason to
the Age of Algorithm [104]. Along with work by dana boyd, Kate Crawford, Safiya
Noble, and Meredith Broussard, we find Catherine d’Ignazio and Lauren Klein’s
Data Feminism to be also be a great place to start when it comes to data ethics and
justice [30].
Zooming out, there is significant domain-specific scholarship to draw on to
see the power of humanities data analysis. There are series and journals such as
Current Research in Digital History, Debates in the Digital Humanities, Digital
Scholarship in the Humanities, Journal of Cultural Analytics, Journal of Open
Source Software, and the new journal Computational Humanities Research along
with digital humanities special issues in journals like American Quarterly, Cinema
Journal, and Digital Humanities Quarterly. There are books like Ted Underwood’s
Distant Horizons, [87] Andrew Piper’s Enumerations [73], and our own Distant
Viewing [7] that offer theories for computational methods. As well, there are
domain-specific works such as Cameron Blevins’ Paper Trails: The US Post and
the Making of the American West [16] and Lincoln Mullen’s America’s Public Bible
[63] that show how computational methods provide key evidence for scholarship in
religious studies, US history, and rhetorical studies. We offer the work above as a
starting point for the rich conversations and debates around humanities data.

Supplementary Materials

We make extensive use of example datasets through this text. Particular care was
taken to use data in the public domain, or otherwise freely and openly accessible.
Whenever possible, subsets of larger archives were used instead of smaller one-
off datasets. This approach has the dual benefit that these larger sets are often of
independent interest, as well as providing an easy source of additional data for
use in course projects, lectures, and further study. These datasets are available (or
Preface ix

linked to) from the text’s website: http://humanitiesdata.org. Complete code


snippets from the text, further references, and additional links and notes are also
included in that site and will continue to be updated.

Acknowledgments

For the first edition, it would not have been possible to write this text without
the collaboration and support offered by our many colleagues, friends, and family.
In particular, we would like to thank those who agreed to read and comment on
the early drafts of this text: Carol Chiodo, Jay Emerson, Alex Gil, Jason Heppler,
Matthew Jockers, Mike Kane, Lev Manovich, Laura Wexler, Jeri Wieringa, and two
anonymous readers.
For the second edition, we are deeply appreciative of the University of Richmond,
which has given us the time and resources to pursue a second edition. We
are grafteful to Justin Wigard, who read a complete draft and offered crucial
feedback, and Agnieska Szymanska, who provided guidance in countless ways.
Working with Rob Nelson and the Digital Scholarship Lab (DSL) has been
incredible; their commitment to bringing together digital humanities and social
justice through award-winning projects like Mapping Inequality continue to inspire.
We are also grateful to our departments—Rhetoric and Communication and Math
and Statistics—along with Dean Jenny Cavanaugh, whose support, generosity, and
deep commitment to the liberal arts is a model for us all. It is a special place where
the University President takes the time to engage with faculty’s scholarship. Thank
you, Kevin Hallock, for your time and leadership. And finally, to the awesome UR
students who took our classes and helped us refine our teaching and shared in the
joys and challenges of working with humanities data.

Richmond, VA, USA Taylor Arnold


April 2024 Lauren Tilton
Contents

Part I Core
1 Working with Data in R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Working with R and R Markdown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Running R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Loading Data in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Formatting R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 EDA I: Grammar of Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Text Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Lines and Bars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Optional Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Labels and Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Conventions for Graphics Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 EDA II: Organizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Choosing Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Data and Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Selecting Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Arranging Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Summarize and Group By . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Geometries for Summaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.8 Mutate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
xi
xii Contents

4 EDA III: Restructuring Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Joining by Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Mutating and Filtering Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Pivot Longer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Pivot Wider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.6 Patterns for Table Pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5 Collecting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Rectangular Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Naming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 What Goes in a Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5 Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.6 Output Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.7 Data Dictionary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.8 Summary of Data Collection Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Part II Data Types


6 Textual Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2 Working with a Textual Corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Natural Language Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Term Frequency-Inverse Document Frequency (TF-IDF). . . . . . . . . . 97
6.5 Document Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.6 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.7 Word Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.8 Texts in Other Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2 Creating a Network Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.3 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.5 Co-citation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.6 Directed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.7 Distance Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.8 Nearest Neighbor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8 Temporal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Temporal Data and Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Contents xiii

8.3 Date Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


8.4 Datetime Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.5 Language and Time Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.6 Manipulating Dates and Datetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.7 Window Functions and Range Joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9 Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.2 Spatial Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.3 Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.4 Spatial Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.5 Spatial Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.6 Raster Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10 Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.2 Loading Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.3 Pixels and Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
10.4 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.5 Object Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.6 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
10.7 Pose Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
10.8 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
10.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Part III Additional Methods


11 Programming in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.3 Data Types and Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
11.4 Selecting and Modifying Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.6 Control Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.7 Functional Programming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
11.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12 Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
12.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
12.3 Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
12.4 JSON Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
12.5 XML and HTML Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
12.6 XML Path Language (XPath). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
xiv Contents

12.7 Building Datasets Through an API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270


12.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Part I
Core
Chapter 1
Working with Data in R

1.1 Introduction

In this book, we focus on tools and techniques for exploratory data analysis or EDA.
Initially described in John Tukey’s classic text by the same name, EDA is a general
approach to examining data through visualizations and broad summary statistics
[19, 85]. It prioritizes studying data directly in order to generate hypotheses and
ascertain general trends prior to, and often in lieu of, formal statistical modeling.
The growth in both data volume and complexity has further increased the need
for a careful application of these exploratory techniques. In the intervening 50
years, techniques for EDA have enjoyed great popularity within statistics, computer
science, and many other data-driven fields and professions.
The histories of the R programming language and EDA are deeply entwined.
Concurrent with Tukey’s development of EDA, Rick Becker, John Chambers,
and Allan Wilks of Bell Labs began developing software designed specifically
for statistical computing. By 1980, the “S” language was released for general
distribution outside Bell Labs. It was followed by a popular series of books and
updates, including “New S” and “S-Plus” [10–12, 21]. In the early 1990s, Ross
Ihaka and Robert Gentleman produced a fully open-source implementation of S
called “R.” It is called “R” for it is both the “previous letter in the alphabet” and
the shared initial in the authors’ names. Their implementation has become the de
facto tool in the field of statistics and is often cited as being amongst the top 20 used
programming languages in the world. Without the interactive console and flexible
graphics engine of a language such as R, modern data analysis techniques would be
largely intractable. Conversely, without the tools of EDA, R would likely still have
been a welcome simplification to programming in lower-level languages but would
have played a far less pivotal role in the development of applied statistics.
The historical context of these two topics underscores the motivation for studying
both concurrently. In addition, we see this book as contributing to efforts to bring
new communities to learn from and to help shape data analysis by offering other

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 3


T. Arnold, L. Tilton, Humanities Data in R, Quantitative Methods in the Humanities
and Social Sciences, https://doi.org/10.1007/978-3-031-62566-4_1
4 1 Working with Data in R

Fig. 1.1 Diagram of the process of exploratory data analysis

fields of study to engage with [4]. It is an attempt to provide an introduction for


students and scholars in the humanities and the humanistic social sciences to both
EDA and R. It also shows how data analysis with humanities data can be a powerful
method for humanistic inquiry. A visual summary of the steps of EDA are shown in
Fig. 1.1. We will see that the core chapters in this text map onto the steps outlined
in the diagram.

1.2 Setup

While it is possible to read this book as a conceptual text, we expect that the majority
of readers will eventually want to follow along with the code and examples that are
given throughout the text. The first step in doing so is to obtain a working copy
of R. The Comprehensive R Archive Network, known as CRAN, is the official
home of the R language and supplies download instructions according to a user’s
operating system (i.e., Mac, Windows, Linux): http://cran.r-project.org/.
Other download options exist for advanced users, up to and including a custom
build from the source code. We make no assumptions throughout this text regarding
which operating system or method of obtaining or accessing R readers have chosen.
In the rare cases where differences exist based on these options, they will be
explicitly addressed. While one can work from the terminal, we recommend using
an integrated development environment (IDE) to more easily see the code and data.
A piece of open-source software called the RStudio IDE is highly recommended:
https://posit.co/download/rstudio-desktop/. When installed in conjunc-
tion with the R environment, RStudio provides a convenient way of running R
code and seeing the output in a single window. We will show in the next section
screenshots from running R code in RStudio.
In addition to the R software, walking through the examples in this text requires
access to the datasets we explore. Care has been taken to ensure that these are all in
the public domain so as to make it easy for us to redistribute to readers. The materials
and download instructions can be found at https://humanitiesdata.org/. A
complete copy of the code from the book is also provided to make replicating (and
extending) the results as easy as possible.
1.3 Working with R and R Markdown 5

A major selling point of R is its extensive collection of user-contributed add-


ons, called packages. Details of how to install packages are included in the
supplemental materials. Specifically, the supplemental materials have a document
called setup.Rmd. Opening this in RStudio provides instructions for installing
all the packages that are needed throughout this book. Like R itself, all the
packages used here are free and open-source software, thanks to a robust community
dedicated to developing and expanding R.
As mentioned in the preface, we make heavy use in this text of a set of R packages
known as the tidyverse. These include ggplot2, readr, dplyr, and tidyr. The meta-
package tidyverse can be loaded to automatically load all the other associated R
packages. One of the other packages included in this book is hdir (Humanities Data
in R), which contains a set of wrapper functions specifically created for the text.
This package, like all the others used in this book, is released under an open-source
license and can be reused in other projects.
Learning to program is hard and invariably questions and issues will arise
in the process (even the most experienced users require help with surprisingly
high frequency). As a first source of help, searching a question or error message
online will often pull up one of the many third-party question and answer sites,
such as http://stackoverflow.com/, which are heavily frequented by new and
advanced R users alike. If we cannot find an immediate answer to a question, the
next best step is to find some local, in-person help. While we have done our best with
this static text to explain the concepts for working with R, nothing beats talking to
a real-life person. As a final step, we could post questions directly on third-party
sites. It may take a few days to get a response, but usually someone helpful from
the R community will answer. We invite everyone to participate in the community
by being active on forums, contributing packages, and supporting colleagues and
friends. There are also great groups like R-Ladies (rladies.org) and regional
groups that can provide further connections (see: r-community.org).

1.3 Working with R and R Markdown

The supplemental materials for this book include all the data and code needed to
replicate all of the analyses and visualizations in this book. We include the exact
same code that will be printed in the book. We have used the R Markdown file
format, which has an .Rmd extension, to store this code, with a file corresponding
to each chapter in the text. The R Markdown file format is a great choice for data
analysis because it allows us mix code and descriptions within the same file [51].
In fact, we even wrote the text of this book in the R Markdown format before
converting it into LaTeX for printing.
The RStudio environment offers a convenient format for viewing and editing R
Markdown files. If we open an R Markdown file in RStudio, we should see a window
similar to the one shown in Fig. 1.2. We made this image on a recent version of
macOS; the specific view may be slightly different on Windows and may change
6 1 Working with Data in R

Fig. 1.2 Default view of an R Markdown file in RStudio shown in a recent version of macOS

slightly depending on the screen size and the version of RStudio being used. On the
left is the actual file itself. Some output and other helpful bits of information are
shown on the right. There is also a Console window, which we generally will not
need. We have minimized it in the graphic, which we often do whenever working
on a smaller screen
Looking at the R Markdown file, notice that the file has parts that are on a
white background and other parts that are on a gray background. The white parts
correspond to text and the gray parts to code. In order to run the code, and to see
the output, click on the green triangle play button on the upper-right corner of each
block. When we run code to read or create a new dataset, the data will be listed in
the Environment tab in the upper-right-hand side of RStudio. Finally, clicking on
the data will open a spreadsheet version of the data that we can view to understand
the structure of our data and to see all the columns that are available for analysis.
As with any digital file, it is a good idea to make sure to save the notebook
frequently. Keep in mind, however, that only the text and code itself is saved.
The results (plots, tables, and other output) are not automatically stored. While
counterintuitive at first, this is a helpful feature because the code is much smaller
compared to the results. Saving the code helps to keep the file sizes small and tidy.
If we would like to save the results in a way that can be shared with others, we need
to knit the file by clicking on the Knit button (it has a ball of yarn icon) at the top of
the notebook. After running all the code from scratch, the knit function will produce
an HTML version of our script that we can open in a web browser.
1.4 Running R Code 7

1.4 Running R Code

Now, let’s see some examples of how to run R code. In this book, we will show
snippets of R code and the output rather than a screenshot of the entire RStudio
session. Though, know that we should think of each of the snippets as occurring
inside of one of the gray boxes in an R Markdown file. In one of its most basic
forms, R can be used as a fancy calculator. We can add 1 and 1 by typing 1+1
into the code chunk of an R Markdown file. Hitting the run button will display the
output (2) below. An example in RStudio is shown in Fig. 1.2. In the book, we will
write this code and output using a black box with the R code written inside of it.
Any output will be shown below, with each line proceeded by two hash tags. An
example is given below.

1 + 1

## [1] 2

We will often see numbers in the output surrounded by square brackets, such as the
[1] in the output above. These are a common cause of confusion and worry for
new users of R. These numbers are simply counting the values in the output. In the
example above, the [1] that it is showing that the value 2 is first output from our
code.
In addition to just returning a value, running R code can also result in storing
values through the creation of new objects within R. Objects in R are used to store
anything—such as numbers, datasets, functions, or models—that we want to use
again later. Each object has a name associated with it that we can use to access it in
future code. To create an object, we will use the <- (arrow) symbol with the name on
the left-hand side of the arrow and code that produces the object on the right-hand
side. For example, we can create a new object called mynum with a value of 8 by
running the following code.

mynum <- 3 + 5

Notice that the code here did not print any results because the result was saved as
a new object. We can now use our new object mynum exactly the same way that we
would use the number 8. For example, adding it to 1 to get the number nine:

mynum + 1

## [1] 9

Object names must start with a letter but can also use underscores and periods. We
recommend using only lowercase letters and underscores. That makes it easier to
8 1 Working with Data in R

read the code later on without needing to remember if and where we used capital
letters.

1.5 Functions in R

A function in R is something that takes a set of input values and returns an output
value. Generally, a function will have a format similar to that given in the code here:

function _name (arg1 = input1 , arg2 = input2 )

Where arg1 and arg2 are the names of the inputs to the function (they are fixed)
and input1 and input2 are the values that we will assign to them. The number
of arguments is not always two, however. There may be any number of arguments,
including zero. Also, there may be additional optional arguments that have default
values that can be modified. Let us look at an example function: seq. This function
returns a sequence of numbers. We can give the function two input arguments: the
starting point from and the ending point to.

seq(from = 1, to = 100)

## [1] 1 2 3 4 5 6 7 8 9 10 11 12
## [13] 13 14 15 16 17 18 19 20 21 22 23 24
## [25] 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48
## [49] 49 50 51 52 53 54 55 56 57 58 59 60
## [61] 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84
## [85] 85 86 87 88 89 90 91 92 93 94 95 96
## [97] 97 98 99 100

The function returns a sequence of numbers starting from 1 and ending at 100
in increments of 1. Here, we see the benefit of the square brackets in the output;
the [13] at the start of the second line indicates that the second line starts on the
13th value of the output. In addition to specifying arguments by name, we can also
pass arguments by position. When specifying arguments by position, we need to
know and use the default ordering of the arguments. Below is an example of another
equivalent way to write the code to produce a sequence of integers from 1 to 100, this
time without the argument names. (For the sake of saving space, we will sometimes
not display the output of our code, as is the case here.)

seq (1, 100)

How did we know the inputs to each function and what they do? In this text, we
will explain the names and usage of the required inputs to new functions as they
1.5 Functions in R 9

Fig. 1.3 Example documentation page for the function “seq”

are introduced. In order to learn more about all of the possible inputs to a function,
we can look at a function’s documentation. For packages to be on CRAN, they
must include information about each of the inputs to a function and the values that
are returned. In order to see the documentation, we can run a line of code that starts
with a question mark followed by the name of the function, as in the example below.
In RStudio, the information about the function will then show up in the lower-left
corner of the IDE. An example of the page is shown in Fig. 1.3

?seq
10 1 Working with Data in R

As shown in the documentation page, there is also an optional argument, called by,
that controls the spacing between each of the numbers. By default, the by argument
is equal to 1, but we can change it to spread the points out by different intervals. For
example, below are the half-numbers between 1 and 10.

seq(from = 1, to = 10, by = 0.5)

## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
## [11] 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0

We will learn how to use numerous functions in the coming chapters, each of which
will help us in exploring and understanding data. In order to do this, we need to first
load our data into R, which we will show in the next section.

1.6 Loading Data in R

In this book, we will be working with data that is stored in a tabular format.
Figure 1.4 shows an example of a tabular dataset consisting of information about
metropolitan regions in the United States supplied by the US Census Bureau.
These regions are called core-based statistical areas or CBSA. In Fig. 1.4, we
have ten rows and five columns. Each row of the dataset represents a particular
metropolitan region. We call each of the rows an observation. The columns in a
tabular dataset represent the measurements that we record for each observation.
These measurements are called variables.

Fig. 1.4 Example of a tabular dataset


1.6 Loading Data in R 11

In our example dataset, we have five variables which record the name of the
region, the quadrant of the country that the region exists in, the population of the
region in millions of people, the density given in tens of thousands of people per
square kilometer, and the median age of all people living in the region. More details
are given in the following section.
A larger version of this dataset, with more regions and variables, is included
in the book’s supplemental materials as a comma-separated value (CSV) file. We
will make extensive use of this dataset in the following chapters as a common
example for creating visualizations and performing data manipulation. In order to
read in the dataset, we use the function read_csv from the readr package [100].
In order to make the functions from readr available, we need to run the line of
code: library(tidyverse). As mentioned above, tidyverse will automatically
load several packages at once that we will use throughout this book. In each chapter,
we will assume that this package has already been loaded without including the
explicit library command. All other packages will be loaded once per chapter as
needed.

library ( tidyverse )

We call this function with the path to where the file is located relative to where this
script is stored. If we are running the R Markdown notebooks from the supplemental
materials, the data will be called cbsa_acs.csv and will be stored in a folder called
data. The following code will load the CBSA dataset into R, save it as an object
called cbsa, and print out the first several rows. The output dataset is stored as a
type of R object called a tibble.

cbsa <- read_csv(file.path("data", "acs_cbsa.csv"))


cbsa

## # A tibble : 934 x 13
## name geoid quad lon lat pop density
## <chr > <dbl > <chr > <dbl > <dbl > <dbl > <dbl >
## 1 New York 35620 NE -74.1 40.8 20.0 1051.
## 2 Los Angeles 31080 W -118. 34.2 13.2 1041.
## 3 Chicago 16980 NC -88.0 41.7 9.61 509.
## 4 Dallas 19100 S -97.0 32.8 7.54 323.
## 5 Houston 26420 S -95.4 29.8 7.05 317.
## 6 Washington 47900 S -77.5 38.8 6.33 364.
## 7 Philadelphia 37980 NE -75.3 39.9 6.22 506.
## 8 Miami 33100 S -80.5 26.2 6.11 430.
## 9 Atlanta 12060 S -84.4 33.7 6.03 263.
## 10 Boston 14460 NE -71.1 42.6 4.91 518.
## # 924 more rows
## # 6 more variables: age_ median <dbl >,
## # hh_ income _ median <dbl >, percent _own <dbl >,
## # rent_1br_ median <dbl >, rent_perc_ income <dbl >,
## # division <chr >
12 1 Working with Data in R

Notice that the display shows that there are a total of 934 rows and 13 columns. Or,
with our terms defined above, there are 934 observations and 13 variables. Only the
first ten observations and seven variables are shown in the output. At the bottom, the
names of the additional variable names are given. As described above, if we run this
RStudio, we can view a full tabular version of the tibble by clicking on the dataset
name in the Environment tab.
The abbreviations in square brackets above the variable names tell us the types
of data stored in each column. The abbreviation <chr>, which is seen below name,
quad (quadrant), and division, indicates that these columns contain character
data. Character data can consist of any sequence of letters, numbers, spaces, and
punctuation marks. Character variables are often used to represent fixed categories,
such as the quadrant and division of each CBSA region. They can also provide
unique identifiers and descriptions for each row, such as the name of the CBSA
region in our example. Values in a character vector are commonly called strings
throughout R documentation, a convention that we will follow in this text by using
it as a synonym for a character value.
The other abbreviation we see in the tibble from the CBSA data is <dbl>, which
indicates that a column contains numeric data. The abbreviation stands for double,
a historical designation of numeric data indicating how much computer memory is
needed to store a single value. While not seen in this example here, the abbreviation
<int> is used as an alternative abbreviation to indicate that a column contains
integer values (i.e., whole numbers). There are limited practical differences between
doubles and integers when working with R code; we will refer to any variable of
either type as numeric data.
Knowing the types of data for each column is important because, as we will
see throughout the book, they will affect the kinds of visualizations and analysis
that can be applied. The data types in the tibble are automatically determined by
the read_csv function. An optional argument col_types can be set to specify an
alternative, or we can modify data types after the tibble has been created using the
techniques shown in Chap. 3. The character and numeric data types are by far the
most common. Other possible options are explored in Chap. 7 (dates and times),
Chap. 9 (spatial variables), and Chap. 11 (lists and logical values).

1.7 Datasets

Throughout this book, we will use multiple datasets to illustrate different concepts
and show how each approach can be used across multiple application domains. We
draw on data that animates humanities inquiry in areas such as American Studies,
history, literary studies, and visual culture studies. While we will briefly reintroduce
new datasets as they appear, for readers making their way selectively through the
text, we offer a somewhat more detailed description of the main datasets that we
will use in this section.
1.7 Datasets 13

To introduce the concept of EDA, we will make sustained use of the CBSA
dataset in Chaps. 2–5 to demonstrate new concepts in data visualization and
manipulation. As described above, the data comes from an annual survey conducted
by the US Census Bureau called the American Community Survey (ACS). The
survey consists of data collected from a sample of 3.5 million households in the
United States. Outside of the constitutionally mandated decennial census, this is
the largest survey completed by the Census Bureau. It asks several dozen questions
covering topics such as gender, race, income, housing, education, and transportation.
Aggregated data are released on a regular schedule, with summaries over one-,
three-, and five-year periods. Our data comes from the five-year summary from the
most recently published version (2021) at the time of writing. We selected a small set
of measurements that we felt did not require extensive background knowledge while
capturing variations across the country. As seen in the table above, we have selected
the median age, median household income (USD), the percentage of households
owning their housing, the median rent for a one-bedroom apartment (USD), and the
median household spending on rent.
The American Community Survey aggregates data to a variety of different
geographic regions. Most regions correspond to political boundaries, such as states,
counties, and cities. One particularly interesting geographic region are the core-
based statistical areas or CBSA. These regions, of which there are nearly a thousand,
are defined by the US Office of Management and Budget. Regions are defined in
the documentation as “an area containing a large population nucleus and adjacent
communities that have a high degree of integration with that nucleus.” We chose
these regions for our dataset because their social, rather than political, definition
makes them particularly well suited for humanities research questions. Our dataset
includes a short, common name for each CBSA, as well as a unique identifier
(geoid), and several geographic categorizations derived from spatial data provided
by the Census Bureau. All of the code to produce this dataset, using the tidycensus
package within R, is included in the book’s supplementary materials [91].
The core chapters of the book also make use of a dataset illustrating the relative
change in the price of various food items for over 140 years in the United States.
This collection was published as is by Davis S. Jacks for his publication “From
boom to bust: a typology of real commodity prices in the long run” [44]. The data is
organized with one observation per year and variables capturing the relative price of
each of thirteen food commodities. We can read this dataset into R using the same
function that we used for the CBSA dataset, shown below.

food_ prices <- read_csv(file.path("data", "food_ prices .csv"))


food_ prices

## # A tibble : 146 x 14
## year tea sugar peanuts coffee cocoa wheat rye
## <dbl > <dbl > <dbl > <dbl > <dbl > <dbl > <dbl > <dbl >
## 1 1870 129. 151. 203. 88.1 78.8 88.1 103.
## 2 1871 132. 167. 222. 109. 66.7 118. 105.
## 3 1872 134. 162. 189. 140. 71.6 122. 102.
14 1 Working with Data in R

## 4 1873 136. 154. 179. 173. 65.8 116. 106.


## 5 1874 146. 153. 231. 187. 69.9 113. 126.
## 6 1875 149. 150. 197. 176. 69.4 110. 116.
## 7 1876 150. 160. 172. 184. 80.7 114. 106.
## 8 1877 149. 189. 153. 198. 87.8 144. 97.0
## 9 1878 150. 165. 160. 169. 96.0 115. 91.6
## 10 1879 144. 158. 133. 149. 108. 118. 113.
## # 136 more rows
## # 6 more variables: rice <dbl >, corn <dbl >,
## # barley <dbl >, pork <dbl >, beef <dbl >, lamb <dbl >

All of the prices are given on a relative scale where 100 is equal to the price in 1900.
We will use this dataset to show how to build data visualizations that show change
over time. It will also be useful for our study of table pivots in Chap. 5.
Part II turns to data types. The first three application chapters focus on text
analysis, temporal analysis, and network analysis, respectively. While these three
chapters introduce different methods, we will make use of a consistent core dataset
across all three that we have created from Wikipedia. Specifically, we have a
dataset consisting of the text, links, page views, and change histories of a set of
75 Wikipedia pages sampled from a set of British authors. These data are contained
in several different tables, each of which will be introduced as needed. The main
metadata for the set of 75 pages is shown in the data loaded by the following code.

meta <- read_csv(file.path("data", "wiki_uk_meta.csv.gz"))


meta

## # A tibble : 75 x 7
## doc_id born died era gender link short
## <chr > <dbl > <dbl > <chr > <chr > <chr > <chr >
## 1 Marie de France 1160 1215 Early female Mari Mari
## 2 Geoffrey Chaucer 1343 1400 Early male Geof Chau
## 3 John Gower 1330 1408 Early male John Gower
## 4 William Langland 1332 1386 Early male Will Lang
## 5 Margery Kempe 1373 1438 Early female Marg Kempe
## 6 Thomas Malory 1405 1471 Early male Thom Malo
## 7 Thomas More 1478 1535 Sixt male Thom More
## 8 Edmund Spenser 1552 1599 Sixt male Edmu Spen
## 9 Walter Raleigh 1552 1618 Sixt male Walt Rale
## 10 Philip Sidney 1554 1586 Sixt male Phil Sidn
## # 65 more rows

We decided to use Wikipedia data because it is freely available and can be easily
generated in the same format for other collection of pages that correspond to nearly
any other topic of interest. Wikipedia is also helpful because it allows us to look
at pages in other languages, which will allow us to demonstrate how to extend our
techniques to texts that are not in English. Finally, we will return to the Wikipedia
data in Chap. 12 to demonstrate how to build a dataset (specifically, this one) by
calling an API from within R using the httr package [95].
1.9 Extensions 15

Several other datasets will be used throughout the book within a single chapter.
For example, Chap. 9 on spatial data makes use of a dataset showing the location
of French cities and Parisian metro stops as a source in our study of geographic
data. Chapter 10 on image data shows a collection of documentary photographs and
associated metadata in our analysis of images. As these datasets are used only in one
section of the book, we will introduce them in more detail as they are introduced.

1.8 Formatting R Code

It is very important to properly format R code in a consistent way. Even though


the code may run without errors and produce the desired results, keeping the code
well formatted will make it easier to read and debug. We will follow the following
guidelines throughout this book:
1. One space before and after an equals sign or assignment arrow.
2. One space after a comma, but no space before a comma.
3. One space around mathematical operations (such as + and *).
4. If a line of code becomes too long, split the argument to a function into separate
lines, indenting the code two additional spaces.
We have found it makes our life a lot easier if we use these rules right from the start
and whenever we are writing R code.

1.9 Extensions

Each chapter in this book contains a short, concluding section of extensions on the
main material. These include references for further study, additional R packages,
and other suggested methods that may be of interest to the study of each specific
type of humanities data.
In this chapter, we will mention a few standard R references that might be useful
to use in parallel or in sequence with our text. The classic introduction to the core R
language is An Introduction to R by William Venables and David Smith [89]. This
is freely available directly on the same CRAN website where the R language itself
is hosted. The content is quite terse to read linearly, but it serves as a great reference
for anyone coming from another programming language who wants to learn how to
do lower-level programing tasks. We briefly cover some of this material in Chap. 12
but not in anywhere near as much detail.
For the higher-level version of R that we are using in the second edition of this
book, the standard reference is Wickham, Çetinkaya-Rundel, and Grolemund’s R
for Data Science [97]. This open-access book roughly follows the same material
covered in the first and third parts of our text. It introduces far more extensions and
often exhaustively explains all of the optional arguments to new functions. It is a
16 1 Working with Data in R

great reference text after learning the basics and can be useful as a primary text when
guided within a classroom environment to provide more motivation and context to
each technique. It does not have any material for modeling textual, network, spatial,
or image data.
When working through the code in this book’s supplemental materials, as
mentioned above, we will need to run code using the R Markdown format. More
information about the format and what can be done with it can be found in R
Markdown: The Definitive Guide [109]. The philosophy behind the format can be
found in the corresponding research focused on reproducible research pipelines
[107, 108]. Recently, Quarto, a new extension of the R Markdown format, has
quickly gained in popularity [74]. It provides an almost backward compatible
version of R Markdown while extending the functionality to all mixing in other
programing languages.
Chapter 2
EDA I: Grammar of Graphics

2.1 Introduction

As we outlined in Chap. 1, the concept of exploratory data analysis (EDA) is key


to our approach. As a result, data visualization is one of the most important tasks
and powerful tools for the analysis of data. We start our study of exploratory data
analysis with visualization because it offers the best immediate payoff for how
statistical programming can help understand datasets of any size. Visualizations also
have the benefit for those new to programming because it is relatively easy to verify
that our code is working. We can just look at the output and see if the resulting plot
is what we expected. Finally, data visualizations can be useful for even very small
collections of data.
In this chapter, we will learn and use the ggplot2 package for building informa-
tive graphics [94, 106]. The package makes it easy to build fairly complex graphics
in a way that is guided by a general theory of data visualization. The only downside
is that, because it is built around a theoretical model rather than many one-off
solutions for different tasks, it has a somewhat steeper initial learning curve. The
chapter is designed to get us started using the package to make a variety of different
data visualizations.
The core idea of the grammar of graphics is that visualizations are composed
of independent layers. The term “grammar” is used to describe visualizations
because the theory builds connections between elements of the dataset to elements
of a visualization. It builds up complex elements from smaller ones, much like a
grammar provides relations between words in order to generate larger phrases and
sentences. To describe a specific layer, we need to specify several elements. First, we
need to specify the dataset from which data will be taken to construct the plot. Next,
we have to specify a set of mappings called aesthetics that describe how elements
of the plot are related to columns in our data. For example, we often indicate which

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 17


T. Arnold, L. Tilton, Humanities Data in R, Quantitative Methods in the Humanities
and Social Sciences, https://doi.org/10.1007/978-3-031-62566-4_2
18 2 EDA I: Grammar of Graphics

column corresponds to the horizontal axis of the plot and which one corresponds to
the vertical axis of the plot. It is also possible to describe elements such as color,
shape, and size of elements of the plot by associating these quantities with columns
in the data. Finally, we need to provide the geometry that will be used in the plot.
The geometry describes the kinds of objects that are associated with each row of
the data. A common example is the points geometry, which associates a single point
with each observation.
We can show how to use the grammar of graphics by starting with the CBSA
data that we introduced in the previous chapter, where each row is associated with
a particular metropolitan region in the United States. The first plot we will make is
a scatterplot that investigates the relationship between the median price of a one-
bedroom apartment and the population density of the metropolitan region. In the
language of the grammar of graphics, we can start to describe this visualization by
providing the name of the dataset in R (cbsa). Next, we associate the horizontal
axis (called the x aesthetic) with the column in the data named density. The
vertical axis (the y aesthetic) can similarly be associated with the column named
rent_1br_median. We will make a scatterplot, with each point on the plot
describing one of our metropolitan regions, which leads us to use a point geometry.
Our plot will allow us to understand the relationship between city density and rental
prices.
In R, we need to use some special functions to indicate all of this information
and to instruct the program to produce a plot. We start by indicating the name of the
underlying dataset and piping it into a special function called ggplot that indicates
that we want to create a data visualization. The plot itself is created by adding—
literally, with the plus sign—the function geom_point. This function indicates that
we want to add a points geometry to the plot. Inside of the geometry function, we
apply the function aes (short for aesthetics), which indicates that we want to specify
the mappings between components of the plot and column names in our dataset.
Code to write this using the values described in the previous paragraph is given
below. A breakdown of the role of each component is detailed in Fig. 2.1.

cbsa |>
ggplot () +
geom_ point(aes(x = density , y = rent_1br_ median ))

select (cbsa , name , quad , density , rent_1br_ median )

## # A tibble : 30 x 4
## name quad density rent_1br_ median
## <chr > <chr > <dbl > <dbl >
## 1 New York NE 1051. 1430
## 2 Los Angeles W 1041. 1468
## 3 Chicago NC 509. 1060
## 4 Dallas S 323. 1106
## 5 Houston S 317. 997
## 6 Washington S 364. 1601
2.1 Introduction 19

## 7 Philadelphia NE 506. 1083


## 8 Miami S 430. 1230
## 9 Atlanta S 263. 1181
## 10 Boston NE 518. 1390
## # 20 more rows

Fig. 2.1 Diagram of how the elements of the grammar of graphics correspond to elements of the
code and visualization
20 2 EDA I: Grammar of Graphics

Fig. 2.2 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey

Running the code above from an RMarkdown file opened in R Studio will show the
desired visualization right below the block of code. Within this book, we will show
the results of plots within figures. The plot here is shown in Fig. 2.2. In this plot,
each row of our dataset, a CBSA region, is represented as a point in the plot. The
location of each point is determined by the density and median rent price for a one-
bedroom apartment in the corresponding region. Notice that R has automatically
made several choices for the plot that we did not explicitly indicate in the code, for
example, the range of values on the two axes, the axis labels, the grid lines, and
the marks along the grid. R has also automatically picked the color, size, and shape
of the points. While the defaults work as a good starting point, it is often useful to
modify these values; we will see how to change these aspects of the plot in later
sections of this chapter.
Scatterplots are typically used to understand the relationship between two
numeric values. What does our first plot, shown in Fig. 2.2, tell us about the
relationship between city density and median rent? There is not a clear trend
between these two variables. Rather, the plot of these two economic metrics clusters
the regions into several groups. We see a couple of regions with a very high density
but only moderately large rental prices, one city with unusually high rental prices,
and the rest of the regions fairly uniformly distributed in the lower-left corner of the
2.2 Text Geometry 21

plot. Let’s see if we can give some more context to the plot by adding additional
information.

2.2 Text Geometry

A common critique of computational methods is that they obscure a closer


understanding of each individual object of study in an attempt to search for
numeric patterns. This is certainly an important caution; computational analysis
of humanities data should always be paired with close analysis. However, it does
not always have to be the case that visualizations reduce complex collections to a
few numerical summaries. This is particularly so when working with a dataset that
has a relatively small number of observations. Looking back at our first scatterplot,
how could we recover a closer analysis of individual cities while also looking for
general patterns between the two economic variables? One option is to add labels
indicating the names of the regions. These names would let anyone looking at the
plot to adding their own understanding of the individual regions as an additional
layer of information as they interpret the plot.
Adding the names of the regions can be done by using another type of geometry
called a text geometry. This geometry is created with the function geom_text. For
each row of a given dataset, this geometry adds a small textual label. As with the
point geometry, it requires us to specify which columns of our data correspond to the
x and y aesthetics. These values tell the plot where to place the label. Additionally,
the text geometry requires an aesthetic called label that indicates the column of the
dataset that the label should take its text from. In our case, we will use the column
called name to make textual labels on the plot, a reminder that this is a column name
from the data that we loaded into R. The code block below produces a text label
plot by changing the geometry type and adding the additional aesthetic from the
previous example.

cbsa |>
ggplot () +
geom_text(aes(
x = density , y = rent_1br_median , label = name
))

The plot generated by the code is shown in Fig. 2.3. We can now see which region
has the highest rents (San Francisco). And, we can identify which regions have the
highest density (New York and Los Angeles). We can also identify regions such as
Detroit that are relatively dense but inexpensive or regions such as Denver that are
not particularly dense but still one of the more expensive regions to rent in. While
we have added only a single additional piece of information to the plot, each of
the labels uniquely identifies each row of the data. This allows anyone familiar with
metropolitan regions in the United States to bring many more characteristics of each
22 2 EDA I: Grammar of Graphics

Fig. 2.3 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey. Here, short descriptive names of the regions are included

data point to the plot through their own knowledge. For example, while the plot does
not include any information about overall population, anyone who knows the largest
cities in the United States can use the plot to see that the two most dense cities (New
York and Los Angeles) are also the most populous. And, while the plot does not have
information about the location of the regions, if we know the general geography of
the country, it is possible to see that many of the cities that are expensive but not
particularly dense (Portland, Denver, Seattle, and San Diego) are on the West Coast.
These observations point to the power of including labels on a scatterplot.
While the text plot adds additional contextual information compared to the
scatterplot, it does have some shortcomings. Some of the labels for points at the
edges of the plot fall off and become truncated. Labels for points in the lower-left
corner of the plot start to overlap one another and become difficult to read. These
issues will only grow if we increase the number of regions in our dataset. Also, it is
not entirely clear what part of the label corresponds to the density of the cities. Is it
the center of the label, the start of the label, or the end of the label? We could add a
note that the value is the center of the label, but that becomes somewhat cumbersome
to have to constantly remember and remind ourselves and others about.
To start addressing these issues, we can add the points back into the plot with
the labels. We could do this in R by adding the two geometry layers (geom_point
and geom_text) one after the other. This will make it more clear where on the x-
2.2 Text Geometry 23

axis each region is associated to but at the same time will make the names of the
cities even more difficult to read. To fix the second problem, we will replace the text
geometry with a different geometry called geom_text_repel. It also places labels
on the plot but has special logic that avoids intersecting labels. Instead, labels are
moved away from the data points and connected (when needed) by a line segment.
As with the text geometry, the text repel geometry requires specifying x, y, and
label aesthetics. Below is the code to make both of these modifications.

library ( ggrepel )

cbsa |>
ggplot () +
geom_ point(aes(x = density , y = rent_1br_ median )) +
geom_text_ repel(aes(
x = density , y = rent_1br_median , label = name
))

The output of the plot with the points and text repelled labels is shown in Fig. 2.4.
Notice that the repel feature has attempted to avoided writing labels that intersect
one another. It has also tried to avoid having the labels intersect the points and avoid

Fig. 2.4 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey. Here, short descriptive names of the regions are included but offset from the points to make
the plot easier to read
24 2 EDA I: Grammar of Graphics

having the labels get pushed outside of the plot. Since the points indicate the specific
values of the density and median rents, the labels are free to float around as long as it
is clear which label is associated with each point. Some of the labels do still become
a bit busy in the lower left-hand corner; this could be fixed by making the size of
the labels slightly smaller, which we will learn how to do later in the chapter. Once
the number of points becomes larger, it will eventually not be possible to label all
of the points. Several strategies exist for dealing with this, such as only labeling a
subset of the points. We will see these techniques as they arise in our examples. The
ggplot2 package and communities online have an entire ecosystem of strategies
for increasing interpretability and adding context to plot, providing strategies for
using the exploratory and visual power of data visualization to garner insights from
humanities data.

2.3 Lines and Bars

There a large number of different geometries supplied by the ggplot2 package, in


addition to the even larger collection of extensions by other R packages. We will
look at two other types of geometries in this section that allow us to investigate
common relationships between pairs of columns of a dataset. Other geometries
will be discussed throughout the book as the need arises, and the full list of
geometries can be found in the ggplot2 package’s documentation. A summary of
all the geometries shown in this chapter is given in Fig. 2.5.
For a moment, we will switch gears and look at the food prices dataset, which was
introduced in the previous chapter. This data contains one row for every year from
1870 through 2015, with relative prices for thirteen different food items across the
United States [44]. Consider a visualization showing the change in the price of tea
over the 146 years in the dataset. We could create a scatterplot where each point is a
row of the data, the x aesthetic captures the year of each record, and the y aesthetic
measures the relative cost of tea. This visualization would be fine and could roughly
help us understand the changes in relative prices for this commodity. A common
visualization type, however, for data of this format is a line plot, where the price in
each year is connected by a line to the price in the subsequent year. To create such
a plot, we can use the geom_line geometry. This is most commonly used when the
horizontal axis measures some unit of time but can represent other quantities that we
expect to continuously and smoothly change between measurements on the x-axis.
The line geometry requires the same aesthetics as the point geometry and can be
created with the same syntax, as shown in the following block of code.

food_ prices |>


ggplot () +
geom_line(aes(x = year , y = tea))
2.3 Lines and Bars 25

Fig. 2.5 Examples of common geometries used in the grammar of graphics


26 2 EDA I: Grammar of Graphics

Fig. 2.6 Plot of the price of tea in standardized units (100 is the price in 1900) over time

The output of this visualization, shown in Fig. 2.6, allows us to see the change over
time of the tea prices. Notice that the relative price decreased fairly steadily from
1870 through to 1920. It had a few sudden drops and reversals in the 1920s and
1930s, before increasing again in the 1950s. The relative cost of tea then decreased
again fairly steadily from the mid-1950s through to the end of the data range in
2015.
Another common usage of a visualization is to see the value of a numeric column
of the dataset relative to a character column of the dataset. It is possible to represent
such a relationship with a geom_point layer. However, it is often more visually
meaningful to use a bar for each category and the height or length of the bar
representing the numeric value. This type of plot is most common when showing
the counts of different categories, something we will see in the next chapter, but
can also be used in any situation where a numeric value is associated with different
categories. To create a plot with bars, we use the geom_col function, providing both
x and y aesthetics. R with automatically create vertical bars if we have a character
variable associated with the x aesthetic and horizontal bars if we have one in the
y aesthetic. Putting the character variable on the y-axis usually makes it easier to
read the labels, so we recommend it in most cases. In the code block below, we have
the commands to create a bar plot of the population in each region from the CBSA
dataset, which will be shown in Fig. 2.7.
2.4 Optional Aesthetics 27

Fig. 2.7 Plot of the population of the largest 30 core-based statistical areas in the United States,
showing their population from the 2021 American Community Survey

cbsa |>
ggplot () +
geom_col(aes(x = pop , y = name))

One of the first things that stands out in the output shown in Fig. 2.7 is that the
regions are ordered alphabetically from bottom to top. The visualization would be
much more useful and readable if we could reorder the categories on the y-axis. This
is also something that we will address in the following chapter. For now, we can see
how ggplot2 is offering a range of plot types to see our data from different angles.
We can add additional context through additional aesthetics.

2.4 Optional Aesthetics

In the previous sections, we have shown how visualizations can be built out of
geometry layers, where each geometry is associated with a dataset and a collection
of variable mappings known as aesthetics. The point, line, and bar geometries
require x and y aesthetics; the text and text repel geometries also required an
aesthetic named label. In addition to the required aesthetics, each geometry
28 2 EDA I: Grammar of Graphics

Fig. 2.8 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey. Here, the points are colored based on the quadrant in which the city is found in the United
States

type also has a number of optional aesthetics that we can use to add additional
information to the plot. For example, most geometries have a color aesthetic. The
syntax for describing this is exactly the same as with the required aesthetics: we
place the name of the aesthetic followed by the name of the associated variable
name. Let’s see what happens when we add a color aesthetic to our scatterplot
by relating the column called quad to the aesthetic named color. Below is the
corresponding code; the output is shown in Fig. 2.8.

cbsa |>
ggplot () +
geom_ point(aes(
x = density , y = rent_1br_median , color = quad
))

The result of associating a column in the dataset with a color produces a new
variation of the original scatterplot. We have the same set of points and locations on
the plot, as well as the same axes. However, now each color has been automatically
associated with a region and every point has been colored according to the region
column associated with each row of the data. The mapping between colors and
2.4 Optional Aesthetics 29

region names is shown in an automatically created legend on the right-hand side of


the plot. The ability to add additional information to the plot by specifying a single
aesthetic speaks to how powerful the grammar of graphics is in terms of quickly
producing informative visualizations of data. In the first edition of this text, which
used the built-in graphics system in R, it was necessary to write nearly a dozen lines
of code to produce a similar plot. Now that we are able to use the ggplot2 package,
this process has been greatly simplified.
In the previous example, we changed the color aesthetic from the fixed default of
black to a color that changes with another variable. It is also possible to specify an
alternative, fixed value for any aesthetic. We can draw on the color names available
in R. For example, we might want to change all of the points to be a shade of green.
This can be done with a small change to the function call. To do this, we set the
color aesthetic to the name of a color, such as “red.” However, unlike with variable
aesthetics, the mapping needs to be done outside of the aes() function but still
within the geom_* function. Below is an example of the code to redo our plot with
a different color; we use a color called “olivedrab,” which in print is much more
aesthetically pleasing than its name might at first suggest.

cbsa |>
ggplot () +
geom_ point(aes(
x = density , y = rent_1br_ median
), color = " olivedrab")

While minor, the changed notation for specifying fixed aesthetics is a common
source of confusing errors for users new to the geometry of graphics, so be careful to
follow the correct syntax of arguments as in the code above. One can interchange the
fixed and variable aesthetic commands, and the relative order should not effect the
output. Just be sure to put fixed terms after finishing the aes() command (Fig. 2.9).
While each geometry can have different required and optional aesthetics, the
ggplot2 package tries as much as possible to use a common set of terms for the
aesthetics in each geometry. We have already seen the x, y, and label aesthetics
in the previous sections and just introduced the color aesthetic. Color can also
be used to change the color of a line plot or the color of the font in a text or text
repel geometry. For applications such as the bar plot, we might want to modify both
the border and interior colors of the bars; these are set separately by the color
and fill aesthetics, respectively. The size aesthetic can be used to set the size
of the points in a scatterplot or the font size of the labels in a text geometry. The
shape aesthetic is used to modify the shape of the points. An aesthetic named
alpha controls the opacity of points, with a value of 1 being the default and 0
being completely invisible. Some of these, such as alpha, are most frequently used
with fixed values, but if needed, almost all can be given a variable mapping as well.
30 2 EDA I: Grammar of Graphics

Fig. 2.9 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey. The color of the points has been changed to a dark green called “olivedrab”

2.5 Scales

R makes many choices for us automatically when creating any plot. In our example
above, Fig. 2.8, in which we set the color of the points to follow another variable
in the dataset, R handles the details of how to pick the specific colors and sizes.
It has figured how large to make the axes, where to add tick marks, and where to
draw grid lines. Letting R deal with these details is convenient because it frees us
up to focus on the data itself. Sometimes, such as when preparing to produce plots
for external distribution, or when the default are particularly hard to interpret, it is
useful to manually adjust these details. This is exactly what scales were designed
for.
Each aesthetic within the grammar of graphics is associated with a scale. Scales
detail how a plot should relate aesthetics to the concrete, perceivable features in a
plot. For example, a scale for the x aesthetic will describe the smallest and largest
values on the x-axis. It will also code information about how to label the x-axis.
Similarly, a color scale describes what colors corresponds to each category in a
dataset and how to format a legend for the plot. In order to change or modify the
default scales, we add an additional function to the code. The order of the scales
2.5 Scales 31

relative to the geometries do not effect the output; by convention, scales are usually
grouped after the geometries.
For example, a popular alternative to the default color palette shown in our
previous plot is the function scale_color_viridis_d(). It constructs a set of
colors that is color-blind friendly, looks nice when printed in black and white, and
displays fine on bad projectors. After specifying that the color of a geometry should
vary with a column in the dataset, we specify that viridis color scale by adding the
function as an extra line in the plot. An example is shown in the following code.

cbsa |>
ggplot () +
geom_ point(aes(
x = density , y = rent_1br_median , color = quad
)) +
scale_ color_ viridis _d()

The output shown in Fig. 2.10 shows that the colors are now given by a range from
dark purple to bright yellow in place of the rainbow of colors in the default plot.
As with the categories in the bar plot, the ordering of the unique colors is given

Fig. 2.10 Plot of the largest 30 core-based statistical areas in the United States, showing their
density and the median price to rent a one-bedroom apartment from the 2021 American Community
Survey. Here, the points color based on the quadrant in which the city is found in the United States,
with a color-blind friendly color scale
32 2 EDA I: Grammar of Graphics

by putting the categories in alphabetical order. Changing this requires modifying


the dataset before passing it to the plot, something that we will discuss in the next
chapter. Note that the _d at the end of the scale function indicates that the colors are
used to create a set of mappings for a character variable (it stands for “discrete”).
There is also a complimentary function scale_color_viridis_c that produces
a similar set of colors when making the color of the points change according to
a numeric variable. The code below demonstrates the continuous case, where the
population is treated as a numeric variable.

cbsa |>
ggplot () +
geom_ point(aes(
x = density , y = rent_1br_median , color = pop
)) +
scale_ color_ viridis _c()

Many other scales exist to control a variety of aesthetics. For example,


scale_size_area can be used to make the size of the points proportional to
one of the other columns in a dataset. There are also several scales to control the x
and y axes. For example, we can add scale_x_log10() and scale_y_log10()
to a plot to produce values on a logarithmic scale, which can be very useful when
working with heavily skewed datasets. We will use this in later chapters as needed.
The default scale for the x-axis is called scale_x_continuous. A correspond-
ing function scale_y_continuous is the default for the y-axis. Adding these to a
plot on their own has no visible effect. However, there are many helpful optional
arguments that we can provide to these functions that change the way a plot is
displayed. Setting n.breaks within one of these scales tells R the (approximate)
number of labels to put on the axis. Also, making minor_breaks equal to NULL
turns off the minor grid lines. We can set the value limits to a pair of numbers
in order to describe the starting and ending range on a plot. Below is the code to
produce the plot in Fig. 2.11, which shows the same data as our original scatterplot,
but now with modified grid lines, axis labels, and vertical range.

cbsa |>
ggplot () +
geom_ point(aes(x = density , y = rent_1br_ median )) +
scale_x_ continuous (n. breaks = 10, minor_ breaks = NULL) +
scale_y_ continuous ( limits = c(0, 2000))

Finally, there are two special scale types that can be useful for working with colors.
In some cases, we may already have a column in our dataset that explicitly describes
the color of an observation; here, it would make sense to use these colors directly. To
do that, we can add the scale scale_color_identity to the plot. Another type of
scale that can be useful for colors is scale_color_manual. Here, it is possible to
describe exactly which color should be used for each category. Below is the syntax
for producing manually defined colors for each region in the CBSA dataset.
Another Random Scribd Document
with Unrelated Content
Bob’s sharp eyes took all these things in at a glance, and then they
turned toward the sheriff.
The latter looked solemn, but he did not appear to be at all
astonished. He knew that George Edwards had never put those
bundles in that hole; and there were other men in the party who
knew it, too.
But the question was: Who did do it?
It was answered in a very few minutes, and in a most unexpected
manner.
“George, I am astonished at you!” said Uncle Ruben, drawing the
back of his hand across his eyes, and wiping away the tears that
would not come at his bidding. “Neighbor Newton, these things come
from some of the stores that’s been robbed.”
The officer nodded his head, but said nothing.
“There’s been a heap of this sort of work goin’ on,” continued
Uncle Ruben; “an’ who knows but there’s something else hid away
about here? Let’s take a look through the bushes, all of us, an’ see if
we can find anything in ’em.”
Some of the party complied, moving about in a listless sort of way,
and showing by all their actions that their hearts were not in the
matter, while the others held the horses and awaited the result of the
search in silence.
Uncle Ruben kept clear of the thicket into which he had thrown the
chickens, hoping that some one would stumble upon it. Two or three
men did walk through it, but they found nothing.
Then Uncle Ruben went in himself; but he, too, came out empty-
handed. Beyond a doubt, some prowling fox or raccoon had been
there before him and carried off the chickens.
“Well, Mr. Edwards, you don’t seem to be having very good luck,”
said the sheriff, who was growing tired of this “spite-work business,”
as he afterward termed it.
“No, I don’t seem to find nothing—that’s a fact,” replied the man,
as he came out of the bushes, looking rather surprised and
crestfallen. “Queer, too, I must say—for my hen-roost was robbed
t’other night.”
While Uncle Ruben was wondering whether or not it would be safe
to accuse George of having stolen and eaten the chickens, the rest of
the searching party came out of the woods, one after the other.
And when they were all assembled, and were waiting for the officer
to speak, Bob Howard, after holding a short consultation with Dick,
stepped out where all could see him.
“Now, then, I’ve got the floor,” said he, “and I will show you how to
go to the bottom of this business in less than two minutes.”
Everybody seemed to know that there was something coming now.
The sheriff looked expectant, and those who had accompanied him to
the cabin, merely out of curiosity, led their horses closer to the
speaker and formed a complete circle around him.
As Bob uttered these words, he fastened his eyes upon Wallace and
his two friends, and kept them there so long that the rest of the party
began to look toward them, also.
Wallace, who showed himself to be possessed of uncommon nerve,
met his gaze without flinching; Forbes moved about uneasily and
smiled in a sickly sort of way; and Benson, utterly unable to endure
his close scrutiny, walked off as though he had no particular object in
view, leading his horse by the bridle.
“Don’t go away, Benson,” said Bob. “You are just the fellow I want
to talk to. Come back here.”
“Why, Bob, you’re crazy!” exclaimed Wallace. “What does Benson
know about Mr. Stebbins’ money? I mean—”
Wallace saw that he had made a false step, and he intended to
correct it; but Bob was too quick for him.
“Who said anything about Mr. Stebbins’ money?” he demanded.
“That subject was dropped long ago; but Benson knows all about it,
and so do you and Forbes.”
The horsemen moved up closer to Bob, and exclamations of
astonishment were heard on all sides. Forbes would have been glad
to run away with Benson, but Wallace stood his ground manfully.
“If I know all about it, why don’t you question me instead of
Benson?” he inquired, with a sneer.
“Because I don’t choose to, just now. I may have a few questions to
ask you, by-and-by.”
“Well, I shall do as I please about answering them.”
“Of course; that’s your privilege. But you’ll not do as you please
about answering them, when you find yourself hauled up before
Judge Baker. Come back here, Benson.”
But Benson paid no attention to him. He did not think it would be
quite safe to go back, for he knew too well what was coming. He led
his horse around the corner of the cabin, and there is every reason to
believe that he intended to mount him and ride away; but his
purpose was defeated by Dick Langdon and George, who sprang
around the opposite end of the cabin and ran along the front of it,
just in time to seize the bridle of Benson’s horse as the young fellow
was about to swing himself into the saddle.
“Look here, Benson! You’re only making a bad matter worse,”
warned Dick.
“Let me alone!” protested Benson, whose eyes filled with tears as
fast as he could wipe them away. “I don’t know anything about Mr.
Stebbins’ money.”
“Yes, you do,” said Dick, firmly. “Bob Howard and I were there,
and we drove you away just as you were about to go into the house
through the wood-shed window. I am sorry for you; but if you think
that Bob and I are going to stand still and let somebody accuse us of
a crime of which you are guilty, you will find that you are mistaken.”
When Dick took him by the arm and attempted to lead him behind
the cabin, Benson showed a disposition to resist him, and it is
probable that he would have done so if the sheriff had not put in an
appearance.
The latter had been looking for something strange and unexpected
to come of this morning’s work, but he had little dreamed that it
would be the means of putting him on the track of the burglars for
whom he had been so long watching.
He knew now, as well as he knew it ten minutes later, that Benson
and his two friends had made an effort to steal Mr. Stebbins’ money
—that they were responsible for at least one of the burglaries that
had been committed in the village—and he was astounded by the
discovery; but his face did not show it.
The culprits were the sons of the wealthiest and most prominent
men in the county, and, although the officer did not approve of their
idle, shiftless ways, and watched their conduct with some concern, as
many other good men in the village did, they were the last ones he
would have suspected of any crime. He wondered what it was that
had led them to it, and the next Monday he found out.
“Benson, come with me,” said the officer, kindly, but firmly. “I
should like to have a few words with you in private. Dick, you and
George go around where the others are, and tell them that I don’t
want to be interrupted.”
“Well, smart Alecks, what have you accomplished?” asked Wallace,
as Dick and his companion joined their friend, Bob Howard.
“We kept Benson from running away,” replied Dick, whose even
temper was not in the least ruffled by the other’s insulting tones. “We
couldn’t afford to let him get out of sight, you know, because we shall
need his evidence. You said last night that if you ever got into
trouble, it would be through him, and I guess you hit the nail right on
top of the head.”
“I never said any such thing,” denied Wallace, hoping by an
assumption of rage, which he did not feel, to hide the alarm he did
feel. “Now, I am sick of all this nonsense, and I want to know what
you mean by it.”
“You will find out all you want to know as soon as Benson has
finished his confession.”
“Confession!” gasped Wallace.
That was the thing of which he stood the most in fear. If Benson’s
courage gave way, there was no hope for them. The bare thought was
enough to terrify him beyond expression.
His face was fairly livid, while Forbes could only maintain an
upright position by clinging to the horn of his saddle.
CHAPTER XIV.
THE UPSHOT OF THE WHOLE MATTER.

“W here is Benson now?” asked Wallace, as soon as he could


speak. “What did you do with him?”
“We left him on the beach with the sheriff; but I wouldn’t advise
you to go around there,” said Dick, as Wallace handed his bridle to
Forbes and moved away. “Mr. Newton desired me to say to all of you
that he doesn’t wish to be interrupted.”
“You shut your mouth, and keep your advice until you are asked
for it!” said Wallace, fiercely.
Knowing Benson as well as he did, he dared not leave him alone
with the officer; so he kept on, and presently those who remained
behind heard loud voices on the other side of the cabin.
An animated conversation was kept up for a minute or two, and
then the officer appeared, bringing Wallace with him. The latter was
angry and excited, while the sheriff’s face wore a determined look.
“Steve,” said he, addressing one of the horsemen, and speaking in
an authoritative tone of voice, “I shall have to ask you to take charge
of this young man.”
“Hello! He’s been arrested,” whispered Dick.
“And I ask you once more, and for the last time, to take your hands
off me!” howled Wallace, trying in vain to twist his arm out of the
officer’s grasp. “You want to look out for me, for I’m dangerous when
I’m riled.”
“Arthur, if you don’t behave yourself, I shall put you under close
restraint,” said Mr. Newton, sternly.
“You mean by that, that you will put the bracelets on me, I
suppose!” yelled Wallace, who acted for all the world like a crazy boy.
“You can’t do it. Now, I am going to show you what Wild Harry is
made of.”
Before the officer could prevent it, Wallace thrust his hand into his
hip-pocket, and when he brought it out again, he brought with it an
ivory-handled revolver.
The spectators looked at it with the utmost consternation depicted
on their countenances, and Mr. Stebbins, uttering a cry of alarm,
started up his horse, from which he had never once dismounted, and
almost ran over Bob and George in his eagerness to get out of harm’s
way.
There is no doubt, whatever, that Wallace intended to use the
weapon he had so unexpectedly produced; but fortunately for
himself and all concerned, he had to deal with men who were not
easily intimidated, and who did not allow their astonishment to
prevent them from acting quickly and promptly.
Before Wallace could think twice, the revolver was wrenched from
his grasp, and the broad-shouldered Steve, rushing upon him from
behind, clasped him around the arms, pinning them securely to his
side.
A moment later there were two ominous “clicks,” and when Steve,
in obedience to a sign from the officer, released his hold upon the
captive, the latter was powerless, his wrists being encircled by a pair
of hand-cuffs.
“This is the most extraordinary thing I ever heard of. I don’t
understand it at all,” said the sheriff.
And the reason he did not understand it was because he had not
yet gone to the bottom of the matter. He knew more about it before
two days more had passed over his head.
“Forbes,” shouted Wallace, after he had made several desperate
but unsuccessful attempts to pull off the hand-cuffs, “where’s your
gun? Why do you stand there looking instead of helping me?”
This question very naturally suggested the idea that possibly the
youth appealed to have something dangerous about him, and two or
three of the party at once moved toward him, with the intention of
satisfying themselves on that point.
But Forbes did not wait to be searched. The ease with which his
companion had been conquered took all the courage out of him, and
he handed out his “gun”—a nickel-plated revolver—before he was
asked for it.
The sheriff put it into his pocket, to keep company with the one he
had taken from Wallace, and then went back to the front of the cabin
to hear the rest of Benson’s confession, leaving two prisoners instead
of one in Steve’s charge.
He did not think it necessary to put Forbes under “close restraint,”
for the latter was thoroughly cowed, and quite as willing to make a
clean breast of the whole matter as Benson was.
All these things, which we have been so long in describing,
occupied but a very short time in taking place—probably not over ten
minutes.
The spectators had had but little to say, because their
astonishment held them speechless. They had barely time to recover
from the surprise occasioned by one startling disclosure before they
were called upon to be surprised at something else.
They were all satisfied on one point, and that was that the events
of the preceding night had been the means of unearthing the thieves
of whom they had so long stood in fear.
But, like Bob Howard, they could not for the life of them see why
boys in their circumstances, who had indulgent parents, comfortable
homes and everything in the way of benefits and amusements that
reasonable boys ought to ask for, could become criminals.
When the sheriff came back, accompanied by Benson, who was
crying as though he had been whipped, they stared at him very hard,
in the hope of seeing something in the officer’s face that would
enlighten them on this point; but they were disappointed.
They could only judge of the result of his long interview with
Benson by his actions. Without saying a word, he tied the bundles
which Uncle Ruben had dug out of the ground, fastened them to the
horn of his saddle and mounted his horse.
When he was ready to start, he said, addressing himself to George
and his friends:
“Now, boys, I am going back to the village.”
“Do you want us to go with you?” asked Dick.
“No, I do not,” answered the officer. “I shall probably—”
At this point Uncle Ruben interrupted him. He was no less
astonished than the others were by the incidents that had transpired
during the last few minutes, and he was angry and disgusted, too.
He had come up there on purpose to find the chickens, which he
had killed himself, in order that he might have some excuse for
accusing George of robbing his hen-roost, and his failure to produce
the evidence he had so carefully prepared exasperated him. It looked
now as though his nephew was going to get off scot free.
“Look here, Newton,” exclaimed Uncle Ruben, “ain’t you goin’ to
arrest George, too?”
The officer replied very decidedly that he was not.
“What for?” demanded Uncle Ruben.
“Because I understand my business, and have no desire to put an
innocent boy to any trouble.”
“Well, it’s mighty strange where my two Plymouth Rock chickens
have gone to. They was wuth two dollars,” whined Uncle Ruben, who
thought quite as much of money as Mr. Stebbins did.
The sheriff made no reply. Addressing himself to George, he said:
“I shall probably need your services on Monday morning.”
“Very good, sir,” answered George. “Do you want me to go down to
the village?”
“No, I will come up here. And, Dick, I shall no doubt find you and
Bob at the academy if I have occasion to serve a summons on you?
All right. Good-by! I am sorry that we have put you to so much
trouble and anxiety.”
“I am not,” said Bob cheerfully. “This thing was bound to happen,
sooner or later, and now it is over.”
The sheriff and his party rode away, and the three boys went
around to the front of the cabin and seated themselves on the bench.
“Do you know, Dick, that we had a very narrow escape last night?”
said Bob, who was the first to speak.
“Of course I do. Didn’t you see that window this morning? It was
full of holes, and if we had been there—”
“I wasn’t thinking of that. I mean it was a lucky thing for us that we
didn’t try to approach the house after we drove the robbers away.
While you were telling your story to the sheriff, I heard Mr. Stebbins
say to a man near him that he stood guard at that window all night,
ready to shoot the first one of us who showed himself.”
“And he would have done it without realizing what he was about,”
replied George. “His fright took away all his sense. But what do you
suppose the sheriff is coming up here for on Monday morning?”
That was a question that neither Dick nor Bob could answer. Like
the causes that had impelled Wallace and his companions to take up
stealing as a pastime, it was a mystery, and so it would remain until
time unravelled it.
While they were discussing the matter, Dick Langdon caught a
momentary glimpse of something that brought him to his feet and
sent him post-haste into the cabin. When he came out again, he
carried his double-barrel in his hands, and his cartridge-belt was
buckled about his waist.
“Have you fellows forgotten that we are hungry, and that dinner
was to be served immediately?” he asked. “Now make yourselves
useful as well as ornamental, while I go out and shoot a squirrel. I
just saw one run up that hickory tree.”
Dick moved away with stealthy footsteps, holding his gun in
readiness for a shot, and Bob and George went about their work in
that listless, die-away manner that boys always assume when they
are compelled to do something in which they feel no interest. Their
excitement had taken away their appetites.
Their tongues were busier than their hands, and as soon as Bob
found an opportunity to do so, he asked George why it was that
Uncle Ruben had manifested so strong a desire to get him into
trouble. The latter replied by telling as much of his private history as
he cared to reveal to a boy who was almost a stranger to him, and
when he ceased speaking, Bob said:
“You may have the satisfaction of knowing that from this time on
you need never see him again, unless you are willing to do so.
Wallace and the others will be brought to trial, of course, and you
will have to appear as a witness. When you go down to the village in
obedience to the summons, be sure and take all your clothes with
you, for you are not coming back here to live like a wild Injun,” he
added with a laugh.
“What do you mean by that?”
“I mean that our old janitor is going to leave next Monday night—
he’s real hateful, and the boys played so many tricks on him, that he
can’t stand it any longer—and you are to take his place. Dick and I
have settled it.”
George could hardly believe that he had heard aright. If Uncle
Ruben had succeeded in proving that he was a chicken-thief, he
could not have been more amazed. He saw a bright prospect opening
before him. All he asked was an opportunity to get an education, and
he would answer for his own future.
“Lend me your knife long enough to open this can of milk,” said
Bob. “It’s bigger and stronger than mine. That’s the way the thing
stands. You are to take care of the buildings—there is another fellow
there who looks out for the grounds—ring the bell at certain hours,
and see to it that the boys don’t run off with it, or the ropes belonging
to it, every chance they get. You’ll have to report us for every
violation of the rules, and take a good thrashing every time you do it.
You’ll have to attend to lots of things that I can’t think of now, and, in
return, you’ll get your books and schooling free, and money enough
to keep you in clothes. Professor Boyle says he thinks you are just the
boy he has been looking for.”
“But I don’t know him,” stammered George.
“No matter. I know him, and so does Dick. My father knew him
well when they were boys together, and that is the reason he sends
me so far away from home to go to school.”
“You are at the bottom of this, Bob—you and Dick—but I don’t
know how to thank you for it,” said George, at length.
“Do you remember what you said to me when you brought my gun
up from the bottom of the lake?” asked Bob. “You needn’t try.”
George thought it best to act upon this advice, for he could not find
words with which to express his gratitude.
CHAPTER XV.
THE RENDEZVOUS.

G eorge’s unexpected stroke of fortune put new life and energy


into him, and he worked to such good purpose that in less than
three-quarters of an hour the dinner was ready and waiting.
Neither of them had much to say, each being fully occupied with
his own thoughts. George was telling himself how good he was going
to be, how hard he was going to study when he was fairly installed at
the academy, and had learned how to perform the duties that were
required of him, while his companion was looking a little further into
the future.
Bob Howard had as good a home as any boy ever had, and, unlike
a good many of his age, he knew and fully appreciated the benefits of
it; but it was a lonely home in some respects, for he had no mother,
and not a playmate within many miles of him. Here was a boy who
had saved his life at the imminent risk of his own, who was also
motherless, who had no father worth mentioning, and if he found
that George, speaking in schoolboy parlance, “wore well”—if, after
summering and wintering him, he became satisfied that he was as
good a fellow in every respect as he seemed to be—why shouldn’t he
take him home with him when they had both completed the course at
the academy, and make a brother of him? The house was large
enough for them—if it were not, the mountain range around it was—
and Bob was sure that his father would give his friend a cordial
welcome.
Bob was resolved that he would think the matter over when he
could devote more time to it.
“What shall we do now?” said George, breaking in on his reverie.
“Dinner is ready, but Dick hasn’t returned.”
“We’ll not waste any time in waiting for him,” replied Bob. “The
last time he shot he was so far away that I could hardly hear the
report of his gun. Let’s eat our dinner and go back to the bass-hole.
Dick won’t come back as long as he can find a squirrel to shoot at,
and when he does come he can help himself.”
The boys did not have as good luck that afternoon as they did in
the morning, for they were on the ground too early to get the evening
fishing. Still, they added a few fine bass to their string; but, about the
time the fish began to show a disposition to take the bait promptly,
they were obliged to pull up the anchor and start for the cabin.
They found Dick sitting on the bench, picking the bones of a
squirrel he had broiled over the coals on a forked stick. He had
eighteen others to carry home with him.
Having a long walk before them, he and Bob decided to start for
the village at once. They wanted to get through the woods before
dark.
“We’ll leave our surplus provisions here, so that it will not be
necessary for us to bring a new supply when we come again,” said
Dick, as he proceeded to pack his squirrels and some of the fish away
in his basket. “Has Bob told you that you are to be janitor at the
academy? All right; but remember that you are to be easy on the
boys. If we are out after ten o’clock, you are to be at the gate to let us
in; and you are not to report us, no matter what we do. We’ll see you
on Monday, I suppose, and you must tell us what the sheriff wanted
of you.”
George took his friends across the lake in his boat, put them on the
road leading to the village, and returned to the cabin, feeling lonely,
indeed, but at the same time very much elated and encouraged.
Monday morning came at last, and with it came the deputy sheriff,
accompanied by two constables. They were all mounted, and one of
the constables led an extra horse, which George soon learned was
intended for his own use.
“This is my idea of a hunter’s home,” said Mr. Newton, who
seemed to enjoy the view that was spread out before him. “I
shouldn’t mind living this way myself, if I could make a support by
it.”
“You would find it a dog’s life,” said George. “At least, I have found
it so. I didn’t come here from choice, and I am heartily glad that this
is my last day here. How is everything in the village?”
“Oh, the excitement is intense, and the fathers of those young
rogues are very indignant! I have been called everything but a decent
man by them and their friends; but I was justified in arresting them,
for Benson and Forbes have made a full confession. Wallace is as
defiant as ever, and neither denies nor acknowledges anything. Now,
George, do you know where Dungan Brook is?”
George said that he did.
“It’s a wild place, I understand. Have you been there lately?”
“Not since last May, and then I caught the finest string of trout
there I ever saw.”
“Well,” continued the officer, “there’s one place in the ravine
through which the brook runs, that bears a striking resemblance, in
everything except grandeur and extent, to a famous valley
somewhere out West, and when some of the academy boys were
botanizing there, a few years ago, they named it the Little Yosemite.”
“I know right where it is,” said George.
“Then take us there by the quickest and shortest route.”
George closed the door of the cabin, mounted the horse that had
been provided for him, and led the way around the head of the lake.
The shortest route to the place they wanted to find was a long one,
and a rough one too; and, for almost the entire distance, it led
through a thick wood, where every step of the way was obstructed by
bushes and fallen logs, which were piled upon and across one
another in every conceivable shape.
After two hours of slow and laborious riding, George dismounted,
pushed aside the bushes, and gave his companions their first view of
the Little Yosemite. Dungan Brook they could not see. It was so far
below them that the ripple of its waters could be but faintly heard.
“As long as I have lived in this county I never knew before that it
could boast of scenery like this,” said the sheriff, as he drew back
from the edge of the gulf, after trying in vain to see the bottom of it.
“How are we going to get down there?”
“Hitch your horses, and I will see if I can find the path I cut the last
time I was here,” said George. “Here it is now, and, I declare, it looks
as though it had been used,” he added, in a tone of surprise.
The officers smiled, but said nothing. They followed their guide, as
he scrambled down the bluff, and in a few minutes more they were
standing beside the brook.
“There’s Le Capitan,” said George, pointing to a huge rock on the
other side of the stream, which rose to the height of two hundred feet
without a single break or crevice.
“I recognize the captain from the description I have received of
him,” said the sheriff, as he drew a note-book from his pocket, and
consulted a diagram that he or somebody else had drawn on one of
the pages. “He is in a bad business for he is standing guard over
stolen property.”
The officer led the way across the brook, and around the base of
the rock, to a thick cluster of bushes, in front of which he stopped
long enough to light a dark lantern he had brought with him. Then
he dived into the bushes, and when George and the constables
followed him they could not find him.
He had disappeared in a small opening in the ground, which
seemed to run back under the rock. Presently a bundle of something
came sailing out, then another and another, until there was a small
cartload of them piled up before the opening.
The constables examined them as fast as they came out, and found
that they contained a quantity of ready-made clothing, underwear of
all kinds, boxes of cigars, tobacco, jewelry, jack-knives, pistols,
cutlery, buffalo robes, blankets, cloaks, and a lot of other articles too
numerous to mention.
The constables opened their eyes in surprise when the sheriff came
out, and told them that these were not half the goods that had been
stolen. The rest had been sold to enable the thieves to raise money
enough for their Western trip.
“What were they going to do out West?” asked George.
“What do people of this stamp generally do out there?” asked the
constable, in reply. “Benson and Forbes would have died of home-
sickness, and Wallace would have been in the hands of a vigilance
committee in less than a week. Now let’s go up to headquarters, and
see what we shall find there.”
After taking another look at his diagram, the sheriff moved up the
ravine, closely examining the base of the bluff as he went, and when
he stopped, it was in front of a little pole cabin, which was so
effectually concealed by the thick shrubbery and trees that
surrounded it that one might have passed within five feet of it
without knowing that there was any cabin there. Having opened the
door, which was formed of half a dozen saplings that fitted loosely
into holes in the ground, the sheriff went in and flashed his lantern
around.
“This is where they used to come to hold their revels and plan their
expeditions,” said he.
Wallace and his two friends had passed the preceding Saturday
there, perfecting their scheme for driving George Edwards away from
the lake, and securing possession of Mr. Stebbins’ money, and
everything in the cabin was just as they had left it.
There were the dishes from which they had eaten their dinner, the
hammocks in which they had swung while talking over their plans,
and the books and papers that had helped them while away their
leisure time were scattered about.
The officer picked up one of the books, and turning to the title-
page, read the words, “The Life of Jesse James.” Throwing it aside
with an exclamation of disgust, he picked up another, which was
entitled, “Wild Harry, the Black Valley Demon.”
“Here is the secret of the whole matter, and I can now understand
some things that I couldn’t see through before,” said the officer.
“Those foolish boys have poisoned their minds by reading dime
novels, and are anxious to imitate the heroes of them. I see that
Wallace’s name is on some, and that Forbes and Benson own the
others. Pick them up and be careful of them, for they will do for
evidence.”
George accompanied the officers to the village, not forgetting to
take his clothes with him, as Bob had directed, appeared as one of
the witnesses at the preliminary examination which was held that
afternoon, and that night he slept at the academy, so that he could be
ready to assume his duties the next morning.
The arrest and trial of the guilty boys created a greater sensation
than the quiet little village of Montford had ever known before.
Their fathers exerted themselves to the utmost in their behalf; but
their efforts to clear them were entirely unsuccessful, and the most
they could do was to secure a mitigation of the punishment they so
richly deserved.
As soon as the excitement was over, our three friends settled down
to business, working hard for five days in the week, and spending
every pleasant Saturday at the lake.
George Edwards proved to be an apt pupil, and very soon became
one of the most popular students at the academy. At first, the boys
played tricks upon him, in spite of all his caution; but George
submitted so good-naturedly, and did his full duty in so manly a way,
that they finally left off bothering him.
At the end of his second school year, Bob was permitted to take up
his abode at a private house in the village, and, at his earnest
solicitation, George consented to room with him.
They studied, worked, and played together, and it finally came to
be understood between them, that, if they could possibly prevent it,
they were not to allow themselves to be separated as long as they
lived.
George did not know what he was bringing upon himself by
consenting to this arrangement.
Having described, as rapidly as we could, the various incidents
that had operated to bring these two boys together, let us go back to
where we first found them—to the day on which that telegram
arrived from Arizona.
It was the last day they ever expected to spend in Montford, and it
had been big with events. They had passed their examination with
flying colors, the base-ball club to which they belonged had
established its claim to the championship, after a hotly-contested
game, and the two friends—there were only two of them now, for
Dick Langdon had completed the course a year before—were in high
spirits.
Having exchanged their uniforms for their ordinary clothes, and
taken a run around the bases for the last time, they set out for their
boarding-house.
CHAPTER XVI.
HOW ONE TELEGRAM WAS RECEIVED.

B ob Howard and his companion had other reasons besides those


of which we have spoken, for feeling at peace with themselves
and all the world.
By hard work and strict attention to their books, they had
succeeded in winning an enviable position in their class, and this
night was to wind up their connection with the academy in a blaze of
glory.
George had written an essay on “Unconscious Influence,” which
was a very creditable effort for a boy of his years, and Bob had been
chosen, without one dissenting voice, to deliver the valedictory.
Their trunks were packed, their tickets had been purchased, and
their landlady had promised to give them an early breakfast, so that
they could reach the depot in time to catch the western-bound train
that passed through the village at six o’clock.
“The time draws near,” said Bob, with a tragic air, as he glanced at
the little clock on the mantelpiece. “In five hours we shall have made
our last bow to a Montford audience. The only thing I regret is the
absence of my father; but he was not at all well when I last heard
from him, and he didn’t feel as though he could stand the journey. By
this time to-morrow, if nothing happens to delay us, we shall be
hurrying to meet him as fast as steam can carry us. I tell you, George,
you may make up your mind to see some fun when we get out there
in that wilderness, and for once in your life you will have hunting,
fishing, and horseback riding until you are heartily tired of them all.
Father has a pack of splendid hounds, and it will make you laugh to
see them in pursuit of an antelope or prairie wolf. When you grow
weary of that sport, you can go out with a double-barrel and shoot
grouse and sage-hens over as fine a brace of setters as ever drew to a
scent. Trout streams are plenty, and any one who can throw the fly
can snatch out such beauties as you don’t see here in the Eastern
States this side of the Rangeley Lakes. There is one thing we must do,
George, as soon as we can gain father’s consent—we must clear up a
certain mystery that hangs over those mountains.”
“I have often heard you speak of it,” replied George, with a smile;
“but you have never told me what it is.”
“If I could tell you, it wouldn’t be a mystery, would it? You needn’t
laugh about it, for there is a mystery there, and in all that country
there is no one who has ever been able to solve it. The Indians or
some of the trappers might do it, but they won’t try, for their
superstition makes them timid. Several parties, composed of settlers
and soldiers, and one or two scientific expeditions from Eastern
colleges, have started out from our valley, declaring that they
wouldn’t come back until the thing was cleared up; but they have
always returned, after a few weeks’ absence, in a most dilapidated
condition.”
“There must be a good many obstacles to be overcome,” said
George, “but you may count on me every time.”
“All right. I shall some day put your courage to the test. Now I will
tell you what I have decided to do. If my father is no worse when I
reach home, I shall go to college. He wants me to do it, and I should
like to carry out his wishes, although I expect to be a ranchman all
my life. If he requires my presence at home, I shall remain there, and
you must stay with me. I will give you a position as herdsman at good
wages, and will pay you in money or sheep, or both, just as you
prefer. You can make enough in a few years, by steady work and
economy, to start a ranch of your own on a small scale.”
“You are very kind, Bob,” said George.
“No, I am not. I am only selfish. I am thinking quite as much of my
own comfort and pleasure as I am of yours. I don’t want to stay out
there with no congenial companion to help me while away the time.
It is lonely, especially in winter, when we are snowed up or confined
to the house for days at a time by those furious storms that we call
‘blizzards.’ And since you have no home of your own, and no father
or mother, why shouldn’t you go with me?”
“Wouldn’t it be more agreeable for you to take your Cousin Arthur
out there with you?” asked George. “I have often heard you speak of
him.”
“No, it wouldn’t,” answered Bob, quickly. “His father—Uncle Bob,
after whom I was named—treated my father most shamefully, and
they have not seen each other for years. Father has forgiven him, and
Uncle Bob now and then writes him very friendly letters; but I am
afraid of Uncle Bob, for I know that he is cunning and vindictive, and
always on the lookout for a chance to work some injury to those he
does not like, because my mother often told me so. I have seen him
and Arthur several times, but I did not like either of them. There is
too much ‘Oily Gammon’ about Uncle Bob, while Arthur is—Well, the
less said about him the better. I wouldn’t take him into my father’s
house under any consideration, for his presence there would be
enough to rob life of all its pleasure. I say, George!” exclaimed Bob,
suddenly, “What is that on the table there by your elbow?”
George raised his arm, and, discovering the brown envelope, he
picked it up and looked at it.
“Why, it is a telegram, addressed to you!” said he, handing it over
to his friend, whose face had suddenly grown as pale as death.
“A telegram!” gasped Bob. “It can mean but one of two things. My
father is worse, or else he is—”
Bob could say no more. With trembling hands, he tore open the
dispatch, and, with one swift glance, made himself master of its
contents. Then he pressed his hand to his forehead in a bewildered
sort of way, reeled a moment, as if some one had dealt him a
stunning blow, and, falling heavily back upon the sofa, he covered his
face with his hands and burst into tears. The telegram fluttered out
of his nerveless fingers.
George picked it up, and read the following fateful words:

“Your father died very suddenly this morning. Come home immediately, and
telegraph me from Leavenworth when to meet you at the station.
G. H. Evans.”

We will not speak of the scene that followed. Such sorrow as this,
which had come upon Bob Howard like a clap of thunder from a
clear sky, is too sacred to be intruded upon, even by a sympathizing
pen.
It will be enough to say that after the first overwhelming burst of
grief had passed away, Bob acted more like a caged tiger than a
human being. He longed to fly on the wings of the wind to his far-off
home, in order that he might gaze once more upon that loved face
before the darkness of the grave shut it out forever from his view.
But steam was the only power that could take him there. The next
train left the village at six in the morning, and that was the one Bob
had intended to take.
He ate no supper, and when the time came he began preparing
himself for the evening’s festivities. What a mockery they seemed to
him now!
“Don’t go,” said George, who had tried his best to say something
comforting to his almost heart-broken friend. “The professor will not
expect anything of you to-night.”
“I shall go and deliver my speech—that is, if I have brains enough
to remember it,” said Bob, quietly but firmly. “This sorrow is my
own. No one in the wide world has a share in it, and you will see that
I have self-control enough to take me through the exercises without
detracting in the least from anybody’s enjoyment.”
And he kept his word.
The news of his bereavement had spread all through the village by
this time, and not one of the vast audience that crowded the
Academy Chapel expected to see him on the stage.
When the valedictory was announced, and the young orator
appeared before the footlights, a silence that was almost oppressive
fell upon the assembly. They all sympathized with the boy, and their
sympathy was so intense that, like the darkness that covered the land
of Egypt, it could be felt.
Bob’s voice was husky, and trembled a little at first, but he
gradually regained the mastery of himself as he proceeded, and,
when he ended his peroration, the applause that followed fairly
shook the building.
It was a spontaneous outburst of admiration, not for the oratorical
effort of the student—which was something better than common—
but for the wonderful nerve he exhibited. Few boys could have
passed through such an ordeal.
Bob set out for his boarding-house as soon as he left the stage, and
when George entered the room, an hour later, he was pacing the
floor, with his hands buried deep in his pockets, and his chin resting
on his breast. He was calmer now, and he even smiled as he gave his
chum an approving slap on the back.
“You did yourself credit to-night, George,” said he. “If I could write
an essay like that, I should feel proud of myself. Now, go to bed, and
I will have you up at five o’clock in the morning. I will lie down on
the sofa when I get tired. I know how to sympathize with you now,
for I am alone in the world as you are.”
“There are your uncle and your cousin,” George ventured to
remark.
“They are no more to me than they are to you,” replied Bob. “I
shall drop them a line, telling them of father’s death, but beyond
that, I shall have nothing to do with them. They can stay at their
home in Indiana, and you and I will live on the ranch. You are all I
have, and you must stick to me.”
Neither of the two boys slept a wink that night. Bob walked the
floor, and George lay in bed, watching him through his half-closed
eyes.
At half-past five they disposed of a hasty breakfast, said “good-by”
to their landlady, and to a few friends among the students who had
come to the depot to see them off, and then the fast express whirled
them away toward St. Louis.
Up to this time, Bob Howard’s career had been rather an
uneventful one; but now, capricious fate had taken him in hand, and
ordered that during the next few months his life was to be crowded
full of such excitement and adventure, such perils and startling
surprises, as never before fell to the lot of any boy.
He was to be given ample opportunity for the exercise of the
extraordinary nerve and pluck which he had exhibited while
delivering his valedictory, but with this difference:
Then, he was in the presence of friends, who would willingly have
made every allowance for him, had any forbearance or consideration
on their part been necessary; but hereafter he was to be surrounded
by enemies, who were already plotting his ruin, and who stood ready
to take every possible advantage of him.
Let us follow that other telegram to its destination, and see who
some of these enemies were.
CHAPTER XVII.
TWO NEW CHARACTERS.

“I f my last half-hour’s experience isn’t enough to disgust any one


with the dry-goods business, and everything connected with it, I
wouldn’t say so.”
Arthur Howard suspended for a moment the distasteful work of
rolling up the bolts of goods with which his counter was covered, and
gazed after a party of ladies who had just gone out.
While they were in the store he was all bows and smiles, struck
imposing attitudes, fumbled with the watch-chain that hung across
his vest, rested his white hands on the counter, so that the immense
seal-ring he wore on the third finger of his left hand could be plainly
seen, and tried in various other ways to make himself appear
interesting in the eyes of his fair customers; but now he frowned
fiercely, and slammed the heavy bolts about as if he were in no
amiable frame of mind.
He grew angry every time he looked toward the street. The day was
bright and pleasant, and not too warm for comfort, and everybody in
town seemed to have come out for a ride or a promenade.
“Everybody except me sees some pleasure in this world,” said Mr.
Arthur Howard, resuming his work. “I have to toil and slave all the
time for wages that are barely enough to keep me in cigars; and,
more than all, I can’t look forward to anything better. I shall lead a
dog’s life as long as I live. If I had money I should be perfectly happy,
and I would do anything in the world to get it. What did you say,
sir?”
This question was addressed to one of the proprietors of the store,
who leaned over the counter and said something in a tone so low that
Arthur did not catch the words.
“Mr. Allen desires your presence in the office,” was the reply.
The clerk’s under jaw dropped, and he grew red and pale by turns,
as he left his counter and walked toward the office, where the head of
the firm, a stern old gentleman, with gold eye glasses perched on the
top of his nose, sat in an easy chair waiting for him.
“Howard,” said the merchant, when the clerk in obedience to a
sign from his employer, had closed the door behind him, “how much
do we pay you for your services?”
“Twenty-five dollars a month, sir,” was the answer.
And the tone in which it was given was humble enough. The clerk
was always cringing in his demeanor toward his superiors, and
haughty and overbearing when in the presence of those whom he
considered to be beneath him in the social scale. He was just the sort
of person that tyrants are made of.
“Well, now, what I want to know is this,” continued the senior
partner. “How can you afford to dress as you do, and sport a watch
and chain, and rings, and patent-leather shoes, on twenty-five
dollars a month? I can’t afford so much finery on ten times that
amount. Then, your billiards and cigars must cost you a tidy sum,
and you don’t get those livery horses that you drive out into the
country every Sunday for nothing.”
“It takes all my salary, sir,” replied the clerk, pulling out his
handkerchief and arranging his moustache, not because it needed
arranging, but because he wanted to conceal his face from his
employer.
He knew that it was as red as fire, for he could feel it burn.
“Are you sure that you don’t spend more than your salary?” asked
the merchant, in a very significant tone of voice.
“Oh, yes, sir! yes, sir!—quite sure!” replied Mr. Howard, with more
earnestness than the occasion seemed to demand.
He wanted to add, “You surely do not suspect me of dishonesty?”
but the words stuck in his throat.
“Well,” said the merchant, after looking sharply at the clerk for a
moment, “all I have to say is, that you can make twenty-five dollars
go much further than I can. I cannot permit so much extravagance
among those in my employ, for, to say the least, it looks suspicious.
So I have called you in here for the purpose of telling you that we
shall have no further occasion for your services. There is the money
we owe you. Good-day!”
“I am well out of that scrape,” said Mr. Howard to himself, as he
walked rapidly away from the store. “I have been looking for it for a
long time, and I am glad it is over. They can’t prove anything against
me, for I have been very careful, and never took more than two
dollars at a time. Of course, when the receipts ran up to two or three
hundred dollars a day, so small an amount as that wouldn’t be
missed. Now, where shall I look for another situation? Well, I’ll not
think about that now. ‘Sufficient unto the day is the evil thereof,’ as
Shakespeare says. I guess I’ll smoke.”
This soliloquy would seem to indicate that trouble sat very lightly
on Mr. Howard’s shoulders, and that he was not very well posted in
either Shakespeare or the Bible.
It would also seem to indicate that the suspicions his late employer
entertained regarding his honesty were well founded.
Mr. Howard did not care a snap of his finger for those suspicions;
but he did care for the loss of his situation, for he knew that if he did
not work he could get no money to spend.
He turned into a little cigar store while he was communing with
himself, and when he came out, with a freshly-lighted Havana
between his fingers, he saw a sight that enraged him.
An elegant top-buggy, drawn by a pair of stylish, high-stepping
horses, which moved as if they were proud of the gold-mounted
harness they wore, dashed along the street.
The reins were held by an exquisitely-dressed young gentleman
who managed them adroitly with one hand, while with the other he
saluted the friends and acquaintances he saw on the sidewalk. But
there was no salute for Mr. Howard—only a barely perceptible nod of
the head, which the latter pretended he did not see.
“I declare, it’s enough to make one do something desperate,”
thought he, as he threw his cigar spitefully into the gutter and
resumed his walk. “Look at me, and then look at Coal Oil Tom! I have
just seventy dollars in my pocket, less what I paid for that cigar, and
no prospect of getting any more. Five years ago Tom was a hostler in
a hotel stable, somewhere in Pennsylvania—a low, ignorant hostler—
and all he had in the world was a little, rocky farm that he couldn’t
give away. But oil was discovered on that farm, and to-day Tom is
worth half a million dollars. He doesn’t know enough to keep him
over night, but his money takes him into the best society, while I—I
wish those horses would run away, and throw him out and break his
neck!”
Mr. Howard stopped, and looked back at the carriage that
contained the object of his envy, as if he fully expected that his
amiable wish would be gratified. But the rapidly-moving trotters
were kept under perfect control, and in a short time took their driver
safely out of Mr. Howard’s sight.
A quarter of an hour’s walk brought the clerk to his home—a little
cottage in an obscure street, whose surroundings bore testimony to
the poverty or shiftlessness of its occupants.
The house, as well as the fence in front of it, was sadly in need of
paint; some of the blinds hung by one hinge, disclosing to the public
gaze windows with broken panes and sashes heavily festooned with
cobwebs; and the flower garden, once the pride of Arthur’s mother,
now dead and gone, had been given up to weeds, which also covered
the walk that led from the gate through a narrow alley to the back
door.
“This is a pretty place for a white man to call home, I must say!”
said the clerk to himself, while bitterness rankled in his heart. “When
I come here, after passing the fine houses on Crosby Street, where
those happy young people spend every afternoon in playing croquet
on the finely-kept lawns, I tell you it makes me feel wicked when I
contrast their circumstances with my own. No one ever thinks of
inviting me to make one of such a party, and yet I am just as good as
the best of them. It’s the ready cash that determines one’s position in
this world. I wonder what the governor will have to say to me? Of
course I shall not tell him why I was discharged.”
Passing through the kitchen, where a slovenly servant girl was
moving leisurely about making preparations for supper, Arthur
entered the sitting-room, and found there a shabby-genteel old man,
who was slowly pacing the floor. This was Arthur’s father—the
“Uncle Bob” after whom our hero had been named.
He was not a man to inspire confidence at the first glance, and the
longer you looked at him, the less you would like him. He had an
insinuating—or rather, a sneaking—air that he could not shake off,
and his movements, as he trod the thread-bare carpet with his well-
worn gaiters, reminded you of the stealthy actions of a fox.
He had a high and very narrow forehead, a pair of piercing gray
eyes, which looked at you from under shaggy brows, and a long, thin
nose—a nose that seemed formed for thrusting itself into other
people’s affairs, and for finding out secrets that its owner had no
business to know.
Uncle Bob, as we shall call him in this story, had once been in
business for himself; but he was a gentleman of leisure now.
Following the example of more respected men, he had gone as
heavily in debt as his limited credit would allow, and failed when the
proper time came. But it is a dangerous thing for one to fail in
business with his pockets full, unless they are very full, and Uncle
Bob’s creditors had looked so closely into his way of doing business,
that he barely escaped being taken in hand by the law.
It was from this man that Arthur had inherited his great desire for
wealth and his utter abhorrence of any kind of work.
“You are home early to-night,” said Uncle Bob, pausing in his walk.
“Yes,” was the indifferent reply. “And I shall probably be at home
earlier to-morrow night. I have got my walking-papers.”
“Ah!” exclaimed Uncle Bob, elevating his shaggy eyebrows. “What
for?”
“Too many clerks.”
“And what are you going to do now? You can’t live without work.”
“I know that; but I shall not look for another place until the
seventy dollars I have in my pocket are gone. I am going to make
believe that it is two thousand, and live like a gentleman for awhile.
It is hard to be poor. You don’t respect yourself and no one respects
you. What is it, Jane?” he added, turning to the servant girl who just
then opened the door.
“A letter for Mr. Howard,” replied the girl.
“A letter?” repeated Uncle Bob, with a shade of anxiety in his
tones. “Why, it’s a telegram. Who in the world—”
He closed the door behind the girl, and stood with his eyes
fastened on the envelope as if he hoped to find something there that
would tell him where the dispatch came from and what it contained.
“Hand it over here and I will read it for you,” said Arthur, after he
had waited until his patience was all exhausted.
His father probably did not hear the request, or, if he did, he paid
no attention to it. He seated himself in the nearest chair and tore
open the envelope with the most exasperating deliberation.
Like Micawber, he had long clung firmly to the hope that
something would “turn up” in his favor—that the fickle goddess who
had hitherto frowned upon him would change her frowns to smiles—
and he little imagined how near he was to seeing his fond dream
realized.
CHAPTER XVIII.
HOW THE OTHER WAS RECEIVED.

“B y the piper that played before Moses!” exclaimed the telegraph


operator at Bolton, when he had received and copied a
message that had come over the wires all the way from some little
place buried in the wilds of Arizona. “If that old villain, Bob Howard,
hasn’t struck it rich this time, I am beat!”
Here the operator read the message over again to make sure that
he had made no mistake in copying it, shaking his head and sighing
deeply all the while, and then he put it into an envelope, which he
handed over to a messenger boy who happened to enter the office at
that moment.
“Wonders will never cease!” he added, as he walked up and down
the office, with his hands buried deep in his pockets; “but this is a
little ahead of anything I ever heard of, and it doesn’t seem possible.
‘And the whole of your deceased brother’s property, roughly
estimated at—’ Whew! I wouldn’t give much for it by the time old
Bob and that scapegrace son of his get through handling it. I guess
that man out in Arizona couldn’t have known his brother as well as
we in Bolton know him. I pity that nephew, whoever he is.”
The messenger boy readily found his way to the little cottage in
that obscure street, of which we spoke in the last chapter, and there,
as we have seen, he found the man for whom it was intended.
“G. H. Evans,” said Uncle Bob, slowly reading the name that was
signed to the dispatch. “Who is he?”
“Why, it is from Arizona!” exclaimed Arthur, who was looking over
Uncle Bob’s shoulder. “Listen to this: ‘Your brother, Eben Howard,
died very suddenly this morning.’ Humph!” he ejaculated, walking
back to his seat with an air of disgust. “They probably expect you to
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like