Full Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition Fabio Nelli Ebook All Chapters
Full Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition Fabio Nelli Ebook All Chapters
com
https://ebookmeta.com/product/python-data-analytics-with-
pandas-numpy-and-matplotlib-3rd-edition-fabio-nelli/
OR CLICK BUTTON
DOWLOAD NOW
https://ebookmeta.com/product/python-data-analytics-with-pandas-
numpy-and-matplotlib-3rd-edition-fabio-nelli-2/
https://ebookmeta.com/product/python-data-analytics-data-
analysis-and-science-using-pandas-matplotlib-and-the-python-
programming-language-1st-edition-nelli-fabio/
https://ebookmeta.com/product/python-data-analytics-data-
analysis-and-science-using-pandas-matplotlib-and-the-python-
programming-language-1st-edition-nelli-fabio-2/
https://ebookmeta.com/product/python-data-analytics-data-
analysis-and-science-using-pandas-matplotlib-and-the-python-
programming-language-1st-edition-nelli-fabio-3/
Python Data Analysis Numpy, Matplotlib and Pandas Bernd
Klein
https://ebookmeta.com/product/python-data-analysis-numpy-
matplotlib-and-pandas-bernd-klein/
https://ebookmeta.com/product/data-analysis-with-python-
introducing-numpy-pandas-matplotlib-and-essential-elements-of-
python-programming-1st-edition-rituraj-dixit/
https://ebookmeta.com/product/python-for-data-analysis-data-
wrangling-with-pandas-numpy-and-ipython-1st-edition-wes-mckinney/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/numerical-python-scientific-
computing-and-data-science-applications-with-numpy-scipy-and-
matplotlib-2nd-edition-robert-johansson/
Python Data Analytics
With Pandas, NumPy, and Matplotlib
Third Edition
Fabio Nelli
Python Data Analytics: With Pandas, NumPy, and Matplotlib
Fabio Nelli
Rome, Italy
This book is dedicated to all those who are constantly looking for awareness
Table of Contents
■
■Chapter 1: An Introduction to Data Analysis��������������������������������������������������������� 1
Data Analysis�������������������������������������������������������������������������������������������������������������������� 1
Knowledge Domains of the Data Analyst������������������������������������������������������������������������� 2
Computer Science���������������������������������������������������������������������������������������������������������������������������������� 2
Mathematics and Statistics�������������������������������������������������������������������������������������������������������������������� 3
Machine Learning and Artificial Intelligence������������������������������������������������������������������������������������������ 3
Professional Fields of Application����������������������������������������������������������������������������������������������������������� 3
v
■ Table of Contents
SciPy������������������������������������������������������������������������������������������������������������������������������ 42
NumPy�������������������������������������������������������������������������������������������������������������������������������������������������� 42
Pandas�������������������������������������������������������������������������������������������������������������������������������������������������� 43
matplotlib��������������������������������������������������������������������������������������������������������������������������������������������� 43
Conclusions�������������������������������������������������������������������������������������������������������������������� 43
■
■Chapter 3: The NumPy Library����������������������������������������������������������������������������� 45
NumPy: A Little History��������������������������������������������������������������������������������������������������� 45
The NumPy Installation�������������������������������������������������������������������������������������������������� 46
ndarray: The Heart of the Library����������������������������������������������������������������������������������� 47
Create an Array������������������������������������������������������������������������������������������������������������������������������������� 48
Types of Data���������������������������������������������������������������������������������������������������������������������������������������� 49
The dtype Option���������������������������������������������������������������������������������������������������������������������������������� 50
Intrinsic Creation of an Array���������������������������������������������������������������������������������������������������������������� 50
Basic Operations������������������������������������������������������������������������������������������������������������ 51
Arithmetic Operators���������������������������������������������������������������������������������������������������������������������������� 52
The Matrix Product������������������������������������������������������������������������������������������������������������������������������� 53
vi
■ Table of Contents
General Concepts����������������������������������������������������������������������������������������������������������� 64
Copies or Views of Objects������������������������������������������������������������������������������������������������������������������� 64
Vectorization����������������������������������������������������������������������������������������������������������������������������������������� 65
Broadcasting���������������������������������������������������������������������������������������������������������������������������������������� 66
Structured Arrays����������������������������������������������������������������������������������������������������������� 68
Reading and Writing Array Data on Files������������������������������������������������������������������������ 70
Loading and Saving Data in Binary Files���������������������������������������������������������������������������������������������� 70
Reading Files with Tabular Data����������������������������������������������������������������������������������������������������������� 70
Conclusions�������������������������������������������������������������������������������������������������������������������� 72
■
■Chapter 4: The pandas Library—An Introduction������������������������������������������������ 73
pandas: The Python Data Analysis Library��������������������������������������������������������������������� 73
Installation of pandas����������������������������������������������������������������������������������������������������� 74
Installation from Anaconda������������������������������������������������������������������������������������������������������������������� 74
Installation from PyPI���������������������������������������������������������������������������������������������������������������������������� 78
vii
■ Table of Contents
The Series��������������������������������������������������������������������������������������������������������������������������������������������� 80
The Dataframe�������������������������������������������������������������������������������������������������������������������������������������� 87
The Index Objects��������������������������������������������������������������������������������������������������������������������������������� 94
Conclusions������������������������������������������������������������������������������������������������������������������ 114
■
■Chapter 5: pandas: Reading and Writing Data��������������������������������������������������� 115
I/O API Tools������������������������������������������������������������������������������������������������������������������ 115
CSV and Textual Files��������������������������������������������������������������������������������������������������� 116
Reading Data in CSV or Text Files��������������������������������������������������������������������������������� 116
Using Regexp to Parse TXT Files�������������������������������������������������������������������������������������������������������� 119
Reading TXT Files Into Parts��������������������������������������������������������������������������������������������������������������� 121
Writing Data in CSV���������������������������������������������������������������������������������������������������������������������������� 121
viii
■ Table of Contents
Concatenating�������������������������������������������������������������������������������������������������������������� 154
Combining������������������������������������������������������������������������������������������������������������������������������������������ 156
Pivoting����������������������������������������������������������������������������������������������������������������������������������������������� 157
Removing�������������������������������������������������������������������������������������������������������������������������������������������� 160
Permutation������������������������������������������������������������������������������������������������������������������ 169
Random Sampling������������������������������������������������������������������������������������������������������������������������������ 170
ix
■ Table of Contents
pyplot��������������������������������������������������������������������������������������������������������������������������� 189
The Plotting Window��������������������������������������������������������������������������������������������������������������������������� 189
Histograms������������������������������������������������������������������������������������������������������������������� 218
Bar Charts�������������������������������������������������������������������������������������������������������������������� 219
Horizontal Bar Charts�������������������������������������������������������������������������������������������������������������������������� 222
Multiserial Bar Charts������������������������������������������������������������������������������������������������������������������������� 223
Multiseries Bar Charts with a pandas Dataframe������������������������������������������������������������������������������� 225
Multiseries Stacked Bar Charts���������������������������������������������������������������������������������������������������������� 227
Stacked Bar Charts with a pandas Dataframe������������������������������������������������������������������������������������ 229
Other Bar Chart Representations�������������������������������������������������������������������������������������������������������� 230
xi
■ Table of Contents
■
■Chapter 8: Machine Learning with scikit-learn������������������������������������������������� 259
The scikit-learn Library������������������������������������������������������������������������������������������������ 259
Machine Learning��������������������������������������������������������������������������������������������������������� 259
Supervised and Unsupervised Learning��������������������������������������������������������������������������������������������� 259
Training Set and Testing Set��������������������������������������������������������������������������������������������������������������� 260
Conclusions������������������������������������������������������������������������������������������������������������������ 287
■
■Chapter 9: Deep Learning with TensorFlow������������������������������������������������������� 289
Artificial Intelligence, Machine Learning, and Deep Learning�������������������������������������� 289
Artificial Intelligence��������������������������������������������������������������������������������������������������������������������������� 289
Machine Learning Is a Branch of Artificial Intelligence���������������������������������������������������������������������� 290
Deep Learning Is a Branch of Machine Learning�������������������������������������������������������������������������������� 290
The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning������������������ 290
xii
■ Table of Contents
TensorFlow������������������������������������������������������������������������������������������������������������������� 298
TensorFlow: Google’s Framework������������������������������������������������������������������������������������������������������� 298
TensorFlow: Data Flow Graph������������������������������������������������������������������������������������������������������������� 298
Conclusions������������������������������������������������������������������������������������������������������������������ 321
■
■Chapter 10: An Example—Meteorological Data������������������������������������������������ 323
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea������������������������� 323
The System in the Study: The Adriatic Sea and the Po Valley������������������������������������������������������������� 323
xiii
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 348
■
■Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook���� 349
The Open Data Source for Demographics�������������������������������������������������������������������� 349
The JavaScript D3 Library�������������������������������������������������������������������������������������������� 352
Drawing a Clustered Bar Chart������������������������������������������������������������������������������������� 355
The Choropleth Maps��������������������������������������������������������������������������������������������������� 358
The Choropleth Map of the U.S. Population in 2022����������������������������������������������������� 362
Conclusions������������������������������������������������������������������������������������������������������������������ 366
■
■Chapter 12: Recognizing Handwritten Digits���������������������������������������������������� 367
Handwriting Recognition���������������������������������������������������������������������������������������������� 367
Recognizing Handwritten Digits with scikit-learn�������������������������������������������������������� 367
The Digits Dataset�������������������������������������������������������������������������������������������������������� 368
Learning and Predicting����������������������������������������������������������������������������������������������� 370
Recognizing Handwritten Digits with TensorFlow�������������������������������������������������������� 372
Learning and Predicting with an SLP��������������������������������������������������������������������������� 376
Learning and Predicting with an MLP�������������������������������������������������������������������������� 381
Conclusions������������������������������������������������������������������������������������������������������������������ 384
■
■Chapter 13: Textual Data Analysis with NLTK���������������������������������������������������� 385
Text Analysis Techniques���������������������������������������������������������������������������������������������� 385
The Natural Language Toolkit (NLTK)�������������������������������������������������������������������������������������������������� 386
Import the NLTK Library and the NLTK Downloader Tool��������������������������������������������������������������������� 386
Search for a Word with NLTK�������������������������������������������������������������������������������������������������������������� 389
Analyze the Frequency of Words�������������������������������������������������������������������������������������������������������� 390
Select Words from Text����������������������������������������������������������������������������������������������������������������������� 392
Bigrams and Collocations������������������������������������������������������������������������������������������������������������������� 393
Preprocessing Steps��������������������������������������������������������������������������������������������������������������������������� 394
xiv
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 401
■
■Chapter 14: Image Analysis and Computer Vision with OpenCV����������������������� 403
Image Analysis and Computer Vision��������������������������������������������������������������������������� 403
OpenCV and Python������������������������������������������������������������������������������������������������������ 404
OpenCV and Deep Learning������������������������������������������������������������������������������������������ 404
Installing OpenCV��������������������������������������������������������������������������������������������������������� 404
First Approaches to Image Processing and Analysis���������������������������������������������������� 404
Before Starting����������������������������������������������������������������������������������������������������������������������������������� 404
Load and Display an Image���������������������������������������������������������������������������������������������������������������� 405
Work with Images������������������������������������������������������������������������������������������������������������������������������� 406
Save the New Image��������������������������������������������������������������������������������������������������������������������������� 407
Elementary Operations on Images������������������������������������������������������������������������������������������������������ 407
Image Blending����������������������������������������������������������������������������������������������������������������������������������� 411
■
■Appendix B: Open Data Sources������������������������������������������������������������������������ 435
Index��������������������������������������������������������������������������������������������������������������������� 437
xv
About the Author
Fabio Nelli is a data scientist and Python consultant who designs and develops Python applications for
data analysis and visualization. He also has experience in the scientific world, having performed various
data analysis roles in pharmaceutical chemistry for private research companies and universities. He has
been a computer consultant for many years at IBM, EDS, and Hewlett-Packard, along with several banks
and insurance companies. He holds a master’s degree in organic chemistry and a bachelor’s degree in
information technologies and automation systems, with many years of experience in life sciences (as a tech
specialist at Beckman Coulter, Tecan, and SCIEX).
For further info and other examples, visit his page at www.meccanismocomplesso.org and the GitHub
page at https://github.com/meccanismocomplesso.
xvii
About the Technical Reviewer
xix
Preface
About five years have passed since the last edition of this book. In drafting this third edition, I made some
necessary changes, both to the text and to the code. First, all the Python code has been ported to 3.8 and
greater, and all references to Python 2.x versions have been dropped. Some chapters required a total
rewrite because the content was no longer compatible. I'm referring to TensorFlow 3.x which, compared
to TensorFlow 2.x (covered in the previous edition), has completely revamped its entire reference system.
In five years, the deep learning modules and code developed with version 2.x have proven completely
unusable. Keras and all its modules have been incorporated into the TensorFlow library, replacing all the
classes, functions, and modules that performed similar functions. The construction of neural network
models, their learning phases, and the functions they use have all completely changed. In this edition,
therefore, you have the opportunity to learn the methods of TensorFlow 3.x and to acquire familiarity with
the concepts and new paradigms in the new version.
Regarding data visualization, I decided to add information about the Seaborn library to the matplotlib
chapter. Seaborn, although still in version 0.x, is proving to be a very useful matplotlib extension for data
analysis, thanks to its statistical display of plots and its compatibility with pandas dataframes. I hope that,
with this completely updated third edition, I can further entice you to study and deepen your data analysis
with Python. This book will be a valuable learning tool for you now, and serve as a dependable reference in
the future.
—Fabio Nelli
xxi
CHAPTER 1
In this chapter, you’ll take your first steps in the world of data analysis, learning in detail the concepts and
processes that make up this discipline. The concepts discussed in this chapter are helpful background
for the following chapters, where these concepts and procedures are applied in the form of Python code,
through the use of several libraries that are discussed in later chapters.
Data Analysis
In a world increasingly centralized around information technology, huge amounts of data are produced
and stored each day. Often these data come from automatic detection systems, sensors, and scientific
instrumentation, or you produce them daily and subconsciously every time you make a withdrawal from the
bank or purchase something, when you record various blogs, or even when you post on social networks.
But what are the data? The data actually are not information, at least in terms of their form. In the
formless stream of bytes, at first glance it is difficult to understand their essence, if they are not strictly
numbers, words, or times. This information is actually the result of processing, which, taking into account a
certain dataset, extracts conclusions that can be used in various ways. This process of extracting information
from raw data is called data analysis.
The purpose of data analysis is to extract information that is not easily deducible but, when understood,
enables you to carry out studies on the mechanisms of the systems that produced the data. This in turn
allows you to forecast possible responses of these systems and their evolution in time.
Starting from a simple methodical approach to data protection, data analysis has become a real
discipline, leading to the development of real methodologies that generate models. The model is in fact
a translation of the system to a mathematical form. Once there is a mathematical or logical form that can
describe system responses under different levels of precision, you can predict its development or response
to certain inputs. Thus, the aim of data analysis is not the model, but the quality of its predictive power.
The predictive power of a model depends not only on the quality of the modeling techniques but also
on the ability to choose a good dataset upon which to build the entire analysis process. So the search for
data, their extraction, and their subsequent preparation, while representing preliminary activities of an
analysis, also belong to data analysis itself, because of their importance in the success of the results.
So far I have spoken of data, their handling, and their processing through calculation procedures. In
parallel to all the stages of data analysis processing, various methods of data visualization have also been
developed. In fact, to understand the data, both individually and in terms of the role they play in the dataset,
there is no better system than to develop the techniques of graphical representation. These techniques are
capable of transforming information, sometimes implicitly hidden, into figures, which help you more easily
understand the meaning of the data. Over the years, many display modes have been developed for different
modes of data display, called charts.
At the end of the data analysis process, you have a model and a set of graphical displays and you can
predict the responses of the system under study; after that, you move to the test phase. The model is tested
using another set of data for which you know the system response. These data do not define the predictive
model. Depending on the ability of the model to replicate real, observed responses, you get an error
calculation and knowledge of the validity of the model and its operating limits.
These results can be compared to any other models to understand if the newly created one is
more efficient than the existing ones. Once you have assessed that, you can move to the last phase of
data analysis—deployment. This phase consists of implementing the results produced by the analysis,
namely, implementing the decisions to be made based on the predictions generated by the model and its
associated risks.
Data analysis is well suited to many professional activities. So, knowledge of it and how it can be put
into practice is relevant. It allows you to test hypotheses and understand the systems you’ve analyzed
more deeply.
Computer Science
Knowledge of computer science is a basic requirement for any data analyst. In fact, only when you have
good knowledge of and experience in computer science can you efficiently manage the necessary tools for
data analysis. In fact, every step concerning data analysis involves using calculation software (such as IDL,
MATLAB, etc.) and programming languages (such as C ++, Java, and Python).
The large amount of data available today, thanks to information technology, requires specific skills in
order to be managed as efficiently as possible. Indeed, data research and extraction require knowledge of
these various formats. The data are structured and stored in files or database tables with particular formats.
XML, JSON, or simply XLS or CSV files, are now the common formats for storing and collecting data, and
many applications allow you to read and manage the data stored in them. When it comes to extracting data
contained in a database, things are not so immediate, but you need to know the SQL Query language or use
software specially developed for the extraction of data from a given database.
Moreover, for some specific types of data research, the data are not available in an explicit format, but
are present in text files (documents and log files) or web pages, or shown as charts, measures, number of
visitors, or HTML tables. This requires specific technical expertise to parse and eventually extract these data
(called web scraping).
2
Chapter 1 ■ An Introduction to Data Analysis
Knowledge of information technology is necessary for using the various tools made available by
contemporary computer science, such as applications and programming languages. These tools, in turn, are
needed to perform data analysis and data visualization.
The purpose of this book is to provide all the necessary knowledge, as far as possible, regarding the
development of methodologies for data analysis. The book uses the Python programming language and
specialized libraries that contribute to the performance of the data analysis steps, from data research to data
mining, to publishing the results of the predictive model.
3
Chapter 1 ■ An Introduction to Data Analysis
Types of Data
Data can be divided into two distinct categories:
• Categorical (nominal and ordinal)
• Numerical (discrete and continuous)
Categorical data are values or observations that can be divided into groups or categories. There are two
types of categorical values: nominal and ordinal. A nominal variable has no intrinsic order that is identified
in its category. An ordinal variable instead has a predetermined order.
Numerical data are values or observations that come from measurements. There are two types of
numerical values: discrete and continuous numbers. Discrete values can be counted and are distinct and
separated from each other. Continuous values, on the other hand, are values produced by measurements or
observations that assume any value within a defined range.
4
Chapter 1 ■ An Introduction to Data Analysis
• Predictive modeling
• Model validation/testing
• Visualization and interpretation of results
• Deployment of the solution (implementation of the solution in the real world)
Figure 1-1 shows a schematic representation of all the processes involved in data analysis.
Problem Definition
The process of data analysis actually begins long before the collection of raw data. In fact, data analysis
always starts with a problem to be solved, which needs to be defined.
The problem is defined only after you have focused the system you want to study; this may be a
mechanism, an application, or a process in general. Generally this study can be in order to better understand
its operation, but in particular, the study is designed to understand the principles of its behavior in order to
be able to make predictions or choices (defined as an informed choice).
The definition step and the corresponding documentation (deliverables) of the scientific problem or
business are both very important in order to focus the entire analysis strictly on getting results. In fact, a
comprehensive or exhaustive study of the system is sometimes complex and you do not always have enough
information to start with. So the definition of the problem and especially its planning can determine the
guidelines for the whole project.
5
Chapter 1 ■ An Introduction to Data Analysis
Once the problem has been defined and documented, you can move to the project planning stage of
data analysis. Planning is needed to understand which professionals and resources are necessary to meet
the requirements to carry out the project as efficiently as possible. You consider the issues involving the
resolution of the problem. You look for specialists in various areas of interest and install the software needed
to perform data analysis.
Also during the planning phase, you choose an effective team. Generally, these teams should be cross-
disciplinary in order to solve the problem by looking at the data from different perspectives. So, building a
good team is certainly one of the key factors leading to success in data analysis.
Data Extraction
Once the problem has been defined, the first step is to obtain the data in order to perform the analysis.
The data must be chosen with the basic purpose of building the predictive model, and so data selection is
crucial for the success of the analysis as well. The sample data collected must reflect as much as possible
the real world, that is, how the system responds to stimuli from the real world. For example, if you’re using
huge datasets of raw data and they are not collected competently, these may portray false or unbalanced
situations.
Thus, poor choice of data, or even performing analysis on a dataset that’s not perfectly representative of
the system, will lead to models that will move away from the system under study.
The search and retrieval of data often require a form of intuition that goes beyond mere technical
research and data extraction. This process also requires a careful understanding of the nature and form of
the data, which only good experience and knowledge in the problem’s application field can provide.
Regardless of the quality and quantity of data needed, another issue is using the best data sources.
If the studio environment is a laboratory (technical or scientific) and the data generated are
experimental, then in this case the data source is easily identifiable. In this case, the problems will be only
concerning the experimental setup.
But it is not possible for data analysis to reproduce systems in which data are gathered in a strictly
experimental way in every field of application. Many fields require searching for data from the surrounding
world, often relying on external experimental data, or even more often collecting them through interviews
or surveys. So in these cases, finding a good data source that is able to provide all the information you need
for data analysis can be quite challenging. Often it is necessary to retrieve data from multiple data sources to
supplement any shortcomings, to identify any discrepancies, and to make the dataset as general as possible.
When you want to get the data, a good place to start is the web. But most of the data on the web can be
difficult to capture; in fact, not all data are available in a file or database, but might be content that is inside
HTML pages in many different formats. To this end, a methodology called web scraping allows the collection
of data through the recognition of specific occurrence of HTML tags within web pages. There is software
specifically designed for this purpose, and once an occurrence is found, it extracts the desired data. Once the
search is complete, you will get a list of data ready to be subjected to data analysis.
Data Preparation
Among all the steps involved in data analysis, data preparation, although seemingly less problematic, in
fact requires more resources and more time to be completed. Data are often collected from different data
sources, each of which has data in it with a different representation and format. So, all of these data have to
be prepared for the process of data analysis.
The preparation of the data is concerned with obtaining, cleaning, normalizing, and transforming
data into an optimized dataset, that is, in a prepared format that’s normally tabular and is suitable for the
methods of analysis that have been scheduled during the design phase.
Many potential problems can arise, including invalid, ambiguous, or missing values, replicated fields,
and out-of-range data.
6
Chapter 1 ■ An Introduction to Data Analysis
Data Exploration/Visualization
Exploring the data involves essentially searching the data in a graphical or statistical presentation in order
to find patterns, connections, and relationships. Data visualization is the best tool to highlight possible
patterns.
In recent years, data visualization has been developed to such an extent that it has become a real
discipline in itself. In fact, numerous technologies are utilized exclusively to display data, and many display
types are applied to extract the best possible information from a dataset.
Data exploration consists of a preliminary examination of the data, which is important for
understanding the type of information that has been collected and what it means. In combination with the
information acquired during the definition problem, this categorization determines which method of data
analysis is most suitable for arriving at a model definition.
Generally, this phase, in addition to a detailed study of charts through the visualization data, may
consist of one or more of the following activities:
• Summarizing data
• Grouping data
• Exploring the relationship between the various attributes
• Identifying patterns and trends
Generally, data analysis requires summarizing statements regarding the data to be studied.
Summarization is a process by which data are reduced to interpretation without sacrificing important
information.
Clustering is a method of data analysis that is used to find groups united by common attributes (also
called grouping).
Another important step of the analysis focuses on the identification of relationships, trends, and
anomalies in the data. In order to find this kind of information, you often have to resort to the tools as well as
perform another round of data analysis, this time on the data visualization itself.
Other methods of data mining, such as decision trees and association rules, automatically extract
important facts or rules from the data. These approaches can be used in parallel with data visualization to
uncover relationships between the data.
Predictive Modeling
Predictive modeling is a process used in data analysis to create or choose a suitable statistical model to
predict the probability of a result.
After exploring the data, you have all the information needed to develop the mathematical model that
encodes the relationship between the data. These models are useful for understanding the system under
study, and in a specific way they are used for two main purposes. The first is to make predictions about the
data values produced by the system; in this case, you will be dealing with regression models if the result is
numeric or with classification models if the result is categorical. The second purpose is to classify new data
products, and in this case, you will be using classification models if the results are identified by classes or
clustering models if the results could be identified by segmentation. In fact, it is possible to divide the models
according to the type of result they produce:
• Classification models: If the result obtained by the model type is categorical.
• Regression models: If the result obtained by the model type is numeric.
• Clustering models: If the result obtained by the model type is a segmentation.
7
Chapter 1 ■ An Introduction to Data Analysis
Simple methods to generate these models include techniques such as linear regression, logistic
regression, classification and regression trees, and k-nearest neighbors. But the methods of analysis are
numerous, and each has specific characteristics that make it excellent for some types of data and analysis.
Each of these methods will produce a specific model, and then their choice is relevant to the nature of the
product model.
Some of these models will provide values corresponding to the real system and according to their
structure. They will explain some characteristics of the system under study in a simple and clear way. Other
models will continue to give good predictions, but their structure will be no more than a “black box” with
limited ability to explain characteristics of the system.
Model Validation
Validation of the model, that is, the test phase, is an important phase that allows you to validate the model
built on the basis of starting data. That is important because it allows you to assess the validity of the data
produced by the model by comparing these data directly with the actual system. But this time, you are
coming from the set of starting data on which the entire analysis has been established.
Generally, you refer to the data as the training set when you are using them to build the model, and as
the validation set when you are using them to validate the model.
Thus, by comparing the data produced by the model with those produced by the system, you can
evaluate the error, and using different test datasets, you can estimate the limits of validity of the generated
model. In fact the correctly predicted values could be valid only within a certain range, or they could have
different levels of matching depending on the range of values taken into account.
This process allows you not only to numerically evaluate the effectiveness of the model but also to
compare it with any other existing models. There are several techniques in this regard; the most famous is
the cross-validation. This technique is based on the division of the training set into different parts. Each of
these parts, in turn, is used as the validation set and any other as the training set. In this iterative manner,
you will have an increasingly perfected model.
Deployment
This is the final step of the analysis process, which aims to present the results, that is, the conclusions of the
analysis. In the deployment process of the business environment, the analysis is translated into a benefit
for the client who has commissioned it. In technical or scientific environments, it is translated into design
solutions or scientific publications. That is, the deployment basically consists of putting into practice the
results obtained from the data analysis.
There are several ways to deploy the results of data analysis or data mining. Normally, a data analyst’s
deployment consists of writing a report for management or for the customer who requested the analysis.
This document conceptually describes the results obtained from the analysis of data. The report should
be directed to the managers, who are then able to make decisions. Then, they will put into practice the
conclusions of the analysis.
In the documentation supplied by the analyst, each of these four topics is discussed in detail:
• Analysis results
• Decision deployment
• Risk analysis
• Measuring the business impact
When the results of the project include the generation of predictive models, these models can be
deployed as stand-alone applications or can be integrated into other software.
8
Chapter 1 ■ An Introduction to Data Analysis
Open Data
In support of the growing demand for data, a huge number of data sources are now available on the Internet.
These data sources freely provide information to anyone in need, and they are called open data.
9
Chapter 1 ■ An Introduction to Data Analysis
Here is a list of some open data available online covering different topics. You can find a more complete
list and details of the open data available online in Appendix B.
• Kaggle (www.kaggle.com/datasets) is a huge community of apprentices and expert
data scientists who provide a vast amount of datasets and code that they use for
their analyses. The extensive documentation and the introduction to every aspect
of machine learning are also excellent. They also hold interesting competitions
organized around the resolution of various problems.
• DataHub (datahub.io/search) is a community that makes a huge amount of
datasets freely available, along with tools for their command-line management. The
dataset topics cover various fields, ranging from the financial market, to population
statistics, to the prices of cryptocurrencies.
• Nasa Earth Observations (https://neo.gsfc.nasa.gov/dataset_index.php/)
provides a wide range of datasets that contain data collected from global climate and
environmental observations.
• World Health Organization (www.who.int/data/collections) manages and
maintains a wide range of data collections related to global health and well-being.
• World Bank Open Data (https://data.worldbank.org/) provides a listing of
available World Bank datasets covering financial and banking data, development
indicators, and information on the World Bank’s lending projects from 1947 to the
present.
• Data.gov (https://data.gov) is intended to collect and provide access to the
U.S. government’s Open Data, a broad range of government information collected at
different levels (federal, state, local, and tribal).
• European Union Open Data Portal (https://data.europa.eu/en) collects and
makes publicly available a wide range of datasets concerning the public sector of the
European member states.
• Healthdata.gov (www.healthdata.gov/) provides data about health and health care
for doctors and researchers so they can carry out clinical studies and solve problems
regarding diseases, virus spread, and health practices, as well as improve the level of
global health.
• Google Trends Datastore (https://googletrends.github.io/data/) collects and
makes available the collected data divided by topic of the famous and very useful
Google Trends, which is used to carry out analyses on its own account.
Finally, recently Google has made available a search page dedicated to datasets,
where you can search for a topic and obtain a series of datasets (or even data
sources) that correspond as much as possible to what you are looking for. For
example, in Figure 1-3, you can see how, when researching the price of houses, a
series of datasets or data sources are suggested in real time.
10
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-3. Example of a search for a dataset regarding the prices of houses on Google Dataset Search
As an idea of open data sources available online, you can look at the LOD cloud diagram (http://cas.
lod-cloud.net), which displays the connections of the data link among several open data sources currently
available on the network (see Figure 1-4). The diagram contains a series of circular elements corresponding
to the available data sources; their color corresponds to a specific topic of the data provided. The legend
indicates the topic-color correspondence. When you click an element on the diagram, you see a page
containing all the information about the selected data source and how to access it.
11
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-4. Linked open data cloud diagram 2023, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch,
and Richard Cyganiak. http://cas.lod-cloud.net [CC-BY license]
12
Chapter 1 ■ An Introduction to Data Analysis
Compared to other programming languages generally used for data analysis, such as R and MATLAB,
Python not only provides a platform for processing data, but it also has features that make it unique
compared to other languages and specialized applications. The development of an ever-increasing number
of support libraries, the implementation of algorithms of more innovative methodologies, and the ability to
interface with other programming languages (C and Fortran) all make Python unique among its kind.
Furthermore, Python is not only specialized for data analysis, but it also has many other applications,
such as generic programming, scripting, interfacing to databases, and more recently web development,
thanks to web frameworks like Django. So it is possible to develop data analysis projects that are compatible
with the web server with the possibility to integrate them on the web.
For those who want to perform data analysis, Python, with all its packages, is considered the best choice
for the foreseeable future.
Conclusions
In this chapter, you learned what data analysis is and, more specifically, the various processes that comprise
it. Also, you have begun to see the role that data play in building a prediction model and how their careful
selection is at the basis of a careful and accurate data analysis.
In the next chapter, you take this vision of Python and the tools it provides to perform data analysis.
13
CHAPTER 2
The Python language, and the world around it, is made by interpreters, tools, editors, libraries, notebooks,
and so on. This Python world has expanded greatly in recent years, enriching and taking forms that
developers who approach it for the first time can sometimes find complicated and somewhat misleading.
Thus, if you are approaching Python for the first time, you might feel lost among so many choices, especially
about where to start.
This chapter gives you an overview of the entire Python world. You’ll first gain an introduction to the
Python language and its unique characteristics. You’ll learn where to start, what an interpreter is, and how to
begin writing your first lines of code in Python before being presented with some new and more advanced
forms of interactive writing with respect to shells, such as IPython and the IPython Notebook.
Python is an object-oriented programming language. In fact, it allows you to specify classes of objects
and implement their inheritance. But unlike C++ and Java, there are no constructors or destructors. Python
also allows you to implement specific constructs in your code to manage exceptions. However, the structure
of the language is so flexible that it allows you to program with alternative approaches with respect to the
object-oriented one. For example, you can use functional or vectorial approaches.
Python is an interactive programming language. Thanks to the fact that Python uses an interpreter to
be executed, this language can take on very different aspects depending on the context in which it is used.
In fact, you can write long lines of code, similar to what you might do in languages like C++ or Java, and then
launch the program, or you can enter the command line at once and execute a command, immediately
getting the results. Then, depending on the results, you can decide what command to run next. This highly
interactive way to execute code makes the Python computing environment similar to MATLAB. This feature
of Python is one reason it’s popular with the scientific community.
Python is a programming language that can be interfaced. In fact, this programming language can be
interfaced with code written in other programming languages such as C/C++ and FORTRAN. Even this
was a winning choice. In fact, thanks to this aspect, Python can compensate for what is perhaps its only
weak point, the speed of execution. The nature of Python, as a highly dynamic programming language, can
sometimes lead to execution of programs up to 100 times slower than the corresponding static programs
compiled with other languages. The solution to this kind of performance problem is to interface Python to
the compiled code of other languages by using it as if it were its own.
Python is an open-source programming language. CPython, which is the reference implementation
of the Python language, is completely free and open source. Additionally every module or library in the
network is open source and their code is available online. Every month, an extensive developer community
includes improvements to make this language and all its libraries even richer and more efficient. CPython is
managed by the nonprofit Python Software Foundation, which was created in 2001 and has given itself the
task of promoting, protecting, and advancing the Python programming language.
Finally, Python is a simple language to use and learn. This aspect is perhaps the most important,
because it is the most direct aspect that a developer, even a novice, faces. The high intuitiveness and ease of
reading of Python code often leads to “sympathy” for this programming language, and consequently most
newcomers to programming choose to use it. However, its simplicity does not mean narrowness, since
Python is a language that is spreading in every field of computing. Furthermore, Python is doing all of this
very simply, in comparison to existing programming languages such as C++, Java, and FORTRAN, which by
their nature are very complex.
16
Chapter 2 ■ Introduction to the Python World
Lexing, or tokenization, is the initial phase in which the Python (human-readable) code is converted
into a sequence of logical entities, the so-called lexical tokens (see Figure 2-1).
Parsing is the next stage in which the syntax and grammar of the lexical tokens are checked by a parser,
which produces an abstract syntax tree (AST) as a result.
Compiling is the phase in which the compiler reads the AST and, based on this information, generates
the Python bytecode (.pyc or .pyo files), which contains very basic execution instructions. Although this
is a compilation phase, the generated bytecode is still platform-independent, which is very similar to what
happens in the Java language.
The last phase is interpreting, in which the generated bytecode is executed by a Python virtual
machine (PVM).
CPython
The standard Python interpreter is CPython, and it was written in C. This made it possible to use C-based
libraries over Python. CPython is available on a variety of platforms, including ARM, iOS, and RISC. Despite
this, CPython has been optimized on portability and other specifications, but not on speed.
Cython
The strongly intrinsic nature of C in the CPython interpreter has been taken further with the Cython project.
This project is based on creating a compiler that translates Python code into C. This code is then executed
within a Cython environment at runtime. This type of compilation system makes it possible to introduce C
semantics into the Python code to make it even more efficient. This system has led to the merging of two worlds
of programming language with the birth of Cython, which can be considered a new programming language.
You can find documentation about it online. I advise you to visit cython.readthedocs.io/en/latest/.
Pyston
Pyston (www.pyston.org/) is a fork of the CPython interpreter that implements performance optimization.
This project arises precisely from the need to obtain an interpreter that can replace CPython over time to
remedy its poor performance in terms of execution speed. Recent results seem to confirm these predictions,
reporting a 30 percent improvement in performance in the case of large, real-world applications.
Unfortunately, due to the lack of compatible binary packages, Pyston packages have to be rebuilt during the
download phase.
17
Chapter 2 ■ Introduction to the Python World
Jython
In parallel to Cython, there is a version built and compiled in Java, called Jython. It was created by Jim
Hugunin in 1997 (www.jython.org/). Jython is an implementation of the Python programming language in
Java; it is further characterized by using Java classes instead of Python modules to implement extensions and
packages of Python.
IronPython
Even the .NET framework offers the possibility of being able to execute Python code inside it. For this
purpose, you can use the IronPython interpreter (https://ironpython.net/). This interpreter allows .NET
developers to develop Python programs on the Visual Studio platform, integrating perfectly with the other
development tools of the .NET platform.
Initially built by Jim Hugunin in 2006 with the release of version 1.0, the project was later supported by a
small team at Microsoft until version 2.7 in 2010. Since then, numerous other versions have been released up
to the current 3.4, all ported forward by a group of volunteers on Microsoft’s CodePlex repository.
PyPy
The PyPy interpreter is a JIT (just-in-time) compiler, and it converts the Python code directly to machine
code at runtime. This choice was made to speed up the execution of Python. However, this choice has led to
the use of a smaller subset of Python commands, defined as RPython. For more information on this, consult
the official website at www.pypy.org/.
RustPython
As the name suggests, RustPython (rustpython.github.io/) is a Python interpreter written in Rust. This
programming language is quite new but it is gaining popularity. RustPython is an interpreter like CPython
but can also be used as a JIT compiler. It also allows you to run Python code embedded in Rust programs
and compile the code into WebAssembly, so you can run Python code directly from web browsers.
Installing Python
In order to develop programs in Python, you have to install it on your operating system. Linux distributions
and macOS X machines should have a preinstalled version of Python. If not, or if you want to replace that
version with another, you can easily install it. The process for installing Python differs from operating system
to operating system. However, it is a rather simple operation.
On Debian-Ubuntu Linux systems, the first thing to do is to check whether Python is already installed
on your system and what version is currently in use.
Open a terminal (by pressing ALT+CTRL+T) and enter the following command:
python3 --version
If you get the version number as output, then Python is already present on the Ubuntu system. If you get
an error message, Python hasn’t been installed yet.
In this last case
18
Chapter 2 ■ Introduction to the Python World
If, on the other hand, the current version is old, you can update it with the latest version of your Linux
distribution by entering the following command:
Finally, if instead you want to install a specific version on your system, you have to explicitly indicate it
in the following way:
On Red Hat and CentOS Linux systems working with rpm packages, run this command instead:
If you are running Windows or macOS X, you can go to the official Python site (www.python.org) and
download the version you prefer. The packages in this case are installed automatically.
However, today there are distributions that provide a number of tools that make the management and
installation of Python, all libraries, and associated applications easier. I strongly recommend you choose one
of the distributions available online.
Python Distributions
Due to the success of the Python programming language, many Python tools have been developed to meet
various functionalities over the years. There are so many that it’s virtually impossible to manage all of them
manually.
In this regard, many Python distributions efficiently manage hundreds of Python packages. In fact,
instead of individually downloading the interpreter, which includes only the standard libraries, and then
needing to individually install all the additional libraries, it is much easier to install a Python distribution.
At the heart of these distributions are the package managers, which are nothing more than applications
that automatically manage, install, upgrade, configure, and remove Python packages that are part of the
distribution.
Their functionality is very useful, since the user simply makes a request regarding a particular package
(which could be an installation for example). Then the package manager, usually via the Internet, performs
the operation by analyzing the necessary version, alongside all dependencies with any other packages, and
downloads them if they are not present.
Anaconda
Anaconda is a free distribution of Python packages distributed by Continuum Analytics (www.anaconda.com).
This distribution supports Linux, Windows, and macOS X operating systems. Anaconda, in addition to
providing the latest packages released in the Python world, comes bundled with most of the tools you need
to set up a Python development environment.
Indeed, when you install the Anaconda distribution on your system, you can use many tools and
applications described in this chapter, without worrying about having to install and manage them
separately. The basic distribution includes Spyder, an IDE used to develop complex Python programs,
Jupyter Notebook, a wonderful tool for working interactively with Python in a graphical and orderly way, and
Anaconda Navigator, a graphical panel for managing packages and virtual environments.
19
Chapter 2 ■ Introduction to the Python World
The management of the entire Anaconda distribution is performed by an application called conda. This
is the package manager and the environment manager of the Anaconda distribution and it handles all of the
packages and their versions.
One of the most interesting aspects of this distribution is the ability to manage multiple development
environments, each with its own version of Python. With Anaconda, you can work simultaneously and
independently with different Python versions at the same time, by creating several virtual environments.
You can create, for instance, an environment based on Python 3.11 even if the current Python version is still
3.10 in your system. To do this, you write the following command via the console:
This will generate a new Anaconda virtual environment with all the packages related to the Python
3.11 version. This installation will not affect the Python version installed on your system and won’t generate
any conflicts. When you no longer need the new virtual environment, you can simply uninstall it, leaving
the Python system installed on your operating system completely unchanged. Once it’s installed, you can
activate the new environment by entering the following command:
activate py311
C:\Users\Fabio>activate py311
(py311) C:\Users\Fabio>
You can create as many versions of Python as you want; you need only to change the parameter passed
with the python option in the conda create command. When you want to return to work with the original
Python version, use the following command:
source deactivate
(py311) C:\Users\Fabio>deactivate
Deactivating environment "py311"...
C:\Users\Fabio>
A
naconda Navigator
Although at the base of the Anaconda distribution there is the conda command for the management of
packages and virtual environments, working through the command console is not always practical and
efficient. As you will see in the following chapters of the book, Anaconda provides a graphical tool called
Anaconda Navigator, which allows you to manage the virtual environments and related packages in a
graphical and very simplified way (see Figure 2-2).
20
Chapter 2 ■ Introduction to the Python World
21
Chapter 2 ■ Introduction to the Python World
Also from the Environments panel it is possible to create new virtual environments, selecting the basic
Python version. Similarly, the same virtual environments can be deleted, cloned, backed up, or imported
using the menu shown in Figure 2-4.
Figure 2-4. Button menu for managing virtual environments in Anaconda Navigator
But that is not all. Anaconda Navigator is not only a useful application for managing Python
applications, virtual environments, and packages. In the third panel, called Learning (see Figure 2-5), it
provides links to the main sites of many useful Python libraries (including those covered in this book). By
clicking one of these links, you can access a lot of documentation. This is always useful to have on hand if
you program in Python on a daily basis.
22
Chapter 2 ■ Introduction to the Python World
An identical panel to this is the next one, called Community. There are links here too, but this time to
forums from the main Python development and Data Analytics communities.
The Anaconda platform, with its multiple applications and Anaconda Navigator, allows developers to
take advantage of this simple and organized work environment and be well prepared for the development
of Python code. It is no coincidence that this platform has become almost a standard for those belonging to
the sector.
Using Python
Python is rich, but simple and very flexible. It allows you to expand your development activities in many
areas of work (data analysis, scientific, graphic interfaces, etc.). Precisely for this reason, Python can be used
in many different contexts, often according to the taste and ability of the developer. This section presents
the various approaches to using Python in the course of the book. According to the various topics discussed
in different chapters, these different approaches will be used specifically, as they are more suited to the task
at hand.
Python Shell
The easiest way to approach the Python world is to open a session in the Python shell, which is a terminal
running a command line. In fact, you can enter one command at a time and test its operation immediately.
This mode makes clear the nature of the interpreter that underlies Python. In fact, the interpreter can read
one command at a time, keeping the status of the variables specified in the previous lines, a behavior similar
to that of MATLAB and other calculation software.
23
Chapter 2 ■ Introduction to the Python World
This approach is helpful when approaching Python the first time. You can test commands one at a time
without having to write, edit, and run an entire program, which could be composed of many lines of code.
This mode is also good for testing and debugging Python code one line at a time, or simply to make
calculations. To start a session on the terminal, simply type this on the command line:
C:\Users\nelli>python
Python 3.10 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
The Python shell is now active and the interpreter is ready to receive commands in Python. Start by
entering the simplest of commands, but a classic for getting started with programming.
If you have the Anaconda platform available on your system, you can open a Python shell related to a
specific virtual environment you want to work on. In this case, from Anaconda Navigator, in the Home panel,
activate the virtual environment from the drop-down menu and click the Launch button of the CMD.exe
Prompt application, as shown in Figure 2-6.
24
Chapter 2 ■ Introduction to the Python World
A command console will open with the name of the active virtual environment prefixed in brackets in
the prompt. From there, you can run the python command to activate the Python shell.
(Edition3) C:\Users\nelli>python
Python 3.11.0 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Now you’ve written your first program in Python, and you can run it directly from the command line by
calling the python command and then the name of the file containing the program code.
python MyFirstProgram.py
From the output, the program will ask for your name. Once you enter it, it will say hello.
25
Chapter 2 ■ Introduction to the Python World
M
ake Calculations
You have already seen that the print() function is useful for printing almost anything. Python, in addition
to being a printing tool, is a great calculator. Start a session on the Python shell and begin to perform these
mathematical operations:
>>> 1 + 2
3
>>> (1.045 * 3)/4
0.78375
>>> 4 ** 2
16
>>> ((4 + 5j) * (2 + 3j))
(-7+22j)
>>> 4 < (2*3)
True
Python can calculate many types of data, including complex numbers and conditions with Boolean
values. As you can see from these calculations, the Python interpreter directly returns the result of the
calculations without the need to use the print() function. The same thing applies to values contained in
variables. It’s enough to call the variable to see its contents.
>>> a = 12 * 3.4
>>> a
40.8
In this way, all the functions contained in the math package are available in your Python session so you
can call them directly. Thus, you have extended the standard set of functions available when you start a
Python session. These functions are called with the following expression.
library_name.function_name()
26
Chapter 2 ■ Introduction to the Python World
For example, you can now calculate the sine of the value contained in the variable a.
>>> math.sin(a)
As you can see, the function is called along with the name of the library. Sometimes you might find the
following expression for declaring an import.
Even if this works properly, it is to be avoided for good practice. In fact, writing an import in this way
involves the importation of all functions without necessarily defining the library to which they belong.
>>> sin(a)
0.040693257349864856
This form of import can lead to very large errors, especially if the imported libraries are numerous. In
fact, it is not unlikely that different libraries have functions with the same name, and importing all of these
would result in an override of all functions with the same name that were previously imported. Therefore,
the behavior of the program could generate numerous errors or worse, abnormal behavior.
Actually, this way to import is generally used for only a limited number of functions, that is, functions
that are strictly necessary for the functioning of the program, thus avoiding the importation of an entire
library when it is completely unnecessary.
Data Structure
You saw in the previous examples how to use simple variables containing a single value. Python provides a
number of extremely useful data structures. These data structures can contain lots of data simultaneously
and sometimes even data of different types. The various data structures provided are defined differently
depending on how their data are structured internally.
• List
• Set
• Strings
• Tuples
• Dictionary
• Deque
• Heap
This is only a small part of all the data structures that can be made with Python. Among all these data
structures, the most commonly used are dictionaries and lists.
The type dictionary, defined also as dicts, is a data structure in which each particular value is associated
with a particular label, called a key. The data collected in a dictionary have no internal order but are only
definitions of key/value pairs.
27
Chapter 2 ■ Introduction to the Python World
If you want to access a specific value within the dictionary, you have to indicate the name of the
associated key.
>>> dict["name"]
'William'
If you want to iterate the pairs of values in a dictionary, you have to use the for-in construct. This is
possible through the use of the items() function.
The type list is a data structure that contains a number of objects in a precise order to form a sequence
to which elements can be added and removed. Each item is marked with a number corresponding to the
order of the sequence, called the index.
If you want to access the individual elements, it is sufficient to specify the index in square brackets (the
first item in the list has 0 as its index), while if you take out a portion of the list (or a sequence), it is sufficient
to specify the range with the indices i and j corresponding to the extremes of the portion.
>>> list[2]
3
>>> list[1:3]
[2, 3]
If you are using negative indices instead, this means you are considering the last item in the list and
gradually moving to the first.
>>> list[-1]
4
In order to do a scan of the elements of a list, you can use the for-in construct.
28
Another Random Document on
Scribd Without Any Related Topics
who devours all without distinction. And the meaning in the phrase about
being all in the same boat is, not that there are no degrees among the people
in a boat, but that all those degrees are nothing compared with the
stupendous fact that the boat goes home or goes down. And it is when I
come to the particular criticism on my remarks about “the fact of having to
die” that I feel most confident that I was right and that Mr. Moore is wrong.
It will be noted that I spoke of the fact of having to die, not of the fact of
dying. The brotherhood of men, being a spiritual thing, is not concerned
merely with the truth that all men will die, but with the truth that all men
know it. It is true, as Mr. Moore says, that everything will die, “whether it
be leviathan or butterfly, oak or violet, worm or eagle”; but exactly what, at
the very start, we do not know is whether they know it. Can Mr. Moore
draw forth leviathan with a hook, and extract his hopes and fears about the
heavenly harpooner? Can he worm its philosophy out of a worm, or get the
caterpillar to talk about the faint possibility of a butterfly? The caterpillar on
the leaf may repeat to Blake his mother’s grief; but it does not repeat to
anybody its own grief about its own mother. Can he know whether oaks
confront their fate with hearts of oak, as the phrase is used in a sailor’s
song? He cannot; and this is the whole point about human brotherhood, the
point the vegetarians cannot see. This is why a harpooner is not an assassin;
this is why eating whale’s blubber, though not attractive to the fancy, is not
repulsive to the conscience. We do not know what a whale thinks of death;
still less what the other whales think of his being killed and eaten. He may
be a pessimistic whale, and be perpetually wishing that this too, too solid
blubber would melt, thaw and resolve itself into a dew. He may be a
fanatical whale, and feel frantically certain of passing instantly into a polar
paradise of whales, ruled by the sacred whale who swallowed Jonah. But
we can elicit no sign or gesture from him suggestive of such reflections; and
the working common sense of the thing is that no creatures outside man
seem to have any sense of death at all. Mr. Moore has therefore chosen a
strangely unlucky point upon which to challenge the true egalitarian
doctrine. Almost the most arresting and even startling stamp of the
solidarity and sameness of mankind is precisely this fact, not only of death,
but of the shadow of death. We do know of any man whatever what we do
not know of any other thing whatever, that his death is what we call a
tragedy. From the fact that it is a tragedy flow all the forms and tests by
which we say it is a murder or an execution, a martyrdom or a suicide. They
all depend on an echo or vibration, not only in the soul of man, but in the
souls of all men.
Oddly enough, Mr. Moore has made exactly the same mistake about the
comic as about the tragic. It is true, I think, that almost everything which
has a shape is humorous; but it is not true that everything which has a shape
has a sense of humour. The whale may be laughable, but it is not the whale
who laughs; the image indeed is almost alarming. And the instant the
question is raised, we collide with another colossal fact, dwarfing all human
differentiations; the fact that man is the only creature who does laugh. In the
presence of this prodigious fact, the fact that men laugh in different degrees,
and at different things, shrivels not merely into insignificance but into
invisibility. It is true that I have often felt the physical universe as
something like a firework display: the most practical of all practical jokes.
But if the cosmos is meant for a joke, men seem to be the only cosmic
conspirators who have been let into the joke. There could be no fraternity
like our freemasonry in that secret pleasure. It is true that there are no limits
to this jesting faculty, that it is not confined to common human jests; but it
is confined to human jesters. Mr. Moore may burst out laughing when he
beholds the morning star, or be thrown into convulsions of amusement by
the effect of moonrise seen through a mist. He may, to quote his own
catalogue, see all the fun of an eagle or an oak tree. We may come upon him
in some quiet dell rolling about in uproarious mirth at the sight of a violet.
But we shall not find the violet in a state of uproarious mirth at Mr. Moore.
He may laugh at the worm; but the worm will not turn and laugh at him. For
that comfort he must come to his fellow-sinners: I shall always be ready to
oblige.
The truth involved here has had many names; that man is the image of
God; that he is the microcosm; that he is the measure of all things. He is the
microcosm in the sense that he is the mirror, the only crystal we know in
which the fantasy and fear in things are, in the double and real sense, things
of reflection. In the presence of this mysterious monopoly the differences of
men are like dust. That is what the equality of men means to me; and that is
the only intelligible thing it ever meant to anybody. The common things of
men infinitely outclass all classes. For a man to disagree with this it is
necessary that he should understand it; Mr. Moore may really disagree with
it; but the ordinary modern anti-egalitarian does not understand it, or
apparently anything else. If a man says he had some transcendental dogma
of his own, as Mr. Moore may possibly have, which mixes man with nature
or claims to see other values in men, I shall say no more than that my
religion is different from his, and I am uncommonly glad of it. But if he
simply says that men cannot be equal because some of them are clever and
some of them are stupid—why then I shall merely agree (not without tears)
that some of them are very stupid.
The Sentimentalism of Divorce
D IVORCE is a thing which the newspapers now not only advertise, but
advocate, almost as if it were a pleasure in itself. It may be, indeed, that
all the flowers and festivities will now be transferred from the
fashionable wedding to the fashionable divorce. A superb iced and frosted
divorce-cake will be provided for the feast, and in military circles will be
cut with the co-respondent’s sword. A dazzling display of divorce presents
will be laid out for the inspection of the company, watched by a detective
dressed as an ordinary divorce guest. Perhaps the old divorce breakfast will
be revived; anyhow, toasts will be drunk, the guests will assemble on the
doorstep to see the husband and wife go off in opposite directions; and all
will go merry as a divorce-court bell. All this, though to some it might seem
a little fanciful, would really be far less fantastic than the sort of things that
are really said on the subject. I am not going to discuss the depth and
substance of that subject. I myself hold a mystical view of marriage; but I
am not going to debate it here. But merely in the interests of light and logic
I would protest against the way in which it is frequently debated. The
process cannot rationally be called a debate at all. It is a sort of chorus of
sentimentalists in the sensational newspapers, perpetually intoning some
such formula as this: “We respect marriage, we reverence marriage, holy,
sacred, ineffably exquisite and ideal marriage. True marriage is love, and
when love alters, marriage alters, and when love stops or begins again,
marriage does the same; wonderful, beautiful, beatific marriage.”
Now, with all reasonable sympathy with everything sentimental, I may
remark that all that talk is tosh. Marriage is an institution like any other, set
up deliberately to have certain functions and limitations; it is an institution
like private property, or conscription, or the legal liberties of the subject. To
talk as if it were made or melted with certain changing moods is a mere
waste of words. The object of private property is that as many citizens as
possible should have a certain dignity and pleasure in being masters of
material things. But suppose a dog-stealer were to say that as soon as a man
was bored with his dog it ceased to be his dog, and he ceased to be
responsible for it. Suppose he were to say that by merely coveting the dog,
he could immediately morally possess the dog. The answer would be that
the only way to make men responsible for dogs was to make the relation a
legal one, apart from the likes and dislikes of the moment. Suppose a
burglar were to say: “Private property I venerate, private property I revere;
but I am convinced that Mr. Brown does not truly value his silver Apostle
spoons as such sacred objects should be valued; they have therefore ceased
to be his property; in reality they have already become my property, for I
appreciate their precious character as nobody else can do.” Suppose a
murderer were to say: “What can be more amiable and admirable than
human life lived with a due sense of its priceless opportunity! But I regret
to observe that Mr. Robinson has lately been looking decidedly tired and
melancholy; life accepted in this depressing and demoralizing spirit can no
longer truly be called life; it is rather my own exuberant and perhaps
exaggerated joy of life which I must gratify by cutting his throat with a
carving-knife.”
It is obvious that these philosophers would fail to understand what we
mean by a rule, quite apart from the problem of its exceptions. They would
fail to grasp what we mean by an institution, whether it be the institution of
law, of property, or of marriage. A reasonable person will certainly reply to
the burglar: “You will hardly soothe us by merely poetical praises of
property; because your case would be much more convincing if you denied,
as the Communists do, that property ought to exist at all. There may be,
there certainly are, gross abuses in private property; but, so long as it is an
institution at all, it cannot alter merely with moods and emotions. A farm
cannot simply float away from a farmer, in proportion as his interest in it
grows fainter than it was. A house cannot shift away by inches from a
householder, by certain fine shades of feeling that he happens to have about
it. A dog cannot drift away like a dream, and begin to belong to somebody
else who happens just then to be dreaming of him. And neither can the
serious social relation of husband and wife, of mother and father, or even of
man and woman, be resolved in all its relations by passions and reactions of
sentiment.” This question is quite apart from the question of whether there
are exceptions to the rule of loyalty, or what they are. The primary point is
that there is an institution to which to be loyal. If the new sentimentalists
mean what they say, when they say they venerate that institution, they must
not suggest that an institution can be actually identical with an emotion.
And that is what their rhetoric does suggest, so far as it can be said to
suggest anything.
These writers are always explaining to us why they believe in divorce. I
think I can easily understand why they believe in divorce. What I do not
understand is why they believe in marriage. Just as the philosophical
burglar would be more philosophical if he were a Bolshevist, so this sort of
divorce advocate would be more philosophical if he were a free-lover. For
his arguments never seem to touch on marriage as an institution, or
anything more than an individual experience. The real explanation of this
strange indifference to the institutional idea is, I fancy, something not only
deeper, but wider; something affecting all the institutions of the modern
world. The truth is that these sociologists are not at all interested in
promoting the sort of social life that marriage does promote. The sort of
society of which marriage has always been the strongest pillar is what is
sometimes called the distributive society; the society in which most of the
citizens have a tolerable share of property, especially property in land.
Everywhere, all over the world, the farm goes with the family and the
family with the farm. Unless the whole domestic group hold together with a
sort of loyalty or local patriotism, unless the inheritance of property is
logical and legitimate, unless the family quarrels are kept out of the courts
of officialism, the tradition of family ownership cannot be handed on
unimpaired. On the other hand, the Servile State, which is the opposite of
the distributive state, has always been rather embarrassed by the institution
of marriage. It is an old story that the negro slavery of “Uncle Tom’s Cabin”
did its worst work in the breaking-up of families. But, curiously enough, the
same story is told from both sides. For the apologists of the Slave States, or,
at least, of the Southern States, made the same admission even in their own
defence. If they denied breaking up the slave family, it was because they
denied that there was any slave family to break up.
Free love is the direct enemy of freedom. It is the most obvious of all the
bribes that can be offered by slavery. In servile societies a vast amount of
sexual laxity can go on in practice, and even in theory, save when now and
then some cranky speculator or crazy squire has a fad for some special
breed of slaves like a breed of cattle. And even that lunacy would not last
long; for lunatics are the minority among slave-owners. Slavery has a much
more sane and a much more subtle appeal to human nature than that. It is
much more likely that, after a few such fads and freaks, the new Servile
State would settle down into the sleepy resignation of the old Servile State;
the old pagan repose in slavery, as it was before Christianity came to
trouble and perplex the world with ideals of liberty and chivalry. One of the
conveniences of that pagan world is that, below a certain level of society,
nobody really need bother about pedigree or paternity at all. A new world
began when slaves began to stand on their dignity as virgin martyrs.
Christendom is the civilization that such martyrs made; and slavery is its
returning enemy. But of all the bribes that the old pagan slavery can offer,
this luxury and laxity is the strongest; nor do I deny that the influences
desiring the degradation of human dignity have here chosen their
instrument well.
Street Cries and Stretching the Law
A BOUT a hundred years ago some enemy sowed among our people the
heresy that it is more practical to use a corkscrew to open a sardine-tin,
or to employ a door-scraper as a paperweight. Practical politics came to
mean the habit of using everything for some other purpose than its own; of
snatching up anything as a substitute for something else. A law that had
been meant to do one thing, and had conspicuously failed to do it, was
always excused because it might do something totally different and perhaps
directly contrary. A custom that was supposed to keep everything white was
allowed to survive on condition that it made everything black. In reality this
is so far from being practical that it does not even rise to the dignity of
being lazy. At the best it can only claim to save trouble, and it does not even
do that. What it really means is that some people will take every other kind
of trouble in the world, if they are saved the trouble of thinking. They will
sit for hours trying to open a tin with a corkscrew, rather than make the
mental effort of pursuing the abstract, academic, logical connexion between
a corkscrew and a cork.
Here is an example of the sort of thing I mean, which I came across in a
daily paper to-day. A headline announces in staring letters, and with startled
notes of exclamation, that some abominable judicial authority has made the
monstrous decision that musicians playing in the street are not beggars. The
journalist bitterly remarks that they may shove their hats under our very
noses for money, but yet we must not call them beggars. He follows this
remark with several notes of exclamation, and I feel inclined to add a few of
my own. The most astonishing thing about the matter, to my mind, is that
the journalist is quite innocent in his own indignation. It never so much as
crosses his mind that organ-grinders are not classed as beggars because they
are not beggars. They may be as much of a nuisance as beggars; they may
demand special legislation like beggars; it may be right and proper for
every philanthropist to stop them, starve them, harry them, and hound them
to death just as if they were beggars. But they are not beggars, by any
possible definition of begging. Nobody can be said to be a mere mendicant
who is offering something in exchange for money, especially if it is
something which some people like and are willing to pay for. A street singer
is no more of a mendicant than Madame Clara Butt, though the method
(and the scale) of remuneration differs more or less. Anybody who sells
anything, in the streets or in the shops, is begging in the sense of begging
people to buy. Mr. Selfridge is begging people to buy; the Imperial
International Universal Cosmic Stores is begging people to buy. The only
possible definition of the actual beggar is not that he is begging people to
buy, but that he has nothing to sell.
Now, it is interesting to ask ourselves what the newspaper really meant,
when it was so wildly illogical in what it said. Superficially and as a matter
of mood or feeling, we can all guess what was meant. The writer meant that
street musicians looked very much like beggars, because they wore thinner
and dirtier clothes than his own; and that he had grown quite used to people
who looked like that being treated anyhow and arrested for everything. That
is a state of mind not uncommon among those whom economic security has
kept as superficial as a varnish. But what was intellectually involved in his
vague argument was more interesting. What he meant was, in that deeper
sense, that it would be a great convenience if the law that punishes beggars
could be stretched to cover people who are certainly not beggars, but who
may be as much of a botheration as beggars. In other words, he wanted to
use the mendicity laws in a matter quite unconnected with mendicity; but he
wanted to use the old laws because it would save the trouble of making new
laws—as the corkscrew would save the trouble of going to look for the tin-
opener. And for this notion of the crooked and anomalous use of laws, for
ends logically different from their own, he could, of course, find much
support in the various sophists who have attacked reason in recent times.
But, as I have said, it does not really save trouble; and it is becoming
increasingly doubtful whether it will even save disaster. It used to be said
that this rough-and-ready method made the country richer; but it will be
found less and less consoling to explain why the country is richer when the
country is steadily growing poorer. It will not comfort us in the hour of
failure to listen to long and ingenious explanations of our success. The truth
is that this sort of practical compromise has not led to practical success. The
success of England came as the culmination of the highly logical and
theoretical eighteenth century. The method was already beginning to fail by
the time we came to the end of the compromising and constitutional
nineteenth century. Modern scientific civilization was launched by
logicians. It was only wrecked by practical men. Anyhow, by this time
everybody in England has given up pretending to be particularly rich. It is,
therefore, no appropriate moment for proving that a course of being
consistently unreasonable will always lead to riches.
In truth, it would be much more practical to be more logical. If street
musicians are a nuisance, let them be legislated against for being a
nuisance. If begging is really wrong, a logical law should be imposed on all
beggars, and not merely on those whom particular persons happen to regard
as being also nuisances. What this sort of opportunism does is simply to
prevent any question being considered as a whole. I happen to think the
whole modern attitude towards beggars is entirely heathen and inhuman. I
should be prepared to maintain, as a matter of general morality, that it is
intrinsically indefensible to punish human beings for asking for human
assistance. I should say that it is intrinsically insane to urge people to give
charity and forbid people to accept charity. Nobody is penalized for crying
for help when he is drowning; why should he be penalized for crying for
help when he is starving? Every one would expect to have to help a man to
save his life in a shipwreck; why not a man who has suffered a shipwreck of
his life? A man may be in such a position by no conceivable fault of his
own; but in any case his fault is never urged against him in the parallel
cases. A man is saved from shipwreck without inquiry about whether he has
blundered in the steering of his ship; and we fish him out of a pond before
asking whose fault it was that he fell into it. A striking social satire might be
written about a man who was rescued again and again out of mere motives
of humanity in all the wildest places of the world; who was heroically
rescued from a lion and skilfully saved out of a sinking ship; who was
sought out on a desert island and scientifically recovered from a deadly
swoon; and who only found himself suddenly deserted by all humanity
when he reached the city that was his home.
In the ultimate sense, therefore, I do not myself disapprove of
mendicants. Nor do I disapprove of musicians. It may not unfairly be
retorted that this is because I am not a musician. I allow full weight to the
fairness of the retort, but I cannot think it a good thing that even musicians
should lose all their feelings except the feeling for music. And it may surely
be said that a man must have lost most of his feelings if he does not feel the
pathos of a barrel-organ in a poor street. But there are other feelings besides
pathos covered by any comprehensive veto upon street music and
minstrelsy. There are feelings of history, and even of patriotism. I have seen
in certain rich and respectable quarters of London a notice saying that all
street cries are forbidden. If there were a notice up to say that all old
tombstones should be carted away like lumber, it would be rather less of an
act of vandalism. Some of the old street cries of London are among the last
links that we have with the London of Shakespeare and the London of
Chaucer. When I meet a man who utters one I am so far from regarding him
as a beggar; it is I who should be a beggar, and beg him to say it again.
But in any case it should be made clear that we cannot make one law do
the work of another. If we have real reasons for forbidding something like a
street cry, we should give the reasons that are real; we should forbid it
because it is a cry, because it is a noise, because it is a nuisance, or perhaps,
according to our tastes, because it is old, because it is popular, because it is
historic and a memory of Merry England. I suspect that the subconscious
prejudice against it is rooted in the fact that the pedlar or hawker is one of
the few free men left in the modern city; that he often sells his own wares
directly to the consumer, and does not pay rent for a shop. But if the
modern spirit wishes to veto him, to harry him, or to hang, draw and quarter
him for being free, at least let it so far recognize his dignity as to define
him; and let the law deal with him in principle as well as in practice.
The Revolt of the Spoilt Child
E VERYBODY says that each generation revolts against the last. Nobody
seems to notice that it generally revolts against the revolt of the last. I
mean that the latest grievance is really the last reform. To take but one
example in passing. There is a new kind of novel which I have seen widely
reviewed in the newspapers. No; it is not an improper novel. On the
contrary, it is more proper—almost in the sense of prim—than its authors
probably imagine. It is really a reaction towards a more old-fashioned
morality, and away from a new-fashioned one. It is not so much a revolt of
the daughters as a return of the grandmothers.
Miss May Sinclair wrote a novel of the kind I mean, about a spinster
whose life had been blighted by a tender and sensitive touch in her
education, which had taught her—or rather, expected her—always to
“behave beautifully.” Mrs. Delafield wrote a story with the refreshing name
of “Humbug” on somewhat similar lines. It suggests that children are
actually trained to deception, and especially self-deception, by a delicate
and considerate treatment that continually appealed to their better feelings,
which was always saying, “You would not hurt father.” Now, certainly a
more old-fashioned and simple style of education did not invariably say
“You would not hurt father.” Sometimes it preferred to say, “Father will
hurt you.” I am not arguing for or against the father with the big stick. I am
pointing out that Miss Sinclair and the modern novelists really are arguing
for the father with the big stick, and against a more recent movement that is
supposed to have reformed him. I myself can remember the time when the
progressives offered us, as a happy prospect, the very educational method
which the novelists now describe so bitterly in retrospect. We were told that
true education would only appeal to the better feelings of children; that it
would devote itself entirely to telling them to live beautifully; that it would
use no argument more arbitrary than saying “You would not hurt father.”
That ethical education was the whole plan for the rising generation in the
days of my youth. We were assured beforehand how much more effective
such a psychological treatment would be than the bullying and blundering
idea of authority. The hope of the future was in this humanitarian optimism
in the training of the young; in other words, the hope was set on something
which, when it is established, Mrs. Delafield instantly calls humbug and
Miss Sinclair appears to hate as a sort of hell. What they are suffering from,
apparently, is not the abuses of their grandfathers, but the most modern
reforms of their fathers. These complaints are the first fruits of reformed
education, of ethical societies and social idealists. I repeat that I am for the
moment talking about their opinions and not mine. I am not eulogizing
either big sticks or psychological scalpels; I am pointing out that the outcry
against the scalpel inevitably involves something of a case for the stick. I
have never tied myself to a final belief in either; but I point out that the
progressive, generation after generation, does elaborately tie himself up in
new knots, and then roar and yell aloud to be untied.
It seems a little hard on the late Victorian idealist to be so bitterly abused
merely for being kind to his children. There is something a little
unconsciously comic about the latest generation of critics, who are crying
out against their parents, “Never, never can I forgive the tenderness with
which my mother treated me.” There is a certain irony in the bitterness
which says, “My soul cries for vengeance when I remember that papa was
always polite at the breakfast-table; my soul is seared by the persistent
insolence of Uncle William in refraining from clouting me over the head.”
It seems harsh to blame these idealists for idealizing human life, when they
were only following what was seriously set before them as the only ideal of
education. But, if this is to be said for the late Victorian idealist, there is
also something to be said for the early Victorian authoritarian. Upon their
own argument, there is something to be said for Uncle William if he did
clout them over the head. It is rather hard, even on the great-grandfather
with the big stick, that we should still abuse him merely for having
neglected the persuasive methods that we have ourselves abandoned. It is
hard to revile him for not having discovered to be sound the very
sentimentalities that we have since discovered to be rotten.
For the case of these moderns is worst of all when they do try to find any
third ideal, which is neither the authority which they once condemned for
not being persuasion, nor the persuasion which they now condemn for
being worse than authority. The nearest they can get to any other alternative
is some notion about individuality; about drawing out the true personality of
the child, or allowing a human being to find his real self. It is, perhaps, the
most utterly meaningless talk in the whole muddle of the modern world.
How is a child of seven to decide whether he has or has not found his true
individuality? How, for that matter, is any grown-up person to tell it for
him? How is anybody to know whether anybody has become his true self?
In the highest sense it can only be a matter of mysticism; it can only mean
that there was a purpose in his creation. It can only be the purpose of God,
and even then it is a mystery. In anybody who does not accept the purpose
of God, it can only be a muddle. It is so unmeaning that it cannot be called
mystery but only mystification. Humanly considered, a human personality
is only the thing that does in fact emerge out of a combination of the forces
inside the child and the forces outside. The child cannot grow up in a void
or vacuum with no forces outside. Circumstances will control or contribute
to his character, whether they are the grandfather’s stick or the father’s
persuasion or the conversations among the characters of Miss May Sinclair.
Who in the world is to say positively which of these things has or has not
helped his real personality?
What is his real personality? These philosophers talk as if there was a
complete and complex animal curled up inside every baby, and we had
nothing to do but to let it come out with a yell. As a matter of fact, we all
know, in the case of the finest and most distinguished personalities, that it
would be very difficult to disentangle them from the trials they have
suffered, as well as from the truths they have found. But, anyhow, these
thinkers must give us some guidance as to how they propose to tell whether
their transcendental notion of a true self has been realized or no. As it is,
anybody can say of any part of any personality that it is or is not an
artificial addition obscuring that personality. In fiction, most of the wild and
anarchical characters strike me as entirely artificial. In real life they would
no doubt be much the same, if they could ever be met with in real life. But
anyhow, they would be the products of experience as well as of elemental
impulses; they would be influenced in some way by all they had gone
through; and anybody would be free to speculate on what they would have
been like if they had never had such experiences. Anybody might amuse
himself by trying to subtract the experiences and find the self; anybody who
wanted to waste his time.
Therefore, without feeling any fixed fanaticism for all the old methods,
whether coercive or persuasive, I do think they both had a basis of common
sense which is wanting in this third theory. The parent, whether persuading
or punishing the child, was at least aware of one simple truth. He knew that,
in the most serious sense, God alone knows what the child is really like, or
is meant to be really like. All we can do to him is to fill him with those
truths which we believe to be equally true whatever he is like. We must
have a code of morals which we believe to be applicable to all children, and
impose it on this child because it is applicable to all children. If it seems to
be a part of his personality to be a swindler or a torturer, we must tell him
that we do not want any personalities to be swindlers and torturers. In other
words, we must believe in a religion or philosophy firmly enough to take
the responsibility of acting on it, however much the rising generations may
knock, or kick, at the door. I know all about the word education meaning
drawing things out, and mere instruction meaning putting things in. And I
respectfully reply that God alone knows what there is to draw out; but we
can be reasonably responsible for what we are ourselves putting in.
The Innocence of the Criminal