Download Full (Ebook) Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition by Fabio Nelli ISBN 9781484295311, 9781484295328, 1484295315, 1484295323 PDF All Chapters
Download Full (Ebook) Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition by Fabio Nelli ISBN 9781484295311, 9781484295328, 1484295315, 1484295323 PDF All Chapters
com
OR CLICK HERE
DOWLOAD EBOOK
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
(Ebook) Python Data Analytics: With Pandas, NumPy, and
Matplotlib by Nelli, Fabio ISBN 9781484239124,
9781484239131, 1484239121, 148423913X
https://ebooknice.com/product/python-data-analytics-with-pandas-numpy-
and-matplotlib-11712292
ebooknice.com
https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018
ebooknice.com
ebooknice.com
ebooknice.com
Third Edition
Fabio Nelli
Python Data Analytics: With Pandas, NumPy, and Matplotlib
Fabio Nelli
Rome, Italy
This book is dedicated to all those who are constantly looking for awareness
Table of Contents
■
■Chapter 1: An Introduction to Data Analysis��������������������������������������������������������� 1
Data Analysis�������������������������������������������������������������������������������������������������������������������� 1
Knowledge Domains of the Data Analyst������������������������������������������������������������������������� 2
Computer Science���������������������������������������������������������������������������������������������������������������������������������� 2
Mathematics and Statistics�������������������������������������������������������������������������������������������������������������������� 3
Machine Learning and Artificial Intelligence������������������������������������������������������������������������������������������ 3
Professional Fields of Application����������������������������������������������������������������������������������������������������������� 3
v
■ Table of Contents
SciPy������������������������������������������������������������������������������������������������������������������������������ 42
NumPy�������������������������������������������������������������������������������������������������������������������������������������������������� 42
Pandas�������������������������������������������������������������������������������������������������������������������������������������������������� 43
matplotlib��������������������������������������������������������������������������������������������������������������������������������������������� 43
Conclusions�������������������������������������������������������������������������������������������������������������������� 43
■
■Chapter 3: The NumPy Library����������������������������������������������������������������������������� 45
NumPy: A Little History��������������������������������������������������������������������������������������������������� 45
The NumPy Installation�������������������������������������������������������������������������������������������������� 46
ndarray: The Heart of the Library����������������������������������������������������������������������������������� 47
Create an Array������������������������������������������������������������������������������������������������������������������������������������� 48
Types of Data���������������������������������������������������������������������������������������������������������������������������������������� 49
The dtype Option���������������������������������������������������������������������������������������������������������������������������������� 50
Intrinsic Creation of an Array���������������������������������������������������������������������������������������������������������������� 50
Basic Operations������������������������������������������������������������������������������������������������������������ 51
Arithmetic Operators���������������������������������������������������������������������������������������������������������������������������� 52
The Matrix Product������������������������������������������������������������������������������������������������������������������������������� 53
vi
■ Table of Contents
General Concepts����������������������������������������������������������������������������������������������������������� 64
Copies or Views of Objects������������������������������������������������������������������������������������������������������������������� 64
Vectorization����������������������������������������������������������������������������������������������������������������������������������������� 65
Broadcasting���������������������������������������������������������������������������������������������������������������������������������������� 66
Structured Arrays����������������������������������������������������������������������������������������������������������� 68
Reading and Writing Array Data on Files������������������������������������������������������������������������ 70
Loading and Saving Data in Binary Files���������������������������������������������������������������������������������������������� 70
Reading Files with Tabular Data����������������������������������������������������������������������������������������������������������� 70
Conclusions�������������������������������������������������������������������������������������������������������������������� 72
■
■Chapter 4: The pandas Library—An Introduction������������������������������������������������ 73
pandas: The Python Data Analysis Library��������������������������������������������������������������������� 73
Installation of pandas����������������������������������������������������������������������������������������������������� 74
Installation from Anaconda������������������������������������������������������������������������������������������������������������������� 74
Installation from PyPI���������������������������������������������������������������������������������������������������������������������������� 78
vii
■ Table of Contents
The Series��������������������������������������������������������������������������������������������������������������������������������������������� 80
The Dataframe�������������������������������������������������������������������������������������������������������������������������������������� 87
The Index Objects��������������������������������������������������������������������������������������������������������������������������������� 94
Conclusions������������������������������������������������������������������������������������������������������������������ 114
■
■Chapter 5: pandas: Reading and Writing Data��������������������������������������������������� 115
I/O API Tools������������������������������������������������������������������������������������������������������������������ 115
CSV and Textual Files��������������������������������������������������������������������������������������������������� 116
Reading Data in CSV or Text Files��������������������������������������������������������������������������������� 116
Using Regexp to Parse TXT Files�������������������������������������������������������������������������������������������������������� 119
Reading TXT Files Into Parts��������������������������������������������������������������������������������������������������������������� 121
Writing Data in CSV���������������������������������������������������������������������������������������������������������������������������� 121
viii
■ Table of Contents
Concatenating�������������������������������������������������������������������������������������������������������������� 154
Combining������������������������������������������������������������������������������������������������������������������������������������������ 156
Pivoting����������������������������������������������������������������������������������������������������������������������������������������������� 157
Removing�������������������������������������������������������������������������������������������������������������������������������������������� 160
Permutation������������������������������������������������������������������������������������������������������������������ 169
Random Sampling������������������������������������������������������������������������������������������������������������������������������ 170
ix
■ Table of Contents
pyplot��������������������������������������������������������������������������������������������������������������������������� 189
The Plotting Window��������������������������������������������������������������������������������������������������������������������������� 189
Histograms������������������������������������������������������������������������������������������������������������������� 218
Bar Charts�������������������������������������������������������������������������������������������������������������������� 219
Horizontal Bar Charts�������������������������������������������������������������������������������������������������������������������������� 222
Multiserial Bar Charts������������������������������������������������������������������������������������������������������������������������� 223
Multiseries Bar Charts with a pandas Dataframe������������������������������������������������������������������������������� 225
Multiseries Stacked Bar Charts���������������������������������������������������������������������������������������������������������� 227
Stacked Bar Charts with a pandas Dataframe������������������������������������������������������������������������������������ 229
Other Bar Chart Representations�������������������������������������������������������������������������������������������������������� 230
xi
■ Table of Contents
■
■Chapter 8: Machine Learning with scikit-learn������������������������������������������������� 259
The scikit-learn Library������������������������������������������������������������������������������������������������ 259
Machine Learning��������������������������������������������������������������������������������������������������������� 259
Supervised and Unsupervised Learning��������������������������������������������������������������������������������������������� 259
Training Set and Testing Set��������������������������������������������������������������������������������������������������������������� 260
Conclusions������������������������������������������������������������������������������������������������������������������ 287
■
■Chapter 9: Deep Learning with TensorFlow������������������������������������������������������� 289
Artificial Intelligence, Machine Learning, and Deep Learning�������������������������������������� 289
Artificial Intelligence��������������������������������������������������������������������������������������������������������������������������� 289
Machine Learning Is a Branch of Artificial Intelligence���������������������������������������������������������������������� 290
Deep Learning Is a Branch of Machine Learning�������������������������������������������������������������������������������� 290
The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning������������������ 290
xii
■ Table of Contents
TensorFlow������������������������������������������������������������������������������������������������������������������� 298
TensorFlow: Google’s Framework������������������������������������������������������������������������������������������������������� 298
TensorFlow: Data Flow Graph������������������������������������������������������������������������������������������������������������� 298
Conclusions������������������������������������������������������������������������������������������������������������������ 321
■
■Chapter 10: An Example—Meteorological Data������������������������������������������������ 323
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea������������������������� 323
The System in the Study: The Adriatic Sea and the Po Valley������������������������������������������������������������� 323
xiii
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 348
■
■Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook���� 349
The Open Data Source for Demographics�������������������������������������������������������������������� 349
The JavaScript D3 Library�������������������������������������������������������������������������������������������� 352
Drawing a Clustered Bar Chart������������������������������������������������������������������������������������� 355
The Choropleth Maps��������������������������������������������������������������������������������������������������� 358
The Choropleth Map of the U.S. Population in 2022����������������������������������������������������� 362
Conclusions������������������������������������������������������������������������������������������������������������������ 366
■
■Chapter 12: Recognizing Handwritten Digits���������������������������������������������������� 367
Handwriting Recognition���������������������������������������������������������������������������������������������� 367
Recognizing Handwritten Digits with scikit-learn�������������������������������������������������������� 367
The Digits Dataset�������������������������������������������������������������������������������������������������������� 368
Learning and Predicting����������������������������������������������������������������������������������������������� 370
Recognizing Handwritten Digits with TensorFlow�������������������������������������������������������� 372
Learning and Predicting with an SLP��������������������������������������������������������������������������� 376
Learning and Predicting with an MLP�������������������������������������������������������������������������� 381
Conclusions������������������������������������������������������������������������������������������������������������������ 384
■
■Chapter 13: Textual Data Analysis with NLTK���������������������������������������������������� 385
Text Analysis Techniques���������������������������������������������������������������������������������������������� 385
The Natural Language Toolkit (NLTK)�������������������������������������������������������������������������������������������������� 386
Import the NLTK Library and the NLTK Downloader Tool��������������������������������������������������������������������� 386
Search for a Word with NLTK�������������������������������������������������������������������������������������������������������������� 389
Analyze the Frequency of Words�������������������������������������������������������������������������������������������������������� 390
Select Words from Text����������������������������������������������������������������������������������������������������������������������� 392
Bigrams and Collocations������������������������������������������������������������������������������������������������������������������� 393
Preprocessing Steps��������������������������������������������������������������������������������������������������������������������������� 394
xiv
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 401
■
■Chapter 14: Image Analysis and Computer Vision with OpenCV����������������������� 403
Image Analysis and Computer Vision��������������������������������������������������������������������������� 403
OpenCV and Python������������������������������������������������������������������������������������������������������ 404
OpenCV and Deep Learning������������������������������������������������������������������������������������������ 404
Installing OpenCV��������������������������������������������������������������������������������������������������������� 404
First Approaches to Image Processing and Analysis���������������������������������������������������� 404
Before Starting����������������������������������������������������������������������������������������������������������������������������������� 404
Load and Display an Image���������������������������������������������������������������������������������������������������������������� 405
Work with Images������������������������������������������������������������������������������������������������������������������������������� 406
Save the New Image��������������������������������������������������������������������������������������������������������������������������� 407
Elementary Operations on Images������������������������������������������������������������������������������������������������������ 407
Image Blending����������������������������������������������������������������������������������������������������������������������������������� 411
■
■Appendix B: Open Data Sources������������������������������������������������������������������������ 435
Index��������������������������������������������������������������������������������������������������������������������� 437
xv
About the Author
Fabio Nelli is a data scientist and Python consultant who designs and develops Python applications for
data analysis and visualization. He also has experience in the scientific world, having performed various
data analysis roles in pharmaceutical chemistry for private research companies and universities. He has
been a computer consultant for many years at IBM, EDS, and Hewlett-Packard, along with several banks
and insurance companies. He holds a master’s degree in organic chemistry and a bachelor’s degree in
information technologies and automation systems, with many years of experience in life sciences (as a tech
specialist at Beckman Coulter, Tecan, and SCIEX).
For further info and other examples, visit his page at www.meccanismocomplesso.org and the GitHub
page at https://github.com/meccanismocomplesso.
xvii
About the Technical Reviewer
xix
Preface
About five years have passed since the last edition of this book. In drafting this third edition, I made some
necessary changes, both to the text and to the code. First, all the Python code has been ported to 3.8 and
greater, and all references to Python 2.x versions have been dropped. Some chapters required a total
rewrite because the content was no longer compatible. I'm referring to TensorFlow 3.x which, compared
to TensorFlow 2.x (covered in the previous edition), has completely revamped its entire reference system.
In five years, the deep learning modules and code developed with version 2.x have proven completely
unusable. Keras and all its modules have been incorporated into the TensorFlow library, replacing all the
classes, functions, and modules that performed similar functions. The construction of neural network
models, their learning phases, and the functions they use have all completely changed. In this edition,
therefore, you have the opportunity to learn the methods of TensorFlow 3.x and to acquire familiarity with
the concepts and new paradigms in the new version.
Regarding data visualization, I decided to add information about the Seaborn library to the matplotlib
chapter. Seaborn, although still in version 0.x, is proving to be a very useful matplotlib extension for data
analysis, thanks to its statistical display of plots and its compatibility with pandas dataframes. I hope that,
with this completely updated third edition, I can further entice you to study and deepen your data analysis
with Python. This book will be a valuable learning tool for you now, and serve as a dependable reference in
the future.
—Fabio Nelli
xxi
CHAPTER 1
In this chapter, you’ll take your first steps in the world of data analysis, learning in detail the concepts and
processes that make up this discipline. The concepts discussed in this chapter are helpful background
for the following chapters, where these concepts and procedures are applied in the form of Python code,
through the use of several libraries that are discussed in later chapters.
Data Analysis
In a world increasingly centralized around information technology, huge amounts of data are produced
and stored each day. Often these data come from automatic detection systems, sensors, and scientific
instrumentation, or you produce them daily and subconsciously every time you make a withdrawal from the
bank or purchase something, when you record various blogs, or even when you post on social networks.
But what are the data? The data actually are not information, at least in terms of their form. In the
formless stream of bytes, at first glance it is difficult to understand their essence, if they are not strictly
numbers, words, or times. This information is actually the result of processing, which, taking into account a
certain dataset, extracts conclusions that can be used in various ways. This process of extracting information
from raw data is called data analysis.
The purpose of data analysis is to extract information that is not easily deducible but, when understood,
enables you to carry out studies on the mechanisms of the systems that produced the data. This in turn
allows you to forecast possible responses of these systems and their evolution in time.
Starting from a simple methodical approach to data protection, data analysis has become a real
discipline, leading to the development of real methodologies that generate models. The model is in fact
a translation of the system to a mathematical form. Once there is a mathematical or logical form that can
describe system responses under different levels of precision, you can predict its development or response
to certain inputs. Thus, the aim of data analysis is not the model, but the quality of its predictive power.
The predictive power of a model depends not only on the quality of the modeling techniques but also
on the ability to choose a good dataset upon which to build the entire analysis process. So the search for
data, their extraction, and their subsequent preparation, while representing preliminary activities of an
analysis, also belong to data analysis itself, because of their importance in the success of the results.
So far I have spoken of data, their handling, and their processing through calculation procedures. In
parallel to all the stages of data analysis processing, various methods of data visualization have also been
developed. In fact, to understand the data, both individually and in terms of the role they play in the dataset,
there is no better system than to develop the techniques of graphical representation. These techniques are
capable of transforming information, sometimes implicitly hidden, into figures, which help you more easily
understand the meaning of the data. Over the years, many display modes have been developed for different
modes of data display, called charts.
At the end of the data analysis process, you have a model and a set of graphical displays and you can
predict the responses of the system under study; after that, you move to the test phase. The model is tested
using another set of data for which you know the system response. These data do not define the predictive
model. Depending on the ability of the model to replicate real, observed responses, you get an error
calculation and knowledge of the validity of the model and its operating limits.
These results can be compared to any other models to understand if the newly created one is
more efficient than the existing ones. Once you have assessed that, you can move to the last phase of
data analysis—deployment. This phase consists of implementing the results produced by the analysis,
namely, implementing the decisions to be made based on the predictions generated by the model and its
associated risks.
Data analysis is well suited to many professional activities. So, knowledge of it and how it can be put
into practice is relevant. It allows you to test hypotheses and understand the systems you’ve analyzed
more deeply.
Computer Science
Knowledge of computer science is a basic requirement for any data analyst. In fact, only when you have
good knowledge of and experience in computer science can you efficiently manage the necessary tools for
data analysis. In fact, every step concerning data analysis involves using calculation software (such as IDL,
MATLAB, etc.) and programming languages (such as C ++, Java, and Python).
The large amount of data available today, thanks to information technology, requires specific skills in
order to be managed as efficiently as possible. Indeed, data research and extraction require knowledge of
these various formats. The data are structured and stored in files or database tables with particular formats.
XML, JSON, or simply XLS or CSV files, are now the common formats for storing and collecting data, and
many applications allow you to read and manage the data stored in them. When it comes to extracting data
contained in a database, things are not so immediate, but you need to know the SQL Query language or use
software specially developed for the extraction of data from a given database.
Moreover, for some specific types of data research, the data are not available in an explicit format, but
are present in text files (documents and log files) or web pages, or shown as charts, measures, number of
visitors, or HTML tables. This requires specific technical expertise to parse and eventually extract these data
(called web scraping).
2
Chapter 1 ■ An Introduction to Data Analysis
Knowledge of information technology is necessary for using the various tools made available by
contemporary computer science, such as applications and programming languages. These tools, in turn, are
needed to perform data analysis and data visualization.
The purpose of this book is to provide all the necessary knowledge, as far as possible, regarding the
development of methodologies for data analysis. The book uses the Python programming language and
specialized libraries that contribute to the performance of the data analysis steps, from data research to data
mining, to publishing the results of the predictive model.
3
Chapter 1 ■ An Introduction to Data Analysis
Types of Data
Data can be divided into two distinct categories:
• Categorical (nominal and ordinal)
• Numerical (discrete and continuous)
Categorical data are values or observations that can be divided into groups or categories. There are two
types of categorical values: nominal and ordinal. A nominal variable has no intrinsic order that is identified
in its category. An ordinal variable instead has a predetermined order.
Numerical data are values or observations that come from measurements. There are two types of
numerical values: discrete and continuous numbers. Discrete values can be counted and are distinct and
separated from each other. Continuous values, on the other hand, are values produced by measurements or
observations that assume any value within a defined range.
4
Chapter 1 ■ An Introduction to Data Analysis
• Predictive modeling
• Model validation/testing
• Visualization and interpretation of results
• Deployment of the solution (implementation of the solution in the real world)
Figure 1-1 shows a schematic representation of all the processes involved in data analysis.
Problem Definition
The process of data analysis actually begins long before the collection of raw data. In fact, data analysis
always starts with a problem to be solved, which needs to be defined.
The problem is defined only after you have focused the system you want to study; this may be a
mechanism, an application, or a process in general. Generally this study can be in order to better understand
its operation, but in particular, the study is designed to understand the principles of its behavior in order to
be able to make predictions or choices (defined as an informed choice).
The definition step and the corresponding documentation (deliverables) of the scientific problem or
business are both very important in order to focus the entire analysis strictly on getting results. In fact, a
comprehensive or exhaustive study of the system is sometimes complex and you do not always have enough
information to start with. So the definition of the problem and especially its planning can determine the
guidelines for the whole project.
5
Chapter 1 ■ An Introduction to Data Analysis
Once the problem has been defined and documented, you can move to the project planning stage of
data analysis. Planning is needed to understand which professionals and resources are necessary to meet
the requirements to carry out the project as efficiently as possible. You consider the issues involving the
resolution of the problem. You look for specialists in various areas of interest and install the software needed
to perform data analysis.
Also during the planning phase, you choose an effective team. Generally, these teams should be cross-
disciplinary in order to solve the problem by looking at the data from different perspectives. So, building a
good team is certainly one of the key factors leading to success in data analysis.
Data Extraction
Once the problem has been defined, the first step is to obtain the data in order to perform the analysis.
The data must be chosen with the basic purpose of building the predictive model, and so data selection is
crucial for the success of the analysis as well. The sample data collected must reflect as much as possible
the real world, that is, how the system responds to stimuli from the real world. For example, if you’re using
huge datasets of raw data and they are not collected competently, these may portray false or unbalanced
situations.
Thus, poor choice of data, or even performing analysis on a dataset that’s not perfectly representative of
the system, will lead to models that will move away from the system under study.
The search and retrieval of data often require a form of intuition that goes beyond mere technical
research and data extraction. This process also requires a careful understanding of the nature and form of
the data, which only good experience and knowledge in the problem’s application field can provide.
Regardless of the quality and quantity of data needed, another issue is using the best data sources.
If the studio environment is a laboratory (technical or scientific) and the data generated are
experimental, then in this case the data source is easily identifiable. In this case, the problems will be only
concerning the experimental setup.
But it is not possible for data analysis to reproduce systems in which data are gathered in a strictly
experimental way in every field of application. Many fields require searching for data from the surrounding
world, often relying on external experimental data, or even more often collecting them through interviews
or surveys. So in these cases, finding a good data source that is able to provide all the information you need
for data analysis can be quite challenging. Often it is necessary to retrieve data from multiple data sources to
supplement any shortcomings, to identify any discrepancies, and to make the dataset as general as possible.
When you want to get the data, a good place to start is the web. But most of the data on the web can be
difficult to capture; in fact, not all data are available in a file or database, but might be content that is inside
HTML pages in many different formats. To this end, a methodology called web scraping allows the collection
of data through the recognition of specific occurrence of HTML tags within web pages. There is software
specifically designed for this purpose, and once an occurrence is found, it extracts the desired data. Once the
search is complete, you will get a list of data ready to be subjected to data analysis.
Data Preparation
Among all the steps involved in data analysis, data preparation, although seemingly less problematic, in
fact requires more resources and more time to be completed. Data are often collected from different data
sources, each of which has data in it with a different representation and format. So, all of these data have to
be prepared for the process of data analysis.
The preparation of the data is concerned with obtaining, cleaning, normalizing, and transforming
data into an optimized dataset, that is, in a prepared format that’s normally tabular and is suitable for the
methods of analysis that have been scheduled during the design phase.
Many potential problems can arise, including invalid, ambiguous, or missing values, replicated fields,
and out-of-range data.
6
Chapter 1 ■ An Introduction to Data Analysis
Data Exploration/Visualization
Exploring the data involves essentially searching the data in a graphical or statistical presentation in order
to find patterns, connections, and relationships. Data visualization is the best tool to highlight possible
patterns.
In recent years, data visualization has been developed to such an extent that it has become a real
discipline in itself. In fact, numerous technologies are utilized exclusively to display data, and many display
types are applied to extract the best possible information from a dataset.
Data exploration consists of a preliminary examination of the data, which is important for
understanding the type of information that has been collected and what it means. In combination with the
information acquired during the definition problem, this categorization determines which method of data
analysis is most suitable for arriving at a model definition.
Generally, this phase, in addition to a detailed study of charts through the visualization data, may
consist of one or more of the following activities:
• Summarizing data
• Grouping data
• Exploring the relationship between the various attributes
• Identifying patterns and trends
Generally, data analysis requires summarizing statements regarding the data to be studied.
Summarization is a process by which data are reduced to interpretation without sacrificing important
information.
Clustering is a method of data analysis that is used to find groups united by common attributes (also
called grouping).
Another important step of the analysis focuses on the identification of relationships, trends, and
anomalies in the data. In order to find this kind of information, you often have to resort to the tools as well as
perform another round of data analysis, this time on the data visualization itself.
Other methods of data mining, such as decision trees and association rules, automatically extract
important facts or rules from the data. These approaches can be used in parallel with data visualization to
uncover relationships between the data.
Predictive Modeling
Predictive modeling is a process used in data analysis to create or choose a suitable statistical model to
predict the probability of a result.
After exploring the data, you have all the information needed to develop the mathematical model that
encodes the relationship between the data. These models are useful for understanding the system under
study, and in a specific way they are used for two main purposes. The first is to make predictions about the
data values produced by the system; in this case, you will be dealing with regression models if the result is
numeric or with classification models if the result is categorical. The second purpose is to classify new data
products, and in this case, you will be using classification models if the results are identified by classes or
clustering models if the results could be identified by segmentation. In fact, it is possible to divide the models
according to the type of result they produce:
• Classification models: If the result obtained by the model type is categorical.
• Regression models: If the result obtained by the model type is numeric.
• Clustering models: If the result obtained by the model type is a segmentation.
7
Chapter 1 ■ An Introduction to Data Analysis
Simple methods to generate these models include techniques such as linear regression, logistic
regression, classification and regression trees, and k-nearest neighbors. But the methods of analysis are
numerous, and each has specific characteristics that make it excellent for some types of data and analysis.
Each of these methods will produce a specific model, and then their choice is relevant to the nature of the
product model.
Some of these models will provide values corresponding to the real system and according to their
structure. They will explain some characteristics of the system under study in a simple and clear way. Other
models will continue to give good predictions, but their structure will be no more than a “black box” with
limited ability to explain characteristics of the system.
Model Validation
Validation of the model, that is, the test phase, is an important phase that allows you to validate the model
built on the basis of starting data. That is important because it allows you to assess the validity of the data
produced by the model by comparing these data directly with the actual system. But this time, you are
coming from the set of starting data on which the entire analysis has been established.
Generally, you refer to the data as the training set when you are using them to build the model, and as
the validation set when you are using them to validate the model.
Thus, by comparing the data produced by the model with those produced by the system, you can
evaluate the error, and using different test datasets, you can estimate the limits of validity of the generated
model. In fact the correctly predicted values could be valid only within a certain range, or they could have
different levels of matching depending on the range of values taken into account.
This process allows you not only to numerically evaluate the effectiveness of the model but also to
compare it with any other existing models. There are several techniques in this regard; the most famous is
the cross-validation. This technique is based on the division of the training set into different parts. Each of
these parts, in turn, is used as the validation set and any other as the training set. In this iterative manner,
you will have an increasingly perfected model.
Deployment
This is the final step of the analysis process, which aims to present the results, that is, the conclusions of the
analysis. In the deployment process of the business environment, the analysis is translated into a benefit
for the client who has commissioned it. In technical or scientific environments, it is translated into design
solutions or scientific publications. That is, the deployment basically consists of putting into practice the
results obtained from the data analysis.
There are several ways to deploy the results of data analysis or data mining. Normally, a data analyst’s
deployment consists of writing a report for management or for the customer who requested the analysis.
This document conceptually describes the results obtained from the analysis of data. The report should
be directed to the managers, who are then able to make decisions. Then, they will put into practice the
conclusions of the analysis.
In the documentation supplied by the analyst, each of these four topics is discussed in detail:
• Analysis results
• Decision deployment
• Risk analysis
• Measuring the business impact
When the results of the project include the generation of predictive models, these models can be
deployed as stand-alone applications or can be integrated into other software.
8
Chapter 1 ■ An Introduction to Data Analysis
Open Data
In support of the growing demand for data, a huge number of data sources are now available on the Internet.
These data sources freely provide information to anyone in need, and they are called open data.
9
Chapter 1 ■ An Introduction to Data Analysis
Here is a list of some open data available online covering different topics. You can find a more complete
list and details of the open data available online in Appendix B.
• Kaggle (www.kaggle.com/datasets) is a huge community of apprentices and expert
data scientists who provide a vast amount of datasets and code that they use for
their analyses. The extensive documentation and the introduction to every aspect
of machine learning are also excellent. They also hold interesting competitions
organized around the resolution of various problems.
• DataHub (datahub.io/search) is a community that makes a huge amount of
datasets freely available, along with tools for their command-line management. The
dataset topics cover various fields, ranging from the financial market, to population
statistics, to the prices of cryptocurrencies.
• Nasa Earth Observations (https://neo.gsfc.nasa.gov/dataset_index.php/)
provides a wide range of datasets that contain data collected from global climate and
environmental observations.
• World Health Organization (www.who.int/data/collections) manages and
maintains a wide range of data collections related to global health and well-being.
• World Bank Open Data (https://data.worldbank.org/) provides a listing of
available World Bank datasets covering financial and banking data, development
indicators, and information on the World Bank’s lending projects from 1947 to the
present.
• Data.gov (https://data.gov) is intended to collect and provide access to the
U.S. government’s Open Data, a broad range of government information collected at
different levels (federal, state, local, and tribal).
• European Union Open Data Portal (https://data.europa.eu/en) collects and
makes publicly available a wide range of datasets concerning the public sector of the
European member states.
• Healthdata.gov (www.healthdata.gov/) provides data about health and health care
for doctors and researchers so they can carry out clinical studies and solve problems
regarding diseases, virus spread, and health practices, as well as improve the level of
global health.
• Google Trends Datastore (https://googletrends.github.io/data/) collects and
makes available the collected data divided by topic of the famous and very useful
Google Trends, which is used to carry out analyses on its own account.
Finally, recently Google has made available a search page dedicated to datasets,
where you can search for a topic and obtain a series of datasets (or even data
sources) that correspond as much as possible to what you are looking for. For
example, in Figure 1-3, you can see how, when researching the price of houses, a
series of datasets or data sources are suggested in real time.
10
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-3. Example of a search for a dataset regarding the prices of houses on Google Dataset Search
As an idea of open data sources available online, you can look at the LOD cloud diagram (http://cas.
lod-cloud.net), which displays the connections of the data link among several open data sources currently
available on the network (see Figure 1-4). The diagram contains a series of circular elements corresponding
to the available data sources; their color corresponds to a specific topic of the data provided. The legend
indicates the topic-color correspondence. When you click an element on the diagram, you see a page
containing all the information about the selected data source and how to access it.
11
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-4. Linked open data cloud diagram 2023, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch,
and Richard Cyganiak. http://cas.lod-cloud.net [CC-BY license]
12
Chapter 1 ■ An Introduction to Data Analysis
Compared to other programming languages generally used for data analysis, such as R and MATLAB,
Python not only provides a platform for processing data, but it also has features that make it unique
compared to other languages and specialized applications. The development of an ever-increasing number
of support libraries, the implementation of algorithms of more innovative methodologies, and the ability to
interface with other programming languages (C and Fortran) all make Python unique among its kind.
Furthermore, Python is not only specialized for data analysis, but it also has many other applications,
such as generic programming, scripting, interfacing to databases, and more recently web development,
thanks to web frameworks like Django. So it is possible to develop data analysis projects that are compatible
with the web server with the possibility to integrate them on the web.
For those who want to perform data analysis, Python, with all its packages, is considered the best choice
for the foreseeable future.
Conclusions
In this chapter, you learned what data analysis is and, more specifically, the various processes that comprise
it. Also, you have begun to see the role that data play in building a prediction model and how their careful
selection is at the basis of a careful and accurate data analysis.
In the next chapter, you take this vision of Python and the tools it provides to perform data analysis.
13
CHAPTER 2
The Python language, and the world around it, is made by interpreters, tools, editors, libraries, notebooks,
and so on. This Python world has expanded greatly in recent years, enriching and taking forms that
developers who approach it for the first time can sometimes find complicated and somewhat misleading.
Thus, if you are approaching Python for the first time, you might feel lost among so many choices, especially
about where to start.
This chapter gives you an overview of the entire Python world. You’ll first gain an introduction to the
Python language and its unique characteristics. You’ll learn where to start, what an interpreter is, and how to
begin writing your first lines of code in Python before being presented with some new and more advanced
forms of interactive writing with respect to shells, such as IPython and the IPython Notebook.
Python is an object-oriented programming language. In fact, it allows you to specify classes of objects
and implement their inheritance. But unlike C++ and Java, there are no constructors or destructors. Python
also allows you to implement specific constructs in your code to manage exceptions. However, the structure
of the language is so flexible that it allows you to program with alternative approaches with respect to the
object-oriented one. For example, you can use functional or vectorial approaches.
Python is an interactive programming language. Thanks to the fact that Python uses an interpreter to
be executed, this language can take on very different aspects depending on the context in which it is used.
In fact, you can write long lines of code, similar to what you might do in languages like C++ or Java, and then
launch the program, or you can enter the command line at once and execute a command, immediately
getting the results. Then, depending on the results, you can decide what command to run next. This highly
interactive way to execute code makes the Python computing environment similar to MATLAB. This feature
of Python is one reason it’s popular with the scientific community.
Python is a programming language that can be interfaced. In fact, this programming language can be
interfaced with code written in other programming languages such as C/C++ and FORTRAN. Even this
was a winning choice. In fact, thanks to this aspect, Python can compensate for what is perhaps its only
weak point, the speed of execution. The nature of Python, as a highly dynamic programming language, can
sometimes lead to execution of programs up to 100 times slower than the corresponding static programs
compiled with other languages. The solution to this kind of performance problem is to interface Python to
the compiled code of other languages by using it as if it were its own.
Python is an open-source programming language. CPython, which is the reference implementation
of the Python language, is completely free and open source. Additionally every module or library in the
network is open source and their code is available online. Every month, an extensive developer community
includes improvements to make this language and all its libraries even richer and more efficient. CPython is
managed by the nonprofit Python Software Foundation, which was created in 2001 and has given itself the
task of promoting, protecting, and advancing the Python programming language.
Finally, Python is a simple language to use and learn. This aspect is perhaps the most important,
because it is the most direct aspect that a developer, even a novice, faces. The high intuitiveness and ease of
reading of Python code often leads to “sympathy” for this programming language, and consequently most
newcomers to programming choose to use it. However, its simplicity does not mean narrowness, since
Python is a language that is spreading in every field of computing. Furthermore, Python is doing all of this
very simply, in comparison to existing programming languages such as C++, Java, and FORTRAN, which by
their nature are very complex.
16
Chapter 2 ■ Introduction to the Python World
Lexing, or tokenization, is the initial phase in which the Python (human-readable) code is converted
into a sequence of logical entities, the so-called lexical tokens (see Figure 2-1).
Parsing is the next stage in which the syntax and grammar of the lexical tokens are checked by a parser,
which produces an abstract syntax tree (AST) as a result.
Compiling is the phase in which the compiler reads the AST and, based on this information, generates
the Python bytecode (.pyc or .pyo files), which contains very basic execution instructions. Although this
is a compilation phase, the generated bytecode is still platform-independent, which is very similar to what
happens in the Java language.
The last phase is interpreting, in which the generated bytecode is executed by a Python virtual
machine (PVM).
CPython
The standard Python interpreter is CPython, and it was written in C. This made it possible to use C-based
libraries over Python. CPython is available on a variety of platforms, including ARM, iOS, and RISC. Despite
this, CPython has been optimized on portability and other specifications, but not on speed.
Cython
The strongly intrinsic nature of C in the CPython interpreter has been taken further with the Cython project.
This project is based on creating a compiler that translates Python code into C. This code is then executed
within a Cython environment at runtime. This type of compilation system makes it possible to introduce C
semantics into the Python code to make it even more efficient. This system has led to the merging of two worlds
of programming language with the birth of Cython, which can be considered a new programming language.
You can find documentation about it online. I advise you to visit cython.readthedocs.io/en/latest/.
Pyston
Pyston (www.pyston.org/) is a fork of the CPython interpreter that implements performance optimization.
This project arises precisely from the need to obtain an interpreter that can replace CPython over time to
remedy its poor performance in terms of execution speed. Recent results seem to confirm these predictions,
reporting a 30 percent improvement in performance in the case of large, real-world applications.
Unfortunately, due to the lack of compatible binary packages, Pyston packages have to be rebuilt during the
download phase.
17
Chapter 2 ■ Introduction to the Python World
Jython
In parallel to Cython, there is a version built and compiled in Java, called Jython. It was created by Jim
Hugunin in 1997 (www.jython.org/). Jython is an implementation of the Python programming language in
Java; it is further characterized by using Java classes instead of Python modules to implement extensions and
packages of Python.
IronPython
Even the .NET framework offers the possibility of being able to execute Python code inside it. For this
purpose, you can use the IronPython interpreter (https://ironpython.net/). This interpreter allows .NET
developers to develop Python programs on the Visual Studio platform, integrating perfectly with the other
development tools of the .NET platform.
Initially built by Jim Hugunin in 2006 with the release of version 1.0, the project was later supported by a
small team at Microsoft until version 2.7 in 2010. Since then, numerous other versions have been released up
to the current 3.4, all ported forward by a group of volunteers on Microsoft’s CodePlex repository.
PyPy
The PyPy interpreter is a JIT (just-in-time) compiler, and it converts the Python code directly to machine
code at runtime. This choice was made to speed up the execution of Python. However, this choice has led to
the use of a smaller subset of Python commands, defined as RPython. For more information on this, consult
the official website at www.pypy.org/.
RustPython
As the name suggests, RustPython (rustpython.github.io/) is a Python interpreter written in Rust. This
programming language is quite new but it is gaining popularity. RustPython is an interpreter like CPython
but can also be used as a JIT compiler. It also allows you to run Python code embedded in Rust programs
and compile the code into WebAssembly, so you can run Python code directly from web browsers.
Installing Python
In order to develop programs in Python, you have to install it on your operating system. Linux distributions
and macOS X machines should have a preinstalled version of Python. If not, or if you want to replace that
version with another, you can easily install it. The process for installing Python differs from operating system
to operating system. However, it is a rather simple operation.
On Debian-Ubuntu Linux systems, the first thing to do is to check whether Python is already installed
on your system and what version is currently in use.
Open a terminal (by pressing ALT+CTRL+T) and enter the following command:
python3 --version
If you get the version number as output, then Python is already present on the Ubuntu system. If you get
an error message, Python hasn’t been installed yet.
In this last case
18
Chapter 2 ■ Introduction to the Python World
If, on the other hand, the current version is old, you can update it with the latest version of your Linux
distribution by entering the following command:
Finally, if instead you want to install a specific version on your system, you have to explicitly indicate it
in the following way:
On Red Hat and CentOS Linux systems working with rpm packages, run this command instead:
If you are running Windows or macOS X, you can go to the official Python site (www.python.org) and
download the version you prefer. The packages in this case are installed automatically.
However, today there are distributions that provide a number of tools that make the management and
installation of Python, all libraries, and associated applications easier. I strongly recommend you choose one
of the distributions available online.
Python Distributions
Due to the success of the Python programming language, many Python tools have been developed to meet
various functionalities over the years. There are so many that it’s virtually impossible to manage all of them
manually.
In this regard, many Python distributions efficiently manage hundreds of Python packages. In fact,
instead of individually downloading the interpreter, which includes only the standard libraries, and then
needing to individually install all the additional libraries, it is much easier to install a Python distribution.
At the heart of these distributions are the package managers, which are nothing more than applications
that automatically manage, install, upgrade, configure, and remove Python packages that are part of the
distribution.
Their functionality is very useful, since the user simply makes a request regarding a particular package
(which could be an installation for example). Then the package manager, usually via the Internet, performs
the operation by analyzing the necessary version, alongside all dependencies with any other packages, and
downloads them if they are not present.
Anaconda
Anaconda is a free distribution of Python packages distributed by Continuum Analytics (www.anaconda.com).
This distribution supports Linux, Windows, and macOS X operating systems. Anaconda, in addition to
providing the latest packages released in the Python world, comes bundled with most of the tools you need
to set up a Python development environment.
Indeed, when you install the Anaconda distribution on your system, you can use many tools and
applications described in this chapter, without worrying about having to install and manage them
separately. The basic distribution includes Spyder, an IDE used to develop complex Python programs,
Jupyter Notebook, a wonderful tool for working interactively with Python in a graphical and orderly way, and
Anaconda Navigator, a graphical panel for managing packages and virtual environments.
19
Chapter 2 ■ Introduction to the Python World
The management of the entire Anaconda distribution is performed by an application called conda. This
is the package manager and the environment manager of the Anaconda distribution and it handles all of the
packages and their versions.
One of the most interesting aspects of this distribution is the ability to manage multiple development
environments, each with its own version of Python. With Anaconda, you can work simultaneously and
independently with different Python versions at the same time, by creating several virtual environments.
You can create, for instance, an environment based on Python 3.11 even if the current Python version is still
3.10 in your system. To do this, you write the following command via the console:
This will generate a new Anaconda virtual environment with all the packages related to the Python
3.11 version. This installation will not affect the Python version installed on your system and won’t generate
any conflicts. When you no longer need the new virtual environment, you can simply uninstall it, leaving
the Python system installed on your operating system completely unchanged. Once it’s installed, you can
activate the new environment by entering the following command:
activate py311
C:\Users\Fabio>activate py311
(py311) C:\Users\Fabio>
You can create as many versions of Python as you want; you need only to change the parameter passed
with the python option in the conda create command. When you want to return to work with the original
Python version, use the following command:
source deactivate
(py311) C:\Users\Fabio>deactivate
Deactivating environment "py311"...
C:\Users\Fabio>
A
naconda Navigator
Although at the base of the Anaconda distribution there is the conda command for the management of
packages and virtual environments, working through the command console is not always practical and
efficient. As you will see in the following chapters of the book, Anaconda provides a graphical tool called
Anaconda Navigator, which allows you to manage the virtual environments and related packages in a
graphical and very simplified way (see Figure 2-2).
20
Chapter 2 ■ Introduction to the Python World
21
Chapter 2 ■ Introduction to the Python World
Also from the Environments panel it is possible to create new virtual environments, selecting the basic
Python version. Similarly, the same virtual environments can be deleted, cloned, backed up, or imported
using the menu shown in Figure 2-4.
Figure 2-4. Button menu for managing virtual environments in Anaconda Navigator
But that is not all. Anaconda Navigator is not only a useful application for managing Python
applications, virtual environments, and packages. In the third panel, called Learning (see Figure 2-5), it
provides links to the main sites of many useful Python libraries (including those covered in this book). By
clicking one of these links, you can access a lot of documentation. This is always useful to have on hand if
you program in Python on a daily basis.
22
Chapter 2 ■ Introduction to the Python World
An identical panel to this is the next one, called Community. There are links here too, but this time to
forums from the main Python development and Data Analytics communities.
The Anaconda platform, with its multiple applications and Anaconda Navigator, allows developers to
take advantage of this simple and organized work environment and be well prepared for the development
of Python code. It is no coincidence that this platform has become almost a standard for those belonging to
the sector.
Using Python
Python is rich, but simple and very flexible. It allows you to expand your development activities in many
areas of work (data analysis, scientific, graphic interfaces, etc.). Precisely for this reason, Python can be used
in many different contexts, often according to the taste and ability of the developer. This section presents
the various approaches to using Python in the course of the book. According to the various topics discussed
in different chapters, these different approaches will be used specifically, as they are more suited to the task
at hand.
Python Shell
The easiest way to approach the Python world is to open a session in the Python shell, which is a terminal
running a command line. In fact, you can enter one command at a time and test its operation immediately.
This mode makes clear the nature of the interpreter that underlies Python. In fact, the interpreter can read
one command at a time, keeping the status of the variables specified in the previous lines, a behavior similar
to that of MATLAB and other calculation software.
23
Chapter 2 ■ Introduction to the Python World
This approach is helpful when approaching Python the first time. You can test commands one at a time
without having to write, edit, and run an entire program, which could be composed of many lines of code.
This mode is also good for testing and debugging Python code one line at a time, or simply to make
calculations. To start a session on the terminal, simply type this on the command line:
C:\Users\nelli>python
Python 3.10 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
The Python shell is now active and the interpreter is ready to receive commands in Python. Start by
entering the simplest of commands, but a classic for getting started with programming.
If you have the Anaconda platform available on your system, you can open a Python shell related to a
specific virtual environment you want to work on. In this case, from Anaconda Navigator, in the Home panel,
activate the virtual environment from the drop-down menu and click the Launch button of the CMD.exe
Prompt application, as shown in Figure 2-6.
24
Chapter 2 ■ Introduction to the Python World
A command console will open with the name of the active virtual environment prefixed in brackets in
the prompt. From there, you can run the python command to activate the Python shell.
(Edition3) C:\Users\nelli>python
Python 3.11.0 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Now you’ve written your first program in Python, and you can run it directly from the command line by
calling the python command and then the name of the file containing the program code.
python MyFirstProgram.py
From the output, the program will ask for your name. Once you enter it, it will say hello.
25
Chapter 2 ■ Introduction to the Python World
M
ake Calculations
You have already seen that the print() function is useful for printing almost anything. Python, in addition
to being a printing tool, is a great calculator. Start a session on the Python shell and begin to perform these
mathematical operations:
>>> 1 + 2
3
>>> (1.045 * 3)/4
0.78375
>>> 4 ** 2
16
>>> ((4 + 5j) * (2 + 3j))
(-7+22j)
>>> 4 < (2*3)
True
Python can calculate many types of data, including complex numbers and conditions with Boolean
values. As you can see from these calculations, the Python interpreter directly returns the result of the
calculations without the need to use the print() function. The same thing applies to values contained in
variables. It’s enough to call the variable to see its contents.
>>> a = 12 * 3.4
>>> a
40.8
In this way, all the functions contained in the math package are available in your Python session so you
can call them directly. Thus, you have extended the standard set of functions available when you start a
Python session. These functions are called with the following expression.
library_name.function_name()
26
Chapter 2 ■ Introduction to the Python World
For example, you can now calculate the sine of the value contained in the variable a.
>>> math.sin(a)
As you can see, the function is called along with the name of the library. Sometimes you might find the
following expression for declaring an import.
Even if this works properly, it is to be avoided for good practice. In fact, writing an import in this way
involves the importation of all functions without necessarily defining the library to which they belong.
>>> sin(a)
0.040693257349864856
This form of import can lead to very large errors, especially if the imported libraries are numerous. In
fact, it is not unlikely that different libraries have functions with the same name, and importing all of these
would result in an override of all functions with the same name that were previously imported. Therefore,
the behavior of the program could generate numerous errors or worse, abnormal behavior.
Actually, this way to import is generally used for only a limited number of functions, that is, functions
that are strictly necessary for the functioning of the program, thus avoiding the importation of an entire
library when it is completely unnecessary.
Data Structure
You saw in the previous examples how to use simple variables containing a single value. Python provides a
number of extremely useful data structures. These data structures can contain lots of data simultaneously
and sometimes even data of different types. The various data structures provided are defined differently
depending on how their data are structured internally.
• List
• Set
• Strings
• Tuples
• Dictionary
• Deque
• Heap
This is only a small part of all the data structures that can be made with Python. Among all these data
structures, the most commonly used are dictionaries and lists.
The type dictionary, defined also as dicts, is a data structure in which each particular value is associated
with a particular label, called a key. The data collected in a dictionary have no internal order but are only
definitions of key/value pairs.
27
Chapter 2 ■ Introduction to the Python World
If you want to access a specific value within the dictionary, you have to indicate the name of the
associated key.
>>> dict["name"]
'William'
If you want to iterate the pairs of values in a dictionary, you have to use the for-in construct. This is
possible through the use of the items() function.
The type list is a data structure that contains a number of objects in a precise order to form a sequence
to which elements can be added and removed. Each item is marked with a number corresponding to the
order of the sequence, called the index.
If you want to access the individual elements, it is sufficient to specify the index in square brackets (the
first item in the list has 0 as its index), while if you take out a portion of the list (or a sequence), it is sufficient
to specify the range with the indices i and j corresponding to the extremes of the portion.
>>> list[2]
3
>>> list[1:3]
[2, 3]
If you are using negative indices instead, this means you are considering the last item in the list and
gradually moving to the first.
>>> list[-1]
4
In order to do a scan of the elements of a list, you can use the for-in construct.
28
Other documents randomly have
different content
suggestion that a disguised and armed tramp should be employed as
a decoy ship.
"Then I'll have a wire sent to him," decided the admiral. "Perhaps
he would be able to assist us while you are on particular service
afloat."
"Very well, then. You can carry on with your leave for a few days,
but I wish you to be present when Mr. Vyse is here. We have your
address?" At two the same afternoon, Broadmayne was "rung up"
from the dockyard, the message stating that Mr. Vyse had arranged
to call at Admiralty House at three; would Mr. Broadmayne be
present?
"I've never heard him called by the name, sir," replied Broadmayne,
while Vyse replied in a similar strain.
This was on a Wednesday. Since the sale of Old Silas's cottage was
fixed for the following Monday, there was little time to be lost.
The[38] matter of recovering the booty could, of course, be managed
by the use of a search-warrant, but for certain reasons the
Commander-in-Chief decided to deal with it without invoking the aid
of the law. Once the booty were taken possession of, then the
Admiralty Courts could take up the case and restore the plunder to
its lawful owners—the Norddeutscher-Lloyd Company.
"Very good, sir," he replied. "But I beg leave to state, sir, I've
already a little house at Mutley."
"You two can work together splendidly," declared the admiral. "If
you require additional assistance, wire at once."
The sale by auction was at eleven. At two o'clock came a wire from
Primmer addressed in a precautionary measure to a private address
at Plymouth—that of one of the Commander-in-Chief's staff. The
telegram was to the effect that Primmer had secured the house and
had paid the necessary deposit to Messrs. Jeremiah Built & Co.,
Auctioneers and Surveyors, of Penzance.
Directly Primmer reported that his furniture had arrived and that
his temporary abode was ready to receive his guest, Rollo Vyse took
train to Penzance. After making arrangements for his luggage to be
sent on, Vyse set out to walk to Mousehole.
The first man he met after the decision was a tall bronzed man
wearing fisherman's rig, including thigh boots.
"Up-along, Maaster," was the reply. "You'm see chimbly over atop o'
yon wall."
The recognition had been mutual, and the former mate of the
lugger was considerably perturbed at finding Vyse on his way to the
cottage where Porthoustoc lived.
"Well, I hope we shan't be here long, Mr. Primmer," said Rollo. "I'd
like to get away before Christmas."
"Same 'ere, sir," agreed the new tenant cordially. "We'll get to work
soon as you like. I've got crowbar, picks and spades an' such-like.
An' I brought a sack of cement up from Plymouth. Thought it 'ud
make 'em think if I got it hereabouts."
"I'll change, and then we'll have a look at the kitchen," decided
Rollo. "It'll make a bit of a mess, I fancy."
"My missus she don't mind," said Mr. Primmer reassuringly. "Fact is,
we've been doin' all the cooking in the spare room—proper sort o'
galley it makes."
Having completed the necessary change of clothing, Rollo,
accompanied by his host, went to the room under discussion. It was
about twenty feet in length and fifteen in breadth, stone walled and
stone floored. A doorway gave direct access to the garden; another
into the living-room.[42] There were two narrow windows, which
gave the place a look of perpetual gloom. One wall was blank, the
kitchen having been partly let into the steep hillside at the back of
the cottage.
"I've been a-lookin' at it, sir," said the ex-bo'sun. "Wall's made of
stone set in cement. It don't look as if it's been touched come these
fifty year—maybe longer."
"I'll get a torch," said Rollo. "It's too dark to see much without
artificial light. We'll have to curtain those windows pretty heavily
when we work at night. Any one coming along that path—it's a
public one, I take it?—can see right in if we don't screen the
windows."
Throwing the rays of his electric torch upon the mass of masonry,
Vyse saw that the ex-bo'sun had good reason for his statement. The
stones were black with smoke, the cement as hard as iron. Further
examination showed that there was a small rectangular aperture in
the roof close to the wall. Evidently the former occupants were in
the habit of kindling a fire on the open hearth adjoining the wall and
allowing the smoke to escape through the hole in the roof.
"It looks like it," admitted Rollo, scraping the cement with the back
of the blade of his penknife. "I suppose the cave does exist? Wonder
if the entrance is under these flagstones?"
"We'll soon find that out, sir," declared[43] the other. "I've a pick
and a crowbar close handy."
It was a long and difficult task chipping away the mortar between
the flagstones. As Rollo toiled and sweated, he wondered what it
would be like having to loosen cement. Mortar was hard enough.
At length, one stone was eased from its setting. With the aid of the
crowbar it was lifted. Underneath was soft soil mingled with rock.
Obviously that mixture would not hold over the mouth of a cave.
Rollo had not been scraping more than five minutes when he gave
an exclamation of satisfaction.
It was not the fog that had aroused him. A curious horripilation,
such as he had never before experienced, gripped him. For some
moments he lay with wide-open eyes fixed upon the dark grey
rectangular patch of open window.
His bedroom window was less than ten feet from the ground, the
house being low. On his left was the front of the kitchen—a one-
storeyed building. It was from that direction that the sound of the
mysterious footsteps came.
"Well, I think I scared him," he mused. "In future, while I'm here I
think I'll have a bed made up in the old kitchen. Then, if any one
tries to break in he'll feel sorry for himself."
After the midday meal, Vyse and his assistant got to work. They
were on the right track this time. Three hours' strenuous toil resulted
in the removal of a couple of large stones set in very hard cement.
Through the small aperture thus formed, they could discern a cavern
of generous proportions.
Armed with a torch, Rollo led the way. It was[46] a matter of about
a three-feet drop to the floor of the cave, the natural mouth of
which was of oval section, seven feet in height and four in width. In
length it went back nearly eighty yards, the width and height
increasing at ten feet or so from the entrance.
Working at high pressure, Vyse and his companion removed all the
booty from the cave and stored it in one of the rooms. They then
proceeded to wall up the cave, carefully discolouring the cement in
order to impart the appearance of age.
At the same time, the new owner and master of the lugger Fairy
was composing an anonymous letter to the chief officer of the Water
Guard at Penzance.
Rollo had another disturbed night. With an automatic pistol ready
to hand, he slept on a camp-bed by the side of the large pile of
booty; but although he kept waking and tiptoeing to the window,
somewhat to his surprise there were no signs of the intruder of the
previous evening. As soon as the post office opened, a telegram was
dispatched to Devonport asking for a van to be sent to remove the
"furniture"; while to allay suspicion on the part of his neighbours,
Primmer spread the yarn that his recently-acquired cottage[47] was
haunted, that his wife refused to remain there another night, and
that he had arranged to clear out that very clay.
CHAPTER XVIII
THE DESTROYER AND THE DESTROYED
E XACTLY three weeks after the capture of the Mendez Nunez the
Alerte arrived off the mouth of the Wad-el-Abuam, a small river
flowing into the Atlantic a few miles south of Cape Bojador.
The estuary formed an ideal base for Captain Cain's new sphere of
operations. Nominally within the limits of Rio del Oro—Spain's
extensive, unproductive and loosely-held dependency, stretching
from Morocco on the north to French Senegal on the south—the
Wad-el-Abuam was hardly ever visited by vessels, except Moorish
coasters and fishing craft.
The entrance to the river was a difficult one, a bar on which the
surf broke heavily, extending practically right across it, although well
on the starboard hand was a narrow channel carrying twenty feet at
high water and protected by a long, narrow rocky island that not
only served as a breakwater, but also effectively screened the
estuary when viewed from seaward.
Within the bar the depth increased to sixty feet, with a bottom of
firm white sand. Farther up, the bed was composed of mud that
became more[50] objectionable as the width of the river decreased.
The banks were almost destitute of vegetation, consisting of sand
with a few palms and a scanty scrub that afforded meagre food for
goats belonging to the inhabitants. There were four or five small
villages populated by a tribe of savages, half Arab, half Negro, who
had long resisted any attempt at subjection on the part of the
Spanish troops stationed at Villa Cisnero and other fortified posts of
Rio del Oro.
Within two hundred miles lay the Canary Islands, with Funchal, the
favourite port of call for ships running between Europe and the west
and south coasts of Africa. Farther to the south'ard was Teneriffe,
with Las Palmas, another frequented coaling-station. Both these
were within the Alerte's wireless radius, so that the pirates hoped to
obtain a fairly complete report of all vessels passing within striking
distance of their proposed base.
"It must be there," replied Cain, after consulting the latest but far
from reliable chart of this part of the coast. "We'll stand in a bit
more. If there's any doubt about it, we'll send a boat and take
soundings. The sailing directions state that the island is hardly
distinguishable from the mainland except at short distance."
He levelled his binoculars for the twentieth time during the last
hour.
While the Alerte was yet a quarter of a mile from her, the felucca
ported helm, close-hauled, and stood off in a nor' nor'-westerly
direction.
"So did I," agreed Captain Cain. "But now I think she's a Moorish
fishing vessel homeward bound. She had to stand out towards us to
avoid running on the shoals. We'll collar her, Pengelly. If the old boy
in the cotton nightgown is reasonable we'll pay him and let him go
when he's piloted us in."[52]
A cast of the lead gave nine fathoms, and since the chart showed
that the soundings were remarkably even on this course, Captain
Cain had no apprehensions of running his vessel aground.
The crew of the felucca seemed quite apathetic when they saw the
Alerte in pursuit. At a sign from the white-robed Moor the two blacks
lowered the sails, one of them standing by to heave a line.
Captain Cain replied by indicating the longer cord and then holding
up six fingers. The Moor nodded gravely and motioned to the pirate
skipper to order the ship to forge ahead.
Slowly the Alerte made her way inside the island, and thence
through the channel over the bar. The while the lead was kept going,
Pengelly and the bo'sun taking bearings and noting how the channel
bore for future occasions.
"Stand by and let go!" roared Cain as the Alerte arrived at her
anchorage. "Is the buoy streamed, Mr. Barnard?"
With the rattle of chain tearing through the hawsepipe, the anchor
plunged to the bed of the Wad-el-Abuam.
Pengelly turned to his captain.
"Snug little crib, this, sir," he remarked. "What about our pilot? Are
we going to overhaul his boat in case there's anything useful? The
blighter might have been pearling. One never knows."
Turning to the Moor, who was standing a couple of paces off, the
pirate captain handed him a gold coin.
The pilot took the piece of money, made an elaborate salaam, and
went to the side, the felucca having been brought to the gangway.
Already[54] the two negroes were hoisting sail. With another salaam,
the Moor boarded his own craft, the ropes were cast off, and the
felucca headed for the open sea.
In quick time the aerial was spread between the two masts and the
"lead-in" connected to a powerful wireless set concealed between
double bulkheads at the after end of the little fo'c'sle. A message
was then dispatched in code to the Officer Commanding H.M.S.
Canvey, giving the position of the pirate submarine's new base.
Allerton was in high feather. It was he who had "trailed the tail of
his coat" across the path of the pirate submarine and had piloted her
into the estuary of the Wad-el-Abuam.
For the next few days the decoy ship steamed to and fro between
the Canaries and St. Vincent sending out fictitious messages en clair
in the hope that the Alerte would emerge from her retreat and[57]
come outside the three-mile limit in order to seize a likely prey. But
no Alerte put in an appearance.
The Villamil was an old vessel of three hundred and sixty tons, with
a speed of twenty-eight knots. Her armament consisted of five six-
pounders, of which three could fire ahead and three on the beam. In
addition, she carried two torpedo tubes.
While this work was in progress, Captain Cain had not allowed
other matters to slide. One of his first steps was to establish a signal
station on the rocky island guarding and screening the Alerte's
anchorage. Day and night armed men were on watch at the station,
ready to signal to the pirate vessel the moment any sail appeared
over the horizon.
Just before noon one morning, Captain Cain was informed that a
craft looking like a destroyer was approaching from the nor'ard and
steaming a course parallel to the coast.
In hot haste six men with the machine-gun were sent off in a boat
to the island with instructions to keep under cover and not to open
fire until the approaching destroyer came within a hundred yards of
the rock, which she must do by reason of the tortuous course of the
deep-water channel.
The Alerte was swung athwart the river to enable her six-inch
quick-firer to bear. With the excep-[59]tion of the captain, Mr.
Marchant and the gun's crew, all the rest of the hands were ordered
below to be ready to replace casualties amongst the men working
the quick-firer.
Alone on the bridge, Cain stood calm and confident. There was not
the slightest tremor in his large, powerful hands as he grasped his
binoculars ready to bring them to bear upon the as yet invisible
enemy.
Round the precipitous face of the island appeared the lean bows of
the Spanish destroyer. Then her round bridge, mast and funnels
came into view. Through his glasses Cain saw that her fo'c'sle gun
was manned by a crew of white-clad, swarthy-faced men.... There
was a deafening crash as the Alerte's six-inch sent the hundred-
pound projectile hurtling on its way.... Even as he looked, Cain saw a
vivid flash immediately in front of the destroyer's bridge... a cloud of
smoke torn by diverging blasts of air.... The smoke dispersed, or
rather the destroyer's speed carried her through it.... The crew of
her fo'c'sle six-pounder had dispersed, too; with them the gun and
its mounting.... The bridge didn't look the same as it had a few
seconds previously—a bit lopsided. Flames were pouring from a
heap of débris in the wake of the foremast.
At two thousand yards the appalling noise caused by the explosion
of the Alerte's first shell was in-[61]audible to the solitary watcher on
her bridge. The scene brought within a very short distance through
the lenses of the powerful binoculars resembled a "close-up" picture
on the cinematograph—unrealistic by reason of the absence of
sound.
Two vivid flashes leapt from the Spanish destroyer's deck, one on
the port side, the other to starboard. They were her reply to the
destructive "sighting shot" from the pirate submarine.
The Villamil had received a rough awakening. Her crew, not one of
whom had previously been under fire, were lacking in that courage
and tenacity that marks the Anglo-Saxon race. Appalled by the havoc
wrought on the fo'c'sle, the gunlayers of the remaining weapons that
could be brought to bear certainly did make reply. Their aim was
bad. One shell whizzed high above the Alerte's masts, shrieking as it
sped to bury itself harmlessly in the sand three miles away. The
other, striking the water a hundred yards short of its objective,
ricochetted and hurtled through the air full fifty yards astern.
Cain paid no attention to either. His interest was centred upon his
attacker. He could hear the rapid crashes of the Alerte's quick-firer.
He could see the results by the frequent lurid bursts of flame and
the showers of débris as shell after shell struck the luckless
Spaniard.
Still she came on, leaving an eddying trail of smoke. One of her six-
pounders was firing spasmodically. She was reeling like a drunken
man.
Suddenly Cain put aside his glasses and made a spring for the
telegraph indicator, moving the[62] starboard lever to "full ahead."
His quick eye had discerned a glistening object curving over the
Villamil's side. A torpedo was already on its way, travelling at the
speed of a train in the direction of the pirate submarine.
Well before the action the Alerte's oil-engines had been started
with the clutches in neutral position. It was a precaution that was
justified in its results. Under the action of one propeller only the
Alerte forged ahead, her stern swinging round as she overran her
anchors.
For ten seconds the captain held his breath. Looking aft, the rise of
the poop intercepted the wake of the torpedo. It seemed as if the
Alerte was doomed.
The Alerte's quick-firer was now silent. The manoeuvre that had
saved her from the torpedo had brought her almost bows-on to the
Villamil, with the result that the former's fo'c'sle masked her line of
fire.
For the first time since the action commenced Captain Cain spoke.
Leaning over the bridgerail he shouted to the gunlayer to aim for the
Spaniard's aft torpedo-tube.
The Villamil was well down by the head and had a pronounced list
to starboard. Her speed had appreciably fallen off. The menace of
being rammed was now hardly worth taking into account; but the
torpedo—— At that range, if the Spanish torpedo-gunner knew his
job, it was almost a matter of impossibility to miss.
Cain could see four or five grimy figures bringing the loading cage
to the after-end of the tube. The torpedo was launched home.... He
could see the convex metal cover swing into the closing position...
the torpedo coxswain was getting astride the tube... in another three
or four seconds...[64]
A deafening crash told the anxious skipper of the Alerte that the
six-inch was again at work. At a range of six hundred yards the shell
got home. A terrific flash—it was far too vivid for the explosion of a
shell—leapt from the destroyer. An enormous cloud of smoke was
hurled skywards, completely obliterating the Villamil from Cain's
vision. A blast of hot air swept over the superstructure of the
submarine. Pieces of metal tinkled on her steel deck. Heavier pieces
were falling with a succession of splashes into the smoke-
enshrouded water.
The only casualty on board the Alerte was No. 3 of the gun's crew,
and he had been knocked out only after the Villamil had been
destroyed. A fragment of steel descending with terrific force had
struck him on the head, killing him instantly.
The action over, Captain Cain brought the rest of the hands on
deck.
"My lads!" he exclaimed, "if we were out for glory, we've got it. It
wasn't of our seeking. It's riches, not glory, we're after. Now, lads,
although there's no one of our opponents left to tell the tale, we'll
have to get a move on. One more good capture and we pay off.
With luck we'll finish repairs by nightfall. To-morrow I hope our aims
will be realised. There's a Belgian vessel due to leave St. Vincent at
dawn to-morrow.[65] She's ours for the asking. I propose to capture
her and bring her in here until we can unload everything of value. All
then that remains to be done is to hide the booty, make our way
home and come out again as quite above-board West Coast traders.
That's all I have to say, lads. No hanging on to the slack, but plenty
of beef into your work for the next few hours and everything will be
plain sailing. Pipe down!"[66]
CHAPTER XIX
RECALLED
And for a very good reason. No wireless message from the Villamil
was received by the[67] Spanish naval authorities after a brief report
that the destroyer was about to enter the Wad-el-Abuam to attack
the Alerte. From that time the movements of the destroyer were
shrouded in mystery.
Presently it occurred to the Spanish Admiralty that all was not well
with the Villamil. There was something decidedly ominous about the
prolonged silence. The weather had been unusually quiet, so her
disappearance could not be attributed to a sudden tempest. It
seemed incredible that a unit of Spain's navy had been vanquished
by a contemptible pirate ship. But at last that supposition had to be
regarded as a fact.
The owner read the message. The corners of his mouth dropped.
In a few minutes the "buzz" was all over the ship. The feeling of
disappointment had a consoling feature. The Canvey would be
ordered home to be put out of commission, and that meant the
bluejackets' highly-prized privilege—paying off leave, or "leaf" as the
"matloe" insists on calling it.
CHAPTER XX
THE AFFAIR OF THE BRONX CITY
"
THERE'S that Candide asking for trouble, sir," replied Pengelly, as
he entered the captain's cabin. "We've just intercepted a
message saying she's leaving St. Vincent to-day."
"Quite so," agreed Cain. "That's why I'm anxious to nab the
Candide. Pass the word to Mr. Barnard that I want to be under way
in an hour's time—just before high water."
Just before two bells in the afternoon watch, smoke was observed
on the southern horizon. Twenty minutes later the dark grey hull of
a fairly big steamer emerged from the patches of haze.
"She's the Candide right enough," declared Cain. "Clear away the
gun, my lads. One more hooker and our job's done.... No colours
yet, Mr. Barnard. We'll let 'em have a good sight of the Jolly Roger in
a brace of shakes. Pick your boarding-party, Mr. Pengelly. See that
everything's ready in the boat."
"Ay," agreed Cain, with a grin. "And there's the name Bronx City on
her bows as large as life. Yankee colours and Yankee name don't
turn a Belgian tramp into a United States hooker. I'm[72] too old a
bird to be caught with chaff.... Starboard a bit, Quartermaster... at
that!"
The eyes of the signalman, the gun's crew and the seamen
standing aft with the rolled-up skull and cross-bones already toggled
to the halyards, were all fixed expectantly upon the skipper of the
pirate submarine as he stood at the extreme end of the port side of
the bridge.
Captain Cain raised his right hand. At the signal the black flag was
broken out, the International ID hoisted at the fore, while an instant
later a shot whizzed across the stranger's bows.
Promptly Pengelly and his men pushed off to the prize, under cover
of the Alerte's six-inch gun. Before the boat ran alongside the
stranger, the latter's accommodation-ladder had been lowered.
Pengelly realised that Cain had made a mistake. The vessel was not
the Candide disguised, but the Bronx City, registered and owned in
the United States. But having boarded her, Pengelly had no intention
of returning ignominiously to the Alerte.[73]
"No quitting this time, skipper," he replied firmly. "I'm not here to
argue—this is my persuader."
He touched the barrel of his automatic with his left hand and then
pointed to the Alerte, which was still closing the prize.
"Guess you'll swing for this," exclaimed the captain of the Bronx
City.
"More ways than one of killing a cat," retorted Pengelly. "Now, you
—officers and men—for'ard you go and keep quiet, or it'll be the
worse for you."
"I am," rejoined Pengelly curtly. "Now let me see your papers."
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com