Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download
Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download
Adarsha Shivananda
Bangalore, Karnataka, India
Apress Standard
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Adarsha Shivananda
is a lead data scientist at Indegene Inc.’s product and technology team,
where he leads a group of analysts who enable predictive analytics and
AI features to healthcare software products. These are mainly
multichannel activities for pharma products and solving the real-time
problems encountered by pharma sales reps. Adarsha aims to build a
pool of exceptional data scientists within the organization to solve
greater health care problems through
brilliant training programs. He always
wants to stay ahead of the curve.
His core expertise involves machine
learning, deep learning,
recommendation systems, and statistics.
Adarsha has worked on various data
science projects across multiple domains
using different technologies and
methodologies. Previously, he worked
for Tredence Analytics and IQVIA.
He lives in Bangalore, India, and loves
to read, ride, and teach data science.
About the Technical Reviewer
Aakash Kag
is a data scientist at AlixPartners and is a
co-founder of the Emeelan application.
He has six years of experience in big data
analytics and has a postgraduate degree
in computer science with a specialization
in big data analytics. Aakash is
passionate about developing social
platforms, machine learning, and
meetups, where he often talks.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_1
This chapter covers various sources of text data and the ways to extract it. Textual data can act as
information or insights for businesses. The following recipes are covered.
Recipe 1. Text data collection using APIs
Recipe 2. Reading a PDF file in Python
Recipe 3. Reading a Word document
Recipe 4. Reading a JSON object
Recipe 5. Reading an HTML page and HTML parsing
Recipe 6. Regular expressions
Recipe 7. String handling
Recipe 8. Web scraping
Introduction
Before getting into the details of the book, let’s look at generally available data sources. We need to identify
potential data sources that can help with solving data science use cases.
Client Data
For any problem statement, one of the sources is the data that is already present. The business decides
where it wants to store its data. Data storage depends on the type of business, the amount of data, and the
costs associated with the sources. The following are some examples.
SQL databases
HDFS
Cloud storage
Flat files
Free Sources
A large amount of data is freely available on the Internet. You just need to streamline the problem and start
exploring multiple free data sources.
Free APIs like Twitter
Wikipedia
Government data (e.g., http://data.gov)
Census data (e.g., www.census.gov/data.html)
Health care claim data (e.g., www.healthdata.gov)
Data science community websites (e.g., www.kaggle.com)
Google dataset search (e.g., https://datasetsearch.research.google.com)
Web Scraping
Extracting the content/data from websites, blogs, forums, and retail websites for reviews with permission
from the respective sources using web scraping packages in Python.
There are a lot of other sources, such as news data and economic data, that can be leveraged for analysis.
Problem
You want to collect text data using Twitter APIs.
Solution
Twitter has a gigantic amount of data with a lot of value in it. Social media marketers make their living from
it. There is an enormous number of tweets every day, and every tweet has some story to tell. When all of this
data is collected and analyzed, it gives a business tremendous insights about their company, product,
service, and so forth.
Let’s now look at how to pull data and then explore how to leverage it in the coming chapters.
How It Works
Step 1-1. Log in to the Twitter developer portal
Log in to the Twitter developer portal at https://developer.twitter.com.
Create your own app in the Twitter developer portal, and get the following keys. Once you have these
credentials, you can start pulling data.
consumer key: The key associated with the application (Twitter, Facebook, etc.)
consumer secret: The password used to authenticate with the authentication server (Twitter, Facebook,
etc.)
access token: The key given to the client after successful authentication of keys
access token secret: The password for the access key
# Install tweepy
!pip install tweepy
# Import the libraries
import numpy as np
import tweepy
import json
import pandas as pd
from tweepy import OAuthHandler
# credentials
consumer_key = "adjbiejfaaoeh"
consumer_secret = "had73haf78af"
access_token = "jnsfby5u4yuawhafjeh"
access_token_secret = "jhdfgay768476r"
# calling API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Provide the query you want to pull the data. For example, pulling data for
the mobile phone ABC
query ="ABC"
# Fetching tweets
Tweets = api.search(query, count =
10,lang='en',exclude='retweets',tweet_mode='extended')
This query pulls the top ten tweets when product ABC is searched. The API pulls English tweets since the
language given is 'en'. It excludes retweets.
Problem
You want to read a PDF file.
Solution
The simplest way to read a PDF file is by using the PyPDF2 library.
How It Works
Follow the steps in this section to extract data from PDF files.
Note You can download any PDF file from the web and place it in the location where you are running
this Jupyter notebook or Python script.
Please note that the function doesn’t work for scanned PDFs.
Problem
You want to read Word files .
Solution
The simplest way is to use the docx library.
How It Works
Follow the steps in this section to extract data from a Word file.
#Install docx
!pip install docx
#Import library
from docx import Document
Note You can download any Word file from the web and place it in the location where you are running a
Jupyter notebook or Python script.
Problem
You want to read a JSON file/object.
Solution
The simplest way is to use requests and the JSON library.
How It Works
Follow the steps in this section to extract data from JSON.
import requests
import json
Step 4-2. Extract text from a JSON file
Now let’s extract the text .
Problem
You want to read parse/read HTML pages.
Solution
The simplest way is to use the bs4 library.
How It Works
Follow the steps in this section to extract data from the web.
response =
urllib2.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')
html_doc = response.read()
#Parsing
soup = BeautifulSoup(html_doc, 'html.parser')
# Formating the parsed html file
strhtm = soup.prettify()
# Print few lines
print (strhtm[:1000])
#output
<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>
Natural language processing - Wikipedia
</title>
<script>
document.documentElement.className = document.documentElement.className.rep
</script>
<script>
(window.RLQ=window.RLQ||[]).push(function()
{mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"
processing","wgCurRevisionId":860741853,"wgRevisionId":860741853,"wgArticleId"
["*"],"wgCategories":["Webarchive template wayback links","All accuracy disput
identifiers","Natural language processing","Computational linguistics","Speech
print(soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)
#output
<title>Natural language processing - Wikipedia</title>
Natural language processing - Wikipedia
None
Natural language processing
Problem
You want to parse text data using regular expressions.
Solution
The best way is to use the re library in Python.
How It Works
Let’s look at some of the ways we can use regular expressions for our tasks.
The basic flags are I, L, M, S, U, X.
re.I ignores casing.
re.L finds a local dependent.
re.M finds patterns throughout multiple lines.
re.S finds dot matches.
re.U works for Unicode data.
re.X writes regex in a more readable format.
The following describes regular expressions’ functionalities .
Find a single occurrence of characters a and b: [ab]
Find characters except for a and b: [^ab]
Find the character range of a to z: [a-z]
Find a character range except a to z: [^a-z]
Find all the characters from both a to z and A to Z: [a-zA-Z]
Find any single character: []
Find any whitespace character: \s
Find any non-whitespace character: \S
Find any digit: \d
Find any non-digit: \D
Find any non-words: \W
Find any words: \w
Find either a or b: (a|b)
The occurrence of a is either zero or one
Matches zero or not more than one occurrence: a? ; ?
The occurrence of a is zero or more times: a* ; * matches zero or more than that
The occurrence of a is one or more times: a+ ; + matches occurrences one or more
than one time
Match three simultaneous occurrences of a: a{3}
Match three or more simultaneous occurrences of a: a{3,}
Match three to six simultaneous occurrences of a: a{3,6}
Start of a string: ^
End of a string: $
Match word boundary: \b
Non-word boundary: \B
The re.match() and re.search() functions find patterns, which are then processed according to
the requirements of the application.
Let’s look at the differences between re.match() and re.search().
re.match() checks for a match only at the beginning of the string. So, if it finds a pattern at the
beginning of the input string, it returns the matched pattern; otherwise, it returns a noun.
re.search() checks for a match anywhere in the string. It finds all the occurrences of the pattern in the
given input string or data.
Now let’s look at a few examples using these regular expressions.
Tokenizing
Tokenizing means splitting a sentence into words. One way to do this is to use re.split.
# Import library
import re
#run the split query
re.split('\s+','I like this book.')
['I', 'like', 'this', 'book.']
([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)
There are even more complex ones to handle all the edge cases (e.g., “.co.in” email IDs). Please give it a
try.
# Import library
import re
import requests
#url you want to extract
url = 'https://www.gutenberg.org/files/2638/2638-0.txt'
#function to extract
def get_book(url).
# Sends a http request to get the text from project Gutenberg
raw = requests.get(url).text
# Discards the metadata from the beginning of the book
start = re.search(r"\*\*\* START OF THIS PROJECT GUTENBERG EBOOK .*
\*\*\*",raw ).end()
# Discards the metadata from the end of the book
stop = re.search(r"II", raw).start()
# Keeps the relevant text
text = raw[start:stop]
return text
# processing
def preprocess(sentence).
return re.sub('[^A-Za-z0-9.]+' , ' ', sentence).lower()
#calling the above function
book = get_book(url)
processed_book = preprocess(book)
print(processed_book)
# Output
produced by martin adamson david widger with corrections by andrew sly
the idiot by fyodor dostoyevsky translated by eva martin part i i. towards
the end of november during a thaw at nine o clock one morning a train on
the warsaw and petersburg railway was approaching the latter city at full
speed. the morning was so damp and misty that it was only with great
difficulty that the day succeeded in breaking and it was impossible to
distinguish anything more than a few yards away from the carriage windows.
some of the passengers by this particular train were returning from abroad
but the third class carriages were the best filled chiefly with
insignificant persons of various occupations and degrees picked up at the
different stations nearer town. all of them seemed weary and most of them
had sleepy eyes and a shivering expression while their complexions
generally appeared to have taken on the colour of the fog outside. when da
2. Perform an exploratory data analysis on this data using regex.
Problem
You want to explore handling strings.
Solution
The simplest way is to use the following string functionality.
s.find(t) is an index of the first instance of string t inside s (–1 if not found)
s.rfind(t) is an index of the last instance of string t inside s (–1 if not found)
s.index(t) is like s.find(t) except it raises ValueError if not found
s.rindex(t) is like s.rfind(t) except it raises ValueError if not found
s.join(text) combines the words of the text into a string using s as the glue
s.split(t) splits s into a list wherever a t is found (whitespace by default)
s.splitlines() splits s into a list of strings, one per line
s.lower() is a lowercase version of the string s
s.upper() is an uppercase version of the string s
s.title() is a titlecased version of the string s
s.strip() is a copy of s without leading or trailing whitespace
s.replace(t, u) replaces instances of t with u inside s
How It Works
Now let’s look at a few of the examples.
Replacing Content
Create a string and replace the content. Creating strings is easy. It is done by enclosing the characters in
single or double quotes. And to replace, you can use the replace function.
1. Create a string.
s1 = "nlp"
s2 = "machine learning"
s3 = s1+s2
print(s3)
#output
'nlpmachine learning'
Caution Before scraping any websites, blogs, or ecommerce sites, please make sure you read the site’s
terms and conditions on whether it gives permissions for data scraping. Generally, robots.txt contains the
terms and conditions (e.g., see www.alixpartners.com/robots.txt) and a site map contains a
URL’s map (e.g., see www.alixpartners.com/sitemap.xml).
Web scraping is also known as web harvesting and web data extraction. It is a technique to extract a large
amount of data from websites and save it in a database or locally. You can use this data to extract
information related to your customers, users, or products for the business’s benefit.
A basic understanding of HTML is a prerequisite.
Problem
You want to extract data from the web by scraping. Let’s use IMDB.com as an example of scraping top
movies.
Solution
The simplest way to do this is by using Python’s Beautiful Soup or Scrapy libraries. Let’s use Beautiful Soup
in this recipe.
How It Works
Follow the steps in this section to extract data from the web.
Step 8-4. Request the URL and download the content using Beautiful Soup
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c,"lxml")
Step 8-5. Understand the website’s structure to extract the required information
Go to the website and right-click the page content to inspect the site’s HTML structure.
Identify the data and fields that you want to extract. For example, you want the movie name and IMDB
rating.
Check which div or class in the HTML contains the movie names and parse the Beautiful Soup
accordingly. In this example, you can parse the soup through <table class ="chart full-width">
and <td class="titleColumn"> to extract the movie name.
Similarly, you can fetch other data; refer to the code in step 8-6.
Step 8-6. Use Beautiful Soup to extract and parse the data from HTML tags
summary = soup.find('div',{'class':'article'})
# Create empty lists to append the extracted data .
moviename = []
cast = []
description = []
rating = []
ratingoutof = []
year = []
genre = []
movielength = []
rot_audscore = []
rot_avgrating = []
rot_users = []
# Extracting the required data from the html soup.
rgx = re.compile('[%s]' % '()')
f = FloatProgress(min=0, max=250)
display(f)
for row,i in
zip(summary.find('table').findAll('tr'),range(len(summary.find('table').findAl
for sitem in row.findAll('span',{'class':'secondaryInfo'}).
s = sitem.find(text=True)
year.append(rgx.sub(", s))
for ritem in row.findAll('td',{'class':'ratingColumn imdbRating'}).
for iget in ritem.findAll('strong').
rating.append(iget.find(text=True))
ratingoutof.append(iget.get('title').split(' ', 4)[3])
for item in row.findAll('td',{'class':'titleColumn'}).
for href in item.findAll('a',href=True).
moviename.append(href.find(text=True))
rurl = 'https://www.rottentomatoes.com/m/'+ href.find(text=True)
try.
rresult = requests.get(rurl)
except requests.exceptions.ConnectionError.
status_code = "Connection refused"
rc = rresult.content
rsoup = BeautifulSoup(rc)
try:
rot_audscore.append(rsoup.find('div',{'class':'meter-
value'}).find('span',{'class':'superPageFontColor'}).text)
rot_avgrating.append(rsoup.find('div',{'class':'audience-info
superPageFontColor'}).find('div').contents[2].strip())
rot_users.append(rsoup.find('div',{'class':'audience-info hidd
superPageFontColor'}).contents[3].contents[2].strip())
except AttributeError.
rot_audscore.append("")
rot_avgrating.append("")
rot_users.append("")
cast.append(href.get('title'))
imdb = "http://www.imdb.com" + href.get('href')
try.
iresult = requests.get(imdb)
ic = iresult.content
isoup = BeautifulSoup(ic)
description.append(isoup.find('div',
{'class':'summary_text'}).find(text=True).strip())
genre.append(isoup.find('span',{'class':'itemprop'}).find(text
movielength.append(isoup.find('time',
{'itemprop':'duration'}).find(text=True).strip())
except requests.exceptions.ConnectionError.
description.append("")
genre.append("")
movielength.append("")
sleep(.1)
f.value = i
Note that there is a high chance that you might encounter an error while executing this script because of
the following reasons.
Your request to the URL fails. If so, try again after some time. This is common in web scraping.
The webpages are dynamic, which means the HTML tags keep changing. Study the tags and make small
changes in the code in accordance with HTML, and you should be good to go.
Step 8-7. Convert lists to a data frame and perform an analysis that meets
business requirements
# List to pandas series
moviename = Series(moviename)
cast = Series(cast)
description = Series(description)
rating = Series(rating)
ratingoutof = Series(ratingoutof)
year = Series(year)
genre = Series(genre)
movielength = Series(movielength)
rot_audscore = Series(rot_audscore)
rot_avgrating = Series(rot_avgrating)
rot_users = Series(rot_users)
# creating dataframe and doing analysis
imdb_df = pd.concat([moviename,year,description,genre,movielength,cast,rating,
imdb_df.columns =
['moviename','year','description','genre','movielength','cast','imdb_rating','
imdb_df['rank'] = imdb_df.index + 1
imdb_df.head(1)
#output
This chapter implemented most of the techniques to extract text data from sources. In the coming
chapters, you look at how to explore, process, and clean data. You also learn about feature engineering and
building NLP applications.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_2
This chapter discusses various methods and techniques to preprocess textual data and
exploratory data analysis. It covers the following recipes.
Recipe 1. Lowercasing
Recipe 2. Punctuation removal
Recipe 3. Stop words removal
Recipe 4. Text standardization
Recipe 5. Spelling correction
Recipe 6. Tokenization
Recipe 7. Stemming
Recipe 8. Lemmatization
Recipe 9. Exploratory data analysis
Recipe 10. Dealing with emojis and emoticons
Recipe 11. End-to-end processing pipeline
Before directly jumping into the recipes, let’s first understand the need for preprocessing
the text data. As you know, about 90% of the world’s data is unstructured and may be present
in the form of an image, text, audio, and video. Text can come in various forms, from a list of
individual words to sentences to multiple paragraphs with special characters (like tweets and
other punctuations). It also may be present in the form of web, HTML, documents, and so on.
And this data is never clean and consists of a lot of noise. It needs to be treated and then
perform a few preprocessing functions to make sure you have the right input data for the
feature engineering and model building. If you don’t preprocess the data, any algorithms built
on top of such data do not add any value to a business. This reminds us of a very popular
phrase in data science: “Garbage in, garbage out.”
Preprocessing involves transforming raw text data into an understandable format. Real-
world data is often incomplete, inconsistent, and filled with a lot of noise, and is likely to
contain many errors. Preprocessing is a proven method of resolving such issues. Data
preprocessing prepares raw text data for further processing.
Problem
You want to lowercase the text data.
Solution
The simplest way is to use the default lower() function in Python.
The lower() method converts all uppercase characters in a string to lowercase characters
and returns them.
How It Works
Follow the steps in this section to lowercase a given text or document. Here, Python is used.
x = 'Testing'
x2 = x.lower()
print(x2)
#output
'testing'
When you want to perform lowercasing on a data frame, use the apply function as follows.
ALIMENTARY CANAL,
AND UPON
ANTHELMINTIC MEDICINES.
Footnotes:
[40] See the Inquiry into the Diseases of the Indians, p. 19.
[41] Vol. II. of his Epidemics, p. 56.
[42] P. 136.
[43] Vol. I. p. 76.
[44] Vol. II. p. 329.
[45] Dolichos Pruriens, of Linnæus.
[46] Geoffrea, of Linnæus.
[47] Spigelia Marylandica, of Linnæus.
AN ACCOUNT
OF THE
CURE OF CANCERS.
A few years ago, a certain Doctor Hugh Martin, a surgeon of one
of the Pennsylvania regiments stationed at Pittsburg, during the
latter part of the late war, came to this city, and advertised to cure
cancers with a medicine which he said he had discovered in the
woods, in the neighbourhood of the garrison. As Dr. Martin had once
been my pupil, I took the liberty of waiting upon him, and asked him
some questions respecting his discovery. His answers were
calculated to make me believe, that his medicine was of a vegetable
nature, and that it was originally an Indian remedy. He showed me
some of the medicine, which appeared to be the powder of a well-
dried root of some kind. Anxious to see the success of this medicine
in cancerous sores, I prevailed upon the doctor to admit me to see
him apply it in two or three cases. I observed, in some instances, he
applied a powder to the parts affected, and in others only touched
them with a feather dipped in a liquid which had a white sediment,
and which he made me believe was the vegetable root diffused in
water. It gave me great pleasure to witness the efficacy of the
doctor's applications. In several cancerous ulcers, the cures he
performed were complete. Where the cancers were much connected
with the lymphatic system, or accompanied with a scrophulous habit
of body, his medicine always failed, and, in some instances, did
evident mischief.
Anxious to discover a medicine that promised relief in even a few
cases of cancers, and supposing that all the caustic vegetables were
nearly alike, I applied the phytolacca or poke-root, the stramonium,
the arum, and one or two others, to foul ulcers, in hopes of seeing
the same effects from them which I had seen from Doctor Martin's
powder; but in these I was disappointed. They gave some pain, but
performed no cures. At length I was furnished by a gentleman from
Pittsburg with a powder which I had no doubt, from a variety of
circumstances, was of the same kind as that used by Dr. Martin. I
applied it to a fungous ulcer, but without producing the degrees of
pain, inflammation, or discharge, which I had been accustomed to
see from the application of Dr. Martin's powder. After this, I should
have suspected that the powder was not a simple root, had not the
doctor continued upon all occasions to assure me, that it was wholly
a vegetable preparation.
In the beginning of the year 1784, the doctor died, and it was
generally believed that his medicine had died with him. A few weeks
after his death I procured, from one of his administrators, a few
ounces of the doctor's powder, partly with a view of applying it to a
cancerous sore which then offered, and partly with a view of
examining it more minutely than I had been able to do during the
doctor's life. Upon throwing the powder, which was of a brown
colour, upon a piece of white paper, I perceived distinctly a number
of white particles scattered through it. I suspected at first that they
were corrosive sublimate, but the usual tests of that metallic salt
soon convinced me, that I was mistaken. Recollecting that arsenic
was the basis of most of the celebrated cancer powders that have
been used in the world, I had recourse to the tests for detecting it.
Upon sprinkling a small quantity of the powder upon some coals of
fire, it emitted the garlick smell so perceptibly as to be known by
several persons whom I called into the room where I made the
experiment, and who knew nothing of the object of my inquiries.
After this, with some difficulty I picked out about three or four grains
of the white powder, and bound them between two pieces of copper,
which I threw into the fire. After the copper pieces became red hot,
I took them out of the fire, and when they had cooled, discovered
an evident whiteness imparted to both of them. One of the pieces
afterwards looked like dull silver. These two tests have generally
been thought sufficient to distinguish the presence of arsenic in any
bodies; but I made use of a third, which has lately been
communicated to the world by Mr. Bergman, and which is supposed
to be in all cases infallible.
I infused a small quantity of the powder in a solution of a
vegetable alkali in water for a few hours, and then poured it upon a
solution of blue vitriol in water. The colour of the vitriol was
immediately changed to a beautiful green, and afterwards
precipitated.
I shall close this paper with a few remarks upon this powder, and
upon the cure of cancers and foul ulcers of all kinds.
1. The use of caustics in cancers and foul ulcers is very ancient,
and universal. But I believe arsenic to be the most efficacious of any
that has ever been used. It is the basis of Plunket's and probably of
Guy's well-known cancer powders. The great art of applying it
successfully, is to dilute and mix it in such a manner as to mitigate
the violence of its action. Doctor Martin's composition was happily
calculated for this purpose. It gave less pain than the common or
lunar caustic. It excited a moderate inflammation, which separated
the morbid from the sound parts, and promoted a plentiful afflux of
humours to the sore during its application. It seldom produced an
escar; hence it insinuated itself into the deepest recesses of the
cancers, and frequently separated those fibres in an unbroken state,
which are generally called the roots of the cancer. Upon this account,
I think, in some ulcerated cancers it is to be preferred to the knife. It
has no action upon the sound skin. This Doctor Hall proved, by
confining a small quantity of it upon his arm for many hours. In
those cases where Doctor Martin used it to extract cancerous or
schirrous tumours that were not ulcerated, I have reason to believe
that he always broke the skin with Spanish flies.
2. The arsenic used by the doctor was the pure white arsenic. I
should suppose from the examination I made of the powder with the
eye, that the proportion of arsenic to the vegetable powder, could
not be more than one-fortieth part of the whole compound. I have
reason to think that the doctor employed different vegetable
substances at different times. The vegetable matter with which the
arsenic was combined in the powder which I used in my
experiments, was probably nothing more than the powder of the
root and berries of the solanum lethale, or deadly nightshade. As the
principal, and perhaps the only design of the vegetable addition was
to blunt the activity of the arsenic, I should suppose that the same
proportion of common wheat flour as the doctor used of his caustic
vegetables, would answer nearly the same purpose. In those cases
where the doctor applied a feather dipped in a liquid to the sore of
his patient, I have no doubt but his phial contained nothing but a
weak solution of arsenic in water. This is no new method of applying
arsenic to foul ulcers. Doctor Way of Wilmington has spoken in the
highest terms to me of a wash for foulnesses on the skin, as well as
old ulcers, prepared by boiling an ounce of white arsenic in two
quarts of water to three pints, and applying it once or twice a day.
3. I mentioned, formerly, that Doctor Martin was often
unsuccessful in the application of his powder. This was occasioned
by his using it indiscriminately in all cases. In schirrous and
cancerous tumours, the knife should always be preferred to the
caustic. In cancerous ulcers attended with a scrophulous or a bad
habit of body, such particularly as have their seat in the neck, in the
breasts of females, and in the axillary glands, it can only protract the
patient's misery. Most of the cancerous sores cured by Doctor Martin
were seated on the nose, or cheeks, or upon the surface or
extremities of the body. It remains yet to discover a cure for cancers
that taint the fluids, or infect the whole lymphatic system. This cure
I apprehend must be sought for in diet, or in the long use of some
internal medicine.
To pronounce a disease incurable, is often to render it so. The
intermitting fever, if left to itself, would probably prove frequently,
and perhaps more speedily fatal than cancers. And as cancerous
tumours and sores are often neglected, or treated improperly by
injudicious people, from an apprehension that they are incurable (to
which the frequent advice of physicians “to let them alone,” has no
doubt contributed), perhaps the introduction of arsenic into regular
practice as a remedy for cancers, may invite to a more early
application to physicians, and thereby prevent the deplorable cases
that have been mentioned, which are often rendered so by delay or
unskilful management.
4. It is not in cancerous sores only that Doctor Martin's powder
has been found to do service. In sores of all kinds, and from a
variety of causes, where they have been attended with fungous flesh
or callous edges, I have used the doctor's powder with advantage.
I flatter myself that I shall be excused in giving this detail of a
quack medicine, when we reflect that it was from the inventions and
temerity of quacks, that physicians have derived some of their most
active and most useful medicines.
OBSERVATIONS
UPON
THE TETANUS.
For a history of the different names and symptoms of this disease,
I beg leave to refer the reader to practical books, particularly to
Doctor Cullen's First Lines. My only design in this inquiry, is to deliver
such a theory of the disease, as may lead to a new and successful
use of old and common remedies for it.
All the remote and predisposing causes of the tetanus act by
inducing preternatural debility, and irritability in the muscular parts
of the body. In many cases, the remote causes act alone, but they
more frequently require the co-operation of an exciting cause. I shall
briefly enumerate, without discriminating them, or pointing out when
they act singly, or when in conjunction with each other.
I. Wounds on different parts of the body are the most frequent
causes of this disease. It was formerly supposed it was the effect
only of a wound, which partially divided a tendon, or a nerve; but
we now know it is often the consequence of læsions which affect the
body in a superficial manner. The following is a list of such wounds
and læsions as have been known to induce the disease:
1. Wounds in the soles of the feet, in the palms of the hands, and
under the nails, by means of nails or splinters of wood.
2. Amputations, and fractures of limbs.
3. Gun-shot wounds.
4. Venesection.
5. The extraction of a tooth, and the insertion of new teeth.
6. The extirpation of a schirrous.
7. Castration.
8. A wound on the tongue.
9. The injury which is done to the feet by frost.
10. The injury which is sometimes done to one of the toes, by
stumping it (as it is called) in walking.
11. Cutting a nail too closely. Also,
12. Cutting a corn too closely.
13. Wearing a shoe so tight as to abrade the skin of one of the
toes.
14. A wound, not more than an eighth part of an inch, upon the
forehead.
15. The stroke of a whip upon the arm, which only broke the skin.
16. Walking too soon upon a broken limb.
17. The sting of a wasp upon the glands penis.
18. A fish bone sticking in the throat.
19. Cutting the navel string in new-born infants.
Between the time in which the body is thus wounded or injured,
and the time in which the disease makes its appearance, there is an
interval which extends from one day to six weeks. In the person who
injured his toe by stumping it in walking, the disease appeared the
next day. The trifling wound on the forehead which I have
mentioned, produced both tetanus and death, the day after it was
received. I have known two instances of tetanus, from running nails
in the feet, which did not appear until six weeks afterwards. In most
of the cases of this disease from wounds which I have seen, there
was a total absence of pain and inflammation, or but very moderate
degrees of them, and in some of them the wounds had entirely
healed, before any of the symptoms of the disease had made their
appearance. Wounds and læsions are most apt to produce tetanus,
after the long continued application of heat to the body; hence its
greater frequency, from these causes, in warm than in cold climates,
and in warm than in cold weather, in northern countries.
II. Cold applied suddenly to the body, after it has been exposed to
intense heat. Of this Dr. Girdlestone mentions many instances, in his
Treatise upon Spasmodic Affections in India. It was most commonly
induced by sleeping upon the ground, after a warm day. Such is the
dampness and unwholesome nature of the ground, in some parts of
that country, that “fowls (the doctor says) put into coops at night, in
the sickly season of the year, and on the same soil that the men
slept, were always found dead the next morning, if the coop was not
placed at a certain height above the surface of the earth[48].” It was
brought on by sleeping on a damp pavement in a servant girl of Mr.
Alexander Todd of Philadelphia, in the evening of a day in which the
mercury in Fahrenheit's thermometer stood at 90°. Dr. Chalmers
relates an instance of its having been induced by a person's sleeping
without a nightcap, after shaving his head. The late Dr. Bartram
informed me, that he had known a draught of cold water produce it
in a man who was in a preternaturally heated state. The cold air
more certainly brings on this disease, if it be applied to the body in
the form of a current. The stiff neck which is sometimes felt after
exposure to a stream of cool air from an open window, is a tendency
to a locked jaw, or a feeble and partial tetanus.
III. Worms and certain acrid matters in the alimentary canal.
Morgagni relates an instance of the former, and I shall hereafter
mention instances of the latter in new-born infants.
IV. Certain poisonous vegetables. There are several cases upon
record of its being induced by the hemlock dropwort, and the datura
stramonium, or Jamestown weed of our country.
V. It is sometimes a symptom of the bilious remitting and
intermitting fever. It is said to occur more frequently in those states
of fever in the island of Malta, than in any other part of the world.
VI. It is likewise a symptom of that malignant state of fever which
is brought on by the bite of a rabid animal, also of hysteria and gout.
VII. The grating noise produced by cutting with a knife upon a
pewter plate excited it in a servant, while he was waiting upon his
master's table in London. It proved fatal in three days.
VIII. The sight of food, after long fasting.
IX. Drunkenness.
X. Certain emotions and passions of the mind. Terror brought it on
a brewer in this city. He had been previously debilitated by great
labour, in warm weather. I have heard of its having been induced in
a man by agitation of mind, occasioned by seeing a girl tread upon a
nail. Fear excited it in a soldier who kneeled down to be shot. Upon
being pardoned he was unable to rise, from a sudden attack of
tetanus. Grief produced it in a case mentioned by Dr. Willan.
XI. Parturition.
All these remote and exciting causes act with more or less
certainty and force, in proportion to the greater or less degrees of
fatigue which have preceded them.
It has been customary with authors to call all those cases of
tetanus, which are not brought on by wounds, symptomatic. They
are no more so than those which are said to be idiopathic. They all
depend alike upon irritating impressions, made upon one part of the
body, producing morbid excitement, or disease in another. It is
immaterial, whether the impression be made upon the intestines by
a worm, upon the ear by an ungrateful noise, upon the mind by a
strong emotion, or upon the sole of the foot by a nail; it is alike
communicated to the muscles, which, from their previous debility
and irritability, are thrown into commotions by it. In yielding to the
impression of irritants, they follow in their contractions the order of
their predisposing debility. The muscles which move the lower jaw
are affected more early, and more obstinately than any of the other
external muscles of the body, only because they are more constantly
in a relaxed, or idle state.