100% found this document useful (1 vote)

27 views

Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download

Ebook

Uploaded by

bevilfahim5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

27 views

Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download

Ebook

Uploaded by

bevilfahim5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Natural Language Processing Recipes: Unlocking

Text Data with Machine Learning and Deep

Learning Using Python 2nd Edition Akshay
Kulkarni install download
https://ebookmeta.com/product/natural-language-processing-
recipes-unlocking-text-data-with-machine-learning-and-deep-
learning-using-python-2nd-edition-akshay-kulkarni/

Download more ebook from https://ebookmeta.com

Akshay Kulkarni and Adarsha Shivananda

Natural Language Processing Recipes

Unlocking Text Data with Machine Learning and
Deep Learning Using Python
2nd ed.
Akshay Kulkarni
Bangalore, Karnataka, India

Adarsha Shivananda
Bangalore, Karnataka, India

ISBN 978-1-4842-7350-0 e-ISBN 978-1-4842-7351-7

https://doi.org/10.1007/978-1-4842-7351-7

© Akshay Kulkarni and Adarsha Shivananda 2021

Apress Standard

The use of general descriptive names, registered names, trademarks,

service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress

Media, LLC part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
To our families
Introduction
According to industry estimates, more than 80% of the data being
generated is in an unstructured format in the form of text, images,
audio, or video. Data is being generated as we speak, write, tweet, use
social media platforms, send messages on messaging platforms, use
ecommerce to shop, and do various other activities. The majority of this
data exists in textual form.

So, what is unstructured data? Unstructured data is information that

doesn’t reside in a traditional relational database. Examples include
documents, blogs, social media feeds, pictures, and videos.
Most of the insights are locked within different types of
unstructured data. Unlocking unstructured data plays a vital role in
every organization wanting to make improved and better decisions.
This book unlocks the potential of textual data.
Textual data is the most common and comprises more than 50% of
unstructured data. Examples include tweets/posts on social media, chat
conversations, news, blogs, articles, product or services reviews, and
patient records in the healthcare sector. Recent examples include voice-
driven bots like Siri and Alexa.
To retrieve significant and actionable insights from textual data and
unlock its potential, we use natural language processing coupled with
machine learning and deep learning.
But what is natural language processing? Machines and algorithms
do not understand text or characters, so it is very important to convert
textual data into a machine-understandable format (like numbers or
binary) to analyze it. Natural language processing (NLP) allows
machines to understand and interpret the human language.
If you want to use the power of unstructured text, this book is the
right starting point. This book unearths the concepts and
implementation of natural language processing and its applications in
the real world. NLP offers unbounded opportunities for solving
interesting problems in artificial intelligence, making it the latest
frontier for developing intelligent, deep learning–based applications.
What Does This Book Cover?
Natural Language Processing Recipes is a handy problem/solution
reference for learning and implementing NLP solutions using Python.
The book is packed with lots of code and approaches that help you
quickly learn and implement both basic and advanced NLP techniques.
You will learn how to efficiently use a wide range of NLP packages,
implement text classification, and identify parts of speech. You also
learn about topic modeling, text summarization, text generation,
sentiment analysis, and many other NLP applications.
This new edition of Natural Language Processing Recipes focuses on
implementing end-to-end projects using Python and leveraging cutting-
edge algorithms and transfer learning.
The book begins by discussing text data collections, web scraping,
and different types of data sources. You learn how to clean and
preprocess text data and analyze it using advanced algorithms.
Throughout the book, you explore the semantic as well as syntactic
analysis of text. It covers complex NLP solutions that involve text
normalization, various advanced preprocessing methods, part-of-
speech (POS) tagging, parsing, text summarization, sentiment analysis,
topic modeling, named-entity recognition (NER), word2vec, seq2seq,
and more.
The book covers both fundamental and state-of-the-art techniques
used in machine learning applications and deep learning natural
language processing. This edition includes various advanced techniques
to convert text to features, like GloVe, ELMo, and BERT. It also explains
how transformers work, using Sentence-BERT and GPT as examples.
The book closes by discussing some of the advanced industrial
applications of NLP with a solution approach and implementation, also
leveraging the power of deep learning techniques for natural language
processing and natural language generation problems, employing
advanced RNNs, like long short-term memory, to solve complex text
generation tasks. It also explores embeddings—high-quality
representations of words in a language.
In this second edition, few advanced state-of-art embeddings and
industrial applications are explained along with end-to-end
implementation using deep learning.
Each chapter includes several code examples and illustrations.
By the end of the book, you will have a clear understanding of
implementing natural language processing. You will have worked on
multiple examples that implement NLP techniques in the real world.
Readers will be comfortable with various NLP techniques coupled with
machine learning and deep learning and its industrial applications,
making the NLP journey much more interesting and improving your
Python coding skills.
Who This Book Is For
This book explains various concepts and implementations to get more
clarity when applying NLP algorithms to chosen data. You learn about
all the ingredients you need to become successful in the NLP space.
Fundamental Python skills are assumed, as well as some knowledge of
machine learning and basic NLP. If you are an NLP or machine learning
enthusiast and an intermediate Python programmer who wants to
quickly master natural language processing, this learning path will do
you a lot of good.
All you need to know are the basics of machine learning and Python
to enjoy the book.

What You Will Learn

The core concepts of implementing NLP, its various approaches, and
using Python libraries such as NLTK, TextBlob, spaCy, and Stanford
CoreNLP
Text preprocessing and feature engineering in NLP along with
advanced methods of feature engineering
Information retrieval, text summarization, sentiment analysis, text
classification, and other advanced NLP techniques solved leveraging
machine learning and deep learning
The problems faced by industries and how to implement them using
NLP techniques
Implementing an end-to-end pipeline of NLP life cycle projects,
which includes framing the problem, finding the data, collecting,
preprocessing the data, and solving it using cutting-edge techniques
and tools

What Do You Need for This Book?

To perform all the recipes in this book successfully, you need Python 3.x
or higher running on any Windows- or Unix-based operating system
with a processor of 2.0 GHz or higher and a minimum of 4 GB RAM. You
can download Python from Anaconda and leverage a Jupyter notebook
for coding purposes. This book assumes you know Keras basics and
how to install the basic machine learning and deep learning libraries.
Please make sure you upgrade or install the latest version of all the
libraries.
Python is the most popular and widely used tool for building NLP
applications. It has many sophisticated libraries to perform NLP tasks,
from basic preprocessing to advanced techniques.
To install any library in a Python Jupyter notebook, use ! before the
pip install.
NLTK is a natural language toolkit and is commonly called “the
mother of all NLP libraries.” It is one of the primary resources when it
comes to Python and NLP.

!pip install nltk

nltk.download()

spaCy is a trending library that comes with the added flavors of a

deep learning framework. Although spaCy doesn’t cover all NLP
functionalities, it does many things well.

!pip install spacy

#if above doesn't work, try this in your terminal/
command prompt
conda install spacy
python -m spacy.en.download all
#then load model via
spacy.load('en')

TextBlob is one of data scientists’ favorite libraries when it comes

to implementing NLP tasks. It is based on both NLTK and Pattern.
TextBlob isn’t the fastest or most complete library, however.

!pip install textblob

CoreNLP is a Python wrapper for Stanford CoreNLP. The toolkit

provides robust, accurate, and optimized techniques for tagging,
parsing, and analyzing text in various languages.
!pip install CoreNLP
There are hundreds of other NLP libraries, but these are the widely
used and important ones.
There is an immense number of NLP industrial applications that are
leveraged to uncover insights. By the end of the book, you will have
implemented many of these use cases, from framing a business
problem to building applications and drawing business insights. The
following are some examples.
Sentiment analysis—a customer’s emotions toward products offered
by the business
Topic modeling extracts the unique topics from the group of
documents.
Complaint classifications/email classifications/ecommerce product
classification, and so on
Document categorization/management using different clustering
techniques.
Résumé shortlisting and job description matching using similarity
methods
Advanced feature engineering techniques (word2vec and fastText) to
capture context
Information/document retrieval systems, for example, search
engines
Chatbots, Q&A, and voice-to-text applications like Siri, Alexa
Language detection and translation using neural networks
Text summarization using graph methods and advanced techniques
Text generation/predicting the next sequence of words using deep
learning algorithms
Acknowledgments
We are grateful to our families for their motivation and constant
support.
We want to express our gratitude to out mentors and friends for
their input, inspiration, and support. A special thanks to Anoosh R.
Kulkarni, a data scientist at Quantziq, for his support in writing this
book and his technical input. A big thanks to the Apress team for their
constant support and help.
Finally, we would like to thank you, the reader, for showing an
interest in this book and making your natural language processing
journey more exciting.
Note that the views and opinions expressed in this book are those of
the authors.
Table of Contents
Chapter 1:Extracting the Data
Introduction
Client Data
Free Sources
Web Scraping
Recipe 1-1.Collecting Data
Problem
Solution
How It Works
Recipe 1-2.Collecting Data from PDFs
Problem
Solution
How It Works
Recipe 1-3.Collecting Data from Word Files
Problem
Solution
How It Works
Recipe 1-4.Collecting Data from JSON
Problem
Solution
How It Works
Recipe 1-5.Collecting Data from HTML
Problem
Solution
How It Works
Recipe 1-6.Parsing Text Using Regular Expressions
Problem
Solution
How It Works
Recipe 1-7.Handling Strings
Problem
Solution
How It Works
Recipe 1-8.Scraping Text from the Web
Problem
Solution
How It Works
Chapter 2:Exploring and Processing Text Data
Recipe 2-1.Converting Text Data to Lowercase
Problem
Solution
How It Works
Recipe 2-2.Removing Punctuation
Problem
Solution
How It Works
Recipe 2-3.Removing Stop Words
Problem
Solution
How It Works
Recipe 2-4.Standardizing Text
Problem
Solution
How It Works
Recipe 2-5.Correcting Spelling
Problem
Solution
How It Works
Recipe 2-6.Tokenizing Text
Problem
Solution
How It Works
Recipe 2-7.Stemming
Problem
Solution
How It Works
Recipe 2-8.Lemmatizing
Problem
Solution
How It Works
Recipe 2-9.Exploring Text Data
Problem
Solution
How It Works
Recipe 2-10.Dealing with Emojis and Emoticons
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Problem
Solution
How It Works
Recipe 2-11.Building a Text Preprocessing Pipeline
Problem
Solution
How It Works
Chapter 3:Converting Text to Features
Recipe 3-1.Converting Text to Features Using One-Hot
Encoding
Problem
Solution
How It Works
Recipe 3-2.Converting Text to Features Using a Count
Vectorizer
Problem
Solution
How It Works
Recipe 3-3.Generating n-grams
Problem
Solution
How It Works
Recipe 3-4.Generating a Co-occurrence Matrix
Problem
Solution
How It Works
Recipe 3-5.Hash Vectorizing
Problem
Solution
How It Works
Recipe 3-6.Converting Text to Features Using TF-IDF
Problem
Solution
How It Works
Recipe 3-7.Implementing Word Embeddings
Problem
Solution
How It Works
Recipe 3-8.Implementing fastText
Problem
Solution
How It Works
Recipe 3-9.Converting Text to Features Using State-of-the-Art
Embeddings
Problem
Solution
ELMo
Sentence Encoders
Open-AI GPT
How It Works
Chapter 4:Advanced Natural Language Processing
Recipe 4-1.Extracting Noun Phrases
Problem
Solution
How It Works
Recipe 4-2.Finding Similarity Between Texts
Solution
How It Works
Recipe 4-3.Tagging Part of Speech
Problem
Solution
How It Works
Recipe 4-4.Extracting Entities from Text
Problem
Solution
How It Works
Recipe 4-5.Extracting Topics from Text
Problem
Solution
How It Works
Recipe 4-6.Classifying Text
Problem
Solution
How It Works
Recipe 4-7.Carrying Out Sentiment Analysis
Problem
Solution
How It Works
Recipe 4-8.Disambiguating Text
Problem
Solution
How It Works
Recipe 4-9.Converting Speech to Text
Problem
Solution
How It Works
Recipe 4-10.Converting Text to Speech
Problem
Solution
How It Works
Recipe 4-11.Translating Speech
Problem
Solution
How It Works
Chapter 5:Implementing Industry Applications
Recipe 5-1.Implementing Multiclass Classification
Problem
Solution
How It Works
Recipe 5-2.Implementing Sentiment Analysis
Problem
Solution
How It Works
Recipe 5-3.Applying Text Similarity Functions
Problem
Solution
How It Works
Recipe 5-4.Summarizing Text Data
Problem
Solution
How It Works
Recipe 5-5.Clustering Documents
Problem
Solution
How It Works
Recipe 5-6.NLP in a Search Engine
Problem
Solution
How It Works
Recipe 5-7.Detecting Fake News
Problem
Solution
How It Works
Recipe 5-8.Movie Genre Tagging
Problem
Solution
How It Works
Chapter 6:Deep Learning for NLP
Introduction to Deep Learning
Convolutional Neural Networks
Data
Architecture
Convolution
Nonlinearity (ReLU)
Pooling
Flatten, Fully Connected, and Softmax Layers
Backpropagation:Training the Neural Network
Recurrent Neural Networks
Training RNN:Backpropagation Through Time (BPTT)
Long Short-Term Memory (LSTM)
Recipe 6-1.Retrieving Information
Problem
Solution
How It Works
Recipe 6-2.Classifying Text with Deep Learning
Problem
Solution
How It Works
Recipe 6-3.Next Word Prediction
Problem
Solution
How It Works
Recipe 6-4.Stack Overflow question recommendation
Problem
Solution
How It Works
Chapter 7:Conclusion and Next-Gen NLP
Recipe 7-1.Recent advancements in text to features or
distributed representations
Problem
Solution
Recipe 7-2.Advanced deep learning for NLP
Problem
Solution
Recipe 7-3.Reinforcement learning applications in NLP
Problem
Solution
Recipe 7-4.Transfer learning and pre-trained models
Problem
Solution
Recipe 7-5.Meta-learning in NLP
Problem
Solution
Recipe 7-6.Capsule networks for NLP
Problem
Solution
Index
About the Authors
Akshay Kulkarni
is a renowned AI and machine learning
evangelist and thought leader. He has
consulted several Fortune 500 and global
enterprises on driving AI and data
science–led strategic transformation.
Akshay has rich experience in building
and scaling AI and machine learning
businesses and creating significant
impact. He is currently a data science
and AI manager at Publicis Sapient,
where he is part of strategy and
transformation interventions through AI.
He manages high-priority growth
initiatives around data science and
works on various artificial intelligence engagements by applying state-
of-the-art techniques to this space.
Akshay is also a Google Developers Expert in machine learning, a
published author of books on NLP and deep learning, and a regular
speaker at major AI and data science conferences.
In 2019, Akshay was named one of the top “40 under 40 data
scientists” in India.
In his spare time, he enjoys reading, writing, coding, and mentoring
aspiring data scientists. He lives in Bangalore, India, with his family.

Adarsha Shivananda
is a lead data scientist at Indegene Inc.’s product and technology team,
where he leads a group of analysts who enable predictive analytics and
AI features to healthcare software products. These are mainly
multichannel activities for pharma products and solving the real-time
problems encountered by pharma sales reps. Adarsha aims to build a
pool of exceptional data scientists within the organization to solve
greater health care problems through
brilliant training programs. He always
wants to stay ahead of the curve.
His core expertise involves machine
learning, deep learning,
recommendation systems, and statistics.
Adarsha has worked on various data
science projects across multiple domains
using different technologies and
methodologies. Previously, he worked
for Tredence Analytics and IQVIA.
He lives in Bangalore, India, and loves
to read, ride, and teach data science.
About the Technical Reviewer
Aakash Kag
is a data scientist at AlixPartners and is a
co-founder of the Emeelan application.
He has six years of experience in big data
analytics and has a postgraduate degree
in computer science with a specialization
in big data analytics. Aakash is
passionate about developing social
platforms, machine learning, and
meetups, where he often talks.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_1

1. Extracting the Data

Akshay Kulkarni1 and Adarsha Shivananda1
(1) Bangalore, Karnataka, India

This chapter covers various sources of text data and the ways to extract it. Textual data can act as
information or insights for businesses. The following recipes are covered.
Recipe 1. Text data collection using APIs
Recipe 2. Reading a PDF file in Python
Recipe 3. Reading a Word document
Recipe 4. Reading a JSON object
Recipe 5. Reading an HTML page and HTML parsing
Recipe 6. Regular expressions
Recipe 7. String handling
Recipe 8. Web scraping

Introduction
Before getting into the details of the book, let’s look at generally available data sources. We need to identify
potential data sources that can help with solving data science use cases.

Client Data
For any problem statement, one of the sources is the data that is already present. The business decides
where it wants to store its data. Data storage depends on the type of business, the amount of data, and the
costs associated with the sources. The following are some examples.
SQL databases
HDFS
Cloud storage
Flat files

Free Sources
A large amount of data is freely available on the Internet. You just need to streamline the problem and start
exploring multiple free data sources.
Free APIs like Twitter
Wikipedia
Government data (e.g., http://data.gov)
Census data (e.g., www.census.gov/data.html)
Health care claim data (e.g., www.healthdata.gov)
Data science community websites (e.g., www.kaggle.com)
Google dataset search (e.g., https://datasetsearch.research.google.com)

Web Scraping
Extracting the content/data from websites, blogs, forums, and retail websites for reviews with permission
from the respective sources using web scraping packages in Python.
There are a lot of other sources, such as news data and economic data, that can be leveraged for analysis.

Recipe 1-1. Collecting Data

There are a lot of free APIs through which you can collect data and use it to solve problems. Let’s discuss the
Twitter API.

Problem
You want to collect text data using Twitter APIs.

Solution
Twitter has a gigantic amount of data with a lot of value in it. Social media marketers make their living from
it. There is an enormous number of tweets every day, and every tweet has some story to tell. When all of this
data is collected and analyzed, it gives a business tremendous insights about their company, product,
service, and so forth.
Let’s now look at how to pull data and then explore how to leverage it in the coming chapters.

How It Works
Step 1-1. Log in to the Twitter developer portal
Log in to the Twitter developer portal at https://developer.twitter.com.
Create your own app in the Twitter developer portal, and get the following keys. Once you have these
credentials, you can start pulling data.
consumer key: The key associated with the application (Twitter, Facebook, etc.)
consumer secret: The password used to authenticate with the authentication server (Twitter, Facebook,
etc.)
access token: The key given to the client after successful authentication of keys
access token secret: The password for the access key

Step 1-2. Execute query in Python

Once all the credentials are in place, use the following code to fetch the data.

# Install tweepy
!pip install tweepy
# Import the libraries
import numpy as np
import tweepy
import json
import pandas as pd
from tweepy import OAuthHandler
# credentials
consumer_key = "adjbiejfaaoeh"
consumer_secret = "had73haf78af"
access_token = "jnsfby5u4yuawhafjeh"
access_token_secret = "jhdfgay768476r"
# calling API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Provide the query you want to pull the data. For example, pulling data for
the mobile phone ABC
query ="ABC"
# Fetching tweets
Tweets = api.search(query, count =
10,lang='en',exclude='retweets',tweet_mode='extended')
This query pulls the top ten tweets when product ABC is searched. The API pulls English tweets since the
language given is 'en'. It excludes retweets.

Recipe 1-2. Collecting Data from PDFs

Most of your data is stored in PDF files. You need to extract text from these files and store it for further
analysis.

Problem
You want to read a PDF file.

Solution
The simplest way to read a PDF file is by using the PyPDF2 library.

How It Works
Follow the steps in this section to extract data from PDF files.

Step 2-1. Install and import all the necessary libraries

Here are the first lines of code .

!pip install PyPDF2

import PyPDF2
from PyPDF2 import PdfFileReader

Note You can download any PDF file from the web and place it in the location where you are running
this Jupyter notebook or Python script.

Step 2-2. Extract text from a PDF file

Now let’s extract the text.

#Creating a pdf file object

pdf = open("file.pdf","rb")
#creating pdf reader object
pdf_reader = PyPDF2.PdfFileReader(pdf)
#checking number of pages in a pdf file
print(pdf_reader.numPages)
#creating a page object
page = pdf_reader.getPage(0)
#finally extracting text from the page
print(page.extractText())
#closing the pdf file
pdf.close()

Please note that the function doesn’t work for scanned PDFs.

Recipe 1-3. Collecting Data from Word Files

Next, let’s look at another small recipe that reads Word files in Python.

Problem
You want to read Word files .

Solution
The simplest way is to use the docx library.

How It Works
Follow the steps in this section to extract data from a Word file.

Step 3-1. Install and import all the necessary libraries

The following is the code to install and import the docx library.

#Install docx
!pip install docx
#Import library
from docx import Document

Note You can download any Word file from the web and place it in the location where you are running a
Jupyter notebook or Python script.

Step 3-2. Extract text from a Word file

Now let’s get the text .

#Creating a word file object

doc = open("file.docx","rb")
#creating word reader object
document = docx.Document(doc)
#create an empty string and call this document. #This document variable
stores each paragraph in the Word document.
#We then create a "for" loop that goes through each paragraph in the Word
document and appends the paragraph.
docu=""
for para in document.paragraphs.
docu += para.text
#to see the output call docu
print(docu)

Recipe 1-4. Collecting Data from JSON

JSON is an open standard file format that stands for JavaScript Object Notation. It’s often used when data is
sent to a webpage from a server. This recipe explains how to read a JSON file/object.

Problem
You want to read a JSON file/object.

Solution
The simplest way is to use requests and the JSON library.

How It Works
Follow the steps in this section to extract data from JSON.

Step 4-1. Install and import all the necessary libraries

Here is the code for importing the libraries.

import requests
import json
Step 4-2. Extract text from a JSON file
Now let’s extract the text .

#extracting the text from "https://quotes.rest/qod.json"

r = requests.get("https://quotes.rest/qod.json")
res = r.json()
print(json.dumps(res, indent = 4))
#output
{
"success": {
"total": 1
},
"contents": {
"quotes": [
{
"quote": "Where there is ruin, there is hope for a
treasure.",
"length": "50",
"author": "Rumi",
"tags": [
"failure",
"inspire",
"learning-from-failure"
],
"category": "inspire",
"date": "2018-09-29",
"permalink":
"https://theysaidso.com/quote/dPKsui4sQnQqgMnXHLKtfweF/rumi-where-there-is-
ruin-there-is-hope-for-a-treasure",
"title": "Inspiring Quote of the day",
"background":
"https://theysaidso.com/img/bgs/man_on_the_mountain.jpg",
"id": "dPKsui4sQnQqgMnXHLKtfweF"
}
],
"copyright": "2017-19 theysaidso.com"
}
}
#extract contents
q = res['contents']['quotes'][0]
q
#output
{'author': 'Rumi',
'background': 'https://theysaidso.com/img/bgs/man_on_the_mountain.jpg',
'category': 'inspire',
'date': '2018-09-29',
'id': 'dPKsui4sQnQqgMnXHLKtfweF',
'length': '50',
'permalink': 'https://theysaidso.com/quote/dPKsui4sQnQqgMnXHLKtfweF/rumi-
where-there-is-ruin-there-is-hope-for-a-treasure',
'quote': 'Where there is ruin, there is hope for a treasure.',
'tags': ['failure', 'inspire', 'learning-from-failure'],
'title': 'Inspiring Quote of the day'}
#extract only quote
print(q['quote'], '\n--', q['author'])
#output
It wasn't raining when Noah built the ark....
-- Howard Ruff

Recipe 1-5. Collecting Data from HTML

HTML is short for HyperText Markup Language. It structures webpages and displays them in a browser.
There are various HTML tags that build the content. This recipe looks at reading HTML pages .

Problem
You want to read parse/read HTML pages.

Solution
The simplest way is to use the bs4 library.

How It Works
Follow the steps in this section to extract data from the web.

Step 5-1. Install and import all the necessary libraries

First, import the libraries .

!pip install bs4

import urllib.request as urllib2
from bs4 import BeautifulSoup

Step 5-2. Fetch the HTML file

You can pick any website that you want to extract. Let’s use Wikipedia in this example.

response =
urllib2.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')
html_doc = response.read()

Step 5-3. Parse the HTML file

Now let’s get the data.

#Parsing
soup = BeautifulSoup(html_doc, 'html.parser')
# Formating the parsed html file
strhtm = soup.prettify()
# Print few lines
print (strhtm[:1000])
#output
<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>
Natural language processing - Wikipedia
</title>
<script>
document.documentElement.className = document.documentElement.className.rep
</script>
<script>
(window.RLQ=window.RLQ||[]).push(function()
{mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"
processing","wgCurRevisionId":860741853,"wgRevisionId":860741853,"wgArticleId"
["*"],"wgCategories":["Webarchive template wayback links","All accuracy disput
identifiers","Natural language processing","Computational linguistics","Speech

Step 5-4. Extract a tag value

You can extract a tag’s value from the first instance of the tag using the following code.

print(soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)
#output
<title>Natural language processing - Wikipedia</title>
Natural language processing - Wikipedia
None
Natural language processing

Step 5-5. Extract all instances of a particular tag

Here we get all the instances of the tag that we are interested in.

for x in soup.find_all('a'): print(x.string)

#sample output
None
Jump to navigation
Jump to search
Language processing in the brain
None
None
automated online assistant
customer service
[1]
computer science
artificial intelligence
natural language
speech recognition
natural language understanding
natural language generation

Step 5-6. Extract all text from a particular tag

Finally, we get the text .

for x in soup.find_all('p'): print(x.text)

#sample output
Natural language processing (NLP) is an area of computer science and
artificial intelligence concerned with the interactions between computers
and human (natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data.
Challenges in natural language processing frequently involve speech
recognition, natural language understanding, and natural language
generation.
The history of natural language processing generally started in the 1950s,
although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Intelligence" which
proposed what is now called the Turing test as a criterion of intelligence.
Note that the p tag extracted most of the text on the page.

Recipe 1-6. Parsing Text Using Regular Expressions

This recipe discusses how regular expressions are helpful when dealing with text data. Regular expressions
are required when dealing with raw data from the web that contains HTML tags, long text, and repeated text.
During the process of developing your application, as well as in output, you don’t need such data.
You can do allsorts of basic and advanced data cleaning using regular expressions.

Problem
You want to parse text data using regular expressions.

Solution
The best way is to use the re library in Python.

How It Works
Let’s look at some of the ways we can use regular expressions for our tasks.
The basic flags are I, L, M, S, U, X.
re.I ignores casing.
re.L finds a local dependent.
re.M finds patterns throughout multiple lines.
re.S finds dot matches.
re.U works for Unicode data.
re.X writes regex in a more readable format.
The following describes regular expressions’ functionalities .
Find a single occurrence of characters a and b: [ab]
Find characters except for a and b: [^ab]
Find the character range of a to z: [a-z]
Find a character range except a to z: [^a-z]
Find all the characters from both a to z and A to Z: [a-zA-Z]
Find any single character: []
Find any whitespace character: \s
Find any non-whitespace character: \S
Find any digit: \d
Find any non-digit: \D
Find any non-words: \W
Find any words: \w
Find either a or b: (a|b)
The occurrence of a is either zero or one
Matches zero or not more than one occurrence: a? ; ?
The occurrence of a is zero or more times: a* ; * matches zero or more than that
The occurrence of a is one or more times: a+ ; + matches occurrences one or more
than one time
Match three simultaneous occurrences of a: a{3}
Match three or more simultaneous occurrences of a: a{3,}
Match three to six simultaneous occurrences of a: a{3,6}
Start of a string: ^
End of a string: $
Match word boundary: \b
Non-word boundary: \B
The re.match() and re.search() functions find patterns, which are then processed according to
the requirements of the application.
Let’s look at the differences between re.match() and re.search().
re.match() checks for a match only at the beginning of the string. So, if it finds a pattern at the
beginning of the input string, it returns the matched pattern; otherwise, it returns a noun.
re.search() checks for a match anywhere in the string. It finds all the occurrences of the pattern in the
given input string or data.
Now let’s look at a few examples using these regular expressions.

Tokenizing
Tokenizing means splitting a sentence into words. One way to do this is to use re.split.

# Import library
import re
#run the split query
re.split('\s+','I like this book.')
['I', 'like', 'this', 'book.']

For an explanation of regex, please refer to the main recipe.

Extracting Email IDs

The simplest way to extract email IDs is to use re.findall.
1. Read/create the document or sentences.

doc = "For more details please mail us at: xyz@abc.com, pqr@mno.com"

2. Execute the re.findall function.

addresses = re.findall(r'[\w\.-]+@[\w\.-]+', doc)

for address in addresses.
print(address)
#Output
xyz@abc.com
pqr@mno.com

Replacing Email IDs

Let’s replace email IDs in sentences or documents with other email IDs. The simplest way to do this is by
using re.sub.
1. Read/create the document or sentences.

doc = "For more details please mail us at xyz@abc.com"

2. Execute the re.sub function.

new_email_address = re.sub(r'([\w\.-]+)@([\w\.-]+)', r'pqr@mno.com', doc)

print(new_email_address)
#Output
For more details please mail us at pqr@mno.com
For an explanation of regex, please refer to Recipe 1-6.
If you observe in both instances when dealing with email using regex, we have implemented a very basic
one. We state that words separated by @ help capture email IDs. However, there could be many edge cases;
for example, the dot (.) incorporates domain names and handles numbers, the + (plus sign), and so on,
because they can be part of an email ID.
The following is an advanced regex to extract/find/replace email IDs.

([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)

There are even more complex ones to handle all the edge cases (e.g., “.co.in” email IDs). Please give it a
try.

Extracting Data from an eBook and Performing regex

Let’s solve a case study that extracts data from an ebook by using the techniques you have learned so far.
1. Extract the content from the book.

# Import library
import re
import requests
#url you want to extract
url = 'https://www.gutenberg.org/files/2638/2638-0.txt'
#function to extract
def get_book(url).
# Sends a http request to get the text from project Gutenberg
raw = requests.get(url).text
# Discards the metadata from the beginning of the book
start = re.search(r"\*\*\* START OF THIS PROJECT GUTENBERG EBOOK .*
\*\*\*",raw ).end()
# Discards the metadata from the end of the book
stop = re.search(r"II", raw).start()
# Keeps the relevant text
text = raw[start:stop]
return text
# processing
def preprocess(sentence).
return re.sub('[^A-Za-z0-9.]+' , ' ', sentence).lower()
#calling the above function
book = get_book(url)
processed_book = preprocess(book)
print(processed_book)
# Output
produced by martin adamson david widger with corrections by andrew sly
the idiot by fyodor dostoyevsky translated by eva martin part i i. towards
the end of november during a thaw at nine o clock one morning a train on
the warsaw and petersburg railway was approaching the latter city at full
speed. the morning was so damp and misty that it was only with great
difficulty that the day succeeded in breaking and it was impossible to
distinguish anything more than a few yards away from the carriage windows.
some of the passengers by this particular train were returning from abroad
but the third class carriages were the best filled chiefly with
insignificant persons of various occupations and degrees picked up at the
different stations nearer town. all of them seemed weary and most of them
had sleepy eyes and a shivering expression while their complexions
generally appeared to have taken on the colour of the fog outside. when da
2. Perform an exploratory data analysis on this data using regex.

# Count number of times "the" is appeared in the book

len(re.findall(r'the', processed_book))
#Output
302
#Replace "i" with "I"
processed_book = re.sub(r'\si\s', " I ", processed_book)
print(processed_book)
#output
produced by martin adamson david widger with corrections by andrew sly
the idiot by fyodor dostoyevsky translated by eva martin part I i. towards
the end of november during a thaw at nine o clock one morning a train on
the warsaw and petersburg railway was approaching the latter city at full
speed. the morning was so damp and misty that it was only with great
difficulty that the day succeeded in breaking and it was impossible to
distinguish anything more than a few yards away from the carriage windows.
some of the passengers by this particular train were returning from abroad
but the third class carriages were the best filled chiefly with
insignificant persons of various occupations and degrees picked up at the
different stations nearer town. all of them seemed weary and most of them
had sleepy eyes and a shivering expression while their complexions
generally appeared to have taken on the colour of the fog outside. when da
#find all occurance of text in the format "abc--xyz"
re.findall(r'[a-zA-Z0-9]*--[a-zA-Z0-9]*', book)
#output
['ironical--it',
'malicious--smile',
'fur--or',
'astrachan--overcoat',
'it--the',
'Italy--was',
'malady--a',
'money--and',
'little--to',
'No--Mr',
'is--where',
'I--I',
'I--',
'--though',
'crime--we',
'or--judge',
'gaiters--still',
'--if',
'through--well',
'say--through',
'however--and',
'Epanchin--oh',
'too--at',
'was--and',
'Andreevitch--that',
'everyone--that',
'reduce--or',
'raise--to',
'listen--and',
'history--but',
'individual--one',
'yes--I',
'but--',
't--not',
'me--then',
'perhaps--',
'Yes--those',
'me--is',
'servility--if',
'Rogojin--hereditary',
'citizen--who',
'least--goodness',
'memory--but',
'latter--since',
'Rogojin--hung',
'him--I',
'anything--she',
'old--and',
'you--scarecrow',
'certainly--certainly',
'father--I',
'Barashkoff--I',
'see--and',
'everything--Lebedeff',
'about--he',
'now--I',
'Lihachof--',
'Zaleshoff--looking',
'old--fifty',
'so--and',
'this--do',
'day--not',
'that--',
'do--by',
'know--my',
'illness--I',
'well--here',
'fellow--you']

Recipe 1-7. Handling Strings

This recipe discusses how to handle strings and deal with textual data. You can do all sorts of basic text
explorations using string operations.

Problem
You want to explore handling strings.

Solution
The simplest way is to use the following string functionality.
s.find(t) is an index of the first instance of string t inside s (–1 if not found)
s.rfind(t) is an index of the last instance of string t inside s (–1 if not found)
s.index(t) is like s.find(t) except it raises ValueError if not found
s.rindex(t) is like s.rfind(t) except it raises ValueError if not found
s.join(text) combines the words of the text into a string using s as the glue
s.split(t) splits s into a list wherever a t is found (whitespace by default)
s.splitlines() splits s into a list of strings, one per line
s.lower() is a lowercase version of the string s
s.upper() is an uppercase version of the string s
s.title() is a titlecased version of the string s
s.strip() is a copy of s without leading or trailing whitespace
s.replace(t, u) replaces instances of t with u inside s

How It Works
Now let’s look at a few of the examples.

Replacing Content
Create a string and replace the content. Creating strings is easy. It is done by enclosing the characters in
single or double quotes. And to replace, you can use the replace function.
1. Create a string.

String_v1 = "I am exploring NLP"

#To extract particular character or range of characters from string
print(String_v1[0])
#output
"I"
#To extract the word “exploring”
print(String_v1[5:14])
#output
exploring

2. Replace "exploring" with "learning" in the preceding string.

String_v2 = String_v1.replace("exploring", "learning")

print(String_v2)
#Output
I am learning NLP

Concatenating Two Strings

The following is simple code.

s1 = "nlp"
s2 = "machine learning"
s3 = s1+s2
print(s3)
#output
'nlpmachine learning'

Searching for a Substring in a String

Use the find function to fetch the starting index value of the substring in the whole string.

var="I am learning NLP"

f= "learn"
var.find(f)
#output
5

Recipe 1-8. Scraping Text from the Web

This recipe discusses how to scrape data from the web.

Caution Before scraping any websites, blogs, or ecommerce sites, please make sure you read the site’s
terms and conditions on whether it gives permissions for data scraping. Generally, robots.txt contains the
terms and conditions (e.g., see www.alixpartners.com/robots.txt) and a site map contains a
URL’s map (e.g., see www.alixpartners.com/sitemap.xml).

Web scraping is also known as web harvesting and web data extraction. It is a technique to extract a large
amount of data from websites and save it in a database or locally. You can use this data to extract
information related to your customers, users, or products for the business’s benefit.
A basic understanding of HTML is a prerequisite.

Problem
You want to extract data from the web by scraping. Let’s use IMDB.com as an example of scraping top
movies.

Solution
The simplest way to do this is by using Python’s Beautiful Soup or Scrapy libraries. Let’s use Beautiful Soup
in this recipe.

How It Works
Follow the steps in this section to extract data from the web.

Step 8-1. Install all the necessary libraries

!pip install bs4
!pip install requests

Step 8-2. Import the libraries

from bs4 import BeautifulSoup
import requests
import pandas as pd
from pandas import Series, DataFrame
from ipywidgets import FloatProgress
from time import sleep
from IPython.display import display
import re
import pickle

Step 8-3. Identify the URL to extract the data

url = 'http://www.imdb.com/chart/top?ref_=nv_mv_250_6'

Step 8-4. Request the URL and download the content using Beautiful Soup
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c,"lxml")
Step 8-5. Understand the website’s structure to extract the required information
Go to the website and right-click the page content to inspect the site’s HTML structure.
Identify the data and fields that you want to extract. For example, you want the movie name and IMDB
rating.
Check which div or class in the HTML contains the movie names and parse the Beautiful Soup
accordingly. In this example, you can parse the soup through <table class ="chart full-width">
and <td class="titleColumn"> to extract the movie name.
Similarly, you can fetch other data; refer to the code in step 8-6.

Step 8-6. Use Beautiful Soup to extract and parse the data from HTML tags
summary = soup.find('div',{'class':'article'})
# Create empty lists to append the extracted data .
moviename = []
cast = []
description = []
rating = []
ratingoutof = []
year = []
genre = []
movielength = []
rot_audscore = []
rot_avgrating = []
rot_users = []
# Extracting the required data from the html soup.
rgx = re.compile('[%s]' % '()')
f = FloatProgress(min=0, max=250)
display(f)
for row,i in
zip(summary.find('table').findAll('tr'),range(len(summary.find('table').findAl
for sitem in row.findAll('span',{'class':'secondaryInfo'}).
s = sitem.find(text=True)
year.append(rgx.sub(", s))
for ritem in row.findAll('td',{'class':'ratingColumn imdbRating'}).
for iget in ritem.findAll('strong').
rating.append(iget.find(text=True))
ratingoutof.append(iget.get('title').split(' ', 4)[3])
for item in row.findAll('td',{'class':'titleColumn'}).
for href in item.findAll('a',href=True).
moviename.append(href.find(text=True))
rurl = 'https://www.rottentomatoes.com/m/'+ href.find(text=True)
try.
rresult = requests.get(rurl)
except requests.exceptions.ConnectionError.
status_code = "Connection refused"
rc = rresult.content
rsoup = BeautifulSoup(rc)
try:
rot_audscore.append(rsoup.find('div',{'class':'meter-
value'}).find('span',{'class':'superPageFontColor'}).text)
rot_avgrating.append(rsoup.find('div',{'class':'audience-info
superPageFontColor'}).find('div').contents[2].strip())
rot_users.append(rsoup.find('div',{'class':'audience-info hidd
superPageFontColor'}).contents[3].contents[2].strip())
except AttributeError.
rot_audscore.append("")
rot_avgrating.append("")
rot_users.append("")
cast.append(href.get('title'))
imdb = "http://www.imdb.com" + href.get('href')
try.
iresult = requests.get(imdb)
ic = iresult.content
isoup = BeautifulSoup(ic)
description.append(isoup.find('div',
{'class':'summary_text'}).find(text=True).strip())
genre.append(isoup.find('span',{'class':'itemprop'}).find(text
movielength.append(isoup.find('time',
{'itemprop':'duration'}).find(text=True).strip())
except requests.exceptions.ConnectionError.
description.append("")
genre.append("")
movielength.append("")
sleep(.1)
f.value = i
Note that there is a high chance that you might encounter an error while executing this script because of
the following reasons.
Your request to the URL fails. If so, try again after some time. This is common in web scraping.
The webpages are dynamic, which means the HTML tags keep changing. Study the tags and make small
changes in the code in accordance with HTML, and you should be good to go.

Step 8-7. Convert lists to a data frame and perform an analysis that meets
business requirements
# List to pandas series
moviename = Series(moviename)
cast = Series(cast)
description = Series(description)
rating = Series(rating)
ratingoutof = Series(ratingoutof)
year = Series(year)
genre = Series(genre)
movielength = Series(movielength)
rot_audscore = Series(rot_audscore)
rot_avgrating = Series(rot_avgrating)
rot_users = Series(rot_users)
# creating dataframe and doing analysis
imdb_df = pd.concat([moviename,year,description,genre,movielength,cast,rating,
imdb_df.columns =
['moviename','year','description','genre','movielength','cast','imdb_rating','
imdb_df['rank'] = imdb_df.index + 1
imdb_df.head(1)
#output

Step 8-8. Download the data frame

# Saving the file as CSV.
imdb_df.to_csv("imdbdataexport.csv")

This chapter implemented most of the techniques to extract text data from sources. In the coming
chapters, you look at how to explore, process, and clean data. You also learn about feature engineering and
building NLP applications.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
A. Kulkarni, A. Shivananda, Natural Language Processing Recipes
https://doi.org/10.1007/978-1-4842-7351-7_2

2. Exploring and Processing Text Data

Akshay Kulkarni1 and Adarsha Shivananda1
(1) Bangalore, Karnataka, India

This chapter discusses various methods and techniques to preprocess textual data and
exploratory data analysis. It covers the following recipes.
Recipe 1. Lowercasing
Recipe 2. Punctuation removal
Recipe 3. Stop words removal
Recipe 4. Text standardization
Recipe 5. Spelling correction
Recipe 6. Tokenization
Recipe 7. Stemming
Recipe 8. Lemmatization
Recipe 9. Exploratory data analysis
Recipe 10. Dealing with emojis and emoticons
Recipe 11. End-to-end processing pipeline
Before directly jumping into the recipes, let’s first understand the need for preprocessing
the text data. As you know, about 90% of the world’s data is unstructured and may be present
in the form of an image, text, audio, and video. Text can come in various forms, from a list of
individual words to sentences to multiple paragraphs with special characters (like tweets and
other punctuations). It also may be present in the form of web, HTML, documents, and so on.
And this data is never clean and consists of a lot of noise. It needs to be treated and then
perform a few preprocessing functions to make sure you have the right input data for the
feature engineering and model building. If you don’t preprocess the data, any algorithms built
on top of such data do not add any value to a business. This reminds us of a very popular
phrase in data science: “Garbage in, garbage out.”
Preprocessing involves transforming raw text data into an understandable format. Real-
world data is often incomplete, inconsistent, and filled with a lot of noise, and is likely to
contain many errors. Preprocessing is a proven method of resolving such issues. Data
preprocessing prepares raw text data for further processing.

Recipe 2-1. Converting Text Data to Lowercase

This recipe discusses how to lowercase the text data to have all the data in a uniform format
and make sure “NLP” and “nlp” are treated as the same.

Problem
You want to lowercase the text data.
Solution
The simplest way is to use the default lower() function in Python.
The lower() method converts all uppercase characters in a string to lowercase characters
and returns them.

How It Works
Follow the steps in this section to lowercase a given text or document. Here, Python is used.

Step 1-1. Read/create the text data

Let’s create a list of strings and assign it to a variable .

text=['This is introduction to NLP','It is likely to be useful, to

people ','Machine learning is the new electrcity','There would be
less hype around AI and more action going forward','python is the
best tool!','R is good langauage','I like this book','I want more
books like this']
#convert list to data frame
import pandas as pd
df = pd.DataFrame({'tweet':text})
print(df)
#output
tweet
0 This is introduction to NLP
1 It is likely to be useful, to people
2 Machine learning is the new electrcity
3 There would be less hype around AI and more ac...
4 python is the best tool!
5 R is good langauage
6 I like this book
7 I want more books like this

Step 1-2. Execute the lower() function on the text data

When there is only a string, directly apply the lower() function as follows.

x = 'Testing'
x2 = x.lower()
print(x2)
#output
'testing'

When you want to perform lowercasing on a data frame, use the apply function as follows.

df['tweet'] = df['tweet'].apply(lambda x: " ".join(x.lower() for x

in x.split()))
df['tweet']
#output
0 this is introduction to nlp
1 it is likely to be useful, to people
Random documents with unrelated
content Scribd suggests to you:
war; during which time he was perfectly free from all pulmonary
disease. The spitting of blood returned soon after he settled in
private practice. To remedy this complaint, he had recourse to a low
diet, but finding it ineffectual, he partook liberally of the usual diet of
healthy men, and he now enjoys a perfect exemption from it.
It would be very easy to add many other cases, in which labour,
the employments of agriculture, and a life of hardship by sea and
land, have prevented, relieved, or cured, not only the consumption,
but pulmonary diseases of all kinds.
To the cases that have been mentioned, I shall add only one
more, which was communicated to me by the venerable Doctor
Franklin, whose conversation at all times conveyed instruction, and
not less in medicine than upon other subjects. In travelling, many
years ago, through New-England, the doctor overtook the post-rider;
and after some inquiries into the history of his life, he informed him
that he was bred a shoe-maker; that his confinement, and other
circumstances, had brought on a consumption, for which he was
ordered by a physician to ride on horseback. Finding this mode of
exercise too expensive, he made interest, upon the death of an old
post-rider, to succeed to his appointment, in which he perfectly
recovered his health in two years. After this he returned to his old
trade, upon which his consumption returned. He again mounted his
horse, and rode post in all seasons and weathers, between New-York
and Connecticut river (about 140 miles), in which employment he
continued upwards of thirty years, in perfect health.
These facts, I hope, are sufficient to establish the advantages of
restoring the original vigour of the constitution, in every attempt to
effect a radical cure of consumption.
But how shall these remedies be applied in the time of peace, or
in a country where the want of woods, and brooks without bridges,
forbid the attainment of the laborious pleasures of the Indian mode
of hunting; or where the universal extent of civilization does not
admit of our advising the toils of a new settlement, and
improvements upon bare creation? Under these circumstances, I
conceive substitutes may be obtained for each of them, nearly of
equal efficacy, and attainable with much less trouble.
1. Doctor Sydenham pronounced riding on horseback, to be as
certain a cure for consumptions as bark is for an intermitting fever. I
have no more doubt of the truth of this assertion, than I have that
inflammatory fevers are now less frequent in London than they were
in the time of Doctor Sydenham. If riding on horseback in
consumptions has ceased to be a remedy in Britain, the fault is in
the patient, and not in the remedy. “It is a sign that the stomach
requires milk (says Doctor Cadogan), when it cannot bear it.” In like
manner, the inability of the patient to bear this manly and
wholesome exercise, serves only to demonstrate the necessity and
advantages of it. I suspect the same objections to this exercise
which have been made in Britain, will not occur in the United States
of America; for the Americans, with respect to the symptoms and
degrees of epidemic and chronic diseases, appear to be nearly in the
same state that the inhabitants of England were in the seventeenth
century. We find, in proportion to the decline of the vigour of the
body, that many occasional causes produce fever and inflammation,
which would not have done it a hundred years ago.
2. The laborious employments of agriculture, if steadily pursued,
and accompanied at the same time by the simple, but wholesome
diet of a farmhouse, and a hard bed, would probably afford a good
substitute for the toils of a savage or military life.
3. Such occupations or professions as require constant labour or
exercise in the open air, in all kinds of weather, may easily be chosen
for a young man who, either from hereditary predisposition, or an
accidental affection of the lungs, is in danger of falling into a
consumption. In this we should imitate the advice given by some
wise men, always to prefer those professions for our sons, which are
the least favourable to the corrupt inclinations of their hearts. For
example, where an undue passion for money, or a crafty disposition,
discover themselves in early life, we are directed to oppose them by
the less profitable and more disinterested professions of divinity or
physic, rather than cherish them by trade, or the practice of the law.
Agreeably to this analogy, weakly children should be trained to the
laborious, and the robust, to the sedentary occupations. From a
neglect of this practice, many hundred apprentices to taylors,
shoemakers, conveyancers, watchmakers, silversmiths, and mantua-
makers, perish every year by consumptions.
4. There is a case recorded by Dr. Smollet, of the efficacy of the
cold bath in a consumption; and I have heard of its having been
used with success, in the case of a negro man, in one of the West-
India islands. To render this remedy useful, or even safe, it will be
necessary to join it with labour, or to use it in degrees that shall
prevent the alternation of the system with vigour and debility; for I
take the cure of consumption ultimately to depend upon the simple
and constant action of tonic remedies. It is to be lamented that it
often requires so much time, or such remedies to remove the
inflammatory diathesis, which attends the first stage of
consumption, as to reduce the patient too low to make use of those
tonic remedies afterwards, which would effect a radical cure.
If it were possible to graduate the tone of the system by means of
a scale, I would add, that to cure consumption, the system should
be raised to the highest degree of this scale. Nothing short of an
equilibrium of tone, or a free and vigorous action of every muscle
and viscus in the body, will fully come up to a radical cure of this
disease.
In regulating the diet of consumptive patients, I conceive it to be
as necessary to feel the pulse, as it is in determining when and in
what quantity to draw blood. Where inflammatory diathesis prevails,
a vegetable diet is certainly proper; but where the patient has
escaped, or passed this stage of the disease, I believe a vegetable
diet alone to be injurious; and am sure a moderate quantity of
animal food may be taken with advantage.
The presence or absence of this inflammatory diathesis, furnishes
the indications for administering or refraining from the use of the
bark and balsamic medicines. With all the testimonies of their having
done mischief, many of which I could produce, I have known several
cases in which they have been given with obvious advantage; but it
was only when there was a total absence of inflammatory diathesis.
Perhaps the remedies I have recommended, and the opinions I
have delivered, may derive some support from attending to the
analogy of ulcers on the legs, and in other parts of the body. The
first of these occur chiefly in habits debilitated by spiritous liquors,
and the last frequently in habits debilitated by the scrophula. In
curing these diseases, it is in vain to depend upon internal or
external medicines. The whole system must be strengthened, or we
do nothing; and this is to be effected only by exercise and a
generous diet.
In relating the facts that are contained in this inquiry, I wish I
could have avoided reasoning upon them; especially as I am
confident of the certainty of the facts, and somewhat doubtful of the
truth of my reasonings.
I shall only add, that if the cure of consumptions should at last be
effected by remedies in every respect the opposites of those
palliatives which are now fashionable and universal, no more will
happen than what we have already seen in the tetanus, the small-
pox, and the management of fractured limbs.
Should this be the case, we shall not be surprised to hear of
physicians, instead of prescribing any one, or all of the medicines
formerly enumerated for consumptions, ordering their patients to
exchange the amusements, or indolence of a city, for the toils of a
country life; of their advising farmers to exchange their plentiful
tables, and comfortable fire-sides, for the scanty but solid
subsistence, and midnight exposure of the herdsman; or of their
recommending, not so much the exercise of a passive sea voyage,
as the active labours and dangers of a common sailor. Nor should it
surprise us, after what we have seen, to hear patients relate the
pleasant adventures of their excursions or labours, in quest of their
recovery from this disease, any more than it does now to see a
strong or well-shaped limb that has been broken; or to hear a man
talk of his studies, or pleasures, during the time of his being
inoculated and attended for the small-pox.
I will not venture to assert, that there does not exist a medicine
which shall supply, at least in some degree, the place of the labour
or exercises, whose usefulness in consumptions has been
established by the facts that have been mentioned. Many instances
of the analogous effects of medicines, and of exercise upon the
human body, forbid the supposition. If there does exist in nature
such a medicine, I am disposed to believe it will be found in the
class of TONICS. If this should be the case, I conceive its strength,
or its dose, must far exceed the present state of our knowledge or
practice, with respect to the efficacy or dose of tonic medicines.
I except the disease, which arises from recent abscesses in the
lungs, from the general observation which has been made,
respecting the inefficacy of the remedies that were formerly
enumerated for the cure of consumptions without labour or exercise.
These abscesses often occur without being preceded by general
debility, or accompanied by a consumptive diathesis, and are
frequently cured by nature, or by very simple medicines.
OBSERVATIONS UPON WORMS
IN THE

ALIMENTARY CANAL,
AND UPON

ANTHELMINTIC MEDICINES.

With great diffidence I venture to lay before the public my

opinions upon worms: nor should I have presumed to do it, had I
not entertained a hope of thereby exciting further inquiries upon this
subject.
When we consider how universally worms are found in all young
animals, and how frequently they exist in the human body, without
producing disease of any kind, it is natural to conclude, that they
serve some useful and necessary purposes in the animal economy.
Do they consume the superfluous aliment which all young animals
are disposed to take, before they have been taught, by experience
or reason, the bad consequences which arise from it? It is no
objection to this opinion, that worms are unknown in the human
body in some countries. The laws of nature are diversified, and often
suspended under peculiar circumstances in many cases, where the
departure from uniformity is still more unaccountable, than in the
present instance. Do worms produce diseases from an excess in
their number, and an error in their place, in the same manner that
blood, bile, and air produce diseases from an error in their place, or
from excess in their quantities? Before these questions are decided,
I shall mention a few facts which have been the result of my own
observations upon this subject.
1. In many instances, I have seen worms discharged in the small-
pox and measles, from children who were in perfect health
previously to their being attacked by those diseases, and who never
before discovered a single symptom of worms. I shall say nothing
here of the swarms of worms which are discharged in fevers of all
kinds, until I attempt to prove that an idiopathic fever is never
produced by worms.
2. Nine out of ten of the cases which I have seen of worms, have
been in children of the grossest habits and most vigorous
constitutions. This is more especially the case where the worms are
dislodged by the small-pox and measles. Doctor Capelle of
Wilmington, in a letter which I received from him, informed me, that
in the livers of sixteen, out of eighteen rats which he dissected, he
found a number of the tænia worms. The rats were fat, and
appeared in other respects to have been in perfect health. The two
rats in which he found no worms, he says, “were very lean, and their
livers smaller in proportion than the others.”
3. In weakly children, I have often known the most powerful
anthelmintics given without bringing away a single worm. If these
medicines have afforded any relief, it has been by their tonic quality.
From this fact, is it not probable—the conjecture, I am afraid, is too
bold, but I will risk it:—is it not probable, I say, that children are
sometimes disordered from the want of worms? Perhaps the tonic
medicines which have been mentioned, render the bowels a more
quiet and comfortable asylum for them, and thereby provide the
system with the means of obviating the effects of crapulas, to which
all children are disposed. It is in this way that nature, in many
instances, cures evil by evil. I confine the salutary office of worms
only to that species of them which is known by the name of the
round worm, and which occurs most frequently in children.
Is there any such disease as an idiopathic WORM-FEVER? The
Indians in this country say there is not, and ascribe the discharge of
worms to a fever, and not a fever to the worms[40].
By adopting this opinion, I am aware that I contradict the
observations of many eminent and respectable physicians.
Doctor Huxham describes an epidemic pleurisy, in the month of
March, in the year 1740, which he supposes was produced by his
patients feeding upon some corn that had been injured by the rain
the August before[41]. He likewise mentions that a number of
people, and those too of the elderly sort[42], were afflicted at one
time with worms, in the month of April, in the year 1743.
Lieutade gives an account of an epidemic worm-fever from
Velchius, an Italian physician[43]; and Sauvages describes, from
Vandermonde, an epidemic dysentery from worms, which yielded
finally only to worm medicines[44]. Sir John Pringle, and Doctor
Monro, likewise frequently mention worms as accompanying the
dysentery and remitting fever, and recommend the use of calomel as
an antidote to them.
I grant that worms appear more frequently in some epidemic
diseases than in others, and oftener in some years than in others.
But may not the same heat, moisture, and diet which produced the
diseases, have produced the worms? And may not their discharge
from the bowels have been occasioned in those epidemics, as in the
small-pox and measles, by the increased heat of the body, by the
want of nourishment, or by an anthelmintic quality being accidentally
combined with some of the medicines that are usually given in
fevers?
In answer to this, we are told that we often see the crisis of a
fever brought on by the discharge of worms from the bowels by
means of a purge, or by an anthelmintic medicine. Whenever this is
the case, I believe it is occasioned by offending bile being dislodged
by means of the purge, at the same time with the worms, or by the
anthelmintic medicine (if not a purge) having been given on, or near
one of the usual critical days of the fever. What makes the latter
supposition probable is, that worms are seldom suspected in the
beginning of fevers, and anthelmintic medicines seldom given, till
every other remedy has failed of success; and this generally happens
about the usual time in which fevers terminate in life or death.
It is very remarkable, that since the discovery and description of
the hydrocephalus internus, we hear and read much less than
formerly of worm-fevers. I suspect that disease of the brain has laid
the foundation for the principal part of the cases of worm-fevers
which are upon record in books of medicine. I grant that worms
sometimes increase the danger from fevers, and often confound the
diagnosis and prognosis of them, by a number of new and
anomalous symptoms. But here we see nothing more than that
complication of symptoms which often occurs in diseases of a very
different and opposite nature.
Having rejected worms as the cause of fevers, I proceed to
remark, that the diseases most commonly produced by them, belong
to Dr. Cullen's class of NEUROSES. And here I might add, that there
is scarcely a disease, or a symptom of a disease, belonging to this
class, which is not produced by worms. It would be only publishing
extracts from books, to describe them.
The chronic and nervous diseases of children, which are so
numerous and frequently fatal, are, I believe, frequently occasioned
by worms. There is no great danger, therefore, of doing mischief, by
prescribing anthelmintic medicines in all our first attempts to cure
their chronic and nervous diseases.
I have been much gratified by finding myself supported in the
above theory of worm-fevers, by the late Dr. William Hunter, and by
Dr. Butter, in his excellent treatise upon the infantile remitting fever.
I have taken great pains to find out, whether the presence of the
different species of worms might not be discovered by certain
peculiar symptoms; but all to no purpose. I once attended a girl of
twelve years of age in a fever, who discharged four yards of a tænia,
and who was so far from having discovered any peculiar symptom of
this species of worms, that she had never complained of any other
indisposition, than now and then a slight pain in the stomach, which
often occurs in young girls from a sedentary life, or from errors in
their diet. I beg leave to add further, that there is not a symptom
which has been said to indicate the presence of worms of any kind,
as the cause of a disease, that has not deceived me; and none
oftener than the one that has been so much depended upon, viz.
the picking of the nose. A discharge of worms from the bowels, is,
perhaps, the only symptom that is pathognomonic of their presence
in the intestines.
I shall now make a few remarks upon anthelmintic remedies.
But I shall first give an account of some experiments which I
made in the year 1771, upon the common earth-worm, in order to
ascertain the anthelmintic virtues of a variety of substances. I made
choice of the earth-worm for this purpose, as it is, according to
naturalists, nearly the same in its structure, manner of subsistence,
and mode of propagating its species, with the round worm of the
human body.
In the first column I shall set down, under distinct heads, the
substances in which worms were placed; and in the second and third
columns the time of their death, from the action of these substances
upon them.
I. Bitter and astringent substances. Hours. Minutes.
Watery infusion of aloes 2 48
—— of rhubarb 1 30
—— of Peruvian bark 1 30
II. Purges.
Watery infusion of jalap 1 —
——— bear's-foot 1 17
——— gamboge 1 —
III. Salts.
1. Acids.
Vinegar — 1½ convulsed.
Lime juice — 1
Diluted nitrous acid — 1½
2. Alkali.
A watery solution of salt of tartar — 2 convulsed, throwing
up a mucus on
the surface of
the water.
3. Neutral Salts.
In a watery solution of common salt — 1 convulsed.
— of nitre — ditto.
— of sal diuretic — ditto.
— of sal ammoniac — 1½
— of common salt and sugar. — 4
4. Earthy and metallic salts.
In a watery solution of Epsom salt — 15½
— of rock alum — 10
— of corrosive sublimate — 1½ convulsed.
— of calomel — 49
— of turpeth mineral — 1 convulsed.
— of sugar of lead — 3
— of green vitriol — 1
— of blue vitriol — 10
— of white vitriol — 30
IV. Metals.
Filings of steel — 2½
Filings of tin 1 —
V. Calcareous Earth.
Chalk 2 —
VI. Narcotic Substances.
Watery infusion of opium — 11½ convulsed.
—— of Carolina pink-root — 33
—— of tobacco — 14
VII. Essential Oils.
Oil of wormwood — 3 convulsed.
— of mint — 3
— of caraway seed — 3
— of amber — 1½
— of anniseed — 4½
— of turpentine — 6
VIII. Arsenic.
A watery solution of white arsenic near 2 —
IX. Fermented Liquors.
In Madeira wine — 3 convulsed.
Claret — 10
X. Distilled Spirit.
Common rum — 1 convulsed.
XI. The Fresh Juices of Ripe Fruits.
The juice of red cherries — 5½
——— of black do. — 5
——— of red currants — 2½
——— of gooseberries — 3½
——— of whortleberries — 12
——— of blackberries — 7
——— of raspberries — 5½
——— of plums — 15
——— of peaches — 25
——— of water-melons, no effect. — —
XII. Saccharine Substances.
Honey — 7
Molasses — 7
Brown sugar — 30
Manna — 2½
XIII. In Aromatic Substances.
Camphor — 5
Pimento — 3½
Black pepper — 45
XIV. Foetid Substances
Juice of onions — 3½
Watery infusion of assafœtida — 27
—— Santonicum, or worm seed 1 —
XIV. Miscellaneous Substances.
Sulphur mixed with oil 2 —
Æthiops mineral 2 —
Sulphur 2 —
Solution of gunpowder — 1½
—— of soap — 19
Oxymel of squills — 3½
Sweet oil 2 30
In the application of these experiments to the human body, an
allowance must always be made for the alteration which the several
anthelmintic substances that have been mentioned, may undergo
from mixture and diffusion in the stomach and bowels.
In order to derive any benefit from these experiments, as well as
from the observations that have been made upon anthelmintic
medicines, it will be necessary to divide them into such as act,
1. Mechanically,
2. Chemically upon worms; and,
3. Into those which possess a power composed of chemical and
mechanical qualities.
1. The mechanical medicines act indirectly and directly upon the
worms.
Those which act indirectly are, vomits, purges, bitter and
astringent substances, particularly aloes, rhubarb, bark, bear's-foot,
and worm-seed. Sweet oil acts indirectly and very feebly upon
worms. It was introduced into medicine from its efficacy in
destroying the botts in horses; but the worms which infest the
human bowels, are of a different nature, and possess very different
organs of life from those which are found in the stomach of a horse.
Those mechanical medicines which act directly upon the worms,
are cowhage[45] and powder of tin. The last of these medicines has
been supposed to act chemically upon the worms, from the arsenic
which adheres to it; but from the length of time a worm lived in a
solution of white arsenic, it is probable the tin acts altogether
mechanically upon them.
2. The medicines which act chemically upon worms, appear, from
our experiments, to be very numerous.
Nature has wisely guarded children against the morbid effects of
worms, by implanting in them an early appetite for common salt,
ripe fruits, and saccharine substances; all of which appear to be
among the most speedy and effectual poisons for worms.
Let it not be said, that nature here counteracts her own purposes.
Her conduct in this business is conformable to many of her
operations in the human body, as well as throughout all her works.
The bile is a necessary part of the animal fluids, and yet an appetite
for ripe fruits seems to be implanted chiefly to obviate the
consequences of its excess, or acrimony, in the summer and
autumnal months.
The use of common salt as an anthelmintic medicine, is both
ancient and universal. Celsus recommends it. In Ireland it is a
common practice to feed children, who are afflicted by worms, for a
week or two upon a salt-sea weed, and when the bowels are well
charged with it, to give a purge of wort in order to carry off the
worms, after they are debilitated by the salt diet.
I have administered many pounds of common salt coloured with
cochineal, in doses of half a drachm, upon an empty stomach in the
morning, with great success in destroying worms.
Ever since I observed the effects of sugar and other sweet
substances upon worms, I have recommended the liberal use of all
of them in the diet of children, with the happiest effects. The sweet
substances probably act in preventing the diseases from worms in
the stomach only, into which they often insinuate themselves,
especially in the morning. When we wish to dislodge worms from the
bowels by sugar or molasses, we must give these substances in
large quantities, so that they may escape in part the action of the
stomach upon them.
I can say nothing from my own experience of the efficacy of the
mineral salts, composed of copper, iron, and zinc, combined with
vitriolic acid, in destroying worms in the bowels. Nor have I ever
used the corrosive sublimate in small doses as an anthelmintic.
I have heard of well-attested cases of the efficacy of the oil of
turpentine in destroying worms.
The expressed juices of onions and of garlic are very common
remedies for worms. From one of the experiments, it appears that
the onion juice possesses strong anthelmintic virtues.
I have often prescribed a tea-spoonful of gunpowder in the
morning upon an empty stomach, with obvious advantage. The
active medicine here is probably the nitre.
I have found a syrup made of the bark of the Jamaica cabbage-
tree[46], to be a powerful as well as a most agreeable anthelmintic
medicine. It sometimes purges and vomits, but its good effects may
be obtained without giving it in such doses as to produce these
evacuations.
There is not a more certain anthelmintic than Carolina pink-
root[47]. But as there have been instances of death having followed
excessive doses of it, imprudently administered, and as children are
often affected by giddiness, stupor, and a redness and pain in the
eyes after taking it, I acknowledge that I have generally preferred to
it, less certain, but more safe medicines for destroying worms.
3. Of the medicines whose action is compounded of mechanical
and chemical qualities, calomel, jalap, and the powder of steel, are
the principal.
Calomel, in order to be effectual, must be given in large doses. It
is a safe and powerful anthelmintic. Combined with jalap, it often
brings away worms when given for other purposes.
Of all the medicines that I have administered, I know of none
more safe and certain than the simple preparations of iron, whether
they be given in the form of steel-filings or of the rust of iron. If ever
they fail of success, it is because they are given in too small doses. I
generally prescribe from five to thirty grains every morning, to
children between one year, and ten years old; and I have been
taught by an old sea-captain, who was cured of a tænia by this
medicine, to give from two drachms to half an ounce of it, every
morning, for three or four days, not only with safety, but with
success.
I shall conclude this essay with the following remarks:
1. Where the action of medicines upon worms in the bowels does
not agree exactly with their action upon the earth-worms in the
experiments that have been related, it must be ascribed to the
medicines being more or less altered by the action of the stomach
upon them. I conceive that the superior anthelmintic qualities of
pink-root, steel-filings, and calomel (all of which acted but slowly
upon the earth-worms compared with many other substances) are in
a great degree occasioned by their escaping the digestive powers
unchanged, and acting in a concentrated state upon the worms.
2. In fevers attended with anomalous symptoms, which are
supposed to arise from worms, I have constantly refused to yield to
the solicitations of my patients, to abandon the indications of cure in
the fever, and to pursue worms as the principal cause of the disease.
While I have adhered steadily to the usual remedies for the different
states of fever, in all their stages, I have at the same time blended
those remedies occasionally with anthelmintic medicines. In this I
have imitated the practice of physicians in many other diseases, in
which troublesome and dangerous symptoms are pursued, without
seducing the attention from the original disease. The anthelmintic
medicines prescribed in these cases, should not be the rust of iron,
and common salt, which are so very useful in chronic diseases from
worms, but calomel and jalap, and such other medicines as aid in
the cure of fevers.

Footnotes:
[40] See the Inquiry into the Diseases of the Indians, p. 19.
[41] Vol. II. of his Epidemics, p. 56.
[42] P. 136.
[43] Vol. I. p. 76.
[44] Vol. II. p. 329.
[45] Dolichos Pruriens, of Linnæus.
[46] Geoffrea, of Linnæus.
[47] Spigelia Marylandica, of Linnæus.
AN ACCOUNT
OF THE

EXTERNAL USE OF ARSENIC,

IN THE

CURE OF CANCERS.
A few years ago, a certain Doctor Hugh Martin, a surgeon of one
of the Pennsylvania regiments stationed at Pittsburg, during the
latter part of the late war, came to this city, and advertised to cure
cancers with a medicine which he said he had discovered in the
woods, in the neighbourhood of the garrison. As Dr. Martin had once
been my pupil, I took the liberty of waiting upon him, and asked him
some questions respecting his discovery. His answers were
calculated to make me believe, that his medicine was of a vegetable
nature, and that it was originally an Indian remedy. He showed me
some of the medicine, which appeared to be the powder of a well-
dried root of some kind. Anxious to see the success of this medicine
in cancerous sores, I prevailed upon the doctor to admit me to see
him apply it in two or three cases. I observed, in some instances, he
applied a powder to the parts affected, and in others only touched
them with a feather dipped in a liquid which had a white sediment,
and which he made me believe was the vegetable root diffused in
water. It gave me great pleasure to witness the efficacy of the
doctor's applications. In several cancerous ulcers, the cures he
performed were complete. Where the cancers were much connected
with the lymphatic system, or accompanied with a scrophulous habit
of body, his medicine always failed, and, in some instances, did
evident mischief.
Anxious to discover a medicine that promised relief in even a few
cases of cancers, and supposing that all the caustic vegetables were
nearly alike, I applied the phytolacca or poke-root, the stramonium,
the arum, and one or two others, to foul ulcers, in hopes of seeing
the same effects from them which I had seen from Doctor Martin's
powder; but in these I was disappointed. They gave some pain, but
performed no cures. At length I was furnished by a gentleman from
Pittsburg with a powder which I had no doubt, from a variety of
circumstances, was of the same kind as that used by Dr. Martin. I
applied it to a fungous ulcer, but without producing the degrees of
pain, inflammation, or discharge, which I had been accustomed to
see from the application of Dr. Martin's powder. After this, I should
have suspected that the powder was not a simple root, had not the
doctor continued upon all occasions to assure me, that it was wholly
a vegetable preparation.
In the beginning of the year 1784, the doctor died, and it was
generally believed that his medicine had died with him. A few weeks
after his death I procured, from one of his administrators, a few
ounces of the doctor's powder, partly with a view of applying it to a
cancerous sore which then offered, and partly with a view of
examining it more minutely than I had been able to do during the
doctor's life. Upon throwing the powder, which was of a brown
colour, upon a piece of white paper, I perceived distinctly a number
of white particles scattered through it. I suspected at first that they
were corrosive sublimate, but the usual tests of that metallic salt
soon convinced me, that I was mistaken. Recollecting that arsenic
was the basis of most of the celebrated cancer powders that have
been used in the world, I had recourse to the tests for detecting it.
Upon sprinkling a small quantity of the powder upon some coals of
fire, it emitted the garlick smell so perceptibly as to be known by
several persons whom I called into the room where I made the
experiment, and who knew nothing of the object of my inquiries.
After this, with some difficulty I picked out about three or four grains
of the white powder, and bound them between two pieces of copper,
which I threw into the fire. After the copper pieces became red hot,
I took them out of the fire, and when they had cooled, discovered
an evident whiteness imparted to both of them. One of the pieces
afterwards looked like dull silver. These two tests have generally
been thought sufficient to distinguish the presence of arsenic in any
bodies; but I made use of a third, which has lately been
communicated to the world by Mr. Bergman, and which is supposed
to be in all cases infallible.
I infused a small quantity of the powder in a solution of a
vegetable alkali in water for a few hours, and then poured it upon a
solution of blue vitriol in water. The colour of the vitriol was
immediately changed to a beautiful green, and afterwards
precipitated.
I shall close this paper with a few remarks upon this powder, and
upon the cure of cancers and foul ulcers of all kinds.
1. The use of caustics in cancers and foul ulcers is very ancient,
and universal. But I believe arsenic to be the most efficacious of any
that has ever been used. It is the basis of Plunket's and probably of
Guy's well-known cancer powders. The great art of applying it
successfully, is to dilute and mix it in such a manner as to mitigate
the violence of its action. Doctor Martin's composition was happily
calculated for this purpose. It gave less pain than the common or
lunar caustic. It excited a moderate inflammation, which separated
the morbid from the sound parts, and promoted a plentiful afflux of
humours to the sore during its application. It seldom produced an
escar; hence it insinuated itself into the deepest recesses of the
cancers, and frequently separated those fibres in an unbroken state,
which are generally called the roots of the cancer. Upon this account,
I think, in some ulcerated cancers it is to be preferred to the knife. It
has no action upon the sound skin. This Doctor Hall proved, by
confining a small quantity of it upon his arm for many hours. In
those cases where Doctor Martin used it to extract cancerous or
schirrous tumours that were not ulcerated, I have reason to believe
that he always broke the skin with Spanish flies.
2. The arsenic used by the doctor was the pure white arsenic. I
should suppose from the examination I made of the powder with the
eye, that the proportion of arsenic to the vegetable powder, could
not be more than one-fortieth part of the whole compound. I have
reason to think that the doctor employed different vegetable
substances at different times. The vegetable matter with which the
arsenic was combined in the powder which I used in my
experiments, was probably nothing more than the powder of the
root and berries of the solanum lethale, or deadly nightshade. As the
principal, and perhaps the only design of the vegetable addition was
to blunt the activity of the arsenic, I should suppose that the same
proportion of common wheat flour as the doctor used of his caustic
vegetables, would answer nearly the same purpose. In those cases
where the doctor applied a feather dipped in a liquid to the sore of
his patient, I have no doubt but his phial contained nothing but a
weak solution of arsenic in water. This is no new method of applying
arsenic to foul ulcers. Doctor Way of Wilmington has spoken in the
highest terms to me of a wash for foulnesses on the skin, as well as
old ulcers, prepared by boiling an ounce of white arsenic in two
quarts of water to three pints, and applying it once or twice a day.
3. I mentioned, formerly, that Doctor Martin was often
unsuccessful in the application of his powder. This was occasioned
by his using it indiscriminately in all cases. In schirrous and
cancerous tumours, the knife should always be preferred to the
caustic. In cancerous ulcers attended with a scrophulous or a bad
habit of body, such particularly as have their seat in the neck, in the
breasts of females, and in the axillary glands, it can only protract the
patient's misery. Most of the cancerous sores cured by Doctor Martin
were seated on the nose, or cheeks, or upon the surface or
extremities of the body. It remains yet to discover a cure for cancers
that taint the fluids, or infect the whole lymphatic system. This cure
I apprehend must be sought for in diet, or in the long use of some
internal medicine.
To pronounce a disease incurable, is often to render it so. The
intermitting fever, if left to itself, would probably prove frequently,
and perhaps more speedily fatal than cancers. And as cancerous
tumours and sores are often neglected, or treated improperly by
injudicious people, from an apprehension that they are incurable (to
which the frequent advice of physicians “to let them alone,” has no
doubt contributed), perhaps the introduction of arsenic into regular
practice as a remedy for cancers, may invite to a more early
application to physicians, and thereby prevent the deplorable cases
that have been mentioned, which are often rendered so by delay or
unskilful management.
4. It is not in cancerous sores only that Doctor Martin's powder
has been found to do service. In sores of all kinds, and from a
variety of causes, where they have been attended with fungous flesh
or callous edges, I have used the doctor's powder with advantage.
I flatter myself that I shall be excused in giving this detail of a
quack medicine, when we reflect that it was from the inventions and
temerity of quacks, that physicians have derived some of their most
active and most useful medicines.
OBSERVATIONS
UPON

THE TETANUS.
For a history of the different names and symptoms of this disease,
I beg leave to refer the reader to practical books, particularly to
Doctor Cullen's First Lines. My only design in this inquiry, is to deliver
such a theory of the disease, as may lead to a new and successful
use of old and common remedies for it.
All the remote and predisposing causes of the tetanus act by
inducing preternatural debility, and irritability in the muscular parts
of the body. In many cases, the remote causes act alone, but they
more frequently require the co-operation of an exciting cause. I shall
briefly enumerate, without discriminating them, or pointing out when
they act singly, or when in conjunction with each other.
I. Wounds on different parts of the body are the most frequent
causes of this disease. It was formerly supposed it was the effect
only of a wound, which partially divided a tendon, or a nerve; but
we now know it is often the consequence of læsions which affect the
body in a superficial manner. The following is a list of such wounds
and læsions as have been known to induce the disease:
1. Wounds in the soles of the feet, in the palms of the hands, and
under the nails, by means of nails or splinters of wood.
2. Amputations, and fractures of limbs.
3. Gun-shot wounds.
4. Venesection.
5. The extraction of a tooth, and the insertion of new teeth.
6. The extirpation of a schirrous.
7. Castration.
8. A wound on the tongue.
9. The injury which is done to the feet by frost.
10. The injury which is sometimes done to one of the toes, by
stumping it (as it is called) in walking.
11. Cutting a nail too closely. Also,
12. Cutting a corn too closely.
13. Wearing a shoe so tight as to abrade the skin of one of the
toes.
14. A wound, not more than an eighth part of an inch, upon the
forehead.
15. The stroke of a whip upon the arm, which only broke the skin.
16. Walking too soon upon a broken limb.
17. The sting of a wasp upon the glands penis.
18. A fish bone sticking in the throat.
19. Cutting the navel string in new-born infants.
Between the time in which the body is thus wounded or injured,
and the time in which the disease makes its appearance, there is an
interval which extends from one day to six weeks. In the person who
injured his toe by stumping it in walking, the disease appeared the
next day. The trifling wound on the forehead which I have
mentioned, produced both tetanus and death, the day after it was
received. I have known two instances of tetanus, from running nails
in the feet, which did not appear until six weeks afterwards. In most
of the cases of this disease from wounds which I have seen, there
was a total absence of pain and inflammation, or but very moderate
degrees of them, and in some of them the wounds had entirely
healed, before any of the symptoms of the disease had made their
appearance. Wounds and læsions are most apt to produce tetanus,
after the long continued application of heat to the body; hence its
greater frequency, from these causes, in warm than in cold climates,
and in warm than in cold weather, in northern countries.
II. Cold applied suddenly to the body, after it has been exposed to
intense heat. Of this Dr. Girdlestone mentions many instances, in his
Treatise upon Spasmodic Affections in India. It was most commonly
induced by sleeping upon the ground, after a warm day. Such is the
dampness and unwholesome nature of the ground, in some parts of
that country, that “fowls (the doctor says) put into coops at night, in
the sickly season of the year, and on the same soil that the men
slept, were always found dead the next morning, if the coop was not
placed at a certain height above the surface of the earth[48].” It was
brought on by sleeping on a damp pavement in a servant girl of Mr.
Alexander Todd of Philadelphia, in the evening of a day in which the
mercury in Fahrenheit's thermometer stood at 90°. Dr. Chalmers
relates an instance of its having been induced by a person's sleeping
without a nightcap, after shaving his head. The late Dr. Bartram
informed me, that he had known a draught of cold water produce it
in a man who was in a preternaturally heated state. The cold air
more certainly brings on this disease, if it be applied to the body in
the form of a current. The stiff neck which is sometimes felt after
exposure to a stream of cool air from an open window, is a tendency
to a locked jaw, or a feeble and partial tetanus.
III. Worms and certain acrid matters in the alimentary canal.
Morgagni relates an instance of the former, and I shall hereafter
mention instances of the latter in new-born infants.
IV. Certain poisonous vegetables. There are several cases upon
record of its being induced by the hemlock dropwort, and the datura
stramonium, or Jamestown weed of our country.
V. It is sometimes a symptom of the bilious remitting and
intermitting fever. It is said to occur more frequently in those states
of fever in the island of Malta, than in any other part of the world.
VI. It is likewise a symptom of that malignant state of fever which
is brought on by the bite of a rabid animal, also of hysteria and gout.
VII. The grating noise produced by cutting with a knife upon a
pewter plate excited it in a servant, while he was waiting upon his
master's table in London. It proved fatal in three days.
VIII. The sight of food, after long fasting.
IX. Drunkenness.
X. Certain emotions and passions of the mind. Terror brought it on
a brewer in this city. He had been previously debilitated by great
labour, in warm weather. I have heard of its having been induced in
a man by agitation of mind, occasioned by seeing a girl tread upon a
nail. Fear excited it in a soldier who kneeled down to be shot. Upon
being pardoned he was unable to rise, from a sudden attack of
tetanus. Grief produced it in a case mentioned by Dr. Willan.
XI. Parturition.
All these remote and exciting causes act with more or less
certainty and force, in proportion to the greater or less degrees of
fatigue which have preceded them.
It has been customary with authors to call all those cases of
tetanus, which are not brought on by wounds, symptomatic. They
are no more so than those which are said to be idiopathic. They all
depend alike upon irritating impressions, made upon one part of the
body, producing morbid excitement, or disease in another. It is
immaterial, whether the impression be made upon the intestines by
a worm, upon the ear by an ungrateful noise, upon the mind by a
strong emotion, or upon the sole of the foot by a nail; it is alike
communicated to the muscles, which, from their previous debility
and irritability, are thrown into commotions by it. In yielding to the
impression of irritants, they follow in their contractions the order of
their predisposing debility. The muscles which move the lower jaw
are affected more early, and more obstinately than any of the other
external muscles of the body, only because they are more constantly
in a relaxed, or idle state.

Full Download Getting Started with Natural Language Processing MEAP V06 Ekaterina Kochmar PDF DOCX
100% (3)
Full Download Getting Started with Natural Language Processing MEAP V06 Ekaterina Kochmar PDF DOCX
55 pages
Management Information Systems Managing The Digital Firm Canadian 7th Edition Laudon Test Bank 1
100% (57)
Management Information Systems Managing The Digital Firm Canadian 7th Edition Laudon Test Bank 1
25 pages
Examples of Handwriting Styles PDF
50% (2)
Examples of Handwriting Styles PDF
22 pages
Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni - Download the ebook now for an unlimited reading experience
100% (3)
Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni - Download the ebook now for an unlimited reading experience
72 pages
Mastering Natural Language Processing with Python and NLTK
From Everand
Mastering Natural Language Processing with Python and NLTK
Pedro Martins
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
211 pages
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
From Everand
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
Amandeep
No ratings yet
Python Programming
From Everand
Python Programming
Arthur Keane
No ratings yet
Natural Language Processing: All You Need To Know About
No ratings yet
Natural Language Processing: All You Need To Know About
45 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Getting Started with Natural Language Processing MEAP V06 Ekaterina Kochmar 2024 Scribd Download
100% (2)
Getting Started with Natural Language Processing MEAP V06 Ekaterina Kochmar 2024 Scribd Download
62 pages
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
NLP DL
No ratings yet
NLP DL
26 pages
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
100% (2)
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
55 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
80 pages
Natural Language Processing (NLP) With Python - Tutorial
No ratings yet
Natural Language Processing (NLP) With Python - Tutorial
72 pages
Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers - The complete ebook set is ready for download today
No ratings yet
Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers - The complete ebook set is ready for download today
55 pages
Instant ebooks textbook Natural Language Processing with PyTorch 2019th Edition Delip Rao download all chapters
No ratings yet
Instant ebooks textbook Natural Language Processing with PyTorch 2019th Edition Delip Rao download all chapters
40 pages
Natural Language Processing With Python Cookbook 1st Edition Krishna Bhavsar Download PDF
100% (7)
Natural Language Processing With Python Cookbook 1st Edition Krishna Bhavsar Download PDF
38 pages
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
No ratings yet
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
185 pages
sha10
No ratings yet
sha10
6 pages
Python Text Processing with NLTK 2.0 Cookbook: LITE
From Everand
Python Text Processing with NLTK 2.0 Cookbook: LITE
Jacob Perkins
4/5 (1)
Natural Language Processing with Python Cookbook 1st Edition Krishna Bhavsar - The ebook is available for online reading or easy download
100% (4)
Natural Language Processing with Python Cookbook 1st Edition Krishna Bhavsar - The ebook is available for online reading or easy download
65 pages
Chapter-1 Deep Learning in NLP
No ratings yet
Chapter-1 Deep Learning in NLP
28 pages
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
From Everand
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
Eric Chou
3/5 (2)
(Ebook) Natural Language Processing with Python Cookbook by Krishna Bhavsar, Pratap Dangeti ISBN 9781787289321, 178728932X 2024 scribd download
100% (8)
(Ebook) Natural Language Processing with Python Cookbook by Krishna Bhavsar, Pratap Dangeti ISBN 9781787289321, 178728932X 2024 scribd download
67 pages
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
NLP Unit 1 (1)
No ratings yet
NLP Unit 1 (1)
48 pages
NLP handwritten notes_copy
No ratings yet
NLP handwritten notes_copy
26 pages
Natural Language Processing with PyTorch 2019th Edition Delip Rao instant download
100% (1)
Natural Language Processing with PyTorch 2019th Edition Delip Rao instant download
50 pages
eco36
No ratings yet
eco36
6 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
21 pages
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Amer 2
No ratings yet
Amer 2
18 pages
Natural Language Processing (NLP) (A Complete Guide)
No ratings yet
Natural Language Processing (NLP) (A Complete Guide)
26 pages
Elective
No ratings yet
Elective
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Python Programming for Newbies
From Everand
Python Programming for Newbies
Abound Academy
No ratings yet
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
31 pages
Python Mastery Unleashed: Advanced Programming Techniques
From Everand
Python Mastery Unleashed: Advanced Programming Techniques
Jarrel E.
No ratings yet
Top NLP BOoks
No ratings yet
Top NLP BOoks
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
47 pages
ML1701 - NLP Notes Unit-1
No ratings yet
ML1701 - NLP Notes Unit-1
38 pages
Python Programming: Your Step By Step Guide To Easily Learn Python in 7 Days
From Everand
Python Programming: Your Step By Step Guide To Easily Learn Python in 7 Days
i Code Academy
3/5 (9)
Python Interview Questions: Ultimate Guide to Success
From Everand
Python Interview Questions: Ultimate Guide to Success
Meenu Kohli
No ratings yet
Natural Language Processing: A Beginner's Guide To Fundamentals of
No ratings yet
Natural Language Processing: A Beginner's Guide To Fundamentals of
14 pages
P-1.1.3
No ratings yet
P-1.1.3
9 pages
Getting Started With Artificial Intelligence - Preview - Final 1 - KUO12425USEN PDF
No ratings yet
Getting Started With Artificial Intelligence - Preview - Final 1 - KUO12425USEN PDF
18 pages
nlp
No ratings yet
nlp
1 page
Python Programming Techniques: The Art of Coding and Programming Explained
From Everand
Python Programming Techniques: The Art of Coding and Programming Explained
Lance Gifford
No ratings yet
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
No ratings yet
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
11 pages
Practical Natural Language Processing A Comprehensive Guide to Building Real world Nlp Systems 1st Edition Sowmya Vajjala - The full ebook with complete content is ready for download
100% (1)
Practical Natural Language Processing A Comprehensive Guide to Building Real world Nlp Systems 1st Edition Sowmya Vajjala - The full ebook with complete content is ready for download
61 pages
Python for Beginners: A Step by Step Guide on How to Program with Python
From Everand
Python for Beginners: A Step by Step Guide on How to Program with Python
Mark Allen
No ratings yet
Analysis of Applied Natural Language Processing with Python_ Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing ( PDFDrive )
No ratings yet
Analysis of Applied Natural Language Processing with Python_ Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing ( PDFDrive )
2 pages
Applied Natural Language Processing with PyTorch 2.0: Master Advanced NLP Techniques, Transform Text Data into Insights, and Build Scalable AI Models with PyTorch 2.0
From Everand
Applied Natural Language Processing with PyTorch 2.0: Master Advanced NLP Techniques, Transform Text Data into Insights, and Build Scalable AI Models with PyTorch 2.0
Dr. Deepti
No ratings yet
NLP lab introduction
No ratings yet
NLP lab introduction
4 pages
AI For Natural Language Processing Bundle
No ratings yet
AI For Natural Language Processing Bundle
84 pages
Immediate download Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni ebooks 2024
100% (2)
Immediate download Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni ebooks 2024
24 pages
Termination of Employment 3rd Edition Alastair Purdy download
100% (1)
Termination of Employment 3rd Edition Alastair Purdy download
74 pages
When Things Grow Many: Complexity, Universality and Emergence in Nature 1st Edition Lawrence S. Schulman download
100% (1)
When Things Grow Many: Complexity, Universality and Emergence in Nature 1st Edition Lawrence S. Schulman download
51 pages
Sweet Desire Sweet but Twisted Christmas 1st Edition Amy Stephens instant download
100% (1)
Sweet Desire Sweet but Twisted Christmas 1st Edition Amy Stephens instant download
75 pages
Umberto Eco's Semiotics: Theory, Methodology and Poetics 1st Edition Bujar Hoxha instant download
100% (1)
Umberto Eco's Semiotics: Theory, Methodology and Poetics 1st Edition Bujar Hoxha instant download
46 pages
Academy of Spells and Wishes A Limited Edition Collection of Academy Stories Various Authors download
100% (1)
Academy of Spells and Wishes A Limited Edition Collection of Academy Stories Various Authors download
44 pages
Adsorption: Fundamental Processes and Applications (Volume 33) (Interface Science and Technology, Volume 33) 1st Edition Mehrorang Ghaedi download
100% (1)
Adsorption: Fundamental Processes and Applications (Volume 33) (Interface Science and Technology, Volume 33) 1st Edition Mehrorang Ghaedi download
53 pages
Formal Analysis by Abstract Interpretation Case Studies in Modern Protocols Benjamin Aziz pdf download
100% (1)
Formal Analysis by Abstract Interpretation Case Studies in Modern Protocols Benjamin Aziz pdf download
50 pages
Dependable Computing EDCC 2020 Workshops AI4RAILS DREAMS DSOGRI SERENE 2020 Munich Germany September 7 2020 Proceedings Simona Bernardi instant download
100% (1)
Dependable Computing EDCC 2020 Workshops AI4RAILS DREAMS DSOGRI SERENE 2020 Munich Germany September 7 2020 Proceedings Simona Bernardi instant download
40 pages
Integrating CBT and Third Wave Therapies: Distinctive Features 1st Edition Fiona Kennedy download
100% (1)
Integrating CBT and Third Wave Therapies: Distinctive Features 1st Edition Fiona Kennedy download
47 pages
Advanced R Solutions 1st Edition Malte Grosser download
100% (1)
Advanced R Solutions 1st Edition Malte Grosser download
52 pages
Men Without Women First Edition Haruki Murakami pdf download
100% (1)
Men Without Women First Edition Haruki Murakami pdf download
44 pages
GTmetrix Report Dailygram - Com 20180515T162401 FMxzGN8r Full
No ratings yet
GTmetrix Report Dailygram - Com 20180515T162401 FMxzGN8r Full
12 pages
Email Templates
No ratings yet
Email Templates
8 pages
82 Govind2
No ratings yet
82 Govind2
47 pages
Research Paper
No ratings yet
Research Paper
49 pages
Full Download Adobe Dreamweaver CS5 on Demand 1st Edition Steve Johnson PDF DOCX
100% (1)
Full Download Adobe Dreamweaver CS5 on Demand 1st Edition Steve Johnson PDF DOCX
67 pages
Liceul Teoretic "Ion Creanga": Proiect
No ratings yet
Liceul Teoretic "Ion Creanga": Proiect
5 pages
DOM
100% (1)
DOM
8 pages
Complete Guide - HTML
No ratings yet
Complete Guide - HTML
13 pages
Js Final
No ratings yet
Js Final
4 pages
Web Development Using PHP (PHP) - 050120304
No ratings yet
Web Development Using PHP (PHP) - 050120304
3 pages
History of HTML, WWW, Internet
No ratings yet
History of HTML, WWW, Internet
3 pages
HTML Form Controls: Action
No ratings yet
HTML Form Controls: Action
14 pages
The Road to React The React js in JavaScript Book 2024 Edition Robin Wieruch instant download
No ratings yet
The Road to React The React js in JavaScript Book 2024 Edition Robin Wieruch instant download
54 pages
CCS375 WT QB
No ratings yet
CCS375 WT QB
5 pages
Full Download React and React Native - Fifth Edition Mikhail Sakhniuk File PDF All Chapter On 2024
100% (4)
Full Download React and React Native - Fifth Edition Mikhail Sakhniuk File PDF All Chapter On 2024
54 pages
Ansys Fluent 12.0 Getting Started Guide: January 2009
No ratings yet
Ansys Fluent 12.0 Getting Started Guide: January 2009
28 pages
GTM 101 Ebook v2
100% (1)
GTM 101 Ebook v2
101 pages
Literature Review Student Hostel Management System
No ratings yet
Literature Review Student Hostel Management System
8 pages
Introduction To XML
No ratings yet
Introduction To XML
9 pages
Final Year Project Report On
No ratings yet
Final Year Project Report On
49 pages
Minn2010 Course Notes Introduction To Computers (Hardware and Software)
No ratings yet
Minn2010 Course Notes Introduction To Computers (Hardware and Software)
15 pages
Codigos HTML Mas Modernos
No ratings yet
Codigos HTML Mas Modernos
31 pages
Its Mobile
No ratings yet
Its Mobile
2 pages
Web Laboratory Manual
0% (1)
Web Laboratory Manual
75 pages
HTML Semantic Essential Questions
No ratings yet
HTML Semantic Essential Questions
4 pages
Problem Solving and Programming: Dr. A. Nayeemulla Khan
No ratings yet
Problem Solving and Programming: Dr. A. Nayeemulla Khan
192 pages
ETech Q1 Weeks 1-2
No ratings yet
ETech Q1 Weeks 1-2
28 pages
Grapecity Sample Technical Placement Paper
No ratings yet
Grapecity Sample Technical Placement Paper
15 pages

Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download

Uploaded by

Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python 2nd Edition Akshay Kulkarni download

Uploaded by

Natural Language Processing Recipes: Unlocking

Text Data with Machine Learning and Deep

Download more ebook from https://ebookmeta.com

Natural Language Processing Recipes

ISBN 978-1-4842-7350-0 e-ISBN 978-1-4842-7351-7

© Akshay Kulkarni and Adarsha Shivananda 2021

The use of general descriptive names, registered names, trademarks,

This Apress imprint is published by the registered company APress

So, what is unstructured data? Unstructured data is information that

What You Will Learn

What Do You Need for This Book?

!pip install nltk

spaCy is a trending library that comes with the added flavors of a

!pip install spacy

TextBlob is one of data scientists’ favorite libraries when it comes

!pip install textblob

CoreNLP is a Python wrapper for Stanford CoreNLP. The toolkit

1. Extracting the Data

Recipe 1-1. Collecting Data

Step 1-2. Execute query in Python

Recipe 1-2. Collecting Data from PDFs

Step 2-1. Install and import all the necessary libraries

!pip install PyPDF2

Step 2-2. Extract text from a PDF file

#Creating a pdf file object

Recipe 1-3. Collecting Data from Word Files

Step 3-1. Install and import all the necessary libraries

Step 3-2. Extract text from a Word file

#Creating a word file object

Recipe 1-4. Collecting Data from JSON

Step 4-1. Install and import all the necessary libraries

#extracting the text from "https://quotes.rest/qod.json"

Recipe 1-5. Collecting Data from HTML

Step 5-1. Install and import all the necessary libraries

!pip install bs4

Step 5-2. Fetch the HTML file

Step 5-3. Parse the HTML file

Step 5-4. Extract a tag value

Step 5-5. Extract all instances of a particular tag

for x in soup.find_all('a'): print(x.string)

Step 5-6. Extract all text from a particular tag

for x in soup.find_all('p'): print(x.text)

Recipe 1-6. Parsing Text Using Regular Expressions

For an explanation of regex, please refer to the main recipe.

Extracting Email IDs

doc = "For more details please mail us at: xyz@abc.com, pqr@mno.com"

2. Execute the re.findall function.

addresses = re.findall(r'[\w\.-]+@[\w\.-]+', doc)

Replacing Email IDs

doc = "For more details please mail us at xyz@abc.com"

2. Execute the re.sub function.

new_email_address = re.sub(r'([\w\.-]+)@([\w\.-]+)', r'pqr@mno.com', doc)

Extracting Data from an eBook and Performing regex

# Count number of times "the" is appeared in the book

Recipe 1-7. Handling Strings

String_v1 = "I am exploring NLP"

2. Replace "exploring" with "learning" in the preceding string.

String_v2 = String_v1.replace("exploring", "learning")

Concatenating Two Strings

Searching for a Substring in a String

var="I am learning NLP"

Recipe 1-8. Scraping Text from the Web

Step 8-1. Install all the necessary libraries

Step 8-2. Import the libraries

Step 8-3. Identify the URL to extract the data

Step 8-8. Download the data frame

2. Exploring and Processing Text Data

Recipe 2-1. Converting Text Data to Lowercase

Step 1-1. Read/create the text data

text=['This is introduction to NLP','It is likely to be useful, to

Step 1-2. Execute the lower() function on the text data

df['tweet'] = df['tweet'].apply(lambda x: " ".join(x.lower() for x

With great diffidence I venture to lay before the public my

EXTERNAL USE OF ARSENIC,

You might also like