Python Data Science Cookbook Gopi Subramanian download
Python Data Science Cookbook Gopi Subramanian download
download
https://ebookbell.com/product/python-data-science-cookbook-gopi-
subramanian-50195042
https://ebookbell.com/product/python-data-science-cookbook-1st-
edition-gopi-subramanian-38440514
https://ebookbell.com/product/python-data-science-cookbook-gopi-
subramanian-33681254
https://ebookbell.com/product/python-data-science-cookbook-
over-60-practical-recipes-to-help-you-explore-python-and-its-robust-
data-science-capabilities-gopi-subramanian-10816954
https://ebookbell.com/product/python-data-science-cookbook-gopi-
subramanian-gopi-subramanian-23686092
Python Data Science Cookbook Gopi Subramanian
https://ebookbell.com/product/python-data-science-cookbook-gopi-
subramanian-7264708
Python Data Science Handbook Essential Tools For Working With Data 2nd
Edition 2nd Edition Jake Vanderplas
https://ebookbell.com/product/python-data-science-handbook-essential-
tools-for-working-with-data-2nd-edition-2nd-edition-jake-
vanderplas-47710390
https://ebookbell.com/product/python-data-science-3-books-in-1-travis-
booth-50417292
Table of Contents
Python Data Science Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Python for Data Science
Introduction
Using dictionary objects
Getting ready
How to do it…
How it works…
There's more…
See also
Working with a dictionary of dictionaries
Getting ready
How to do it…
How it works…
See also
Working with tuples
Getting ready
How to do it…
How it works…
There's more…
See also
Using sets
Getting ready
How to do it…
How it works…
There's more…
Writing a list
Getting ready
How to do it…
How it works…
There's more…
Creating a list from another list - list comprehension
Getting ready
How to do it…
How it works…
There's more…
Using iterators
Getting ready
How to do it…
How it works…
There's more…
Generating an iterator and a generator
Getting ready
How it do it…
How it works…
There's more…
Using iterables
Getting ready
How to do it…
How it works..
See also
Passing a function as a variable
Getting ready
How to do it…
How it works…
Embedding functions in another function
Getting ready
How to do it…
How it works…
Passing a function as a parameter
Getting ready
How to do it…
How it works…
Returning a function
Getting ready
How to do it…
How it works…
There's more…
Altering the function behavior with decorators
Getting ready
How to do it…
How it works…
Creating anonymous functions with lambda
Getting ready
How to do it…
How it works…
Using the map function
Getting ready
How to do it…
How it works…
There's more…
Working with filters
Getting ready
How to do it…
How it works…
Using zip and izip
Getting ready
How to do it…
How it works…
There's more…
See also
Processing arrays from the tabular data
Getting ready
How to do it…
How it works…
There's more…
Preprocessing the columns
Getting ready
How to do it…
How it works…
There's more…
Sorting lists
Getting ready
How to do it…
How it works…
There's more…
Sorting with a key
Getting ready
How to do it…
How it works…
There's more…
Working with itertools
Getting ready
How to do it…
How it works…
2. Python Environments
Introduction
Using NumPy libraries
Getting ready
How to do it…
How it works…
There's more…
See also
Plotting with matplotlib
Getting ready
How to do it…
How it works…
There's more…
Machine learning with scikit-learn
Getting ready
How to do it…
How it works…
There's more…
See also
3. Data Analysis – Explore and Wrangle
Introduction
Analyzing univariate data graphically
Getting ready
How to do it…
How it works…
See also
Grouping the data and using dot plots
Getting ready
How to do it…
How it works…
See also
Using scatter plots for multivariate data
Getting ready
How to do it…
How it works…
See also
Using heat maps
Getting ready
How to do it…
How it works…
There's more...
See also
Performing summary statistics and plots
Getting ready
How to do it…
How it works…
See also
Using a box-and-whisker plot
Getting ready
How to do it…
How it works…
There's more…
Imputing the data
Getting ready
How to do it…
How it works…
There's more…
See also
Performing random sampling
Getting ready
How to do it…
How it works…
There's more…
Stratified sampling
Progressive sampling
Scaling the data
Getting ready
How to do it…
How it works…
There's more…
Standardizing the data
Getting ready
How to do it…
How it works…
There's more…
Performing tokenization
Getting ready
How to do it…
How it works…
There's more…
See also
Removing stop words
How to do it…
How it works…
There's more…
See also
Stemming the words
Getting ready
How to do it…
How it works…
There's more…
See also
Performing word lemmatization
Getting ready
How to do it…
How it works…
There's more…
See also
Representing the text as a bag of words
Getting ready
How to do it…
How it works…
There's more…
See also
Calculating term frequencies and inverse document frequencies
Getting ready
How to do it…
How it works…
There's more…
4. Data Analysis – Deep Dive
Introduction
Matrix Decomposition:
Extracting the principal components
Getting ready
How to do it…
How it works…
There's more…
See also
Using Kernel PCA
Getting ready
How to do it…
How it works…
There's more…
Extracting features using singular value decomposition
Getting ready
How to do it…
How it works…
There's more…
Reducing the data dimension with random projection
Getting ready
How to do it…
How it works…
There's more…
See also
Decomposing the feature matrices using non-negative matrix
factorization
Getting ready
How to do it…
How it works…
There's more…
See also
5. Data Mining – Needle in a Haystack
Introduction
Working with distance measures
Getting ready
How to do it…
How it works…
There's more...
See also
Learning and using kernel methods
Getting ready
How to do it…
How it works…
There's more...
See also
Clustering data using the k-means method
Getting ready
How to do it…
How it works…
There's more...
See also
Learning vector quantization
Getting ready
How to do it…
How it works…
There's more...
See also
Finding outliers in univariate data
Getting ready
How to do it…
How it works…
There's more…
See also
Discovering outliers using the local outlier factor method
Getting ready
How to do it…
How it works…
There's more…
6. Machine Learning 1
Introduction
Preparing data for model building
Getting ready
How to do it…
How it works…
There's more...
Finding the nearest neighbors
Getting ready
How to do it…
How it works…
There's more…
See also
Classifying documents using Naïve Bayes
Getting ready
How to do it…
How it works…
There's more…
See also
Building decision trees to solve multiclass problems
Getting ready
How to do it…
How it works…
There's more…
See also
7. Machine Learning 2
Introduction
Predicting real-valued numbers using regression
Getting ready
How to do it…
How it works…
There's more...
See also
Learning regression with L2 shrinkage – ridge
Getting ready
How to do it…
How it works…
There's more…
See also
Learning regression with L1 shrinkage – LASSO
Getting ready
How to do it…
How it works…
There's more…
See also
Using cross-validation iterators with L1 and L2 shrinkage
Getting ready
How to do it…
How it works…
There's more…
See also
8. Ensemble Methods
Introduction
Understanding Ensemble – Bagging Method
Getting ready…
How to do it
How it works…
There's more…
See also
Understanding Ensemble – Boosting Method
Getting Started…
How to do it
How it works…
There's more…
See also
Understanding Ensemble – Gradient Boosting
Getting Started…
How to do it
How it works…
There's more…
See also
9. Growing Trees
Introduction
Going from trees to Forest – Random Forest
Getting ready
How to do it...
How it works…
There's more…
See also
Growing Extremely Randomized Trees
Getting ready…
How to do it...
How it works…
There's more…
See also
Growing Rotational Forest
Getting ready…
How to do it...
How it works…
There's more…
See also
10. Large-Scale Machine Learning – Online Learning
Introduction
Using perceptron as an online learning algorithm
Getting ready
How to do it…
How it works…
There's more…
See also
Using stochastic gradient descent for regression
Getting ready
How to do it…
How it works…
There's more…
See also
Using stochastic gradient descent for classification
Getting ready
How to do it…
How it works…
There's more…
See also
Index
Python Data Science
Cookbook
Python Data Science
Cookbook
Copyright © 2015 Packt Publishing All rights reserved.
No part of this book may be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, without the prior written permission of the
publisher, except in the case of brief quotations
embedded in critical articles or reviews.
Livery Place
35 Livery Street
ISBN 978-1-78439-640-4
www.packtpub.com
Credits
Author
Gopi Subramanian
Reviewer
Bastiaan Sjardin
Commissioning Editor
Akram Hussain
Acquisition Editor
Nikhil Karkal
Siddhesh Salvi
Technical Editor
Danish Shaikh
Copy Editor
Tasneem Fatehi
Project Coordinator
Kranti Berde
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Author
Gopi Subramanian is a data scientist with over 15 years
of experience in the field of data mining and machine
learning. During the past decade, he has designed,
conceived, developed, and led data mining, text mining,
natural language processing, information extraction and
retrieval, and search systems for various domains and
business verticals, including engineering infrastructure,
consumer finance, healthcare, and materials. In the
loyalty domain, he has conceived and built innovative
consumer loyalty models and designed enterprise-wide
systems for personalized promotions. He has filed over
ten patent applications at the US and Indian patent office
and has several publications to his credit. He currently
lives and works in Bangaluru, India.
About the Reviewer
Bastiaan Sjardin is a data scientist and entrepreneur
with a background in artificial intelligence, mathematics,
and machine learning. He has an MSc degree in
cognitive science and mathematical statistics from the
University of Leiden. In the past 5 years, he has worked
on a wide range of data science projects. He is a
frequent community TA at Coursera in the social network
analysis course from the University of Michigan and the
practical machine learning course from Johns Hopkins
University. His programming language of choice is R and
Python. Currently, he is the cofounder of Quandbee
(www.quandbee.com), a company specializing in
machine learning applications.
www.PacktPub.com
Support files, eBooks,
discount offers, and more
For support files and downloads related to your book,
please visit www.PacktPub.com.
https://www2.packtpub.com/books/subscription/packtlib
Why Subscribe?
Fully searchable across every book published by Packt
Getting ready
This section tells you what to expect in the recipe, and
describes how to set up any software or any preliminary
settings required for the recipe.
How to do it…
This section contains the steps required to follow the
recipe.
How it works…
This section usually consists of a detailed explanation of
what happened in the previous section.
There's more…
This section consists of additional information about the
recipe in order to make the reader more knowledgeable
about the recipe.
See also
This section provides helpful links to other useful
information for the recipe.
Conventions
In this book, you will find a number of text styles that
distinguish between different kinds of information. Here
are some examples of these styles and an explanation of
their meaning.
http://scikit-
learn.org/stable/modules/generated/sklearn.metrics.log_l
oss.html
NOTE
Warnings or important notes appear in a box like this.
TIP
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us
know what you think about this book—what you liked or
disliked. Reader feedback is important for us as it helps
us develop titles that you will really get the most out of.
Errata
Although we have taken every care to ensure the
accuracy of our content, mistakes do happen. If you find
a mistake in one of our books—maybe a mistake in the
text or the code—we would be grateful if you could
report this to us. By doing so, you can save other
readers from frustration and help us improve subsequent
versions of this book. If you find any errata, please report
them by visiting http://www.packtpub.com/submit-errata,
selecting your book, clicking on the Errata Submission
Form link, and entering the details of your errata. Once
your errata are verified, your submission will be accepted
and the errata will be uploaded to our website or added
to any list of existing errata under the Errata section of
that title.
Piracy
Piracy of copyrighted material on the Internet is an
ongoing problem across all media. At Packt, we take the
protection of our copyright and licenses very seriously. If
you come across any illegal copies of our works in any
form on the Internet, please provide us with the location
address or website name immediately so that we can
pursue a remedy.
Questions
If you have a problem with any aspect of this book, you
can contact us at <questions@packtpub.com>, and
we will do our best to address the problem.
Chapter 1. Python for Data
Science
In this chapter, we will cover the following recipes:
Using sets
Writing a list
Using iterators
Using iterables
Returning a function
Sorting lists
Sorting with a key
Introduction
The Python programming language provides a lot of
built-in data structures and functions that are very handy
for data science programming. In this chapter, we will
look at some that are most frequently used. In the
subsequent chapters, you will see that these will be used
in various sections for different topics. A good grasp of
these will help you in the long run to quickly bootstrap a
program in order to handle data and develop algorithms.
Getting ready
Let's look at an example Python script to understand
how a dictionary operates. So, with a text, this script tries
to get the word count, that is, how many times each word
has appeared in the given text.
How to do it…
Let's proceed to demonstrate how to operate a dictionary
in Python. Let's use a simple sentence to demonstrate
the use of a dictionary. We will follow it up with an actual
dictionary creation:
How it works…
The preceding code builds a word frequency table; every
word and its frequency is calculated. The final print
statement produces the following output:
word_dict.setdefault(word,0)
There's more…
Python 2.5 and above has a class named
defaultdict; it's in the collections module. This
takes care of the setdefault action. A defaultdict
class is invoked as follows:
word_dict = defaultdict(int)
NOTE
A typical dictionary does not remember the order in which the keys were inserted. In its
collections module, Python provides a container called OrderedDict that can remember
the order in which the keys were inserted. See the following Python documentation for
more details:
https://docs.python.org/2/library/collections.html#collections.OrderedDict
https://docs.python.org/2/tutorial/datastructures.html#dict
ionaries
https://docs.python.org/2/library/json.html
words = sentence.split()
word_count = Counter(words)
https://docs.python.org/2/library/collections.html#collectio
ns.Counter
See also
Working with Dictionary of Dictionaries recipe in Chapter 1, Using
Python for Data Science
Working with a dictionary of
dictionaries
As we mentioned earlier, the real power of these data
structures lies in how creatively you can use them to
achieve your tasks. Let's look at an example to
understand how to use dictionaries in a dictionary.
Getting ready
Look at the following table:
How to do it…
We will create the user_movie_rating dictionary
using an anonymous function to demonstrate the
concept of a dictionary of dictionaries.
user_movie_rating = defaultdict(lambda
:defaultdict(int))
How it works…
The user_movie_rating is a dictionary of
dictionaries. As explained in the previous section,
defaultdict takes a function for argument; in this
case, we passed a built-in anonymous function, lambda,
which returns a dictionary. So, every time a new key is
passed to user_movie_rating, a new dictionary will
be created for this key. We will see more about the
lambda function in the subsequent section.
http://www.nltk.org/book/ch05.html
See also
Creating Anonymous Functions recipe in Chapter 1, Using Python for
Data Science
Working with tuples
A tuple is a type of container object known as sequence
types in Python. Tuples are immutable and can have a
heterogeneous sequence of elements separated by a
comma and enclosed in parentheses. They support the
following operations:
in and not in
Getting ready
Rather than having a full program as we did with
dictionaries, we will see tuples as fragmented codes
where we will concentrate on the creation and
manipulation activities.
How to do it…
Let's see some scripts demonstrating the creation and
manipulation of tuples:
# 6 Slicing of uples
a =(1,2,3,4,5,6,7,8,9,10)
print a[1:]
print a[1:3]
print a[1:6:2]
print a[:-1]
How it works…
In step 1, we created a tuple. Though strictly speaking,
the parentheses are not needed, still it's an option for
better readability. As you can see, we created a
heterogeneous tuple with numeric and string values.
Step 2 details how the elements of a tuple can be
accessed through the index. Indices start from zero. A
negative number can be used to access the tuple in
reverse. The output of the print statement is as follows:
TIP
While building programs for machine learning, in particular during the feature generation
from raw data, creating feature tuples ensures that values cannot be changed by
downstream programs.
c_tuple[2][0] = 100
print c_tuple
As you can see, the value of the first element in the list is
changed to 100.
His “star” had led him far from insignificant Ajaccio and was now
leading him still further. Unknown lad, cadet, lieutenant, general,
emperor, statesman, constructor, destructor, he had been all, and
more. Destiny had now set him a far more difficult task, namely, to
reign over himself. In this he was perhaps less successful than
myriads who have gone down to the grave in silence, and whose
names find no place in the printed page or the scrolls of history. In
lonely St Helena, isolated from other human habitation, spied on by
soldiers of the army which had done so much to bring about his
downfall, but surrounded by a little band of men who refused to
desert him in his last days of trial and despair, he spent the
remainder of a life which had been lived to the full. Sometimes his
old enthusiasm would revive as he reviewed the history of a
campaign, at others he would show the capriciousness of a spoilt
child at the over-conscientious sense of duty displayed by Sir Hudson
Lowe, the Governor of the island. It is perhaps a more dramatic
ending to so marvellous a story than if he had fallen in battle. Many
men have met their death in that way, but there has been but one
Imperial prisoner at St Helena, the exiled monarch whose soul took
its flight on the stormy night of the 5th May 1821.
“The glories of our blood and state
Are shadows, not substantial things;
There is no armour against fate;
Death lays his icy hand on kings;
Sceptre and crown
Must tumble down,
And in the dust be equal made
With the poor crooked scythe and spade.”
The less deeply shaded portion shows the extent of the French Empire at the height of
Napoleon’s power. The darker part shows its diminished size after 1815.
Index
Abensberg, 224
Acre, 101–2
Ajaccio, 15, 34, 39, 41, 44, 52, 55, 92, 105
Alexander I., 123, 178, 298, 306
Alexandria, 94, 95
Amiens (Treaty), 124, 131
Arcola, 85
Aspern, 236, 237
Augsburg, 148, 220
Austerlitz, 150, 153
Auxonne, 34, 39, 41, 42
Avignon, 59, 233
Cagliari, 52
Cairo, 95
Campo Formio, Treaty of, 88
Carteaux, 60, 65
Casa-Bianca, Commodore, 98
Charles, Archduke, 88, 147, 150, 221
Charles IV. (Spain), 194, 195, 196
Cintra, Convention of, 204
Clary, Mlle. Désirée, 68
Copenhagen, 123, 188, 189
Corsica, 15, 33, 34, 38, 39, 44, 66, 69, 202
Coruña, Battle of, 216
Danzig, 175
D’Enghien, Duc, 135
Desaix, General, 93, 99, 117, 119
Dnieper River, 289
Doppet, 60, 62, 65
Dresden, 259, 260, 261, 297, 298, 300
Dugommier, General, 61, 63
Dumouriez, General, 57, 58
Duroc, General, 75, 104, 298
Jaffa, 100
Jemappes, Battle of, 57
Jena, Battle of, 158, 159, 160, 162, 173, 194, 207
John, Archduke, 150
Joubert, General, 88
Junot, General, 61, 75, 87, 183, 193, 202, 203, 204, 268
Junot, Madame, 31
Landgrafenberg, 162
Lannes, Marshal, 80, 104, 107, 108, 116, 117, 119, 138, 161,
226, 227, 235, 236, 237
Leipzig, Campaign of, 291–301
Leoben, 88
Ligny, Battle of, 307
Lobau (Ile Napoléon), 238, 240, 241
Lodi, 80, 81, 85
Lonato, 84
Louis XVI., 37, 48, 49, 50, 57, 58
Lugo, 215–6
Lunéville, Treaty of, 122, 128, 131
Lützen, Battle of, 296
Lyons, 36, 43
Macdonald, Marshal, 113, 229, 241, 243, 244, 290, 293, 300,
304
Madrid, 199, 213
Malta, 93, 96, 125
Mantua, 84, 87
Marbœuf, 20, 21
Marengo, 117, 120
Marie Antoinette, 47, 49
Marmont, Marshal, 104, 107, 147, 148, 241, 245, 304
Marseilles, 38, 60, 68
Masséna, Marshal, 65, 76, 80, 83, 84, 106, 138, 150, 220, 223,
245
Médola, 84
Milan, 76, 79, 82, 83, 116, 140, 182
Millesimo, 77, 79
Montebello, 117
Montenotte Pass, 76
Moore, Sir John, 202, 204, 212, 215, 217
Moreau, General, 113, 114, 121, 122, 135
Mortier, Marshal, 134, 138, 304
Moscow, 275, 279
Munich, 115
Murat, King of Naples, 83, 93, 104, 107, 138, 160, 167, 195,
198, 295
Naples, 113, 196
Nelson, Admiral, 94, 97, 123, 133, 142, 143, 149, 215
Ney, Marshal, 128, 138, 150, 170, 175, 264, 273, 289, 307
Nice, 64, 74, 77
Valence, 30, 42
Victor, Marshal, 87, 289
Vienna, 122, 150, 233
Villeneuve, Admiral, 142, 143, 144
Vilna, 266, 268, 270, 290
Vimiero, 203
Vistula River, 169, 175, 293
Vitebsk, 207, 272, 277
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookbell.com