100% found this document useful (1 vote)
6 views

Lectures On Mathematical Computing With Python Jay Gopalakrishnan pdf download

The document is a collection of lectures on mathematical computing using Python, prepared for undergraduate students at Portland State University during the Spring 2020 semester. It includes practical activities and examples to illustrate computational thinking, while also addressing real-world issues such as the COVID-19 pandemic. The lectures cover various topics in scientific computation and data science, and are intended as a starting point for further exploration and adaptation by students and instructors.

Uploaded by

zohmunikeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
6 views

Lectures On Mathematical Computing With Python Jay Gopalakrishnan pdf download

The document is a collection of lectures on mathematical computing using Python, prepared for undergraduate students at Portland State University during the Spring 2020 semester. It includes practical activities and examples to illustrate computational thinking, while also addressing real-world issues such as the COVID-19 pandemic. The lectures cover various topics in scientific computation and data science, and are intended as a starting point for further exploration and adaptation by students and instructors.

Uploaded by

zohmunikeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Lectures On Mathematical Computing With Python

Jay Gopalakrishnan download

https://ebookbell.com/product/lectures-on-mathematical-computing-
with-python-jay-gopalakrishnan-55033540

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Lectures On Mathematical Control Theory

https://ebookbell.com/product/lectures-on-mathematical-control-
theory-22216138

Parisprinceton Lectures On Mathematical Finance 2010 1st Edition


Areski Cousin

https://ebookbell.com/product/parisprinceton-lectures-on-mathematical-
finance-2010-1st-edition-areski-cousin-2042140

Parisprinceton Lectures On Mathematical Finance 2004 1st Edition Ren A


Carmona

https://ebookbell.com/product/parisprinceton-lectures-on-mathematical-
finance-2004-1st-edition-ren-a-carmona-4211596

Parisprinceton Lectures On Mathematical Finance 2013 Editors Vicky


Henderson Ronnie Sircar 1st Edition Fred Espen Benth

https://ebookbell.com/product/parisprinceton-lectures-on-mathematical-
finance-2013-editors-vicky-henderson-ronnie-sircar-1st-edition-fred-
espen-benth-4293956
Parisprinceton Lectures On Mathematical Finance 2003 1st Edition
Tomasz R Bielecki

https://ebookbell.com/product/parisprinceton-lectures-on-mathematical-
finance-2003-1st-edition-tomasz-r-bielecki-1011964

Parisprinceton Lectures On Mathematical Finance 2002 1st Edition Peter


Bank

https://ebookbell.com/product/parisprinceton-lectures-on-mathematical-
finance-2002-1st-edition-peter-bank-1076404

Phenomenology And Logic The Boston College Lectures On Mathematical


Logic And Existentialism Bernard Lonergan Editor Philip Mcshane Editor

https://ebookbell.com/product/phenomenology-and-logic-the-boston-
college-lectures-on-mathematical-logic-and-existentialism-bernard-
lonergan-editor-philip-mcshane-editor-51915218

The Science Of Cities And Regions Lectures On Mathematical Model


Design 1st Edition Alan Wilson Auth

https://ebookbell.com/product/the-science-of-cities-and-regions-
lectures-on-mathematical-model-design-1st-edition-alan-wilson-
auth-2518938

The Science Of Cities And Regions Lectures On Mathematical Model


Design 1st Edition Alan Wilson Auth

https://ebookbell.com/product/the-science-of-cities-and-regions-
lectures-on-mathematical-model-design-1st-edition-alan-wilson-
auth-4112226
Lectures on

Mathematical Computing

with Python

Jay Gopalakrishnan

Portland State University


© 2020 Jay Gopalakrishnan

This work is licensed under a Creative Commons Attribution–ShareAlike 4.0 International


License.

You are free to:

• Share: copy and redistribute the material in any medium or format, and
• Adapt: remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:

• Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes
were made. You may do so in any reasonable manner, but not in any way that suggests the licensor
endorses you or your use.
• ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions
under the same license as the original.
• No additional restrictions: You may not apply legal terms or technological measures that legally restrict
others from doing anything the license permits.

Digital Object Identifier (DOI):


10.15760/pdxopen-28
(https://doi.org/10.15760/pdxopen-28)

Recommended citation:
Gopalakrishnan, J., Lectures on Mathematical Computing with Python, PDXOpen: Open Educational Resource
28, Portland State University Library, DOI: 10.15760/pdxopen-28, July 2020.

This corrected version is dated August 4, 2020.

2
Preface

These lectures were prepared for a class of (mostly) second year mathematics and statis-
tics undergraduate students at Portland State University during Spring 2020. The term
was unlike any other. The onslaught of COVID-19 moved the course meetings online, an
emergency transition that few of us were prepared for. Many lectures reflect our preoccu-
pations with the damage inflicted by the virus. I have not attempted to edit these out since
I felt that a utilitarian course on computing need not be divested from the real world.
These materials offer class activities for studying basics of mathematical computing using
the python programming language, with glimpses into modern topics in scientific com-
putation and data science. The lectures attempt to illustrate computational thinking by
examples. They do not attempt to introduce programming from the ground up, although
students, by necessity, will learn programming skills from external materials. In my expe-
rience, students are able and eager to learn programming by themselves using the abun-
dant free online resources that introduce python programming. In particular, my students
and I found the two (free and online) books of Jake VanderPlas invaluable. Many sec-
tions of these two books, hyperlinked throughout these lectures, were assigned as required
preparatory reading materials during the course (see List of Preparatory Materials).
Materials usually covered in a first undergraduate linear algebra course and in a one-
variable differential calculus course form mathematical prerequisites for some lectures.
Concepts like convergence may not be covered rigorously in such prerequisites, but I have
not shied away from talking about them: I feel it is entirely appropriate that a first en-
counter with such concepts is via computation.
Each lecture has a date of preparation. It may help the reader understand the context in
relation to current events and news headlines. The timestamp also serves as an indicator
of the state of the modules in the ever-changing python ecosystem of modules for scientific
computation. The specific version numbers of the modules used are listed overleaf. The
codes may need tinkering with to ensure compatibility with future versions. The materials
are best viewed as offering a starting point for your own adaptation.
If you are an instructor declaring these materials as a resource in your course syllabus, I
would be happy to provide any needed solutions to exercises or datafiles. If you find errors
please alert me. If you wish to contribute by updating or adding materials, please fork the
public GitHub Repository where these materials reside and send me a pull request.

Jay Gopalakrishnan
(gjay@pdx.edu)

3
Software Requirements:
• Python >= 3.7
• Jupyter >= 1
Main modules used:
• cartopy==0.18.0b2.dev48+
• geopandas==0.7.0
• gitpython==3.1.0
• matplotlib==3.2.1
• numpy==1.18.2
• pandas==1.0.4
• scipy==1.4.1
• scikit-learn==0.23.1
• seaborn==0.10.0
• spacy==2.2.4
Other (optional) facilities used include line_profiler, memory_profiler, numexpr, pandas-
datareader, and primesieve.

4
Table of Contents

Lecture Notebooks
• 01 Overview of some tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
• 02 Interacting with python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
• 03 Working with git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
• 04 Conversion table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
• 05 Approximating derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
• 06 Genome of SARS-CoV-2 virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
• 07 Fibonacci primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
• 08 Numpy blitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
• 09 The SEIR model of infectious diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
• 10 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
• 11 Bikes on Tilikum Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
• 12 Visualizing geospatial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
• 13 Gambler’s ruin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
• 14 Google’s PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
• 15 Supervised learning by regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
• 16 Unsupervised learning by PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
• 17 Latent semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Exercises
• Power sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
• Graph functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
• Argument passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
• Piecewise functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
• Row swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
• Averaging matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
• Differentiation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
• Pairwise differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
• Hausdorff distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
• k-nearest neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
• Predator-prey model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
• Column space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
• Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
• Pandas from dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
• Iris flowers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
• Stock prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
• Passengers on the Titanic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
• Animate functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

5
• Insurance company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
• Probabilities on small graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
• Ehrenfest urns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
• Power method for large graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
• Google’s toy graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
• Atmospheric carbon dioxide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
• Ovarian cancer data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
• Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
• Word vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Projects
• Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
• Rise of CO2 in the atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
• COVID-19 cases in the west coast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
• World map of COVID-19 cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
• Neighbor’s color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

6
List of Preparatory Materials for Each Activity

The activities in the table of contents are enumerated again below in a linear ordering with
hyperlinks to external online preparatory materials for each.

Required Preparation Activity

Watch the first few of the 44 Microsoft 01 Overview of tools


videos on python. Watch a 2014 video by
SIAM: What is data science? Browse basic
language facilities either from the official
Python tutorial or from [JV-W].

Read about the iPython shell facilities 02 Interacting with python


from the first chapter of [JV-H].

Browse Git Handbook 03 Working with git

Using the Python tutorial or [JV-W] work 04 Conversion table


with if, while, for,range, print, lists [],
tuples (), and list comprehension.
Exercise: Power sum

Using Python tutorial or [JV-W], learn 05 Approximating derivatives


about functions, def, and lambda.
Project: Bisection

Using Python tutorial or [JV-W], learn 06 Genome of SARS-CoV-2 virus


about dictionaries {}, strings and file
operations open, readlines
Project: Rise of CO2 in the atmosphere

Learn about pytest, generator 07 Fibonacci primes


expressions, yield, line and cell magics

Learn numpy basics from [JV-H], ufuncs, 08 Numpy blitz


broadcasting indexing, masking
Exercise: Argument passing
Exercise: Piecewise functions

7
Required Preparation Activity
Learn sorting, partitioning [JV-H], and Exercise: Row Swap
quick ways to make matrices from
[numpy.org].
Exercise: Averaging Matrix
Exercise: Differentiation Matrix

Learn how to make simple plots using Exercise: Graphing functions


matplotlib. Read about aggregation and
masking from [JV-H]
Exercise: Pairwise differences
Exercise: Hausdorff distance
Exercise: k-nearest neighbors

Get an overview of scipy facilities. Online 09 SEIR model of infectious diseases


scipy lecture notes are very helpful.
Familiarize yourselves with scipy’s
sparse and integrate modules.
Exercise: Predator-prey model

Learn numpy facilities for matrix 10 Singular value decomposition


factorizations, eigenvalues etc.
Exercise: Column space
Exercise: Null space

Introduce yourselves to the data analysis Exercise: Pandas from dictionaries


module pandas.
Exercise: Iris flower dataset

Reinforce your pandas skills. 11 Bikes on Tilikum Crossing


Exercise: Stock prices
Exercise: Passengers on the Titanic

Project: Growth of COVID-19 cases in the


west coast

Familiarize yourselves with geopandas, 12 Visualizing geospatial data


cartopy, and matplotlib.animation.
Exercise: Animate functions

Project: World map of COVID-19 cases

Review scipy.sparse. Introduce 13 Gambler’s Ruin


yourselves to NetworkX.
Exercise: Insurance Company
Exercise: Probabilities on small graphs
Exercise: Ehrenfest thought experiment

8
Required Preparation Activity

Project: Neighbor’s color

Be acquainted with scipy.sparse’s matrix Exercise: Power method for large graphs
format, specifically COO and CSR Exercise: Google’s toy graph
formats.

Read the good introduction to machine 15 Supervised learning by regression


learning from [JV-H]
Exercise: Atmospheric carbon dioxide

Read about unsupervised machine 16 Unsupervised learning by PCA


learning, focusing specifically on PCA.
Also review the prior lecture on SVD.
Exercise: Ovarian cancer data
Exercise: Eigenfaces

Learn about text features in machine 17 Latent semantic analysis


learning from [JV-H].
Exercise: Word vectors

9
I
Overview of some tools

March 31, 2020

This lecture is an introductory overview to give you a sense of the broad utility of a few
python tools you will encounter in later lectures. Each lecture or class activity is guided by
a Jupyter Notebook (like this document), which combines executable code, mathematical
formulae, and text notes. This overview notebook also serves to check and verify that you
have a working installation of some of the python modules we will need to use later. We
shall delve into basic programming using python (after this overview and a few further
start-up notes) starting from a later lecture.
The ability to program, analyze and compute with data are life skills. They are useful well
beyond your mathematics curriculum. To illustrate this claim, let us begin by considering
the most pressing current issue in our minds as we begin these lectures: the progression
of COVID-19 disease worldwide. The skills you will learn in depth later can be applied
to understand many types of data, including the data on COVID-19 disease progression.
In this overview, we shall use a few python tools to quickly obtain and visualize data on
COVID-19 disease worldwide. The live data on COVID-19 (which is changing in as yet
unknown ways) will also be used in several later activities.
Specifically, this notebook contains all the code needed to perform these tasks:
• download today’s data on COVID-19 from a cloud repository,
• make a data frame object out of the data,
• use a geospatial module to put the data on a world map,
• download county maps from US Census Bureau, and
• visualize the COVID-19 data restricted to Oregon.
The material here is intended just to give you an overview of the various tools we will learn
in depth later. There is no expectation that you can immediately digest the code here. The
goal of this overview is merely to whet your appetite and motivate you to allocate time to
learn the materials yet to come.

I.1 The modules you need


These are the python modules we shall use below.
• matplotlib (for various plotting & visualization tools in python)
• descartes (for specialized visualization of maps using matplotlib)
• gitpython (to work in python with Git repositories)
• pandas (to make data frame structures out of raw data)
• geopandas (for analysis of geospatial data)
• urllib (for fetching resources at an internet url)

10
Please install these modules if you do not have them already. (If you do not have these
installed, attempting to run the next cell will give you an error.)

[1]: import pandas as pd


import os
from git import Repo
import matplotlib.pyplot as plt
import geopandas as gpd
import urllib
import shutil
%matplotlib inline

I.2 Get the data


The Johns Hopkins University Center for Systems Science and Engineering has curated
data on COVID-19 from multiple sources and provided it online as a “git” repository in
a cloud server at https://github.com/CSSEGISandData/COVID-19. (We shall learn a bit
more about git in a later lecture.) These days, as the disease progresses, new data is being
pushed into this repository every day.
Git repositories in the cloud server can be cloned to get an identical local copy on our
computers. Let us begin by cloning a copy of the Johns Hopkins COVID-19 data repository
into a location in your computer. Please specify this location in your computer in the
variable called covidfolder below. Once you have cloned the repository, the next time
you run the same line of code, it does not clone it again. Instead, it only pulls updates
from the cloud to sync your local copy with the remote original.

[2]: # your local folder into which you want to download the covid data

covidfolder = '../../data_external/covid19'

Remember this location where you have stored the COVID-19 data. You will need to return
to it when you use the data during activities in later days, including assignment projects.

[3]: if os.path.isdir(covidfolder): # if repo exists, pull newest data


repo = Repo(covidfolder)
repo.remotes.origin.pull()
else: # otherwise, clone from remote
repo = Repo.clone_from('https://github.com/CSSEGISandData/COVID-19.
↪→git',

covidfolder)
datadir = repo.working_dir + '/csse_covid_19_data/
↪→csse_covid_19_daily_reports'

The folder datadir contains many files (all of which can be listed here using the command
os.listdir(datadir) if needed). The filenames begin with a date like 03-27-2020 and
ends in .csv. The ending suffix csv stands for “comma separated values”, a common
simple format for storing uncompressed data.

11
I.3 Examine the data for a specific date
The python module pandas, the workhorse for all data science tasks in python, can make a
DataFrame object out of each such .csv files. You will learn more about pandas later in the
course. For now, let us pick a recent date, say March 27, 2020, and examine the COVID-19
data for that date.
[4]: c = pd.read_csv(datadir+'/03-27-2020.csv')

The DataFrame object c has over 3000 rows. An examination of the first five rows already
tells us a lot about the data layout:

[5]: c.head()

[5]: FIPS Admin2 Province_State Country_Region Last_Update ␣


↪→ \
0 45001.0 Abbeville South Carolina US 2020-03-27 22:14:55
1 22001.0 Acadia Louisiana US 2020-03-27 22:14:55
2 51001.0 Accomack Virginia US 2020-03-27 22:14:55
3 16001.0 Ada Idaho US 2020-03-27 22:14:55
4 19001.0 Adair Iowa US 2020-03-27 22:14:55

Lat Long_ Confirmed Deaths Recovered Active \


0 34.223334 -82.461707 4 0 0 0
1 30.295065 -92.414197 8 1 0 0
2 37.767072 -75.632346 2 0 0 0
3 43.452658 -116.241552 54 0 0 0
4 41.330756 -94.471059 1 0 0 0

Combined_Key
0 Abbeville, South Carolina, US
1 Acadia, Louisiana, US
2 Accomack, Virginia, US
3 Ada, Idaho, US
4 Adair, Iowa, US

Note that depending on how the output is rendered where you are reading this, the later
columns may be line-wrapped or may be visible only after scrolling to the edges. This
object c, whose head part is printed above, looks like a structured array. There are features
corresponding to locations, as specified in latitude Lat and longitude Long_. The columns
Confirmed, Deaths, and Recovered represents the number of confirmed cases, deaths, and
recovered cases due to COVID-19 at a corresponding location.

I.4 Put the data on a map


Data like that in c contains geospatial information. One way to visualize geospatial data is
to indicate the quantity of interest on a map. We shall visualize the data in the Confirmed
column by positioning a marker at its geographical location and make the marker size
correspond to the number of confirmed cases at that position. The module geopandas

12
(gpd) is well-suited for visualizing geospatial data. It is built on top of the pandas library.
So it is easy to convert our pandas object c to a geopandas object.

[6]: # make a geometry object from Lat, Long


geo = gpd.points_from_xy(c['Long_'], c['Lat'])
# give the geometry to geopandas together with c
gc = gpd.GeoDataFrame(c, geometry=geo)
gc.head()

[6]: FIPS Admin2 Province_State Country_Region Last_Update ␣


↪→ \
0 45001.0 Abbeville South Carolina US 2020-03-27 22:14:55
1 22001.0 Acadia Louisiana US 2020-03-27 22:14:55
2 51001.0 Accomack Virginia US 2020-03-27 22:14:55
3 16001.0 Ada Idaho US 2020-03-27 22:14:55
4 19001.0 Adair Iowa US 2020-03-27 22:14:55

Lat Long_ Confirmed Deaths Recovered Active \


0 34.223334 -82.461707 4 0 0 0
1 30.295065 -92.414197 8 1 0 0
2 37.767072 -75.632346 2 0 0 0
3 43.452658 -116.241552 54 0 0 0
4 41.330756 -94.471059 1 0 0 0

Combined_Key geometry
0 Abbeville, South Carolina, US POINT (-82.46171 34.22333)
1 Acadia, Louisiana, US POINT (-92.41420 30.29506)
2 Accomack, Virginia, US POINT (-75.63235 37.76707)
3 Ada, Idaho, US POINT (-116.24155 43.45266)
4 Adair, Iowa, US POINT (-94.47106 41.33076)

The only difference between gc and c is the last column, which contains the new geometry
objects representing points on the globe. Next, in order to place markers at these points on
a map of the world, we need to get a simple low resolution world map:

[7]: world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))


world.plot();

13
You can download and use maps with better resolution from Natural Earth, but that will
be too far of a digression for this overview. On top of the above low resolution map, we
can now put the markers whose sizes are proportional to the number of confirmed cases.

[8]: base = world.plot(alpha=0.3)


msz = 500 * gc['Confirmed'] / gc['Confirmed'].max()
gc.plot(ax=base, column='Confirmed', markersize=msz, alpha=0.7);

These python tools have made it incredibly easy for us to immediately identify the COVID-
19 trouble spots in the world. Moreover, these visualizations can be updated easily by
re-running this code as data becomes available for other days.

I.5 Restricting to Oregon


Focusing on our part of the world, let us see how to restrict the COVID-19 data in the data
frame c to Oregon.

[9]: co = c[c['Province_State']=='Oregon']

The variable co now contains the data restricted to Oregon. However, we are now pre-
sented with a problem. To visualize the restricted data, we need a map of Oregon. The
module geopandas does not carry any information about Oregon and its counties. How-
ever this information is available from the United States Census Bureau. (By the way, the
2020 census is happening now! Do not forget to respond to their survey. They are one of
our authoritative sources of quality data.)
To visualize the COVID-19 information on a map of Oregon, we need to get the county
boundary information from the census bureau. This illustrates a common situation that
arises when trying to analyze data: it is often necessary to procure and merge data from
multiple sources in order to understand a real-world phenomena.
A quick internet search reveals the census page with county information. The information
is available in an online file cb_2018_us_county_500k.zip at the URL below. Python al-
lows you to download this file using its urllib module without even needing to leave this
notebook.
[10]: # url of the data
census_url = 'https://www2.census.gov/geo/tiger/GENZ2018/shp/
↪→cb_2018_us_county_500k.zip'

14
# location of your download
your_download_folder = '../../data_external'
if not os.path.isdir(your_download_folder):
os.mkdir(your_download_folder)
us_county_file = your_download_folder + '/cb_2018_us_county_500k.zip'

# download if the file doesn't already exist


if not os.path.isfile(us_county_file):
with urllib.request.urlopen(census_url) as response,␣
↪→open(us_county_file, 'wb') as out_file:

shutil.copyfileobj(response, out_file)

Now, your local computer has a zip file, which has among its contents, files with geometry
information on the county boundaries, which can be read by geopandas. We let geopandas
directly read in the zip file: it knows which information to extract from the zip archive to
make a data frame with geometry.

[11]: us_counties = gpd.read_file(f"zip://{us_county_file}")


us_counties.head()

[11]: STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME LSAD ␣


ALAND \
↪→

0 21 007 00516850 0500000US21007 21007 Ballard 06 ␣


↪→639387454

1 21 017 00516855 0500000US21017 21017 Bourbon 06 ␣


↪→750439351

2 21 031 00516862 0500000US21031 21031 Butler 06 ␣


↪→1103571974

3 21 065 00516879 0500000US21065 21065 Estill 06 ␣


↪→655509930

4 21 069 00516881 0500000US21069 21069 Fleming 06 ␣


↪→902727151

AWATER geometry
0 69473325 POLYGON ((-89.18137 37.04630, -89.17938 37.053...
1 4829777 POLYGON ((-84.44266 38.28324, -84.44114 38.283...
2 13943044 POLYGON ((-86.94486 37.07341, -86.94346 37.074...
3 6516335 POLYGON ((-84.12662 37.64540, -84.12483 37.646...
4 7182793 POLYGON ((-83.98428 38.44549, -83.98246 38.450...

The object us_counties has information about all the counties. Now, we need to re-
strict this data to just that of Oregon. Looking at the columns, we find something called
STATEFP. Searching through the government pages, we find that STATEFP refers to a 2-
character state FIPS code. The FIPS code refers to Federal Information Processing Standard
which was a “standard” at one time, then deemed obsolete, but still continues to be used
today. All that aside, it suffices to note that Oregon’s FIPS code is 41. Once we know this,

15
python makes it is easy to restrict the data to Oregon:

[12]: ore = us_counties[us_counties['STATEFP']=='41']


ore.plot();

Now we have the Oregon data in two data frames, ore and co. We must combine the two
data frames. This is again a situation so often encountered when dealing with real data
that there is a facility for it in pandas called merge. Both data has FIPS codes: in ore you
find it under column GEOID, and in co you find it called FIPS. The merged data frame is
represented by the variable orco below:

[13]: ore = ore.astype({'GEOID': 'int64'}).rename(columns={'GEOID' : 'FIPS'})


co = co.astype({'FIPS': 'int64'})
orco = pd.merge(ore, co.iloc[:,:-1], on='FIPS')

The orco object now has both the geometry information as well as the COVID-19 informa-
tion, making it extremely easy to visualize.

[14]: # plot coloring counties by number of confirmed cases

fig, ax = plt.subplots(figsize=(12, 8))


orco.plot(ax=ax, column='Confirmed', legend=True,
legend_kwds={'label': '# confimed cases',
'orientation':'horizontal'})

# label the counties

for x, y, county in zip(orco['Long_'], orco['Lat'], orco['NAME']):


ax.text(x, y, county, color='grey')

ax.set_title('Confirmed COVID-19 cases in Oregon as of March 27 2020')


ax.set_xlabel('Latitude'); ax.set_ylabel('Longitude');

16
This is an example of a chloropleth map, a map where regions are colored or shaded in
proportion to some data variable. It is an often-used data visualization tool.

I.6 Ask the data


Different ways of displaying data often give different insights. There are many visualiza-
tion tools in the python ecosystem and you will become more acquainted with these as we
proceed.
Meanwhile, you might have many questions whose answers already lie in the data we
have downloaded. For example, you may wonder how Oregon is doing in terms of
COVID-19 outbreak compared to the other two west coast states. Here is the answer ex-
tracted from the same data:

17
Confirmed COVID-19 cases until 2020-03-30
7000 Oregon
Washington
6000 California

5000

4000

3000

2000

1000

0
1 15 1 -15 -01
2-0 02- 3-0 3 4
0-0 0 - 0-0 0-0 0-0
202 202 202 202 202
Dates

How does the progression of infections in New York compare with Hubei where the dis-
ease started? Again the answer based on the data we have up to today is easy to extract,
and is displayed next.

Confirmed COVID-19 cases until 2020-03-30


70000
New York
60000 Hubei

50000

40000

30000

20000

10000

0
01 15 01 -15 -01
0-02- 0- 02- 0-03- 0-0
3
0-04
202 202 202 202 202
Dates

Of course, the COVID-19 situation is evolving, so these figures are immediately outdated
after today’s class. This situation is evolving in as yet unknown ways. I am sure that you,
like me, want to know more about how these plots will change in the next few months.
You will be able to generate plots like this and learn many more data analysis skills from
these lectures. As you amass more technical skills, let me encourage you to answer your

18
own questions on COVID-19 by returning to this overview, pulling the most recent data,
and modifying the code here to your needs. In fact, some later assignments will require
you to work further with this Johns Hopkins COVID-19 worldwide dataset. Visualizing
the COVID-19 data for any other state, or indeed, any other region in the world, is easily
accomplished by some small modifications to the code of this lecture.

19
II
Interacting with Python

March 31, 2020

Python is a modern, general-purpose, object-oriented, high-level programming language


with a clean and expressive syntax. The following features make for easy code develop-
ment and debugging in python:
• Python code is interpreted: There is no need to compile the code. Your code is read by
a python interpreter and made into executable instructions for your computer in real
time.
• Python is dynamically typed: There is no need to declare the type of a variable or the
type of an input to a function.
• Python has automatic garbage collection or memory management: There is no need to
explicitly allocate memory for variables before you use them or deallocate them after
use.
However, keep in mind that these features also make pure python code slower (than, say
C) in repetitious loops because of repeated checking for the type of objects. Therefore
many python modules (such as numpy, which we shall see in detail soon), have C or other
compiled code, which is then wrapped in python to take advantage of python’s usability
without losing speed.
There are at least four ways to interact with your Python 3 installation.
1. Use a python shell
2. Use an iPython shell
3. Put code in a python file ending in .py
4. Write code + text in Jupyter notebook

II.1 Python shell


Type the python command you use in your system (python or python3) to get this shell. I
will use python3 since that is what my system requires, but please do make sure to replace
it by python if that’s what is needed on your system. Here is an image of the interactive
python shell within a terminal.

20
Note the following from the interactive session displayed in the figure above:
• Computing the square root of a number using sqrt is successful only after import-
ing math. Most of the functionality in Python is provided by modules, like the math
module. Some modules, like math, come with python, while others must be installed
after python is installed.
• Strings that begin with # (like “# works!” in the figure) differentiate comments from
code. This is the case in a python shell and also in the other forms of interacting with
python discussed below.
• The dir command shows the facilities provided by a module. As you can see, the
math module contains many functions in addition to sqrt.

II.2 iPython shell


A more powerful shell interactive environment is provided by the iPython shell (type in
ipython or ipython3 into your command prompt, or launch it from Anaconda navigator).
The iPython shell has features like auto-completion, coloring, history of commands, au-
tomatic help by tacking on ?, ability to interact with your operating system’s commands,
etc.

21
II.3 Jupyter Notebook
The Jupyter notebook is a web-browser based graphical environment consisting of cells,
which can consist of code, or text. The text cells should contain text in markdown syntax,
which allows you to type not just words in bold and italic, but also tables, mathematical
formula using latex, etc. The code cells of Jupyter can contain code in various languages,
but here we will exclusively focus on code cells with Python 3.
For example, this block of text that begins with this sentence marks the beginning of a
jupyter notebook cell containing markdown content. If you are viewing this from jupyter,
click on jupyter’s top menu -> Cell -> Cell Type to see what is the type of the current cell,
or to change the cell type. Computations must be done in a code cell, not a markdown cell.
For example, to compute √
cos(π π )7
we open a code cell next with the following two lines of python code:

[1]: from math import cos, sqrt, pi

cos(pi*sqrt(pi))**7

[1]: 0.14008146171564725

This seamless integration of text and code makes Jupyter attractive for developing a repro-
ducible environment for scientific computing.

22
II.4 Python file
Open your favorite text editor, type in some python code, and then save the file as
myfirstpy.py. Here is a simple example of such a file.
#------- myfirstpy.py ---------------------------------
from math import cos, sqrt, pi

print('Hello, I can compute! ')


x = 3
y = cos(pi*sqrt(pi)*x)**7
print('Starting from x =', x, 'we have computed y=', y)
#------------------------------------------------------
One executes such a python file by typing the following on the command line
python3 ../pyfiles/myfirstpy.py
Note that depending on your operating system, you may have to replace the above com-
mand by python ..\pyfiles\myfirstpy.py or similar variants.
You can also execute the python file in a platform-independent way from within this Jupyter
notebook by loading the contents of the file into a cell. This is done using line magic com-
mand %load ../pyfiles/myfirstpy.py. Once you type in this command into a code cell
and execute the cell, the contents of the file will be copied into the cell (and simultaneously,
the load command will be commented out). Then, returning to the cell and executing the
cell a second time runs the same code that was in the file.
[2]: # %load ../pyfiles/myfirstpy.py
from math import cos, sqrt, pi

print('Hello, I can compute! ')


x = 3
y = cos(pi*sqrt(pi)*x)**7
print('Starting from x =', x, 'we have computed y=', y)

Hello, I can compute!


Starting from x = 3 we have computed y= -0.013884089495354414

The above output cell should display the same output as what one would have obtained
if we executed the python file on the command line.
For larger projects (including take-home assignments), you will need to create such python
files with many lines of python code. Therefore it is essential that you know how to create
and execute python files in your system.

23
III
Working with git

April 2, 2020

Git a distributed version control system (and is a program often used independently of
python). A version control system tracks the history of changes in projects with many files,
including data files, and codes, which many people access simultaneously. Git facilitates
identification of changes made, fetching revisions from a cloud repository in git format,
and pushing revisions to the cloud.
GitHub is a cloud server that specializes in serving data in the form of git repositories.
Many other such cloud services exists, such as Atlassian’s BitBucket.
The notebooks that form these lectures are in a git repository served from GitHub. In this
notebook, we describe how to access materials from this remote git repository. We will
also use this opportunity to introduce some object-oriented terminology like classes, objects,
constructor, data members, and methods, which are pervasive in python. Those already
familiar with this terminology and GitHub may skip to the next activity.

III.1 Our materials in GitHub


Lecture notes, exercises, codes, and all accompanying materials can be found in the GitHub
repository at https://github.com/jayggg/mth271content
One of the reasons we use git is that many continuously updated datasets, like the COVID-
19 dataset, are served in git format. Another reason is that we may want to use current
news and fresh data in our activities. Such activities may be prepared with very little lead
time, so cloud git repositories are ideal for pushing in new materials as they get devel-
oped: once they are in the cloud, you have immediate access to them. After a lecture, the
materials may be revised and updated per your feedback and these revisions will also be
available for you from GitHub. Therefore, it is useful to be conversant with GitHub.
Let us spend a few minutes today on how to fetch materials from the git repository. In
particular, executing this notebook will pull the updated data from GitHub and place it in
a location you specify (below).
If you want to know more about git, there are many resources online, such as the Git
Handbook. The most common way to fetch materials from a remote repository is using
git’s command line tools, but for our purposes, the python code in this notebook will
suffice.

24
III.2 Git Repo class in python
We shall use the python module gitpython to work with git. (We already used this mod-
ule in the first overview lecture. The documentation of gitpython contains a lot of infor-
mation on how to use its facilities. The main facility is the class called Repo which it uses
to represent git repositories.

[1]: from git import Repo

Python is an object-oriented language. Everything in the workspace is an object. An


object is an instance of a class. The definition and features of the class Repo were imported
into this workspace by the above line of code. A class has members, which could be data
members or attributes (which themselves are objects residing in the class’ memory layout),
or function members, called methods, which provide functionalities of the class.
You can query the functionalities of Repo using help. Open a cell and type in
help(Repo)
You will see that the ouput contains the extensive documentation for objects of class Repo,
including all its available methods.
Below, we will use the method called clone_from. Here is the class documentation for that
method:
[2]: help(Repo.clone_from)

Help on method clone_from in module git.repo.base:

clone_from(url, to_path, progress=None, env=None, multi_options=None, **kwargs) method of


builtins.type instance
Create a clone from the given URL

:param url: valid git url, see http://www.kernel.org/pub/software/scm/git/docs/git-


clone.html#URLS
:param to_path: Path to which the repository should be cloned to
:param progress: See 'git.remote.Remote.push'.
:param env: Optional dictionary containing the desired environment variables.
Note: Provided variables will be used to update the execution
environment for `git`. If some variable is not specified in `env`
and is defined in `os.environ`, value from `os.environ` will be used.
If you want to unset some variable, consider providing empty string
as its value.
:param multi_options: See ``clone`` method
:param kwargs: see the ``clone`` method
:return: Repo instance pointing to the cloned directory

Classes have a special method called constructor, which you would find listed among its
methods as __init__.
[3]: help(Repo.__init__)

Help on function __init__ in module git.repo.base:

__init__(self, path=None, odbt=<class 'git.db.GitCmdObjectDB'>, search_parent_directories=False,


expand_vars=True)
Create a new Repo instance

25
:param path:
the path to either the root git directory or the bare git repo::

repo = Repo("/Users/mtrier/Development/git-python")
repo = Repo("/Users/mtrier/Development/git-python.git")
repo = Repo("~/Development/git-python.git")
repo = Repo("$REPOSITORIES/Development/git-python.git")
repo = Repo("C:\Users\mtrier\Development\git-python\.git")

- In *Cygwin*, path may be a `'cygdrive/...'` prefixed path.


- If it evaluates to false, :envvar:`GIT_DIR` is used, and if this also evals to false,
the current-directory is used.
:param odbt:
Object DataBase type - a type which is constructed by providing
the directory containing the database objects, i.e. .git/objects. It will
be used to access all object data
:param search_parent_directories:
if True, all parent directories will be searched for a valid repo as well.

Please note that this was the default behaviour in older versions of GitPython,
which is considered a bug though.
:raise InvalidGitRepositoryError:
:raise NoSuchPathError:
:return: git.Repo

The __init__ method is called when you type in Repo(...) with the arguments allowed
in __init__. Below, we will see how to initialize a Repo object using our github repository.

III.3 Your local copy of the repository


Next, each of you need to specify a location on your computer where you want the course
materials to reside. This location can be specified as a string, where subfolders are delin-
eated by forward slash. Please revise the string below to suit your needs.

[4]: coursefolder = '/Users/Jay/tmpdir/'

Python provides a module os to perform operating system dependent tasks in a portable


(platform-independent) way. If you did not give the full name of the folder, os can attempt
to produce it as follows:

[5]: import os
os.path.abspath(coursefolder)

[5]: '/Users/Jay/tmpdir'

Please double-check that the output is what you expected on your operating system: if not,
please go back and revise coursefolder before proceeding. (Windows users should see
forward slashes converted to double backslashes, while mac and linux users will usually
retain the forward slashes.)
We proceed to download the course materials from GitHub. These materials will be stored
in a subfolder of coursefolder called mth271content, which is the name of the git reposi-
tory.

26
[6]: repodir = os.path.join(os.path.abspath(coursefolder), 'mth271content')
repodir # full path name of the subfolder

[6]: '/Users/Jay/tmpdir/mth271content'

Again, the value of the string variable repodir output above describes the location on your
computer where your copy of the course materials from GitHub will reside.

III.4 Two cases


Now there are two cases to consider:
1. Are you downloading the remote git repository for the first time?
2. Or, are you returning to the remote repository to update the materials?
In Case 1, you want to clone the repository. This will create a local copy (on your computer)
of the remote cloud repository.
In Case 2, you want to pull updates (only) from the repository, i.e., only changes in the
remote cloud that you don’t have in your existing local copy.
To decide which case you are in, I will assume the following. If the folder whose name is
the value of the string repodir already exists, then I will assume you are in Case 2. Oth-
erwise, you are in Case 1. To find out if a folder exists, we can use another facility from
os:
[7]: os.path.isdir(repodir)

[7]: True

The output above should be False if you are running this notebook for the first time, per
my assumption above. When you run it after you have executed this notebook successfully
at least once, you would already have cloned the repository, so the folder will exist.

III.5 Clone or pull


The code below uses the conditionals if and else (included in the prerequisite reading for
this lecture) to check if the folder exists: If it does not exist, a new local copy of the GitHub
repository is cloned into your local hard drive. If it exists, then only the differences (or
updates) between your local copy and the remote repository are fetched, so that your local
copy is up to date with the remote.

[8]: if os.path.isdir(repodir): # if repo exists, pull newest data


repo = Repo(repodir)
repo.remotes.origin.pull()
else: # otherwise, clone from remote
repo = Repo.clone_from('https://github.com/jayggg/mth271content',
repodir)

• Here repo is an object of class Repo.


• Repo(repodir) invokes the constructor, namely the __init__ method.

27
• Repo.clone_from(...) calls the clone_from(...) method.
Now you have the updated course materials in your computer in a local folder. The object
repo stores information about this folder, which you gave to the constructor in the string
variable repodir, in a data member called working_dir. You can access any data members
of an object in memory, and you do so just like you access a method, using a dot . followed
by the member name. Here is an example:

[9]: repo.working_dir

[9]: '/Users/Jay/tmpdir/mth271content'

Note how the Repo object was either initialized with repodir (if that folder exists) or set to
clone a remote repository at a URL.

III.6 Updated and future materials


The following instructions are for those of you who want to keep tracking the git repository
closely in the future. Suppose you want to update your local folder with new materials
from GitHub. But at the same time, you want to experiment and modify the notebooks as
you like. This can create conflicting versions, which we should know how to handle.
Consider the situation where I have pushed changes to a file into the remote git repository
that you want your local folder to reflect. But you have been working with the same file
locally and have made changes to it - perhaps you have put a note to yourself to look
something up, or perhaps you have found a better explanation, or better code, than what I
gave. You want to keep your changes.
You should know that once you modify a file that is tracked by git as a local copy of a
remote file, and you ask git to update, git will refuse to overwrite your changes. Because the
remote version of the file and the local version of the file are now in conflict, a simple git
pull command will fail. Git provides constructs to help resolve such conflicts, but let’s try
to keep things simple today. The following method is a solution that doubles the number
of files, but has the advantage of simplicity:
Go to the repodir location in your computer. Copy the jupyter subfolder as, say
jupyterCopy. Overwrite the copy of this notebook (called 03_Working_with_git.ipynb)
in the jupyterCopy folder with this file, which you saved after making your changes to
variables like coursefolder above. Note that jupyerCopy is untracked by git: there is
no remote folder in the cloud repository with that name. So any changes you make in
jupyterCopy will be left untouched by git. So you can freely change any jupyter notebooks
within this folder. The next time you run this file from jupyterCopy it will pull updates
from the remote repository into the original jupyter folder. This way you get your up-
dates from the cloud in jupyter and at the same time get to retain your modifications in
jupyterCopy.
Alternately, if you like working on the command line, instead of running this notebook,
you can run the python file update_course.py on the command line. You should move this
file outside of the repository and save it after changing the value of the string coursefolder
to your specific local folder name.

28
IV
Conversion table

April 2, 2020

This elementary activity is intended to check and consolidate your understanding of very
basic python language features. It is modeled after a similar activity in [HPL] and involves
a simple temperature conversion formula. You may have seen kitchen cheat sheets (or have
one yourself) like the following:

Fahrenheit Celsius
cool oven 200 F 90 C
very slow oven 250 F 120 C
slow oven 300-325 F 150-160 C
moderately slow oven 325-350 F 160-180 C
moderate oven 350-375 F 180-190 C
moderately hot oven 375-400 F 190-200 C
hot oven 400-450 F 200-230 C
very hot oven 450-500 F 230-260 C

This is modeled after a conversion table at the website Cooking Conversions for Old Time
Recipes, which many found particularly useful for translating old recipes from Europe.
Of course, the “old continent” has already moved on to the newer, more rational, metric
system, so all European recipes, old and new, are bound to have temperatures in Celsius
(C). Even if recipes don’t peak your interest, do know that every scientist must learn to
work with the metric system.
Celsius values can be converted to the Fahrenheit system by the formula
9
F= C + 32.
5
The task in this activity is to print a table of F and C values per this formula. While ac-
complishing this task, you will recall basic python language features, like while loop, for
loop, range, print, list and tuples, zip, and list comprehension.

IV.1 Using the while loop


We start by making a table of F and C values, starting from 0 C to 250 C, using the while
loop.

[1]: print('F C')

29
C = 0
while C <= 250:
F = 9 * C / 5 + 32
print(F, C)
C += 10

F C
32.0 0
50.0 10
68.0 20
86.0 30
104.0 40
122.0 50
140.0 60
158.0 70
176.0 80
194.0 90
212.0 100
230.0 110
248.0 120
266.0 130
284.0 140
302.0 150
320.0 160
338.0 170
356.0 180
374.0 190
392.0 200
410.0 210
428.0 220
446.0 230
464.0 240
482.0 250

This cell shows how to add, multiply, assign, increment, print, and run a while loop. Such
basic language features are introduced very well in the prerequisite reading for this lecture,
the official python tutorial’s section titled “An informal introduction to Python.” (Note
that all pointers to prerequisite reading materials are listed together just after the table of
contents in the beginning.)

IV.2 Adjusting the printed output


Examining the output above, we note that it is not perfectly aligned like a printed table.
Here is how we can use print’s features to format or align them to our tastes.
Formatting options like %10.3f can be used for alignment. It’s easy to describe this by an
example:
%10.3f: print 3 decimals, field width 10
%9.2e: print 2 decimals, field width 9, scientific notation
Type help(print) to recall these and other options. Below, we use a fixed width of 4 to
format F and C values.
[2]: print(' F C')

C = 0

30
while C <= 250:
F = 9 * C / 5 + 32
print('%4.0f %4.0f' % (F, C))
C += 10

F C
32 0
50 10
68 20
86 30
104 40
122 50
140 60
158 70
176 80
194 90
212 100
230 110
248 120
266 130
284 140
302 150
320 160
338 170
356 180
374 190
392 200
410 210
428 220
446 230
464 240
482 250

IV.3 Do the same using for loop


In addition to the while loop construct, python also has a for loop, which is often safer
from an accidental bug sending the system into an infinite loop. Also recall the very useful
range construct. The loop statement
for i in range(4):
runs over i=0,1,2,3 implicitly using range’s default starting value 0 and the default step-
ping value 1. For our temperature conversion task, we step by 10 degrees instead of the
default value of 1:
[3]: print(' F C')
for C in range(0, 250, 10):
F = 9 * C / 5 + 32
print('%4.0f %4.0f' % (F, C))

F C
32 0
50 10
68 20
86 30
104 40
122 50
140 60
158 70
176 80

31
194 90
212 100
230 110
248 120
266 130
284 140
302 150
320 160
338 170
356 180
374 190
392 200
410 210
428 220
446 230
464 240

IV.4 Is there a temperature whose F and C values are equal?


As you can see from the above values, for a 10 degree increase in the C column, we see
a corresponding 18 degree increase in the F column. Due to the these different rates of
increase, we should see the values coincide by going to lower C values. Focusing on lower
C values, let us run the for loop again:

[4]: print(' F C')


for C in range(-50, 50, 5):
F = 9 * C / 5 + 32
print('%4.0f %4.0f' % (F, C))

F C
-58 -50
-49 -45
-40 -40
-31 -35
-22 -30
-13 -25
-4 -20
5 -15
14 -10
23 -5
32 0
41 5
50 10
59 15
68 20
77 25
86 30
95 35
104 40
113 45

As you see from the output above, at −40 degrees, the Fahrenheit scale and the Celsius
scale coincide. If you have lived in Minnesota, you probably know how −40 feels like, and
you likely already know the fact we just discovered above (it’s common for Minnesotans
to throw around this tidbit while commiserating in the cold).

32
IV.5 Store in a list
If we want to use the above-printed tables later, we would have to run a loop again. Our
conversion problem is so small that there is no cost to run as many loops as we like, but
in many practical problems, loops contains expensive computations. So one often wants
to store the quantities computed in the loop in order to reuse them later. Lists are good
constructs for this.
First we should note that python has lists and also tuples. Only the former can be modified
after creation. Here is an example of a list:

[5]: Cs = [0, 10] # create list using []


Cs.append(20) # modify by appending an entry
Cs

[5]: [0, 10, 20]

And here is an example of a tuple:

[6]: Cs = (0, 10) # create a tuple using ()

You access a tuple element just like a list element, so Cs[0] will give the first element
whether or not Cs is a list or a tuple. But the statement Cs[0] = -10 that changes an
element of the container will work only if Cs is a list. We say that a list is mutable, while
a tuple is immutable. Tuples are generally faster than lists, but lists are more flexible than
tuples.
Here is an example of how to store the computed C and F values within a loop into lists.

[7]: Cs = [] # empty list


Fs = []

for C in range(0, 250, 25):


Cs.append(C)
Fs.append(9 * C / 5 + 32)

The lists Cs and Fs can be accessed later:


[8]: print(Cs)

[0, 25, 50, 75, 100, 125, 150, 175, 200, 225]

[9]: print(Fs)

[32.0, 77.0, 122.0, 167.0, 212.0, 257.0, 302.0, 347.0, 392.0, 437.0]

This is not as pretty an output as before. But we can easily run a loop and print the stored
values in any format we like. This is a good opportunity to show off a pythonic feature zip
that allows you to traverse two lists simultaneously:

33
[10]: print(' F C')
for C, F in zip(Cs, Fs):
print('%4.0f %4.0f' % (F, C))

F C
32 0
77 25
122 50
167 75
212 100
257 125
302 150
347 175
392 200
437 225

IV.6 List comprehension


An alternate and very interesting way to make lists in python is by the list comprehension
feature. Codes with list comprehension read almost like English. Let’s illustrate this by
creating the list of F values from the existing list Cs of C values. Instead of making Fs in a
loop as above, in a list comprehension, we just say that each value of the list Fs is obtained
applying a formula for each C in a list Cs:

[11]: Fs = [9 * C / 5 + 32 for C in Cs]

Note how this makes for compact code without sacrificing readability: constructs like this
are why your hear so much praise for python’s expressiveness. For mathematicians, the
list comprehension syntax is also reminiscent of the set notation in mathematics: the set
(list) Fs is described in mathematical notation by
{︃ }︃
9
Fs = C + 32 : C ∈ Cs .
5

Note how similar it is to the list comprehension code. (Feel free to check that the Fs com-
puted by the above one-liner is the same as the Fs we computed previously within a loop.)

34
V
Approximating the derivative

April 7, 2020

In calculus, you learnt about the derivative and its central role in modeling processes
where a rate of change is important. How do we compute the derivate on a computer?
Recall what you did in your first calculus course to compute the derivative. You memo-
rized derivatives of simple functions like cos x, sin x, exp x, x n etc. Then you learnt rules
like product rule, quotient rule, chain rule etc. In the end you could systematically com-
pute derivatives of complicated functions by reducing it to simpler components and ap-
plying the rules. We could teach the computer the same rules and come up with an algo-
rithm for computing derivatives. This is the idea behind automatic differentiation. Python
modules like sympy can compute derivatives symbolically in this fashion. However, this
approach has its limits.
In the real world, we often encounter complicated functions, such as functions that cannot
be represented in terms of simple component functions, or functions whose values you can
only query from some proprietary company code, or functions whose values are based off
a table, like for instance this function.

Price of TSLA stock


900 Daily Closing
Weekly Mean
800

700

600

500

400

1-01 1-1
5 01 -15 01 -15 -01
0-0 0-0 0-02- 0-0
2
0-03- 0-0
3
0-0
4
2 0 2 202 202 202 202 202 202
Date

This function represents Tesla’s stock prices this year until yesterday (which I got, in case
you are curious, using just a few lines of python code). The function is complicated (not

35
to mention depressing - it reflects the market downturn due to the pandemic). But its
rate of change drives some investment decisions. Instead of the oscillatory daily stock
values, analysts often look at the rate of change of trend lines (like the rolling weekly
means above), a function certainly not expressible in terms of a few simple functions like
sines or cosines.
In this activity, we look at computing a numerical approximation to the derivative using
something you learnt in calculus.

V.1 Numerical differentiation


Suppose f is a function of a single real variable x. Its derivative at any point x is the slope
of the tangent of its graph at x. This slope, as you no doubt recall from calculus, can be
numerically approximated by the slope of a secant line:

f ( x + h/2) − f ( x − h/2)
f ′ (x) ≈
h
Below is a plot of the tangent line of some function f at x, whose slope is f ′ ( x ), together
with the secant line whose slope is the approximation on the right hand side above. Clearly
as the spacing h decreases, the secant line becomes a better and better approximation to
the tangent line.

t
gen
tan nt
seca
f (x)

The right hand side formula


f ( x + h/2) − f ( x − h/2)
h
can be implemented in python as long as we can compute the values f ( x + h/2) and f ( x −
h/2). As h → 0, we should a good obtain approximation to f ′ ( x ).

V.2 Second derivative


We take one further step and approximate the second derivative by
f ′ ( x + h/2) − f ′ ( x − h/2)
f ′′ ( x ) ≈
(︂ h )︂ (︂ )︂
f ( x +h/2+h/2)− f ( x +h/2−h/2) f ( x −h/2)− f ( x −h/2−h/2)
h − h

h
f ( x + h) − 2 f ( x ) + f ( x − h)

h2

36
This is the Central Difference Formula for the second derivative.
The first task in this activity is to write a function to compute the above-stated second
derivative approximation,

f ( x − h) − 2 f ( x ) + f ( x + h)
h2
given any function f of a single variable x. The parameter h should also be input, but can
take a default value of 10−6 .
The prerequisite reading for this activity included python functions, keyword arguments,
positional arguments, and lambda functions. Let’s apply all of these concepts while com-
puting the derivative approximation. Note that python allows you to pass functions them-
selves as arguments to other functions. Therefore, without knowing what specific function
f to apply the central difference formula, we can write a generic function D2 for implement-
ing the formula for any f .

[1]: def D2(f, x, h=1E-6):


return (f(x-h) - 2*f(x) + f(x+h)) / (h*h)

Let’s apply the formula to some nice function, say the sine function.

[2]: from math import sin

D2(sin, 0.2)

[2]: -0.19864665468105613

Of course we know that second derivative of sin( x ) is negative of itself, so a quick test of
correctness is to compare the above value to that of − sin(0.2).

[3]: -sin(0.2)

[3]: -0.19866933079506122

How do we apply D2 to, say, sin(2x )? One way is to define a function returning sin(2 ∗ x )
and then pass it to D2, as follows.

[4]: def g(x):


return sin(2*x)

D2(g, 0.2)

[4]: -1.5576429035490946

An alternate way is using a lambda function. This gives a one-liner without damaging code
readability.

[5]: D2(lambda x: sin(2*x), 0.2) # central diff approximation

37
[5]: -1.5576429035490946

Of course, in either case the computed value approximates the actual value of sin′′ (2x ) =
−4 sin(2x ), thus verifying our code.
[6]: -4*sin(2* 0.2) # actual 2nd derivative value

[6]: -1.557673369234602

V.3 Error
The error in the approximation formula we just implemented is
f ( x − h) − 2 f ( x ) + f ( x + h)
ε( x ) = f ′′ ( x ) −
h2
Although we can’t know the error ε( x ) without knowing the true value f ′′ ( x ), calculus
gives you all the tools to bound this error.
Substituting the Taylor expansions
h2 ′′ h3 ′′′ h4 ′′′′
f ( x + h) = f ( x ) + h f ′ ( x ) + f (x) + f (x) + f (x) + · · ·
2 6 24
and
h2 ′′ h3 ′′′ h4 ′′′′
f ( x − h) = f ( x ) − h f ′ ( x ) +
f (x) − f (x) + f (x) + · · ·
2 6 24
into the definition of ε( x ), we find that the after several cancellations, the dominant term
is O(h2 ) as h → 0.
This means that if h is halved, the error should decrease by a factor of 4. Let us take a look
at the error in the derivative approximations applied to a simple function
f ( x ) = x −6
at, say x = 1. I am sure you can compute the exact derivative using your calculus knowl-
edge. In the code below, we subtract this exact derivative from the computed derivative
approximation to obtain the error.

[7]: print(' h D2 Result Error')


for k in range(4,8):
h = 2**(-k)
d2g = D2(lambda x: x**-6, 1, h=h)
e = d2g - 42
print('%.0e %.5f %7.6f' %(h, d2g, e))

h D2 Result Error
6e-02 42.99863 0.998629
3e-02 42.24698 0.246977
2e-02 42.06158 0.061579
8e-03 42.01538 0.015384

Clearly, we observe that the error decreases by a factor of 4 when h is halved. This is in
accordance with what we expected from the Taylor expansion analysis above.

38
V.4 Limitations
A serious limitation of numerical differentiation formulas like this can be seen when we
take values of h really close to 0. Although the limiting process in calculus relies on h going
to 0, your computer is not equipped to deal with very small numbers. This creates issues.
Instead of halving h, let us aggressively reduce h by a factor of 10, going down to 10−13
and look at the results.
[8]: for k in range(1,14):
h = 10**(-k)
d2g = D2(lambda x: x**-6,1, h)
print('%.0e %18.5f' %(h, d2g))

1e-01 44.61504
1e-02 42.02521
1e-03 42.00025
1e-04 42.00000
1e-05 41.99999
1e-06 42.00074
1e-07 41.94423
1e-08 47.73959
1e-09 -666.13381
1e-10 0.00000
1e-11 0.00000
1e-12 -666133814.77509
1e-13 66613381477.50939

Although a mathematical argument led us to expect better approximations as h → 0, we


find that the results from our computer for h < 10−8 are totally wrong! The problem is
that computers cannot do exact arithmetic: the infinite real number system is replaced
by a finite set of numbers allowed in the so-called IEEE standard. This causes errors,
called round-off errors that are different from the approximation error ε( x ) we discussed.
Specifically, what happened was that for small h we subtracted very closeby numbers,
creating round-off errors; we then multiplied by a big number (1/h2 ) amplifying these
round-off errors. We shall not deal in depth with round-off errors in this course, but it
pays to be wary of them.

39
VI
Genome of SARS-CoV-2

April 7, 2020

Since most data come in files and streams, a data scientist must be able to effectively work
with them. Python provides many facilities to make this easy. In this class activity, we
will review some of python’s file, string, and dictionary facilities by examining a file con-
taining the genetic code of the virus that has been disrupting our lives this term. Here is
a transmission electron micrograph showing the virus (a public domain image from the
CDC, credited to H. A. Bullock and A. Tamin).

The genetic code of each living organism is a long sequence of simple molecules called nu-
cleotides or bases. Although many nucleotides exist in nature, only 4 nucleotides, labeled
A, C, G, and T, have been found in DNA. They are abbreviations of Adenine, Cytosine,
Guanine, and Thymine. Although it is difficult to put viruses in the category of living
organisms, they also have genetic codes made up of nucleotides.

VI.1 Get the genome


The NCBI (National Center for Biotechnology Information) has recently started maintain-
ing a data hub for genetic sequences related to the virus causing COVID-19. Recall that the

40
name of the virus is SARS-CoV-2 (which is different from the name of the disease, COVID-
19), or “Severe Acute Respiratory Syndrome Coronavirus 2” in full. Searching the NCBI
website with the proper virus name will help you locate many publicly available data sets.
Let’s download NCBI’s Reference Sequence NC_045512 giving the complete genome ex-
tracted from a sample of SARS-CoV-2 from the Wuhan seafood market, called the Wuhan-
Hu-1 isolate. Here is a code using urllib that will attempt to directly download from the
url specified below. It is unclear if this url would serve as a stable permanent link. In the
event you have problems executing the next cell, please just head over to the webpage for
NC_045512, click on “FASTA” (a data format) and then click on “Send to” a file. Then save
the file in the same relative location mentioned below in f within the folder where we have
been putting all the data files in this course.

[1]: # NCBI url:

url = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?tool=portal&' + \
'save=file&log$=seqview&db=nuccore&report=fasta&id=1798174254&' + \
'extrafeat=null&conwithfeat=on&hide-cdd=on'

# your local downloaded file:

f = '../../data_external/SARS-CoV-2-Wuhan-NC_045512.2.fasta'

[2]: import os
import urllib
import shutil

if not os.path.isdir('../../data_external/'):
os.mkdir('../../data_external/')

r = urllib.request.urlopen(url)
fo = open(f, 'wb')
shutil.copyfileobj(r, fo)
fo.close()

As mentioned in the page describing the data, this file gives the RNA of the virus.

[3]: lines = open(f, 'r').readlines()

The file has been opened in read-only mode. The variable lines contains a list of all the
lines of the file. Here are the first five lines:
[4]: lines[0:5]

[4]: ['>NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate␣


↪→Wuhan-

Hu-1, complete genome\n',

41

↪→ 'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA\n',

↪→ 'CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC\n',

↪→ 'TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG\n',

↪→ 'TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTC\n']

The first line is a description of the data. The long genetic code is broken up into the
following lines. We need to strip end-of-line characters from each such line to re-assemble
the RNA string. Here is a way to strip off the end-of-line character:

[5]: lines[1].strip()

[5]: 'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA'

Let’s do so for every line starting ignoring the first. Since lines is a list object, ignoring the
first element of the list is done by lines[1:]. (If you don’t know this already, you must
review the list access constructs.) The following code uses the string operation join to put
together the lines into one long string. This is the RNA of the virus.

[6]: rna = ''.join([line.strip() for line in lines[1:]])

The first thousand characters and the last thousand characters of the RNA of the coron-
avirus are printed below:

[7]: rna[:1000]

[7]: 'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTA
AAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAG
GACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTT
TCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTG
CCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACA
TCTTAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCA
AACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGT
CGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAA
GAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTG
ATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAAC
GGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCT
AGCACGTGCTGGTAAAGCTTCATGCACTTTGTCCGAACAACTGGACTTTATTGACACTAAGAGGGGTGTATACTGCTGCC
GTGAACATGAGCATGAAATTGCTTGGTACACGGAACGTTCT'

[8]: rna[-1000:]

[8]: 'GCTGGCAATGGCGGTGATGCTGCTCTTGCTTTGCTGCTGCTTGACAGATTGAACCAGCTTGAGAGCAAAATGTCTGGTA
AAGGCCAACAACAACAAGGCCAAACTGTCACTAAGAAATCTGCTGCTGAGGCTTCTAAGAAGCCTCGGCAAAAACGTACT
GCCACTAAAGCATACAATGTAACACAAGCTTTCGGCAGACGTGGTCCAGAACAAACCCAAGGAAATTTTGGGGACCAGGA

42
ACTAATCAGACAAGGAACTGATTACAAACATTGGCCGCAAATTGCACAATTTGCCCCCAGCGCTTCAGCGTTCTTCGGAA
TGTCGCGCATTGGCATGGAAGTCACACCTTCGGGAACGTGGTTGACCTACACAGGTGCCATCAAATTGGATGACAAAGAT
CCAAATTTCAAAGATCAAGTCATTTTGCTGAATAAGCATATTGACGCATACAAAACATTCCCACCAACAGAGCCTAAAAA
GGACAAAAAGAAGAAGGCTGATGAAACTCAAGCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTG
CTGCAGATTTGGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCA
GACCACACAAGGCAGATGGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCTTGTGCAGAATGAA
TTCTCGTAACTACATAGCACAAGTAGATGTAGTTAACTTTAATCTCACATAGCAATCTTTAATCAGTGTGTAACATTAGG
GAGGACTTGAAAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTACGATCGAGTGTACAGTGAACAATGCTAGGGAGA
GCTGCCTATATGGAAGAGCCCTAATGTGTAAAATTAATTTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGG
AGAATGACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

Here is the total length of the RNA:

[9]: len(rna)

[9]: 29903

While the human genome is over 3 billion in length, the genome of this virus does not even
reach the length of 30000.

VI.2 Finding a protein


When describing RNA, the T (Thymine) is often replaced by U (Uracil). This is done for
example in an interesting New York Times article that came out last Friday. The article
explains how this RNA code makes infected host cells produce a variety of proteins. Sci-
entists have a good understanding of what some of these proteins do, but not all.
Here is a quote from the article on a protein it nicknamed Virus Liberator. ORF7a
When new viruses try to escape a cell, the cell can snare them with
proteins called tetherin. Some research suggests that ORF7a cuts
down an infected cell’s supply of tetherin, allowing more of the
viruses to escape. Researchers have also found that the protein can
trigger infected cells to commit suicide - which contributes to the
damage Covid-19 causes to the lungs.
The article then gives the ORF7a sequence, which I have copied and pasted into the next
cell, adding some string breaks. Note how the article has used lower case characters and
the character u instead of T.
[10]: orf7a = \
'augaaaauuauucuuuucuuggcacugauaacacucgcuacuugugagcuuuaucacuaccaag' + \
'aguguguuagagguacaacaguacuuuuaaaagaaccuugcucuucuggaacauacgagggcaa' + \
'uucaccauuucauccucuagcugauaacaaauuugcacugacuugcuuuagcacucaauuugcu' + \
'uuugcuuguccugacggcguaaaacacgucuaucaguuacgugccagaucaguuucaccuaaac' + \
'uguucaucagacaagaggaaguucaagaacuuuacucuccaauuuuucuuauuguugcggcaau' + \
'aguguuuauaacacuuugcuucacacucaaaagaaagacagaaugauugaacuuucauuaauug' + \
'acuucuauuugugcuuuuuagccuuucugcuauuccuuguuuuaauuaugcuuauuaucuuuug' + \
'guucucacuugaacugcaagaucauaaugaaacuugucacgccuaaacgaac'

43
The next task in this class activity is to find if this sequence occurs in the RNA we just
downloaded, and if it does, where it occurs. To this end, we first make the replacements
required to read the string in terms of A, T, G, and C.

[11]: s=orf7a.replace('u', 'T').replace('a', 'A').replace('g', 'G').replace('c',␣


↪→'C')

[11]: 'ATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAGAGGTA
CAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAACATACGAGGGCAATTCACCATTTCATCCTCTAGCTGATAACAAA
TTTGCACTGACTTGCTTTAGCACTCAATTTGCTTTTGCTTGTCCTGACGGCGTAAAACACGTCTATCAGTTACGTGCCAG
ATCAGTTTCACCTAAACTGTTCATCAGACAAGAGGAAGTTCAAGAACTTTACTCTCCAATTTTTCTTATTGTTGCGGCAA
TAGTGTTTATAACACTTTGCTTCACACTCAAAAGAAAGACAGAATGATTGAACTTTCATTAATTGACTTCTATTTGTGCT
TTTTAGCCTTTCTGCTATTCCTTGTTTTAATTATGCTTATTATCTTTTGGTTCTCACTTGAACTGCAAGATCATAATGAA
ACTTGTCACGCCTAAACGAAC'

The next step is now a triviality in view of python’s exceptional string handling mecha-
nisms:
[12]: s in rna

[12]: True

We may also easily find the location of the ORF7a sequence and read off the entire string
beginning with the sequence.

[13]: rna.find(s)

[13]: 27393

[14]: rna[27393:]

[14]: 'ATGAAAATTATTCTTTTCTTGGCACTGATAACACTCGCTACTTGTGAGCTTTATCACTACCAAGAGTGTGTTAGAGGTA
CAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAACATACGAGGGCAATTCACCATTTCATCCTCTAGCTGATAACAAA
TTTGCACTGACTTGCTTTAGCACTCAATTTGCTTTTGCTTGTCCTGACGGCGTAAAACACGTCTATCAGTTACGTGCCAG
ATCAGTTTCACCTAAACTGTTCATCAGACAAGAGGAAGTTCAAGAACTTTACTCTCCAATTTTTCTTATTGTTGCGGCAA
TAGTGTTTATAACACTTTGCTTCACACTCAAAAGAAAGACAGAATGATTGAACTTTCATTAATTGACTTCTATTTGTGCT
TTTTAGCCTTTCTGCTATTCCTTGTTTTAATTATGCTTATTATCTTTTGGTTCTCACTTGAACTGCAAGATCATAATGAA
ACTTGTCACGCCTAAACGAACATGAAATTTCTTGTTTTCTTAGGAATCATCACAACTGTAGCTGCATTTCACCAAGAATG
TAGTTTACAGTCATGTACTCAACATCAACCATATGTAGTTGATGACCCGTGTCCTATTCACTTCTATTCTAAATGGTATA
TTAGAGTAGGAGCTAGAAAATCAGCACCTTTAATTGAATTGTGCGTGGATGAGGCTGGTTCTAAATCACCCATTCAGTAC
ATCGATATCGGTAATTATACAGTTTCCTGTTTACCTTTTACAATTAATTGCCAGGAACCTAAATTGGGTAGTCTTGTAGT
GCGTTGTTCGTTCTATGAAGACTTTTTAGAGTATCATGACGTTCGTGTTGTTTTAGATTTCATCTAAACGAACAAACTAA
AATGTCTGATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTA
ACCAGAATGGAGAACGCAGTGGGGCGCGATCAAAACAACGTCGGCCCCAAGGTTTACCCAATAATACTGCGTCTTGGTTC
ACCGCTCTCACTCAACATGGCAAGGAAGACCTTAAATTCCCTCGAGGACAAGGCGTTCCAATTAACACCAATAGCAGTCC
AGATGACCAAATTGGCTACTACCGAAGAGCTACCAGACGAATTCGTGGTGGTGACGGTAAAATGAAAGATCTCAGTCCAA
GATGGTATTTCTACTACCTAGGAACTGGGCCAGAAGCTGGACTTCCCTATGGTGCTAACAAAGACGGCATCATATGGGTT

44
GCAACTGAGGGAGCCTTGAATACACCAAAAGATCACATTGGCACCCGCAATCCTGCTAACAATGCTGCAATCGTGCTACA
ACTTCCTCAAGGAACAACATTGCCAAAAGGCTTCTACGCAGAAGGGAGCAGAGGCGGCAGTCAAGCCTCTTCTCGTTCCT
CATCACGTAGTCGCAACAGTTCAAGAAATTCAACTCCAGGCAGCAGTAGGGGAACTTCTCCTGCTAGAATGGCTGGCAAT
GGCGGTGATGCTGCTCTTGCTTTGCTGCTGCTTGACAGATTGAACCAGCTTGAGAGCAAAATGTCTGGTAAAGGCCAACA
ACAACAAGGCCAAACTGTCACTAAGAAATCTGCTGCTGAGGCTTCTAAGAAGCCTCGGCAAAAACGTACTGCCACTAAAG
CATACAATGTAACACAAGCTTTCGGCAGACGTGGTCCAGAACAAACCCAAGGAAATTTTGGGGACCAGGAACTAATCAGA
CAAGGAACTGATTACAAACATTGGCCGCAAATTGCACAATTTGCCCCCAGCGCTTCAGCGTTCTTCGGAATGTCGCGCAT
TGGCATGGAAGTCACACCTTCGGGAACGTGGTTGACCTACACAGGTGCCATCAAATTGGATGACAAAGATCCAAATTTCA
AAGATCAAGTCATTTTGCTGAATAAGCATATTGACGCATACAAAACATTCCCACCAACAGAGCCTAAAAAGGACAAAAAG
AAGAAGGCTGATGAAACTCAAGCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTT
GGATGATTTCTCCAAACAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAAACTCATGCAGACCACACAA
GGCAGATGGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCTTGTGCAGAATGAATTCTCGTAAC
TACATAGCACAAGTAGATGTAGTTAACTTTAATCTCACATAGCAATCTTTAATCAGTGTGTAACATTAGGGAGGACTTGA
AAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTACGATCGAGTGTACAGTGAACAATGCTAGGGAGAGCTGCCTATA
TGGAAGAGCCCTAATGTGTAAAATTAATTTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

VI.3 Nucleotide frequencies


The frequency of a base or a nucleotide in a genetic code is the number of times it occurs
divided by the length of the code. The varying frequency of different nucleotides, called
the nucleotide bias varies between organisms and is known to have biological implica-
tions. Biologists also often talk of the GC content, the percentage of nitrogeneous bases (G
and C) in an RNA or DNA to get insights into its stability.
The next task in this activity is to make a python dictionary, called freq, whose keys are
the nucleotide characters and whose values are the number of times it occurs in the virus
RNA. Once you have made it, freq['A'], for example, should output the frequency of
nucleotide A.
[15]: freq = {b: rna.count(b)/len(rna) for b in 'ATGC'}

[16]: freq

[16]: {'A': 0.29943483931378123,


'T': 0.32083737417650404,
'G': 0.19606728421897468,
'C': 0.18366050229074005}

VI.4 A Washington sample


A more recent dataset at NCBI, apparently just submitted for peer-review on April 3,
claims to contain the genome of a virus sample from our neighboring state of Washing-
ton. You can find it labeled there as the data set MT293201. Let us take a look. (Again,
if the url below fails, please head over the NCBI webpage, find and download the corre-
sponding data file for this sample, again in FASTA format, and save it using the name f2
below.)

45
[17]: url2 = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?' + \
'tool=portal&save=file&log$=seqview&db=nuccore&report=fasta&' + \
'id=1828694245&extrafeat=null&conwithfeat=on&hide-cdd=on'
f2 = '../../data_external/SARS-CoV-2-Washington_MT293201.1.fasta'

[18]: r2 = urllib.request.urlopen(url2)
fo2 = open(f2, 'wb')
shutil.copyfileobj(r2, fo2)

You might have already heard in the news that there are multiple strains of the virus
around the globe. Let’s investigate this genetic code a bit closer.

Is this the same genetic code as from the Wuhan sample? Repeating the previous pro-
cedure on this new file, we now make a string object that contains the RNA from the
Washington sample. We shall call it rna2 below.

[19]: lines = open(f2, 'r').readlines()


rna2 = ''.join([line.strip() for line in lines[1:]])

We should note that not all data sets uses just ATGC. There is a standard notation that ex-
tends the four letters, e.g., N is used to indicate any nucleotide. So, it might be a good idea
to answer this question first: what are the distinct characters in the new rna2? There can be
very simply done in python if you use the set data structure, which removes duplicates.

[20]: set(rna2)

[20]: {'A', 'C', 'G', 'T'}

The next natural question might be this. Are the lengths of rna and rna2 the same?

[21]: len(rna2), len(rna)

[21]: (29846, 29903)

We could also look at the first and last 30 characters and check if they are the same, like so:

[22]: rna2[:30], rna2[-30:]

[22]: ('AACCTTTAAACTTTCGATCTCTTGTAGATC', 'TTTAATAGCTTCTTAGGAGAATGACAAAAA')

[23]: rna[:30], rna[-30:]

[23]: ('ATTAAAGGTTTATACCTTCCCAGGTAACAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA')

Clearly, rna and rna2 are different strings.

Compare their nucleotide frequencies


[24]: freq2 = {b: rna2.count(b)/len(rna2) for b in 'ATGC'}

46
[25]: freq2

[25]: {'A': 0.29866648797158746,


'T': 0.3214166052402332,
'G': 0.1963077129263553,
'C': 0.18360919386182403}

Although the Washington genome is not identical to the Wuhan one, their nucleotide fre-
quencies are very close to the Wuhan one, reproduced here:

[26]: freq

[26]: {'A': 0.29943483931378123,


'T': 0.32083737417650404,
'G': 0.19606728421897468,
'C': 0.18366050229074005}

Does it contain ORF7a?


[27]: s in rna2

[27]: True

[28]: rna2.find(s)

[28]: 27364

Thus, we located the same ORF7a instruction in this virus at a different location. Although
the genetic code from the Washington sample and the Wuhan sample are different, they
can make the same protein ORF7a and their nucleotide frequencies are very close.
This activity provided you with just a glimpse into the large field of bioinformatics, which
studies, among other things, patterns of nucleotide arrangements. If you are interested in
this field, you should take a look at Biopython, a bioinformatics python package.

47
VII
Fibonacci primes

April 9, 2020

Fibonacci numbers appear in so many unexpected places that I am sure you have already
seen them. They are elements of the Fibonacci sequence Fn defined by

F0 = 0, F1 = 1,
Fn = Fn−1 + Fn−2 , for n > 1.

Obviously, this recursive formula gives infinitely many Fibonacci numbers. We also know
that there are infinitely many prime numbers: the ancient Greeks knew it (actually proved
it) in 300 BC!
But, to this day, we still do not know if there are infinitely many prime numbers in the Fibonacci
sequence. These numbers form the set of Fibonacci primes. Its (in)finiteness is one of the
still unsolved problems in mathematics.
In this activity, you will compute a few initial Fibonacci primes, while reviewing some
python features along the way, such as generator expressions, yield, next, all, line mag-
ics, modules, and test functions. Packages we shall come across include memory_profiler,
primesieve, and pytest.

VII.1 Generator expressions


Representing sequences is one of the elementary tasks any programming language should
be able to do well. Python lists can certainly be used for this. For example, the following
list comprehension gives elements of the sequence

ni , n = 0, 1, 2, . . . , N − 1

succinctly:

[1]: i=2; N=10

L = [n**i for n in range(1, N)]

If you change the brackets to parentheses, then instead of a list comprehension, you get a
different object called generator expression.

[2]: G = (n**i for n in range(1, N))

Both L and G are examples of iterators, an abstraction of a sequence of things with the
ability to tell, given an element, what is the next element of the sequence. Since both L and

48
G are iterators, you will generally not see a difference in the results if you run a loop to
print their values, or if you use them within a list comprehension.

[3]: [l for l in L]

[3]: [1, 4, 9, 16, 25, 36, 49, 64, 81]

[4]: [g for g in G]

[4]: [1, 4, 9, 16, 25, 36, 49, 64, 81]

However, if you run the last statement again, what happens?

[5]: [g for g in G]

[5]: []

The difference between the generator expression G and the list L is that a generator expres-
sion does not actually compute the values until they are needed. Once an element of the
sequence is computed, the next time, the generator can only compute the next element in
the sequence. If the end of a finite sequence was already reached in a previous use of the
generator, then there are no more elements of the sequence to compute. This is why we
got the empty output above.

VII.2 Generator functions


Just as list comprehensions can be viewed as abbreviations of loops, generator expressions
can also be viewed so using the yield statement. The statement
G = (n**i for n in range(1, N))
is an abbreviation of the following function with a loop where you find yield in the loca-
tion where you might have expected a return statement.

[6]: def GG():


for n in range(1, N):
yield n**i

[7]: G2 = GG()
print(*G2) # see that you get the same values as before

1 4 9 16 25 36 49 64 81

The yield statement tells python that this function does not just return a value, but rather
a value that is an element of a sequence, or an iterator. Internally, in order for something
to be an iterator in python, it must have a well-defined __next__() method: even though
you did not explicitly define anything called __next__ when you defined GG, python seeing
yield defines one for you behind the scenes.

49
Recall that you have seen another method whose name also began with two underscores,
the special __init__ method, which allows you to construct a object using the name of the
class followed by parentheses. The __next__ method is also a “special” method in that it
allows you to call next on the iterator to get its next value, like so:

[8]: G2 = GG()

# get the first 3 values of the sequence using next:

next(G2), next(G2), next(G2)

[8]: (1, 4, 9)

[9]: print(*G2) # print the remaining values of the sequence

16 25 36 49 64 81

As you can see, a generator “remembers” where it left off in a prior iteration.

VII.3 Disposable generators or reusable lists?


It might seem that generators are dangerous disposable objects that are somehow inferior
to resuable lists which have all the same properties. Here is an example that checks that
thinking:

[10]: i = -20
N = 10**8

To compute the sum


108
1
∑ n20
,
n =1

would you use the following list comprehension?


sum([n**i for n in range(1, N)])
If you do, you would need to store the created list in memory. If you install the
memory_profiler and use it as described in the prerequisite reading material from [JV-
H], then you can see memory usage easily. If you don’t have a GB of RAM free, be warned
that running this list comprehension (mentioned above, and in the cell after next) might
crash your computer.

[11]: %load_ext memory_profiler

[12]: %memit sum([n**i for n in range(1, N)])

peak memory: 3884.82 MiB, increment: 3842.59 MiB

50
Random documents with unrelated
content Scribd suggests to you:
d ates e, ea g day s ace, ca ed o eus
"This mortal steals upon my sovereignty,
Stands brazen champion for the world of flesh,
Determines souls that waver towards the Styx—
Worse! hales the souls back from beyond the Styx,
Bringing the dead to life. This is more craft,
Brother, than we may suffer in a man.
Shall he with careless finger sway at will
The Balance of Destiny? Avenge me, Zeus!"
A Cyclops forged a thunder-bolt for Zeus,
And, black-browed, Zeus did launch it ... Thus I lost
My son Asklepios, killed thro' too much knowledge.

Asklepios! my dead Asklepios!

Let the dark King of Stygia howl for aid


To Olympos! I am King of Heaven and ask
No aid! I wreak my vengeance for myself.
I rose up in the wrath of my bereavement
And set an arrow to the silver bow
That none save I can bend, and let it fly.
I might not slay the wielder of the bolt,
But I did slay the forger of the bolt.
And when I saw the Cyclops pierced and dead
I came to Zeus and told him of my deed:
"Father, 'gainst whom my bow was never turned,
Father, that hast destroyed thine own son's son,
I defy thy doing and have destroyed thy tool."

Then while the Gods stood all aghast, Zeus spake:


"Go from among this immortal company
Which thou hast sinned against in daring so
To sin against me that am the head of all,
And learn to quell thy too fierce spirit, learn
To teach thy riotous blood obedience,
Serving the sons of men one year of days.
Go hence! thou art not of us for twelve moons."
I thi id d t F h G d
I nothing said, and went. For when we Gods
Revolt among ourselves the end is near,
And Zeus must levy justice as he will.

Asklepios! my dead Asklepios!


Had an hundred bolts been forged instead of one
I had slain an hundred Cyclops for thy sake
And suffered an hundred years of degradation!

Earth that receivest my body for a space,


I first saw light upon thee. Comfort me,
And tame a little the untamed blood in me.
Better will I endure to learn of thee
Than of the envious Gods, whom this disgrace
Serves for a secret feast to glut their hearts on.
For we have loved each other, thou and I,
And I have belted thee with golden arms,
And I have claspt thee daily with hot kisses,
And felt thee leap and pulse and answer to me
Like a shy maid grown bold and glad with love.
There's that in the core of thee that is so kin
To the core of me, it holds us twain inseverable,
Tho' from a billion blue-gold caverns of air
Translucent waves of space roll up an ocean
'Twixt earth and sun: our hearts beat time together.
My sister of the spheres has no such power
To quicken thee, be lov'd of thee and love thee.
She rains down light like argent snows; and thou,
Part shadow'd, part-illumin'd, wholly chill'd,
Submitt'st thyself to call her queen, who asks
No ardent service of thee, earth, as I do.
Yet, chaste twin-sister, we were of one birth;
Thy veins run all the silver, mine the gold.
What marvel Leto had nine days labour of us,
Strenuously thus disparting snow from flame,
To give the Gods one daughter all pure ice,
One son all perfect fire?
One son all perfect fire?...
O Thunderer!
That spark of immortal fire which, pregnant in her,
Evolved into my Godhead, issuèd
Out of thy Godhead; my humiliation
Is thy humiliation, Zeus! I stand
Supremest in thy shining progeny:
I am thy glittering symbol fix'd in heaven
To draw the dazed, adoring eyes of men:
I am thy arm of vengeance, I the hand
Bestowing thy good gifts: I am thy Voice
Of mystic prophecy and divination
Thro' which thou keep'st thy fingers on men's souls.
Daughters and sons thou hast whose attributes,
This one by twisty cunning, this by love
Too often base, this by remorseless carnage
Not bearing the high name of vengeance, these
By the insidious lusts of gold and wine,
Serve to express thee to the bodies of men;
But I express thee to the ghost in them,
For there is none whose vesture is like mine
Weft only of the spirit's highest tissues,
So that the world beholding thee thro' me
Beholds thee at thy zenith, and exalted
Out of the flesh struggles to sense an instant
The music, fire and essence of Olympos.
This Thunderer, wilt thou smirch? More dim, more dim
Than the imperial spark thou quenchest in me
Thou mak'st thy imperial fires whence I did spring,
The fount of us so indissoluble
That what shames thee shames me.
Earth, is this vengeance?

Nay, I see clearer. Rest unstained of me,


Thou God that art the father of my being.
The spirit of me, which is Thou, makes cause with thee
Against me. We must be inviolable
ga st e e ust be o ab e
Or men will point their fingers—when We fall.

Asklepios! farewell, Asklepios!

Earth, I will serve on thee my year of days


Nor chafe beneath them like a petulant boy.
Ay, tho' Zeus force my Godhead into bonds
I will yet bear my bondage like a God.
Transcriber's Note
Obvious punctuation and spelling errors have been
repaired.
*** END OF THE PROJECT GUTENBERG EBOOK PAN-WORSHIP, AND
OTHER POEMS ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute


this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like