Machine Learning on Geographical Data Using Python 1st Edition Joos Korstanje - The complete ebook version is now available for download
Machine Learning on Geographical Data Using Python 1st Edition Joos Korstanje - The complete ebook version is now available for download
com
https://ebookmeta.com/product/machine-learning-on-
geographical-data-using-python-1st-edition-joos-korstanje/
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/the-culture-of-judicial-independence-in-
a-globalised-world-1st-edition-shimon-shetreet-wayne-mccormack/
ebookmeta.com
Information Technology for Healthcare Managers 9th Edition
Gerald L. Glandon
https://ebookmeta.com/product/information-technology-for-healthcare-
managers-9th-edition-gerald-l-glandon/
ebookmeta.com
https://ebookmeta.com/product/stress-and-your-body-1st-edition-robert-
sapolsky/
ebookmeta.com
https://ebookmeta.com/product/workplace-ostracism-its-nature-
antecedents-and-consequences-1st-edition-cong-liu/
ebookmeta.com
https://ebookmeta.com/product/investing-in-stocks-for-dummies-paul-
mladjenovic/
ebookmeta.com
https://ebookmeta.com/product/2021-african-small-publishers-catalogue-
colleen-higgs/
ebookmeta.com
Love at First Swim Tiaras and Treats Book 5 1st Edition
Rachelle Stevensen Stevensen Rachelle
https://ebookmeta.com/product/love-at-first-swim-tiaras-and-treats-
book-5-1st-edition-rachelle-stevensen-stevensen-rachelle/
ebookmeta.com
Joos Korstanje
Apress Standard
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Source Code
All source code used in the book can be downloaded from
github.com/apress/machine-learning-geographic-
data-python.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit http://www.apress.com/source-code.
Table of Contents
Part I: General Introduction
Chapter 1:Introduction to Geodata
Reading Guide for This Book
Geodata Definitions
Cartesian Coordinates
Polar Coordinates and Degrees
The Difference with Reality
Geographic Information Systems and Common Tools
What Are Geographic Information Systems
Standard Formats of Geodata
Shapefile
Google KML File
GeoJSON
TIFF/JPEG/PNG
CSV/TXT/Excel
Overview of Python Tools for Geodata
Key Takeaways
Chapter 2:Coordinate Systems and Projections
Coordinate Systems
Geographic Coordinate Systems
Projected Coordinate Systems
Local Coordinate Systems
Which Coordinate System to Choose
Playing Around with Some Maps
Example:Working with Own Data
Key Takeaways
Chapter 3:Geodata Data Types
Vector vs.Raster Data
Dealing with Attributes in Vector and Raster
Points
Definition of a Point
Importing an Example Point Dataset in Python
Some Basic Operations with Points
Lines
Definition of a Line
An Example Line Dataset in Python
Polygons
Definition of a Polygon
An Example Polygon Dataset in Python
Some Simple Operations with Polygons
Rasters/Grids
Definition of a Grid or Raster
Importing a Raster Dataset in Python
Key Takeaways
Chapter 4:Creating Maps
Mapping Using Geopandas and Matplotlib
Getting a Dataset into Python
Making a Basic Plot
Plot Title
Plot Legend
Mapping a Point Dataset with Geopandas and Matplotlib
Concluding on Mapping with Geopandas and Matplotlib
Making a Map with Cartopy
Concluding on Mapping with Cartopy
Making a Map with Plotly
Concluding on Mapping with Plotly
Making a Map with Folium
Concluding on Mapping with Folium
Key Takeaways
Part II: GIS Operations
Chapter 5:Clipping and Intersecting
What Is Clipping?
A Schematic Example of Clipping
What Happens in Practice When Clipping?
Clipping in Python
What Is Intersecting?
What Happens in Practice When Intersecting?
Conceptual Examples of Intersecting Geodata
Intersecting in Python
Difference Between Clipping and Intersecting
Key Takeaways
Chapter 6:Buffers
What Are Buffers?
A Schematic Example of Buffering
What Happens in Practice When Buffering?
Creating Buffers in Python
Creating Buffers Around Points in Python
Creating Buffers Around Lines in Python
Creating Buffers Around Polygons in Python
Combining Buffers and Set Operations
Key Takeaways
Chapter 7:Merge and Dissolve
The Merge Operation
What Is a Merge?
A Schematic Example of Merging
Merging in Python
Row-Wise Merging in Python
Attribute Join in Python
Spatial Join in Python
The Dissolve Operation
What Is the Dissolve Operation?
Schematic Overview of the Dissolve Operation
The Dissolve Operation in Python
Key Takeaways
Chapter 8:Erase
The Erase Operation
Schematic Overview of Spatially Erasing Points
Schematic Overview of Spatially Erasing Lines
Schematic Overview of Spatially Erasing Polygons
Erase vs.Other Operations
Erase vs.Deleting a Feature
Erase vs.Clip
Erase vs.Overlay
Erasing in Python
Erasing Portugal from Iberia to Obtain Spain
Erasing Points in Portugal from the Dataset
Cutting Lines to Be Only in Spain
Key Takeaways
Part III: Machine Learning and Mathematics
Chapter 9:Interpolation
What Is Interpolation?
Different Types of Interpolation
Linear Interpolation
Polynomial Interpolation
Nearest Neighbor Interpolation
From One-Dimensional to Spatial Interpolation
Spatial Interpolation in Python
Linear Interpolation Using Scipy Interp2d
Kriging
Linear Ordinary Kriging
Gaussian Ordinary Kriging
Exponential Ordinary Kriging
Conclusion on Interpolation Methods
Key Takeaways
Chapter 10:Classification
Quick Intro to Machine Learning
Quick Intro to Classification
Spatial Classification Use Case
Feature Engineering with Additional Data
Importing and Inspecting the Data
Spatial Operations for Feature Engineering
Reorganizing and Standardizing the Data
Modeling
Model Benchmarking
Key Takeaways
Chapter 11:Regression
Introduction to Regression
Spatial Regression Use Case
Importing and Preparing Data
Iteration 1 of Data Exploration
Iteration 1 of the Model
Iteration 2 of Data Exploration
Iteration 2 of the Model
Iteration 3 of the Model
Iteration 4 of the Model
Interpretation of Iteration 4 Model
Key Takeaways
Chapter 12:Clustering
Introduction to Unsupervised Modeling
Introduction to Clustering
Different Clustering Models
Spatial Clustering Use Case
Importing and Inspecting the Data
Cluster Model for One Person
Tuning the Clustering Model
Applying the Model to All Data
Key Takeaways
Chapter 13:Conclusion
What You Should Remember from This Book
Recap of Chapter 1 – Introduction to Geodata
Recap of Chapter 2 – Coordinate Systems and Projections
Recap of Chapter 3 – Geodata Data Types
Recap of Chapter 4 – Creating Maps
Recap of Chapter 5 – Clipping and Intersecting
Recap of Chapter 6 – Buffers
Recap of Chapter 7 – Merge and Dissolve
Recap of Chapter 8 – Erase
Recap of Chapter 9 – Interpolation
Recap of Chapter 10 – Classification
Recap of Chapter 11 – Regression
Recap of Chapter 12 – Clustering
Further Learning Path
Going into Specialized GIS
Specializing in Machine Learning
Remote Sensing and Image Treatment
Other Specialties
Key Takeaways
Index
About the Author
Joos Korstanje
is a data scientist, with over five years of
industry experience in developing
machine learning tools. He has a double
MSc in Applied Data Science and in
Environmental Science and has extensive
experience working with geodata use
cases. He has worked at a number of
large companies in the Netherlands and
France, developing machine learning for
a variety of tools. His experience in
writing and teaching has motivated him
to write this book on machine learning
for geodata with Python.
About the Technical Reviewer
Xiaochi Liu
is a PhD researcher and data scientist at
Macquarie University, specializing in
machine learning, explainable artificial
intelligence, spatial analysis, and their
novel application in environmental and
public health. He is a programming
enthusiast using Python and R to
conduct end-to-end data analysis. His
current research applies cutting-edge AI
technologies to untangle the causal
nexus between trace metal
contamination and human health to
develop evidence-based intervention
strategies for mitigating environmental exposure.
Part I
General Introduction
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2022
J. Korstanje, Machine Learning on Geographical Data Using Python
https://doi.org/10.1007/978-1-4842-8287-8_1
1. Introduction to Geodata
Joos Korstanje1
(1) VIELS MAISONS, France
Geodata Definitions
To get started, I want to cover the basics of coordinate systems in the
simplest mathematic situation: the Euclidean space. Although the world
does not respect the hypothesis made by Euclidean geometry, it is a great
entry into the deeper understanding of coordinate systems.
A two-dimensional Euclidean space is often depicted as shown in
Figure 1-1.
Cartesian Coordinates
To locate points in the Euclidean space, we can use the Cartesian
coordinate system. This coordinate system specifies each point uniquely
by a pair of numerical coordinates. For example, look at the coordinate
system in Figure 1-2, in which two points are located: a square and a
triangle.
The square is located at x = 2 and y = 1 (horizontal axis). The triangle
is located at x = -2 and y = -1.
Figure 1-2 Two points in a coordinate system. Image by author
The point where the x and y axes meet is called the Origin, and
distances are measured from there. Cartesian coordinates are among the
most well-known coordinate system and work easily and intuitively in the
Euclidean space.
The letter r signifies the distance and the letter φ is the angle. You can go
the other way as well, using the following formulas:
ArcGIS
ArcGIS, made by ESRI, is arguably the most famous software package for
working with Geographic Information Systems. It has a very large number
of functionalities that can be accessed through a user-friendly click-
button system, but visual programming of geodata processing pipelines is
also allowed. Python integration is even possible for those who have
specific tasks for which there are no preexisting tools in ArcGIS. Among
its tools are also AI and data science options.
ArcGIS is a great software for working with geodata. Yet there is one
big disadvantage, and that is that it is a paid, proprietary software. It is
therefore accessible only to companies or individuals that have no
difficulty paying the considerably high price. Even though it may be worth
its price, you’ll need to be able to pay or convince your company to pay for
such software. Unfortunately, this is often not the case.
Python/R Programming
Finally, you can use Python or R programming for working with geodata
as well. Programming, especially in Python or R, is a very common skill
among data professionals nowadays.
As programming skills were less well spread a few years back, the
boom in data science, machine learning, and artificial intelligence has
made languages like Python become very commonly spread throughout
the workforce.
Now that many are able to code or have access to courses to learn how
to code, the need for full software becomes less. The availability of a
number of well-functioning geodata packages is enough for many to get
started.
Python or R programming is a great tool for treating geodata with
common or more modern methods. By using these programming
languages, you can easily apply tools from other libraries to your geodata,
without having to convert this to QGIS modules, for example.
The only problem that is not very well solved by programming
languages is long-term geodata storage. For this, you will need a database.
Cloud-based databases are nowadays relatively easy to arrange and
manage, and this problem is therefore relatively easily solved.
Shapefile
The shapefile is a very commonly used file format for geodata because it
is the standard format for ArcGIS. The shapefile is not very friendly for
being used outside of ArcGIS, but due to the popularity of ArcGIS, you will
likely encounter shapefiles at some point.
The shapefile is not really a single file. It is actually a collection of files
that are stored together in one and the same directory, all having the
same name. You have the following files that make up a shapefile:
– myfile.shp: The main file, also called the shapefile (confusing but true)
– myfile.shx: The shapefile index file
– myfile.dbf: The shapefile data file that stores attribute data
– myfile.prj: Optional file that stores spatial reference and projection
metadata
As an example, let’s look at an open data dataset containing the
municipalities of the Paris region that is provided by the French
government. This dataset is freely available at
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f
2c318a0971e8faedb7b5675d6
On this website, you can download the data in SHP/L93 format, and
this will allow you to download a directory with a zip file. Figure 1-6
shows what this contains.
Figure 1-6 The inside of the shapefile. Image by author Data source: Ministry of
DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f2c318a0971e8faedb7b5675d6,
updated on 1 July 2016. Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
As you can see, there are the .shp file (the main file), the .shx file (the
index file), the .dbf file containing the attributes, and finally the optional
.prj file.
For this exercise, if you want to follow along, you can use your local
environment or a Google Colab Notebook at
https://colab.research.google.com/.
You have to make sure that in your environment, you install
geopandas:
Then, make sure that in your environment you have a directory called
Communes_MGP.shp in which you have the four files:
– Communes_MGP.shp
– Communes_MGP.dbf
– Communes_MGP.prj
– Communes_MGP.shx
In a local environment, you need to put the “sample_data” file in the
same directory as the notebook, but when you are working on Colab, you
will need to upload the whole folder to your working environment, by
clicking the folder icon and then dragging and dropping the whole folder
onto there. You can then execute the Python code in Code Block 1-1 to
have a peek inside the data.
shapefile.plot()
Code Block 1-2 Plotting the shapefile
You will obtain the map corresponding to this dataset as in Figure 1-8.
Figure 1-8 The map resulting from Code Block 1-2. Image by author Data source:
Ministry of DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f2c318a0971e8faedb7b5675d6,
updated on 1 July 2016. Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
import fiona
gpd.io.file.fiona.drvsupport.supported_drivers['KML']
= 'rw'
You’ll then see the exact same geodataframe as before, which is shown
in Figure 1-10.
Figure 1-10 The KML data shown in Python. Image by author Data source: Ministry of
DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f2c318a0971e8faedb7b5675d6,
updated on 1 July 2016. Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
As before, you can plot this geodataframe to obtain a basic map
containing the municipalities of the area of Paris and around. This is done
in Code Block 1-4.
kmlfile.plot()
Code Block 1-4 Plotting the KML file data
Figure 1-11 The plot resulting from Code Block 1-4. Screenshot by author Data source:
Ministry of DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f2c318a0971e8faedb7b5675d6,
updated on 1 July 2016. Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
GeoJSON
The json format is a data format that is well known and loved by
developers. Json is much used in communication between different
information systems, for example, in website and Internet
communication.
The json format is loved because it is very easy to parse, and this
makes it a perfect storage for open source and other developer-oriented
tools.
Json is a key-value dataset, which is much like the dictionary in
Python. The whole is surrounded by accolades. As an example, I could
write myself as a json object as in this example:
{ 'first_name': 'joos',
'last_name': 'korstanje',
'job': 'data scientist' }
As you can see, this is a very flexible format, and it is very easy to
adapt to all kinds of circumstances. You might easily add GPS coordinates
like this:
{ 'first_name': 'joos',
'last_name': 'korstanje',
'job': 'data scientist',
'latitude': '48.8566° N',
'longitude': '2.3522° E' }
As expected, the data looks exactly like before (Figure 1-13). This is
because it is transformed into a geodataframe, and therefore the original
representation as json is not maintained anymore.
Figure 1-13 The geojson content in Python. Image by author Data source: Ministry of
DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/8fadd7040c4b94f2c318a0971e8faedb7b5675d6,
updated on 1 July 2016. Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
You can make the plot of this geodataframe to obtain a map, using the
code in Code Block 1-6.
geojsonfile.plot()
Code Block 1-6 Plotting the geojson data
TIFF/JPEG/PNG
Image file types can also be used to store geodata. After all, many maps
are 2D images that lend themselves well to be stored as an image. Some of
the standard formats to store images are TIFF, JPEG, and PNG.
– The TIFF format is an uncompressed image. A georeferenced TIFF
image is called a GeoTIFF, and it consists of a directory with a TIFF file
and a tfw (world file).
– The better-known JPEG file type stores compressed image data. When
storing a JPEG in the same folder as a JPW (world file), it becomes a
GeoJPEG.
– The PNG format is another well-known image file format. You can make
this file into a GeoJPEG as well when using it together with a PWG
(world file).
Image file types are generally used to store raster data. For now,
consider that raster data is image-like (one value per pixel), whereas
vector data contains objects like lines, points, and polygons. We’ll get to
the differences between raster and vector data in a next chapter.
On the following website, you can download a GeoTIFF file that
contains an interpolated terrain model of Kerbernez in France:
https://geo.data.gouv.fr/en/datasets/b0a420b9e003
d45aaf0670446f0d600df14430cb
You can use the code in Code Block 1-7 to read and show the raster file
in Python.
Note Depending on your OS, you may obtain a .tiff file format rather
than a .tif when downloading the data. In this case, you can simply
change the path to become .tiff, and the result should be the same. In
both cases, you will obtain the image shown in Figure 1-15.
Figure 1-15 The plot resulting from Code Block 1-7. Image by author Data source:
Ministry of DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/b0a420b9e003d45aaf0670446f0d600df14430cb,
updated on “unknown.” Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
CSV/TXT/Excel
The same file as used in the first three examples is also available in CSV.
When downloading it and opening it with a text viewer, you will observe
something like Figure 1-16.
Figure 1-16 The contents of the CSV file. Image by author Data source: Ministry of
DINSIC. Original data downloaded from
https://geo.data.gouv.fr/en/datasets/b0a420b9e003d45aaf0670446f0d600df14430cb,
updated on “unknown.” Open Licence 2.0 (www.etalab.gouv.fr/wp-
content/uploads/2018/11/open-licence.pdf)
The important thing to take away from this part of the chapter is that
geodata is “just data,” but with geographic references. These can be stored
in different formats or in different coordinate systems to make things
complicated, but in the end you must simply make sure that you have
some sort of understanding of what you have in your data.
You can use many different tools for working with geodata. The goal of
those tools is generally to make your life easier. As a last step for this
introduction, let’s have a short introduction to the different Python tools
that you may encounter on your geodata journey.
Overview of Python Tools for Geodata
Here is a list of Python packages that you may want to look into on your
journey into geodata with Python:
Geopandas
General GIS tool with a pandas-like code syntax that makes it very
accessible for the data science world.
Fiona
Reading and writing geospatial data.
Rasterio
Python package for reading and writing raster data.
GDAL/OGR
A Python package that can be used for translating between different GIS
file formats.
RSGISLIB
A package containing remote sensing tools together with raster
processing and analysis.
PyProj
A package that can transform coordinates with multiple geographic
reference systems.
Geopy
Find postal addresses using coordinates or the inverse.
Shapely
Manipulation of planar geometric objects.
PySAL
Spatial analysis package in Python.
Scipy.spatial
Spatial algorithms based on the famous scipy package for data science.
Cartopy
Discovering Diverse Content Through
Random Scribd Documents
hyvissä ajoin herra T:n huvilassa, yksinäisyys. Sinä naurat? Eilen ja
toissa päivänä nousin vakavasti päättäneenä sinulle kirjoittaa ja —
melkein huomaamattani löydänkin itseni ulkona.
22. XI.
27. XI.
3. XII.
Padova, 7. XII.
Kello 2.
Padova, —
……………
……………
Padova, jouluk.
10. I.
19. I.
22. I.
17. III.
"— — — tapa,
kuin hänet
kadotin, mua
vielä loukkaa".