Full Download (Ebook) Graph Data Science with Neo4j: Learn how to use Neo4j 5 with Graph Data Science library 2.0 and its Python driver for your project by Scifo, Estelle ISBN 9781804612743, 180461274X, 9781804612763, 180461234X PDF DOCX
Full Download (Ebook) Graph Data Science with Neo4j: Learn how to use Neo4j 5 with Graph Data Science library 2.0 and its Python driver for your project by Scifo, Estelle ISBN 9781804612743, 180461274X, 9781804612763, 180461234X PDF DOCX
com
DOWLOAD EBOOK
https://ebooknice.com/product/graph-algorithms-for-data-science-with-
examples-in-neo4j-55836718
ebooknice.com
ebooknice.com
ebooknice.com
https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018
ebooknice.com
(Ebook) Graph Algorithms for Data Science (MEAP v7) by
Tomaž Bratani■
https://ebooknice.com/product/graph-algorithms-for-data-science-
meap-v7-50699820
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
BIRMINGHAM—MUMBAI
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the author, nor Packt
Publishing or its dealers and distributors, will be held liable for any damages
caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80461-274-3
www.packtpub.com
Contributors
Before that, she worked in several fields, starting out with research in particle
physics, during which she worked at CERN on uncovering Higgs boson properties.
She received her PhD in 2014 from the Laboratoire de l’Accélérateur Linéaire
(Orsay, France). Continuing her career in industry, she worked in real estate,
mobility, and logistics for almost 10 years. In the Neo4j community, she is known
as the creator of neomap, a map visualization application for data stored in Neo4j.
She also regularly gives talks at conferences such as NODES and PyCon. Her
domain expertise and deep insight into the perspective of a beginner’s needs make
her an excellent teacher.
There is only one name on the cover, but a book is not the work of one
person. I would like to thank everyone involved in making this book a reality.
Beyond everyone at Packt, the reviewers did an incredible job of suggesting
some very relevant improvements. Thank you, all!
I hope this book will inspire you as much as other books of this genre have
inspired me.
Sean William Grant is a product and analytics professional with over 20 years of
experience in technology and data analysis. His experience ranges from geospatial
intelligence with the United States Marine Corps, product management within the
aviation and autonomy space, to implementing advanced analytics and data
science within organizations. He is a graph data science and network analytics
enthusiast who frequently gives presentations and workshops on connected data.
He has also been a technical advisor to several early-stage start-ups. Sean is
passionate about data and technology, and how it can elevate our understanding
of ourselves.
Jose Ernesto Echeverria has worked with all kinds of databases, from relational
databases in the 1990s to non-SQL databases in the 2010s. He considers graph
databases to be the best fit for solving real-world problems, given their strong
capability for modeling and adaptability to change. As a polyglot programmer, he
has used languages such as Java, Ruby, and R and tools such as Jupyter with
Neo4j in order to solve data management problems for multinational corporations.
A long-time advocate of data science, he expects this long-awaited book to cover
the proper techniques and approach the intersections of this discipline, as well as
help readers to discover the possibilities of graph databases. When not working,
he enjoys spending time with friends and family.
Table of Contents
Preface
Part 1 – Creating Graph Data in Neo4j
Technical requirements
Importing CSV data into Neo4j with Cypher
Discovering the Netflix dataset
Defining the graph schema
Importing data
Introducing the APOC library to deal with JSON
data
Browsing the dataset
Getting to know and installing the APOC plugin
Loading data
Dealing with temporal data
Discovering the Wikidata public knowledge
graph
Data format
Query language – SPARQL
Enriching our graph with Wikidata information
Loading data into Neo4j for one person
Importing data for all people
Dealing with spatial data in Neo4j
Importing data in the cloud
Summary
Further reading
Exercises
Part 2 – Exploring and Characterizing Graph Data
with Neo4j
Technical requirements
Digging into the Neo4j GDS library
GDS content
Installing the GDS library with Neo4j Desktop
GDS project workflow
Projecting a graph for use by GDS
Native projections
Cypher projections
Computing a node’s degree with GDS
stream mode
The YIELD keyword
write mode
mutate mode
Algorithm configuration
Other centrality metrics
Understanding a graph’s structure by looking for
communities
Number of components
Modularity and the Louvain algorithm
Summary
Further reading
8
Building a GDS Pipeline for Node Classification Model
Training
Technical requirements
The GDS pipelines
What is a pipeline?
Building and training a pipeline
Creating the pipeline and choosing the features
Setting the pipeline configuration
Training the pipeline
Making predictions
Computing the confusion matrix
Using embedding features
Choosing the graph embedding algorithm to use
Training using Node2Vec
Training using GraphSAGE
Summary
Further reading
Exercise
10
Index
Among the different tools on the market to work with graphs, Neo4j, a graph
database, is popular among developers for its ability to build simple and evolving
data models and query data easily with Cypher. For a few years now, it has also
stood out as a leader in graph analytics, especially since the release of the first
version of its GDS library, allowing you to run graph algorithms from data stored in
Neo4j, even at a large scale.
This book is designed to guide you through the field of GDS, always using Neo4j
and its GDS library as the main tool. By the end of this book, you will be able to
run your own GDS model on a graph dataset you created. By the end of the book,
you will even be able to pass the Neo4j Data Science certification to prove your
new skills to the world.
Who this book is for
This book is for people who are curious about graphs and how this data structure
can be useful in data science. It can serve both data scientists who are learning
about graphs and Neo4j developers who want to get into data science.
The book assumes minimal data science knowledge (classification, training sets,
confusion matrices) and some experience with Python and its related data science
toolkit (pandas, matplotlib, and scikit-learn).
What this book covers
Chapter 1, Introducing and Installing Neo4j, introduces the basic principles of
graph databases and gives instructions on how to set up Neo4j locally, create your
first graph, and write your first Cypher queries.
Chapter 2, Using Existing Data to Build a Knowledge Graph, guides you through
loading data into Neo4j from different formats (CSV, JSON, and an HTTP API). This
is where you will build the dataset that will be used throughout this book.
Chapter 5, Visualizing Graph Data, delves into graph data visualization by drawing
nodes and edges, starting from static representations and moving on to dynamic
ones.
Chapter 6, Building a Machine Learning Model with Graph Features, talks about
machine learning model training using scikit-learn. This is where we will first use
the GDS Python client.
Chapter 9, Predicting Future Edges, gives a short introduction to the topic of link
prediction, a graph-specific machine learning task.
Chapter 10, Writing Your Custom Graph Algorithms with the Pregel API in Java,
covers the exciting topic of building an extension for the GDS plugin.
For the very last chapter, a Java JDK will also be required. The code was tested
with OpenJDK 11.
You will also need to install Neo4j plugins: APOC and GDS. Installation instructions
for Neo4j Desktop are given in the relevant chapters. However, if you are not using
a local Neo4j instance, please refer to the following pages for installation
instructions, especially regarding version compatibilities:
APOC: https://neo4j.com/docs/apoc/current/installation/
GDS: https://neo4j.com/docs/graph-data-science/current/installation/
If you are using the digital version of this book, we advise you to type
the code yourself or access the code from the book’s GitHub repository
(a link is available in the next section). Doing so will help you avoid any
potential errors related to the copying and pasting of code.
We also have other code bundles from our rich catalog of books and videos
available at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter
handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk
image file as another disk in your system.”
CREATE (:Movie {
id: line.show_id,
title: line.title,
releaseYear: line.release_year
}
When we wish to draw your attention to a particular part of a code block, the
relevant lines or items are set in bold:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see onscreen.
For instance, words in menus or dialog boxes appear in bold. Here is an example:
“Select System info from the Administration panel.”
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us
at customercare@packtpub.com and mention the book title in the subject of your
message.
Errata: Although we have taken every care to ensure the accuracy of our content,
mistakes do happen. If you have found a mistake in this book, we would be
grateful if you would report this to us. Please visit
www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the
internet, we would be grateful if you would provide us with the location address or
website name. Please contact us at copyright@packt.com with a link to the
material.
If you are interested in becoming an author: If there is a topic that you have
expertise in and you are interested in either writing or contributing to a book,
please visit authors.packtpub.com.
Your review is important to us and the tech community and will help us make sure
we’re delivering excellent quality content.
Do you like to read on the go but are unable to carry your print books
everywhere? Is your eBook purchase not compatible with the device of your
choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that
book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your
favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters,
and great free content in your inbox daily
https://packt.link/free-ebook/9781804612743
2. Submit your proof of purchase
3. That’s it! We’ll send your free PDF and other benefits to your email directly
Part 1 – Creating Graph Data in Neo4j
In this first part, you will learn about Neo4j and set up your first graph database.
You will also build a graph dataset in Neo4j using Cypher, the APOC library, and
public knowledge graphs.
Fig. 513.—Hedysarum
coronarium.
10. Dalbergieæ. 25 genera; especially in Tropical America; the majority are
trees, a few shrubs or lianes; the leaves are simple or imparipinnate. The fruit is
indehiscent in all; in some it is a winged, in others a wingless nut (Machærium,
Dalbergia, Centrolobium, etc.), in others, again, a drupe, e.g. in Dipteryx (Tonquin-
bean) and Andira. In some genera the embryo is straight.
Pollination. Especially effected by Bees. The nectar is secreted by a ring or
disc-like portion round the base of the gynœceum or the inner surface of the
receptacle. The flower is constructed with a peculiar mechanism to ensure cross-
pollination by insects. The pollen is shed just before the flower opens, and is
retained in a pouch formed by the keel. An insect visiting the flower uses the wings
and keel for a landing-stage, and in attempting to reach the honey presses down
the wings and the keel which are locked together near the standard; the stylar-
brush by this means is forced through the apical opening of the keel and a little
pollen is thus swept out and deposited upon the abdomen of the visiting insect as
it presses against the apex of the keel; the insect thus carries away pollen and
may effect cross-pollination. In the different flowers this arrangement is modified in
various ways to promote pollination. 5000 species (319 genera); especially in the
Tropics, where many are important forest trees.—The following plants are used
for food: Pisum sativum (W. Asia?) and arvense (Italy); Phaseolus vulgaris
(Kidney-bean, American; Dolichos sinensis was known to the Greeks and Romans
under the name “φασηλος,” “phaseolus”), P. compressus (French-bean), etc.; Faba
vulgaris (Field-bean, Horse-bean; from the Old World); Ervum lens (Lentil, Eastern
Mediterranean); in tropical countries the oil-containing seeds of Arachis hypogæa.
—The following are fodder plants: Vicia sativa, Faba vulgaris, Onobrychis sativa
(Sainfoin), Medicago sativa (Lucerne), and lupulina (Medick), species of Trifolium,
Hedysarum coronarium. Officinal: “Liquorice root,” from Glycyrrhiza glabra (S.
Europe); “Red Sandalwood,” from Pterocarpus santalinus (Tropical E. Asia); Gum
Tragacanth, from Astragalus-species (E. Mediterranean); Balsam of Peru, from
Toluifera pereiræ, and Balsam of Tolu, from Toluifera balsamum. Calabar-beans,
from Physostigma venenosum; Kino, from Pterocarpus marsupium; the pith of
Andira araroba is used under the name of “Chrysarobin.”—Of use technically:
Genista tinctoria (yellow dye) and Indigofera-species (Indigo), the bast of
Crotalaria juncea (Sunn Hemp); the seeds of Dipteryx, which contain Coumarin,
and are highly scented, and Balsam of Myroxylon. Poisonous: the seeds of
Laburnum (Cytisus laburnum), various species of Lathyrus, and Abrus precatorius;
the latter contain two poisonous proteids, paraglobulin and albumose, which
resemble snake-poison in their effects. The following are ornamental plants:
Phaseolus multiflorus (Scarlet runner, from America), Robinia pseudacacia,
Amorpha, Colutea, Coronilla, Indigofera dosua, Wistaria polystachya, Cytisus
laburnum (Laburnum, S. Europe, Orient.) and other species.
Order 3. Mimosaceæ. The flowers are most frequently
hypogynous and regular, the æstivation of the corolla is valvate and,
in the majority of instances, that of the calyx also. The flower is 4-
merous, less frequently 5- or 3-merous.—The flowers are generally
small, but are always borne in compact, round capitula or spikes
(Fig. 514); they are hypogynous or perigynous. The calyx is
generally gamosepalous and the corolla gamopetalous, the latter
being frequently wanting. The stamens are equal or double the
number of the petals (Mimosa, etc., in M. pudica, e.g. S4, P4, A4,
G1) or (in Acacia, Inga, etc.) in a large, indefinite number, free or
monadelphous, often united to the corolla (Fig. 514 b). The colour of
the flower in most cases is due to the long and numerous stamens.
The fruit is various. The embryo is straight as in the Cæsalpiniaceæ.
Entada and many species of Mimosa have a flat, straight, or
somewhat sickle-like pod, which resembles the siliqua of the
Cruciferæ in that the sutures (in this instance, however, dorsal and
ventral suture) persist as a frame, but the intermediate portion
divides, as in the transversely divided siliqua, into as many nut-like
portions as there are seeds. Some species have a pod of enormous
dimensions. The seeds of Entada gigalobium are often carried from
the West Indies to the N. W. coasts of Europe by the Gulf Stream.—
The fruit of Acacia in some species is an ordinary pod, in others it is
transversely divided, or remains an undivided fruit, a nut.—This
order includes both trees and herbaceous plants, which are often
thorny; the leaves are usually bipinnate (Fig. 514) and are sensitive,
and also possess sleep-movements.—Many Australian Acacias
have compound leaves only when young, but when old have
phyllodia, i.e. leaf-like petioles without blades, placed vertically. A
large number have thorny stipules, which in some (Acacia
sphærocephala) attain an enormous size, and serve as a home for
ants, which in return protect their host-plant against the attacks of
other, leaf-cutting ants.
Fig. 514.—Acacia farnesiana: a inflorescence; b flower.
Other genera besides those mentioned are: Adenanthera,
Desmanthus, Parkia, Inga (with rather fleshy, indehiscent fruit),
Calliandra, etc.
1350 species (30 genera); none natives of Europe, their home being the Tropics
and sub-tropical regions, especially Australia and Africa.—Fossils in Tertiary.—
Gums are found in many species of Acacia, especially the African (Gum arabic)
and Australian, of which some are officinal. The bark, and also the fruits, contain a
large amount of tannic acid and are used as astringents and in tanning (“Bablah” is
the fruits of several species of Acacia). Catechu is a valuable tanning material
extracted from the wood of Acacia catechu (E. Ind). The flowers of Acacia
farnesiana (Fig. 514) are used in the manufacture of perfumes. With us they are
cultivated as ornamental plants, e.g. A. lophantha and many others, in
conservatories.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com