Graph Databases Ian Robinson 2024 scribd download
Graph Databases Ian Robinson 2024 scribd download
com
https://ebookname.com/product/graph-databases-ian-robinson/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookname.com/product/genomes-browsers-and-databases-data-
mining-tools-for-integrated-genomic-databases-1st-edition-peter-
schattner/
ebookname.com
https://ebookname.com/product/graph-algorithms-2nd-edition-shimon-
even-2/
ebookname.com
https://ebookname.com/product/graph-algorithms-2nd-edition-shimon-
even/
ebookname.com
https://ebookname.com/product/a-passion-to-preserve-gay-men-as-
keepers-of-culture-1st-edition-will-fellows/
ebookname.com
Fira and the Full Moon First Edition Gail Herman
https://ebookname.com/product/fira-and-the-full-moon-first-edition-
gail-herman/
ebookname.com
https://ebookname.com/product/tendon-transfers-in-reconstructive-hand-
surgery-1st-edition-jan-friden/
ebookname.com
https://ebookname.com/product/the-5-second-rule-1st-edition-mel-
robbins/
ebookname.com
https://ebookname.com/product/minerva-s-night-out-philosophy-pop-
culture-and-moving-pictures-1st-edition-noel-carroll/
ebookname.com
Graph Databases
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly
Media, Inc. !!FILL THIS IN!! and related trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐
mark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information contained
herein.
ISBN: 978-1-449-35626-2
[?]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book 2
What is a Graph? 2
A High Level View of the Graph Space 5
Graph Databases 6
Graph Compute Engines 8
The Power of Graph Databases 10
Performance 10
Flexibility 10
Agility 11
Summary 11
iii
A Comparison of Relational and Graph Modeling 30
Relational Modeling in a Systems Management Domain 31
Graph Modeling in a Systems Management Domain 34
Testing the Model 36
Cross-Domain Models 37
Creating the Shakespeare Graph 40
Beginning a Query 42
Declaring Information Patterns to Find 42
Constraining Matches 44
Processing Results 45
Query Chaining 46
Common Modeling Pitfalls 46
Email Provenance Problem Domain 47
A Sensible First Iteration? 47
Second Time’s the Charm 49
Evolving the Domain 51
Avoiding Anti-Patterns 54
Summary 55
iv | Table of Contents
Why Organizations Choose Graph Databases 93
Common Use Cases 94
Social 94
Recommendations 95
Geo 96
Master Data Management 96
Network and Data Center Management 97
Authorization and Access Control (Communications) 98
Real-World Examples 99
Social Recommendations (Professional Social Network) 99
Authorization and Access Control 107
Geo (Logistics) 113
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Table of Contents | v
Preface
vii
do not need to contact us for permission unless you’re reproducing a significant portion
of the code. For example, writing a program that uses several chunks of code from this
book does not require permission. Selling or distributing a CD-ROM of examples from
O’Reilly books does require permission. Answering a question by citing this book and
quoting example code does not require permission. Incorporating a significant amount
of example code from this book into your product’s documentation does require per‐
mission.
We appreciate, but do not require, attribution. An attribution usually includes the title,
author, publisher, and ISBN. For example: “Book Title by Some Author (O’Reilly).
Copyright 2012 Some Copyright Holder, 978-0-596-xxxx-x.”
If you feel your use of code examples falls outside fair use or the permission given above,
feel free to contact us at permissions@oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
viii | Preface
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://www.oreilly.com/catalog/<catalog page>.
To comment or ask technical questions about this book, send email to bookques
tions@oreilly.com.
For more information about our books, courses, conferences, and news, see our website
at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
Preface | ix
CHAPTER 1
Introduction
Graph databases address one of the great macroscopic business trends of today: lever‐
aging complex and dynamic relationships in highly-connected data to generate insight
and competitive advantage. Whether we want to understand relationships between
customers, elements in a telephone or datacenter network, entertainment producers
and consumers, or genes and proteins, the ability to understand and analyze vast graphs
of highly-connected data will be key in determining which companies outperform their
competitors over the coming decade.
For data of any significant size or value, graph databases are the best way to represent
and query connected data. Connected data is data whose interpretation and value re‐
quires us first to understand the ways in which its constituent elements are related. More
often than not, to generate this understanding, we need to name and qualify the con‐
nections between things.
While large corporates realized this some time ago, creating their own proprietary graph
processing technologies, we’re now in an era where that technology has rapidly become
democratized. Today, general-purpose graph databases are a reality, allowing main‐
stream users to experience the benefits of connected data without having to invest in
building their own graph infrastructure.
What’s remarkable about this renaissance of graph data and graph thinking is that graph
theory itself is not new. Graph theory was pioneered by Euler in the 18th century, and
has been actively researched and improved by mathematicians, sociologists, anthro‐
pologists, and others ever since. However, it is only in the last few years that graph theory
and graph thinking have been applied to information management. In that time, graph
databases have helped solve important problems in the areas of social networking, mas‐
ter data management, geospatial, recommendations, and more. This increased focus on
graph is driven by twin forces: by the massive commercial successes of companies such
as Facebook, Google, and Twitter, all of whom have centered their business models
1
around their own proprietary graph technologies; and by the introduction of general
purpose graph databases into the technology landscape.
What is a Graph?
Formally a graph is just a collection of vertices and edges--or, in less intimidating lan‐
guage, a set of nodes and the relationships that connect them. Graphs represent entities
as nodes and the ways in which those entities relate to the world as relationships. This
general-purpose, expressive structure allows us to model all kinds of scenarios, from
1. http://www.gartner.com/id=2081316
2. For introductions to graph theory, see Richard J. Trudeau, Introduction To Graph Theory (Dover, 1993) and
Gary Chartrand, Introductory Graph Theory (Dover, 1985). For an excellent introduction to how graphs
provide insight into complex events and behaviors, see David Easley and Jon Kleinberg, Networks, Crowds,
and Markets: Reasoning about a Highly Connected World (Cambridge University Press, 2010)
2 | Chapter 1: Introduction
the construction of a space rocket, to a system of roads, and from the supply-chain or
provenance of foodstuff, to medical history for populations, and beyond.
For example, Twitter’s data is easily represented as a graph. In Figure 1-1 we see a small
network of followers. The relationships are key here in establishing the semantic context:
namely, that Billy follows Harry, and that Harry, in turn, follows Billy. Ruth and Harry
likewise follow each other, but sadly, while Ruth follows Billy, Billy hasn’t (yet) recip‐
rocated.
Of course, Twitter’s real graph is hundreds of millions of times larger than the example
in Figure 1-1, but it works on precisely the same principles. In Figure 1-2 we’ve expanded
the graph to include the messages published by Ruth.
What is a Graph? | 3
Figure 1-2. Publishing messages
Though simple, Figure 1-2 shows the expressive power of the graph model. It’s easy to
see that Ruth has published a string of messages. The most recent message can be found
4 | Chapter 1: Introduction
by following a relationship marked CURRENT; PREVIOUS relationships then create a time‐
line of posts.
Most people find the property graph model intuitive and easy to understand. While
simple, it can be used to describe the overwhelming majority of graph use cases in ways
that yield useful insights into our data.
Graph Databases
A graph database management system (henceforth, a graph database) is an online da‐
tabase management system with Create, Read, Update and Delete methods that expose
a graph data model. Graph databases are generally built for use with transactional
(OLTP) systems. Accordingly, they are normally optimized for transactional perfor‐
mance, and engineered with transactional integrity and operational availability in mind.
There are two properties of graph databases you should consider when investigating
graph database technologies:
1. The underlying storage. Some graph databases use native graph storage that is op‐
timized and designed for storing and managing graphs. Not all graph database
technologies use native graph storage however. Some serialize the graph data into
a relational database, an object-oriented database, or some other general-purpose
data store.
2. The processing engine. Some definitions require that a graph database use index-
free adjacency, meaning that connected nodes physically “point” to each other in
the database.4 Here we take a slightly broader view: any database that from the user’s
perspective behaves like a graph database, i.e. exposes a graph data model through
CRUD operations, qualifies as a graph database. We do acknowledge however the
significant performance advantages of index-free adjacency, and therefore use the
term native graph processing to describe graph databases that leverage index-free
adjacency.
4. See Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” 2010 (http://arxiv.org/abs/1004.1001)
6 | Chapter 1: Introduction
It’s important to note that native graph storage and native graph pro‐
cessing are neither good nor bad—they’re simply classic engineering
tradeoffs. The benefit of native graph storage is that its purpose-built
stack is engineered for performance and scalability. The benefit of non-
native graph storage, in contrast, is that it typically depends on a mature
non-graph backend (such as MySQL) whose production characteristics
are well understood by operations teams. Native graph processing
(index-free adjacency) benefits traversal performance, but at the ex‐
pense of making some non-traversal queries difficult or memory in‐
tensive.
Relationships are first-class citizens of the graph data model, unlike other database
management systems, which require us to infer connections between entities using
contrived properties such as foreign keys, or out-of-band processing like map-reduce.
By assembling the simple abstractions of nodes and relationships into connected struc‐
tures, graph databases allow us to build arbitrarily sophisticated models that map closely
to our problem domain. The resulting models are simpler and at the same time more
expressive than those produced using traditional relational databases and the other
NOSQL stores.
Figure 1-3 shows a pictorial overview of some of the graph databases on the market
today based on their storage and processing models:
8 | Chapter 1: Introduction
Figure 1-4. A high level view of a typical graph compute engine deployment
A variety of different types of graph compute engines exist. Most notably there are in-
memory/single machine graph compute engines like Cassovary, and distributed graph
compute engines like Pegasus or Giraph. Most distributed graph compute engines are
based on the Pregel white paper, authored by Google, which describes the graph com‐
pute engine Google uses to rank pages.5
Performance
One compelling reason, then, for choosing a graph database is the sheer performance
increase when dealing with connected data versus relational databases and NOSQL
stores. In contrast to relational databases, where join-intensive query performance de‐
teriorates as the dataset gets bigger, with a graph database performance tends to remain
relatively constant, even as the dataset grows. This is because queries are localized to a
portion of the graph. As a result, the execution time for each query is proportional only
to the size of the part of the graph traversed to satisfy that query, rather than the size of
the overall graph.
Flexibility
As developers and data architects we want to connect data as the domain dictates,
thereby allowing structure and schema to emerge in tandem with our growing under‐
standing of the problem space, rather than being imposed upfront, when we know least
about the real shape and intricacies of the data. Graph databases address this want
directly. As we show in Chapter 3, the graph data model expresses and accommodates
the business’ needs in a way that enables IT to move at the speed of business.
Graphs are naturally additive, meaning we can add new kinds of relationships, new
nodes, and new subgraphs to an existing structure without disturbing existing queries
and application functionality. These things have generally positive implications for de‐
veloper productivity and project risk. Because of the graph model’s flexibility, we don’t
have to model our domain in exhaustive detail ahead of time—a practice which is all
but foolhardy in the face of changing business requirements. The additive nature of
graphs also means we tend to perform fewer migrations, thereby reducing maintenance
overhead and risk.
10 | Chapter 1: Introduction
Agility
We want to be able to evolve our data model in step with the rest of our application,
using a technology aligned with today’s incremental and iterative software delivery
practices. Modern graph databases equip us to perform frictionless development and
graceful systems maintenance. In particular, the schema-free nature of the graph data
model, coupled with the testable nature of a graph database’s API and query language,
empower us to evolve an application in a controlled manner.
Graph users cannot rely on fixed schemas to provide some level of governance at the
level of the database. But this is not a risk; rather it presents an opportunity to implement
more visible, actionable governance. As we show in Chapter 4, governance is typically
applied in a programmatic fashion, using tests to drive out the data model and queries,
as well as assert the business rules that depend upon the graph. This is no longer a
controversial practice: more so than relational development, graph database develop‐
ment aligns well with today’s agile and test-driven software development practices, al‐
lowing graph database-backed applications to evolve in step with changing business
environment.
Summary
In this chapter we’ve defined connected data and reviewed the graph property model,
a simple yet expressive tool for representing connected data. Property graphs capture
complex domains in an expressive and flexible fashion, while graph databases make it
easy to develop applications that manipulate our graph models.
In the next chapter we’ll look in more detail at how several different technologies address
the challenge of connected data, starting with relational databases, moving onto aggre‐
gate NOSQL stores, and ending with graph databases. In the course of the discussion,
we’ll see why graphs and graph databases provide the best means for modeling, storing
and querying connected data. Later chapters then go on to show how to design and
implement a graph database-based solution.
Summary | 11
CHAPTER 2
Options for Storing Connected Data
We live in a connected world. To thrive and progress, we need to understand and in‐
fluence the web of connections that surrounds us.
How do today’s technologies deal with the challenge of connected data? In this chapter
we look at how relational databases and aggregate NOSQL stores manage graphs and
connected data, and compare their performance to that of a graph database.1
1. For readers interested in exploring the topic of NOSQL, Appendix A describes the four major types of NOSQL
databases
13
Figure 2-1 shows a relational schema for storing customer orders in a customer-centric,
transactional application.
The application exerts a tremendous influence over the design of this schema, making
some queries very easy, others more difficult:
• Join tables add accidental complexity; they mix business data with foreign key met‐
adata.
• Foreign key constraints add additional development and maintenance overhead
just to make the database work.
• Sparse tables with nullable columns require special checking in code, despite the
presence of a schema.
• Several expensive joins are needed just to discover what a customer bought.
• Reciprocal queries are even more costly. “What products did a customer buy?” is
relatively cheap compared to “which customers bought this product?”, which is the
basis of recommendation systems. We could introduce an index, but even with an
index, recursive questions such as “which customers bought this product who also
bought that product?” quickly become prohibitively expensive as the degree of re‐
cursion increases.
78
79
Once upon a time there was a man who lived in a dark
hut under a willow tree. His face, and his wife’s face,
and the faces of their six black-haired children, were as
dark and gnarled as the willow trunk. But when their
seventh son was born, he was a light-haired boy, with
clear blue eyes, and a smile like golden sunshine.
“This is not our child!” cried the black-eyed man and the
black-eyed woman; “this yellow-haired baby is a
changeling; the dwarfs have put him into the cradle!” So
they called him Peter Dwarf. They were very unkind to
him, and when he grew older they made him do hard,
ugly work, like picking nettles and killing lambs. Peter
liked to work, but he did not at all like to kill poor little
lambs.
80
One day it happened that the cat got into the larder and
ate a big piece of meat. The black-eyed woman took her
by the tail and flung her out of the window at Peter
Dwarf, telling him that he must get rid of her at once.
But when he had the lovely white cat in his arms, she
looked at him so pleadingly that tears came into his
eyes, and he said: “Minka, I cannot hurt you! But if I
don’t obey, my father and mother will be very angry.’”
But the cat still looked at him so sorrowfully that he
said: “Minka, let us both run away. You shall not be
harmed.”
The little men all muttered and grunted; they did not
look unkind.
85
“Because I have blue eyes and yellow hair,” Peter
replied, “I was so different from my brothers, and so
ugly that my mother said I was not her own son, but a
fairy changeling whom the dwarfs have put into the
cradle.”
“Work?” said the dwarf with the hammer, “how can such
a slight and princely creature work? Peter, let me see
your hands.” He felt Peter’s hands; they were thin and
strong and callous. “Yes,” he said, “this boy knows what
it is to work, I think we had better let him stay with us.
And now, Peter, since you are coming with us, let us
have a general introduction. My name is Stroke,” and he
bowed as best he could over his round stomach. “I am a
Swordsmith, and he with the pick-axe is a Miner,
Mushroom by name; he of the pointed ears is Berry, the
Blacksmith; and those three who are talking to the Lady
Minka, are Hump, the Goldsmith, Crow the
Coppersmith, and Wisely, he that jingles the keys—a
Locksmith.”
87
Peter got up and bowed to the little men. They told him
to follow, then they led the way through winding
passages down to the very center of the earth.
“Now, if you will truly learn the trades,” they said, “you 88
must work with each one of us for a year. You shall be
given plenty to eat, and shall sleep beside the fire.”
The next year Peter worked for Thorn, the Smelter, and
his face became a ruddy brown from standing over the
roaring furnaces; then he learned from Berry, the
Blacksmith, how to make hammers and axes and other
tools; and the next year he helped Stroke to fashion
swords and armor. He made gold chains and brooches
and rings with Hump, and keys with Wisely, the
Locksmith. Before the seventh year was over, there was
not a lock in all Christendom which Peter could not
open.
All the little dwarfs waved their caps and their big brown
hands, as Peter and Minka went back to the sunny
upper Earth, which they had not seen for seven years.
93
Peter took off his leather apron and his red cap, and put
on a doublet and hose of light blue silk, and a mantle of
dark blue velvet. But happy as he was in his rich attire,
he did not forget about the king who was so ill. Every
time he met somebody who might know, he asked:
96
“Come,” said Peter to his cat, “I know what we must do; 101
but it must be quick work! Oh Minka—one more dawn,
and it will be too late!”
He went into a cavern at the foot of the mountain. Here
he called loudly down the dark passage way—“O
Mushroom, Thorn, Stroke, Wisely! Help me—help me—
help me!” And in another minute he saw little lights
approaching from all parts of the mountain, as the
faithful Diggerfolk came to his call.
106
107
Once upon a time in distant Orient Land, there lived a
beautiful royal maiden, the Princess Zarashne. Her hair
was long and black as the veils of night, her arms were
fair as the pink sea-shells, and her gowns were as
glorious in color as the feathers of the peacock.
Everybody in the palace where she lived, and in the
town outside the palace-walls, loved Princess Zarashne;
even the animals who dwelt far away in the desert had
heard how good and beautiful she was, and many of
them came to town hoping they might see her—the lion
and the tiger, and the striped zebra, the ostrich and the
camel and the curious giraffe. But the keeper of the
palace-gate was afraid to let them in, so the only ones
who saw her were the ostrich and the giraffe, because
they could look over the wall and watch her feed her
gold-fish in the fountain.
108
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com