PDF SQL and NoSQL Databases: Modeling, Languages, Security and Architectures for Big Data Management Michael Kaufmann download
PDF SQL and NoSQL Databases: Modeling, Languages, Security and Architectures for Big Data Management Michael Kaufmann download
com
https://ebookmeta.com/product/sql-and-nosql-databases-
modeling-languages-security-and-architectures-for-big-data-
management-michael-kaufmann/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/python-data-persistence-with-sql-and-
nosql-databases-1st-edition-lathkar/
ebookmeta.com
https://ebookmeta.com/product/nosql-and-sql-data-modeling-bringing-
together-data-semantics-and-software-first-edition-hills/
ebookmeta.com
https://ebookmeta.com/product/transplantation-imaging-ghaneh-
fananapazir-editor-ramit-lamba-editor/
ebookmeta.com
The Indonesian economy since 1965 a case study of
political economy Ingrid Palmer
https://ebookmeta.com/product/the-indonesian-economy-
since-1965-a-case-study-of-political-economy-ingrid-palmer/
ebookmeta.com
https://ebookmeta.com/product/insight-guides-israel-10th-edition-
insight-guides/
ebookmeta.com
https://ebookmeta.com/product/competing-interest-groups-and-lobbying-
in-the-construction-of-the-european-banking-union-giuseppe-montalbano/
ebookmeta.com
Triple Play for the Single Mom 1st Edition Rebel Bloom
https://ebookmeta.com/product/triple-play-for-the-single-mom-1st-
edition-rebel-bloom/
ebookmeta.com
https://ebookmeta.com/product/in-your-face-law-justice-and-niqab-
wearing-women-in-canada-1st-edition-natasha-bakht/
ebookmeta.com
Michael Kaufmann
Andreas Meier
Second Edition
Michael Kaufmann Andreas Meier
Informatik Institute of Informatics
Hochschule Luzern Universität Fribourg
Rotkreuz, Switzerland Fribourg, Switzerland
# The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
The first edition of this book was published by Springer Vieweg in 2019
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
The term database has long since become part of people’s everyday vocabulary, for
managers and clerks as well as students of most subjects. They use it to describe a
logically organized collection of electronically stored data that can be directly
searched and viewed. However, they are generally more than happy to leave the
whys and hows of its inner workings to the experts.
Users of databases are rarely aware of the immaterial and concrete business
values contained in any individual database. This applies as much to a car importer’s
spare parts inventory as the IT solution containing all customer depots at a bank or
the patient information system of a hospital. Yet failure of these systems, or even
cumulative errors, can threaten the very existence of the respective company or
institution. For that reason, it is important for a much larger audience than just the
“database specialists” to be well-informed about what is going on. Anyone involved
with databases should understand what these tools are effectively able to do and
which conditions must be created and maintained for them to do so.
Probably the most important aspect concerning databases involves (a) the dis-
tinction between their administration and the data stored in them (user data) and
(b) the economic magnitude of these two areas. Database administration consists of
various technical and administrative factors, from computers, database systems, and
additional storage to the experts setting up and maintaining all these components—
the aforementioned database specialists. It is crucial to keep in mind that the
administration is by far the smaller part of standard database operation, constituting
only about a quarter of the entire efforts.
Most of the work and expenses concerning databases lie in gathering,
maintaining, and utilizing the user data. This includes the labor costs for all
employees who enter data into the database, revise it, retrieve information from
the database, or create files using this information. In the above examples, this means
warehouse employees, bank tellers, or hospital personnel in a wide variety of
fields—usually for several years.
In order to be able to properly evaluate the importance of the tasks connected with
data maintenance and utilization on the one hand and database administration on the
other hand, it is vital to understand and internalize this difference in the effort
required for each of them. Database administration starts with the design of the
database, which already touches on many specialized topics such as determining the
v
vi Foreword
consistency checks for data manipulation or regulating data redundancies, which are
as undesirable on the logical level as they are essential on the storage level. The
development of database solutions is always targeted on their later use, so
ill-considered decisions in the development process may have a permanent impact
on everyday operations. Finding ideal solutions, such as the golden mean between
too strict and too flexible when determining consistency conditions, may require
some experience. Unduly strict conditions will interfere with regular operations,
while excessively lax rules will entail a need for repeated expensive data repairs.
To avoid such issues, it is invaluable for anyone concerned with database
development and operation, whether in management or as a database specialist, to
gain systematic insight into this field of computer sciences. The table of contents
gives an overview of the wide variety of topics covered in this book. The title already
shows that, in addition to an in-depth explanation of the field of conventional
databases (relational model, SQL), the book also provides highly educational infor-
mation about current advancements and related fields, the keywords being NoSQL
and Big Data. I am confident that the newest edition of this book will once again be
well-received by both students and professionals—its authors are quite familiar with
both groups.
It is remarkable how stable some concepts are in the field of databases. Information
technology is generally known to be subject to rapid development, bringing forth
new technologies at an unbelievable pace. However, this is only superficially the
case. Many aspects of computer science do not essentially change. This includes not
only the basics, such as the functional principles of universal computing machines,
processors, compilers, operating systems, databases and information systems, and
distributed systems, but also computer language technologies such as C, TCP/IP, or
HTML that are decades old but in many ways provide a stable fundament of the
global, earth-spanning information system known as the World Wide Web. Like-
wise, the SQL language (Structured Query Language) has been in use for almost five
decades and will remain so in the foreseeable future. The theory of relational
database systems was initiated in the 1970s by Codd (relation model) and
Chamberlin and Boyce (SEQUEL). However, these technologies have a major
impact on the practice of data management today. Especially, with the Big Data
revolution and the widespread use of data science methods for decision support,
relational databases and the use of SQL for data analysis are actually becoming more
important. Even though sophisticated statistics and machine learning are enhancing
the possibilities for knowledge extraction from data, many if not most data analyses
for decision support rely on descriptive statistics using SQL for grouped aggrega-
tion. SQL is also used in the field of Big Data with MapReduce technology. In this
sense, although SQL database technology is quite mature, it is more relevant today
than ever.
Nevertheless, the developments in the Big Data ecosystem brought new
technologies into the world of databases, to which we pay enough attention too.
Non-relational database technologies, which find more and more fields of applica-
tion under the generic term NoSQL, differ not only superficially from the classical
relational databases but also in the underlying principles. Relational databases were
developed in the twentieth century with the purpose of tightly organized, operational
forms of data management, which provided stability but limited flexibility. In
contrast, the NoSQL database movement emerged in the beginning of the new
century, focusing on horizontal partitioning, schema flexibility, and index-free
neighborhood with the goal of solving the Big Data problems of volume, variety,
and velocity, especially in Web-scale data systems. This has far-reaching
vii
viii Preface
1 Database Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Information Systems and Databases . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 SQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Structured Query Language SQL . . . . . . . . . . . . . . . . . . . 6
1.2.3 Relational Database Management System . . . . . . . . . . . . . 8
1.3 Big Data and NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 NoSQL Database Management System . . . . . . . . . . . . . . . 12
1.4 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Graph-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Graph Query Language Cypher . . . . . . . . . . . . . . . . . . . . 15
1.5 Document Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.1 Document Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.2 Document-Oriented Database Language MQL . . . . . . . . . . 19
1.6 Organization of Data Management . . . . . . . . . . . . . . . . . . . . . . . . 21
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Database Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 From Requirements Analysis to Database . . . . . . . . . . . . . . . . . . . 25
2.2 The Entity-Relationship Model . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Entities and Relationships . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Associations and Association Types . . . . . . . . . . . . . . . . . 29
2.2.3 Generalization and Aggregation . . . . . . . . . . . . . . . . . . . . 32
2.3 Implementation in the Relational Model . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Dependencies and Normal Forms . . . . . . . . . . . . . . . . . . . 35
2.3.2 Mapping Rules for Relational Databases . . . . . . . . . . . . . . 42
2.4 Implementation in the Graph Model . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.1 Graph Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2 Mapping Rules for Graph Databases . . . . . . . . . . . . . . . . . 51
2.5 Implementation in the Document Model . . . . . . . . . . . . . . . . . . . . 55
2.5.1 Document-Oriented Database Modeling . . . . . . . . . . . . . . 55
xi
xii Contents
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Database Management
1
The evolution from the industrial society via the service society to the information
and knowledge society is represented by the assessment of information as a factor in
production. The following characteristics distinguish information from material
goods:
These properties clearly show that digital goods (information, software, multime-
dia, etc.), i.e., data, are vastly different from material goods in both handling and
economic or legal evaluation. A good example is the loss in value that physical
products often experience when they are used—the shared use of information, on the
other hand, may increase its value. Another difference lies in the potentially high
production costs for material goods, while information can be multiplied easily and
at significantly lower costs (only computing power and storage medium). This
causes difficulties in determining property rights and ownership, even though digital
watermarks and other privacy and security measures are available.
Information System
Communication
Database System Application Software network
or WWW
User
User guidance
Database Dialog design Request
Management
Business logic
Data querying Response
Data manipulation
Database Access permissions
Storage Data protection
One of the simplest and most intuitive ways to collect and present data is in a table.
Most tabular data sets can be read and understood without additional explanations.
To collect information about employees, a table structure as shown in Fig. 1.2 can
be used. The all-capitalized table name EMPLOYEE refers to the entire table, while
the individual columns are given the desired attribute names as headers, for example,
the employee number “E#,” the employee’s name “Name,” and their city of resi-
dence “City.”
An attribute assigns a specific data value from a predefined value range called
domain as a property to each entry in the table. In the EMPLOYEE table, the
attribute E# allows to uniquely identify individual employees, making it the key of
the table. To mark key attributes more clearly, they will be written in italics in the
table headers throughout this book.1 The attribute City is used to label the respective
1
Some major works of database literature mark key attributes by underlining.
4 1 Database Management
Table name
EMPLOYEE Attribute
E# Name City
Key attribute
E# Name City
E19 Stewart Stow
E4 Bell Kent
E1 Murphy Kent
E7 Howard Cleveland
places of residence and the attribute Name for the names of the respective employees
(Fig. 1.3).
The required information of the employees can now easily be entered row by row.
In the columns, values may appear more than once. In our example, Kent is listed as
the place of residence of two employees. This is an important fact, telling us that both
employee Murphy and employee Bell are living in Kent. In our EMPLOYEE table,
not only cities but also employee names may exist multiple times. For that reason,
the aforementioned key attribute E# is required to uniquely identify each employee
in the table.
1.2 SQL Databases 5
Identification Key
An identification key or just key of a table is one attribute or a minimal combination
of attributes whose values uniquely identify the records (called rows or tuples)
within the table. If there are multiple keys, one of them can be chosen as the primary
key. This short definition lets us infer two important properties of keys:
• Uniqueness: Each key value uniquely identifies one record within the table, i.e.,
different tuples must not have identical keys.
• Minimality: If the key is a combination of attributes, this combination must be
minimal, i.e., no attribute can be removed from the combination without
eliminating the unique identification.
Table Definition
To summarize, a table is a set of rows presented in tabular form. The data records
stored in the table rows, also called tuples, establish a relation between singular data
values. According to this definition, the relational model considers each table as a set
of unordered tuples. Tables in this sense meet the following requirements:
EMPLOYEE
E# Name City
E19 Stewart Stow
E4 Bell Kent
E1 Murphy Kent
E7 Howard Cleveland
Example query:
“Select the names of the employees living in Kent.”
As explained, the relational model presents information in tabular form, where each
table is a set of tuples (or records) of the same type. Seeing all the data as sets makes
it possible to offer query and manipulation options based on sets.
The result of a selective operation, for example, is a set, i.e., each search result is
returned by the database management system as a table. If no tuples of the scanned
table show the respective properties, the user gets a blank result table. Manipulation
operations similarly target sets and affect an entire table or individual table sections.
The primary query and data manipulation language for tables is called Structured
Query Language, usually shortened to SQL (see Fig. 1.4). It was standardized by
1.2 SQL Databases 7
2
ANSI is the national standards organization of the USA. The national standardization
organizations are part of ISO.
8 1 Database Management
Natural language:
Descriptive language:
SELECT Name
FROM EMPLOYEE
WHERE City = 'Kent'
Procedural language:
fact, there are modern relational database management systems that can be accessed
with natural language.
Databases are used in the development and operation of information systems in order
to store data centrally, permanently, and in a structured manner.
As shown in Fig. 1.6, relational database management systems are integrated
systems for the consistent management of tables. They offer service functionalities
and the descriptive language SQL for data description, selection, and manipulation.
Every relational database management system consists of a storage and a man-
agement component. The storage component stores both data and the relationships
between pieces of information in tables. In addition to tables with user data from
various applications, it contains predefined system tables necessary for database
operation. These contain descriptive information and can be queried, but not
manipulated, by users.
The management component’s most important part is the language SQL for
relational data definition, selection, and manipulation. This component also contains
service functions for data restoration after errors, for data protection, and for backup.
Relational database management systems (RDBMS) have the following properties:
1.2 SQL Databases 9
• Model: The database model follows the relational model, i.e., all data and data
relations are represented in tables. Dependencies between attribute values of
tuples or multiple instances of data can be discovered (cf. normal forms in Sect.
2.3.1).
• Schema: The definitions of tables and attributes are stored in the relational
database schema. The schema further contains the definition of the identification
keys and rules for integrity assurance.
• Language: The database system includes SQL for data definition, selection, and
manipulation. The language component is descriptive and facilitates analyses and
programming tasks for users.
• Architecture: The system ensures extensive data independence, i.e., data and
applications are mostly segregated. This independence is reached by separating
the actual storage component from the user side using the management compo-
nent. Ideally, physical changes to relational databases are possible without having
to adjust related applications.
• Multi-user operation: The system supports multi-user operation (cf. Sect. 4.1),
i.e., several users can query or manipulate the same database at the same time.
The RDBMS ensures that parallel transactions in one database do not interfere
with each other or worse, with the correctness of data (Sect. 4.2).
• Consistency assurance: The database management system provides tools for
ensuring data integrity, i.e., the correct and uncompromised storage of data.
• Data security and data protection: The database management system provides
mechanisms to protect data from destruction, loss, or unauthorized access.
NoSQL database management systems meet these criteria only partially (see
Chaps. 4 and 7). For that reason, most corporations, organizations, and especially
10 1 Database Management
SMEs (small and medium enterprises) rely heavily on relational database manage-
ment systems. However, for spread-out Web applications or applications handling
Big Data, relational database technology must be augmented with NoSQL technol-
ogy in order to ensure uninterrupted global access to these services.
The term Big Data is used to label large volumes of data that push the limits of
conventional software. This data can be unstructured (see Sect. 5.1) and may
originate from a wide variety of sources: social media postings; e-mails; electronic
archives with multimedia content; search engine queries; document repositories of
content management systems; sensor data of various kinds; rate developments at
stock exchanges; traffic flow data and satellite images; smart meters in household
appliances; order, purchase, and payment processes in online stores; e-health
applications; monitoring systems; etc.
There is no binding definition for Big Data yet, but most data specialists will
agree on three Vs: volume (extensive amounts of data), variety (multiple formats:
structured, semi-structured, and unstructured data; see Fig. 1.7), and velocity (high-
speed and real-time processing). Gartner Group’s IT glossary offers the following
definition:
Big Data
“Big data is high-volume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information processing for enhanced
insight and decision making.”
Multimedia
With this definition, Big Data are information assets for companies. It is indeed
vital for companies and organizations to generate decision-relevant knowledge in
order to survive. In addition to internal information systems, they increasingly utilize
the numerous resources available online to better anticipate economic, ecologic, and
social developments on the markets.
Big Data is a challenge faced by not only for-profit-oriented companies in digital
markets but also governments, public authorities, NGOs (non-governmental
organizations), and NPOs (nonprofit organizations).
A good example are programs to create smart or ubiquitous cities, i.e., by using
Big Data technologies in cities and urban agglomerations for sustainable develop-
ment of social and ecologic aspects of human living spaces. They include projects
facilitating mobility, the use of intelligent systems for water and energy supply, the
promotion of social networks, expansion of political participation, encouragement of
entrepreneurship, protection of the environment, and an increase of security and
quality of life.
All use of Big Data applications requires successful management of the three Vs
mentioned above:
• Volume: There are massive amounts of data involved, ranging from giga- to
zettabytes (megabyte, 106 bytes; gigabyte, 109 bytes; terabyte, 1012 bytes;
petabyte, 1015 bytes; exabyte, 1018 bytes; zettabyte, 1021 bytes).
• Variety: Big Data involves storing structured, semi-structured, and unstructured
multimedia data (text, graphics, images, audio, and video; cf. Fig. 1.7).
• Velocity: Applications must be able to process and analyze data streams in real
time as the data is gathered.
• Value: Big Data applications are meant to increase the enterprise value, so
investments in personnel and technical infrastructure are made where they will
bring leverage or added value can be generated.
To complete our discussion of the concept of Big Data, we will look at another V:
Veracity is an important factor in Big Data, where the available data is of variable
quality, which must be taken into consideration in analyses. Aside from statistical
methods, there are fuzzy methods of soft computing which assign a truth value
between 0 (false) and 1 (true) to any result or statement.
12 1 Database Management
NoSQL
The term NoSQL is used for any non-relational database management approach
meeting at least one of two criteria:
Parallel execution
Weak to strong consistency
Document E
DI
Document D
Document
Document C D MOVIE
CT
Value = Order-Nr 1
ED
Document
DocumentB C
_B
Document A
Document B
Y
Key= Session-ID 2
AC
Key= Order-Nr A
Document
TE
Shopping cart
Key= Session-ID 3 Item 1
Item 2 ACTOR
Value = Order-Nr 3
Item 3
Figure 1.9 shows three different NoSQL database management systems. Key-
value stores (see also Sect. 7.2) are the simplest version. Data is stored as an
identification key <key = "key"> and a list of values <value = "value 1", "value
2", . . .>. A good example is an online store with session management and shopping
basket. The session ID is the identification key; the order number is the value stored
in the cache. In document stores, records are managed as documents within the
NoSQL database. These documents are structured files which describe an entire
subject matter in a self-contained manner. For instance, together with an order
number, the individual items from the basket are stored as values in addition to the
customer profile. The third example shows a graph database on movies and actors
discussed in the next section.
NoSQL databases support various database models (see Fig. 1.9). Here, we discuss
graph databases as a first example to look at and discuss its characteristics.
Property Graph
Property graphs consist of nodes (concepts, objects) and directed edges
(relationships) connecting the nodes. Both nodes and edges are given a label and
can have properties. Properties are given as attribute-value pairs with the names of
attributes and the respective values.
1.4 Graph Databases 15
MOVIE
Title HAS
Year
GENRE
Type
ACTED_IN
Role DIRECTED_BY
ACTOR
Name DIRECTOR
Birthyear Name
Nationality
A graph abstractly presents the nodes and edges with their properties. Figure 1.10
shows part of a movie collection as an example. It contains the nodes MOVIE with
attributes Title and Year (of release), GENRE with the respective Type (e.g., crime,
mystery, comedy, drama, thriller, western, science fiction, documentary, etc.),
ACTOR with Name and Year of Birth, and DIRECTOR with Name and Nationality.
The example uses three directed edges: The edge ACTED_IN shows which artist
from the ACTOR node starred in which film from the MOVIE node. This edge also
has a property, the Role of the actor in the movie. The other two edges, HAS and
DIRECTED_BY, go from the MOVIE node to the GENRE and DIRECTOR node,
respectively.
In the manifestation level, i.e., the graph database, the property graph contains the
concrete values (Fig. 1.11). For each node and for each edge, a separate record is
stored. Thus, in contrast to relational databases, the connections between the data are
not stored and indexed as key references, but as separate records. This leads to
efficient processing of network analyses.
Cypher is a declarative query language for extracting patterns from graph databases.
ISO plans to extend Cypher to become the international standard for graph-based
database languages as Graph Query Language (GQL) by 2023.
Users define their query by specifying nodes and edges. The database manage-
ment system then calculates all patterns meeting the criteria by analyzing the
16 1 Database Management
ACTOR
Name: Keanu Reeves
Birthyear: 1964
AC ole
k
ar
R
TE : N
M
DIRECTED_BY
D eo
D IN
_I
ak
e : D_
N
on
ol E
R CT
A
MOVIE MOVIE
Title: Man of Tai Chi Title: The Matrix
Year: 2013 Year: 1999
possible paths (connections between nodes via edges). The user declares the struc-
ture of the desired pattern, and the database management system’s algorithms
traverse all necessary connections (paths) and assemble the results.
As described in Sect. 1.4.1, the data model of a graph database consists of nodes
(concepts, objects) and directed edges (relationships between nodes). In addition to
their name, both nodes and edges can have a set of properties (see Property Graph in
Sect. 1.4.1). These properties are represented by attribute-value pairs.
Figure 1.11 shows a segment of a graph database on movies and actors. To keep
things simple, only two types of nodes are shown: ACTOR and MOVIE. ACTOR
nodes contain two attribute-value pairs, specifically (Name: FirstName LastName)
and (YearOfBirth: Year).
The segment in Fig. 1.11 includes different types of edges: The ACTED_IN
relationship represents which actors starred in which movies. Edges can also have
properties if attribute-value pairs are added to them. For the ACTED_IN relation-
ship, the respective roles of the actors in the movies are listed. For example, Keanu
Reeves is the hacker Neo in “The Matrix.”
Nodes can be connected by multiple relationship edges. The movie “Man of Tai
Chi” and actor Keanu Reeves are linked not only by the actor’s role (ACTED_IN)
but also by the director position (DIRECTED_BY). The diagram therefore shows
that Keanu Reeves both directed the movie “Man of Tai Chi” and starred in it as
Donaka Mark.
If we want to analyze this graph database on movies, we can use Cypher. It uses
the following basic query elements:
1.4 Graph Databases 17
For instance, the Cypher query for the year the movie “The Matrix” was released
would be:
The query sends out the variable m for the movie “The Matrix” to return the
movie’s year of release by m.Year. In Cypher, parentheses always indicate nodes,
i.e., (m: Movie) declares the control variable m for the MOVIE node. In addition to
control variables, individual attribute-value pairs can be included in curly brackets.
Since we are specifically interested in the movie “The Matrix,” we can add {Title:
“The Matrix”} to the node (m: Movie).
Queries regarding the relationships within the graph database are a bit more
complicated. Relationships between two arbitrary nodes (a) and (b) are expressed
in Cypher by the arrow symbol “->,” i.e., the path from (a) to (b) is declared as “(a) -
> (b).” If the specific relationship between (a) and (b) is of importance, the edge
[r] can be inserted in the middle of the arrow. The square brackets represent edges,
and r is our variable for relationships.
Now, if we want to find out who played Neo in “The Matrix,” we use the
following query to analyze the ACTED_IN path between ACTOR and MOVIE:
Cypher will return the result Keanu Reeves. For a list of movie titles (m), actor
names (a), and respective roles (r), the query would have to be:
Since our example graph database only contains one actor and two movies, the
result would be the movie “Man of Tai Chi” with actor Keanu Reeves in the role of
Donaka Mark and the movie “The Matrix” with Keanu Reeves as Neo.
Random documents with unrelated
content Scribd suggests to you:
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.