100% found this document useful (3 votes)
58 views

Download Complete SQL and NoSQL Databases: Modeling, Languages, Security and Architectures for Big Data Management Michael Kaufmann PDF for All Chapters

Security

Uploaded by

emotoswahnrk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
58 views

Download Complete SQL and NoSQL Databases: Modeling, Languages, Security and Architectures for Big Data Management Michael Kaufmann PDF for All Chapters

Security

Uploaded by

emotoswahnrk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Experience Seamless Full Ebook Downloads for Every Genre at ebookmeta.

com

SQL and NoSQL Databases: Modeling, Languages,


Security and Architectures for Big Data Management
Michael Kaufmann

https://ebookmeta.com/product/sql-and-nosql-databases-
modeling-languages-security-and-architectures-for-big-data-
management-michael-kaufmann/

OR CLICK BUTTON

DOWNLOAD NOW

Explore and download more ebook at https://ebookmeta.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

SQL and NoSQL Databases Modeling Languages Security and


Architectures for Big Data Management 2nd Edition Michael
Kaufmann
https://ebookmeta.com/product/sql-and-nosql-databases-modeling-
languages-security-and-architectures-for-big-data-management-2nd-
edition-michael-kaufmann/
ebookmeta.com

Python Data Persistence With SQL and NOSQL Databases 1st


Edition Lathkar

https://ebookmeta.com/product/python-data-persistence-with-sql-and-
nosql-databases-1st-edition-lathkar/

ebookmeta.com

NoSQL and SQL Data Modeling: Bringing Together Data,


Semantics, and Software First Edition Hills

https://ebookmeta.com/product/nosql-and-sql-data-modeling-bringing-
together-data-semantics-and-software-first-edition-hills/

ebookmeta.com

The Divorce from Hell Margie Majors Middle Aged Vampire


Slayer 2 2nd Edition Dewylde Saranna

https://ebookmeta.com/product/the-divorce-from-hell-margie-majors-
middle-aged-vampire-slayer-2-2nd-edition-dewylde-saranna/

ebookmeta.com
Optical Communications in the 5G Era 1st Edition Xiang Liu

https://ebookmeta.com/product/optical-communications-in-
the-5g-era-1st-edition-xiang-liu/

ebookmeta.com

Deep Waters Frank Waters Remembered in Letters and


Commentary 1st Edition Alan Louis Kishbaugh Alan Louis
Kishbaugh
https://ebookmeta.com/product/deep-waters-frank-waters-remembered-in-
letters-and-commentary-1st-edition-alan-louis-kishbaugh-alan-louis-
kishbaugh/
ebookmeta.com

The Atheist Manifesto 2nd Edition Christopher Hitchens

https://ebookmeta.com/product/the-atheist-manifesto-2nd-edition-
christopher-hitchens/

ebookmeta.com

European Revolutions and the Ottoman Balkans Nationalism


Violence and Empire in the Long Nineteenth Century 1st
Edition Dimitris Stamatopoulos
https://ebookmeta.com/product/european-revolutions-and-the-ottoman-
balkans-nationalism-violence-and-empire-in-the-long-nineteenth-
century-1st-edition-dimitris-stamatopoulos/
ebookmeta.com

Monster Girl Safari 3 1st Edition Roland Carlsson

https://ebookmeta.com/product/monster-girl-safari-3-1st-edition-
roland-carlsson/

ebookmeta.com
Loveless Osemanverse 10 1st Edition Alice Oseman

https://ebookmeta.com/product/loveless-osemanverse-10-1st-edition-
alice-oseman/

ebookmeta.com
Michael Kaufmann
Andreas Meier

SQL and NoSQL


Databases
Modeling, Languages, Security
and Architectures for Big Data
Management
Second Edition
SQL and NoSQL Databases
Michael Kaufmann • Andreas Meier

SQL and NoSQL


Databases
Modeling, Languages, Security
and Architectures for Big Data
Management

Second Edition
Michael Kaufmann Andreas Meier
Informatik Institute of Informatics
Hochschule Luzern Universität Fribourg
Rotkreuz, Switzerland Fribourg, Switzerland

ISBN 978-3-031-27907-2 ISBN 978-3-031-27908-9 (eBook)


https://doi.org/10.1007/978-3-031-27908-9

# The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
The first edition of this book was published by Springer Vieweg in 2019
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

The term database has long since become part of people’s everyday vocabulary, for
managers and clerks as well as students of most subjects. They use it to describe a
logically organized collection of electronically stored data that can be directly
searched and viewed. However, they are generally more than happy to leave the
whys and hows of its inner workings to the experts.
Users of databases are rarely aware of the immaterial and concrete business
values contained in any individual database. This applies as much to a car importer’s
spare parts inventory as the IT solution containing all customer depots at a bank or
the patient information system of a hospital. Yet failure of these systems, or even
cumulative errors, can threaten the very existence of the respective company or
institution. For that reason, it is important for a much larger audience than just the
“database specialists” to be well-informed about what is going on. Anyone involved
with databases should understand what these tools are effectively able to do and
which conditions must be created and maintained for them to do so.
Probably the most important aspect concerning databases involves (a) the dis-
tinction between their administration and the data stored in them (user data) and
(b) the economic magnitude of these two areas. Database administration consists of
various technical and administrative factors, from computers, database systems, and
additional storage to the experts setting up and maintaining all these components—
the aforementioned database specialists. It is crucial to keep in mind that the
administration is by far the smaller part of standard database operation, constituting
only about a quarter of the entire efforts.
Most of the work and expenses concerning databases lie in gathering,
maintaining, and utilizing the user data. This includes the labor costs for all
employees who enter data into the database, revise it, retrieve information from
the database, or create files using this information. In the above examples, this means
warehouse employees, bank tellers, or hospital personnel in a wide variety of
fields—usually for several years.
In order to be able to properly evaluate the importance of the tasks connected with
data maintenance and utilization on the one hand and database administration on the
other hand, it is vital to understand and internalize this difference in the effort
required for each of them. Database administration starts with the design of the
database, which already touches on many specialized topics such as determining the

v
vi Foreword

consistency checks for data manipulation or regulating data redundancies, which are
as undesirable on the logical level as they are essential on the storage level. The
development of database solutions is always targeted on their later use, so
ill-considered decisions in the development process may have a permanent impact
on everyday operations. Finding ideal solutions, such as the golden mean between
too strict and too flexible when determining consistency conditions, may require
some experience. Unduly strict conditions will interfere with regular operations,
while excessively lax rules will entail a need for repeated expensive data repairs.
To avoid such issues, it is invaluable for anyone concerned with database
development and operation, whether in management or as a database specialist, to
gain systematic insight into this field of computer sciences. The table of contents
gives an overview of the wide variety of topics covered in this book. The title already
shows that, in addition to an in-depth explanation of the field of conventional
databases (relational model, SQL), the book also provides highly educational infor-
mation about current advancements and related fields, the keywords being NoSQL
and Big Data. I am confident that the newest edition of this book will once again be
well-received by both students and professionals—its authors are quite familiar with
both groups.

Professor Emeritus for Databases Carl August Zehnder


ETH Zürich
Zürich, Switzerland
Preface

It is remarkable how stable some concepts are in the field of databases. Information
technology is generally known to be subject to rapid development, bringing forth
new technologies at an unbelievable pace. However, this is only superficially the
case. Many aspects of computer science do not essentially change. This includes not
only the basics, such as the functional principles of universal computing machines,
processors, compilers, operating systems, databases and information systems, and
distributed systems, but also computer language technologies such as C, TCP/IP, or
HTML that are decades old but in many ways provide a stable fundament of the
global, earth-spanning information system known as the World Wide Web. Like-
wise, the SQL language (Structured Query Language) has been in use for almost five
decades and will remain so in the foreseeable future. The theory of relational
database systems was initiated in the 1970s by Codd (relation model) and
Chamberlin and Boyce (SEQUEL). However, these technologies have a major
impact on the practice of data management today. Especially, with the Big Data
revolution and the widespread use of data science methods for decision support,
relational databases and the use of SQL for data analysis are actually becoming more
important. Even though sophisticated statistics and machine learning are enhancing
the possibilities for knowledge extraction from data, many if not most data analyses
for decision support rely on descriptive statistics using SQL for grouped aggrega-
tion. SQL is also used in the field of Big Data with MapReduce technology. In this
sense, although SQL database technology is quite mature, it is more relevant today
than ever.
Nevertheless, the developments in the Big Data ecosystem brought new
technologies into the world of databases, to which we pay enough attention too.
Non-relational database technologies, which find more and more fields of applica-
tion under the generic term NoSQL, differ not only superficially from the classical
relational databases but also in the underlying principles. Relational databases were
developed in the twentieth century with the purpose of tightly organized, operational
forms of data management, which provided stability but limited flexibility. In
contrast, the NoSQL database movement emerged in the beginning of the new
century, focusing on horizontal partitioning, schema flexibility, and index-free
neighborhood with the goal of solving the Big Data problems of volume, variety,
and velocity, especially in Web-scale data systems. This has far-reaching

vii
viii Preface

consequences and leads to a new approach in data management, which deviate


significantly from the previous theories on the basic concept of databases: the way
data is modeled, how data is queried and manipulated, how data consistency is
handled, and how data is stored and made accessible. That is why in all chapters we
compare these two worlds, SQL and NoSQL databases.
In the first five chapters, we analyze in detail the management, modeling,
languages, security, and architecture of SQL databases, graph databases, and, in
the second English edition, new document databases. In Chaps. 6 and 7, we provide
an overview of other SQL- and NoSQL-based database approaches.
In addition to classic concepts such as the entity and relationship model and its
mapping in SQL or NoSQL database schemas, query languages, or transaction
management, we explain aspects for NoSQL databases such as the MapReduce
procedure, distribution options (fragments, replication), or the CAP theorem (con-
sistency, availability, partition tolerance).
In the second English edition, we offer a new in-depth introduction to document
databases with a method for modeling document structures, an overview of the
database language MQL, as well as security and architecture aspects. The new
edition also takes into account new developments in the Cypher language. The
topic of database security is newly introduced as a separate chapter and analyzed
in detail with regard to data protection, integrity, and transactions. Texts on data
management, database programming, and data warehousing and data lakes have
been updated. In addition, the second English edition explains the concepts of JSON,
JSON Schema, BSON, index-free neighborhood, cloud databases, search engines,
and time series databases.
We have launched a Website called sql-nosql.org, where we share teaching and
tutoring materials such as slides, tutorials for SQL and Cypher, case studies, and a
workbench for MySQL and Neo4j, so that language training can be done either with
SQL or with Cypher, the graph-oriented query language of the NoSQL database
Neo4j.
We thank Alexander Denzler and Marcel Wehrle for the development of the
workbench for relational and graph-oriented databases. For the redesign of the
graphics, we were able to work with Thomas Riediker. We thank him for his tireless
efforts. He has succeeded in giving the pictures a modern style and an individual
touch. In the ninth edition, we have tried to keep his style in our new graphics. For
the further development of the tutorials and case studies, which are available on the
website sql-nosql.org, we thank the computer science students Andreas Waldis,
Bettina Willi, Markus Ineichen, and Simon Studer for their contributions to the
tutorial in Cypher, respectively, to the case study Travelblitz with OpenOffice Base
and with Neo4J. For the feedback on the manuscript, we thank Alexander Denzler,
Daniel Fasel, Konrad Marfurt, Thomas Olnhoff, and Stefan Edlich for their willing-
ness to contribute to the quality of our work with reading our manuscript and with
providing valuable feedback. A heartfelt thank you goes out to Michael Kaufmann’s
wife Melody Reymond for proofreading our manuscript. Special thanks to Andy
Preface ix

Oppel of the University of California, Berkeley, for grammatical and technological


review of the English text. A big thank goes to Leonardo Milla of Springer, who has
supported us with patience and expertise.

Rotkreuz, Switzerland Michael Kaufmann


Fribourg, Switzerland Andreas Meier
October 2022
Contents

1 Database Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Information Systems and Databases . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 SQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Structured Query Language SQL . . . . . . . . . . . . . . . . . . . 6
1.2.3 Relational Database Management System . . . . . . . . . . . . . 8
1.3 Big Data and NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 NoSQL Database Management System . . . . . . . . . . . . . . . 12
1.4 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Graph-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Graph Query Language Cypher . . . . . . . . . . . . . . . . . . . . 15
1.5 Document Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.1 Document Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.2 Document-Oriented Database Language MQL . . . . . . . . . . 19
1.6 Organization of Data Management . . . . . . . . . . . . . . . . . . . . . . . . 21
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Database Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 From Requirements Analysis to Database . . . . . . . . . . . . . . . . . . . 25
2.2 The Entity-Relationship Model . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Entities and Relationships . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Associations and Association Types . . . . . . . . . . . . . . . . . 29
2.2.3 Generalization and Aggregation . . . . . . . . . . . . . . . . . . . . 32
2.3 Implementation in the Relational Model . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Dependencies and Normal Forms . . . . . . . . . . . . . . . . . . . 35
2.3.2 Mapping Rules for Relational Databases . . . . . . . . . . . . . . 42
2.4 Implementation in the Graph Model . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.1 Graph Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2 Mapping Rules for Graph Databases . . . . . . . . . . . . . . . . . 51
2.5 Implementation in the Document Model . . . . . . . . . . . . . . . . . . . . 55
2.5.1 Document-Oriented Database Modeling . . . . . . . . . . . . . . 55

xi
xii Contents

2.5.2 Mapping Rules for Document Databases . . . . . . . . . . . . . . 59


2.6 Formula for Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3 Database Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.1 Interacting with Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Relational Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.1 Overview of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 Set Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.3 Relation Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.4 Relationally Complete Languages . . . . . . . . . . . . . . . . . . . 80
3.3 Relational Language SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.1 Creating and Populating the Database Schema . . . . . . . . . . 81
3.3.2 Relational Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.3 Built-In Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.4 Null values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 Graph-Based Language Cypher . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.4.1 Creating and Populating the Database Schema . . . . . . . . . . 92
3.4.2 Relation Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.3 Built-In Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.4.4 Graph Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.5 Document-Oriented Language MQL . . . . . . . . . . . . . . . . . . . . . . 98
3.5.1 Creating and Filling the Database Schema . . . . . . . . . . . . . 98
3.5.2 Relation Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5.3 Built-In Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.5.4 Null Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.6 Database Programming with Cursors . . . . . . . . . . . . . . . . . . . . . . 106
3.6.1 Embedding of SQL in Procedural Languages . . . . . . . . . . . 106
3.6.2 Embedding Graph-Based Languages . . . . . . . . . . . . . . . . . 109
3.6.3 Embedding Document Database Languages . . . . . . . . . . . . 109
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 Database Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.1 Security Goals and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2 Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2.1 Authentication and Authorization in SQL . . . . . . . . . . . . . 113
4.2.2 Authentication in Cypher . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.2.3 Authentication and Authorization in MQL . . . . . . . . . . . . . 121
4.3 Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3.1 Relational Integrity Constraints . . . . . . . . . . . . . . . . . . . . . 127
4.3.2 Integrity Constraints for Graphs in Cypher . . . . . . . . . . . . 129
4.3.3 Integrity Constraints in Document Databases with MQL . . . 132
4.4 Transaction Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.1 Multi-user Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.2 ACID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Contents xiii

4.4.3 Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


4.4.4 Pessimistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.4.5 Optimistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.4.6 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.5 Soft Consistency in Massive Distributed Data . . . . . . . . . . . . . . . . 144
4.5.1 BASE and the CAP Theorem . . . . . . . . . . . . . . . . . . . . . . 144
4.5.2 Nuanced Consistency Settings . . . . . . . . . . . . . . . . . . . . . 146
4.5.3 Vector Clocks for the Serialization of Distributed
Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.5.4 Comparing ACID and BASE . . . . . . . . . . . . . . . . . . . . . . 149
4.6 Transaction Control Language Elements . . . . . . . . . . . . . . . . . . . . 151
4.6.1 Transaction Control in SQL . . . . . . . . . . . . . . . . . . . . . . . 151
4.6.2 Transaction Management in the Graph Database Neo4J
and in the Cypher Language . . . . . . . . . . . . . . . . . . . . . . . 153
4.6.3 Transaction Management in MongoDB and MQL . . . . . . . 155
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.1 Processing of Homogeneous and Heterogeneous Data . . . . . . . . . . 159
5.2 Storage and Access Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.2.1 Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.2.2 Tree Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.2.3 Hashing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2.4 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.2.5 Multi-dimensional Data Structures . . . . . . . . . . . . . . . . . . 168
5.2.6 Binary JavaScript Object Notation BSON . . . . . . . . . . . . . 171
5.2.7 Index-Free Adjacency . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.3 Translation and Optimization of Relational Queries . . . . . . . . . . . . 175
5.3.1 Creation of Query Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.3.2 Optimization by Algebraic Transformation . . . . . . . . . . . . 178
5.3.3 Calculation of Join Operators . . . . . . . . . . . . . . . . . . . . . . 180
5.3.4 Cost-Based Optimization of Access Paths . . . . . . . . . . . . . 182
5.4 Parallel Processing with MapReduce . . . . . . . . . . . . . . . . . . . . . . 184
5.5 Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.6 Use of Different Storage Structures . . . . . . . . . . . . . . . . . . . . . . . 187
5.7 Cloud Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6 Post-relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.1 The Limits of SQL and What Lies Beyond . . . . . . . . . . . . . . . . . . 193
6.2 Federated Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.3 Temporal Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.4 Multi-dimensional Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.5 Data Warehouse and Data Lake Systems . . . . . . . . . . . . . . . . . . . 204
6.6 Object-Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.7 Knowledge Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
xiv Contents

6.8 Fuzzy Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216


Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7 NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.1 Development of Non-relational Technologies . . . . . . . . . . . . . . . . 223
7.2 Key-Value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.3 Column-Family Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.4 Document Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.5 XML Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.6 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.7 Search Engine Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.8 Time Series Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Database Management
1

1.1 Information Systems and Databases

The evolution from the industrial society via the service society to the information
and knowledge society is represented by the assessment of information as a factor in
production. The following characteristics distinguish information from material
goods:

• Representation: Information is specified by data (signs, signals, messages, or


language elements).
• Processing: Information can be transmitted, stored, categorized, found, or
converted into other representation formats using algorithms and data structures
(calculation rules).
• Combination: Information can be freely combined. The origin of individual parts
cannot be traced. Manipulation is possible at any point.
• Age: Information is not subject to physical aging processes.
• Original: Information can be copied without limit and does not distinguish
between original and copy.
• Vagueness: Information can be imprecise and of differing validity (quality).
• Medium: Information does not require a fixed medium and is therefore indepen-
dent of location.

These properties clearly show that digital goods (information, software, multime-
dia, etc.), i.e., data, are vastly different from material goods in both handling and
economic or legal evaluation. A good example is the loss in value that physical
products often experience when they are used—the shared use of information, on the
other hand, may increase its value. Another difference lies in the potentially high
production costs for material goods, while information can be multiplied easily and
at significantly lower costs (only computing power and storage medium). This
causes difficulties in determining property rights and ownership, even though digital
watermarks and other privacy and security measures are available.

# The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


M. Kaufmann, A. Meier, SQL and NoSQL Databases,
https://doi.org/10.1007/978-3-031-27908-9_1
2 1 Database Management

Information System

Communication
Database System Application Software network
or WWW

User
 User guidance
Database  Dialog design Request
Management
 Business logic
 Data querying Response
 Data manipulation
Database  Access permissions
Storage  Data protection

Fig. 1.1 Architecture and components of information systems

Considering data as the basis of information as a production factor in a company


has significant consequences:

• Basis for decision-making: Data allows well-informed decisions, making it vital


for all organizational functions.
• Quality level: Data can be available from different sources; information quality
depends on the availability, correctness, and completeness of the data.
• Need for investments: Data gathering, storage, and processing cause work and
expenses.
• Degree of integration: Fields and holders of duties within any organization are
connected by informational relations, meaning that the fulfillment of the said
duties largely depends on the degree of data integration.

Once data is viewed as a factor in production, it must be planned, governed,


monitored, and controlled. This makes it necessary to see data management as a task
for the executive level, inducing a major change within the company. In addition to
the technical function of operating the information and communication infrastructure
(production), planning and design of data flows (application portfolio) is crucial.
As shown in Fig. 1.1, an information system enables users to store and connect
information interactively, to ask questions, and to get answers. Depending on the
type of information system, the acceptable questions may be limited. There are,
however, open information systems and online platforms in the World Wide Web
that use search engines to process arbitrary queries.
The computer-based information system in Fig. 1.1 is connected to a communi-
cation network such as the World Wide Web in order to allow for online interaction
Discovering Diverse Content Through
Random Scribd Documents
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite
these efforts, Project Gutenberg™ electronic works, and the
medium on which they may be stored, may contain “Defects,”
such as, but not limited to, incomplete, inaccurate or corrupt
data, transcription errors, a copyright or other intellectual
property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or
cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES -


Except for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU
AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE,
STRICT LIABILITY, BREACH OF WARRANTY OR BREACH
OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE
TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER
THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR
ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE
OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF
THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If


you discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person or
entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you do
or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status by
the Internal Revenue Service. The Foundation’s EIN or federal
tax identification number is 64-6221541. Contributions to the
Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or
determine the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like