0% found this document useful (0 votes)

45 views85 pages

Building The Data Lakehouse Bill Inmon Ranjeet Srivastava Mary Levins Download

The document discusses the concept of the data lakehouse, which integrates the features of data lakes and data warehouses to enhance data analysis and machine learning capabilities. It outlines the evolution of data management technologies, the importance of analytical infrastructure, and the need for a universal common connector to relate diverse data types. The book serves as a comprehensive guide for understanding and implementing data lakehouses effectively.

Uploaded by

ckkpycpte7389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views85 pages

Building The Data Lakehouse Bill Inmon Ranjeet Srivastava Mary Levins Download

Uploaded by

ckkpycpte7389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Building The Data Lakehouse Bill Inmon Ranjeet

Srivastava Mary Levins download

https://ebookbell.com/product/building-the-data-lakehouse-bill-
inmon-ranjeet-srivastava-mary-levins-44238712

Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Building The Data Lakehouse Bill Inmon Mary Levins Ranjeet Srivastava

https://ebookbell.com/product/building-the-data-lakehouse-bill-inmon-
mary-levins-ranjeet-srivastava-38478248

The Azure Data Lakehouse Toolkit Building And Scaling Data Lakehouses
On Azure With Delta Lake Apache Spark Databricks Synapse Analytics And
Snowflake 1st Edition Ron Lesteve

https://ebookbell.com/product/the-azure-data-lakehouse-toolkit-
building-and-scaling-data-lakehouses-on-azure-with-delta-lake-apache-
spark-databricks-synapse-analytics-and-snowflake-1st-edition-ron-
lesteve-43888052

Building The Data Warehouse 3rd Edition 3rd Edition W H Inmon

https://ebookbell.com/product/building-the-data-warehouse-3rd-
edition-3rd-edition-w-h-inmon-2001620

Building The Data Warehouse 4th Ed W H Inmon

https://ebookbell.com/product/building-the-data-warehouse-4th-ed-w-h-
inmon-977674
Building The Snowflake Data Cloud Monetizing And Democratizing Your
Data Andrew Carruthers

https://ebookbell.com/product/building-the-snowflake-data-cloud-
monetizing-and-democratizing-your-data-andrew-carruthers-44888670

The Edge Data Center Building The Connected Future Hugh Taylor

https://ebookbell.com/product/the-edge-data-center-building-the-
connected-future-hugh-taylor-51012162

Building The New Economy Data As Capital Alex Pentland Alexander

Lipton

https://ebookbell.com/product/building-the-new-economy-data-as-
capital-alex-pentland-alexander-lipton-38248216

Cracking The Data Engineering Interview Land Your Dream Job With The
Help Of Resumebuilding Tips Over 100 Mock Questions And A Unique
Portfolio 1st Edition Anonymous

https://ebookbell.com/product/cracking-the-data-engineering-interview-
land-your-dream-job-with-the-help-of-resumebuilding-tips-
over-100-mock-questions-and-a-unique-portfolio-1st-edition-
anonymous-55324592

Practical Data Science A Guide To Building The Technology Stack For

Turning Data Lakes Into Business Assets 1st Edition Andreas Franois
Vermeulen

https://ebookbell.com/product/practical-data-science-a-guide-to-
building-the-technology-stack-for-turning-data-lakes-into-business-
assets-1st-edition-andreas-franois-vermeulen-24013854
Building the
Data Lakehouse

Bill Inmon
Mary Levins

Ranjeet Srivastava

Technics Publications
2 Lindsley Road
Basking Ridge, NJ 07920 USA
https://www.TechnicsPub.com

Edited by Jamie Hoberman

Cover design by Lorena Molinari

All rights reserved. No part of this book may be reproduced or

transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the publisher, except
for brief quotations in a review.

The authors and publisher have taken care in the preparation of this
book, but make no expressed or implied warranty of any kind and
assume no responsibility for errors or omissions. No liability is assumed
for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.

All trade and product names are trademarks, registered trademarks, or

service marks of their respective companies and are the property of
their respective holders and should be treated as such.

First Printing 2021

Copyright © 2021 by Bill Inmon, Mary Levins, and Ranjeet Srivastava

ISBN, print ed. 9781634629669

ISBN, Kindle ed. 9781634629676
ISBN, ePub ed. 9781634629683
ISBN, PDF ed. 9781634629690

Library of Congress Control Number: 2021945389

Acknowledgments
The authors would like to acknowledge the contributions
made by Bharath Gowda from Databricks Corporation.
Bharath made sure that when we started wandering off
into the desert, he led us back to civilization. We sincerely
thank Bharath for his contributions.

In addition, we thank Sean Owen and Jason Pohl from

Databricks for their contributions. Thank you, Sean and
Jason.
Contents
Introduction ___________________________________________________1

Chapter 1: Evolution to the Data Lakehouse _______________________ 5

The evolution of technology ______________________________________ 5
All the data in the organization __________________________________ 12
Where is business value? _______________________________________ 18
The data lake _________________________________________________ 19
Current data architecture challenges _____________________________ 22
Emergence of the data lakehouse ________________________________ 23
Comparing data warehouse and data lake with data lakehouse ______ 28

Chapter 2: Data Scientists and End Users __________________________31

The data lake _________________________________________________ 31
The analytical infrastructure ____________________________________ 32
Different audiences ____________________________________________ 32
The tools of analysis ___________________________________________ 33
What is being analyzed? ________________________________________ 34
The analytical approaches ______________________________________ 35
Types of data _________________________________________________ 36

Chapter 3: Different Types of Data in the Data Lakehouse __________ 39

Types of data _________________________________________________ 40
Different volumes of data _______________________________________ 45
Relating data across the diverse types of data _____________________ 45
Segmenting data based on probability of access____________________ 46
Relating data in the IoT and the analog environment _______________ 47
The analytical infrastructure ____________________________________ 50

Chapter 4: The Open Environment _______________________________ 53

The evolution of open systems ___________________________________ 54
Innovation today ______________________________________________ 55
The unstructured portion builds on open standards _________________ 56
Open source lakehouse software _________________________________ 57
Lakehouse provides open APIs beyond SQL ________________________ 60
Lakehouse enables open data sharing ____________________________ 61

i
ii • Building the Data Lakehouse

Lakehouse supports open data exploration _______________________ 63

Lakehouse simplifies discovery with open data catalogs ____________ 65
Lakehouse leverages cloud architecture __________________________ 66
An evolution to the open data lakehouse _________________________ 69

Chapter 5: Machine Learning and the Data Lakehouse _______________ 71

Machine learning ______________________________________________ 71
What machine learning needs from a lakehouse ____________________ 72
New value from data ___________________________________________ 73
Resolving the dilemma__________________________________________ 73
The problem of unstructured data ________________________________ 75
The importance of open source ___________________________________ 77
Taking advantage of cloud elasticity ______________________________ 78
Designing “ MLOps“ for a data platform __________________________ 80
Example: Learning to classify chest x-rays _________________________ 81
An evolution of the unstructured component _______________________85

Chapter 6: The Analytical Infrastructure for the Data Lakehouse ____ 87

Metadata ____________________________________________________ 89
The data model _______________________________________________ 90
Data quality _________________________________________________ 92
ETL _______________________________________________________ 93
Textual ETL ___________________________________________________ 94
Taxonomies __________________________________________________ 95
Volume of data _______________________________________________ 96
Lineage of data ________________________________________________ 97
KPIs _______________________________________________________ 98
Granularity __________________________________________________ 99
Transactions ________________________________________________ 100
Keys _______________________________________________________ 101
Schedule of processing _________________________________________102
Summarizations ______________________________________________102
Minimum requirements ________________________________________104

Chapter 7: Blending Data in the Data Lakehouse _________________ 105

The lakehouse and the data lakehouse ___________________________ 105
The origins of data ____________________________________________106
Different types of analysis _____________________________________ 107
Common identifiers ___________________________________________109
Contents • iii

Structured identifiers _________________________________________ 110

Repetitive data ________________________________________________111
Identifiers from the textual environment _________________________ 112
Combining text and structured data _____________________________ 114
The importance of matching ____________________________________ 121

Chapter 8: Types of Analysis Across the Data Lakehouse Architecture_______125

Known queries _______________________________________________ 125
Heuristic analysis ____________________________________________ 128

Chapter 9: Data Lakehouse Housekeeping™ __________________________135

Data integration and interoperability ___________________________ 138
Master references for the data lakehouse ________________________ 142
Data lakehouse privacy, confidentiality, and data protection _______ 145
“Data future-proofing™” in a data lakehouse _____________________ 148
Five phases of “Data Future-proofing” ___________________________ 154
Data lakehouse routine maintenance ____________________________ 165

Chapter 10: Visualization _______________________________________167

Turning data into information __________________________________ 169
What is data visualization and why is it important? _______________ 172
Data visualization, data analysis, and data interpretation _________ 174
Advantage of data visualization ________________________________ 177

Chapter 11: Data Lineage in the Data Lakehouse Architecture_____________ 191

The chain of calculations _______________________________________ 192
Selection of data______________________________________________ 194
Algorithmic differences ________________________________________ 194
Lineage for the textual environment _____________________________ 196
Lineage for the other unstructured environment __________________ 197
Data lineage _________________________________________________ 198

Chapter 12: Probability of Access in the Data Lakehouse Architecture ______ 201
Efficient arrangement of data __________________________________ 202
Probability of access __________________________________________ 203
Different types of data in the data lakehouse _____________________204
Relative data volume differences _______________________________ 205
Advantages of segmentation ___________________________________ 206
iv • Building the Data Lakehouse

Using bulk storage ___________________________________________ 207

Incidental indexes ____________________________________________ 207

Chapter 13: Crossing the Chasm__________________________________ 209

Merging data ________________________________________________ 209
Different kinds of data _________________________________________210
Different business needs _______________________________________ 211
Crossing the chasm ____________________________________________ 211

Chapter 14: Managing Volumes of Data in the Data Lakehouse ___________ 219
Distribution of the volumes of data _____________________________ 220
High performance/bulk storage of data __________________________ 221
Incidental indexes and summarization __________________________ 222
Periodic filtering _____________________________________________ 224
Tokenization of data__________________________________________ 225
Separating text and databases _________________________________ 225
Archival storage _____________________________________________ 226
Monitoring activity ___________________________________________ 227
Parallel processing ____________________________________________ 227

Chapter 15: Data Governance and the Lakehouse_____________________ 229

Purpose of data governance ___________________________________ 229
Data lifecycle management ___________________________________ 232
Data quality management ____________________________________ 234
Importance of metadata management __________________________ 236
Data governance over time _____________________________________ 237
Types of governance __________________________________________ 238
Data governance across the lakehouse __________________________ 239
Data governance considerations ________________________________ 241

Index ______________________________________________________ 243

Introduction
Once the world had simple applications. But in today’s
world, we have all sorts of data, technology, hardware,
and other gadgets. Data comes to us from a myriad of
places and comes in many forms. And the volume of data
is just crushing.

There are three different types of data that an organization

uses for analytical purposes. First, there is classical
structured data that principally comes from executing
transactions. This structured data has been around the
longest. Second, there is textual data from emails, call
center conversations, contracts, medical records, and
elsewhere. Once text was a “black box” that could only be
stored but not analyzed by the computer.

Now, textual Extract, Transform, and Load (ETL)

technology has opened the door of text to standard
analytical techniques. Third, there is the world of
analog/IoT. Machines of every kind, such as drones,
electric eyes, temperature gauges, and wristwatches—all
can generate data. Analog/IoT data is in a much rougher
form than structured or textual data. And there is a
tremendous amount of this data generated in an
automated manner. Analog/IoT data is the domain of the
data scientist.

1
2 • Building the Data Lakehouse

At first, we threw all of this data into a pit called the “data
lake.” But we soon discovered that merely throwing data
into a pit was a pointless exercise. To be useful—to be
analyzed—data needed to (1) be related to each other and
(2) have its analytical infrastructure carefully arranged and
made available to the end user.

Unless we meet these two conditions, the data lake turns into
a swamp, and swamps start to smell after a while.

A data lake that does not meet the criteria for analysis is a
waste of time and money.

Enter the data lakehouse. The data lakehouse indeed adds

the elements to the data lake to become useful and
productive. Stated differently, if all you build is a data lake
without turning it into a data lakehouse, you have just
created an expensive eyesore. Over time that eyesore is
going to turn into an expensive liability.

The first of those elements needed for analysis and

machine learning is the analytical infrastructure. The
analytical infrastructure contains a combination of familiar
things and some things that may not be familiar. For
example, the analytical infrastructure of the data lakehouse
contains:

• Metadata
• Lineage of the data
Introduction • 3

• Volumetric measurements
• Historical records of creation
• Transformation descriptions

The second essential element of the data lakehouse needed

for analysis and machine learning is recognizing and using
the universal common connector. The universal common
connector allows data of all varieties to be combined and
compared. Without the universal common connector, it is
very difficult (if not impossible) for the diverse types of
data found in the data lakehouse to be related. But with
the universal common connector, it is possible to relate any
kind of data.

With the data lakehouse, it is possible to achieve a level of

analytics and machine learning that is not feasible or
possible any other way. But like all architectural
structures, the data lakehouse requires an understanding
of architecture and an ability to plan and create a
blueprint.
CHAPTER 1

Evolution to the Data

Lakehouse

M
ost evolutions occur over eons of time. The
evolution occurs so slowly that the steps in the
evolution are not observable on a day-to-day
basis. Watching the daily progression of an evolution
makes watching paint dry look like a spectator sport.
However, the evolution of computer technology has
progressed at warp speed, starting in the 1960s.

The evolution of technology

Once upon a time, life was simple when it came to the

computer. Data went in, was processed, and data came
out. In the beginning, there was paper tape. Paper tape
was automated but stored a minuscule amount of data in a
fixed format. Then came punched cards. One of the
problems with punched cards was that they were in a
fixed format. Huge volumes of punched cards consumed
huge amounts of paper and dropping a deck of cards led
to a tedious effort to get the cards back in order.

5
6 • Building the Data Lakehouse

Then modern data processing began with magnetic tape,

which opened up the door to the storage and usage of
larger volumes of data not in a fixed format. The problem
with magnetic tape was that you had to search the entire
file to find a particular record. Stated differently, with
magnetic tape files, you had to search data sequentially.
And magnetic tapes were notoriously fragile, so storing
data for long periods was not advisable.

Then came disk storage. Disk storage truly opened the

door even wider to modern IT processing by introducing
direct data access. With disk storage, you could go to a
record directly, not sequentially. Although there were cost
and availability issues early on, disk storage became much
less expensive, and large volumes of disk storage became
widely available over time.

Online transaction processing (OLTP)

The fact that data could be accessed directly opened up the

door to high-performance, direct access applications.

With high-performance and direct data access, Online

Transaction Systems (OLTP) became possible. Once online
transaction processing systems became available,
businesses found that computers had entered into the very
fabric of the business. Now there could be online
Evolution to the Data Lakehouse • 7

reservation systems, banking teller systems, ATM systems,

and the like. Now computers could directly interact with
customers.

In the early days of the computer, the computer was useful

for doing repetitive activities. But with online transaction
processing systems, the computer was useful for direct
interaction with the customer. In doing so, the business
value of the computer increased dramatically.

Computer applications
Very quickly, applications grew like weeds in the
springtime. Soon there were applications everywhere.

Figure 1-1. Lots of applications for lots of reasons.

The problem of data integrity

And with the growth of applications came a new and
unanticipated problem. In the early days of the computer,
the end user complained about not having his/her data.
But after being inundated with applications, the end user
then complained about not finding the RIGHT data.
8 • Building the Data Lakehouse

The end user switched from not being able to find data to not
being able to find the right data. This sounds like an almost
trivial shift, but it was anything but trivial.

With the proliferation of applications came the problem of

data integrity. The same data appeared in many places
with sometimes different values. To make a decision, the
end user had to find which version of the data was the right
one to use among the many available applications. Poor
business choices resulted when the end user did not find
and use the right version of data.

ABC = 45
ABC = 3200

ABC = -30

ABC = 0

Figure 1-2. Trying to find the correct data on which to base decisions
was an enormous task.

The challenge of finding the right data was a challenge that

few people understood. But over time, people began to
understand the complexity of finding the right data to use
for decision making. People discovered that they needed a
different architectural approach than simply building
more applications. Adding more machines, technology,
and consultants made matters relating to the integrity of
data worse, not better.
Evolution to the Data Lakehouse • 9

Adding more technology exaggerated the problems of the lack

of integrity of data.

The data warehouse

Enter the data warehouse. The data warehouse led to
disparate application data being copied into a separate
physical location. Thus, the data warehouse became an
architectural solution to an architectural problem.

Figure 1-3. An entirely new infrastructure around the data warehouse

was needed.
10 • Building the Data Lakehouse

Merely integrating data and placing it into a physically

separate location was only the start of the architecture. To
be successful, the designer had to build an entirely new
infrastructure around the data warehouse. The
infrastructure that surrounded the data warehouse made
the data found in the data warehouse usable and easily
analyzed. Stated differently, as important as the data
warehouse was, the end user found little value in the data
warehouse without the surrounding analytical
infrastructure. The analytical infrastructure included:

• Metadata—a guide to what data was located where

• Data model—an abstraction of the data found in
the data warehouse
• Data lineage—the tale of the origins and
transformations of data found in the data
warehouse
• Summarization—a description of the algorithmic
work to create the data in the data warehouse
• KPIs—where are key performance indicators
found
• ETL—technology that allowed applications data to
be transformed automatically into corporate data

The issue of historical data

Data warehousing opened other doors for analytical
processing. Before data warehousing, there was no
Evolution to the Data Lakehouse • 11

convenient place to store older and archival data easily

and efficiently—it was normal for organizations to store a
week, a month, or even a quarter’s worth of data in their
systems. But it was rare for an organization to store a year
or five years’ worth of data. But with data warehousing,
organizations could store ten years or more.

And there was great value in being able to store a longer

spectrum of time-valued data. For example, when
organizations became interested in looking at a customer’s
buying habits, understanding past buying patterns led the
way to understanding current and future buying patterns.

The past became a great predictor of the future.

Data warehousing then added the dimension of a greater

length of time for data storage to the world of analysis.
Now historical data was no longer a burden.

As important and useful as data warehouses are, for the

most part, data warehouses focus on structured,
transaction-based data. It is worth pointing out that many
other data types are not available in the structured
environment or the data warehouse.

The evolution of technology did not stop with the advent

of structured data. Soon data appeared from many
different and diverse sources. There were call centers.
There was the internet. There were machines that
12 • Building the Data Lakehouse

produced data. Data seemed to come from everywhere.

The evolution continued well beyond structured,
transaction-based data.

The limitations of data warehouses became evident with

the increasing variety of data (text, IoT, images, audio,
videos, drones, etc.) in the enterprise. In addition, the rise
of Machine Learning (ML) and Artificial Intelligence (AI)
introduced iterative algorithms that required direct data
access to data not based on SQL.

All the data in the organization

As important and useful as data warehouses are, for the
most part, data warehouses are centered around
structured data. But now, there are many other data types
in the organization. To see what data resides in an
organization, consider a simple graph.

Figure 1-4. A simple graph.

Structured data is typically transaction-based data

generated by an organization to conduct day-to-day
business activities. For example, textual data is data
generated by letters, emails, and conversations within the
Evolution to the Data Lakehouse • 13

organization. Other unstructured data has other sources,

such as IoT, image, video, and analog-based data.

Structured data
The first type of data to appear was structured data. For
the most part, structured data was a by-product of
transaction processing. A record was written when a
transaction was executed. This could be a sale, payment,
phone call, bank activity, or other transaction type. Each
new record had a similar structure to the previous record.

To see this similarity of processing, consider the making of

a deposit in a bank. A bank customer walks up to the teller
window and makes a deposit. The next person comes to
the window and also makes a deposit. Although the
account numbers and deposit amounts are different, the
structures of both records are the same.

We call this “structured data“ because the same data structure

is written and rewritten repeatedly.

Typically when you have structured data, you have many

records—one for each transaction that has occurred. So
naturally, there is a high degree of business value placed
on structured data for no other reason than transactions
are very near the business’s heart.
14 • Building the Data Lakehouse

Textual data
The primary reason raw text is not very useful is that raw
text must also contain context to be understood. Therefore,
it is not sufficient to merely read and analyze raw text.

To analyze text, we must understand both the text and the

context of the text.

However, we need to consider other aspects of text. We

must consider that text exists in a language, such as
English, Spanish, German, etc. Also, some text is
predictable, but other text is not predictable. Analyzing
predictable text is very different than analyzing
unpredictable text. Another obstacle to incisive analysis is
that the same word can have multiple meanings. The word
“record” can mean a vinyl recording of a song. Or it can
mean the speed of a race. Or other things. And other
obstacles await the person that tries to read and analyze
raw text.

Textual ETL
Fortunately, creating text in a structured format is a real
possibility. There is technology known as textual ETL.
With textual ETL, you can read the raw text and transform
it into a standard database format, identifying both text
and context. And in doing so, you can now start to blend
Evolution to the Data Lakehouse • 15

structured data and text. Or you can do an independent

analysis of the text by itself.

Analog data/IoT data

The operation of a machine, such as a car, watch, or
manufacturing machine, creates analog data. As long as
the machine is operating, it spews out measurements. The
measurements may be of many things—temperature,
chemical makeup, speed, time of day, etc. In fact, the
analog data may be of many different variables measured
and captured simultaneously.

Electronic eyes, temperature monitors, video equipment,

telemetry, timers—there are many sources of analog data.

It is normal for there to be many occurrences of analog

data. Depending on the machine and what processing is
occurring, it is normal to take measurements every second,
every ten seconds, or perhaps every minute.

In truth, most of the measurements—those within the

band of normality—may not be very interesting or useful.
But occasionally, there will be a measurement outside the
band of normality that indeed is very interesting.

The challenge in capturing and managing analog and IoT

data is in determining:
16 • Building the Data Lakehouse

• What types of data to capture and measure

• The frequency of data capture
• The band of normality

Other challenges include the volume of data collected, the

need to occasionally transform the data, finding and
removing outliers, relating the analog data to other data,
and so forth. As a rule, store data inside the band of
normality in bulk storage and data outside the band of
normality in a separate store.

Another way to store data is by relevancy to problem-

solving. Traditionally, certain types of data are more
relevant to solving a problem than other sorts of data.

There typically are three things that catch the attention of

the person analyzing analog data:

• Specific values of data

• Trends of data across many occurrences
• Correlative patterns

Other types of unstructured data

The majority of the data generated by enterprises today falls

under unstructured data—images, audio, and video content.
Evolution to the Data Lakehouse • 17

You cannot store this data in a typical database table as it

normally lacks a tabular structure. Given the massive
volume of analog and IoT data, storing and managing
these datasets is very expensive.

It isn’t easy to analyze unstructured data with SQL-only

interfaces. However, with the advent of cheap blob storage
in the cloud, elastic cloud compute and machine learning
algorithms can access unstructured data directly—
enterprises are beginning to understand the potential of
these datasets.

Here are some emerging use cases for unstructured data.

Image Data
• Medical image analysis to help radiologists with X-
Rays, CT, and MRI scans
• Image classification for hotels and restaurants to
classify pictures of their properties and food
• Visual search for product discovery to improve the
experience for e-commerce companies
• Brand identification in social media images to
identify demographics for marketing campaigns

Audio Data
• Automated transcription of call-center audio data
to help provide better customer service
18 • Building the Data Lakehouse

• Conversational AI techniques to recognize speech

and communicate in a similar way to human
conversation
• Audio AI to map out the various acoustic
signatures of machines in a manufacturing plant to
proactively monitor the equipment

Video Data
• In-store analytic video analytics to provide people
counting, queue analysis, heat maps, etc., to
understand how people are interacting with
products
• Video analytics to automatically track inventory
and also detect product faults in the manufacturing
process
• Video data to provide deep usage data, helping
policy makers and governments decide when
public infrastructure requires maintenance work
• Facial recognition to allow healthcare workers to be
alerted if and when a patient with dementia leaves
the facility and respond appropriately

Where is business value?

There are different kinds of business value associated with
different classifications of data. First, there is business
value for the day-to-day activities. Second, there is long-
Evolution to the Data Lakehouse • 19

term strategic business value. Third, there is business

value in the management and operation of mechanical
devices.

Not surprisingly, there is a very strong relationship

between structured data and business value. The world of
transactions and structured data is where the organization
conducts its day-to-day business. And there is also a
strong relationship between textual data and business
value. Text is the very fabric of the business.

But there is a different kind of business relationship

between analog/IoT and today’s business. Organizations
are only beginning to understand the potential of
analog/IoT data today with access to massive cloud
computing resources and machine learning frameworks.
For example, organizations use image data to identify
quality defects in manufacturing, audio data in call centers
to analyze customer sentiment, and video data of remote
operations such as oil and gas pipelines to perform
predictive maintenance.

The data lake

The data lake is an amalgamation of all of the different kinds of

data found in the organization.
20 • Building the Data Lakehouse

The first type of data in the lake is structured data. The

second type of data is textual data. And the third type of
data is analog/IoT data. There are many challenges with
the data that resides in the data lake. But one of the biggest
challenges is that the form and structure of analog/IoT data
is very different from the classical structured data in the
data warehouse. To complicate matters, the volumes of
data across the different types of data found in the data
lake are very different. As a rule, there is a very large
amount of data found in the analog/IoT portion of the data
lake compared to the volume of data found in other types
of data.

The data lake is where enterprises offload all their data,

given its low-cost storage systems with a file API that
holds data in generic and open file formats, such as
Apache Parquet and ORC. The use of open formats also
made data lake data directly accessible to a wide range of
other analytics engines, such as machine learning systems.

In the beginning, it was thought that all that was required

was to extract data and place it in the data lake. Once in
the data lake, the end user could just dive in and find data
and do analysis. However, organizations quickly
discovered that using the data in the data lake was a
completely different story than merely having the data
placed in the lake. Stated differently, the end user’s needs
were very different from the needs of the data scientist.
Evolution to the Data Lakehouse • 21

Figure 1-5. The data lake uses open formats.

The end user ran into all sorts of obstacles:

• Where was the data that was needed?

• How did one unit of data relate to another unit of
data?
• Was the data up to date?
• How accurate was the data?

Many of the promises of the data lakes have not been

realized due to the lack of some critical infrastructure
features: no support for transactions, no enforcement of
22 • Building the Data Lakehouse

data quality or governance, and poor performance

optimizations. As a result, most of the data lakes in the
enterprise have become data swamps.

In a data swamp, data just sits there are no one uses it. In the
data swamp, data just rots over time.

Current data architecture challenges

A common analytical approach is to use multiple
systems—a data lake, several data warehouses, and other
specialized systems, resulting in three common problems:

1. Expensive data movement with dual architecture.

More than 90% of analog/IoT data is stored in data
lakes due to its flexibility from open direct access to
files and low cost, as it uses cheap storage. To
overcome the data lake’s lack of performance and
quality issues, enterprises use ETL
(Extract/Transform/Load) to copy a small subset of
data in the data lake to a downstream data
warehouse for the most important decision support
and BI applications. This dual system architecture
requires continuous engineering to ETL data
between the lake and warehouse. Each ETL step
risks incurring failures or introducing bugs that
reduce data quality—keeping the data lake and
Evolution to the Data Lakehouse • 23

warehouse consistent is difficult and costly. At the

same time, ETL can integrate the data.

2. Limited support for machine learning. Despite

much research on the confluence of ML and data
management, none of the leading machine learning
systems, such as TensorFlow, PyTorch, and
XGBoost, work well on top of warehouses. Unlike
Business Intelligence (BI) which extracts a small
amount of data, ML systems process large datasets
using complex non-SQL code.

3. Lack of openness. Data warehouses lock data into

proprietary formats that increase the cost of
migrating data or workloads to other systems.
Given that data warehouses primarily provide
SQL-only access, it is hard to run any other
analytics engines, such as machine learning
systems against the data warehouses.

Emergence of the data lakehouse

From the data swamp, there emerges a new class of data

architecture called the data lakehouse. The data lakehouse
has several components:

• Data from the structured environment

• Data from the textual environment
24 • Building the Data Lakehouse

• Data from the analog/IoT environment

• An analytical infrastructure allowing data in the
lakehouse to be read and understood

A new open and standardized system design enables

analog/IoT data analysis by implementing similar data
structures and data management features to those found in
a data warehouse but operating directly on the kind of
low-cost storage used for data lakes.

Figure 1-6. The data lakehouse architecture.

Evolution to the Data Lakehouse • 25

The data lakehouse architecture addresses the key challenges of

current data architectures discussed in the previous section by
building on top of existing data lakes.

Here are the six steps to build out the analog/IoT

component of the data lakehouse architecture:

1. Taking a lake-first approach

Leverage the analog and IoT data already found in the

data lake, as the data lake already stores most structured,
textual, and other unstructured data on low-cost storage
such as Amazon S3, Azure Blob Storage, or Google Cloud.

2. Bringing reliability and quality to the data lake

• Transaction support leverages ACID transactions

to ensure consistency as multiple parties
concurrently read or write data, typically using
SQL
• Schema support provides support for DW schema
architectures like star/snowflake-schemas and
provides robust governance and auditing
mechanisms
• Schema enforcement provides the ability to specify
the desired schema and enforce it, preventing bad
data from causing data corruption
• Schema evolution allows data to change constantly,
enabling the end user to make changes to a table
26 • Building the Data Lakehouse

schema that can be applied automatically, without

the need for cumbersome DDL

3. Adding governance and security controls

• DML support through Scala, Java, Python, and SQL

APIs to merge, update and delete datasets,
enabling compliance with GDPR and CCPA and
also simplifying use cases like change data capture
• History provides records details about every
change made to data, providing a full audit trail of
the changes
• Data snapshots enable developers to access and
revert to earlier versions of data for audits,
rollbacks, or to reproduce experiments
• Role-based access control provides fine-grained
security and governance for row/columnar level for
tables

4. Optimizing performance

Enable various optimization techniques, such as caching,

multi-dimensional clustering, z-ordering, and data
skipping, by leveraging file statistics and data compaction
to right-size the files.

5. Supporting machine learning

• Support for diverse data types to store, refine,

analyze and access data for many new applications,
Evolution to the Data Lakehouse • 27

including images, video, audio, semi-structured

data, and text
• Efficient non-SQL direct reads of large volumes of
data for running machine learning experiments
using R and Python libraries
• Support for DataFrame API via a built-in
declarative DataFrame API with query
optimizations for data access in ML workloads,
since ML systems such as TensorFlow, PyTorch,
and XGBoost have adopted DataFrames as the
main abstraction for manipulating data
• Data versioning for ML experiments, providing
snapshots of data enabling data science and
machine learning teams to access and revert to
earlier versions of data for audits and rollbacks or
to reproduce ML experiments

6. Providing openness

• Open file formats, such as Apache Parquet and

ORC
• Open API provides an open API that can efficiently
access the data directly without the need for
proprietary engines and vendor lock-in
• Language support for not only SQL access but also
a variety of other tools and engines, including
machine learning and Python/R libraries
28 • Building the Data Lakehouse

Comparing data warehouse and data lake

with data lakehouse

Data Data lake Data lakehouse

warehouse

Data Closed, Open format Open format

format proprietary
format

Types of Structured All types: All types:

data data, with Structured data, Structured data,
limited semi-structured semi-structured
support for data, textual data, data, textual data,
semi- unstructured unstructured
structured data (raw) data (raw) data

Data SQL-only Open APIs for Open APIs for

access direct access to direct access to
files with SQL, R, files with SQL, R,
Python, and other Python, and other
languages languages

Reliabilit High quality, Low quality, data High quality,

y reliable data swamp reliable data with
with ACID ACID
transactions transactions

Governa Fine-grained Poor governance Fine-grained

nce and security and as security needs security and
security governance for to be applied to governance for
row/columnar files row/columnar
level for tables level for tables
Evolution to the Data Lakehouse • 29

Data Data lake Data lakehouse

warehouse

Performa High Low High

nce

Scalabilit Scaling Scales to hold any Scales to hold any

y becomes amount of data at amount of data at
exponentially low cost, low cost,
more regardless of type regardless of type
expensive

Use case Limited to BI, Limited to One data

support SQL machine learning architecture for
applications, BI, SQL, and
and decision machine learning
support

The data lakehouse architecture presents an opportunity

comparable to the one seen during the early years of the data
warehouse market. The unique ability of the lakehouse to
manage data in an open environment, blend all varieties of
data from all parts of the enterprise, and combine the data
science focus of the data lake with the end user analytics of the
data warehouse will unlock incredible value for organizations.
CHAPTER 2

Data Scientists and End

Users

F
irst applications, then data warehouses, and then
came a whole host of types of data. The volume of
data and the diversity of data were bewildering.
Soon these data types were placed in a data lake.

The data lake

Figure 2-1. The first rendition of a data lake was a repository of raw
data. Data was simply placed into the data lake for anyone to
analyze or use. The data lake data came from a wide variety of
sources.

31
32 • Building the Data Lakehouse

The analytical infrastructure

As time passed, we discovered the need for another data

lake component: the analytical infrastructure. The
analytical infrastructure was built from the raw data found
in the data lake, and did many functions such as:

• Identify how data related to each other

• Identify the timeliness of data
• Examine the quality of the data
• Identify the lineage of data

metadata taxonomies lineage

granularity model
summarization KPI
source document
key
record
transaction

Figure 2-2. The analytical infrastructure consisted of many different

components, which we will describe in a later chapter.

Different audiences

The analytical infrastructure served one audience and the data

lake served a different audience.
Data Scientists and End Users • 33

The primary audience served by the data lake was the data
scientist.

Figure 2-3. The data scientist used the data lake to find new and
interesting patterns and data trends in the organization.

The end user was the other type of community served by

the data lake and the analytical infrastructure.

Figure 2.4. The end user‘s role was to keep the business moving
forward productively and profitably on an ongoing basis.

The tools of analysis

One distinct difference between the end user and data
scientist was the tools used to analyze data. The data
scientist uses primarily statistical analytical tools.
34 • Building the Data Lakehouse

Occasionally, the data scientist uses exploratory tools, but

the data scientist uses statistical analysis tools for the most
part.

The end user addresses data analysis in a completely

different manner. The end user uses tools of simple
calculation and visualization. The end user looks to create
charts, diagrams, and other visual representations of data.

The data scientist tools operate on rough accumulations of

data. The end user tools operate on uniform, well-defined
data.

Visualization Statistics

Figure 2.5. There is a very basic difference in the data that the two
different communities operate on.

What is being analyzed?

Another difference between the data scientist and end user
is that the two roles look for different things. The data
science community is looking for new and profound
patterns and trends in the data. In doing so, once
Data Scientists and End Users • 35

discovering the patterns and trends, the data scientist can

improve the life and profitability of the organization.

The end user is not interested in discovering new patterns

of data. Instead, the end user is interested in recalculating
and reexamining old patterns of data. For example, the
end user is interested in monthly and quarterly KPIs
covering profitability, new customers, new types of sales,
etc.

KPIs The mind of a 6 year old

Quarterly profits The impact of a new competitive
product

Figure 2.6. The data that the data scientist is interested in is very
different from the end user‘s.

The analytical approaches

The analytical approaches taken by the data scientist and
the end user are very different as well.

The data scientist uses a heuristic model of analysis. In the

heuristic approach, the next step of analysis depends on
the results obtained from the previous steps. When the
data scientist first starts an analysis, the data scientist does
36 • Building the Data Lakehouse

not know what will be discovered or if anything will be

discovered. In many cases, the data scientist discovers
nothing. In other cases, the data scientist uncovers useful
patterns that have never before been seen or recognized.

The end user operates entirely differently from the data

scientist. The end user operates on the basis of regularly
occurring patterns of data. The end user relies upon simple
methods of calculation.

Calculation Discovery
Regular usage Irregular usage

Figure 2.7. The end user repeats the same analysis over and over on
different segments of time. The data scientist operates in a mode
of discovery.

Types of data

The data scientist operates on data with a low level of

granularity that is widely diverse. Typically the data
scientist works with data generated by a machine. Part of
the exploration experience is the ability to roam over and
examine a wide variety of different kinds of data.
Data Scientists and End Users • 37

The end user operates on summarized (or lightly

summarized) data that is highly organized and appears
regularly. Each month, each week, each day, the same type
of data is examined and recalculated.

Summarization Low granularity

Highly organized data Wide diversity of data

Figure 2.8. Even the types of data the different communities operate
on are different.

Given the stark differences in the needs of the different

communities, it is no surprise that the different
communities are attracted to different parts of the data
lake.

Does this difference in attraction preclude the different

communities from looking at data that is foreign to them?

The answer is not at all. There is no reason why the end

user cannot look at and use the raw data found in the data
lake. And conversely, there is no reason why the data
scientist cannot use the analytical infrastructure.
38 • Building the Data Lakehouse

metadata taxonomies lineage

granularity model
summarization KPI
source document
key
record
transaction

Figure 2.9. The data scientist is attracted to the raw data found in the
data lake, and the end user is attracted to the data found in the
analytical infrastructure.

Indeed, the data scientist may well find the analytical

infrastructure to be useful. However, although data
scientists learn techniques for data analysis, when they go
into the real world, they become data garbage men, as they
spend 95% of their time cleaning data and 5% of their time
doing data analysis.

There are then very different types of people that use the data
lakehouse for very different purposes. The purpose of the data
lakehouse is to serve all the different communities.
CHAPTER 3

Different Types of Data in

the Data Lakehouse
The data lakehouse is an amalgamation of different types of
data. Each of the different types of data has their own physical
characteristics.

structured textual other unstructured

transform taxonomies
streaming Data integration

extract
load Textual API and App
text ETL integration

The data lakehouse

transaction document
record lineage
taxonomies summarization
model KPI key source
metadata granularity

Figure 3-1. A data lakehouse and its infrastructure.

39
40 • Building the Data Lakehouse

The data lakehouse consists of:

• A data lake, where raw amounts of text are placed

• An analytical infrastructure, where descriptive
information is made available to the end user
• A collection of different kinds of data—structured,
textual, and other unstructured data

The data found in the data lakehouse is open.

Let’s dive deeper into each of these components.

Types of data
The three different types of data found in the data
lakehouse include:

• Structured data—transaction-based data

• Textual data—data from conversations and written
text
• Other unstructured data—analog data and IoT
data, typically machine-generated data

structured textual other unstructured

Figure 3-2. The three different types of data found in the data
lakehouse.
Another Random Document on
Scribd Without Any Related Topics
TURNING OVER THE TRENCHES.
THE RELIEF.

96.—In the afternoon and in each section:

Have all the tools and supplies collected and list drawn up ready
to hand over to successor against receipt for same.
Inspect equipment of men that they may be taken out completely.
Check up exact itinerary of relief in and out.

97.—At the time of relief:

Have rifles inspected and emptied.
Give strict orders for silence.
Follow same marching order as when coming in.
Have officer march in rear.

98.—On reaching billets.

Have the roll called and sent to the officer of the day.
Have rifles inspected.

THE DAY AFTER THE RELIEF.

99.—Replace equipment.
Have all arms cleaned and oiled.
Have broken arms turned in and others issued.
Inspect shoes, clothes, equipment, tools, and replace when
needed.
Have special inspection of gas-masks and replace if needed.

100.—Sanitation.
Have underwear washed, and personal cleanliness attended to,
baths, hair-cuts, etc.
Have premises kept clean and latrines disinfected daily.
OUT OF THE TRENCHES.

101.—Specialists' Instruction.
While in rest billets: Have all specialists' instruction continued:
sharpshooters, bomb-throwers, signallers, etc.

102.—Bayonet exercises.
Should be given special attention.

103.—Close and extended order drill

and marching give the men needed exercise.

104.—Relaxation.
should also be provided: in the form of games, contests,
entertainments, etc. They help to keep the men "fit."

105.—Efficiency.
The company commander should make it is his constant concern
that his men be kept at the highest possible point of efficiency.
QUESTIONS.
The following questions are topical. Supplements to the
answers found in this book should be looked for in the larger
works referred to in the preface.

Trench Life and Trench Warfare.

1.—What inspections should be made on the day before the
relief?
2.—State orders to be issued one hour before departure.
3.—What may be the marching orders, on the way to the
trenches?
4.—Describe precautions to be taken against enemy's fire, against
aeroplanes.
5.—What other precautions should be taken?
6.—What should the company commander attend to on reaching
the trenches?
7.—What possible improvements of trenches are obviously called
for?
8.—What special attention should be given the parapet?
9.—Give rules for drainage and sanitation.
10.—What precautions may be taken against capture of fire-
trench?
11.—What does trench warfare correspond to in open warfare?
12.—What does the safety of a sector depend on?
13.—What is the fundamental duty in trench warfare?
14.—What rule determines the number of men to be posted in the
fire-trench?
15.—Sum up their orders about firing before open terrain, before
covered terrain.
16.—What is meant by double sentinels?
17.—Why is listening attentively even more important than
keeping a sharp look out?
18.—Why should the sentinels refrain from answering the enemy's
fire?
19.—What is expected of the men in the listening posts?
20.—When should the sentinels fire on a clear night? When, on a
dark night?
21.—What should the sentinels do, if they hear the enemy's
digging?
22.—When and where are sharpshooters posted and what is their
duty?
23.—What information may patrols bring back?
24.—When should patrols be sent out and how should they be
assigned?
25.—What should the sentinels along a sector know about the
patrols, and the several possible patrols know about one another?
26.—Describe dress and equipment of men on patrols.
27.—Describe their method of advance.
28.—What should they do on encountering a hostile patrol?
29.—What should be the motto of men on patrol?
30.—What are some of the most useful informations about the
enemy, you should try to obtain?
31.—What motto should you have about ammunition?
32.—Describe several ways of leading enemy to waste
ammunition.
33.—What is the distinction between legitimate and illegitimate
ruses?
34.—On what principle is the enemy's ruse of the use of blank
cartridges based?
35.—How may this ruse be foiled?
36.—What should the sentinels, and what should the men on
patrol do, when the enemy sends up flares?
37.—How should the enemy's machine gun fire be answered?
An Enemy's Attack.
38.—Describe procedure when enemy's patrols are sighted by
sentinels and when an attack develops.
39.—When are the trench mortars and the machine guns fired?
40.—How are hand grenades thrown?
41.—Where should the rifle fire be aimed?
42.—When are bayonets used?
43.—Is it sufficient to repulse an attack?
44.—What formation should be adopted for the counter attack?
45.—How is the advance made and the counter attack carried
out?
46.—Describe what is meant by organization of a newly
conquered trench.
47.—What should be done, if the enemy bombards the fire-
trench?
48.—What should the sentinels do?
49.—What should be done if the bombardment is back of the fire-
trench?
50.—What general rule applies to the use of all trench artillery?
51.—What are its ordinary objectives?
52.—How are trench mortars handled?
53.—What is meant by calling trench-artillery mobile weapons?
54.—Give a general caution for the use of all ammunition.
55.—What is essential to secure effective artillery fire?
56.—What should be done if one's own artillery fire falls short
upon one's own trenches?
57.—How is coordination between artillery and infantry secured in
case of a raid?
58.—What are the principal items of the morning schedule, of the
afternoon schedule?
59.—Describe the preparations for leaving the trenches.
60.—What orders are given at the time of relief?
61.—What is done before the men are dismissed to their billets?
62.—How should the days in rest billets be utilized?
63.—Describe a typical day in the trenches.
64.—Describe a typical day in rest billets.
65.—What should be the supreme aim alike of men and officers?
Part II.

French Infantry Combat

Principles.
FRENCH INFANTRY COMBAT
PRINCIPLES.
OPEN WARFARE.

106.—Is open warfare probable?

It is improbable that in this war trench warfare will definitely give
place on all sectors of the front to open warfare.
But the tactics that have forced several retirements will force
others.
If sufficient troops are available, tried and fit and resolute, with
the necessary quantities of ammunition and improved artillery, we
shall see German arrogance and brutality in victory become again
cringing fear and demoralization in defeat; the experience of the
Marne will be repeated and the invaders will be driven out of the
territory they swarmed over through treacherous breaking of
treaties.

107.—The need of training in Infantry Combat

Principles.
That day the infantry will come again unto its own and its dash
and resolution will insure victory.
To achieve it, it must be a well trained infantry, in the old sense of
the word. Officers, non-commissioned officers and men must have a
thorough and practical knowledge of Infantry Combat Principles.
These should be practiced in the intervals of trench service when
the battalion is in rest billets.
Their theory should be thoroughly mastered by all on whom may
devolve responsibility.

108.—The two phases of the Combat.

We shall study here the two principal phases of the combat: the
approach and the attack, from the point of view of the company
commander.

109.—The Defense.
We shall also consider the Combat from the standpoint of the
Defense.
THE APPROACH.

110.—All maneuvering at close range

impossible.
In the attack, the infantry can proceed only straight ahead. Under
infantry fire all maneuvering is impossible. Therefore by "approach"
is meant all maneuvering preparatory to the attack: It brings the
troops directly in front of and as near as possible to the objective.

PRELIMINARY DISPOSITIONS TO START THE

APPROACH.

111.—The orders to attack.

The company commander will receive his orders from the
battalion commander.

112.—Equipment and Liaison.

In the meanwhile let the lieutenants:
a) make sure that the men are fully equipped and provided with
full allotment of ammunition;
b) appoint and parade connecting files (runners) to await
orders.
113.—Distribution of Orders.
The company commander having received his orders from the
battalion commander, will then call his subordinates and issue his
own orders accordingly, including the formation to be adopted.

114.—Combat patrols.
He will make sure that there are combat patrols on the exposed
flank or flanks and to the front and rear if need be.
It is well to have combat patrols detach automatically. It may be
understood, once for all, that, without further orders, the first squad
will cover in front, the second to the right, the third to the left, the
fourth to the rear, whenever needed. Still, the officer in charge
should make sure that this arrangement is carried out.
A combat patrol, if not a full advance guard, will thus always
precede a unit and be the first to take contact with the enemy.

115.—Officers as guides.
The officers serve as guides to their units, until deployment, a
mounted officer in liaison with the advance guard or advanced
combat patrol checking up the itinerary.

116.—Keep Close Order as long as possible.

The advance of a company into an engagement is conducted in
close order, preferably columns of squads, until possible observation
by the enemy or encountering of hostile fire makes it advisable to
deploy.
Deployment should not be premature and should always follow
upon the conditions arising during the progress of the advance.
PRECAUTIONS AGAINST HOSTILE
ARTILLERY.
AGAINST SILENT ARTILLERY.

117.—Nearing artillery which may open fire.

About two or three miles from the positions liable to be occupied
by the enemy's field artillery, precautions should be taken against
the possibility of its opening fire.

118.—Deployment.
Deployments should be adopted best suited to escape
observation:

119.—To escape direct observation:

March in single or double file, the whole section[D] keeping closed
up so as to diminish the number of files seen from the front.

120.—Under aeroplane observation:

Avoid especially the center of roads as they show white, utilize on
the contrary the spaces between cultivated fields of different colors,
make use of all possible cover, trees, shrubs, ditches, embankments.
Always walk in the shade when possible. If hostile aeroplanes are
flying low, halt and lie down on left side, hiding face in elbow.

FOOTNOTE:
[D] The French "section" comprises 54 men. It is thus
equivalent to 7 squads, and may be considered as 2 platoons.
CROSSING A BOMBARDED ZONE.

121.—Case I. Artillery opening fire to register.

A registering fire is easily recognized as the German artillery
registers either with a single percussion shell at a time, or with two
time-shells at three seconds interval.
In the German field gun, the setting of the angle of sight[E] and
of the elevation[F] involves two operations.

122.—Oblique to right then to left.

Therefore infantry under registering fire should oblique forward
rapidly.

123.—Case II: Artillery opening fire for effect.

The zone has necessarily been previously registered. Such a zone
is easily recognized by the presence of shell holes.

124.—Avoid Zone if possible.

It should be avoided and the advance made on its outskirts.

125.—The five cases of fire for effect.

If this cannot be done and the fire for effect materializes five
cases are to be distinguished as the shells may be:
1. Shrapnel shells bursting at right height;
2. Shrapnel shells bursting high;
3. Time-Fuse high explosive shells bursting at right height;
4. Time-Fuse high explosive shells bursting high;
5. Percussion high explosive shells.

126.—Case 1. Burst Area of Shrapnel shells

bursting at right height.
The area of burst is about 250 to 300 yards in length and 30
yards in width, half the bullets falling on the first 50 yards of the
beaten zone.

127.—Protective Formation against Shrapnel.

Advance in line of section, in single or double file keeping as
closed up as possible with 30 yards intervals between sections.
The second line should be 250 to 300 yards behind the first.

128.—Case 2. Shrapnel shells bursting high.

Much less dangerous than when bursting at right height as initial
speed of bullets is spent. Same formation as for Case 1.

129.—Case 3. Burst area of Time-fuse high

explosive shells bursting at right height.
The area of burst is opposite to that of shrapnel: short depth,
large width, only 7 to 10 yards depths as opposed to 60 to 100 yards
in width.

130.—Protective Formation against Time-fuse

high explosives.
Advance in line of section, single or double file, keeping as closed
up as possible with 60 to 100 yards intervals between sections.
The second line may be 15 yards behind the first.

131.—Case 4. High explosive shells bursting

high.
The depth of the area of burst is longer than when shells burst at
the right height; therefore widen interval between the lines.

132.—Case 5. Burst area of percussion high

explosive shells.
The radius of the explosion is only about 25 yards but the local
effect is intense and the displacement is effective in more than
double the radius.

133.—Protective Formation against percussion

high explosive shells.
Advance in line of section in double file, keeping as closed up as
possible, with about 100 yards intervals between sections.
The second line may be about 50 yards behind the first.

GENERAL RECOMMENDATIONS AGAINST ALL

TYPES OF EFFECTIVE FIRE.

134.—Dangerous to stop, useless to run.

Do not stop in a zone under fire for effect as lying down only
provides a larger target. If absolutely obliged to stop, remain
standing and packed together like sardines, maintaining above
formations and intervals. It is useless to run, but, as much as
possible, advance steadily.

135.—Protective Formation against all types of

shells.
As may appear from the study of the above the following
formation and intervals will afford the best protection against all
types and combinations of types of shells, as a shell will never affect
more than one section.
Advance in lines of sections in double file, keeping as closed up as
possible, with 85 to 110 yards intervals[G] between sections.
The second line should be 250 to 300 yards behind the first.

FOOTNOTES:
[E] Inclination of the line of sight to the horizontal.
[F] The vertical inclination of the gun.
[G] All through this chapter, maximum intervals are given.
They may have to be shortened to secure closer order at the
expense of greater safety.
SPECIAL FEATURES OF THE
APPROACH.
USE OF WOODS AS SHELTER ON
THE ADVANCE.

136.—Avoid if small.
They should be used to advance or halt only if they are of
considerable size. Then, they hide movements and provide some
shelter from fire. On the contrary, when they are small, they are to
be avoided as they draw artillery fire and do not offer sufficient
protection.

137.—Liaison difficult.
When advancing in woods, special care should be taken to keep
all fractions connected.

138.—Exit quickly at one time.

To exit from wood, take all necessary dispositions under cover so
that, on the signal of the commander, all fractions may be ready to
spring out together. They should continue to advance forward, as
rapidly as possible, to avoid the enemy's likely shelling of the
outskirts.

138.—Otherwise exit in different places.

If the exit cannot be made by all fractions at one time, the
elements of the second line should avoid coming out at the same
point as those of the first line.
TO CROSS A CREST.

139.—Cross altogether and rapidly.

Let the line of sections assemble at top of crest, crouching
carefully below the sky line. Then, upon concerted signal, all should
leap quickly across and down the descending slope, making as
extended bounds as possible.
This makes crossing fairly safe as even the infantry will have to
modify both its elevation and angle of sight for every new position of
this quickly moving target.
PRECAUTIONS AGAINST CAVALRY.

140.—Cavalry Patrols.
During the whole "approach" watch should be kept for possible
cavalry patrols. The elements acting as advance guard and flank
guards or as combat patrols have as part of their special mission to
keep the cavalry away from the main body.

141.—Face and Fire.

To repulse cavalry, the infantry must be able to face quickly
toward the charging horsemen and furnish a heavy fire.

142.—Protective formations.
If cavalry patrols are expected ahead, deployment as skirmishers
will secure this, if on the flanks, deploy in columns of squads
marching in double file. A formation in echelon is effective at all
times.

143.—Repulsing the charge.

If cavalry appears, stop, face the charge quickly, fix bayonets and
fire at will, the section leaders controlling the fire.

144.—In case of surprise.

If surprised, deploy quickly and lie down.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge

connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and

personal growth every day!

ebookbell.com

Concept Map - Chemistry - 2018 - June
100% (2)
Concept Map - Chemistry - 2018 - June
1 page
What Should Drug Manufacturers Do To Prevent Formation of Glass Lamellae (Glass Fragments) in Injectable Drugs Filled in Small-Volume Glass Vials?
No ratings yet
What Should Drug Manufacturers Do To Prevent Formation of Glass Lamellae (Glass Fragments) in Injectable Drugs Filled in Small-Volume Glass Vials?
7 pages
India's Current and Potential MSW Use in RDF Production ICR October 2020
No ratings yet
India's Current and Potential MSW Use in RDF Production ICR October 2020
5 pages
PH Diagram (R404A)
100% (1)
PH Diagram (R404A)
5 pages
B+V Manual - CLS 100-150 - 621100-Y-BC-D Rev 005
0% (1)
B+V Manual - CLS 100-150 - 621100-Y-BC-D Rev 005
30 pages
FLA Solar Pump User Manual
No ratings yet
FLA Solar Pump User Manual
10 pages
VEX Clawbot Programming Guide
No ratings yet
VEX Clawbot Programming Guide
100 pages
Emergency Descent
No ratings yet
Emergency Descent
18 pages
IMUP
No ratings yet
IMUP
5 pages
Vitastiq Quick Start Guide v201612
No ratings yet
Vitastiq Quick Start Guide v201612
2 pages
Nagar Nigam Proposal
No ratings yet
Nagar Nigam Proposal
7 pages
11kv 630a 25ka Outdoor VCB Panel
100% (1)
11kv 630a 25ka Outdoor VCB Panel
3 pages
Legure Bakra - Bronze
No ratings yet
Legure Bakra - Bronze
25 pages
Heat Exchanger Cleaning
100% (1)
Heat Exchanger Cleaning
26 pages
Communication Manual - PR Series
No ratings yet
Communication Manual - PR Series
180 pages
DẠNG BÀI ĐỌC HIỂU
No ratings yet
DẠNG BÀI ĐỌC HIỂU
8 pages
Multi Radio BTS Installation Guide
No ratings yet
Multi Radio BTS Installation Guide
11 pages
Waveguides & Antennas Course Plan
No ratings yet
Waveguides & Antennas Course Plan
6 pages
Peppa ріg: task
No ratings yet
Peppa ріg: task
9 pages
Shawarma Nutritional Value
No ratings yet
Shawarma Nutritional Value
1 page
Fundamentals of Statistics 6th Edition Michael Sullivan Latest PDF 2025
No ratings yet
Fundamentals of Statistics 6th Edition Michael Sullivan Latest PDF 2025
26 pages
Eur Man Jour
No ratings yet
Eur Man Jour
27 pages
MRCS OSCE Recalls PDF
No ratings yet
MRCS OSCE Recalls PDF
3 pages
M Tech Automotive Engineering
No ratings yet
M Tech Automotive Engineering
22 pages
Parting Line: Non-IH Gap 2-5 MM
No ratings yet
Parting Line: Non-IH Gap 2-5 MM
1 page
Bioplastics, What Is It, Its Properties
No ratings yet
Bioplastics, What Is It, Its Properties
3 pages
Plants of Ramayana
No ratings yet
Plants of Ramayana
14 pages
Stephen Neonatal Transport Ventilator F 120 Mobil
No ratings yet
Stephen Neonatal Transport Ventilator F 120 Mobil
4 pages
Software Metric Calculator Report
No ratings yet
Software Metric Calculator Report
66 pages
Computer AMC Services 1
No ratings yet
Computer AMC Services 1
38 pages