Building The Data Lakehouse Bill Inmon Ranjeet
Srivastava Mary Levins download
   https://ebookbell.com/product/building-the-data-lakehouse-bill-
           inmon-ranjeet-srivastava-mary-levins-44238712
   Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
           interested in. You can click the link to download.
 Building The Data Lakehouse Bill Inmon Mary Levins Ranjeet Srivastava
 https://ebookbell.com/product/building-the-data-lakehouse-bill-inmon-
 mary-levins-ranjeet-srivastava-38478248
 The Azure Data Lakehouse Toolkit Building And Scaling Data Lakehouses
 On Azure With Delta Lake Apache Spark Databricks Synapse Analytics And
 Snowflake 1st Edition Ron Lesteve
 https://ebookbell.com/product/the-azure-data-lakehouse-toolkit-
 building-and-scaling-data-lakehouses-on-azure-with-delta-lake-apache-
 spark-databricks-synapse-analytics-and-snowflake-1st-edition-ron-
 lesteve-43888052
 Building The Data Warehouse 3rd Edition 3rd Edition W H Inmon
 https://ebookbell.com/product/building-the-data-warehouse-3rd-
 edition-3rd-edition-w-h-inmon-2001620
 Building The Data Warehouse 4th Ed W H Inmon
 https://ebookbell.com/product/building-the-data-warehouse-4th-ed-w-h-
 inmon-977674
Building The Snowflake Data Cloud Monetizing And Democratizing Your
Data Andrew Carruthers
https://ebookbell.com/product/building-the-snowflake-data-cloud-
monetizing-and-democratizing-your-data-andrew-carruthers-44888670
The Edge Data Center Building The Connected Future Hugh Taylor
https://ebookbell.com/product/the-edge-data-center-building-the-
connected-future-hugh-taylor-51012162
Building The New Economy Data As Capital Alex Pentland Alexander
Lipton
https://ebookbell.com/product/building-the-new-economy-data-as-
capital-alex-pentland-alexander-lipton-38248216
Cracking The Data Engineering Interview Land Your Dream Job With The
Help Of Resumebuilding Tips Over 100 Mock Questions And A Unique
Portfolio 1st Edition Anonymous
https://ebookbell.com/product/cracking-the-data-engineering-interview-
land-your-dream-job-with-the-help-of-resumebuilding-tips-
over-100-mock-questions-and-a-unique-portfolio-1st-edition-
anonymous-55324592
Practical Data Science A Guide To Building The Technology Stack For
Turning Data Lakes Into Business Assets 1st Edition Andreas Franois
Vermeulen
https://ebookbell.com/product/practical-data-science-a-guide-to-
building-the-technology-stack-for-turning-data-lakes-into-business-
assets-1st-edition-andreas-franois-vermeulen-24013854
 Building the
Data Lakehouse
    Bill Inmon
      Mary Levins
   Ranjeet Srivastava
     Technics Publications
2 Lindsley Road
Basking Ridge, NJ 07920 USA
https://www.TechnicsPub.com
Edited by Jamie Hoberman
Cover design by Lorena Molinari
All rights reserved. No part of this book may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the publisher, except
for brief quotations in a review.
The authors and publisher have taken care in the preparation of this
book, but make no expressed or implied warranty of any kind and
assume no responsibility for errors or omissions. No liability is assumed
for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.
All trade and product names are trademarks, registered trademarks, or
service marks of their respective companies and are the property of
their respective holders and should be treated as such.
First Printing 2021
Copyright © 2021 by Bill Inmon, Mary Levins, and Ranjeet Srivastava
ISBN, print ed.   9781634629669
ISBN, Kindle ed. 9781634629676
ISBN, ePub ed.    9781634629683
ISBN, PDF ed.     9781634629690
Library of Congress Control Number: 2021945389
           Acknowledgments
The authors would like to acknowledge the contributions
made by Bharath Gowda from Databricks Corporation.
Bharath made sure that when we started wandering off
into the desert, he led us back to civilization. We sincerely
thank Bharath for his contributions.
In addition, we thank Sean Owen and Jason Pohl from
Databricks for their contributions. Thank you, Sean and
Jason.
                         Contents
Introduction ___________________________________________________1
Chapter 1: Evolution to the Data Lakehouse _______________________ 5
  The evolution of technology ______________________________________ 5
  All the data in the organization __________________________________ 12
  Where is business value? _______________________________________ 18
  The data lake _________________________________________________ 19
  Current data architecture challenges _____________________________ 22
  Emergence of the data lakehouse ________________________________ 23
  Comparing data warehouse and data lake with data lakehouse ______ 28
Chapter 2: Data Scientists and End Users __________________________31
  The data lake _________________________________________________ 31
  The analytical infrastructure ____________________________________ 32
  Different audiences ____________________________________________ 32
  The tools of analysis ___________________________________________ 33
  What is being analyzed? ________________________________________ 34
  The analytical approaches ______________________________________ 35
  Types of data _________________________________________________ 36
Chapter 3: Different Types of Data in the Data Lakehouse __________ 39
  Types of data _________________________________________________ 40
  Different volumes of data _______________________________________ 45
  Relating data across the diverse types of data _____________________ 45
  Segmenting data based on probability of access____________________ 46
  Relating data in the IoT and the analog environment _______________ 47
  The analytical infrastructure ____________________________________ 50
Chapter 4: The Open Environment _______________________________ 53
  The evolution of open systems ___________________________________ 54
  Innovation today ______________________________________________ 55
  The unstructured portion builds on open standards _________________ 56
  Open source lakehouse software _________________________________ 57
  Lakehouse provides open APIs beyond SQL ________________________ 60
  Lakehouse enables open data sharing ____________________________ 61
                                    i
 ii • Building the Data Lakehouse
  Lakehouse supports open data exploration _______________________      63
  Lakehouse simplifies discovery with open data catalogs ____________   65
  Lakehouse leverages cloud architecture __________________________     66
  An evolution to the open data lakehouse _________________________     69
Chapter 5: Machine Learning and the Data Lakehouse _______________ 71
  Machine learning ______________________________________________ 71
  What machine learning needs from a lakehouse ____________________ 72
  New value from data ___________________________________________ 73
  Resolving the dilemma__________________________________________ 73
  The problem of unstructured data ________________________________ 75
  The importance of open source ___________________________________ 77
  Taking advantage of cloud elasticity ______________________________ 78
  Designing “ MLOps“ for a data platform __________________________ 80
  Example: Learning to classify chest x-rays _________________________ 81
  An evolution of the unstructured component _______________________85
Chapter 6: The Analytical Infrastructure for the Data Lakehouse ____ 87
  Metadata ____________________________________________________ 89
  The data model _______________________________________________ 90
  Data quality _________________________________________________ 92
  ETL _______________________________________________________ 93
  Textual ETL ___________________________________________________ 94
  Taxonomies __________________________________________________ 95
  Volume of data _______________________________________________ 96
  Lineage of data ________________________________________________ 97
  KPIs _______________________________________________________ 98
  Granularity __________________________________________________ 99
  Transactions ________________________________________________ 100
  Keys _______________________________________________________ 101
  Schedule of processing _________________________________________102
  Summarizations ______________________________________________102
  Minimum requirements ________________________________________104
Chapter 7: Blending Data in the Data Lakehouse _________________ 105
  The lakehouse and the data lakehouse ___________________________ 105
  The origins of data ____________________________________________106
  Different types of analysis _____________________________________ 107
  Common identifiers ___________________________________________109
                                                                 Contents • iii
   Structured identifiers _________________________________________ 110
   Repetitive data ________________________________________________111
   Identifiers from the textual environment _________________________ 112
   Combining text and structured data _____________________________ 114
   The importance of matching ____________________________________ 121
Chapter 8: Types of Analysis Across the Data Lakehouse Architecture_______125
  Known queries _______________________________________________ 125
  Heuristic analysis ____________________________________________ 128
Chapter 9: Data Lakehouse Housekeeping™ __________________________135
  Data integration and interoperability ___________________________ 138
  Master references for the data lakehouse ________________________ 142
  Data lakehouse privacy, confidentiality, and data protection _______ 145
  “Data future-proofing™” in a data lakehouse _____________________ 148
  Five phases of “Data Future-proofing” ___________________________ 154
  Data lakehouse routine maintenance ____________________________ 165
Chapter 10: Visualization _______________________________________167
  Turning data into information __________________________________ 169
  What is data visualization and why is it important? _______________ 172
   Data visualization, data analysis, and data interpretation _________ 174
   Advantage of data visualization ________________________________ 177
Chapter 11: Data Lineage in the Data Lakehouse Architecture_____________ 191
  The chain of calculations _______________________________________ 192
  Selection of data______________________________________________ 194
  Algorithmic differences ________________________________________ 194
  Lineage for the textual environment _____________________________ 196
  Lineage for the other unstructured environment __________________ 197
  Data lineage _________________________________________________ 198
Chapter 12: Probability of Access in the Data Lakehouse Architecture ______ 201
  Efficient arrangement of data __________________________________ 202
  Probability of access __________________________________________ 203
  Different types of data in the data lakehouse _____________________204
  Relative data volume differences _______________________________ 205
  Advantages of segmentation ___________________________________ 206
 iv • Building the Data Lakehouse
  Using bulk storage ___________________________________________ 207
  Incidental indexes ____________________________________________ 207
Chapter 13: Crossing the Chasm__________________________________ 209
  Merging data ________________________________________________ 209
  Different kinds of data _________________________________________210
  Different business needs _______________________________________ 211
  Crossing the chasm ____________________________________________ 211
Chapter 14: Managing Volumes of Data in the Data Lakehouse ___________ 219
  Distribution of the volumes of data _____________________________ 220
  High performance/bulk storage of data __________________________ 221
  Incidental indexes and summarization __________________________ 222
  Periodic filtering _____________________________________________ 224
  Tokenization of data__________________________________________ 225
  Separating text and databases _________________________________ 225
  Archival storage _____________________________________________ 226
  Monitoring activity ___________________________________________ 227
  Parallel processing ____________________________________________ 227
Chapter 15: Data Governance and the Lakehouse_____________________ 229
  Purpose of data governance ___________________________________ 229
  Data lifecycle management ___________________________________ 232
  Data quality management ____________________________________ 234
  Importance of metadata management __________________________ 236
  Data governance over time _____________________________________ 237
  Types of governance __________________________________________ 238
  Data governance across the lakehouse __________________________ 239
  Data governance considerations ________________________________ 241
Index ______________________________________________________ 243
                    Introduction
Once the world had simple applications. But in today’s
world, we have all sorts of data, technology, hardware,
and other gadgets. Data comes to us from a myriad of
places and comes in many forms. And the volume of data
is just crushing.
There are three different types of data that an organization
uses for analytical purposes. First, there is classical
structured data that principally comes from executing
transactions. This structured data has been around the
longest. Second, there is textual data from emails, call
center conversations, contracts, medical records, and
elsewhere. Once text was a “black box” that could only be
stored but not analyzed by the computer.
Now, textual Extract, Transform, and Load (ETL)
technology has opened the door of text to standard
analytical techniques. Third, there is the world of
analog/IoT. Machines of every kind, such as drones,
electric eyes, temperature gauges, and wristwatches—all
can generate data. Analog/IoT data is in a much rougher
form than structured or textual data. And there is a
tremendous amount of this data generated in an
automated manner. Analog/IoT data is the domain of the
data scientist.
                             1
 2 • Building the Data Lakehouse
At first, we threw all of this data into a pit called the “data
lake.” But we soon discovered that merely throwing data
into a pit was a pointless exercise. To be useful—to be
analyzed—data needed to (1) be related to each other and
(2) have its analytical infrastructure carefully arranged and
made available to the end user.
     Unless we meet these two conditions, the data lake turns into
            a swamp, and swamps start to smell after a while.
A data lake that does not meet the criteria for analysis is a
waste of time and money.
Enter the data lakehouse. The data lakehouse indeed adds
the elements to the data lake to become useful and
productive. Stated differently, if all you build is a data lake
without turning it into a data lakehouse, you have just
created an expensive eyesore. Over time that eyesore is
going to turn into an expensive liability.
The first of those elements needed for analysis and
machine learning is the analytical infrastructure. The
analytical infrastructure contains a combination of familiar
things and some things that may not be familiar. For
example, the analytical infrastructure of the data lakehouse
contains:
    •    Metadata
    •    Lineage of the data
                                               Introduction • 3
   •   Volumetric measurements
   •   Historical records of creation
   •   Transformation descriptions
The second essential element of the data lakehouse needed
for analysis and machine learning is recognizing and using
the universal common connector. The universal common
connector allows data of all varieties to be combined and
compared. Without the universal common connector, it is
very difficult (if not impossible) for the diverse types of
data found in the data lakehouse to be related. But with
the universal common connector, it is possible to relate any
kind of data.
With the data lakehouse, it is possible to achieve a level of
analytics and machine learning that is not feasible or
possible any other way. But like all architectural
structures, the data lakehouse requires an understanding
of architecture and an ability to plan and create a
blueprint.
                        CHAPTER 1
        Evolution to the Data
             Lakehouse
M
            ost evolutions occur over eons of time. The
            evolution occurs so slowly that the steps in the
            evolution are not observable on a day-to-day
basis. Watching the daily progression of an evolution
makes watching paint dry look like a spectator sport.
However, the evolution of computer technology has
progressed at warp speed, starting in the 1960s.
The evolution of technology
Once upon a time, life was simple when it came to the
computer. Data went in, was processed, and data came
out. In the beginning, there was paper tape. Paper tape
was automated but stored a minuscule amount of data in a
fixed format. Then came punched cards. One of the
problems with punched cards was that they were in a
fixed format. Huge volumes of punched cards consumed
huge amounts of paper and dropping a deck of cards led
to a tedious effort to get the cards back in order.
                               5
 6 • Building the Data Lakehouse
Then modern data processing began with magnetic tape,
which opened up the door to the storage and usage of
larger volumes of data not in a fixed format. The problem
with magnetic tape was that you had to search the entire
file to find a particular record. Stated differently, with
magnetic tape files, you had to search data sequentially.
And magnetic tapes were notoriously fragile, so storing
data for long periods was not advisable.
Then came disk storage. Disk storage truly opened the
door even wider to modern IT processing by introducing
direct data access. With disk storage, you could go to a
record directly, not sequentially. Although there were cost
and availability issues early on, disk storage became much
less expensive, and large volumes of disk storage became
widely available over time.
Online transaction processing (OLTP)
      The fact that data could be accessed directly opened up the
         door to high-performance, direct access applications.
With high-performance and direct data access, Online
Transaction Systems (OLTP) became possible. Once online
transaction      processing        systems     became       available,
businesses found that computers had entered into the very
fabric of the business. Now there could be online
                                       Evolution to the Data Lakehouse • 7
reservation systems, banking teller systems, ATM systems,
and the like. Now computers could directly interact with
customers.
In the early days of the computer, the computer was useful
for doing repetitive activities. But with online transaction
processing systems, the computer was useful for direct
interaction with the customer. In doing so, the business
value of the computer increased dramatically.
Computer applications
Very quickly, applications grew like weeds in the
springtime. Soon there were applications everywhere.
Figure 1-1. Lots of applications for lots of reasons.
The problem of data integrity
And with the growth of applications came a new and
unanticipated problem. In the early days of the computer,
the end user complained about not having his/her data.
But after being inundated with applications, the end user
then complained about not finding the RIGHT data.
 8 • Building the Data Lakehouse
     The end user switched from not being able to find data to not
      being able to find the right data. This sounds like an almost
              trivial shift, but it was anything but trivial.
With the proliferation of applications came the problem of
data integrity. The same data appeared in many places
with sometimes different values. To make a decision, the
end user had to find which version of the data was the right
one to use among the many available applications. Poor
business choices resulted when the end user did not find
and use the right version of data.
                                                  ABC = 45
                      ABC = 3200
                                      ABC = -30
                       ABC = 0
Figure 1-2. Trying to find the correct data on which to base decisions
    was an enormous task.
The challenge of finding the right data was a challenge that
few people understood. But over time, people began to
understand the complexity of finding the right data to use
for decision making. People discovered that they needed a
different architectural approach than simply building
more applications. Adding more machines, technology,
and consultants made matters relating to the integrity of
data worse, not better.
                                      Evolution to the Data Lakehouse • 9
     Adding more technology exaggerated the problems of the lack
                         of integrity of data.
The data warehouse
Enter the data warehouse. The data warehouse led to
disparate application data being copied into a separate
physical location. Thus, the data warehouse became an
architectural solution to an architectural problem.
Figure 1-3. An entirely new infrastructure around the data warehouse
    was needed.
 10 • Building the Data Lakehouse
Merely integrating data and placing it into a physically
separate location was only the start of the architecture. To
be successful, the designer had to build an entirely new
infrastructure       around         the   data   warehouse.   The
infrastructure that surrounded the data warehouse made
the data found in the data warehouse usable and easily
analyzed. Stated differently, as important as the data
warehouse was, the end user found little value in the data
warehouse         without           the   surrounding   analytical
infrastructure. The analytical infrastructure included:
    •   Metadata—a guide to what data was located where
    •   Data model—an abstraction of the data found in
        the data warehouse
    •   Data lineage—the tale of the origins and
        transformations of data found in the data
        warehouse
    •   Summarization—a description of the algorithmic
        work to create the data in the data warehouse
    •   KPIs—where are key performance indicators
        found
    •   ETL—technology that allowed applications data to
        be transformed automatically into corporate data
The issue of historical data
Data warehousing opened other doors for analytical
processing. Before data warehousing, there was no
                                   Evolution to the Data Lakehouse • 11
convenient place to store older and archival data easily
and efficiently—it was normal for organizations to store a
week, a month, or even a quarter’s worth of data in their
systems. But it was rare for an organization to store a year
or five years’ worth of data. But with data warehousing,
organizations could store ten years or more.
And there was great value in being able to store a longer
spectrum of time-valued data. For example, when
organizations became interested in looking at a customer’s
buying habits, understanding past buying patterns led the
way to understanding current and future buying patterns.
          The past became a great predictor of the future.
Data warehousing then added the dimension of a greater
length of time for data storage to the world of analysis.
Now historical data was no longer a burden.
As important and useful as data warehouses are, for the
most   part,   data    warehouses        focus     on    structured,
transaction-based data. It is worth pointing out that many
other data types are not available in the structured
environment or the data warehouse.
The evolution of technology did not stop with the advent
of structured data. Soon data appeared from many
different and diverse sources. There were call centers.
There was the internet. There were machines that
 12 • Building the Data Lakehouse
produced data. Data seemed to come from everywhere.
The    evolution       continued     well    beyond   structured,
transaction-based data.
The limitations of data warehouses became evident with
the increasing variety of data (text, IoT, images, audio,
videos, drones, etc.) in the enterprise. In addition, the rise
of Machine Learning (ML) and Artificial Intelligence (AI)
introduced iterative algorithms that required direct data
access to data not based on SQL.
All the data in the organization
As important and useful as data warehouses are, for the
most    part,    data     warehouses      are   centered   around
structured data. But now, there are many other data types
in the organization. To see what data resides in an
organization, consider a simple graph.
Figure 1-4. A simple graph.
Structured      data    is    typically   transaction-based   data
generated by an organization to conduct day-to-day
business activities. For example, textual data is data
generated by letters, emails, and conversations within the
                                   Evolution to the Data Lakehouse • 13
organization. Other unstructured data has other sources,
such as IoT, image, video, and analog-based data.
Structured data
The first type of data to appear was structured data. For
the most part, structured data was a by-product of
transaction processing. A record was written when a
transaction was executed. This could be a sale, payment,
phone call, bank activity, or other transaction type. Each
new record had a similar structure to the previous record.
To see this similarity of processing, consider the making of
a deposit in a bank. A bank customer walks up to the teller
window and makes a deposit. The next person comes to
the window and also makes a deposit. Although the
account numbers and deposit amounts are different, the
structures of both records are the same.
   We call this “structured data“ because the same data structure
                is written and rewritten repeatedly.
Typically when you have structured data, you have many
records—one for each transaction that has occurred. So
naturally, there is a high degree of business value placed
on structured data for no other reason than transactions
are very near the business’s heart.
 14 • Building the Data Lakehouse
Textual data
The primary reason raw text is not very useful is that raw
text must also contain context to be understood. Therefore,
it is not sufficient to merely read and analyze raw text.
      To analyze text, we must understand both the text and the
                              context of the text.
However, we need to consider other aspects of text. We
must consider that text exists in a language, such as
English, Spanish, German, etc. Also, some text is
predictable, but other text is not predictable. Analyzing
predictable      text    is     very      different   than   analyzing
unpredictable text. Another obstacle to incisive analysis is
that the same word can have multiple meanings. The word
“record” can mean a vinyl recording of a song. Or it can
mean the speed of a race. Or other things. And other
obstacles await the person that tries to read and analyze
raw text.
Textual ETL
Fortunately, creating text in a structured format is a real
possibility. There is technology known as textual ETL.
With textual ETL, you can read the raw text and transform
it into a standard database format, identifying both text
and context. And in doing so, you can now start to blend
                                   Evolution to the Data Lakehouse • 15
structured data and text. Or you can do an independent
analysis of the text by itself.
Analog data/IoT data
The operation of a machine, such as a car, watch, or
manufacturing machine, creates analog data. As long as
the machine is operating, it spews out measurements. The
measurements may be of many things—temperature,
chemical makeup, speed, time of day, etc. In fact, the
analog data may be of many different variables measured
and captured simultaneously.
       Electronic eyes, temperature monitors, video equipment,
      telemetry, timers—there are many sources of analog data.
It is normal for there to be many occurrences of analog
data. Depending on the machine and what processing is
occurring, it is normal to take measurements every second,
every ten seconds, or perhaps every minute.
In truth, most of the measurements—those within the
band of normality—may not be very interesting or useful.
But occasionally, there will be a measurement outside the
band of normality that indeed is very interesting.
The challenge in capturing and managing analog and IoT
data is in determining:
 16 • Building the Data Lakehouse
    •      What types of data to capture and measure
    •      The frequency of data capture
    •      The band of normality
Other challenges include the volume of data collected, the
need to occasionally transform the data, finding and
removing outliers, relating the analog data to other data,
and so forth. As a rule, store data inside the band of
normality in bulk storage and data outside the band of
normality in a separate store.
Another way to store data is by relevancy to problem-
solving. Traditionally, certain types of data are more
relevant to solving a problem than other sorts of data.
There typically are three things that catch the attention of
the person analyzing analog data:
    •      Specific values of data
    •      Trends of data across many occurrences
    •      Correlative patterns
Other types of unstructured data
        The majority of the data generated by enterprises today falls
     under unstructured data—images, audio, and video content.
                               Evolution to the Data Lakehouse • 17
You cannot store this data in a typical database table as it
normally lacks a tabular structure. Given the massive
volume of analog and IoT data, storing and managing
these datasets is very expensive.
It isn’t easy to analyze unstructured data with SQL-only
interfaces. However, with the advent of cheap blob storage
in the cloud, elastic cloud compute and machine learning
algorithms   can   access unstructured data directly—
enterprises are beginning to understand the potential of
these datasets.
Here are some emerging use cases for unstructured data.
Image Data
   •   Medical image analysis to help radiologists with X-
       Rays, CT, and MRI scans
   •   Image classification for hotels and restaurants to
       classify pictures of their properties and food
   •   Visual search for product discovery to improve the
       experience for e-commerce companies
   •   Brand identification in social media images to
       identify demographics for marketing campaigns
Audio Data
   •   Automated transcription of call-center audio data
       to help provide better customer service
 18 • Building the Data Lakehouse
    •   Conversational AI techniques to recognize speech
        and communicate in a similar way to human
        conversation
    •   Audio AI to map out the various acoustic
        signatures of machines in a manufacturing plant to
        proactively monitor the equipment
Video Data
    •   In-store analytic video analytics to provide people
        counting, queue analysis, heat maps, etc., to
        understand how people are interacting with
        products
    •   Video analytics to automatically track inventory
        and also detect product faults in the manufacturing
        process
    •   Video data to provide deep usage data, helping
        policy makers and governments decide when
        public infrastructure requires maintenance work
    •   Facial recognition to allow healthcare workers to be
        alerted if and when a patient with dementia leaves
        the facility and respond appropriately
Where is business value?
There are different kinds of business value associated with
different classifications of data. First, there is business
value for the day-to-day activities. Second, there is long-
                                     Evolution to the Data Lakehouse • 19
term strategic business value. Third, there is business
value in the management and operation of mechanical
devices.
Not surprisingly, there is a very strong relationship
between structured data and business value. The world of
transactions and structured data is where the organization
conducts its day-to-day business. And there is also a
strong relationship between textual data and business
value. Text is the very fabric of the business.
But there is a different kind of business relationship
between analog/IoT and today’s business. Organizations
are only beginning to understand the potential of
analog/IoT data today with access to massive cloud
computing resources and machine learning frameworks.
For example, organizations use image data to identify
quality defects in manufacturing, audio data in call centers
to analyze customer sentiment, and video data of remote
operations such as oil and gas pipelines to perform
predictive maintenance.
The data lake
    The data lake is an amalgamation of all of the different kinds of
                    data found in the organization.
 20 • Building the Data Lakehouse
The first type of data in the lake is structured data. The
second type of data is textual data. And the third type of
data is analog/IoT data. There are many challenges with
the data that resides in the data lake. But one of the biggest
challenges is that the form and structure of analog/IoT data
is very different from the classical structured data in the
data warehouse. To complicate matters, the volumes of
data across the different types of data found in the data
lake are very different. As a rule, there is a very large
amount of data found in the analog/IoT portion of the data
lake compared to the volume of data found in other types
of data.
The data lake is where enterprises offload all their data,
given its low-cost storage systems with a file API that
holds data in generic and open file formats, such as
Apache Parquet and ORC. The use of open formats also
made data lake data directly accessible to a wide range of
other analytics engines, such as machine learning systems.
In the beginning, it was thought that all that was required
was to extract data and place it in the data lake. Once in
the data lake, the end user could just dive in and find data
and    do     analysis.     However,   organizations   quickly
discovered that using the data in the data lake was a
completely different story than merely having the data
placed in the lake. Stated differently, the end user’s needs
were very different from the needs of the data scientist.
                                    Evolution to the Data Lakehouse • 21
Figure 1-5. The data lake uses open formats.
The end user ran into all sorts of obstacles:
    •   Where was the data that was needed?
    •   How did one unit of data relate to another unit of
        data?
    •   Was the data up to date?
    •   How accurate was the data?
Many of the promises of the data lakes have not been
realized due to the lack of some critical infrastructure
features: no support for transactions, no enforcement of
 22 • Building the Data Lakehouse
data quality or governance, and poor performance
optimizations. As a result, most of the data lakes in the
enterprise have become data swamps.
    In a data swamp, data just sits there are no one uses it. In the
                 data swamp, data just rots over time.
Current data architecture challenges
A common analytical approach is to use multiple
systems—a data lake, several data warehouses, and other
specialized systems, resulting in three common problems:
    1. Expensive data movement with dual architecture.
        More than 90% of analog/IoT data is stored in data
        lakes due to its flexibility from open direct access to
        files and low cost, as it uses cheap storage. To
        overcome the data lake’s lack of performance and
        quality issues, enterprises use ETL
        (Extract/Transform/Load) to copy a small subset of
        data in the data lake to a downstream data
        warehouse for the most important decision support
        and BI applications. This dual system architecture
        requires continuous engineering to ETL data
        between the lake and warehouse. Each ETL step
        risks incurring failures or introducing bugs that
        reduce data quality—keeping the data lake and
                              Evolution to the Data Lakehouse • 23
       warehouse consistent is difficult and costly. At the
       same time, ETL can integrate the data.
   2. Limited support for machine learning. Despite
       much research on the confluence of ML and data
       management, none of the leading machine learning
       systems, such as TensorFlow, PyTorch, and
       XGBoost, work well on top of warehouses. Unlike
       Business Intelligence (BI) which extracts a small
       amount of data, ML systems process large datasets
       using complex non-SQL code.
   3. Lack of openness. Data warehouses lock data into
       proprietary formats that increase the cost of
       migrating data or workloads to other systems.
       Given that data warehouses primarily provide
       SQL-only access, it is hard to run any other
       analytics engines, such as machine learning
       systems against the data warehouses.
Emergence of the data lakehouse
From the data swamp, there emerges a new class of data
architecture called the data lakehouse. The data lakehouse
has several components:
   •   Data from the structured environment
   •   Data from the textual environment
 24 • Building the Data Lakehouse
    •   Data from the analog/IoT environment
    •   An analytical infrastructure allowing data in the
        lakehouse to be read and understood
A new open and standardized system design enables
analog/IoT data analysis by implementing similar data
structures and data management features to those found in
a data warehouse but operating directly on the kind of
low-cost storage used for data lakes.
Figure 1-6. The data lakehouse architecture.
                                   Evolution to the Data Lakehouse • 25
   The data lakehouse architecture addresses the key challenges of
   current data architectures discussed in the previous section by
               building on top of existing data lakes.
Here are the six steps to build out the analog/IoT
component of the data lakehouse architecture:
1. Taking a lake-first approach
Leverage the analog and IoT data already found in the
data lake, as the data lake already stores most structured,
textual, and other unstructured data on low-cost storage
such as Amazon S3, Azure Blob Storage, or Google Cloud.
2. Bringing reliability and quality to the data lake
   •   Transaction support leverages ACID transactions
       to ensure consistency as multiple parties
       concurrently read or write data, typically using
       SQL
   •   Schema support provides support for DW schema
       architectures like star/snowflake-schemas and
       provides robust governance and auditing
       mechanisms
   •   Schema enforcement provides the ability to specify
       the desired schema and enforce it, preventing bad
       data from causing data corruption
   •   Schema evolution allows data to change constantly,
       enabling the end user to make changes to a table
 26 • Building the Data Lakehouse
        schema that can be applied automatically, without
        the need for cumbersome DDL
3. Adding governance and security controls
    •   DML support through Scala, Java, Python, and SQL
        APIs to merge, update and delete datasets,
        enabling compliance with GDPR and CCPA and
        also simplifying use cases like change data capture
    •   History provides records details about every
        change made to data, providing a full audit trail of
        the changes
    •   Data snapshots enable developers to access and
        revert to earlier versions of data for audits,
        rollbacks, or to reproduce experiments
    •   Role-based access control provides fine-grained
        security and governance for row/columnar level for
        tables
4. Optimizing performance
Enable various optimization techniques, such as caching,
multi-dimensional          clustering,   z-ordering,   and   data
skipping, by leveraging file statistics and data compaction
to right-size the files.
5. Supporting machine learning
    •   Support for diverse data types to store, refine,
        analyze and access data for many new applications,
                               Evolution to the Data Lakehouse • 27
       including images, video, audio, semi-structured
       data, and text
   •   Efficient non-SQL direct reads of large volumes of
       data for running machine learning experiments
       using R and Python libraries
   •   Support for DataFrame API via a built-in
       declarative DataFrame API with query
       optimizations for data access in ML workloads,
       since ML systems such as TensorFlow, PyTorch,
       and XGBoost have adopted DataFrames as the
       main abstraction for manipulating data
   •   Data versioning for ML experiments, providing
       snapshots of data enabling data science and
       machine learning teams to access and revert to
       earlier versions of data for audits and rollbacks or
       to reproduce ML experiments
6. Providing openness
   •   Open file formats, such as Apache Parquet and
       ORC
   •   Open API provides an open API that can efficiently
       access the data directly without the need for
       proprietary engines and vendor lock-in
   •   Language support for not only SQL access but also
       a variety of other tools and engines, including
       machine learning and Python/R libraries
28 • Building the Data Lakehouse
Comparing data warehouse and data lake
  with data lakehouse
                  Data                  Data lake      Data lakehouse
               warehouse
Data         Closed,            Open format           Open format
format       proprietary
             format
Types of     Structured         All types:            All types:
data         data, with         Structured data,      Structured data,
             limited            semi-structured       semi-structured
             support for        data, textual data,   data, textual data,
             semi-              unstructured          unstructured
             structured data    (raw) data            (raw) data
Data         SQL-only           Open APIs for         Open APIs for
access                          direct access to      direct access to
                                files with SQL, R,    files with SQL, R,
                                Python, and other     Python, and other
                                languages             languages
Reliabilit   High quality,      Low quality, data     High quality,
y            reliable data      swamp                 reliable data with
             with ACID                                ACID
             transactions                             transactions
Governa      Fine-grained       Poor governance       Fine-grained
nce and      security and       as security needs     security and
security     governance for     to be applied to      governance for
             row/columnar       files                 row/columnar
             level for tables                         level for tables
                                       Evolution to the Data Lakehouse • 29
                      Data            Data lake         Data lakehouse
                 warehouse
Performa       High              Low                   High
nce
Scalabilit     Scaling           Scales to hold any    Scales to hold any
y              becomes           amount of data at     amount of data at
               exponentially     low cost,             low cost,
               more              regardless of type    regardless of type
               expensive
Use case       Limited to BI,    Limited to            One data
support        SQL               machine learning      architecture for
               applications,                           BI, SQL, and
               and decision                            machine learning
               support
         The data lakehouse architecture presents an opportunity
      comparable to the one seen during the early years of the data
        warehouse market. The unique ability of the lakehouse to
       manage data in an open environment, blend all varieties of
        data from all parts of the enterprise, and combine the data
      science focus of the data lake with the end user analytics of the
      data warehouse will unlock incredible value for organizations.
                           CHAPTER 2
       Data Scientists and End
                Users
F
        irst applications, then data warehouses, and then
        came a whole host of types of data. The volume of
        data and the diversity of data were bewildering.
Soon these data types were placed in a data lake.
The data lake
Figure 2-1. The first rendition of a data lake was a repository of raw
    data. Data was simply placed into the data lake for anyone to
    analyze or use. The data lake data came from a wide variety of
    sources.
                                   31
 32 • Building the Data Lakehouse
The analytical infrastructure
As time passed, we discovered the need for another data
lake    component:        the       analytical   infrastructure.     The
analytical infrastructure was built from the raw data found
in the data lake, and did many functions such as:
    •   Identify how data related to each other
    •   Identify the timeliness of data
    •   Examine the quality of the data
    •   Identify the lineage of data
         metadata taxonomies lineage
                                          granularity  model
                    summarization KPI
                     source                   document
                            key
                                record
                                       transaction
Figure 2-2. The analytical infrastructure consisted of many different
    components, which we will describe in a later chapter.
Different audiences
    The analytical infrastructure served one audience and the data
                    lake served a different audience.
                                        Data Scientists and End Users • 33
The primary audience served by the data lake was the data
scientist.
Figure 2-3. The data scientist used the data lake to find new and
    interesting patterns and data trends in the organization.
The end user was the other type of community served by
the data lake and the analytical infrastructure.
Figure 2.4. The end user‘s role was to keep the business moving
    forward productively and profitably on an ongoing basis.
The tools of analysis
One distinct difference between the end user and data
scientist was the tools used to analyze data. The data
scientist    uses    primarily     statistical     analytical       tools.
 34 • Building the Data Lakehouse
Occasionally, the data scientist uses exploratory tools, but
the data scientist uses statistical analysis tools for the most
part.
The end user addresses data analysis in a completely
different manner. The end user uses tools of simple
calculation and visualization. The end user looks to create
charts, diagrams, and other visual representations of data.
The data scientist tools operate on rough accumulations of
data. The end user tools operate on uniform, well-defined
data.
                 Visualization                 Statistics
Figure 2.5. There is a very basic difference in the data that the two
    different communities operate on.
What is being analyzed?
Another difference between the data scientist and end user
is that the two roles look for different things. The data
science community is looking for new and profound
patterns and trends in the data. In doing so, once
                                              Data Scientists and End Users • 35
discovering the patterns and trends, the data scientist can
improve the life and profitability of the organization.
The end user is not interested in discovering new patterns
of data. Instead, the end user is interested in recalculating
and reexamining old patterns of data. For example, the
end user is interested in monthly and quarterly KPIs
covering profitability, new customers, new types of sales,
etc.
                     KPIs               The mind of a 6 year old
                Quarterly profits   The impact of a new competitive
                                                product
Figure 2.6. The data that the data scientist is interested in is very
       different from the end user‘s.
The analytical approaches
The analytical approaches taken by the data scientist and
the end user are very different as well.
The data scientist uses a heuristic model of analysis. In the
heuristic approach, the next step of analysis depends on
the results obtained from the previous steps. When the
data scientist first starts an analysis, the data scientist does
 36 • Building the Data Lakehouse
not know what will be discovered or if anything will be
discovered. In many cases, the data scientist discovers
nothing. In other cases, the data scientist uncovers useful
patterns that have never before been seen or recognized.
The end user operates entirely differently from the data
scientist. The end user operates on the basis of regularly
occurring patterns of data. The end user relies upon simple
methods of calculation.
                     Calculation           Discovery
                    Regular usage       Irregular usage
Figure 2.7. The end user repeats the same analysis over and over on
    different segments of time. The data scientist operates in a mode
    of discovery.
Types of data
The data scientist operates on data with a low level of
granularity that is widely diverse. Typically the data
scientist works with data generated by a machine. Part of
the exploration experience is the ability to roam over and
examine a wide variety of different kinds of data.
                                         Data Scientists and End Users • 37
The end user operates on summarized (or lightly
summarized) data that is highly organized and appears
regularly. Each month, each week, each day, the same type
of data is examined and recalculated.
                Summarization             Low granularity
             Highly organized data      Wide diversity of data
Figure 2.8. Even the types of data the different communities operate
    on are different.
Given the stark differences in the needs of the different
communities,       it   is   no      surprise    that     the    different
communities are attracted to different parts of the data
lake.
Does this difference in attraction preclude the different
communities from looking at data that is foreign to them?
The answer is not at all. There is no reason why the end
user cannot look at and use the raw data found in the data
lake. And conversely, there is no reason why the data
scientist cannot use the analytical infrastructure.
 38 • Building the Data Lakehouse
         metadata taxonomies lineage
                                          granularity model
                    summarization KPI
                     source                   document
                            key
                                record
                                       transaction
Figure 2.9. The data scientist is attracted to the raw data found in the
    data lake, and the end user is attracted to the data found in the
    analytical infrastructure.
Indeed, the data scientist may well find the analytical
infrastructure to be useful. However, although data
scientists learn techniques for data analysis, when they go
into the real world, they become data garbage men, as they
spend 95% of their time cleaning data and 5% of their time
doing data analysis.
     There are then very different types of people that use the data
     lakehouse for very different purposes. The purpose of the data
           lakehouse is to serve all the different communities.
                                          CHAPTER 3
     Different Types of Data in
        the Data Lakehouse
          The data lakehouse is an amalgamation of different types of
      data. Each of the different types of data has their own physical
                                          characteristics.
          structured                        textual                        other unstructured
             transform                                  taxonomies
                                                                          streaming   Data integration
extract
                         load                 Textual                            API and App
                                   text       ETL                                integration
                                     The data lakehouse
                           transaction                              document
                               record                           lineage
                            taxonomies                        summarization
                           model      KPI                      key source
                           metadata                                granularity
Figure 3-1. A data lakehouse and its infrastructure.
                                                   39
 40 • Building the Data Lakehouse
The data lakehouse consists of:
    •    A data lake, where raw amounts of text are placed
    •    An analytical infrastructure, where descriptive
         information is made available to the end user
    •    A collection of different kinds of data—structured,
         textual, and other unstructured data
The data found in the data lakehouse is open.
Let’s dive deeper into each of these components.
Types of data
The three different types of data found in the data
lakehouse include:
    •    Structured data—transaction-based data
    •    Textual data—data from conversations and written
         text
    •    Other unstructured data—analog data and IoT
         data, typically machine-generated data
   structured               textual               other unstructured
Figure 3-2. The three different types of data found in the data
    lakehouse.
 Another Random Document on
Scribd Without Any Related Topics
   TURNING OVER THE TRENCHES.
                           THE RELIEF.
96.—In the afternoon and in each section:
   Have all the tools and supplies collected and list drawn up ready
to hand over to successor against receipt for same.
  Inspect equipment of men that they may be taken out completely.
  Check up exact itinerary of relief in and out.
97.—At the time of relief:
  Have rifles inspected and emptied.
  Give strict orders for silence.
  Follow same marching order as when coming in.
  Have officer march in rear.
98.—On reaching billets.
  Have the roll called and sent to the officer of the day.
  Have rifles inspected.
             THE DAY AFTER THE RELIEF.
99.—Replace equipment.
  Have all arms cleaned and oiled.
  Have broken arms turned in and others issued.
  Inspect shoes, clothes, equipment, tools, and replace when
needed.
  Have special inspection of gas-masks and replace if needed.
100.—Sanitation.
   Have underwear washed, and personal cleanliness attended to,
baths, hair-cuts, etc.
  Have premises kept clean and latrines disinfected daily.
           OUT OF THE TRENCHES.
101.—Specialists' Instruction.
  While in rest billets: Have all specialists' instruction continued:
sharpshooters, bomb-throwers, signallers, etc.
102.—Bayonet exercises.
  Should be given special attention.
103.—Close and extended order drill
  and marching give the men needed exercise.
104.—Relaxation.
  should also be provided: in the form of games, contests,
entertainments, etc. They help to keep the men "fit."
105.—Efficiency.
   The company commander should make it is his constant concern
that his men be kept at the highest possible point of efficiency.
                      QUESTIONS.
    The following questions are topical. Supplements to the
  answers found in this book should be looked for in the larger
  works referred to in the preface.
          Trench Life and Trench Warfare.
    1.—What inspections should be made on the day before the
relief?
  2.—State orders to be issued one hour before departure.
   3.—What may be the marching orders, on the way to the
trenches?
   4.—Describe precautions to be taken against enemy's fire, against
aeroplanes.
  5.—What other precautions should be taken?
   6.—What should the company commander attend to on reaching
the trenches?
   7.—What possible improvements of trenches are obviously called
for?
  8.—What special attention should be given the parapet?
  9.—Give rules for drainage and sanitation.
   10.—What precautions may be taken against capture of fire-
trench?
  11.—What does trench warfare correspond to in open warfare?
  12.—What does the safety of a sector depend on?
  13.—What is the fundamental duty in trench warfare?
    14.—What rule determines the number of men to be posted in the
fire-trench?
   15.—Sum up their orders about firing before open terrain, before
covered terrain.
  16.—What is meant by double sentinels?
  17.—Why is listening attentively even more important than
keeping a sharp look out?
    18.—Why should the sentinels refrain from answering the enemy's
fire?
  19.—What is expected of the men in the listening posts?
  20.—When should the sentinels fire on a clear night? When, on a
dark night?
   21.—What should the sentinels do, if they hear the enemy's
digging?
  22.—When and where are sharpshooters posted and what is their
duty?
  23.—What information may patrols bring back?
   24.—When should patrols be sent out and how should they be
assigned?
   25.—What should the sentinels along a sector know about the
patrols, and the several possible patrols know about one another?
  26.—Describe dress and equipment of men on patrols.
  27.—Describe their method of advance.
  28.—What should they do on encountering a hostile patrol?
  29.—What should be the motto of men on patrol?
  30.—What are some of the most useful informations about the
enemy, you should try to obtain?
  31.—What motto should you have about ammunition?
  32.—Describe    several   ways    of   leading   enemy   to   waste
ammunition.
   33.—What is the distinction between legitimate and illegitimate
ruses?
   34.—On what principle is the enemy's ruse of the use of blank
cartridges based?
  35.—How may this ruse be foiled?
   36.—What should the sentinels, and what should the men on
patrol do, when the enemy sends up flares?
  37.—How should the enemy's machine gun fire be answered?
  An Enemy's Attack.
  38.—Describe procedure when enemy's patrols are sighted by
sentinels and when an attack develops.
  39.—When are the trench mortars and the machine guns fired?
  40.—How are hand grenades thrown?
  41.—Where should the rifle fire be aimed?
  42.—When are bayonets used?
  43.—Is it sufficient to repulse an attack?
  44.—What formation should be adopted for the counter attack?
  45.—How is the advance made and the counter attack carried
out?
  46.—Describe what is meant by organization of a newly
conquered trench.
   47.—What should be done, if the enemy bombards the fire-
trench?
  48.—What should the sentinels do?
   49.—What should be done if the bombardment is back of the fire-
trench?
  50.—What general rule applies to the use of all trench artillery?
  51.—What are its ordinary objectives?
  52.—How are trench mortars handled?
  53.—What is meant by calling trench-artillery mobile weapons?
  54.—Give a general caution for the use of all ammunition.
  55.—What is essential to secure effective artillery fire?
  56.—What should be done if one's own artillery fire falls short
upon one's own trenches?
   57.—How is coordination between artillery and infantry secured in
case of a raid?
   58.—What are the principal items of the morning schedule, of the
afternoon schedule?
  59.—Describe the preparations for leaving the trenches.
  60.—What orders are given at the time of relief?
  61.—What is done before the men are dismissed to their billets?
  62.—How should the days in rest billets be utilized?
  63.—Describe a typical day in the trenches.
64.—Describe a typical day in rest billets.
65.—What should be the supreme aim alike of men and officers?
       Part II.
French Infantry Combat
      Principles.
FRENCH INFANTRY COMBAT
      PRINCIPLES.
                   OPEN WARFARE.
106.—Is open warfare probable?
   It is improbable that in this war trench warfare will definitely give
place on all sectors of the front to open warfare.
   But the tactics that have forced several retirements will force
others.
   If sufficient troops are available, tried and fit and resolute, with
the necessary quantities of ammunition and improved artillery, we
shall see German arrogance and brutality in victory become again
cringing fear and demoralization in defeat; the experience of the
Marne will be repeated and the invaders will be driven out of the
territory they swarmed over through treacherous breaking of
treaties.
107.—The need of training in Infantry Combat
     Principles.
  That day the infantry will come again unto its own and its dash
and resolution will insure victory.
   To achieve it, it must be a well trained infantry, in the old sense of
the word. Officers, non-commissioned officers and men must have a
thorough and practical knowledge of Infantry Combat Principles.
   These should be practiced in the intervals of trench service when
the battalion is in rest billets.
  Their theory should be thoroughly mastered by all on whom may
devolve responsibility.
108.—The two phases of the Combat.
  We shall study here the two principal phases of the combat: the
approach and the attack, from the point of view of the company
commander.
109.—The Defense.
  We shall also consider the Combat from the standpoint of the
Defense.
                   THE APPROACH.
110.—All maneuvering at close range
     impossible.
   In the attack, the infantry can proceed only straight ahead. Under
infantry fire all maneuvering is impossible. Therefore by "approach"
is meant all maneuvering preparatory to the attack: It brings the
troops directly in front of and as near as possible to the objective.
 PRELIMINARY DISPOSITIONS TO START THE
              APPROACH.
111.—The orders to attack.
   The company commander will receive his orders from the
battalion commander.
112.—Equipment and Liaison.
  In the meanwhile let the lieutenants:
    a) make sure that the men are fully equipped and provided with
      full allotment of ammunition;
    b) appoint and parade connecting files (runners) to await
      orders.
113.—Distribution of Orders.
   The company commander having received his orders from the
battalion commander, will then call his subordinates and issue his
own orders accordingly, including the formation to be adopted.
114.—Combat patrols.
   He will make sure that there are combat patrols on the exposed
flank or flanks and to the front and rear if need be.
   It is well to have combat patrols detach automatically. It may be
understood, once for all, that, without further orders, the first squad
will cover in front, the second to the right, the third to the left, the
fourth to the rear, whenever needed. Still, the officer in charge
should make sure that this arrangement is carried out.
   A combat patrol, if not a full advance guard, will thus always
precede a unit and be the first to take contact with the enemy.
115.—Officers as guides.
  The officers serve as guides to their units, until deployment, a
mounted officer in liaison with the advance guard or advanced
combat patrol checking up the itinerary.
116.—Keep Close Order as long as possible.
   The advance of a company into an engagement is conducted in
close order, preferably columns of squads, until possible observation
by the enemy or encountering of hostile fire makes it advisable to
deploy.
  Deployment should not be premature and should always follow
upon the conditions arising during the progress of the advance.
  PRECAUTIONS AGAINST HOSTILE
          ARTILLERY.
            AGAINST SILENT ARTILLERY.
117.—Nearing artillery which may open fire.
   About two or three miles from the positions liable to be occupied
by the enemy's field artillery, precautions should be taken against
the possibility of its opening fire.
118.—Deployment.
  Deployments     should   be   adopted    best   suited   to   escape
observation:
119.—To escape direct observation:
  March in single or double file, the whole section[D] keeping closed
up so as to diminish the number of files seen from the front.
120.—Under aeroplane observation:
   Avoid especially the center of roads as they show white, utilize on
the contrary the spaces between cultivated fields of different colors,
make use of all possible cover, trees, shrubs, ditches, embankments.
Always walk in the shade when possible. If hostile aeroplanes are
flying low, halt and lie down on left side, hiding face in elbow.
                          FOOTNOTE:
        [D] The French "section" comprises 54 men. It is thus
      equivalent to 7 squads, and may be considered as 2 platoons.
   CROSSING A BOMBARDED ZONE.
121.—Case I. Artillery opening fire to register.
   A registering fire is easily recognized as the German artillery
registers either with a single percussion shell at a time, or with two
time-shells at three seconds interval.
   In the German field gun, the setting of the angle of sight[E] and
of the elevation[F] involves two operations.
122.—Oblique to right then to left.
   Therefore infantry under registering fire should oblique forward
rapidly.
123.—Case II: Artillery opening fire for effect.
   The zone has necessarily been previously registered. Such a zone
is easily recognized by the presence of shell holes.
124.—Avoid Zone if possible.
  It should be avoided and the advance made on its outskirts.
125.—The five cases of fire for effect.
   If this cannot be done and the fire for effect materializes five
cases are to be distinguished as the shells may be:
    1.   Shrapnel shells bursting at right height;
    2.   Shrapnel shells bursting high;
    3.   Time-Fuse high explosive shells bursting at right height;
    4.   Time-Fuse high explosive shells bursting high;
    5.   Percussion high explosive shells.
126.—Case 1. Burst Area of Shrapnel shells
     bursting at right height.
   The area of burst is about 250 to 300 yards in length and 30
yards in width, half the bullets falling on the first 50 yards of the
beaten zone.
127.—Protective Formation against Shrapnel.
   Advance in line of section, in single or double file keeping as
closed up as possible with 30 yards intervals between sections.
  The second line should be 250 to 300 yards behind the first.
128.—Case 2. Shrapnel shells bursting high.
  Much less dangerous than when bursting at right height as initial
speed of bullets is spent. Same formation as for Case 1.
129.—Case 3. Burst area of Time-fuse high
     explosive shells bursting at right height.
   The area of burst is opposite to that of shrapnel: short depth,
large width, only 7 to 10 yards depths as opposed to 60 to 100 yards
in width.
130.—Protective Formation against Time-fuse
     high explosives.
  Advance in line of section, single or double file, keeping as closed
up as possible with 60 to 100 yards intervals between sections.
  The second line may be 15 yards behind the first.
131.—Case 4. High explosive shells bursting
     high.
   The depth of the area of burst is longer than when shells burst at
the right height; therefore widen interval between the lines.
132.—Case 5. Burst area of percussion high
     explosive shells.
   The radius of the explosion is only about 25 yards but the local
effect is intense and the displacement is effective in more than
double the radius.
133.—Protective Formation against percussion
     high explosive shells.
  Advance in line of section in double file, keeping as closed up as
possible, with about 100 yards intervals between sections.
  The second line may be about 50 yards behind the first.
 GENERAL RECOMMENDATIONS AGAINST ALL
        TYPES OF EFFECTIVE FIRE.
134.—Dangerous to stop, useless to run.
   Do not stop in a zone under fire for effect as lying down only
provides a larger target. If absolutely obliged to stop, remain
standing and packed together like sardines, maintaining above
formations and intervals. It is useless to run, but, as much as
possible, advance steadily.
135.—Protective Formation against all types of
     shells.
   As may appear from the study of the above the following
formation and intervals will afford the best protection against all
types and combinations of types of shells, as a shell will never affect
more than one section.
  Advance in lines of sections in double file, keeping as closed up as
possible, with 85 to 110 yards intervals[G] between sections.
  The second line should be 250 to 300 yards behind the first.
                             FOOTNOTES:
         [E] Inclination of the line of sight to the horizontal.
         [F] The vertical inclination of the gun.
         [G] All through this chapter, maximum intervals are given.
       They may have to be shortened to secure closer order at the
expense of greater safety.
SPECIAL FEATURES OF THE
      APPROACH.
    USE OF WOODS AS SHELTER ON
           THE ADVANCE.
136.—Avoid if small.
   They should be used to advance or halt only if they are of
considerable size. Then, they hide movements and provide some
shelter from fire. On the contrary, when they are small, they are to
be avoided as they draw artillery fire and do not offer sufficient
protection.
137.—Liaison difficult.
    When advancing in woods, special care should be taken to keep
all fractions connected.
138.—Exit quickly at one time.
   To exit from wood, take all necessary dispositions under cover so
that, on the signal of the commander, all fractions may be ready to
spring out together. They should continue to advance forward, as
rapidly as possible, to avoid the enemy's likely shelling of the
outskirts.
138.—Otherwise exit in different places.
   If the exit cannot be made by all fractions at one time, the
elements of the second line should avoid coming out at the same
point as those of the first line.
                TO CROSS A CREST.
139.—Cross altogether and rapidly.
   Let the line of sections assemble at top of crest, crouching
carefully below the sky line. Then, upon concerted signal, all should
leap quickly across and down the descending slope, making as
extended bounds as possible.
   This makes crossing fairly safe as even the infantry will have to
modify both its elevation and angle of sight for every new position of
this quickly moving target.
 PRECAUTIONS AGAINST CAVALRY.
140.—Cavalry Patrols.
   During the whole "approach" watch should be kept for possible
cavalry patrols. The elements acting as advance guard and flank
guards or as combat patrols have as part of their special mission to
keep the cavalry away from the main body.
141.—Face and Fire.
   To repulse cavalry, the infantry must be able to face quickly
toward the charging horsemen and furnish a heavy fire.
142.—Protective formations.
   If cavalry patrols are expected ahead, deployment as skirmishers
will secure this, if on the flanks, deploy in columns of squads
marching in double file. A formation in echelon is effective at all
times.
143.—Repulsing the charge.
    If cavalry appears, stop, face the charge quickly, fix bayonets and
fire at will, the section leaders controlling the fire.
144.—In case of surprise.
If surprised, deploy quickly and lie down.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
                         ebookbell.com