Download Full Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni PDF All Chapters
Download Full Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni PDF All Chapters
com
https://ebookmass.com/product/data-engineering-with-dbt-a-
practical-guide-to-building-a-cloud-based-pragmatic-and-
dependable-data-platform-with-sql-zagni/
OR CLICK BUTTON
DOWNLOAD NOW
https://ebookmass.com/product/getting-started-with-sql-and-databases-
managing-and-manipulating-data-with-sql-mark-simon/
ebookmass.com
https://ebookmass.com/product/getting-started-with-sql-and-databases-
managing-and-manipulating-data-with-sql-1st-edition-mark-simon/
ebookmass.com
https://ebookmass.com/product/security-engineering-a-guide-to-
building-dependable-distributed-systems-3-edition-ross-anderson/
ebookmass.com
https://ebookmass.com/product/hands-on-with-google-data-studio-a-data-
citizens-survival-guide-lee-hurst/
ebookmass.com
https://ebookmass.com/product/unlocking-dbt-design-and-deploy-
transformations-in-your-cloud-data-warehouse-cameron-cyr/
ebookmass.com
Data Engineering with dbt
Roberto Zagni
BIRMINGHAM—MUMBAI
Data Engineering with dbt
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-80324-628-4
www.packtpub.com
To the four females in my daily life: my wife, who supports me every day, my daughters,
who keep me grounded in reality, and our dog Lily for the sparkles of love and happiness
that she spreads around every day.
To my mother and my late father who through their sacrifices and support allowed me
to become what I wanted to be.
– Roberto Zagni
Contributors
I would like to thank my customers and colleagues for all the problems, and discussions to get to a
working solution that helped me to become a better software and data engineer and collect a wide
array of experiences in software and data engineering.
This book is my little contribution to the data engineering community.
I hope that I have been able to put the core set of knowledge that I would have loved to have in my
days as a data engineer in one place, along with a simple, opinionated way to build data platforms
using the modern data stack and proven patterns that scale and simplify everyday work.
About the reviewers
Hari Krishnan has been in the data space for close to 20 years now. He started at Infosys Limited,
working on mainframe technology for about 6 years, and then moved over to Informatica, then
eventually into business intelligence, big data, and the cloud in general. Currently, he is senior manager
of data engineering at Beachbody LLC, where he manages a team of data engineers with Airflow, dbt,
and Snowflake as the ir primary tech stack. He built the data lake and migrated the data warehouse
as well the ETL/ELT pipelines from on-premises to the cloud. He spent close to 13 years working for
Infosys and has spent the last 7 years with Beachbody. He is a technology enthusiast and always has
an appetite to discover, explore, and innovate new avenues in the data space.
Daniel Joshua Jayaraj S R is a data evangelist and business intelligence engineer, with over six years of
experience in the field of data analytics, data modeling, and visualization. He has helped organizations
understand the full potential of their data by providing stakeholders with strong business-oriented
visuals, thereby enhancing data-driven decisions. He has worked with multiple tools and technologies
during his career and completed his master’s in big data and business analytics.
I would like to thank my mother, S J Inbarani, who has been my motivation my whole life. I would also
like to thank Roberto Zagni for allowing me to review this wonderful book on dbt.
Table of Contents
Prefacexv
2
Setting Up Your dbt Cloud Development Environment 53
Technical requirements 54 Creating your GitHub account 56
Setting up your GitHub account 54
Introducing Version Control 54
viii Table of Contents
Setting up your first repository for dbt 61 Experimenting with SQL in dbt Cloud 91
Exploring the dbt Cloud IDE 92
Setting up your dbt Cloud account 63
Executing SQL from the dbt IDE 93
Signing up for a dbt Cloud account 63
Setting up your first dbt Cloud project 65 Introducing the source and ref dbt
Adding the default project to an empty functions94
repository80 Exploring the dbt default model 95
Comparing dbt Core and dbt Cloud Using ref and source to connect models 97
workflows85 Running your first models 98
Testing your first models 100
dbt Core workflows 85
Editing your first model 101
dbt Cloud workflows 88
Summary103
Further reading 103
3
Data Modeling for Data Engineering 105
Technical requirements 106 Modeling use cases and patterns 124
What is and why do we need data Header-detail use case 124
modeling?106 Hierarchical relationships 126
Understanding data 106 Forecasts and actuals 131
What is data modeling? 106 Libraries of standard data models 132
Why we need data modeling 107 Common problems in data models 132
Complementing a visual data model 109
Fan trap 133
Conceptual, logical, and physical Chasm trap 135
data models 109
Modeling styles and architectures 137
Conceptual data model 110
Kimball method or dimensional modeling or
Logical data model 111 star schema 138
Physical data model 113 Unified Star Schema 141
Tools to draw data models 114 Inmon design style 144
Entity-Relationship modeling 114 Data Vault 145
Main notation 114 Data mesh 148
Cardinality115 Our approach, the Pragmatic Data
Time perspective 120 Platform - PDP 150
An example of an E-R model at different Summary151
levels of detail 122
Further reading 151
Generalization and specialization 122
Table of Contents ix
4
Analytics Engineering as the New Core of Data Engineering 153
Technical requirements 154 Defining analytics engineering 168
The data life cycle and its evolution 154 The roles in the modern data stack 169
Understanding the data flow 154 The analytics engineer 169
Data creation 155 DataOps – software engineering best
Data movement and storage 156 practices for data 170
Data transformation 162
Version control 171
Business reporting 164
Quality assurance 171
Feeding back to the source systems 165
The modularity of the code base 172
Understanding the modern Development environments 173
data stack 166 Designing for maintainability 174
The traditional data stack 166 Summary176
The modern data stack 167
Further reading 176
5
Transforming Data with dbt 177
Technical requirements 177 How to write and test
The dbt Core workflow for ingesting transformations198
and transforming data 178 Writing the first dbt model 198
Introducing our stock tracking Real-time lineage and project navigation 200
project181 Deploying the first dbt model 200
The initial data model and glossary 181 Committing the first dbt model 201
Setting up the project in dbt, Snowflake, Configuring our project and where we
and GitHub 183 store data 202
Re-deploying our environment to the
Defining data sources and providing desired schema 204
reference data 187 Configuring the layers for our architecture 207
Defining data sources in dbt 187 Ensuring data quality with tests 209
Loading the first data for the portfolio project 192 Generating the documentation 217
Summary219
x Table of Contents
7
Working with Dimensional Data 259
Adding dimensional data 260 Creating an STG model for the security
Creating clear data models for the refined dimension267
and data mart layers 260 Adding the default record to the STG 269
Loading the data of the first Saving history for the dimensional
dimension262 data269
Creating and loading a CSV as a seed 262 Saving the history with a snapshot 270
Configuring the seeds and loading them 263 Building the REF layer with the
Adding data types and a load timestamp dimensional data 271
to your seed 263
Adding the dimensional data to the
Building the STG model for the first data mart 272
dimension266 Exercise – adding a few more
Defining the external data source for seeds 266 hand-maintained dimensions 273
Summary274
Table of Contents xi
8
Delivering Consistency in Your Data 275
Technical requirements 275 Building on the shoulders of
Keeping consistency by reusing code giants – dbt packages 290
– macros 276 Creating dbt packages 291
Repetition is inherent in data projects 276 How to import a package in dbt 292
Why copy and paste kills your future self 277 Browsing through noteworthy packages
How to write a macro 277 for dbt 296
Refactoring the “current” CTE into a macro 278 Adding the dbt-utils package to our project 299
Fixing data loaded from our CSV file 282 Summary302
The basics of macro writing 284
9
Delivering Reliability in Your Data 303
Testing to provide reliability 303 Testing the right things in the right
Types of tests 304 places313
Singular tests 304 What do we test? 314
Generic tests 305 Where to test what? 316
Defining a generic test 308 Testing our models to ensure good quality 319
Summary329
10
Agile Development 331
Technical requirements 331 Organizing work the agile way 336
Agile development and collaboration 331 Managing the backlog in an agile way 337
S3.x – developing with dbt models the S5 – development and verification of the
pipeline for the XYZ table 354 report in the BI application 357
S4 – an acceptance test of the data produced
Summary357
in the data mart 356
11
Team Collaboration 359
Enabling collaboration 359 Keeping your development environment
Core collaboration practices 360 healthy368
Collaboration with dbt Cloud 361 Suggested Git branch naming 369
Adopting frequent releases 371
Working with branches and PRs 363
Making your first PR 372
Working with Git in dbt Cloud 364
The dbt Cloud Git process 364
Summary377
Further reading 377
Summary426
Table of Contents xiii
13
Moving Beyond the Basics 427
Technical requirements 427 Main uses of keys 440
Building for modularity 428 Master Data management 440
Modularity in the storage layer 429 Data for Master Data management 441
Modularity in the refined layer 431 A light MDM approach with DBT 442
Modularity in the delivery layer 434
Saving history at scale 445
Managing identity 436 Understanding the save_history macro 447
Identity and semantics – defining Understanding the current_from_history
your concepts 436 macro455
Different types of keys 437
Summary458
14
Enhancing Software Quality 459
Technical requirements 459 Calculating transactions 482
Refactoring and evolving models 460 Publishing dependable datasets 484
Dealing with technical debt 460 Managing data marts like APIs 485
Implementing real-world code and What shape should you use for your
business rules 462 data mart? 485
Self-completing dimensions 487
Replacing snapshots with HIST tables 463
History in reports – that is, slowly changing
Renaming the REF_ABC_BANK_
dimensions type two 492
SECURITY_INFO model 466
Handling orphans in facts 468 Summary493
Calculating closed positions 473 Further reading 493
15
Patterns for Frequent Use Cases 495
Technical requirements 495 Loading data from files 501
Ingestion patterns 495 External tables 505
Landing tables 507
Basic setup for ingestion 498
xiv Table of Contents
Index539
Haiphong, 36
Hakka, 13
Hangchow, 31-6
Hankow, 41-4
Han River, 43, 215
Hart, Sir Robert, 92
Helena May Institute, 36
Henry, Dr. Augustine, 125-35
Heyworth, Dr., 185, 212
Home for Incurables, 34
Honan, 171
Hong Kiang, 145
Hong Kong, 36, 197, 216-7
Hosie, Sir A., 40, 88
Hospitals, American, 146, 151-2
Anshunfu, 97
Changsha (Yale), 152
Chao Chowfu, 214
French (Yünnanfu), 70
Hangchow, 32-4
Peking, 22
Swatow, 212
Taiyuanfu, 62
Tsinan, 25-6
Wênchowfu, 205
Yünnanfu (C.M.S.), 70
Hotels:
Amichow, 38;
Amoy, 213;
Canton, 218;
Foochow, 213;
Hangchow, 35;
Lao Kay, 37;
Swatow, 213;
Yünnanfu, 39
Hunan, Ch. VI, 41, 42
Hupeh, 43
Hygiene, 26
Macao, 217
Main, Dr. and Mrs., 31
Malong, 40, 79
Mandarin Chinese, 24, 57
Maternity Hospital, 32, 33
Medicine, Schools of, 22, 24, 33
Mencius, 16
Mettle, 17, 164
Military Escort, 76, 107-11
Minerals, 42, 75, 89
Missions, Baptist, 25, 152, 214-5
C.I.M. (China Inland Mission), 97, 105, 143, 146, 149, 206
C.M.S. (Church Missionary Society), 34, 70, 179, 208
Danish, 153
English United Methodist, 206
L.M.S. (London Missionary Society), 153
Norwegian, 153
Presbyterian (Eng. & Amer.), 149, 210-6
Russian, 153
Wesleyan, 154, 163
Mixed Courts, 28
Mohammedans, 27
Money, 36, 38, 39, 40, 222
Mongtsze, 38
Morals, 117, 120
Morrison, Dr., 88
Mott, Dr., 163
Music, 54, 135, 208
Nanking, 24, 44
National Language, 56
Nestorians, 132
New York, 199
North and South, Division of, 37
troops, 150, 154-5, 159, 160, 170
Norton, Mr. and Mrs., 209
Nurses, 33, 212
Scouts, Boy, 26
Script, New, 54, 55
Seaports, Ch. X
Sericulture, 52, 209
Shanghai, 17, 28-31, 197
Shansi, 19, 49
Shantung, 24, 27, 195
Shenchowfu, 146-8
Shihchiah Chwang, 18
Shrines, Wayside, 99, 100
Sianfu, 54
Siang-Kiang, 42
Slichter, Mr. and Mrs., 97
Smuts, General, 204
Soap Tree, 100
Social Welfare, 29, 201
Soldiers, 24
S. S. Lines—
Changsha to Hankow, 42, 150
Changteh to Changsha, 42
Hankow to Shanghai, 42
Hong Kong to Haiphong, 36
Hong Kong to Macao, 217
Hong Kong to Swatow, 216
Shanghai to Hong Kong, 36
Standard Oil Coy., 148
Stone, Dr. Mary, 72
Student Movement, Ch. IX, 182
Student Strikes, 182-4, 191
Sun Yat Sen, 171
Swatow, 131, 185, 212-6
Symbolism, 83
Szechuan, 159
Universities—
Hong Kong, 197-8
Peking, 197
St. John’s, Shanghai, 147
Shantung Christian, 24
Varnish tree, 98
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookmass.com