Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni download
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL Zagni download
https://ebookmass.com/product/getting-started-with-sql-and-databases-
managing-and-manipulating-data-with-sql-mark-simon/
https://ebookmass.com/product/getting-started-with-sql-and-databases-
managing-and-manipulating-data-with-sql-1st-edition-mark-simon/
https://ebookmass.com/product/security-engineering-a-guide-to-
building-dependable-distributed-systems-3-edition-ross-anderson/
https://ebookmass.com/product/hands-on-with-google-data-studio-a-data-
citizens-survival-guide-lee-hurst/
https://ebookmass.com/product/unlocking-dbt-design-and-deploy-
transformations-in-your-cloud-data-warehouse-cameron-cyr/
Data Engineering with dbt
Roberto Zagni
BIRMINGHAM—MUMBAI
Data Engineering with dbt
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-80324-628-4
www.packtpub.com
To the four females in my daily life: my wife, who supports me every day, my daughters,
who keep me grounded in reality, and our dog Lily for the sparkles of love and happiness
that she spreads around every day.
To my mother and my late father who through their sacrifices and support allowed me
to become what I wanted to be.
– Roberto Zagni
Contributors
I would like to thank my customers and colleagues for all the problems, and discussions to get to a
working solution that helped me to become a better software and data engineer and collect a wide
array of experiences in software and data engineering.
This book is my little contribution to the data engineering community.
I hope that I have been able to put the core set of knowledge that I would have loved to have in my
days as a data engineer in one place, along with a simple, opinionated way to build data platforms
using the modern data stack and proven patterns that scale and simplify everyday work.
About the reviewers
Hari Krishnan has been in the data space for close to 20 years now. He started at Infosys Limited,
working on mainframe technology for about 6 years, and then moved over to Informatica, then
eventually into business intelligence, big data, and the cloud in general. Currently, he is senior manager
of data engineering at Beachbody LLC, where he manages a team of data engineers with Airflow, dbt,
and Snowflake as the ir primary tech stack. He built the data lake and migrated the data warehouse
as well the ETL/ELT pipelines from on-premises to the cloud. He spent close to 13 years working for
Infosys and has spent the last 7 years with Beachbody. He is a technology enthusiast and always has
an appetite to discover, explore, and innovate new avenues in the data space.
Daniel Joshua Jayaraj S R is a data evangelist and business intelligence engineer, with over six years of
experience in the field of data analytics, data modeling, and visualization. He has helped organizations
understand the full potential of their data by providing stakeholders with strong business-oriented
visuals, thereby enhancing data-driven decisions. He has worked with multiple tools and technologies
during his career and completed his master’s in big data and business analytics.
I would like to thank my mother, S J Inbarani, who has been my motivation my whole life. I would also
like to thank Roberto Zagni for allowing me to review this wonderful book on dbt.
Table of Contents
Prefacexv
2
Setting Up Your dbt Cloud Development Environment 53
Technical requirements 54 Creating your GitHub account 56
Setting up your GitHub account 54
Introducing Version Control 54
viii Table of Contents
Setting up your first repository for dbt 61 Experimenting with SQL in dbt Cloud 91
Exploring the dbt Cloud IDE 92
Setting up your dbt Cloud account 63
Executing SQL from the dbt IDE 93
Signing up for a dbt Cloud account 63
Setting up your first dbt Cloud project 65 Introducing the source and ref dbt
Adding the default project to an empty functions94
repository80 Exploring the dbt default model 95
Comparing dbt Core and dbt Cloud Using ref and source to connect models 97
workflows85 Running your first models 98
Testing your first models 100
dbt Core workflows 85
Editing your first model 101
dbt Cloud workflows 88
Summary103
Further reading 103
3
Data Modeling for Data Engineering 105
Technical requirements 106 Modeling use cases and patterns 124
What is and why do we need data Header-detail use case 124
modeling?106 Hierarchical relationships 126
Understanding data 106 Forecasts and actuals 131
What is data modeling? 106 Libraries of standard data models 132
Why we need data modeling 107 Common problems in data models 132
Complementing a visual data model 109
Fan trap 133
Conceptual, logical, and physical Chasm trap 135
data models 109
Modeling styles and architectures 137
Conceptual data model 110
Kimball method or dimensional modeling or
Logical data model 111 star schema 138
Physical data model 113 Unified Star Schema 141
Tools to draw data models 114 Inmon design style 144
Entity-Relationship modeling 114 Data Vault 145
Main notation 114 Data mesh 148
Cardinality115 Our approach, the Pragmatic Data
Time perspective 120 Platform - PDP 150
An example of an E-R model at different Summary151
levels of detail 122
Further reading 151
Generalization and specialization 122
Table of Contents ix
4
Analytics Engineering as the New Core of Data Engineering 153
Technical requirements 154 Defining analytics engineering 168
The data life cycle and its evolution 154 The roles in the modern data stack 169
Understanding the data flow 154 The analytics engineer 169
Data creation 155 DataOps – software engineering best
Data movement and storage 156 practices for data 170
Data transformation 162
Version control 171
Business reporting 164
Quality assurance 171
Feeding back to the source systems 165
The modularity of the code base 172
Understanding the modern Development environments 173
data stack 166 Designing for maintainability 174
The traditional data stack 166 Summary176
The modern data stack 167
Further reading 176
5
Transforming Data with dbt 177
Technical requirements 177 How to write and test
The dbt Core workflow for ingesting transformations198
and transforming data 178 Writing the first dbt model 198
Introducing our stock tracking Real-time lineage and project navigation 200
project181 Deploying the first dbt model 200
The initial data model and glossary 181 Committing the first dbt model 201
Setting up the project in dbt, Snowflake, Configuring our project and where we
and GitHub 183 store data 202
Re-deploying our environment to the
Defining data sources and providing desired schema 204
reference data 187 Configuring the layers for our architecture 207
Defining data sources in dbt 187 Ensuring data quality with tests 209
Loading the first data for the portfolio project 192 Generating the documentation 217
Summary219
Visit https://ebookmass.com today to explore
a vast collection of ebooks across various
genres, available in popular formats like
PDF, EPUB, and MOBI, fully compatible with
all devices. Enjoy a seamless reading
experience and effortlessly download high-
quality materials in just a few simple steps.
Plus, don’t miss out on exciting offers that
let you access a wealth of knowledge at the
best prices!
x Table of Contents
7
Working with Dimensional Data 259
Adding dimensional data 260 Creating an STG model for the security
Creating clear data models for the refined dimension267
and data mart layers 260 Adding the default record to the STG 269
Loading the data of the first Saving history for the dimensional
dimension262 data269
Creating and loading a CSV as a seed 262 Saving the history with a snapshot 270
Configuring the seeds and loading them 263 Building the REF layer with the
Adding data types and a load timestamp dimensional data 271
to your seed 263
Adding the dimensional data to the
Building the STG model for the first data mart 272
dimension266 Exercise – adding a few more
Defining the external data source for seeds 266 hand-maintained dimensions 273
Summary274
Table of Contents xi
8
Delivering Consistency in Your Data 275
Technical requirements 275 Building on the shoulders of
Keeping consistency by reusing code giants – dbt packages 290
– macros 276 Creating dbt packages 291
Repetition is inherent in data projects 276 How to import a package in dbt 292
Why copy and paste kills your future self 277 Browsing through noteworthy packages
How to write a macro 277 for dbt 296
Refactoring the “current” CTE into a macro 278 Adding the dbt-utils package to our project 299
Fixing data loaded from our CSV file 282 Summary302
The basics of macro writing 284
9
Delivering Reliability in Your Data 303
Testing to provide reliability 303 Testing the right things in the right
Types of tests 304 places313
Singular tests 304 What do we test? 314
Generic tests 305 Where to test what? 316
Defining a generic test 308 Testing our models to ensure good quality 319
Summary329
10
Agile Development 331
Technical requirements 331 Organizing work the agile way 336
Agile development and collaboration 331 Managing the backlog in an agile way 337
S3.x – developing with dbt models the S5 – development and verification of the
pipeline for the XYZ table 354 report in the BI application 357
S4 – an acceptance test of the data produced
Summary357
in the data mart 356
11
Team Collaboration 359
Enabling collaboration 359 Keeping your development environment
Core collaboration practices 360 healthy368
Collaboration with dbt Cloud 361 Suggested Git branch naming 369
Adopting frequent releases 371
Working with branches and PRs 363
Making your first PR 372
Working with Git in dbt Cloud 364
The dbt Cloud Git process 364
Summary377
Further reading 377
Summary426
Table of Contents xiii
13
Moving Beyond the Basics 427
Technical requirements 427 Main uses of keys 440
Building for modularity 428 Master Data management 440
Modularity in the storage layer 429 Data for Master Data management 441
Modularity in the refined layer 431 A light MDM approach with DBT 442
Modularity in the delivery layer 434
Saving history at scale 445
Managing identity 436 Understanding the save_history macro 447
Identity and semantics – defining Understanding the current_from_history
your concepts 436 macro455
Different types of keys 437
Summary458
14
Enhancing Software Quality 459
Technical requirements 459 Calculating transactions 482
Refactoring and evolving models 460 Publishing dependable datasets 484
Dealing with technical debt 460 Managing data marts like APIs 485
Implementing real-world code and What shape should you use for your
business rules 462 data mart? 485
Self-completing dimensions 487
Replacing snapshots with HIST tables 463
History in reports – that is, slowly changing
Renaming the REF_ABC_BANK_
dimensions type two 492
SECURITY_INFO model 466
Handling orphans in facts 468 Summary493
Calculating closed positions 473 Further reading 493
15
Patterns for Frequent Use Cases 495
Technical requirements 495 Loading data from files 501
Ingestion patterns 495 External tables 505
Landing tables 507
Basic setup for ingestion 498
xiv Table of Contents
Index539
Chapter 2, Setting Up Your DBT Cloud Development Environment, gets started with DBT by creating
your GitHub and DBT accounts. You will learn why version control is important and what the data
engineering workflow is when working with DBT.
You will also understand the difference between the open source DBT Core and the commercial DBT
Cloud. Finally, you will experiment with the default project and set up your environment for running
basic SQL with DBT on Snowflake and understand the key functions of DBT: ref and source.
Chapter 3, Data Modeling for Data Engineering, shows why and how you describe data, and how to
travel through different abstraction levels, from business processes to the storage of the data that
supports them: conceptual, logical, and physical data models.
You will understand entities, relationships, attributes, entity-relationship (E-R) diagrams, modeling
use cases and modeling patterns, Data Vault, dimensional models, wide tables, and business reporting.
Chapter 4, Analytics Engineering as the New Core of Data Engineering, showcases the full data life cycle
and the different roles and responsibilities of people that work on data.
You will understand the modern data stack, the role of DBT, and analytic engineering. You will learn
how to adopt software engineering practices to build data platforms (or DataOps), and about working
as a team, not as a silo.
Chapter 5, Transforming Data with DBT, shows us how to develop an example application in dbt and
learn all the steps to create, deploy, run, test, and document a data application with dbt.
Chapter 6, Writing Maintainable Code, continues the example that we started in the previous chapter,
and we will guide you to configure dbt and write some basic but functionally complete code to build the
three layers of our reference architecture: staging/storage, refined data, and delivery with data marts.
Chapter 7, Working with Dimensional Data, shows you how to incorporate dimensional data in our
data models and utilize it for fact-checking and a multitude of purposes. We will explore how to create
data models, edit the data for our reference architecture, and incorporate the dimensional data in data
marts. We will also recap everything we learned in the previous chapters with an example.
Chapter 8, Delivering Consistency in Your Code, shows you how to add consistency to your transformations.
You will learn how to go beyond basic SQL and bring the power of scripting into your code, write
your first macros, and learn how to use external libraries in your projects.
Chapter 9, Delivering Reliability in Your Data, shows you how to ensure the reliability of your code by
adding tests that verify your expectations and check the results of your transformations.
Chapter 10, Agile Development, teaches you how to develop with agility by mixing philosophy and
practical hints, discussing how to keep the backlog agile through the phases of your projects, and a
deep dive into building data marts.
Chapter 11, Collaboration, touches on a few practices that help developers work as a team and the
support that dbt provides toward this.
Preface xvii
Chapter 12, Deployment, Execution, and Documentation Automation, helps you learn how to automate
the operation of your data platform, by setting up environments and jobs that automate the release
and execution of your code following your deployment design.
Chapter 13, Moving beyond Basics, helps you learn how to manage the identity of your entities so that
you can apply master data management to combine data from different systems. At the same time,
you will review the best practices to apply modularity in your pipelines to simplify their evolution
and maintenance. You will also discover macros to implement patterns.
Chapter 14, Enhancing Software Quality, helps you discover and apply more advanced patterns that
provide high-quality results in real-life projects, and you will experiment with how to evolve your
code with confidence through refactoring.
Chapter 15, Patterns for Frequent Use Cases, presents you with a small library of patterns that are
frequently used for ingesting data from external files and storing this ingested data in what we call
history tables. You will also get the insights and the code to ingest data in Snowflake.
If you are using the digital version of this book, we advise you to type the code yourself or access
the code from the book’s GitHub repository (a link is available in the next section). Doing so will
help you avoid any potential errors related to the copying and pasting of code.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Create
the new database using the executor role. We named it PORTFOLIO_TRACKING.”
xviii Preface
When we wish to draw your attention to a particular part of a code block, the relevant lines or items
are set in bold:
CREATE VIEW BIG_ORDERS AS
SELECT * FROM ORDERS
WHERE TOTAL_AMOUNT > 1000;
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance,
words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the
Administration panel.”
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at customercare@
packtpub.com and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen.
If you have found a mistake in this book, we would be grateful if you would report this to us. Please
visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would
be grateful if you would provide us with the location address or website name. Please contact us at
copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Preface xix
Some time later, once more the footsteps came, crash crash on
the coral, squeak squeak on the verandah, again my door opened
and the squeak changed to the tramp of booted feet on the boarded
floor; as I looked to see who it was, the tramp passed close behind
my chair and across the room to the door, which opened, then again
the tramp changed to the squeak and the squeak to the crash on the
coral. I was by this time getting very puzzled, but, after a little
thought, decided my imagination was playing me tricks, and that I
had not really closed the doors when I thought I had. I made
certain, however, that I did close them this time, and went on with
my work again. Once more the whole thing was repeated, only this
time I rose from the table, took my lamp in my hand, and gazed
hard at the places on the floor from which the sound came, but
could see nothing.
Then I went on to the verandah and yelled for Giorgi and
Poruma. “Who is playing tricks here?” I asked in a rage. Before
Poruma could answer, again came the sound of footsteps through
my room. “I did not know that you had any one with you,” said
Poruma in surprise, as he heard the steps. “I have no one with me,
but somebody keeps opening my door and walking about,” I replied,
“and I want him caught.” “No one would dare come into the
Government compound and play tricks on the R.M.,” said Poruma,
“unless he were mad.” I was by this time thoroughly angry. “Giorgi,
go to the guard-house, send up the gate-keeper and all the men
there, then go to the gaol and send Manigugu (the gaoler) and all
his warders; then send to the Siai for her men; I mean to get to the
bottom of all this fooling.” The gate-keeper arrived, and swore he
had locked the gate at ten o’clock, that no other than Government
people had passed through before that hour; that since then, until
Giorgi went for him, he had been sitting on his verandah with some
friends, and nobody could have passed without his knowledge. Then
came the men from the gaol and the Siai, and I told them some
scoundrel had been playing tricks upon me and I wanted him
caught.
First they searched the house, not a big job, as there were only
three rooms furnished with spartan simplicity; that being completed,
I placed four men with lanterns under the house, which was raised
on piles about four feet from the ground: at the back and front and
sides I stationed others, until it was impossible for a mouse to have
entered or left that house unseen. Then again I searched the house
myself; after which Poruma, Giorgi and I shut the doors of my room
and sat inside. Exactly the same thing occurred once more; through
that line of men came the footsteps, through my room in precisely
the same manner came the tread of a heavily-booted man, then on
to the palm verandah, where—in the now brilliant illumination—we
could see the depression at the spots from which the sound came,
as though a man were stepping there. “Well, what do you make of
it?” I asked my men. “No man living could have passed unseen,” was
the answer; “it’s either the spirit of a dead man or a devil.” “Spirit of
dead man or devil, it’s all one to me,” I remarked; “if it’s taken a
fancy to prance through my room, it can do so alone; shift my things
off to the Siai for the night.”
The following day I sought out Armit. “Do you know anything
about spooks?” I asked; “because something of that nature has
taken a fancy to Moreton’s house.” “Moreton once or twice hinted at
something of the sort,” said Armit, “but he would never speak out; I
will come and spend to-night with you, and we will investigate.”
Armit came, but nothing out of the ordinary occurred; nor did I ever
hear of it afterwards, and before a year had elapsed the house had
been pulled down. When Moreton returned, I related my experience
to him, and he then told me that one night, when he was sleeping in
his hammock, he was awakened by footsteps, such as I have
described, and upon his calling out angrily to demand who was
making the racket, his hammock was violently banged against the
wall. “I didn’t care to say anything about it,” he said, “as I was alone
at the time, and didn’t want to be laughed at.”
I have told this story for what it is worth: I leave my readers,
who are interested in the occult or psychical research, to form what
opinion they choose; all I say is, that the story, as I have related it,
is absolutely true.
Some few days after Moreton had resumed his duties, the Merrie
England came in with Sir William on board, and his Excellency told
me that as Ballantine, the Treasurer and Collector of Customs, had
broken down in health, it was necessary for him to be relieved at
once, and that I was to take up his duties. I protested that I knew
nothing about accountants’ work or book-keeping, and respectfully
declined the appointment. “You can do simple addition and
subtraction, that’s all I want; find your way to Port Moresby as soon
as you can,” was all the Governor replied. Then the Merrie England
left; and I consulted Moreton. “The Lord help you, laddie,” said he;
“you will make a devil of a mess of it, but you must do what Jock
says.” Then Armit. “You must take it, or you will never get another
job; but you will be all right if you sit tight, and refuse to sign
anything without the authority of the Governor or Government
Secretary.” Then I went to Arbouine and unfolded my tale of woe.
“Oh, that’s all right,” said he; “I will write a line to Gors, our
manager at Port Moresby, and if you get stuck, he will lend you a
good clerk for a day or two, who will keep you all right.”
Then I resigned myself to the inevitable; Treasurer and Collector
of Customs I had to be. The next thing was to find my way to Port
Moresby, and break the news to Ballantine. A steamer came in, the
Mount Kembla, an Australian-owned boat recently chartered to carry
coal to German New Guinea; Burns, Philp and Co. were the agents,
and upon my going to book a passage to Port Moresby, Arbouine
said, “This vessel is bound by her insurances to carry a pilot in New
Guinea waters; I can’t let her leave here without one, and you are
the only man I can get hold of capable of acting as a local pilot.”
“Damn it all,” I said, “I only want a passage, and you can hardly
expect the Acting Treasurer and Collector of Customs of New Guinea
to act as your blanky pilot.” “Oh, all right,” said Arbouine, “if you
don’t sign on as pilot, the ship won’t leave.”
Eventually I did take on the job as pilot of the Mount Kembla,
and left for Port Moresby. She was an iron collier with iron decks,
and utterly unsuited for tropical work; hardly had we got out of
Samarai Harbour, before the skipper, a nice, genial little man, came
to me, and said, “I’m feeling very ill, for Heaven’s sake look after the
ship.” I looked at him and, taking his temperature with a clinical
thermometer, found he was in a high state of fever. “Get away to
bed, man,” I said, “and I will dose you.” Then I told the mate to fill
him up with brandy and quinine. “I can’t do it, pilot,” said the mate;
“everything is in the lazerette and under Government seals, and I
dare not break them.” I soon settled that by smashing the seals
myself, meanwhile explaining to the mate that the ship’s pilot
happened to be the Collector of Customs for the Possession. “My
God!” said the mate, “I’ve been in the coal trade all my life, and
been in many parts of the world, but I have never been in a country
like this before.” I took the Mount Kembla safely into Port Moresby,
from whence she departed two days later; and, to my regret, I
afterwards heard that hardly had she cleared the harbour before her
nice little skipper died.
Leaving the Mount Kembla, I went to the office of the
Government Secretary, the Hon. Anthony Musgrave, and told him I
had been sent by the Governor to relieve Ballantine. “I suppose, Mr.
Monckton, you have had previous experience of accountancy and
audit work?” said Mr. Musgrave. “On the contrary,” was my reply, “if
you searched New Guinea from end to end, you could not find a
man more blankly ignorant of the subject.” Muzzy—as he was
generally termed in the service—gasped. “Did you tell the Governor
that?” he asked. “Of course I did; but he seemed to think that a man
who knew navigation and could do simple addition and subtraction
was all he required,” was my reply. Muzzy sighed, and then sent for
Ballantine and introduced me to him, after which, he gladly washed
his hands of the matter. Ballantine was very nice and kind about it
all. “You had better work with me for a few days,” he said, “it’s not
all quite as simple as his Excellency appears to imagine.” Three days
satisfied me that the job was quite beyond me; Ballantine was doing
sums all day long, and could do work, in five minutes, that would
take me a full day. At the end of the three days, I got him to
accompany me to the Government Secretary, to whom I pointed out,
that if I were to carry out the Treasurer’s duties for one month, at
the end of that time it would require at least ten clerks and one
expert accountant to unravel the tangle. “What am I to do?” said Mr.
Musgrave. “Sir William must be obeyed.” Ballantine also intimated
that he was Registrar for Births, Deaths, and Marriages, and that, as
the Death Register had not been written up for some years, I might
delve into piles of letters and papers reporting deaths, and write it
up; to which cheerful occupation I betook myself.
A MOTUAN GIRL
ebookmasss.com