Learn Azure Synapse Data Explorer: A guide to building real-time analytics solutions to unlock log and telemetry data Rocha instant download
Learn Azure Synapse Data Explorer: A guide to building real-time analytics solutions to unlock log and telemetry data Rocha instant download
https://ebookmass.com/product/learn-azure-synapse-data-explorer-
a-guide-to-building-real-time-analytics-solutions-to-unlock-log-
and-telemetry-data-rocha/
https://ebookmass.com/product/modern-data-architecture-on-azure-
design-data-centric-solutions-on-microsoft-azure-1st-edition-sagar-
lad/
https://ebookmass.com/product/ebook-pdf-introduction-to-data-
analytics-for-accounting/
https://ebookmass.com/product/data-driven-solutions-to-transportation-
problems-yinhai-wang/
https://ebookmass.com/product/data-analytics-anil-maheshwari/
https://ebookmass.com/product/big-data-analytics-introduction-to-
hadoop-spark-and-machine-learning-raj-kamal/
https://ebookmass.com/product/building-a-data-culture-the-usage-and-
flow-data-culture-model-gary-w-griffin/
Learn Azure Synapse Data Explorer
BIRMINGHAM—MUMBAI
Learn Azure Synapse Data Explorer
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-80323-395-6
www.packtpub.com
To my daughter, Isabella, I love you to the moon and all the way back. To my wife, Cecilia,
my partner, and the love of my life, thank you for your patience, love, friendship, and partnership
in life. I love you. To my brother, Plinio, my best friend, and my favorite companion in the things
we do together. And last but not least, in loving memory of my mother, Yara, and my father, Jose.
This work is dedicated to all of you.
Contributors
I’d like to thank everyone who crossed my path through 25 years of professional experience. All of you
helped me shape my own story and I am deeply thankful for it.
About the reviewer
Felipe Andrade is a client technical lead at Microsoft Canada. He has been at Microsoft for 9 years and
has been working with data analytics for over 10 years. He has spent most of his career at Microsoft
in analytics technical roles working with Power BI, SQL, Synapse, Databricks, and machine learning.
He also worked in a couple of startups as a software engineer running social network analytics.
I’d like to thank Peri Rocha for inviting me to be a technical reviewer for his book. Thanks to my family,
Leticia, Luisa, and Alice, for their patience and kindness.
Table of Contents
Prefacexiii
2
Creating Your First Data Explorer Pool 35
Technical requirements 36 Creating a Data Explorer pool using
Creating a free Azure account 36 Azure Synapse Studio 50
Creating an Azure Synapse workspace 38 Basics tab 52
Additional settings tab 53
Basics tab 40
Tags tab 54
Security tab 43
Review + create tab 54
Networking tab 45
Tags tab 47 Creating a Data Explorer pool using
Review + create tab 48 the Azure portal 55
Finding your new workspace 49 Creating a Data Explorer pool using
the Azure CLI 57
Summary60
3
Exploring Azure Synapse Studio 61
Technical requirements 62 Saving your work and configuring
source control 76
Exploring the user interface of Azure
Synapse Studio 62 Managing and monitoring Data
Running your first query 64 Explorer pools 79
Creating a database 64 Scaling Data Explorer pools 79
Loading the data 67 Pausing and resuming pools 80
Verifying whether your data has loaded
Monitoring Data Explorer pools 81
successfully72
Working with data in Azure
Summary83
Synapse notebooks 74
4
Real-World Usage Scenarios 85
Technical requirements 86 Sources87
Building a multi-purpose end-to-end Ingest88
analytics environment 86 Store89
Table of Contents ix
6
Data Analysis and Exploration with KQL and Python 137
Technical requirements 138 Exploring Data Explorer pool data
Analyzing data with KQL 138 with Python 155
Selecting data 139 Creating an Apache Spark pool 156
Working with calculated columns 143 Working with Azure Synapse notebooks 158
Plotting charts 146 Reading data from Data Explorer pools 160
Obtaining percentiles 149 Plotting charts 163
Creating a time series 150 Performing data transformation tasks 170
Detecting outliers 152 Creating a lake database 174
Using linear regression 154 Summary176
Visit https://ebookmass.com today to explore
a vast collection of ebooks across various
genres, available in popular formats like
PDF, EPUB, and MOBI, fully compatible with
all devices. Enjoy a seamless reading
experience and effortlessly download high-
quality materials in just a few simple steps.
Plus, don’t miss out on exciting offers that
let you access a wealth of knowledge at the
best prices!
x Table of Contents
7
Data Visualization with Power BI 177
Technical requirements 178 Connecting Power BI with your
Introduction to the Power BI Azure Synapse workspace 187
integration178 Authoring Power BI reports from
Creating a Power BI report 179 Azure Synapse Studio 189
Adding data sources to your Power Summary193
BI report 184
8
Building Machine Learning Experiments 195
Technical requirements 196 Exploring additional ML capabilities
Understanding the application of ML 196 in Azure Synapse 213
Introducing ML into your projects Using pre-trained models with Cognitive
Services213
with AutoML 197
Finding patterns using KQL 215
Creating an Azure Machine Learning
Training models with Apache Spark MLlib 216
workspace198
Building applications with SynapseML 217
Configuring the Azure Machine Learning
integration200 Summary217
Finding the best model with AutoML 201
9
Exporting Data from Data Explorer Pools 219
Technical requirements 220 Exporting to cloud storage 224
Understanding data export scenarios 220 Exporting to SQL tables 227
Exporting to external tables 228
Exporting data with client tools 221
Using server-side export to pull data 222 Configuring continuous data export 230
Performing robust exports with Summary233
server-side data push 224
Table of Contents xi
11
Tuning and Resource Management 253
Technical requirements 253 Queuing requests for delayed execution 260
Implementing resource governance Speeding up queries using cache
with workload groups 254 policies261
Managing workload groups 254 Summary264
Classifying user requests 258
12
Securing Your Environment 265
Technical requirements 266 Implementing network security 282
Security overview 266 Using a managed virtual network 284
Managing data encryption 267 Managed private endpoint connection 285
Enabling data exfiltration protection 289
Configuring data encryption at rest 268
Controlling public network access 291
Understanding data encryption in transit 270
13
Advanced Data Management 295
Technical requirements 295 Purging personal data 305
Managing extents 296 Enabling purge on Data Explorer pools 306
Extent tagging 299 Executing data purge operations 307
Moving extents 302 Monitoring data purge operations 310
Dropping extents 304 Summary311
Index313
Chapter 2, Creating Your First Data Explorer Pool, gets your hands busy by walking you through the
creation of your first Azure Synapse workspace and a Data Explorer pool using the Azure portal, Azure
Synapse Studio, or the Azure Command-Line Interface (CLI). If you are not familiar with Azure yet,
don’t worry; this chapter guides you through the steps to create your first free Azure account, allowing
you to follow the examples in the book.
Chapter 3, Exploring Azure Synapse Studio, introduces the development and management environment
of Azure Synapse. You will learn about the user interface elements of Azure Synapse Studio, and where
to find what you are looking for by navigating through the hubs. In addition to that, in this chapter,
you will load some data into a database and run your first query to help you familiarize yourself
with the query editor. This chapter closes with an overview of where to manage and monitor your
environment using Azure Synapse Studio.
Chapter 4, Real-World Usage Scenarios, describes some example solution architectures you can use in
common log and telemetry data analytics scenarios. It looks at five real-world use cases that integrate
Azure Synapse Data Explorer with other Azure services and helps you understand the blueprints so
that you can build your own.
Chapter 5, Ingesting Data into Data Explorer Pools, kicks off Part 2, Working with Data. It walks you
through the data loading process, choosing your own data loading strategy, and walks you through
different ways to load data into Data Explorer pools. This chapter builds the data assets that you will
use in most chapters of the book.
Chapter 6, Data Exploration and Analysis with KQL and Python, is all about learning how to query,
transform, and get insights from your data using Kusto Query Language (KQL) and Python. You will
learn how to use KQL to explore the data you have at hand and familiarize yourself with the schema,
plot simple charts in the query editor, obtain percentiles, and even use native KQL commands to
look at trends in your data using linear regression. In the second half of this chapter, you will create
an Azure Synapse notebook to explore and transform data using Python and create a lake database.
Chapter 7, Data Visualization with Power BI, complements the previous chapter by helping you
configure Power BI integration with Azure Synapse and author new Power BI reports directly from
Azure Synapse Studio. It walks you through the creation of reports that connect to data in Data
Explorer pools, as well as to your new lake database.
Chapter 8, Building Machine Learning Experiments, provides an overview of applied machine learning,
and how to introduce advanced analytics to your Azure Synapse projects using automated machine
learning (AutoML). You will use Python to prepare your data for machine learning experiments,
train a series of models, and find the best model to help you predict values.
Chapter 9, Exporting Data from Data Explorer Pools, closes Part 2, Working with Data, by walking you
through data export scenarios. It explains scenarios where data exports are needed and walks you
through different options you have available to perform data exports, including continuous data exports.
Preface xv
Chapter 10, System Monitoring and Diagnostics, is the first of four chapters in Part 3, Managing Azure
Synapse Data Explorer. In this chapter, you will learn about managing a platform-as-a-service service
such as Azure Synapse, and which parts of the service you should be concerned with. Through code
examples and guidance through the user interface, you will learn how to stay on top of your Data
Explorer pools and proactively monitor them. By setting up alerts, you’ll learn how to get notified on
your phone if an event of interest happens in your environment.
Chapter 11, Tuning and Resource Management, introduces resources to help you provide predictable
performance to end users and using cache policies to speed up queries. It walks you through the
implementation of resource management to help you categorize user requests to prioritize the execution
of critical workloads while queueing requests that can wait.
Chapter 12, Securing Your Environment, provides you with the information you need to make sure
your data is secure at rest and in transit, and that only people who are intended to access your data
have access to it. It walks you through an overview of the security issues you need to consider for
your own implementations, how to double-encrypt your data for an added layer of security, how to
authenticate and authorize users, and how to protect the network environment that transits your data.
Chapter 13, Advanced Data Management, covers how to adhere to governmental regulations for data
handling, including how to permanently purge personal data. You will learn how to use extents, or
data shards, in Azure Synapse Data Explorer to move large volumes of data quickly for archival.
The Azure portal and Azure Synapse Studio are web-based tools that are used to manage, develop,
and build solutions for Azure Synapse Data Explorer. Microsoft supports the latest versions of the
following browsers: Microsoft Edge, Safari (Mac only), Chrome, and Firefox.
To install Power BI Desktop, visit https://learn.microsoft.com/power-bi/
fundamentals/desktop-get-the-desktop.
xvi Preface
To install the Microsoft Azure App, visit http://aka.ms/getazureapp on your mobile device,
or look for the Microsoft Azure App in your device’s app store.
If you are using the digital version of this book, we advise you to type the code yourself or access
the code from the book’s GitHub repository (a link is available in the next section). Doing so will
help you avoid any potential errors related to the copying and pasting of code.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “To create
or alter a new workload group, use the .create-or-alter workload_group command.”
A block of code is set as follows:
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words
in menus or dialog boxes appear in bold. Here is an example: “To enable it, you must select the Enable
option next to Double encryption using a customer-managed key, in the Security tab of the Create
Synapse workspace wizard.”
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at customercare@
packtpub.com and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen.
If you have found a mistake in this book, we would be grateful if you would report this to us. Please
visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would
be grateful if you would provide us with the location address or website name. Please contact us at
copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit authors.packtpub.com.
https://packt.link/free-ebook/9781803233956
To maximize your learning experience, you should quickly become familiar with the core concepts
and tools you will work with when reproducing the examples and learning new concepts, and how
these concepts can help you in real-life projects. The first part of the book focuses on introducing
Azure Synapse Data Explorer and all of its layers. You will learn about the service architecture, all
of the platform elements within Azure Synapse, and how to create your own lab environment to
run through the book examples. You will also become familiar with Azure Synapse Studio, and the
development and management interface of Azure Synapse. Finally, you will learn about solution
templates from real-world usage scenarios that will help you speed up your own Azure Synapse Data
Explorer implementations.
This part comprises the following chapters:
ebookmasss.com