Refactoring Legacy T SQL for Improved Performance Modern Practices for SQL Server Applications 1st Edition Lisa Bohm pdf download
Refactoring Legacy T SQL for Improved Performance Modern Practices for SQL Server Applications 1st Edition Lisa Bohm pdf download
https://textbookfull.com/product/refactoring-legacy-t-sql-for-
improved-performance-modern-practices-for-sql-server-
applications-1st-edition-lisa-bohm/
https://textbookfull.com/product/high-performance-sql-server-
consistent-response-for-mission-critical-applications-2nd-
edition-benjamin-nevarez/
https://textbookfull.com/product/high-performance-sql-server-
consistent-response-for-mission-critical-applications-2nd-
edition-benjamin-nevarez-2/
https://textbookfull.com/product/pro-sql-server-relational-
database-design-and-implementation-best-practices-for-
scalability-and-performance-louis-davidson/
https://textbookfull.com/product/sql-server-execution-plans-for-
sql-server-2008-through-to-2017-and-azure-sql-database-3rd-
edition-grant-fritchey/
Expert SQL Server Transactions and Locking: Concurrency
Internals for SQL Server Practitioners 1st Edition
Dmitri Korotkevitch
https://textbookfull.com/product/expert-sql-server-transactions-
and-locking-concurrency-internals-for-sql-server-
practitioners-1st-edition-dmitri-korotkevitch/
https://textbookfull.com/product/azure-sql-revealed-a-guide-to-
the-cloud-for-sql-server-professionals-bob-ward/
https://textbookfull.com/product/azure-sql-revealed-a-guide-to-
the-cloud-for-sql-server-professionals-1st-edition-bob-ward/
https://textbookfull.com/product/building-custom-tasks-for-sql-
server-integration-services-the-power-of-net-for-etl-for-sql-
server-2019-and-beyond-2nd-edition-andy-leonard/
https://textbookfull.com/product/building-custom-tasks-for-sql-
server-integration-services-the-power-of-net-for-etl-for-sql-
server-2019-and-beyond-second-edition-andy-leonard/
Refactoring
Legacy T-SQL
for Improved
Performance
Modern Practices for SQL Server
Applications
—
Lisa Bohm
Refactoring Legacy T-SQL
for Improved Performance
Modern Practices for
SQL Server Applications
Lisa Bohm
Refactoring Legacy T-SQL for Improved Performance: Modern Practices for
SQL Server Applications
Lisa Bohm
Chardon, OH, USA
Introduction�����������������������������������������������������������������������������������������������������������xvii
Chapter 2: Documentation�������������������������������������������������������������������������������������� 21
Incorporating Existing Documentation���������������������������������������������������������������������������������������� 21
Functionality Documentation������������������������������������������������������������������������������������������������� 28
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
viii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 233
ix
About the Author
Lisa Bohm leads a team of database administrators (DBAs)
for a software development company. Her history with
legacy database code began early in her career with a
summer project to rewrite the chemical inventory database
for the research division of a local VA hospital. From
there, she went on to building front-end web applications.
When the web calls timed out, Lisa dug in to learn what
databases can do. She has since transitioned into database
administration, inheriting and improving legacy applications
along the way. Her personal focus remains on solid database
architecture and writing well-performing T-SQL.
xi
About the Technical Reviewer
Kathi Kellenberger is a data platform MVP and the editor of Simple Talk at Redgate
Software. She has worked with SQL Server for over 20 years. She is also coleader of the
PASS Women in Technology Virtual Group and an instructor at LaunchCode. In her
spare time, Kathi enjoys spending time with family and friends, singing, and cycling.
xiii
Acknowledgments
I would like to thank all of the people who believed in me, encouraged, and pushed me
to continue to grow and learn. Special thanks go to Mindy Curnutt, Eric Blinn, and Tim
Tarbet, who showed me how amazing someone can be at the job they choose to do, and
believed that I could be that good too.
I cannot leave out the people who work for me. I lead a wonderful team of involved
people who are active in their continued learning, and continue to inspire me every day
by finding solutions to really difficult problems.
Also thank you to my family (including my #sqlfamily) who have always been
supportive, loving, and unstinting of hugs and moral support when needed!
xv
Introduction
What is legacy code? There are a few definitions floating around out there, but as a
working definition, we’re going to use the following:
Legacy code is code that is no longer being actively supported by the people who
wrote it.
Why are we going to use that? In software development, good documentation goes a
long way. Developers should understand what code is trying to accomplish and how it’s
trying to do so. When documentation either doesn’t exist or isn’t as thorough as required
and the original programmers aren’t available if you need to know why something was
written a particular way, it can be a nightmare to fix. In some cases, it may not even
be clear whether code ever worked as intended, or if the functionality of the change
someone is requesting is within the original intent of the programmer(s).
A Tale of Woe
How does legacy code start? Let’s look at this story. Code is written to solve a problem –
for example, someone is copying data into Excel every day and doing some hand
manipulation to generate a graph to add to a larger report. A developer sets up a quick
application to pull the data from the database and export it into Excel automatically for
the user, also performing the calculations the user was doing by hand.
This user then trains their successor and another person in the department on how
to view this report. One of them says, “Hey, this is great! Can you also make it pull data
for this other report and we can show how these numbers reflect against each other?”
Someone else loves the additional functionality but needs the code to work in a different
way, or do different statistical calculations, or needs to add an additional field on the
report. That person’s manager is intrigued by the functionality and wants a weekly
summary report to review. Code structure starts to resemble something that is cobbled
together, as multiple developers add bits of functionality over time. Oftentimes, there
is little to no documentation on the functionality or the choice of code – everyone just
adds a bunch of lines at the end of the code to handle the small part they were asked to
develop.
xvii
Introduction
Many times, front-end developers don’t specialize in T-SQL, so do not usually have
a deep understanding of the SQL Server optimizer. Especially in the case of “let’s just
add lines of code to the bottom of this to handle additional functionality,” calls to the
database can increase exponentially; in many cases, calls grab the same data over and
over. And, oh, by now, over half the company is using this app in one way or another – or
perhaps three ways. The vast majority of these uses, by the way, were never intended by
anyone who had ever touched the code.
Users complain about slowness and performance. Even more frustrating, all of the
other business-critical applications that use the same database(s) become slower and
slower as they fight for resources and locks with the application and its chatty data calls.
Also, of course, every developer that has ever touched this application has moved on or
has been promoted and hasn’t looked at code for years, so has no recollection of ever
manipulating any code even remotely similar to this patched-together behemoth that is
rampaging through the company infrastructure.
Congratulations!
You have inherited one of these types of applications, or you probably wouldn’t be
here reading this book. Although there will be (possibly many) times that you may
want to cry, yell, or swear, this will also give you some unparalleled opportunities to be
a hero and pull off some very spectacular-seeming fixes. Just remember, though, that
when you really fix something amazing, most people will be completely oblivious to
that fact. Then, when you do something you think is so obvious that a worm out there
on the sidewalk could probably manage it, you may get so many congratulations and
thanks that you’ll wonder if you really did something magical. That is probably more
of a general life/job observation and not related specifically to legacy code, but it’s also
prevalent here.
xviii
Introduction
We are going to continue on from the point of “Okay, this has been identified as an
issue. Now what do I do with it?” Most of what we’ll be doing is actually looking at the
code with the help of a few performance measures and learning about best practices to
help identify problem areas. You should be familiar with basic T-SQL coding syntax and
techniques and how to do a bit more advanced querying.
We will cover most of the issues commonly seen by object type, as well as a couple
of less common problems just for fun. Once these problem areas within the object are
identified, you can then mitigate the performance issues with relatively low effort and
cost. Some objects may require a much deeper dive. Once we’ve done some triage to
help alleviate the immediate pain an object is causing, we will cover what is involved in
effectively performing the deeper dive.
We also will talk about how to quickly tell if you’re on the right track in terms of the
fixes you want to apply. We’ll go over some simple (and free) tools that can be used to
measure performance, so you can document the before/after metrics to go along with
the rest of the documentation you’re going to be sure to add so the next poor sod
(I mean the next person) who has to maintain this system will have an easier time of it!
xix
Introduction
This is not a deep dive into indexes, statistics, or database maintenance. I’d hope
that you have all of those things set up. If you do not, please go find a book that discusses
those items and make sure you are doing them! If you are not in charge of the database
administration specifically, talk to whoever is and make sure THEY have those bases
covered. We will definitely mention indexes and statistics, but there is much more
complete information on them in other places, so discussion will be more of a flyby than
exhaustive information.
This is also not a way to identify problematic hardware, external forces, or database objects.
We mentioned in the preceding text what the assumptions are. If you skipped those, please
go back and at least read those points so we’re all on the same page when we start our triage.
T he Tools
We will be using a data dump from Stack Overflow. Brent Ozar kindly provided it on
his web site at www.brentozar.com/archive/2015/10/how-to-download-the-stack-
overflow-database-via-bittorrent/ which he in turn got from the awesome Stack
Overflow folks.
I will also include a backup with the files for the book so everyone can start on the
same foot. If you want to follow along with the examples, please make sure to run the
database setup script (included in the files for the book) after restoring the database
backup to add the additional objects we’ll be using throughout the book.
SQL Server Management Studio (SSMS) will be the code-running application of
choice. It is a free download from Microsoft: https://docs.microsoft.com/en-us/sql/
ssms/download-sql-server-management-studio-ssms
SentryOne Plan Explorer is also a free download. We will be using it to look at
execution plans and performance metrics related to the plan:
www.sentryone.com/plan-explorer
The last tool is a web site. You can paste in your exhaustively long statistics IO/time
output, and it will parse it into nice handy tables. It is worth its weight in… um… bitcoin?
If you’ve ever spent time trying to wade through output from a cursor or loop, you’ll
completely understand. Please be forewarned though that there is a limit to how much
you can dump in and have it still actually regurgitate output:
http://statisticsparser.com/
xx
Introduction
Let’s Go!
Now that we’ve gotten that out of the way, let’s pull up the code for our first painful
object and go do some first aid!
xxi
PART I
Everything Is Slow
CHAPTER 1
T-SQL Triage
The most dreaded call a DBA can get is this: “Everything in the application is slow! FIX IT!”
In many cases, the database is one of the culprits of the poor performance of the legacy
code. When you are approached to deal with some of these issues, what do you do? Well,
first you identify painful code. We’re going to assume that you (or someone else) have
already identified the concerning areas. Once that happens, we need to assess the situation
by answering these questions:
3
© Lisa Bohm 2020
L. Bohm, Refactoring Legacy T-SQL for Improved Performance, https://doi.org/10.1007/978-1-4842-5581-0_1
Chapter 1 T-SQL Triage
Relative Effort
If this is a several-thousand-line-of-code object, just shake your head sadly and walk
away. Just kidding! However, the effort to just understand what is going on will be
significant, let alone the effort to actually fix the performance issues. This is where great
documentation comes into play, but it’s unlikely that if you’re reading this book, you
have that kind of help at your disposal.
Additionally, you need to think about QA (quality assurance) effort. The best
scenario is to never EVER let untested code go into production. By untested, I mean code
that hasn’t passed a rigorous QA process. Running something once on your laptop seems
like a great idea, but there are all sorts of weird scenarios that you may not be aware of,
that a QA professional will have a much better grasp on.
How many precise pain points are in this code? This will be the biggest
determination of what kind of triage we can perform to put smiles on the users’ faces (or
at least make them stop complaining to your boss). Regardless of any of these answers
though, if code has been identified as a real problem, triage is only your first step.
Even if you “fix” it with an index or other smooth wizardry, DO NOT STOP there! Fully
document the code and rewrite if necessary.
Problem Areas
“Hello? Sam DBA? Every time a user tries to change their display name, the application
hangs for seconds, and it’s really frustrating everyone.”
The best way to see what’s going on with code is to go run some of the code that is
causing problems. Sometimes, settings outside of the actual SQL code can cause issues
4
Chapter 1 T-SQL Triage
as well. Application connection strings can sometimes set ANSI settings, and this can
make SQL code run very differently from SQL Server Management Studio (SSMS),
for example. We’re going to assume that this has already been ruled out and isn’t the
cause of what we’re seeing in this book, but I wanted you to be aware that if you haven’t
checked into those possibilities, you probably should.
This is really useful information that you just received from the caller. Be aware,
however, that people like to throw around terms like “always” and “never” pretty lightly.
Try to get details: “When you say every time, is it 100% of the time? Is it 90% of the time?
Is it any time of day or limited to certain times?” Document the answers to go along with
the ticket/request/complaint documentation.
Let’s go look at what’s happening when a user tries to change their display name.
We’re going to try to change the following users’ display names in the sections to follow
as we look for where the issues are. We’ll use Jon Skeet who has an Id of 22656 and stic
who has an Id of 31996. The query will be run against the dbo.Users table.
5
Chapter 1 T-SQL Triage
Make sure you check BOTH the SET STATISTICS TIME and the SET STATISTICS IO
boxes and then click OK.
If you want to go all scripting, you simply type in the query window the code
shown in Listing 1-1.
and click Execute (or hit Ctrl-E). Please note that either way you set STATISTICS
TIME and IO on, it will only be set for that specific query window or connection (SPID).
If the connection gets reset (e.g., restarting SSMS) or if you open another query window,
you will need to turn STATISTICS TIME and IO on again for the new query window.
C
ode Tests
Next, we want to update a user’s name. So, first, I went and found a user by querying the
Users table and came up with a quick query to change a name. I ran it twice, because the
first run usually includes compile time. This user’s original name was “stic”, by the way.
We’ll change it back later.
6
Chapter 1 T-SQL Triage
UPDATE Users
SET DisplayName = 'stic in mud'
WHERE Id = 31996;
Listing 1-3 shows what the STATISTICS IO and TIME output looks like for the query in
Listing 1-2.
7
Chapter 1 T-SQL Triage
There is a much nicer way to look at the output in Listing 1-3. Go to the web site
http://statisticsparser.com/, and paste that output into the big window. Click the
Parse button, and scroll to the bottom of the page. You’ll get a nice summary table that
will give you what you need to know at a glance. This is the section labeled “Totals” on
the web page. Table 1-1 shows the read columns from the “Totals” section after running
the output shown in Listing 1-3 through the Statistics Parser web site.
Table 1-1. The “Totals” section read columns of the Statistics Parser site output for
Listing 1-3
Table Scan Count Logical Reads Physical Reads Read-Ahead Reads
Total 2 366,195 0 0
Users 0 3 0 0
WidePosts 2 366,192 0 0
CPU Elapsed
So what are we looking at with Table 1-1? When we’re using STATISTICS output
to help with query tuning, we’re mostly going to focus on logical reads. The first time
you run a query, if your data isn’t in memory, you will see higher physical reads. When
testing queries, I generally ignore the first run and focus on the second. This is more
accurate to what is usually seen in production, where the data being called frequently
will already be in memory. Also, by always using this method, we can make sure we’re
comparing apples to apples.
8
Chapter 1 T-SQL Triage
We are seeing a LOT of page reads in Table 1-1, especially for the update of a single
row. But what is this WidePosts table? We weren’t updating that table… were we? No,
we were updating the dbo.Users table. Somehow, this WidePosts table is related to
the Users table. In SSMS, are there any objects around the Users table? A foreign key
constraint? A… wait, a trigger? Hmm, let’s look at the definition of the trigger, which
is shown in Listing 1-4.
/*****************************************************************
Object Description: Pushes user changes to the WidePosts table.
Revision History:
Date Name Label/PTS Description
----------- -------------- ---------- -------------------
2019.05.12 LBohm Initial Release
*****************************************************************/
IF EXISTS
(
SELECT 1
FROM INSERTED i
INNER JOIN dbo.WidePosts wp ON i.id = wp.OwnerUserId
)
BEGIN
IF EXISTS
(
SELECT 1
FROM INSERTED i
INNER JOIN dbo.WidePosts wp ON i.Id = wp.OwnerUserId
WHERE i.Age <> wp.Age
9
Chapter 1 T-SQL Triage
10
Chapter 1 T-SQL Triage
Huh. Every time we are updating the Users table, it sends the update to any records
in this WidePosts table where the OwnerUserID is equal to the Users table id. It’s doing a
LOT of reads to perform this task. Let’s run another example, and this time we’ll grab an
execution plan.
E xecution Plans
An execution plan shows how SQL Server decided to run your query or statement. It
shows the operators that the optimizer chose. If you have never looked at one, it can be
confusing. The execution plan is XML data that can be parsed into a diagram of what
operators the SQL Server engine uses to fulfill your query request. (Well, that’s clear as
mud, right?) When a query is run, SQL Server uses the Query Optimizer to figure out
the best way to return data or perform the request. For each table that we’re getting data
from or performing an operation against, SQL Server will use one or more operators to
access that table. For example, if we need data from a large portion of the Posts table,
SQL Server will perform a “table scan” – that is, it will read all of the data pages for the
table – to find the data to return. The plan will also show how many rows SQL Server is
expecting to push along to the next operator.
The first things I look for at a quick glance are big fat lines (indicating a LOT of data,
or number of rows being pushed to the next operator). Also, I review which operators
show the most work being done. I use Plan Explorer to help sort operations by the work
being done per operation, which allows us to easily find the most expensive culprits. So
how do we get this information?
In SSMS, there is a button you can click above the query window. When you hover
over it, it will show you text reading “Get Actual Execution Plan”. This icon is shown in
Figure 1-2, surrounded by a box.
11
Chapter 1 T-SQL Triage
S
tatistics
An actual execution plan is simply an estimated execution plan including statistics
information, so generally they’re not very different. Statistics are metadata that
SQL Server keeps against columns in tables to indicate the distribution of data. In a
dictionary, for example, there are lots of words starting with the letters “st”. There are less
words starting with “zy”. This skew of data is tracked through statistics, which helps the
Query Optimizer find a better plan.
I usually use an actual execution plan because it’s also an easy way to see if statistics
might be out of date and require maintenance. If we get back a single row and the Query
Optimizer thinks we’re returning 3,000 rows, statistics maintenance might be needed.
Let’s run a query updating a different display name now.
UPDATE Users
SET DisplayName = 'Rita Skeeter'
WHERE Id = 22656;
What can our execution plan tell us? Let’s take a look at the plan generated by
running the code in Listing 1-5. If you have SentryOne Plan Explorer installed, you
can right-click the execution plan in SSMS (look for the tab near your Results tab) and
choose to open the plan in Plan Explorer from the menu that appears.
12
Chapter 1 T-SQL Triage
Once you look at the execution plan in Plan Explorer, you’ll see something similar to
what we see in Figure 1-3. Please make sure you have clicked the third row of the Results
tab Statement output – we want to be looking at the query plan for the most expensive
(highest estimated cost) part of the query we ran.
Figure 1-3. Initial execution plan for the query in Listing 1-5
We see that the most effort was spent on a clustered index scan for WidePosts, since
the trigger updates the WidePosts table when the Users table is updated.
What is a clustered index scan? Well, a clustered index is the index where all of the
table’s data is stored, in the order specified by the index. There can only be a single
clustered index per table. If a table does not have a clustered index, it is considered a
heap. So a scan of the clustered index is comparable to a scan of the entire table.
Also, there are a lot of rows being read in that operation. Well, if we decided to
look in the WidePosts table, we’d find that there were 4,394 rows of data that included
the OwnerUserID corresponding to user ID 22656. That’s all well and good, but we are
reading over 850,000 rows. (We can see that 850,000 rows in Figure 1-3; it’s the number
right under the wide arrow leading from the Clustered Index Scan.) This is a problem
that we will need to address.
13
Another Random Document on
Scribd Without Any Related Topics
The Project Gutenberg eBook of John Marvel,
Assistant
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Language: English
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com