PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond 1st Edition Kevin Feasel instant download
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond 1st Edition Kevin Feasel instant download
https://ebookmeta.com/product/polybase-revealed-data-
virtualization-with-sql-server-hadoop-apache-spark-and-
beyond-1st-edition-kevin-feasel/
https://ebookmeta.com/product/beginning-apache-spark-3-with-
dataframe-spark-sql-structured-streaming-and-spark-machine-
learning-library-hien-luu/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/sql-server-2022-revealed-a-hybrid-
data-platform-powered-by-security-performance-and-
availability-1st-edition-bob-ward/
https://ebookmeta.com/product/dk-eyewitness-jerusalem-israel-and-
the-palestinian-territories-travel-guide-dk-eyewitness/
Populist Rhetorics Case Studies and a Minimalist
Definition 1st Edition Christian Kock
https://ebookmeta.com/product/populist-rhetorics-case-studies-
and-a-minimalist-definition-1st-edition-christian-kock/
https://ebookmeta.com/product/the-professionalisation-of-african-
medicine-african-seminars-scholarship-from-the-international-
african-institute-volume-5-1st-edition-murray-last-editor/
https://ebookmeta.com/product/cisco-aci-zero-to-hero-a-
comprehensive-guide-to-cisco-aci-design-implementation-operation-
and-troubleshooting-1st-edition-jan-janovic/
https://ebookmeta.com/product/chase-the-darkness-1st-edition-j-d-
tyler/
https://ebookmeta.com/product/a-modern-introduction-to-neutrino-
physics-frank-f-deppisch/
A Most Unlikely Hero 07 1st Edition Brandon Varnell
https://ebookmeta.com/product/a-most-unlikely-hero-07-1st-
edition-brandon-varnell/
PolyBase
Revealed
Data Virtualization with SQL Server,
Hadoop, Apache Spark, and Beyond
—
Kevin Feasel
PolyBase Revealed
Data Virtualization with SQL Server,
Hadoop, Apache Spark, and Beyond
Kevin Feasel
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark,
and Beyond
Kevin Feasel
Durham, NC, USA
Acknowledgments���������������������������������������������������������������������������������������������������xv
Introduction�����������������������������������������������������������������������������������������������������������xvii
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
viii
Table of Contents
ix
Table of Contents
Chapter 11: Query Tuning with Statistics and Execution Plans���������������������������� 273
Statistics in SQL Server������������������������������������������������������������������������������������������������������������ 273
Statistics on External Tables������������������������������������������������������������������������������������������������ 276
Managing External Statistics����������������������������������������������������������������������������������������������� 277
The Performance Impact of Statistics��������������������������������������������������������������������������������� 280
Tuning Queries with Execution Plans���������������������������������������������������������������������������������������� 283
Reviewing an Execution Plan���������������������������������������������������������������������������������������������� 283
Reviewing MapReduce Queries������������������������������������������������������������������������������������������� 287
Conclusion�������������������������������������������������������������������������������������������������������������������������������� 288
Index��������������������������������������������������������������������������������������������������������������������� 305
x
About the Author
Kevin Feasel is a Microsoft Data Platform MVP and CTO at
Envizage, where he specializes in T-SQL and R development,
forcing Spark clusters to do his bidding, fighting with Kafka,
and pulling rabbits out of hats on demand. He is the lead
curator at Curated SQL (https://curatedsql.com).
A resident of Durham, North Carolina, Kevin can be found
cycling the trails along the Triangle whenever the weather is
nice enough.
xi
About the Technical Reviewer
Ike Ellis has over 18 years of experience in data engineering.
He’s a current Microsoft MVP. He is the General Manager of
Data & AI for Solliance. He is a partner in Crafting Bytes,
a San Diego software studio and Data Engineering group.
He is an author who has written several books on Microsoft
data and Azure topics. Ike has spoken at conferences around
the world including PASS Summit, SQLBits, DevIntersection,
TechEd, TechEd Europe, and SQL in the City. He’ll be
speaking at Craft in Romania in 2020. For more information,
see www.ikeellis.com or follow him on Twitter at @ike_ellis.
xiii
Acknowledgments
This book is the product of many hours of research, rants, and head-scratching
moments. Most of the head-scratching was mine, and I could not have written this book
without the help of some very smart people. First, I would like to thank several Microsoft
employees. Murshed Zaman gets top billing here for starting me on my PolyBase
journey and helping solve my initial problems. James Rowland-Jones is next—James
patiently explained various PolyBase details to me over and over, until I finally got
them. Eric Burgess, Nathan Schoenack, and Suresh Kandoth at Microsoft CSS helped
me understand common pain points and shaped the direction of several chapters,
particularly Chapter 5. Finally, Jasraj Dange and Weiyun Huang helped connect the
dots on PolyBase V2 functionality and left me excited about the future of the technology
beyond SQL Server 2019.
I am grateful to Bill Preachuk and Scott Shaw at Cloudera (and previously, when they
were at Hortonworks) for helping me through several land mines early on when working
with Hadoop. I’ll never forget going back and forth between Murshed at CAT and Bill
and Scott at the Hortonworks booth at PASS Summit trying to get PolyBase and the
Hortonworks Data Platform sandbox working at the same time on my laptop.
In addition to these two companies, I am heavily indebted to several people in the
community. Ike Ellis is the best technical editor I could have asked for, at least until he
calls in the favor. Gerhard Brueckl was great to bounce ideas off of and provided me with
insights I never would have discovered otherwise. Jason Horner helped me test things I
thought I knew about the product and made this book better as a result. Hasan Savran
taught me most of what I know about Cosmos DB, though don’t hold my limitations
against him. There are plenty of others who played a role in this book, and I thank you all
for it.
As is natural in these works, any errors which remain are mine and mine alone,
although I will probably try to blame gremlins.
xv
Introduction
This is an exciting time to be a data platform professional. Over the past decade, we have
seen a proliferation of data platform technologies, all trying to solve the critical problem
of our era: collecting, storing, managing, and querying ever-increasing amounts of data.
To solve this problem, we have seen the rise of technologies like Apache Hadoop, Apache
Spark, Google BigTable, Amazon Redshift, Microsoft Azure Synapse Analytics, and plenty
more. In the meantime, the hard core of the data platform space—relational databases—
has not ossified. Each new edition of SQL Server, Oracle, and PostgreSQL has new
features and the ability to handle even more data. As great as these relational database
platform products are, however, they do not fit every use case. Where there are gaps,
other products fill the void. This leaves data platform developers at most enterprises—
even companies of moderate size—juggling data between several systems.
Historically, the way we managed this juggling act was to learn a separate language
for each platform: T-SQL for SQL Server; PL/SQL for Oracle; HiveQL for Apache Hive;
Spark SQL, Scala, or Python for Apache Spark; and so on. The common adoption of SQL
as a general interface (sometimes in spite of product developers’ wishes) has simplified
the task, but each product has its own dialect of ANSI SQL, and it can be difficult to
remember which operators and functions exist in one database vs. another.
Furthermore, the most common task I see in this space is some variant of Extract-
Transform-Load (ETL): moving data from one system to another, sometimes reshaping
it along the way, in order to combine the products of two separate systems. We spend so
much time moving data within systems, going from write-heavy Online Transactional
Processing (OLTP systems) into reporting-friendly Online Analytical Processing (OLAP)
databases. Add different data platform technologies and the problem grows even further:
now we need to combine that general ledger data from Oracle, device statuses from
Spark, device metadata from SQL Server, customer data entry from Cosmos DB, and
historical rollups from Teradata in order to build a neural network which will solve all of
our problems. The traditional approach has been to use purpose-built ETL tools like SQL
Server Integration Services or Informatica, or to write custom code in a programming
language of choice. These techniques work, but they are relatively effort-heavy, particularly
in maintaining this separate ETL process as the source systems evolve over time.
xvii
Introduction
A modern take on the classic problem of ETL is data virtualization: making this data
appear to come from one source system while under the covers defining links to where
the data really lives. An end user or analyst can read this data using one SQL dialect and
join together structured data sets from different systems without needing to know the
provenance of each data set and without waiting for database developers to build in the
plumbing needed to move data from one system to the next. This simplifies greatly the
analyst’s life and is one of the key selling points of Microsoft’s PolyBase technology.
PolyBase has been around since 2010 but came to the general public in SQL Server
2016. Its purpose was to integrate SQL Server with Hadoop by allowing us to run
MapReduce jobs against a remote Hadoop cluster, bringing the results back into SQL
Server and thus reducing the computational burden on our relatively more expensive
SQL Server instances. Now, with SQL Server 2019, PolyBase has grown and adapted to
this era of data virtualization. As you will see throughout this book, PolyBase gives us
the ability to integrate with a variety of source systems. In the book, we will connect to
a Hadoop cluster, Azure Blob Storage, other SQL Server instances, an Oracle database,
Cosmos DB, an Apache Spark cluster, Apache Hive tables, and even Microsoft Excel! This
leaves out the wide variety of other data sources, such as Teradata, MongoDB, DB2, and
much more. The best part of it is that our developers need only one language for all of
this: T-SQL.
PolyBase is no panacea, and there are certainly trade-offs compared to storing all
data natively in one source system, particularly around performance. If you do, however,
have existing, disparate systems which need to interact, PolyBase has a few tricks up its
sleeves to make those integrations easier.
This book is intended for database developers, database administrators, and
architects looking to solve multisystem integration problems. My key assumption
throughout this book is that you are already familiar with the T-SQL language but
might be less familiar with different data platform technologies such as Hadoop, Spark,
or Cosmos DB. Naturally, having more experience with these other data platform
technologies will help considerably when dealing with the headaches which come when
trying to interconnect disparate systems.
My intent in this book is as much narrative as reference, meaning that the best way
to read the book is in chapter order. Even if you do not make use of a particular data
platform technology, there can be key components in the chapter which apply to other
technologies. In particularly important cases, I will note when this is the case so you do
not miss out on critical information.
xviii
Exploring the Variety of Random
Documents with Different Content
caution, but if no such habits are the result, danger ceases to excite
this emotion, and a man becomes at once fearless and careless. So
with sympathy in the sufferings of others; if no habits of benevolent
efforts to relieve are induced, that sensibility diminishes, and men
become at once unsympathising, hard and cruel. So it is with
shame; if it does not lead to habits of honor and duty, the
susceptibility continually diminishes. And so it is with remorse; if
habits of rectitude are not induced by its emotions, the conscience
becomes “seared as with hot iron.”
One point in the history of our race has a mournful pertinence to this
question. We find that the improvement and the safety of the great
commonwealth is always, more or less, promoted by the ruin of
individuals. Multitudes are deterred from evil courses by the
miserable end of those who pursue them; so that the good are often
preserved by the destruction of the bad.
So, too, we find exhibitions of the fact that minds are utterly ruined,
and ruined for ever, so far as we can perceive. The man who has
stultified his intellect, ruined his health, seared his conscience, and
blunted all his generous and benevolent sensibilities by a course of
debauchery, cruelty and crime, is a wreck as total and irretrievable,
so far as we can see, as a watch whose springs and pivots are
crushed beneath the hammer, or a human body whose every
lineament is effaced beneath the rushing locomotive train.
Add to this the teaching of experience, that when men are bad, the
increase of blessings only increases indulgence and crime. At the
same time punishment does not tend to reformation. The more men
suffer for their folly and guilt, the more hardened they become. The
victims of licentiousness and intemperance, though they suffer such
miseries, have ever been regarded as the farthest removed from the
probabilities of reformation.
[pg 178]
That any portion, either of matter or mind, is to be annihilated, can
not be inferred from any past experience. All that we can learn are
the laws of perpetual succession and change. One single fact of
annihilation has never yet been made known to man by any process
of reasoning, or any recorded experience.
Here, as before, we have only the nature and past history of mind,
from which the future is to be deduced. In this world we have found
the changes in the character of individuals and of communities to
proceed by slow and imperceptible movement. We have nothing in
the past to lead to the belief that this slow process of discipline,
culture and change may not proceed on for ages. As in this life,
multitudes have the impress and direction of character given in early
life, so that the first few years determine all their future history in
this world, so the career of this short life may fix the future through
eternal years. And yet the process of change to the full
consummation of character may involve ages.
In studying the works of the Creator, we find that every thing goes
forward on a system of developments. Nothing comes into being in
full perfection, and unless there is an interruption of the natural
tendencies of things, every thing reaches its full and perfected state
before its existence ends. And the nobler, larger, [pg 179] and
grander the existence, the slower it proceeds to its consummated
perfection. The oak and the palm demand centuries ere they reach
their perfected prime. The highest grades of animal life are slowest
in gaining their full development. The horse, the elephant, and the
camel, are going forward to perfection for years after the feebler
tribes that started with them have perfected and perished.
We can suppose the body a veil to hide our mind from another, and
that death makes every soul “open and naked,” in all its thoughts
and feelings, to every other disembodied spirit. What would be the
effect of such a revelation, no one could say. But we should fear
rather than hope.
The conduct and character formed in this life will have an abiding
influence on the character and happiness of every mind through
eternal ages.10
Chapter XXIX. What Must We Do To Be
Saved?
The next question is, what are the teachings of reason and
experience as to the most successful modes of securing true virtue,
or voluntary obedience to all the laws of God?
Mind itself is the only producing cause of its own volitions. Excited
desires, and those objects which excite desire, are the occasional
causes of choice.
The question is, in what sense can any being be the cause of
virtuous actions, or virtuous character, in another mind?
Here we must recur to the fact that the Creator, as the author of all
minds, and of all the things that excite desire, is the cause, in one
sense, of all the volitions and of all the characters of all finite minds.
It is in this sense that, in the Bible, the Jehovah of the Old
Testament says, “I make peace and create evil.” No other being but
the Creator can be regarded as the cause of volitions in this sense,
viz., as the author of all minds and their circumstances of
temptation.
The only sense, then, in which God can be called the author or
cause of sinful volitions in the minds of his creatures, is the fact that
he is the author of all created minds and of their circumstances of
temptation.
[pg 183]
In regard to man, there are only two conceivable modes, in which he
can be the cause of sinful or virtuous character in other minds.
When these two modes are employed with the design to induce
wrong action, then men are blameable causes of sinful action and
character in their fellow men. God, as above shown, never thus
causes sin. When these modes are employed with the intention to
induce virtuous actions and character, then both God and man are
causes of right moral action in mankind.
The blamable causes of all failure in right and virtuous action are self
and the finite educators of self. The unblamable causes are God,
educators and self, so far as they are faithful in doing all they can to
educate aright.
The next thing that has been found efficacious in forming virtuous
character is the formation of uniform habits of obedience to parental
rule, in the early periods [pg 185] of existence. To secure this,
invariable steadiness in government has been found indispensable. If
a child finds that sometimes he is to obey and sometimes he is not,
there is always a temptation to struggle against law. But if a parent's
laws, rewards and penalties are as steady and sure as those of God,
in due time the child submits as cheerfully to the domestic rules and
commands, as he does to the laws of nature. He is no more tempted
to contest parental commands than he is to attempt to stop the flow
of a river or the falling of rain. In this way a habit of submission to
law is generated, which makes all the future discipline and training
of life comparatively easy. A child learns cheerfully to obey a
heavenly Father, just in proportion as he thus obeys his earthly
parents.
Sympathy with a child in all its trials and in all its enjoyments, still
further increases this power of another mind in right guidance.
[pg 186]
This sympathetic influence is greatly increased by the power of a
virtuous example—especially if this example is exhibited by a
beloved friend and benefactor, who would be gratified by thus
guiding a dependent mind.
These are the influences which experience has [pg 187] shown to be
most effective in securing virtuous character.
In view of the above teachings, each one for himself must seek,
first, knowledge of the laws of God, and of their rewards and
penalties as discovered by the experience of mankind. In order to do
this, each must take all means to gain true teachers, and to receive
their teachings in true faith, that is, that practical faith, which
includes the purpose of obedience. Each must cultivate the intellect,
the reason and the moral sense, in order to judge correctly in
receiving and applying the rules of rectitude; each must seek to
discover the reasonableness and benevolence of these laws, and
form habits of steady obedience; each must seek to discover and
rightly to appreciate all the good and lovable qualities of all who
institute and administer laws, from the Creator to all subordinate
rulers and governors in the domestic and civil state; each must seek
the society of those whose sympathy and example would encourage
and promote virtuous conduct; and finally, each must make
obedience to all the laws of God the chief end or ruling purpose.
These are briefly the reply to the great question in relation to self.
Here we are to take into account two subjects previously [pg 188]
illustrated; the first is that great law of sacrifice, by which each
individual must make his own wishes and welfare subordinate to the
higher interests of the great commonwealth; the second is the fact
that all questions of right and wrong are dependent on the risks and
dangers that threaten the commonwealth. In cases where there is
little peril or evil, each individual has little responsibility for others.
On the contrary, when all are exposed to terrific dangers and
hazards, every individual is bound to think and care as much for the
danger of each one as for his own. And just as much as the interests
of all are of more value than those of one, so much more should
each place the public welfare above that of self.
In regard to those who are the educators of the young, each must
strive to maintain that invariable steadiness in governments which is
so effective in forming virtuous habits and in rendering obedience to
the laws of God more and more easy.
Finally, it should be the aim of each to establish such a community
around all who are being trained to virtue, that every social influence
shall repress vice and encourage virtue.
To suppose that God can impart at creation of each mind all the
knowledge of the millions of rules needed [pg 190] for all the
myriads of new relations, of myriads of beings through all eternity, is
to suppose an impossibility in the nature of things.
[pg 191]
These things being granted, the teachings of experience would lead
us to suppose, still farther, that the Creator must do all that is
possible to maintain invariable steadiness of government. We can
see that this, which is so important in family government, must be
still more so in an infinite family. For this end, the natural penalties
for wrong doing, must be as invariable as the rewards for well doing.
Again, the Creator must instruct his creatures in his laws and their
rewards and penalties to the full extent of his power. That is to say,
he must provide well-trained educators of mind, as fast and as fully
as is possible in the nature of things, having in view the results of
eternal ages to guide his decisions.
Whether mankind ever have, or ever would, fully evolve this system
of religious belief without any aid by revelation from the Creator, is a
question which we can not readily decide—inasmuch as the claim of
Christianity is, that from the first, our race have been instructed by
revelations from God, which have been more or less preserved in
traditions and written records. [pg 193] It is certain that the
elimination of this system, by unaided humanity, is dependent on the
development of both the intellectual and moral powers, just as much
so, as the physical discoveries of Newton, Copernicus and Columbus
were dependent on the intellectual progress of the race.
We will now notice how far the system of Boodh corresponds with
that of common sense.
Sins are divided into these three classes: first, those of the body,
such as killing, theft, fornication, etc.; those of the tongue, as
falsehood, harsh language, idle talk, etc.; and those of the mind, as
pride, covetousness, envy, heretical thoughts, etc.
For all such sins the most awful conceivable punishments are to
follow in a future state, and for millions of ages.
The Boodhists have a hierarchy very much like the Catholic church,
with varied grades and ranks. The priests are required to practice
celibacy, and are mainly supported by voluntary gifts from the
people.
There are eight principal hells; four that torment with cold and four
with heat. In the other hells are other sufferings, although not
connected with heat and cold. Worms bite, bowels are torn out,
limbs are racked, bodies are lacerated, they are pierced with hot
spits, crucified head downward, gnawed by dogs, torn by vultures.
These are described with minuteness in the Bedegat and often
depicted by the native artists in drawings, reminding one of Dante's
Inferno illustrated.
For killing a parent or a priest a man will suffer in one of the hells of
fire for inconceivable millions of ages. Denying the doctrines of
Gaudama incurs eternal suffering in fire. Insulting women, old men
or priests, receiving bribes, selling intoxicating drinks and parricide,
are punished in the worst hell.
Merit gained by any good conduct in these hells enables the person
to rise even to the celestial regions.
[pg 199]
The system of Boodhism commenced about six hundred years
before Christ, and has pervaded eastern, central and southern Asia
about as long and as fully as Christianity has pervaded Europe. The
Burman empire, where this account of that faith was obtained,
presents the most favorable results of this system on the character
and condition of its votaries.
In Thibet and Tartary, the religion of the Grand Lama chiefly prevails,
which is one form of Boodhism.
In past ages the two most highly developed heathen nations were
those of Greece and Rome, and of their religion we have the fullest
records. It is not probable that any one will consider their system of
religion superior to this now exhibited of modern paganism.
The result is that the most highly developed heathen nations, as yet,
have attained but very imperfectly the system of common sense.
No heathen religion ever taught an eternally-existing Creator, perfect
in knowledge, wisdom, power and benevolence. None ever taught
that the chief end of our Creator is happiness-making on the
greatest possible scale. None ever taught that this also is the chief
end for which man is created. None ever taught that right moral
action, or true virtue, consists in good willing toward the Creator,
toward self, and toward our fellow-beings, according to the laws of
the Creator, so that every mind shall make the good of self
subordinate to the general good. None ever taught that all questions
of right and wrong, or what is for the best, are to be decided with
reference to the risks and dangers of a future life. None ever
presented communion with, and the care, sympathy, sacrifices, and
example of a “long-suffering” [pg 201] Creator, as motives to secure
virtuous self-sacrifice from his creatures. If all this is taught by
revelations from God in the Bible, it is what was never taught by any
other religion yet known on earth.
These have been the mournful questionings of every age and every
race, while the wisest sages of the wisest nations, without a
revelation, have been unable to give any satisfactory reply.
Greece and Rome were the most civilized of all ancient nations, and
they give us Socrates, Plato, Aristotle and Cicero, as their best and
wisest men, who most deeply pondered these great questions.
Aristotle held to one superior deity, but taught that the stars are true
and eternal deities. Cicero leads to the belief of many gods, and
approves of worshiping distinguished men as gods. Socrates held to
a plurality [pg 202] of deities, and also to transmigration. He held
that the common sort of good men will go into the forms of bees,
ants, and other animals of a mild and social kind. Plato held to two
principles, God and matter, and that God was not concerned either in
the creation or government of this world. He argued for the
immortality of the soul on the ground of its pre-existence, and
concludes some of his speculations thus:
“We can not of ourselves know what will be pleasing to God, or what
worship to pay him; but it is needful that a lawgiver be sent from
heaven. Such an one do I expect, and O how greatly do I desire to
see him, and who he is!”
Chapter XXXI. Augustinian Creeds and
Theologians Teach the Common-Sense
System.
In the former portion of this work the Augustinian theory, with the
system based on it, has been presented as it is taught by creeds and
theologians. In contrast with it, has been presented the common-
sense system of religion as evolved by reason and experience.
The evidence will now be presented, to show that those who teach
the Augustinian system, at the same time teach the main points of
the common-sense system; and where the two systems are
contradictory, that they teach both sides of the contradiction, at
once affirming and denying the same things.
[pg 203]
A leading feature of the common-sense system is, that the nature of
the human mind is our only guide to the natural attributes of God.
“In the very frame and constitution of his nature he still bears
the natural image of his Maker. In a word, man is the living
image of the living God, in whom is displayed more of the
divine nature and glory than in all the works and creatures of
God upon earth.”
[pg 204]
The celebrated Scotch metaphysician, Sir W. Hamilton, says:
In proof of this from the Bible, these writers quote from the Apostle
James, that “men are made after the similitude of God.”
Another leading feature of the common-sense system is the position,
that we can discover the chief end or design of the Creator, by the
nature of his works, and that this end is to produce the greatest
possible happiness with the least possible evil.