Instant Download Synthetic Data for Deep Learning: Generate Synthetic Data for Decision Making and Applications with Python and R 1st Edition Necmi Gürsakal PDF All Chapters
Instant Download Synthetic Data for Deep Learning: Generate Synthetic Data for Decision Making and Applications with Python and R 1st Edition Necmi Gürsakal PDF All Chapters
https://ebookmass.com
https://ebookmass.com/product/data-universe-organizational-insights-
with-python-embracing-data-driven-decision-making-van-der-post/
testbankdeal.com
https://ebookmass.com/product/more-judgment-than-data-data-literacy-
and-decision-making-michael-jones/
testbankdeal.com
Machine Learning with Python for Everyone (Addison Wesley
Data & Analytics Series) 1st Edition, (Ebook PDF)
https://ebookmass.com/product/machine-learning-with-python-for-
everyone-addison-wesley-data-analytics-series-1st-edition-ebook-pdf/
testbankdeal.com
https://ebookmass.com/product/modern-business-analytics-practical-
data-science-for-decision-making-matt-taddy/
testbankdeal.com
https://ebookmass.com/product/data-analysis-for-the-life-sciences-
with-r-1st-edition/
testbankdeal.com
https://ebookmass.com/product/data-mining-for-business-analytics-
concepts-techniques-and-applications-in-python-ebook/
testbankdeal.com
Synthetic
Data for
Deep Learning
Generate Synthetic Data for Decision
Making and Applications with
Python and R
—
Necmi Gürsakal
Sadullah Çelik
Esma Birişçi
Synthetic Data for Deep
Learning
Generate Synthetic Data for Decision
Making and Applications with Python
and R
Necmi Gürsakal
Sadullah Çelik
Esma Birişçi
Synthetic Data for Deep Learning: Generate Synthetic Data for Decision Making and
Applications with Python and R
Necmi Gürsakal Sadullah Çelik
Bursa, Turkey Aydın, Turkey
Esma Birişçi
Bursa, Turkey
Introduction�������������������������������������������������������������������������������������������������������������xv
v
Table of Contents
Computer Vision�������������������������������������������������������������������������������������������������������������������� 22
Summary������������������������������������������������������������������������������������������������������������������������������������ 27
References���������������������������������������������������������������������������������������������������������������������������������� 27
vi
Table of Contents
vii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 215
viii
About the Authors
Necmi Gürsakal a statistics professor at Mudanya University
in Turkey, where he shares his experience and knowledge
with his students. Before that, he worked as a faculty member
at the Econometrics Department Bursa Uludağ University
for more than 40 years. Necmi has many published Turkish
books and English and Turkish articles on data science,
machine learning, artificial intelligence, social network
analysis, and big data. In addition, he has served as a
consultant to various business organizations.
ix
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
About the Authors
x
About the Technical Reviewer
Fatih Gökmenoğlu is a researcher focused on synthetic
data, computational intelligence, domain adaptation, and
active learning. He also likes reporting on the results of his
research.
His knowledge closely aligns with computer vision,
especially with deepfake technology. He studies both the
technology itself and ways of countering it.
When he’s not on the computer, you’ll likely find him
spending time with his little daughter, whose development
has many inspirations for his work on machine learning.
xi
Preface
In 2017, The Economist wrote, “The world’s most valuable resource is no longer oil,
but data,” and this becomes truer with every passing day. The gathering and analysis
of massive amounts of data drive the business world, public administration, and
science, giving leaders the information they need to make accurate, strategically-sound
decisions. Although some worry about the implications of this new “data economy,” it is
clear that data is here to stay. Those who can harness the power of data will be in a good
position to shape the future.
To use data ever more efficiently, machine and deep learning—forms of artificial
intelligence (AI)—continue to evolve. And every new development in how data and
AI are used impacts innumerable areas of everyday life. In other words, from banking
to healthcare to scientific research to sports and entertainment, data has become
everything. But, for privacy reasons, it is not always possible to find sufficient data.
As the lines between the real and virtual worlds continue to blur, data scientists have
begun to generate synthetic data, with or without real data, to understand, control, and
regulate decision-making in the real world. Instead of focusing on how to overcome
barriers to data, data professionals have the option of either transforming existing data
for their specific use or producing it synthetically. We have written this book to explore
the importance and meaning of these two avenues through real-world examples. If you
work with or are interested in data science, statistics, machine learning, deep learning, or
AI, this book is for you.
While deep learning models’ huge data needs are a bottleneck for such applications,
synthetic data has allowed these models to be, in a sense, self-fueled. Synthetic data
is still an emerging topic, from healthcare to retail, manufacturing to autonomous
driving. It should be noted that since labeling processes start with real data. Real data,
augmented data, and synthetic data all take place in these deep learning processes.
This book includes examples of Python and R applications for synthetic data
production. We hope that it proves to be as comprehensive as you need it to be.
—Necmi Gürsakal
— Sadullah Çelik
— Esma Birişçi
xiii
Introduction
“The claim is that nature itself operates in a way that is analogous to a priori reasoning.
The way nature operates is, of course, via causation: the processes we see unfolding
around us are causal processes, with earlier stages linked to later ones by causal
relations” [1]. Data is extremely important in the operation of causal relationships and
can be described as the “sine qua non” of these processes. In addition, data quality is
related to quantity and diversity, especially in the AI framework.
Data is the key to understanding causal relationships. Without data, it would
be impossible to understand how the world works. The philosopher David Hume
understood this better than anyone. According to Hume, our knowledge of the world
comes from our experiences. Experiences produce data, which can be stored on a
computer or in the cloud. Based on this data, we can make predictions about what will
happen in the future. These predictions allow us to test our hypotheses and theories. If
our predictions are correct, we can have confidence in our ideas. If they are wrong, we
need to rethink our hypotheses and theories. This cycle of testing and refinement is how
we make progress in science and life. This is how we make progress as scientists and as
human beings.
Many of today’s technology giants, such as Amazon, Facebook, and Google, have
made data-driven decision-making the core of their business models. They have done
this by harnessing the power of big data and AI to make decisions that would otherwise
be impossible. In many ways, these companies are following in the footsteps of Hume,
using data to better understand the world around them.
As technology advances, how we collect and store data also changes. In the past,
data was collected through experiments and observations made by scientists. However,
with the advent of computers and the internet, data can now be collected automatically
and stored in a central location. This has led to a change in the way we think about
knowledge. Instead of knowledge being stored in our minds, it is now something that is
stored in computers and accessed through algorithms.
This change in the way we think about knowledge has had a profound impact on the
way we live and work. In the past, we would have to rely on our memory and experience
to make decisions. However, now we can use data to make more informed decisions.
xv
Introduction
For example, we can use data about the past behavior of consumers to predict what
they might buy in the future. This has led to a more efficient and effective way of doing
business.
In the age of big data, it is more important than ever to have high-quality data to
make accurate predictions. However, it is not only the quantity and quality of the data
that is important but also the diversity. The diversity of data sources is important to
avoid bias and to get a more accurate picture of the world. This is because different data
sources can provide different perspectives on the same issue, which can help to avoid
bias. Furthermore, more data sources can provide a more complete picture of what is
happening in the world.
Machine Learning
In recent years, a method has been developed to teach machines to see, read, and hear
via data input. The point of origin for this is what we think of in the brain as producing
output bypassing inputs through a large network of neurons. In this framework, we
are trying to give machines the ability to learn by modeling artificial neural networks.
Although some authors suggest that the brain does not work that way, this is the path
followed today.
Many machines learning projects in new application areas began with the labeling
of data by humans to initiate machine training. These projects were categorized under
the title of supervised learning. This labeling task is similar to the structured content
analysis applied in social sciences and humanities. Supervised learning is a type of
machine learning that is based on providing the machine with training data that is
already labeled. This allows the machine to learn and generalize from the data to make
predictions about new data. Supervised learning is a powerful tool for many machine
learning applications.
The quality of data used in machine learning studies is crucial for the accuracy
of the findings. A study by Geiger et al. (2020) showed that the data used to train a
machine learning model for credit scoring was of poor quality, which led to an unfair
and inaccurate model. The study highlights the importance of data quality in machine
learning research. Data quality is essential for accurate results. Furthermore, the study
showed how data labeling impacts data quality. About half of the papers using original
human annotation overlap with other papers to some extent, and about 70% of the
xvi
Introduction
papers that use multiple overlaps report metrics of inter-annotator agreement [2]. This
suggests that the data used in these studies is unreliable and that further research is
needed to improve data quality.
As more business decisions are informed by data analysis, more companies are
built on data. However, data quality remains a problem. Unfortunately, “garbage in,
garbage out,” which was a frequently used motto about computers in the past, is valid
in the sense of data sampling, which is also used in the framework of machine learning.
According to the AI logic most employed today, if qualified college graduates have been
successful in obtaining doctorates in the past, they will remain doing so in the future.
In this context, naturally, the way to get a good result in machine learning is to include
“black swans” in our training data, and this is also a problem with our datasets.
A “black swan” is a term used to describe outliers in datasets. It is a rare event that
is difficult to predict and has a major impact on a system. In machine learning, a black
swan event is not represented in the training data but could significantly impact the
results of the machine learning algorithm. Black swans train models to be more robust
to unexpected inputs. It is important to include them in training datasets to avoid biased
results.
Over time, technological development has moved it into the framework of human
decision-making with data and into the decision-making framework of machines. Now,
machines evaluate big data and make decisions with algorithms written by humans.
For example, a driverless car can navigate toward the desired destination by constantly
collecting data on stationary and moving objects around it in various ways. Autonomous
driving is a very important and constantly developing application area for synthetic data.
Autonomous driving systems should be developed at a capability level that can solve
complex and varied traffic problems in simulation. The scenarios we mentioned in these
simulations are sometimes made by gaming engines such as Unreal and Unity. Creating
accurate and useful “synthetic data” with simulations based on real data will be the way
companies will prefer real data that cannot be easily found.
Synthetic data is becoming an increasingly important tool for businesses looking to
improve their AI initiatives and overcome many of the associated challenges. By creating
synthetic data, businesses can shape and form data to their needs and augment and
de-bias their datasets. This makes synthetic data an essential part of any AI strategy.
DataGen, Mostly, Cvedia, Hazy, AI.Reverie, Omniverse, and Anyverse can be counted
among the startups that produce synthetic data. Sample images from synthetic outdoor
datasets produced by such companies can be seen in the given source.
xvii
Introduction
In addition to the benefits mentioned, synthetic data can also help businesses train
their AI models more effectively and efficiently. Businesses can avoid the need for costly
and time-consuming data collection processes by using synthetic data. This can help
businesses save money and resources and get their AI initiatives up and running more
quickly.
Book Structure
Synthetic data is not originally collected from real-world sources. It is generated by
artificial means, using algorithms or mathematical models, and has many applications
in deep learning, particularly in training neural networks. This book, which discusses the
structure and application of synthetic data, consists of five chapters.
Chapter 1 covers synthetic data, why it is important, and how it can be used in
data science and artificial intelligence applications. This chapter also discusses the
accuracy problems associated with synthetic data, the life cycle of data, and the
tradeoffs between data collection and privacy. Finally, this chapter describes some
applications of synthetic data, including financial services, manufacturing, healthcare,
automotive, robotics, security, social media, marketing, natural language processing,
and computer vision.
Chapter 2 provides information about different ways of generating synthetic data. It
covers how to generate fair synthetic data, as well as how to use video games to create
synthetic data. The chapter also discusses the synthetic-to-real domain gap and how
to overcome it using domain transfer, domain adaptation, and domain randomization.
xviii
Introduction
Finally, the chapter discusses whether a real-world experience is necessary for training
machine learning models and, if not, how to achieve it using pretraining, reinforcement
learning, and self-supervised learning.
Chapter 3 explains the content and purpose of a generative adversarial network, or
GAN, a type of AI used to generate new data, like training data.
Chapter 4 explores synthetic data generation with R.
Chapter 5 covers different methods of synthetic data generation with Python.
Source Code
The datasets and source code used in this book can be downloaded from github.com/
apress/synthetic-data-deep-learning.
References
[1]. M. Rozemund, “The Nature of the Mind,” in The Blackwell Guide
to Descartes’ Meditations, S. Gaukroger, John Wiley & Sons, 2006.
xix
CHAPTER 1
An Introduction
to Synthetic Data
In this chapter, we will explore the concept of data and its importance in today’s world.
We will discuss the lifecycle of data from collection to storage and how synthetic data can
be used to improve accuracy in data science and artificial intelligence (AI) applications.
Next, we will explore of synthetic data applications in financial services, manufacturing,
healthcare, automotive, robotics, security, social media, and marketing. Finally, we
will examine natural language processing, computer vision, understanding of visual
scenes, and segmentation problems in terms of synthetic data.
1
© Necmi Gürsakal, Sadullah Çelik, and Esma Birişçi 2022
N. Gürsakal et al., Synthetic Data for Deep Learning, https://doi.org/10.1007/978-1-4842-8587-9_1
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
Chapter 1 An Introduction to Synthetic Data
hypotheses and models before they are applied to real-world data. This can help data
scientists avoid making errors that could have negative consequences in the real world.
Synthetic data that is artificially generated by a computer program or simulation,
rather than being collected from real-world sources [11]. When we examine this
definition, we see that the following concepts are included in the definition:
“Annotated information, computer simulations, algorithm, and “not measured in a
real-world”.
The key features of synthetic data are as follows:
generated for a specific purpose, it will not be randomly scattered. In some cases,
synthetic data may even be of a higher quality than real data. Actual data may need to be
over-processed when necessary, and too much data may be processed when necessary.
These actions can reduce the quality of the data. Synthetic data, on the other hand, can
be of higher quality, thanks to the model used to generate the data.
Overall, synthetic data has many advantages over real data. Synthetic data is more
controlled, of higher quality, and can be generated in the desired quantities. These
factors make synthetic data a valuable tool for many applications. A final reason why
synthetic data is important is that it can be used to generate data for research purposes,
allowing researchers to study data that is not biased or otherwise not representative of
the real data.
Now let’s explain the importance of synthetic data for data science and artificial
intelligence.
¡Ay, ay de tu vida!
POLÍXENA
¿A qué llamas con esa voz de mal agüero? Nada bueno me indica
Habla, no me lo ocultes más tiempo. ¡Tengo miedo, madre, tengo
miedo!
HÉCUBA
Refiero, ¡oh hija!, un rumor fatal: dicen que los argivos han
decretado arrancarme tu vida.
POLÍXENA
¡Oh, madre, que tales penas sufres! ¡Oh tú, la más infeliz de las
madres! ¡Oh mujer desdichada! ¿Qué numen ha suscitado contra ti de
nuevo tantas infaustas e inauditas calamidades? Ya no seré tu
compañera de esclavitud; ya no podré, siendo tu hija, consolarte en tu
deplorable vejez. Como a leoncilla criada en las selvas, como a
ternerilla nueva, me verás separada de ti, me verás degollar, y bajaré a
las subterráneas tinieblas de Hades, en donde yaceré con los muertos
Por ti lloro, ¡oh madre desdichada!, por ti me lamento amargamente
No por mi vida, llena de males y de oprobio, porque es mejor mi suerte
muriendo.
EL CORO
Estrofa 1.ª — ¡Oh aura, aura marina, que impeles a las ligeras
naves, surcando las olas! ¿Adónde llevarás a esta mísera? ¿Qué dueño
me comprará para arrastrarme a su hogar? ¿Iré a las riberas de la
Dóride,[45] o a las de la Ftía,[46] en donde dicen que el Apídano,[47] río
de cristalinas ondas, fertiliza los campos?
Antístrofa 1.ª — ¿O a alguna de las islas, al son del marino remo
para vivir triste vida, a do crece la primera palma que vieron los
hombres,[48] y el laurel sagrado en honor de Leto y de sus hijos
delicias de Zeus? ¿Cantaré himnos con las vírgenes delias a la diosa
Artemisa, y celebraré sus blondos cabellos y su arco?
Estrofa 2.ª — ¿O en la ciudad de Palas y en el peplo amarillo de
Atenea labraré con la aguja la cuadriga y sus caballos, sembrándolo de
tejidas y artificiosas flores, o al linaje de los titanes, a quienes Zeus, e
hijo de Cronos, condenó con sus rayos a perpetuo sueño?[49]
Antístrofa 2.ª — ¡Ay de mis padres, ay de mis hijos, ay de mi patria
que cayó envuelta en humo, vencida en la guerra por los griegos! Yo
dejo el Asia sierva de la Europa, trocando el tálamo por el Orco,[50] y
me llamarán esclava en tierra extraña.
TALTIBIO
¿En dónde, ¡oh doncellas troyanas!, podré encontrar a Hécuba, la
que hace poco era reina de Ilión?
EL CORO
¿Qué diré, oh Zeus? ¿Te interesas por los hombres, o ellos lo creen
falsamente, pensando que hay dioses, y que la fortuna domina a
mismo tiempo a los mortales? ¿No fue Hécuba reina de los frigios
ricos en oro? ¿No fue esposa de Príamo, gloriosamente afortunado? La
lanza ha derribado su ciudad, y ella, esclava y anciana, huérfana de
sus hijos, yace en tierra, manchando con el polvo su cabeza
desventurada. ¡Ah!, ¡ah! Viejo soy, pero más quiero morir que sufri
vergonzosos males. (Acercándose a Hécuba). Levántate, ¡oh muje
infeliz! Que tu cuerpo y tu blanca cabeza abandonen la tierra.
HÉCUBA (levantándose).
¡Ay de mí! ¿Qué dices? ¿No has venido a buscarme, cuando estoy a
punto de morir, sino para anunciarme males? Pereciste, ¡oh hija!
arrancada de los brazos de tu madre: yo quedo sin hijos, sin ti a
menos; ¡oh, cuán desgraciada soy! ¿Cómo la sacrificasteis? ¿Con
respeto, os ensañasteis en ella, ¡oh anciano!, como si fuese un
enemigo? Habla, aunque tus frases me aflijan.
TALTIBIO[51]
Me harás llorar dos veces, ¡oh mujer!, compadecido de tu hija
ahora humedeceré mis ojos recordándolo, y al morir lloré también
junto al sepulcro. La muchedumbre infinita del ejército aqueo acudió
alrededor del túmulo para presenciar el sacrificio de Políxena: el hijo de
Aquiles la llevó de la mano hasta colocarla en lo alto del túmulo
teniéndome a su lado; seguíanle los principales jóvenes aqueos para
sujetar a la víctima en las convulsiones de la agonía. El hijo de Aquiles
con el vaso dorado de las libaciones, las hizo a los manes de su padre
ordenándome después que impusiese silencio a todo el ejército. Yo
entonces, en medio de ellos, dije: «Callad, ¡oh griegos!; haya silencio
en el pueblo; que ninguno hable, que todos guarden compostura», y la
muchedumbre calló en efecto. Él, a su vez, se expresó así: «Recibe
¡oh padre mío!, hijo de Peleo, estas libaciones que evocan a los
muertos, y muéstrate propicio: ven a beber la negra y no libada sangre
de esta virgen, que el ejército y yo te ofrecemos; favorécenos, desata
nuestras popas, suelta nuestras naves, y concédenos a todos que
tornemos con felicidad desde Troya a nuestra patria». Así dijo, y todo
el ejército le acompañó en su oración. Cogió luego la empuñadura de
oro de su espada, y, desenvainándola, hizo seña a los jóvenes griegos
para que sujetaran a la víctima. Ella, al conocerlo, habló de esta
manera: «De buen grado muero, ¡oh argivos que arruinasteis m
patria!; nadie toque mi cuerpo, que ofreceré al hierro mi cerviz con
ánimo esforzado; pero por los dioses os ruego que no me sujetéis
para que muera como debe morir una mujer libre, que me avergonzará
ante los manes el nombre de esclava, siendo reina». Murmullos de
aprobación se oyeron en la muchedumbre, y el rey Agamenón ordenó
que los jóvenes soltasen a la virgen. Ella, al escucharlo, desgarró su
peplo desde los hombros hasta la cintura,[52] y enseñó su pecho, tan
hermoso como el de una estatua, e hincó en tierra sus rodillas, y
pronunció esta frase muy animosa: «He aquí mi pecho; hiérelo, ¡oh
joven!, si lo deseas; si ha de ser en la garganta, prepara la cuchilla». É
vacilaba, movido a compasión; pero al fin la dio muerte, y su sangre
corrió a raudales. Al morir no se olvidó de su decoro, y ocultó a
nuestras miradas lo que no deben ver los hombres. Después que
exhaló el alma, ocupáronse los griegos en distintos menesteres, ya
cubriéndola de hojas, ya llenando la pira con ramas de pino. Los que
nada hacían, oyéronles expresarse así: «¿Te estarás quieto, ¡oh
perezoso!, y no ofrecerás a esta doncella ni fúnebres galas ni tu peplo?
¿Nada darás a esta víctima tan valerosa como noble?». Esto es lo que
puedo decirte acerca de la muerte de Políxena, considerándote, si miro
a tus numerosos hijos, la más feliz de las mujeres, y si a tu suerte
como a la más infortunada.
EL CORO
Horribles desgracias han sobrevenido a los hijos de Príamo y a m
patria por decreto inexorable de los dioses.
HÉCUBA
Casandra vive: pero ¿no gimes por este muerto? Mira su cuerpo
desnudo y, contra lo que esperabas, contemplarás un prodigio.
HÉCUBA
¡Ay de mí! El muerto que veo es mi hijo Polidoro, el que me
guardaba el tracio. ¡Yo muero; ya no puedo vivir más! ¡Oh hijo! ¡Hijo
de mi corazón! Ya comienzo otro lúgubre canto, puesto que un numen
maléfico me anuncia nuevas calamidades.
LA ESCLAVA
¿Por qué no? ¿No fueron mujeres las que mataron a los hijos de
Egipto[67] y exterminaron a los hombres en Lemnos?[68] Así se hará, y
no hablemos más de esto; manda que no detengan a esta esclava en
todo el campamento, y tú, sierva, acércate al huésped tracio, y dile
«Hécuba te llama, la que era hace poco reina de Ilión, porque as
conviene a ti y a ella; que contigo vengan tus hijos, que ellos deben
saber también lo que piensa hacer». Retarda, ¡oh Agamenón!, e
entierro de Políxena, para que ambos, el hermano y la hermana, doble
objeto de mi maternal amor, ardan en una misma pira y sean
sepultados juntos.
AGAMENÓN
Así se hará, porque si navegase el ejército, no podría concederte
esta gracia; pero ahora, y ya que por obra de los dioses no soplan
vientos favorables, debemos permanecer aquí, esperando
tranquilamente hacernos después a la vela. Que todo suceda con
felicidad; es de interés de todos en general, de cada uno en particula
y de la república que el malo sufra el mal y que el bueno sea
afortunado. (Vanse los dos en distintas direcciones).
EL CORO
Golpea, nada perdones; rompe las puertas; nunca verán tus ojos la
luz, ni tampoco a tus hijos, muertos a mis manos.
EL CORO
¿Venciste al tracio, triunfaste de él, ¡oh mi dueña!, e hiciste lo que
pensabas?
HÉCUBA
¿Qué oigo? ¿Tú has hecho esto tal como él lo dice? ¿Tú, Hécuba
has tenido tanta audacia?
POLIMÉSTOR