Download Complete Introduction to Datafication: Implement Datafication Using AI and ML Algorithms Shivakumar R. Goniwada PDF for All Chapters
Download Complete Introduction to Datafication: Implement Datafication Using AI and ML Algorithms Shivakumar R. Goniwada PDF for All Chapters
https://ebookmass.com
https://ebookmass.com/product/introduction-to-
datafication-implement-datafication-using-ai-and-
ml-algorithms-shivakumar-r-goniwada/
https://ebookmass.com/product/introduction-to-datafication-implement-
datafication-using-ai-and-ml-algorithms-shivakumar-r-goniwada/
testbankdeal.com
https://ebookmass.com/product/introduction-to-responsible-ai-
implement-ethical-ai-using-python-1st-edition-manure/
testbankdeal.com
https://ebookmass.com/product/time-series-algorithms-recipes-
implement-machine-learning-and-deep-learning-techniques-with-python-
akshay-r-kulkarni/
testbankdeal.com
https://ebookmass.com/product/introduction-to-algorithms-for-data-
mining-and-machine-learning-yang/
testbankdeal.com
Shivakumar R. Goniwada
Introduction to Datafication: Implement Datafication Using AI and
ML Algorithms
Shivakumar R. Goniwada
Gubbalala, Bangalore, Karnataka, India
This work is subject to copyright. All rights are reserved by the publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,
and transmission or information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark
symbol with every occurrence of a trademarked name, logo, or image we use the names, logos,
and images only in an editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal
responsibility for any errors or omissions that may be made. The publisher makes no warranty,
express or implied, with respect to the material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Celestin Suresh John
Development Editor: Laura Berendson
Coordinating Editor: Mark Powers
Copy Editor: April Rondeau
Cover designed by eStudioCalamar
Cover image by Pawel Czerwinsk on Unsplash (www.unsplash.com)
Distributed to the book trade worldwide by Apress Media, LLC, 1 New York Plaza, New York, NY
10004, U.S.A. Phone 1-800-SPRINGER, fax (201) 348-4505, email orders-ny@springer-sbm.com,
or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member
(owner) is Springer Science+Business Media Finance Inc. (SSBM Finance Inc.). SSBM Finance
Inc. is a Delaware corporation.
For information on translations, please e-mail booktranslations@springernature.com;
for reprint, paperback, or audio rights, please e-mail bookpermissions@springernature.com.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook
versions and licenses are also available for most titles. For more information, reference our Print
and eBook Bulk Sales web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is
available to readers on GitHub (https://github.com/Apress). For more detailed information,
please visit http://www.apress.com/source-code.
Printed on acid-free paper
This book is dedicated to those who may need access to the
resources and opportunities many take for granted. May
this book serve as a reminder that knowledge and learning
are powerful tools that can transform lives and create new
opportunities for those who seek them.
Table of Contents
About the Author�������������������������������������������������������������������������������xiii
Acknowledgments����������������������������������������������������������������������������xvii
Introduction���������������������������������������������������������������������������������������xix
v
Table of Contents
vi
Table of Contents
Sentiment Analytics���������������������������������������������������������������������������������������73
Audio Analytics����������������������������������������������������������������������������������������������75
Video Analytics����������������������������������������������������������������������������������������������76
Comparison in Analytics��������������������������������������������������������������������������������76
Datafication Metrics��������������������������������������������������������������������������������������������77
Datafication Analysis�������������������������������������������������������������������������������������������79
Data Sources�������������������������������������������������������������������������������������������������80
Data Gathering�����������������������������������������������������������������������������������������������83
Introduction to Algorithms����������������������������������������������������������������������������������83
Supervised Machine Learning�����������������������������������������������������������������������84
Linear Regression������������������������������������������������������������������������������������������86
Support Vector Machines (SVM)��������������������������������������������������������������������88
Decision Trees�����������������������������������������������������������������������������������������������89
Neural Networks��������������������������������������������������������������������������������������������91
Naïve Bayes Algorithm����������������������������������������������������������������������������������93
K-Nearest Neighbor (KNN) Algorithm������������������������������������������������������������94
Random Forest����������������������������������������������������������������������������������������������95
Unsupervised Machine Learning�������������������������������������������������������������������������96
Clustering������������������������������������������������������������������������������������������������������97
Association Rule Learning�����������������������������������������������������������������������������98
Dimensionality Reduction������������������������������������������������������������������������������98
Reinforcement Machine Learning�����������������������������������������������������������������������99
Summary����������������������������������������������������������������������������������������������������������100
vii
Table of Contents
Data-Sharing Decisions�������������������������������������������������������������������������������106
Data-Sharing Styles������������������������������������������������������������������������������������������108
Unidirectional, Asynchronous Push Integration Style����������������������������������108
Real-Time and Event-based Integration Style���������������������������������������������109
Bidirectional, Synchronous, API-led Integration Style���������������������������������110
Mediated Data Exchange with an Event-Driven Approach��������������������������111
Designing a Data-Sharing Pipeline�������������������������������������������������������������������112
Types of Data Pipeline���������������������������������������������������������������������������������������118
Batch Processing�����������������������������������������������������������������������������������������118
Extract, Transform, and Load Data Pipeline (ETL)����������������������������������������119
Extract, Load, and Transform Data Pipeline (ELT)����������������������������������������120
Streaming and Event Processing�����������������������������������������������������������������121
Change Data Capture (CDC)�������������������������������������������������������������������������123
Lambda Data Pipeline Architecture�������������������������������������������������������������124
Kappa Data Pipeline Architecture����������������������������������������������������������������126
Data as a Service (DaaS)����������������������������������������������������������������������������������127
Data Lineage�����������������������������������������������������������������������������������������������������129
Data Quality�������������������������������������������������������������������������������������������������������130
Data Integration Governance����������������������������������������������������������������������������132
Summary����������������������������������������������������������������������������������������������������������133
viii
Table of Contents
Match Result�����������������������������������������������������������������������������������������������159
Create an Analysis Report���������������������������������������������������������������������������161
Summary����������������������������������������������������������������������������������������������������������163
ix
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
Table of Contents
x
Table of Contents
xi
Table of Contents
Index�������������������������������������������������������������������������������������������������263
xii
About the Author
Shivakumar R. Goniwada is an author,
inventor, chief enterprise architect, and
technology leader with over 23 years of
experience architecting cloud-native, data
analytics, and event-driven systems. He works
in Accenture and leads a highly experienced
technology enterprise and cloud architect
team. Over the years, he has led many
complex projects across industries and the
globe. He has ten software patents in cloud
computing, polyglot architecture, software
engineering, data analytics, and IoT. He authored a book on Cloud Native
Architecture and Design. He is a speaker at multiple global and in-house
conferences. Shivakumar has earned Master Technology Architecture,
Google Professional, AWS, and data science certifications. He completed
his executive MBA at the MIT Sloan School of Management.
xiii
About the Technical Reviewer
Dr. Mohan H M is a technical program
manager and research engineer (HMI, AI/
ML) at Digital Shark Technology, supporting
the research and development of new
products, promotion of existing products, and
investigation of new applications for existing
products.
In the past, he has worked as a technical education evangelist and
has traveled extensively all over India delivering training on artificial
intelligence, embedded systems, and Internet of Things (IoT) to research
scholars and faculties in engineering colleges under the MeitY scheme. In
the past, he has worked as an assistant professor at the T. John Institute of
Technology. Mohan holds a master’s degree in embedded systems and the
VLSI design field from Visvesvaraya Technological University. He earned
his Ph.D. on the topic of non-invasive myocardial infarction prediction
using computational intelligence techniques from the same university.
He has been a peer reviewer for technical publications, including BMC
Informatics, Springer Nature, Scientific Reports, and more. His research
interests include computer vision, IoT, and biomedical signal processing.
xv
Acknowledgments
Many thanks to my mother, S. Jayamma, and late father, G.M. Rudrapp,
who taught me the value of hard work, and to my wife, Nirmala, and
daughter, Neeharika, without whom I wouldn’t have been able to work
long hours into the night every day of the week. Last but not least, I’d like
to thank my friends, colleagues, and mentors at Mphasis, Accenture, and
other corporations who have guided me throughout my career.
Thank you also to my colleagues Mark Powers, Celestin Suresh John,
Shobana Srinivasan, and other Apress team members for allowing me to
work with you and Apress, and to all who have helped this book become
a reality. Thank you for my mentors Bert Hooyman and Abubacker
Mohamed and thanks for my colleague Raghu Pasupuleti for providing
key inputs.
xvii
Introduction
The motivation to write this book goes back to the words of Swami
Vivekananda: “Everything is easy when you are busy, but nothing is easy
when you are lazy,” and “Take up on one idea, make that one idea your life,
dream of it, think of it, live on that idea.”
Data is increasingly shaping the world in which we live. The
proliferation of digital devices, social media platforms, and the Internet
of Things (IoT) has led to an explosion in the amount of data generated
daily. This has created new opportunities and challenges for everyone
as we seek to harness the power of data to drive innovation and improve
decision making.
This book is a comprehensive guide to the world of datafication and its
development, governing process, and security. We explore fundamental
principles and patterns, analysis frameworks, techniques to implement
artificial intelligence (AI) and machine learning (ML) algorithms, models,
and regulations to govern datafication systems.
We will start by exploring the basics of datafication and how it
transforms the world, and then delve into the fundamental principles and
patterns and how data are ingested and processed with an extensive data
analysis framework. We will examine the ethics, regulations, and security
of datafication in a real scenario.
Throughout the book, we will use real-world examples and case
studies to illustrate key concepts and techniques and provide practical
guidance in sentiment and behavior analysis.
Whether you are a student, analyst, engineer, technologist, or someone
simply interested in the world of datafication, this book will provide you
with a comprehensive understanding of datafication.
xix
CHAPTER 1
Introduction to
Datafication
A comprehensive look at datafication must first begin with its definition.
This chapter provides that and details why datafication plays a significant
role in modern business and data architecture.
Datafication has profoundly impacted many aspects of society,
including business, finance, health care, politics, and education. It
has enabled companies to gain insights into consumer behavior and
preferences, health care to improve patient outcomes, finance to enhance
consumer experience and risk and compliance, and educators to
personalize learning experiences.
Datafication helps you to take facts and statistics gained from myriad
sources and give them domain-specific context, aggregating and making
them accessible for use in strategy building and decision making.
This improves sales and profiles, health results, and influence over
public policy.
Datafication is the process of turning data into a usable and accessible
format and involves the following:
• What is datafication?
What Is Datafication?
Datafication involves using digital technologies such as the cloud, data
products, and AI/ML algorithms to collect and process vast amounts of
data on human behavior, preferences, and activities.
Datafication converts various forms of information, such as texts,
images, audio recordings, comments, claps, and likes/dislikes to curated
format, and that data can be easily analyzed and processed by multiple
algorithms. This involves extracting relevant data from social media,
hospitals, and Internet of Things (IoT). These data are organized into
a consistent format and stored in a way that makes them accessible for
further analysis.
2
Visit https://ebookmass.com
now to explore a rich
collection of eBooks and enjoy
exciting offers!
Chapter 1 Introduction to Datafication
3
Chapter 1 Introduction to Datafication
4
Random documents with unrelated
content Scribd suggests to you:
kohtaan. Siitä on todistuksena m.m. Prahan Nadrazi Wilsonovo
(Wilsonin asema), joksi entinen Keisari Franz Josefin asema on
kastettu. Sen kansallistuntoon saattaa sisältyä jokunen määrä
ylpeyttä ja itsekkyyttä, kuten on laita meillä muillakin pienillä
vastaleivotuilla valtioilla, mutta sisäisesti vallitsee maassa
demokratia. Prahalaiset istuvat mielellään oluttuvissa —
hienoimmissa ravintoloissa näyttää olevan huomattavasti vähemmän
väkeä, — pohtivat politiikkaa ja juttelevat muista asioista. Siellä
istuvat papat ja mammat oluttuopin ääressä ja pelaavat korttia.
Usein käy tämä peli rahan päältä ja niin omituisilla korteilla, että
meikäläinen niistä ei ymmärrä mitään. Mitä heissä lie vinksin-vonksin
kuvioita, ja muutenkin ovat ne kooltaan hirmuisen suuret. Niillä voisi
vieras helposti saada pietin nahkaansa. Oluttupaan saattaa astua
verraten korkea-arvoinen upseeri, turista ukkojen kanssa päivän
kysymyksistä, lukea vapaamielisiä lehtiä, ja istua lopuksi
korttipöytään vertaisena toisten kanssa. Luultavasti vallitsee siis
tämän maan armeijassa hiukan toisenlainen henki kuin moniaitten
muitten tasavaltani upseereissa, joitten kansan keskuuteen
laskeutuminen merkitsisi kuolemansyntiä.
— Mikä onnettomuus?
Eihän sitä ole missään. Tirehtööri sanoo kuivasti, että slava bohu,
kun ei ole teillä, se on kai sitten jossakin muualla.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.