Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services Dr. Shitalkumar R. Sukhdeve download
Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services Dr. Shitalkumar R. Sukhdeve download
or textbooks at https://ebookmass.com
_____ Follow the link below to get your download now _____
https://ebookmass.com/product/google-cloud-platform-for-
data-science-a-crash-course-on-big-data-machine-learning-
and-data-analytics-services-dr-shitalkumar-r-sukhdeve/
https://ebookmass.com/product/big-data-analytics-introduction-to-
hadoop-spark-and-machine-learning-raj-kamal/
https://ebookmass.com/product/fundamentals-of-machine-learning-for-
predictive-data-analytics-algorithms/
https://ebookmass.com/product/data-science-in-theory-and-practice-
techniques-for-big-data-analytics-and-complex-data-sets-maria-c-
mariani/
https://ebookmass.com/product/the-big-r-book-from-data-science-to-
learning-machines-and-big-data-philippe-j-s-de-brouwer/
https://ebookmass.com/product/machine-learning-big-data-and-iot-for-
medical-informatics-pardeep-kumar/
https://ebookmass.com/product/essential-statistics-for-data-science-a-
concise-crash-course-1st-edition-mu-zhu/
Google Cloud
Platform for Data
Science
A Crash Course on Big Data,
Machine Learning, and Data
Analytics Services
Acknowledgments�����������������������������������������������������������������������������xiii
Preface�����������������������������������������������������������������������������������������������xv
Introduction��������������������������������������������������������������������������������������xvii
v
Table of Contents
vi
Table of Contents
Bibliography�������������������������������������������������������������������������������������213
Index�������������������������������������������������������������������������������������������������215
vii
About the Authors
Dr. Shitalkumar R. Sukhdeve is an
experienced senior data scientist with a strong
track record of developing and deploying
transformative data science and machine
learning solutions to solve complex business
problems in the telecom industry. He has
notable achievements in developing a machine
learning–driven customer churn prediction
and root cause exploration solution, a
customer credit scoring system, and a product
recommendation engine.
Shitalkumar is skilled in enterprise data science and research
ecosystem development, dedicated to optimizing key business indicators
and adding revenue streams for companies. He is pursuing a doctorate
in business administration from SSBM, Switzerland, and an MTech in
computer science and engineering from VNIT Nagpur.
Shitalkumar has authored a book titled Step Up for Leadership
in Enterprise Data Science and Artificial Intelligence with Big Data:
Illustrations with R and Python and co-authored a book titled Web
Application Development with R Using Shiny, Third Edition. He is a
speaker at various technology and business events such as World AI Show
Jakarta 2021, 2022, and 2023, NXT CX Jakarta 2022, Global Cloud-Native
and Open Source Summit 2022, Cyber Security Summit 2022, and ASEAN
Conversational Automation Webinar. You can find him on LinkedIn at
www.linkedin.com/in/shitalkumars/.
ix
About the Authors
x
About the Technical Reviewer
Sachin G. Narkhede is a highly skilled data
scientist and software engineer with over
12 years of experience in Python and R
programming for data analytics and machine
learning. He has a strong background in
building machine learning models using scikit-
learn, Pandas, Seaborn, and NLTK, as well as
developing question-answering machines and
chatbots using Python and IBM Watson.
Sachin's expertise also extends to data visualization using Microsoft
BI and the data analytics tool RapidMiner. With a master's degree in
information technology, he has a proven track record of delivering
successful projects, including transaction monitoring, trade-based money
laundering detection, and chatbot development for banking solutions. He
has worked on GCP (Google Cloud Platform).
Sachin's passion for research is evident in his published papers on
brain tumor detection using symmetry and mathematical analysis. His
dedication to learning is demonstrated through various certifications and
workshop participation. Sachin's combination of technical prowess and
innovative thinking makes him a valuable asset in the field of data science.
xi
Acknowledgments
We extend our sincerest thanks to all those who have supported us
throughout the writing process of this book. Your encouragement,
guidance, and unwavering belief in our abilities have contributed to
bringing this project to fruition.
Above all, we express our deepest gratitude to our parents, whose
unconditional love, unwavering support, and sacrifices have allowed us
to pursue our passions. Your unwavering belief in us has been the driving
force behind our motivation.
We are grateful to our family for their understanding and patience
during our countless hours researching, writing, and editing this book.
Your love and encouragement have served as a constant source of
inspiration.
A special thank you goes to our friends for their words of
encouragement, motivation, and continuous support throughout this
journey. Your belief in our abilities and willingness to lend an ear during
moments of doubt have been invaluable.
We would also like to express our appreciation to our mentors
and colleagues who generously shared their knowledge and expertise,
providing valuable insights and feedback that have enriched the content of
this book.
Lastly, we want to express our deepest gratitude to the readers of this
book. Your interest and engagement in the subject matter make all our
efforts worthwhile. We sincerely hope this book proves to be a valuable
resource for your journey in understanding and harnessing the power of
technology.
xiii
Acknowledgments
Once again, thank you for your unwavering support, love, and
encouragement. This book would not have been possible without each
and every one of you.
Sincerely,
Shitalkumar and Sandika
xiv
Preface
The business landscape is transforming by integrating data science
and machine learning, and cloud computing platforms have become
indispensable for handling and examining vast datasets. Google Cloud
Platform (GCP) stands out as a top-tier cloud computing platform, offering
extensive services for data science and machine learning.
This book is a comprehensive guide to learning GCP for data science,
using only the free-tier services offered by the platform. Regardless of
your professional background as a data analyst, data scientist, software
engineer, or student, this book offers a comprehensive and progressive
approach to mastering GCP's data science services. It presents a step-by-
step guide covering everything from fundamental concepts to advanced
topics, enabling you to gain expertise in utilizing GCP for data science.
The book begins with an introduction to GCP and its data science
services, including BigQuery, Cloud AI Platform, Cloud Dataflow, Cloud
Storage, and more. You will learn how to set up a GCP account and
project and use Google Colaboratory to create and run Jupyter notebooks,
including machine learning models.
The book then covers big data and machine learning, including
BigQuery ML, Google Cloud AI Platform, and TensorFlow. Within this
learning journey, you will acquire the skills to leverage Vertex AI for
training and deploying machine learning models and harness the power of
Google Cloud Dataproc for the efficient processing of large-scale datasets.
The book then delves into data visualization and business intelligence,
encompassing Looker Studio and Google Colaboratory. You will gain
proficiency in generating and distributing data visualizations and reports
using Looker Studio and acquiring the knowledge to construct interactive
dashboards.
xv
Preface
xvi
Introduction
Welcome to Google Cloud Platform for Data Science: A Crash Course on
Big Data, Machine Learning, and Data Analytics Services. In this book, we
embark on an exciting journey into the world of Google Cloud Platform
(GCP) for data science. GCP is a cutting-edge cloud computing platform
that has revolutionized how we handle and analyze data, making it an
indispensable tool for businesses seeking to unlock valuable insights and
drive innovation in the modern digital landscape.
As a widely recognized leader in cloud computing, GCP offers a
comprehensive suite of services specifically tailored for data science
and machine learning tasks. This book provides a progressive and
comprehensive approach to mastering GCP's data science services,
utilizing only the free-tier services offered by the platform. Whether you’re
a seasoned data analyst, a budding data scientist, a software engineer, or
a student, this book equips you with the skills and knowledge needed to
leverage GCP for data science purposes.
Chapter 1: “Introduction to GCP”
This chapter explores the transformative shift that data science and
machine learning brought about in the business landscape. We highlight
cloud computing platforms' crucial role in handling and analyzing vast
datasets. We then introduce GCP as a leading cloud computing platform
renowned for its comprehensive suite of services designed specifically for
data science and machine learning tasks.
xvii
Introduction
xviii
Introduction
xix
CHAPTER 1
Introduction to GCP
Over the past few years, the landscape of data science has undergone a
remarkable transformation in how data is managed by organizations.
The rise of big data and machine learning has necessitated the storage,
processing, and analysis of vast quantities of data for businesses. As a
result, there has been a surge in the demand for cloud-based data science
platforms like Google Cloud Platform (GCP).
According to a report by IDC, the worldwide public cloud services
market was expected to grow by 18.3% in 2021, reaching $304.9 billion.
GCP has gained significant traction in this market, becoming the third-
largest cloud service provider with a market share of 9.5% (IDC, 2021). This
growth can be attributed to GCP’s ability to provide robust infrastructure,
data analytics, and machine learning services.
GCP offers various data science services, including data storage,
processing, analytics, and machine learning. It also provides tools for building
and deploying applications, managing databases, and securing resources.
Let’s look at a few business cases that shifted to GCP and achieved
remarkable results:
2
Chapter 1 Introduction to GCP
3
Chapter 1 Introduction to GCP
4
Chapter 1 Introduction to GCP
5
Chapter 1 Introduction to GCP
6
Chapter 1 Introduction to GCP
7
Chapter 1 Introduction to GCP
Your project is now set up, and you can start using GCP services.
Note If you are using the free tier, make sure to monitor your usage
to avoid charges, as some services have limitations. Also, you may
need to enable specific services for your project to use them.
Summary
Google Cloud Platform (GCP) offers a comprehensive suite of cloud
computing services that leverage the same robust infrastructure used by
Google’s products. This chapter introduced GCP, highlighting its essential
services and their relevance to data science.
We explored several essential GCP services for data science, including
BigQuery, Cloud AI Platform, Cloud Dataflow, Cloud DataLab, Cloud
Dataproc, Cloud Storage, and Cloud Vision API. Each of these services
serves a specific purpose in the data science workflow, ranging from data
storage and processing to machine learning model development and
deployment.
8
Chapter 1 Introduction to GCP
9
CHAPTER 2
Google Colaboratory
Google Colaboratory is a free, cloud-based Jupyter Notebook environment
provided by Google. It allows individuals to write and run code in Python
and other programming languages and perform data analysis, data
visualization, and machine learning tasks. The platform is designed to be
accessible, easy to use, and collaboration-friendly, making it a popular tool
for data scientists, software engineers, and students.
This chapter will guide you through the process of getting started
with Colab, from accessing the platform to understanding its features and
leveraging its capabilities effectively. We will cover how to create and run
Jupyter notebooks, run machine learning models, and access GCP services
and data from Colab.
Features of Colab
Cloud-based environment: Colaboratory runs on Google’s servers,
eliminating users needing to install software on their devices.
Easy to use: Colaboratory provides a user-friendly interface for
working with Jupyter notebooks, making it accessible for individuals with
limited programming experience.
Access to GCP services: Colaboratory integrates with Google
Cloud Platform (GCP) services, allowing users to access and use GCP
resources, such as BigQuery and Cloud Storage, from within the notebook
environment.
12
Chapter 2 Google Colaboratory
13
Chapter 2 Google Colaboratory
15
Chapter 2 Google Colaboratory
Hands-On Example
Insert text in the notebook to describe the code by clicking the +
Text button.
16
Chapter 2 Google Colaboratory
import random
import matplotlib.pyplot as plt
17
Chapter 2 Google Colaboratory
18
Chapter 2 Google Colaboratory
Importing Libraries
The following is an example code to import libraries into your notebook. If
the library is not already installed, use the “!pip install” command followed
by the library’s name to install:
19
Chapter 2 Google Colaboratory
Once you execute the code, a prompt will appear, asking you to
sign into your Google account and grant Colaboratory the necessary
permissions to access your Google Drive.
Once you’ve authorized Colaboratory, you can access your Google
Drive data by navigating to /content/drive in the Colaboratory file explorer.
To write data to Google Drive, you can use Python’s built-in open
function. For example, to write a Pandas DataFrame to a CSV file in Google
Drive, you can use the following code:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.to_csv('/content/drive/My Drive/Colab Notebooks/data.
csv', index=False)
20
Chapter 2 Google Colaboratory
import pandas as pd
df = pd.read_csv('/content/drive/My Drive/Colab
Notebooks/data.csv')
Visualize Data
To visualize data in Colaboratory, you can use libraries such as Matplotlib
or Seaborn to create plots and charts.
21
Chapter 2 Google Colaboratory
Create a data frame in Python using the Pandas library with the
following code:
import pandas as pd
import numpy as np
This will create a data frame with 100 rows and 2 columns named “age”
and “weight”, populated with random integer values between 20 and 80 for
age and 50 and 100 for weight.
Visualize the data in the Panda’s data frame using the Matplotlib
library in Python. Here’s an example to plot a scatter plot of the age and
weight columns:
plt.scatter(df['age'], df['weight'])
plt.xlabel('Age')
plt.ylabel('Weight')
plt.show()
22
Chapter 2 Google Colaboratory
This will create a scatter plot with the age values on the x-axis and
weight values on the y-axis. You can also use other types of plots, like
histograms, line plots, bar plots, etc., to visualize the data in a Pandas
data frame.
23
Chapter 2 Google Colaboratory
import pandas as pd
import numpy as np
24
Chapter 2 Google Colaboratory
5. X = np.random.randn(n_samples, n_features):
This line generates a random array of shape (n_
samples, n_features) using the np.random.randn()
function from Numpy. Each element in the array is
drawn from a standard normal distribution (mean =
0, standard deviation = 1).
25
Chapter 2 Google Colaboratory
8. df = pd.DataFrame(np.hstack((X, y[:,
np.newaxis])), columns=[“feature_1”, “feature_2”,
“feature_3”, “feature_4”, “feature_5”, “target”]):
This line combines the features (X) and target (y)
arrays into a Pandas DataFrame. The np.hstack()
function horizontally stacks the X and y arrays, and
the resulting combined array is passed to the pd.
DataFrame() function to create a DataFrame. The
columns parameter is used to assign column names
to the DataFrame, specifying the names of the
features and the target.
26
Chapter 2 Google Colaboratory
27
Chapter 2 Google Colaboratory
8. clf = RandomForestClassifier(n_estimators=100):
This line creates an instance of the
RandomForestClassifier class with 100 decision
trees. The n_estimators parameter determines the
number of trees in the random forest.
9. clf.fit(X_train, y_train): This line trains the random
forest classifier (clf) on the training data (X_train
and y_train). The classifier learns patterns and
relationships in the data to make predictions.
28
Chapter 2 Google Colaboratory
import joblib
model = clf # Your trained model
# Save the model
joblib.dump(model, '/content/drive/My
Drive/Colab Notebooks/model.joblib')
29
Other documents randomly have
different content
After the painting belonging to Yale College. Cf. photograph in
Kingsley's Yale College, i. 102; engravings in Hollister's Connecticut,
i. 234, and Amer. Quart. Reg., viii. 31, 193; and memoir in Sparks's
Amer. Biog., xvi. 3, by J. L. Kingsley.
JOSEPH WARREN.
After a copperplate by J. Norman in An Impartial Hist. of the War in
America (Boston, 1781), vol. ii. p. 210. The best known picture of
Warren is a small canvas by Copley, belonging to Dr. John Collins
Warren, of Boston, which has been often engraved, and is given in
mezzotint by H. W. Smith in Frothingham's Life of Warren. The
picture in Faneuil Hall is painted after this, and Thomas Illman has
engraved that copy. A larger canvas by Copley, painted not long
before that artist left Boston for England, is owned by Dr.
Buckminster Brown, of Boston, and was engraved for the first time
in the Mem. Hist. of Boston, iii. 60, where will be found accounts of
various contemporary prints and memorials of Warren (pp. 59, 61,
142, 143), including his house at Roxbury, the manuscript of his
Massacre Oration, etc. Cf. Frothingham's Warren, p. 546; Hist.
Mag., Dec., 1857; Loring's Hundred Boston Orators, p. 67; Mrs. J.
B. Brown's Stories of General Warren; Life of Dr. John Warren; the
Warren Genealogy; Mass. Hist. Soc. Proc., Sept., 1866. The earliest
eulogy was that by Perez Morton in 1776 (Loring's Hundred Boston
Orators, 327; Niles's Principles and Acts, 1876, p. 30), and the
earliest memoir of any extent was that by A. H. Everett, in Sparks's
Amer. Biography (vol. x.). There are reminiscences in the N. E. Hist.
and Geneal. Reg., xii. 113, 234, which were based by Gen. William
H. Sumner on some letters published by him in 1825 in the Boston
Patriot, when, as adjutant-general of the State, he arranged for the
appearance of the Bunker Hill veterans in the celebration of that
year, and derived some reminiscences from them respecting
Warren's appearance and action during the fight. All other accounts
of Warren, however, have been eclipsed by Frothingham's Life of
Warren (Boston, 1865). In the Boston Medical and Surgical Journal
(June 17, 1875), Dr. John Jeffries (son of the surgeon of the British
army who saw Warren's body on the field) published a paper on his
death. Cf. also R. J. Speirr in Potter's Amer. Monthly, v. 571;
Frothingham's Warren, pp. 519, 523; Barry's Massachusetts, i. 37,
and references.
The grateful intentions expressed by the Massachusetts House of
Representatives (April 4, 1776), by the Continental Congress (April
8, 1777; Sept. 6, 1778; July 1, 1780,—see Journals of Congress),
and by the Congress of the United States (Jan. 30, 1846,—Mass.
Hist. Soc. Proc., ii. 337), have never been carried out. Benedict
Arnold manifested a special interest in the welfare of Warren's
children (N. E. Hist. and Geneal. Reg., April, 1857, p. 122). The
Freemasons erected a pillar to his memory on the battlefield in
1794, which disappeared when the present obelisk was begun in
1825. There is a view of the pillar in the Analectic Mag., March,
1818, and in Snow's Boston, 309. Cf. Mass. Hist. Soc. Proc., xiv. 65.
A statue of Warren, by Henry Dexter, was placed in a pavilion near
the obelisk in 1857. Cf. G. W. Warren's Hist. of the Bunker Hill
Monument Association; Frothingham's Warren, p. 547.
The long list of general histories on the British side, detailing the
events of the battle, begins with Murray's Impartial Hist. of the War
(London, 1778; Newcastle, 1782), and is made up during the rest of
that century by the Hist. of the War published at Dublin (1779-85);
Hall's Civil War in America (1780); The Detail and Conduct of the
Amer. War (1780); Andrews's Hist. of the War (1785, vol. i. 301,—
quoted at length by Ryerson, Loyalists, i. 461); Stedman, Hist. Amer.
War (London, 1794, vol. i. 125). The best of the later historians is
Mahon (Hist. of England, vi.), who was forced to admit, when
pressed upon the question, that the American claims of victory,
which he says they have always held, appear only in the reports of
later British tourists (vol. vi., App. xxix.). Lecky, in his brief account
(England in the Eighteenth Century, iii. 463), makes an intention of
Gage to fortify the Charlestown and not the Dorchester heights the
incentive to the American occupation of the former. Edw. Bernard's
History of England (London) has a curious "View of the Attack on
Bunker's Hill, with the burning of Charlestown."
Something confirmatory, rather than of original value, can be
gained from the histories of various regiments which took part in the
battle, as detailed in the series of Historical Records of such
regiments.[574]
The cut on the title of the present volume represents one side of
the medal given by Congress to Washington, to commemorate his
raising the siege of Boston.[602]
ebookmasss.com