100% found this document useful (4 votes)
110 views

Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets Maria C. Mariani download pdf

Maria

Uploaded by

ntekahrudka17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
110 views

Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets Maria C. Mariani download pdf

Maria

Uploaded by

ntekahrudka17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Download Full Version ebookmass - Visit ebookmass.

com

Data Science in Theory and Practice: Techniques


for Big Data Analytics and Complex Data Sets Maria
C. Mariani

https://ebookmass.com/product/data-science-in-theory-and-
practice-techniques-for-big-data-analytics-and-complex-data-
sets-maria-c-mariani/

OR CLICK HERE

DOWLOAD NOW

Discover More Ebook - Explore Now at ebookmass.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Google Cloud Platform for Data Science: A Crash Course on


Big Data, Machine Learning, and Data Analytics Services
Dr. Shitalkumar R. Sukhdeve
https://ebookmass.com/product/google-cloud-platform-for-data-science-
a-crash-course-on-big-data-machine-learning-and-data-analytics-
services-dr-shitalkumar-r-sukhdeve/
ebookmass.com

Data Mining for Business Analytics: Concepts, Techniques


and Applications in Python eBook

https://ebookmass.com/product/data-mining-for-business-analytics-
concepts-techniques-and-applications-in-python-ebook/

ebookmass.com

Distrust: Big Data, Data-Torturing, and the Assault on


Science Gary Smith

https://ebookmass.com/product/distrust-big-data-data-torturing-and-
the-assault-on-science-gary-smith/

ebookmass.com

Upconverting Nanoparticles Vineet K. Rai

https://ebookmass.com/product/upconverting-nanoparticles-vineet-k-rai/

ebookmass.com
The Ghost Orchid Jonathan Kellerman

https://ebookmass.com/product/the-ghost-orchid-jonathan-kellerman/

ebookmass.com

Medical secrets 6th Edition Mary P. Harward

https://ebookmass.com/product/medical-secrets-6th-edition-mary-p-
harward/

ebookmass.com

Power Electronic System Design: Linking Differential


Equations, Linear Algebra, and Implicit Functions Keng C.
Wu
https://ebookmass.com/product/power-electronic-system-design-linking-
differential-equations-linear-algebra-and-implicit-functions-keng-c-
wu/
ebookmass.com

Windows 10 Inside Out 3rd Edition Ed Bott

https://ebookmass.com/product/windows-10-inside-out-3rd-edition-ed-
bott/

ebookmass.com

Recipe for Love: A Small Town Romance (Sugar Springs Book


5) Alexa Aston

https://ebookmass.com/product/recipe-for-love-a-small-town-romance-
sugar-springs-book-5-alexa-aston/

ebookmass.com
Agricultural Nanobiotechnology Biogenic Nanoparticles,
Nanofertilizers and Nanoscale Biocontrol Agents Sougata
Ghosh
https://ebookmass.com/product/agricultural-nanobiotechnology-biogenic-
nanoparticles-nanofertilizers-and-nanoscale-biocontrol-agents-sougata-
ghosh/
ebookmass.com
Data Science in Theory and Practice
Data Science in Theory and Practice

Techniques for Big Data Analytics and Complex Data Sets

Maria Cristina Mariani


University of Texas, El Paso
El Paso, United States

Osei Kofi Tweneboah


Ramapo College of New Jersey
Mahwah, United States

Maria Pia Beccar-Varela


University of Texas, El Paso
El Paso, United States
This first edition first published 2022
© 2022 John Wiley and Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by law. Advice on how to obtain permission to reuse material
from this title is available at http://www.wiley.com/go/permissions

The right of Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela to be
identified as the authors of this work has been asserted in accordance with law.

Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office
111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley
products visit us at www.wiley.com

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some
content that appears in standard print versions of this book may not be available in other
formats.

Limit of Liability/Disclaimer of Warranty


In view of ongoing research, equipment modifications, changes in governmental regulations,
and the constant flow of information relating to the use of experimental reagents, equipment,
and devices, the reader is urged to review and evaluate the information provided in the package
insert or instructions for each chemical, piece of equipment, reagent, or device for, among other
things, any changes in the instructions or indication of usage and for added warnings and
precautions. While the publisher and authors have used their best efforts in preparing this work,
they make no representations or warranties with respect to the accuracy or completeness of the
contents of this work and specifically disclaim all warranties, including without limitation any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be
created or extended by sales representatives, written sales materials or promotional statements
for this work. The fact that an organization, website, or product is referred to in this work as a
citation and/or potential source of further information does not mean that the publisher and
authors endorse the information or services the organization, website, or product may provide
or recommendations it may make. This work is sold with the understanding that the publisher is
not engaged in rendering professional services. The advice and strategies contained herein may
not be suitable for your situation. You should consult with a specialist where appropriate.
Further, readers should be aware that websites listed in this work may have changed or
disappeared between when this work was written and when it is read. Neither the publisher nor
authors shall be liable for any loss of profit or any other commercial damages, including but not
limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data applied for


ISBN: 9781119674689

Cover Design: Wiley


Cover Image: © nobeastsofierce/Shutterstock

Set in 9.5/12.5pt STIXTwoText by Straive, Chennai, India

10 9 8 7 6 5 4 3 2 1
v

Contents

List of Figures xvii


List of Tables xxi
Preface xxiii

1 Background of Data Science 1


1.1 Introduction 1
1.2 Origin of Data Science 2
1.3 Who is a Data Scientist? 2
1.4 Big Data 3
1.4.1 Characteristics of Big Data 4
1.4.2 Big Data Architectures 5

2 Matrix Algebra and Random Vectors 7


2.1 Introduction 7
2.2 Some Basics of Matrix Algebra 7
2.2.1 Vectors 7
2.2.2 Matrices 8
2.3 Random Variables and Distribution Functions 12
2.3.1 The Dirichlet Distribution 15
2.3.2 Multinomial Distribution 17
2.3.3 Multivariate Normal Distribution 18
2.4 Problems 19

3 Multivariate Analysis 21
3.1 Introduction 21
3.2 Multivariate Analysis: Overview 21
3.3 Mean Vectors 22
3.4 Variance–Covariance Matrices 24
3.5 Correlation Matrices 26
vi Contents

3.6 Linear Combinations of Variables 28


3.6.1 Linear Combinations of Sample Means 29
3.6.2 Linear Combinations of Sample Variance and Covariance 29
3.6.3 Linear Combinations of Sample Correlation 30
3.7 Problems 31

4 Time Series Forecasting 35


4.1 Introduction 35
4.2 Terminologies 36
4.3 Components of Time Series 39
4.3.1 Seasonal 39
4.3.2 Trend 40
4.3.3 Cyclical 41
4.3.4 Random 42
4.4 Transformations to Achieve Stationarity 42
4.5 Elimination of Seasonality via Differencing 44
4.6 Additive and Multiplicative Models 44
4.7 Measuring Accuracy of Different Time Series Techniques 45
4.7.1 Mean Absolute Deviation 46
4.7.2 Mean Absolute Percent Error 46
4.7.3 Mean Square Error 47
4.7.4 Root Mean Square Error 48
4.8 Averaging and Exponential Smoothing Forecasting Methods 48
4.8.1 Averaging Methods 49
4.8.1.1 Simple Moving Averages 49
4.8.1.2 Weighted Moving Averages 51
4.8.2 Exponential Smoothing Methods 54
4.8.2.1 Simple Exponential Smoothing 54
4.8.2.2 Adjusted Exponential Smoothing 55
4.9 Problems 57

5 Introduction to R 61
5.1 Introduction 61
5.2 Basic Data Types 62
5.2.1 Numeric Data Type 62
5.2.2 Integer Data Type 62
5.2.3 Character 63
5.2.4 Complex Data Types 63
5.2.5 Logical Data Types 64
5.3 Simple Manipulations – Numbers and Vectors 64
5.3.1 Vectors and Assignment 64
Contents vii

5.3.2 Vector Arithmetic 65


5.3.3 Vector Index 66
5.3.4 Logical Vectors 67
5.3.5 Missing Values 68
5.3.6 Index Vectors 69
5.3.6.1 Indexing with Logicals 69
5.3.6.2 A Vector of Positive Integral Quantities 69
5.3.6.3 A Vector of Negative Integral Quantities 69
5.3.6.4 Named Indexing 69
5.3.7 Other Types of Objects 70
5.3.7.1 Matrices 70
5.3.7.2 List 72
5.3.7.3 Factor 73
5.3.7.4 Data Frames 75
5.3.8 Data Import 76
5.3.8.1 Excel File 76
5.3.8.2 CSV File 76
5.3.8.3 Table File 77
5.3.8.4 Minitab File 77
5.3.8.5 SPSS File 77
5.4 Problems 78

6 Introduction to Python 81
6.1 Introduction 81
6.2 Basic Data Types 82
6.2.1 Number Data Type 82
6.2.1.1 Integer 82
6.2.1.2 Floating-Point Numbers 83
6.2.1.3 Complex Numbers 84
6.2.2 Strings 84
6.2.3 Lists 85
6.2.4 Tuples 86
6.2.5 Dictionaries 86
6.3 Number Type Conversion 87
6.4 Python Conditions 87
6.4.1 If Statements 88
6.4.2 The Else and Elif Clauses 89
6.4.3 The While Loop 90
6.4.3.1 The Break Statement 91
6.4.3.2 The Continue Statement 91
6.4.4 For Loops 91
viii Contents

6.4.4.1 Nested Loops 92


6.5 Python File Handling: Open, Read, and Close 93
6.6 Python Functions 93
6.6.1 Calling a Function in Python 94
6.6.2 Scope and Lifetime of Variables 94
6.7 Problems 95

7 Algorithms 97
7.1 Introduction 97
7.2 Algorithm – Definition 97
7.3 How to Write an Algorithm 98
7.3.1 Algorithm Analysis 99
7.3.2 Algorithm Complexity 99
7.3.3 Space Complexity 100
7.3.4 Time Complexity 100
7.4 Asymptotic Analysis of an Algorithm 101
7.4.1 Asymptotic Notations 102
7.4.1.1 Big O Notation 102
7.4.1.2 The Omega Notation, Ω 102
7.4.1.3 The Θ Notation 102
7.5 Examples of Algorithms 104
7.6 Flowchart 104
7.7 Problems 105

8 Data Preprocessing and Data Validations 109


8.1 Introduction 109
8.2 Definition – Data Preprocessing 109
8.3 Data Cleaning 110
8.3.1 Handling Missing Data 110
8.3.2 Types of Missing Data 110
8.3.2.1 Missing Completely at Random 110
8.3.2.2 Missing at Random 110
8.3.2.3 Missing Not at Random 111
8.3.3 Techniques for Handling the Missing Data 111
8.3.3.1 Listwise Deletion 111
8.3.3.2 Pairwise Deletion 111
8.3.3.3 Mean Substitution 112
8.3.3.4 Regression Imputation 112
8.3.3.5 Multiple Imputation 112
8.3.4 Identifying Outliers and Noisy Data 113
8.3.4.1 Binning 113
Contents ix

8.3.4.2 Box and Whisker plot 113


8.4 Data Transformations 115
8.4.1 Min–Max Normalization 115
8.4.2 Z-score Normalization 115
8.5 Data Reduction 116
8.6 Data Validations 117
8.6.1 Methods for Data Validation 117
8.6.1.1 Simple Statistical Criterion 117
8.6.1.2 Fourier Series Modeling and SSC 118
8.6.1.3 Principal Component Analysis and SSC 118
8.7 Problems 119

9 Data Visualizations 121


9.1 Introduction 121
9.2 Definition – Data Visualization 121
9.2.1 Scientific Visualization 123
9.2.2 Information Visualization 123
9.2.3 Visual Analytics 124
9.3 Data Visualization Techniques 126
9.3.1 Time Series Data 126
9.3.2 Statistical Distributions 127
9.3.2.1 Stem-and-Leaf Plots 127
9.3.2.2 Q–Q Plots 127
9.4 Data Visualization Tools 129
9.4.1 Tableau 129
9.4.2 Infogram 130
9.4.3 Google Charts 132
9.5 Problems 133

10 Binomial and Trinomial Trees 135


10.1 Introduction 135
10.2 The Binomial Tree Method 135
10.2.1 One Step Binomial Tree 136
10.2.2 Using the Tree to Price a European Option 139
10.2.3 Using the Tree to Price an American Option 140
10.2.4 Using the Tree to Price Any Path Dependent Option 141
10.3 Binomial Discrete Model 141
10.3.1 One-Step Method 141
10.3.2 Multi-step Method 145
10.3.2.1 Example: European Call Option 146
10.4 Trinomial Tree Method 147
x Contents

10.4.1 What is the Meaning of Little o and Big O? 148


10.5 Problems 148

11 Principal Component Analysis 151


11.1 Introduction 151
11.2 Background of Principal Component Analysis 151
11.3 Motivation 152
11.3.1 Correlation and Redundancy 152
11.3.2 Visualization 153
11.4 The Mathematics of PCA 153
11.4.1 The Eigenvalues and Eigenvectors 156
11.5 How PCA Works 159
11.5.1 Algorithm 160
11.6 Application 161
11.7 Problems 162

12 Discriminant and Cluster Analysis 165


12.1 Introduction 165
12.2 Distance 165
12.3 Discriminant Analysis 166
12.3.1 Kullback–Leibler Divergence 167
12.3.2 Chernoff Distance 167
12.3.3 Application – Seismic Time Series 169
12.3.4 Application – Financial Time Series 171
12.4 Cluster Analysis 173
12.4.1 Partitioning Algorithms 174
12.4.2 k-Means Algorithm 174
12.4.3 k-Medoids Algorithm 175
12.4.4 Application – Seismic Time Series 176
12.4.5 Application – Financial Time Series 176
12.5 Problems 177

13 Multidimensional Scaling 179


13.1 Introduction 179
13.2 Motivation 180
13.3 Number of Dimensions and Goodness of Fit 182
13.4 Proximity Measures 183
13.5 Metric Multidimensional Scaling 183
13.5.1 The Classical Solution 184
13.6 Nonmetric Multidimensional Scaling 186
13.6.1 Shepard–Kruskal Algorithm 186
13.7 Problems 187
Contents xi

14 Classification and Tree-Based Methods 191


14.1 Introduction 191
14.2 An Overview of Classification 191
14.2.1 The Classification Problem 192
14.2.2 Logistic Regression Model 192
14.2.2.1 l1 Regularization 193
14.2.2.2 l2 Regularization 194
14.3 Linear Discriminant Analysis 194
14.3.1 Optimal Classification and Estimation of Gaussian Distribution 195
14.4 Tree-Based Methods 197
14.4.1 One Single Decision Tree 197
14.4.2 Random Forest 198
14.5 Applications 200
14.6 Problems 202

15 Association Rules 205


15.1 Introduction 205
15.2 Market Basket Analysis 205
15.3 Terminologies 207
15.3.1 Itemset and Support Count 207
15.3.2 Frequent Itemset 207
15.3.3 Closed Frequent Itemset 207
15.3.4 Maximal Frequent Itemset 208
15.3.5 Association Rule 208
15.3.6 Rule Evaluation Metrics 208
15.4 The Apriori Algorithm 210
15.4.1 An example of the Apriori Algorithm 211
15.5 Applications 213
15.5.1 Confidence 214
15.5.2 Lift 215
15.5.3 Conviction 215
15.6 Problems 216

16 Support Vector Machines 219


16.1 Introduction 219
16.2 The Maximal Margin Classifier 219
16.3 Classification Using a Separating Hyperplane 223
16.4 Kernel Functions 225
16.5 Applications 225
16.6 Problems 227
xii Contents

17 Neural Networks 231


17.1 Introduction 231
17.2 Perceptrons 231
17.3 Feed Forward Neural Network 231
17.4 Recurrent Neural Networks 233
17.5 Long Short-Term Memory 234
17.5.1 Residual Connections 235
17.5.2 Loss Functions 236
17.5.3 Stochastic Gradient Descent 236
17.5.4 Regularization – Ensemble Learning 237
17.6 Application 237
17.6.1 Emergent and Developed Market 237
17.6.2 The Lehman Brothers Collapse 237
17.6.3 Methodology 238
17.6.4 Analyses of Data 238
17.6.4.1 Results of the Emergent Market Index 238
17.6.4.2 Results of the Developed Market Index 238
17.7 Significance of Study 239
17.8 Problems 240

18 Fourier Analysis 245


18.1 Introduction 245
18.2 Definition 245
18.3 Discrete Fourier Transform 246
18.4 The Fast Fourier Transform (FFT) Method 247
18.5 Dynamic Fourier Analysis 250
18.5.1 Tapering 251
18.5.2 Daniell Kernel Estimation 252
18.6 Applications of the Fourier Transform 253
18.6.1 Modeling Power Spectrum of Financial Returns Using Fourier
Transforms 253
18.6.2 Image Compression 259
18.7 Problems 259

19 Wavelets Analysis 261


19.1 Introduction 261
19.1.1 Wavelets Transform 262
19.2 Discrete Wavelets Transforms 264
19.2.1 Haar Wavelets 265
19.2.1.1 Haar Functions 265
19.2.1.2 Haar Transform Matrix 266
Contents xiii

19.2.2 Daubechies Wavelets 267


19.3 Applications of the Wavelets Transform 269
19.3.1 Discriminating Between Mining Explosions and Cluster of
Earthquakes 269
19.3.1.1 Background of Data 269
19.3.1.2 Results 269
19.3.2 Finance 271
19.3.3 Damage Detection in Frame Structures 275
19.3.4 Image Compression 275
19.3.5 Seismic Signals 275
19.4 Problems 276

20 Stochastic Analysis 279


20.1 Introduction 279
20.2 Necessary Definitions from Probability Theory 279
20.3 Stochastic Processes 280
20.3.1 The Index Set  281
20.3.2 The State Space  281
20.3.3 Stationary and Independent Components 281
20.3.4 Stationary and Independent Increments 282
20.3.5 Filtration and Standard Filtration 283
20.4 Examples of Stochastic Processes 284
20.4.1 Markov Chains 285
20.4.1.1 Examples of Markov Processes 286
20.4.1.2 The Chapman–Kolmogorov Equation 287
20.4.1.3 Classification of States 289
20.4.1.4 Limiting Probabilities 290
20.4.1.5 Branching Processes 291
20.4.1.6 Time Homogeneous Chains 293
20.4.2 Martingales 294
20.4.3 Simple Random Walk 294
20.4.4 The Brownian Motion (Wiener Process) 294
20.5 Measurable Functions and Expectations 295
20.5.1 Radon–Nikodym Theorem and Conditional Expectation 296
20.6 Problems 299

21 Fractal Analysis – Lévy, Hurst, DFA, DEA 301


21.1 Introduction and Definitions 301
21.2 Lévy Processes 301
21.2.1 Examples of Lévy Processes 304
21.2.1.1 The Poisson Process (Jumps) 305
21.2.1.2 The Compound Poisson Process 305
xiv Contents

21.2.1.3 Inverse Gaussian (IG) Process 306


21.2.1.4 The Gamma Process 307
21.2.2 Exponential Lévy Models 307
21.2.3 Subordination of Lévy Processes 308
21.2.4 Stable Distributions 309
21.3 Lévy Flight Models 311
21.4 Rescaled Range Analysis (Hurst Analysis) 312
21.5 Detrended Fluctuation Analysis (DFA) 315
21.6 Diffusion Entropy Analysis (DEA) 316
21.6.1 Estimation Procedure 317
21.6.1.1 The Shannon Entropy 317
21.6.2 The H–𝛼 Relationship for the Truncated Lévy Flight 319
21.7 Application – Characterization of Volcanic Time Series 321
21.7.1 Background of Volcanic Data 321
21.7.2 Results 321
21.8 Problems 323

22 Stochastic Differential Equations 325


22.1 Introduction 325
22.2 Stochastic Differential Equations 325
22.2.1 Solution Methods of SDEs 326
22.3 Examples 335
22.3.1 Modeling Asset Prices 335
22.3.2 Modeling Magnitude of Earthquake Series 336
22.4 Multidimensional Stochastic Differential Equations 337
22.4.1 The multidimensional Ornstein–Uhlenbeck Processes 337
22.4.2 Solution of the Ornstein–Uhlenbeck Process 338
22.5 Simulation of Stochastic Differential Equations 340
22.5.1 Euler–Maruyama Scheme for Approximating Stochastic Differential
Equations 340
22.5.2 Euler–Milstein Scheme for Approximating Stochastic Differential
Equations 341
22.6 Problems 343

23 Ethics: With Great Power Comes Great Responsibility 345


23.1 Introduction 345
23.2 Data Science Ethical Principles 346
23.2.1 Enhance Value in Society 346
23.2.2 Avoiding Harm 346
23.2.3 Professional Competence 347
23.2.4 Increasing Trustworthiness 348
Contents xv

23.2.5 Maintaining Accountability and Oversight 348


23.3 Data Science Code of Professional Conduct 348
23.4 Application 350
23.4.1 Project Planning 350
23.4.2 Data Preprocessing 350
23.4.3 Data Management 350
23.4.4 Analysis and Development 351
23.5 Problems 351

Bibliography 353
Index 359
xvii

List of Figures

Figure 4.1 Time series data of phase arrival times of an earthquake. 36


Figure 4.2 Time series data of financial returns corresponding to Bank of
America (BAC) stock index. 37
Figure 4.3 Seasonal trend component. 40
Figure 4.4 Linear trend component. The horizontal axis is time t, and the
vertical axis is the time series Yt . (a) Linear increasing trend.
(b) Linear decreasing trend. 41
Figure 4.5 Nonlinear trend component. The horizontal axis is time t and the
vertical axis is the time series Yt . (a) Nonlinear increasing trend.
(b) Nonlinear decreasing trend. 41
Figure 4.6 Cyclical component (imposed on the underlying trend). The
horizontal axis is time t and the vertical axis is the time series
Yt . 42
Figure 7.1 The big O notation. 102
Figure 7.2 The Ω notation. 103
Figure 7.3 The Θ notation. 103
Figure 7.4 Symbols used in flowchart. 105
Figure 7.5 Flowchart to add two numbers entered by user. 106
Figure 7.6 Flowchart to find all roots of a quadratic equation
ax2 + bx + c = 0. 107
Figure 7.7 Flowchart. 108
Figure 8.1 The box plot. 113
Figure 8.2 Box plot example. 114
Figure 9.1 Scatter plot of temperature versus ice cream sales. 122
Figure 9.2 Heatmap of handwritten digit data. 124
Figure 9.3 Map of earthquake magnitudes recorded in Chile. 125
xviii List of Figures

Figure 9.4 Spatial distribution of earthquake magnitudes (Mariani et al.


2016). 126
Figure 9.5 Number of text messages sent. 128
Figure 9.6 Normal Q–Q plot. 128
Figure 9.7 Risk of loan default. Source: Tableau Viz Gallery. 130
Figure 9.8 Top five publishing markets. Source: Modified from International
Publishers Association – Annual Report. 131
Figure 9.9 High yield defaulted issuer and volume trends. Source: Based on
Fitch High Yield Default Index, Bloomberg. 131
Figure 9.10 Statistics page for popular movies and cinema locations. Source:
Google Charts. 132
Figure 10.1 One-step binomial tree for the return process. 137
Figure 11.1 Height versus weight. 153
Figure 11.2 Visualizing low-dimensional data. 154
Figure 11.3 2D data set. 157
Figure 11.4 First PCA axis. 157
Figure 11.5 Second PCA axis. 157
Figure 11.6 New axis. 158
Figure 11.7 Scatterplot of Royal Dutch Shell stock versus Exxon Mobil
stock. 161
Figure 12.1 Classification (by quadrant) of earthquakes and explosions using
the Chernoff and Kullback–Leibler differences. 171
Figure 12.2 Classification (by quadrant) of Lehman Brothers collapse and
Flash crash event using the Chernoff and Kullback–Leibler
differences. 173
Figure 12.3 Clustering results for the earthquake and explosion series based
on symmetric divergence using PAM algorithm. 176
Figure 12.4 Clustering results for the Lehman Brothers collapse, Flash crash
event, Citigroup (2009), and IAG (2011) stock data based on
symmetric divergence using the PAM algorithm. 177
Figure 13.1 Scatter plot of data in Table 13.1. 180
Figure 16.1 The xy-plane and several other horizontal planes. 220
Figure 16.2 The xy-plane and several parallel planes. 221
Figure 16.3 The plane x + y + z = 1. 221
Figure 16.4 Two class problem when data is linearly separable. 224
List of Figures xix

Figure 16.5 Two class problem when data is not linearly separable. 224
Figure 16.6 ROC curve for linear SVM. 226
Figure 16.7 ROC curve for nonlinear SVM. 227
Figure 17.1 Single hidden layer feed-forward neural networks. 232
Figure 17.2 Simple recurrent neural network. 234
Figure 17.3 Long short-term memory unit. 235
Figure 17.4 Philippines (PSI). (a) Basic RNN. (b) LTSM. 239
Figure 17.5 Thailand (SETI). (a) Basic RNN. (b) LTSM. 240
Figure 17.6 United States (NASDAQ). (a) Basic RNN. (b) LTSM. 241
Figure 17.7 JPMorgan Chase & Co. (JPM). (a) Basic RNN. (b) LTSM. 242
Figure 17.8 Walmart (WMT). (a) Basic RNN. (b) LTSM. 243
Figure 18.1 3D power spectra of the daily returns from the four analyzed stock
companies. (a) Discover. (b) Microsoft. (c) Walmart. (d) JPM
Chase. 255
Figure 18.2 3D power spectra of the returns (generated per minute) from the
four analyzed stock companies. (a) Discover. (b) Microsoft.
(c) Walmart. (d) JPM Chase. 257
Figure 19.1 Time-frequency image of explosion 1 recorded by ANMO
(Table 19.2). 270
Figure 19.2 Time-frequency image of earthquake 1 recorded by ANMO
(Table 19.2). 270
Figure 19.3 Three-dimensional graphic information of explosion 1 recorded
by ANMO (Table 19.2). 272
Figure 19.4 Three-dimensional graphic information of earthquake 1 recorded
by ANMO (Table 19.2). 272
Figure 19.5 Time-frequency image of explosion 2 recorded by TUC
(Table 19.3). 273
Figure 19.6 Time-frequency image of earthquake 2 recorded by TUC
(Table 19.3). 273
Figure 19.7 Three-dimensional graphic information of explosion 2 recorded
by TUC (Tabl 19.3). 274
Figure 19.8 Three-dimensional graphic information of earthquake 2 recorded
by TUC (Table 19.3). 274
Figure 21.1 R∕S for volcanic eruptions 1 and 2. 322
Figure 21.2 DFA for volcanic eruptions 1 and 2. 323
Figure 21.3 DEA for volcanic eruptions 1 and 2. 323
xxi

List of Tables

Table 2.1 Examples of random vectors. 13


Table 3.1 Ramus Bone Length at Four Ages for 20 Boys. 33
Table 4.1 Time series data of the volume of sales of over a six hour
period. 50
Table 4.2 Simple moving average forecasts. 50
Table 4.3 Time series data used in Example 4.6. 52
Table 4.4 Weighted moving average forecasts. 52
Table 4.5 Trend projection of weighted moving average forecasts. 53
Table 4.6 Exponential smoothing forecasts of volume of sales. 55
Table 4.7 Exponential smoothing forecasts from Example 4.9. 56
Table 4.8 Adjusted exponential smoothing forecasts. 57
Table 6.1 Numbers. 83
Table 6.2 Files mode in Python. 93
Table 7.1 Common asymptotic notations. 103
Table 9.1 Temperature versus ice cream sales. 122
Table 12.1 Events information. 170
Table 12.2 Discriminant scores for earthquakes and explosions groups. 170
Table 12.3 Discriminant scores for Lehman Brothers collapse and Flash crash
event. 172
Table 12.4 Discriminant scores for Citigroup in 2009 and IAG stock in
2011. 172
Table 13.1 Data matrix. 180
Table 13.2 Distance matrix. 181
Table 13.3 Stress and goodness of fit. 182
xxii List of Tables

Table 13.4 Data matrix. 188


Table 14.1 Models’ performances on the test dataset with 23 variables using
AUC and mean square error (MSE) values for the five
models. 201
Table 14.2 Top 10 variables selected by the Random forest algorithm. 201
Table 14.3 Performance for the four models using the top 10 features from
model Random forest on the test dataset. 201
Table 15.1 Market basket transaction data. 206
Table 15.2 A binary 0∕1 representation of market basket transaction
data. 206
Table 15.3 Grocery transactional data. 211
Table 15.4 Transaction data. 216
Table 16.1 Models performances on the test dataset. 226
Table 18.1 Percentage of power for Discover data. 254
Table 18.2 Percentage of power for JPM data. 254
Table 18.3 Percentage of power for Microsoft data. 254
Table 18.4 Percentage of power for Walmart data. 254
Table 19.1 Determining p and q for N = 16. 266
Table 19.2 Percentage of total power (energy) for Albuquerque, New Mexico
(ANMO) seismic station. 271
Table 19.3 Percentage of total power (energy) for Tucson, Arizona (TUC)
seismic station. 271
Table 21.1 Moments of the Poisson distribution with intensity 𝜆. 306
Table 21.2 Moments of the Γ(a, b) distribution. 307
Table 21.3 Scaling exponents of Volcanic Data time series. 322
xxiii

Preface

This textbook is dedicated to practitioners, graduate, and advanced undergraduate


students who have interest in Data Science, Business analytics, and Statistical and
Mathematical Modeling in different disciplines such as Finance, Geophysics, and
Engineering. This book is designed to serve as a textbook for several courses in the
aforementioned areas and a reference guide for practitioners in the industry.
The book has a strong theoretical background and several applications to
specific practical problems. It contains numerous techniques applicable to
modern data science and other disciplines. In today’s world, many fields are
confronted with increasingly large amounts of complex data. Financial, health-
care, and geophysical data sampled with high frequency is no exception. These
staggering amounts of data pose special challenges to the world of finance and
other disciplines such as healthcare and geophysics, as traditional models and
information technology tools can be poorly suited to grapple with their size
and complexity. Probabilistic modeling, mathematical modeling, and statistical
data analysis attempt to discover order from apparent disorder; this textbook may
serve as a guide to various new systematic approaches on how to implement these
quantitative activities with complex data sets.
The textbook is split into five distinct parts. In the first part of this book, foun-
dations of Data Science, we will discuss some fundamental mathematical and
statistical concepts which form the basis for the study of data science. In the second
part of the book, Data Science in Practice, we will present a brief introduction to
R and Python programming and how to write algorithms. In addition, vari-
ous techniques for data preprocessing, validations, and visualizations will be
discussed. In the third part, Data Mining and Machine Learning techniques for
Complex Data Sets and fourth part of the book, Advanced Models for Big Data
Analytics and Complex Data Sets, we will provide exhaustive techniques for
analyzing and predicting different types of complex data sets.
xxiv Preface

We conclude this book with a discussion of ethics in data science: With great
power comes great responsibility.
The authors express their deepest gratitude to Wiley for making the publication
a reality.

El Paso, TX and Mahwah, NJ, USA Maria Cristina Mariani


September 2021 Osei Kofi Tweneboah
Maria Pia Beccar-Varela
1

Background of Data Science

1.1 Introduction
Data science is one of the most promising and high-demand career paths for skilled
professionals in the 21st century. Currently, successful data professionals under-
stand that they must advance past the traditional skills of analyzing large amounts
of data, statistical learning, and programming skills. In order to explore and dis-
cover useful information for their companies or organizations, data scientists must
have a good grip of the full spectrum of the data science life cycle and have a level
of flexibility and understanding to maximize returns at each phase of the process.
Data science is a “concept to unify statistics, mathematics, computer science,
data analysis, machine learning and their related methods” in order to find trends,
understand, and analyze actual phenomena with data. Due to the Coronavirus dis-
ease (COVID-19) many colleges, institutions, and large organizations asked their
nonessential employees to work virtually. The virtual meetings have provided col-
leges and companies with plenty of data. Some aspect of the data suggest that
virtual fatigue is on the rise. Virtual fatigue is defined as the burnout associated
with the over dependence on virtual platforms for communication. Data science
provides tools to explore and reveal the best and worst aspects of virtual work.
In the past decade, data scientists have become necessary assets and are present
in almost all institutions and organizations. These professionals are data-driven
individuals with high-level technical skills who are capable of building complex
quantitative algorithms to organize and synthesize large amounts of information
used to answer questions and drive strategy in their organization. This is coupled
with the experience in communication and leadership needed to deliver tangible
results to various stakeholders across an organization or business.
Data scientists need to be curious and result-oriented, with good knowledge
(domain specific) and communication skills that allow them to explain very tech-
nical results to their nontechnical counterparts. They possess a strong quantitative
background in statistics and mathematics as well as programming knowledge with
Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets,
First Edition. Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela.
© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
2 1 Background of Data Science

focuses in data warehousing, mining, and modeling to build and analyze algo-
rithms. In fact, data scientists are a group of analytical data expert who have the
technical skills to solve complex problems and the curiosity to explore how prob-
lems need to be solved.

1.2 Origin of Data Science


Data scientists are part mathematicians, statisticians and computer scientists.
And because they span both the business and information technology (IT) worlds,
they’re in high demand and well-paid. Data scientists were not very popular
some decades ago; however, their sudden popularity reflects how businesses now
think about “Big data.” Big data is defined as a field that treats ways to analyze,
systematically extract information from, or otherwise deal with data sets that are
too large or complex to be dealt with by traditional data-processing application
software. That bulky mass of unstructured information can no longer be ignored
and forgotten. It is a virtual gold mine that helps boost revenue as long as there
is someone who explores and discovers business insights that no one thought
to look for before. Many data scientists began their careers as statisticians or
business analyst or data analysts. However, as big data began to grow and evolve,
those roles evolved as well. Data is no longer just an add on for IT to handle.
It is vital information that requires analysis, creative curiosity, and the ability
to interpret high-tech ideas into innovative ways to make profit and to help
practitioners make informed decisions.

1.3 Who is a Data Scientist?


The term “data scientist” was invented as recently as 2008 when companies real-
ized the need for data professionals who are skilled in organizing and analyz-
ing massive amounts of data. Data scientists are quantitative and analytical data
experts who utilize their skills in both technology and social science to find trends
and manage the data around them. With the growth of big data integration in busi-
ness, they have evolved at the forefront of the data revolution. They are part math-
ematicians, statisticians, computer programmers, and analysts who are equipped
with a diverse and wide-ranging skill set, balancing knowledge in several com-
puter programming languages with advanced experience in statistical learning
and data visualization.
There is not a definitive job description when it comes to a data scientist role.
However, we outline here some stuffs they do:
● Collecting and recording large amounts of unruly data and transforming it into
a more usable format.
1.4 Big Data 3

● Solving business-related problems using data-driven techniques.


● Working with a variety of programming languages, including SAS, Minitab, R,
and Python.
● Having a strong background of mathematics and statistics including statistical
tests and distributions.
● Staying on top of quantitative and analytical techniques such as machine learn-
ing, deep learning, and text analytics.
● Communicating and collaborating with both IT and business.
● Looking for order and patterns in data, as well as spotting trends that enables
businesses to make informed decisions.
Some of the useful tools that every data scientist or practitioner needs are outlined
below:
● Data preparation: The process of cleaning and transforming raw data into suit-
able formats prior to processing and analysis.
● Data visualization: The presentation of data in a pictorial or graphical format so
it can be easily analyzed.
● Statistical learning or Machine learning: A branch of artificial intelligence based
on mathematical algorithms and automation. Artificial intelligence (AI) refers
to the process of building smart machines capable of performing tasks that typ-
ically require human intelligence. They are designed to make decisions, often
using real-time data. Real-time data are information that is passed along to the
end user immediately it is gathered.
● Deep learning: An area of statistical learning research that uses data to model
complex abstractions.
● Pattern recognition: Technology that recognizes patterns in data (often used
interchangeably with machine learning).
● Text analytics: The process of examining unstructured data and drawing mean-
ing out of written communication.
We will discuss all the above tools in details in this book. There are several sci-
entific and programming skills that every data scientist should have. They must
be able to utilize key technical tools and skills, including R, Python, SAS, SQL,
Tableau, and several others. Due to the ever growing technology, data scientist
must always learn new and emerging techniques to stay on top of their game. We
will discuss the R and Python programming in Chapters 5 and 6.

1.4 Big Data


Big data is a term applied to ways to analyze, systematically extract information
from, or otherwise deal with data sets that are too large or complex to be dealt
with by classical data-processing tools. In particular, it refers to data sets whose
4 1 Background of Data Science

size or type is beyond the ability of traditional relational databases to capture,


manage, and process the data with low latency. Sources of big data includes data
from sensors, stock market, devices, video/audio, networks, log files, transactional
applications, web, and social media and much of it generated in real time and at a
very large scale.
In recent times, the use of the term “big data” (both stored and real-time) tend
to refer to the use of user behavior analytics (UBA), predictive analytics, or certain
other advanced data analytics methods that extract value from data. UBA solutions
look at patterns of human behavior, and then apply algorithms and statistical anal-
ysis to detect meaningful anomalies from those patterns’ anomalies that indicate
potential threats. For example detection of hackers, detection of insider threats,
targeted attacks, financial fraud, and several others.
Predictive analytics deals with the process of extracting information from
existing data sets in order to determine patterns and predict future outcomes and
trends. Generally, predictive analytics does not tell you what will happen in the
future. However, it forecasts what might happen in the future with some degree
of certainty. Predictive analytics goes hand in hand with big data: Businesses and
organizations collect large amounts of real-time customer data and predictive
analytics and uses this historical data, combined with customer insight, to forecast
future events. Predictive analytics helps organizations to use big data to move
from a historical view to a forward-looking perspective of the customer. In this
book, we will discuss several methods for analyzing big data.

1.4.1 Characteristics of Big Data


Big data has one or more of the following characteristics: high volume, high veloc-
ity, high variety, and high veracity. That is, the data sets are characterized by huge
amounts (volume) of frequently updated data (velocity) in various types, such as
numeric, textual, audio, images and videos (variety), with high quality (veracity).
We briefly discuss each in detail. Volume: Volume describes the quantity of
generated and stored data. The size of the data determines the value and potential
insight, and whether it can be considered big data or not. Velocity: Velocity
describes the speed at which the data is generated and processed to meet the
demands and challenges that lie in the path of growth and development. Big data
is often available in both stored and real-time. Compared to small data, big data
are produced more continually (it could be nanosecond, second, minute, hours,
etc.). Two types of velocity related to big data are the frequency of generation and
the frequency of handling, recording, and reporting. Variety: Variety describes
the type and formats of the data. This helps people who analyze it to effectively
use the resulting insight. Big data draws from different formats and completes
missing pieces through data fusion. Data fusion is a term used to describe the
technique of integrating multiple data sources to produce more consistent,
1.4 Big Data 5

accurate, and useful information than that provided by any individual data
source. Veracity: Veracity describes the quality of data and the data value. The
quality of data obtained can greatly affect the accuracy of the analyzed results. In
the next subsection we will discuss some big data architectures. A comprehensive
study of this topic can be found in the application architecture guide of the
Microsoft technical documentation.

1.4.2 Big Data Architectures


Big data architectures are designed to handle the ingestion, processing, and anal-
ysis of data that is too large or complex for classical data-processing application
tools. Some popular big data architectures are the Lambda architecture, Kappa
architecture and the Internet of Things (IoT). We refer the reader to the Microsoft
technical documentation on Big data architectures for a detailed discussion on the
different architectures. Almost all big data architectures include all or some of the
following components:
● Data sources: All big data solutions begin with one or more data sources. Some
common data sources includes the following: Application data stores such as
relational databases, static files produced by applications such as web server log
files, and real-time data sources such as the Internet of Things (IoT) devices.
● Data storage: Data for batch processing operations is typically stored in a dis-
tributed file store that can hold high volumes of large files in various formats.
This kind of store is often called a data lake. A data lake is a storage repository
that allows one to store structured and unstructured data at any scale until it is
needed.
● Batch processing: Since data sets are enormous, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and other-
wise prepare the data for analysis. Normally, these jobs involve reading source
files, processing them, and writing the output to new files. Options include run-
ning U-SQL jobs or using Java, Scala, R, or Python programs. U-SQL is a data
processing language that merges the benefits of SQL with the expressive power
of ones own code.
● Real-time message ingestion: If the solution includes real-time sources, the archi-
tecture must include a way to capture and store real-time messages for stream
processing. This might be a simple data store, where incoming messages are
stored into a folder for processing. However, many solutions need a message
ingestion store to act as a buffer for messages and to support scale-out process-
ing, reliable delivery, and other message queuing semantics.
● Stream processing: After obtaining real-time messages, the solution must process
them by filtering, aggregating, and preparing the data for analysis. The processed
stream data is then written to an output sink.
6 1 Background of Data Science

● Analytical data store: Several big data solutions prepare data for analysis and
then serve the processed data in a structured format that can be queried using
analytical tools. The analytical data store used to serve these queries can be a
Kimball-style relational data warehouse, as observed in most classical business
intelligence (BI) solutions. Alternatively, the data could be presented through a
low-latency NoSQL technology, such as HBase, or an interactive Hive database
that provides a metadata abstraction over data files in the distributed data store.
● Analysis and reporting: The goal of most big data solutions is to provide insights
into the data through analysis and reporting. Users can analyze the data using
mathematical and statistical models as well using data visualization techniques.
Analysis and reporting can also take the form of interactive data exploration by
data scientists or data analysts.
● Orchestration: Several big data solutions consist of repeated data processing
operations, encapsulated in workflows, that transform source data, move data
between multiple sources and sinks, load the processed data into an analytical
data store, or move the results to a report or dashboard.
7

Matrix Algebra and Random Vectors

2.1 Introduction

The matrix algebra and random vectors presented in this chapter will enable us to
precisely state statistical models. We will begin by discussing some basic concepts
that will be essential throughout this chapter. For more details on matrix algebra
please consult (Axler 2015).

2.2 Some Basics of Matrix Algebra


2.2.1 Vectors

Definition 2.1 (Vector) A vector x is an array of real numbers x1 , x2 , … , xN ,


and it is written as:
⎡x1 ⎤
⎢ ⎥
x
x = ⎢ 2⎥ .
⎢⋮⎥
⎢x ⎥
⎣ n⎦

Definition 2.2 (Scaler multiplication of vectors) The product of a scalar c,


and a vector is the vector obtained by multiplying each entry in the vector by the
scalar:
⎡cx1 ⎤
⎢ ⎥
cx
cx = ⎢ 2 ⎥ .
⎢⋮⎥
⎢cx ⎥
⎣ n⎦

Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets,
First Edition. Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela.
© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
8 2 Matrix Algebra and Random Vectors

Definition 2.3 (Vector addition) The sum of two vectors of the same size is
the vector obtained by adding corresponding entries in the vectors:
⎡x1 ⎤ ⎡y1 ⎤ ⎡ x1 + y1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
x y x + y2 ⎥
x + y = ⎢ 2⎥ + ⎢ 2⎥ = ⎢ 2
⎢⋮⎥ ⎢⋮⎥ ⎢ ⋮ ⎥
⎢x ⎥ ⎢y ⎥ ⎢x + y ⎥
⎣ n⎦ ⎣ n⎦ ⎣ n n⎦

so that x + y is the vector with the ith element xi + yi .

2.2.2 Matrices

Definition 2.4 (Matrix) Let m and n denote positive integers. An m-by-n


matrix is a rectangular array of real numbers with m rows and n columns:
⎡ A1,1 · · · A1,n ⎤
A=⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣Am,1 · · · Am,n ⎦

The notation Ai,j denotes the entry in row i, column j of A. In other words,
the first index refers to the row number and the second index refers to the column
number.

Example 2.1
⎛1 4 8⎞
If A = ⎜0 4 9⎟ ,
⎜ ⎟
⎝7 −1 7⎠
then A3,1 = 7.

Definition 2.5 (Transpose of a matrix) The transpose operation AT of a


matrix changes the columns into rows, i.e. in matrix notation (AT )i,j = Aj,i , where
“T” denotes transpose.

Example 2.2
( ) ⎛1 0⎞
1 4 8
If A2×3 = , then AT3×2 = ⎜4 4⎟ .
0 4 9 ⎜ ⎟
⎝8 9⎠

Definition 2.6 (Scaler multiplication of a matrix) The product of a scalar


c, and a matrix is the matrix obtained by multiplying each entry in the matrix
2.2 Some Basics of Matrix Algebra 9

by the scalar:
⎡ cA1,1 · · · cA1,n ⎤
cA = ⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣cAm,1 · · · cAm,n ⎦

In other words, (cA)i,j = cAi,j .

Definition 2.7 (Matrix addition) The sum of two vectors of the same size is
the vector obtained by adding corresponding entries in the vectors:
⎡ A1,1 · · · A1,n ⎤ ⎡ B1,1 · · · B1,n ⎤
A+B=⎢ ⋮ ⋮ ⎥+⎢ ⋮ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣ Am,1 · · · Am,n ⎦ ⎣ Bm,1 · · · Bm,n ⎦
⎡ A1,1 + B1,1 · · · A1,n + B1,n ⎤
=⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣ Am,1 + Bm,1 · · · Am,n + Bm,n ⎦

In other words, (A + B)i,j = Ai,j + Bi,j .

Definition 2.8 (Matrix multiplication) Suppose A is an m-by-n matrix


and B is an n-by-p matrix. Then AB is defined to be the m-by-p matrix whose
entry in row i, column j, is given by the following equation:

n
(AB)i,j = Ai,k Bk,j .
k=1

In other words, the entry in row i, column j, of AB is computed by taking row


i of A and column j of B, multiplying together corresponding entries, and then
summing. The number of columns of A must be equal to the number of rows of B.

Example 2.3
⎡1 4 ⎤ [ ]
1 1
If A = ⎢0 4 ⎥ and B=
⎢ ⎥ 2 1
⎣7 −1⎦
then
⎡1 4 ⎤ [ ] ⎡ 1(1) + 4(2) 1(1) + 4(1) ⎤ ⎡9 5⎤
⎢ ⎥ 1 1
AB = 0 4 = ⎢ 0(1) + 4(2) 0(1) + 4(1) ⎥ = ⎢8 4⎥ .
⎢ ⎥ 2 1 ⎢ ⎥ ⎢ ⎥
⎣7 −1⎦ ⎣7(1) + −1(2) 7(1) + −1(1)⎦ ⎣5 6⎦
10 2 Matrix Algebra and Random Vectors

Definition 2.9 (Square matrix) A matrix A is said to be a square matrix if


the number of rows is the same as the number of columns.

Definition 2.10 (Symmetric matrix) A square matrix A is said to be sym-


metric if A=AT or in matrix notation (AT )i,j = Ai,j = Aj,i all i and j.

[ ] [ ]
1 4 1 6
Example 2.4 The matrix A = is symmetric; the matrix B = is not
4 4 4 −4
symmetric.

Definition 2.11 (Trace) For any square matrix A, the trace of A denoted
by tr(A) is defined as the sum of the diagonal elements, i.e.

n
tr(A) = aii = a11 + a22 + · · · + ann .
i=1

Example 2.5 Let A be a matrix with


⎡1 4 9 ⎤
A = ⎢1 0 0 ⎥ .
⎢ ⎥
⎣1 4 −9⎦
Then

2
tr(A) = aii = a11 + a22 + a33 = 1 + 0 + (−9) = −8.
i=1

We remark that trace are only defined for square matrices.

Definition 2.12 (Determinant of a matrix) Suppose A is an n-by-n matrix,


⎡a1,1 · · · a1,n ⎤
A=⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣an,1 · · · an,n ⎦
The determinant of A, denoted det A or |A|, is defined by
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin ,
where Cij are referred to as the “cofactors” and are computed from

Cij = (−1)i+j det Mi,j .


The term Mij is known as the “minor matrix” and is the matrix you get if you
eliminate row i and column j from matrix A.
2.2 Some Basics of Matrix Algebra 11

Finding the determinant depends on the dimension of the matrix A; determi-


nants only exist for square matrices.

Example 2.6 For a 2 by 2 matrix


[ ]
a b
A=
c d
we have
|a b|
| |
det A = |A| = | | = ad − bc.
| c d|
| |

Example 2.7 For a 3 by 3 matrix


⎡a11 a12 a13 ⎤
A = ⎢a21 a22 a23 ⎥
⎢ ⎥
⎣a31 a32 a33 ⎦
we have
|a |
| 11 a12 a13 |
| |
det A = |A| = |a21 a22 a23 |
| |
|a31 a32 a33 |
| |
= a11 (a22 a33 − a23 a33 ) − a12 (a21 a33 − a23 a31 )
+ a13 (a21 a32 − a22 a31 ).

Definition 2.13 (Positive definite matrix) A square n × n matrix A is called


positive definite if, for any vector u ∈ ℝn nonidentically zero, we have
uT Au > 0.

Example 2.8 Let A be a 2 by 2 matrix


[ ]
9 −2
A= .
−2 6
To show that A is positive definite, by definition
[ ][ ]
[ ] 9 −2 u1
uT Au = u1 , u2
−2 6 u2
= 9u21 − 4u1 u2 + 6u22
= (2u1 − u2 )2 + 5(u21 + u22 ) > 0 for [u1 , u2 ] ≠ [0, 0].
Therefore, A is positive definite.
12 2 Matrix Algebra and Random Vectors

Definition 2.14 (Positive semidefinite matrix) A matrix A is called posi-


tive semidefinite (or nonnegative definite) if, for any vector u ∈ ℝn , we have
uT Au ≥ 0.

Definition 2.15 (Negative definite matrix) A square n × n matrix A is


called negative definite if, for any vector u ∈ ℝn nonidentically zero, we have
uT Au < 0.

Example 2.9 Let A be a 2 by 2 matrix


[ ]
−2 1
A= .
1 −2
To show that A is negative definite, by definition
[ ][ ]
[ ] −2 1 u1
u Au = u1 , u2
T
1 −2 u2
= −2u21 + 2u1 u2 − 2u22
= −(u1 − u2 )2 < 0 for [u1 , u2 ] ≠ [0, 0].
Therefore, A is negative definite.

Definition 2.16 (Negative semidefinite matrix) A matrix A is called nega-


tive semidefinite if, for any vector u ∈ ℝn , we have
uT Au ≤ 0.

We state the following theorem without proof.

Theorem 2.1 A 2 by 2 symmetric matrix


[ ]
a b
A=
c d
is:
1. positive definite if and only if a > 0 and det A > 0
2. negative definite if and only if a < 0 and det A > 0
3. indefinite if and only if det A < 0.

2.3 Random Variables and Distribution Functions


We begin this section with the definition of 𝜎-algebra.

Definition 2.17 (σ-algebra ) A 𝜎-algebra  is a collection of sets  of Ω sat-


isfying the following condition:
2.3 Random Variables and Distribution Functions 13

1. ∅ ∈  .
2. If F ∈  then its complement F c ∈  .
3. If F1 , F2 , … is a countable collection of sets in  then their union ∪∞
n=1 Fn ∈  .

Definition 2.18 (Measurable functions) A real-valued function f defined


on Ω is called measurable with respect to a sigma algebra  in that space if the
inverse image of the set B, defined as f −1 (B) ≡ {𝜔 ∈ E ∶ f (𝜔) ∈ B} is a set in
𝜎-algebra  , for all Borel sets B of ℝ. Borel sets are sets that are constructed from
open or closed sets by repeatedly taking countable unions, countable intersections
and relative complement.

Definition 2.19 (Random vector) A random vector X is any measurable


function defined on the probability space (Ω,  , p) with values in ℝn (Table 2.1).

Measurable functions will be discussed in detail in Section 20.5.


Suppose we have a random vector X defined on a space (Ω,  , p). The sigma
algebra generated by X is the smallest sigma algebra in (Ω,  , p) that contains all
the pre images of sets in ℝ through X. That is
𝜎(X) = 𝜎({X−1 (B) ∣ for all B Borel sets in ℝ}).
This abstract concept is necessary to make sure that we may calculate any proba-
bility related to the random variable X.
Any random vector has a distribution function, defined similarly with the
one-dimensional case. Specifically, if the random vector X has components
X = (X1 , … , Xn ), its cumulative distribution function or cdf is defined as:
FX (x) = P(X ≤ x) = P(X1 ≤ x1 , … , Xn ≤ xn ) for all x.
Associated with a random variable X and its cdf FX is another function,
called the probability density function (pdf) or probability mass function (pmf).
The terms pdf and pmf refer to the continuous and discrete cases of random
variables, respectively.

Table 2.1 Examples of random vectors.

Experiment Random variable

Toss two dice X = sum of the numbers


Toss a coin 10 times X = sum of tails in 10 tosses
14 2 Matrix Algebra and Random Vectors

Definition 2.20 (Probability mass function) The pmf of a discrete random


variable X is given by
fX (x) = P(X = x) for all x.

Definition 2.21 (Probability density function) The pdf, fX (x) of a continu-


ous random variable X is the function that satisfies
x1 xn
F(x) = F(x1 , … , xn ) = … fX (t1 , … , tn )dtn … dt1 .
∫−∞ ∫−∞

We will discuss these notations in details in Chapter 20.


Using these concepts, we can define the moments of the distribution. In fact,
suppose that g ∶ ℝn → ℝ is any function, then we can calculate the expected value
of the random variable g(X1 , … , Xn ) when the joint density exists as:
∞ ∞
E[g(X1 , … , Xn )] = … g(x1 , … , xn )f (x1 , … , xn )dx1 … dxn .
∫−∞ ∫−∞
Now we can define the moments of the random vector. The first moment is a
vector
⎛E[X1 ]⎞
E[X] = 𝜇X = ⎜ ⋮ ⎟ .
⎜ ⎟
⎝E[Xn ]⎠
The expectation applies to each component in the random vector. Expectations of
functions of random vectors are computed just as with univariate random vari-
ables. We recall that expectation of a random variable is its average value.
The second moment requires calculating all the combination of the components.
The result can be presented in a matrix form. The second central moment can be
presented as the covariance matrix.
Cov(X) = E[(X − 𝜇X )(X − 𝜇X )t ]
⎛Var(X1 ) Cov(X1 , X2 ) … Cov(X1 , Xn ) ⎞
⎜ ⎟
Cov(X2 , X1 ) Var(X2 ) … Cov(X2 , Xn )
=⎜ ⎟, (2.1)
⎜⋮ ⋮ ⋱ ⋮ ⎟
⎜Cov(X , X ) Cov(Xn , X2 ) … Var(Xn ) ⎟
⎝ n 1 ⎠
where we used the transpose matrix notation and since the Cov(Xi , Xj ) =
Cov(Xj , Xi ), the matrix is symmetric.
We note that the covariance matrix is positive semidefinite (nonnegative defi-
nite), i.e. for any vector u ∈ ℝn , we have uT Xu ≤ 0.
Now we explain why the covariance matrix has to be semidefinite. Take any
vector u ∈ ℝn . Then the product

uT X = u i Xi (2.2)
Discovering Diverse Content Through
Random Scribd Documents
The Project Gutenberg eBook of On the
quantum theory of radiation and the structure
of the atom
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.

Title: On the quantum theory of radiation and the structure of the


atom

Author: Niels Bohr

Release date: March 2, 2024 [eBook #73087]

Language: English

Original publication: London: The London, Edinburgh, and Dublin


Philosophical Magazine and Journal of Science, 1915

Credits: Laura Natal Rodrigues (Images generously made available


by The Internet Archive.)

*** START OF THE PROJECT GUTENBERG EBOOK ON THE


QUANTUM THEORY OF RADIATION AND THE STRUCTURE OF THE
ATOM ***
THE LONDON, EDINBURGH, AND DUBLIN
PHILOSOPHICAL MAGAZINE
and
JOURNAL OF SCIENCE.

VOL. XXX—SIXTH SERIES.

JULY-DECEMBER 1915.

XLII. On the Quantum Theory


of Radiation and the Structure
of the Atom.

By N. BOHR,

Dr. phil. Copenhagen;


p.t. Reader in Mathematical Physics at the University
of Manchester[1].
CONTENTS
§ 1. General assumptions.
§ 2. Spectra emitted from systems containing only one electron.
§ 3. Spectra emitted from systems containing more than one
electron.
§ 4. The high frequency spectra of the elements.

IN a series of papers in this periodical[2] the present writer has


attempted to give the outlines of a theory of the constitution of
atoms and molecules by help of a certain application of the Quantum
theory of radiation to the theory of the nucleus atom. As the theory
has been made a subject of criticism, and as experimental evidence
of importance bearing on these questions has been obtained in the
meantime, an attempt will be made in this paper to consider some
points more closely.
§ 1. General assumptions.
According to the theory proposed by Sir Ernest Rutherford, in
order to account for the phenomena of scattering of -rays, the
atom consists of a central positively charged nucleus surrounded by
a cluster of electrons. The nucleus is the seat of the essential part of
the mass of the atom, and has linear dimensions exceedingly small
compared with the distances apart of the electrons in the
surrounding cluster. From the results of experiments on scattering of
alpha rays, Rutherford concluded that the charge on the nucleus
corresponds to a number of electrons per atom approximately equal
to half the atomic weight. Concordant evidence from a large number
of very different phenomena has led to the more definite assumption
that the number of electrons per atom is exactly equal to the atomic
number, i.e., the number of the corresponding element in the
periodic table. This view was first proposed by van den Broek[3].
While the nucleus theory has been of great utility in explaining many
important properties of the atom[4], on the other hand it is evident
that it is impossible by its aid to explain many other fundamental
properties if we base our considerations on the ordinary
electrodynamical theory; but this can hardly be considered as a valid
objection at the present time. It does not seem that there is any
escape from the conclusion that it is impossible to account for the
phenomena of temperature radiation on ordinary electrodynamics,
and that the modification to be introduced in this theory must be
essentially equivalent with the assumptions first used by Planck in
the deduction of his radiation formula[5]. These assumptions are
known as the Quantum theory. In my previous paper it was
attempted to apply the main principles of this theory by introducing
the following general assumptions:—
A. An atomic system possesses a number of states in which no
emission of energy radiation takes place, even if the particles
are in motion relative to each other, and such an emission is to
be expected on ordinary electrodynamics. The states are
denoted as the “stationary” states of the system under
consideration.
B. Any emission or absorption of energy radiation will correspond to
the transition between two stationary states. The radiation
emitted during such a transition is homogeneous and the
frequency is determined by the relation

where is Planck’s constant and , and , are the energies


of the system in the two stationary states.
C. That the dynamical equilibrium of the systems in the stationary
states is governed by the ordinary laws of mechanics, while
these laws do not hold for the transition from one state to
another.
D. That the various possible stationary states of a system consisting
of an electron rotating round a positive nucleus are determined
by the relation

where is the mean value of the kinetic energy of the system,


the frequency of rotation, and a whole number.
It will be seen that these assumptions are closely analogous to
those originally used by Planck about the emission of radiation in
quanta, and about the relation between the frequency of an atomic
resonator (of constant frequency) and its energy. It can be shown
that, for any system containing one electron rotating in a closed
orbit, the assumption C and the relation (2) will secure a connexion
between the frequency calculated by (1) and that to be expected
from ordinary electrodynamics, in the limit where the difference
between the frequency of the rotation of the electron in successive
stationary states is very small compared with the absolute value of
the frequency (see IV. p. 5). On the nucleus theory this occurs in the
region of very slow vibrations. If the orbit of the electron is circular,
the assumption D is equivalent to the condition that the angular
momentum of the system in the stationary states is an integral
multiple of . The possible importance of the angular momentum
in the discussion of atomic systems in relation to Planck’s theory was
first pointed out by J. W. Nicholson[6].
In paper I. it was shown that the above assumptions lead to an
interpretation of the Balmer formula for the hydrogen spectrum, and
to a determination of the Rydberg constant which was in close
agreement with the measurements. In these considerations it is not
necessary to make any assumption about the degree of excentricity
of the orbit of the electron, and we shall see in the next section that
it cannot be assumed that the orbit is always circular.
So far we have considered systems which contain only one
electron, but the general validity of the assumptions A and B seems
strongly supported by the fact that they offer a simple interpretation
of the general principle of combination of spectral lines (see IV. p.
2). This principle was originally discovered by Ritz to hold for the
ordinary series spectra of the elements. It has recently acquired
increased interest by Fowler’s work on the series spectra of
enhanced lines emitted from many elements when subject to a
powerful electric discharge. Fowler showed that the principle of
combination holds for these spectra although the laws governing the
numerical relation between the lines at an important point (see
section 3) differed from those of the ordinary series spectra. There is
also, as we shall see in section 4, some indication that the principle
holds for the high frequency spectra revealed by interference in
crystals. In this connexion it may also be remarked that the
assumption A recently has obtained direct support by experiments of
A. Einstein and J. W. de Haas[7], who have succeeded in detecting
and measuring a rotational mechanical effect produced when an iron
bar is magnetized. Their results agree very closely with those to be
expected on the assumption that the magnetism of iron is due to
rotating electrons, and as pointed out by Einstein and Haas, these
experiments therefore indicate very strongly that electrons can
rotate in atoms without emission of energy radiation.
When we try to apply assumptions, analogous with C and D, to
systems containing more than one electron, we meet with
difficulties, since in this case the application of ordinary mechanics in
general does not lead to periodic orbits. An exception to this,
however, occurs if the electrons are arranged in rings and rotate in
circular orbits, and from simple considerations of analogy the
following assumption was proposed (see I. p. 24).
E. In any atomic or molecular system consisting of positive nuclei
and electrons in which the nuclei are at rest relative to each
other, and the electrons move in circular orbits, the angular
momentum of each electron round the centre of its orbit will be
equal to in the “normal” state of the system, i.e. the state
in which the total energy is a minimum.
It was shown that in a number of different cases this assumption
led to results in approximate agreement with experimental facts. In
general, no stable configuration in which the electrons rotate in
circular orbits can exist if the problem of stability is discussed on
ordinary mechanics. This is no objection, however, since it is
assumed already that the mechanics do not hold for the transition
between two stationary states. Simple considerations led to the
following condition of stability.
F. A configuration satisfying the condition E is stable if the total
energy of the system is less than in any neighbouring
configuration satisfying the same condition of angular
momentum of the electrons.
As already mentioned, the foundation for the hypothesis E was
sought in analogy with the simple system consisting of one electron
and one nucleus. Additional support, however, was obtained from a
closer consideration of the formation of the systems. It was shown
how simple processes could be imagined by which the confluence of
different rings of electrons could be effected without any change in
the angular momentum of the electrons, if the angular momentum
of each electron before the process was the same. Such
considerations led to a theory of formation of molecules.
It must be emphasized that only in the case of circular orbits has
the angular momentum any connexion with the principles of the
Quantum theory. If, therefore, the application of ordinary mechanics
to the stationary states of the system does not lead to strictly
circular orbits, the assumption E cannot be applied. This case occurs
if we consider configurations in which the electrons are arranged in
different rings which do not rotate with the same frequency. Such
configurations, however, are apparently necessary in order to explain
many characteristic properties of the atoms. In my previous papers
an attempt was made in certain cases to overcome this difficulty by
assuming, that if a very small alteration of the forces would make
circular orbits possible on ordinary mechanics, the configuration and
energy of the actual system would only differ very little from that
calculated for the altered system. It will be seen that this
assumption is most intimately connected with the hypothesis F on
the stability of the configurations. Such considerations were used to
explain the general appearance of the Rydberg constant in the
spectra of the elements, and were also applied in discussing possible
configurations of the electrons in the atoms suggested by the
observed chemical properties. These calculations have been
criticised by Nicholson[8], who has attempted to show that the
configurations chosen for the electrons in the atoms are inconsistent
with the main principles of the theory, and has also attempted to
prove the impossibility of accounting for other spectra by help of
assumptions similar to those used in the interpretation of the
hydrogen spectrum.
Although I am quite ready to admit that these points involve
great and unsolved difficulties, I am unable to agree with Nicholson’s
conclusions. In the first place, his calculations rest upon a particular
application to non-circular orbits of the principle of constancy of
angular momentum for each electron, which it does not seem
possible to justify either on the Quantum theory or on the ordinary
mechanics, and which has no direct connexion with the assumptions
used in my papers. It has not been proved that the configurations
proposed are inconsistent with the assumption C. But even if it were
possible to prove that the unrestricted use of ordinary mechanics to
the stationary states is inconsistent with the configurations of the
electrons, apparently necessary to explain the observed properties of
the elements, this would not constitute a serious objection to the
deductions in my papers. It must be remarked that all the
applications of ordinary mechanics are essentially connected with the
assumption of periodic orbits. As far as the applications are
concerned, the first part of the assumption C might just as well have
been given the following more cautious form:—
“The relation between the frequency and energy of the particles
in the stationary states can be determined by means of the ordinary
laws of mechanics if these laws lead to periodic orbits.”
The possible necessity for an alteration of this kind in assumption
C may perhaps not seem unlikely when it is remembered that the
laws of mechanics are only known to hold for certain mean values of
the motion of the electrons. In this connexion it should also be
remarked that when considering periodic orbits only mean values are
essential (comp. I. p. 7). The preliminary and tentative character of
the formulation of the general assumptions cannot be too strongly
emphasized, and admittedly they are made to suit certain simple
applications. For example, it has been already shown in paper IV.
that the assumption B needs modification in order to account for the
effect of a magnetic field on spectral lines. In the following sections
some of the recent experimental evidence on line spectra and
characteristic Röntgen rays will be considered, and I shall endeavour
to show that it seems to give strong support to the main principles
of the theory.
§ 2. Spectra emitted from systems
containing only one electron.
In the former papers it was shown that the general assumptions
led to the following formula for the spectrum emitted by an electron
rotating round a positive nucleus

, , , are the electric charges and the masses of the


nucleus and the electron respectively. The frequency of rotation and
the major axis of the relative orbit of the particles in the stationary
states are given by

The energy necessary to remove the electron to infinite distance from


the nucleus is

This expression is also equal to the mean value of the kinetic energy
of the system. Since is equal to the total energy of the
system we get from (4) and (5)
If we compare (6) with the relation (1), we see that the connexion
with ordinary mechanics in the region of slow vibration, mentioned in
the former section, is satisfied.
Putting in (3) we get the ordinary series spectrum of
hydrogen. Putting we get a spectrum which, on the theory,
should be expected to be emitted by an electron rotating round a
helium nucleus. The formula is found very closely to represent some
series of lines observed by Fowler[9] and Evans[10]. These series
correspond to and [11]. The theoretical value for the
ratio between the second factor in (3) for this spectrum and for the
hydrogen spectrum is 1.000409; the value calculated from Fowler’s
measurements is 1.000408[12]. Some of the lines under consideration
have been observed earlier in star spectra, and have been ascribed to
hydrogen not only on account of the close numerical relation with the
lines of the Balmer series, but also on account of the fact that the
lines observed, together with the lines of the Balmer series,
constitutes a spectrum which shows a marked analogy with the
spectra of the alkali metals. This analogy, however, has been
completely disturbed by Fowler’s and Evans’ observations, that the
two new series contain twice as many lines as is to be expected on
this analogy. In addition, Evans has succeeded in obtaining the lines
in such pure helium that no trace of the ordinary hydrogen lines
could be observed[13]. The great difference between the conditions
for the production of the Balmer series and the series under
consideration is also brought out very strikingly by some recent
experiments of Rau[14] on the minimum voltage necessary for the
production of spectral lines. While about 13 volts was sufficient to
excite the lines of the Balmer series, about 80 volts was found
necessary to excite the other series. These values agree closely with
the values calculated from the assumption E for the energies
necessary to remove the electron from the hydrogen atom and to
remove both electrons from the helium atom, viz. 13.6 and 81.3 volts
respectively. It has recently been argued[15] that the lines are not so
sharp as should be expected from the atomic weight of helium on
Lord Rayleigh’s theory of the width of spectral lines. This might,
however, be explained by the fact that the systems emitting the
spectrum, in contrast to those emitting the hydrogen spectrum, are
supposed to carry an excess positive charge, and therefore must be
expected to acquire great velocities in the electric field in the
discharge-tube.
In paper IV. an attempt was made on the basis of the present
theory to explain the characteristic effect of an electric field on the
hydrogen spectrum recently discovered by Stark. This author
observed that if luminous hydrogen is placed in an intense electric
field, each of the lines of the Balmer series is split up into a number
of homogeneous components. These components are situated
symmetrically with regard to the original lines, and their distance
apart is proportional to the intensity of the external electric field. By
spectroscopic observation in a direction perpendicular to the field, the
components are linearly polarized, some parallel and some
perpendicular to the field. Further experiments have shown that the
phenomenon is even more complex than was at first expected. By
applying greater dispersion, the number of components observed has
been greatly increased, and the numbers as well as the intensities of
the components are found to vary in a complex manner from line to
line[16]. Although the present development of the theory does not
allow us to account in detail for the observations, it seems that the
considerations in paper IV. offer a simple interpretation of several
characteristic features of the phenomenon.
The calculation can be made considerably simpler than in the
former paper by an application of Hamilton’s principle. Consider a
particle moving in a closed orbit in a stationary field. Let be the
frequency of revolution, the mean value of the kinetic energy
during the revolution, and the mean value of the sum of the
kinetic energy and the potential energy of the particle relative to the
stationary field. We have then for a small arbitrary variation of the
orbit
This equation was used in paper IV. to prove the equivalence of the
formulæ (2) and (6) for any system governed by ordinary mechanics.
The equation (7) further shows that if the relations (2) and (6) hold
for a system of orbits, they will hold also for any small variation of
these orbits for which the value of is unaltered. If a hydrogen
atom in one of its stationary states is placed in an external electric
field and the electron rotates in a closed orbit, we shall therefore
expect that is not altered by the introduction of the atom in the
field, and that the only variation of the total energy of the system will
be due to the variation of the mean value of the potential energy
relative to the external field.
In the former paper it was pointed out that the orbit of the
electron will be deformed by the external field. This deformation will
in course of time be considerable even if the external electric force is
very small compared with the force of attraction between the
particles. The orbit of the electron may at any moment be considered
as an ellipse with the nucleus in the focus, and the length of the
major axis will approximately remain constant, but the effect of the
field will consist in a gradual variation of the direction of the major
axis as well as the excentricity of the orbit. A detailed investigation of
the very complicated motion of the electron was not attempted, but it
was simply pointed out that the problem allows of two stationary
orbits of the electron, and that these may be taken as representing
two possible stationary states. In these orbits the excentricity is equal
to 1, and the major axis parallel to the external force; the orbits
simply consisting of a straight line through the nucleus parallel to the
axis of the field, one on each side of it. It can very simply be shown
that the mean value of the potential energy relative to the field for
these rectilinear orbits is equal to , where is the external
electric force and the major axis of the orbit, and the two signs
correspond to orbits in which the direction of the major axis from the
nucleus is the same or opposite to that of the electric force
respectively. Using the formulæ (4) and (5) and neglecting the mass
of the electron compared with that of the nucleus, we get, therefore,
for the energy of the system in the two states

respectively. This expression is the same as that deduced in paper IV.


by an application of (6) to the expressions for the energy and
frequency of the system. Applying the relation (1) and using the
same arguments as in paper IV. p. 10, we are therefore led to expect
that the hydrogen spectrum in an electric field will contain two
components polarized parallel to the field and of a frequency given by

The table below contains Stark’s recent measurements of the


frequency difference between the two strong outer components
polarized parallel to the field for the five first lines in the Balmer
series[17]. The first column gives the values for the numbers and
. The second and fourth columns give the frequency difference
corresponding to a field of 28500 and 74000 volts per cm.
respectively. The third and fifth columns give the values of

where should be a constant for all the lines and equal to unity.
28500 volts. per cm. 74000 volts. per cm.

2 3 0.46 0.83
28500 volts. per cm. 74000 volts. per cm.

2 4 1.04 0.79 2.86 0.83


2 5 2.06 0.89 5.41 0.90
2 6 3.16 0.90 7.81 0.85
2 7 4.47 0.90

Considering the difficulties of accurate measurement of the


quantities involved, it will be seen that the agreement with regard to
the variation of the frequency differences from line to line is very
good. The fact that all the observed values are a little smaller than
the calculated may be due to a slight over-estimate of the intensity of
the fields used in the experiments (see Stark, loc. cit. pp. 38 and
118). Besides the two strong outer components polarized parallel to
the field, Stark’s experiments have revealed a large number of inner
weaker components polarized in the same way, and also a number of
components polarized perpendicular to the field. This complexity of
the phenomenon, however, cannot be considered as inconsistent with
the theory. The above simple calculations deal only with the two
extreme cases, and we may expect to find a number of stationary
states corresponding to orbits of smaller excentricity. In a discussion
of such non-periodic orbits, however, the general principles applied
are no longer sufficient guidance.
Apart from the agreement with the calculations, Stark’s
experiments seem to give strong support to the interpretation of the
origin of the two outer components. It was found that the two outer
components have not always equal intensities; when the spectrum is
produced by positive rays, it was found that the component of
highest frequency is the stronger if the rays travel against the electric
field, while if it travels in the direction of the field the component of
smallest frequency is the stronger (loc. cit. p. 40). This indicates that
the components are produced independently of each other—a result
to be expected if they correspond to quite different orbits of the
electron. That the orbit of the electron in general need not be circular
is also very strongly indicated by the observation that the hydrogen
lines emitted from positive rays under certain conditions are partly
polarized without the presence of a strong external field (loc. cit. p.
12). This polarization, as well as the observed intensity differences of
the two components, would be explained if we can assume that for
some reason, when the atom is in rapid motion, there is a greater
probability for the orbit of the electron to lie behind the nucleus
rather than in front of it.
§ 3. Spectra emitted from systems
containing more than one electron.
According to Rydberg and Ritz, the frequency of the lines in the
ordinary spectrum of an element is given by

where and are whole numbers and , , ...... are a series


of functions of which can be expressed by

where is a universal constant and a function which for large


values of approaches unity. The complete spectrum is obtained by
combining the numbers and , as well as the functions ,
...... in every possible way.
On the present theory, this indicates that the system which emits
the spectrum possesses a number of series of stationary states for
which the energy in the th state in the th series is given by (see
IV. p. 6)

where is an arbitrary constant, the same for the whole system of


stationary states. The first factor in the second term is equal to the
expression (5) if .
In the present state of the theory it is not possible to account in
detail for the formula (13), but it was pointed out in my previous
papers that a simple interpretation can be given of the fact that in
every series approaches unity for large values of . It was
assumed that in the stationary states corresponding to such values
of , one of the electrons in the atom moves at a distance from the
nucleus large compared with the distance of the other electrons. If
the atom is neutral, the outer electron will be subject to very nearly
the same forces as the electron in the hydrogen atom, and the
formula (13) indicates the presence of a number of series of
stationary states of the atom in which the configuration of the inner
electrons is very nearly the same for all states in one series, while
the configuration of the outer electron changes from state to state in
the series approximately in the same way as the electron in the
hydrogen atom. From the considerations in the former sections it will
therefore appear that the frequency calculated from the relations (1)
and (13) for the radiation emitted during the transition between
successive stationary states within each series will approach that to
be expected on ordinary electrodynamics in the region of slow
vibrations[18].
From (13) it follows that for high values of the configuration of
the inner electrons possesses the same energy in all the series of
stationary states corresponding to the same spectrum (11). The
different series of stationary states must therefore correspond to
different types of orbits of the outer electron, involving different
relations between energy and frequency. In order to fix our ideas, let
us for a moment consider the helium atom. This atom contains only
two electrons, and in the previous papers it was assumed that in the
normal state of the atom the electrons rotate in a circular ring round
a nucleus. Now the helium spectrum contains two complete systems
of series given by formulæ of the type (11) and the measurements
of Rau mentioned below indicate that the configuration of the inner
electron in the two corresponding systems of stationary states
possesses the same energy. A simple assumption is therefore that in
one of the two systems the orbit of the electron is circular and in the
other very flat. For high values of the inner electron in the two
configurations will act on the outer electron very nearly as a ring of
uniformly distributed charge with the nucleus in the centre or as a
line charge extending from the nucleus, respectively. In both cases
several different types of orbit for the outer electron present
themselves, for instance, circular orbits perpendicular to the axis of
the system or very flat orbits parallel to this axis. The different
configurations of the inner electrons might be due to different ways
of removing the electron from the neutral atom: thus, if it is
removed by impact perpendicular to the plane of the ring, we might
expect the orbit of the remaining electron to be circular, if it is
removed by an impact in the plane of the ring we might expect the
orbit to be flat. Such considerations may offer a simple explanation
of the fact that in contrast with the helium spectrum the lithium
spectrum contains only one system of series of the type (11). The
neutral lithium atom contains three electrons, and according to the
configuration proposed in paper II. the two electrons move in an
inner ring and the other electron in an outer orbit; for such a
configuration we should expect that the mode of removal of the
outer electron would be of no influence on the configuration of the
inner electrons. It is unnecessary to point out the hypothetical
nature of these considerations, but the intention is only to show that
it does not seem impossible to obtain simple interpretations of the
spectra observed on the general principles of the theory. However, in
a quantitative comparison with the measurements we meet with the
difficulties mentioned in the first section of applying assumptions
analogous with and to systems for which ordinary mechanics
do not lead to periodic orbits.
The above interpretation of the formulæ (11) and (12) has
recently obtained very strong support by Fowler’s work on series of
enhanced lines on spark spectra[19]. Fowler showed that the
frequency of the lines in these spectra, as of the lines in the ordinary
spectra, can be represented by the formula (11). The only difference
is that the Rydberg constant in (12) is replaced by a constant
. It will be seen that this is just what we should expect on the
present theory if the spectra are emitted by atoms which have lost
two electrons and are regaining one of them. In this case, the outer
electron will rotate round a system of double charge, and we must
assume that in the stationary states it will have configurations
approximately the same as an electron rotating round a helium
nucleus. This view seems in conformity with the general evidence as
to the conditions of the excitation of the ordinary spectra and the
spectra of enhanced lines. From Fowler’s results, it will appear that
the helium spectrum given by (3) for has exactly the same
relation to the spectra of enhanced lines of other elements as the
hydrogen spectrum has to the ordinary spectra. It may be expected
that it will be possible to observe spectra of a new class
corresponding to a loss of 3 electrons from the atom, and in which
the Rydberg constant is replaced by . No definite evidence,
however, has so far been obtained of the existence of such
spectra[20].
Additional evidence of the essential validity of the interpretation
of formula (13) seems also to be derived from the result of Stark’s
experiments on the effect of electric fields on spectral lines. For
other spectra, this effect is even more complex than for the
hydrogen spectrum, in some cases not only are a great number of
components observed, but the components are generally not
symmetrical with regard to the original line, and their distance apart
varies from line to line in the same series in a far more irregular way
than for the hydrogen lines[21]. Without attempting to account in
detail for any of the electrical effects observed, we shall see that a
simple interpretation can be given of the general way in which the
magnitude of the effect varies from series to series.
In the theory of the electrical effect on the hydrogen spectrum
given in the former section, it was supposed that this effect was due
to an alteration of the energy of the systems in the external field,
and that this alteration was intimately connected with a considerable
deformation of the orbit of the electron. The possibility of this
deformation is due to the fact that without the external field every
elliptical orbit of the electron in the hydrogen atom is stationary. This
condition will only be strictly satisfied if the forces which act upon
the electron vary exactly as the inverse square of the distance from
the nucleus, but this will not be the case for the outer electron in an
atom containing more than one electron. It was pointed out in paper
IV. that the deviation of the function from unity gives us an
estimate for the deviation of the forces from the inverse square, and
that on the theory we can only expect a Stark effect of the same
order of magnitude as for the hydrogen lines for those series in
which differs very little from unity.
This conclusion was consistent with Stark’s original
measurements of the electric effect on the different series in the
helium spectrum, and it has since been found to be in complete
agreement with the later measurements for a great number of other
spectral series. An electric effect of the same order of magnitude as
that for hydrogen lines has been observed only for the lines in the
two diffuse series of the helium spectrum and the diffuse series of
lithium. This corresponds to the observation that for these three
series is very much nearer to unity than for any other series; even
for the deviation of from 1 is less than one part in a
thousand. The distance between the outer components for all three
series is smaller than that observed for the hydrogen line
corresponding to the same value of , but the ratio between this
distance and that of the hydrogen lines approaches rapidly to unity
as increases. This is just what would be expected on the above
considerations. The series for which the effect, although much
smaller, comes next in magnitude to the three series mentioned, is
the principal single line series in the helium spectrum. This
corresponds to the fact that the deviation of from unity, although
several times greater than for the three first series, is much smaller
for this series than for any other of the series examined by Stark.
For all the other series the effect was very small, and in most cases
even difficult of detection.
Quite apart from the question of the detailed theoretical
interpretation of the formula (13), it seems that it may be possible to
test the validity of this formula by direct measurements of the
minimum voltages necessary to produce spectral lines. Such
measurements have recently been made by Rau[22] for the lines in
the ordinary helium spectrum. This author found that the different
lines within each series appeared for slightly different voltages,
higher voltages being necessary to produce the lines corresponding
to higher values of , and he pointed out that the differences
between the voltages observed were of the magnitude to be
expected from the differences in the energies of the different
stationary states calculated by (13). In addition Rau found that the
lines corresponding to high values of n appeared for very nearly the
same voltages for all the different series in both helium spectra. The
absolute values for the voltages could not be determined very
accurately with the experimental arrangement, but apparently nearly
30 volts was necessary to produce the lines corresponding to high
values of . This agrees very closely with the value calculated on the
present theory for the energy necessary to remove one electron
from the helium atom, viz., 29.3 volts. On the other hand, the later
value is considerably larger than the ionization potential in helium
(20.5 volts) measured directly by Franck and Hertz[23]. This
apparent disagreement, however, may possibly be explained by the
assumption, that the ionization potential measured does not
correspond to the removal of the electron from the atom but only to
a transition from the normal state of the atom to some other
stationary state where the one electron rotates outside the other,
and that the ionization observed is produced by the radiation
emitted when the electron falls back to its original position. This
radiation would be of a sufficiently high frequency to ionize any
impurity which may be present in the helium gas, and also to
liberate electrons from the metal part of the apparatus. The
frequency of the radiation would be , which is
of the same order of magnitude as the characteristic frequency
calculated from experiments on dispersion in helium, viz.,
[24].
Similar considerations may possibly apply also to the recent
remarkable experiments of Franck and Hertz on ionization in
mercury vapour[25]. These experiments show strikingly that an
electron does not lose energy by collision with a mercury atom if its
energy is smaller than a certain value corresponding to 4.9 volts, but
as soon as the energy is equal to this value the electron has a great
probability of losing all its energy by impact with the atom. It was
further shown that the atom, as the result of such an impact, emits
a radiation consisting only of the ultraviolet mercury line of wave-
length 2536, and it was pointed out that if the frequency of this line
is multiplied by Planck’s constant, we obtain a value which, within
the limit of experimental error, is equal to the energy acquired by an
electron by a fall through a potential difference of 4.9 volts. Franck
and Hertz assume that 4.9 volts corresponds to the energy
necessary to remove an electron from the mercury atom, but it
seems that their experiments may possibly be consistent with the
assumption that this voltage corresponds only to the transition from
the normal state to some other stationary state of the neutral atom.
On the present theory we should expect that the value for the
energy necessary to remove an electron from the mercury atom
could be calculated from the limit of the single line series of
Paschen, 1850, 1403, 1269[26]. For since mercury vapour absorbs
light of wave-length 1850[27], the lines of this series as well as the
line 2536 must correspond to a transition from the normal state of
the atom to other stationary states of the neutral atom (see I. p.
16). Such a calculation gives 10.5 volts for the ionization potential
instead of 4.9 volts[28]. If the above considerations are correct it will
be seen that Franck and Hertz’s measurements give very strong
support to the theory considered in this paper. If, on the other hand,
the ionization potential of mercury should prove to be as low as
assumed by Franck and Hertz, it would constitute a serious difficulty
for the above interpretation of the Rydberg constant, at any rate for
the mercury spectrum, since this spectrum contains lines of greater
frequency than the line 2536.
It will be remarked that it is assumed that all the spectra
considered in this section are essentially connected with the
displacement of a single electron. This assumption—which is in
contrast to the assumptions used by Nicholson in his criticism of the
present theory—does not only seem supported by the
measurements of the energy necessary to produce the spectra, but
it is also strongly advocated by general reasons if we base our
considerations on the assumption of stationary states. Thus it may
happen that the atom loses several electrons by a violent impact,
but the probability that the electrons will be removed to exactly the
same distance from the nucleus or will fall back into the atom again
at exactly the same time would appear to be very small. For
molecules, i. e. systems containing more than one nucleus, we have
further to take into consideration that if the greater part of the
electrons are removed there is nothing to keep the nuclei together,
and that we must assume that the molecules in such cases will split
up into single atoms (comp. III. p. 2).
§ 4. The high frequency spectra of the
elements.
In paper II. it was shown that the assumption E led to an estimate of the energy
necessary to remove an electron from the innermost ring of an atom which was in
approximate agreement with Whiddington’s measurements of the minimum kinetic
energy of cathode rays required to produce the characteristic Röntgen radiation of
the type. The value calculated for this energy was equal to the expression (5) if
. In the calculation the repulsion from the other electrons in the ring was
neglected. This must result in making the value a little too large, but on account of
the complexity of the problem no attempt at that time was made to obtain a more
exact determination of the energy.
These considerations have obtained strong support through Moseley’s important
researches on the high frequency spectra of the elements[29]. Moseley found that
the frequency of the strongest lines in these spectra varied in a remarkably simple
way with the atomic number of the corresponding element. For the strongest line in
the radiation he found that the frequency for a great number of elements was
represented with considerable accuracy by the empirical formula

where is the Rydberg constant in the hydrogen spectrum. It will be seen that this
result is in approximate agreement with the calculation mentioned above if we
assume that the radiation is emitted as a quantum .
Moseley pointed out the analogy between the formula (14) and the formula (3) in
section 2, and remarked that the constant was equal to the last factor in this
formula, if we put and . He therefore proposed the explanation of
the formula (14), that the line was emitted during a transition of the innermost ring
between two states in which the angular momentum of each electron was equal to
and respectively. From the replacement of by he deduced
that the number of electrons in the ring was equal to 4. This view, however, can
hardly be maintained. The approximate agreement mentioned above with
Whiddington’s measurements for the energy necessary to produce the characteristic
radiation indicates very strongly that the spectrum is due to a displacement of a
single electron, and not to a whole ring. In the latter case the energy should be
several times larger. It is also pointed out by Nicholson[30] that Moseley’s explanation
would imply the emission of several quanta at the same time; but this assumption is
apparently not necessitated for the explanation of other phenomena. At present it
seems impossible to obtain a detailed interpretation of Moseley’s results, but much
light seems to be thrown on the whole problem by some recent interesting
considerations by W. Kossel[31].
Kossel takes the view of the nucleus atom and assumes that the electrons are
arranged in rings, the one outside the other. As in the present theory, it is assumed
that any radiation emitted from the atom is due to a transition of the system
between two steady states, and that the frequency of the radiation is determined by
the relation (1). He considers now the radiation which results from the removal of an
electron from one of the rings, assuming that the radiation is emitted when the atom
settles down in its original state. The latter process may take place in different ways.
The vacant place in the ring may be taken by an electron coming directly from
outside the whole system, but it may also be taken by an electron jumping from one
of the outer rings. In the latter case a vacant place will be left in that ring to be
replaced in turn by another electron, etc. For the sake of brevity, we shall refer to
the innermost ring as ring 1, the next one as ring 2, and so on. Kossel now assumes
that the radiation results from the removal of an electron from ring 1, and makes
the interesting suggestion that the line denoted by Moseley as corresponds to
the radiation emitted when an electron jumps from ring 2 to ring 1, and that the line
, corresponds to a jump from ring 3 to ring 1. On this view, we should expect
that the radiation consists of as many lines as there are rings in the atom, the
lines forming a series of rapidly increasing intensities. For the radiation, Kossel
makes assumptions analogous to those for the radiation, with the distinction that
the radiation is ascribed to the removal of an electron from ring 2 instead of ring 1.
A possible radiation is ascribed to ring 3, and so on. The interest of these
considerations is that they lead to the prediction of some simple relations between
the frequencies of the different lines. Thus it follows as an immediate consequence
of the assumption used that we must have

It will be seen that these relations correspond exactly to the ordinary principle of
combination of spectral lines. By using Moseley’s measurements for and and
extrapolating for the values of by the help of Moseley’s. empirical formula, Kossel
showed that the first relation was closely satisfied for the elements from calcium to
zinc. Recently T. Malmer[32] has measured the wave-length of and for a
number of elements of higher atomic weight, and it is therefore possible to test the
relation over a wider range and without extrapolation. The table gives Malmer’s
values for and Moseley’s values for , all values being multiplied by
.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookmass.com

You might also like