Download Quantitative Methods of Data Analysis for the Physical Sciences and Engineering 1st Edition Douglas G. Martinson ebook All Chapters PDF
Download Quantitative Methods of Data Analysis for the Physical Sciences and Engineering 1st Edition Douglas G. Martinson ebook All Chapters PDF
com
https://textbookfull.com/product/quantitative-methods-of-
data-analysis-for-the-physical-sciences-and-engineering-1st-
edition-douglas-g-martinson/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/quantitative-data-analysis-for-
language-assessment-volume-ii-advanced-methods-vahid-aryadoust/
textboxfull.com
https://textbookfull.com/product/introduction-to-quantitative-data-
analysis-in-the-behavioral-and-social-sciences-michael-j-albers/
textboxfull.com
https://textbookfull.com/product/quantitative-analysis-of-market-
data-1st-edition-adam-grimes/
textboxfull.com
https://textbookfull.com/product/statistical-data-analysis-using-sas-
intermediate-statistical-methods-mervyn-g-marasinghe/
textboxfull.com
https://textbookfull.com/product/data-analysis-techniques-for-
physical-scientists-1st-edition-claude-a-pruneau/
textboxfull.com
https://textbookfull.com/product/multilevel-network-analysis-for-the-
social-sciences-theory-methods-and-applications-1st-edition-emmanuel-
lazega/
textboxfull.com
https://textbookfull.com/product/quantitative-data-analysis-a-
companion-for-accounting-and-information-systems-research-1st-edition-
willem-mertens/
textboxfull.com
https://textbookfull.com/product/the-reviewer-s-guide-to-quantitative-
methods-in-the-social-sciences-gregory-r-hancock/
textboxfull.com
Quantitative Methods of Data Analysis for the Physical Sciences
and Engineering
This book provides thorough and comprehensive coverage of most of the new and
important quantitative methods of data analysis for college and graduate students and
practitioners. In recent years, data analysis methods have exploded alongside advanced
computing power, and an understanding of such methods is critical to getting the most
out of data and for extracting signal from noise. The book excels in explaining difficult
concepts through simple explanations and detailed explanatory illustrations. Most
unique is the focus on confidence limits for power spectra and their proper interpretation,
something rare or completely missing in other books. Likewise, there is a thorough
discussion of how to assess uncertainty via use of Expectancy, and easy-to-apply and
-understand Bootstrap method. The book is written so that descriptions of each method
are as self-contained as possible. Many examples are presented to clarify interpretations,
as are user tips in highlighted boxes.
www.cambridge.org
Information on this title: www.cambridge.org/9781107029767
DOI: 10.1017/9781139342568
© Douglas G. Martinson 2018
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2018
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Martinson, Douglas G.
Title: Quantitative methods of data analysis for the physical sciences and engineering /
Douglas G. Martinson (Columbia University, New York)
Description: Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2018.
Identifiers: LCCN 2017055413| ISBN 9781107029767 (hbk.) | ISBN 1107029767 (hbk.)
Subjects: LCSH: Statistics. | Physical sciences – Statistical methods. | Engineering – Statistical methods.
Classification: LCC QA276 .M3375 2018 | DDC 519.5–dc23
LC record available at https://lccn.loc.gov/2017055413
ISBN 978-1-107-02976-7 Hardback
Additional resources for this publication at www.cambridge.org/martinson
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To the love of my life, my wife Rhonda
Contents
Part I Fundamentals 1
2 Probability Theory 15
2.1 Overview 15
2.2 Definitions 16
2.3 Probability 18
2.4 Univariate Distributions 19
2.5 Multivariate Distributions 27
2.6 Moments of Random Variables 31
2.7 Common Distributions and Their Moments 50
2.8 Take-Home Points 59
2.9 Questions 60
3 Statistics 62
3.1 Overview 62
3.2 Estimation 62
3.3 Estimating the Distribution 66
3.4 Point Estimates 69
3.5 Principle of Maximum Likelihood (An Important Principle) 76
viii Contents
4 Interpolation 105
4.1 Overview 105
4.2 Piecewise Continuous Interpolants 110
4.3 Continuous Interpolants 124
4.4 Take-Home Points 126
4.5 Questions 127
This book is the outcome of a one-semester graduate class taught in the Department of
Earth and Environmental Sciences at Columbia University, although the book could be
used over two or even three semesters, if desired. I have taught this class since 1985,
having taken over from a departing marine seismologist who had taught the course as
one on Fourier analysis, the only topic that computers of the day were capable of
performing, because of the development of the Fast Fourier Transform. However, at
that time computers were rapidly becoming powerful enough to allow application of
methods requiring more power and memory. New methods were sprouting yearly, and
as the computers grew faster, previously sluggish methodologies were becoming
realizable. At the time I started teaching the course, there were no textbooks (none!)
that gave a thorough introduction to the primary methods. Numerical Recipes –
published in the early 1980s – did present a brief overview and the computer code
necessary to run nearly every method, and it was a godsend. It occurred to me that my
class notes should be converted to a book to fill this void. Over the last 30 years many
other books have been published, but in my opinion there is still a need for an
introductory-level book that spans a broad number of the most useful techniques.
Regardless of its introductory nature, I have tried to give the reader a complete enough
understanding to allow him or her to properly apply the methods while avoiding
common pitfalls and misunderstandings.
I try to present the methods following a few fundamental themes: e.g., Principle of
Maximum Likelihood for deriving optimal methods, and Expectancy for estimating
uncertainty. I hope this makes these important themes better understood and the material
easier to grasp.
This book is designed to fill many needs, according to the level of the student. Some, like
myself, see the methods clearly if they understand their complete derivation, while
others don’t require that detailed understanding. In an effort to satisfy both readerships,
I have placed complete derivations in boxes highlighted with 25 percent grayscale: these
boxes are optional, and the reader is free, if preferred, to skip the box and go straight to
the answer (all equations in derivation boxes are prefaced by “D” – for example,
“D5.1”). There are student exercises at the end of each chapter, some of which require
xiv Preface
computing. I do not present code because it changes so quickly, but I do show some
MATLAB code in the solution manual. Data for exercises requiring such can be found at
www.cambridge.org/martinson. Most of the examples in the book are taken from the
natural sciences, although they are presented so as to be understandable to anyone.
Special user tips are included in boxes highlighted with 15 percent grayscale. I have
attempted to make each chapter stand on its own (as far as possible), so the reader
doesn’t need to have read the entire book in order to understand material from previous
chapters. This should make the book good for easy use and reference for a particular
method.
Read it, practice, and when not sure what road to take, take all possible roads and then
determine which is the most appropriate for your particular analysis. Then, maybe
present several results explaining the differences, and why you favor the method you
choose.
Acknowledgments
As with all books evolving from a course, one must acknowledge the considerable input
from students and teaching assistants. As any teacher knows, it is usually the one
teaching who learns more than anyone – when first teaching this course there were
numerous derivations that only were partly developed, then a “miracle occurred” that
skipped some “intuitively obvious” steps to the final result. No such skipped steps occur
within this book. Over the years, excellent questions from students that I could not
answer on the spot forced me to fill in many aspects of the material. So I offer a heartfelt
thanks to those who stumped me in class. In that same vein, I would appreciate hearing
about any errors still present in the book. The class has benefited from some incredibly
smart and motivated teaching assistants, and many of the exercises appearing at the end
of chapters originated from them (special thanks go to Sharon Stammerjohn, Chen Chen,
and Darren McKee, among many others). Unfortunately, as I transformed my class notes
into a textbook, my wife, Rhonda, became a writer’s widow for nearly a year – I can’t
thank her enough for all the support she has given me. And finally, but not least, thanks to
my editor, Matt Lloyd, at Cambridge University Press, who was a constant source of
improvements and encouragement!
Finally, that ubiquitous message accompanying all such books: any errors in the book
are strictly mine. Oh, and the other statement: any views expressed in this book (and
there are many) are entirely mine. Enjoy!
Part I
Fundamentals
Analysis of data requires the use of a broad range of techniques spanning a wide range of
complexities. Common to most of these techniques is a fundamental foundation con-
sisting of a nomenclature (not always consistent from one author to the next) as well as
a set of mathematical and statistical tools. The purpose of Part I is to define that
nomenclature and those basic tools.
For this Part, the order in which the observations occur is not important. This is in
contrast to sequential data, which are generically referred to as time series (whether
they vary with time or not), the subject of Part II. For the latter, the order in which the
observations occur is important.
Part I is dominated by techniques of classical statistics, such as regression analysis,
though some newer techniques such as nonparametric and resampling (bootstrap)
statistics represent valuable additions to that traditional arsenal. While the tools of
statistics are extensive and span a broad range of approaches, the concepts of
Expectation and Maximum Likelihood Estimators are particularly useful in data analysis
and will be stressed throughout. Because resampling statistics offers a significant
increase in our processing capabilities, it too will be presented for general analyses.
1 The Nature of Data and Analysis
1.1 Analysis
The Random House College Dictionary defines analysis as “the separation of any material
or abstract entity into its constituent elements.” For our analysis to be meaningful, it is
implicit that the data being analyzed contain some “signal” representing the phenomenon
of interest (or some aspect of it). Satisfying this, we might attempt to separate the signal
from the noise present in the data. Then we can characterize the signal in terms of its
robust features and, in the case of complex phenomena, separate the signal even farther
into constituents, each of which may afford additional insights regarding the character,
behavior or makeup of multiple processes contributing to our single phenomenon.
Mathematically, we often desire to rewrite a data set, yi, as
where the constituents of the data are now described by functions or vectors, φki. These
constituents (or some subset) with the appropriate weights ak can then be recombined to
synthesize (reconstruct) the original data yi (or just the signal portion of it). Typically, the
fewer constituent terms you can use to describe the greatest amount of the data is better
(presuming that the signal is contained in a few, hopefully understandable, constituents).
Equation (1.1) represents a simple linear foundation from which a large number of the
techniques and analysis tools developed in this text will build on.
Caution: Most analysis techniques will produce something satisfying that technique’s
definition of signal, even when performed on pure noise, so be aware that the analysis
result may actually be nothing more than a statistical construct. A proper interpretation
of your analysis is possible when multiple pieces of evidence support or refute
a hypothesized answer to the question being addressed.
Multivariate data are those in which there are multiple dependent variables varying
as a function of a single independent variable. Or, if they are not time series or sequential
data, then they simply represent a data set that includes two or more dependent variables.
An additional, slightly more restrictive criterion is added to this term as used in
probability and statistical applications (Chapter 2).
Real versus Complex Data. “Real” data are what we deal with in the real world, but
treating them mathematically as complex numbers (described in more detail in later
chapters) affords us the ability to conveniently consider rotation (or phase) of a quantity,
as well as offering several additional mathematical conveniences. Therefore, real data
will sometimes be organized as if they are complex quantities. Contrary to the name,
complex quantities are often considerably easier to deal with than real ones.
Data are further classified according to statistical considerations (e.g., samples, realiza-
tions, etc.). These are presented in the next chapter (Probability Theory). Since the analyses
presented in this text involve computer manipulation, all data are considered to be discrete.
It is helpful to use the most convenient and standard form to represent data. This involves
the concept of vectors and matrices. A more detailed summary of the matrix techniques
utilized in this text is presented in Appendix 1. Here, only the concept of how one stores
discrete data and mathematical functions in vectors is presented.
Typically, data are stored in tables (matrices). For example, if one measures the
temperature at noontime on each of m days, at n different locations, the data are stored
in a table as shown in Table 1.1:
Table 1.1
Alternatively, you can store temperatures in columns and different locations in rows
(preferred) as shown in Table 1.2:
Table 1.2
Which of the two storage schemes you use is a matter of personal taste, but you must
pay close attention to the storage form when performing the mathematical manipulations
so that the appropriate values are being manipulated as required. For consistency between
matrix and vector operations, the form of Table 1.2 is the form used throughout this text.
Initially, however, we will deal predominantly with single-column vectors of data.
In the above examples, this is equivalent to having the temperature measured each
noontime at one site only (e.g., the first column of Table 1.1, or the first row in Table 1.2),
or the noontime temperature of one particular day at n different sites (e.g., the first
column of Table 1.2).
Data storage in organized rows and columns is precisely the method used in a matrix.
Indeed, each column of numbers represents a column vector. For simplicity, we will
assume all vectors are column vectors and thus will drop the descriptor “column” (row
vectors are indicated as the transpose of a column vector).
In addition to storing discrete observations or data values in table (matrix) form, the
storage of mathematical functions, conveniently expressed as formulas when dealing
with continuous data, must be presented at discrete values to represent them in vectors,
hence the indexing of the constituent terms, φkj in equation (1.1), where the k represents
which constituent term (function or vector), and j, the jth discrete value of the term.
1.4.1 Domain
Domain represents the spread or extent of the independent variable over which the
quantity being measured varies. It is usually given as the maximum value of the
independent variable minus the minimum value. Because no phenomenon is observed
over all time or over all space, data have a limited domain (though the domain may be
complete relative to the phenomenon of interest, e.g., the finite Earth’s surface).
1.4.2 Range
Range represents the spread or extent over which the dependent variable (i.e., the
quantity being measured, possibly as a function of time or space) can take on values.
You will typically present range as the maximum value of the dependent variable minus
the minimum value. No measuring technique can record (or transmit) values that are
arbitrarily large or small. The lower limit on very small quantities is often set by the
noise level of the measuring instrument.
Dynamic Range (DR) is the actual range over which dependent variables are
measured or reproduced. Often this is less than the true range of the variable. You
present dynamic range on a logarithmic scale in decibels (dB):1
1
You can use some other form to express this if you are uncomfortable with decibels; just make it clear what
your form is.
1.4 Data Limits 7
largest power
DR ¼ 10 log10 : ð1:2Þ
smallest ðnonzeroÞ power
or
jlargest valuej
DR ¼ 20 log10 ð1:3Þ
jsmallest ðnonzeroÞ valuej
The first formula (1.2) is used if the data represent a measure of power (a squared
quantity such as variance or square of the signal amplitude). Otherwise the second
formula (1.3) is used. Use the smallest nonzero value for measurement devices that
report zero values.
Since power is a quantity squared, the first formula (1.2) is related to the second (1.3) by
" 2 #
jlargest valuej
DR ¼ 10 log10 : ð1:4Þ
jsmallest ðnonzeroÞ valuej
Therefore, the two formulas yield the same answer, given the appropriate input. This is
especially useful for instruments that return measurements proportional to variance or
power.
An increment of 10 in DR equates to a factor of 10 in Rp (the power ratio).
so
Rp ¼ 10ðDR=10Þ : ð1:6Þ
and
jyðxÞj < M; ð1:8Þ
where XS, XL and M are finite constants. Such functions are manageable and can always
be integrated. This seemingly esoteric fact proves extremely useful in the practical
analysis of data.
1.4.3 Frequency
Most methods of data measurement cannot respond instantly to sudden change.
The resulting data are thus said to be band limited. That is, they will not contain
frequency information higher than that representing the fastest response of the recording
8 The Nature of Data and Analysis
device, this is an invaluable constraint for some important analysis techniques, though it
can be severely limiting regarding the study of certain high-frequency (rapidly varying)
phenomena.
Data are never perfect. Errors can enter data through experimental design, measurement
and collection techniques, assumptions concerning the nature and fidelity of the data,
discretization and computational or analysis procedures. In analyzing and interpreting
data it is important to attempt to estimate all of the potential errors (either quantitatively
or qualitatively). That is, it is important to estimate the uncertainty contained in the
measurements and their influence on your interpretation. Too often, errors related to one
easily determined component are presented while others are completely ignored. You
needn’t be fanatic in your attempt to estimate the uncertainty – scientific progress
needn’t be held hostage to unreasonable quantification – rather, it is important to
make an honest assessment to the best of your ability to estimate the uncertainties
associated with your measurements. If you can’t formally estimate an error, simply
say so, then make your best educated guess or give a range for the error.
For example, if a substance was repeatedly weighed 100 times, giving a mean
weight of 100 kg but with a scatter about this mean of 0.1 kg, then 0.1 kg would
represent the precision of the measurement.
2) Accuracy specifies how well a specific measurement actually represents the true
value of the measured quantity (often considered in terms of, say, a long-term
instrument drift). In statistical terms, accuracy is often reported in terms of bias.
For example, if a scale repeatedly returns a weight of ~100.0 kg for a substance, but
its true weight is 105.3 kg – the mismatch between the measured value and true
value reflects the bias of the measurement. So the scale is good to an accuracy of just
over 5 kg, or the scale has a bias of ~5 kg.
3) Resolution specifies the size of a discrete measurement interval of the recording
instrument used in the discretization process. In other words, it indicates how well the
instrument (or digitized data) can resolve changes in the quantity being measured.
For example, if a thermometer only registers changes in temperature of 0.01° C, then
it cannot distinguish changes in temperature smaller than this resolution. One would
achieve the same resolution if, in the process of digitizing higher-resolution data, all
values were rounded off to the nearest 0.01° C.
4) Response time specifies how quickly an instrument can respond to a change in the
quantity being measured. This will limit the bandwidth (range of frequencies) of
a measured time series (discussed in more detail latter).
In general, accuracy reflects the degree of systematic errors, while precision reflects
the degree of random errors. Statistics are well designed to treat the latter (precision),
whereas they are not generally designed to address the former (accuracy). Accuracy
must be estimated, using whatever means are practical and reasonable, by the person
who understands the instrument.
In this respect, precision errors may well be attributable to a combination of both instru-
ment and experimental error. This combination is responsible for the observed random scatter
in replicate measurements, which is often referred to as noise in data. Noise can also represent
any portion of the data that does not conform with preconceived ideas concerning the nature
of the data – recall the expression that “one person’s noise is another person’s signal.”
One of the goals in data analysis is to detect signal in noise or reduce the degree of noise
contamination. Noise is sometimes classified according to its contribution relative to some
more-stable (or non-fluctuating) component of the observations, referred to as the signal.
Signal-to-Noise Ratio (SN) is the common measure for comparing the ratio of signal
to noise in a data set. As with dynamic range, you give this ratio on a logarithmic scale in
decibels (dB):
power of signal
SN ¼ 10 log10 ð1:9Þ
power of noise
or
jamplitude of signalj
SN ¼ 20 log10 : ð1:10Þ
jamplitude of noisej
Exactly how one determines the values to insert in the above formulas depends on the
data and how they were collected. In some cases, it is appropriate to use the mean value
of the data (or measured range, for time-series data) as the amplitude of signal and the
(known) instrument error (or precision) as the amplitude of the noise. With time series,
the noise may be alternatively estimated by computing the measured scatter in a series of
replicate time-series measurements of the same quantity (under the same sampling
conditions). The amplitude of signal might then be regarded as the observed range in
the average of the replicate time series.
The signal-to-noise ratio is also given as the ratio of the variance (a power) of the
signal to the variance of the noise, or it can be written in any other manner that essentially
provides some ratio between the variance of the signal and that of the noise.
If one uses the mean of the data as the signal and the standard deviation as the noise,
then it is often convenient to present this form of SN (actually, SN−1) as a coefficient of
variation:2
2
It is common to present many statistical quantities (“moments”) as coefficients of the moment. For moment
μk, a coefficient of the moment is given as μk/μk−1 (as is the case for V here). This will make more sense after
the discussion of moments in Chapter 2, “Probability Theory.”
1.5 Data Errors 11
Language: French
CHRONIQUES
DE
J. FROISSART
9924—PARIS, TYPOGRAPHIE LAHURE
Rue de Fleurus, 9
CHRONIQUES
DE
J. FROISSART
PUBLIÉES POUR LA SOCIÉTÉ DE L’HISTOIRE DE FRANCE
P A R SIMÉO N LU CE
TOME SEPTIÈME
1367-1370
A PARIS
CH EZ M ME
V E JUL ES R ENOUARD
( H . LO O N E S, S U C C E SS E U R )
L I B R A I R E D E LA S O C I É T É D E L’ H I S TO I R E D E F R A N C E
RUE DE TOURNON, Nº 6
M DCCC LXXVIII
EXTRAIT DU RÈGLEMENT.
France,
J. DESNOYERS.
SOMMAIRE.
CHAPITRE XCI.
CHAPITRE XCII.
CHAPITRE XCIII.
CHAPITRE XCIV.
CHAPITRE XCV.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com