An Introduction to Statistical Computing A Simulation based Approach 1st Edition Jochen Voss - Download the full ebook version right now
An Introduction to Statistical Computing A Simulation based Approach 1st Edition Jochen Voss - Download the full ebook version right now
https://ebookfinal.com/download/program-evaluation-an-introduction-to-
an-evidence-based-approach-6th-edition-david-royse/
https://ebookfinal.com/download/introduction-to-supercritical-fluids-
a-spreadsheet-based-approach-1st-edition-richard-smith/
https://ebookfinal.com/download/location-based-services-1st-edition-
jochen-schiller/
https://ebookfinal.com/download/an-introduction-to-statistical-signal-
processing-1st-edition-robert-m-gray/
An Introduction to Numerical Methods A MATLAB Approach
Third Edition Guenther
https://ebookfinal.com/download/an-introduction-to-numerical-methods-
a-matlab-approach-third-edition-guenther/
https://ebookfinal.com/download/an-invariant-approach-to-statistical-
analysis-of-shapes-1st-edition-subhash-r-lele/
https://ebookfinal.com/download/an-introduction-to-statistical-
concepts-3rd-edition-debbie-l-hahs-vaughn/
https://ebookfinal.com/download/a-comprehensible-guide-to-j1939-first-
edition-voss/
https://ebookfinal.com/download/an-introduction-to-computer-
simulation-methods-applications-to-physical-systems-3rd-edition-
harvey-gould/
An Introduction to Statistical Computing A Simulation
based Approach 1st Edition Jochen Voss Digital Instant
Download
Author(s): Jochen Voss
ISBN(s): 9781118357729, 1118357728
Edition: 1
File Details: PDF, 3.37 MB
Year: 2013
Language: english
An Introduction to
Statistical Computing
WILEY SERIES IN COMPUTATIONAL STATISTICS
Consulting Editors:
Paolo Giudici
University of Pavia, Italy
Geof H. Givens
Colorado State University, USA
Bani K. Mallick
Texas A & M University, USA
Jochen Voss
School of Mathematics, University of Leeds, UK
This edition first published 2014
C 2014 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ,
United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply
for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the
Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of
the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand
names and product names used in this book are trade names, service marks, trademarks or registered
trademarks of their respective owners. The publisher is not associated with any product or vendor
mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not
engaged in rendering professional services and neither the publisher nor the author shall be liable for
damages arising herefrom. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Voss, Jochen.
An introduction to statistical computing : a simulation-based approach / Jochen Voss. – First edition.
pages cm. – (Wiley series in computational statistics)
Includes bibliographical references and index.
ISBN 978-1-118-35772-9 (hardback)
1. Mathematical statistics–Data processing. I. Title.
QA276.4.V66 2013
519.501 13–dc23
2013019321
A catalogue record for this book is available from the British Library.
ISBN: 978-1-118-35772-9
Typeset in 10/12pt Times by Aptara Inc., New Delhi, India
1 2014
Contents
List of algorithms ix
Preface xi
Nomenclature xiii
References 375
Index 379
List of algorithms
Continuous-time models
alg. 6.6 Brownian motion 217
alg. 6.12 Euler–Maruyama scheme 232
alg. 6.15 Milstein scheme 235
alg. 6.26 multilevel Monte Carlo estimates 251
alg. 6.29 Euler–Maruyama scheme for the Heston model 256
Preface
This is a book about exploring random systems using computer simulation and thus,
this book combines two different topic areas which have always fascinated me:
the mathematical theory of probability and the art of programming computers. The
method of using computer simulations to study a system is very different from the
more traditional, purely mathematical approach. On the one hand, computer exper-
iments normally can only provide approximate answers to quantitative questions,
but on the other hand, results can be obtained for a much wider class of systems,
including large and complex systems where a purely theoretical approach becomes
difficult.
In this text we will focus on three different types of questions. The first, easiest
question is about the normal behaviour of the system: what is a typical state of the sys-
tem? Such questions can be easily answered using computer experiments: simulating
a few random samples of the system gives examples of typical behaviour. The second
kind of question is about variability: how large are the random fluctuations? This
type of question can be answered statistically by analysing large samples, generated
using repeated computer simulations. A final, more complicated class of questions is
about exceptional behaviour: how small is the probability of the system behaving in
a specified untypical way? Often, advanced methods are required to answer this third
type of question. The purpose of this book is to explain how such questions can be
answered. My hope is that, after reading this book, the reader will not only be able
to confidently use methods from statistical computing for answering such questions,
but also to adjust existing methods to the requirements of a given problem and, for
use in more complex situations, to develop new specialised variants of the existing
methods.
This text originated as a set of handwritten notes which I used for teaching
the ‘Statistical Computing’ module at the University of Leeds, but now is greatly
extended by the addition of many examples and more advanced topics. The material
we managed to cover in the ‘Statistical Computing’ course during one semester is less
than half of what is now the contents of the book! This book is aimed at postgraduate
students and their lecturers; it can be used both for self-study and as the basis of
taught courses. With the inclusion of many examples and exercises, the text should
also be accessible to interested undergraduate students and to mathematically inclined
researchers from areas outside mathematics.
xii PREFACE
Only very few prerequisites are required for this book. On the mathematical side,
the text assumes that the reader is familiar with basic probability, up to and including
the law of large numbers; Appendix A summarises the required results. As a con-
sequence of the decision to require so little mathematical background, some of the
finer mathematical subtleties are not discussed in this book. Results are presented in a
way which makes them easily accessible to readers with limited mathematical back-
ground, but the statements are given in a form which allows the mathematically more
knowledgeable reader to easily add the required detail on his/her own. (For example,
I often use phrases such as ‘every set A ⊆ Rd ’ where full mathematical rigour would
require us to write ‘every measurable set A ⊆ Rd ’.) On the computational side, basic
programming skills are required to make use of the numerical methods introduced
in this book. While the text is written independent of any specific programming
language, the reader will need to choose a language when implementing methods
from this book on a computer. Possible choices of programming language include
Python, Matlab and C/C++. For my own implementations, provided as part of the
solutions to the exercises in Appendix C, I used the R programming language; a short
introduction to programming with R is provided in Appendix B.
Writing this book has been a big adventure for me. When I started this project,
more than a year ago, my aim was to cover enough material so that I could discuss
the topics of multilevel Monte Carlo and reversible jump Markov Chain Monte Carlo
methods. I estimated that 350 pages would be enough to cover this material but it
quickly transpired that I had been much too optimistic: my estimates for the final
page count kept rising and even after several rounds of throwing out side-topics and
generally tightening the text, the book is still stretching this limit! Nevertheless, the
text now covers most of the originally planned topics, including multilevel Monte
Carlo methods near the very end of the book. Due to my travel during the last year,
parts of this book have been written on a laptop in exciting places. For example, the
initial draft of section 1.5 was written on a coach travelling through the beautiful
island of Kyushu, halfway around the world from where I live! All in all, I greatly
enjoyed writing this book and I hope that the result is useful to the reader.
This book contains an accompanying website. Please visit www.wiley.com/
go/statistical_computing
Jochen Voss
Leeds, March 2013
Nomenclature
For reference, the following list summarises some of the notation used throughout
this book.
The topic of this book is the study of statistical models using computer simulations.
Here we use the term ‘statistical models’ to mean any mathematical models which
include a random component. Our interest in this chapter and the next is in simu-
lation of the random component of these models. The basic building block of such
simulations is the ability to generate random numbers on a computer, and this is the
topic of the present chapter. Later, in Chapter 2, we will see how the methods from
Chapter 1 can be combined to simulate more complicated models.
Generation of random numbers, or more general random objects, on a computer
is complicated by the fact that computer programs are inherently deterministic: while
the output of computer program may look random, it is obtained by executing the
steps of some algorithm and thus is totally predictable. For example the output of a
program computing the decimal digits of the number
π = 3.14159265358979323846264338327950288419716939937510 · · ·
(the ratio between the perimeter and diameter of a circle) looks random at first sight,
but of course π is not random at all! The output can only start with the string of digits
given above and running the program twice will give the same output twice.
We will split the problem of generating random numbers into two distinct sub-
problems: first we will study the problem of generating any randomness at all, con-
centrating on the simple case of generating independent random numbers, uniformly
distributed on the interval [0, 1]. This problem and related concerns will be discussed
in Section 1.1. In the following sections, starting with Section 1.2, we will study the
generation of random numbers from different distributions, using the independent,
uniformly distributed random numbers obtained in the previous step as a basis.
(a) True random numbers are generated using some physical phenomenon which
is random. Generating such numbers requires specialised hardware and can
be expensive and slow. Classical examples of this include tossing a coin or
throwing dice. Modern methods utilise quantum effects, thermal noise in
electric circuits, the timing of radioactive decay, etc.
(b) Pseudo random numbers are generated by computer programs. While these
methods are normally fast and resource effective, a challenge with this
approach is that computer programs are inherently deterministic and therefore
cannot produce ‘truly random’ output.
In the algorithm, ‘mod’ denotes the modulus for integer division, that is the value
n mod m is the remainder of the division of n by m, in the range 0, 1, . . . , m −
1. Thus the sequence generated by algorithm 1.2 consists of integers X n from the
range {0, 1, 2, . . . , m − 1}. The output depends on the parameters m, a, c and on the
seed X 0 . We will see that, if m, a and c are carefully chosen, the resulting sequence
behaves ‘similar’ to a sequence of independent, uniformly distributed random vari-
ables. By choosing different values for the seed X 0 , different sequences of pseudo
random numbers can be obtained.
n 5X n−1 + 1 Xn
1 1 1
2 6 6
3 31 7
4 36 4
5 21 5
6 26 2
7 11 3
8 16 0
9 1 1
10 6 6
While the output of the LCG looks random, from the way it is generated it is
clear that the output has several properties which make it different from truly random
sequences. For example, since each new value of X n is computed from X n−1 , once the
generated series reaches a value X n which has been generated before, the output starts
to repeat. In example 1.3 this happens for X 8 = X 0 and we get X 9 = X 1 , X 10 = X 2
and so on. Since X n can take only m different values, the output of a LCG starts
repeating itself after at most m steps; the generated sequence is eventually periodic.
Sometimes the periodicity of a sequence of pseudo random numbers can cause
problems, but on the other hand, if the period length is longer than the amount of
random numbers we use, periodicity cannot affect our result. For this reason, one
needs to carefully choose the parameters m, a and c in order to achieve a long enough
period. In particular m, since it is an upper bound for the period length, needs to be
chosen large. In practice, typical values of m are on the order of m = 232 ≈ 4 · 109
and a and c are then chosen such that the generator actually achieves the maximally
possible period length of m. A criterion for the choice of m, a and c is given in the
following theorem (Knuth, 1981, Section 3.2.1.2).
4 AN INTRODUCTION TO STATISTICAL COMPUTING
Theorem 1.4 The LCG has period m if and only if the following three conditions
are satisfied:
In the situation of the theorem, the period length does not depend on the seed X 0
and usually this parameter is left to be chosen by the user of the PRNG.
Example 1.5 Let m = 232 , a = 1 103 515 245 and c = 12 345. Since the only
prime factor of m is 2 and c is odd, the values m and c are relatively prime and condition
(a) of the theorem is satisfied. Similarly, condition (b) is satisfied, since a − 1 is
even and thus divisible by 2. Finally, since m is a multiple of 4, we have to check
condition (c) but, since a − 1 = 1 103 515 244 = 275 878 811 · 4, this condition also
holds. Therefore the LCG with these parameters m, a and c has period 232 for every
seed X 0 .
values in a finite set S = {0, 1, . . . , m − 1}, in the long run, for every set A ⊆ S we
should have
# i 1 ≤ i ≤ N , Xi ∈ A #A
≈ , (1.1)
N #S
Example 1.6 Assume that we have a PRNG with m = 1024 possible output values
and that we perform a chi-squared test for the hypothesis
for j = 0, 1, . . . , 15.
If we consider a sample X 1 , X 2 , . . . , X N , the test statistic of the chi-squared test
is computed from the observed numbers of samples in each block, given by
O j = # i 64 j ≤ X i < 64( j + 1) .
E j = N · 64/1024 = N /16
15
(O j − E j )2
Q= .
j=0
Ej
6 AN INTRODUCTION TO STATISTICAL COMPUTING
For large sample size N , and under the hypothesis (1.1), the value Q follows a
χ 2 -distribution with 15 degrees of freedom. Some quantiles of this distribution are:
Thus, for a one-sided test with significance level 1 − α = 95% we would reject the
hypothesis if Q > 24.996. In contrast, for a two-sided test with significance level
1 − α = 95%, we would reject the hypothesis if either Q < 6.262 or Q > 27.488.
We consider two different test cases: first, if X n = n mod 1024 for n =
1, 2, . . . , N = 106 , we find Q = 0.244368. Since the series is very regular, the value
of Q is very low. The one-sided test would accept this sequence as being uniformly
distributed, whereas the two-sided test would reject the sequence.
Secondly, we consider X n = n mod 1020 for n = 1, 2, . . . , N = 106 . Since this
series never takes the values 1021 to 1023, the distribution is wrong and we expect a
large value of Q. Indeed, for this case we get Q = 232.5864 and thus both versions
of the test reject this sequence.
Random number generators used in practice, and even the LCG for large enough
values of m, pass statistical tests for the distribution of the output samples without
problems.
(a) (b)
1.0
1.0
0.8
0.8
0.6
0.6
Xi+1
Xi+1
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Xi Xi
(c) (d)
1.0
1.0
0.8
0.8
0.6
0.6
Xi+1
Xi+1
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Xi Xi
Figure 1.1 Scatter plots to illustrate the correlation between consecutive outputs
X i and X i+1 of different pseudo random number generators. The random number
generators used are the runif function in R (a), the LCG with m = 81, a = 1 and
c = 8 (b), the LCG with m = 1024, a = 401, c = 101 (c) and finally the LCG with
parameters m = 232 , a = 1 664 525, c = 1 013 904 223 (d). Clearly the output in the
second and third example does not behave like a sequence of independent random
variables.
One way to quantify the independence of the output samples of a PRNG is the
following criterion.
Definition 1.7 A periodic sequence (X n )n∈N with values in a finite set S and
period length P is k-dimensionally equidistributed, if every possible subsequence
x = (x1 , . . . , xk ) ∈ S k of length k occurs equally often in the sequence X , that is if
N x = # i 0 ≤ i < P, X i+1 = xi , . . . , X i+k = xk
Language: English
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com