Probability theory and examples 4ed Edition Durrett R. download
Probability theory and examples 4ed Edition Durrett R. download
https://ebookgate.com/product/probability-theory-and-
examples-4ed-edition-durrett-r/
https://ebookgate.com/product/probability-and-statistics-4ed-edition-
degroot-m/
ebookgate.com
https://ebookgate.com/product/number-theory-structures-examples-and-
problems-1st-edition-dorin-andrica/
ebookgate.com
https://ebookgate.com/product/an-introduction-to-dynamic-
meteorology-4ed-edition-james-r-holton-eds/
ebookgate.com
https://ebookgate.com/product/probability-concepts-and-theory-for-
engineers-1st-edition-harry-schwarzlander/
ebookgate.com
Real Analysis and Probability 2nd edition Edition Dudley
R. M.
https://ebookgate.com/product/real-analysis-and-probability-2nd-
edition-edition-dudley-r-m/
ebookgate.com
https://ebookgate.com/product/reinforced-concrete-design-to-eurocodes-
design-theory-and-examples-4th-edition-bhatt/
ebookgate.com
https://ebookgate.com/product/probability-and-statistics-with-r-1st-
edition-maria-dolores-ugarte/
ebookgate.com
https://ebookgate.com/product/lectures-on-probability-theory-and-
mathematical-statistics-2nd-edition-marco-taboga/
ebookgate.com
This page intentionally left blank
Probability
Fourth Edition
Editorial Board:
Z. Ghahramani, Department of Engineering, University of Cambridge
R. Gill, Department of Mathematics, Utrecht University
F. Kelly, Statistics Laboratory, University of Cambridge
B. D. Ripley, Department of Statistics, University of Oxford
S. Ross, Department of Industrial & Systems Engineering, University of
Southern California
M. Stein, Department of Statistics, University of Chicago
Already Published
RICK DURRETT
Department of Mathematics, Duke University
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo, Mexico City
C Rick Durrett 1991, 1995, 2004, 2010
A catalog record for this publication is available from the British Library.
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not
guarantee that any content on such Web sites is, or will remain, accurate or appropriate.
Contents
Preface page ix
1 Measure Theory 1
1.1 Probability Spaces 1
1.2 Distributions 9
1.3 Random Variables 14
1.4 Integration 17
1.5 Properties of the Integral 23
1.6 Expected Value 27
1.6.1 Inequalities 27
1.6.2 Integration to the Limit 29
1.6.3 Computing Expected Values 30
1.7 Product Measures, Fubini’s Theorem 36
2 Laws of Large Numbers 41
2.1 Independence 41
2.1.1 Sufficient Conditions for Independence 43
2.1.2 Independence, Distribution, and Expectation 45
2.1.3 Sums of Independent Random Variables 47
2.1.4 Constructing Independent Random Variables 50
2.2 Weak Laws of Large Numbers 53
2.2.1 L2 Weak Laws 53
2.2.2 Triangular Arrays 56
2.2.3 Truncation 59
2.3 Borel-Cantelli Lemmas 64
2.4 Strong Law of Large Numbers 73
2.5 Convergence of Random Series* 78
2.5.1 Rates of Convergence 82
2.5.2 Infinite Mean 84
2.6 Large Deviations* 86
3 Central Limit Theorems 94
3.1 The De Moivre-Laplace Theorem 94
3.2 Weak Convergence 97
3.2.1 Examples 97
3.2.2 Theory 100
v
vi Contents
References 419
Index 425
Preface
In 1989 when the first edition of this book was completed, my sons David and
Greg were 3 and 1, and the cover picture showed the Dow Jones at 2650. The past
20 years have brought many changes, but the song remains the same. The title
of the book indicates that as we develop the theory, we will focus our attention
on examples. Hoping that the book would be a useful reference for people who
apply probability in their work, we have tried to emphasize the results that are
important for applications, and have illustrated their use with roughly 200 examples.
Probability is not a spectator sport, so the book contains almost 450 exercises to
challenge readers and to deepen their understanding.
This fourth edition has two major changes (in addition to a new publisher):
(i) The book has been converted from TeX to LaTeX. The systematic use of labels
should eventually eliminate problems with references to other points in the
text. In addition, the picture environment and graphicx package has allowed
the figures lost from the third edition to be reintroduced and a number of new
ones to be added.
(ii) Four sections of the old appendix have been combined with the first three
sections of Chapter 1 to make a new first chapter on measure theory, which
should allow the book to be used by people who do not have this background
without making the text tedious for those who have.
Acknowledgments. I am always grateful to the many people who sent me com-
ments and typos. Helping to correct the first edition were David Aldous, Ken
Alexander, Daren Cline, Ted Cox, Robert Dalang, Joe Glover, David Griffeath, Phil
Griffin, Joe Horowitz, Olav Kallenberg, Jim Kuelbs, Robin Pemantle, Yuval Peres,
Ken Ross, Steve Samuels, Byron Schmuland, Jon Wellner, and Ruth Williams.
The third edition benefited from input from Manel Baucells, Eric Blair, Zhen-
Qing Chen, Finn Christensen, Ted Cox, Bradford Crain, Winston Crandall, Amir
Dembo, Neil Falkner, Changyong Feng, Brighten Godfrey, Boris Granovsky, Jan
Hannig, Andrew Hayen, Martin Hildebrand, Kyoungmun Jang, Anatole Joffe,
Daniel Kifer, Steve Krone, Greg Lawler, T. Y. Lee, Shlomo Levental, Torgny Lind-
vall, Arif Mardin, Carl Mueller, Robin Pemantle, Yuval Peres, Mark Pinsky, Ross
Pinsky, Boris Pittel, David Pokorny, Vinayak Prabhu, Brett Presnell, Jim Propp,
ix
x Preface
Yossi Schwarzfuchs, Rami Shakarchi, Lian Shen, Marc Shivers, Rich Sowers, Bob
Strain, Tsachy Weissman, and Hao Zhang.
New helpers for the fourth edition include John Angus, Phillipe Charmony, Adam
Cruz, Ricky Der, Justin Dyer, Piet Groeneboom, Vlad Island, Elena Kosygina,
Richard Laugesen, Sungchul Lee, Shlomo Levental, Ping Li, Fredddy López, Lutz
Mattner, Piotr Milos, Davey Owen, Brett Presnell, Igal Sason, Alex Smith, Laurent
Tournier, Harsha Wabgaonkar, John Walsh, Tsachy Weissman, Neil Wu, Ofer
Zeitouni, Martin Zerner, and Andrei Zherebtsov. I apologize to those whose names
have been omitted or are new typos.
Family update. David graduated from Ithaca College in May 2009 with a degree
in print journalism, and like many of his peers is struggling to find work. Greg has
one semester to go at MIT and is applying to graduate schools in computer science.
He says he wants to do research in “machine learning,” so perhaps he can write a
program to find and correct the typos in my books.
After 25 years in Ithaca, we moved to Durham in June 2010 and I have taken a
position in the math department at Duke. Everyone seems to focus on the fact that
we are trading very cold winters for hotter summers and a much longer growing
season, but the real attraction is the excellent opportunities for interdisciplinary
research in the Research Triangle.
The more things change, the more they stay the same: inevitably there will be
typos in the new version. You can email me at rtd@math.duke.edu
In this chapter, we recall some definitions and results from measure theory. Our
purpose here is to provide an introduction for readers who have not seen these
concepts before and to review that material for those who have. Harder proofs,
especially those that do not contribute much to one’s intuition, are hidden away
in the Appendix. Readers with a solid background in measure theory can skip
Sections 1.4, 1.5, and 1.7, which were previously part of the Appendix.
Here and in what follows, countable means finite or countably infinite. Since
∩i Ai = (∪i Aci )c , it follows that a σ -field is closed under countable intersections.
We omit the last property from the definition to make it easier to check.
Without P , (, F) is called a measurable space, that is, it is a space on which
we can put a measure. A measure is a nonnegative countably additive set function;
that is, a function µ : F → R with
1
2 Measure Theory
The next result gives some consequences of the definition of a measure that we
will need later. In all cases, we assume that the sets we mention are in F.
Proof.
(i) Let B − A = B ∩ Ac be the difference of the two sets. Using + to denote
disjoint union, B = A + (B − A) so
µ(B) = µ(A) + µ(B − A) ≥ µ(A).
(ii) Let An = An ∩ A, B1 = A1 and for n > 1, Bn = An − ∪n−1 c
m=1 (Am ) . Since the
Bn are disjoint and have union A, we have, using (i) of the definition of
measure, Bm ⊂ Am , and (i) of this theorem,
∞
∞
µ(A) = µ(Bm ) ≤ µ(Am )
m=1 m=1
∞
n
µ(A) = µ(Bm ) = lim µ(Bm ) = lim µ(An )
n→∞ n→∞
m=1 m=1
Example 1.1.1. Discrete probability spaces. Let = a countable set, that is,
finite or countably infinite. Let F = the set of all subsets of . Let
P (A) = p(ω) where p(ω) ≥ 0 and p(ω) = 1
ω∈A ω∈
A little thought reveals that this is the most general probability measure on this
space. In many cases when is a finite set, we have p(ω) = 1/|| where || =
the number of points in .
For a simple concrete example that requires this level of generality, consider the
astragali, dice used in ancient Egypt made from the ankle bones of sheep. This die
1.1 Probability Spaces 3
could come to rest on the top side of the bone for four points or on the bottom for
three points. The side of the bone was slightly rounded. The die could come to rest
on a flat and narrow piece for six points or somewhere on the rest of the side for
one point. There is no reason to think that all four outcomes are equally likely, so
we need probabilities p1 , p3 , p4 , and p6 to describe P .
Let Rd be the set of vectors (x1 , . . . xd ) of real numbers and Rd be the Borel sets,
the smallest σ -field containing the open sets. When d = 1, we drop the superscript.
Example 1.1.2. Measures on the real line. Measures on (R, R) are defined by
giving probability a Stieltjes measure function with the following properties:
(i) F is nondecreasing.
(ii) F is right continuous, that is, limy↓x F (y) = F (x).
Theorem 1.1.2. Associated with each Stieltjes measure function F there is a unique
measure µ on (R, R) with µ(a, b]) = F (b) − F (a)
µ((a, b]) = F (b) − F (a) (1.1.1)
Example 1.1.3. Sd = the empty set plus all sets of the form
(a1 , b1 ] × · · · × (ad , bd ] ⊂ Rd where − ∞ ≤ ai < bi ≤ ∞
Example 1.1.5. Let = R and S = S1 . Then S̄1 = the empty set plus all sets of
the form
In (ii) above, and in what follows, i ≥ 1 indicates a countable union, while a plain
subscript i or j indicates a finite union. The proof of Theorems 1.1.4 is rather
involved, so it is given in Section A.1. To check condition (ii) in the theorem, the
following is useful.
∪i Bi = F1 + · · · + Fn
A = A ∩ (∪i Bi ) = (A ∩ F1 ) + · · · + (A ∩ Fn )
and µ((a, b]) = F (b) − F (a) makes sense for all −∞ ≤ a < b ≤ ∞ since
F (∞) > −∞ and F (−∞) < ∞.
If (a, b] = +ni=1 (ai , bi ] then after relabeling the intervals we must have a1 = a,
bn = b, and ai = bi−1 for 2 ≤ i ≤ n, so condition (i) in Theorem 1.1.4 holds. To
check (ii), suppose first that −∞ < a < b < ∞, and (a, b] ⊂ ∪i≥1 (ai , bi ] where
(without loss of generality) −∞ < ai < bi < ∞. Pick δ > 0 so that F (a + δ) <
F (a) + and pick ηi so that
J ∞
F (b) − F (a + δ) ≤ F (βj ) − F (αj ) ≤ (F (bi + ηi ) − F (ai ))
j =1 i=1
and since is arbitrary, we have proved the result in the case −∞ < a < b < ∞. To
remove the last restriction, observe that if (a, b] ⊂ ∪i (ai , bi ] and (A, B] ⊂ (a, b]
has −∞ < A < B < ∞, then we have
∞
F (B) − F (A) ≤ (F (bi ) − F (ai ))
i=1
Since the last result holds for any finite (A, B] ⊂ (a, b], the desired result
follows.
Measures on Rd
Our next goal is to prove a version of Theorem 1.1.2 for Rd . The first step is to
introduce the assumptions on the defining function F . By analogy with the case
d = 1 it is natural to assume:
(i) It is nondecreasing, that is, if x ≤ y (meaning xi ≤ yi for all i), then F (x) ≤
F (y).
(ii) F is right continuous, that is, limy↓x F (y) = F (x) (here y ↓ x means each
yi ↓ xi ).
However this time it is not enough. Consider the following F :
⎧
⎪
⎪ 1 if x1 , x2 ≥ 1
⎪
⎪
⎨2/3 if x ≥ 1 and 0 ≤ x < 1
1 2
F (x1 , x2 ) =
⎪
⎪ 2/3 if x2 ≥ 1 and 0 ≤ x2 < 1
⎪
⎪
⎩0 otherwise
See Figure 1.1 for a picture. A little thought shows that
µ((a1 , b1 ] × (a2 , b2 ]) = µ((−∞, b1 ] × (−∞, b2 ]) − µ((−∞, a1 ] × (−∞, b2 ])
− µ((−∞, b1 ] × (−∞, a2 ]) + µ((−∞, a1 ] × (−∞, a2 ])
= F (b1 , b2 ) − F (a1 , b2 ) − F (b1 , a2 ) + F (a1 , a2 )
Using this with a1 = a2 = 1 − and b1 = b2 = 1 and letting → 0, we see that
µ({1, 1}) = 1 − 2/3 − 2/3 + 0 = −1/3
1.1 Probability Spaces 7
0 2/3 1
0 0 2/3
0 0 0
Theorem 1.1.6. Suppose F : Rd → [0, 1] satisfies (i)–(iii) given above. Then there
is a unique probability measure µ on (Rd , Rd ) so that µ(A) = A F for all finite
rectangles.
d
Example 1.1.6. Suppose F (x) = i=1 Fi (x), where the Fi satisfy (i) and (ii) of
Theorem 1.1.2. In this case,
d
Proof. We let µ(A) = A F for all finite rectangles and then use monotonicity
to extend the definition to Sd . To check (i) of Theorem 1.1.4, call A = +k Bk a
regular subdivision of A if there are sequences ai = αi,0 < αi,1 . . . < αi,ni = bi
so that each rectangle Bk has the form
(α1,j1 −1 , α1,j1 ] × · · · × (αd,jd −1 , αd,jd ] where 1 ≤ ji ≤ ni
8 Measure Theory
It is easy to see that for regular subdivisions λ(A) = k λ(Bk ). (First consider the
case in which all the endpoints are finite, and then take limits to get the general
case.) To extend this result to a general finite subdivision A = +j Aj , subdivide
further to get a regular one see Figure 1.2.
The proof of (ii) is almost identical to that in Theorem 1.1.2. To make things
easier to write and to bring out the analogies with Theorem 1.1.2, we let
for x, y ∈ Rd . Suppose first that −∞ < a < b < ∞, where the inequalities mean
that each component is finite, and suppose (a, b] ⊂ ∪i≥1 (a i , bi ], where (without
loss of generality) −∞ < a i < bi < ∞. Let 1̄ = (1, . . . , 1), pick δ > 0 so that
The open rectangles (a i , bi + ηi 1̄) cover [a + δ 1̄, b], so there is a finite subcover
(α j , β j ), 1 ≤ j ≤ J . Since (a + δ 1̄, b] ⊂ ∪Jj=1 (α j , β j ], (b) in Lemma 1.1.5 implies
J ∞
µ([a + δ 1̄, b]) ≤ µ((α j , β j ]) ≤ µ((a i , bi + ηi 1̄])
j =1 i=1
and since is arbitrary, we have proved the result in the case −∞ < a < b < ∞.
The proof can now be completed exactly as before.
1.2 Distributions 9
Exercises
1.1.2. Let = R, F = all subsets so that A or Ac is countable, P (A) = 0 in the
first case and = 1 in the second. Show that (, F, P ) is a probability space.
1.1.3. Recall the definition of Sd from Example 1.1.3. Show that σ (Sd ) = Rd , the
Borel subsets of Rd .
Let A be the collection of sets for which the asymptotic density exists. Is A a
σ -algebra? an algebra?
1.2 Distributions
Probability spaces become a little more interesting when we define random vari-
ables on them. A real-valued function X defined on is said to be a random
variable if for every Borel set B ⊂ R we have X −1 (B) = {ω : X(ω) ∈ B} ∈ F.
When we need to emphasize the σ -field, we will say that X is F-measurable or
write X ∈ F. If is a discrete probability space (see Example 1.1.1), then any
function X : → R is a random variable. A second trivial, but useful, type of
example of a random variable is the indicator function of a set A ∈ F:
1 ω∈A
1A (ω) =
0 ω ∈ A
The notation is supposed to remind you that this function is 1 on A. Analysts call
this object the characteristic function of A. In probability, that term is used for
something quite different. (See Section 3.3.)
If X is a random variable, then X induces a probability measure on R called
its distribution by setting µ(A) = P (X ∈ A) for Borel sets A. Using the notation
introduced above, the right-hand side can be written as P (X−1 (A)). In words, we
pull A ∈ R back to X−1 (A) ∈ F and then take P of that set.
To check that µ is a probability measure we observe that if the Ai are disjoint,
then using the definition of µ; the fact that X lands in the union if and only if it
lands in one of the Ai ; the fact that if the sets Ai ∈ R are disjoint then the events
{X ∈ Ai } are disjoint; and the definition of µ again, we have:
µ (∪i Ai ) = P (X ∈ ∪i Ai ) = P (∪i {X ∈ Ai }) = P (X ∈ Ai ) = µ(Ai )
i i
10 Measure Theory
X- A
−1
X (A)
Figure 1.3. Definition of the distribution of X.
Proof. To prove (i), note that if x ≤ y then {X ≤ x} ⊂ {X ≤ y}, and then use (i)
in Theorem 1.1.1 to conclude that P (X ≤ x) ≤ P (X ≤ y).
To prove (ii), we observe that if x ↑ ∞, then {X ≤ x} ↑ , while if x ↓ −∞, then
{X ≤ x} ↓ ∅, and then use (iii) and (iv) of Theorem 1.1.1.
To prove (iii), we observe that if y ↓ x, then {X ≤ y} ↓ {X ≤ x}.
To prove (iv), we observe that if y ↑ x, then {X ≤ y} ↑ {X < x}.
For (v), note P (X = x) = P (X ≤ x) − P (X < x) and use (iii) and (iv).
The next result shows that we have found more than enough properties to char-
acterize distribution functions.
Theorem 1.2.2. If F satisfies (i), (ii), and (iii) in Theroem 1.2.1, then it is the
distribution function of some random variable.
Proof. Let = (0, 1), F = the Borel sets, and P = Lebesgue measure. If ω ∈ (0, 1),
let
X(ω) = sup{y : F (y) < ω}
Once we show that
(∗) {ω : X(ω) ≤ x} = {ω : ω ≤ F (x)}
the desired result follows immediately since P (ω : ω ≤ F (x)) = F (x). (Recall P
is Lebesgue measure.) To check ( ), we observe that if ω ≤ F (x) then X(ω) ≤ x,
since x ∈/ {y : F (y) < ω}. On the other hand if ω > F (x), then since F is right
continuous, there is an > 0 so that F (x + ) < ω and X(ω) ≥ x + > x.
1.2 Distributions 11
y
x
F −1 (x) F −1 (y)
Figure 1.4. Picture of the inverse defined in the proof of Theorem 1.2.2.
Even though F may not be 1-1 and onto, we will call X the inverse of F and
denote it by F −1 . The scheme in the proof of Theorem 1.2.2 is useful in generating
random variables on a computer. Standard algorithms generate random variables
U with a uniform distribution; then one applies the inverse of the distribution func-
tion defined in Theorem 1.2.2 to get a random variable F −1 (U ) with distribution
function F .
If X and Y induce the same distribution µ on (R, R), we say X and Y are equal
in distribution. In view of Theorem 1.1.2, this holds if and only if X and Y have
the same distribution function, that is, P (X ≤ x) = P (Y ≤ x) for all x. When X
and Y have the same distribution, we like to write
d
X=Y
but this is too tall to use in text, so for typographical reasons we will also use
X =d Y .
When the distribution function F (x) = P (X ≤ x) has the form
x
F (x) = f (y) dy (1.2.1)
−∞
0 x≤0
F (x) = −x
1−e x≥0
In this case, there is no closed-form expression for F (x), but we have the following
bounds that are useful for large x:
Example 1.2.4. Uniform distribution on the Cantor set. The Cantor set C is
defined by removing (1/3, 2/3) from [0,1] and then removing the middle third
of each interval that remains. We define an associated distribution function by
setting F (x) = 0 for x ≤ 0, F (x) = 1 for x ≥ 1, F (x) = 1/2 for x ∈ [1/3, 2/3],
F (x) = 1/4 for x ∈ [1/9, 2/9], F (x) = 3/4 for x ∈ [7/9, 8/9], . . . There is no f
for which (1.2.1) holds because such an f would be equal to 0 on a set of measure 1.
From the definition, it is immediate that the corresponding measure has µ(C c ) = 0.
In Section 1.6, we will see the Bernoulli, Poisson, and geometric distributions.
The next example shows that the distribution function associated with a discrete
probability measure can be quite wild.
Exercises
1.2.1. Suppose X and Y are random variables on (, F, P ) and let A ∈ F. Show
that if we let Z(ω) = X(ω) for ω ∈ A and Z(ω) = Y (ω) for ω ∈ Ac , then Z is a
random variable.
1.2.2. Let χ have the standard normal distribution. Use Theorem 1.2.3 to get upper
and lower bounds on P (χ ≥ 4).
1.2.3. Show that a distribution function has at most countably many discontinuities.
1.2.6. Suppose X has a normal distribution. Use the previous exercise to compute
the density of exp(X). (The answer is called the lognormal distribution.)
1.2.7. (i) Suppose X has density function f . Compute the distribution function
of X2 and then differentiate to find its density function. (ii) Work out the answer
when X has a standard normal distribution to find the density of the chi-square
distribution.
14 Measure Theory
It follows from the two equations displayed in the previous proof that if S is a
σ -field, then {{X ∈ B} : B ∈ S} is a σ -field. It is the smallest σ -field on that
makes X a measurable map. It is called the σ -field generated by X and denoted
σ (X). For future reference we note that
σ (X) = {{X ∈ B} : B ∈ S} (1.3.1)
Example 1.3.1. If (S, S) = (R, R), then possible choices of A in Theorem 1.3.1
are {(−∞, x] : x ∈ R} or {(−∞, x) : x ∈ Q} where Q = the rationals.
{(X1 , . . . , Xn ) ∈ A1 × · · · × An } = ∩i {Xi ∈ Ai } ∈ F
Since sets of the form A1 × · · · × An generate Rn , the desired result follows from
Theorem 1.3.1.
Proof. Since the infimum of a sequence is < a if and only if some term is < a (if
all terms are ≥ a, then so is the infimum), we have
A similar argument shows {supn Xn > a} = ∪n {Xn > a} ∈ F. For the last two, we
observe
lim inf Xn = sup inf Xm
n→∞ n m≥n
lim sup Xn = inf sup Xm
n→∞ n m≥n
To complete the proof in the first case, note that Yn = inf m≥n Xm is a random
variable for each n, so supn Yn is as well.
X∞ = lim sup Xn
n→∞
but this random variable may take the value +∞ or −∞. To accommodate this
and some other headaches, we will generalize the definition of random variable.
A function whose domain is a set D ∈ F and whose range is R∗ ≡ [−∞, ∞] is
said to be a random variable if for all B ∈ R∗ we have X −1 (B) = {ω : X(ω) ∈
B} ∈ F. Here R∗ = the Borel subsets of R∗ with R∗ given the usual topology,
that is, the one generated by intervals of the form [−∞, a), (a, b) and (b, ∞]
where a, b ∈ R. The reader should note that the extended real line (R∗ , R∗ ) is a
measurable space, so all the results above generalize immediately.
Exercises
1.3.1. Show that if A generates S, then X −1 (A) ≡ {{X ∈ A} : A ∈ A} generates
σ (X) = {{X ∈ B} : B ∈ S}.
1.3.4. (i) Show that a continuous function from Rd → R is a measurable map from
(Rd , Rd ) to (R, R). (ii) Show that Rd is the smallest σ -field that makes all the
continuous functions measurable.
where the cm are real numbers and Am ∈ F. Show that the class of F measurable
functions is the smallest class containing the simple functions and closed under
pointwise limits.
1.3.8. Use the previous exercise to conclude that Y is measurable with respect to
σ (X) if and only if Y = f (X) where f : R → R is measurable.
1.3.9. To get a constructive proof of the last result, note that {ω : m2−n ≤ Y <
(m + 1)2−n } = {X ∈ Bm,n } for some Bm,n ∈ R and set fn (x) = m2−n for x ∈ Bm,n
and show that as n → ∞ fn (x) → f (x) and Y = f (X).
1.4 Integration
Let µ be a σ -finite measure on (, F). We will be primarily interested in the
special case µ is a probability measure, but we will sometimes need to integrate
with respect to infinite measure, and and it is no harder to develop the results in
general.
In this section we will define f dµ for a class of measurable functions. This
is a four-step procedure:
1. Simple functions
2. Bounded functions
3. Nonnegative functions
4. General functions
This sequence of four steps is also useful in proving integration formulas. See, for
example, the proofs of Theorems 1.6.9 and 1.7.2.
Step 1. ϕ is said to be a simple function if ϕ(ω) = ni=1 ai 1Ai and Ai are disjoint
sets with µ(Ai ) < ∞. If ϕ is a simple function, we let
n
ϕ dµ = ai µ(Ai )
i=1
The representation of ϕ is not unique since we have not supposed that the ai
are distinct. However, it is easy to see that the last definition does not contradict
itself.
We will prove the next three conclusions four times, but before we can state
them for the first time, we need a definition. ϕ ≥ ψ µ-almost everywhere (or
ϕ ≥ ψ µ-a.e.) means µ({ω : ϕ(ω) < ψ(ω)}) = 0. When there is no doubt about
what measure we are referring to, we drop the µ.
Proof. (i) and (ii) are immediate consequences of the definition. To prove (iii),
suppose
m
n
ϕ= ai 1Ai and ψ = bj 1B j
i=1 j =1
To make the supports of the two functions the same, we let A0 = ∪i Bi − ∪i Ai , let
B0 = ∪i Ai − ∪i Bi , and let a0 = b0 = 0. Now
m
n
ϕ+ψ = (ai + bj )1(Ai ∩Bj )
i=0 j =0
m
n
n
m
= ai µ(Ai ∩ Bj ) + bj µ(Ai ∩ Bj )
i=0 j =0 j =0 i=0
m
n
= ai µ(Ai ) + bj µ(Bj ) = ϕ dµ + ψ dµ
i=0 j =0
Step 2. Let E be a set with µ(E) < ∞ and let f be a bounded function that
vanishes on E c . To define the integral of f , we observe that if ϕ, ψ are simple
1.4 Integration 19
ϕ dµ ≤ f dµ ≤ ψ dµ
so we let
Here and for the rest of Step 2, we assume that ϕ and ψ vanish on E c . To justify
the definition, we have to prove that the sup and inf are equal. It follows from (iv)
in Lemma 1.4.2 that
sup ϕ dµ ≤ inf ψ dµ
φ≤f ψ≥f
Lemma 1.4.3. Let E be a set with µ(E) < ∞. If f and g are bounded functions
that vanish on E c then:
(i) If f ≥ 0 a.e. then f dµ ≥ 0.
(ii) For any a ∈ R, af dµ = a f dµ.
(iii) f + g dµ = f dµ + g dµ.
(iv) If g ≤ f a.e. then g dµ ≤ f dµ.
(v) If g = f a.e. then g dµ = f dµ.
(vi) | f dµ| ≤ |f | dµ.
20 Measure Theory
Proof. Since we can take φ ≡ 0, (i) is clear from the definition. To prove (ii), we
observe that if a > 0, then aϕ ≤ af if and only if ϕ ≤ f , so
inf ψ dµ ≤ inf ψ1 + ψ2 dµ
ψ≥f +g ψ1 ≥f,ψ2 ≥g
f + g dµ = inf ψ dµ
ψ≥f +g
≤ inf ψ1 dµ + ψ2 dµ = f dµ + g dµ
ψ1 ≥f,ψ2 ≥g
To prove the other inequality, observe that the last conclusion applied to −f
and −g and (ii) imply
− f + g dµ ≤ − f dµ − g dµ
f dµ ≡ f · 1E dµ
E
The last definition is nice since it is clear that this is well defined. The next result
will help us compute the value of the integral.
Lemma 1.4.4. Let En ↑ have µ(En ) < ∞ and let a ∧ b = min(a, b). Then
f ∧ n dµ ↑ f dµ as n ↑ ∞
En
Proof. It is clear that from (iv) in Lemma 1.4.3 that the left-hand side increases as n
does. Since h = (f ∧ n)1En is a possibility in the sup, each term is smaller than the
1.4 Integration 21
integral on the right. To prove that the limit is f dµ, observe that if 0 ≤ h ≤ f ,
h ≤ M, and µ({x : h(x) > 0}) < ∞, then for n ≥ M using h ≤ M, (iv), and (iii),
f ∧ n dµ ≥ h dµ = h dµ − h dµ
En En Enc
lim inf f ∧ n dµ ≥ h dµ
n→∞ En
which proves the desired result since h is an arbitrary member of the class that
defines the integral of f .
Proof. (i) is trivial from the definition. (ii) is clear, since when a > 0, ah ≤ af if
and only if h ≤ f and we have ah dµ = a h du for h in the defining class. For
(iii), we observe that if f ≥ h and g ≥ k, then f + g ≥ h + k so taking the sup
over h and k in the defining classes for f and g gives
f + g dµ ≥ f dµ + g dµ
(f + g) ∧ n dµ ≤ f ∧ n dµ + g ∧ n dµ
En En En
Letting n → ∞ and using Lemma 1.4.4 gives (iii). As before, (iv) and (v) follow
from (i), (iii), and Lemma 1.4.2.
f dµ = f + dµ − f − dµ
22 Measure Theory
The right-hand side is well defined since f + , f − ≤ |f | and we have (iv) in Lemma
1.4.5. For the final time, we will prove our six properties. To do this, it is useful to
know:
f dµ = f1 dµ − f2 dµ
f1 dµ + f − dµ = f1 + f − dµ = f2 + f + dµ = f2 dµ + f + dµ
Proof. (i) is trivial. (ii) is clear since if a > 0, then (af )+ = a(f + ), and so on. To
prove (iii), observe that f + g = (f + + g + ) − (f − + g − ), so using Lemma 1.4.6
and Lemma 1.4.5,
f + g dµ = f + + g + dµ − f − + g − dµ
= f + dµ + g + dµ − f − dµ − g − dµ
Exercises
1.4.1. Show that if f ≥ 0 and f dµ = 0, then f = 0 a.e.
1.4.2. Let f ≥ 0 and En,m = {x : m/2n ≤ f (x) < (m + 1)/2n }. As n ↑ ∞,
∞
m
µ(En,m ) ↑ f dµ
m=1
2n
1.4.3. Let g be an integrable function on R and > 0. (i) Use the definition of the
integral to conclude there is a simple function ϕ = k bk 1Ak with |g − ϕ| dx <
. (ii) Use Exercise A.2.1 to approximate the Ak by finite unions of intervals to get
a step function
k
q= cj 1(aj −1 ,aj )
j =1
with a0 < a1 < · · · < ak , so that |ϕ − q| < . (iii) Round the corners of q to get
a continuous function r so that |q − r| dx < .
1.4.4. Prove the Riemann-Lebesgue lemma. If g is integrable then
Hint: If g is a step function, this is easy. Now use the previous exercise.
Proof. Let c = f dµ and let (x) = ax + b be a linear function that has (c) =
ϕ(c) and ϕ(x) ≥ (x). To see that such a function exists, recall that convexity
implies
ϕ(c) − ϕ(c − h) ϕ(c + h) − ϕ(c)
lim ≤ lim
h↓0 h h↓0 h
(The limits exist since the sequences are monotone.) If we let a be any number
between the two limits and let (x) = a(x − c) + ϕ(c), then has the desired
24 Measure Theory
properties. With the existence of established, the rest is easy. (iv) in Theorem
1.4.7 implies
ϕ(f ) dµ ≥ (af + b) dµ = a f dµ + b = f dµ = ϕ f dµ
|f g| dµ ≤ f p gq
so the quadratic aθ 2 + bθ + c on the right-hand side has at most one real root.
Recalling the formula for the roots of a quadratic
√
−b ± b2 − 4ac
2a
we see b2 − 4ac ≤ 0, which is the desired result.
Our next goal is to give conditions that guarantee
lim fn dµ = lim fn dµ
n→∞ n→∞
First, we need a definition. We say that fn → f in measure, that is, for any > 0,
µ({x : |fn (x) − f (x)| > }) → 0 as n → ∞. On a space of finite measure, this is
1.5 Properties of the Integral 25
a weaker assumption than fn → f a.e., but the next result is easier to prove in the
greater generality.
Theorem 1.5.3. Bounded convergence theorem. Let E be a set with µ(E) < ∞.
Suppose fn vanishes on E c , |fn (x)| ≤ M, and fn → f in measure. Then
f dµ = lim fn dµ
n→∞
Example 1.5.1. Consider the real line R equipped with the Borel sets R and
Lebesgue measure λ. The functions fn (x) = 1/n on [0, n] and 0 otherwise on
show that the conclusion of Theorem 1.5.3 does not hold when µ(E) = ∞.
Proof. Let > 0, Gn = {x : |fn (x) − f (x)| < } and Bn = E − Gn . Using (iii)
and (vi) from Theorem 1.4.7,
f dµ − fn dµ = (f − fn ) dµ ≤ |f − fn | dµ
= |f − fn | dµ + |f − fn | dµ
Gn Bn
≤ µ(E) + 2Mµ(Bn )
Example 1.5.2. Example 1.5.1 shows that we may have strict inequality in Theorem
1.5.4. The functions fn (x) = n1(0,1/n] (x) on (0,1) equipped with the Borel sets and
Lebesgue measure show that this can happen on a space of finite measure.
lim inf gn dµ ≥ g dµ
n→∞
lim inf gn dµ ≥ gn ∧ m dµ → g ∧ m dµ
n→∞ Em Em
Taking the sup over m and using Theorem 1.4.4 gives the desired result.
fn dµ ↑ f dµ
lim inf fn + g dµ ≥ f + g dµ
n→∞
lim inf fn dµ ≥ f dµ
n→∞
lim sup fn dµ ≤ f dµ
n→∞
Exercises
1.5.1. Let f ∞ = inf{M : µ({x : |f (x)| > M}) = 0}. Prove that
|f g|dµ ≤ f 1 g∞
In this section, we will restate some properties of the integral derived in the last
section in terms of expected value and prove some new ones. To organize things,
we will divide the developments into three subsections.
1.6.1 Inequalities
For probability measures, Theorem 1.5.1 becomes:
28 Measure Theory
5
0
0.5 1 1.5 2 2.5 3 3.5
To recall the direction in which the inequality goes, note that if P (X = x) = λ and
P (X = y) = 1 − λ, then (see Figure 1.6)
Eφ(X) = λϕ(x) + (1 − λ)ϕ(y) ≥ ϕ(λx + (1 − λ)y) = φ(EX)
Two useful special cases are |EX| ≤ E|X| and (EX)2 ≤ E(X 2 ).
To state our next result, we need some notation. If we only integrate over A ⊂ ,
we write
E(X; A) = X dP
A
So taking expected values and using part (c) of Theorem 1.6.1 gives the desired
result.
Remark. Some authors call this result Markov’s inequality and use the name
Chebyshev’s inequality for the special case in which ϕ(x) = x 2 and A = {x :
|x| ≥ a}:
a 2 P (|X| ≥ a) ≤ EX 2 (1.6.1)
The special case of Theorem 1.6.7 in which Y is constant is called the bounded
convergence theorem.
In the developments below, we will need another result on integration to the limit.
Perhaps the most important special case of this result occurs when g(x) = |x|p with
p > 1 and h(x) = x.
where M = sup{|h(x)|/g(x) : |x| ≥ M}. To check the second inequality, note that
when |Y | ≤ M, Ȳ = Y , and we have supposed h(0) = 0. The third inequality
follows from the definition of M .
Taking Y = Xn in (b) and using (iii), it follows that
Remark. To explain the name, write h for X and P ◦ h−1 for µ to get
Proof. We will prove this result by verifying it in four increasingly general special
cases that parallel the way that the integral was defined in Section 1.4. The reader
should note the method employed, since it will be used several times below.
Case 1: Indicator functions. If B ∈ S and f = 1B , then recalling the relevant
definitions shows
n
= cm 1Bm (y) µ(dy) = f (y) µ(dy)
m=1 S S
where [x] = the largest integer ≤ x and a ∧ b = min{a, b}, then the fn are simple
and fn ↑ f , so using the result for simple functions and the monotone convergence
theorem:
= f (y) µ(dy)
S
then the variance of X is defined to be var (X) = E(X − µ)2 . To compute the
variance the following formula is useful:
Here EX2 is the expected value of X 2 . When we want the square of EX, we will
write (EX)2 . Since E(aX + b) = aEX + b by (b) of Theorem 1.6.1, it follows
easily from the definition that
We turn now to concrete examples and leave the calculus in the first two examples
to the reader. (Integrate by parts.)
We will next consider some discrete distributions. The first is very simple, but
will be useful several times below, so we record it here.
1.6 Expected Value 33
EX = p · 1 + (1 − p) · 0 = p
To evaluate the moments of the Poisson random variable, we use a little inspiration
to observe that for k ≥ 1
∞
λj
E(X(X − 1) · · · (X − k + 1)) = j (j − 1) · · · (j − k + 1)e−λ
j =k
j!
∞
λj −k
= λk e−λ = λk
j =k
(j − k)!
where the equalities follow from (i) the fact that j (j − 1) · · · (j − k + 1) = 0 when
j < k, (ii) canceling part of the factorial, and (iii) the fact that Poisson distribution
has total mass 1. Using the last formula, it follows that EX = λ while
Language: English
Third Edition.
Vol. II.
Part I
The Magic Art and the Evolution of Kings
Vol. II
Macmillan and Co., Limited
St. Martin’s Street, London
1917
ebookgate.com