Lecture notes in advanced probability theory

advanced probability and statistics pdf, advanced probability problems and solutions pdf, topics in advanced econometrics probability foundations pdf free download
Prof.EvanBaros Profile Pic
Prof.EvanBaros,United Kingdom,Teacher
Published Date:26-07-2017
Your Website URL(Optional)
Comment
ADVANCED PROBABILITY JAMES NORRIS Contents 0. Review of measure and integration 2 1. Conditional expectation 4 2. Martingales in discrete time 8 3. Applications of martingale theory 15 4. Random processes in continuous time 20 5. Weak convergence 24 6. Large deviations 27 7. Brownian motion 31 8. Poisson random measures 44 9. L evy processes 48 Date: November 19, 2016. 10. Review of measure and integration This review covers brie y some notions which are discussed in detail in my notes on Probability and Measure (from now on PM), Sections 1 to 3. 0.1. Measurable spaces. Let E be a set. A setE of subsets of E is called a -algebra on E if it contains the empty set; and, for all A2E and every sequence (A :n2N) inE, n EnA2E; A 2E: n n2N LetE be a-algebra onE. A pair such as (E;E) is called a measurable space. The elements of E are called measurable sets. A function  :E 0;1 is called a measure on (E;E) if (;) = 0 and, for every sequence (A :n2N) of disjoint sets inE, n X  A = (A ): n n n2N n2N A triple such as (E;E;) is called a measure space. Given a setE which is equipped with a topology, the Borel -algebra on E is the smallest -algebra containing all the open sets. We denote this-algebra byB(E) and call its elements Borel sets. We use this construction most often in the cases where E is the real line R or the extended half-line 0;1. We writeB forB(R). 0 0 0.2. Integration of measurable functions. Given measurable spaces (E;E) and (E;E ) 0 1 0 and a function f :EE , we say that f is measurable if f (A)2E whenever A2E . If we refer to a measurable function f on (E;E) without specifying its range then, by default, 0 0 we take E = R and E = B. By a non-negative measurable function on E we mean any function f : E 0;1 which is measurable when we use the Borel -algebra on 0;1. Note that we allow the value1 for non-negative measurable functions but not for real-valued measurable functions. We denote the set of real-valued measurable functions bymE and the + set of non-negative measurable functions by mE . + Theorem 0.2.1. Let (E;E;) be a measure space. There exists a unique map  : mE 0;1 with the following properties (a)  (1 ) =(A) for all A2E, A + (b)  ( f + g) =  (f) +  (g) for all f;g2mE and all ; 2 0;1), (c)  (f )  (f) as n1 whenever (f : n2 N) is a non-decreasing sequence in n n + mE with pointwise limit f. The map is called the integral with respect to . We will usually simply write  instead of  . We say thatf is a simple function if it is a nite linear combination of indicator functions of measurable sets, with positive coecients. Thusf is a simple function if there existn 0, and 2 (0;1) and A 2E for k = 1;:::;n, such that k k n X f = 1 : k A k k=1 2Note that properties (a) and (b) force the integral of such a simple function f to be n X (f) = (A ): k k k=1 Note also that property (b) implies that (f)(g) whenever fg. + Property (c) is called monotone convergence. Given f 2 mE , we can de ne a non- decreasing sequence of simple functions (f :n2N) by n n n f (x) = (2 b2 f(x)c)n; x2E: n Then f (x)f(x) as n1 for all x2E. So, by monotone convergence, we have n (f) = lim (f ): n n1 We have proved the uniqueness statement in Theorem 0.2.1. For measurable functions f and g, we say that f =g almost everywhere if (fx2E :f(x) =6 g(x)g) = 0: + It is straightforward to see that, for f2mE , we have (f) = 0 if and only if f = 0 almost everywhere. Lemma 0.2.2 (Fatou's lemma). Let (f :n2N) be a sequence of non-negative measurable n functions. Then    lim inff  lim inf(f ): n n n1 n1 The proof is by applying monotone convergence to the non-decreasing sequence of functions (inf f :n2N). mn m Given a (real-valued) measurable function f, we say that f is integrable with respect to  1 1 if (jfj) 1. We write L (E;E;) for the set of such integrable functions, or simply L 1 when the choice of measure space is clear. The integral is extended to L by setting + (f) =(f )(f )  1 1 where f = (f)_ 0. Then L is a vector space and the map  :L R is linear. Theorem 0.2.3 (Dominated convergence). Let (f : n2 N) be a sequence of measurable n functions. Suppose that f (x) converges as n1, with limit f(x), for all x2E. Suppose n further that there exists an integrable function g such thatjfj g for all n. Then f is n n integrable for all n, and so is f, and (f )(f) as n1. n The proof is by applying Fatou's lemma to the two sequences of non-negative measurable functions (gf :n2N). n 30.3. Product measure and Fubini's theorem. Let (E ;E ; ) and (E ;E ; ) be nite 1 1 1 2 2 2 (or- nite) measure spaces. The product -algebra E =E E is the-algebra onE E 1 2 1 2 generated by subsets of the form A A for A 2E and A 2E . 1 2 1 1 2 2 Theorem 0.3.1. There exists a unique measure  =  onE such that, for all A 2E 1 2 1 1 and A 2E , 2 2 (A A ) = (A ) (A ): 1 2 1 1 2 2 Theorem 0.3.2 (Fubini's theorem). Let f be a non-negative E-measurable function on E. For x 2E , de ne a function f on E by f (x ) =f(x ;x ). Then f is E -measurable 1 1 x 2 x 2 1 2 x 2 1 1 1 for all x 2E . Hence, we can de ne a function f on E by f (x ) = (f ). Then f is 1 1 1 1 1 1 2 x 1 1 E -measurable and  (f ) =(f). 1 1 1 By some routine arguments, it is not hard to see that(f) = (f), where =  and 2 1 f is the function onE E given byf(x ;x ) =f(x ;x ). Hence, with obvious notation, it 2 1 2 1 1 2 follows from Fubini's theorem that, for any non-negative E-measurable function f, we have  (f ) = (f ). This is more usually written as 1 1 2 2     Z Z Z Z f(x ;x ) (dx )  (dx ) = f(x ;x ) (dx )  (dx ): 1 2 2 2 1 1 1 2 1 1 2 2 E E E E 1 2 2 1 We refer to PM, Section 3.6 for more discussion, in particular for the case where the assumption of non-negativity is replaced by one of integrability. 1. Conditional expectation We say that ( ;F;P) is a probability space if it is a measure space with the property that P( ) = 1. Let ( ;F;P) be a probability space. The elements ofF are called events andP is called a probability measure. A measurable functionX on ( ;F) is called a random variable. The integral of a random variable X with respect to P is written E(X) and is called the expectation of X. We use almost surely to mean almost everywhere in this context. A probability space gives us a mathematical framework in which to model probabilities of events subject to randomness and average values of random quantities. It often natural also to take a partial average, which may be thought of as integrating out some variables and not others. This is made precise in greatest generality in the notion of conditional expectation. We rst give three motivating examples, then establish the notion in general, and nally discuss some of its properties. 1.1. Discrete case. Let (G :n2N) be sequence of disjoint events, whose union is . Set n G =(G :n2N) =f G :INg: n n2I n For any integrable random variable X, we can de ne X Y = E(XjG )1 n G n n2N where we set E(XjG ) = E(X1 )=P(G ) when P(G ) 0 and set E(XjG ) = 0 when n G n n n n P(G ) = 0. It is easy to check that Y has the following two properties n (a) Y isG-measurable, (b) Y is integrable andE(X1 ) =E(Y 1 ) for all A2G. A A 42 1.2. Gaussian case. Let (W;X) be a Gaussian random variable inR . Set G =(W ) =ffW2Bg :B2Bg: Write Y =aW +b, where a;b2R are chosen to satisfy aE(W ) +b =E(X); a varW = cov(W;X): ThenE(XY ) = 0 and cov(W;XY ) = cov(W;X) cov(W;Y ) = 0 so W and XY are independent. Hence Y satis es (a) Y isG-measurable, (b) Y is integrable andE(X1 ) =E(Y 1 ) for all A2G. A A 1.3. Conditional density functions. Suppose thatU andV are random variables having 2 a joint density function f (u;v) inR . Then U has density function f given by U;V U Z f (u) = f (u;v)dv: U U;V R The conditional density function f (vju) of V given U is de ned by VjU f (vju) =f (u;v)=f (u) VjU U;V U where interpret 0=0 as 0 if necessary. Let h :RR be a Borel function and suppose that X =h(V ) is integrable. Let Z g(u) = h(v)f (vju)dv: VjU R SetG =(U) and Y =g(U). Then Y satis es (a) Y isG-measurable, (b) Y is integrable andE(X1 ) =E(Y 1 ) for all A2G. A A To see (b), note that everyA2G takes the formA =fU2Bg, for some Borel setB. Then, by Fubini's theorem, Z E(X1 ) = h(v)1 (u)f (u;v)dudv A B U;V 2 R   Z Z = h(v)f (vju)dv f (u)1 (u)du =E(Y 1 ): VjU U B A R R 1.4. Existence and uniqueness. We will use in this subsection the Hilbert space structure 2 of the set L of square integrable random variables. See PM, Section 5 for details. Theorem 1.4.1. Let X be an integrable random variable and let G F be a -algebra. Then there exists a random variable Y such that (a) Y is G-measurable, (b) Y is integrable andE(X1 ) =E(Y 1 ) for all A2G. A A 0 0 Moreover, if Y also satis es (a) and (b), then Y =Y almost surely. 5The same statement holds with `integrable' replaced by `non-negative' throughout. We leave this extension as an exercise. We callY (a version of ) the conditional expectation ofX given G and write Y =E(XjG) almost surely: In the case whereG =(G) for some random variable G, we also write Y =E(XjG) almost surely. In the case where X = 1 for some event A, we write Y = P(AjG) almost surely. A The preceding three examples show how to construct explicit versions of the conditional expectation in certain simple cases. In general, we have to live with the indirect approach provided by the theorem. 0 Proof. (Uniqueness.) Suppose that Y satis es (a) and (b) and that Y satis es (a) and 0 0 (b) for another integrable random variable X , with X X almost surely. Consider the 0 0 non-negative random variable Z = (YY )1 , where A =fYYg2G. Then A 0 0 E(Y 1 ) =E(X1 )E(X 1 ) =E(Y 1 )1 A A A A 0 so E(Z) 0 and so Z = 0 almost surely, which implies that Y  Y almost surely. In the 0 0 case X =X , we deduce that Y =Y almost surely. 2 2 (Existence.) Assume for now thatX2L (F). SinceL (G) is complete, it is a closed subspace 2 2 2 of L (F), so X has an orthogonal projection Y on L (G), that is, there exists Y 2 L (G) 2 such that E((XY )Z) = 0 for all Z2 L (G). In particular, for any A2 G, we can take Z = 1 to see thatE(X1 ) =E(Y 1 ). Thus Y satis es (a) and (b). A A A 2 Assume now that X 0. Then X = Xn2 L (F) and 0 X " X as n1. We n n 2 have shown, for each n, that there exists Y 2L (G) such that, for all A2G, n E(X 1 ) =E(Y 1 ) n A n A and moreover that 0Y Y almost surely. De ne n n+1 =f2 : 0Y ()Y () for all ng 0 n n+1 and setY = lim Y 1 . ThenY is a non-negativeG-measurable random variable and, 1 n1 n 1 0 by monotone convergence, for all A2G, E(X1 ) =E(Y 1 ): A 1 A In particular, since X is integrable, we haveE(Y ) =E(X)1 so Y 1 almost surely. 1 1 Set Y =Y 1 . Then Y is a random variable satisfying (a) and (b). 1 fY 1g 1 Finally, for a general integrable random variableX, we can apply the preceding construc- + + + tion to X and X to obtain Y and Y . Then Y =Y Y satis es (a) and (b).  1.5. Properties of conditional expectation. Let X be an integrable random variable and letGF be a -algebra. The following properties follow directly from Theorem 1.4.1 (i) E(E(XjG)) =E(X), (ii) if X is G-measurable, thenE(XjG) =X almost surely, (iii) if X is independent of G, thenE(XjG) =E(X) almost surely. In the proof of Theorem 1.4.1, we showed also (iv) if X 0 almost surely, thenE(XjG) 0 almost surely. 6Next, for ; 2R and any integrable random variable Y , we have (v) E( X + YjG) = E(XjG) + E(YjG) almost surely. To see this, one checks that the right hand side satis es the properties (a) and (b) from Theorem 1.4.1 which characterize the left hand side. The basic convergence theorems for expectation have counterparts for conditional expec- tation. Consider a sequence of random variables X in the limit n1. If 0 X " X n n almost surely, then E(XjG)" Y almost surely, for some G-measurable random variable Y ; n so, by monotone convergence, for all A2G, E(X1 ) = limE(X 1 ) = limE(E(XjG)1 ) =E(Y 1 ); A n A n A A which implies that Y = E(XjG) almost surely. We have proved the conditional monotone convergence theorem: (vi) if 0X "X almost surely, thenE(XjG)"E(XjG) almost surely. n n Next, by essentially the same arguments used for the original results, we can deduce condi- tional forms of Fatou's lemma and the dominated convergence theorem (vii) if X  0 for all n, thenE(lim infXjG) lim infE(XjG) almost surely, n n n (viii) if X X andjXj Y for all n, almost surely, for some integrable random n n variable Y , thenE(XjG)E(XjG) almost surely. n There is a conditional form of Jensen's inequality. Let c : R (1;1 be a convex function. Then c is the supremum of a sequence of ane functions c(x) = sup(a x +b ); x2R: n n n2N Hence,E(c(X)jG) is well de ned and, almost surely, for all n, E(c(X)jG)a E(XjG) +b : n n On taking the supremum over n2N in this inequality, we obtain (ix) if c :R (1;1 is convex, thenE(c(X)jG)c(E(XjG)) almost surely. In particular, for 1p1, p p p p p kE(XjG)k =E(jE(XjG)j )E(E(jXjjG)) =E(jXj ) =kXk : p p So we have (x)kE(XjG)k kXk for all 1p1: p p For any -algebra HG, the random variable Y =E(E(XjG)jH) is H-measurable and satis es, for all A2H E(Y 1 ) =E(E(XjG)1 ) =E(X1 ) A A A so we have the tower property (xi) if HG, thenE(E(XjG)jH) =E(XjH) almost surely. We can always take out what is known (xii) if Y is bounded and G-measurable, thenE(YXjG) =YE(XjG) almost surely. 7To see this, consider rst the case where Y = 1 for some B2G. Then, for A2G, B E(YE(XjG)1 ) =E(E(XjG)1 ) =E(X1 ) =E(YX1 ); A A\B A\B A which implies that E(YXjG) = YE(XjG) almost surely. The result extends to simple G- measurable random variables Y by linearity, then to the case X  0 and any bounded non-negativeG-measurable random variable Y by monotone convergence. The general case + + follows by writing X =X X and Y =Y Y . Finally, (xiii) if (X;G) is independent of H, thenE(Xj(G;H)) =E(XjG) almost surely. For, suppose A2G and B2H, then E(E(Xj(G;H))1 ) =E(X1 ) A\B A\B =E(E(XjG)1 )P(B) =E(E(XjG)1 ): A A\B The set of such intersectionsA\B is a-system generating(G;H), so the desired formula follows from PM, Proposition 3.1.4. 1 Lemma 1.5.1. Let X2L . Then the set of random variables Y of the form Y =E(XjG), where GF is a -algebra, is uniformly integrable. Proof. By PM, Lemma 6.2.1, given " 0, we can nd  0 so that E(jXj1 )  " A wheneverP(A). Then choose 1 so thatE(jXj). Suppose Y =E(XjG), then jYjE(jXjjG). In particular,E(jYj)E(jXj) so 1 P(jYj) E(jYj): Then E(jYj1 )E(jXj1 )": jYj jYj Since  was chosen independently ofG, we are done.  2. Martingales in discrete time 2.1. De nitions. Let ( ;F;P) be a probability space. We assume that ( ;F;P) is equipped with a ltration, that is to say, a sequence (F ) of -algebras such that, for all n 0, n n0 F F F: n n+1 Set F =(F :n 0): 1 n Then F  F. We allow the possibility that F 6= F. We interpret the parameter n as 1 1 time, and the -algebraF as the extent of our knowledge at time n. n By a random process (in discrete time) we mean a sequence of random variables (X ) . n n0 X Each random process X = (X ) has a natural ltration (F ) , given by n n0 n0 n X F =(X ;:::;X ): 0 n n X ThenF models what we know about X by time n. We say that (X ) is adapted if X n n0 n n X is F -measurable for all n 0. It is equivalent to require that F  F for all n. In this n n n section we consider only real-valued or non-negative random processes. We say that (X ) n n0 is integrable if X is an integrable random variable for all n 0. n 8A martingale is an adapted integrable random process (X ) such that, for all n 0, n n0 E(X jF ) =X almost surely: n+1 n n If equality is replaced in this condition by, then we call X a supermartingale. On the other hand, if equality is replaced by, then we call X a submartingale. Note that every process which is a martingale with respect to the given ltration (F ) is also a martingale n n0 with respect to its natural ltration. 2.2. Optional stopping. We say that a random variable T : f0; 1; 2;:::gf1g is a stopping time iffTng2F for all n 0. For a stopping time T , we set n F =fA2F :A\fTng2F for all n 0g: T 1 n It is easy to check that, ifT () =n for all, thenT is a stopping time andF =F . Given T n a process X, we de ne X () =X () whenever T ()1 T T () T and we de ne the stopped process X by T X () =X (); n 0: T ()n n Proposition 2.2.1. Let S and T be stopping times and let X be an adapted process. Then (a) ST is a stopping time, (b) F is a -algebra, T (c) if ST , thenF F , S T (d) X 1 is an F -measurable random variable, T T1 T T (e) X is adapted, T (f) if X is integrable, then X is integrable. Throughout these notes, a `Proposition' indicates a straightforward result whose proof is left as an exercise. Theorem 2.2.2 (Optional stopping theorem). Let X be a supermartingale and let S and T be bounded stopping times with ST . ThenE(X )E(X ). T S Note that X is a submartingale if and only ifX is a supermartingale, and X is a martingale if and only both X andX are supermartingales. So the optional stopping theorem immediately implies a submartingale version withE(X )E(X ) and a martingale T S version with E(X ) =E(X ) =E(X ). We will prove a more comprehensive result on the T 0 S relationship between supermartingales and stopping times. For a direct proof of the optional stopping theorem, you can write out the implication from (a) to (b) below in the case where ST and A = . Theorem 2.2.3. Let X be an adapted integrable process. Then the following are equivalent (a) X is a supermartingale, (b) for all bounded stopping times T and all stopping times S, E(X jF )X almost surely, T S ST 9T (c) for all stopping times T , the stopped process X is a supermartingale, (d) for all bounded stopping times T and all stopping times ST , E(X )E(X ): T S Proof. For S 0 and Tn, we have n X X (2.1) X =X + (X X ) =X + (X X )1 : T ST k+1 k ST k+1 k SkT SkT k=0 Suppose thatX is a supermartingale and thatS andT are stopping times, withTn. Let A2F . Then A\fSkg2F andfT kg2F , so S k k E((X X )1 1 ) 0: k+1 k SkT A Hence, on multiplying (2.1) by 1 and taking expectations, we obtain A E(X 1 )E(X 1 ): T A ST A We have shown that (a) implies (b). It is obvious that (b) implies (c) and (d) and that (c) implies (a). c Letmn andA2F . SetT =m1 +n1 . ThenT is a stopping time andTn. We m A A note that E(X 1 )E(X 1 ) =E(X )E(X ): n A m A n T It follows that (d) implies (a).  2.3. Doob's upcrossing inequality. Let X be a random process and let a;b2 R with a b. Fix 2 . By an upcrossing of a;b by X(), we mean an interval of times fj;j + 1;:::;kg such that X () a and X () b. Write U a;b() for the number of j k n disjoint upcrossings contained inf0; 1;:::;ng and write Ua;b() for the total number of disjoint upcrossings. Then, as n1, we have U a;b"Ua;b: n Theorem 2.3.1 (Doob's upcrossing inequality). Let X be a supermartingale. Then (ba)E(Ua;b) supE((X a) ): n n0 Proof. Set T = 0 and de ne recursively for k 0 0 S = inffmT :X ag; T = inffmS :X bg: k+1 k m k+1 k+1 m Note that, if T 1, thenfS ;S + 1;:::;Tg is an upcrossing of a;b, and indeed T is k k k k k the time of completion of the kth disjoint upcrossing. Note that U a;b n. For m n, n we have fU a;b =mg =fT nT g n m m+1 and, on this event, ( X X ba; if km, T S k k X X = X X X a; if k =m + 1 and S n, T n S n n S n m+1 k k k 0; otherwise. 10Hence, on summing over kn, we obtain n X (X X ) (ba)U a;b (X a) : T n S n n n k k k=1 SinceX is a supermartingale andT n andS n are bounded stopping times withS T , k k k k by optional stopping, E(X )E(X ): T n S n k k Hence, on taking expectations, we obtain  (2.2) (ba)E(U a;b)E (X a) n n and the desired estimate follows by monotone convergence.  2.4. Doob's maximal inequalities. De ne, for a random process X,   X = supjXj; X = supjXj: k n n kn n0 In the next two theorems, we see that the martingale (or submartingale) property allows us e ectively to move the supremum outside the probability or expectation. Theorem 2.4.1 (Doob's maximal inequality). Let X be a martingale or non-negative sub- martingale. Then, for all  0,  P(X ) supE(jXj): n n0 Proof. If X is a martingale, thenjXj is a non-negative submartingale. It therefore suces to consider the case where X is non-negative. Set T = inffk 0 :X gn: k Then T is a stopping time and Tn so, by optional stopping,     E(X )E(X ) =E(X 1 ) +E(X 1 )P(X ) +E(X 1 ): n T T fX g T fX g n fX g n n n n Hence  (2.3) P(X )E(X 1  )E(X ): n fX g n n n     On letting n1, we have X " X , so P(X )P(X ). Hence, from (2.3) we n n obtain  P(X ) supE(X ): n n0 0 0 Finally, for  0, we apply this to  2 0;) and let   for the desired inequality.  p Theorem 2.4.2 (Doob'sL -inequality). Let X be a martingale or non-negative submartin- gale. Then, for all p 1 and q =p=(p 1),  kXk q supkXk : p n p n0 11Proof. If X is a martingale, thenjXj is a non-negative submartingale. So it suces to consider the case whereX is non-negative. Fixk1. By Fubini's theorem, equation (2.3) and H older's inequality, Z Z k k  p p1 p1  E(X k) =E p 1  d = p P(X )d fX g n n n 0 0 Z k p2  p1  p1  p E(X 1  )d =qE(X (X k) )qkXkkX kk : n fX g n n p n n p n 0  HencekX kk qkXk and the result follows by monotone convergence on lettingk1 p n p n and then n1.  2.5. Doob's martingale convergence theorems. We say that a random process X is p L -bounded if supkXk 1: n p n0 We say that X is uniformly integrable if  supE jXj1 0 as 1: n fjX jg n n0 p By H older's inequality, if X is L -bounded for some p 1, then X is uniformly integrable. 1 On the other hand, if X is uniformly integrable, then X is L -bounded. 1 Theorem 2.5.1 (Almost sure martingale convergence theorem). Let X be an L -bounded supermartingale. Then there exists an integrable F -measurable random variable X such 1 1 that X X almost surely as n1. n 1 Proof. Recall that, for a sequence of real numbers (x ) , as n1, either x converges, n n0 n orjxj1, or lim infx lim supx . In the last case, since the rationals are dense, there n n n exist a;b2Q such that lim infx ab lim supx . Set n n \ = \ 0 1 a;b a;b2Q;ab where =flim infjXj1g; =fUa;b1g: 1 n a;b Then X () converges for all 2 . By Fatou's lemma and Doob's upcrossing inequality, n 0 for all ab, E(lim infjXj) lim infEjXj; (ba)E(Ua;b)jaj + supEjXj: n n n n0 1 So, since (X ) is L -bounded, we haveP( ) = 1. De ne n n0 0 X = lim X 1 : 1 n 0 n1 Then X X almost surely, X is F -measurable andjX j lim infjXj so X is n 1 1 1 1 n 1 integrable.  1 Note, in particular, that every non-negative supermartingale is L -bounded and hence, by the theorem, converges almost surely. 121 Theorem 2.5.2 (L martingale convergence theorem). Let (X ) be a uniformly integrable n n0 1 martingale. Then there exists a random variable X 2 L (F ) such that X X as 1 1 n 1 1 n1 almost surely and in L . Moreover, X = E(X jF ) almost surely for all n 0. n 1 n 1 Moreover, we may obtain all L (F ) random variables in this way. 1 Proof. Let (X ) be a uniformly integrable martingale. By the almost sure martingale n n0 1 convergence theorem, there exists X 2 L (F ) such that X X almost surely. Since 1 1 n 1 1 X is uniformly integrable, it follows that X X in L , by PM, Theorems 2.5.1 and n 1 6.2.3. Next, for mn, kX E(X jF )k =kE(X X jF )k kX X k : n 1 n 1 m 1 n 1 m 1 1 Let m1 to deduce X =E(X jF ) almost surely. n 1 n 1 Suppose now thatY2L (F ) and letX be a version ofE(YjF ) for alln. Then (X ) 1 n n n n0 is a martingale by the tower property and is uniformly integrable by Lemma 1.5.1. Hence 1 1 there exists X 2L (F ) such that X X almost surely and in L . For all n 0 and 1 1 n 1 all A2F we have n E(X 1 ) = lim E(X 1 ) =E(Y 1 ): 1 A n A A n1 1 Now X ;Y 2L (F ) and F is a -system generatingF . Hence, by PM, Proposition 1 1 n n 1 3.1.4, X =Y almost surely.  1 This theorem can be seen as setting up a bijection between the set of uniformly integrable 1 martingales and L (F ), given by X 7 X , provided that we identify martingales and 1 1 random variables which agree almost surely. p Theorem 2.5.3 (L martingale convergence theorem). Let p2 (1;1). Let (X ) be n n0 p p an L -bounded martingale. Then there exists a random variable X 2 L (F ) such that 1 1 p X X as n1 almost surely and in L . Moreover, X =E(X jF ) almost surely for n 1 n 1 n p all n 0. Moreover, we may obtain all L (F ) random variables in this way. 1 p Proof. Let (X ) be an L -bounded martingale. By the almost sure martingale conver- n n0 1 gence theorem, there exists X 2 L (F ) such that X X almost surely. By Doob's 1 1 n 1 p L -inequality,  kXk q supkXk 1: p n p n0 p  p SincejX X j  (2X ) for all n, it follows by dominated convergence that X X n 1 n 1 p 1 in L . Then X =E(X jF ) almost surely for all n 0, as in the L case. n 1 n p Suppose now thatY2L (F ) and letX be a version ofE(YjF ) for alln. Then (X ) 1 n n n n0 is a martingale by the tower property and kXk =kE(YjF )k kYk n p n p p p p for all n, so (X ) is L -bounded. Hence there exists X 2L (F ) such that X X n n0 1 1 n 1 p 1 almost surely and inL . Finally, we must haveX =Y almost surely, as in theL case.  1 In the next result, we dispense with the ltration (F ) and suppose given instead a n n0 backward ltration (F ) , that is to say, a sequence of -algebras F such that, for all n n0 n n 0, FF F : n n+1 13 We writeF for the -algebra given by 1 \ F = F : 1 n n0 1 Theorem 2.5.4 (Backward martingale convergence theorem). For all Y 2L (F), we have 1 E(YjF )E(YjF ) as n1, almost surely and in L . n 1 Proof. WriteX =E(YjF ) for alln 0. Fixn 0. By the tower property, (X ) is n n nk 0kn a martingale for the ltration (F ) . For ab, the number U a;b of upcrossings of nk 0kn n a;b by (X ) equals the number of upcrossings of b;a by (X ) . Hence, k 0kn nk 0kn from (2.2), we obtain + (ba)E(U a;b)E((X b) ) n 0 and so, by monotone convergence, + (ba)E(Ua;b)E((X b) )EjYj +jbj1: 0 Also, we have E(lim infjXj) lim infEjXjEjYj1: n n Hence the argument used in the proof of the almost sure martingale convergence theorem applies to show thatP( ) = 1, where 0 =fX converges as n1g: 0 n Set X = 1 lim X : 1 n 0 n1 1 Then X 2L (F ) and X X almost surely. Now (X ) is uniformly integrable by 1 1 n 1 n n0 1 Lemma 1.5.1, so X X also in L . Finally, for all A2F , we have n 1 1 E((X E(YjF ))1 ) = lim E((X Y )1 ) = 0 1 1 A n A n1 and this implies that X =E(YjF ) almost surely.  1 1 Recall that, for a stopping time T and a random process X, X has been de ned only T on the eventfT 1g. Given an almost sure limit X for X, we de ne X = X on 1 T 1 fT =1g. Then the optional stopping theorem extends to all stopping times for uniformly integrable martingales. Theorem 2.5.5. LetX be a uniformly integrable martingale and letT be any stopping time. ThenE(X ) =E(X ). Moreover, for all stopping times S and T , we have T 0 E(X jF ) =X almost surely. T S ST 1 1 Proof. By the L martingale convergence theorem, there exists X 2 L (F ) such that 1 1 1 X X asn1, almost surely and inL , andX =E(X jF ) almost surely, for alln. n 1 n 1 n In particular, we have X X almost surely. Since F F , by Theorem 2.2.3 and Tn T Tn n the tower property, X =E(XjF ) =E(X jF ): Tn n Tn 1 Tn 14By Lemma 1.5.1, the random process (X ) is then uniformly integrable. HenceX Tn n0 Tn 1 1 X in L and so alsoE(X jF )E(X jF ) in L . Now, the optional stopping theorem T Tn S T S and Theorem 2.2.3 apply at the bounded stopping time Tn to show E(X ) =E(X ); E(X jF ) =X almost surely Tn 0 Tn S STn and the claimed identities follow on letting n1.  3. Applications of martingale theory 3.1. Sums of independent random variables. We use martingale arguments to analyse some aspects of the behaviour of the partial sums S =X + +X n 1 n of a sequence (X ) of independent random variables. We will have more to say about n n1 such sums in Theorem 6.1.1 and Theorem 7.10.3 Theorem 3.1.1 (Strong law of large numbers). Let (X ) be a sequence of independent, n n1 identically distributed, integrable random variables. Set  = E(X ). Then S =n  as 1 n 1 n1 almost surely and in L . Proof. De ne for n 1 F =(S :mn); T =(X :mn + 1); T =\ T : n m n m n1 n Then F = (S ;T ) and (F ) is a backward ltration. Since (X ;S ) is independent n n n n n1 1 n of T , we have E(XjF ) =E(XjS ) almost surely for all n. For k n and all Borel sets n 1 n 1 n B, we have E(X 1 ) = E(X 1 ) by symmetry, so E(XjS ) = E(XjS ) almost k fS 2Bg 1 fS 2Bg k n 1 n n n surely. But E(XjS ) + +E(XjS ) =E(SjS ) =S almost surely 1 n n n n n n so we must have E(XjF ) =E(XjS ) =S =n almost surely: 1 n 1 n n Then, by the backward martingale convergence theorem, 1 S =nY almost surely and in L n for some random variable Y . Then Y is T-measurable so, by Kolmogorov's zero-one law PM, Theorem 2.6.1, Y is constant almost surely. Hence Y =E(Y ) = lim E(S =n) = almost surely: n n1  Since almost sure convergence implies convergence in probability PM, Theorem 2.5.1, the following is an immediate corollary. Corollary 3.1.2 (Weak law of large numbers). Let (X ) be a sequence of independent, n n1 identically distributed, integrable random variables. Set  = E(X ). Then P(jS =nj 1 n ") 0 as n1 for all " 0. The main point of the next result is that, if a sum of independent random variables 2 converges in L , then it also converges almost surely, without passing to a subsequence. 152 Proposition 3.1.3. Let (X ) be a sequence of independent random variables in L . Set n n1 S =X + +X and write n 1 n 2  =E(S ) =E(X ) + +E(X );  = var(S ) = var(X ) + + var(X ): n n 1 n n 1 n n Then the following are equivalent: 2 (a) the sequences ( ) and ( ) converge inR, n n1 n1 n 2 (b) there exists a random variable S such that S S almost surely and in L . n The following identities allow estimation of exit probabilities and the mean exit time for a random walk in an interval. They are of some historical interest, having been developed by Wald in the 1940's to compute the eciency of the sequential probability ratio test. Proposition 3.1.4 (Wald's identities). Let (X ) be a sequence of independent, identically n n1 2 distributed random variables, having mean  and variance  2 (0;1). Fix a;b2 R with a 0b and set T = inffn 0 :S a or S bg: n n ThenE(T )1 and E(S ) =E(T ): T Moreover, in the case  = 0, we have 2 2 E(S ) = E(T ) T    X 1 while, in the case 6= 0, if we can nd  =6 0 such thatE(e ) = 1, then   S T E(e ) = 1: 3.2. Non-negative martingales and change of measure. Given a random variable X, with X 0 andE(X) = 1, we can de ne a new probability measureP onF by P(A) =E(X1 ); A2F: A Moreover, by PM, Proposition 3.1.4, given P, this equation determines X uniquely, up to almost sure modi cation. We say thatP has a density with respect toP and X is a version of the density. Let (F ) be a ltration in F and assume for simplicity that F =F . Let (X ) be n n0 1 n n0 an adapted random process, with X  0 andE(X ) = 1 for all n. We can de ne for each n n n a probability measureP onF by n n P (A) =E(X 1 ); A2F : n n A n Since we requireX to beF -measurable, this equation determinesX uniquely, up to almost n n n sure modi cation. Proposition 3.2.1. The measures P are consistent, that is P j =P for all n, if and n n+1 F n n only if (X ) is a martingale. Moreover, there is a measure P on F, which has a density n n0 with respect to P, such that Pj = P for all n, if and only if (X ) is a uniformly F n n n0 n integrable martingale. This construction can also give rise to new probability measures which do not have a density with respect toP onF, as the following result suggests. 16 Theorem 3.2.2. There exists a measureP onF such thatPj =P for all n if and only if F n n E(X ) = 1 for all nite stopping times T . T Proof. Suppose that E(X ) = 1 for all nite stopping times T . Then, since bounded stop- T ping times are nite, (X ) is a martingale, by optional stopping. Hence we can de ne n n0 consistently a set function P on F such that Pj = P for all n. Note that F is a n n F n n n n ring. By Carath eodory's extension theorem PM, Theorem 1.6.1, P extends to a measure onF if and only ifP is countably additive on F . Since eachP is countably additive, 1 n n n it is not hard to see that this condition holds if and only if 1 X P(A ) = 1 n n=1 for all partitions (A : n 0) of such that A 2F for all n. But such partitions are in n n n one-to-one correspondence with nite stopping times T , byfT =ng =A , and then n 1 X E(X ) = P(A ): T n n=1 Hence P extends to a measure on F with the claimed property. Conversely, given such a measure, the last equation shows thatE(X ) = 1 for all nite stopping times T .  T Theorem 3.2.3 (RadonNikodym theorem). Let  and  be - nite measures on a mea- surable space (E;E). Then the following are equivalent (a) (A) = 0 for all A2E such that (A) = 0, (b) there exists a measurable function f on E such that f 0 and (A) =(f1 ); A2E: A The functionf, which is unique up to modi cation-almost everywhere, is called (a version of ) the Radon-Nikodym derivative of  with respect to . We write d f = almost everywhere: d We will give a proof for the case where E is countably generated. Thus, we assume further that there is a sequence (G : n2 N) of subsets of E which generates E. This holds, for n example, whenever E is the Borel -algebra of a topology with countable basis. A further martingale argument, which we omit, allows to deduce the general case. Proof. It is obvious that (b) implies (a). Assume then that (a) holds. There is a countable partition of E by measurable sets on which both  and  are nite. It will suce to show that (b) holds on each of these sets, so we reduce without loss to the case where and are nite. The case where (E) = 0 is clear. Assume then that (E) 0. Then also (E) 0, by (a). Write = E and F = E and consider the probability measures P = =(E) and P = =(E) on ( ;F). It will suce to show that there is a random variable X 0 such thatP(A) =E(X1 ) for all A2F. A 17Set F = (G : k n). There exist m2N and a partition of by events A ;:::;A n k 1 m such thatF =(A ;:::;A ). Set n 1 m m X X = a 1 n j A j j=1 where a = P(A )=P(A ) if P(A ) 0 and a = 0 otherwise. Then X  0, X is F - j j j j j n n n measurable and, using (a), we haveP(A) =E(X 1 ) for allA2F . Observe that (F ) is n A n n n0 a ltration and (X ) is a non-negative martingale. We will show that (X ) is uniformly n n0 n n0 1 integrable. Then, by theL martingale convergence theorem, there exists a random variable X 0 such thatE(X1 ) =E(X 1 ) for all A2F . De ne a probability measure Q onF A n A n byQ(A) =E(X1 ). ThenQ =P on F , which is a-system generatingF. HenceQ =P A n n onF, by uniqueness of extension PM, Theorem 1.7.1, which implies (b). It remains to show that (X ) is uniformly integrable. Given " 0 we can nd  0 n n0 such thatP(B)" for all B2F withP(B). For, if not, there would be a sequence of n sets B 2F withP(B ) 2 andP(B )" for all n. Then n n n P(\ B ) = 0; P(\ B )" n mn m n mn m which contradicts (a). Set  = 1=, thenP(X )E(X )= = 1= = for all n, so n n E(X 1 ) =P(X )": n X  n n Hence (X ) is uniformly integrable.  n n0 3.3. Markov chains. Let E be a countable set. We identify each measure  on E with its mass function ( :x2E), where =(fxg). Then, for each functionf onE, the integral x x is conveniently written as the matrix product X (f) =f =  f x x x2E where we consider  as a row vector and identify f with the column vector (f : x2 E) x given by f = f(x). A transition matrix on E is a matrix P = (p : x;y2 E) such that x xy each row (p :y2E) is a probability measure. xy Let a ltration (F ) be given and let (X ) be an adapted process with values in E. n n0 n n0 We say that (X ) is a Markov chain with transition matrixP if, for alln 0, allx;y2E n n0 and all A2F with AfX =xg andP(A) 0, n n P(X =yjA) =p : n+1 xy Our notion of Markov chain depends on the choice of (F ) . The following result shows n n0 that our de nition agrees with the usual one for the most obvious such choice. Proposition 3.3.1. Let (X ) be a random process in E and take n n0 F =(X :kn): n k The following are equivalent (a) (X ) is a Markov chain with initial distribution  and transition matrix P , n n0 (b) for all n and all x ;x ;:::;x 2E, 0 1 n P(X =x ;X =x ;:::;X =x ) = p :::p : 0 0 1 1 n n x x x x x 0 0 1 n1 n 18 Proposition 3.3.2. Let E denote the set of sequences x = (x : n 0) in E and de ne n   X : E E by X (x) = x . Set E = (X : k 0). Let P be a transition matrix on n n n k   E. Then, for each x2 E, there is a unique probability measure P on (E ;E ) such that x (X ) is a Markov chain with transition matrix P and starting from x. n n0 d A example of a Markov chain inZ is the simple symmetric random walk, whose transition matrix is given by  1=(2d); ifjxyj = 1, p = xy 0; otherwise. The following result shows a simple instance of a general relationship between Markov pro- cesses and martingales. We will see a second instance of this for Brownian motion in Theorem 7.4.4. Proposition 3.3.3. Let (X ) be an adapted process in E. Then the following are equiv- n n0 alent (a) (X ) is a Markov chain with transition matrix P , n n0 (b) for all bounded functions f on E the following process is a martingale n1 X f M =f(X )f(X ) (PI)f(X ): n 0 k n k=0 A bounded function f on E is said to be harmonic if Pf =f, that is to say, if X p f =f ; x2E: xy y x y2E Note that, if f is a bounded harmonic function, then (f(X )) is a bounded martingale. n n0 p Then, by Doob's convergence theorems, f(X ) converges almost surely and in L for all n p1. More generally, for DE, a bounded function f on E is harmonic in D if X p f =f ; x2D: xy y x y2E Suppose we set D =EnD x a bounded function f on D. Set T = inffn 0 :X 2Dg n and de ne a function u on E by u(x) =E (f(X )1 ): x T fT1g Theorem 3.3.4. The function u is bounded, harmonic in D, and u =f on D. Moreover, if P (T 1) = 1 for all x2 D, then u is the unique bounded extension of f which is x harmonic in D. Proof. It is clear that u is bounded and u =f on D. For all x;y2E with p 0, under xy P , conditional onfX =yg, (X ) has distributionP . So, for x2D, x 1 n+1 n0 y X u(x) = p u(y) xy y2E showing that u is harmonic in D. 19On the other hand, suppose that g is a bounded function, harmonic in D and such that g T g = f on D. Then M = M is a martingale and T is a stopping time, so M is also a martingale by optional stopping. ButM =g(X ). So, ifP (T 1) = 1 for allx2D, Tn Tn x then M f(X ) almost surely Tn T so, by bounded convergence, for all x2D, g(x) =E (M ) =E (M )E (f(X )) =u(x): x 0 x Tn x T  In Theorem 7.9.3 we will prove an analogous result for Brownian motion 4. Random processes in continuous time 4.1. De nitions. A continuous random process is a family of random variables (X ) such t t0 that, for all 2 , the path t7X () : 0;1)R is continuous. t A function x : 0;1) R is said to be cadlag if it is right-continuous with left limits, that is to say, for all t 0 x x as st with st s t and, for all t 0, there exists x 2R such that t x x as st with st: s t The term is a French acronym for continu a  droite, limit e  a gauche. A cadlag random process is a family of random variables (X ) such that, for all2 , the patht7X () : t t0 t 0;1)R is cadlag. The spaces of continuous and cadlag functions on 0;1) are denoted C(0;1);R) and D(0;1);R) respectively. We equip both these spaces with the -algebra generated by the coordinate functions (x7 x : t 0). A continuous random process (X ) can then be t t t0 considered as a random variable X in C(0;1);R) given by X() = (t7X () :t 0): t A cadlag random process can be thought of as a random variable in D(0;1);R). The nite-dimensional distributions of a continuous or cadlag process X are the laws  on t ;:::;t 1 n n R given by n  (A) =P((X ;:::;X )2A); A2B(R ) t ;:::;t t t 1 n 1 n wheren2N andt ;:::;t 2 0;1) witht t . Since the cylinder setsf(X ;:::;X )2 1 n 1 n t t n 1 Ag form a generating -system, they determine uniquely the law of X. We make analogous de nitions whenR is replaced by a general topological space. 20

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.