Question? Leave a message!




Linearity and Monotonicity of Expectation

Linearity and Monotonicity of Expectation 19
Stat 110 Unit 4: Expectation Chapter 4 in the text 1 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 2 Summarizing a r.v.’s Distribution • Suppose a discrete random variable has the following distribution: • How would you summarize this distribution • Center • Spread • Shape • The concept of expectation of a r.v. formalizes these ideas. 3 3 Definition: Expectation • The expected value (or expectation or mean), E(X), of a discrete random variable X is defined as (text, p.138139): E(X ) xP(X x)  all x • Sometimes the mean of X is written as μ . X • Intuitively, what is E(X) measuring • It is the theoretical weighted average of X, weighted by the probabilities (hence the name, mean). • It is the measure of the “center” of a distribution, but says nothing about the spread or shape of it. • Physically, it is the balance point of the PMF. 4 4 Expected Value Examples • Concrete Example: let X have the following distribution: x 0 1 2 P(X = x) 0.5 0.4 0.1 • Intuitively, what should be the mean of X Calculate it. E(X ) xP(X x) 0(0.5)1(0.4) 2(0.1) 0.6  all x • Let X Bern(p). Intuitively, what should be the mean of X Calculate it. E(X ) xP(X x) 0(1 p)1( p) p  all x 5 5 Nonuniqueness of Expectation • Based on it’s definition, it can be shown that E(X) only depends on the distribution of X. So two r.v.s with the same distribution (same PMF) will have the same expected value. • The converse is not true: two r.v.s could have the same expected value, but have completely different distribution. • Examples: X Bin(n=2, p=0.5) and Y = DUnif(0,1,2). 6 6 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 7 Linearity of Expectation • For any r.v.s X, Y and any constant c, linearity of expectation holds (text p. 140): E(XY ) E(X ) E(Y ) E(aX b) aE(X ) b • What do these results mean • The second equation says that we can take out constant factors from an expectation; this is both intuitively reasonable and easily verified from the definition. • The first equation, E(X + Y) = E(X) + E(Y), also seems reasonable when X and Y are independent. What may be surprising is that it holds even if X and Y are dependent 8 8 Why E(X + Y) = E(X) + E(Y) • We can equivalently calculate expectations based on the sum of all outcomes in S directly. Thus: E(X ) X (s)P(s)  all s • What the heck does X(s) mean How about P(s) • So if X and Y are based on the outcomes (key: they have to be if they are dependent), then: E(X ) E(Y ) X (s)P(s) Y (s)P(s)  all s all s  (XY )(s)P(s) E(XY )  all s • An example is worth a thousand words. .. 9 9 Linearity of Expectation is handy • Let X Bin(n,p). Intuitively, what should be the mean of X Calculate it. • Two ways to do this: 1) Brute force: 2) Applying linearity (much easier): 10 10 Linearity of Expectation is handy • Let X HGeom(w,b,n). Intuitively, what should be the mean of X Calculate it. Hint: define X as a sum of dependent Bernoulli r.v.s. 11 11 Monotonicity of Expectation • Let X and Y be r.v.s such that X ≥ Y with probability 1. Then E(X) ≥ E(Y), with equality holding if and only if X = Y with probability 1. • Proof: • We will prove it only for discrete r.v.s. (but it holds for all r.v.s). The r.v. Z = X – Y is nonnegative (with probability 1), so E(Z) ≥ 0 since E(Z) is defined as a sum of nonnegative terms. By linearity, E(X) – E(Y) = E(X – Y) ≥ 0; as desired. • If E(X) = E(Y), then by linearity we also have E(Z) = 0, which implies that P(X = Y) = P(Z = 0) = 1 since if even one term in the sum defining E(Z) is positive, then the whole sum is positive. • Monotonicity is not as useful as linearity. 12 12 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 13 Law of the Unconscious Statistician (LOTUS) • If X is a discrete r.v. and g is a function from ℝ to ℝ, then (text, p.156): Eg(X ) g(x)P(X x)  all x • What does this mean This means that we can get the expected value of g(X) knowing only P(X = x), the PMF of X; we don't need to know the PMF of g(X). The name comes from the fact that in going from E(X) to Eg(X) it is tempting just to change x to g(x) in the definition, which can be done very easily and mechanically, perhaps in a state of unconsciousness. • Be careful: Eg(X) does not necessarily equal gE(X) 14 14 Example to illustrate LOTUS • Concrete Example: let X have the following distribution: x 2 1 0 1 2 P(X = x) 0.1 0.2 0.3 0.2 0.2 2 • Let Y = X . Find E(Y) two ways: based on the distribution of Y and using LOTUS. • Why does it work out to be the same answer 15 15 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 16 Variance and Standard Deviation • The variance of a r.v. X is defined as (text, p.158): 2 Var(X ) EX 2 where μ = E(X). Sometimes variance is written as σ . X • The square root of the variance is called standard deviation: SD(X ) Var(X ) • What is variance measuring What are its units • What is standard deviation measuring What are its units • Which is more interpretable 17 17 Variance: an equivalent formula • For any r.v. X: 2 2 Var(X ) EX where μ = E(X). 2 • Proof (Expand (X – μ) and use linearity): 2 2 2 Var(X ) EX EX 2X 2 2 2 2 2  EX 2E(X ) EX 2 2 2  EX • This result is often useful when calculating variance of a r.v. 18 18 Variance Examples • Concrete Example: let X have the following distribution: x 0 1 2 P(X = x) 0.5 0.4 0.1 • What is the variance of X What is its standard deviation 2 2 Var(X ) EXx P(X x)  all x 2 2 2  (0 0.6) (0.5) (1 0.6) (0.4) (2 0.6) (0.1) 0.44 SD(X ) Var(X ) 0.44 0.663 • Let X Bern(p). Calculate Var(X). 2 2 2 2 E(X ) x P(X x) 0 (1 p)1 ( p) p  all x 2 2 2 Var(X ) EX p p p(1 p) 19 19 Properties of Variance • Var(X + c) = Var(X) 2 • Var(cX) = c Var(X) • Var(X) ≥ 0 • If X and Y are independent, then: Var(X + Y) = Var(X) + Var(Y) • Why do these properties make sense, intuitively th • The 4 property: we will prove later (need to define covariance first). But intuitively, what should Var(X+X) be • When is Var(X) = 0 20 20 Variance of a Binomial r.v. • Let X Bin(n,p). What is Var(X) • Two ways to do this: 1) Brute force: 2) Using properties of variance (much easier): Let X Bern(p). i • How should variance of a Hypergeometric r.v. compare 21 21 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 22 Story of the Geometric Distribution • Consider a sequence of independent Bernoulli trials, each with the same success probability p, with trial performed until a success occurs. Let X be the number of failures before the first successful trial. Then X has the Geometric distribution with parameter p; we denote this by X Geom(p). • What is X’s distribution That is, what is the probability mass function for X (don’t forget to mention X’s support) 23 Geometric Distribution Definition • If X Geom(p), then the PMF of X is: k k P(X k) (1 p) p q p for k = 0, 1, … • Why is this a valid PMF • Key (recall your geometric series):  1 k q  1 q k0 st • Note: the First Success distribution, FS(p), counts the 1 success and adds one to the Geom(p) r.v. 24 Plot of a Geometric PMF and CDF Geom(0.5) 25 The Geometric Distribution: Mean and Variance • Let X Geom(p). Find E(X) and Var(X). Hint: take the derivative of the geometric series w.r.t. q.    1 1 1 k k1  d q / dq d / dq kq  2 2  1 q (1 q) p k0 k0    k k E(X ) kP(X k) kq p p kq  k0 k0 k0 pq q  2 p p 26 The Geometric Distribution: Mean and Variance 2 2 • Var(X) = E(X ) μ . Hint: take the second derivative of q times the geometric series w.r.t. q.  q 1 q 1 q k 2 k1 kq k q  2 3 3 (1 q) (1 q) p k0 k0  2 2 2 k 2 k E(X ) k P(X k) k q p p k q  k0 k0 k0 1 q q(1 q)  pq 3 2  p p  2 q(1 q) q q 2 2 Var(X ) E(X ) 2 2  p p p  27 Story of the Negative Binomial Dist. • The Negative Binomial distribution is just an extension of the Geometric distribution: instead of waiting for just one success, we can wait for a known predetermined number, r, of successes. • So in a sequence of independent Bernoulli trials with success probability p, if X is the number of failures before th the r success, then X is said to have the Negative Binomial distribution with parameters r and p, denoted X NBin(r, p) • What is X’s distribution That is, what is the probability mass function for X (don’t forget to mention X’s support) 28 Negative Binomial Dist. Definition • If X NBin(r, p), then the PMF of X is: n r1  r n  P(X n) p (1 p)  r1  for n = 0, 1, 2, … • How does the Negative Binomial distribution relate to the Geometric Distribution • Let i.i.d. X Geom(p), and let X X + X +…+ X i 1 2 r • What distribution does X have • So a Negative Binomial r.v. is a sum of i.i.d. Geometric r.v.s 29 The Negative Binomial Distribution: Mean and Variance • Let X NBin(r,p). Find E(X) and Var(X). Hint: use the fact that Negative Binomial r.v. is a sum of i.i.d. Geometric r.v.s 30 How everything relates… • Below is a table of 4 different types of r.v.s. based on sampling where each observation can be considered a result of a Bernoulli trial (not necessarily independent). Here are the 4 types we have seen: • We won’t talk about the Negative Hypergeometric, but realize there is such a thing (rarely comes into play in practice). 31 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 32 Properties of Indicator Variables • Recall from Unit 3 that an indicator r.v., I or I(A), for an A event A takes on the value 1 if A occurs and 0 if does not occur. • Indicator r.v.s have the following properties (for distinct events A and B): k • (I ) = I for any positive integer k. A A • I = 1 – I C A A • I = I∙I A B A B ∩ • I = I + I – I∙I A B A B A B ∪ 33 The Fundamental Bridge • There is a onetoone correspondence between events and indicator r.v.s, and the probability of an event A is the expected value of its indicator r.v. I : A P(A) = E(I ) A • Proof: What distribution does I have What is that A distribution’s expectation 34 The Fundamental Bridge: Example 1 • In a group of n people, under the usual assumptions about birthdays, what is the expected number of distinct birthdays among the n people, i.e., the expected number of days on which at least one of the people was born • Hint: define a set of indicator variables: I ,…,I and use 1 365 the fundamental bridge and linearity. • What is the expected number of birthday matches, i.e., pairs of people with the same birthday 𝑛 • Hint: define a set of indicator variables: J ,…, J and use 1 𝑘 the fundamental bridge and linearity. 35 The Fundamental Bridge: Example 2 • Let X be a nonnegative integervalued r.v. Let F be the CDF of X, and G(x) = 1 – F(x) = P(X x). The function G is called the survival function of X. Then  E(X ) G(n)  n0 • Let X Geom(p), and q = 1 – p. Using the Geometric story, X n is the event that the first n + 1 trials are all failures. • So by expectation via the survival function:  q q n1 E(X ) P(X n) q  1 q p n0 n0 36 Unit 4 Outline • Definition of Expectation • Linearity and Monotonicity of Expectation • LOTUS (expectation of a function) • Variance and Standard Deviation • Geometric and Negative Binomial distributions • Indicator r.v.s and the Fundamental Bridge • Poisson distribution 37 Story of the Poisson Distribution • Imagine you are trying to determine the number of occurrences (“successes”) of a certain rare type of cancer (melanoma) in a large population (like the state of Massachusetts) over a fixed period of time (say a year). • What would be the Binomial distribution used to model the situation here • The Poisson distribution is instead often used in situations like this, where we are counting the number of successes in a particular region or interval of time, and there are a large number of trials, each with a small probability of success. 38 Poisson Distribution Definition • A r.v. X has the Poisson distribution with parameter λ, X Pois(λ), if the PMF of X is:  k e P(X k) k for k = 0, 1, … • This is a valid PMF because of the Taylor series: k     e  k k0 39 Plot of Poisson PMFs and CDFs top is Pois(2) and bottom is Pois(5) 40 The Poisson Distribution Examples • The following random variables could reasonably follow a distribution that is approximately Poisson: • The number of emails you receive in an hour. There are a lot of people who could potentially email you in that hour, but it is unlikely that any specific person will actually email you in that hour. Alternatively, imagine subdividing the hour into 6 milliseconds. There are 3.6x10 milliseconds in an hour, but in any specific millisecond it is unlikely that you will get an email. • The number of chips in a chocolate chip cookie. Imagine subdividing the cookie into small cubes; the probability of getting a chocolate chip in a single cube is small, but the number of cubes is large. • The parameter λ is interpreted as the rate of occurrence of these rare events; in the examples above, could be 20 (emails per hour) or 10 (chips per cookie). 41 The Poisson Distribution: Mean and Variance • Let X Pois(λ). Find E(X) and Var(X). 42 Poisson Distribution: Concrete Example • The Boston Bruins score on average 3 goals per 60 minute game. Let X = goals scored in the next Bruins game. • Is it reasonable to assume X follows a Poisson distribution • What should be the value of λ • What is the probability that they get shut out in a game • Note: more on the Poisson in the notes for next week… 43 Last Word: what is a Poisson r.v. 44
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
ZoeTabbot
User Type:
Professional
Country:
Germany
Uploaded Date:
13-07-2017