Question? Leave a message!




Lecture notes Probability Theory

lecture notes on probability and statistics theory for economists and probability theory lecture notes mit | pdf free download
NotesonProbability PeterJ.CameroniiPreface Here are the course lecture notes for the course MAS108, Probability I, at Queen Mary,UniversityofLondon,takenbymostMathematicsstudentsandsomeothers inthefirstsemester. Thedescriptionofthecourseisasfollows: Thiscourseintroducesthebasicnotionsofprobabilitytheoryandde velops them to the stage where one can begin to use probabilistic ideasinstatisticalinferenceandmodelling,andthestudyofstochastic processes. Probability axioms. Conditional probability and indepen dence. Discreterandomvariablesandtheirdistributions. Continuous distributions. Jointdistributions. Independence. Expectations. Mean, variance,covariance,correlation. Limitingdistributions. Thesyllabusisasfollows: 1. Basic notions of probability. Sample spaces, events, relative frequency, probabilityaxioms. 2. Finitesamplespaces. Methodsofenumeration. Combinatorialprobability. 3. Conditionalprobability. Theoremoftotalprobability. Bayestheorem. 4. Independence of two events. Mutual independence of n events. Sampling withandwithoutreplacement. 5. Random variables. Univariate distributions discrete, continuous, mixed. Standarddistributionshypergeometric,binomial,geometric,Poisson,uni form,normal,exponential. Probabilitymassfunction,densityfunction,dis tributionfunction. Probabilitiesofeventsintermsofrandomvariables. 6. Transformations of a single random variable. Mean, variance, median, quantiles. 7. Jointdistributionoftworandomvariables. Marginalandconditionaldistri butions. Independence. iiiiv 8. Covariance,correlation. Meansandvariancesoflinearfunctionsofrandom variables. 9. LimitingdistributionsintheBinomialcase. Thesecoursenotesexplainthenaterialinthesyllabus. Theyhavebeen“field tested” on the class of 2000. Many of the examples are taken from the course homeworksheetsorpastexampapers. Set books The notes cover only material in the Probability I course. The text books listed below will be useful for other courses on probability and statistics. You need at most one of the three textbooks listed below, but you will need the statisticaltables. • Probability and Statistics for Engineering and the Sciences by Jay L. De vore(fifthedition),publishedbyWadsworth. Chapters 2–5 of this book are very close to the material in the notes, both in order and notation. However, the lectures go into more detail at several points, especially proofs. If you find the course difficult then you are advised to buy thisbook, readthecorrespondingsectionsstraight afterthelectures, anddoextra exercisesfromit. Otherbookswhichyoucanuseinsteadare: • ProbabilityandStatisticsinEngineeringandManagementSciencebyW.W. HinesandD.C.Montgomery,publishedbyWiley,Chapters2–8. • Mathematical Statistics and Data Analysis by John A. Rice, published by Wadsworth,Chapters1–4. Youshouldalsobuyacopyof • New Cambridge Statistical Tables by D. V. Lindley and W. F. Scott, pub lishedbyCambridgeUniversityPress. You need to become familiar with the tables in this book, which will be provided for you in examinations. All of these books will also be useful to you in the coursesStatisticsIandStatisticalInference. Thenextbookisnotcompulsorybutintroducestheideasinafriendlyway: • Taking Chances: Winning with Probability, by John Haigh, published by OxfordUniversityPress.v Web resources Course material for the MAS108 course is kept on the Web at theaddress http://www.maths.qmw.ac.uk/ pjc/MAS108/ ˜ This includes a preliminary version of these notes, together with coursework sheets,testandpastexampapers,andsomesolutions. Otherwebpagesofinterestinclude http://www.dartmouth.edu/ chance/teaching aids/ ˜ books articles/probability book/pdf.html A textbook Introduction to Probability, by Charles M. Grinstead and J. Laurie Snell,availablefree,withmanyexercises. http://www.math.uah.edu/stat/ TheVirtualLaboratoriesinProbabilityandStatistics,asetofwebbasedresources for students and teachers of probability and statistics, where you can run simula tionsetc. http://www.newton.cam.ac.uk/wmy2kposters/july/ TheBirthdayParadox(posterintheLondonUnderground,July2000). http://www.combinatorics.org/Surveys/ds5/VennEJC.html An article on Venn diagrams by Frank Ruskey, with history and many nice pic tures. WebpagesforotherQueenMarymathscoursescanbefoundfromtheonline versionoftheMathsUndergraduateHandbook. PeterJ.Cameron December2000viContents 1 Basicideas 1 1.1 Samplespace,events . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Whatisprobability . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Kolmogorov’sAxioms . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Provingthingsfromtheaxioms . . . . . . . . . . . . . . . . . . . 4 1.5 InclusionExclusionPrinciple . . . . . . . . . . . . . . . . . . . . 6 1.6 Otherresultsaboutsets . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Stoppingrules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.9 Questionnaireresults . . . . . . . . . . . . . . . . . . . . . . . . 13 1.10 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.11 Mutualindependence . . . . . . . . . . . . . . . . . . . . . . . . 16 1.12 Propertiesofindependence . . . . . . . . . . . . . . . . . . . . . 17 1.13 Workedexamples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 Conditionalprobability 23 2.1 Whatisconditionalprobability . . . . . . . . . . . . . . . . . . 23 2.2 Genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 TheTheoremofTotalProbability . . . . . . . . . . . . . . . . . 26 2.4 Samplingrevisited . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Bayes’Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6 Iteratedconditionalprobability . . . . . . . . . . . . . . . . . . . 31 2.7 Workedexamples . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Randomvariables 39 3.1 Whatarerandomvariables . . . . . . . . . . . . . . . . . . . . 39 3.2 Probabilitymassfunction . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Expectedvalueandvariance . . . . . . . . . . . . . . . . . . . . 41 3.4 Jointp.m.f. oftworandomvariables . . . . . . . . . . . . . . . . 43 3.5 Somediscreterandomvariables . . . . . . . . . . . . . . . . . . 47 3.6 Continuousrandomvariables . . . . . . . . . . . . . . . . . . . . 55 viiviii CONTENTS 3.7 Median,quartiles,percentiles . . . . . . . . . . . . . . . . . . . . 57 3.8 Somecontinuousrandomvariables . . . . . . . . . . . . . . . . . 58 3.9 Onusingtables . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.10 Workedexamples . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Moreonjointdistribution 67 4.1 Covarianceandcorrelation . . . . . . . . . . . . . . . . . . . . . 67 4.2 Conditionalrandomvariables . . . . . . . . . . . . . . . . . . . . 70 4.3 Jointdistributionofcontinuousr.v.s . . . . . . . . . . . . . . . . 73 4.4 Transformationofrandomvariables . . . . . . . . . . . . . . . . 74 4.5 Workedexamples . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A Mathematicalnotation 79 B Probabilityandrandomvariables 83Chapter1 Basicideas In this chapter, we don’t really answer the question ‘What is probability’ No bodyhasareallygoodanswertothisquestion. Wetakeamathematicalapproach, writing down some basic axioms which probability must satisfy, and making de ductions from these. We also look at different kinds of sampling, and examine whatitmeansforeventstobeindependent. 1.1 Samplespace,events The general setting is: We perform an experiment which can have a number of different outcomes. The sample space is the set of all possible outcomes of the experiment. WeusuallycallitS. It is important to be able to list the outcomes clearly. For example, if I plant tenbeanseedsandcountthenumberthatgerminate,thesamplespaceis S =0,1,2,3,4,5,6,7,8,9,10. IfItossacointhreetimesandrecordtheresult,thesamplespaceis S =HHH,HHT,HTH,HTT,THH,THT,TTH,TTT, where (for example) HTH means ‘heads on the first toss, then tails, then heads again’. Sometimes we can assume that all the outcomes are equally likely. (Don’t assume this unless either you are told to, or there is some physical reason for assuming it. In the beans example, it is most unlikely. In the coins example, the assumption will hold if the coin is ‘fair’: this means that there is no physical reason for it to favour one side over the other.) If all outcomes are equally likely, theneachhasprobability1/S. (RememberthatSisthenumberofelementsin thesetS). 12 CHAPTER1. BASICIDEAS On this point, Albert Einstein wrote, in his 1905 paper On a heuristic point of view concerning the production and transformation of light (for which he was awardedtheNobelPrize), Incalculatingentropybymoleculartheoreticmethods,theword“prob ability” is often used in a sense differing from the way the word is defined in probability theory. In particular, “cases of equal probabil ity” are often hypothetically stipulated when the theoretical methods employedaredefiniteenoughtopermitadeductionratherthanastip ulation. In other words: Don’t just assume that all outcomes are equally likely, especially whenyouaregivenenoughinformationtocalculatetheirprobabilities An event isasubsetofS. Wecanspecifyaneventbylistingalltheoutcomes that make it up. In the above example, let A be the event ‘more heads than tails’ andBtheevent‘headsonlastthrow’. Then A = HHH,HHT,HTH,THH, B = HHH,HTH,THH,TTH. The probability of an event is calculated by adding up the probabilities of all the outcomes comprising that event. So, if all outcomes are equally likely, we have A P(A)= . S Inourexample,bothAandBhaveprobability4/8=1/2. An event is simple if it consists of just a single outcome, and is compound otherwise. In the example, A and B are compound events, while the event ‘heads on every throw’ is simple (as a set, it isHHH). If A =a is a simple event, then the probability of A is just the probability of the outcome a, and we usually write P(a), which is simpler to write than P(a). (Note that a is an outcome, whileaisanevent,indeedasimpleevent.) Wecanbuildneweventsfromoldones: • A∪B(read‘AunionB’)consistsofalltheoutcomesinAorinB(orboth) • A∩B(read‘AintersectionB’)consistsofalltheoutcomesinbothAandB; • A\B(read‘AminusB’)consistsofalltheoutcomesinAbutnotinB; 0 • A (read‘Acomplement’)consistsofalloutcomesnotinA(thatis,S\A); • 0/ (read‘emptyset’)fortheeventwhichdoesn’tcontainanyoutcomes.1.2. WHATISPROBABILITY 3 Notethebackwardslopingslash;thisisnotthesameaseitheraverticalslash or aforwardslash/. 0 In the example, A is the event ‘more tails than heads’, and A∩B is the event HHH,THH,HTH. Note that P(A∩B)=3/8; this is not equal to P(A)·P(B), despitewhatyoureadinsomebooks 1.2 Whatisprobability Thereisreallynoanswertothisquestion. Somepeoplethinkofitas‘limitingfrequency’. Thatis,tosaythattheproba bilityofgettingheadswhenacoinistossedmeansthat,ifthecoinistossedmany times, it is likely to come down heads about half the time. But if you toss a coin 1000times,youarenotlikelytogetexactly500heads. Youwouldn’tbesurprised togetonly495. Butwhatabout450,or100 Some people would say that you can work out probability by physical argu ments, like the one we used for a fair coin. But this argument doesn’t work in all cases,anditdoesn’texplainwhatprobabilitymeans. Some people say it is subjective. You say that the probability of heads in a cointossis1/2becauseyouhavenoreasonforthinkingeitherheadsortailsmore likely; you might change your view if you knew that the owner of the coin was a magicianoraconman. Butwecan’tbuildatheoryonsomethingsubjective. Weregardprobabilityasamathematicalconstructionsatisfyingsomeaxioms (devised by the Russian mathematician A. N. Kolmogorov). We develop ways of doing calculations with probability, so that (for example) we can calculate how unlikely it is to get 480 or fewer heads in 1000 tosses of a fair coin. The answer agreeswellwithexperiment. 1.3 Kolmogorov’sAxioms Remember that an event is a subset of the sample space S. A number of events, say A ,A ,..., are called mutually disjoint or pairwise disjoint if A ∩A =0/ for 1 2 i j anytwooftheeventsA andA ;thatis,notwooftheeventsoverlap. i j According to Kolmogorov’s axioms, each event A has a probability P(A), whichisanumber. Thesenumberssatisfythreeaxioms: Axiom1: ForanyeventA,wehaveP(A)≥0. Axiom2: P(S)=1.4 CHAPTER1. BASICIDEAS Axiom3: IftheeventsA ,A ,...arepairwisedisjoint,then 1 2 P(A ∪A ∪···)=P(A )+P(A )+··· 1 2 1 2 Note that in Axiom 3, we have the union of events and the sum of numbers. Don’t mix these up; never write P(A )∪P(A ), for example. Sometimes we sep 1 2 arate Axiom 3 into two parts: Axiom 3a if there are only finitely many events A ,A ,...,A ,sothatwehave 1 2 n n P(A ∪···∪A )= P(A ), 1 n i ∑ i=1 andAxiom3bforinfinitelymany. WewillonlyuseAxiom3a,but3bisimportant lateron. Noticethatwewrite n P(A ) i ∑ i=1 for P(A )+P(A )+···+P(A ). n 1 2 1.4 Provingthingsfromtheaxioms You can prove simple properties of probability from the axioms. That means, every step must be justified by appealing to an axiom. These properties seem obvious,justasobviousastheaxioms;butthepointofthisgameisthatweassume onlytheaxioms,andbuildeverythingelsefromthat. Herearesomeexamplesofthingsprovedfromtheaxioms. Thereisreallyno difference between a theorem, a proposition, and a corollary; they all have to be proved. Usually, a theorem is a big, important statement; a proposition a rather smaller statement; and a corollary is something that follows quite easily from a theoremorpropositionthatcamebefore. Proposition1.1 If the event A contains only a finite number of outcomes, say A=a ,a ,...,a ,then 1 2 n P(A)=P(a )+P(a )+···+P(a ). 1 2 n To prove the proposition, we define a new event A containing only the out i come a, that is, A =a, for i=1,...,n. Then A ,...,A are mutually disjoint i i i 1 n1.4. PROVINGTHINGSFROMTHEAXIOMS 5 (each contains only one element which is in none of the others), and A ∪A ∪ 1 2 ···∪A =A;sobyAxiom3a,wehave n P(A)=P(a )+P(a )+···+P(a ). 1 2 n Corollary1.2 IfthesamplespaceS isfinite,sayS =a ,...,a ,then 1 n P(a )+P(a )+···+P(a )=1. 1 2 n For P(a )+P(a )+···+P(a )=P(S) by Proposition 1.1, and P(S)=1 by 1 2 n Axiom 2. Notice that once we have proved something, we can use it on the same basisasanaxiomtoprovefurtherfacts. Now we see that, if all the n outcomes are equally likely, and their probabil ities sum to 1, then each has probability 1/n, that is, 1/S. Now going back to Proposition1.1,weseethat,ifalloutcomesareequallylikely,then A P(A)= S foranyeventA,justifyingtheprincipleweusedearlier. 0 Proposition1.3 P(A )=1−P(A)foranyeventA. 0 LetA =AandA =A (thecomplementofA). ThenA ∩A =0/ (thatis,the 1 2 1 2 eventsA andA aredisjoint),andA ∪A =S. So 1 2 1 2 P(A )+P(A ) = P(A ∪A ) (Axiom3) 1 2 1 2 = P(S) = 1 (Axiom2). SoP(A)=P(A )=1−P(A ). 1 2 Corollary1.4 P(A)≤1foranyeventA. 0 0 For 1−P(A)=P(A ) by Proposition 1.3, and P(A )≥0 by Axiom 1; so 1− P(A)≥0,fromwhichwegetP(A)≤1. Remember that if you ever calculate a probability to be less than 0 or more than1,youhavemadeamistake Corollary1.5 P(0/)=0. 0 / / For0=S ,soP(0)=1−P(S)byProposition1.3;andP(S)=1byAxiom2, soP(0/)=0.6 CHAPTER1. BASICIDEAS Hereisanotherresult. ThenotationA⊆BmeansthatAiscontainedinB,that is,everyoutcomeinAalsobelongstoB. Proposition1.6 IfA⊆B,thenP(A)≤P(B). This time, take A = A, A = B\A. Again we have A ∩A =0/ (since the 1 2 1 2 elementsofB\Aare,bydefinition,notinA),andA ∪A =B. SobyAxiom3, 1 2 P(A )+P(A )=P(A ∪A )=P(B). 1 2 1 2 Inotherwords,P(A)+P(B\A)=P(B). NowP(B\A)≥0byAxiom1;so P(A)≤P(B), aswehadtoshow. 1.5 InclusionExclusionPrinciple  A B  A Venn diagram for two sets A and B suggests that, to find the size of A∪B, weaddthesizeofAandthesizeofB,butthenwehaveincludedthesizeofA∩B twice,sowehavetotakeitoff. Intermsofprobability: Proposition1.7 P(A∪B)=P(A)+P(B)−P(A∩B). We now prove this from the axioms, using the Venn diagram as a guide. We seethatA∪Bismadeupofthreeparts,namely A =A∩B, A =A\B, A =B\A. 1 2 3 Indeed we do have A∪B=A ∪A ∪A , since anything in A∪B is in both these 1 2 3 setsorjustthefirstorjustthesecond. SimilarlywehaveA ∪A =AandA ∪A = 1 2 1 3 B. ThesetsA ,A ,A aremutuallydisjoint. (Wehavethreepairsofsetstocheck. 1 2 3 / Now A ∩A =0, since all elements of A belong to B but no elements of A do. 1 2 1 2 Theargumentsfortheothertwopairsaresimilar–youshoulddothemyourself.)1.6. OTHERRESULTSABOUTSETS 7 So,byAxiom3,wehave P(A) = P(A )+P(A ), 1 2 P(B) = P(A )+P(A ), 1 3 P(A∪B) = P(A )+P(A )+P(A ). 1 2 3 Fromthisweobtain P(A)+P(B)−P(A∩B) = (P(A )+P(A ))+(P(A )+P(A ))−P(A ) 1 2 1 3 1 = P(A )+P(A )+P(A ) 1 2 3 = P(A∪B) asrequired. The InclusionExclusion Principle extends to more than two events, but gets morecomplicated. Hereitisforthreeevents;trytoproveityourself.  C   A B  Tocalculate P(A∪B∪C), wefirstaddup P(A), P(B), and P(C). Thepartsin commonhavebeencountedtwice,sowesubtractP(A∩B),P(A∩C)andP(B∩C). But then we find that the outcomes lying in all three sets have been taken off completely,somustbeputback,thatis,weaddP(A∩B∩C). Proposition1.8 ForanythreeeventsA,B,C,wehave P(A∪B∪C)=P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C). Canyouextendthistoanynumberofevents 1.6 Otherresultsaboutsets There are other standard results about sets which are often useful in probability theory. Herearesomeexamples. Proposition1.9 LetA,B,C besubsetsofS. Distributivelaws: (A∩B)∪C =(A∪C)∩(B∪C)and (A∪B)∩C =(A∩C)∪(B∩C). 0 0 0 0 0 0 DeMorgan’sLaws: (A∪B) =A ∩B and (A∩B) =A ∪B. We will not give formal proofs of these. You should draw Venn diagrams and convinceyourselfthattheywork.8 CHAPTER1. BASICIDEAS 1.7 Sampling Ihavefourpensinmydeskdrawer;theyarered,green,blue,andpurple. Idrawa pen;eachpenhasthesamechanceofbeingselected. Inthiscase,S =R,G,B,P, where R means ‘red pen chosen’ and so on. In this case, if A is the event ‘red or greenpenchosen’,then A 2 1 P(A)= = = . S 4 2 More generally, if I have a set of n objects and choose one, with each one equally likely to be chosen, then each of the n outcomes has probability 1/n, and aneventconsistingofmoftheoutcomeshasprobabilitym/n. Whatifwechoosemorethanonepen Wehavetobemorecarefultospecify thesamplespace. First,wehavetosaywhetherweare • samplingwithreplacement,or • samplingwithoutreplacement. Sampling with replacement means that we choose a pen, note its colour, put it back and shake the drawer, then choose a pen again (which may be the same penasbeforeoradifferentone),andsoonuntiltherequirednumberofpenshave beenchosen. Ifwechoosetwopenswithreplacement,thesamplespaceis RR, RG, RB, RP, GR, GG, GB, GP, BR, BG, BB, BP, PR, PG, PB, PP The event ‘at least one red pen’ isRR,RG,RB,RP,GR,BR,PR, and has proba bility7/16. Sampling without replacement means that we choose a pen but do not put it back, so that our final selection cannot include two pens of the same colour. In thiscase,thesamplespaceforchoosingtwopensis RG, RB, RP, GR, GB, GP, BR, BG, BP, PR, PG, PB and the event ‘at least one red pen’ isRG,RB,RP,GR,BR,PR, with probability 6/12=1/2.1.7. SAMPLING 9 Now there is another issue, depending on whether we care about the order in which the pens are chosen. We will only consider this in the case of sampling without replacement. It doesn’t really matter in this case whether we choose the pens one at a time or simply take two pens out of the drawer; and we are not interestedinwhichpenwaschosenfirst. Sointhiscasethesamplespaceis R,G,R,B,R,P,G,B,G,P,B,P, containingsixelements. (Eachelementiswrittenasasetsince,inaset,wedon’t carewhichelementisfirst,onlywhichelementsareactuallypresent. Sothesam plespaceisasetofsets) Theevent‘atleastoneredpen’isR,G,R,B,R,P, withprobability3/6=1/2. Weshouldnotbesurprisedthatthisisthesameasin thepreviouscase. There are formulae for the sample space size in these three cases. These in volvethefollowingfunctions: n = n(n−1)(n−2)···1 n P = n(n−1)(n−2)···(n−k+1) k n n C = P /k k k Notethatnistheproductofallthewholenumbersfrom1ton;and n n P = , k (n−k) sothat n n C = . k k(n−k) Theorem1.10 The number of selections of k objects from a set of n objects is giveninthefollowingtable. withreplacement withoutreplacement k n orderedsample n P k n unorderedsample C k n+k−1 In fact the number that goes in the empty box is C , but this is much k hardertoprovethantheothers,andyouareveryunlikelytoneedit. Here are the proofs of the other three cases. First, for sampling with replace ment and ordered sample, there are n choices for the first object, and n choices forthesecond,andsoon;wemultiplythechoicesfordifferentobjects. (Thinkof thechoicesasbeingdescribedbyabranchingtree.) Theproductofk factorseach k equaltonisn .10 CHAPTER1. BASICIDEAS Forsamplingwithoutreplacementandorderedsample,therearestillnchoices for the first object, but now only n−1 choices for the second (since we do not replacethefirst),andn−2forthethird,andsoon;therearen−k+1choicesfor the kth object, since k−1 have previously been removed and n−(k−1) remain. n Asbefore,wemultiply. Thisproductistheformulafor P . k Forsamplingwithoutreplacementandunorderedsample,thinkfirstofchoos n inganorderedsample,whichwecandoin P ways. Buteachunorderedsample k couldbeobtainedbydrawingitinkdifferentorders. Sowedividebyk,obtain n n ing P /k= C choices. k k 2 In our example with the pens, the numbers in the three boxes are 4 = 16, 4 4 P =12, and C =6, in agreement with what we got when we wrote them all 2 2 out. Note that, if we use the phrase ‘sampling without replacement, ordered sam ple’, or any other combination, we are assuming that all outcomes are equally likely. Example The names of the seven days of the week are placed in a hat. Three names are drawn out; these will be the days of the Probability I lectures. What is theprobabilitythatnolectureisscheduledattheweekend Here the sampling is without replacement, and we can take it to be either ordered or unordered; the answers will be the same. For ordered samples, the 7 size of the sample space is P =7·6·5 =210. If A is the event ‘no lectures at 3 weekends’, then A occurs precisely when all three days drawn are weekdays; so 5 A= P =5·4·3=60. Thus,P(A)=60/210=2/7. 3 5 7 Ifwedecidedtouseunorderedsamplesinstead,theanswerwouldbe C / C , 3 3 whichisonceagain2/7. Example Asixsideddieisrolledtwice. Whatistheprobabilitythatthesumof thenumbersisatleast10 This time we are sampling with replacement, since the two numbers may be 2 thesameordifferent. Sothenumberofelementsinthesamplespaceis6 =36. Toobtainasumof10ormore,thepossibilitiesforthetwonumbersare(4,6), (5,5), (6,4), (5,6), (6,5)or (6,6). Sotheprobabilityoftheeventis6/36=1/6. Example Aboxcontains20balls,ofwhich10areredand10areblue. Wedraw tenballsfromthebox,andweareinterestedintheeventthatexactly5oftheballs are red and 5 are blue. Do you think that this is more likely to occur if the draws aremadewithorwithoutreplacement Let S be the sample space, and A the event that five balls are red and five are blue.1.7. SAMPLING 11 10 Consider sampling with replacement. Then S = 20 . What is A The numberofwaysinwhichwecanchoosefirstfiveredballsandthenfiveblueones 5 5 10 (thatis,RRRRRBBBBB),is10 ·10 =10 . Buttherearemanyotherwaystoget five red and five blue balls. In fact, the five red balls could appear in any five of 10 the ten draws. This means that there are C =252 different patterns of five Rs 5 andfiveBs. Sowehave 10 A=252·10 , andso 10 252·10 P(A)= =0.246... 10 20 Nowconsidersamplingwithoutreplacement. Ifweregardthesampleasbeing 20 10 ordered, then S = P . There are P ways of choosing five of the ten red 10 5 balls, and the same for the ten blue balls, and as in the previous case there are 10 C patternsofredandblueballs. So 5 10 2 10 A=( P ) · C , 5 5 and 10 2 10 ( P ) · C 5 5 P(A)= =0.343... 20 P 10 20 10 If we regard the sample as being unordered, thenS= C . There are C 10 5 choicesofthefiveredballsandthesamefortheblueballs. Wenolongerhaveto countpatternssincewedon’tcareabouttheorderoftheselection. So 10 2 A=( C ) , 5 and 10 2 ( C ) 5 P(A)= =0.343... 20 C 10 Thisisthesameanswerasinthecasebefore,asitshouldbe;thequestiondoesn’t careaboutorderofchoices Sotheeventismorelikelyifwesamplewithreplacement. Example I have 6 gold coins, 4 silver coins and 3 bronze coins in my pocket. I takeoutthreecoinsatrandom. Whatistheprobabilitythattheyareallofdifferent material Whatistheprobabilitythattheyareallofthesamematerial Inthiscasethesamplingiswithoutreplacementandthesampleisunordered. 13 SoS = C =286. The event that the three coins are all of different material 3 canoccurin6·4·3=72ways,sincewemusthaveoneofthesixgoldcoins,and soon. Sotheprobabilityis72/286=0.252...12 CHAPTER1. BASICIDEAS Theeventthatthethreecoinsareofthesamematerialcanoccurin 6 4 3 C + C + C =20+4+1=25 3 3 3 ways,andtheprobabilityis25/286=0.087... Inasamplingproblem,youshouldfirstreadthequestioncarefullyanddecide whetherthesamplingiswithorwithoutreplacement. Ifitiswithoutreplacement, decide whether the sample is ordered (e.g. does the question say anything about the first object drawn). If so, then use the formula for ordered samples. If not, then you can use either ordered or unordered samples, whichever is convenient; they should give the same answer. If the sample is with replacement, or if it involves throwing a die or coin several times, then use the formula for sampling withreplacement. 1.8 Stoppingrules Suppose that you take a typing proficiency test. You are allowed to take the test up to three times. Of course, if you pass the test, you don’t need to take it again. Sothesamplespaceis S =p, fp, f fp, f f f, where for example f fp denotes the outcome that you fail twice and pass on your thirdattempt. Ifalloutcomeswereequallylikely,thenyourchanceofeventuallypassingthe testandgettingthecertificatewouldbe3/4. But it is unreasonable here to assume that all the outcomes are equally likely. For example, you may be very likely to pass on the first attempt. Let us assume that the probability that you pass the test is 0.8. (By Proposition 3, your chance of failingis 0.2.) Letus furtherassume that, no matterhow manytimes youhave failed,yourchanceofpassingatthenextattemptisstill0.8. Thenwehave P(p) = 0.8, P(fp) = 0.2·0.8=0.16, 2 P(f fp) = 0.2 ·0.8=0.032, 3 P(f f f) = 0.2 =0.008. Thus the probability that you eventually get the certificate is P(p, fp, f fp) = 0.8+0.16+0.032=0.992. Alternatively,youeventuallygetthecertificateunless youfailthreetimes,sotheprobabilityis1−0.008=0.992. Astoppingruleisaruleofthetypedescribedhere,namely,continuetheexper iment until some specified occurrence happens. The experiment may potentially beinfinite.1.9. QUESTIONNAIRERESULTS 13 For example, if you toss a coin repeatedly until you obtain heads, the sample spaceis S =H,TH,TTH,TTTH,... since in principle you may get arbitrarily large numbers of tails before the first head. (Wehavetoallowallpossibleoutcomes.) Inthetypingtest,theruleis‘stopifeitheryoupassoryouhavetakenthetest threetimes’. Thisensuresthatthesamplespaceisfinite. Inthenextchapter,wewillhavemoretosayaboutthe‘multiplicationrule’we used forcalculating theprobabilities. In themeantime youmight liketo consider whether it is a reasonable assumption for tossing a coin, or for someone taking a seriesoftests. Other kinds of stopping rules are possible. For example, the number of coin tosses might be determined by some other random process such as the roll of a die; or we might toss a coin until we have obtained heads twice; and so on. We willnotdealwiththese. 1.9 Questionnaireresults The students in the Probability I class in Autumn 2000 filled in the following questionnaire: 1. I have a hat containing 20 balls, 10 red and 10 blue. I draw 10 balls from the hat. I am interested in the event that I draw exactly five red and fiveblueballs. DoyouthinkthatthisismorelikelyifInotethecolourof each ball I draw and replace it in the hat, or if I don’t replace the balls in thehatafterdrawing Morelikelywithreplacement2 Morelikelywithoutreplacement2 2. Whatcolourareyoureyes Blue2 Brown2 Green2 Other2 3. Doyouownamobilephone Yes2 No2 Afterdiscardingincompletequestionnaires,theresultswereasfollows: Answerto “Morelikely “Morelikely question withreplacement” withoutreplacement” Eyes Brown Other Brown Other Mobilephone 35 4 35 9 Nomobilephone 10 3 7 114 CHAPTER1. BASICIDEAS Whatcanweconclude Halftheclassthoughtthat,intheexperimentwiththecolouredballs,sampling with replacement make the result more likely. In fact, as we saw in Chapter 1, actually it is more likely if we sample without replacement. (This doesn’t matter, sincethestudentswereinstructednottothinktoohardaboutit) Youmightexpectthateyecolourandmobilephoneownershipwouldhaveno influenceonyouranswer. Let’stestthis. Iftrue,thenofthe87peoplewithbrown eyes, half of them (i.e. 43 or 44) would answer “with replacement”, whereas in fact45did. Also,ofthe83peoplewithmobilephones,wewouldexpecthalf(that is,41or42)wouldanswer“withreplacement”,whereasinfact39ofthemdid. So perhaps we have demonstrated that people who own mobile phones are slightly smarterthanaverage,whereaspeoplewithbrowneyesareslightlylesssmart In fact we have shown no such thing, since our results refer only to the peo ple who filled out the questionnaire. But they do show that these events are not independent,inasensewewillcometosoon. On the other hand, since 83 out of 104 people have mobile phones, if we think that phone ownership and eye colour are independent, we would expect that the same fraction 83/104 of the 87 browneyed people would have phones, i.e. (83·87)/104 =69.4 people. In fact the number is 70, or as near as we can expect. So indeed it seems that eye colour and phone ownership are moreorless independent. 1.10 Independence TwoeventsAandBaresaidtobeindependent if P(A∩B)=P(A)·P(B). This is the definition of independence of events. If you are asked in an exam to define independence of events, this is the correct answer. Do not say that two eventsareindependentifonehasnoinfluenceontheother;andundernocircum stances say that A and B are independent if A∩B =0/ (this is the statement that A and B are disjoint, which is quite a different thing) Also, do not ever say that P(A∩B) = P(A)·P(B) unless you have some good reason for assuming that A and B are independent (either because this is given in the question, or as in the nextbutoneparagraph). Let us return to the questionnaire example. Suppose that a student is chosen atrandomfromthosewhofilledoutthequestionnaire. LetAbetheeventthatthis student thought that the event was more likely if we sample with replacement; B the event that the student has brown eyes; andC the event that the student has a1.10. INDEPENDENCE 15 mobilephone. Then P(A) = 52/104=0.5, P(B) = 87/104=0.8365, P(C) = 83/104=0.7981. Furthermore, P(A∩B)=45/104=0.4327, P(A)·P(B)=0.4183, P(A∩C)=39/104=0.375, P(A)·P(C)=0.3990, P(B∩C)=70/104=0.6731, P(B)∩P(C)=0.6676. So none of the three pairs is independent, but in a sense B and C ‘come closer’ thaneitheroftheothers,aswenoted. In practice, if it is the case that the event A has no effect on the outcome of event B, then A and B are independent. But this does not apply in the other direction. There might be a very definite connection between A and B, but still it could happen that P(A∩B) =P(A)·P(B), so that A and B are independent. We willseeanexampleshortly. Example If we toss a coin more than once, or roll a die more than once, then you may assume that different tosses or rolls are independent. More precisely, if we roll a fair sixsided die twice, then the probability of getting 4 on the first throw and 5 on the second is 1/36, since we assume that all 36 combinations of the two throws are equally likely. But (1/36) = (1/6)·(1/6), and the separate probabilitiesofgetting4onthefirstthrowandofgetting5onthesecondareboth equalto1/6. Sothetwoeventsareindependent. Thiswouldworkjustaswellfor anyothercombination. Ingeneral,itisalwaysOKtoassumethattheoutcomesofdifferenttossesofa coin,ordifferentthrowsofadie,areindependent. Thisholdseveniftheexamples arenotallequallylikely. Wewillseeanexamplelater. Example I have two red pens, one green pen, and one blue pen. I choose two pens without replacement. Let A be the event that I choose exactly one red pen, andBtheeventthatIchooseexactlyonegreenpen. IfthepensarecalledR ,R ,G,B,then 1 2 S = R R ,R G,R B,R G,R B,GB, 1 2 1 1 2 2 A = R G,R B,R G,R B, 1 1 2 2 B = R G,R G,GB 1 216 CHAPTER1. BASICIDEAS WehaveP(A)=4/6=2/3,P(B)=3/6=1/2,P(A∩B)=2/6=1/3=P(A)P(B), soAandBareindependent. But before you say ‘that’s obvious’, suppose that I have also a purple pen, and I do the same experiment. This time, if you write down the sample space andthetwoeventsanddothecalculations,youwillfindthatP(A)=6/10=3/5, P(B) = 4/10 = 2/5, P(A∩B) = 2/10 = 1/5 = 6 P(A)P(B), so adding one more penhasmadetheeventsnonindependent Weseethatitisverydifficulttotellwhethereventsareindependentornot. In practice, assume that events are independent only if either you are told to assume it, or the events are the outcomes of different throws of a coin or die. (There is one other case where you can assume independence: this is the result of different draws,withreplacement,fromasetofobjects.) Example Consider the experiment where we toss a fair coin three times and note the results. Each of the eight possible outcomes has probability 1/8. Let A be the event ‘there are more heads than tails’, and B the event ‘the results of the firsttwotossesarethesame’. Then • A=HHH,HHT,HTH,THH,P(A)=1/2, • B=HHH,HHT,TTH,TTT,P(B)=1/2, • A∩B=HHH,HHT,P(A∩B)=1/4; so A and B are independent. However, both A and B clearly involve the results of the first two tosses and it is not possible to make a convincing argument that one of these events has no influence or effect on the other. For example, letC be the event‘headsonthelasttoss’. Then,aswesawinPart1, • C =HHH,HTH,THH,TTH,P(C)=1/2, • A∩C =HHH,HTH,THH,P(A∩C)=3/8; soAandC arenotindependent. AreBandC independent 1.11 Mutualindependence Thissectionisabittechnical. Youwillneedtoknowtheconclusions,thoughthe argumentsweusetoreachthemarenotsoimportant. We saw in the cointossing example above that it is possible to have three eventsA,B,CsothatAandBareindependent,BandCareindependent,butAand C arenotindependent.1.12. PROPERTIESOFINDEPENDENCE 17 If all three pairs of events happen to be independent, can we then conclude that P(A∩B∩C)=P(A)·P(B)·P(C) At first sight this seems very reasonable; inAxiom3,weonlyrequiredallpairsofeventstobeexclusiveinordertojustify ourconclusion. Unfortunatelyitisnottrue... Example Inthecointossingexample,let Abetheevent‘firstandsecondtosses have same result’, B the event ‘first and third tosses have the same result, and C the event ‘second and third tosses have same result’. You should check that P(A)=P(B)=P(C)=1/2,andthattheeventsA∩B,B∩C,A∩C,andA∩B∩C are all equal toHHH,TTT, with probability 1/4. Thus any pair of the three eventsareindependent,but P(A∩B∩C) = 1/4, P(A)·P(B)·P(C) = 1/8. SoA,B,C arenotmutuallyindependent. Thecorrectdefinitionandpropositionrunasfollows. LetA ,...,A beevents. Wesaythattheseeventsaremutuallyindependentif, 1 n givenanydistinctindicesi ,i ,...,i withk≥1,theevents 1 2 k A ∩A ∩···∩A and A i i i i 1 2 k−1 k are independent. In other words, any one of the events is independent of the intersectionofanynumberoftheothereventsintheset. Proposition1.11 LetA ,...,A bemutuallyindependent. Then 1 n P(A ∩A ∩···∩A )=P(A )·P(A )···P(A ). 1 2 n 1 2 n Now all you really need to know is that the same ‘physical’ arguments that justify that two events (such as two tosses of a coin, or two throws of a die) are independent,alsojustifythatanynumberofsucheventsaremutuallyindependent. So, for example, if we toss a fair coin six times, the probability of getting the 6 sequence HHTHHT is (1/2) =1/64, and the same would apply for any other sequence. Inotherwords,all64possibleoutcomesareequallylikely. 1.12 Propertiesofindependence 0 Proposition1.12 IfAandBareindependent,thenAandB areindependent.18 CHAPTER1. BASICIDEAS 0 WearegiventhatP(A∩B)=P(A)·P(B),andaskedtoprovethatP(A∩B )= 0 P(A)·P(B ). 0 FromCorollary4,weknowthatP(B )=1−P(B). Also,theeventsA∩Band 0 0 A∩B are disjoint (since no outcome can be both in B and B), and their union 0 is A (since every event in A is either in B or in B); so by Axiom 3, we have that 0 P(A)=P(A∩B)+P(A∩B ). Thus, 0 P(A∩B ) = P(A)−P(A∩B) = P(A)−P(A)·P(B) (sinceAandBareindependent) = P(A)(1−P(B)) 0 = P(A)·P(B ), whichiswhatwewererequiredtoprove. 0 0 Corollary1.13 IfAandBareindependent,soareA andB. 0 Apply the Proposition twice, first to A and B (to show that A and B are inde 0 0 0 pendent),andthentoB andA(toshowthatB andA areindependent). Moregenerally,ifeventsA ,...,A aremutuallyindependent,andwereplace 1 n some of them by their complements, then the resulting events are mutually inde 0 pendent. Wehavetobeabitcarefulthough. Forexample,AandA arenotusually independent Resultslikethefollowingarealsotrue. Proposition1.14 Let events A, B,C be mutually independent. Then A and B∩C areindependent,andAandB∪C areindependent. Example Consider the example of the typing proficiency test that we looked at earlier. Youarealloweduptothreeattemptstopassthetest. Suppose that your chance of passing the test is 0.8. Suppose also that the eventsofpassingthetestonanynumberofdifferentoccasionsaremutuallyinde pendent. Then,byProposition1.11,theprobabilityofanysequenceofpassesand failsistheproductoftheprobabilitiesofthetermsinthesequence. Thatis, 2 3 P(p)=0.8, P(fp)=(0.2)·(0.8), P(f fp)=(0.2) ·(0.8), P(f f f)=(0.2) , asweclaimedintheearlierexample. In other words, mutual independence is the condition we need to justify the argumentweusedinthatexample.1.12. PROPERTIESOFINDEPENDENCE 19 Example  The electrical apparatus in the diagram A B works so long as current can flow from left  to right. The three components are inde pendent. The probability that component A works is 0.8; the probability that compo  nent B works is 0.9; and the probability that C  componentC worksis0.75. Findtheprobabilitythattheapparatusworks. At risk of some confusion, we use the letters A, B andC for the events ‘com ponent A works’, ‘component B works’, and ‘componentC works’, respectively. Now the apparatus will work if either A and B are working, or C is working (or possiblyboth). Thustheeventweareinterestedinis (A∩B)∪C. Now P((A∩B)∪C)) = P(A∩B)+P(C)−P(A∩B∩C) (byInclusion–Exclusion) = P(A)·P(B)+P(C)−P(A)·P(B)·P(C) (bymutualindependence) = (0.8)·(0.9)+(0.75)−(0.8)·(0.9)·(0.75) = 0.93. The problem can also be analysed in a different way. The apparatus will not work if both paths are blocked, that is, ifC is not working and one of A and B is 0 0 0 alsonotworking. Thus,theeventthattheapparatusdoesnotworkis(A ∪B )∩C . 0 0 0 0 BytheDistributiveLaw,thisisequalto (A ∩C )∪(B ∩C ). Wehave 0 0 0 0 0 0 0 0 0 0 0 P((A ∩C )∪(B ∩C ) = P(A ∩C )+P(B ∩C )−P(A ∩B ∩C ) (byInclusion–Exclusion) 0 0 0 0 0 0 0 = P(A )·P(C )+P(B )·P(C )−P(A )·P(B )·P(C ) 0 0 0 (bymutualindependenceofA,B,C ) = (0.2)·(0.25)+(0.1)·(0.25)−(0.2)·(0.1)·(0.25) = 0.07, sotheapparatusworkswithprobability1−0.07=0.93. Thereisatrapherewhichyoushouldtakecaretoavoid. Youmightbetempted 0 0 0 0 tosay P(A ∩C )=(0.2)·(0.25)=0.05,and P(B ∩C )=(0.1)·(0.25)=0.025; andconcludethat 0 0 0 0 P((A ∩C )∪(B ∩C ))=0.05+0.025−(0.05)·(0.025)=0.07375 by the Principle of Inclusion and Exclusion. But this is not correct, since the 0 0 0 0 eventsA ∩C andB ∩C arenot independent20 CHAPTER1. BASICIDEAS Example We can always assume that successive tosses of a coin are mutually independent, even if it is not a fair coin. Suppose that I have a coin which has probability 0.6 of coming down heads. I toss the coin three times. What are the probabilitiesofgettingthreeheads,twoheads,onehead,ornoheads For three heads, since successive tosses are mutually independent, the proba 3 bilityis (0.6) =0.216. The probability of tails on any toss is 1−0.6 = 0.4. Now the event ‘two heads’canoccurinthreepossibleways,asHHT,HTH,orTHH. Eachoutcome has probability (0.6)·(0.6)·(0.4) = 0.144. So the probability of two heads is 3·(0.144)=0.432. 2 Similarlytheprobabilityofoneheadis3·(0.6)·(0.4) =0.288,andtheprob 3 abilityofnoheadsis (0.4) =0.064. Asacheck,wehave 0.216+0.432+0.288+0.064=1. 1.13 Workedexamples Question (a)Yougototheshoptobuyatoothbrush. Thetoothbrushestherearered,blue, green, purple and white. The probability that you buy a red toothbrush is threetimestheprobabilitythatyoubuyagreenone;theprobabilitythatyou buyablueoneistwicetheprobabilitythatyoubuyagreenone; theproba bilities of buying green, purple, and white are all equal. You are certain to buy exactly one toothbrush. For each colour, find the probability that you buyatoothbrushofthatcolour. (b) James and Simon share a flat, so it would be confusing if their toothbrushes were the same colour. On the first day of term they both go to the shop to buy a toothbrush. For each of James and Simon, the probability of buying various colours of toothbrush is as calculated in (a), and their choices are independent. Find the probability that they buy toothbrushes of the same colour. (c) James and Simon live together for three terms. On the first day of each term they buy new toothbrushes, with probabilities as in (b), independently of what they had bought before. This is the only time that they change their toothbrushes. Find the probablity that James and Simon have differently coloured toothbrushes from each other for all three terms. Is it more likely that they will have differently coloured toothbrushes from each other for1.13. WORKEDEXAMPLES 21 all three terms or that they will sometimes have toothbrushes of the same colour Solution (a) Let R,B,G,P,W be the events that you buy a red, blue, green, purple and whitetoothbrushrespectively. Letx=P(G). Wearegiventhat P(R)=3x, P(B)=2x, P(P)=P(W)=x. Sincetheseoutcomescomprisethewholesamplespace,Corollary2gives 3x+2x+x+x+x=1, sox=1/8. Thus,theprobabilitiesare3/8,1/4,1/8,1/8,1/8respectively. (b)LetRBdenotetheevent‘JamesbuysaredtoothbrushandSimonbuysablue toothbrush’,etc. Byindependence(given),wehave,forexample, P(RR)=(3/8)·(3/8)=9/64. The event that the toothbrushes have the same colour consists of the five outcomesRR,BB,GG,PP,WW,soitsprobabilityis P(RR)+P(BB)+P(GG)+P(PP)+P(WW) 9 1 1 1 1 1 = + + + + = . 64 16 64 64 64 4 (c)Theevent‘differentcolouredtoothbrushesintheithterm’hasprobability3/4 (from part (b)), and these events are independent. So the event ‘different colouredtoothbrushesinallthreeterms’hasprobability 3 3 3 27 · · = . 4 4 4 64 The event ‘same coloured toothbrushes in at least one term’ is the comple ment of the above, so has probability 1−(27/64) = (37)/(64). So it is morelikelythattheywillhavethesamecolourinatleastoneterm. Question There are 24 elephants in a game reserve. The warden tags six of the elephants with small radio transmitters and returns them to the reserve. The next month,herandomlyselectsfiveelephantsfromthereserve. Hecountshowmany oftheseelephantsaretagged. Assumethatnoelephantsleaveorenterthereserve, or die or give birth, between the tagging and the selection; and that all outcomes of the selection are equally likely. Find the probability that exactly two of the selectedelephantsaretagged,givingtheanswercorrectto3decimalplaces.22 CHAPTER1. BASICIDEAS Solution Theexperiment consistsof pickingthe fiveelephants, not theoriginal 24 choiceofsixelephantsfortagging. LetS bethesamplespace. ThenS= C . 5 LetAbetheeventthattwooftheselectedelephantsaretagged. Thisinvolves choosingtwoofthesixtaggedelephantsandthreeoftheeighteenuntaggedones, 6 18 soA= C · C . Thus 2 3 6 18 C · C 2 3 P(A)= =0.288 24 C 5 to3d.p. Note: Should the sample should be ordered or unordered Since the answer doesn’tdependontheorderinwhichtheelephantsarecaught,anunorderedsam pleispreferable. Ifyouwanttouseanorderedsample,thecalculationis 6 18 5 P · P · C 2 3 2 P(A)= =0.288, 24 P 5 5 since it is necessary to multiply by the C possible patterns of tagged and un 2 taggedelephantsinasampleoffivewithtwotagged. Question A couple are planning to have a family. They decide to stop having children either when they have two boys or when they have four children. Sup posethattheyaresuccessfulintheirplan. (a)Writedownthesamplespace. (b) Assume that, each time that they have a child, the probability that it is a boy is 1/2, independent of all other times. Find P(E) and P(F) where E =“thereareatleasttwogirls”,F =“therearemoregirlsthanboys”. Solution (a)S =BB,BGB,GBB,BGGB,GBGB,GGBB,BGGG,GBGG, GGBG,GGGB,GGGG. (b)E =BGGB,GBGB,GGBB,BGGG,GBGG,GGBG,GGGB,GGGG, F =BGGG,GBGG,GGBG,GGGB,GGGG. NowwehaveP(BB)=1/4,P(BGB)=1/8,P(BGGB)=1/16,andsimilarly fortheotheroutcomes. SoP(E)=8/16=1/2,P(F)=5/16.Chapter2 Conditionalprobability In this chapter we develop the technique of conditional probability to deal with caseswhereeventsarenotindependent. 2.1 Whatisconditionalprobability Alice and Bob are going out to dinner. They toss a fair coin ‘best of three’ to decide who pays: if there are more heads than tails in the three tosses then Alice pays,otherwiseBobpays. Clearlyeachhasa50chanceofpaying. Thesamplespaceis S =HHH,HHT,HTH,HTT,THH,THT,TTH,TTT, andtheevents‘Alicepays’and‘Bobpays’arerespectively A=HHH,HHT,HTH,THH, B=HTT,THT,TTH,TTT. They toss the coin once and the result is heads; call this event E. How should wenowreassesstheirchances Wehave E =HHH,HHT,HTH,HTT, andifwearegiventheinformationthattheresultofthefirsttossisheads,thenE nowbecomesthesamplespaceoftheexperiment,sincetheoutcomesnotinE are no longer possible. In the new experiment, the outcomes ‘Alice pays’ and ‘Bob pays’are A∩E =HHH,HHT,HTH, B∩E =HTT. 2324 CHAPTER2. CONDITIONALPROBABILITY Thus the new probabilities that Alice and Bob pay for dinner are 3/4 and 1/4 respectively. In general, suppose that we are given that an event E has occurred, and we wanttocomputetheprobabilitythatanothereventAoccurs. Ingeneral,wecanno longercount,sincetheoutcomesmaynotbeequallylikely. Thecorrectdefinition isasfollows. Let E be an event with nonzero probability, and let A be any event. The conditionalprobabilityofAgivenE isdefinedas P(A∩E) P(AE)= . P(E) AgainIemphasisethatthisisthedefinition. Ifyouareaskedforthedefinition of conditional probability, it is not enough to say “the probability of A given that E hasoccurred”,althoughthisisthebestwaytounderstandit. Thereisnoreason whyeventE shouldoccurbeforeeventA Notetheverticalbarinthenotation. ThisisP(AE),notP(A/E)orP(A\E). Note also that the definition only applies in the case where P(E) is not equal tozero,sincewehavetodividebyit,andthiswouldmakenosenseifP(E)=0. Tochecktheformulainourexample: P(A∩E) 3/8 3 P(AE)= = = , P(E) 1/2 4 P(B∩E) 1/8 1 P(BE)= = = . P(E) 1/2 4 It may seem like a small matter, but you should be familiar enough with this formula that you can write it down without stopping to think about the names of theevents. Thus,forexample, P(A∩B) P(AB)= P(B) ifP(B)= 6 0. Example A random car is chosen among all those passing through Trafalgar Square on a certain day. The probability that the car is yellow is 3/100: the probability that the driver is blonde is 1/5; and the probability that the car is yellowandthedriverisblondeis1/50. Find the conditional probability that the driver is blonde given that the car is yellow.2.2. GENETICS 25 Solution: IfY istheevent‘thecarisyellow’andBtheevent‘thedriverisblonde’, thenwearegiventhatP(Y)=0.03, P(B)=0.2,andP(Y∩B)=0.02. So P(B∩Y) 0.02 P(BY)= = =0.667 P(Y) 0.03 to3d.p. Notethatwehaven’tusedalltheinformationgiven. Thereisaconnectionbetweenconditionalprobabilityandindependence: Proposition2.1 LetAandBbeeventswithP(B)6=0. ThenAandBareindepen dentifandonlyifP(AB)=P(A). Proof Thewords‘ifandonlyif’tellusthatwehavetwojobstodo: wehaveto showthatifAandBareindependent,thenP(AB)=P(A);andthatifP(AB)= P(A),thenAandBareindependent. SofirstsupposethatAandBareindependent. Rememberthatthismeansthat P(A∩B)=P(A)·P(B). Then P(A∩B) P(A)·P(B) P(AB)= = =P(A), P(B) P(B) thatis,P(AB)=P(A),aswehadtoprove. NowsupposethatP(AB)=P(A). Inotherwords, P(A∩B) =P(A), P(B) usingthedefinitionofconditionalprobability. Nowclearingfractionsgives P(A∩B)=P(A)·P(B), whichisjustwhatthestatement‘AandBareindependent’means. This proposition is most likely what people have in mind when they say ‘A andBareindependentmeansthatBhasnoeffectonA’. 2.2 Genetics Here is a simplified version of how genes code eye colour, assuming only two coloursofeyes. Eachpersonhastwogenesforeyecolour. EachgeneiseitherBorb. Achild receives one gene from each of its parents. The gene it receives from its father is one of its father’s two genes, each with probability 1/2; and similarly for its mother. Thegenesreceivedfromfatherandmotherareindependent. IfyourgenesareBBorBborbB,youhavebrowneyes; ifyourgenesarebb, youhaveblueeyes.26 CHAPTER2. CONDITIONALPROBABILITY Example Suppose that John has brown eyes. So do both of John’s parents. His sisterhasblueeyes. WhatistheprobabilitythatJohn’sgenesareBB Solution John’ssisterhasgenesbb,soonebmusthavecomefromeachparent. Thus each of John’s parents is Bb or bB; we may assume Bb. So the possibilities forJohnare(writingthegenefromhisfatherfirst) BB,Bb,bB,bb eachwithprobability1/4. (Forexample,Johngetshisfather’sBgenewithprob ability 1/2 and his mother’s B gene with probability 1/2, and these are indepen dent, so the probability that he gets BB is 1/4. Similarly for the other combina tions.) Let X be the event ‘John has BB genes’ and Y the event ‘John has brown eyes’. Then X =BB andY =BB,Bb,bB. The question asks us to calculate P(XY). Thisisgivenby P(X∩Y) 1/4 P(XY)= = =1/3. P(Y) 3/4 2.3 TheTheoremofTotalProbability Sometimeswearefacedwithasituationwherewedonotknowtheprobabilityof an event B, but we know what its probability would be if we were sure that some othereventhadoccurred. Example Anicecreamsellerhastodecidewhethertoordermorestockforthe Bank Holiday weekend. He estimates that, if the weather is sunny, he has a 90 chanceofsellingallhisstock;ifitiscloudy,hischanceis60;andifitrains,his chanceisonly20. Accordingtotheweatherforecast,theprobabilityofsunshine is 30, the probability of cloud is 45, and the probability of rain is 25. (We assume that these are all the possible outcomes, so that their probabilities must addupto100.) Whatistheoverallprobabilitythatthesalesmanwillsellallhis stock ThisproblemisansweredbytheTheoremofTotalProbability,whichwenow state. First we need a definition. The events A ,A ,...,A form a partition of the 1 2 n samplespaceifthefollowingtwoconditionshold: / (a) the events are pairwise disjoint, that is, A ∩A =0 for any pair of events A i j i andA ; j (b)A ∪A ∪···∪A =S. 1 2 n2.3. THETHEOREMOFTOTALPROBABILITY 27 Another way of saying the same thing is that every outcome in the sample space lies in exactly one of the events A ,A ,...,A . The picture shows the idea of a 1 2 n partition. A A ... A 1 2 n NowwestateandprovetheTheoremofTotalProbability. Theorem2.2 LetA ,A ,...,A formapartitionofthesamplespacewithP(A )= 6 1 2 n i 0foralli,andletBbeanyevent. Then n P(B)= P(BA )·P(A ). i i ∑ i=1 Proof Bydefinition,P(BA )=P(B∩A )/P(A ). Multiplyingup,wefindthat i i i P(B∩A )=P(BA )·P(A ). i i i Now consider the events B∩A ,B∩A ,...,B∩A . These events are pairwise 1 2 n disjoint;foranyoutcomelyinginboth B∩A and B∩A wouldlieinboth A and i j i A , and by assumption there are no such outcomes. Moreover, the union of all j these events is B, since every outcome lies in one of the A. So, by Axiom 3, we i concludethat n P(B∩A )=P(B). i ∑ i=1 SubstitutingourexpressionforP(B∩A )givestheresult. i   B   A A ... A 1 2 n Consider the icecream salesman at the start of this section. Let A be the 1 event ‘it is sunny’, A the event ‘it is cloudy’, and A the event ‘it is rainy’. Then 2 3 A ,A andA formapartitionofthesamplespace,andwearegiventhat 1 2 3 P(A )=0.3, P(A )=0.45, P(A )=0.25. 1 2 328 CHAPTER2. CONDITIONALPROBABILITY LetBbetheevent‘thesalesmansellsallhisstock’. Theotherinformationweare givenisthat P(BA )=0.9, P(BA )=0.6, P(BA )=0.2. 1 2 3 BytheTheoremofTotalProbability, P(B)=(0.9×0.3)+(0.6×0.45)+(0.2×0.25)=0.59. YouwillnowrealisethattheTheoremofTotalProbabilityisreallybeingused whenyoucalculateprobabilitiesbytreediagrams. Itisbettertogetintothehabit ofusingitdirectly,sinceitavoidsanyaccidentalassumptionsofindependence. One special case of the Theorem of Total Probability is very commonly used, 0 and is worth stating in its own right. For any event A, the events A and A form a 0 partition of S. To say that both A and A have nonzero probability is just to say thatP(A)6=0,1. Thuswehavethefollowingcorollary: Corollary2.3 LetAandBbeevents,andsupposethatP(A)= 6 0,1. Then 0 0 P(B)=P(BA)·P(A)+P(BA )·P(A ). 2.4 Samplingrevisited We can use the notion of conditional probability to treat sampling problems in volvingorderedsamples. Example I have two red pens, one green pen, and one blue pen. I select two penswithoutreplacement. (a)Whatistheprobabilitythatthefirstpenchosenisred (b)Whatistheprobabilitythatthesecondpenchosenisred For the first pen, there are four pens of which two are red, so the chance of selectingaredpenis2/4=1/2. Forthesecondpen,wemustseparatecases. LetA betheevent‘firstpenred’, 1 A theevent‘firstpengreen’andA theevent‘firstpenblue’. ThenP(A )=1/2, 2 3 1 P(A )=P(A )=1/4(arguingasabove). LetBbetheevent‘secondpenred’. 2 3 If the first pen is red, then only one of the three remaining pens is red, so that P(BA ) =1/3. On the other hand, if the first pen is green or blue, then two of 1 theremainingpensarered,soP(BA )=P(BA )=2/3. 2 32.5. BAYES’THEOREM 29 BytheTheoremofTotalProbability, P(B) = P(BA )P(A )+P(BA )P(A )+P(BA )P(A ) 1 1 2 2 3 3 = (1/3)×(1/2)+(2/3)×(1/4)+(2/3)×(1/4) = 1/2. We have reached by a roundabout argument a conclusion which you might thinktobeobvious. Ifwehavenoinformationaboutthefirstpen,thenthesecond pen is equally likely to be any one of the four, and the probability should be 1/2, just as for the first pen. This argument happens to be correct. But, until your ability to distinguish between correct arguments and plausiblelooking false ones is very well developed, you may be safer to stick to the calculation that we did. Beware of obviouslooking arguments in probability Many clever people have beencaughtout. 2.5 Bayes’Theorem ThereisaverybigdifferencebetweenP(AB)andP(BA). Supposethatanewtestisdevelopedtoidentifypeoplewhoareliabletosuffer from some genetic disease in later life. Of course, no test is perfect; there will be somecarriersofthedefectivegenewhotestnegative,andsomenoncarrierswho test positive. So, for example, let A be the event ‘the patient is a carrier’, and B theevent‘thetestresultispositive’. The scientists who develop the test are concerned with the probabilities that 0 0 the test result is wrong, that is, with P(BA ) and P(B A). However, a patient who has taken the test has different concerns. If I tested positive, what is the chancethatIhavethedisease IfItestednegative,howsurecanIbethatIamnot 0 0 acarrier Inotherwords,P(AB)andP(A B ). TheseconditionalprobabilitiesarerelatedbyBayes’Theorem: Theorem2.4 LetAandBbeeventswithnonzeroprobability. Then P(BA)·P(A) P(AB)= . P(B) Theproofisnothard. Wehave P(AB)·P(B)=P(A∩B)=P(BA)·P(A), using the definition of conditional probability twice. (Note that we need both A andBtohavenonzeroprobabilityhere.) Nowdividethisequationby P(B)toget theresult.30 CHAPTER2. CONDITIONALPROBABILITY IfP(A)6=0,1andP(B)= 6 0,thenwecanuseCorollary17towritethisas P(BA)·P(A) P(AB)= . 0 0 P(BA)·P(A)+P(BA )·P(A ) Bayes’Theoremisoftenstatedinthisform. Example ConsidertheicecreamsalesmanfromSection2.3. Giventhathesold all his stock of icecream, what is the probability that the weather was sunny (Thisquestionmightbeaskedbythewarehousemanagerwhodoesn’tknowwhat the weather was actually like.) Using the same notation that we used before, A 1 isthe event‘itis sunny’and B theevent ‘thesalesmansells all hisstock’. Weare askedforP(A B). WeweregiventhatP(BA )=0.9andthatP(A )=0.3,and 1 1 1 wecalculatedthatP(B)=0.59. SobyBayes’Theorem, P(BA )P(A ) 0.9×0.3 1 1 P(A B)= = =0.46 1 P(B) 0.59 to2d.p. Example Considertheclinicaltestdescribedatthestartofthissection. Suppose that 1 in 1000 of the population is a carrier of the disease. Suppose also that the probability that a carrier tests negative is 1, while the probability that a non carrier tests positive is 5. (A test achieving these values would be regarded as very successful.) Let A be the event ‘the patient is a carrier’, and B the event ‘the 0 test result is positive’. We are given that P(A) =0.001 (so that P(A ) =0.999), andthat 0 P(BA)=0.99, P(BA )=0.05. (a) A patient has just had a positive test result. What is the probability that the patientisacarrier Theansweris P(BA)P(A) P(AB) = 0 0 P(BA)P(A)+P(BA )P(A ) 0.99×0.001 = (0.99×0.001)+(0.05×0.999) 0.00099 = =0.0194. 0.05094 (b) A patient has just had a negative test result. What is the probability that the patientisacarrier Theansweris 0 P(B A)P(A) 0 P(AB ) = 0 0 0 0 P(B A)P(A)+P(B A )P(A )2.6. ITERATEDCONDITIONALPROBABILITY 31 0.01×0.001 = (0.01×0.001)+(0.95×0.999) 0.00001 = =0.00001. 0.94095 Soapatientwithanegativetestresultcanbereassured; butapatientwithaposi tivetestresultstillhaslessthan2chanceofbeingacarrier,soislikelytoworry unnecessarily. Of course, these calculations assume that the patient has been selected at ran dom from the population. If the patient has a family history of the disease, the calculationswouldbequitedifferent. Example 2 of the population have a certain blood disease in a serious form; 10 have it in a mild form; and 88 don’t have it at all. A new blood test is developed; the probability of testing positive is 9/10 if the subject has the serious form, 6/10 if the subject has the mild form, and 1/10 if the subject doesn’t have thedisease. Ihavejusttestedpositive. WhatistheprobabilitythatIhavetheseriousform ofthedisease Let A be ‘has disease in serious form’, A be ‘has disease in mild form’, and 1 2 A be ‘doesn’t have disease’. Let B be ‘test positive’. Then we are given that A , 3 1 A ,A formapartitionand 2 3 P(A )=0.02 P(A )=0.1 P(A )=0.88 1 2 3 P(BA )=0.9 P(BA )=0.6 P(BA )=0.1 1 2 3 Thus,bytheTheoremofTotalProbability, P(B)=0.9×0.02+0.6×0.1+0.1×0.88=0.166, andthenbyBayes’Theorem, P(BA )P(A ) 0.9×0.02 1 1 P(A B)= = =0.108 1 P(B) 0.166 to3d.p. 2.6 Iteratedconditionalprobability The conditional probability of C, given that both A and B have occurred, is just P(CA∩B). SometimesinsteadwejustwriteP(CA,B). Itisgivenby P(C∩A∩B) P(CA,B)= , P(A∩B)32 CHAPTER2. CONDITIONALPROBABILITY so P(A∩B∩C)=P(CA,B)P(A∩B). Nowwealsohave P(A∩B)=P(BA)P(A), sofinally(assumingthatP(A∩B)= 6 0),wehave P(A∩B∩C)=P(CA,B)P(BA)P(A). Thisgeneralisestoanynumberofevents: Proposition2.5 Let A ,...,A be events. Suppose that P(A ∩···∩A )6= 0. 1 n 1 n−1 Then P(A ∩A ∩···∩A )=P(A A ,...,A )···P(A A )P(A ). 1 2 n n 1 n−1 2 1 1 Weapplythistothebirthdayparadox. Thebirthdayparadoxisthefollowingstatement: If there are 23 or more people in a room, then the chances are better thaneventhattwoofthemhavethesamebirthday. Tosimplifytheanalysis,weignore29February,andassumethattheother365 daysareallequallylikelyasbirthdaysofarandomperson. (Thisisnotquitetrue but not inaccurate enough to have much effect on the conclusion.) Suppose that we have n people p ,p ,...,p . Let A be the event ‘p has a different birthday 1 2 n 2 2 1 from p ’. Then P(A ) =1− , since whatever p ’s birthday is, there is a 1 in 1 2 1 365 365chancethat p willhavethesamebirthday. 2 Let A be the event ‘p has a different birthday from p and p ’. It is not 3 3 1 2 straightforward to evaluate P(A ), since we have to consider whether p and p 3 1 2 have the same birthday or not. (See below). But we can calculate that P(A 3 2 A )=1− ,sinceifA occursthen p and p havebirthdaysondifferentdays, 2 2 1 2 365 andA willoccuronlyif p ’sbirthdayisonneitherofthesedays. So 3 3 1 2 P(A ∩A )=P(A )P(A A )=(1− )(1− ). 2 3 2 3 2 365 365 What is A ∩A It is simply the event that all three people have birthdays on 2 3 differentdays. Now this process extends. If A denotes the event ‘p’s birthday is not on the i i samedayasanyof p ,...,p ’,then 1 i−1 i−1 P(A A ,...,A )=1− , i 1 i−1 3652.6. ITERATEDCONDITIONALPROBABILITY 33 andsobyProposition2.5, 1 2 i−1 P(A ∩···∩A )=(1− )(1− )···(1− ). 1 i 365 365 365 Call this number q; it is the probability that all of the people p ,...,p have i 1 i theirbirthdaysondifferentdays. Thenumbersq decrease,sinceateachstepwemultiplybyafactorlessthan1. i Sotherewillbesomevalueofnsuchthat q 0.5, q ≤0.5, n−1 n that is, n is the smallest number of people for which the probability that they all have different birthdays is less than 1/2, that is, the probability of at least one coincidenceisgreaterthan1/2. By calculation, we find that q = 0.5243, q = 0.4927 (to 4 d.p.); so 23 22 23 peopleareenoughfortheprobabilityofcoincidencetobegreaterthan1/2. Now return to a question we left open before. What is the probability of the eventA (Thisistheeventthat p hasadifferentbirthdayfromboth p and p .) 3 3 1 2 2 If p and p have different birthdays, the probability is 1− : this is the 1 2 365 calculationwealreadydid. Ontheotherhand,if p and p havethesamebirthday, 1 2 1 0 then the probability is 1− . These two numbers are P(A A ) and P(A A ) 3 2 3 365 2 respectively. So,bytheTheoremofTotalProbability, 0 0 P(A ) = P(A A )P(A )+P(A A )P(A ) 3 3 2 2 3 2 2 2 1 1 1 = (1− )(1− )+(1− ) 365 365 365 365 = 0.9945 to4d.p. Problem How many people would you need to pick at random to ensure that thechanceoftwoofthembeingborninthesamemontharebetterthaneven Assuming all months equally likely, if B is the event that p is born in a dif i i ferentmonthfromanyof p ,...,p ,thenasbeforewefindthat 1 i−1 i−1 P(B B ,···,B )=1− , i 1 i−1 12 so 1 2 i−1 P(B ∩···∩B )=(1− )(1− )(1− ). 1 i 12 12 12 Wecalculatethatthisprobabilityis (11/12)×(10/12)×(9/12)=0.572934 CHAPTER2. CONDITIONALPROBABILITY fori=4and (11/12)×(10/12)×(9/12)×(8/12)=0.3819 for i=5. So, with five people, it is more likely that two will have the same birth month. Atruestory. Some years ago, in a probability class with only ten students, the lecturer started discussing the Birthday Paradox. He said to the class, “I bet that notwopeopleintheroomhavethesamebirthday”. Heshouldhavebeenonsafe ground,since q =0.859. (Rememberthatthereareelevenpeopleintheroom) 11 However, a student in the back said “I’ll take the bet”, and after a moment all the otherstudentsrealisedthatthelecturerwouldcertainlylosehiswager. Why (Answerinthenextchapter.) 2.7 Workedexamples Question Each person has two genes for cystic fibrosis. Each gene is either N orC. Eachchildreceivesonegenefromeachparent. IfyourgenesareNN orNC orCN thenyouarenormal;iftheyareCC thenyouhavecysticfibrosis. (a)NeitherofSally’sparentshascysticfibrosis. Nordoesshe. However,Sally’s sister Hannah does have cystic fibrosis. Find the probability that Sally has atleastoneC gene(giventhatshedoesnothavecysticfibrosis). (b) In the general population the ratio of N genes to C genes is about 49 to 1. Youcanassumethatthetwogenesinapersonareindependent. Harrydoes nothave cysticfibrosis. Findtheprobabilitythat hehasatleast oneC gene (giventhathedoesnothavecysticfibrosis). (c) Harry and Sally plan to have a child. Find the probability that the child will havecysticfibrosis(giventhatneitherHarrynorSallyhasit). Solution Duringthissolution,wewilluseanumberoftimesthefollowingprin ciple. LetAandBbeeventswithA⊆B. ThenA∩B=A,andso P(A∩B) P(A) P(AB)= = . P(B) P(B) (a)Thisisthesameastheeyecolourexamplediscussedearlier. Wearegiven that Sally’s sister has genesCC, and one gene must come from each parent. But2.7. WORKEDEXAMPLES 35 neither parent is CC, so each parent is CN or NC. Now by the basic rules of genetics, all the four combinations of genes for a child of these parents, namely CC,CN,NC,NN,willhaveprobability1/4. IfS istheevent‘SallyhasatleastoneCgene’,thenS =CN,NC,CC;and 1 1 if S is the event ‘Sally does not have cystic fibrosis’, then S =CN,NC,NN. 2 2 Then P(S ∩S ) 2/4 2 1 2 P(S S )= = = . 1 2 P(S ) 3/4 3 2 (b) We know nothing specific about Harry, so we assume that his genes are randomly and independently selected from the population. We are given that the probability of a random gene beingC or N is 1/50 and 49/50 respectively. Then 2 theprobabilitiesofHarryhavinggenesCC,CN,NC,NN arerespectively(1/50) , 2 (1/50)·(49/50), (49/50)·(1/50), and (49/50) , respectively. So, if H is the 1 event ‘Harry has at least one C gene’, and H is the event ‘Harry does not have 2 cysticfibrosis’,then P(H ∩H ) (49/2500)+(49/2500) 2 1 2 P(H H )= = = . 1 2 P(H ) (49/2500)+(49/2500)+(2401/2500) 51 2 (c) Let X be the event that Harry’s and Sally’s child has cystic fibrosis. As in (a), this can only occur if Harry and Sally both have CN or NC genes. That is, X ⊆S ∩H , where S =S ∩S and H =H ∩H . Now if Harry and Sally are 3 3 3 1 2 3 1 2 bothCN orNC,thesegenespassindependentlytothebaby,andso P(X) 1 P(XS ∩H )= = . 3 3 P(S ∩H ) 4 3 3 (Remembertheprinciplethatwestartedwith) We are asked to find P(X S ∩H ), in other words (since X ⊆ S ∩H ⊆ 2 2 3 3 S ∩H ), 2 2 P(X) . P(S ∩H ) 2 2 NowHarry’sandSally’sgenesareindependent,so P(S ∩H ) = P(S )·P(H ), 3 3 3 3 P(S ∩H ) = P(S )·P(H ). 2 2 2 2 Thus, P(X) P(X) P(S ∩H ) 3 3 = · P(S ∩H ) P(S ∩H ) P(S ∩H ) 2 2 3 3 2 236 CHAPTER2. CONDITIONALPROBABILITY 1 P(S ∩S ) P(H ∩H ) 1 2 1 2 = · · 4 P(S ) P(H ) 2 2 1 = ·P(S S )·P(H H ) 1 2 1 2 4 1 2 2 = · · 4 3 51 1 = . 153 IthankEduardoMendesforpointingoutamistakeinmyprevioussolutionto thisproblem. Question TheLandofNodliesinthemonsoonzone,andhasjusttwoseasons, WetandDry. TheWetseasonlastsfor1/3oftheyear,andtheDryseasonfor2/3 oftheyear. DuringtheWetseason,theprobabilitythatitisrainingis3/4;during theDryseason,theprobabilitythatitisrainingis1/6. (a) I visit the capital city, Oneirabad, on a random day of the year. What is the probabilitythatitisrainingwhenIarrive (b)IvisitOneirabadonarandomday,anditisrainingwhenIarrive. Given this information,whatistheprobabilitythatmyvisitisduringtheWetseason (c)I visitOneirabad ona randomday, and itisraining whenI arrive. Given this information, what is the probability that it will be raining when I return to Oneirabadinayear’stime (You may assume that in a year’s time the season will be the same as today but, giventheseason,whetherornotitisrainingisindependentoftoday’sweather.) Solution (a) LetW be the event ‘it is the wet season’, D the event ‘it is the dry season’, and R the event ‘it is raining when I arrive’. We are given that P(W) = 1/3,P(D)=2/3,P(RW)=3/4,P(RD)=1/6. BytheToTP, P(R) = P(RW)P(W)+P(RD)P(D) = (3/4)·(1/3)+(1/6)·(2/3)=13/36. (b)ByBayes’Theorem, P(RW)P(W) (3/4)·(1/3) 9 P(W R)= = = . P(R) 13/36 132.7. WORKEDEXAMPLES 37 0 (c) Let R be the event ‘it is raining in a year’s time’. The information we are 0 0 givenisthatP(R∩R W)=P(RW)P(R W)andsimilarlyforD. Thus 0 0 0 P(R∩R ) = P(R∩R W)P(W)+P(R∩R D)P(D) 89 2 2 = (3/4) ·(1/3)+(1/6) ·(2/3)= , 432 andso 0 P(R∩R ) 89/432 89 0 P(R R)= = = . P(R) 13/36 15638 CHAPTER2. CONDITIONALPROBABILITYChapter3 Randomvariables In this chapter we define random variables and some related concepts such as probabilitymassfunction,expectedvalue,variance,andmedian;andlookatsome particularlyimportanttypesofrandomvariablesincludingthebinomial,Poisson, andnormal. 3.1 Whatarerandomvariables TheHolyRomanEmpirewas,inthewordsofthehistorianVoltaire,“neitherholy, norRoman, noranempire”. Similarly, arandomvariableisneitherrandomnora variable: Arandomvariableisafunctiondefinedonasamplespace. The values of the function can be anything at all, but for us they will always be numbers. Thestandardabbreviationfor‘randomvariable’isr.v. Example I select at random a student from the class and measure his or her heightincentimetres. Here, the sample space is the set of students; the random variable is ‘height’, whichisafunctionfromthesetofstudentstotherealnumbers: h(S)istheheight of student S in centimetres. (Remember that a function is nothing but a rule for associating with each element of its domain set an element of its target or range set. HerethedomainsetisthesamplespaceS,thesetofstudentsintheclass,and thetargetspaceisthesetofrealnumbers.) Example I throw a sixsided die twice; I am interested in the sum of the two numbers. Herethesamplespaceis S =(i, j):1≤i, j≤6, 3940 CHAPTER3. RANDOMVARIABLES and the random variable F is given by F(i, j) = i+ j. The target set is the set 2,3,...,12. Thetworandomvariablesintheaboveexamplesarerepresentativesofthetwo types of random variables that we will consider. These definitions are not quite precise,butmoreexamplesshouldmaketheideaclearer. ArandomvariableF isdiscreteifthevaluesitcantakeareseparatedbygaps. Forexample,F isdiscreteifitcantakeonlyfinitelymanyvalues(asinthesecond exampleabove,wherethevaluesaretheintegersfrom2to12),orifthevaluesof F are integers (for example, the number of nuclear decays which take place in a secondinasampleofradioactivematerial–thenumberisanintegerbutwecan’t easilyputanupperlimitonit.) A random variable is continuous if there are no gaps between its possible values. In the first example, the height of a student could in principle be any real number between certain extreme limits. A random variable whose values range overanintervalofrealnumbers,orevenoverallrealnumbers,iscontinuous. Onecouldconcoctrandomvariableswhichareneitherdiscretenorcontinuous (e.g. the possible, values could be 1, 2, 3, or any real number between 4 and 5), butwewillnotconsidersuchrandomvariables. Webeginbyconsideringdiscreterandomvariables. 3.2 Probabilitymassfunction LetF beadiscreterandomvariable. Themostbasicquestionwecanaskis: given anyvalue ainthetargetsetof F,whatistheprobabilitythat F takesthevalue a Inotherwords,ifweconsidertheevent A=x∈S :F(x)=a what is P(A) (Remember that an event is a subset of the sample space.) Since eventsofthiskindaresoimportant,wesimplifythenotation: wewrite P(F =a) inplaceof P(x∈S :F(x)=a). (There is a fairly common convention in probability and statistics that random variables are denoted by capital letters and their values by lowercase letters. In fact, it is quite common to use the same letter in lower case for a value of the random variable; thus, we would write P(F = f) in the above example. But rememberthatthisisonlyaconvention,andyouarenotboundtoit.)3.3. EXPECTEDVALUEANDVARIANCE 41 TheprobabilitymassfunctionofadiscreterandomvariableF isthefunction, formulaortablewhichgivesthevalueofP(F =a)foreachelementainthetarget setofF. IfF takesonlyafewvalues,itisconvenienttolistitinatable;otherwise we should give a formula if possible. The standard abbreviation for ‘probability massfunction’isp.m.f. Example Itossafaircointhreetimes. TherandomvariableX givesthenumber ofheadsrecorded. ThepossiblevaluesofX are0,1,2,3,anditsp.m.f. is a 0 1 2 3 1 3 3 1 P(X =a) 8 8 8 8 ForthesamplespaceisHHH,HHT,HTH,HTT,THH,THT,TTH,TTT,and each outcome is equally likely. The event X =1, for example, when written as a setofoutcomes,isequaltoHTT,THT,TTH,andhasprobability3/8. Two random variables X and Y are said to have the same distribution if the values they take and their probability mass functions are equal. We write X ∼Y inthiscase. In the above example, ifY is the number of tails recorded during the experi ment,thenX andY havethesamedistribution,eventhoughtheiractualvaluesare different(indeed,Y =3−X). 3.3 Expectedvalueandvariance Let X be a discrete random variable which takes the values a ,...,a . The ex 1 n pectedvalueormeanofX isthenumberE(X)givenbytheformula n E(X)= a P(X =a ). i i ∑ i=1 That is, we multiply each value of X by the probability that X takes that value, and sum these terms. The expected value is a kind of ‘generalised average’: if eachofthevaluesisequallylikely,sothateachhasprobability1/n,thenE(X)= (a +···+a )/n,whichisjusttheaverageofthevalues. 1 n There is an interpretation of the expected value in terms of mechanics. If we putamass p ontheaxisatpositiona fori=1,...,n,where p =P(X =a ),then i i i i thecentreofmassofallthesemassesisatthepointE(X). If the random variable X takes infinitely many values, say a ,a ,a ,..., then 1 2 3 wedefinetheexpectedvalueofX tobetheinfinitesum ∞ E(X)= a P(X =a ). i i ∑ i=142 CHAPTER3. RANDOMVARIABLES Of course, now we have to worry about whether this means anything, that is, whether this infinite series is convergent. This is a question which is discussed at great length in analysis. We won’t worry about it too much. Usually, discrete random variables will only have finitely many values; in the few examples we consider where there are infinitely many values, the series will usually be a ge ometric series or something similar, which we know how to sum. In the proofs below,weassumethatthenumberofvaluesisfinite. ThevarianceofX isthenumberVar(X)givenby 2 2 Var(X)=E(X )−E(X) . 2 Here,X isjusttherandomvariablewhosevaluesarethesquaresofthevaluesof X. Thus n 2 2 E(X )= a P(X =a ) i ∑ i i=1 (or an infinite sum, if necessary). The next theorem shows that, if E(X) is a kind of average of the values of X, then Var(X) is a measure of how spreadout the valuesarearoundtheiraverage. Proposition3.1 LetX be adiscreterandomvariablewithE(X)=μ. Then n 2 2 Var(X)=E((X−μ) )= (a −μ) P(X =a ). i i ∑ i=1 Forthesecondtermisequaltothethirdbydefinition,andthethirdis n 2 (a −μ) P(X =a ) i i ∑ i=1 n 2 2 = (a −2μa +μ )P(X =a ) i i ∑ i i=1 n n n 2 2 = a P(X =a ) −2μ a P(X =a ) +μ P(X =a ) . i i i i ∑ i ∑ ∑ i=1 i=1 i=1 (Whatishappeninghereisthattheentiresumconsistsofnrowswiththreeterms ineachrow. Weadditupbycolumnsinsteadofbyrows,gettingthreepartswith ntermsineachpart.) Continuing,wefind 2 2 2 E((X−μ) ) = E(X )−2μE(X)+μ 2 2 = E(X )−E(X) , n and we are done. (Remember that E(X) =μ, and that P(X =a ) =1 since ∑ i i=1 theeventsX =a formapartition.) i3.4. JOINTP.M.F.OFTWORANDOMVARIABLES 43 Some people take the conclusion of this proposition as the definition of vari ance. Example I toss a fair coin three times; X is the number of heads. What are the expectedvalueandvarianceofX E(X)=0×(1/8)+1×(3/8)+2×(3/8)+3×(1/8)=3/2, 2 2 2 2 2 Var(X)=0 ×(1/8)+1 ×(3/8)+2 ×(3/8)+3 ×(1/8)−(3/2) =3/4. IfwecalculatethevarianceusingProposition3.1,weget         2 2 2 2 3 1 1 3 1 3 3 1 3 Var(X)= − × + − × + × + × = . 2 8 2 8 2 8 2 8 4 Twopropertiesofexpectedvalueandvariancecanbeusedasacheckonyour calculations. • TheexpectedvalueofX alwaysliesbetweenthesmallestandlargestvalues ofX. • The variance of X is never negative. (For the formula in Proposition 3.1 is 2 a sum of terms, each of the form (a −μ) (a square, hence nonnegative) i timesP(X =a )(aprobability,hencenonnegative). i 3.4 Jointp.m.f. oftworandomvariables Let X be a random variable taking the values a ,...,a , and let Y be a random 1 n variable taking the values b ,...,b . We say that X andY are independent if, for 1 m anypossiblevaluesiand j,wehave P(X =a,Y =b )=P(X =a )·P(Y =b ). i j i j Here P(X =a,Y =b ) means the probability of the event that X takes the value i j a andY takesthevalueb . Sowecouldrestatethedefinitionasfollows: i j Therandomvariables X andY are independent if, foranyvalue a of i X andanyvalueb ofY,theeventsX =a andY =b areindependent j i j (events). Note the difference between ‘independent events’ and ‘independent random vari ables’.44 CHAPTER3. RANDOMVARIABLES Example In Chapter 2, we saw the following: I have two red pens, one green pen, and one blue pen. I select two pens without replacement. Then the events ‘exactly one red pen selected’ and ‘exactly one green pen selected’ turned out to be independent. Let X be the number of red pens selected, andY the number of greenpensselected. Then P(X =1,Y =1)=P(X =1)·P(Y =1). AreX andY independentrandomvariables No,because P(X =2)=1/6, P(Y =1)=1/2,but P(X =2,Y =1)=0(itis impossibletohavetworedandonegreeninasampleoftwo). Ontheotherhand,ifIrolladietwice,andX andY arethenumbersthatcome up on the first and second throws, then X andY will be independent, even if the dieisnotfair(sothattheoutcomesarenotallequallylikely). If we have more than two random variables (for example X,Y,Z), we say that theyaremutuallyindependentiftheeventsthattherandomvariablestakespecific values (for example, X =a, Y =b, Z =c) are mutually independent. (You may wanttorevisethematerialonmutuallyindependentevents.) Whatabouttheexpectedvaluesofrandomvariables Forexpectedvalue,itis easy,butforvarianceithelpsifthevariablesareindependent: Theorem3.2 LetX andY berandomvariables. (a)E(X+Y)=E(X)+E(Y). (b)IfX andY areindependent,thenVar(X+Y)=Var(X)+Var(Y). Wewillseetheprooflater. IftworandomvariablesX andY arenotindependent,thenknowingthep.m.f. ofeachvariabledoesnottellthewholestory. The joint probability mass function (or joint p.m.f.) of X and Y is the table giving, for each value a of X and each i value b of Y, the probability that X = a and Y = b . We arrange the table so j i j that the rows correspond to the values of X and the columns to the values of Y. Note that summing the entries in the row corresponding to the value a gives the i probability that X =a; that is, the row sums form the p.m.f. of X. Similarly the i column sums form the p.m.f. of Y. (The row and column sums are sometimes calledthemarginaldistributionsormarginals.) In particular, X and Y are independent r.v.s if and only if each entry of the tableisequaltotheproductofitsrowsumanditscolumnsum.3.4. JOINTP.M.F.OFTWORANDOMVARIABLES 45 Example I have two red pens, one green pen, and one blue pen, and I choose twopenswithoutreplacement. LetX bethenumberofredpensthatIchooseand Y the number of green pens. Then the joint p.m.f. of X and Y is given by the followingtable: Y 0 1 1 0 0 6 1 1 X 1 3 3 1 2 0 6 Therowandcolumnsumsgiveusthep.m.f.sforX andY: a 0 1 2 b 0 1 1 2 1 1 1 P(X =a) P(Y =b) 6 3 6 2 2 NowwegivetheproofofTheorem3.2. Weconsiderthejointp.m.f. ofX andY. TherandomvariableX+Y takesthe values a +b for i =1,...,n and j =1,...,m. Now the probability that it takes i j a given value c is the sum of the probabilities P(X =a,Y =b ) over all i and j i j k suchthata +b =c . Thus, i j k E(X+Y) = c P(X+Y =c ) k k ∑ k n m = (a +b )P(X =a,Y =b ) ∑∑ i j i j i=1 j=1 n m m n = a P(X =a,Y =b ) + b P(X =a,Y =b ) . i i j j i j ∑ ∑ ∑ ∑ i=1 j=1 j=1 i=1 m Now P(X =a,Y =b ) is a row sum of the joint p.m.f. table, so is equal to ∑ i j j=1 n P(X =a ), and similarly∑ P(X =a,Y =b ) is a column sum and is equal to i i j i=1 P(Y =b ). So j n m E(X+Y) = a P(X =a )+ b P(Y =b ) i i j j ∑ ∑ i=1 j=1 = E(X)+E(Y). Thevarianceisabittrickier. Firstwecalculate 2 2 2 2 2 E((X+Y) )=E(X +2XY +Y )=E(X )+2E(XY)+E(Y ),46 CHAPTER3. RANDOMVARIABLES using part (a) of the Theorem. We have to consider the term E(XY). For this, we havetomaketheassumptionthatX andY areindependent,thatis, P(X =a ,Y =b )=P(X =a )·P(Y =b ). 1 j i j Asbefore,wehave n m E(XY) = a b P(X =a,Y =b ) i j i j ∑∑ i=1 j=1 n n = a b P(X =a )P(Y =b ) i j i j ∑∑ i=1 j=1 n m = a P(X =a ) · b P(Y =b ) ∑ i i ∑ j j i=1 j=1 = E(X)·E(Y). So 2 2 Var(X+Y) = E((X+Y) )−(E(X+Y)) 2 2 2 2 = (E(X )+2E(XY)+E(Y ))−(E(X) +2E(X)E(Y)+E(Y) ) 2 2 2 2 = (E(X )−E(X) )+2(E(XY)−E(X)E(Y))+(E(Y )−E(Y) ) = Var(X)+Var(Y). To finish this section, we consider constant random variables. (If the thought of a ‘constant variable’ worries you, remember that a random variable is not a variableatallbutafunction,andthereisnothingamisswithaconstantfunction.) Proposition3.3 LetC be a constant random variable with value c. Let X be any randomvariable. (a)E(C)=c,Var(C)=0. (b)E(X+c)=E(X)+c,Var(X+c)=Var(X). 2 (c)E(cX)=cE(X),Var(cX)=c Var(X). Proof (a)TherandomvariableC takesthesinglevaluecwithP(C=c)=1. So E(C)=c·1=c. Also, 2 2 2 2 Var(C)=E(C )−E(C) =c −c =0. 2 2 (ForC isaconstantrandomvariablewithvaluec .)3.5. SOMEDISCRETERANDOMVARIABLES 47 (b) This follows immediately from Theorem 3.2, once we observe that the constant random variableC and any random variable X are independent. (This is truebecauseP(X =a,C =c)=P(X =a)·1.) Then E(X+c)=E(X)+E(C)=E(X)+c, Var(X+c)=Var(X)+Var(C)=Var(X). (c)Ifa ,...,a arethevaluesofX,thenca ,...,ca arethevaluesofcX,and 1 n 1 n P(cX =ca )=P(x=a ). So i i n E(cX) = ca P(cX =ca ) ∑ i i i=1 n = c a P(X =a ) i i ∑ i=1 = cE(X). Then 2 2 2 Var(cX) = E(c X )−E(cX) 2 2 2 = c E(X )−(cE(X)) 2 2 2 = c (E(X )−E(X) ) 2 = c Var(X). 3.5 Somediscreterandomvariables Wenowlookatfivetypesofdiscreterandomvariables,eachdependingononeor more parameters. We describe for each type the situations in which it arises, and give the p.m.f., the expected value, and the variance. If the variable is tabulated in the New Cambridge Statistical Tables, we give the table number, and some examples of using the tables. You should have a copy of the tables to follow the examples. AsummaryofthisinformationisgiveninAppendixB. Before we begin, a comment on the New Cambridge Statistical Tables. They don’t givethe probability massfunction (or p.m.f.), but a closelyrelated function called the cumulative distribution function. It is defined for a discrete random variableasfollows. Let X be a random variable taking values a ,a ,...,a . We assume that these 1 2 n are arranged in ascending order: a a ···a . The cumulative distribution 1 2 n function,orc.d.f.,ofX isgivenby F (a )=P(X≤a ). X i i48 CHAPTER3. RANDOMVARIABLES Weseethatitcanbeexpressedintermsofthep.m.f. ofX asfollows: i F (a )=P(X =a )+···+P(X =a )= P(X =a ). X i 1 i ∑ j j=1 Intheotherdirection,wecnrecoverthep.m.f. fromthec.d.f.: P(X =a )=F (a )−F (a ). i X i X i−1 We won’t use the c.d.f. of a discrete random variable except for looking up thetables. Itismuchmoreimportantforcontinuousrandomvariables BernoullirandomvariableBernoulli(p) A Bernoulli random variable is the simplest type of all. It only takes two values, 0and1. Soitsp.m.f. looksasfollows: x 0 1 P(X =x) q p Here, p is the probability that X = 1; it can be any number between 0 and 1. Necessarily q (the probability that X = 0) is equal to 1− p. So p determines everything. For a Bernoulli random variable X, we sometimes describe the experiment as a‘trial’,theeventX =1as‘success’,andtheeventX =0as‘failure’. For example, if a biased coin has probability p of coming down heads, then the number of heads that we get when we toss the coin once is a Bernoulli(p) randomvariable. More generally, let A be any event in a probability space S. With A, we asso ciate a random variable I (remember that a random variable is just a function on A S)bytherule n 1 ifs∈A; I (s)= A 0 ifs∈ / A. The random variable I is called the indicator variable of A, because its value A indicates whether or not A occurred. It is a Bernoulli(p) random variable, where p=P(A). (TheeventI =1isjusttheeventA.) Somepeoplewrite11 insteadof A A I . A CalculationoftheexpectedvalueandvarianceofaBernoullirandomvariable is easy. Let X ∼ Bernoulli(p). (Remember that ∼ means “has the same p.m.f. as”.) E(X)=0·q+1·p= p; 2 2 2 2 Var(X)=0 ·q+1 ·p−p = p−p = pq. (Rememberthatq=1−p.)3.5. SOMEDISCRETERANDOMVARIABLES 49 BinomialrandomvariableBin(n,p) RememberthatforaBernoullirandomvariable,wedescribetheeventX =1asa ‘success’. Now a binomial random variable counts the number of successes in n independenttrialseachassociatedwithaBernoulli(p)randomvariable. For example, suppose that we have a biased coin for which the probability of headsis p. Wetossthecoinntimesandcountthenumberofheadsobtained. This numberisaBin(n,p)randomvariable. ABin(n,p)randomvariableX takesthevalues0,1,2,...,n,andthep.m.f. of X isgivenby n n−k k P(X =k)= C q p k n fork=0,1,2,...,n,whereq=1−p. Thisisbecausethereare C differentways k of obtaining k heads in a sequence of n throws (the number of choices of the k positions in which the heads occur), and the probability of getting k heads and n−k k n−k tailsinaparticularorderisq p . Note that we have given a formula rather than a table here. For small values wecouldtabulatetheresults;forexample,forBin(4,p): k 0 1 2 3 4 4 3 2 2 3 4 P(X =k) q 4q p 6q p 4qp p Note: whenweaddupalltheprobabilitiesinthetable,weget n n n−k k n C q p =(q+p) =1, k ∑ k=0 asitshouldbe: hereweusedthebinomialtheorem n n n n−k k (x+y) = C x y . ∑ k k=0 (Thisargumentexplainsthenameofthebinomialrandomvariable) IfX∼Bin(n,p),then E(X)=np, Var(X)=npq. There are two ways to prove this, an easy way and a harder way. The easy way only works for the binomial, but the harder way is useful for many random vari ables. However, you can skip it if you wish: I have set it in smaller type for this reason. Here is the easy method. We have a coin with probability p of coming down heads, and we toss it n times and count the number X of heads. Then X is our Bin(n,p)randomvariable. LetX betherandomvariabledefinedby k  1 ifwegetheadsonthekthtoss, X = k 0 ifwegettailsonthekthtoss.50 CHAPTER3. RANDOMVARIABLES In other words, X is the indicator variable of the event ‘heads on the kth toss’. i Nowwehave X =X +X +···+X 1 2 n (canyouseewhy),andX ,...,X areindependentBernoulli(p)randomvariables 1 n (sincetheyaredefinedbydifferenttossesofacoin). So,aswesawearlier,E(X)= i p, Var(X) = pq. Then, by Theorem 21, since the variables are independent, we i have E(X) = p+p+···+p=np, Var(X) = pq+pq+···+pq=npq. The other method uses a gadget called the probability generating function. We only use it here for calculating expected values and variances, but if you learn more probability theory you willseeotherusesforit. LetX bearandomvariablewhosevaluesarenonnegativeintegers. (We don’t insist that it takes all possible values; this method is fine for the binomial Bin(n,p), which takes values between 0 and n. To save space, we write p for the probability P(X =k). Now the k probabilitygeneratingfunctionofX isthepowerseries k G (x)= p x . X k ∑ (Thesumisoverallvaluesk takenbyX.) Weusethenotation F(x) fortheresultofsubstitutingx=1intheseriesF(x). x=1 Proposition3.4 LetG (x)bethe probability generating function of a random variable X. Then X (a) G (x) =1; X x=1   d (b)E(X)= G (x) ; X dx x=1 h i 2 d 2 (c)Var(X)= G (x) +E(X)−E(X) . X 2 dx x=1 Part (a) is just the statement that probabilities add up to 1: when we substitute x = 1 in the powerseriesforG (x)wejustget p . X ∑ k For part (b), when we differentiate the series termbyterm (you will learn later in Analysis thatthisisOK),weget d k−1 G (x)= kp x . X k ∑ dx Nowputtingx=1inthisseriesweget kp =E(X). ∑ k Forpart(c),differentiatingtwicegives 2 d k−2 G (x)= k(k−1)p x . X k ∑ 2 dx Nowputtingx=1inthisseriesweget 2 2 k(k−1)p = k p − kp =E(X )−E(X). k k k ∑ ∑ ∑ 2 2 2 Adding E(X)andsubtractingE(X) givesE(X )−E(X) ,whichbydefinitionisVar(X).3.5. SOMEDISCRETERANDOMVARIABLES 51 NowletusappplythistothebinomialrandomvariableX∼Bin(n,p). Wehave n n−k k p =P(X =k)= C q p , k k sotheprobabilitygeneratingfunctionis n n n−k k k n C q p x =(q+px) , ∑ k k=0 n bytheBinomialTheorem. Puttingx=1gives(q+p) =1,inagreementwithProposition3.4(a). n−1 Differentiatingonce,usingtheChainRule,wegetnp(q+px) . Puttingx=1wefindthat E(X)=np. 2 n−2 2 Differentiatingagain,weget n(n−1)p (q+px) . Putting x=1gives n(n−1)p . Nowadding 2 E(X)−E(X) ,weget 2 2 2 2 Var(X)=n(n−1)p +np−n p =np−np =npq. ThebinomialrandomvariableistabulatedinTable1oftheCambridgeStatis tical Tables 1. As explained earlier, the tables give the cumulative distribution function. For example, suppose that the probability that a certain coin comes down heads is 0.45. If the coin is tossed 15 times, what is the probability of five or fewer heads Turning to the page n=15 in Table 1 and looking at the row 0.45, youreadofftheanswer0.2608. Whatistheprobabilityofexactlyfiveheads This isP(5orfewer)−P(4orfewer),andfromtablestheansweris0.2608−0.1204= 0.1404. The tables only go up to p=0.5. For larger values of p, use the fact that the number of failures in Bin(n,p) is equal to the number of successes in Bin(n,1− p). Sotheprobabilityoffiveheadsin15tossesofacoinwith p=0.55is0.9745− 0.9231=0.0514. Another interpretation of the binomial random variable concerns sampling. Suppose that we have N balls in a box, of which M are red. We sample n balls from the box with replacement; let the random variable X be the number of red ballsinthesample. WhatisthedistributionofX Sinceeachballhasprobability M/N of being red, and different choices are independent, X ∼ Bin(n,p), where p=M/N istheproportionofredballsinthesample. What about sampling without replacement This leads us to our next random variable: HypergeometricrandomvariableHg(n,M,N) Suppose that we have N balls in a box, of which M are red. We sample n balls from the box without replacement. Let the random variable X be the number of52 CHAPTER3. RANDOMVARIABLES red balls in the sample. Such an X is called a hypergeometric random variable Hg(n,M,N). The random variable X can take any of the values 0,1,2,...,n. Its p.m.f. is givenbytheformula M N−M C · C k n−k P(X =k)= . N Cn N For the number of samples of n balls from N is C ; the number of ways of n M N−M choosingkoftheM redballsandn−koftheN−M othersis C · C ;and k n−k allchoicesareequallylikely. The expected value and variance of a hypergeometric random variable are as follows(wewon’tgointotheproofs):       M M N−M N−n E(X)=n , Var(X)=n . N N N N−1 Youshouldcomparethesetothevaluesforabinomialrandomvariable. Ifwe let p=M/N betheproportionofredballsinthehat,thenE(X)=np,andVar(X) isequaltonpqmultipliedbya‘correctionfactor’ (N−n)/(N−1). In particular, if the numbers M and N−M of red and nonred balls in the hat are both very large compared to the size n of the sample, then the difference between sampling with and without replacement is very small, and indeed the ‘correction factor’ is close to 1. So we can say that Hg(n,M,N) is approximately Bin(n,M/N)ifnissmallcomparedtoM andN−M. Consider our example of choosing two pens from four, where two pens are red, one green, and one blue. The number X of red pens is a Hg(2,2,4) random variable. We calculated earlier that P(X =0)=1/6, P(X =1)=2/3 and P(X = 2)=1/6. FromthiswefindbydirectcalculationthatE(X)=1andVar(X)=1/3. Theseagreewiththeformulaeabove. GeometricrandomvariableGeom(p) The geometric random variable is like the binomial but with a different stopping rule. We have again a coin whose probability of heads is p. Now, instead of tossingitafixednumberoftimesandcountingtheheads,wetossituntilitcomes down heads for the first time, and count the number of times we have tossed the coin. Thus, the values of the variable are the positive integers 1,,2,3,... (In theory we might never get a head and toss the coin infinitely often, but if p0 this possibility is ‘infinitely unlikely’, i.e. has probability zero, as we will see.) Wealwaysassumethat0 p1. More generally, the number of independent Bernoulli trials required until the firstsuccessisobtainedisageometricrandomvariable.3.5. SOMEDISCRETERANDOMVARIABLES 53 Thep.m.fofaGeom(p)randomvariableisgivenby k−1 P(X =k)=q p, where q = 1−p. For the event X = k means that we get tails on the first k−1 k−1 tossesandheadsonthe kth,andthiseventhasprobability q p,since‘tails’has probabilityqanddifferenttossesareindependent. Let’sadduptheseprobabilities: ∞ p k−1 2 q p= p+qp+q p+···= =1, ∑ 1−q k=1 since the series is a geometric progression with first term p and common ratio q, where q1. (Just as the binomial theorem shows that probabilities sum to 1 for abinomial randomvariable, and givesits name tothe randomvariable, so the geometricprogressiondoesforthegeometricrandomvariable.) We calculate the expected value and the variance using the probability gener atingfunction. IfX∼Geom(p),theresultwillbethat 2 E(X)=1/p, Var(X)=q/p . Wehave ∞ px k−1 k G (x)= q px = , X ∑ 1−qx k=1 againbysummingageometricprogression. Differentiating,weget d (1−qx)p+pxq p G (x)= = . X 2 2 dx (1−qx) (1−qx) Puttingx=1,weobtain p 1 E(X)= = . 2 (1−q) p 3 Differentiatingagaingives2pq/(1−qx) ,so 2pq 1 1 q Var(X)= + − = . 3 2 2 p p p p Forexample,ifwetossafaircoinuntilheadsisobtained,theexpectednumber of tosses until the first head is 2 (so the expected number of tails is 1); and the varianceofthisnumberisalso2.54 CHAPTER3. RANDOMVARIABLES PoissonrandomvariablePoisson(λ) ThePoissonrandomvariable,unliketheoneswehaveseenbefore,isveryclosely connectedwithcontinuousthings. Suppose that ‘incidents’ occur at random times, but at a steady rate overall. The best example is radioactive decay: atomic nuclei decay randomly, but the average number λ which will decay in a given interval is constant. The Poisson randomvariabeX countsthenumberof‘incidents’whichoccurinagiveninterval. So if, on average, there are 2.4 nuclear decays per second, then the number of decaysinonesecondstartingnowisaPoisson(2.4)randomvariable. Another example might be the number of telephone calls a minute to a busy telephonenumber. Although we will not prove it, the p.m.f. for a Poisson(λ) variable X is given bytheformula k λ −λ P(X =k)= e . k Let’scheckthattheseprobabilitiesadduptoone. Weget ∞ k λ −λ λ −λ e =e ·e =1, ∑ k k=0 sincetheexpressioninbracketsisthesumoftheexponentialseries. By analogy with what happened for the binomial and geometric random vari ables, you might have expected that this random variable would be called ‘expo nential’. Unfortunately, this name has been given to a closelyrelated continuous random variable which we will meet later. However, if you speak a little French, youmightuseasamnemonicthefactthatifIgofishing,andthefisharebitingat the rate ofλ per hour on average, then the number of fish I will catch in the next hourisaPoisson(λ)randomvariable. TheexpectedvalueandvarianceofaPoisson(λ)randomvariableX aregiven by E(X)=Var(X)=λ. Againweusetheprobabilitygeneratingfunction. IfX∼Poisson(λ),then ∞ k (λx) −λ λ(x−1) G (x)= e =e , X ∑ k k=0 againusingtheseriesfortheexponentialfunction. λ(x−1) 2 λ(x−1) Differentiationgivesλe ,soE(X)=λ. Differentiatingagaingivesλ e ,so 2 2 Var(X)=λ +λ−λ =λ.3.6. CONTINUOUSRANDOMVARIABLES 55 ThecumulativedistributionfunctionofaPoissonrandomvariableistabulated inTable2oftheNewCambridgeStatisticalTables. So,forexample,wefindfrom the tables that, if 2.4 fish bite per hour on average, then the probability that I will catchnofishinthenexthouris0.0907,whiletheprobabilitythatIcatchatfiveor feweris0.9643(sothattheprobabilitythatIcatchsixormoreis0.0357). There is another situation in which the Poisson distribution arises. Suppose I amlookingforsomeveryrareeventwhichonlyoccursoncein1000trialsonav erage. So I conduct 1000 independent trials. How many occurrences of the event do I see This number is really a binomial random variable Bin(1000,1/1000). But it turns out to be Poisson(1), to a very good approximation. So, for example, theprobabilitythattheeventdoesn’toccurisabout1/e. Thegeneralruleis: If n is large, p is small, and np =λ, then Bin(n,p) can be approxi matedbyPoisson(λ). 3.6 Continuousrandomvariables Wehaven’tsofarreallyexplainedwhatacontinuousrandomvariableis. Itstarget set is the set of real numbers, or perhaps the nonnegative real numbers or just an interval. Thecrucialpropertyisthat,foranyrealnumbera,wehave(X =a)=0; that is, the probability that the height of a random student, or the time I have to waitforabus,ispreciselya,iszero. Sowecan’tusetheprobabilitymassfunction forcontinuousrandomvariables;itwouldalwaysbezeroandgivenoinformation. Weusethecumulativedistributionfunctionorc.d.f. instead. Rememberfrom lastweekthatthec.d.f. oftherandomvariableX isthefunctionF definedby X F (x)=P(X≤x). X Note: The name of the function is F ; the lower case x refers to the argument X of the function, the number which is substituted into the function. It is common butnotuniversaltouseastheargumentthelowercaseversionofthenameofthe randomvariable,ashere. NotethatF (y)isthesamefunctionwrittenintermsof X the variable y instead of x, whereas F (x) is the c.d.f. of the random variable Y Y (whichmightbeacompletelydifferentfunction.) Now let X be a continuous random variable. Then, since the probability that X takes the precise value x is zero, there is no difference between P(X ≤x) and P(X x). Proposition3.5 The c.d.f. is an increasing function (this means that F (x)≤ X F (y)ifxy),andapproachesthelimits0asx→−∞and1asx→∞. X56 CHAPTER3. RANDOMVARIABLES Thefunctionisincreasingbecause,ifxy,then F (y)−F (x)=P(X≤y)−P(X≤x)=P(xX≤y)≥0. X X AlsoF (∞)=1becauseX mustcertainlytakesomefinitevalue;andF (−∞)=0 X X becausenovalueissmallerthan−∞ Another important function is the probability density function f . It is ob X tainedbydifferentiatingthec.d.f.: d f (x)= F (x). X X dx Now f (x) is nonnegative, since it is the derivative of an increasing function. If X we know f (x), then F is obtained by integrating. Because F (−∞) = 0, we X X X have Z x F (x)= f (t)dt. X X −∞ Notetheuseofthe“dummyvariable”t inthisintegral. Notealsothat Z b P(a≤X≤b)=F (b)−F (a)= f (t)dt. X X X a You can think of the p.d.f. like this: the probability that the value of X lies in a very small interval from x to x+h is approximately f (x)·h. So, although the X probability of getting exactly the value x is zero, the probability of being close to xisproportionalto f (x). X There is a mechanical analogy which you may find helpful. Remember that wemodelledadiscreterandomvariableX byplacingateachvalueaofX amass equal to P(X = a). Then the total mass is one, and the expected value of X is the centre of mass. For a continuous random variable, imagine instead a wire of variable thickness, so that the density of the wire (mass per unit length) at the point x is equal to f (x). Then again the total mass is one; the mass to the left of X xisF (x);andagainitwillholdthatthecentreofmassisatE(X). X Most facts about continuous random variables are obtained by replacing the p.m.f. by the p.d.f. and replacing sums by integrals. Thus, the expected value of X isgivenby Z ∞ E(X)= xf (x)dx, X −∞ andthevarianceis(asbefore) 2 2 Var(X)=E(X )−E(X) , where Z ∞ 2 2 E(X )= x f (x)dx. X −∞ 2 ItisalsotruethatVar(X)=E((X−μ) ),whereμ=E(X).3.7. MEDIAN,QUARTILES,PERCENTILES 57 Wewillseeexamplesofthesecalculationsshortly. Buthereisasmallexample to show the ideas. The support of a continuous random variable is the smallest intervalcontainingallvaluesofxwhere f (x)0. X SupposethattherandomvariableX hasp.d.f. givenby n 2x if0≤x≤1, f (x)= X 0 otherwise. ThesupportofXistheinterval 0,1. Wechecktheintegral: Z Z ∞ 1   x=1 2 f (x)dx= 2xdx= x =1. X x=0 −∞ 0 ThecumulativedistributionfunctionofX is ( Z 0 ifx0, x 2 F (x)= f (t)dt = x if0≤x≤1, X X −∞ 1 ifx1. (Studythiscarefullytoseehowitworks.) Wehave Z Z ∞ 1 2 2 E(X) = xf (x)dx= 2x dx= , X 3 −∞ 0 Z Z ∞ 1 1 2 2 3 E(X ) = x f (x)dx= 2x dx= , X 2 −∞ 0   2 1 2 1 Var(X) = − = . 2 3 18 3.7 Median,quartiles,percentiles Anothermeasurecommonlyusedforcontinuousrandomvariablesisthemedian; thisisthevaluemsuchthat“halfofthedistributionliestotheleftofmandhalfto theright”. Moreformally,mshouldsatisfyF (m)=1/2. Itisnotthesameasthe X meanorexpectedvalue. Intheexampleattheendofthelastsection,wesawthat E(X) =2/3. The median of X is the value of m for which F (m) =1/2. Since X √ 2 F (x)=x for0≤x≤1,weseethatm=1/ 2. X Ifthereisavaluemsuchthatthegraphofy= f (x)issymmetricaboutx=m, X thenboththeexpectedvalueandthemedianofX areequaltom. Thelowerquartilel andtheupperquartileuaresimilarlydefinedby F (l)=1/4, F (u)=3/4. X X Thus,theprobabilitythat X liesbetween l and uis3/4−1/4=1/2,sothequar tiles give an estimate of how spreadout the distribution is. More generally, we definethenthpercentileofX tobethevalueofx suchthat n F (x )=n/100, X n58 CHAPTER3. RANDOMVARIABLES thatis,theprobabilitythatX issmallerthanx isn. n Reminder Ifthec.d.f. ofX isF (x)andthep.d.f. is f (x),then X X • differentiateF toget f ,andintegrate f togetF ; X X X X • use f tocalculateE(X)andVar(X); X • use F to calculate P(a≤ X ≤ b) (this is F (b)−F (a)), and the median X X X andpercentilesofX. 3.8 Somecontinuousrandomvariables Inthissectionweexaminethreeimportantcontinuousrandomvariables: theuni form,exponential,andnormal. ThedetailsaresummarisedinAppendixB. UniformrandomvariableU(a,b) Letaandbberealnumberswithab. Auniformrandomvariableontheinterval a,bis,roughlyspeaking,“equallylikelytobeanywhereintheinterval”. Inother words, its probability density function is constant on the interval a,b (and zero outside the interval). What should the constant value c be The integral of the p.d.f. is the area of a rectangle of height c and base b−a; this must be 1, so c=1/(b−a). Thus,thep.d.f. oftherandomvariableX∼U(a,b)isgivenby n 1/(b−a) ifa≤x≤b, f (x)= X 0 otherwise. Byintegration,wefindthatthec.d.f. is ( 0 if xa, F (x)= (x−a)/(b−a) ifa≤x≤b, X 1 if xb. Further calculation (or the symmetry of the p.d.f.) shows that the expected value and the median of X are both given by (a+b)/2 (the midpoint of the interval), 2 whileVar(X)=(b−a) /12. Theuniformrandomvariabledoesn’treallyariseinpracticalsituations. How ever, it is very useful for simulations. Most computer systems include a random number generator, which apparently produces independent values of a uniform randomvariableontheinterval0,1. Ofcourse,theyarenotreallyrandom,since thecomputerisadeterministicmachine;butthereshouldbenoobviouspatternto3.8. SOMECONTINUOUSRANDOMVARIABLES 59 the numbers produced, and in a large number of trials they should be distributed uniformlyovertheinterval. You will learn in the Statistics course how to use a uniform random variable to construct values of other types of discrete or continuous random variables. Its greatsimplicitymakesitthebestchoiceforthispurpose. ExponentialrandomvariableExp(λ) The exponential random variable arises in the same situation as the Poisson: be careful not to confuse them We have events which occur randomly but at a con stant average rate of λ per unit time (e.g. radioactive decays, fish biting). The Poisson random variable, which is discrete, counts how many events will occur in the next unit of time. The exponential random variable, which is continuous, measures exactly how long from now it is until the next event occurs. Not that it takesnonnegativerealnumbersasvalues. IfX∼Exp(λ),thep.d.f. ofX is  0 if x0, f (x)= X −λx λe ifx≥0. Byintegration,wefindthec.d.f. tobe  0 if x0, F (x)= X −λx 1−e ifx≥0. Furthercalculationgives 2 E(X)=1/λ, Var(X)=1/λ . −λm Themedian msatisfies1−e =1/2,sothat m=log2/λ. (Thelogarithmisto basee,sothatlog2=0.69314718056approximately. 2 NormalrandomvariableN(μ,σ ) Thenormalrandomvariableisthecommonestofallinapplications,andthemost important. Thereisatheoremcalledthecentrallimittheoremwhichsaysthat,for virtually any random variable X which is not too bizarre, if you take the sum (or the average) of n independent random variables with the same distribution as X, the result will be approximately normal, and will become more and more like a normal variable as n grows. This partly explains why a random variable affected by many independent factors, like a man’s height, has an approximately normal distribution.60 CHAPTER3. RANDOMVARIABLES More precisely, if n is large, then a Bin(n,p) random variable is well approx imated by a normal random variable with the same expected value np and the same variance npq. (If you are approximating any discrete random variable by a continuous one, you should make a “continuity correction” – see the next section fordetailsandanexample.) 2 Thep.d.f. oftherandomvariableX∼N(μ,σ )isgivenbytheformula 1 2 2 −(x−μ) /2σ f (x)= √ e . X σ 2π 2 We have E(X) =μ and Var(X) =σ . The picture below shows the graph of this functionforμ=0,thefamiliar‘bellshapedcurve’. ... .................................. ........ ........ ...... ...... ...... ...... ..... ..... ..... ..... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ..... ..... .... .... ... ... .... .... .... .... .... .... .... .... .... .... ... ... .... .... .... .... .... .... .... .... .... .... ... ... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ..... ..... .... .... ..... ..... .... .... ..... ..... ..... ..... ...... ...... ...... ...... ...... ...... ...... ...... ....... ....... ........ ........ ........ ........ ........... ........... ............ ............ .................. .................. ............................. ............................. . . Thec.d.f. of X isobtainedasusualbyintegratingthep.d.f. However,itisnot possible to write the integral of this function (which, stripped of its constants, is 2 −x e )intermsof‘standard’functions. Sothereisnoalternativebuttomaketables ofitsvalues. The crucial fact that means that we don’t have to tabulate the function for all valuesofμandσisthefollowing: 2 Proposition3.6 IfX∼N(μ,σ ),andY =(X−μ)/σ,thenY ∼N(0,1). Soweonlyneedtablesofthec.d.f. forN(0,1)–thisisthesocalled standard normal random variable – and we can find the c.d.f. of any normal random vari able. The c.d.f. of the standard normal is given in Table 4 of the New Cambridge StatisticalTables1. ThefunctioniscalledΦinthetables. For example, suppose that X ∼ N(6,25).What is the probability that X ≤8 Putting Y = (X−6)/5, so that Y ∼ N(0,1), we find that X ≤ 8 if and only if Y ≤(8−6)/5=0.4. Fromthetables,theprobabilityofthisisΦ(0.4)=0.6554. The p.d.f. of a standard normal r.v. Y is symmetric about zero. This means that,foranypositivenumberc, Φ(−c)=P(Y ≤−c)=P(Y ≥c)=1−P(Y ≤c)=1−Φ(c). Soitisonlynecessarytotabulatethefunctionforpositivevaluesofitsargument. So,ifX∼N(6,25)andY =(X−6)/5asbefore,then P(X≤3)=P(Y ≤−0.6)=1−P(Y ≤0.6)=1−0.7257=0.2743.3.9. ONUSINGTABLES 61 3.9 Onusingtables Weendthissectionwithafewcommentsaboutusingtables,nottiedparticularly tothenormaldistribution(thoughmostoftheexampleswillcomefromthere). Interpolation Any table is limited in the number of entries it contains. Tabulating something withtheinputgiventooneextradecimalplacewouldmakethetabletentimesas bulky Interpolationcanbeusedtoextendtherangeofvaluestabulated. SupposethatsomefunctionF istabulatedwiththeinputgiventothreeplaces of decimals. It is probably true that F is changing at a roughly constant rate between, say, 0.28 and 0.29. So F(0.283) will be about threetenths of the way betweenF(0.28)andF(0.29). For example, if Φ is the c.d.f. of the normal distribution, then Φ(0.28) = 0.6103andΦ(0.29)=0.6141,soΦ(0.283)=0.6114. (Threetenthsof0 .0038is 0.0011.) Usingtablesinreverse This means, if you have a table of values of F, use it to find x such that F(x) is a givenvaluec. Usually,cwon’tbeinthetableandwehavetointerpolatebetween valuesx andx ,whereF(x )isjustlessthancandF(x )isjustgreater. 1 2 1 2 For example, if Φ is the c.d.f. of the normal distribution, and we want the upperquartile,thenwefindfromtablesΦ(0.67)=0.7486andΦ(0.68)=0.7517, sotherequiredvalueisabout0.6745(since0.0014/0.0031=0.45). Inthiscase,thepercentilepointsofthestandardnormalr.v. aregiveninTable 5 of the New Cambridge Statistical Tables 1, so you don’t need to do this. But youwillfinditnecessaryinothercases. Continuitycorrection Suppose we know that a discrete random variable X is well approximated by a continuousrandomvariableY. Wearegivenatableofthec.d.f. ofY andwantto find information about X. For example, suppose that X takes integer values and we want to find P(a≤ X ≤ b), where a and b are integers. This probability is equalto P(X =a)+P(x=a+1)+···+P(X =b). To say that X can be approximated by Y means that, for example, P(X = a) is approximatelyequalto f (a),where f isthep.d.f. ofY. Thisisequaltothearea Y Y62 CHAPTER3. RANDOMVARIABLES ofarectangleofheight f (a)andbase1(froma−0.5toa+0.5). Thisinturnis, Y to a good approximation, the area under the curve y = f (x) from x =a−0.5 to Y x=a+0.5,sincethepiecesofthecurveaboveandbelowtherectangleoneither sideofx=awillapproximatelycancel. Similarlyfortheothervalues. ... ..... .... .... ..... .... ..... .... ..... y=f (x) ..... ..... Y .... ..... ..... ..... ..... ..... ...... ..... ..... ...... ...... ...... P(X=a) ..... ...... ......u ....... ...... ....... ...... ....... ........ ....... ....... ......... ......... ......... .......... ........... ............ . a−0.5 a a+0.5 Adding all these pieces. we find that P(a≤X ≤b) is approximately equal to the area under the curve y = f (x) from x =a−0.5 to x =b+0.5. This area is Y givenbyF (b+0.5)−F (a−0.5),sinceF istheintegralof f . Saidotherwise, Y Y Y Y thisisP(a−0.5≤Y ≤b+0.5). Wesummarisethecontinuitycorrection: SupposethatthediscreterandomvariableX,takingintegervalues,is approximatedbythecontinuousrandomvariableY. Then P(a≤X≤b)≈P(a−0.5≤Y≤b+0.5)=F (b+0.5)−F (a−0.5). Y Y (Here,≈ means “approximately equal”.) Similarly, for example, P(X ≤b)≈ P(Y ≤b+0.5),andP(X≥a)≈P(Y ≥a−0.5). Example The probability that a light bulb will fail in a year is 0.75, and light bulbsfailindependently. If192bulbsareinstalled,whatistheprobabilitythatthe numberwhichfailinayearliesbetween140and150inclusive Solution Let X be the number of light bulbs which fail in a year. Then X ∼ Bin(192,3/4), and so E(X) =144, Var(X) =36. So X is approximated byY ∼ N(144,36),and P(140≤X≤150)≈P(139.5≤Y ≤150.5) bythecontinuitycorrection.3.10. WORKEDEXAMPLES 63 LetZ =(Y−144)/6. ThenZ∼N(0,1),and   139.5−144 150.5−144 P(139.5≤Y ≤150.5) = P ≤Z≤ 6 6 = P(−0.75≤Z≤1.083) = 0.8606−0.2268 (fromtables) = 0.6338. 3.10 Workedexamples Question I roll a fair die twice. Let the random variable X be the maximum of the two numbers obtained, and let Y be the modulus of their difference (that is, thevalueofY isthelargernumberminusthesmallernumber). (a)Writedownthejointp.m.f. of (X,Y). (b)Writedownthep.m.f. ofX,andcalculateitsexpectedvalueanditsvariance. (c)Writedownthep.m.f. ofY,andcalculateitsexpectedvalueanditsvariance. (d)AretherandomvariablesX andY independent Solution (a) Y 0 1 2 3 4 5 1 1 0 0 0 0 0 36 1 2 2 0 0 0 0 36 36 1 2 2 3 0 0 0 36 36 36 1 2 2 2 X 4 0 0 36 36 36 36 1 2 2 2 2 5 0 36 36 36 36 36 1 2 2 2 2 2 6 36 36 36 36 36 36 Thebestwaytoproducethisistowriteouta6×6tablegivingallpossiblevalues for the two throws, work out for each cell what the values of X and Y are, and then count the number of occurrences of each pair. For example: X =5, Y =2 canoccurintwoways: thenumbersthrownmustbe (5,3)or (3,5). (b)Takerowsums: x 1 2 3 4 5 6 1 3 5 7 9 11 P(X =x) 36 36 36 36 36 3664 CHAPTER3. RANDOMVARIABLES Henceintheusualway 161 2555 E(X)= , Var(X)= . 36 1296 (c)Takecolumnsums: y 0 1 2 3 4 5 6 10 8 6 4 2 P(Y =y) 36 36 36 36 36 36 andso 35 665 E(Y)= , Var(Y)= . 18 324 8 (d)No: e.g. P(X =1,Y =2)=0butP(X =1)·P(Y =2)= . 1296 Question Anarchershootsanarrowatatarget. Thedistanceofthearrowfrom thecentreofthetargetisarandomvariableX whosep.d.f. isgivenby  2 (3+2x−x )/9 ifx≤3, f (x)= X 0 if x3. Thearcher’sscoreisdeterminedasfollows: Distance X 0.5 0.5≤X 1 1≤X 1.5 1.5≤X 2 X≥2 Score 10 7 4 1 0 Constructtheprobabilitymassfunctionforthearcher’sscore,andfindthearcher’s expectedscore. Solution First we work out the probability of the arrow being in each of the givenbands: Z 2 0.5 3+2x−x P(X 0.5)=F (0.5)−F (0) = dx X X 9 0   1/2 2 3 9x+3x −x = 27 0 41 = . 216 Similarly we find that P(0.5≤ X 1) = 47/216, P(1≤ X 1.5) = 47/216, P(1.5≤X 2)=41/216,andP(X≥2)=40/216. Sothep.m.f. fotthearcher’s scoreS is s 0 1 4 7 10 40 41 47 47 41 P(S=s) 216 216 216 216 2163.10. WORKEDEXAMPLES 65 Hence 41+47·4+47·7+41·10 121 E(S)= = . 216 27 Question Let T be the lifetime in years of new bus engines. Suppose that T is continuouswithprobabilitydensityfunction    0 forx1 f (x)= T d   forx1 3 x forsomeconstantd. (a)Findthevalueofd. (b)FindthemeanandmedianofT. (c) Suppose that 240 new bus engines are installed at the same time, and that their lifetimes are independent. By making an appropriate approximation, findtheprobabilitythatatmost10oftheengineslastfor4yearsormore. Solution (a)Theintegralof f (x),overthesupportofT,mustbe1. Thatis, T Z ∞ d 1 = dx 3 x 1   ∞ −d = 2 2x 1 = d/2, sod =2. (b)Thec.d.f. ofT isobtainedbyintegratingthep.d.f.;thatis,itis    0 for x1 F (x)= T 1   1− forx1 2 x ThemeanofT is Z Z ∞ ∞ 2 xf (x)dx= dx=2. T 2 x 1 1 2 The median is the value m such that F (m) =1/2. That is, 1−1/m =1/2, T √ orm= 2.66 CHAPTER3. RANDOMVARIABLES (c)Theprobabilitythatanenginelastsforfouryearsormoreis   2 1 1 1−F (4)=1− 1− = . T 4 16 So, if 240 engines are installed, the number which last for four years or more is a binomial random variable X ∼ Bin(240,1/16), with expected value 240× (1/16)=15andvariance240×(1/16)×(15/16)=225/16. 2 We approximate X by Y ∼ N(15,(15/4) ). Using the continuity correction, P(X≤10)≈P(Y ≤10.5). Now,ifZ =(Y−15)/(15/4),thenZ∼N(0,1),and P(Y ≤10.5) = P(Z≤−1.2) = 1−P(Z≤1.2) = 0.1151 usingthetableofthestandardnormaldistribution. NotethatwestartwiththecontinuousrandomvariableT,movetothediscrete random variable X, and then move on to the continuous random variablesY and Z,wherefinallyZ isstandardnormalandsoisinthetables. A true story The answer to the question at the end of the last chapter: As the studentsintheclassobviouslyknew,theclassincludedapairoftwins (Thetwins wereLeoandWillyMoser,whobothhadsuccessfulcareersasmathematicians.) But what went wrong with our argument for the Birthday Paradox We as sumed(withoutsayingso)thatthebirthdaysofthepeopleintheroomwereinde pendent;butofcoursethebirthdaysoftwinsareclearlynotindependentChapter4 Moreonjointdistribution We have seen the joint p.m.f. of two discrete random variables X andY, and we have learned what it means for X and Y to be independent. Now we examine this further to see measures of nonindependence and conditional distributions of randomvariables. 4.1 Covarianceandcorrelation InthissectionweconsiderapairofdiscreterandomvariablesX andY. Remember thatX andY areindependentif P(X =a,Y =b )=P(X =a )·P(Y =b ) i j i j holds for any pair (a,b ) of values of X and Y. We introduce a number (called i j thecovarianceof X andY)whichgivesameasureofhowfartheyarefrombeing independent. Look back at the proof of Theorem 21(b), where we showed that if X and Y areindependentthenVar(X+Y)=Var(X)+Var(Y). Wefoundthat,inanycase, Var(X+Y)=Var(X)+Var(Y)+2(E(XY)−E(X)E(Y)), andthenprovedthatifX andY areindependentthenE(XY)=E(X)E(Y),sothat thelasttermiszero. Now we define the covariance of X and Y to be E(XY)−E(X)E(Y). We write Cov(X,Y) for this quantity. Then the argument we had earlier shows the following: Theorem4.1 (a)Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y). (b)IfX andY areindependent,thenCov(X,Y)=0. 6768 CHAPTER4. MOREONJOINTDISTRIBUTION Infact,amoregeneralversionof(a),provedbythesameargument,saysthat 2 2 Var(aX+bY)=a Var(X)+b Var(Y)+2abCov(X,Y). (4.1) Another quantity closely related to covariance is the correlation coefficient, corr(X,Y),whichisjusta“normalised”versionofthecovariance. Itisdefinedas follows: Cov(X,Y) p corr(X,Y)= . Var(X)Var(Y) Thepointofthisisthefirstpartofthefollowingtheorem. Theorem4.2 LetX andY berandomvariables. Then (a)−1≤corr(X,Y)≤1; (b)ifX andY areindependent,thencorr(X,Y)=0; (c) ifY =mX+c for some constants m6=0 and c, then corr(X,Y)=1 if m0, andcorr(X,Y)=−1ifm0. Theproofofthefirstpartisoptional: seetheendofthissection. Butnotethat thisisanothercheckonyourcalculations: ifyoucalculateacorrelationcoefficient whichisbiggerthan1orsmallerthan−1,thenyouhavemadeamistake. Part(b) followsimmediatelyfrompart(b)oftheprecedingtheorem. 2 2 Forpart(c),supposethatY =mX+c. LetE(X)=μandVar(X)=α,sothatE(X )=μ +α. Nowwejustcalculateeverythinginsight. E(Y) = E(mX+c)=mE(X)+c=mμ+c 2 2 2 2 2 2 2 E(Y ) = E(m X +2mcX+c )=m (μ +α)+2mcμ+c 2 2 2 Var(Y) = E(Y )−E(Y) =m α 2 2 E(XY) = E(mX +cX)=m(μ +α)+cμ; Cov(X,Y) = E(XY)−E(X)E(Y)=mα p √ 2 2 corr(X,Y) = Cov(X,Y)/ Var(X)Var(Y)=mα/ m α n +1 ifm0, = −1 ifm0. Thus the correlation coefficient is a measure of the extent to which the two variablesarerelated. Itis+1ifY increaseslinearlywithX;0ifthereisnorelation between them; and−1 ifY decreases linearly as X increases. More generally, a positive correlation indicates a tendency for larger X values to be associated with largerY values;anegativevalue,forsmallerX valuestobeassociatedwithlarger Y values.4.1. COVARIANCEANDCORRELATION 69 Example I have two red pens, one green pen, and one blue pen, and I choose twopenswithoutreplacement. LetX bethenumberofredpensthatIchooseand Y the number of green pens. Then the joint p.m.f. of X and Y is given by the followingtable: Y 0 1 1 0 0 6 1 1 X 1 3 3 1 2 0 6 From this we can calculate the marginal p.m.f. of X and ofY and hence find theirexpectedvaluesandvariances: E(X)=1, Var(X)=1/3, E(Y)=1/2, Var(Y)=1/4. Also,E(XY)=1/3,sincethesum E(XY)= a b P(X =a,Y =b ) i j i j ∑ i,j containsonlyonetermwhereallthreefactorsarenonzero. Hence Cov(X,Y)=1/3−1/2=−1/6, and −1/6 1 corr(X,Y)=p =−√ . 1/12 3 The negative correlation means that small values of X tend to be associated with larger values ofY. Indeed, if X =0 thenY must be 1, and if X =2 thenY must be0,butifX =1thenY canbeeither0or1. Example We have seen that if X and Y are independent then Cov(X,Y) = 0. However, it doesn’t work the other way around. Consider the following joint p.m.f. Y −1 0 1 1 1 −1 0 5 5 1 X 0 0 0 5 1 1 1 0 5 570 CHAPTER4. MOREONJOINTDISTRIBUTION Now calculation shows that E(X) =E(Y) =E(XY) =0, so Cov(X,Y) =0. But X andY arenotindependent: for P(X =−1)=2/5, P(Y =0)=1/5, but P(X = −1,Y =0)=0. WecalltworandomvariablesX andY uncorrelatedifCov(X,Y)=0(inother words,ifcorr(X,Y)=0). Sowecansay: Independentrandomvariablesareuncorrelated,butuncorrelatedran domvariablesneednotbeindependent. Hereistheproofthatthecorrelationcoefficientliesbetween−1and1. Clearlythisisexactly equivalenttoprovingthatitssquareisatmost1,thatis,that 2 Cov(X,Y) ≤Var(X)·Var(Y). Thisdependsonthefollowingfact: 2 Let p,q,r be real numbers with p0. Suppose that px +2qx+r≥0 for all real 2 numbersx. Then q ≤ pr. 2 For, when we plot the graph y= px +2qx+r, we get a parabola; the hypothesis means that this parabola never goes below the Xaxis, so that either it lies entirely above the axis, or it touches it 2 in one point. This means that thequadraticequation px +2qx+r =0eitherhasnorealroots, or 2 hastwoequalrealroots. Fromhighschoolalgebra,weknowthatthismeansthat q ≤ pr. Nowlet p=Var(X),q=Cov(X,Y),and r =Var(Y). Equation(4.1)showsthat 2 px +2qx+r =Var(xX+Y). (NotethatxisanarbitraryrealnumberhereandhasnoconnectionwiththerandomvariableX) 2 Sincethevarianceofarandomvariableisnevernegative,weseethat px +2qx+r≥0forall 2 2 choicesof x. Nowourargumentaboveshowsthatq ≤ pr,thatis,Cov(X,Y) ≤Var(X)·Var(Y), asrequired. 4.2 Conditionalrandomvariables RememberthattheconditionalprobabilityofeventBgiveneventAisP(BA)= P(A∩B)/P(A). SupposethatX isadiscreterandomvariable. Thentheconditionalprobability thatX takesacertainvaluea,givenA,isjust i P(AholdsandX =a ) i P(X =a A)= . i P(A) This defines the probability mass function of the conditional random variable XA. Sowecan,forexample,talkabouttheconditionalexpectation E(XA)= a P(X =a A). i i ∑ i4.2. CONDITIONALRANDOMVARIABLES 71 Nowtheevent Amightitselfbedefinedbyarandomvariable;forexample, A mightbetheeventthatY takesthevalueb . Inthiscase,wehave j P(X =a,Y =b ) i j P(X =a Y =b )= . i j P(Y =b ) j In other words, we have taken the column of the joint p.m.f. table of X and Y corresponding to the valueY =b . The sum of the entries in this column is just j P(Y =b ),themarginaldistributionofY. Wedividetheentriesinthecolumnby j thisvaluetoobtainanewdistributionofX (whoseprobabilitiesaddupto1). Inparticular,wehave E(XY =b )= a P(X =a Y =b ). j i i j ∑ i Example I have two red pens, one green pen, and one blue pen, and I choose twopenswithoutreplacement. LetX bethenumberofredpensthatIchooseand Y the number of green pens. Then the joint p.m.f. of X and Y is given by the followingtable: Y 0 1 1 0 0 6 1 1 X 1 3 3 1 2 0 6 Inthiscase,theconditionaldistributionsofX correspondingtothetwovalues ofY areasfollows: a 0 1 2 a 0 1 2 2 1 1 2 P(X =aY =0) 0 P(X =aY =1) 0 3 3 3 3 Wehave 4 2 E(XY =0)= , E(XY =1)= . 3 3 If we know the conditional expectation of X for all values of Y, we can find theexpectedvalueofX: Proposition4.3 E(X)= E(XY =b )P(Y =b ). j j ∑ j Proof: E(X) = a P(X =a ) i i ∑ i72 CHAPTER4. MOREONJOINTDISTRIBUTION = a P(X =a Y =b )P(Y =b ) i i j j ∑ ∑ i j = a P(X =a Y =b P(Y =b ) i i j j ∑ ∑ j i = E(XY =b )P(Y =b ). j j ∑ j Intheaboveexample,wehave E(X) = E(XY =0)P(Y =0)+E(XY =1)P(Y =1) = (4/3)×(1/2)+(2/3)×(1/2) = 1. Example Letusrevisitthegeometricrandomvariableandcalculateitsexpected value. Recall the situation: I have a coin with probability p of showing heads; I tossitrepeatedlyuntilheadsappearsforthefirsttime;X isthenumberoftosses. Let Y be the Bernoulli random variable whose value is 1 if the result of the first toss is heads, 0 if it is tails. IfY =1, then we stop the experiment then and there; so if Y = 1, then necessarily X = 1, and we have E(X Y = 1) = 1. On the other hand, if Y = 0, then the sequence of tosses from that point on has the same distribution as the original experiment; so E(X Y =0) =1+E(X) (the 1 countingthefirsttoss). So E(X) = E(XY =0)P(Y =0)+E(XY =1)P(Y =1) = (1+E(X))·q+1·p = E(X)(1−p)+1; rearrangingthisequation,wefindthatE(X)=1/p,confirmingourearliervalue. In Proposition 2.1, we saw that independence of events can be characterised in terms of conditional probabilities: A and B are independent if and only if they satisfy P(AB)=P(A). A similar result holds for independence of random vari ables: Proposition4.4 Let X and Y be discrete random variables. Then X and Y are independent if and only if, for any values a and b of X and Y respectively, we i j have P(X =a Y =b )=P(X =a ). i j i This is obtained by applying Proposition 15 to the events X =a andY =b . i j It can be stated in the following way: X andY are independent if the conditional p.m.f. ofX(Y =b )isequaltothep.m.f. ofX,foranyvalueb ofY. j j4.3. JOINTDISTRIBUTIONOFCONTINUOUSR.V.S 73 4.3 Jointdistributionofcontinuousr.v.s For continuous random variables, the covariance and correlation can be defined by the same formulae as in the discrete case; and Equation (4.1) remains valid. But we have to examine what is meant by independence for continuous random variables. The formalism here needs even more concepts from calculus than we haveusedbefore: functionsoftwovariables,partialderivatives,doubleintegrals. Iassumethatthisisunfamiliartoyou,sothissectionwillbebriefandcanmostly beskipped. LetX andY becontinuousrandomvariables. Thejointcumulativedistribution functionofX andY isthefunctionF oftworealvariablesgivenby X,Y F (x,y)=P(X≤x,Y ≤y). X,Y We define X andY to be independent if P(X ≤x,Y ≤y) =P(X ≤x)·P(Y ≤y), for any x and y, that is, F (x,y) =F (x)·F (y). (Note that, just as in the one X,Y X Y variablecase,X ispartofthenameofthefunction,whilexistheargumentofthe function.) ThejointprobabilitydensityfunctionofX andY is 2 ∂ f (x,y)= F (x,y). X,Y X,Y ∂x∂y In other words, differentiate with respect to x keeping y constant, and then differ entiatewithrespecttoykeepingxconstant(ortheotherwayround: theansweris thesameforallfunctionsweconsider.) Theprobabilitythatthepairofvaluesof(X,Y)correspondstoapointinsome region of the plane is obtained by taking the double integral of f over that X,Y region. Forexample, Z Z d b P(a≤X≤b,c≤Y ≤d)= f (x,y)dxdy X,Y c a (the right hand side means, integrate with respect to x between a and b keeping y fixed; the result is a function of y; integrate this function with respect to y from c tod.) Themarginalp.d.f. ofX isgivenby Z ∞ f (x)= f (x,y)dy, X X,Y −∞ andthemarginalp.d.f. ofY issimilarly Z ∞ f (y)= f (x,y)dx. Y X,Y −∞74 CHAPTER4. MOREONJOINTDISTRIBUTION Thentheconditionalp.d.f. ofX(Y =b)is f (x,b) X,Y f (x)= . X(Y=b) f (b) Y TheexpectedvalueofXY is,notsurprisingly, Z Z ∞ ∞ E(XY)= xyf (x,y)dxdy, X,Y −∞ −∞ andthenasinthediscretecase Cov(X,Y) p Cov(X,Y)=E(XY)−E(X)E(Y), corr(X,Y)= . Var(X)Var(Y) Finally,andimportantly, ThecontinuousrandomvariablesX andY areindependentifandonly if f (x,y)= f (x)· f (y). X,Y X Y As usual this holds if and only if the conditional p.d.f. of X (Y =b) is equal to the marginal p.d.f. of X, for any value b. Also, if X andY are independent, then Cov(X,Y)=corr(X,Y)=0(butnotconversely). 4.4 Transformationofrandomvariables If a continuous random variableY is a function of another r.v. X, we can find the distributionofY intermsofthatofX. Example Let X andY be random variables. Suppose that X ∼U0,4 (uniform √ on0,4)andY = X. WhatisthesupportofY Findthecumulativedistribution functionandtheprobabilitydensityfunctionofY. √ Solution (a)ThesupportofX is0,4,andY = X,sothesupportofY is0,2. (b)Wehave f (x)=x/4for0≤x≤4. Now X F (y) = P(Y ≤y) Y 2 = P(X≤y ) 2 = F (y ) X 2 = y /44.4. TRANSFORMATIONOFRANDOMVARIABLES 75 for 0≤y≤2; of course F (y)=0 for y0 and F (y)=1 for y2. (Note that Y Y √ 2 Y ≤yifandonlyifX≤y ,sinceY = X.) (c)Wehave n d y/2 if0≤y≤2, f (y)= F (y)= Y Y dy 0 otherwise. Theargumentin(b)isthekey. IfweknowY asafunctionofX,sayY =g(X), where g is an increasing function, then the event Y ≤ y is the same as the event X ≤ h(Y), where h is the inverse function of g. This means that y = g(x) if and √ 2 onlyifx=h(y). (Inourexample,g(x)= x,andsoh(y)=y .) Thus F (y)=F (h(y)), Y X andso,bytheChainRule, 0 f (y)= f (h(y))h (y), Y X 0 where h is the derivative of h. (This is because f (x) is the derivative of F (x) X X with respect to its argument x, and the Chain Rule says that if x =h(y) we must 0 multiplybyh (y)tofindthederivativewithrespecttoy.) Applyingthisformulainourexamplewehave 1 y f (y)= ·2y= Y 4 2 for0≤y≤2,sincethep.d.f. ofX is f (x)=1/4for0≤x≤4. X Hereisaformalstatementoftheresult. Theorem4.5 Let X be a continuous random variable. Let g be a real function which is either strictly increasing or strictly decreasing on the support of X, and whichisdifferentiablethere. LetY =g(X). Then (a)the supportofY istheimageof thesupportofX underg; 0 (b) the p.d.f. of Y is given by f (y) = f (h(y))h (y), where h is the inverse Y X functionofg. 2 For example, here is the proof of Proposition 3.6: if X ∼ N(μ,σ ) and Y = (X−μ)/σ,thenY ∼N(0,1). Recallthat 1 2 2 −(x−μ) /2σ f (x)= √ e . X σ 2π76 CHAPTER4. MOREONJOINTDISTRIBUTION We haveY =g(X), where g(x) = (x−μ)/σ; this function is everywhere strictly increasing(thegraphisastraightlinewithslope1/σ),andtheinversefunctionis 0 x=h(y)=σy+μ. Thus,h (y)=σ,and 1 2 −y /2 √ f (y)= f (σy+μ)·σ= e , Y X 2π thep.d.f. ofastandardnormalvariable. However, rather than remember this formula, together with the conditions for itsvalidity,Irecommendgoingbacktotheargumentweusedintheexample. If the transforming function g is not monotonic (that is, not either increasing ordecreasing), thenlifeisabitmorecomplicated. Forexample, if X isarandom 2 variable taking both positive and negative values, andY =X , then a given value √ √ y ofY could arise from either of the values y and− y of X, so we must work outthetwocontributionsandaddthemup. 2 Example X∼N(0,1)andY =X . Findthep.d.f. ofY. √ 2 −x /2 The p.d.f. of X is (1/ 2π)e . Let Φ(x) be its c.d.f., so that P(X ≤x) = Φ(x),and 1 2 0 −x /2 Φ (x)= √ e . 2π √ √ 2 NowY =X ,soY ≤yifandonlyif− y≤X≤ y. Thus F (y) = P(Y ≤y) Y p p = P(− y≤X≤ y) p p = Φ( y)−Φ(− y) p p = Φ( y−(1−Φ( y)) (bysymmetryofN(0,1)) p = 2Φ( y)−1. So d f (y) = F (y) Y Y dy √ 1 0 = 2Φ ( y)· √ (bytheChainRule) 2 y 1 −y/2 = √ e . 2πy Ofcourse,thisisvalidfory0;fory0,thep.d.f. iszero.4.5. WORKEDEXAMPLES 77 Note the 2 in the line labelled “by the Chain Rule”. If you blindly applied √ the formula of Theorem 4.5, using h(y)= y, you would not get this 2; it arises 2 from the fact that, sinceY =X , each value ofY corresponds to two values of X (one positive, one negative), and each value gives the same contribution, by the symmetryofthep.d.f. ofX. 4.5 Workedexamples Question Two numbers X and Y are chosen independently from the uniform distributionontheunitinterval 0,1. LetZ bethemaximumofthetwonumbers. Findthep.d.f. ofZ,andhencefinditsexpectedvalue,varianceandmedian. Solution Thec.d.f.sofX andY areidentical,thatis, ( 0 ifx0, F (x)=F (x)= x if0x1, X Y 1 ifx1. (Thevariablecanbecalledxinbothcases;itsnamedoesn’tmatter.) Thekeytotheargumentistonoticethat Z =max(X,Y)≤x ifandonlyif X≤xandY ≤x. (For, if both X andY are smaller than a given value x, then so is their maximum; but if at least one of them is greater than x, then again so is their maximum.) For 0≤x≤1,wehaveP(X≤x)=P(Y ≤x)=x;byindependence, 2 P(X≤xandY ≤x)=x·x=x . 2 Thus P(Z≤x)=x . Of course this probability is 0 if x0 and is 1 if x1. So thec.d.f. ofZ is ( 0 ifx0, 2 F (x)= x if0x1, Z 1 ifx1. 2 ThemedianofZ isthevalueofmsuchthatF (m)=1/2,thatism =1/2,or Z √ m=1/ 2. Weobtainthep.d.f. ofZ bydifferentiating: n 2x if0x1, f (x)= Z 0 otherwise. ThenwecanfindE(Z)andVar(Z)intheusualway:   Z Z 2 1 1 2 2 1 2 3 E(Z)= 2x dx= , Var(Z)= 2x dx− = . 3 3 18 0 078 CHAPTER4. MOREONJOINTDISTRIBUTION Question Irollafairdiebearingthenumbers1to6. IfN isthenumbershowing onthedie,IthentossafaircoinN times. LetX bethenumberofheadsIobtain. (a)Writedownthep.m.f. forX. (b)CalculateE(X)withoutusingthisinformation. Solution (a) If we were given that N = n, say, then X would be a binomial n n Bin(n,1/2)randomvariable. SoP(X =kN =n)= C (1/2) . k BytheToTP, 6 P(X =k)= P(X =kN =n)P(N =n). ∑ n=1 Clearly P(N = n) = 1/6 for n = 1,...,6. So to find P(X = k), we add up the probability that X =k for a Bin(n,1/2) r.v. for n=k,...,6 and divide by 6. (We startatkbecauseyoucan’tgetkheadswithfewerthankcointosses) Theanswer comesto k 0 1 2 3 4 5 6 63 120 99 64 29 8 1 P(X =k) 384 384 384 384 384 384 384 Forexample, 4 4 5 5 6 6 C (1/2) + C (1/2) + C (1/2) 4+10+15 4 4 4 P(X =4)= = . 6 384 (b)ByProposition4.3, 6 E(X)= E(X(N =n))P(N =n). ∑ n=1 NowifwearegiventhatN=nthen,asweremarked,X hasabinomialBin(n,1/2) distribution,withexpectedvaluen/2. So 6 1+2+3+4+5+6 7 E(X)= (n/2)·(1/6)= = . ∑ 2·6 4 n=1 Tryworkingitoutfromthep.m.f. tocheckthattheansweristhesameAppendixA Mathematicalnotation TheGreekalphabet Name Capital Lowercase alpha A α beta B β gamma Γ γ delta Δ δ epsilon E ε zeta Z ζ eta H η theta Θ θ iota I ι kappa K κ lambda Λ λ mu M μ Mathematicians use the Greek alpha nu N ν bet for an extra supply of symbols. xi Ξ ξ Some,likeπ,havestandardmeanings. omicron O o You don’t need to learn this; keep it pi Π π for reference. Apologies to Greek stu rho P ρ dents: you may not recognise this, but sigma Σ σ it is the Greek alphabet that mathe tau T τ maticiansuse upsilon ϒ υ Pairs that are often confused are zeta phi Φ φ and xi, or nu and upsilon, which look chi X χ alike; and chi and xi, or epsilon and psi Ψ ψ upsilon,whichsoundalike. omega Ω ω 7980 APPENDIXA. MATHEMATICALNOTATION Numbers Notation Meaning Example N Naturalnumbers 1,2,3,... (somepeopleinclude0) Z Integers ...,−2,−1,0,1,2,... √ 1 R Realnumbers , 2,π,... 2 x modulus 2=2,−3=3 a a/bor aover b 12/3=4,2/4=0.5 b ab adividesb 412   m m 5 C or mchoosen C =10 n 2 n n nfactorial 5=120 b 3 2 2 2 2 x x +x +···+x i =1 +2 +3 =14 i a a+1 b ∑ ∑ i=a i=1 (seesectiononSummationbelow) x≈y xisapproximatelyequaltoy Sets Notation Meaning Example ... aset 1,2,3 NOTE:1,2=2,1 x∈A xisanelementofthesetA 2∈1,2,3 2 x:... thesetofallxsuchthat... x:x =4=−2,2 orx... A cardinalityofA 1,2,3=3 (numberofelementsinA) A∪B AunionB 1,2,3∪2,4=1,2,3,4 (elementsineitherAorB) A∩B AintersectionB 1,2,3∩2,4=2 (elementsinbothAandB) A\B setdifference 1,2,3\2,4=1,3 (elementsinAbutnotB) A⊆B AisasubsetofB(orequal) 1,3⊆1,2,3 0 A complementofA everythingnotinA 0/ emptyset(noelements) 1,2∩3,4=0/ (x,y) orderedpair NOTE: (1,2)= 6 (2,1) A×B Cartesianproduct 1,2×1,3= (setofallorderedpairs) (1,1),(2,1),(1,3),(2,3)81 Summation Whatisit Leta ,a ,a ,...benumbers. Thenotation 1 2 3 n a ∑ i i=1 (read“sum,fromiequals1ton,ofa”),means: addupthenumbersa ,a ,...,a ; i 1 2 n thatis, n a =a +a +···+a . i 1 2 n ∑ i=1 n Thenotation a meansexactlythesamething. Thevariableior j iscalled ∑ j j=1 a“dummyvariable”. m The notation a is not the same, since (if m and n are different) it is telling ∑ i i=1 ustoaddupadifferentnumberofterms. Thesumdoesn’thavetostartat1. Forexample, 20 a =a +a +···+a . i 10 11 20 ∑ i=10 Sometimes I get lazy and don’t bother to write out the values: I just say a ∑ i i to mean: add up all the relevant values. For example, if X is a discrete random variable,thenwesaythat E(X)= a P(X =a ) i i ∑ i wherethesumisoverallisuchthata isavalueoftherandomvariableX. i Manipulation Thefollowingthreeruleshold. n n n (a +b )= a + b. (A.1) i i i i ∑ ∑ ∑ i=1 i=1 i=1 Imagine the as and bs written out with a +b on the first line, a +b on the 1 1 2 2 second line, and so on. The lefthand side says: add the two terms in each line,82 APPENDIXA. MATHEMATICALNOTATION andthenaddupalltheresults. Therighthandsidesays: addthefirstcolumn(all the as) and the second column (all the bs), and then add the results. The answers mustbethesame. n m n m a · b = a b . (A.2) i j i j ∑ ∑ ∑∑ i=1 j=1 i=1 j=1 Thedoublesumsaysaddupalltheseproducts,forallvaluesofiand j. Asimple exampleshowshowitworks: (a +a )(b +b )=a b +a b +a b +a b . 1 2 1 2 1 1 1 2 2 1 2 2 If in place of numbers, we have functions of x, then we can “differentiate termbyterm”: n n d d f (x)= f (x). (A.3) i i ∑ ∑ dx dx i+1 i+1 Thelefthandsidesays: addupthefunctionsanddifferentiatethesum. Theright says: differentiateeachfunctionandaddupthederivatives. AnotherusefulresultistheBinomialTheorem: n n n n−k k (x+y) = C x y . k ∑ k=0 Infinitesums ∞ Sometimes we meet infinite sums, which we write as a for example. This i ∑ i=1 doesn’t just mean “add up infinitely many values”, since that is not possible. We need Analysis to give us a definition in general. But sometimes we know the i−1 answeranotherway: forexample,ifa =ar ,where−1r1,then i ∞ a 2 a =a+ar+ar +···= , i ∑ 1−r i=1 using the formula for the sum of the “geometric series”. You also need to know thesumofthe“exponentialseries” ∞ i 2 3 4 x x x x x =1+x+ + + +···=e . ∑ i 2 6 24 i=0 Do the three rules of the preceding section hold Sometimes yes, sometimes no. InAnalysisyouwillseesomeanswerstothisquestion. Inalltheexamplesyoumeetinthisbook,theruleswillbevalid.AppendixB Probabilityandrandomvariables Notation Inthetable,AandBareevents,X andY arerandomvariables. Notation Meaning Page P(A) probabilityofA 3 P(AB) conditionalprobabilityofAgivenB 24 X =Y thevaluesofX andY areequal X∼Y X andY havethesamedistribution 41 (thatis,samep.m.f. orsamep.d.f.) E(X) expectedvalueofX 41 Var(X) varianceofX 42 Cov(X,Y) covarianceofX andY 67 corr(X,Y) correlationcoefficientofX andY 68 XB conditionalrandomvariable 70 X(Y =b) 71 BernoullirandomvariableBernoulli(p)(p.48) • Occurswhenthereisasingletrialwithafixedprobability pofsuccess. • Takesonlythevalues0and1. • p.m.f. P(X =0)=q,P(X =1)= p,whereq=1−p. • E(X)= p,Var(X)= pq. 8384 APPENDIXB. PROBABILITYANDRANDOMVARIABLES BinomialrandomvariableBin(n,p)(p.49) • Occurs when we are counting the number of successes in n independent trials with fixed probability p of success in each trial, e.g. the number of heads in n coin tosses. Also, sampling with replacement from a population withaproportion pofdistinguishedelements. • ThesumofnindependentBernoulli(p)randomvariables. • Values0,1,2,...,n. n n−k k • p.m.f. P(X =k)= C q p for0≤k≤n,whereq=1−p. k • E(X)=np,Var(X)=npq. HypergeometricrandomvariableHg(n,M,N)(p.51) • Occurswhenwearesampling nelementswithoutreplacementfromapop ulationofN elementsofwhichM aredistinguished. • Values0,1,2,...,n. M N−M N • p.m.f. P(X =k)=( C · C )/ C . k n−k n       M M N−M N−n • E(X)=n ,Var(X)=n . N N N N−1 • ApproximatelyBin(n,M/N)ifnissmallcomparedtoN,M,N−M. GeometricrandomvariableGeom(p)(p.52) • Describes the number of trials up to and including the first success in a sequence of independent Bernoulli trials, e.g. number of tosses until the firstheadwhentossingacoin. • Values1,2,...(anypositiveinteger). k−1 • p.m.f. P(X =k)=q p,whereq=1−p. 2 • E(X)=1/p,Var(X)=q/p .85 PoissonrandomvariablePoisson(λ)(p.54) • Describes the number of occurrences of a random event in a fixed time interval,e.g. thenumberoffishcaughtinaday. • Values0,1,2,...(anynonnegativeinteger) −λ k • p.m.f. P(X =k)=e λ /k. • E(X)=λ,Var(X)=λ. • If n is large, p is small, and np =λ, then Bin(n,p) is approximately equal toPoisson(λ)(inthesensethatthep.m.f.sareapproximatelyequal). UniformrandomvariableUa,b(p.58) • Occurswhenanumberischosenatrandomfromtheinterval a,b,withall valuesequallylikely. ( 0 ifxa, • p.d.f. f(x)= 1/(b−a) ifa≤x≤b, 0 ifxb. ( 0 if xa, • c.d.f. F(x)= (x−a)/(b−a) ifa≤x≤b, 1 if xb. 2 • E(X)=(a+b)/2,Var(X)=(b−a) /12. ExponentialrandomvariableExp(λ)(p.59) • OccursinthesamesituationsasthePoissonrandomvariable,butmeasures thetimefromnowuntilthefirstoccurrenceoftheevent.  0 ifx0, • p.d.f. f(x)= −λx λe ifx≥0.  0 if x0, • c.d.f. F(x)= −λx 1−e ifx≥0. 2 • E(X)=1/λ,Var(X)=1/λ . • However long you wait, the time until the next occurrence has the same distribution.86 APPENDIXB. PROBABILITYANDRANDOMVARIABLES 2 NormalrandomvariableN(μ,σ )(p.59) • The limit of the sum (or average) of many independent Bernoulli random variables. This also works for many other types of random variables: this statementisknownastheCentralLimitTheorem. 1 2 2 −(x−μ) /2σ • p.d.f. f(x)= √ e . σ 2π • Nosimpleformulaforc.d.f.;usetables. 2 • E(X)=μ,Var(X)=σ . • Forlargen,Bin(n,p)isapproximatelyN(np,npq). 2 • Standard normal N(0,1) is given in the table. If X ∼ N(μ,σ ), then (X− μ)/σ∼N(0,1). The c.d.f.s of the Binomial, Poisson, and Standard Normal random variables aretabulatedintheNewCambridgeStatisticalTables,Tables1,2and4.