Lecture Notes on Probability Theory and Random Processes

probability random processes and estimation theory for engineers download, lecture notes on probability theory and stochastic processes pdf free download
Dr.AlexanderTyler Profile Pic
Dr.AlexanderTyler,India,Teacher
Published Date:21-07-2017
Your Website URL(Optional)
Comment
Lecture Notes on Probability Theory and Random Processes Jean Walrand Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720 August 25, 20042Table of Contents Table of Contents 3 Abstract 9 Introduction 1 1 Modelling Uncertainty 3 1.1 Models and Physical Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Concepts and Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Function of Hidden Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 A Look Back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Probability Space 13 2.1 Choosing At Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Countable Additivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5.1 Choosing uniformly inf1;2;:::;Ng . . . . . . . . . . . . . . . . . . 17 2.5.2 Choosing uniformly in 0;1 . . . . . . . . . . . . . . . . . . . . . . . 18 2 2.5.3 Choosing uniformly in 0;1 . . . . . . . . . . . . . . . . . . . . . . 18 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6.1 Stars and Bars Method . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Conditional Probability and Independence 27 3.1 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Bayes' Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 34 CONTENTS 3.4.1 Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.2 Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.3 De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.4 General De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Random Variable 37 4.1 Measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Examples of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Generating Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 Function of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.7 Moments of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.8 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.10 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 Random Variables 67 5.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Joint Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6 Conditional Expectation 85 6.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1.1 Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1.2 Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.1.3 Example 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2 MMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3 Two Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.4 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . . . . . 90 6.5 Gambling System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.7 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7 Gaussian Random Variables 101 7.1 Gaussian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.1.1 N(0;1): Standard Gaussian Random Variable . . . . . . . . . . . . . 101 2 7.1.2 N(¹;¾ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104CONTENTS 5 7.2 Jointly Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.1 N(000;III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.2 Jointly Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.3 Conditional Expectation J.G. . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8 Detection and Hypothesis Testing 121 8.1 Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.2 Maximum Likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . 122 8.3 Hypothesis Testing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.3.1 Simple Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8.3.3 Proof of the Neyman-Pearson Theorem . . . . . . . . . . . . . . . . 126 8.4 Composite Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.4.1 Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.4.2 Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.4.3 Example 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.5.1 MAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.5.2 MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.5.3 Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.6 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 9 Estimation 143 9.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Linear Least Squares Estimator: LLSE . . . . . . . . . . . . . . . . . . . . . 143 9.3 Recursive LLSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.4 Su±cient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.5.1 LSSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.6 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 10 Limits of Random Variables 163 10.1 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 10.2 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 10.3 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 10.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.4 Convergence In Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 2 10.5 Convergence in L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.6 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1696 CONTENTS 10.7 Convergence of Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 11 Law of Large Numbers & Central Limit Theorem 175 11.1 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 175 11.2 Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 176 11.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 11.4 Approximate Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 178 11.5 Con¯dence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 11.7 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 12 Random Processes Bernoulli - Poisson 189 12.1 Bernoulli Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 12.1.1 Time until next 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 12.1.2 Time since previous 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 191 12.1.3 Intervals between 1s . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 12.1.4 Saint Petersburg Paradox . . . . . . . . . . . . . . . . . . . . . . . . 191 12.1.5 Memoryless Property . . . . . . . . . . . . . . . . . . . . . . . . . . 192 12.1.6 Running Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 12.1.7 Gamblers Ruin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 12.1.8 Re°ected Running Sum . . . . . . . . . . . . . . . . . . . . . . . . . 194 12.1.9 Scaling: SLLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 12.1.10Scaling: Brownian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 12.2 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 12.2.1 Memoryless Property . . . . . . . . . . . . . . . . . . . . . . . . . . 200 12.2.2 Number of jumps in 0;t . . . . . . . . . . . . . . . . . . . . . . . . 200 12.2.3 Scaling: SLLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 12.2.4 Scaling: Bernoulli Poisson . . . . . . . . . . . . . . . . . . . . . . 201 12.2.5 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 12.2.6 Saint Petersburg Paradox . . . . . . . . . . . . . . . . . . . . . . . . 202 12.2.7 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 12.2.8 Time reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 12.2.9 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 12.2.10Markov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 12.2.11Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13 Filtering Noise 211 13.1 Linear Time-Invariant Systems . . . . . . . . . . . . . . . . . . . . . . . . . 212 13.1.1 De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 13.1.2 Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 13.2 Wide Sense Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . 217CONTENTS 7 13.3 Power Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 13.4 LTI Systems and Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 13.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 14 Markov Chains - Discrete Time 225 14.1 De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 14.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 14.3 Classi¯cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 14.4 Invariant Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 14.5 First Passage Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 14.6 Time Reversal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 14.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 14.8 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 15 Markov Chains - Continuous Time 245 15.1 De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 15.2 Construction (regular case) . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 15.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 15.4 Invariant Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 15.5 Time-Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 15.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 15.7 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 16 Applications 255 16.1 Optical Communication Link . . . . . . . . . . . . . . . . . . . . . . . . . . 255 16.2 Digital Wireless Communication Link . . . . . . . . . . . . . . . . . . . . . 258 16.3 M/M/1 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.4 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 16.5 A Simple Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 16.6 Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 A Mathematics Review 265 A.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 A.1.1 Real, Complex, etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 A.1.2 Min, Max, Inf, Sup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 A.2 Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 A.3 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 A.3.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 A.3.2 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 A.3.3 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 A.4 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 A.5 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2688 CONTENTS A.6 Countability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 A.7 Basic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 A.7.1 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . 270 A.7.2 Proof by Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 A.8 Sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 B Functions 275 C Nonmeasurable Set 277 C.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 C.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 C.3 Constructing S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 D Key Results 279 E Bertrand's Paradox 281 F Simpson's Paradox 283 G Familiar Distributions 285 G.1 Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 G.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Bibliography 293Abstract These notes are derived from lectures and o±ce-hour conversations in a junior/senior-level course on probability and random processes in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Thenotesdonotreplaceatextbook. Rather,theyprovideaguidethroughthematerial. The style is casual, with no attempt at mathematical rigor. The goal to to help the student ¯gure out the meaning of various concepts and to illustrate them with examples. When choosing a textbook for this course, we always face a dilemma. On the one hand, there are many excellent books on probability theory and random processes. However, we ¯nd that these texts are too demanding for the level of the course. On the other hand, books written for the engineering students tend to be fuzzy in their attempt to avoid subtle mathematical concepts. As a result, we always end up having to complement the textbook we select. If we select a math book, we need to help the student understand the meaning of the results and to provide many illustrations. If we select a book for engineers, we need to provide a more complete conceptual picture. These notes grew out of these e®orts at ¯lling the gaps. You will notice that we are not trying to be comprehensive. All the details are available in textbooks. There is no need to repeat the obvious. The author wants to thank the many inquisitive students he has had in that class and theverygood teachingassistants, inparticular TeresaTung, Mubaraq Misra, and Eric Chi, who helped him over the years; they contributed many of the problems. Happy reading and keep testing hypotheses Berkeley, June 2004 - Jean Walrand 9Introduction Engineeringsystemsaredesignedtooperatewellinthefaceofuncertaintyofcharacteristics of components and operating conditions. In some case, uncertainty is introduced in the operations of the system, on purpose. Understandinghowtomodeluncertaintyandhowtoanalyzeitse®ectsisorshouldbe an essential part of an engineer's education. Randomness is a key element of all systems we design. Communication systems are designed to compensate for noise. Internet routers are built to absorb tra±c °uctuations. Building must resist the unpredictable vibrations of an earthquake. The power distribution grid carries an unpredictable load. Integrated circuit manufacturing steps are subject to unpredictable variations. Searching for genes is looking for patterns among unknown strings. What should you understand about probability? It is a complex subject that has been constructedoverdecadesbypureandappliedmathematicians. Thousandsofbooksexplore various aspects of the theory. How much do you really need to know and where do you start? The¯rstkeyconceptishowtomodeluncertainty(seeChapter2-3). Whatdowemean by a \random experiment?" Once you understand that concept, the notion of a random variableshouldbecometransparent(seeChapters4-5). Youmaybesurprisedtolearnthat a random variable does not vary Terms may be confusing. Once you appreciate the notion of randomness, you should get some understanding for the idea of expectation (Section 4.5) andhowobservationsmodifyit(Chapter6). Aspecialclassofrandomvariables(Gaussian) 12 areparticularlyusefulinmanyapplications(Chapter7). Afteryoumasterthesekeynotions, youarereadytolookatdetection(Chapter 8)andestimationproblems(Chapter 9). These arerepresentativeexamplesofhowonecanprocessobservationtoreduceuncertainty. That is, how one learns. Many systems are subject to the cumulative e®ect of many sources of randomness. We study such e®ects in Chapter 11 after having provided some background in Chapter 10. The ¯nal set of important notions concern random processes: uncertain evolution over time. We look at particularly useful models of such processes in Chapters 12-15. We conclude the notes by discussing a few applications in Chapter 16. The concepts are di±cult, but the math is not (Appendix ?? reviews what you should know). The trick is to know what we are trying to compute. Look at examples and invent newonestoreinforceyourunderstandingofideas. Don'tgetdiscouragedifsomeideasseem obscure at ¯rst, but do not let the obscurity persist This stu® is not that hard, it is only new for you.Chapter 1 Modelling Uncertainty In this chapter we introduce the concept of a model of an uncertain physical system. We stress the importance of concepts that justify the structure of the theory. We comment on the notion of a hidden variable. We conclude the chapter with a very brief historical look at the key contributors and some notes on references. 1.1 Models and Physical Reality Probability Theory is a mathematical model of uncertainty. In these notes, we introduce examples of uncertainty and we explain how the theory models them. It is important to appreciate the di®erence between uncertainty in the physical world and the models of Probability Theory. That di®erence is similar to that between laws of theoretical physics and the real world: even though mathematicians view the theory as standing on its own, when engineers use it, they see it as a model of the physical world. Consider°ippingafaircoinrepeatedly. Designateby0and1thetwopossibleoutcomes of a coin °ip (say 0 for head and 1 for tail). This experiment takes place in the physical world. The outcomes are uncertain. In this chapter, we try to appreciate the probability model of this experiment and to relate it to the physical reality. 34 CHAPTER 1. MODELLING UNCERTAINTY 1.2 Concepts and Calculations In our many years of teaching probability models, we have always found that what is most subtle is the interpretation of the models, not the calculations. In particular, this introductory course uses mostly elementary algebra and some simple calculus. However, understandingthemeaningofthemodels,whatoneistryingtocalculate,requiresbecoming familiar with some new and nontrivial ideas. Mathematicians frequently state that \de¯nitions do not require interpretation." We beg to disagree. Although as a logical edi¯ce, it is perfectly true that no interpretation is needed; but to develop some intuition about the theory, to be able to anticipate theorems andresults,torelatethesedevelopmentstothephysicalreality,itisimportanttohavesome interpretation of the de¯nitions and of the basic axioms of the theory. We will attempt to develop such interpretations as we go along, using physical examples and pictures. 1.3 Function of Hidden Variable One idea is that the uncertainty in the world is fully contained in the selection of some hidden variable. (This model does not apply to quantum mechanics, which we do not consider here.) If this variable were known, then nothing would be uncertain anymore. Think of this variable as being picked by nature at the big bang. Many choices were possible, butoneparticularchoicewasmadeandeverythingderivesfromit. Inmostcases, it is easier to think of nature's choice only as it a®ects a speci¯c experiment, but we worry about this type of detail later. In other words, everything that is uncertain is a function of that hidden variable. By function, we mean that if we know the hidden variable, then we know everything else. Let us denote the hidden variable by . Take one uncertain thing, such as the outcome of the ¯fth coin °ip. This outcome is a function of . If we designate the outcome of1.4. A LOOK BACK 5 Figure 1.1: Adrien Marie Legendre the ¯fth coin °ip by X, then we conclude that X is a function of . We can denote that function by X(). Another uncertain thing could be the outcome of the twelfth coin °ip. We can denote it by Y(). The key point here is that X and Y are functions of the same . Remember, there is only one (picked by nature at the big bang). Summing up, everything that is random is some function X of some hidden variable . This is a model. To make this model more precise, we need to explain how is selected and what these functions X() are like. These ideas will keep us busy for a while 1.4 A Look Back The theory was developed by a number of inquiring minds. We brie°y review some of their contributions. (WecondensethishistoricalaccountfromtheverynicebookbyS.M.Stigler 9. For ease of exposition, we simplify the examples and the notation.) Adrien Marie LEGENDRE, 1752-1833 Best use of inaccurate measurements: Method of Least Squares. To start our exploration of \uncertainty" We propose to review very brie°y the various attempts at making use of inaccurate measurements. Say that an ampli¯er has some gain A that we would like to measure. We observe the6 CHAPTER 1. MODELLING UNCERTAINTY input X and the output Y and we know that Y = AX. If we could measure X and Y precisely, thenwecoulddetermine Abyasimpledivision. However, assumethatwecannot measure these quantities precisely. Instead we make two sets of measurements: (X;Y) and 0 0 0 0 (X ;Y ). We would like to ¯nd A so that Y = AX and Y = AX . For concreteness, say 0 0 that (X;Y) = (2;5) and (X ;Y ) = (4;7). No value of A works exactly for both sets of measurements. Theproblemisthatwedidnotmeasuretheinputandtheoutputaccurately enough, but that may be unavoidable. What should we do? One approach is to average the measurements, say by taking the arithmetic means: 0 0 ((X +X )=2;(Y +Y )=2)=(3;6) and to ¯nd the gain A so that 6=A£3, so that A=2. This approach was commonly used in astronomy before 1750. A second approach is to solve for A for each pair of measurements: For (X;Y), we ¯nd 0 0 A=2:5 and for (X ;Y ), we ¯nd A=1:75. We can average these values and decide that A should be close to (2:5+1:75)=2=2:125. We skip over many variations proposed by Mayer, Euler, and Laplace. Another approach is to try to ¯nd A so as to minimize the sum of the squares of 0 0 the errors between Y and AX and between Y and AX . That is, we look for A that 2 0 0 2 minimizes (Y ¡AX) +(Y ¡AX ) . In our example, we need to ¯nd A that minimizes 2 2 2 (5¡2A) +(7¡4A) =74¡76A+20A . Setting the derivative with respect to A equal to 0, we ¯nd¡76+40A = 0, or A = 1:9. This is the solution proposed by Legendre in 1805. He called this approach the method of least squares. The method of least squares is one that produces the \best" prediction of the output based on the input, under rather general conditions. However, to understand this notion, we need to make a short excursion on the characterization of uncertainty. Jacob BERNOULLI, 1654-1705 Making sense of uncertainty and chance: Law of Large Numbers.1.4. A LOOK BACK 7 Figure 1.2: Jacob Bernoulli If an urn contains 5 red balls and 7 blue balls, then the odds of picking \at random" a red ball from the urn are 5 out of 12. One can view the likelihood of a complex event as being the ratio of the number of favorable cases divided by the total number of \equally likely" cases. This is a somewhat circular de¯nition, but not completely: from symmetry considerations, one may postulate the existence equally likely events. However, in most situations,onecannotdetermineletalonecounttheequallylikelycasesnorthefavorable cases. (Consider for instance the odds of having a sunny Memorial Day in Berkeley.) Jacob Bernoulli (one of twelve Bernoullis who contributed to Mathematics, Physics, and Probability) showed the following result. If we pick a ball from an urn with r red balls and b blue balls a large number N of times (always replacing the ball before the next attempt), then the fraction of times that we pick a red ball approaches r=(r+b). More precisely, he showed that the probability that this fraction di®ers from r=(r+b) by more than any given ²0 goes to 0 as N increases. We will learn this result as the weak law of large numbers. Abraham DE MOIVRE, 1667 1754 Bounding the probability of deviation: Normal distribution De Moivre found a useful approximation of the probability that preoccupied Jacob Bernoulli. When N is large and ² small, he derived the normal approximation to the8 CHAPTER 1. MODELLING UNCERTAINTY Figure 1.3: Abraham de Moivre Figure 1.4: Thomas Simpson probability discussed earlier. This is the ¯rst mention of this distribution and an example of the Central Limit Theorem. Thomas SIMPSON, 1710-1761 A ¯rst attempt at posterior probability. Looking again at Bernoulli's and de Moivre's problem, we see that they assumed p = r=(r+b)knownandworriedabouttheprobabilitythatthefractionofN ballsselectedfrom the urn di®ers from p by more than a ¯xed ² 0. Bernoulli showed that this probability goes to zero (he also got some conservative estimates of N needed for that probability to be a given small number). De Moivre improved on these estimates.1.4. A LOOK BACK 9 Figure 1.5: Thomas Bayes Simpson (a heavy drinker) worried about the \reverse" question. Assume we do not know p and that we observe the fraction q of a large number N of balls being red. We believe that p should be close to q, but how close can we be con¯dent that it is? Simpson proposed a naijve answer by making arbitrary assumptions on the likelihood of the values of p. Thomas BAYES, 1701-1761 The importance of the prior distribution: Bayes' rule. BayesunderstoodSimpson'serror. ToappreciateBayes'argument, assumethat q =0:6 andthatwehavemade100experiments. Whataretheoddsthatp20:55;0:65? Ifyouare told that p = 0:5, then these odds are 0. However, if you are told that the urn was chosen such that p = 0:5 or p = 1, with equal probabilities, then the odds that p2 0:55;0:65 are now close to 1. Bayes understood how to include systematically the information about the prior distri- bution in the calculation of the posterior distribution. He discovered what we know today as Bayes' rule, a simple but very useful identity. Pierre Simon LAPLACE, 1749-1827 Posterior distribution: Analytical methods.10 CHAPTER 1. MODELLING UNCERTAINTY Figure 1.6: Pierre Simon Laplace Figure 1.7: Carl Friedrich Gauss Laplaceintroducedthetransformmethodstoevaluateprobabilities. Heprovidedderiva- tions of the central limit theorem and various approximation results for integrals (based on what is known as Laplace's method). Carl Friedrich GAUSS, 1777 1855 Least Squares Estimation with Gaussian errors. Gauss developed the systematic theory of least squares estimation when the errors are Gaussian. We explain in the notes the remarkable fact that the best estimate is linear in the observations.1.4. A LOOK BACK 11 Figure 1.8: Andrei Andreyevich Markov Andrei Andreyevich MARKOV, 1856 1922 Markov Chains A sequence of coin °ips produces results that are independent. Many physical systems exhibit a more complex behavior that requires a new class of models. Markov introduced a class of such models that enable to capture dependencies over time. His models, called Markov chains, are both fairly general and tractable. Andrei Nikolaevich KOLMOGOROV, 1903-1987 Kolmogorov was one of the most proli¯c mathematicians of the 20th century. He made fundamental contributions to dynamic systems, ergodic theory, the theory of functions and functional analysis, the theory of probability and mathematical statistics, the analysis of turbulence and hydrodynamics, to mathematical logic, to the theory of complexity, to geometry, and topology. In probability theory, he formulated probability as part of measure theory and estab- lishedsomeessentialpropertiessuchastheextensiontheoremandmanyotherfundamental results.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.