Lecture notes Probability statistics

lecture notes on probability theory and stochastic processes and lecture notes on probability and random variables, lecture notes on probability and statistics
Prof.WilliamsHibbs Profile Pic
Prof.WilliamsHibbs,United States,Teacher
Published Date:28-07-2017
Your Website URL(Optional)
Chapter 1 BASIC PROBABILITY IN THIS CHAPTER: ✔ Random Experiments ✔ Sample Spaces ✔ Events ✔ The Concept of Probability ✔ The Axioms of Probability ✔ Some Important Theorems on Probability ✔ Assignment of Probabilities ✔ Conditional Probability ✔ Theorem on Conditional Probability ✔ Independent Events ✔ Bayes’ Theorem or Rule ✔ Combinatorial Analysis ✔ Fundamental Principle of Counting ✔ Permutations ✔ Combinations 1 Copyright 2001 by the McGraw-Hill Companies, Inc. Click Here for Terms of Use. 2 PROBABILITY AND STATISTICS ✔ Binomial Coefficients ✔ Stirling’s Approximation to n Random Experiments We are all familiar with the importance of experi- ments in science and engineering. Experimentation is useful to us because we can assume that if we perform certain experiments under very nearly identical conditions, we will arrive at results that are essentially the same. In these circumstances, we are able to control the value of the variables that affect the outcome of the experiment. However, in some experiments, we are not able to ascertain or con- trol the value of certain variables so that the results will vary from one performance of the experiment to the next, even though most of the con- ditions are the same. These experiments are described as random. Here is an example: Example 1.1. If we toss a die, the result of the experiment is that it will come up with one of the numbers in the set 1, 2, 3, 4, 5, 6. Sample Spaces A set S that consists of all possible outcomes of a random experiment is called a sample space, and each outcome is called a sample point. Often there will be more than one sample space that can describe outcomes of an experiment, but there is usually only one that will provide the most information. Example 1.2. If we toss a die, then one sample space is given by 1, 2, 3, 4, 5, 6 while another is even, odd. It is clear, however, that the latter would not be adequate to determine, for example, whether an outcome is divisible by 3. If is often useful to portray a sample space graphically. In such cases, it is desirable to use numbers in place of letters whenever possible.CHAPTER 1: Basic Probability 3 If a sample space has a finite number of points, it is called a finite sample space. If it has as many points as there are natural numbers 1, 2, 3, …. , it is called a countably infinite sample space. If it has as many points as there are in some interval on the x axis, such as 0 ≤ x ≤ 1, it is called a noncountably infinite sample space. A sample space that is finite or countably finite is often called a discrete sample space, while one that is noncountably infinite is called a nondiscrete sample space. Example 1.3. The sample space resulting from tossing a die yields a discrete sample space. However, picking any number, not just inte- gers, from 1 to 10, yields a nondiscrete sample space. Events An event is a subset A of the sample space S, i.e., it is a set of possible outcomes. If the outcome of an experiment is an element of A, we say that the event A has occurred. An event consisting of a single point of S is called a simple or elementary event. As particular events, we have S itself, which is the sure or certain event since an element of S must occur, and the empty set ∅, which is called the impossible event because an element of ∅ cannot occur. By using set operations on events in S, we can obtain other events in S. For example, if A and B are events, then 1. A ∪ B is the event “either A or B or both.” A ∪ B is called the union of A and B. 2. A ∩ B is the event “both A and B.” A ∩ B is called the inter- section of A and B. 3. A′ is the event “not A.” A′ is called the complement of A. 4. A – B = A ∩ is the event “A but not B.” In particular, = B′ A′ S – A. If the sets corresponding to events A and B are disjoint, i.e., A ∩ B =∅, we often say that the events are mutually exclusive. This means that they cannot both occur. We say that a collection of events A , A , … , 1 2 A is mutually exclusive if every pair in the collection is mutually exclu- n sive.4 PROBABILITY AND STATISTICS The Concept of Probability In any random experiment there is always uncertainty as to whether a particular event will or will not occur. As a measure of the chance, or probability, with which we can expect the event to occur, it is conve- nient to assign a number between 0 and 1. If we are sure or certain that an event will occur, we say that its probability is 100% or 1. If we are sure that the event will not occur, we say that its probability is zero. If, for example, the probability is ¹⁄ , we would say that there is a 25% chance it will occur and a 75% chance that it will not occur. Equivalently, we can say that the odds against occurrence are 75% to 25%, or 3 to 1. There are two important procedures by means of which we can esti- mate the probability of an event. 1. CLASSICAL APPROACH: If an event can occur in h different ways out of a total of n possible ways, all of which are equally likely, then the probability of the event is h/n. 2. FREQUENCY APPROACH: If after n repetitions of an experiment, where n is very large, an event is observed to occur in h of these, then the probability of the event is h/n. This is also called the empirical probability of the event. Both the classical and frequency approaches have serious drawbacks, the first because the words “equally likely” are vague and the second because the “large number” involved is vague. Because of these difficulties, mathematicians have been led to an axiomatic approach to probability. The Axioms of Probability Suppose we have a sample space S. If S is discrete, all subsets corre- spond to events and conversely; if S is nondiscrete, only special subsets (called measurable) correspond to events. To each event A in the class C of events, we associate a real number P(A). The P is called a proba- bility function, and P(A) the probability of the event, if the following axioms are satisfied.CHAPTER 1: Basic Probability 5 Axiom 1. For every event A in class C, P(A) ≥ 0 Axiom 2. For the sure or certain event S in the class C, P(S) = 1 Axiom 3. For any number of mutually exclusive events A , A , …, 1 2 in the class C, P(A ∪ A ∪ … ) = P(A ) + P(A ) + … 1 2 1 2 In particular, for two mutually exclusive events A and A , 1 2 P(A ∪ A ) = P(A ) + P(A ) 1 2 1 2 Some Important Theorems on Probability From the above axioms we can now prove various theorems on proba- bility that are important in further work. Theorem 1-1: If A ⊂ A , then (1) 1 2 P(A ) ≤ P(A ) and P(A − A ) = P(A ) − P(A ) 1 2 2 1 1 2 Theorem 1-2: For every event A, (2) 0 ≤ P(A) ≤ 1, i.e., a probability between 0 and 1. Theorem 1-3: For ∅, the empty set, (3) P(∅) = 0 i.e., the impossible event has probability zero. Theorem 1-4: If is the complement of A, then (4) A′ P( A′) = 1 – P(A) Theorem 1-5: If A = A ∪ A ∪ … ∪ A , where A A A are 1 2 n 1, 2, … , n mutually exclusive events, then P(A) = P(A ) + P(A ) + … + P(A ) (5) 1 2 n6 PROBABILITY AND STATISTICS Theorem 1-6: If A and B are any two events, then (6) P(A ∪ B) = P(A) + P(B) – P(A ∩ B) More generally, if A , A , A are any three events, 1 2 3 then P(A ∪ A ∪ A ) = P(A ) + P(A ) + P(A ) – 1 2 3 1 2 3 P(A ∩ A ) – P(A ∩ A ) – P(A ∩ A ) + 1 2 2 3 3 1 P(A ∩ A ∩ A ). 1 2 3 Generalizations to n events can also be made. Theorem 1-7: For any events A and B, (7) B′ P(A) = P(A ∩ B) + P(A ∩ ) Assignment of Probabilities If a sample space S consists of a finite number of outcomes a , a , … , 1 2 a , then by Theorem 1-5, n P(A ) + P(A ) + … + P(A ) = 1 (8) 1 2 n where A , A , … , A are elementary events given by A = a . 1 2 n i i It follows that we can arbitrarily choose any nonnegative numbers for the probabilities of these simple events as long as the previous equa- tion is satisfied. In particular, if we assume equal probabilities for all simple events, then 1 PA() = , k = 1, 2, … , n (9) k n And if A is any event made up of h such simple events, we have h PA () = (10) n This is equivalent to the classical approach to probability. We could of course use other procedures for assigning probabilities, such as fre- quency approach.CHAPTER 1: Basic Probability 7 Assigning probabilities provides a mathematical model, the success of which must be tested by experiment in much the same manner that the theories in physics or others sciences must be tested by experiment. Remember The probability for any event must be between 0 and 1. Conditional Probability Let A and B be two events such that P(A) 0. Denote P(B A) the prob- ability of B given that A has occurred. Since A is known to have occurred, it becomes the new sample space replacing the original S. From this we are led to the definition PA() ∩B PB(A) ≡ (11) PA () or PA() ∩≡ B PA()P(BA) (12) In words, this is saying that the probability that both A and B occur is equal to the probability that A occurs times the probability that B occurs given that A has occurred. We call P(B A) the conditional prob- ability of B given A, i.e., the probability that B will occur given that A has occurred. It is easy to show that conditional probability satisfies the axioms of probability previously discussed. Theorem on Conditional Probability Theorem 1-8: For any three events A , A , A , we have 1 2 3 = (13) PA() ∩∩ A A PA()PA( A)PA( A ∩A) 12 3 12 1 3 1 28 PROBABILITY AND STATISTICS In words, the probability that A and A and A all occur is equal 1 2 3 to the probability that A occurs times the probability that A occurs 1 2 given that A has occurred times the probability that A occurs given 1 3 that both A and A have occurred. The result is easily generalized to n 1 2 events. Theorem 1-9: If an event A must result in one of the mutually exclusive events A , A , … , A , then P(A) 1 2 n = P(A )P(A A ) + P(A )P(A A ) +... 1 1 2 2 + P(A )P(A A ) (14) n n Independent Events If P(B A) = P(B), i.e., the probability of B occurring is not affected by the occurrence or nonoccurrence of A, then we say that A and B are independent events. This is equivalent to PA() ∩= B PA()P(B) (15) Notice also that if this equation holds, then A and B are indepen- dent. We say that three events A1, A2, A3 are independent if they are pairwise independent. P(A ∩ A ) = P(A )P(A ) j ≠ k where j,k = 1,2,3 (16) j k j k and PA() ∩∩ A A =PA()PA()PA() (17) 12 3 1 2 3 Both of these properties must hold in order for the events to be independent. Independence of more than three events is easily defined.CHAPTER 1: Basic Probability 9 Note In order to use this multiplication rule, all of your events must be independent. Bayes’ Theorem or Rule Suppose that A , A , … , A are mutually exclusive events whose union 1 2 n is the sample space S, i.e., one of the events must occur. Then if A is any event, we have the important theorem: Theorem 1-10 (Bayes’ Rule): P(A)P(AA) kk P(A A) = (18) k n P(A)P(AA) ∑ jj j=1 This enables us to find the probabilities of the various events A , 1 A , … , A that can occur. For this reason Bayes’ theorem is often 2 n referred to as a theorem on the probability of causes. Combinatorial Analysis In many cases the number of sample points in a sample space is not very large, and so direct enumeration or counting of sample points needed to obtain probabilities is not difficult. However, problems arise where direct counting becomes a practical impos- sibility. In such cases use is made of combinatorial analysis, which could also be called a sophisticated way of counting.10 PROBABILITY AND STATISTICS Fundamental Principle of Counting If one thing can be accomplished n different ways and after this a sec- 1 ond thing can be accomplished n different ways, … , and finally a kth 2 thing can be accomplished in n different ways, then all k things can be k accomplished in the specified order in n n …n different ways. 1 2 k Permutations Suppose that we are given n distinct objects and wish to arrange r of these objects in a line. Since there are n ways of choosing the first object, and after this is done, n – 1 ways of choosing the second object, … , and finally n – r + 1 ways of choosing the rth object, it follows by the fundamental principle of counting that the number of different arrangements, or permutations as they are often called, is given by Pn=− (n11 )...(n−r+ ) (19) nr where it is noted that the product has r factors. We call P the number n r of permutations of n objects taken r at a time. Example 1.4. It is required to seat 5 men and 4 women in a row so that the women occupy the even places. How many such arrangements are possible? The men may be seated in P ways, and the women P ways. Each 5 5 4 4 arrangement of the men may be associated with each arrangement of the women. Hence, Number of arrangements = P , P = 5 4 = (120)(24) = 2880 5 5 4 4 In the particular case when r = n, this becomes Pn=− (n12 )(n− )...1=n (20) nnCHAPTER 1: Basic Probability 11 which is called n factorial. We can write this formula in terms of facto- rials as n P = (21) nr () nr − If r = n, we see that the two previous equations agree only if we have 0 = 1, and we shall actually take this as the definition of 0. Suppose that a set consists of n objects of which n are of one 1 type (i.e., indistinguishable from each other), n are of a second type, … , 2 nn=+n+ ...+n n are of a kth type. Here, of course, . Then the 12 k k number of different permutations of the objects is n P = (22) n nn , ,...,n 1 2 k nnLn 12 k Combinations In a permutation we are interested in the order of arrangements of the objects. For example, abc is a different permutation from bca. In many problems, however, we are only interested in selecting or choosing objects without regard to order. Such selections are called combina- tions. For example, abc and bca are the same combination. The total number of combinations of r objects selected from n (also called the combinations of n things taken r at a time) is denoted by C n r n   or . We have   r   n   n == C (23)   nr  r rn ( −r) It can also be written n   nn()−−11 L(n r+) P nr = = (24)    r r r It is easy to show that12 PROBABILITY AND STATISTICS n n     =     CC = or (25)  r nr −  nr n n−r Example 1.5. From 7 consonants and 5 vowels, how many words can be formed consisting of 4 different consonants and 3 different vow- els? The words need not have meaning. The four different consonants can be selected in C ways, the three dif- 7 4 ferent vowels can be selected in C ways, and the resulting 7 different 5 3 letters can then be arranged among themselves in P = 7 ways. Then 7 7 Number of words = C · C · 7 = 35·10·5040 = 1,764,000 7 4 5 3 Binomial Coefficients The numbers from the combinations formula are often called binomial coefficients because they arise in the binomial expansion n n n       nn n−− 12n2 n () xy+=x+ xy + xy++ L y (26)       12 n       Stirling’s Approximation to n When n is large, a direct evaluation of n may be impractical. In such cases, use can be made of the approximate formula nn − (27) nn 2πne where e = 2.71828 … , which is the base of natural logarithms. The symbol means that the ratio of the left side to the right side approach- es 1 as n → ∞.CHAPTER 1: Basic Probability 13 Computing technology has largely eclipsed the value of Stirling’s formula for numerical computations, but the approximation remains valuable for theoretical estimates (see Appendix A).Chapter 2 DESCRIPTIVE STATISTICS IN THIS CHAPTER: ✔ Descriptive Statistics ✔ Measures of Central Tendency ✔ Mean ✔ Median ✔ Mode ✔ Measures of Dispersion ✔ Variance and Standard Deviation ✔ Percentiles ✔ Interquartile Range ✔ Skewness Descriptive Statistics When giving a report on a data set, it is useful to describe the data set with terms familiar to most people. Therefore, we shall develop widely accepted terms that can help describe a data set. We shall discuss ways to describe the center, spread, and shape of a given data set. 14 Copyright 2001 by the McGraw-Hill Companies, Inc. Click Here for Terms of Use. CHAPTER 2: Descriptive Statistics 15 Measures of Central Tendency A measure of central tendency gives a single value that acts as a repre- sentative or average of the values of all the outcomes of your experi- ment. The main measure of central tendency we will use is the arith- metic mean. While the mean is used the most, two other measures of central tendency are also employed. These are the median and the mode. Note There are many ways to measure the central tendency of a data set, with the most common being the arithmetic mean, the median, and the mode. Each has advantages and dis- advantages, depending on the data and the intended pur- pose. Mean If we are given a set of n numbers, say x , x , … , x , then the mean, usu- 1 2 n ¯ ally denoted by x or µ , is given by xx++Lx 12 n x = (1) n Example 2.1. Consider the following set of integers: S = 1, 2, 3, 4, 5, 6, 7, 8, 9 ¯ The mean, x , of the set S is16 PROBABILITY AND STATISTICS 12 ++3+4+5+6+7+8+9 x = = 5 9 Median 1 1 The median is that value x for which PX() ≤ x andPX() ≤ x . 2 2 In other words, the median is the value where half of the values of x , 1 x , … , x are larger than the median, and half of the values of x , x , … , 2 n 1 2 x are smaller than the median. n Example 2.2. Consider the following set of integers: S = 1, 6, 3, 8, 2, 4, 9 If we want to find the median, we need to find the value, x, where half the values are above x and half the values are below x. Begin by ordering the list: S = 1, 2, 3, 4, 6, 8, 9 Notice that the value 4 has three scores below it and three scores above it. Therefore, the median, in this example, is 4. In some instances, it is quite possible that the value of the median will not be one of your observed values. Example 2.3. Consider the following set of integers: S = 1, 2, 3, 4, 6, 8, 9, 12 Since the set is already ordered, we can skip that step, but if you notice, we don’t have just one value in the middle of the list. Instead, we have two values, namely 4 and 6. Therefore, the median can be any numberCHAPTER 2: Descriptive Statistics 17 between 4 and 6. In most cases, the average of the two numbers is reported. So, the median for this set of integers is 46 + = 5 2 In general, if we have n ordered data points, and n is an odd number, then the median is the data point located exactly in the middle n +1 of the set. This can be found in location of your set. If n is an 2 even number, then the median is the average of the two middle terms of n n the ordered set. These can be found in locations and +1. 2 2 Mode The mode of a data set is the value that occurs most often, or in other words, has the most probability of occurring. Sometimes we can have two, three, or more values that have relatively large probabilities of occurrence. In such cases, we say that the distribution is bimodal, tri- modal, or multimodal, respectively. Example 2.4. Consider the following rolls of a ten-sided die: R = 2, 8, 1, 9, 5, 2, 7, 2, 7, 9, 4, 7, 1, 5, 2 The number that appears the most is the number 2. It appears four times. Therefore, the mode for the set R is the number 2. Note that if the number 7 had appeared one more time, it would have been present four times as well. In this case, we would have had a bimodal distribution, with 2 and 7 as the modes.18 PROBABILITY AND STATISTICS Measures of Dispersion Consider the following two sets of integers: S = 5, 5, 5, 5, 5, 5 and R = 0, 0, 0, 10, 10, 10 If we calculated the mean for both S and R, we would get the number 5 both times. However, these are two vastly different data sets. Therefore we need another descriptive statistic besides a measure of central tenden- cy, which we shall call a measure of dispersion. We shall measure the dispersion or scatter of the values of our data set about the mean of the data set. If the values tend to be concentrated near the mean, then this measure shall be small, while if the values of the data set tend to be dis- tributed far from the mean, then the measure will be large. The two measures of dispersions that are usually used are called the variance and standard deviation. Variance and Standard Deviation A quantity of great importance in probability and statistics is called the 2 variance. The variance, denoted by σ , for a set of n numbers x , x , … , 1 2 x , is given by n 2 22 (xx −+µµ ) ( −+ ) L+ (xµ− ) 2 1 2 n σ = (2) n The variance is a nonnegative number. The positive square root of the variance is called the standard deviation. Example 2.5. Find the variance and standard deviation for the fol- lowing set of test scores: T = 75, 80, 82, 87, 96CHAPTER 2: Descriptive Statistics 19 Since we are measuring dispersion about the mean, we will need to find the mean for this data set. 75+++ 80 82 87+ 96 µ = = 84 5 Using the mean, we can now find the variance. 222 2 2 (75−+ 84)(80−+ 84)(82−+ 84)(87−+ 84)(96− 84) 2 σ = 5 Which leads to the following: (81)(++ 16)(4)(+ 9)(+ 144) 2 σ = = 50.8 5 Therefore, the variance for this set of test scores is 50.8. To get the standard deviation, denoted by σ, simply take the square root of the variance. 2 σσ== 50.. 8= 7 1274118 The variance and standard deviation are generally the most used quantities to report the measure of dispersion. However, there are other quantities that can also be reported. You Need to Know  It is also widely accepted to divide the variance by (n − 1) as opposed to n. While this leads to a different result, as n gets large, the difference becomes minimal.20 PROBABILITY AND STATISTICS Percentiles It is often convenient to subdivide your ordered data set by use of ordi- nates so that the amount of data points less than the ordinate is some percentage of the total amount of observations. The values correspond- ing to such areas are called percentile values, or briefly, percentiles. Thus, for example, the percentage of scores that fall below the ordinate at x is α. For instance, the amount of scores less than x would be α 0.10 0.10 or 10%, and x would be called the 10th percentile. Another 0.10 example is the median. Since half the data points fall below the medi- an, it is the 50th percentile (or fifth decile), and can be denoted by x . 0.50 The 25th percentile is often thought of as the median of the scores below the median, and the 75th percentile is often thought of as the median of the scores above the median. The 25th percentile is called the first quartile, while the 75th percentile is called the third quartile. As you can imagine, the median is also known as the second quartile. Interquartile Range Another measure of dispersion is the interquartile range. The interquar- tile range is defined to be the first quartile subtracted from the third quartile. In other words, x − x 0.75 0.25 Example 2.6. Find the interquartile range from the following set of golf scores: S = 67, 69, 70, 71, 74, 77, 78, 82, 89 Since we have nine data points, and the set is ordered, the median is 91 + located in position , or the 5th position. That means that the medi- 2 an for this set is 74. The first quartile, x , is the median of the scores below the fifth 0.25

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.