Lecture notes in business statistics

business statistics lecture notes ppt and what is business statistics about and what is business statistics example
ZiaAhuja Profile Pic
ZiaAhuja,Canada,Professional
Published Date:17-07-2017
Your Website URL(Optional)
Comment
MAT 211 Introduction to Business Statistics I Lecture Notes Muhammad El-Taha Department of Mathematics and Statistics University of Southern Maine 96 Falmouth Street Portland, ME 04104-9300MAT 211, Spring 97, revised Fall 97,revised Spring 98 MAT 211 Introduction to Business Statistics I Course Content Topic 1: Data Analysis Topic 2: Probability Topic 3: Random Variables and Discrete Distributions Topic 4: Continuous Probability Distributions Topic 5: Sampling Distributions Topic 6: Point and Interval Estimation 1Contents 1DataAnalysis 4 1 Introduction.................................. 4 2 GraphicalMethods .............................. 6 3 Numericalmethods .............................. 8 4Percentiles................................... 15 5 Sample Mean and Variance ForGroupedData .............................. 16 6 z-score ..................................... 16 2 Probability 21 1 SampleSpaceandEvents .......................... 21 2 Probability of an event ............................ 22 3 Laws of Probability.............................. 24 4CountingSamplePoints........................... 27 5 RandomSampling .............................. 29 6 ModelingUncertainty............................. 29 3 Discrete Random Variables 34 1 RandomVariables............................... 34 2 ExpectedValueandVariance ........................ 36 3 DiscreteDistributions............................. 37 4MarkovChains................................ 39 4 Continuous Distributions 47 1 Introduction .... ........... ........... ........ 47 2 The Normal Distribution ........ ........... ........ 47 3 Uniform:Ua,b . ............................... 50 4Exponential.................................. 51 25SamplingDistributions 1 TheCentralLimitTheorem(CLT) ..................... 55 2 SamplingDistributions ............................ 55 6 Large Sample Estimation 60 1 Introduction.................................. 60 2 PointEstimatorsandTheirProperties ................... 61 3 SingleQuantitativePopulation ....................... 61 4SingleBinomialPopulation......................... 63 5 TwoQuantitativePopulations ........................ 65 6 TwoBinomialPopulations .......................... 66 3 55Chapter 1 Data Analysis Chapter Content. Introduction Statistical Problems Descriptive Statistics Graphical Methods Frequency Distributions (Histograms) Other Methods Numerical methods Measures of Central Tendency Measures of Variability Empirical Rule Percentiles 1 Introduction Statistical Problems 1. A market analyst wants to know the effectiveness of a new diet. 2. A pharmaceutical Co. wants to know if a new drug is superior to already existing drugs, or possible side effects. 3. How fuel efficient a certain car model is? 4. Is there any relationship between your GPA and employment opportunities. 5. If you answer all questions on a (T,F) (or multiple choice) examination completely randomly, what are your chances of passing? 6. What is the effect of package designs on sales. 47. How to interpret polls. How many individuals you need to sample for your infer- ences to be acceptable? What is meant by the margin of error? 8. What is the effect of market strategy on market share? 9. How to pick the stocks to invest in? I. Definitions Probability: A game of chance Statistics: Branch of science that deals with data analysis Course objective: To make decisions in the prescence of uncertainty Terminology Data: Any recorded event (e.g. times to assemble a product) Information: Any aquired data ( e.g. A collection of numbers (data)) Knowledge: Useful data Population: set of all measurements of interest (e.g. all registered voters, all freshman students at the university) Sample: A subset of measurements selected from the population of interest Variable: A property of an individual population unit (e.g. major, height, weight of freshman students) Descriptive Statistics: deals with procedures used to summarize the information con- tained in a set of measurements. Inferential Statistics: deals with procedures used to make inferences (predictions) about a population parameter from information contained in a sample. Elements of a statistical problem: (i) A clear definition of the population and variable of interest. (ii) a design of the experiment or sampling procedure. (iii) Collection and analysis of data (gathering and summarizing data). (iv) Procedure for making predictions about the population based on sample infor- mation. (v) A measure of “goodness” or reliability for the procedure. Objective. (better statement) To make inferences (predictions, decisions) about certain characteristics of a popula- tion based on information contained in a sample. Types of data: qualitative vs quantitative OR discrete vs continuous Descriptive statistics Graphical vs numerical methods 52 Graphical Methods Frequency and relative frequency distributions (Histograms): Example Weight Loss Data 20.5 19.5 15.6 24.1 9.9 15.412.7 5 17.0 28.6 16.9 7.8 23.3 11.8 18.4 13.414.3 19.2 9.2 16.8 8.8 22.1 20.8 12.6 15.9 Objective: Provide a useful summary of the available information. Method: Construct a statistical graph called a “histogram” (or frequency distribution) Weight Loss Data class bound- tally class rel. aries freq, f freq, f/n 1 5.0-9.0- 3 3/25 (.12) 2 9.0-13.0- 5 5/25 (.20) 3 13.0-17.0- 7 7/25 (.28) 417.0-21.0- 6 6/25) 5 21.0-25.0- 3 3/25 (.12) 6 25.0-29.0 1 1/25 (.04) Totals 25 1.00 Let k = of classes max = largest measurement min = smallest measurement n=samplesize w=classwidth Rule of thumb: -The number of classes chosen is usually between 5 and 20. (Most of the time between 7 and 13.) -The more data one has the larger is the number of classes. 6 (.24 .4Formulas: k=1+3.3log (n); 10 max−min w = . k 28.6−5.4 Note: w = =3.87. But we used 6 29−5 w = =4.0(why?) 6 Graphs: Graph the frequency and relative frequency distributions. Exercise. Repeattheaboveexampleusing12and4classesrespectively. Commenton the usefulness of each including k=6. Steps in Constructing a Frequency Distribution (Histogram) 1. Determine the number of classes 2. Determine the class width 3. Locate class boundaries 4. Proceed as above Possible shapes of frequency distributions 1. Normal distribution (Bell shape) 2. Exponential 3. Uniform 4. Binomial, Poisson (discrete variables) Important -The normal distribution is the most popular, most useful, easiest to handle - It occurs naturally in practical applications - It lends itself easily to more in depth analysis Other Graphical Methods -Statistical Table: Comparing different populations - Bar Charts - Line Charts - Pie-Charts - Cheating with Charts 73Numericalmethods Measures of Central Measures of Dispersion Tendency (Variability) 1. Sample mean 1. Range 2. Sample median 2. Mean Absolute Deviation (MAD) 3. Sample mode 3. Sample Variance 4. Sample Standard Deviation I. Measures of Central Tendency Given a sample of measurements (x ,x ,···,x )where 1 2 n n =samplesize th x = value of the i observation in the sample i 1. Sample Mean (arithmetic average) x +x +···+x 1 2 n x =  n x or x = n Example 1: Given a sample of 5 test grades (90, 95, 80, 60, 75) then  x = 90+95+80+60+75 = 400  x 400 x = = =80. n 5 Example 2:Let x = age of a randomly selected student sample: (20, 18, 22, 29, 21, 19)  x = 20+18+22+29+21+19 = 129  x 129 x = = =21.5 n 6 2. Sample Median The median of a sample (data set) is the middle number when the measurements are arranged in ascending order. Note: If n is odd, the median is the middle number 8If n is even, the median is the average of the middle two numbers. Example 1: Sample (9, 2, 7, 11, 14), n=5 Step 1: arrange in ascending order 2, 7, 9, 11, 14 Step 2: med = 9. Example 2: Sample (9, 2, 7, 11, 6, 14), n=6 Step 1: 2, 6, 7, 9, 11, 14 7+9 Step 2: med = =8. 2 Remarks: (i) x is sensitive to extreme values (ii) the median is insensitive to extreme values (because median is a measure of location or position). 3. Mode The mode is the value of x (observation) that occurs with the greatest frequency. Example: Sample: (9, 2, 7, 11, 14, 7, 2, 7), mode = 7 9Effect of x, median and mode on relative frequency distribution. 10II. Measures of Variability Given: a sample of size n sample: (x ,x ,···,x ) 1 2 n 1. Range: Range = largest measurement - smallest measurement or Range = max - min Example 1: Sample (90, 85, 65, 75, 70, 95) Range = max - min = 95-65 = 30 2. Mean Absolute Difference (MAD) (not in textbook)  x−x MAD = n Example 2: Same sample  x x = =80 n xx−x x−x 90 10 10 85 5 5 65 -15 15 75 -5 5 70 -10 10 95 15 15 Totals 480 0 60  x−x 60 MAD = = =10. n 6 Remarks: (i) MAD is a good measure of variability (ii) It is difficult for mathematical manipulations 2 3. Sample Variance, s  2 (x−x) 2 s = n−1 4. Sample Standard Deviation, s 11√ 2 s = s   2 (x−x) or s = n−1 Example: Same sample as before (x = 80) 2 xx−x (x−x) 90 10 100 85 5 25 65 -15 225 75 -5 25 70 -10 100 95 15 225 Totals 480 0 700 Therefore  x 480 x = = =80 n 6  2 (x−x) 700 2 s = = =140 n−1 5 √ √ 2 s = s = 140 = 11.83 2 Shortcut Formula for Calculating s and s  2  x ( ) 2 x − 2 n s = n−1    2   x ( )  2 x −  n s = n−1 √ 2 (or s = s ). Example: Same sample 122 x x 90 8100 85 7225 65 4225 75 5625 70 4900 95 9025 Totals 480 39,100  2  2 x ( ) (480) 2 x − 39,100− 2 n 6 s = = n−1 5 39,100−38,400 700 = = =140 5 5 √ √ 2 s = s = 140 = 11.83. Numerical methods(Summary) Data: x ,x ,···,x 1 2 n (i) Measures of central tendency  x i Sample mean: x = n Samplemedian: themiddlenumberwhenthemeasurements arearrangedinascending order Sample mode: most frequently occurring value (ii) Measures of variability Range: r=max−min  2 (x−x) i 2 Sample Variance: s = n−1 √ 2 Sample standard deviation: s= s Exercise: Find all the measures of central tendency and measures of variability for the weight loss example. Graphical Interpretation of the Variance: Finite Populations Let N = population size. Data: x ,x ,···,x 1 2 N  x i Population mean: µ = N Population variance:  2 (x −µ ) i 2 σ = N 13√ 2 Population standard deviation: σ = σ , i.e.   2 (x −µ ) i σ = N Population parameters vs sample statistics. 2 Sample statistics: x,s ,s. 2 Population parameters: µ,σ ,σ. Practical Significance of the standard deviation Chebyshev’s Inequality. (Regardless of the shape of frequency distribution) 1 Given a number k≥ 1, and a set of measurements x ,x ,...,x ,atleast(1− )of 1 2 n 2 k the measurements lie within k standard deviations of their sample mean. 1 Restated. At least (1− ) observations lie in the interval (x−ks,x+ks). 2 k Example. A set of grades has x=75,s=6. Then (i) (k = 1): at least 0% of all grades lie in 69,81 (ii) (k = 2): at least 75% of all grades lie in 63,87 (iii) (k = 3): at least 88% of all grades lie in 57,93 (iv) (k = 4): at least ?% of all grades lie in ?,? (v) (k = 5): at least ?% of all grades lie in ?,? Suppose that you are told that the frequency distribution is bell shaped. Can you improve the estimates in Chebyshev’s Inequality. Empirical rule. Given a set of measurements x ,x ,...,x , that is bell shaped. Then 1 2 n (i) approximately 68% ofthe measurements lie within onestandard deviations oftheir sample mean, i.e. (x−s,x+s) (ii) approximately 95% of the measurements lie within two standard deviations of their sample mean, i.e. (x−2s,x+2s) (iii) at least (almost all) 99% of the measurements lie within three standard deviations of their sample mean, i.e. (x−3s,x+3s) ExampleAdatasethas x=75,s = 6. The frequency distribution is known to be normal (bell shaped). Then (i) (69,81) contains approximately 68% of the observations (ii) (63,87) contains approximately 95% of the observations (iii) (57,93) contains at least 99% (almost all) of the observations Comments. (i) Empirical rule works better if sample size is large (ii) In your calculations always keep 6 significant digits 14range (iii) Approximation: s 4 s (iv) Coefficient of variation (c.v.) = x 4 Percentiles Using percentiles is useful if data is badly skewed. Let x ,x ,...,x be a set of measurements arranged in increasing order. 1 2 n th Definition. Let 0p 100. The p percentile is a number x such that p%ofall th measurements fall below the p percentile and (100−p)% fall above it. Example. Data: 2,5,8,10,11,14,17,20. (i) Find the 30th percentile. Solution. (S1) position = .3(n+1)= .3(9) = 2.7 (S2) 30th percentile = 5+.7(8−5) = 5+2.1=7.1 Special Cases. 1. Lower Quartile (25th percentile) Example. (S1) position = .25(n+1)= .25(9) = 2.25 (S2) Q =5+.25(8−5) = 5+ .75 = 5.75 1 2. Median (50th percentile) Example. (S1) position = .5(n+1)= .5(9) = 4.5 (S2) median: Q =10+.5(11−10) = 10.5 2 3. Upper Quartile (75th percentile) Example. (S1) position = .75(n+1)= .75(9) = 6.75 (S2) Q =14+.75(17−14) = 16.25 3 Interquartiles. IQ = Q −Q 3 1 Exercise. Find the interquartile (IQ) in the above example. 155SampleMeanandVariance For Grouped Data Example: (weight loss data) Weight Loss Data 2 class boundaries mid-pt. freq. xf x f xf 1 5.0-9.0- 7 3 21 147 2 9.0-13.0- 11 5 55 605 3 13.0-17.0- 15 7 105 1,575 417.0-21.0- 19 6 2,166 5 21.0-25.0- 23 3 69 1,587 6 25.0-29.0 27 1 27 729 Totals 25 391 6,809 Let k = number of classes. Formulas.  xf x = g n   2 2 x f−( xf) /n 2 s = g n−1 where the summation is over the number of classes k. Exercise: Use the grouped data formulas to calculate the sample mean, sample variance and sample standard deviation of the grouped data in the weight loss example. Compare with the raw data results. 6z-score 1. The sample z-score for a measurement x is x−x z = s 2. The population z-score for a measurement x is 16 114x−µ z = σ Example. A set of grades has x=75,s = 6. Suppose your score is 85. What is your relative standing, (i.e. how many standard deviations, s, above (below) the mean your score is)? Answer. x−x 85−75 z = = =1.66 s 6 standard deviations above average. Review Exercises: Data Analysis Please show all work. No credit for a correct final answer without a valid argu- ment. Use the formula, substitution, answer method whenever possible. Show your work graphically in all relevant questions. 1. (Fluoride Problem) The regulation board of health in a particular state specify that the fluoride level must not exceed 1.5 ppm (parts per million). The 25 measurements below represent the fluoride level for a sample of 25 days. Although fluoride levels are measured more than once per day, these data represent the early morning readings for the 25 days sampled. .75 .86 .84.85 .97 .94.89 . .83 .89 .88 .78 .77 .76 .82 .71 .92 1.05 .94.83 .81 .85 .97 .93 .79 2 (i) Show that x = .8588,s = .0065,s = .0803. (ii) Find the range, R. (iii) Using k = 7 classes, find the width, w, of each class interval. (iv) Locate class boundaries (v) Construct the frequency and relative frequency distributions for the data. 17 84class frequency relative frequency .70-.75- .75-.80- .80-.85- .85-.90- .90-.95- .95-1.00- 1.00-1.05 Totals (vi) Graph the frequency and relative frequency distributions and state your conclu- sions. (Vertical axis must be clearly labeled) 2. Given the following data set (weight loss per week) (9,2,5,8,4,5) (i) Find the sample mean. (ii) Find the sample median. (iii) Find the sample mode. (iv) Find the sample range. (v) Find the mean absolute difference. (vi) Find the sample variance using the defining formula. (vii) Find the sample variance using the short-cut formula. (viii) Find the sample standard deviation. (ix) Find the first and third quartiles, Q and Q . 1 3 (x) Repeat (i)-(ix) for the data set (21, 24, 15, 16, 24). s Answers: x=5.5, med =5, mode =5 range = 7, MAD=2, s ,6.7,s=2.588,Q−3= 8.25. 3. Grades for 50 students from a previous MAT test are summarized below. 2 class frequency, f xf x f 40 -50- 4 50 -60- 6 60-70- 10 70-80- 15 80-90- 10 90-100 5 Totals 18(i) Complete all entries in the table. (ii) Graph the frequency distribution. (Vertical axis must be clearly labeled) (iii) Find the sample mean for the grouped data (iv) Find the sample variance and standard deviation for the grouped data. 2 2 Answers: Σxf = 3610,Σx f = 270,250,x=72.2,s = 196,s =14. 4. Refer to the raw data in the fluoride problem. (i) Find the sample mean and standard deviation for the raw data. (ii) Find the sample mean and standard deviation for the grouped data. (iii) Compare the answers in (i) and (ii). 2 Answers: Σxf =21.475,Σx f=18.58,x =,s = .0745. g g 5. Suppose that the mean of a population is 30. Assume the standard deviation is knowntobe4andthatthefrequencydistributionisknowntobebell-shaped. (i) Approximately what percentage of measurements fall in the interval (22,34) (ii) Approximately what percentage of measurements fall in the interval (µ,µ +2σ) (iii) Find the interval around the mean that contains 68% of measurements (iv)Find the interval around the mean that contains 95% of measurements 6. Refer to the data in the fluoride problem. Suppose that the relative frequency distribution is bell-shaped. Using the empirical rule (i) find the interval around the mean that contains 99.6% of measurements. (ii) find the percentage of measurements fall in the interval (µ +2σ,∞) 7. (4pts.) AnswerbyTrueofFalse. (Circleyourchoice). T F (i) The median is insensitive to extreme values. T F (ii) The mean is insensitive to extreme values. T F (iii) For a positively skewed frequency distribution, the mean is larger than the median. T F (iv) The variance is equal to the square of the standard deviation. T F (v) Numerical descriptive measures computed from sample measurements are called parameters. T F (vi) The number of students attending a Mathematics lecture on any given day is a discrete variable. 19

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.