Linear regression analysis ppt

linear regression in machine learning ppt and multiple linear regression analysis ppt
Prof.KristianHardy Profile Pic
Prof.KristianHardy,Austria,Teacher
Published Date:26-07-2017
Your Website URL(Optional)
Comment
c SLDM III Hastie & Tibshirani - March 7, 2013 Linear Regression 71 Linearity assumption? η(x)=β +β x +β x +...β x 0 1 1 2 2 p p Almost always thought of as an approximation to the truth. Functions in nature are rarely linear.  True regression functions are never linear 2 4 6 8 X  although it may seem overly simplistic, linear regression is extremely useful both conceptually and practically. Linear regression  Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X ;X ;:::X is linear. 1 2 p 1/48 f(X) 3 4 5 6 7 although it may seem overly simplistic, linear regression is extremely useful both conceptually and practically. c SLDM III Hastie & Tibshirani - March 7, 2013 Linear Regression 71 Linearity assumption? Linear regression η(x)=β +β x +β x +...β x 0 1 1 2 2 p p  Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on Almost always thought of as an approximation to the truth. X ;X ;:::X is linear. 1 2 p Functions in nature are rarely linear.  True regression functions are never linear 2 4 6 8 X 1/48 f(X) 3 4 5 6 7c SLDM III Hastie & Tibshirani - March 7, 2013 Linear Regression 71 Linearity assumption? Linear regression η(x)=β +β x +β x +...β x 0 1 1 2 2 p p  Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on Almost always thought of as an approximation to the truth. X ;X ;:::X is linear. 1 2 p Functions in nature are rarely linear.  True regression functions are never linear 2 4 6 8 X  although it may seem overly simplistic, linear regression is extremely useful both conceptually and practically. 1/48 f(X) 3 4 5 6 7Linear regression for the advertising data Consider the advertising data shown on the next slide. Questions we might ask:  Is there a relationship between advertising budget and sales?  How strong is the relationship between advertising budget and sales?  Which media contribute to sales?  How accurately can we predict future sales?  Is the relationship linear?  Is there synergy among the advertising media? 2/48Advertising data 0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 TV Radio Newspaper 3/48 Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25Simple linear regression using a single predictorX.  We assume a model Y = + X +; 0 1 where and are two unknown constants that represent 0 1 the intercept and slope, also known as coecients or parameters, and  is the error term.  Given some estimates and for the model coecients, 0 1 we predict future sales using y = + x; 0 1 where y indicates a prediction of Y on the basis of X =x. The hat symbol denotes an estimated value. 4/48 We de ne the residual sum of squares (RSS) as 2 2 2 RSS =e +e + +e ; 1 2 n or equivalently as 2 2 2 RSS = (y x ) +(y x ) +:::+(y x ) : 1 0 1 1 2 0 1 2 n 0 1 n  The least squares approach chooses and to minimize 0 1 the RSS. The minimizing values can be shown to be P n (x x )(y y ) i i i=1 P = ; 1 n 2 (x x ) i i=1 =y  x;  0 1 P P n n 1 1 where y  y and x  x are the sample i i n i=1 n i=1 means. Estimation of the parameters by least squares  Let y = + x be the prediction for Y based on the ith i 0 1 i value of X. Then e =y y represents the ith residual i i i 5/48  The least squares approach chooses and to minimize 0 1 the RSS. The minimizing values can be shown to be P n (x x )(y y ) i i i=1 P = ; 1 n 2 (x x ) i i=1 =y  x;  0 1 P P n n 1 1 where y  y and x  x are the sample i i n i=1 n i=1 means. Estimation of the parameters by least squares  Let y = + x be the prediction for Y based on the ith i 0 1 i value of X. Then e =y y represents the ith residual i i i  We de ne the residual sum of squares (RSS) as 2 2 2 RSS =e +e + +e ; 1 2 n or equivalently as 2 2 2 RSS = (y x ) +(y x ) +:::+(y x ) : 1 0 1 1 2 0 1 2 n 0 1 n 5/48Estimation of the parameters by least squares  Let y = + x be the prediction for Y based on the ith i 0 1 i value of X. Then e =y y represents the ith residual i i i  We de ne the residual sum of squares (RSS) as 2 2 2 RSS =e +e + +e ; 1 2 n or equivalently as 2 2 2 RSS = (y x ) +(y x ) +:::+(y x ) : 1 0 1 1 2 0 1 2 n 0 1 n  The least squares approach chooses and to minimize 0 1 the RSS. The minimizing values can be shown to be P n (x x )(y y ) i i i=1 P = ; 1 n 2 (x x ) i i=1 =y  x;  0 1 P P n n 1 1 where y  y and x  x are the sample i i n i=1 n i=1 means. 5/484 3. Linear Regression between the ith observed response value and the ith response value that is predictedbyour linearmodel.We define the residual sum of squares (RSS) residual sum of as squares 2 2 2 RSS=e +e +··· +e , 1 2 n or equivalently as 2 2 2 ˆ ˆ ˆ ˆ ˆ ˆ RSS =(y −β −β x ) +(y −β −β x ) +...+(y −β −β x ) . (3.3) 1 0 1 1 2 0 1 2 n 0 1 n ˆ ˆ The least squares approach chooses β and β to minimize the RSS. Using 0 1 some calculus, one can show that the minimizers are P n (x −x¯ )(y −y¯ ) i i i=1 ˆ P β = , 1 n 2 (x −x¯ ) i i=1 (3.4) ˆ ˆ β =y¯ −β x¯ , 0 1 P P n n 1 1 where y¯ ≡ y and x¯ ≡ x are the sample means. In other i i i=1 i=1 n n Example: advertising data words, (3.4) defines the least squares coefficient estimates for simple linear regression. 0 50 100 150 200 250 300 TV FIGURE 3.1. For the Advertising data, the least squares fit for the regression The least squares t for the regression of sales onto TV. of sales onto TV is shown. The fit is found by minimizing the sum of squared errors. Each grey line segment represents an error, and the fit makes a compro- In this case a linear t captures the essence of the relationship, mise by averaging their squares. In this case a linear fit captures the essence of the relationship, although it is somewhat deficient in the left of the plot. although it is somewhat de cient in the left of the plot. Figure 3.1 displays the simple linear regression fit to the Advertising ˆ ˆ data, where β0 = 7.03 and β1 = 0.0475. In other words, according to this 6/48 Sales 5 10 15 20 25 These standard errors can be used to compute con dence intervals. A 95% con dence interval is de ned as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter. It has the form  2 SE( ): 1 1 Assessing the Accuracy of the Coecient Estimates  The standard error of an estimator re ects how it varies under repeated sampling. We have   2 2  1 x  2 2 2 P P SE( ) = ; SE( ) = + ; 1 n 0 n 2 2 (x x ) n (x x ) i i i=1 i=1 2 where  = Var() 7/48Assessing the Accuracy of the Coecient Estimates  The standard error of an estimator re ects how it varies under repeated sampling. We have   2 2  1 x  2 2 2 P P SE( ) = ; SE( ) = + ; 1 n 0 n 2 2 (x x ) n (x x ) i i i=1 i=1 2 where  = Var()  These standard errors can be used to compute con dence intervals. A 95% con dence interval is de ned as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter. It has the form  2 SE( ): 1 1 7/48For the advertising data, the 95% con dence interval for is 1 0:042; 0:053 Con dence intervals continued That is, there is approximately a 95% chance that the interval h i 2 SE( ); + 2 SE( ) 1 1 1 1 will contain the true value of (under a scenario where we got 1 repeated samples like the present sample) 8/48Con dence intervals continued That is, there is approximately a 95% chance that the interval h i 2 SE( ); + 2 SE( ) 1 1 1 1 will contain the true value of (under a scenario where we got 1 repeated samples like the present sample) For the advertising data, the 95% con dence interval for is 1 0:042; 0:053 8/48 Mathematically, this corresponds to testing H : = 0 0 1 versus H : =6 0; A 1 since if = 0 then the model reduces to Y = +, and 1 0 X is not associated with Y . Hypothesis testing  Standard errors can also be used to perform hypothesis tests on the coecients. The most common hypothesis test involves testing the null hypothesis of H : There is no relationship between X and Y 0 versus the alternative hypothesis H : There is some relationship between X and Y: A 9/48Hypothesis testing  Standard errors can also be used to perform hypothesis tests on the coecients. The most common hypothesis test involves testing the null hypothesis of H : There is no relationship between X and Y 0 versus the alternative hypothesis H : There is some relationship between X and Y: A  Mathematically, this corresponds to testing H : = 0 0 1 versus H : =6 0; A 1 since if = 0 then the model reduces to Y = +, and 1 0 X is not associated with Y . 9/48Hypothesis testing continued  To test the null hypothesis, we compute a t-statistic, given by 0 1 t = ; SE( ) 1  This will have a t-distribution with n 2 degrees of freedom, assuming = 0. 1  Using statistical software, it is easy to compute the probability of observing any value equal tojtj or larger. We call this probability the p-value. 10/48Results for the advertising data Coecient Std. Error t-statistic p-value Intercept 7.0325 0.4578 15.36 0:0001 TV 0.0475 0.0027 17.67 0:0001 11/48 R-squared or fraction of variance explained is TSS RSS RSS 2 R = = 1 TSS TSS P n 2 where TSS = (y y ) is the total sum of squares. i i=1  It can be shown that in this simple linear regression setting 2 2 that R =r , where r is the correlation between X and Y : P n (x x)(y y) i i i=1 p p r = P P : n n 2 2 (x x) (y y) i i i=1 i=1 Assessing the Overall Accuracy of the Model  We compute the Residual Standard Error v r u n X u 1 1 t 2 RSE = RSS = (y y ) ; i i n 2 n 2 i=1 P n 2 where the residual sum-of-squares isRSS = (yy ) : i i i=1 12/48 It can be shown that in this simple linear regression setting 2 2 that R =r , where r is the correlation between X and Y : P n (x x)(y y) i i i=1 p p r = P P : n n 2 2 (x x) (y y) i i i=1 i=1 Assessing the Overall Accuracy of the Model  We compute the Residual Standard Error v r u n X u 1 1 t 2 RSE = RSS = (y y ) ; i i n 2 n 2 i=1 P n 2 where the residual sum-of-squares isRSS = (yy ) : i i i=1  R-squared or fraction of variance explained is TSS RSS RSS 2 R = = 1 TSS TSS P n 2 where TSS = (y y ) is the total sum of squares. i i=1 12/48

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.