You are on page 1of 22

Simple Regression

Session 11

Review of Last Lecture


We have learnt to compute the equation of a simple

regression line from a sample of data


Compute and interpret the r square

Test hypothesis about the overall significance of the


regression model

Test hypotheses about the intercept and slope of the


regression model and interpret the results

Learning Objectives
Residual Analysis

To compute and interpret the standard error of estimate


To estimate values of Y using the regression model

To construct the confidence interval of estimated Y


To construct the prediction interval of estimated Y Using Residual analysis to test the assumptions of regression model

Residual Analysis: Airline Cost Example


Number of Passengers X Cost ($1,000) Y Predicted Value Y Residual Y -Y

61 63 67 69 70 74 76 81 86 91 95 97

4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56

4.053 4.134 4.297 4.378 4.419 4.582 4.663 4.867 5.070 5.274 5.436 5.518

.227 -.054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042

(Y - Y ) = -.001

Residual Analysis: Airline Cost Example


Excel Graph of Residuals for the Airline Cost Example
0.4

Residuals

0.2 0 -0.2 60 -0.4 No.of Passengers 70 80 90 100

Standard Error of the Estimate

Residuals represent errors of estimation for individual points.


With large sample of data residual computations become laborious

A more useful measurement of error is the standard error of the estimate

The standard error of the estimate, denoted se, is a standard


deviation of the error of the regression model

Standard Error of the Estimate


Sum of Squares Error

SSE =
Standard Error of the Estimate


Y -Y
2

= Y - b0 Y - b1 XY SSE Se = n - 2

Determining SSE for the Airline Cost Example


Number of Passengers X Cost (Rs.1,000) Y Residual
Y -Y )2 (Y - Y

61 63 67 69 70 74 76 81 86 91 95 97

4.28 4.08 4.42 4.17 4.48 4.30 4.82 4 .70 5.11 5.13 5.64 5.56

.227 -.054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042

.05153 .00292 .01513 .04326 .00372 .07952 .02465 .02789 .00160 .02074 .04162 .00176

(Y

) = -.001 -Y

(Y

) 2 =.31434 -Y

Sum of squares of error = SSE = .31434

Determining SSE for the Airline Cost Example MINITAB Output

SSE = 0.3141

Standard Error of the Estimate for the Airline Cost Example


Sum of Squares Error

SSE =

Y -Y

Standard Error of the Estimate

= 0.31434 SSE Se = n - 2 0.31434 = 10 = 0.1773

Standard Error of the Estimate for the Airline Cost Example

Se = 0.177217

Confidence Interval to Estimate Y: Airline Cost Example


-X 1 X0 n SSXX 2 where : X 0 = a particular value of X
Y t ,n -2 S e SSXX = X
2

n For X 0 = 73 and a 95% confidencelevel, 4.5411 2.2280.1773 1 12

X -

73 - 77.5 930 73,764 2

12

= 4.5411 1220 4.4191 E Y 73 4.6631

Confidence Interval to Estimate the Average Value of Y for some Values of X: Airline Cost Example
X 62 68 73 85 90 Confidence Interval 4.0934+ 4.3376+ 4.5411+ 5.0295+ 5.2230+ .1876 .1461 .1220 .1349 .1656 3.9058 to 4.2810 4.1915 to 4.4837 4.4191 to 4.6631 4.8946 to 5.1644 5.0674 to 5.3986

Prediction Interval to Estimate Y for a given value of X


1 X 0- X Y t ,n - 2 S e 1 n SSXX 2 where : X 0 = a particular value of X SSXX = X
2

X n

Minitab Output for Prediction Interval


Minitab Output

Residual Analysis Test the Assumptions of the Regression Models


Residual analysis is used to tests 4 major assumptions
The model is linear The error terms have constant variances (Homoscedasticity) The error terms are normally distributed The error terms are independent

Residual Analysis Linearity of the Model


Linearity of relationship between each X and Y can be checked by scatter plot of Y against each X (To get the preliminary idea). Linearity of relationship between X and Y can be checked by using residual plot Non linear residual plot implies that the model is not linear

Residual Analysis Linearity/Non Linearity of the Model

Residual Analysis Linearity of the Model (Airline Cost Example)


Airline Cost Example Minitab Output

Residual Analysis Homoscedasticity


The assumption of constant error variance (homoscedasticity) can be

checked by residual plot


If the error variances are not constant then it is called heteroscedasticity Incase of heteroscedasticity

- the error variance is smaller for smaller values of x and larger for larger
values of x - the error variance is greater for small values of x and smaller for large values of x

Residual Analysis Heteroscedasticity

In Next Lecture
Residual Analysis to Test the Assumptions of the Regression Model Unusual and Influential Data Points

You might also like