Professional Documents
Culture Documents
(3rd Edition)
Chap 10-1
Chapter Topics
Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions of Regression and Correlation Residual Analysis Measuring Autocorrelation Inferences about the Slope
Chap 10-2
Chapter Topics
(continued)
Correlation - Measuring the Strength of the Association Estimation of Mean Values and Prediction of Individual Values Pitfalls in Regression and Ethical Issues
Chap 10-3
Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable Explain the effect of the independent variables on the dependent variable
Chap 10-4
No Relationship
Chap 10-5
Relationship Between Variables is Described by a Linear Function The Change of One Variable Causes the Other Variable to Change A Dependency of One Variable on the Other
Chap 10-6
(continued)
Random Error
Yi = 0 + 1 X i + i
Population Regression Y |X Line (conditional mean)
(continued)
= 0 + 1 X i + i
i = Random Error
0
Observed Value of Y
2003 Prentice-Hall, Inc.
Y | X = 0 + 1 X i
(Conditional Mean)
X
Chap 10-8
Yi = b0 + b1 X i + ei
b+ = Y = 0 bX 1
2003 Prentice-Hall, Inc.
(continued)
b0 and b1 are obtained by finding the values of b0 and b that minimizes the sum of the 1
squared residuals
(
n i =1
Yi Yi
) = e
2 n i =1
2 i
(continued)
Yi = 0 + 1 X i + i
b1
ei
0
b0
i
Yi = b0 + b1 X i
1
Y | X = 0 + 1 X i
X
Chap 10-11
Observed Value
2003 Prentice-Hall, Inc.
0 = Y | X =0
1 =
Y |X X
Chap 10-12
(continued)
= b0 = Y |X
b1 =
Y |X X
Store 1 2 3 4 5 6 7
Annual Sales ($1000) 3,681 3,395 6,653 9,543 3,318 5,563 3,760
Chap 10-14
An n u a l S a le s ($000)
10000 8000 6000 4000 2000 0 0 1000 2000 3000 4000 5000 6000
Excel Output
2003 Prentice-Hall, Inc.
S q u a re F e e t
Chap 10-15
Chap 10-16
S q u a re F e e t
Chap 10-17
In Excel, use PHStat | Regression | Simple Linear Regression EXCEL Spreadsheet of Regression Sales on Footage
Chap 10-19
SST
SSR
Explained + Variability
Chap 10-20
(continued)
Measures the variation of the Yi values around their mean, Y Explained variation attributable to the relationship between X and Y Variation attributable to factors other than the relationship between X and Y
Chap 10-21
(continued)
SSE =(Yi - Yi )2
_ Y X
Chap 10-22
Xi
2003 Prentice-Hall, Inc.
Chap 10-23
A NO V A
df Re g r e s s io n Re s id u al T o tal 1 5 6 SS MS F Signific anc e F 0.000281201 30380456.12 30380456 81.17909 1871199.595 374239.92 32251655.71
SSE SSR
SST
Chap 10-24
Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
Chap 10-25
1 i
X Y
r2 = .81, r = +0.9 Y
^ Yi = b0 + b1Xi
2003 Prentice-Hall, Inc.
r2 = 0, r = 0
^ Yi = b0 + b1Xi
X
Chap 10-26
SYX
SSE = = n2
(
n i =1
Y Yi
n2
The standard deviation of the variation of observations around the regression equation
Chap 10-27
r2 = .94
Syx
94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage
Chap 10-28
Normality
Y values are normally distributed for each X Probability distribution of error is normal
Chap 10-29
Y
X2
X1 X
2003 Prentice-Hall, Inc.
Residual Analysis
Purposes
Examine linearity Evaluate violations of assumptions Plot residuals vs. X and time
Chap 10-31
X e X
X X
Not Linear
2003 Prentice-Hall, Inc.
Linear
Chap 10-32
SR
SR
Heteroscedasticity
2003 Prentice-Hall, Inc.
Homoscedasticity
Chap 10-33
Residual Plot
1000
2000
3000
4000
5000
6000
Square Feet
2003 Prentice-Hall, Inc.
Chap 10-34
Used when data is collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period) Measures violation of independence assumption
D=
(ei ei 1) 2
i =2
Should be close to 2.
e
i =1
2 i
Chap 10-36
=
k=1 n 15 16
2003 Prentice-Hall, Inc.
50 .
k=2 dL .95 .98 dU 1.54 1.54
Chap 10-37
dL 1.08 1.10
dU 1.36 1.37
0
2003 Prentice-Hall, Inc.
dL
dU
4-dU
4-dL
4
Chap 10-38
Independent
Time
Cyclical Pattern
No Particular Pattern
S YX
d. f . = n 2
(X
i =1
X)
Chap 10-40
Annual Sales ($000) 3,681 3,395 6,653 9,543 3,318 5,563 3,760
Y = 1636.415 + 1.487 X i
The slope of this model is 1.487. Does Square Footage Affect Annual Sales?
Chap 10-41
Test Statistic:
From Excel Printout
H1: 1 0
b1 Sb1
C oeffic ientsS tandard E rrort S tat P -value 451.4953 3.6244 0.01515 0.1650 9.0099 0.00028
Decision: Reject H0 Conclusion: There is evidence that square footage affects annual sales.
Chap 10-42
-2.5706 0 2.5706
2003 Prentice-Hall, Inc.
b1 tn 2 Sb1
Excel Printout for Produce Stores
L ow er 95% In te rce p t 475.810926 X V a ria b le 11.06249037 Upper 95% 2797.01853 1.91077694
At 95% level of confidence the confidence interval for the slope is (1.062, 1.911). Does not include 0. Conclusion: There is a significant linear dependency of annual sales on the size of the store.
2003 Prentice-Hall, Inc. Chap 10-43
Test Statistic
SR S 1 F = SE S ( n 2 )
Chap 10-44
H0: 1 = 0
2
(t )
= F1,n 2
Chap 10-45
Test Statistic:
Conclusion:
= .05
6.61
F1,n 2
Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical Variables
Chap 10-47
(continued)
Population Correlation Coefficient (Rho) is Used to Measure the Strength between the Variables Sample Correlation Coefficient r is an Estimate of and is Used to Measure the Strength of the Linear Relationship in the Sample Observations
Chap 10-48
r = -1
Y
r = -.6
Y
r=0
r = .6
r=1
X
Chap 10-49
Features of and r
Unit Free Range between -1 and 1 The Closer to -1, the Stronger the Negative Linear Relationship The Closer to 1, the Stronger the Positive Linear Relationship The Closer to 0, the Weaker the Linear Relationship
Chap 10-50
Hypotheses
Test Statistic
1 r n 2
2
r=
2003 Prentice-Hall, Inc.
r2
( X
i =1
X) ( Yi Y )
2
( X
i =1
X)
i 1 =
Y ( Y )
i
2
Chap 10-51
Is there any evidence of linear relationship between Annual Sales of a store and its Square Footage at .05 level of significance?
R eg ressio n S tatistics M ultiple R R S quare S tandard E rror O bs ervations 0.9705572 0.94198129 611.751517 7
Chap 10-52
Critical Value(s):
Reject .025 Reject .025
-2.5706 0 2.5706
2003 Prentice-Hall, Inc.
The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient
Chap 10-53
Y | X = X i :
Yi tn 2 SYX
t value from table with df=n-2
2003 Prentice-Hall, Inc.
(Xi X ) 1 + n n 2 (Xi X )
2 i =1
Chap 10-54
Yi tn 2 SYX
2003 Prentice-Hall, Inc.
1 1 + n
(Xi X ) + n 2 (Xi X )
2 i =1
Chap 10-55
+ b 1X i Yi = b0
A given X
X
Chap 10-56
Annual Sales ($000) 3,681 3,395 6,653 9,543 3,318 5,563 3,760
Y = 1636.415 + 1.487 X i
Chap 10-57
Y | X = X i
Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet
Predicted Sales Y = 1636.415 + 1.487 X i = 4610.45 ( $000 ) X = 2350.29 SYX = 611.75 tn 2 = t5 = 2.5706
Yi tn 2 SYX
2003 Prentice-Hall, Inc.
1 ( X i X )2 +n n ( X i X )2
i =1
4610.45 612.66 =
Chap 10-58
Find the 95% prediction interval for annual sales of one particular store of 2,000 square feet
tn 2 = t5 = 2.5706
Yi tn 2 SYX
1 1 + n
4610.45 1687.68 =
Chap 10-59
Chap 10-60
Lacking an Awareness of the Assumptions Underlining Least-squares Regression Not Knowing How to Evaluate the Assumptions Not Knowing What the Alternatives to Leastsquares Regression are if a Particular Assumption is Violated Using a Regression Model Without Knowledge of the Subject Matter
Chap 10-61
Start with a scatter plot of X on Y to observe possible relationship Perform residual analysis to check the assumptions Use a histogram, stem-and-leaf display, boxand-whisker plot, or normal probability plot of the residuals to uncover possible nonnormality
Chap 10-62
(continued)
If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression) If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals
Chap 10-63
Chapter Summary
Introduced Types of Regression Models Discussed Determining the Simple Linear Regression Equation Described Measures of Variation Addressed Assumptions of Regression and Correlation Discussed Residual Analysis Addressed Measuring Autocorrelation
Chap 10-64
Chapter Summary
(continued)
Described Inference about the Slope Discussed Correlation - Measuring the Strength of the Association Addressed Estimation of Mean Values and Prediction of Individual Values Discussed Pitfalls in Regression and Ethical Issues
Chap 10-65