You are on page 1of 124

Simple Linear

Regression

1-1

Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Predict Response Variable
6. Interpret Computer Output
1-2

Models

1-3

Models
1. Representation of Some Phenomenon
2. Mathematical Model Is a Mathematical
Expression of Some Phenomenon
3. Often Describe Relationships between
Variables
4. Types

Deterministic Models
Probabilistic Models

1-4

Deterministic
Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is
Negligible
3. Example: Force Is Exactly
Mass Times Acceleration

F = ma

1984-1994 T/Maker Co.

1-5

Probabilistic Models
1. Hypothesize 2 Components

Deterministic
Random Error

2. Example: Sales Volume Is 10 Times


Advertising Spending + Random Error

Y = 10X +
Random Error May Be Due to Factors
Other Than Advertising

1-6

Types of
Probabilistic Models
Probabilistic
Probabilistic
Models
Models

Regression
Regression
Models
Models

1-7

Correlation
Correlation
Models
Models

Other
Other
Models
Models

Regression Models

1-8

Types of
Probabilistic Models
Probabilistic
Probabilistic
Models
Models

Regression
Regression
Models
Models

1-9

Correlation
Correlation
Models
Models

Other
Other
Models
Models

Regression Models
1. Answer What Is the Relationship
Between the Variables?
2. Equation Used

1 Numerical Dependent (Response) Variable


What

Is to Be Predicted

1 or More Numerical or Categorical


Independent (Explanatory) Variables

3. Used Mainly for Prediction & Estimation


1 - 10

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 11

Model Specification

1 - 12

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 13

Specifying the
Model
1. Define Variables

Conceptual (e.g., Advertising, Price)


Empirical (e.g., List Price, Regular Price)
Measurement (e.g., $, Units)

2. Hypothesize Nature of Relationship

Expected Effects (i.e., Coefficients Signs)


Functional Form (Linear or Non-Linear)
Interactions

1 - 14

Model Specification
Is Based on Theory
1.
2.
3.
4.

Theory of Field (e.g., Sociology)


Mathematical Theory
Previous Research
Common Sense

1 - 15

Thinking Challenge:
Which Is More
Logical?
Sales

Sales

Advertising
Sales

Advertising
Sales

Advertising
1 - 16

Advertising

Types of
Regression Models

1 - 17

Types of
Regression Models
Regression
Models

1 - 18

Types of
Regression Models
1 Explanatory
Variable

Simple

1 - 19

Regression
Models

Types of
Regression Models
1 Explanatory
Variable

Simple

1 - 20

Regression
Models

2+ Explanatory
Variables

Multiple

Types of
Regression Models
1 Explanatory
Variable

Simple

Linear

1 - 21

Regression
Models

2+ Explanatory
Variables

Multiple

Types of
Regression Models
1 Explanatory
Variable

Regression
Models

Multiple

Simple

Linear

1 - 22

2+ Explanatory
Variables

NonLinear

Types of
Regression Models
1 Explanatory
Variable

Regression
Models

2+ Explanatory
Variables

Multiple

Simple

Linear

1 - 23

NonLinear

Linear

Types of
Regression Models
1 Explanatory
Variable

Regression
Models

2+ Explanatory
Variables

Multiple

Simple

Linear

1 - 24

NonLinear

Linear

NonLinear

Linear Regression
Model

1 - 25

Types of
Regression Models
Regression
Models

1 Explanatory
Variable

2+ Explanatory
Variables

Multiple

Simple

Linear

1 - 26

NonLinear

Linear

NonLinear

Linear Equations
Y
Y = mX + b
m = Slope

Change
in Y

Change in X
b = Y-intercept

High School Teacher


1984-1994 T/Maker Co.

1 - 27

Linear Regression
Model
1. Relationship Between Variables Is a
Linear Function
Population
Y-Intercept

Population
Slope

Random
Error

Yi 0 1X i i
Dependent
(Response)
Variable
1 - 28

Independent
(Explanatory)
Variable

Population &
Sample Regression
Models

1 - 29

Population &
Sample Regression
Models

Population

$
$

1 - 30

$
$
$

Population &
Sample Regression
Models

Population
Unknown
Relationship

$
Yi 0 1X i i
$

1 - 31

$
$
$

Population &
Sample Regression
Models

Population

Random Sample

Unknown
Relationship

$
Yi 0 1X i i
$

1 - 32

$
$
$

$
$

Population &
Sample Regression
Models

Population
Unknown
Relationship

$
Yi 0 1X i i
$

1 - 33

$
$
$

Random Sample

Yi 0 1X i i

$
$

Population Linear
Regression Model
Y

Yi 0 1X i i

Observed
value

i = Random error
E Y 0 1 X i

X
Observed value
1 - 34

Sample Linear
Regression Model
Y

Yi 0 1X i i
^i = Random
error

Yi 0 1X i

Unsampled
observation

X
Observed value
1 - 35

Estimating Parameters:
Least Squares Method

1 - 36

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 37

Scattergram
1. Plot of All (Xi, Yi) Pairs
2. Suggests How Well Model Will Fit
60
40
20
0

0
1 - 38

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 39

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 40

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 41

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 42

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 43

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 44

20

40

X
60

Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
60
40
20
0

1 - 45

20

40

X
60

Least Squares
1. Best Fit Means Difference Between
Actual Y Values & Predicted Y Values
Are a Minimum

But Positive Differences Off-Set Negative

1 - 46

Least Squares
1. Best Fit Means Difference Between
Actual Y Values & Predicted Y Values
Are a Minimum

But Positive Differences Off-Set Negative

i 1

1 - 47

i 1

2
i

Least Squares
1. Best Fit Means Difference Between Actual
Y Values & Predicted Y Values Are a
Minimum

But Positive Differences Off-Set Negative

Y
n

i 1

Yi

i2
i 1

2. LS Minimizes the Sum of the Squared


Differences (SSE)
1 - 48

Least Squares
Graphically
n

2
2
2
2
2

LS minimizes i 1 2 3 4
i 1

Y2 0 1X 2 2
^ 44

^ 22
^ 11

^ 33

Yi 0 1X i
X

1 - 49

Coefficient
Equations
Prediction Equation
Y X
Sample Slope

nn

X
ii Yii
nn
ii11

ii11
X
Y

ii ii
n
ii11

11
22
nn

X
ii
nn
ii11
22

ii
n
ii11

Sample Y-intercept

00 Y 11X

1 - 50

nn

Computation Table
Xii

Yii

X1

Y1

X2

Y2

2
Xi
X112
2
X22

Yn

2
Xnn

2
Yn

XnYn

Yi

2
Xi

2
Yi

Xi Yi

Xn
Xii
1 - 51

2
Yi
Y122
22
Y2

X1 Y1

XiYi
X2 Y2

Interpretation of
Coefficients

1 - 52

Interpretation of
Coefficients
^

1. Slope (1)

^
Estimated Y Changes by 1 for Each 1
Unit Increase in X
^
1 = 2, then Sales (Y) Is Expected to
Increase by 2 for Each 1 Unit Increase in
Advertising (X)

If

1 - 53

Interpretation of
Coefficients
^
1. Slope (1)

^ Each 1 Unit
Estimated Y Changes by 1 for
Increase in X

1^= 2, then Sales (Y) Is Expected to Increase by


2 for Each 1 Unit Increase in Advertising (X)

If

2. Y-Intercept (0)

Average Value of Y When X = 0


0 = 4, then Average Sales (Y) Is Expected to Be
^
4 When
Advertising (X) Is 0

If

1 - 54

Parameter
Estimation Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
What is the relationship
between sales & advertising?
1 - 55

Scattergram
Sales vs. Advertising
Sales
4

3
2
1
0
0

Advertising
1 - 56

Parameter
Estimation Solution
Table
X
Y
X 22
Y 22
XY
Xii

Yii

Xii

Yii

XiiYii

16

25

16

20

15

10

55

26

37

1 - 57

Parameter
Estimation Solution

11

nn

X ii
nn
ii11

X
Y

ii ii
n
ii11

nn

nn

Y
ii11

X ii

nn
ii11
22

ii
n
ii11

22

ii

1510
37

5
0.70
22

15
55
5

00 Y 11X 2 0.70 3 0.10


1 - 58

Coefficient
Interpretation
Solution

1 - 59

1.

Coefficient
Interpretation
Solution
^
Slope ( )
1

Sales Volume (Y) Is Expected to Increase


by .7 Units for Each $1 Increase in
Advertising (X)

1 - 60

1.

Coefficient
Interpretation
Solution
^
Slope ( )
1

Sales Volume (Y) Is Expected to Increase by .


7 Units for Each $1 Increase in Advertising ( X)

2. Y-Intercept (0)

Average Value^of Sales Volume (Y) Is


-.10 Units When Advertising (X) Is 0
Difficult

to Explain to Marketing Manager


Expect Some Sales Without Advertising

1 - 61

Parameter
Estimation Computer
Output

^k

Variable DF
INTERCEP 1
ADVERT
1

^0
1 - 62

Parameter Estimates
Parameter Standard T for H0:
Estimate
Error
Param=0
-0.1000
0.6350
-0.157
0.7000
0.1914
3.656

^1

Prob>|T|
0.8849
0.0354

Parameter
Estimation Thinking
Challenge
Youre an economist for the county
Youre an economist for the county
cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.)
4
3.0
6
5.5
10
6.5
12
9.0
What is the relationship
between fertilizer & crop yield?

1984-1994 T/Maker Co.

1 - 63

Scattergram
Crop Yield vs.
Fertilizer*
Yield (lb.)
10
8
6
4
2
0
0

10

Fertilizer (lb.)
1 - 64

15

Parameter
Estimation Solution
Table*
2
2

1 - 65

Xii

Yii

Xii2

Yii2

XiiYii

3.0

16

9.00

12

5.5

36

30.25

33

10

6.5

100

42.25

65

12

9.0

144

81.00

108

32

24.0

296 162.50 218

Parameter
Estimation Solution*

11

nn

X ii
nn
ii11

X
Y

ii ii
n
ii11

nn

nn

Y
ii11

X ii

nn
ii11
22

ii
n
ii11

22

ii

32 24
218

00 Y 11X 6 0.65 8 0.80


1 - 66

4
2

32 2
296
4

0.65

Coefficient
Interpretation
Solution*

1 - 67

Coefficient
Interpretation
Solution*
^

1. Slope (1)

Crop Yield (Y) Is Expected to Increase by .


65 lb. for Each 1 lb. Increase in Fertilizer ( X)

1 - 68

Coefficient
Interpretation
Solution*
^

1. Slope (1)

Crop Yield (Y) Is Expected to Increase by .


65 lb. for Each 1 lb. Increase in Fertilizer ( X)
^

2. Y-Intercept (0)

Average Crop Yield (Y) Is Expected to Be


0.8 lb. When No Fertilizer (X) Is Used

1 - 69

Probability Distribution
of Random Error

1 - 70

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 71

Linear Regression
Assumptions
1. Mean of Probability Distribution of
Error Is 0
2. Probability Distribution of Error Has
Constant Variance
3. Probability Distribution of Error is
Normal
4. Errors Are Independent
1 - 72

Error
Probability
Distribution
^
f( )

Y
X
X

X
2

1 - 73

Random Error
Variation

1 - 74

Random Error
Variation
1. Variation of Actual Y from Predicted Y

1 - 75

Random Error
Variation
1. Variation of Actual Y from Predicted Y
2. Measured by Standard Error of
Regression Model

Sample Standard Deviation of ^, s

1 - 76

Random Error
Variation
1. Variation of Actual Y from Predicted Y
2. Measured by Standard Error of
Regression Model

Sample Standard Deviation of ^, s

3. Affects Several Factors

Parameter Significance
Prediction Accuracy

1 - 77

1.

Measures of
Variation
in
Regression
Total Sum of Squares (SSyy)
yy

Measures Variation of Observed Yi Around


the Mean Y

2. Explained Variation (SSR)

Variation Due to Relationship Between


X&Y

3. Unexplained Variation (SSE)

Variation Due to Other Factors

1 - 78

Variation Measures
Y

Yi
Total sum
of squares
(Yi - Y)2

Unexplained sum
^ )2
of squares (Yi - Y
i
Yi 0 1X i

Explained sum of
^
squares (Yi - Y)2

X
1 - 79

X
i

Coefficient of
Determination
1. Proportion of Variation Explained by
Relationship Between X & Y
0 r2 1
Explained Variation
r
Total Variation
2

Y
n

i 1

Y
i 1

Y Y
n

i 1

1 - 80

Yi Y

Coefficient of
Determination
Examples2

r2 = 1

r =1

X
Y

r2 = .8

X
1 - 81

r2 = 0

Coefficient of
Determination
Example
Youre a marketing analyst
for Hasbro Toys.
You find 0 = -0.1^ & 1 = 0.7.^
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
Interpret a coefficient of
determination of 0.8167.
1 - 82

Root MSE
Dep Mean
C.V.

1 - 83

Computer Output

0.60553
2.00000
30.27650

r2

R-square
Adj R-sq

0.8167
0.7556

r2 adjusted for number of


explanatory variables &
sample size

Evaluating the Model


Testing for Significance

1 - 84

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 85

Test of Slope
Coefficient
1. Shows If There Is a Linear Relationship
Between X & Y
2. Involves Population Slope 1
3. Hypotheses

H0: 1 = 0 (No Linear Relationship)


Ha: 1 0 (Linear Relationship)

4. Theoretical Basis Is Sampling Distribution of


Slope
1 - 86

Sampling
Distribution
of Sample Slopes

1 - 87

Sampling
Distribution
of
Sample
Slopes
Sample 1 Line
Sample 2 Line
Population Line

1 - 88

Sampling
Distribution
of
Sample
Slopes
Sample 1 Line
Sample 2 Line
Population Line

1 - 89

All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
:
:
Very large number of
sample slopes

Sampling
Distribution
of
Sample
Slopes
Sample 1 Line
Sample 2 Line
Population Line

Sampling Distribution
S^
1

1 - 90

^
1

All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
:
:
Very large number of
sample slopes

Slope Coefficient
Test Statistic
tn
n
2
2

1
1
1
1

S
1

where
S

S
1

n
n

i
i
1
1

1 - 91

X ii2

n
n

i
i
1
1

X ii

2
2

Test of Slope
Coefficient Example
Youre a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
Is the relationship significant
at the .05 level?
1 - 92

Solution Table
Xii

Yii

22
Xii

16

25

16

20

15

10

55

26

37

1 - 93

22
Yii

XiiYii

Test of Slope
Parameter
Solution
Test Statistic:

H0: 1 = 0
Ha: 1 0

1 1 0.70 0
t

3.656
S
0.1915

.05
df 5 - 2 = 3
Critical Value(s):
Reject

Decision:
Reject at = .05

Reject

.025

.025

-3.1824
1 - 94

0 3.1824

Conclusion:
There is evidence of a
relationship

Test Statistic
Solution
t nn22

11 11 0.70 0

3.656
S
0.1915
11

where
S
1
1

ii
nn
22
i
1

X i 1

n
i
11

1 - 95

nn

22

0.60553
55

15
5

33

0.1915

Test of Slope
Parameter
Computer Output

Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate
Error
Param=0 Prob>|T|
INTERCEP 1 -0.1000
0.6350
-0.157
0.8849
ADVERT
1
0.7000
0.1914
3.656
0.0354

^
k

S^

t = ^k / S^

P-Value
1 - 96

Using the Model for


Prediction & Estimation

1 - 97

Regression Modeling
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term

Estimate Standard Deviation of Error

4. Evaluate Model
5. Use Model for Prediction & Estimation
1 - 98

Prediction With
Regression Models
1. Types of Predictions

Point Estimates
Interval Estimates

2. What Is Predicted

Population Mean Response E(Y) for Given


X
Point

on Population Regression Line

Individual Response (Yi) for Given X

1 - 99

What Is Predicted
Y

Y Individual
Mean Y, E(Y)
E(Y)
Prediction, Y

XP
1 - 100

= 00 + 11 X

^
X

Confidence Interval
Estimate of Mean Y
Y t nn22,, //22 SYY E (Y ) Y t nn22,, //22 SYY
where
1
SYY S

X X
X X
22

pp

nn

ii11

1 - 101

ii

22

Factors Affecting
Interval Width
1. Level of Confidence (1 - )

Width Increases as Confidence Increases

2. Data Dispersion (s)

Width Increases as Variation Increases

3. Sample Size

Width Decreases as Sample Size Increases

4. Distance of Xp from Mean X

Width Increases as Distance Increases

1 - 102

Why Distance from


Mean?
Y
Greater
dispersion
than X1

_
Y
X1
1 - 103

X2

Confidence Interval
Estimate Example
Youre a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
Estimate the mean sales when
advertising is $4 at the .05 level.
1 - 104

Solution Table
Xii

Yii

Xii22

Yii22

XiiYii

1
2

1
1

1
4

1
1

1
2

3
4

2
2

9
16

4
4

6
8

25

16

20

15

10

55

26

37

1 - 105

Confidence Interval
Estimate Solution
Y t nn22,, //22 SYY E (Y ) Y t nn22,, //22 SYY
Y 0.1 0.7 4 2.7

X to be predicted

1 4 3 2
SYY .60553
0.3316
5
10
2

2.7 3.1824 0.3316 E (Y ) 2.7 3.1824 0.3316


1.6445 E (Y ) 3.7553
1 - 106

Prediction Interval
of Individual
Response

Y t n 2, / 2 S Y Y YP Y t n 2, // 2 S YY Y
where
1
S Y Y S 1
n

X X
X X
2

i 1

Note!
1 - 107

Why the Extra S?


Y

we're trying to
predict

Expected
Expected
(Mean) Y

E(Y) =
Prediction, YY

XP
1 - 108

00 + 11 X

Interval Estimate
Computer Output
Dep Var
Obs SALES
1 1.000
2 1.000
3 2.000
4 2.000
5 4.000

Pred Std Err Low95% Upp95% Low95% Upp95%


Value Predict
Mean
Mean Predict Predict
0.600
0.469 -0.892 2.092 -1.837
3.037
1.300
0.332 0.244 2.355 -0.897
3.497
2.000
0.271 1.138 2.861 -0.111
4.111
2.700
0.332 1.644 3.755
0.502
4.897
3.400
0.469 1.907 4.892
0.962
5.837

Predicted Y
when X = 4
1 - 109

SY^

Confidence
Interval

Prediction
Interval

Hyperbolic Interval
Bands
Y

_
X
1 - 110

XP

Correlation Models

1 - 111

Types of
Probabilistic Models
Probabilistic
Probabilistic
Models
Models

Regression
Regression
Models
Models

1 - 112

Correlation
Correlation
Models
Models

Other
Other
Models
Models

Correlation Models
1. Answer How Strong Is the Linear
Relationship Between 2 Variables?
2. Coefficient of Correlation Used

Population Correlation Coefficient Denoted


(Rho)
Values Range from -1 to +1
Measures Degree of Association

3. Used Mainly for Understanding


1 - 113

Sample Coefficient
of Correlation
1. Pearson Product Moment Coefficient
of Correlation, r:
r Coefficient of Determination

cYi Y h
cX i X h
n

Yi Y h
cX i X h c
n

i 1

1 - 114

i 1

i 1

Coefficient of Correlation
Values

1 - 115

Coefficient of Correlation
Values

-1.0

1 - 116

-.5

+.5

+1.0

Coefficient of Correlation
Values
No
Correlation

-1.0

1 - 117

-.5

+.5

+1.0

Coefficient of Correlation
Values
No
Correlation

-1.0

-.5

Increasing degree of
negative correlation
1 - 118

+.5

+1.0

Coefficient of Correlation
Values
Perfect
Negative
Correlation

-1.0

1 - 119

No
Correlation

-.5

+.5

+1.0

Coefficient of Correlation
Values
Perfect
Negative
Correlation

-1.0

No
Correlation

-.5

+.5

+1.0

Increasing degree of
positive correlation
1 - 120

Coefficient of Correlation
Values
Perfect
Negative
Correlation

-1.0

1 - 121

Perfect
Positive
Correlation

No
Correlation

-.5

+.5

+1.0

r=1

Coefficient of
Correlation
Examples
Y
r = -1
X

r = .89

X
1 - 122

r=0

1.

Test of
Coefficient of
Correlation
Shows If There Is a Linear Relationship

Between 2 Numerical Variables


2. Same Conclusion as Testing Population
Slope 1
3. Hypotheses

H0: = 0 (No Correlation)


Ha: 0 (Correlation)

1 - 123

Conclusion
1. Described the Linear Regression Model
2. Stated the Regression Modeling Steps
3. Explained Ordinary Least Squares
4. Computed Regression Coefficients
5. Predicted Response Variable
6. Interpreted Computer Output
1 - 124

You might also like