You are on page 1of 2

Analysis of Variance (ANOVA) for Simple Linear Regression

The variability in Y values can be partitioned into two pieces.


Pn Pn Pn
i=1 (Yi − Ȳ )2 = i=1 (Ŷi − Ȳ )2 + i=1 (Yi − Ŷi )2

Total Sum of Squares = Regression Sum of Squares + Error (or Residual) Sum of Squares

SSTO = SSREG + SSE

We can organize the results of a simple linear regression analysis in an ANOVA table.

Source D.F. Sum of Squares Mean Square F P-value

M SREG M SREG
Regression dfREG SSREG M SREG M SE P (T 2 ≥ M SE ) T 2 ∼ F (dfREG , dfE )

Error dfE SSE M SE

Total dfT O SST O

Source D.F. Sum of Squares Mean Square F P-value

Pn
Pn (Ŷi −Ȳ )2 M SREG M SREG
Regression 1 i=1 (Ŷi − Ȳ )2 i=1
1 M SE P (T 2 ≥ M SE ) T 2 ∼ F (1, n − 2)
Pn
Pn (Yi −Ŷi )2
Error n−2 i=1 (Yi − Ŷi )2 i=1
n−2

Pn
Total n−1 i=1 (Yi − Ȳ )2

M SREG
The F -statistic M SE is used to test

H0 : µ{Y |X} = β0 versus HA : µ{Y |X} = β0 + β1 X for some β1 6= 0.

or H0 : β1 = 0 vs. HA : β1 6= 0 for short. The test is equivalent to the t-test that we learned about previously
because

M SREG β̂12 2
(1) F = =h i2 = t and (2) T 2 ∼ F with 1 and n − 2 d.f. ⇐⇒ T ∼ t with n − 2 d.f.
M SE SE(β̂1 )
M SREG
The F -statistic M SE is a special case of the F -statistic used to compare full and reduced models.

[RSS(red.) − RSS(full)]/[dfRSS(red.) − dfRSS(full) ]


F =
RSS(full)/dfRSS(full)

Recall that our null and alternative hypotheses are

H0 : µ{Y |X} = β0 versus HA : µ{Y |X} = β0 + β1 X for some β1 6= 0.

The full model corresponds to the situation where β1 can be any value. The reduced model forces β1 to be
0, just like H0 . Write down formulas for RSS(red.), RSS(full), dfRSS(red.) , and dfRSS(full) for the special
case of simple linear regression; and show that the resulting reduced vs. full model F -statistic is the same as
F = MM SREG
SE .

SSREG SSE
Because SST O = SSREG + SSE, we may write 1 = SST O + SST O .

SSE
SST O is the proportion of total variation in the Y values that was not explained by the regression of Y on X.

SSE SSREG
The remaining proportion of variation in the Y values is 1 − SST O = SST O . This quantity – known as the
coefficient of determination – is the proportion of the variation in the Y values that was explained by the
regression of Y on X.

It can be shown that the coefficient of determination is equal to the square of the sample linear correlation coeffi-
cient between X and Y .
SSE SSREG
1− = = r2
SST O SST O

You might also like