Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)

Multiple linear regression
Part 2
Beginning of next lecture online

course evaluation (bring a tablet,
laptop, phone?)
Model comparison by F-Test
Tests whether the addition (removal) of one or more terms
significantly increases (decreases) model fit
Works via a model comparison just like we discussed for t-tests,

regression & ANOVA
The reduced model has to be a subset of the full (i.e. they are
nested models)
E.g. all terms in reduced model are also in full; full has additional
terms unique to it, but reduced has no terms unique to it
In R, use anova(model1, model2). Order doesnt matter to

interpretation.
2
Model comparison by F-Test
> anova(model.polynomial.reduced, full.model.2ndinteractions)
Analysis of Variance Table
Model 1: logherp ~ logarea + thtden + swamp + I(swamp^2)

Model 2: logherp ~ logarea + cpfor2 + thtden + swamp + I(swamp^2) + logarea:cpfor2 +
logarea:thtden + logarea:swamp + cpfor2:thtden + cpfor2:swamp + thtden:swamp
Res.Df RSS Df Sum of Sq F Pr(>F)

1 23 0.25999
2 16 0.18651 7 0.073486 0.9006 0.5294
> anova(full.model.2ndinteractions, model.polynomial.reduced)

Model 1: logherp ~ logarea + cpfor2 + thtden + swamp + I(swamp^2) + logarea:cpfor2 +

logarea:thtden + logarea:swamp + cpfor2:thtden + cpfor2:swamp + thtden:swamp
Model 2: logherp ~ logarea + thtden + swamp + I(swamp^2)
Res.Df RSS Df Sum of Sq F Pr(>F)

1 23 0.25999
2 16 0.18651 -7 -0.073486 0.9006 0.5294
3
What if relationship between Y and
one or more Xs is nonlinear?
Option 1: transform data.
Option 2: use non-linear regression.
Option 3: use polynomial regression.
4
The polynomial regression model
In polynomial
regression, the 1000
regression model
Black fly biomass

(mgDM/m)
includes terms of
increasingly higher 100
powers of the Linear model

dependent 2nd order
polynomial model
variable. 10
10 30 50 70 90 110
k
Yi j X i j i Current velocity (cm/s)
j 1
5
The polynomial regression model: procedure
Fit simple linear model.

1000
Black fly biomass

Fit model with quadratic,
(mgDM/m)
test for increase in SSmodel .
100
Continue with higher order Linear model
(cubic, quartic, etc.) until 2nd order
there is no further polynomial model
10
significant increase in
10 30 50 70 90 110
SSmodel .
Current velocity (cm/s)
6
Polynomial regression: caveats
The biological significance of Extrapolation of polynomial

the higher order terms in a models is usually nonsense.
polynomial regression (if
any) is often not known.
By definition, polynomial
terms are strongly correlated Y = X1- X12
(i.e. multicolinearity will be
high)
Hence standard errors will
Y
be large (precision is low)
and will increase with the
order of the terms.
X1
7
Example
Lowess curve
1
y
-1
-3
10 30 50 70 90
x
8
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) -0.3701 0.1562 -2.3699 0.0188
x 0.0065 0.0027 2.4114 0.0168
Residual standard error: 1.096 on 198 degrees of freedom

Multiple R-Squared: 0.02853
F-statistic: 5.815 on 1 and 198 degrees of freedom, the p-value is
0.01681
3
2
partial for x
1
0
-1
-2
0 20 40 60 80 100
9
Coefficients:
(Intercept) 1.7718 0.1244 14.2486 0.0000
x -0.1195 0.0057 -21.0307 0.0000
I(x^2) 0.0012 0.0001 22.8826 0.0000

F-statistic: 272.4 on 2 and 197 degrees of freedom, the p-value is 0
3
2
1
y
0
-1
-2
0 20 40 60 80 100
x
10
(Intercept) 1.0295 0.1499 6.8655 0.0000
x -0.0335 0.0128 -2.6150 0.0096
I(x^2) -0.0009 0.0003 -2.9716 0.0033
I(x^3) 0.0000 0.0000 7.3217 0.0000

F-statistic: 248 on 3 and 196 degrees of freedom, the p-value
is 0
3
2
1
y
0
-1
-2
0 20 40 60 80 100
x
11
Coefficients:
(Intercept) 1.0258 0.1923 5.3334 0.0000
x -0.0328 0.0261 -1.2547 0.2111
I(x^2) -0.0009 0.0010 -0.8652 0.3880
I(x^3) 0.0000 0.0000 0.9337 0.3516
I(x^4) 0.0000 0.0000 -0.0307 0.9755

F-statistic: 185 on 4 and 195 degrees of freedom, the p-value is 0
3
2
1
y
0
-1
-2
0 20 40 60 80 100
x
12
> anova(degre1,degre2)
Response: y
Terms Resid. Df RSS Test Df Sum of Sq F Value Pr(F)

1 x 198 237.8462
2 x + x^2 197 65.0221 +I(x^2) 1 172.8242 523.6125 0
Response: y

1 x + x^2 197 65.02205
2 x + x^2 + x^3 196 51.05763 +I(x^3) 1 13.96443 53.60663 6.188605e-012
Response: y

1 x + x^2 + x^3 + x^4 195 51.05738
2 x + x^2 + x^3 196 51.05763 -I(x^4) -1 -0.0002468752 0.0009428737 0.9755352
>
13
Overfitting/Underfitting
underfit pretty good overfit
More examples/discussion:
https://stats.stackexchange.com/questions/128616/whats-a-real-world-example-of-
overfitting
14
Multiway ANOVA vs Multiple regression
Multiway ANOVA Multiple regression

Test for significance of main terms Test for significance of main
and usually interactions terms and sometimes
Often balanced or close to interactions
balanced with no or very low
multicollinearity (balanced design Often some multcollinearity
with no collinearity means type 1 (especially for interactions),
and 3 SS (sequential vs. Partial) meaning sequential vs. partial
are identical) SS very different (order
Rarely used for prediction matters)
(coefficients of little interest; model Often used for prediction (so
does not need to be simplified) coefficients matter, simple
Models include few terms in models are usually better;
general (less than 10 including overfiting an issue)
interactions)
Full models, with interactions,
can include many terms
15
When, and why, do we pool in ANOVA
Pooling: dropping terms to estimate sums of

squares and df when combining across levels
of the dropped factor
When some terms are not significant
i.e. they do not contribute much to model fit and
can be dropped
Why? To increase power and/or simplify
interpretation
Increases df for F tests
Often only minor impact; substantial only when there are
many levels per factor
16
How and why do you seek simple models in
multiple regression?
How: by dropping terms from full model

(analogous to pooling)
Why? To increase power
Increases df for F test
Small impact
Increases factor SS
Large impact when multicollinearity an issue (which it
often is)
17
Multiple regression: the general idea for
inference
Evaluate significance of a
Model A
variable by fitting two models:
(X1 in)
one with the term in, the other
with it removed. MF
Test for change in model fit (e.g. R2)
associated with removal of the Model B
term in question. (X1 out)
Unfortunately, change in
model fit may depend on what
Retain X1
other variables are in model if
( large)
there is multicollinearity!
Delete X1
( small)
18
Fitting multiple regression models
Goal: find the best model, given the available

data.
Problem #1: what is best?
highest R2?
lowest RMS?
highest R2 but contains only individually significant
independent variables?
maximizes R2 with minimum number of independent
variables?
19
Selection of independent variables
(contd)
Problem 2: even if best is defined, by what

method do we find it?
Possibilities:
compute all possible models (2k -1) and choose
the best one.
use some procedure for winnowing down the set
of possible models.
20
Strategy I: computing all possible
models
Compute all possible
models and choose the {X1, X2, X3}
best one.
cons:
time-consuming
leaves definition of best {X1} {X1, X2} {X1, X2, X3}
to researcher
pros: {X2} {X2, X3}
if the best model is
defined, you will find it! {X3} {X1, X3}
21
Strategy II: stepwise forward selection
Begin with the nave (i.e. simplest) model (i.e.

intercept only)
Next entry is the variable which most improves model

fit
E.g. greatest increase in R2adj, most significant F-test
Continue until no remaining variable improves model

fit by the criterion employed
22
Strategy III: stepwise backward selection
Start with a full model with all the variables.
Drop variables whose removal does not compromise model fit

by whatever criteria used (e.g. R2adj, non-significant F-test, etc.)
one at a time, starting with the one with the smallest effect
Continue until only significant variables remain (i.e. those for

which removal would compromise model fit)
Note: once Xj is dropped, it stays out even if it explains a

significant amount of the remaining variability once other
variables are excluded.
23
AIC: Akaike Information Criteria
An index of quality of fit penalized for model complexity

AIC = 2k - 2ln(L)
k = number of parameters in model
L = Likelihood of model
Calculated assuming some distribution for the residuals
Assuming normal distribution for residuals, with variance =
residual variance
For each residual, calculate probability of obtaining a value
that high or higher if residuals are N(0,RMS)
L = product of all the probabilities
If fit is good, L will be large
If fit is bad, L will be very small
Lower (i.e. closer to -) the AIC the better the model fit
24
Example
log of herptile species richness (logherp) as a

function of log wetland area (logarea),
percentage of land within 1 km covered in
forest (cpfor2) and density of hard-surface
roads within 1 km (thtdens)
25
Example (all variables)
Call:
lm(formula = logherp ~ logarea + cpfor2 + thtden, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-0.30729 -0.13779 0.02627 0.11441 0.29582
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.284765 0.191420 1.488 0.149867
logarea 0.228490 0.057647 3.964 0.000578 ***
cpfor2 0.001095 0.001414 0.774 0.446516
thtden -0.035794 0.015726 -2.276 0.032055 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(2 observations deleted due to missingness)
Multiple R-squared: 0.5471, Adjusted R-squared: 0.4904
F-statistic: 9.662 on 3 and 24 DF, p-value: 0.0002291
26
Example: stepwise backward
Start: AIC=-98.27 Akaike Information criterion.
logherp ~ logarea + cpfor2 + thtden To be minimized
Df Sum of Sq RSS AIC
- cpfor2 1 0.01571 0.64508 -99.576
<none> 0.62937 -98.267
- thtden 1 0.13585 0.76522 -94.794
- logarea 1 0.41198 1.04135 -86.167
Step: AIC=-99.58
logherp ~ logarea + thtden

<none> 0.64508 -99.576
- thtden 1 0.25092 0.89600 -92.376
- logarea 1 0.40204 1.04712 -88.013
27
Example: forward stepwise
model.null <- lm(logherp ~ 1, data = mydata)
> step <- stepAIC(model.null, scope = ~. + logarea +
+ cpfor2 + thtden, direction = "forward")
Start: AIC=-82.09
logherp ~ 1

+ logarea 1 0.494 0.896 -92.376
+ thtden 1 0.342 1.047 -88.013
+ cpfor2 1 0.129 1.260 -82.820
<none> 1.390 -82.091
Step: AIC=-92.38
logherp ~ logarea

+ thtden 1 0.251 0.645 -99.576
+ cpfor2 1 0.131 0.765 -94.794
<none> 0.896 -92.376
Step: AIC=-99.58
logherp ~ logarea + thtden

<none> 0.645 -99.576
+ cpfor2 1 0.016 0.629 -98.267
28
Information Theoretic Approach
Fit all models and compute AIC for each one.

Select all likely models, i.e. within 4 AIC
units of the best one
Compute average coefficients of these
models, weighted by model probability of
being the best one.
29
> #####################################################################
> # Dredging and the information theoretical approach
> library(MuMIn)
> dd <- dredge(full.model.2ndinteractions)
>
> # get all possible models
> top.models.1 <- get.models(dd, subset = delta < 4)
> model.avg(top.models.1)
AICc for smaller data sets (N < 40k)
Model summary:
Deviance AICc Delta Weight
2+3+4+5+7 0.23 -36.46 0.00 0.34
2+3+4+5 0.26 -35.91 0.55 0.26
1+2+3+4+5+7 0.22 -33.75 2.72 0.09
2+3+4+5+7+8 0.22 -33.67 2.79 0.08
1+2+3+4+5 0.25 -33.25 3.21 0.07
2+3+4+5+8 0.25 -33.02 3.44 0.06
2+3+4+5+6+7 0.22 -32.90 3.56 0.06
2+3+4+5+6 0.26 -32.50 3.97 0.05
Variables:
1 2 3 4 5
cpfor2 I(swamp^2) logarea swamp thtden
6 7 8
logarea:swamp logarea:thtden swamp:thtden
30
Averaged model parameters:
Coefficient SE Adjusted SE Lower CI Upper CI
(Intercept) -2.02e-01 2.50e-01 2.61e-01 -0.713000 0.310000
cpfor2 -1.27e-04 4.84e-04 5.02e-04 -0.001110 0.000856
I(swamp^2) -2.68e-04 4.91e-05 5.16e-05 -0.000370 -0.000167
logarea 1.27e-01 1.20e-01 1.23e-01 -0.114000 0.369000
swamp 3.20e-02 6.13e-03 6.45e-03 0.019400 0.044700
thtden -7.02e-02 5.35e-02 5.49e-02 -0.178000 0.037500
logarea:swamp 4.73e-05 5.53e-04 5.84e-04 -0.001100 0.001190
logarea:thtden 2.23e-02 2.52e-02 2.58e-02 -0.028300 0.072900
swamp:thtden -3.49e-05 1.46e-04 1.52e-04 -0.000333 0.000263
Relative variable importance:

I(swamp^2) logarea swamp thtden logarea:thtden
1.00 1.00 1.00 1.00 0.57
cpfor2 swamp:thtden logarea:swamp
0.16 0.14 0.10
31
What is the best approach?
Stepwise is the most used

Be very critical of p-values (not corrected for multiple tests)
Beware of multicolinearity
best model may not be found
Forward and backward solution could differ
AIC and Information Theoretic Approach increasingly

used
R does it easily
Explicit recognition that several models are equivalent
Average models proposed as a defence against
multicolinearity and the resulting ambiguities
32
What is the partial regression coefficient?
8
j measures the amount by X2 = -3
which Y changes when Xj is
4
increased by one unit and X2 = -1
all other independent
X2 = 1
variables are held constant. Y 0
-4
X2 = 3
Simple -8
regression -4 -2 0 2 4
Partial regression X1
33
https://stats.stackexchange.com/questions/78828/is-there-a-difference-
between-controlling-for-and-ignoring-other-variables-i/78830#78830
34
How to check the normality of data in multiple regression?
I don't understand why independent variables are always considered fixed? why
not random?
I was a bit confused on the difference between when two variables (ie: X1 and
X2) have an interaction versus when two variables are correlated. Would it be
possible to get an example of both?
go through an example were we transform the data to their z-scores to observe

the relative strength? I understand the theory of this but I'm not sure exactly how
its done.
Why do we have to bin continuous variables to visualize their interactions? Can

we use continuous variables directly?
Can you explain "binning" again or provide an example of when we would "bin"
a variable? Also, how would we recognize variations within bins as slide 2
mentioned?
35
I did not fully understand the definition of a parameter in the context of multiple regression,
and how these relate with the X variables.
Parameters and variables are NOT interchangeable!
Linear regression
Yi 0 1 X i i
dependent
variable independent unexplained
variable variation
parameters
36
If you know with two dependent variable that one is determined by the other are does it
make sense to omit it in a multiple regression
I still do not understand what Cook's distance is and how to interpret the graphs of
residuals vs leverage
How is tolerance useful if we can gain the same information from VIF?
I dont think I really understand the difference between interactions and multicolinearity (in
terms of how to identify them).
Are we required to memorize the degrees of freedom formulae that would be used to
calculate an F ratio, or just know that they are different for random or fixed effects?
In lecture 16: Slide 43: I don't understand "sensitivity of parameter estimates to small
changes in data (multicolinearity)
Table 37.3 (pg. 353) mentions repeated measures (multiple observations of the same
subject at different times) is a violation of independence in multiple regression. Would the
appropriate test(s) be a paired-t-test, or 2/Multi-Way ANOVA?
37

Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)

Uploaded by

Copyright:

Available Formats

Multiple linear regression

Beginning of next lecture online

Works via a model comparison just like we discussed for t-tests,

In R, use anova(model1, model2). Order doesnt matter to

Model 1: logherp ~ logarea + thtden + swamp + I(swamp^2)

Res.Df RSS Df Sum of Sq F Pr(>F)

> anova(full.model.2ndinteractions, model.polynomial.reduced)

Model 1: logherp ~ logarea + cpfor2 + thtden + swamp + I(swamp^2) + logarea:cpfor2 +

Res.Df RSS Df Sum of Sq F Pr(>F)

Black fly biomass

powers of the Linear model

Fit simple linear model.

Black fly biomass

The biological significance of Extrapolation of polynomial

Residual standard error: 1.096 on 198 degrees of freedom

Residual standard error: 0.5745 on 197 degrees of freedom

Residual standard error: 0.5104 on 196 degrees of freedom

Residual standard error: 0.5117 on 195 degrees of freedom

Terms Resid. Df RSS Test Df Sum of Sq F Value Pr(F)

Terms Resid. Df RSS Test Df Sum of Sq F Value Pr(F)

Terms Resid. Df RSS Test Df Sum of Sq F Value Pr(F)

underfit pretty good overfit

Multiway ANOVA Multiple regression

Pooling: dropping terms to estimate sums of

How: by dropping terms from full model

Goal: find the best model, given the available

Problem 2: even if best is defined, by what

Begin with the nave (i.e. simplest) model (i.e.

Next entry is the variable which most improves model

Continue until no remaining variable improves model

Start with a full model with all the variables.

Drop variables whose removal does not compromise model fit

Continue until only significant variables remain (i.e. those for

Note: once Xj is dropped, it stays out even if it explains a

An index of quality of fit penalized for model complexity

log of herptile species richness (logherp) as a

Residual standard error: 0.1619 on 24 degrees of freedom

Df Sum of Sq RSS AIC

Df Sum of Sq RSS AIC

Df Sum of Sq RSS AIC

Df Sum of Sq RSS AIC

Fit all models and compute AIC for each one.

Relative variable importance:

Stepwise is the most used

AIC and Information Theoretic Approach increasingly

go through an example were we transform the data to their z-scores to observe

Why do we have to bin continuous variables to visualize their interactions? Can

Parameters and variables are NOT interchangeable!

You might also like