Introduction Econometrics R

Econometrics using R
Rajat Tayal
Fourth Quantitative Finance Workshop
December 21-December 24, 2012
Indian Institute of Technology, Kanpur
23 December 2012
Rajat Tayal (IIT Kanpur)
Introduction to Estimation/Computing Environment -II 23 December 2012
1/1
Outline of the presentation

Linear regression
Simple linear regression
Multiple linear regression
Partially linear models
Factors, interactions, and weights
Linear regression with time series data
Linear regression with panel data
Systems of linear equations
Regression diagnostics
Leverage and standardized residuals
Deletion diagnostics
The function influence.measures()
Testing for heteroskedasticity
Testing for functional form
Testing for autocorrelation
Robust standard errors and tests
2/1
Part I
Linear regression
3/1
Introduction
The linear regression model, typically estimated by ordinary least squares
(OLS), is the workhorse of applied econometrics. The model is
yi = xiT + i , i = 1, 2, . . . , n.
(1)
y = X +
(2)
E (|X ) = 0
(3)
Var (|X ) = 2 I
(4)
E (j |xi ) = 0, i j.
(5)
For cross-sections:
For time series:
4/1
Introduction
We estimate the OLS by:

= (X T X )1 X T y
(6)
the residuals are = y y

The corresponding fitted values are: y = X ,
T
and the residual sum of squares is .
In R, models are typically fitted by calling a model-fitting function, in this
case lm(), with a formula object describing the model and a data.frame
object containing the variables used in the formula.
fm < lm(formula, data, . . .)
5/1
The first example

Data from Stock & Watson (2007) on subscriptions to economics journals
at US libraries for the year 2000:
>
>
>
>
>
>
install.packages("AER", dependencies=TRUE)
library(AER)
data("Journals") ; names("Journals")
journals <- Journals[, c("subs", "price")]
journals$citeprice <- Journals$price/Journals$citations
summary(journals)
subs
price
citeprice
Min.
:
2.0
Min.
: 20.0
Min.
: 0.005223
1st Qu.: 52.0
1st Qu.: 134.5
1st Qu.: 0.464495
Median : 122.5
Median : 282.0
Median : 1.320513
Mean
: 196.9
Mean
: 417.7
Mean
: 2.548455
3rd Qu.: 268.2
3rd Qu.: 540.8
3rd Qu.: 3.440171
Max.
:1098.0
Max.
:2120.0
Max.
:24.459459
6/1
The first example
In view of the wide range of the variables, combined with a

considerable amount of skewness, it is useful to take logarithms.
The goal is to estimate the effect of the price per citation on the
number of library subscriptions.
To explore this issue quantitatively, we will fit a linear regression
model,
log (subs)i = 1 + 2 log (citeprice)i + i
(7)
7/1
The first example
Here, the formula of interest is log (subs) log (citeprice). This can be used
both for plotting and for model fitting:
> plot(log(subs) ~ log(citeprice), data = journals)
> jour_lm <- lm(log(subs) ~ log(citeprice), data = journals)
> abline(jour_lm)
abline() extracts the coefficients of the fitted model and adds the
corresponding regression line to the plot.
8/1
The first example
9/1
The first example
The function lm() returns a fitted-model object, here stored as jour lm.
It is an object of class lm.
> class(jour_lm)
[1] "lm"
> names(jour_lm)
[1] "coefficients" "residuals" "effects" "rank" "fitted.val
[7] "qr" "df.residual" "xlevels" "call" "terms" "model"
Introduction to Estimation/Computing Environment -II23 December 2012
10 / 1
The first example

> summary(jour_lm)
Call:
lm(formula = log(subs) ~ log(citeprice), data = journals)
Residuals:
Min
1Q
Median
3Q
Max
-2.72478 -0.53609 0.03721 0.46619 1.84808
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.76621
0.05591
85.25
<2e-16 ***
log(citeprice) -0.53305
0.03561 -14.97
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.7497 on 178 degrees of freedom
Multiple R-squared: 0.5573,
Adjusted R-squared: 0.5548
F-statistic:
224 on 1 and 178 DF, p-value: < 2.2e-16
11 / 1
Generic functions for fitted (linear) model objects

Function
print()
summary()
coef() (or coefficients())
residuals() (or resid())
fitted() (or fitted.values())
anova()
predict()
plot()
confint()
deviance()
vcov()
logLik()
AIC()
Function Description
simple printed display
standard regression output
extracting the regression coefficients
extracting residuals
extracting fitted values
comparison of nested models
predictions for new data
diagnostic plots
confidence intervals for the regression
coefficients
residual sum of squares
(estimated) variance-covariance matrix
log-likelihood (assuming normally distributed
errors)
information criteria including AIC, BIC/SBC
(assuming normally distributed errors)
12 / 1
The first example

It is instructive to take a brief look at what the summary() method
returns for a fitted lm object:
> jour_slm <- summary(jour_lm)

> class(jour_slm)
[1] "summary.lm"
> names(jour_slm)
[1] "call" "terms" "residuals" "coefficients" "aliased" "sigm
[7] "df" "r.squared" "adj.r.squared" "fstatistic" "cov.unscal
> jour_slm$coefficients
Estimate Std. Error
t value
Pr(>|t|)
(Intercept)
4.7662121 0.05590908 85.24934 2.953913e-146
log(citeprice) -0.5330535 0.03561320 -14.96786 2.563943e-33
13 / 1
Analysis of variance
> anova(jour_lm)
Analysis of Variance Table
Response: log(subs)
Df Sum Sq Mean Sq F value
Pr(>F)
log(citeprice)
1 125.93 125.934 224.04 < 2.2e-16 ***
Residuals
178 100.06
0.562
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
The ANOVA table breaks the sum of squares about the mean (for the
dependent variable, here log(subs)) into two parts: a part that is
accounted for by a linear function of log(citeprice) and a part attributed to
residual variation.
14 / 1
Point and Interval estimates
the function coef() can

To extract the estimated regression coefficients ,
be used:
> coef(jour_lm)
(Intercept) log(citeprice)
4.7662121
-0.5330535
> confint(jour_lm, level = 0.95)
2.5 %
97.5 %
(Intercept)
4.6558822 4.8765420
log(citeprice) -0.6033319 -0.4627751
15 / 1
Prediction
Two types of predictions:

1
2
the prediction of points on the regression line and

the prediction of a new data value.
The standard errors of predictions for new data take into account
both the uncertainty in the regression line and the variation of the
individual points about the line.
Thus, the prediction interval for prediction of new data is larger than
that for prediction of points on the line. The function predict()
provides both types of standard errors.
16 / 1
Prediction
> predict(jour_lm, newdata = data.frame(citeprice = 2.11),

interval = "confidence")
fit
lwr
upr
1 4.368188 4.247485 4.48889
> predict(jour_lm, newdata = data.frame(citeprice = 2.11),
interval = "prediction")
fit
lwr
upr
1 4.368188 2.883746 5.852629
The point estimates are identical (fit) but the intervals differ.
The prediction intervals can also be used for computing and visualizing
confidence bands.
17 / 1
Prediction
>
>
>
>
>
>
lciteprice <- seq(from = -6, to = 4, by = 0.25)

jour_pred <- predict(jour_lm, interval = "prediction", newda
plot(log(subs) ~ log(citeprice), data = journals)
lines(jour_pred[, 1] ~ lciteprice, col = 1)
lines(jour_pred[, 2] ~ lciteprice, col = 1, lty = 2)
lines(jour_pred[, 3] ~ lciteprice, col = 1, lty = 2)
18 / 1
Prediction
Figure: Scatterplot with prediction intervals for the journals data

19 / 1
Plotting lm objects
The plot() method for class lm() provides six types of diagnostic plots,
four of which are shown by default.
We set the graphical parameter mfrow to c(2, 2) using the par() function,
creating a 2 2 matrix of plotting areas to see all four plots simultaneously:
> par(mfrow = c(2, 2))
> plot(jour_lm)
> par(mfrow = c(1, 1))
The first provides a graph of residuals versus fitted values, the second is a
QQ plot for normality, plots three and four are a scale-location plot and a
plot of standardized residuals against leverages, respectively.
20 / 1
Plotting lm objects
Figure: Diagnostic plots for the journals data

21 / 1
Testing a linear hypothesis
The standard regression output as provided by summary() only indicates

individual significance of each regressor and joint significance of all
regressors in the form of t and F statistics, respectively. Often it is
necessary to test more general hypotheses.
This is possible using the function linear.hypothesis() from the car package.
Suppose we want to test the hypothesis that the elasticity of the number
of library subscriptions with respect to the price per citation equals 0.5.
H0 : 2 = 0.5
(8)
22 / 1
Testing a linear hypothesis
> linear.hypothesis(jour_lm, "log(citeprice) = -0.5")

Linear hypothesis test
Hypothesis:
log(citeprice) = - 0.5
Model 1: restricted model
Model 2: log(subs) ~ log(citeprice)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
179 100.54
2
178 100.06 1
0.48421 0.8614 0.3546
23 / 1

In economics, most regression analyses comprise more than a single
regressor. Often there are regressors of a special type, usually referred to
as dummy variables in econometrics, which are used for coding
categorical variables.
> data("CPS1988")
> summary(CPS1988)
wage
Min.
:
50.05
1st Qu.: 308.64
Median : 522.32
Mean
: 603.73
3rd Qu.: 783.48
Max.
:18777.20
education
Min.
: 0.00
1st Qu.:12.00
Median :12.00
Mean
:13.07
3rd Qu.:15.00
Max.
:18.00
experience
Min.
:-4.0
1st Qu.: 8.0
Median :16.0
Mean
:18.2
3rd Qu.:27.0
Max.
:63.0
ethnicity
cauc:25923
afam: 2232
smsa
no : 7223
yes:20932
region
northeast:6441
midwest :6863
south
:8760
west
:6091
parttime
no :25631
yes: 2524
The model of interest is

log (wage) = 1 +2 experience+3 experience 2 +4 education+5 ethnicity +
(9)
24 / 1

> cps_lm <- lm(log(wage) ~ experience + I(experience^2) + education
+ ethnicity, data = CPS1988)
> summary(cps_lm)
Call:
lm(formula = log(wage) ~ experience + I(experience^2) + education +
ethnicity, data = CPS1988)
Residuals:
Min
1Q Median
3Q
Max
-2.9428 -0.3162 0.0580 0.3756 4.3830
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.321e+00 1.917e-02 225.38
<2e-16 ***
experience
7.747e-02 8.800e-04
88.03
<2e-16 ***
I(experience^2) -1.316e-03 1.899e-05 -69.31
<2e-16 ***
education
8.567e-02 1.272e-03
67.34
<2e-16 ***
ethnicityafam
-2.434e-01 1.292e-02 -18.84
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.5839 on 28150 degrees of freedom
Multiple R-squared: 0.3347,
Adjusted R-squared: 0.3346
F-statistic: 3541 on 4 and 28150 DF, p-value: < 2.2e-16
25 / 1
Comparison of models
With more than a single explanatory variable, it is interesting to test for
the relevance of subsets of regressors. For any two nested models, this can
be done using the function anova(). E.g. to test for the relevance of the
variable ethnicity, we explicitly fit the model without ethnicity and then
compare both models.
> cps_noeth <- lm(log(wage) ~ experience + I(experience^2) +
education, data = CPS1988)
> anova(cps_noeth, cps_lm)
Analysis of Variance Table
Model 1: log(wage) ~ experience + I(experience^2) + education
Model 2: log(wage) ~ experience + I(experience^2) + education + ethnicity
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1 28151 9719.6
2 28150 9598.6 1
121.02 354.91 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
This reveals that the effect of ethnicity is significant at any reasonable level.
26 / 1
Comparison of models
> cps_noeth <- update(cps_lm, formula = . ~ . - ethnicity)

> waldtest(cps_lm, . ~ . - ethnicity)
Wald test
Res.Df Df
F
Pr(>F)
1 28150
2 28151 -1 354.91 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
27 / 1
Part II
Linear regression with panel data
28 / 1
Introduction
There has been considerable interest in panel data econometrics over

the last two decades.
The package plm (Croissant and Millo 2008) contains the relevant
fitting functions and methods for specifications in R.
Two types of panel data models:
1
2
Statis linear models

Dynamic linear models
29 / 1
Introduction
For illustrating the basic fixed- and random-effects methods, we use the
wellknown Grunfeld data (Grunfeld 1958) comprising 20 annual
observations on the three variables real gross investment (invest), real
value of the firm (value), and real value of the capital stock (capital) for
11 large US firms for the years 1935-1954.
>
>
>
>
data("Grunfeld", package = "AER")

library("plm")
gr <- subset(Grunfeld, firm %in% c("General Electric", "Gene
pgr <- plm.data(gr, index = c("firm", "year"))
# The last command tells R that the individuals are

called "firm", whereas the time identifier is called "year".
Alternatively, pooled OLS can be run using:
> gr_pool <- plm(invest ~ value + capital, data = pgr, model
= "pooling")
30 / 1
One-way panel regression

investit = 1 values + 2 capital + i + it
(10)
where i = 1, . . . , n, t = 1, . . . , T, and the i denote the individual-specific effects. A
fixed-effects version is estimated by running OLS on a within-transformed model:
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within")
> summary(gr_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = invest ~ value + capital, data = pgr, model = "within")
Balanced Panel: n=3, T=20, N=60
Residuals :
Min. 1st Qu. Median 3rd Qu.
Max.
-167.00 -26.10
2.09
26.80 202.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value
0.104914
0.016331 6.4242 3.296e-08 ***
capital 0.345298
0.024392 14.1564 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Total Sum of Squares:
1888900 Residual Sum of Squares: 243980
R-Squared
: 0.87084
Adj. R-Squared : 0.79827
F-statistic: 185.407 on 2 and 55 DF, p-value: < 2.22e-16
31 / 1

A two-way model could have been estimated upon setting effect =
twoways.
If fixed effects need to be inspected, a fixef() method and an
associated summary() method are available.
To check whether the fixed effects are really needed, we compare the
fixed effects and the pooled OLS fits by means of pFtest().
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within")
> pFtest(gr_fe, gr_pool)
F test for individual effects
data: invest ~ value + capital
F = 56.8247, df1 = 2, df2 = 55, p-value = 4.148e-14
alternative hypothesis: significant effects
32 / 1
It is also possible to fit a random-effects version of (3.3) using the

same fitting function upon setting model = random and selecting a
method for estimating the variance components.
Four methods are available: Swamy-Arora, Amemiya,
Wallace-Hussain, and Nerlove.
> gr_re <- plm(invest ~ value + capital, data = pgr, model = "random",
random.method = "walhus")
> summary(gr_re)
33 / 1

Oneway (individual) effect Random Effect Model
(Wallace-Hussains transformation)
Call:
plm(formula = invest ~ value + capital, data = pgr, model = "random",
random.method = "walhus")
Balanced Panel: n=3, T=20, N=60
Effects:
var std.dev share
idiosyncratic 4389.31
66.25 0.352
individual
8079.74
89.89 0.648
theta: 0.8374
Residuals :
Min. 1st Qu. Median 3rd Qu.
Max.
-187.00 -32.90
6.96
31.40 210.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -109.976572
61.701384 -1.7824
0.08001 .
value
0.104280
0.014996 6.9539 3.797e-09 ***
capital
0.344784
0.024520 14.0613 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Total Sum of Squares:
1988300
Residual Sum of Squares: 257520
R-Squared
: 0.87048
Adj. R-Squared : 0.82696
Rajat Tayal (IIT
Kanpur) on 2 Introduction
to Estimation/Computing
Environment -II23 December 2012
F-statistic:
191.545
and 57 DF,
p-value: < 2.22e-16
34 / 1

A comparison of the regression coefficients shows that fixed- and
randomeffects methods yield rather similar results for these data.
To check whether the random effects are really needed, a Lagrange
multiplier test is available in plmtest(), defaulting to the test
proposed by Honda (1985).
> plmtest(gr_pool)
Lagrange Multiplier Test - (Honda)
normal = 15.4704, p-value < 2.2e-16
alternative hypothesis: significant effects
35 / 1

Random-effects methods are more efficient than the fixed-effects
estimator under more restrictive assumptions, namely exogeneity of
the individual effects. It is therefore important to test for endogeneity,
and the standard approach employs a Hausman test. The relevant
function phtest() requires two panel regression objects, in our case
yielding
> phtest(gr_re, gr_fe)
Hausman Test
chisq = 0.0404, df = 2, p-value = 0.98
alternative hypothesis: one model is inconsistent
In line with the rather similar estimates presented above, endogeneity
does not appear to be a problem here.
36 / 1
To conclude this section, we present a more advanced example, the

dynamic panel data model:
yit =
p
X
j yi,tj + xitT + uit , uit = i + t + it
(11)
i=1
This is estimated by the method of Arellano and Bond (1991) viz.

generalized method of moments (GMM) estimator utilizing lagged
endogenous regressors after a first-differences transformation.
37 / 1
> data("EmplUK", package = "plm")

> form <- log(emp) ~ log(wage) + log(capital) + log(output)
# The function providing the Arellano-Bond estimator
is pgmm().
>
+
+
+
empl_ab <- pgmm(dynformula(form, list(2, 1, 0, 1)),

data = EmplUK, index = c("firm", "year"),
effect = "twoways", model = "twosteps",
gmm.inst = ~ log(emp), lag.gmm = list(c(2, 99)))
38 / 1

Twoways effects Two steps model
Call:
pgmm(formula = dynformula(form, list(2, 1, 0, 1)), data = EmplUK,
effect = "twoways", model = "twosteps", index = c("firm",
"year"), ... = list(gmm.inst = ~log(emp), lag.gmm = list(c(2,
99))))
Unbalanced Panel: n=140, T=7-9, N=1031
Number of Observations Used: 611
Residuals
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
-0.6191000 -0.0255700 0.0000000 -0.0001339 0.0332000 0.6410000
Coefficients
Estimate Std. Error z-value Pr(>|z|)
lag(log(emp), c(1, 2))1 0.474151
0.085303
5.5584 2.722e-08 ***
lag(log(emp), c(1, 2))2 -0.052967
0.027284 -1.9413 0.0522200 .
log(wage)
-0.513205
0.049345 -10.4003 < 2.2e-16 ***
lag(log(wage), 1)
0.224640
0.080063
2.8058 0.0050192 **
log(capital)
0.292723
0.039463
7.4177 1.191e-13 ***
log(output)
0.609775
0.108524
5.6188 1.923e-08 ***
lag(log(output), 1)
-0.446373
0.124815 -3.5763 0.0003485 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
39 / 1
Sargan Test: chisq(25) = 30.11247 (p.value=0.22011)

Autocorrelation test (1): normal = -2.427829 (p.value=0.0075948)
Autocorrelation test (2): normal = -0.3325401 (p.value=0.36974)
Wald test for coefficients: chisq(7) = 371.9877 (p.value=< 2.22e-16)
Wald test for time dummies: chisq(6) = 26.9045 (p.value=0.0001509)
The results suggest that autoregressive dynamics are important for these
data.
40 / 1
Part III
Regression diagnostics
41 / 1
Review
>
>
>
>
>
data("Journals")
journals <- Journals[, c("subs", "price")]
journals$citeprice <- Journals$price/Journals$citations
journals$age <- 2000 - Journals$foundingyear
jour_lm <- lm(log(subs) ~ log(citeprice), data = journals)
42 / 1
For cross-section regressions, the assumption Var (i |xi ) = 2 is

typically in doubt. A popular test for checking this assumption is the
Breusch-Pagan test (Breusch and Pagan 1979).
For our model fitted to the journals data, stored in jour lm, the
diagnostic plots in suggest that the variance decreases with the fitted
values or, equivalently, it increases with the price per citation.
Hence, the regressor log(citeprice) used in the main model should also
be employed for the auxiliary regression.
Under H0 , the test statistic of the Breusch-Pagan test approximately
follows a 2q distribution, where q is the number of regressors in the
auxiliary regression (excluding the constant term).
43 / 1

The function bptest() implements all these flavors of the
Breusch-Pagan test. By default, it computes the studentized statistic
for the auxiliary regression utilizing the original regressors X.
> bptest(jour_lm)
studentized Breusch-Pagan test
data: jour_lm
BP = 9.803, df = 1, p-value = 0.001742
Alternatively, the White test picks up the heteroskedasticity. It uses
the original regressors as well as their squares and interactions in the
auxiliary regression, which can be passed as a second formula to
bptest().
> bptest(jour_lm, ~ log(citeprice) + I(log(citeprice)^2),
+ data = journals)
studentized Breusch-Pagan test
data: jour_lm
BP = 10.912, df = 2, p-value = 0.004271
44 / 1
Testing the functional form

The assumption E (|X ) = 0 is crucial for consistency of the
least-squares estimator. A typical source for violation of this
assumption is a misspecification of the functional form; e.g., by
omitting relevant variables. One strategy for testing the functional
form is to construct auxiliary variables and assess their significance
using a simple F test. This is what Ramseys RESET does.
The function resettest() defaults to using second and third powers of
the fitted values as auxiliary variables.
> resettest(jour_lm)
RESET test
data: jour_lm
RESET = 1.4409, df1 = 2, df2 = 176, p-value = 0.2395
45 / 1
Testing the functional form
The rainbow test (Utts 1982) takes a different approach to testing

the functional form. It fits a model to a subsample (typically the
middle 50%) and compares it with the model fitted to the full sample
using an F test.
> raintest(jour_lm, order.by = ~ age, data = journals)
Rainbow test
data: jour_lm
Rain = 1.774, df1 = 90, df2 = 88, p-value = 0.003741
This appears to be the case, signaling that the relationship between
the number of subscriptions and the price per citation also depends
on the age of the journal.
46 / 1
Testing for autocorrelation

Let us reconsider the first model for the US consumption function.
> library(dynlm)
> data("USMacroG")
> consump1 <- dynlm(consumption ~ dpi + L(dpi), data =
USMacroG)
A classical testing procedure suggested for assessing autocorrelation

in regression relationships is the Durbin-Watson test (Durbin and
Watson 1950).
> dwtest(consump1)
Durbin-Watson test
data: consump1
DW = 0.0866, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0
Further tests for autocorrelation are the Box-Pierce test and the
Ljung-Box test, both being implemented in the function Box.test() in
base R.
> Box.test(residuals(consump1), type = "Ljung-Box")
Box-Ljung test
Rajat
Tayal (IITresiduals(consump1)
Kanpur)
data:
47 / 1
Thats all for today
Thank You....
48 / 1

Introduction Econometrics R

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction Econometrics R

Uploaded by

Copyright:

Available Formats

Econometrics using R

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

Outline of the presentation

Introduction to Estimation/Computing Environment -II 23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

For time series:

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

We estimate the OLS by:

the residuals are  = y y

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

In view of the wide range of the variables, combined with a

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

The first example

Introduction to Estimation/Computing Environment -II23 December 2012

Generic functions for fitted (linear) model objects

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

The first example

> jour_slm <- summary(jour_lm)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Introduction to Estimation/Computing Environment -II23 December 2012

Point and Interval estimates

the function coef() can

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Two types of predictions:

the prediction of points on the regression line and

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

> predict(jour_lm, newdata = data.frame(citeprice = 2.11),

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

lciteprice <- seq(from = -6, to = 4, by = 0.25)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Figure: Scatterplot with prediction intervals for the journals data

Introduction to Estimation/Computing Environment -II23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Figure: Diagnostic plots for the journals data

Introduction to Estimation/Computing Environment -II23 December 2012

Testing a linear hypothesis

The standard regression output as provided by summary() only indicates

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

the residuals are = y y

For cross-section regressions, the assumption Var (i |xi ) = 2 is