Econometrics Chapter 14, 15 & 16 PPT Slides

Chapter 14, 15, 16
Time Series
Regression
Copyright 2015 Pearson Education, Inc. All rights reserved.
1. Time Series Data: Whats Different?

Time series data are data collected on the same
observational unit at multiple time periods;
Yt=B0+B1X1t+B2X2t+ut
Aggregate consumption and GDP for a country (for
example, 20 years of quarterly observations = 80
observations)
Yen/$, pound/$ and Euro/$ exchange rates (daily
data for 1 year = 365 observations)
Cigarette consumption per capita in California, by
year (annual data)
14-2
Some monthly U.S. macro and financial

time series
14-3
Logarithm:
14-4
Monthly Percentage Change
14-5
14-6
Some uses of time series data

Forecasting (SW Ch. 14)-separate class Econ 373
Estimation of dynamic causal effects (SW Ch. 15)
If the Fed increases the Federal Funds rate now,
what will be the effect on the rates of inflation and
unemployment in 3 months? in 12 months?
What is the effect over time on cigarette
consumption of a hike in the cigarette tax?
Modeling risks, which is used in financial markets
(one aspect of this, modeling changing variances
and volatility clustering, is discussed in SW Ch.
16)
14-7
Time series data raises new technical issues

Time lags
Correlation over time (serial correlation, a.k.a.
autocorrelation which we encounter in panel data)
Calculation of standard errors when the errors are
serially correlated
A good way to learn about time series data is to
investigate it yourself! A great source for U.S. macro
time series data, and some international data, is the
Federal Reserve Bank of St. Louiss FRED database.
14-8
3. Time Series Data
Time series basics:

A. Notation
B. Lags, first differences, and growth rates
C. Autocorrelation (serial correlation)
D. Stationarity.
14-9
A. Notation
Yt = value of Y in period t.
Data set: {Y1,,YT} are T observations on
the time series variable Y
We consider only consecutive, evenlyspaced observations (for example, monthly,
1960 to 1999, no missing months)
missing and unevenly spaced data introduce
technical complications
14-10
B. Lags, first differences, and growth

rates
14-11
3. Time Series Data
Time series basics:

A. Notation
D. Stationarity.
14-12
AUTOCORRELATION
(Serial Correlation):
follows the laws of multiple
regressors heteroskedasticity

The correlation of a series Yt with its own lagged
values is called autocorrelation or serial
correlation.
The first autocovariance of Yt is cov(Yt,Yt1)
The first autocorrelation of Yt is corr(Yt,Yt1)
cov(Yt , Yt 1 )
Thus
corr(Yt,Yt1) = var(Yt ) var(Yt 1 ) =1
These are population correlations they describe the
population joint distribution of (Yt, Yt1)
14-14
Pure Auto/Serial Correlation

Pure serial correlation occurs when the assumption of
uncorrelated observations of the error term, is violated (in a
correctly specified equation!)
The most commonly assumed kind of serial correlation is
first-order serial correlation, in which the current value of
the error term is a function of the previous value of the error
term:
t = t1 + ut (9.1)
where: = the error term of the equation in question
= the first-order autocorrelation coefficient
u = a classical (not serially correlated) error term
2011 Pearson Addison-Wesley. All rights reserved.
9-15
Pure Serial Correlation (cont.)
t = t1 + ut
The magnitude of indicates the strength of the
serial correlation:
If is zero, there is no serial correlation
As approaches one in absolute value, the previous
observation of the error term becomes more important in
determining the current value of t and a high degree of
serial correlation exists
For to exceed one is unreasonable, since the error term
effectively would explode
As a result of this, we can state that:

1 < < +1(9.2)
9-16
Pure Serial Correlation (cont.)

The sign of indicates the nature of the serial correlation in an
equation:
Positive:
implies that the error term tends to have the same sign from
one time period to the next
this is called positive serial correlation
Negative:
implies that the error term has a tendency to switch signs
from negative to positive and back again in consecutive
observations
this is called negative serial correlation
Figures 9.19.3 illustrate several different scenarios
9-17
Positive or Negative Serial

Correlation ?
9-18
Figure 9.1b
Positive Serial Correlation
9-19
Figure 9.2
No Serial Correlation
9-20
Positive or
Negative Serial Correlation
9-21
Positive or
Negative Serial Correlation
9-22
Impure Serial Correlation

Impure serial correlation is serial correlation that is caused
by a specification error such as:
an omitted variable and/or
an incorrect functional form
How does this happen? Just as with heteroskedasticity in cross sectional data
As an example, suppose that the true equation is:
(9.3)
where t is a classical error term. As learned, if X2 is accidentally omitted
from the equation (or if data for X2 are unavailable), then:
(9.4)
9-23
Impure Serial Correlation (OV)
Instead, the error term is also a function of one of the

explanatory variables, X2
As a result, the new error term, * , can be serially correlated
even if the true error term , is not
In particular, the new error term will tend to be serially
correlated when:
1. X2 itself is serially correlated (this is quite likely in a

time series) and
2. the size of is small compared to the size of
Figure 9.4 illustrates 1., for the case of U.S. disposable
income
9-24
U.S. Disposable Income as a

Function of Time
9-25

(Incorrect Functional Form IFF)
Turn now to the case of impure serial correlation caused by an
incorrect functional form
Suppose that the true equation is polynomial in nature:
(9.7)
but that instead a linear regression is run:
(9. 8)
The new error term * is now a function of the true error term
and of the differences between the linear and the polynomial
functional forms
Figure 9.5 illustrates how these differences often follow fairly
autoregressive patterns
Figure 9.5a Incorrect Functional Form as a

Source of Impure Serial Correlation
9-27
Incorrect Functional Form as a Source of

9-28
The Consequences of Serial

Correlation
The existence of serial correlation in the error term leads to the
estimation of the equation with OLS to have at least three
consequences:
1.
Pure serial correlation does not cause bias in the coefficient

estimates
2.
Serial correlation causes OLS to no longer be the minimum

variance estimator (of all the linear unbiased estimators): So what
doesnt it minimize anymore ? R2
3.
Serial correlation causes the OLS estimates of the SE to be

biased, leading to unreliable hypothesis testing. Typically the
bias in the SE estimate is negative, meaning that OLS
underestimates the standard errors of the coefficients (and thus
overestimates the t-scores). How does this compare to
9-29
heteroskedasticity ?
The DurbinWatson d Test
Two main ways to detect serial correlation:
Informal: observing a pattern in the residuals like we did in figures

Formal: testing for serial correlation using the DurbinWatson d test
We will now go through the second of these in detail
First, it is important to note that the DurbinWatson d test is only

applicable if the following three assumptions are met:
1. The model includes an intercept term: Yt=B1X1T+B2X2T is NOT ok
2. The serial correlation is first-order in nature:
t = t1 + ut where is the autocorrelation coefficient and u is a
classical (normally distributed) error term
3. The regression model does not include a lagged dependent

variable as an independent variable:
Yt=B0+B1X1T+B2X2T+B3Yt-1+ui
9-30
The DurbinWatson
d Test (cont.)
The equation for the DurbinWatson d statistic for T
observations is:
(9.10)
where the ets are the OLS residuals
There are three main cases:
1. Extreme positive serial correlation: d = 0
2. Extreme negative serial correlation: d 4
3. No positive serial correlation: d 2
9-31
The DurbinWatson
d Test (cont.)
To test for positive (note that we rarely, if ever, test for
negative!) serial correlation, the following steps are required:
1. Obtain the OLS residuals from the equation to be tested
and calculate the d statistic by using Equation 9.10:
2. Determine the sample size and the number of

explanatory variables and then consult a Statistical
Table to find the upper critical d value, dU, and the lower
critical d value, dL, respectively
9-32
The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0: 0 (no positive serial correlation)
HA: > 0 (positive serial correlation)
if d < dL
Reject H0
if d > dU
Do not reject H0
if dL d dU
Inconclusive
In rare circumstances, perhaps first differenced equations, a

two-sided d test might be appropriate
In such a case, steps 1 and 2 are still used, but step 3 is now:
9-33
The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0: = 0
(no serial correlation)
HA: 0
(serial correlation)
if d < dL
Reject H0
if d > 4 dL
Reject H0
if 4 dU > d > dU Do Not Reject H0

Otherwise Inconclusive
Figure 9.6 gives an example of a one-sided Durbin Watson d test
9-34
Figure 9.6 An Example of a OneSided DurbinWatson d Test
9-35
https://www3.nd.edu/~wevans1/econ30331/
Durbin_Watson_tables.pdf
14-36
More in class practice and examples

A farmers association hire you to predict inches of growth
for corn as a function of rain on a monthly basis (they
provide you with the data they have been collecting for
the past 14 months). You estimate the model:
InGrwtht=B0+B1InRaint+B2Tempt+ut
1. What sign to you expect each coefficient to have ?
2. Your results are:
InGrwtht=1.2+.07InRaint+.03Tempt , R2=.48
(.07)
(.003)
(.02)
Which are significant ?

3. Interpret in words the findings for your employer.
14-37

InGrwtht=1.2+.07InRaint+.03Tempt , R2=.48
(.07)
(.003)
(.02)
4. How would you test whether your model suffers from

serial correlation ?
5. You run the DW test and find: d=2.8. Do you have
14-38

5. You run the DW test and find: d=2.8. Do you have
14-39
Remedy 1: Generalized Least

Squares
Start with an equation that has first-order serial correlation:
(9.15)
Which, if t = t1 + ut (due to pure serial correlation), also
equals:
(9.16)
Multiply Equation 9.15 by and then lag the new equation
by one period, obtaining:
(9.17)
9-40
Generalized Least Squares

(cont.)
Next, subtract Equation 9.17 from Equation 9.16,
obtaining:
(9.18)
Finally, rewrite equation 9.18 as:

(9.19)
(9.20)
9-41

Equation 9.19 is called a Generalized Least Squares
(or quasi-differenced) version of Equation 9.16.
Notice that:
1. The error term is not serially correlated
a. As a result, OLS estimation of Equation 9.19 will be minimum
variance
b. This is true if we know or if we accurately estimate
2. The slope coefficient 1 is the same as the slope

coefficient of the original serially correlated equation,
Equation 9.16. Thus coefficients estimated with GLS have
the same meaning as those estimated with OLS.
9-42

3. The dependent variable has changed compared
to that in Equation 9.16:
This means that the GLS is not directly comparable
to the OLS.
4. To forecast with GLS, adjustments discussed later
are required
Unfortunately, we cannot use OLS to estimate a GLS
model because GLS equations are inherently nonlinear
in the coefficients
Fortunately, there are at least two other methods
available:
14-43
9-43
1.The CochraneOrcutt Method

This is a two-step iterative technique that first produces an
estimate of and then estimates the GLS equation using that
estimate.
The two steps are:
1. Estimate by running a regression based on the residuals of the
equation suspected of having serial correlation:
et = et1 + ut (9.21)
where the ets are the OLS residuals from the equation suspected
of having pure serial correlation and ut is a classical error term
2. Use this to estimate the GLS equation by substituting into
Equation 9.18
and using OLS to estimate Equation 9.18 with the adjusted data
These two steps are repeated (iterated) until further iteration results
in little change in
Once has converged (usually in just a few iterations), the last
estimate of step 2 is used as a final estimate of Equation 9.18
14-44
9-44
2. The AR(1) Method

The AR(1) method estimates a GLS equation like Equation
9.18
by estimating 0, 1 and simultaneously with iterative
nonlinear regression techniques (that are well beyond the
scope of this class!)
The AR(1) method tends to produce the same coefficient
estimates as CochraneOrcutt
However, the estimated standard errors are smaller
This is why the AR(1) approach is recommended as long as
your software can support such nonlinear regression
9-45
Remedies for Serial

Correlation
The place to start in correcting a serial correlation problem is to
considered
Remember we said there are two main remedies for pure

serial correlation:
1. Generalized Least Squares we just learned it
2. Newey-West standard errors what is this ? And when
would we use this instead of GLS ? Next !
9-46
Remedy 2: NeweyWest
Standard Errors
Not all corrections for pure serial correlation involve Generalized
Least Squares (GLS does not do well in small samples)
NeweyWest standard errors take account of serial correlation
by correcting the standard errors without changing the
estimated coefficients
The logic begin NeweyWest standard errors is powerful:
If serial correlation does not cause bias in the estimated
coefficients but does impact the standard errors, then it
makes sense to adjust the estimated equation in a way that
changes the standard errors but not the coefficients
9-47
NeweyWest Standard Errors

(cont.)
The NeweyWest SEs are biased but generally more
accurate than uncorrected standard errors for large
samples in the face of serial correlation
As a result, NeweyWest standard errors can be used for
t-tests and other hypothesis tests in most samples without
the errors of inference potentially caused by serial
correlation
Typically, NeweyWest SEs are larger than OLS SEs, thus
producing lower t-scores
9-48
DYNAMIC MODELS
14-49
Dynamic Models: Distributed Lag Models

An (ad hoc) distributed lag model explains the
current value of Y as a function of current and past
values of X, thus distributing the impact of X over a
number of time periods
For example, we might be interested in the impact of a
change in the money supply (X) on GDP (Y) and model
this as:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
(12.2)
Or, in our example:
GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... + pMStp + t
14-50
12-
Dynamic Models: Distributed Lag Models

interested in the impact of a change in the money supply
(X) on GDP (Y) and model this as:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
(12.2)
Or, in our example:
If we estimate such a model, what would we

expect ?
14-51
12-
What Is a Dynamic Model? (DLM)

Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
where: 1 = 0
3 = 30
.
.
p = P0
(12.2)
(12.8) 2 = 20
As long as is between 0 and 1, these coefficients

will indeed smoothly decline, as shown in Figure 12.1
14-52
12-
Figure 12.1 Geometric Weighting Schemes for

Various Dynamic Models
14-53
12-
Potential issues from estimating Equation

12.2 with OLS:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
(12.2)

1. The various lagged values of X are likely to be severely
multicollinear, making coefficient estimates
imprecise
there is no guarantee that the estimated coefficients
will follow the smoothly declining pattern that
economic theory would suggest
Instead, its quite typical to get something like:
14-54
Potential issues from estimating Equation 12.2

with OLS:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
(12.2)
2. The degrees of freedom tend to decrease,

sometimes substantially, since we have to:
estimate a coefficient for each lagged X, thus
increasing K and lowering the degrees of
freedom (N K 1)
decrease the sample size by one for each
lagged X, thus lowering the number of
observations, N, and therefore the degrees of
freedom (unless data for lagged Xs outside the
14-55
12-
If Ad Hoc Distributed Lag Models

Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... +
+pMStp + t
have all these problems, how can we still
correctly estimate, say, the impact of a change in
the money supply on GDP ?
14-56
Ad Hoc DLM problem resolution

Because of the aforementioned problems with
an Ad Hoc Distributed Lag Model:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
we always want to rewrite it as
Yt 0 0 X t Yt1 ut
(12.3)
GDPt = 0 + 0MSt + 1GDPt1 + ut

Note that Y is on the left-hand side as Yt, and
on the right-hand side as Yt1
Its this difference in time period that
14-57
makes the equation dynamic
12Copyright 2015 Pearson Education, Inc. All rights reserved.
What Is a Dynamic Model?

The simplest dynamic model is:
Yt 0 0 X t Yt1 ut
(12.3)

Note that Y is on the left-hand side as Yt,
and on the right-hand side as Yt1
Its this difference in time period that
makes the equation dynamic
14-58
12-
Serial Correlation and Dynamic Models

Dynamic models:
Now serial correlation causes bias in the coefficients

produced by OLS
Yt 0 0 X t Yt1 ut
Can we use the Durbin Watson d test to detect this ?

Why or why not ?
14-59
12-
Testing for Serial Correlation in Dynamic Models
Yt 0 0 X t Yt1
ut
Using the Lagrange Multiplier to test for serial

correlation for a typical dynamic model involves
three steps:
1. Obtain the residuals of the estimated equation:
2. Use these residuals as the dependent variable in

an auxiliary regression that includes as
independent variables all those on the right-hand
side of the original equation as well as the lagged
residuals:
14-60
12-
Testing for Serial Correlation in Dynamic Models

3. Estimate Eq 12.18
using OLS and then test the null hypothesis that a3 = 0 with
the following test statistic:
LM = N*R2 (12.19)
where: N = the sample size
R2 is the unadjusted coefficient of determination
For large samples, LM has a chi-square distribution with
degrees of freedom equal to the number of restrictions in the
null hypothesis (in this case, one).
If LM is greater than the critical chi-square value from
the corresponding Statistical Table, then we reject the
null hypothesis that a3 = 0 and conclude that there is
indeed serial correlation in the original equation
14-61
12-

InGrwtht=B0+B1InRaint+B2Tempt+B3InGrwtht-1+ut
1. What sign to you expect each coefficient to have ?
2. Your results are:
InGrwtht=1.3+.11InRaint+.19Tempt-.01InGrwtht-1, R2=.48
(.07)
(.003)
(.02)
(.003)
Which are significant ?

3. Was introducing the lag of the dependent variable a
14-62
good idea or should you remove it ?

A farmers association hire you to predict inches of growth for
corn as a function of rain on a monthly basis (they provide you
with the data they have been collecting for the past 14
months). You estimate the model:
(.07) (.003)
(.02)
(.003)
4. Interpret in words the findings for your employer.

5. How would you test whether your model suffers from
6. You run the LM test and find: LM=_____
7 . Do you have serial correlation ?
14-63

A farmers association hire you to predict inches of growth for
corn as a function of rain on a monthly basis (they provide you
with the data they have been collecting for the past 14
months). You estimate the model:
(.07) (.003)
(.02)
(.003)
6. You run the LM test and find: LM=N* R2=6.72

7 . Do you have serial correlation? F-test table (Chi-Square)
14-64
Correcting for Serial Correlation in

Dynamic Models
There are essentially three strategies for attempting to rid a
dynamic model of serial correlation:
improving the specification:
Only relevant if the serial correlation is impure
instrumental variables:
substituting an instrument (a variable that is highly correlated with YM
but is uncorrelated with ut) for Yt: in the original equation effectively
eliminates the correlation between Ytl and ut
Problem: good instruments are hard to come by (more in Ch 12)
modified GLS:
Technique similar to the GLS procedure we learned
Potential issues: sample must be large and the standard
14-65
12-
Then, are Ad Hoc Distributed Lag

Models
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... +
+pMStp + t
useless or do they offer any

information to a researcher ?
They can tell us if one time-series variable

consistently and predictably changes before
another one.
14-66
9-66
Granger Causality
Granger causality, or precedence, is a circumstance
in which one time series variable consistently and
predictably changes before another variable
A word of caution: even if one variable precedes
(Granger causes) another, this does not mean that the
first variable causes the other to change
There are several tests for Granger causality
They all involve distributed lag models in one form or
another, however
Well discuss an expanded version of a test originally
developed by Granger
14-67
12-
Granger Causality (cont.)

Granger suggested that to see if A Granger-caused Y, we
should run:
Yt = 0 + 1Yt1 + ... + pYtp + 1At1 + ... + pAtp + t(12.20)
and test the null hypothesis that the coefficients of the
lagged As (the s) jointly equal zero
If we can reject this null hypothesis using the F-test,
then we have evidence that A Granger-causes Y
Note that if p = 1, Equation 12.20 is similar to the
dynamic model, Equation 12.3 Y X Y
t
t1
ut
Applications of this test involve running two Granger

tests, one in each direction
14-68
Granger Causality (cont.)

That is, run Equation 12.20:
Yt = 0 + 1Yt1 + ... + pYtp + 1At1 + ... + pAtp + t
(12.20)
and also run:

At = 0 + 1At1 + ... + pAtp + 1Yt1 + ... + pYtp + t
(12.21)
testing for Granger causality in both directions by

testing the null hypothesis that the coefficients of the
lagged Ys (again, the s) jointly equal zero
If the F-test is significant for Equation 12.20 but not
for Equation 12.21, then we can conclude that
A Granger-causes Y
14-69
12-
3. Time Series Data
Time series basics:

A. Notation
D. Stationarity.
14-70
STATIONARITY
14-71
Spurious Correlation and Nonstationarity
Independent variables can appear to be more significant than they

actually are if they have the same underlying trend as the
dependent variable
Example: In a country with rampant inflation almost any nominal
variable will appear to be highly correlated with all other
nominal variables
Why?
Nominal variables are unadjusted for inflation, so every nominal
variable will have a powerful inflationary component
Such a problem is an example of spurious correlation:

a strong relationship between two or more variables that is not caused by
a real underlying causal relationship
If you run a regression in which the dependent variable and one or more
independent variables are spuriously correlated, the result is a
spurious regression, and the t-scores and overall fit of such spurious
14-72
regressions are likely to be overstated and untrustworthy
What is a main cause of spurious

correlation ?
NONSTATIONARITY TIME SERIES
Lets see what that means and how can we
correct for it
14-73
Spurious Correlation and Nonstationarity
Independent variables can appear to be more significant than they

actually are if they have the same underlying trend as the
dependent variable
Such a problem is an example of spurious correlation:
a strong relationship between two or more variables that is not caused by
a real underlying causal relationship
If you run a regression in which the dependent variable and one or more
independent variables are spuriously correlated, the result is a
spurious regression
coefficients are biased: upward or downward?

the t-scores and overall fit of such spurious regressions
are likely to be overstated and untrustworthy
14-74
Stationary and Nonstationary Time Series
14-75

a time-series variable, Xt, is stationary if:
1. the mean of Xt is constant over time,
2. the variance of Xt is constant over time, and
3. the simple correlation coefficient between Xt
and Xtk depends on the length of the lag (k) but on no
other variable (for all k)
If one or more of these properties is not met, then Xt
is nonstationary
If a series is nonstationary, that problem is often
referred to as nonstationarity
14-76

a time-series variable, Xt, is stationary if:
1. the mean of Xt is constant over time,
2. the variance of Xt is constant over time, and
3. the simple correlation coefficient between Xt
and Xtk depends on the length of the lag (k) but on no
other variable (for all k)
What is real per capita output ?
What is the growth rate for real per capita output ?
14-77
12-
To get a better understanding of these issues, consider the case

where Yt is generated by an equation that includes only past values
of itself (an autoregressive equation):
Yt =
Yt1 + vt
(12.22)
GDPt =
GDPt1 + vt
where vt is a classical error term
Can we see that if | | < 1, then the expected value of Yt will

eventually approach 0 (and therefore be stationary) as the sample
size gets bigger and bigger? (Remember, since vt is a classical error
term, its expected value = 0)
Similarly, can we see that if | | > 1, then the expected value of
Yt will continuously increase, making Yt nonstationary?
This is nonstationarity due to a trend, but it still can cause

spurious regression results
14-78

Most importantly, what about if || = 1? In this case:
Yt = Yt1 + vt (12.23)
GDPt = GDPt1 + vt
This is a random walk: the expected value of Yt does
not converge on any value, meaning that it is
nonstationary
This circumstance, where = 1 in Equation 12.23 (or
similar equations), is called a unit root
If a variable has a unit root, then Equation 12.23
holds, and the variable follows a random walk and is
nonstationary
14-79
12-
The DickeyFuller Test

From the previous discussion of stationarity and unit
roots, it makes sense to estimate Equation 12.22:
Yt = Yt1 + vt (12.22)
GDPt =
GDPt1 + vt
and then determine if || < 1 to see if Y is stationary

This is almost exactly how the Dickey-Fuller test
works:
1. Subtract Yt1 from both sides of Equation 12.22,
yielding:
(Yt Yt1) = ( 1)Yt1 + vt
(12.26)
14-80
12-
The DickeyFuller Test

(Yt Yt1) = ( 1)Yt1 + vt
GDPt - GDPt1 = (-1) GDPt1 + vt
If we define Yt = Yt Yt1 then we have the simplest
form of the DickeyFuller test:
Yt = 1Yt1 + vt
(12.27)
where 1 = 1
Note: alternative Dickey-Fuller tests additionally
include a constant and/or a constant and a trend term
2. Set up the test hypotheses:
H0: 1 = 0 (unit root)
HA: 1 < 0 (stationary)
14-81
12-
The DickeyFuller Test (cont.)

3. Set up the decision rule:
If is statistically significantly less than 0, then we can
reject the null hypothesis of nonstationarity
If is not statistically significantly less than 0, then
we cannot reject the null hypothesis of
nonstationarity
Note that the standard t-table does not apply to Dickey
Fuller tests
For the case of no constant and no trend (Equation 12.27)
the large-sample values for tc are listed on the next slide
14-82
12-
Table 12.1 Large-Sample Critical

Values for the DickeyFuller Test
14-83
12-
Augmented Dickey-Fuller tests:

what and when
14-84
9-84
When should you include a time trend in

the DF test?
The decision to use the intercept-only DF test
or the intercept & trend DF test depends on
what the alternative is and what the data
look like.
In the intercept-only specification, the
alternative is that Y is stationary around a
constant no long-term growth in the series
Yt = 0+1Yt1 + vt
In the intercept & trend specification, the
alternative is that Y is stationary around a linear
time trend the series has long-term growth.
Yt = 0+1Yt1 + 2t + vt
14-85
ln(GDPt ) = 0.244 + 0.0002t 0.030ln(GDPt1)

(0.109) (0.0001)
(0.014)
+ 0.269ln(GDPt1) + 0.178ln(GDPt2)
(0.069)
(0.070)
DF t-statstic = 2.18
Note that the standard t-table does not apply to
DickeyFuller tests
Dont compare this to 1.96 use the Dickey-Fuller
table!
14-86
DF t-statstic = 2.18 (intercept and time

trend):
t = 2.18 does not reject a unit root at 10% level.
14-87
Lets check if there is non-stationarity:

1. Which is the coefficient you have to test whether its significant ?
2. Which of the three Dickey-Fuller tables would you use ?
3. Do we have non stationarity in our study or not ?
14-88
9-88
Typical examples of spurious correlation

What was that again ?
a strong relationship between two or more
variables that is not caused by a real underlying
causal relationship
What was its main cause ?
Nonstationarity
Some more examples:
http://www.tylervigen.com/spurious-correlations
14-89
9-89
NON
STATIONARITY AND
COINTEGRATION
14-90
Cointegration
If the DickeyFuller test reveals nonstationarity, what
should we do?
The traditional approach has been to take first
differences (Y = Yt Yt1 and X = Xt Xt1) and use them
in place of Yt and Xt in the regressions
Issue: the first-differencing basically throws away
information about the possible equilibrium
relationships between the variables
Alternatively, one might want to test whether the timeseries are cointegrated, which means that even though
individual variables might be nonstationary, its possible for
linear combinations of nonstationary variables to be
stationary
14-91
Cointegration (cont.)
To see how this works, consider Equation 12.24:
(12.24)
Assume that both Yt and Xt have a unit root
Solving Equation 12.24 for ut, we get:
(12.30)
In Equation 12.24, u t is a function of two nonstationary
variables, so u t might be expected also to be nonstationary
Cointegration refers to the case where this is not the case:
Yt and Xt are both non-stationary, yet a linear combination
of them, as given by Equation 12.24, is stationary
How does this happen?
This could happen if economic theory supports Equation
12.24 as an equilibrium
14-92
Cointegration (cont.)
We thus see that if Xt and Yt are cointegrated then OLS
estimation of the coefficients in Equation 12.24 can
avoid spurious results
To determine if Xt and Yt are cointegrated, we begin with
OLS estimation of Equation 12.24 and calculate the OLS
residuals:
(12.31)
Next, perform a Dickey-Fuller test on the residuals
Remember to use the critical values from the DickeyFuller Table!
If we are able to reject the null hypothesis of a unit root
in the residuals, we can conclude that Xt and Yt are
cointegrated and our OLS estimates are not spurious
14-93
A Standard Sequence of Steps for Dealing with

Nonstationary Time Series
1. Specify the model (lags vs. no lags, etc)
2. Test all variables for nonstationarity (technically unit roots)
using the appropriate version of the DickeyFuller test
3. If the variables dont have unit roots, estimate the equation
in its original units (Y and X)
4. If the variables have unit roots, test the residuals of the
equation for cointegration using the DickeyFuller test
5. If the variables have unit roots but are not cointegrated,
then change the functional form of the model to first
differences (X and Y) and estimate the equation
6. If the variables have unit roots and also are cointegrated,
then estimate the equation in its original units
14-94

Assume we are estimating the following model:
GDPt = 0 + 0MSt + t
1. We first check if each variable is nonstationary:
How would you do that ?
2. Assume we find out both are. Please write out
step by step how you would check for
cointegration.
3. If you find no evidence of cointegration, how
can you still estimate your model correctly ?
14-95
AUTOREGRESSION
14-96
4. Autoregressions
(SW Section 14.3)
A natural starting point for a forecasting model is to
use past values of Y (that is, Yt1, Yt2,) to forecast Yt.
An autoregression is a regression model in which Yt
is regressed
against its own lagged values.
The number of lags used as regressors is called the
order of the autoregression.
In a first order autoregression, Yt is

regressed against Yt1.
In a pth order autoregression, Yt is regressed
against Yt1,Yt2,,Ytp.
14-97
The First Order Autoregressive (AR(1)) Model

The population AR(1) model is
Yt = 0 + 1Yt1 + ut
0 and 1 do not have causal interpretations
if 1 = 0, Yt1 is not useful for forecasting Yt
The AR(1) model can be estimated by an OLS
regression of Yt against Yt1 (mechanically, how
would you run this regression??)
Testing 1 = 0 v. 1 0 provides a test of the
hypothesis that Yt1 is not useful for forecasting Yt
14-98
Example: AR(1) model for the growth

rate of GDP
Estimated using data from 1962:Q1
2012:Q4:
GDPGR
t = 1.991 + 0.344GDPGRt1
R2
(0.349)
(0.075)
= 0.11
Is the lagged growth rate of GDP a useful
predictor of the current growth rate of GDP?
1. t = 0.344/.075 = 4.59 > 1.96 (in absolute value)
2. Reject H0: 1 = 0 at the 5% significance level
3. Yes, the lagged growth rate of GDP is a useful of
2
R
the current growth ratebut the
is pretty low.
14-99
The AR(p) model: using multiple lags for

forecasting
The pth order autoregressive model (AR(p)) is
Yt = 0 + 1Yt1 + 2Yt2 + + pYtp + ut
The AR(p) model uses p lags of Y as regressors
The AR(1) model is a special case
The coefficients do not have a causal interpretation
To test the hypothesis that Yt2,,Ytp do not further
help forecast Yt, beyond Yt1, use an F-test
Use t- or F-tests to determine the lag order p
Or, better, determine p using an information
criterion (more on this later)
14-100
Lag Length Selection Using

Information Criteria
How to choose the number of lags p in an AR(p)?
14-101
AR(1) model for the growth rate of GDP

Estimated using data from 1962:Q1
2012:Q4:
GDPGR
t = 1.991 + 0.344GDPGRt1
R2
(0.349)
(0.075)
= 0.11
Is the lagged growth rate of GDP a useful
predictor of the current growth rate of GDP?
1. t = 0.344/.075 = 4.59 > 1.96 (in absolute value)
2. Reject H0: 1 = 0 at the 5% significance level
3. Yes, the lagged growth rate of GDP is a useful of
2
R
the current growth ratebut the
is pretty low.
14-102
Example: AR(2) model for the growth

rate of GDP
GDPGR
t
R2
= 1.63 + 0.28GDPGRt1 + 0.17GDPGRt2

(0.40)
(0.08)
(0.08)
= 0.14
t-statistic testing lag 2 is 2.27 (p-value = .02)

R 2 increased from .11 to .14 by adding lags 2
So, lag 2 help to predicts the growth of GDP.
14-103
Lag Length Selection Using Information Criteria

(SW Section 14.5)
How to choose the number of lags p in an AR(p)?
You can use sequential downward t- or F-tests;
but the models chosen tend to be too large
Another better way to determine lag lengths is
to use an information criterion
Information criteria trade off bias (too few lags)
vs. variance (too many lags)
Two IC are the Bayes (BIC) and Akaike (AIC)
14-104
The Bayes Information Criterion (BIC)

ln T
SSR ( p )
( p 1)
BIC(p) = ln
T
T
First term: always decreasing in p (larger p, better fit)
Second term: always increasing in p.

The variance of the forecast due to estimation error
increases with p so you dont want a forecasting
model with too many coefficients but what is too
many?
This term is a penalty for using more parameters
and thus increasing the forecast variance.
Minimizing BIC(p) trades off bias and variance to determine a
best value of p for your forecast.
The result is that
BIC
p! (SW, App. 14.5)
14-105
Another information criterion: Akaike

Information Criterion (AIC)
2
SSR ( p )
( p 1)
AIC(p) = ln
T
T
BIC(p) ln SSR ( p ) ( p 1) ln T
T
T
=
The penalty term is smaller for AIC than BIC (2 <
lnT)
AIC estimates more lags (larger p) than the BIC
This might be desirable if you think longer lags
might be important.
However, the AIC estimator of p isnt consistent
it can overestimate p the penalty isnt big
enough
14-106
Example: AR model of GDP Growth, lags 0

6:
BIC chooses 2 lags, AIC chooses 2 lags.
14-107
Example: AR model of inflation, lags 0 6:

# Lags
0
1
2
3
4
5
6
BIC
1.095
1.067
0.955
0.957
0.986
1.016
1.046
AIC
1.076
1.030
0.900
0.884
0.895
0.906
0.918
R2
0.000
0.056
0.181
0.203
0.204
0.204
0.204
BIC chooses 2 lags, AIC chooses 3 lags.
14-108
Time Series Regression with Additional

Predictors and the Autoregressive
Distributed Lag (ADL) Model
Can you use lags of more than the
independent variable in your regression ?
If so, how do you decide how many for
those independent variables ?
14-109
9-
Time Series Regression with Additional

Predictors and the Autoregressive Distributed
Lag (ADL) Model
(SW Section 14.4)
So far we have considered models that use only past
values of Y
It makes sense to add other variables (X) that might
be useful predictors of Y, above and beyond the
predictive value of lagged values of Y:
Yt = 0 + 1Yt1 + + pYtp + 1Xt1 + + rXtr + ut
This is an autoregressive distributed lag model
with p lags of Y and r lags of X ADL(p,r).
14-110
Example: interest rates and the term spread
14-111
ADL(2,2) Model (1962-2012):
= 0.97 + 0.24 GDPGRt1 + 0.18 GDPGRt2

GDPGR
t
(0.48) (0.08)
(0.08)
0.14 TSpreadt1 + 0.66 TSpreadt2

(0.42)
(0.43)
R 2 0.17
F-statistic for coefficients on lags of TSpread:
F = 4.43 (p-value = 0.01)
14-112
Generalization of BIC to multivariate (ADL)

models
Let K = the total number of coefficients in the model
(intercept, lags of Y, lags of X). The BIC is,
BIC(K) =
ln T
SSR ( K )
ln
K
T
T
Can compute this over all possible combinations of lags

of Y and lags of X (but this is a lot)!
Shortcut ? Yes:
require the same number of lags for each variable
used Y, X1,X2
you might choose lags of Y by BIC, and decide
whether or not to include X using a Granger causality
test with a fixed number of lags (number depends on
the data and application)
14-113

Econometrics Chapter 14, 15 & 16 PPT Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Chapter 14, 15 & 16 PPT Slides

Uploaded by

Copyright:

Available Formats

Chapter 14, 15, 16

Copyright 2015 Pearson Education, Inc. All rights reserved.

1. Time Series Data: Whats Different?

Some monthly U.S. macro and financial

Copyright 2015 Pearson Education, Inc. All rights reserved.

Copyright 2015 Pearson Education, Inc. All rights reserved.

Monthly Percentage Change

Copyright 2015 Pearson Education, Inc. All rights reserved.

Copyright 2015 Pearson Education, Inc. All rights reserved.

Some uses of time series data

Time series data raises new technical issues

3. Time Series Data

Time series basics:

Copyright 2015 Pearson Education, Inc. All rights reserved.

Copyright 2015 Pearson Education, Inc. All rights reserved.

B. Lags, first differences, and growth

Copyright 2015 Pearson Education, Inc. All rights reserved.

3. Time Series Data

Time series basics:

Copyright 2015 Pearson Education, Inc. All rights reserved.

C. Autocorrelation (serial correlation)

Pure Auto/Serial Correlation

Pure Serial Correlation (cont.)

As a result of this, we can state that:

Pure Serial Correlation (cont.)

Positive or Negative Serial

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

Impure Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

Impure Serial Correlation (OV)

Instead, the error term is also a function of one of the

1. X2 itself is serially correlated (this is quite likely in a

U.S. Disposable Income as a

2011 Pearson Addison-Wesley. All rights reserved.

Impure Serial Correlation

Figure 9.5a Incorrect Functional Form as a

2011 Pearson Addison-Wesley. All rights reserved.

Incorrect Functional Form as a Source of

2011 Pearson Addison-Wesley. All rights reserved.

The Consequences of Serial

Pure serial correlation does not cause bias in the coefficient

Serial correlation causes OLS to no longer be the minimum

Serial correlation causes the OLS estimates of the SE to be

The DurbinWatson d Test

Two main ways to detect serial correlation:

Informal: observing a pattern in the residuals like we did in figures

First, it is important to note that the DurbinWatson d test is only

3. The regression model does not include a lagged dependent

2. Determine the sample size and the number of

In rare circumstances, perhaps first differenced equations, a

(no serial correlation)

if 4 dU > d > dU Do Not Reject H0

Figure 9.6 An Example of a OneSided DurbinWatson d Test

2011 Pearson Addison-Wesley. All rights reserved.

Copyright 2015 Pearson Education, Inc. All rights reserved.

More in class practice and examples

Which are significant ?

More in class practice and examples

4. How would you test whether your model suffers from