You are on page 1of 819

STATA TIME-SERIES

REFERENCE MANUAL
RELEASE 13

A Stata Press Publication


StataCorp LP
College Station, Texas

c 19852013 StataCorp LP
Copyright
All rights reserved
Version 13

Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845
Typeset in TEX
ISBN-10: 1-59718-127-7
ISBN-13: 978-1-59718-127-3
This manual is protected by copyright. All rights are reserved. No part of this manual may be reproduced, stored
in a retrieval system, or transcribed, in any form or by any meanselectronic, mechanical, photocopy, recording, or
otherwisewithout the prior written permission of StataCorp LP unless permitted subject to the terms and conditions
of a license granted to you by StataCorp LP to use the software and documentation. No license, express or implied,
by estoppel or otherwise, to any intellectual property rights is granted by this document.
StataCorp provides this manual as is without warranty of any kind, either expressed or implied, including, but
not limited to, the implied warranties of merchantability and fitness for a particular purpose. StataCorp may make
improvements and/or changes in the product(s) and the program(s) described in this manual at any time and without
notice.
The software described in this manual is furnished under a license agreement or nondisclosure agreement. The software
may be copied only in accordance with the terms of the agreement. It is against the law to copy the software onto
DVD, CD, disk, diskette, tape, or any other medium for any purpose other than backup or archival purposes.
c 1979 by Consumers Union of U.S.,
The automobile dataset appearing on the accompanying media is Copyright
Inc., Yonkers, NY 10703-1057 and is reproduced by permission from CONSUMER REPORTS, April 1979.
Stata,

, Stata Press, Mata,

, and NetCourse are registered trademarks of StataCorp LP.

Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the United Nations.
NetCourseNow is a trademark of StataCorp LP.
Other brand and product names are registered trademarks or trademarks of their respective companies.
For copyright information about the software, type help copyright within Stata.

The suggested citation for this software is


StataCorp. 2013. Stata: Release 13 . Statistical Software. College Station, TX: StataCorp LP.

Contents
intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to time-series manual
time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to time-series commands

1
2

arch . . . . . . . . . Autoregressive conditional heteroskedasticity (ARCH) family of estimators


arch postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arch
arfima . . . . . . . . . . . . . . . . . . . Autoregressive fractionally integrated moving-average models
arfima postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arfima
arima . . . . . . . . . . . . . . . . . . . . . . . ARIMA, ARMAX, and other dynamic regression models
arima postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arima

10
43
48
66
74
98

corrgram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tabulate and graph autocorrelations 106


cumsp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative spectral distribution 114
dfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic-factor models
dfactor postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for dfactor
dfgls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DF-GLS unit-root test
dfuller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Augmented DickeyFuller unit-root test

117
134
139
145

estat acplot . . . . . . . . . . . . . . . . Plot parametric autocorrelation and autocovariance functions 150


estat aroots . . . . . . . . . . . . . . . . . . . . . . . . Check the stability condition of ARIMA estimates 154
fcast compute . . . . . . . . . . . . . . . . . . . . . . Compute dynamic forecasts after var, svar, or vec
fcast graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph forecasts after fcast compute
forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Econometric model forecasting
forecast adjust . . . . . . . . . . . . . . . . . . . . . . Adjust a variable by add factoring, replacing, etc.
forecast clear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clear current model from memory
forecast coefvector . . . . . . . . . . . . . . . . . . . . . . . Specify an equation via a coefficient vector
forecast create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a new forecast model
forecast describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe features of the forecast model
forecast drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drop forecast variables
forecast estimates . . . . . . . . . . . . . . . . . . . . . . . . . Add estimation results to a forecast model
forecast exogenous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare exogenous variables
forecast identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add an identity to a forecast model
forecast list . . . . . . . . . . . . . . . . . . . . . . . . List forecast commands composing current model
forecast query . . . . . . . . . . . . . . . . . . . . . . Check whether a forecast model has been started
forecast solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain static and dynamic forecasts

159
167
170
184
189
190
195
197
202
204
214
216
218
220
221

irf
irf
irf
irf
irf
irf
irf
irf
irf
irf
irf
irf

236
240
242
246
271
276
279
281
287
292
294
296

. . . . . . . . . . . . . . . . . Create and analyze IRFs, dynamic-multiplier functions, and FEVDs


add . . . . . . . . . . . . . . . . . . . . . . . . . . . Add results from an IRF file to the active IRF file
cgraph . . . . . . . . . . Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs
create . . . . . . . . . . . . . . . . . . . . . Obtain IRFs, dynamic-multiplier functions, and FEVDs
ctable . . . . . . . . . . . Combined tables of IRFs, dynamic-multiplier functions, and FEVDs
describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe an IRF file
drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drop IRF results from the active IRF file
graph . . . . . . . . . . . . . . . . . . . . Graphs of IRFs, dynamic-multiplier functions, and FEVDs
ograph . . . . . . . . . . . Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs
rename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rename an IRF result in an IRF file
set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set the active IRF file
table . . . . . . . . . . . . . . . . . . . . Tables of IRFs, dynamic-multiplier functions, and FEVDs
i

ii

Contents

mgarch
mgarch
mgarch
mgarch
mgarch
mgarch
mgarch
mgarch
mgarch

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate GARCH models


ccc . . . . . . . . . . . . . . Constant conditional correlation multivariate GARCH models
ccc postestimation . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch ccc
dcc . . . . . . . . . . . . . . Dynamic conditional correlation multivariate GARCH models
dcc postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch dcc
dvech . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagonal vech multivariate GARCH models
dvech postestimation . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch dvech
vcc . . . . . . . . . . . . . . . Varying conditional correlation multivariate GARCH models
vcc postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch vcc

301
307
322
326
341
345
357
364
379

newey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression with NeweyWest standard errors 383


newey postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for newey 388
pergram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Periodogram
pperron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PhillipsPerron unit-root test
prais . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prais Winsten and Cochrane Orcutt regression
prais postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for prais
psdensity . . . . . . . . . . . . Parametric spectral density estimation after arima, arfima, and ucm

393
401
406
417
419

rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rolling-window and recursive estimation 429


sspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State-space models 437
sspace postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for sspace 461
tsappend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add observations to a time-series dataset
tsfill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fill in gaps in time variable
tsfilter . . . . . . . . . . . . . . . . . . . . . . . . . Filter a time-series, keeping only selected periodicities
tsfilter bk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BaxterKing time-series filter
tsfilter bw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Butterworth time-series filter
tsfilter cf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ChristianoFitzgerald time-series filter
tsfilter hp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HodrickPrescott time-series filter
tsline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot time-series data
tsreport . . . . . . . . . . . . . . . . . . . Report time-series aspects of a dataset or estimation sample
tsrevar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time-series operator programming command
tsset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare data to be time-series data
tssmooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smooth and forecast univariate time-series data
tssmooth dexponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double-exponential smoothing
tssmooth exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-exponential smoothing
tssmooth hwinters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HoltWinters nonseasonal smoothing
tssmooth ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving-average filter
tssmooth nl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear filter
tssmooth shwinters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HoltWinters seasonal smoothing

468
474
478
497
505
514
522
529
535
541
544
560
562
568
576
583
588
590

ucm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unobserved-components model 599


ucm postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ucm 626
var intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to vector autoregressive models
var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector autoregressive models
var postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for var
var svar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural vector autoregressive models
var svar postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for svar
varbasic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit a simple VAR and graph IRFs or FEVDs
varbasic postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for varbasic
vargranger . . . . . . . . . . . . . . . . . . . Perform pairwise Granger causality tests after var or svar

632
639
651
655
675
678
683
686

Contents

varlmar . . . . . . . . . . . . . . . . . . Perform LM test for residual autocorrelation after var or svar


varnorm . . . . . . . . . . . . . . . . . . . . Test for normally distributed disturbances after var or svar
varsoc . . . . . . . . . . . . . . . . . . . . . . Obtain lag-order selection statistics for VARs and VECMs
varstable . . . . . . . . . . . . . . . . . . . . . Check the stability condition of VAR or SVAR estimates
varwle . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain Wald lag-exclusion statistics after var or svar
vec intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to vector error-correction models
vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector error-correction models
vec postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for vec
veclmar . . . . . . . . . . . . . . . . . . . . . . . Perform LM test for residual autocorrelation after vec
vecnorm . . . . . . . . . . . . . . . . . . . . . . . . . . Test for normally distributed disturbances after vec
vecrank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate the cointegrating rank of a VECM
vecstable . . . . . . . . . . . . . . . . . . . . . . . . . . Check the stability condition of VECM estimates

iii

691
694
700
706
711
716
735
759
762
765
768
776

wntestb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bartletts periodogram-based test for white noise 780


wntestq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Portmanteau (Q) test for white noise 785
xcorr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-correlogram for bivariate time series 788
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

792

Subject and author index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

801

Cross-referencing the documentation


When reading this manual, you will find references to other Stata manuals. For example,
[U] 26 Overview of Stata estimation commands
[R] regress
[D] reshape

The first example is a reference to chapter 26, Overview of Stata estimation commands, in the Users
Guide; the second is a reference to the regress entry in the Base Reference Manual; and the third
is a reference to the reshape entry in the Data Management Reference Manual.
All the manuals in the Stata Documentation have a shorthand notation:
[GSM]
[GSU]
[GSW]
[U ]
[R]
[D ]
[G ]
[XT]
[ME]
[MI]
[MV]
[PSS]
[P ]
[SEM]
[SVY]
[ST]
[TS]
[TE]
[I]

Getting Started with Stata for Mac


Getting Started with Stata for Unix
Getting Started with Stata for Windows
Stata Users Guide
Stata Base Reference Manual
Stata Data Management Reference Manual
Stata Graphics Reference Manual
Stata Longitudinal-Data/Panel-Data Reference Manual
Stata Multilevel Mixed-Effects Reference Manual
Stata Multiple-Imputation Reference Manual
Stata Multivariate Statistics Reference Manual
Stata Power and Sample-Size Reference Manual
Stata Programming Reference Manual
Stata Structural Equation Modeling Reference Manual
Stata Survey Data Reference Manual
Stata Survival Analysis and Epidemiological Tables Reference Manual
Stata Time-Series Reference Manual
Stata Treatment-Effects Reference Manual:
Potential Outcomes/Counterfactual Outcomes
Stata Glossary and Index

[M ]

Mata Reference Manual

Title
intro Introduction to time-series manual

Description

Remarks and examples

Also see

Description
This entry describes this manual and what has changed since Stata 12.

Remarks and examples


This manual documents Statas time-series commands and is referred to as [TS] in cross-references.
After this entry, [TS] time series provides an overview of the ts commands. The other parts of
this manual are arranged alphabetically. If you are new to Statas time-series features, we recommend
that you read the following sections first:
[TS] time series
[TS] tsset

Introduction to time-series commands


Declare a dataset to be time-series data

Stata is continually being updated, and Stata users are always writing new commands. To ensure
that you have the latest features, you should install the most recent official update; see [R] update.

Whats new
For a complete list of all the new features in Stata 13, see [U] 1.3 Whats new.

Also see
[U] 1.3 Whats new

[R] intro Introduction to base reference manual

Title
time series Introduction to time-series commands

Description

Remarks and examples

References

Also see

Description
The Time-Series Reference Manual organizes the commands alphabetically, making it easy to find
individual command entries if you know the name of the command. This overview organizes and
presents the commands conceptually, that is, according to the similarities in the functions that they
perform. The table below lists the manual entries that you should see for additional information.
Data management tools and time-series operators.
These commands help you prepare your data for further analysis.
Univariate time series.
These commands are grouped together because they are either estimators or filters designed for
univariate time series or preestimation or postestimation commands that are conceptually related
to one or more univariate time-series estimators.
Multivariate time series.
These commands are similarly grouped together because they are either estimators designed for
use with multivariate time series or preestimation or postestimation commands conceptually related
to one or more multivariate time-series estimators.
Forecasting models.
These commands work as a group to provide the tools you need to create models by combining
estimation results, identities, and other objects and to solve those models to obtain forecasts.

Within these three broad categories, similar commands have been grouped together.

Data management tools and time-series operators


[TS] tsset
Declare data to be time-series data
[TS] tsfill
Fill in gaps in time variable
[TS] tsappend
Add observations to a time-series dataset
[TS] tsreport
Report time-series aspects of a dataset or estimation
sample
[TS] tsrevar
Time-series operator programming command
[TS] rolling
Rolling-window and recursive estimation
[D] datetime business calendars User-definable business calendars

time series Introduction to time-series commands

Univariate time series


Estimators
[TS] arfima

[TS]
[TS]
[TS]
[TS]

arfima postestimation
arima
arima postestimation
arch

[TS]
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]

arch postestimation
newey
newey postestimation
prais
prais postestimation
ucm
ucm postestimation

Time-series smoothers and filters


[TS] tsfilter bk
[TS] tsfilter bw
[TS] tsfilter cf
[TS] tsfilter hp
[TS] tssmooth ma
[TS] tssmooth dexponential
[TS] tssmooth exponential
[TS] tssmooth hwinters
[TS] tssmooth shwinters
[TS] tssmooth nl

Autoregressive fractionally integrated moving-average


models
Postestimation tools for arfima
ARIMA, ARMAX, and other dynamic regression models
Postestimation tools for arima
Autoregressive conditional heteroskedasticity (ARCH)
family of estimators
Postestimation tools for arch
Regression with NeweyWest standard errors
Postestimation tools for newey
PraisWinsten and CochraneOrcutt regression
Postestimation tools for prais
Unobserved-components model
Postestimation tools for ucm

BaxterKing time-series filter


Butterworth time-series filter
ChristianoFitzgerald time-series filter
HodrickPrescott time-series filter
Moving-average filter
Double-exponential smoothing
Single-exponential smoothing
HoltWinters nonseasonal smoothing
HoltWinters seasonal smoothing
Nonlinear filter

Diagnostic tools
[TS] corrgram
Tabulate and graph autocorrelations
[TS] xcorr
Cross-correlogram for bivariate time series
[TS] cumsp
Cumulative spectral distribution
[TS] pergram
Periodogram
[TS] psdensity
Parametric spectral density estimation
[TS] estat acplot
Plot parametric autocorrelation and autocovariance functions
[TS] estat aroots
Check the stability condition of ARIMA estimates
[TS] dfgls
DF-GLS unit-root test
[TS] dfuller
Augmented DickeyFuller unit-root test
[TS] pperron
PhillipsPerron unit-root test
[R] regress postestimation time series Postestimation tools for regress with time series
[TS] wntestb
Bartletts periodogram-based test for white noise
[TS] wntestq
Portmanteau (Q) test for white noise

time series Introduction to time-series commands

Multivariate time series


Estimators
[TS] dfactor
[TS] dfactor postestimation
[TS] mgarch ccc
[TS] mgarch ccc postestimation
[TS] mgarch dcc
[TS] mgarch dcc postestimation
[TS] mgarch dvech
[TS] mgarch dvech postestimation
[TS] mgarch vcc
[TS] mgarch vcc postestimation
[TS] sspace
[TS] sspace postestimation
[TS] var
[TS] var postestimation
[TS] var svar
[TS] var svar postestimation
[TS] varbasic
[TS] varbasic postestimation
[TS] vec
[TS] vec postestimation

Dynamic-factor models
Postestimation tools for dfactor
Constant conditional correlation multivariate GARCH models
Postestimation tools for mgarch ccc
Dynamic conditional correlation multivariate GARCH models
Postestimation tools for mgarch dcc
Diagonal vech multivariate GARCH models
Postestimation tools for mgarch dvech
Varying conditional correlation multivariate GARCH models
Postestimation tools for mgarch vcc
State-space models
Postestimation tools for sspace
Vector autoregressive models
Postestimation tools for var
Structural vector autoregressive models
Postestimation tools for svar
Fit a simple VAR and graph IRFs or FEVDs
Postestimation tools for varbasic
Vector error-correction models
Postestimation tools for vec

Diagnostic tools
[TS] varlmar
[TS] varnorm
[TS] varsoc
[TS] varstable
[TS] varwle
[TS] veclmar
[TS] vecnorm
[TS] vecrank
[TS] vecstable

Perform LM test for residual autocorrelation


Test for normally distributed disturbances
Obtain lag-order selection statistics for VARs and VECMs
Check the stability condition of VAR or SVAR estimates
Obtain Wald lag-exclusion statistics
Perform LM test for residual autocorrelation
Test for normally distributed disturbances
Estimate the cointegrating rank of a VECM
Check the stability condition of VECM estimates

Forecasting, inference, and interpretation


[TS] irf create
Obtain IRFs, dynamic-multiplier functions, and FEVDs
[TS] fcast compute
Compute dynamic forecasts after var, svar, or vec
[TS] vargranger
Perform pairwise Granger causality tests

time series Introduction to time-series commands

Graphs and tables


[TS] corrgram
[TS] xcorr
[TS] pergram
[TS] irf graph
[TS] irf cgraph
[TS] irf ograph
[TS] irf table
[TS] irf ctable
[TS] fcast graph
[TS] tsline
[TS] varstable
[TS] vecstable
[TS] wntestb

Tabulate and graph autocorrelations


Cross-correlogram for bivariate time series
Periodogram
Graphs of IRFs, dynamic-multiplier functions, and FEVDs
Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs
Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs
Tables of IRFs, dynamic-multiplier functions, and FEVDs
Combined tables of IRFs, dynamic-multiplier functions, and FEVDs
Graph forecasts after fcast compute
Plot time-series data
Check the stability condition of VAR or SVAR estimates
Check the stability condition of VECM estimates
Bartletts periodogram-based test for white noise

Results management tools


[TS] irf add
[TS] irf describe
[TS] irf drop
[TS] irf rename
[TS] irf set

Add results from an IRF file to the active IRF file


Describe an IRF file
Drop IRF results from the active IRF file
Rename an IRF result in an IRF file
Set the active IRF file

Forecasting models
[TS] forecast
[TS] forecast adjust
[TS] forecast clear
[TS] forecast coefvector
[TS] forecast create
[TS] forecast describe
[TS] forecast drop
[TS] forecast estimates
[TS] forecast exogenous
[TS] forecast identity
[TS] forecast list
[TS] forecast query
[TS] forecast solve

Econometric model forecasting


Adjust a variable by add factoring, replacing, etc.
Clear current model from memory
Specify an equation via a coefficient vector
Create a new forecast model
Describe features of the forecast model
Drop forecast variables
Add estimation results to a forecast model
Declare exogenous variables
Add an identity to a forecast model
List forecast commands composing current model
Check whether a forecast model has been started
Obtain static and dynamic forecasts

Remarks and examples


Remarks are presented under the following headings:
Data management tools and time-series operators
Univariate time series
Estimators
Time-series smoothers and filters
Diagnostic tools
Multivariate time series
Estimators
Diagnostic tools
Forecasting models

time series Introduction to time-series commands

We also offer a NetCourse on Statas time-series capabilities; see


http://www.stata.com/netcourse/nc461.html.

Data management tools and time-series operators


Because time-series estimators are, by definition, a function of the temporal ordering of the
observations in the estimation sample, Statas time-series commands require the data to be sorted and
indexed by time, using the tsset command, before they can be used. tsset is simply a way for you
to tell Stata which variable in your dataset represents time; tsset then sorts and indexes the data
appropriately for use with the time-series commands. Once your dataset has been tsset, you can
use Statas time-series operators in data manipulation or programming using that dataset and when
specifying the syntax for most time-series commands. Stata has time-series operators for representing
the lags, leads, differences, and seasonal differences of a variable. The time-series operators are
documented in [TS] tsset.
You can also define a business-day calendar so that Statas time-series operators respect the structure
of missing observations in your data. The most common example is having Monday come after Friday
in market data. [D] datetime business calendars provides a discussion and examples.
tsset can also be used to declare that your dataset contains cross-sectional time-series data, often
referred to as panel data. When you use tsset to declare your dataset to contain panel data, you
specify a variable that identifies the panels and a variable that identifies the time periods. Once your
dataset has been tsset as panel data, the time-series operators work appropriately for the data.
tsfill, which is documented in [TS] tsfill, can be used after tsset to fill in missing times with
missing observations. tsset will report any gaps in your data, and tsreport will provide more
details about the gaps. tsappend adds observations to a time-series dataset by using the information
set by tsset. This function can be particularly useful when you wish to predict out of sample after
fitting a model with a time-series estimator. tsrevar is a programmers command that provides a
way to use varlists that contain time-series operators with commands that do not otherwise support
time-series operators.
rolling performs rolling regressions, recursive regressions, and reverse recursive regressions.
Any command that stores results in e() or r() can be used with rolling.

Univariate time series


Estimators
The six univariate time-series estimators currently available in Stata are arfima, arima, arch,
newey, prais, and ucm. newey and prais are really just extensions to ordinary linear regression.
When you fit a linear regression on time-series data via ordinary least squares (OLS), if the disturbances
are autocorrelated, the parameter estimates are usually consistent, but the estimated standard errors
tend to be underestimated. Several estimators have been developed to deal with this problem. One
strategy is to use OLS for estimating the regression parameters and use a different estimator for the
variances, one that is consistent in the presence of autocorrelated disturbances, such as the NeweyWest
estimator implemented in newey. Another strategy is to model the dynamics of the disturbances. The
estimators found in prais, arima, arch, arfima, and ucm are based on such a strategy.
prais implements two such estimators: the PraisWinsten and the CochraneOrcutt generalized
least-squares (GLS) estimators. These estimators are GLS estimators, but they are fairly restrictive
in that they permit only first-order autocorrelation in the disturbances. Although they have certain
pedagogical and historical value, they are somewhat obsolete. Faster computers with more memory

time series Introduction to time-series commands

have made it possible to implement full information maximum likelihood (FIML) estimators, such
as Statas arima command. These estimators permit much greater flexibility when modeling the
disturbances and are more efficient estimators.
arima provides the means to fit linear models with autoregressive moving-average (ARMA)
disturbances, or in the absence of linear predictors, autoregressive integrated moving-average (ARIMA)
models. This means that, whether you think that your data are best represented as a distributed-lag
model, a transfer-function model, or a stochastic difference equation, or you simply wish to apply
a BoxJenkins filter to your data, the model can be fit using arima. arch, a conditional maximum
likelihood estimator, has similar modeling capabilities for the mean of the time series but can also model
autoregressive conditional heteroskedasticity in the disturbances with a wide variety of specifications
for the variance equation.
arfima estimates the parameters of autoregressive fractionally integrated moving-average (ARFIMA)
models, which handle higher degrees of dependence than ARIMA models. ARFIMA models allow the
autocorrelations to decay at the slower hyperbolic rate, whereas ARIMA models handle processes
whose autocorrelations decay at an exponential rate.
Unobserved-components models (UCMs) decompose a time series into trend, seasonal, cyclical,
and idiosyncratic components and allow for exogenous variables. ucm estimates the parameters of
UCMs by maximum likelihood. UCMs can also model the stationary cyclical component using the
stochastic-cycle parameterization that has an intuitive frequency-domain interpretation.
Time-series smoothers and filters
In addition to the estimators mentioned above, Stata also provides time-series filters and smoothers.
The BaxterKing and ChristianoFitzgerald band-pass filters and the Butterworth and HodrickPrescott
high-pass filters are implemented in tsfilter; see [TS] tsfilter for an overview.
Also included are a simple, uniformly weighted, moving-average filter with unit weights; a
weighted moving-average filter in which you can specify the weights; single- and double-exponential
smoothers; HoltWinters seasonal and nonseasonal smoothers; and a nonlinear smoother. Most of
these smoothers were originally developed as ad hoc procedures and are used for reducing the noise in
a time series (smoothing) or forecasting. Although they have limited application for signal extraction,
these smoothers have all been found to be optimal for some underlying modern time-series models;
see [TS] tssmooth.
Diagnostic tools
Statas time-series commands also include several preestimation and postestimation diagnostic and
interpretation commands. corrgram estimates the autocorrelation function and partial autocorrelation
function of a univariate time series, as well as Q statistics. These functions and statistics are often used
to determine the appropriate model specification before fitting ARIMA models. corrgram can also be
used with wntestb and wntestq to examine the residuals after fitting a model for evidence of model
misspecification. Statas time-series commands also include the commands pergram and cumsp,
which provide the log-standardized periodogram and the cumulative-sample spectral distribution,
respectively, for time-series analysts who prefer to estimate in the frequency domain rather than the
time domain.
psdensity computes the spectral density implied by the parameters estimated by arfima, arima,
or ucm. The estimated spectral density shows the relative importance of components at different
frequencies. estat acplot computes the autocorrelation and autocovariance functions implied by
the parameters estimated by arima. These functions provide a measure of the dependence structure
in the time domain.

time series Introduction to time-series commands

xcorr estimates the cross-correlogram for bivariate time series and can similarly be used for both
preestimation and postestimation. For example, the cross-correlogram can be used before fitting a
transfer-function model to produce initial estimates of the IRF. This estimate can then be used to
determine the optimal lag length of the input series to include in the model specification. It can
also be used as a postestimation tool after fitting a transfer function. The cross-correlogram between
the residual from a transfer-function model and the prewhitened input series of the model can be
examined for evidence of model misspecification.
When you fit ARMA or ARIMA models, the dependent variable being modeled must be covariance
stationary (ARMA models), or the order of integration must be known (ARIMA models). Stata has three
commands that can test for the presence of a unit root in a time-series variable: dfuller performs
the augmented DickeyFuller test, pperron performs the PhillipsPerron test, and dfgls performs
a modified DickeyFuller test. arfima can also be used to investigate the order of integration. After
estimation, you can use estat aroots to check the stationarity of an ARMA process.
The remaining diagnostic tools for univariate time series are for use after fitting a linear model via
OLS with Statas regress command. They are documented collectively in [R] regress postestimation

time series. They include estat dwatson, estat durbinalt, estat bgodfrey, and estat
archlm. estat dwatson computes the DurbinWatson d statistic to test for the presence of firstorder autocorrelation in the OLS residuals. estat durbinalt likewise tests for the presence of
autocorrelation in the residuals. By comparison, however, Durbins alternative test is more general
and easier to use than the DurbinWatson test. With estat durbinalt, you can test for higher
orders of autocorrelation, the assumption that the covariates in the model are strictly exogenous is
relaxed, and there is no need to consult tables to compute rejection regions, as you must with the
DurbinWatson test. estat bgodfrey computes the BreuschGodfrey test for autocorrelation in the
residuals, and although the computations are different, the test in estat bgodfrey is asymptotically
equivalent to the test in estat durbinalt. Finally, estat archlm performs Engles LM test for the
presence of autoregressive conditional heteroskedasticity.

Multivariate time series


Estimators
Stata provides commands for fitting the most widely applied multivariate time-series models. var
and svar fit vector autoregressive and structural vector autoregressive models to stationary data. vec
fits cointegrating vector error-correction models. dfactor fits dynamic-factor models. mgarch ccc,
mgarch dcc, mgarch dvech, and mgarch vcc fit multivariate GARCH models. sspace fits state-space
models. Many linear time-series models, including vector autoregressive moving-average (VARMA)
models and structural time-series models, can be cast as state-space models and fit by sspace.
Diagnostic tools
Before fitting a multivariate time-series model, you must specify the number of lags of the dependent
variable to include. varsoc produces statistics for determining the order of a VAR or VECM.
Several postestimation commands perform the most common specification analysis on a previously
fitted VAR or SVAR. You can use varlmar to check for serial correlation in the residuals, varnorm
to test the null hypothesis that the disturbances come from a multivariate normal distribution, and
varstable to see if the fitted VAR or SVAR is stable. Two common types of inference about VAR
models are whether one variable Granger-causes another and whether a set of lags can be excluded
from the model. vargranger reports Wald tests of Granger causation, and varwle reports Wald lag
exclusion tests.

time series Introduction to time-series commands

Similarly, several postestimation commands perform the most common specification analysis on a
previously fitted VECM. You can use veclmar to check for serial correlation in the residuals, vecnorm
to test the null hypothesis that the disturbances come from a multivariate normal distribution, and
vecstable to analyze the stability of the previously fitted VECM.
VARs and VECMs are often fit to produce baseline forecasts. fcast produces dynamic forecasts
from previously fitted VARs and VECMs.

Many researchers fit VARs, SVARs, and VECMs because they want to analyze how unexpected
shocks affect the dynamic paths of the variables. Stata has a suite of irf commands for estimating
IRF functions and interpreting, presenting, and managing these estimates; see [TS] irf.

Forecasting models
Stata provides a set of commands for obtaining forecasts by solving models, collections of equations
that jointly determine the outcomes of one or more variables. You use Stata estimation commands such
as regress, reg3, var, and vec to fit stochastic equations and store the results using estimates
store. Then you create a forecast model using forecast create and use commands, including
forecast estimates and forecast identity, to build models consisting of estimation results,
nonstochastic relationships (identities), and other model features. Models can be as simple as a single
linear regression for which you want to obtain dynamic forecasts, or they can be complicated systems
consisting of dozens of estimation results and identities representing a complete macroeconometric
model.
The forecast solve command allows you to obtain both stochastic and dynamic forecasts.
Confidence intervals for forecasts can be obtained via stochastic simulation incorporating both
parameter uncertainty and additive random shocks. By using forecast adjust, you can incorporate
outside information and specify different paths for some of the models variables to obtain forecasts
under alternative scenarios.

References
Baum, C. F. 2005. Stata: The language of choice for time-series analysis? Stata Journal 5: 4663.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Pisati, M. 2001. sg162: Tools for spatial data analysis. Stata Technical Bulletin 60: 2137. Reprinted in Stata Technical
Bulletin Reprints, vol. 10, pp. 277298. College Station, TX: Stata Press.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101115.

Also see
[U] 1.3 Whats new

[R] intro Introduction to base reference manual

Title
arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
arch depvar

indepvars

options

 

if

 

in

 

weight

 

, options

Description

Model

noconstant
arch(numlist)
garch(numlist)
saarch(numlist)
tarch(numlist)
aarch(numlist)
narch(numlist)
narchk(numlist)
abarch(numlist)
atarch(numlist)
sdgarch(numlist)
earch(numlist)
egarch(numlist)
parch(numlist)
tparch(numlist)
aparch(numlist)
nparch(numlist)
nparchk(numlist)
pgarch(numlist)
constraints(constraints)
collinear

suppress constant term


ARCH terms
GARCH terms

simple asymmetric ARCH terms


threshold ARCH terms
asymmetric ARCH terms
nonlinear ARCH terms
nonlinear ARCH terms with single shift
absolute value ARCH terms
absolute threshold ARCH terms
lags of t
news terms in Nelsons (1991) EGARCH model
lags of ln(t2 )
power ARCH terms
threshold power ARCH terms
asymmetric power ARCH terms
nonlinear power ARCH terms
nonlinear power ARCH terms with single shift
power GARCH terms
apply specified linear constraints
keep collinear variables

Model 2

archm
archmlags(numlist)
archmexp(exp)
arima(# p ,# d ,# q )
ar(numlist)
ma(numlist)

include ARCH-in-mean term in the mean-equation specification


include specified lags of conditional variance in mean equation
apply transformation in exp to any ARCH-in-mean terms
specify ARIMA(p, d, q) model for dependent variable
autoregressive terms of the structural model disturbance
moving-average terms of the structural model disturbances

Model 3

 
distribution(dist # )
het(varlist)
savespace

use dist distribution for errors (may be gaussian, normal, t,


or ged; default is gaussian)
include varlist in the specification of the conditional variance
conserve memory during estimation
10

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

11

Priming

arch0(xb)
arch0(xb0)
arch0(xbwt)
arch0(xb0wt)
arch0(zero)
arch0(#)
arma0(zero)
arma0(p)
arma0(q)
arma0(pq)
arma0(#)
condobs(#)

compute priming values on the basis of the expected unconditional


variance; the default
compute priming values on the basis of the estimated variance of the
residuals from OLS
compute priming values on the basis of the weighted sum of squares
from OLS residuals
compute priming values on the basis of the weighted sum of squares
from OLS residuals, with more weight at earlier times
set priming values of ARCH terms to zero
set priming values of ARCH terms to #
set all priming values of ARMA terms to zero; the default
begin estimation after observation p, where p is the
maximum AR lag in model
begin estimation after observation q , where q is the
maximum MA lag in model
begin estimation after observation (p + q )
set priming values of ARMA terms to #
set conditioning observations at the start of the sample to #

SE/Robust

vce(vcetype)

vcetype may be opg, robust, or oim

Reporting

level(#)
detail
nocnsreport
display options

set confidence level; default is level(95)


report list of gaps in time series
do not display constraints
control column formats, row spacing, and line width

Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

You must tsset your data before using arch; see [TS] tsset.
depvar and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
iweights are allowed; see [U] 11.1.6 weight.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

To fit an ARCH(# m ) model with Gaussian errors, type


. arch depvar

. . . , arch(1/#m )

To fit a GARCH(# m , # k ) model assuming that the errors follow Students t distribution with 7 degrees
of freedom, type
. arch depvar

. . . , arch(1/#m ) garch(1/#k ) distribution(t 7)

You can also fit many other models.

12

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

Details of syntax
The basic model arch fits is

yt = xt + t
Var(t ) = t2 = 0 + A(, ) + B(, )2

(1)

The yt equation may optionally include ARCH-in-mean and ARMA terms:

yt = xt +

2
i g(ti
) + ARMA(p, q) + t

If no options are specified, A() = B() = 0, and the model collapses to linear regression. The
following options add to A() (, , and represent parameters to be estimated):
Terms added to A()

Option
arch()

A() = A()+ 1,1 2t1 + 1,2 2t2 +

garch()

2
2
A() = A()+ 2,1 t1
+ 2,2 t2
+

saarch()

A() = A()+ 3,1 t1 + 3,2 t2 +

tarch()

A() = A()+ 4,1 2t1 (t1 > 0) + 4,2 2t2 (t2 > 0) +

aarch()

A() = A()+ 5,1 (|t1 | + 5,1 t1 )2 + 5,2 (|t2 | + 5,2 t2 )2 +

narch()

A() = A()+ 6,1 (t1 6,1 )2 + 6,2 (t2 6,2 )2 +

narchk()

A() = A()+ 7,1 (t1 7 )2 + 7,2 (t2 7 )2 +

The following options add to B():


Terms added to B()

Option
abarch()

B() = B()+ 8,1 |t1 | + 8,2 |t2 | +

atarch()

B() = B()+ 9,1 |t1 |(t1 > 0) + 9,2 |t2 |(t2 > 0) +

sdgarch()

B() = B()+ 10,1 t1 + 10,2 t2 +

Each option requires a numlist argument (see [U] 11.1.8 numlist), which determines the lagged
terms included. arch(1) specifies 1,1 2t1 , arch(2) specifies 1,2 2t2 , arch(1,2) specifies
1,1 2t1 + 1,2 2t2 , arch(1/3) specifies 1,1 2t1 + 1,2 2t2 + 1,3 2t3 , etc.
If the earch() or egarch() option is specified, the basic model fit is

yt = xt +

2
i g(ti
) + ARMA(p, q) + t

(2)

lnVar(t ) = lnt2 = 0 + C( ln, z) + A(, ) + B(, )2


where zt = t /t . A() and B() are given as above, but A() and B() now add to lnt2 rather than
t2 . (The options corresponding to A() and B() are rarely specified here.) C() is given by

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

13

Terms added to C()

Option
earch()

p
C() = C() +11,1 zt1 + 11,1 (|zt1 | p2/)
+11,2 zt2 + 11,2 (|zt2 | 2/) +

egarch()

2
2
C() = C() +12,1 lnt1
+ 12,2 lnt2
+

Instead, if the parch(), tparch(), aparch(), nparch(), nparchk(), or pgarch() options are
specified, the basic model fit is
X
2
yt = xt +
i g(ti
) + ARMA(p, q) + t
i
(3)
{Var(t )}/2 = t = 0 + D(, ) + A(, ) + B(, )2
where is a parameter to be estimated. A() and B() are given as above, but A() and B() now add

to t . (The options corresponding to A() and B() are rarely specified here.) D() is given by
Terms added to D()

Option
parch()

D() = D()+ 13,1 


t1 + 13,2 t2 +

tparch()

D() = D()+ 14,1 


t1 (t1 > 0) + 14,2 t2 (t2 > 0) +

aparch()

D() = D()+ 15,1 (|t1 | + 15,1 t1 ) + 15,2 (|t2 | + 15,2 t2 ) +

nparch()

D() = D()+ 16,1 |t1 16,1 | + 16,2 |t2 16,2 | +

nparchk()

D() = D()+ 17,1 |t1 17 | + 17,2 |t2 17 | +

pgarch()

D() = D()+ 18,1 t1


+ 18,2 t2
+

Common models
Common term

Options to specify

ARCH (Engle 1982)

arch()

GARCH (Bollerslev 1986)

arch() garch()

ARCH-in-mean (Engle, Lilien, and Robins 1987)

archm arch() [garch()]

GARCH with ARMA terms

arch() garch() ar() ma()

EGARCH (Nelson 1991)

earch() egarch()

TARCH, threshold ARCH (Zakoian 1994)

abarch() atarch() sdgarch()

GJR, form of threshold ARCH (Glosten, Jagannathan, and Runkle 1993)

arch() tarch() [garch()]

SAARCH, simple asymmetric ARCH (Engle 1990)

arch() saarch() [garch()]

PARCH, power ARCH (Higgins and Bera 1992)

parch() [pgarch()]

NARCH, nonlinear ARCH

narch() [garch()]

NARCHK, nonlinear ARCH with one shift

narchk() [garch()]

A-PARCH, asymmetric power ARCH (Ding, Granger, and Engle 1993)

aparch() [pgarch()]

NPARCH, nonlinear power ARCH

nparch() [pgarch()]

14

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

In all cases, you type



arch depvar indepvars , options
where options are chosen from the table above. Each option requires that you specify as its argument
a numlist that specifies the lags to be included. For most ARCH models, that value will be 1. For
instance, to fit the classic first-order GARCH model on cpi, you would type
. arch cpi, arch(1) garch(1)

If you wanted to fit a first-order GARCH model of cpi on wage, you would type
. arch cpi wage, arch(1) garch(1)

If, for any of the options, you want first- and second-order terms, specify optionname(1/2). Specifying
garch(1) arch(1/2) would fit a GARCH model with first- and second-order ARCH terms. If you
specified arch(2), only the lag 2 term would be included.

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

15

Reading arch output


The regression table reported by arch when using the normal distribution for the errors will appear
as
op.depvar

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

depvar
x1
x2

...

L1.
L2.

#
#

...
...

_cons

...

sigma2

...

ar
L1.

...

ma
L1.

...

z1
z2

...

L1.
L2.

#
#

...
...

arch
L1.

...

garch
L1.

...

aparch
L1.
etc.

...

_cons

...

power

...

ARCHM

ARMA

HET

ARCH

POWER

Dividing lines separate equations.


The first one, two, or three equations report the mean model:
X
2
yt = xt +
i g(ti
) + ARMA(p, q) + t
i

The first equation reports , and the equation will be named [depvar]; if you fit a model on d.cpi,
the first equation would be named [cpi]. In Stata, the coefficient on x1 in the above example could
be referred to as [depvar] b[x1]. The coefficient on the lag 2 value of x2 would be referred to
as [depvar] b[L2.x2]. Such notation would be used, for instance, in a later test command; see
[R] test.

16

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

The [ARCHM] equation reports the coefficients if your model includes ARCH-in-mean terms;
see options discussed under the Model 2 tabPbelow. Most ARCH-in-mean models include only a
2
2
contemporaneous variance term, so the term
i i g(ti ) becomes t . The coefficient will
2
be [ARCHM] b[sigma2]. If your model includes lags of t , the additional coefficients will be
[ARCHM] b[L1.sigma2], and so on. If you specify a transformation g() (option archmexp()),
the coefficients will be [ARCHM] b[sigma2ex], [ARCHM] b[L1.sigma2ex], and so on. sigma2ex
refers to g(t2 ), the transformed value of the conditional variance.
The [ARMA] equation reports the ARMA coefficients if your model includes them; see options discussed
under the Model 2 tab below. This equation includes one or two variables named ar and ma. In
later test statements, you could refer to the coefficient on the first lag of the autoregressive term
by typing [ARMA] b[L1.ar] or simply [ARMA] b[L.ar] (the L operator is assumed to be lag 1 if
you do not specify otherwise). The second lag on the moving-average term, if there were one, could
be referred to by typing [ARMA] b[L2.ma].
The next one, two, or three equations report the variance model.
The [HET] equation reports the multiplicative heteroskedasticity if the model includes it. When
you fit such a model, you specify the variables (and their lags), determining the multiplicative
heteroskedasticity; after estimation, their coefficients are simply [HET] b[op.varname].
The [ARCH] equation reports the ARCH, GARCH, etc., terms by referring to variables arch,
garch, and so on. For instance, if you specified arch(1) garch(1) when you fit the model, the
2
. The coefficients would be named
conditional variance is given by t2 = 0 + 1,1 2t1 + 2,1 t1
[ARCH] b[ cons] (0 ), [ARCH] b[L.arch] (1,1 ), and [ARCH] b[L.garch] (2,1 ).
The [POWER] equation appears only if you are fitting a variance model in the form of (3) above; the
estimated is the coefficient [POWER] b[power].
Also, if you use the distribution() option and specify either Students t or the generalized
error distribution but do not specify the degree-of-freedom or shape parameter, then you will see
two additional rows in the table. The final row contains the estimated degree-of-freedom or shape
parameter. Immediately preceding the final row is a transformed version of the parameter that arch
used during estimation to ensure that the degree-of-freedom parameter is greater than two or that the
shape parameter is positive.
The naming convention for estimated ARCH, GARCH, etc., parameters is as follows (definitions for
parameters i , i , and i can be found in the tables for A(), B(), C(), and D() above):

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators


Option

1st parameter

= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]

2nd parameter

Common parameter

arch()
garch()
saarch()
tarch()
aarch()
narch()
narchk()

1
2
3
4
5
6
7

abarch()
atarch()
sdgarch()

8 = [ARCH] b[abarch]
9 = [ARCH] b[atarch]
10 = [ARCH] b[sdgarch]

earch()
egarch()

11 = [ARCH] b[earch]
12 = [ARCH] b[egarch]

11 = [ARCH] b[earch a]

parch()
tparch()
aparch()
nparch()
nparchk()
pgarch()

13
14
15
16
17
18

= [POWER]
= [POWER]
15 = [ARCH] b[aparch e] = [POWER]
16 = [ARCH] b[nparch k] = [POWER]
17 = [ARCH] b[nparch k] = [POWER]
= [POWER]

= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]
= [ARCH]

b[arch]
b[garch]
b[saarch]
b[tarch]
b[aarch]
b[narch]
b[narch]

5 = [ARCH] b[aarch e]
6 = [ARCH] b[narch k]
7 = [ARCH] b[narch k]

b[parch]
b[tparch]
b[aparch]
b[nparch]
b[nparch]
b[pgarch]

Menu
ARCH/GARCH
Statistics

>

Time series

>

ARCH/GARCH

>

ARCH and GARCH models

>

ARCH/GARCH

>

Nelsons EGARCH model

>

Threshold ARCH model

>

GJR form of threshold ARCH model

EARCH/EGARCH
Statistics

>

Time series

ABARCH/ATARCH/SDGARCH
Statistics

>

Time series

>

ARCH/GARCH

ARCH/TARCH/GARCH
Statistics

>

Time series

>

ARCH/GARCH

ARCH/SAARCH/GARCH
Statistics

>

Time series

>

ARCH/GARCH

>

Simple asymmetric ARCH model

>

ARCH/GARCH

>

Power ARCH model

>

ARCH/GARCH

>

Nonlinear ARCH model

>

ARCH/GARCH

>

Nonlinear ARCH model with one shift

>

ARCH/GARCH

>

Asymmetric power ARCH model

PARCH/PGARCH
Statistics

>

Time series

NARCH/GARCH
Statistics

>

Time series

NARCHK/GARCH
Statistics

>

Time series

APARCH/PGARCH
Statistics

>

Time series

b[power]
b[power]
b[power]
b[power]
b[power]
b[power]

17

18

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

NPARCH/PGARCH
Statistics

>

Time series

>

ARCH/GARCH

>

Nonlinear power ARCH model

Description
arch fits regression models in which the volatility of a series varies through time. Usually, periods
of high and low volatility are grouped together. ARCH models estimate future volatility as a function of
prior volatility. To accomplish this, arch fits models of autoregressive conditional heteroskedasticity
(ARCH) by using conditional maximum likelihood. In addition to ARCH terms, models may include
multiplicative heteroskedasticity. Gaussian (normal), Students t, and generalized error distributions
are supported.
Concerning the regression equation itself, models may also contain ARCH-in-mean and ARMA
terms.

Options


Model

noconstant; see [R] estimation options.


arch(numlist) specifies the ARCH terms (lags of 2t ).
Specify arch(1) to include first-order terms, arch(1/2) to specify first- and second-order terms,
arch(1/3) to specify first-, second-, and third-order terms, etc. Terms may be omitted. Specify
arch(1/3 5) to specify terms with lags 1, 2, 3, and 5. All the options work this way.
arch() may not be specified with aarch(), narch(), narchk(), nparchk(), or nparch(), as
this would result in collinear terms.
garch(numlist) specifies the GARCH terms (lags of t2 ).
saarch(numlist) specifies the simple asymmetric ARCH terms. Adding these terms is one way to
make the standard ARCH and GARCH models respond asymmetrically to positive and negative
innovations. Specifying saarch() with arch() and garch() corresponds to the SAARCH model
of Engle (1990).
saarch() may not be specified with narch(), narchk(), nparchk(), or nparch(), as this
would result in collinear terms.
tarch(numlist) specifies the threshold ARCH terms. Adding these is another way to make the
standard ARCH and GARCH models respond asymmetrically to positive and negative innovations.
Specifying tarch() with arch() and garch() corresponds to one form of the GJR model (Glosten,
Jagannathan, and Runkle 1993).
tarch() may not be specified with tparch() or aarch(), as this would result in collinear terms.
aarch(numlist) specifies the lags of the two-parameter term i (|t | + i t )2 . This term provides the
same underlying form of asymmetry as including arch() and tarch(), but it is expressed in a
different way.
aarch() may not be specified with arch() or tarch(), as this would result in collinear terms.
narch(numlist) specifies the lags of the two-parameter term i (t i )2 . This term allows the
minimum conditional variance to occur at a value of lagged innovations other than zero. For any
term specified at lag L, the minimum contribution to conditional variance of that lag occurs when
2tL = L the squared innovations at that lag are equal to the estimated constant L .

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

19

narch() may not be specified with arch(), saarch(), narchk(), nparchk(), or nparch(),
as this would result in collinear terms.
narchk(numlist) specifies the lags of the two-parameter term i (t )2 ; this is a variation of
narch() with held constant for all lags.
narchk() may not be specified with arch(), saarch(), narch(), nparchk(), or nparch(),
as this would result in collinear terms.
abarch(numlist) specifies lags of the term |t |.
atarch(numlist) specifies lags of |t |(t > 0), where (t > 0) represents the indicator function
returning 1 when true and 0 when false. Like the TARCH terms, these ATARCH terms allow the
effect of unanticipated innovations to be asymmetric about zero.
sdgarch(numlist) specifies lags of t . Combining atarch(), abarch(), and sdgarch() produces
the model by Zakoian (1994) that the author called the TARCH model. The acronym TARCH,
however, refers to any model using thresholding to obtain asymmetry.
p
earch(numlist) specifies lags of the two-parameter term zt +(|zt | 2/). These terms represent
the influence of newslagged innovationsin Nelsons (1991) EGARCH model. For these terms,
zt = t /t , and arch assumes zt N (0, 1). Nelson derived the general form of an EGARCH model
for any assumed distribution and performed estimation assuming a generalized error distribution
(GED). See Hamilton (1994) for a derivation where zt is assumed normal. The zt terms can be
parameterized in either of these two equivalent ways. arch uses Nelsons original parameterization;
see Hamilton (1994) for an equivalent alternative.
egarch(numlist) specifies lags of ln(t2 ).

For the following options, the model is parameterized in terms of h(t ) and t . One is estimated,
even when more than one option is specified.
parch(numlist) specifies lags of |t | . parch() combined with pgarch() corresponds to the class
of nonlinear models of conditional variance suggested by Higgins and Bera (1992).
tparch(numlist) specifies lags of (t > 0)|t | , where (t > 0) represents the indicator function
returning 1 when true and 0 when false. As with tarch(), tparch() specifies terms that allow
for a differential impact of good (positive innovations) and bad (negative innovations) news
for lags specified by numlist.
tparch() may not be specified with tarch(), as this would result in collinear terms.
aparch(numlist) specifies lags of the two-parameter term (|t | + t ) . This asymmetric power
ARCH model, A-PARCH, was proposed by Ding, Granger, and Engle (1993) and corresponds to
a BoxCox function in the lagged innovations. The authors fit the original A-PARCH model on
more than 16,000 daily observations of the Standard and Poors 500, and for good reason. As the
number of parameters and the flexibility of the specification increase, more data are required to
estimate the parameters of the conditional heteroskedasticity. See Ding, Granger, and Engle (1993)
for a discussion of how seven popular ARCH models nest within the A-PARCH model.
When goes to 1, the full term goes to zero for many observations and can then be numerically
unstable.
nparch(numlist) specifies lags of the two-parameter term |t i | .
nparch() may not be specified with arch(), saarch(), narch(), narchk(), or nparchk(),
as this would result in collinear terms.
nparchk(numlist) specifies lags of the two-parameter term |t | ; this is a variation of nparch()
with held constant for all lags. This is the direct analog of narchk(), except for the power
of . nparchk() corresponds to an extended form of the model of Higgins and Bera (1992) as

20

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

presented by Bollerslev, Engle, and Nelson (1994). nparchk() would typically be combined with
the pgarch() option.
nparchk() may not be specified with arch(), saarch(), narch(), narchk(), or nparch(),
as this would result in collinear terms.

pgarch(numlist) specifies lags of t .


constraints(constraints), collinear; see [R] estimation options.

Model 2

archm specifies that an ARCH-in-mean term be included in the specification of the mean equation. This
term allows the expected value of depvar to depend on the conditional variance. ARCH-in-mean is
most commonly used in evaluating financial time series when a theory supports a tradeoff between
asset risk and return. By default, no ARCH-in-mean terms are included in the model.
archm specifies that the contemporaneous expected conditional variance be included in the mean
equation. For example, typing
. arch y x, archm arch(1)

specifies the model


yt = 0 + 1 xt + t2 + t

t2 = 0 + 2t1
archmlags(numlist) is an expansion of archm that includes lags of the conditional variance t2 in
the mean equation. To specify a contemporaneous and once-lagged variance, specify either archm
archmlags(1) or archmlags(0/1).
archmexp(exp) applies the transformation in exp to any ARCH-in-mean terms in the model. The
expression should contain an X wherever a value of the conditional variance is to enter the expression.
This option can be used to produce the commonly used ARCH-in-mean of the conditional standard
deviation. With the example from archm, typing
. arch y x, archm arch(1) archmexp(sqrt(X))

specifies the mean equation yt = 0 + 1 xt + t + t . Alternatively, typing


. arch y x, archm arch(1) archmexp(1/sqrt(X))

specifies yt = 0 + 1 xt + /t + t .
arima(# p ,# d ,# q ) is an alternative, shorthand notation for specifying autoregressive models in the
dependent variable. The dependent variable and any independent variables are differenced # d times,
1 through # p lags of autocorrelations are included, and 1 through # q lags of moving averages are
included. For example, the specification
. arch y, arima(2,1,3)

is equivalent to
. arch D.y, ar(1/2) ma(1/3)

The former is easier to write for classic ARIMA models of the mean equation, but it is not nearly
as expressive as the latter. If gaps in the AR or MA lags are to be modeled, or if different operators
are to be applied to independent variables, the latter syntax is required.

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

21

ar(numlist) specifies the autoregressive terms of the structural model disturbance to be included in
the model. For example, ar(1/3) specifies that lags 1, 2, and 3 of the structural disturbance be
included in the model. ar(1,4) specifies that lags 1 and 4 be included, possibly to account for
quarterly effects.
If the model does not contain regressors, these terms can also be considered autoregressive terms
for the dependent variable; see [TS] arima.
ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for
the lagged innovations or white-noise disturbances.

Model 3

 
distribution(dist # ) specifies the distribution to assume for the error term. dist may be
gaussian, normal, t, or ged. gaussian and normal are synonyms, and # cannot be specified
with them.
If distribution(t) is specified, arch assumes that the errors follow Students t distribution,
and the degree-of-freedom parameter is estimated along with the other parameters of the model.
If distribution(t #) is specified, then arch uses Students t distribution with # degrees of
freedom. # must be greater than 2.
If distribution(ged) is specified, arch assumes that the errors have a generalized error
distribution, and the shape parameter is estimated along with the other parameters of the model.
If distribution(ged #) is specified, then arch uses the generalized error distribution with
shape parameter #. # must be positive. The generalized error distribution is identical to the normal
distribution when the shape parameter equals 2.
het(varlist) specifies that varlist be included in the specification of the conditional variance. varlist
may contain time-series operators. This varlist enters the variance specification collectively as
multiplicative heteroskedasticity; see Judge et al. (1985). If het() is not specified, the model will
not contain multiplicative heteroskedasticity.
Assume that the conditional variance depends on variables x and w and has an ARCH(1) component.
We request this specification by using the het(x w) arch(1) options, and this corresponds to the
conditional-variance model

t2 = exp(0 + 1 xt + 2 wt ) + 2t1
Multiplicative heteroskedasticity enters differently with an EGARCH model because the variance is
already specified in logs. For the het(x w) earch(1) egarch(1) options, the variance model is
ln(t2 ) = 0 + 1 xt + 2 wt + zt1 + (|zt1 |

2
2/) + ln(t1
)

savespace conserves memory by retaining only those variables required for estimation. The original
dataset is restored after estimation. This option is rarely used and should be specified only if
there is insufficient memory to fit a model without the option. arch requires considerably more
temporary storage during estimation than most estimation commands in Stata.

Priming

arch0(cond method) is a rarely used option that specifies how to compute the conditioning (presample
or priming) values for t2 and 2t . In the presample period, it is assumed that t2 = 2t and that this
value is constant. If arch0() is not specified, the priming values are computed as the expected
unconditional variance given the current estimates of the coefficients and any ARMA parameters.

22

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

arch0(xb), the default, specifies that the priming values are the expected unconditional variance
PT 2
of the model, which is
t /T , where b
t is computed from the mean equation and any
1 b
ARMA terms.
arch0(xb0) specifies that the priming values are the estimated variance of the residuals from an
OLS estimate of the mean equation.
arch0(xbwt) specifies that the priming values are the weighted sum of the b
t2 from the current
conditional mean equation (and ARMA terms) that places more weight on estimates of 2t at the
beginning of the sample.
arch0(xb0wt) specifies that the priming values are the weighted sum of the b
t2 from an OLS
estimate of the mean equation (and ARMA terms) that places more weight on estimates of 2t
at the beginning of the sample.
arch0(zero) specifies that the priming values are 0. Unlike the priming values for ARIMA
models, 0 is generally not a consistent estimate of the presample conditional variance or squared
innovations.
arch0(#) specifies that t2 = 2t = # for any specified nonnegative #. Thus arch0(0) is equivalent
to arch0(zero).
arma0(cond method) is a rarely used option that specifies how the t values are initialized at the
beginning of the sample for the ARMA component, if the model has one. This option has an effect
only when AR or MA terms are included in the model (the ar(), ma(), or arima() options
specified).
arma0(zero), the default, specifies that all priming values of t be taken as 0. This fits the model
over the entire requested sample and takes t as its expected value of 0 for all lags required
by the ARMA terms; see Judge et al. (1985).
arma0(p), arma0(q), and arma0(pq) specify that estimation begin after priming the recursions
for a certain number of observations. p specifies that estimation begin after the pth observation
in the sample, where p is the maximum AR lag in the model; q specifies that estimation begin
after the q th observation in the sample, where q is the maximum MA lag in the model; and pq
specifies that estimation begin after the (p + q )th observation in the sample.
During the priming period, the recursions necessary to generate predicted disturbances are performed,
but results are used only to initialize preestimation values of t . To understand the definition
of preestimation, say that you fit a model in 10/100. If the model is specified with ar(1,2),
preestimation refers to observations 10 and 11.
The ARCH terms t2 and 2t are also updated over these observations. Any required lags of t
before the priming period are taken to be their expected value of 0, and 2t and t2 take the
values specified in arch0().
arma0(#) specifies that the presample values of t are to be taken as # for all lags required by
the ARMA terms. Thus arma0(0) is equivalent to arma0(zero).
condobs(#) is a rarely used option that specifies a fixed number of conditioning observations at
the start of the sample. Over these priming observations, the recursions necessary to generate
predicted disturbances are performed, but only to initialize preestimation values of t , 2t , and t2 .
Any required lags of t before the initialization period are taken to be their expected value of 0
(or the value specified in arma0()), and required values of 2t and t2 assume the values specified
by arch0(). condobs() can be used if conditioning observations are desired for the lags in the
ARCH terms of the model. If arma() is also specified, the maximum number of conditioning
observations required by arma() and condobs(#) is used.

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

23

SE/Robust

vce(vcetype) specifies the type of standard error reported, which includes types that are robust to
some kinds of misspecification (robust) and that are derived from asymptotic theory (oim, opg);
see [R] vce option.
For ARCH models, the robust or quasimaximum likelihood estimates (QMLE) of variance are robust
to symmetric nonnormality in the disturbances. The robust variance estimates generally are not
robust to functional misspecification of the mean equation; see Bollerslev and Wooldridge (1992).
The robust variance estimates computed by arch are based on the full Huber/White/sandwich
formulation, as discussed in [P] robust. Many other software packages report robust estimates
that set some terms to their expectations of zero (Bollerslev and Wooldridge 1992), which saves
them from calculating second derivatives of the log-likelihood function.

Reporting

level(#); see [R] estimation options.


detail specifies that a detailed list of any gaps in the series be reported, including gaps due to
missing observations or missing data for the dependent variable or independent variables.
nocnsreport; see [R] estimation options.
display options: vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch;
see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
gtolerance(#), nrtolerance(#), nonrtolerance, and from(init specs); see [R] maximize
for all options except gtolerance(), and see below for information on gtolerance().
These options are often more important for ARCH models than for other maximum likelihood
models because of convergence problems associated with ARCH models ARCH model likelihoods
are notoriously difficult to maximize.
Setting technique() to something other than the default or BHHH changes the vcetype to vce(oim).
The following options are all related to maximization and are either particularly important in fitting
ARCH models or not available for most other estimators.
gtolerance(#) specifies the tolerance for the gradient relative to the coefficients. When
|gi bi | gtolerance() for all parameters bi and the corresponding elements of the
gradient gi , the gradient tolerance criterion is met. The default gradient tolerance for arch
is gtolerance(.05).
gtolerance(999) may be specified to disable the gradient criterion. If the optimizer becomes
stuck with repeated (backed up) messages, the gradient probably still contains substantial
values, but an uphill direction cannot be found for the likelihood. With this option, results can
often be obtained, but whether the global maximum likelihood has been found is unclear.
When the maximization is not going well, it is also possible to set the maximum number of
iterations (see [R] maximize) to the point where the optimizer appears to be stuck and to inspect
the estimation results at that point.
from(init specs) specifies the initial values of the coefficients. ARCH models may be sensitive
to initial values and may have coefficient values that correspond to local maximums. The
default starting values are obtained via a series of regressions, producing results that, on

24

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

the basis of asymptotic theory, are consistent for the and ARMA parameters and generally
reasonable for the rest. Nevertheless, these values may not always be feasible in that the
likelihood function cannot be evaluated at the initial values arch first chooses. In such cases,
the estimation is restarted with ARCH and ARMA parameters initialized to zero. It is possible,
but unlikely, that even these values will be infeasible and that you will have to supply initial
values yourself.
The standard syntax for from() accepts a matrix, a list of values, or coefficient name value
pairs; see [R] maximize. arch also allows the following:
from(archb0) sets the starting value for all the ARCH/GARCH/. . . parameters in the conditionalvariance equation to 0.
from(armab0) sets the starting value for all ARMA parameters in the model to 0.
from(archb0 armab0) sets the starting value for all ARCH/GARCH/. . . and ARMA parameters
to 0.
The following option is available with arch but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


The volatility of a series is not constant through time; periods of relatively low volatility and periods
of relatively high volatility tend to be grouped together. This is a commonly observed characteristic
of economic time series and is even more pronounced in many frequently sampled financial series.
ARCH models seek to estimate this time-dependent volatility as a function of observed prior volatility.
Sometimes the model of volatility is of more interest than the model of the conditional mean. As
implemented in arch, the volatility model may also include regressors to account for a structural
component in the volatilityusually referred to as multiplicative heteroskedasticity.
ARCH models were introduced by Engle (1982) in a study of inflation rates, and there has since
been a barrage of proposed parametric and nonparametric specifications of autoregressive conditional
heteroskedasticity. Overviews of the literature can found in Bollerslev, Engle, and Nelson (1994) and
Bollerslev, Chou, and Kroner (1992). Introductions to basic ARCH models appear in many general
econometrics texts, including Davidson and MacKinnon (1993, 2004), Greene (2012), Kmenta (1997),
Stock and Watson (2011), and Wooldridge (2013). Harvey (1989) and Enders (2004) provide introductions to ARCH in the larger context of econometric time-series modeling, and Hamilton (1994) gives
considerably more detail in the same context. Becketti (2013, chap. 8) provides a simple introduction
to ARCH modeling with an emphasis on how to use Statas arch command.

arch fits models of autoregressive conditional heteroskedasticity (ARCH, GARCH, etc.) using conditional maximum likelihood. By conditional, we mean that the likelihood is computed based on
an assumed or estimated set of priming values for the squared innovations 2t and variances t2 prior
to the estimation sample; see Hamilton (1994) or Bollerslev (1986). Sometimes more conditioning is
done on the first a, g , or a + g observations in the sample, where a is the maximum ARCH term lag
and g is the maximum GARCH term lag (or the maximum lags from the other ARCH family terms).
The original ARCH model proposed by Engle (1982) modeled the variance of a regression models
disturbances as a linear function of lagged values of the squared regression disturbances. We can
write an ARCH(m) model as

yt = xt + t
t2 = 0 + 1 2t1 + 2 2t2 + + m 2tm

(conditional mean)
(conditional variance)

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

where

25

2t is the squared residuals (or innovations)


i are the ARCH parameters

The ARCH model has a specification for both the conditional mean and the conditional variance,
and the variance is a function of the size of prior unanticipated innovations 2t . This model was
generalized by Bollerslev (1986) to include lagged values of the conditional variancea GARCH
model. The GARCH(m, k) model is written as

yt = xt + t
2
2
2
t2 = 0 + 1 2t1 + 2 2t2 + + m 2tm + 1 t1
+ 2 t2
+ + k tk

where

i are the ARCH parameters


i are the GARCH parameters
In his pioneering work, Engle (1982) assumed that the error term, t , followed a Gaussian
(normal) distribution: t N (0, t2 ). However, as Mandelbrot (1963) and many others have noted,
the distribution of stock returns appears to be leptokurtotic, meaning that extreme stock returns are
more frequent than would be expected if the returns were normally distributed. Researchers have
therefore assumed other distributions that can have fatter tails than the normal distribution; arch
allows you to fit models assuming the errors follow Students t distribution or the generalized error
distribution. The t distribution has fatter tails than the normal distribution; as the degree-of-freedom
parameter approaches infinity, the t distribution converges to the normal distribution. The generalized
error distributions tails are fatter than the normal distributions when the shape parameter is less than
two and are thinner than the normal distributions when the shape parameter is greater than two.
The GARCH model of conditional variance can be considered an ARMA process in the squared
innovations, although not in the variances as the equations might seem to suggest; see Hamilton (1994).
Specifically, the standard GARCH model implies that the squared innovations result from

2t = 0 + (1 + 1 )2t1 + (2 + 2 )2t2 + + (k + k )2tk + wt 1 wt1 2 wt2 3 wt3


where

wt = 2t t2
wt is a white-noise process that is fundamental for 2t

One of the primary benefits of the GARCH specification is its parsimony in identifying the conditional
variance. As with ARIMA models, the ARMA specification in GARCH allows the conditional variance
to be modeled with fewer parameters than with an ARCH specification alone. Empirically, many series
with a conditionally heteroskedastic disturbance have been adequately modeled with a GARCH(1,1)
specification.
An ARMA process in the disturbances can easily be added to the mean equation. For example, the
mean equation can be written with an ARMA(1, 1) disturbance as

yt = xt + (yt1 xt1 ) + t1 + t


with an obvious generalization to ARMA(p, q) by adding terms; see [TS] arima for more discussion
of this specification. This change affects only the conditional-variance specification in that 2t now
results from a different specification of the conditional mean.

26

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

Much of the literature on ARCH models focuses on alternative specifications of the variance equation.
arch allows many of these specifications to be requested using the saarch() through pgarch()
options, which imply that one or more terms may be changed or added to the specification of the
variance equation.
These alternative specifications also address asymmetry. Both the ARCH and GARCH specifications
imply a symmetric impact of innovations. Whether an innovation 2t is positive or negative makes
no difference to the expected variance t2 in the ensuing periods; only the size of the innovation
mattersgood news and bad news have the same effect. Many theories, however, suggest that positive
and negative innovations should vary in their impact. For risk-averse investors, a large unanticipated
drop in the market is more likely to lead to higher volatility than a large unanticipated increase (see
Black [1976], Nelson [1991]). saarch(), tarch(), aarch(), abarch(), earch(), aparch(), and
tparch() allow various specifications of asymmetric effects.
narch(), narchk(), nparch(), and nparchk() imply an asymmetric impact of a specific form.
All the models considered so far have a minimum conditional variance when the lagged innovations
are all zero. No news is good news when it comes to keeping the conditional variance small.
narch(), narchk(), nparch(), and nparchk() also have a symmetric response to innovations,
but they are not centered at zero. The entire news-response function (response to innovations) is
shifted horizontally so that minimum variance lies at some specific positive or negative value for prior
innovations.
ARCH-in-mean models allow the conditional variance of the series to influence the conditional
mean. This is particularly convenient for modeling the riskreturn relationship in financial series; the
riskier an investment, with all else equal, the lower its expected return. ARCH-in-mean models modify
the specification of the conditional mean equation to be

yt = xt + t2 + t

(ARCH-in-mean)

Although this linear form in the current conditional variance has dominated the literature, arch allows
the conditional variance to enter the mean equation through a nonlinear transformation g() and for
this transformed term to be included contemporaneously or lagged.
2
2
yt = xt + 0 g(t2 ) + 1 g(t1
) + 2 g(t2
) + + t

Square root is the most commonly used g() transformation because researchers want to include a
linear term for the conditional standard deviation, but any transform g() is allowed.

Example 1: ARCH model


Consider a simple model of the U.S. Wholesale Price Index (WPI) (Enders 2004, 8793), which
we also consider in [TS] arima. The data are quarterly over the period 1960q1 through 1990q4.
In [TS] arima, we fit a model of the continuously compounded rate of change in the WPI,
ln(WPIt ) ln(WPIt1 ). The graph of the differenced seriessee [TS] arima clearly shows periods
of high volatility and other periods of relative tranquility. This makes the series a good candidate for
ARCH modeling. Indeed, price indices have been a common target of ARCH models. Engle (1982)
presented the original ARCH formulation in an analysis of U.K. inflation rates.
First, we fit a constant-only model by OLS and test ARCH effects by using Engles Lagrange
multiplier test (estat archlm).

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

27

. use http://www.stata-press.com/data/r13/wpi1
. regress D.ln_wpi
SS

df

Model
Residual

0
.02521709

0
122

.
.000206697

Total

.02521709

122

.000206697

Source

D.ln_wpi

Coef.

_cons

.0108215

MS

Number of obs
F( 0,
122)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

123
0.00
.
0.0000
0.0000
.01438

Std. Err.

P>|t|

[95% Conf. Interval]

.0012963

8.35

0.000

.0082553

.0133878

. estat archlm, lags(1)


LM test for autoregressive conditional heteroskedasticity (ARCH)
lags(p)

chi2

df

Prob > chi2

8.366

0.0038

vs.

H0: no ARCH effects

H1: ARCH(p) disturbance

Because the LM test shows a p-value of 0.0038, which is well below 0.05, we reject the null hypothesis
of no ARCH(1) effects. Thus we can further estimate the ARCH(1) parameter by specifying arch(1).
See [R] regress postestimation time series for more information on Engles LM test.
The first-order generalized ARCH model (GARCH, Bollerslev 1986) is the most commonly used
specification for the conditional variance in empirical work and is typically written GARCH(1, 1). We
can estimate a GARCH(1, 1) process for the log-differenced series by typing
. arch D.ln_wpi, arch(1) garch(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
Iteration 1:
log likelihood =
(output omitted )
Iteration 10: log likelihood =

355.23458
365.64586
373.23397

ARCH family regression


Sample: 1960q2 - 1990q4
Distribution: Gaussian
Log likelihood =
373.234

D.ln_wpi

Coef.

Number of obs
Wald chi2(.)
Prob > chi2

=
=
=

123
.
.

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_wpi
_cons

.0061167

.0010616

5.76

0.000

.0040361

.0081974

arch
L1.

.4364123

.2437428

1.79

0.073

-.0413147

.9141394

garch
L1.

.4544606

.1866606

2.43

0.015

.0886127

.8203086

_cons

.0000269

.0000122

2.20

0.028

2.97e-06

.0000508

ARCH

We have estimated the ARCH(1) parameter to be 0.436 and the GARCH(1) parameter to be 0.454, so
our fitted GARCH(1, 1) model is

28

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

yt = 0.0061 + t
2
t2 = 0.436 2t1 + 0.454 t1

where yt = ln(wpit ) ln(wpit1 ).


The model Wald test and probability are both reported as missing (.). By convention, Stata reports
the model test for the mean equation. Here and fairly often for ARCH models, the mean equation
consists only of a constant, and there is nothing to test.

Example 2: ARCH model with ARMA process


We can retain the GARCH(1, 1) specification for the conditional variance and model the mean as
an ARMA process with AR(1) and MA(1) terms as well as a fourth-lag MA term to control for quarterly
seasonal effects by typing
. arch D.ln_wpi, ar(1) ma(1 4) arch(1) garch(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
380.9997
Iteration 1:
log likelihood = 388.57823
Iteration 2:
log likelihood = 391.34143
Iteration 3:
log likelihood = 396.36991
Iteration 4:
log likelihood = 398.01098
(switching optimization to BFGS)
Iteration 5:
log likelihood = 398.23668
BFGS stepping has contracted, resetting BFGS Hessian (0)
Iteration 6:
log likelihood = 399.21497
Iteration 7:
log likelihood = 399.21537 (backed up)
(output omitted )
(switching optimization to BHHH)
Iteration 15: log likelihood = 399.51441
Iteration 16: log likelihood = 399.51443
Iteration 17: log likelihood = 399.51443
ARCH family regression -- ARMA disturbances
Sample: 1960q2 - 1990q4
Number of obs
Distribution: Gaussian
Wald chi2(3)
Log likelihood = 399.5144
Prob > chi2

D.ln_wpi

Coef.

=
=
=

123
153.56
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_wpi
_cons

.0069541

.0039517

1.76

0.078

-.000791

.0146992

ar
L1.

.7922674

.1072225

7.39

0.000

.5821153

1.00242

ma
L1.
L4.

-.341774
.2451724

.1499943
.1251131

-2.28
1.96

0.023
0.050

-.6357574
-.0000447

-.0477905
.4903896

arch
L1.

.2040449

.1244991

1.64

0.101

-.0399688

.4480586

garch
L1.

.6949687

.1892176

3.67

0.000

.3241091

1.065828

_cons

.0000119

.0000104

1.14

0.253

-8.52e-06

.0000324

ARMA

ARCH

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

29

To clarify exactly what we have estimated, we could write our model as

yt = 0.007 + 0.792 (yt1 0.007) 0.342 t1 + 0.245 t4 + t


2
t2 = 0.204 2t1 + .695 t1

where yt = ln(wpit ) ln(wpit1 ).


The ARCH(1) coefficient, 0.204, is not significantly different from zero, but the ARCH(1) and
GARCH(1) coefficients are significant collectively. If you doubt this, you can check with test.
. test [ARCH]L1.arch [ARCH]L1.garch
( 1)
( 2)

[ARCH]L.arch = 0
[ARCH]L.garch = 0
chi2( 2) =
Prob > chi2 =

84.92
0.0000

(For comparison, we fit the model over the same sample used in example 1 of [TS] arima; Enders
fits this GARCH model but over a slightly different sample.)

Technical note
The rather ugly iteration log on the previous result is typical, as difficulty in converging is common
in ARCH models. This is actually a fairly well-behaved likelihood for an ARCH model. The switching
optimization to . . . messages are standard messages from the default optimization method for arch.
The backed up messages are typical of BFGS stepping as the BFGS Hessian is often overoptimistic,
particularly during early iterations. These messages are nothing to be concerned about.
Nevertheless, watch out for the messages BFGS stepping has contracted, resetting BFGS Hessian
and backed up, which can flag problems that may result in an iteration log that goes on and on.
Stata will never report convergence and will never report final results. The question is, when do you
give up and press Break, and if you do, what then?
If the BFGS stepping has contracted message occurs repeatedly (more than, say, five times), it
often indicates that convergence will never be achieved. Literally, it means that the BFGS algorithm
was stuck and reset its Hessian and take a steepest-descent step.
The backed up message, if it occurs repeatedly, also indicates problems, but only if the likelihood
value is simultaneously not changing. If the message occurs repeatedly but the likelihood value is
changing, as it did above, all is going well; it is just going slowly.
If you have convergence problems, you can specify options to assist the current maximization
method or try a different method. Or, your model specification and data may simply lead to a likelihood
that is not concave in the allowable region and thus cannot be maximized.
If you see the backed up message with no change in the likelihood, you can reset the gradient
tolerance to a larger value. Specifying the gtolerance(999) option disables gradient checking,
allowing convergence to be declared more easily. This does not guarantee that convergence will be
declared, and even if it is, the global maximum likelihood may not have been found.
You can also try to specify initial values.
Finally, you can try a different maximization method; see options discussed under the Maximization
tab above.

30

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

ARCH models are notorious for having convergence difficulties. Unlike in most estimators in Stata,
it is common for convergence to require many steps or even to fail. This is particularly true of the
explicitly nonlinear terms such as aarch(), narch(), aparch(), or archm (ARCH-in-mean), and of
any model with several lags in the ARCH terms. There is not always a solution. You can try other
maximization methods or different starting values, but if your data do not support your assumed ARCH
structure, convergence simply may not be possible.
ARCH models can be susceptible to irrelevant regressors or unnecessary lags, whether in the
specification of the conditional mean or in the conditional variance. In these situations, arch will
often continue to iterate, making little to no improvement in the likelihood. We view this conservative
approach as better than declaring convergence prematurely when the likelihood has not been fully
maximized. arch is estimating the conditional form of second sample moments, often with flexible
functions, and that is asking much of the data.

Technical note
if exp and in range are interpreted differently with commands accepting time-series operators.
The time-series operators are resolved before the conditions are tested, which may lead to some
confusion. Note the results of the following list commands:
. use http://www.stata-press.com/data/r13/archxmpl
. list t y l.y in 5/10
L.
y

5.
6.
7.
8.
9.

1961q1
1961q2
1961q3
1961q4
1962q1

30.8
30.5
30.5
30.6
30.7

30.7
30.8
30.5
30.5
30.6

10.

1962q2

30.6

30.7

. keep in 5/10
(118 observations deleted)
. list t y l.y
L.
y

1.
2.
3.
4.
5.

1961q1
1961q2
1961q3
1961q4
1962q1

30.8
30.5
30.5
30.6
30.7

.
30.8
30.5
30.5
30.6

6.

1962q2

30.6

30.7

We have one more lagged observation for y in the first case: l.y was resolved before the in
restriction was applied. In the second case, the dataset no longer contains the value of y to compute
the first lag. This means that
. use http://www.stata-press.com/data/r13/archxmpl, clear
. arch y l.x if twithin(1962q2, 1990q3), arch(1)

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

31

is not the same as


. keep if twithin(1962q2, 1990q3)
. arch y l.x, arch(1)

Example 3: Asymmetric effectsEGARCH model


Continuing with the WPI data, we might be concerned that the economy as a whole responds
differently to unanticipated increases in wholesale prices than it does to unanticipated decreases.
Perhaps unanticipated increases lead to cash flow issues that affect inventories and lead to more
volatility. We can see if the data support this supposition by specifying an ARCH model that allows an
asymmetric effect of newsinnovations or unanticipated changes. One of the most popular such
models is EGARCH (Nelson 1991). The full first-order EGARCH model for the WPI can be specified
as follows:
. use http://www.stata-press.com/data/r13/wpi1, clear
. arch D.ln_wpi, ar(1) ma(1 4) earch(1) egarch(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
227.5251
Iteration 1:
log likelihood = 381.68426
(output omitted )
Iteration 23: log likelihood = 405.31453
ARCH family regression -- ARMA disturbances
Sample: 1960q2 - 1990q4
Number of obs
Distribution: Gaussian
Wald chi2(3)
Log likelihood = 405.3145
Prob > chi2

D.ln_wpi

Coef.

=
=
=

123
156.02
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_wpi
_cons

.0087342

.0034004

2.57

0.010

.0020696

.0153989

ar
L1.

.769212

.0968396

7.94

0.000

.5794099

.959014

ma
L1.
L4.

-.3554617
.241463

.1265725
.0863832

-2.81
2.80

0.005
0.005

-.6035393
.072155

-.1073841
.4107711

earch
L1.

.4064007

.116351

3.49

0.000

.178357

.6344445

earch_a
L1.

.2467351

.1233365

2.00

0.045

.0049999

.4884702

egarch
L1.

.8417291

.0704079

11.96

0.000

.7037322

.9797261

_cons

-1.488402

.6604397

-2.25

0.024

-2.78284

-.1939643

ARMA

ARCH

Our result for the variance is


p
2
ln(t2 ) = 1.49 + .406 zt1 + .247 ( zt1 2/ ) + .842 ln(t1
)
where zt = t /t , which is distributed as N (0, 1).

32

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

This is a strong indication for a leverage effect. The positive L1.earch coefficient implies that
positive innovations (unanticipated price increases) are more destabilizing than negative innovations.
The effect appears strong (0.406) and is substantially larger than the symmetric effect (0.247). In fact,
the relative scales of the two coefficients imply that the positive leverage completely dominates the
symmetric effect.
This can readily be seen if we plot what is often referred to as the news-response or news-impact
function. This curve shows the resulting conditional variance as a function of unanticipated news,
in the form of innovations, that is, the conditional variance t2 as a function of t . Thus we must
evaluate t2 for various values of t say, 4 to 4and then graph the result.

Example 4: Asymmetric power ARCH model


As an example of a frequently sampled, long-run series, consider the daily closing indices of the
Dow Jones Industrial Average, variable dowclose. To avoid the first half of the century, when the
New York Stock Exchange was open for Saturday trading, only data after 1jan1953 are used. The
compound return of the series is used as the dependent variable and is graphed below.

.3

.2

.1

.1

DOW, compound return on DJIA

01jan1950

01jan1960

01jan1970
date

01jan1980

01jan1990

We formed this difference by referring to D.ln dow, but only after playing a trick. The series is
daily, and each observation represents the Dow closing index for the day. Our data included a time
variable recorded as a daily date. We wanted, however, to model the log differences in the series,
and we wanted the span from Friday to Monday to appear as a single-period difference. That is, the
day before Monday is Friday. Because our dataset was tsset with date, the span from Friday to
Monday was 3 days. The solution was to create a second variable that sequentially numbered the
observations. By tsseting the data with this new variable, we obtained the desired differences.
. generate t = _n
. tsset t

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

33

Now our data look like this:


. use http://www.stata-press.com/data/r13/dow1, clear
. generate dayofwk = dow(date)
. list date dayofwk t ln_dow D.ln_dow in 1/8
D.
ln_dow

date

dayofwk

ln_dow

1.
2.
3.
4.
5.

02jan1953
05jan1953
06jan1953
07jan1953
08jan1953

5
1
2
3
4

1
2
3
4
5

5.677096
5.682899
5.677439
5.672636
5.671259

.
.0058026
-.0054603
-.0048032
-.0013762

6.
7.
8.

09jan1953
12jan1953
13jan1953

5
1
2

6
7
8

5.661223
5.653191
5.659134

-.0100365
-.0080323
.0059433

. list date dayofwk t ln_dow D.ln_dow in -8/l


D.
ln_dow

date

dayofwk

ln_dow

9334.
9335.
9336.
9337.
9338.

08feb1990
09feb1990
12feb1990
13feb1990
14feb1990

4
5
1
2
3

9334
9335
9336
9337
9338

7.880188
7.881635
7.870601
7.872665
7.872577

.0016198
.0014472
-.011034
.0020638
-.0000877

9339.
9340.
9341.

15feb1990
16feb1990
20feb1990

4
5
2

9339
9340
9341

7.88213
7.876863
7.862054

.009553
-.0052676
-.0148082

The difference operator D spans weekends because the specified time variable, t, is not a true date
and has a difference of 1 for all observations. We must leave this contrived time variable in place
during estimation, or arch will be convinced that our dataset has gaps. If we were using calendar
dates, we would indeed have gaps.
Ding, Granger, and Engle (1993) fit an A-PARCH model of daily returns of the Standard and Poors
500 (S&P 500) for 3jan192830aug1991. We will fit the same model for the Dow data shown above.
The model includes an AR(1) term as well as the A-PARCH specification of conditional variance.

34

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators


. arch D.ln_dow, ar(1) aparch(1) pgarch(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = 31139.547
Iteration 1:
log likelihood = 31350.751
(output omitted )
Iteration 68: log likelihood = 32273.555
Iteration 69: log likelihood = 32273.555
ARCH family regression -- AR disturbances
Sample: 2 - 9341
Distribution: Gaussian
Log likelihood = 32273.56

D.ln_dow

Coef.

(backed up)

Number of obs
Wald chi2(1)
Prob > chi2

=
=
=

9340
175.46
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_dow
_cons

.0001786

.0000875

2.04

0.041

7.15e-06

.00035

ar
L1.

.1410944

.0106519

13.25

0.000

.1202171

.1619716

aparch
L1.

.0626323

.0034307

18.26

0.000

.0559082

.0693564

aparch_e
L1.

-.3645093

.0378485

-9.63

0.000

-.4386909

-.2903277

pgarch
L1.

.9299015

.0030998

299.99

0.000

.923826

.935977

_cons

7.19e-06

2.53e-06

2.84

0.004

2.23e-06

.0000121

power

1.585187

.0629186

25.19

0.000

1.461868

1.708505

ARMA

ARCH

POWER

In the iteration log, the final iteration reports the message backed up. For most estimators,
ending on a backed up message would be a cause for great concern, but not with arch or, for that
matter, arima, as long as you do not specify the gtolerance() option. arch and arima, by default,
monitor the gradient and declare convergence only if, in addition to everything else, the gradient is
small enough.
The fitted model demonstrates substantial asymmetry, with the large negative L1.aparch e
coefficient indicating that the market responds with much more volatility to unexpected drops in
returns (bad news) than it does to increases in returns (good news).

Example 5: ARCH model with nonnormal errors


Stock returns tend to be leptokurtotic, meaning that large returns (either positive or negative) occur
more frequently than one would expect if returns were in fact normally distributed. Here we refit the
previous A-PARCH model assuming the errors follow the generalized error distribution, and we let
arch estimate the shape parameter of the distribution.

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

35

. use http://www.stata-press.com/data/r13/dow1, clear


. arch D.ln_dow, ar(1) aparch(1) pgarch(1) distribution(ged)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
Iteration 1:
log likelihood =
(output omitted )
Iteration 60: log likelihood =

31139.547
31348.13
32486.461

ARCH family regression -- AR disturbances


Sample: 2 - 9341
Distribution: GED
Log likelihood = 32486.46

D.ln_dow

Coef.

Number of obs
Wald chi2(1)
Prob > chi2

=
=
=

9340
178.22
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_dow
_cons

.0002735

.000078

3.51

0.000

.0001207

.0004264

ar
L1.

.1337473

.0100187

13.35

0.000

.1141109

.1533836

aparch
L1.

.0641762

.0049401

12.99

0.000

.0544938

.0738587

aparch_e
L1.

-.4052109

.0573054

-7.07

0.000

-.5175273

-.2928944

pgarch
L1.

.9341738

.0045668

204.56

0.000

.925223

.9431246

_cons

.0000216

.0000117

1.84

0.066

-1.39e-06

.0000446

power

1.325313

.1030748

12.86

0.000

1.12329

1.527336

/lnshape

.3527009

.009482

37.20

0.000

.3341166

.3712853

shape

1.422906

.013492

1.396706

1.449597

ARMA

ARCH

POWER

The ARMA and ARCH coefficients are similar to those we obtained when we assumed normally
distributed errors, though we do note that the power term is now closer to 1. The estimated shape
parameter for the generalized error distribution is shown at the bottom of the output. Here the shape
parameter is 1.42; because it is less than 2, the distribution of the errors has tails that are fatter than
they would be if the errors were normally distributed.

Example 6: ARCH model with constraints


Engles (1982) original model, which sparked the interest in ARCH, provides an example requiring
constraints. Most current ARCH specifications use GARCH terms to provide flexible dynamic properties
without estimating an excessive number of parameters. The original model was limited to ARCH
terms, and to help cope with the collinearity of the terms, a declining lag structure was imposed in
the parameters. The conditional variance equation was specified as

36

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

t2 = 0 + (.4 t1 + .3 t2 + .2 t3 + .1 t4 )


= 0 + .4 t1 + .3 t2 + .2 t3 + .1 t4
From the earlier arch output, we know how the coefficients will be named. In Stata, the formula is

t2 = [ARCH] cons + .4 [ARCH]L1.arch t1 + .3 [ARCH]L2.arch t2


+ .2 [ARCH]L3.arch t3 + .1 [ARCH]L4.arch t4
We could specify these linear constraints many ways, but the following seems fairly intuitive; see
[R] constraint for syntax.
. use http://www.stata-press.com/data/r13/wpi1, clear
. constraint 1 (3/4)*[ARCH]l1.arch = [ARCH]l2.arch
. constraint 2 (2/4)*[ARCH]l1.arch = [ARCH]l3.arch
. constraint 3 (1/4)*[ARCH]l1.arch = [ARCH]l4.arch

The original model was fit on U.K. inflation; we will again use the WPI data and retain our earlier
specification of the mean equation, which differs from Engles U.K. inflation model. With our
constraints, we type
. arch D.ln_wpi, ar(1) ma(1 4) arch(1/4) constraints(1/3)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
Iteration 1:
log likelihood =
(output omitted )
Iteration 9:
log likelihood =

396.80198
399.07809
399.46243

ARCH family regression -- ARMA disturbances


Sample: 1960q2 - 1990q4
Distribution: Gaussian
Log likelihood = 399.4624
( 1)
( 2)
( 3)

Number of obs
Wald chi2(3)
Prob > chi2

=
=
=

123
123.32
0.0000

.75*[ARCH]L.arch - [ARCH]L2.arch = 0
.5*[ARCH]L.arch - [ARCH]L3.arch = 0
.25*[ARCH]L.arch - [ARCH]L4.arch = 0

D.ln_wpi

Coef.

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_wpi
_cons

.0077204

.0034531

2.24

0.025

.0009525

.0144883

ar
L1.

.7388168

.1126811

6.56

0.000

.5179659

.9596676

ma
L1.
L4.

-.2559691
.2528923

.1442861
.1140185

-1.77
2.22

0.076
0.027

-.5387646
.02942

.0268264
.4763645

arch
L1.
L2.
L3.
L4.

.2180138
.1635103
.1090069
.0545034

.0737787
.055334
.0368894
.0184447

2.95
2.95
2.95
2.95

0.003
0.003
0.003
0.003

.0734101
.0550576
.0367051
.0183525

.3626174
.2719631
.1813087
.0906544

_cons

.0000483

7.66e-06

6.30

0.000

.0000333

.0000633

ARMA

ARCH

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

L1.arch, L2.arch, L3.arch, and L4.arch coefficients have the constrained relative sizes.

Stored results
arch stores the following in e():
Scalars
e(N)
e(N gaps)
e(condobs)
e(k)
e(k eq)
e(k eq model)
e(k dv)
e(k aux)
e(df m)
e(ll)
e(chi2)
e(p)
e(archi)
e(archany)
e(tdf)
e(shape)
e(tmin)
e(tmax)
e(power)
e(rank)
e(ic)
e(rc)
e(converged)

number of observations
number of gaps
number of conditioning observations
number of parameters
number of equations in e(b)
number of equations in overall model test
number of dependent variables
number of auxiliary parameters
model degrees of freedom
log likelihood
2

significance
02 =20 , priming values
1 if model contains ARCH terms, 0 otherwise
degrees of freedom for Students t distribution
shape parameter for generalized error distribution
minimum time
maximum time
for power ARCH terms
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise

37

38

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators


Macros
e(cmd)
e(cmdline)
e(depvar)
e(covariates)
e(eqnames)
e(wtype)
e(wexp)
e(title)
e(tmins)
e(tmaxs)
e(dist)
e(mhet)
e(dfopt)
e(chi2type)
e(vce)
e(vcetype)
e(ma)
e(ar)
e(arch)
e(archm)
e(archmexp)
e(earch)
e(egarch)
e(aarch)
e(narch)
e(aparch)
e(nparch)
e(saarch)
e(parch)
e(tparch)
e(abarch)
e(tarch)
e(atarch)
e(sdgarch)
e(pgarch)
e(garch)
e(opt)
e(ml method)
e(user)
e(technique)
e(tech)
e(tech steps)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)

arch
command as typed
name of dependent variable
list of covariates
names of equations
weight type
weight expression
title in estimation output
formatted minimum time
formatted maximum time
distribution for error term: gaussian, t, or ged
1 if multiplicative heteroskedasticity
yes if degrees of freedom for t distribution or shape parameter for GED distribution
was estimated; no otherwise
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
lags for moving-average terms
lags for autoregressive terms
lags for ARCH terms
ARCH-in-mean lags
ARCH-in-mean exp
lags for EARCH terms
lags for EGARCH terms
lags for AARCH terms
lags for NARCH terms
lags for A-PARCH terms
lags for NPARCH terms
lags for SAARCH terms
lags for PARCH terms
lags for TPARCH terms
lags for ABARCH terms
lags for TARCH terms
lags for ATARCH terms
lags for SDGARCH terms
lags for PGARCH terms
lags for GARCH terms
type of optimization
type of ml method
name of likelihood-evaluator program
maximization technique
maximization technique, including number of iterations
number of iterations performed before switching techniques
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators


Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(V)
e(V modelbased)
Functions
e(sample)

39

coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


The mean equation for the model fit by arch and with ARMA terms can be written as

yt = xt +

p
X
i=1
q
X

2
i g(ti
)

p
X

(
j

ytj xtj

j=1

p
X

)
2
i g(tji
)

i=1

k tk + t

(conditional mean)

k=1

where
are the regression parameters,
are the ARCH-in-mean parameters,
are the autoregression parameters,
are the moving-average parameters, and

g() is a general function, see the archmexp() option.


Any of the parameters in this full specification of the conditional mean may be zero. For example,
the model need not have moving-average parameters ( = 0) or ARCH-in-mean parameters ( = 0).
The variance equation will be one of the following:

2 = 0 + A(, ) + B(, )2


ln t2
t

(1)
2

= 0 + C( ln, z) + A(, ) + B(, )

(2)

= 0 + D(, ) + A(, ) + B(, )2

(3)

where A(, ), B(, ), C( ln, z), and D(, ) are linear sums of the appropriate ARCH terms; see
Details of syntax for more information. Equation (1) is used if no EGARCH or power ARCH terms
are included in the model, (2) if EGARCH terms are included, and (3) if any power ARCH terms are
included; see Details of syntax.
Methods and formulas are presented under the following headings:
Priming values
Likelihood from prediction error decomposition
Missing data

40

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

Priming values
The above model is recursive with potentially long memory. It is necessary to assume preestimation
sample values for t , 2t , and t2 to begin the recursions, and the remaining computations are therefore
conditioned on these priming values, which can be controlled using the arch0() and arma0()
options. See options discussed under the Priming tab above.
The arch0(xb0wt) and arch0(xbwt) options compute a weighted sum of estimated disturbances
with more weight on the early observations. With either of these options,

t20 i = 2t0 i = (1 .7)

T
1
X

.7T t1 2T t

t=0

where t0 is the first observation for which the likelihood is computed; see options discussed under the
Priming tab above. The 2t are all computed from the conditional mean equation. If arch0(xb0wt)
is specified, , i , j , and k are taken from initial regression estimates and held constant during
optimization. If arch0(xbwt) is specified, the current estimates of , i , j , and k are used to
compute 2t on every iteration. If any i is in the mean equation (ARCH-in-mean is specified), the
estimates of 2t from the initial regression estimates are not consistent.

Likelihood from prediction error decomposition


The likelihood function for ARCH has a particularly simple form. Given priming (or conditioning)
values of t , 2t , and t2 , the mean equation above can be solved recursively for every t (prediction error
decomposition). Likewise, the conditional variance can be computed recursively for each observation
by using the variance equation. Using these predicted errors, their associated variances, and the
assumption that t N (0, t2 ), we find that the log likelihood for each observation t is

1
ln Lt =
2

2
ln(2t2 ) + t2
t

If we assume that t t(df), then as given in Hamilton (1994, 662),



 






df
1
2t
df + 1
ln

ln (df 2)t2 + (df + 1) ln 1 +


ln Lt = ln
2
2
2
(df 2)t2
The likelihood is not defined for df 2, so instead of estimating df directly, we estimate m =
ln(df 2). Then df = exp(m) + 2 > 2 for any m.
Following Bollerslev, Engle, and Nelson (1994, 2978), the log likelihood for the tth observation,
assuming t GED(s), is


 1 t s
s+1
1


ln Lt = ln s ln
ln 2 ln s

s
2 t
where

(
=

 )1/2
s1
22/s (3s1 )

To enforce the restriction that s > 0, we estimate r = ln s.

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

41

This command supports the Huber/White/sandwich estimator of the variance using vce(robust).
See [P] robust, particularly Maximum likelihood estimators and Methods and formulas.

Missing data
ARCH allows missing data or missing observations but does not attempt to condition on the
surrounding data. If a dynamic component cannot be computed t , 2t , and/or t2 its priming
value is substituted. If a covariate, the dependent variable, or the entire observation is missing, the
observation does not enter the likelihood, and its dynamic components are set to their priming values
for that observation. This is acceptable only asymptotically and should not be used with a great deal
of missing data.

Robert Fry Engle (1942 ) was born in Syracuse, New York. He earned degrees in physics
and economics at Williams College and Cornell and then worked at MIT and the University of
California, San Diego, before moving to New York University Stern School of Business in 2000.
He was awarded the 2003 Nobel Prize in Economics for research on autoregressive conditional
heteroskedasticity and is a leading expert in time-series analysis, especially the analysis of
financial markets.

References
Adkins, L. C., and R. C. Hill. 2011. Using Stata for Principles of Econometrics. 4th ed. Hoboken, NJ: Wiley.
Baum, C. F. 2000. sts15: Tests for stationarity of a time series. Stata Technical Bulletin 57: 3639. Reprinted in
Stata Technical Bulletin Reprints, vol. 10, pp. 356360. College Station, TX: Stata Press.
Baum, C. F., and R. I. Sperling. 2000. sts15.1: Tests for stationarity of a time series: Update. Stata Technical Bulletin
58: 3536. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 360362. College Station, TX: Stata Press.
Baum, C. F., and V. L. Wiggins. 2000. sts16: Tests for long memory in a time series. Stata Technical Bulletin 57:
3944. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 362368. College Station, TX: Stata Press.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Berndt, E. K., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974. Estimation and inference in nonlinear structural
models. Annals of Economic and Social Measurement 3/4: 653665.
Black, F. 1976. Studies of stock price volatility changes. Proceedings of the American Statistical Association, Business
and Economics Statistics 177181.
Bollerslev, T. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307327.
Bollerslev, T., R. Y. Chou, and K. F. Kroner. 1992. ARCH modeling in finance. Journal of Econometrics 52: 559.
Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed.
R. F. Engle and D. L. McFadden. Amsterdam: Elsevier.
Bollerslev, T., and J. M. Wooldridge. 1992. Quasi-maximum likelihood estimation and inference in dynamic models
with time-varying covariances. Econometric Reviews 11: 143172.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
. 2004. Econometric Theory and Methods. New York: Oxford University Press.
Diebold, F. X. 2003. The ET Interview: Professor Robert F. Engle. Econometric Theory 19: 11591193.
Ding, Z., C. W. J. Granger, and R. F. Engle. 1993. A long memory property of stock market returns and a new
model. Journal of Empirical Finance 1: 83106.
Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley.

42

arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators

Engle, R. F. 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom
inflation. Econometrica 50: 9871007.
. 1990. Discussion: Stock volatility and the crash of 87. Review of Financial Studies 3: 103106.
Engle, R. F., D. M. Lilien, and R. P. Robins. 1987. Estimating time varying risk premia in the term structure: The
ARCH-M model. Econometrica 55: 391407.
Glosten, L. R., R. Jagannathan, and D. E. Runkle. 1993. On the relation between the expected value and the volatility
of the nominal excess return on stocks. Journal of Finance 48: 17791801.
Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
. 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press.
Higgins, M. L., and A. K. Bera. 1992. A class of nonlinear ARCH models. International Economic Review 33:
137158.
Hill, R. C., W. E. Griffiths, and G. C. Lim. 2011. Principles of Econometrics. 4th ed. Hoboken, NJ: Wiley.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics.
2nd ed. New York: Wiley.
Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press.
Mandelbrot, B. B. 1963. The variation of certain speculative prices. Journal of Business 36: 394419.
Nelson, D. B. 1991. Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59: 347370.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2007. Numerical Recipes: The Art of Scientific
Computing. 3rd ed. New York: Cambridge University Press.
Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: AddisonWesley.
Wooldridge, J. M. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western.
Zakoian, J. M. 1994. Threshold heteroskedastic models. Journal of Economic Dynamics and Control 18: 931955.

Also see
[TS] arch postestimation Postestimation tools for arch
[TS] tsset Declare data to be time-series data
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] mgarch Multivariate GARCH models
[R] regress Linear regression
[U] 20 Estimation and postestimation commands

Title
arch postestimation Postestimation tools for arch
Description
Remarks and examples

Syntax for predict


Also see

Menu for predict

Options for predict

Description
The following postestimation commands are available after arch:
Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear
combinations of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
test
testnl

Syntax for predict


predict
statistic

type

newvar

if

 

in

 

, statistic options

Description

Main

xb
y
variance
het
residuals
yresiduals

predicted values for mean equationthe differenced series; the default


predicted values for the mean equation in y the undifferenced series
predicted values for the conditional variance
predicted values of the variance, considering only the multiplicative
heteroskedasticity
residuals or predicted innovations
residuals or predicted innovations in y the undifferenced series

These statistics are available both in and out of sample; type predict
for the estimation sample.

43

. . . if e(sample) . . . if wanted only

44

arch postestimation Postestimation tools for arch

Description

options
Options

dynamic(time constant)
at(varname | #  varname2 | # 2 )
t0(time constant)
structural

how to handle the lags of yt


make static predictions
set starting point for the recursions to time constant
calculate considering the structural component only

time constant is a # or a time literal, such as td(1jan1995) or tq(1995q1), etc.; see


Conveniently typing SIF values in [D] datetime.

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict


Six statistics can be computed by using predict after arch: the predictions of the mean equation
(option xb, the default), the undifferenced predictions of the mean equation (option y), the predictions
of the conditional variance (option variance), the predictions of the multiplicative heteroskedasticity
component of variance (option het), the predictions of residuals or innovations (option residuals),
and the predictions of residuals or innovations in terms of y (option yresiduals). Given the dynamic
nature of ARCH models and because the dependent variable might be differenced, there are other
ways of computing each statistic. We can use all the data on the dependent variable available right
up to the time of each prediction (the default, which is often called a one-step prediction), or we
can use the data up to a particular time, after which the predicted value of the dependent variable
is used recursively to make later predictions (option dynamic()). Either way, we can consider or
ignore the ARMA disturbance component, which is considered by default and is ignored if you specify
the structural option. We might also be interested in predictions at certain fixed points where we
specify the prior values of t and t2 (option at()).

Main

xb, the default, calculates the predictions from the mean equation. If D.depvar is the dependent
variable, these predictions are of D.depvar and not of depvar itself.
y specifies that predictions of depvar are to be made even if the model was specified for, say,
D.depvar.
variance calculates predictions of the conditional variance
bt2 .
het calculates predictions of the multiplicative heteroskedasticity component of variance.
residuals calculates the residuals. If no other options are specified, these are the predicted innovations
t ; that is, they include any ARMA component. If the structural option is specified, these are
the residuals from the mean equation, ignoring any ARMA terms; see structural below. The
residuals are always from the estimated equation, which may have a differenced dependent variable;
if depvar is differenced, they are not the residuals of the undifferenced depvar.
yresiduals calculates the residuals for depvar, even if the model was specified for, say, D.depvar. As
with residuals, the yresiduals are computed from the model, including any ARMA component.
If the structural option is specified, any ARMA component is ignored and yresiduals are the
residuals from the structural equation; see structural below.

arch postestimation Postestimation tools for arch

45

Options

dynamic(time constant) specifies how lags of yt in the model are to be handled. If dynamic()
is not specified, actual values are used everywhere lagged values of yt appear in the model to
produce one-step-ahead forecasts.
dynamic(time constant) produces dynamic (also known as recursive) forecasts. time constant
specifies when the forecast is to switch from one step ahead to dynamic. In dynamic forecasts,
references to yt evaluate to the prediction of yt for all periods at or after time constant; they
evaluate to the actual value of yt for all prior periods.
dynamic(10), for example, would calculate predictions where any reference to yt with t < 10
evaluates to the actual value of yt and any reference to yt with t 10 evaluates to the prediction
of yt . This means that one-step-ahead predictions would be calculated for t < 10 and dynamic
predictions would be calculated thereafter. Depending on the lag structure of the model, the dynamic
predictions might still refer to some actual values of yt .
You may also specify dynamic(.) to have predict automatically switch from one-step-ahead to
dynamic predictions at p + q , where p is the maximum AR lag and q is the maximum MA lag.
at(varname | #  varname2 | # 2 ) makes static predictions. at() and dynamic() may not be
specified together.
Specifying at() allows static evaluation of results for a given set of disturbances. This is useful,
for instance, in generating the news response function. at() specifies two sets of values to be
used for t and t2 , the dynamic components in the model. These specified values are treated as
given. Also, any lagged values of depvar in the model are obtained from the real values of the
dependent variable. All computations are based on actual data and the given values.
at() requires that you specify two arguments, which can be either a variable name or a number.
The first argument supplies the values to be used for t ; the second supplies the values to be used
for t2 . If t2 plays no role in your model, the second argument may be specified as . to indicate
missing.
t0(time constant) specifies the starting point for the recursions to compute the predicted statistics;
disturbances are assumed to be 0 for t < t0(). The default is to set t0() to the minimum t
observed in the estimation sample, meaning that observations before that are assumed to have
disturbances of 0.
t0() is irrelevant if structural is specified because then all observations are assumed to have
disturbances of 0.
t0(5), for example, would begin recursions at t = 5. If your data were quarterly, you might
instead type t0(tq(1961q2)) to obtain the same result.
Any ARMA component in the mean equation or GARCH term in the conditional-variance equation
makes arch recursive and dependent on the starting point of the predictions. This includes
one-step-ahead predictions.
structural makes the calculation considering the structural component only, ignoring any ARMA
terms, and producing the steady-state equilibrium predictions.

46

arch postestimation Postestimation tools for arch

Remarks and examples


Example 1
Continuing with our EGARCH model example (example 3) in [TS] arch, we can see that predict,
at() calculates t2 given a set of specified innovations (t , t1 , . . .) and prior conditional variances
2
2
(t1
, t2
, . . .). The syntax is
. predict newvar, variance at(epsilon sigma2)

epsilon and sigma2 are either variables or numbers. Using sigma2 is a little tricky because you specify
values of t2 , which predict is supposed to predict. predict does not simply copy variable sigma2
into newvar but uses the lagged values contained in sigma2 to produce the predicted value of t2 . It
does this for all t, and those results are saved in newvar. (If you are interested in dynamic predictions
of t2 , see Options for predict.)
We will generate predictions for t2 , assuming that the lagged values of t2 are 1, and we will
vary t from 4 to 4. First, we will create variable et containing t , and then we will create and
graph the predictions:
. generate et = (_n-64)/15
. predict sigma2, variance at(et 1)
. line sigma2 et in 2/l, m(i) c(l) title(News response function)

Conditional variance, onestep


.5
1
1.5
2

2.5

News response function

0
et

The positive asymmetry does indeed dominate the shape of the news response function. In fact, the
response is a monotonically increasing function of news. The form of the response function shows
that, for our simple model, only positive, unanticipated price increases have the destabilizing effect
that we observe as larger conditional variances.

arch postestimation Postestimation tools for arch

47

Example 2
Continuing with our ARCH model with constraints example (example 6) in [TS] arch, using lincom
we can recover the parameter from the original specification.
. lincom [ARCH]l1.arch/.4
( 1) 2.5*[ARCH]L.arch = 0
D.ln_wpi

Coef.

(1)

.5450344

Std. Err.

P>|z|

[95% Conf. Interval]

.1844468

2.95

0.003

.1835253

.9065436

Any arch parameter could be used to produce an identical estimate.

Also see
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[U] 20 Estimation and postestimation commands

Title
arfima Autoregressive fractionally integrated moving-average models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
arfima depvar

indepvars

 

if

 

in

 

, options

Description

options
Model

noconstant
ar(numlist)
ma(numlist)
smemory
mle
mpl
constraints(numlist)
collinear

suppress constant term


autoregressive terms
moving-average terms
estimate short-memory model without fractional integration
maximum likelihood estimates; the default
maximum modified-profile-likelihood estimates
apply specified linear constraints
do not drop collinear variables

SE/Robust

vcetype may be oim or robust

vce(vcetype)
Reporting

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

level(#)
nocnsreport
display options
Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

You must tsset your data before using arfima; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Time series

>

ARFIMA models

48

arfima Autoregressive fractionally integrated moving-average models

49

Description
arfima estimates the parameters of autoregressive fractionally integrated moving-average (ARFIMA)
models.
Long-memory processes are stationary processes whose autocorrelation functions decay more
slowly than short-memory processes. The ARFIMA model provides a parsimonious parameterization of
long-memory processes that nests the autoregressive moving-average (ARMA) model, which is widely
used for short-memory processes. By allowing for fractional degrees of integration, the ARFIMA model
also generalizes the autoregressive integrated moving-average (ARIMA) model with integer degrees of
integration. See [TS] arima for ARMA and ARIMA parameter estimation.

Options


Model

noconstant; see [R] estimation options.


ar(numlist) specifies the autoregressive (AR) terms to be included in the model. An AR(p), p 1,
specification would be ar(1/p). This model includes all lags from 1 to p, but not all lags need
to be included. For example, the specification ar(1 p) would specify an AR(p) with only lags 1
and p included, setting all the other AR lag parameters to 0.
ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for
the lagged innovations (white-noise disturbances). ma(1/q ), q 1, specifies an MA(q ) model, but
like the ar() option, not all lags need to be included.
smemory causes arfima to fit a short-memory model with d = 0. This option causes arfima to
estimate the parameters of an ARMA model by a method that is asymptotically equivalent to that
produced by arima; see [TS] arima.
mle causes arfima to estimate the parameters by maximum likelihood. This method is the default.
mpl causes arfima to estimate the parameters by maximum modified profile likelihood (MPL). The
MPL estimator of the fractional-difference parameter has less small-sample bias than the maximum
likelihood estimator when there are covariates in the model. mpl may only be specified when there
is a constant term or indepvars in the model, and it may not be combined with the mle option.
constraints(numlist), collinear; see [R] estimation options.

SE/Robust

vce(vcetype) specifies the type of standard error reported, which includes types that are robust to
some kinds of misspecification (robust) and that are derived from asymptotic theory (oim); see
[R] vce option.
Options vce(robust) and mpl may not be combined.

Reporting

level(#), nocnsreport; see [R] estimation options.


display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

50

arfima Autoregressive fractionally integrated moving-average models

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), gtolerance(#), nonrtolerance(#), and from(init specs); see [R] maximize for all options.
Some special points for arfimas maximize options are listed below.
technique(algorithm spec) sets the optimization algorithm. The default algorithm is BFGS and
BHHH is not allowed. See [R] maximize for a description of the available optimization algorithms.
You can specify multiple optimization methods. For example, technique(bfgs 10 nr) requests
that the optimizer perform 10 BFGS iterations and then switch to NewtonRaphson until convergence.
iterate(#) sets the maximum number of iterations. When the maximization is not going well,
set the maximum number of iterations to the point where the optimizer appears to be stuck and
inspect the estimation results at that point.
from(matname) allows you to specify starting values for the model parameters in a row vector.
We recommend that you use the iterate(0) option, retrieve the initial estimates from e(b),
and modify these elements.
The following option is available with arfima but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Long-memory processes are stationary processes whose autocorrelation functions decay more
slowly than short-memory processes. Because the autocorrelations die out so slowly, long-memory
processes display a type of long-run dependence. The autoregressive fractionally integrated movingaverage (ARFIMA) model provides a parsimonious parameterization of long-memory processes. This
parameterization nests the autoregressive moving-average (ARMA) model, which is widely used for
short-memory processes.
The ARFIMA model also generalizes the autoregressive integrated moving-average (ARIMA) model
with integer degrees of integration. ARFIMA models provide a solution for the tendency to overdifference
stationary series that exhibit long-run dependence. In the ARIMA approach, a nonstationary time series
is differenced d times until the differenced series is stationary, where d is an integer. Such series
are said to be integrated of order d, denoted I(d), with not differencing, I(0), being the option for
stationary series. Many series exhibit too much dependence to be I(0) but are not I(1), and ARFIMA
models are designed to represent these series.
The ARFIMA model allows for a continuum of fractional differences, 0.5 < d < 0.5. The
generalization to fractional differences allows the ARFIMA model to handle processes that are neither
I(0) nor I(1), to test for overdifferencing, and to model long-run effects that only die out at long
horizons.

arfima Autoregressive fractionally integrated moving-average models

51

Technical note
An ARIMA model for the series yt is given by

(L)(1 L)d yt = (L)t

(1)

where (L) = (1 1 L 2 L2 p Lp ) is the autoregressive (AR) polynomial in the lag


operator L; Lyt = yt1 ; (L) = (1 + 1 L + 2 L2 + + p Lp ) is the moving-average (MA) lag
polynomial; t is the independent and identically distributed innovation term; and d is the integer
number of differences required to make the yt stationary. An ARFIMA model is also specified by (1)
with the generalization that 0.5 < d < 0.5. Series with d 0.5 are handled by differencing and
subsequent ARFIMA modeling.

Because long-memory processes are stationary, one might be tempted to approximate the processes
with many terms in an ARMA model. But these approximate models are difficult to fit and to interpret
because ARMA models with many terms are difficult to estimate and the ARMA parameterization has
an inherent short-run nature. In contrast, the ARFIMA model has the d parameter for the long-run
dependence and ARMA parameters for short-run dependence. Using different parameters for different
types of dependence facilitates estimation and interpretation, as discussed by Sowell (1992a).

Technical note
An ARFIMA model specifies a fractionally integrated ARMA process. Formally, the ARFIMA model
specifies that

yt = (1 L)d {(L)}1 (L)t


The short-run ARMA process (L)1 (L)t captures the short-run effects, and the long-run effects
are captured by fractionally integrating the short-run ARMA process.
Essentially, the fractional-integration parameter d captures the long-run effects, and the ARMA
parameters capture the short-run effects. Having separate parameters for short-run and long-run
effects makes the ARFIMA model more flexible and easier to interpret than the ARMA model. After
estimating the ARFIMA parameters, the short-run effects are obtained by setting d = 0, whereas the
long-run effects use the estimated value for d. The short-run effects describe the behavior of the
fractionally differenced process (1 L)d yt , whereas the long-run effects describe the behavior of the
fractionally integrated yt .

ARFIMA models have been useful in fields as diverse as hydrology and economics. Long-memory
processes were first introduced in hydrology by Hurst (1951). Hosking (1981), in hydrology, and
Granger and Joyeux (1980), in economics, independently discovered the ARFIMA representation of
long-memory processes. Beran (1994), Baillie (1996), and Palma (2007) provide good introductions
to long-memory processes and ARFIMA models.

Example 1: Mount Campito tree ring data


Baillie (1996) discusses a time series of measurements of the widths of the annual rings of a
Mount Campito Bristlecone pine. The series contains measurements on rings formed in the tree from
3436 BC to 1969 AD. Essentially, larger widths were good years for the tree and narrower widths
were harsh years.

52

arfima Autoregressive fractionally integrated moving-average models

We begin by plotting the time series.

20

tree ring width in 0.01mm


40
60
80

100

. use http://www.stata-press.com/data/r13/campito
(Campito Mnt. tree ring data from 3435BC to 1969AD)
. tsline width, xlabel(-3435(500)1969) ysize(2)

3435 2935 2435 1935 1435

935 435
year

65

565

1065

1565

2065

Good years and bad years seem to run together, causing the appearance of local trends. The local
trends are evidence of dependence, but they are not as pronounced as those in a nonstationary series.
We plot the autocorrelations for another view:

0.20

Autocorrelations of width
0.00
0.20
0.40
0.60

0.80

. ac width, ysize(2)

10

20
Lag

30

40

Bartletts formula for MA(q) 95% confidence bands

The autocorrelations do not start below 1 but decay very slowly.


Granger and Joyeux (1980) show that the autocorrelations from an ARMA model decay exponentially,
whereas the autocorrelations from an ARFIMA process decay at the much slower hyperbolic rate. Box,
Jenkins, and Reinsel (2008) define short-memory processes as those whose autocorrelations decay
exponentially fast and long-memory processes as those whose autocorrelations decay at the hyperbolic
rate. The above plot of autocorrelations looks closer to hyperbolic than exponential.
Together, the above plots make us suspect that the series was generated by a long-memory process.
We see evidence that the series is stationary but that the autocorrelations die out much slower than a
short-memory process would predict.
Given that we believe the data was generated by a stationary process, we begin by fitting the data
to an ARMA model. We begin by using a short-memory model because a comparison of the results
highlights the advantages of using an ARFIMA model for a long-memory process.

arfima Autoregressive fractionally integrated moving-average models


. arima width, ar(1/2) ma(1) technique(bhhh 4 nr)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -18934.593
Iteration 1:
log likelihood = -18914.337
Iteration 2:
log likelihood = -18913.407
Iteration 3:
log likelihood = -18913.24
(switching optimization to Newton-Raphson)
Iteration 4:
log likelihood = -18913.214
Iteration 5:
log likelihood = -18913.208
Iteration 6:
log likelihood = -18913.208
ARIMA regression
Sample: -3435 - 1969
Number of obs
Wald chi2(3)
Log likelihood = -18913.21
Prob > chi2
OIM
Std. Err.

width

Coef.

_cons

42.45055

1.02142

ar
L1.
L2.

1.264367
-.2848827

ma
L1.
/sigma

53

=
5405
= 133686.46
=
0.0000

P>|z|

[95% Conf. Interval]

41.56

0.000

40.44861

44.4525

.0253199
.0227534

49.94
-12.52

0.000
0.000

1.214741
-.3294785

1.313994
-.2402869

-.8066007

.0189699

-42.52

0.000

-.8437811

-.7694204

8.005814

.0770004

103.97

0.000

7.854896

8.156732

width

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

The roots of the AR polynomial are 0.971 and 0.293, and the root of the MA polynomial is 0.807;
all of these are less than one in magnitude, indicating that the series is stationary and invertible
but has a high level of persistence. See Hamilton (1994, 59) for how to compute the roots of the
polynomials from the estimated coefficients.
Below we estimate the parameters of an ARFIMA model with only the fractional difference parameter
and a constant.

54

arfima Autoregressive fractionally integrated moving-average models


. arfima width
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood
Refining estimates:
Iteration 0:
log likelihood
Iteration 1:
log likelihood
ARFIMA regression
Sample: -3435 - 1969

=
=
=
=
=
=
=
=

-18918.219
-18916.84
-18908.508
-18908.508
-18907.379
-18907.318
-18907.279
-18907.279

(backed up)

= -18907.279
= -18907.279
Number of obs
Wald chi2(1)
Prob > chi2

Log likelihood = -18907.279

=
=
=

5405
1864.44
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

44.01432

9.174319

4.80

0.000

26.03299

61.99566

.4468888

.0103496

43.18

0.000

.4266038

.4671737

/sigma2

63.92927

1.229754

51.99

0.000

61.519

66.33955

width

Coef.

_cons

width

ARFIMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

The estimate of d is large and statistically significant. The relative parsimony of the ARFIMA model
is illustrated by the fact that the estimates of the standard deviation of the idiosyncratic errors are
about the same in the 5-parameter ARMA model and the 3-parameter ARFIMA model.

arfima Autoregressive fractionally integrated moving-average models

55

Lets add an AR parameter to the above ARFIMA model:


. arfima width, ar(1)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Refining estimates:
Iteration 0:
log likelihood
Iteration 1:
log likelihood
ARFIMA regression
Sample: -3435 - 1969

=
=
=
=
=
=
=

-18910.997
-18910.949
-18908.158
-18907.248
-18907.233
-18907.233
-18907.233

(backed up)
(backed up)

= -18907.233
= -18907.233
Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -18907.233

=
=
=

5405
1875.35
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

43.98774

8.68516

5.06

0.000

26.96513

61.01034

ar
L1.

.0063323

.020731

0.31

0.760

-.0342997

.0469642

.4432471

.0157775

28.09

0.000

.4123238

.4741704

/sigma2

63.92915

1.229754

51.99

0.000

61.51888

66.33942

width

Coef.

_cons

width

ARFIMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

That the estimated AR term is tiny and statistically insignificant indicates that the d parameter has
accounted for all the dependence in the series.

As mentioned above, there is a sense in which the main advantages of an ARFIMA model over an
ARMA model for long-memory processes are the relative parsimony of the ARFIMA parameterization
and the ability of the ARFIMA parameterization to separate out the long-run effects from the short-run
effects. If the true process was generated from an ARFIMA model, an ARMA model with many terms
can approximate the process, but the terms make estimation difficult and the lack of separate long-run
and short-run parameters complicates interpretation.
This example highlights the relative parsimony of the ARFIMA model. In the examples below, we
illustrate the advantages of having separate parameters for long-run and short-run effects.

Technical note
You may be wondering what long-run effects can be produced by a model for stationary processes.
Because the autocorrelations of a long-memory process die out so slowly, the spectral density becomes
infinite as the frequency goes to 0 and the impulseresponse functions die out at a much slower rate.
The spectral density of a process describes the relative contributions of random components at
different frequencies to the variance of the process, with the low-frequency components corresponding
to long-run effects. See [TS] psdensity for an introduction to estimating and interpreting spectral
densities implied by the estimated parameters of parametric models.

56

arfima Autoregressive fractionally integrated moving-average models

Granger and Joyeux (1980) motivate ARFIMA models by noting that their implied spectral densities
are finite except at frequency 0 with 0 < d < 0.5, whereas stationary ARMA models have finite spectral
densities at all frequencies. Granger and Joyeux (1980) argue that the ability of ARFIMA models to
capture this long-range dependence, which cannot be captured by stationary ARMA models, is an
important advantage of ARFIMA models over ARMA models when modeling long-memory processes.
Impulseresponse functions are the coefficients on the infinite-order MA representation of a process,
and they describe how a shock feeds though the dynamic system. If the process is stationary, the
coefficients decay to 0 and they sum to a finite constant. As expected, the coefficients from an ARFIMA
model die out at a slower rate than those from an ARMA model. Because the ARMA terms model
the short-run effects and the d parameter models the long-run effects, an ARFIMA model specifies
both a short-run impulseresponse function and a long-run impulseresponse function. When an
ARMA model is used to approximate a long-memory model, the ARMA impulseresponse-function
coefficients confound the two effects.

Example 2
In this example, we model the log of the monthly levels of carbon dioxide above Mauna Loa,
Hawaii. To remove the seasonality, we model the twelfth seasonal difference of the log of the series.
This example illustrates that the ARFIMA model parameterizes long-run and short-run effects, whereas
the ARMA model confounds the two effects. (Sowell [1992a] discusses this point in greater depth.)
We begin by fitting the series to an ARMA model with an AR(1) term and an MA(2).
. use http://www.stata-press.com/data/r13/mloa
. arima S12.log, ar(1) ma(2)
(setting optimization to BHHH)
Iteration 0:
log likelihood = 2000.9262
Iteration 1:
log likelihood = 2001.5484
Iteration 2:
log likelihood = 2001.5637
Iteration 3:
log likelihood = 2001.5641
Iteration 4:
log likelihood = 2001.5641
ARIMA regression
Sample: 1960m1 - 1990m12
Log likelihood =

2001.564

S12.log

Coef.

_cons

.0036754

.0002475

ar
L1.

.7354346

ma
L2.
/sigma

OPG
Std. Err.

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

372
500.41
0.0000

P>|z|

[95% Conf. Interval]

14.85

0.000

.0031903

.0041605

.0357715

20.56

0.000

.6653237

.8055456

.1353086

.0513156

2.64

0.008

.0347319

.2358853

.0011129

.0000401

27.77

0.000

.0010344

.0011914

log

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

All the parameters are statistically significant, and they indicate a high degree of dependence.

arfima Autoregressive fractionally integrated moving-average models

57

Below we nest the previously fit ARMA model into an ARFIMA model.
. arfima S12.log, ar(1) ma(2)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Refining estimates:
Iteration 0:
log likelihood
Iteration 1:
log likelihood

=
=
=
=
=

2006.0757
2006.0774
2006.0775
2006.0804
2006.0805

=
=

2006.0805
2006.0805

(backed up)
(backed up)

ARFIMA regression
Sample: 1960m1 - 1990m12
Log likelihood =

2006.0805

S12.log

Coef.

S12.log
_cons

Number of obs
Wald chi2(3)
Prob > chi2

=
=
=

372
248.88
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

.003616

.0012968

2.79

0.005

.0010743

.0061578

ar
L1.

.2160894

.1015575

2.13

0.033

.0170403

.4151385

ma
L2.

.1633916

.051691

3.16

0.002

.0620791

.2647041

.4042573

.0805442

5.02

0.000

.2463935

.5621211

/sigma2

1.20e-06

8.84e-08

13.63

0.000

1.03e-06

1.38e-06

ARFIMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

All the parameters are statistically significant at the 5% level. That the confidence interval for the
fractional-difference parameter d includes numbers greater than 0.5 is evidence that the series may be
nonstationary. Alternatively, we proceed as if the series is stationary, and the wide confidence interval
for d reflects the difficulty of fitting a complicated dynamic model with only 372 observations.
With the above caveat, we can now proceed to compare the interpretations of the ARMA and ARFIMA
estimates. We compare these estimates in terms of their implied spectral densities. The spectral density
of a stationary time series describes the relative importance of components at different frequencies.
See [TS] psdensity for an introduction to spectral densities.
Below we quietly refit the ARMA model and use psdensity to estimate the parametric spectral
density implied by the ARMA parameter estimates.
. quietly arima S12.log, ar(1) ma(2)
. psdensity d_arma omega1

The psdensity command above put the estimated ARMA spectral density into the new variable
d arma at the frequencies stored in the new variable omega1.
Below we quietly refit the ARFIMA model and use psdensity to estimate the long-run parametric
spectral density and then the short-run parametric spectral density implied by the ARFIMA parameter
estimates. The long-run estimates use the estimated d, and the short-run estimates set d to 0 (as is
implied by specifying the smemory option). The long-run estimates describe the fractionally integrated
series, and the short-run estimates describe the fractionally differenced series.

58

arfima Autoregressive fractionally integrated moving-average models


. quietly arfima S12.log, ar(1) ma(2)
. psdensity d_arfima omega2
. psdensity ds_arfima omega3, smemory

Now that we have the ARMA estimates, the long-run ARFIMA estimates, and the short-run ARFIMA
estimates, we graph them below.
. line d_arma d_arfima

omega1, name(lmem) nodraw

. line d_arma ds_arfima omega1, name(smem) nodraw

. graph combine lmem smem, cols(1) xcommon

Frequency
ARFIMA longmemory spectral density

.5

1.5

ARMA spectral density

Frequency
ARMA spectral density

ARFIMA shortmemory spectral density

The top graph contains a plot of the spectral densities implied by the ARMA parameter estimates
and by the long-run ARFIMA parameter estimates. As discussed by Granger and Joyeux (1980), the
two models imply different spectral densities for frequencies close to 0 when d > 0. When d > 0,
the spectral density implied by the ARFIMA estimates diverges to infinity, whereas the spectral density
implied by the ARMA estimates remains finite at frequency 0 for stable ARMA processes. This difference
reflects the ability of ARFIMA models to capture long-run effects that ARMA models only capture as
the parameters approach those of an unstable model.
The bottom graph contains a plot of the spectral densities implied by the ARMA parameter estimates
and by the short-run ARFIMA parameter estimates, which are the ARMA parameters for the fractionally
differenced process. Comparing the two plots illustrates the ability of the short-run ARFIMA parameters
to capture both low-frequency and high-frequency components in the fractionally differenced series. In
contrast, the ARMA parameters captured only low-frequency components in the fractionally integrated
series.
Comparing the ARFIMA and ARMA spectral densities in the two graphs illustrates that the additional
fractional-difference parameter allows the ARFIMA model to identify both long-run and short-run
effects, which the ARMA model confounds.

Technical note
As noted above, the spectral density of an ARFIMA process with d > 0 diverges to infinity as
the frequency goes to 0. In contrast, the spectral density of an ARFIMA process with d < 0 is 0 at
frequency 0.

arfima Autoregressive fractionally integrated moving-average models

59

The autocorrelation function of an ARFIMA process with d < 0 also decays at the slower hyperbolic
rate. ARFIMA processes with d < 0 are sometimes called antipersistent because all the autocorrelations
for lags greater than 0 are negative.
Hosking (1981), Baillie (1996), and others refer to ARFIMA processes with d < 0 as intermediate
memory processes and ARFIMA processes with d > 0 as long-memory processes. Box, Jenkins, and
Reinsel (2008, 429) define long-memory processes as those with the slower hyperbolic rate of decay,
which includes ARFIMA processes with d < 0. We follow Box, Jenkins, and Reinsel (2008) and thus
call ARFIMA processes for 0.5 < d < 0 and 0 < d < 0.5 long-memory processes.
Sowell (1992a) uses the properties of ARFIMA processes with d < 0 to derive tests for whether a
series was generated by an I(1) process or an I(d) process with d < 1.

Example 3
In this example, we use arfima to test whether a series is nonstationary. More specifically, we
test whether the series was generated by an I(1) process by testing whether the first difference of
the series is overdifferenced.
We have monthly data on the log of the number of reported cases of mumps in New York City
between January 1928 and December 1972. We believe that the series is stationary, after accounting
for the monthly seasonal effects. We use an ARFIMA model for differenced series to test the null
hypothesis of nonstationarity. We use the confidence interval for the d parameter from an ARFIMA
model for the first difference of the log of the series to perform the test. If the right-hand end of the
95% CI is less than 0, we conclude that the differenced series was overdifferenced, which implies
that the original series was not nonstationary.
More formally, if yt is I(1), then yt = yt yt1 must be I(0). If yt is I(d) with d < 0,
then yt is overdifferenced and yt is I(d) with d < 1.
We use seasonal indicators to account for the seasonal effects. In the output below, we specify the
mpl option to use the MPL estimator that is less biased in the presence of covariates.
arfima computes the maximum likelihood estimates (MLE) for the parameters of this stationary
and invertible Gaussian process. Alternatively, the maximum MPL estimates may be computed. See
Methods and formulas for a description of these two estimation techniques, but suffice it to say
that the MLE estimates for d are biased in the presence of exogenous variables, even the constant
term, for small samples. The MPL estimator reduces this bias; see Hauser (1999) and Doornik and
Ooms (2004).
. use http://www.stata-press.com/data/r13/mumps2, clear
(Hipel and Mcleod (1994), http://robjhyndman.com/tsdldata/epi/mumps.dat)
. arfima D.log i.month, ma(1 2) mpl
Iteration 0:
log modified profile
Iteration 1:
log modified profile
Iteration 2:
log modified profile
Iteration 3:
log modified profile
Iteration 4:
log modified profile
Iteration 5:
log modified profile
Iteration 6:
log modified profile
Iteration 7:
log modified profile
Iteration 8:
log modified profile
Refining estimates:
Iteration 0:
log modified profile
Iteration 1:
log modified profile

likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood

=
=
=
=
=
=
=
=
=

53.766763
54.388641
54.934726
54.937524
55.002186
55.20462
55.205939
55.205949
55.205949

likelihood =
likelihood =

55.205949
55.205949

(backed up)
(backed up)

60

arfima Autoregressive fractionally integrated moving-average models


ARFIMA regression
Sample: 1928m2 - 1972m6
Log modified profile likelihood =

55.205949

OIM
Std. Err.

Number of obs
Wald chi2(14)
Prob > chi2

=
=
=

533
1360.28
0.0000

D.log

Coef.

month
February
March
April
May
June
July
August
September
October
November
December

-.220719
.0314683
-.2800296
-.3703179
-.4722035
-.9613239
-1.063042
-.7577301
-.3024251
-.0115317
.0247135

.0428112
.0424718
.0460084
.0449932
.0446764
.0448375
.0449272
.0452529
.0462887
.0426911
.0430401

-5.16
0.74
-6.09
-8.23
-10.57
-21.44
-23.66
-16.74
-6.53
-0.27
0.57

0.000
0.459
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.787
0.566

-.3046275
-.0517749
-.3702043
-.4585029
-.5597676
-1.049204
-1.151098
-.8464242
-.3931494
-.0952046
-.0596435

-.1368105
.1147115
-.1898548
-.2821329
-.3846394
-.873444
-.9749868
-.669036
-.2117009
.0721413
.1090705

_cons

.3656807

.0303215

12.06

0.000

.3062517

.4251096

ma
L1.
L2.

.258056
.1972011

.0684414
.0506439

3.77
3.89

0.000
0.000

.1239132
.0979409

.3921988
.2964613

-.2329426

.0673361

-3.46

0.001

-.3649188

-.1009663

P>|z|

[95% Conf. Interval]

D.log

ARFIMA

We interpret the fact that the estimated 95% CI is strictly less than 0 to mean that the differenced
series is overdifferenced, which implies that the original series is stationary.

Stored results
arfima stores the following in e():
Scalars
e(N)
e(k)
e(k eq)
e(k dv)
e(k aux)
e(df m)
e(ll)
e(chi2)
e(p)
e(s2)
e(tmin)
e(tmax)
e(ar max)
e(ma max)
e(rank)
e(ic)
e(rc)
e(converged)
e(constant)

number of observations
number of parameters
number of equations in e(b)
number of dependent variables
number of auxiliary parameters
model degrees of freedom
log likelihood
2

significance
idiosyncratic error variance estimate, if e(method) = mpl
minimum time
maximum time
maximum AR lag
maximum MA lag
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise
0 if noconstant, 1 otherwise

arfima Autoregressive fractionally integrated moving-average models


Macros
e(cmd)
e(cmdline)
e(depvar)
e(covariates)
e(eqnames)
e(title)
e(tmins)
e(tmaxs)
e(chi2type)
e(vce)
e(vcetype)
e(ma)
e(ar)
e(technique)
e(tech steps)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(V)
e(V modelbased)
Functions
e(sample)

61

arfima
command as typed
name of dependent variable
list of covariates
names of equations
title in estimation output
formatted minimum time
formatted maximum time
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
lags for MA terms
lags for AR terms
maximization technique
number of iterations performed before switching techniques
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


Methods and formulas are presented under the following headings:
Introduction
The likelihood function
The autocovariance function
The profile likelihood
The MPL

Introduction
We model an observed second-order stationary time-series yt , t = 1, . . . , T , using the
ARFIMA(p, d, q) model defined as

(Lp )(1 L)d (yt xt ) = (Lq )t


where

(Lp ) = 1 1 L 2 L2 p Lp
(Lq ) = 1 + 1 L + 2 L2 + + q Lq

(1 L)d =

X
(1)j
j=0

(j + d)
Lj
(j + 1)(d)

62

arfima Autoregressive fractionally integrated moving-average models

and the lag operator is defined as Lj yt = ytj , t = 1, . . . , T and j = 1, . . . , t 1; t N (0, 2 );


() is the gamma function; and 0.5 < d < 0.5, d 6= 0. The row vector xt contains the exogenous
variables specified as indepvars in the arfima syntax.
The process is stationary and invertible for 0.5 < d < 0.5; the roots of the AR polynomial, (z) =
1 1 z 2 z 2 p z p = 0, and the MA polynomial, (z) = 1 + 1 z + 2 z 2 + + q z q = 0,
lie outside the unit circle and there are no common roots. When 0 < d < 0.5, the process has
long
P memory in that the autocovariance function, h , decays to 0 at a hyperbolic rate, such that
has long memory in that the autocovariance
h= |h | = . When 0.5 < d < 0, the process also P

function, h , decays to 0 at a hyperbolic rate such that


h= |h | < . (As discussed in the
text, some authors refer to ARFIMA processes with 0.5 < d < 0 as having intermediate memory,
but we follow Box, Jenkins, and Reinsel [2008] and refer to them as long-memory processes.)
Granger and Joyeux (1980), Hosking (1981), Sowell (1992b), Sowell (1992a), Baillie (1996), and
Palma (2007) provide overviews of long-memory processes, fractional integration, and introductions
to ARFIMA models.

The likelihood function


Estimation of the ARFIMA parameters , , d, and 2 is done by the method of maximum
0
b0,
likelihood. The log Gaussian likelihood of y given parameter estimates
b = (b
0 , b
, db,
b2 ) is

`(y|b
) =


1
b + (y X
b 1 (y X
b )0 V
b)
T log(2) + log |V|
2

where the covariance matrix V has a Toeplitz structure



1
2
0
0
1
1
V=
..
..
..
.
.
.

T 1

T 2

T 3

(2)

. . . T 1
. . . T 2
..
..

.
.
...
0

Var(yt ) = 0 , Cov(yt , yth ) = h (for h = 1, . . . , t 1), and t = 1, . . . , T (Sowell 1992b).


We use the DurbinLevinson algorithm (Palma 2007; Golub and Van Loan 1996) to factor and
invert V. Using only the vector of autocovariances , the DurbinLevinson algorithm will compute
b 0.5 L
b 1 (y X
b ), where L is lower triangular and V = LDL0 and D = Diag(),
b
 = D
t = Var(yt ). The algorithm performs these computations without generating the T T matrix L1 .
During optimization, we restrict the fractional-integration parameter to (0.5, 0.5) using a logistic
transform, d = log {(x + 0.5)/(0.5 x)}, so that the range of d encompasses the real line. During
the Refining estimates step, the fractional-integration parameter is transformed back to the restricted
space, where we obtain its standard error from the observed information matrix.

The autocovariance function


Computation of the autocovariances h is given by Sowell (1992b) with numerical enhancements
by Doornik and Ooms (2003) and is reviewed by Palma (2007, sec. 3.2.4). We reproduce it here.
The autocovariance of an ARFIMA(0, d, 0) process is

h = 2

(1 2d)
(h + d)
(1 d)(d) (1 + h d)

arfima Autoregressive fractionally integrated moving-average models

63

where h = 0, 1, . . . . For ARFIMA(p, d, q ), we have

h = 2

q X
p
X

(i)j C(d, p + i h, j )

(3)

i=q j=1

where
min(q,q+i)

(i) =

k ki

k=max(0,i)

j = j

p
Y

1
(1 i j )

i=1

and

(j m )

m6=j


h  2p
F (d + h, 1, 1 d + h, ) + F (d h, 1, 1 d h, ) 1
2

C(d, h, ) =

F () is the hypergeometric series (Gradshteyn and Ryzhik 2007)


F (a, b, c, x) = 1 +

ab
a(a + 1)b(b + 1) 2 a(a + 1)(a + 2)b(b + 1)(b + 2) 3
x+
x +
x + ...
c1
c(c + 1) 1 2
c(c + 1)(c + 2) 1 2 3

The series recursions are evaluated backward as Doornik and Ooms (2003) emphasize. Doornik and
Ooms (2003) also provide other computational enhancements, such as not dividing by j in (3).

The profile likelihood


Doornik and Ooms (2003) show that the parameters 2 and can be concentrated out of the
likelihood. Using (2), the MLE for 2 is

b2 =
where R =

1
2 V

1
b 1 (y X
b )0 R
b)
(y X
T

(4)

and

b 1 X)1 X0 R
b 1 y
b = (X0 R

(5)

is the weighted least-squares estimates for . Substituting (4) into (2) results in the profile likelihood

`p (y|b
r ) =

T
2


1 + log(2) +

1
b + log
log |R|
b2
T

We compute the MLEs using the profile likelihood for the reduced parameter set r = (0 , 0 , d).
Equations (4) and (5) provide MLEs for 2 and to create the full parameter vector =
(0 , 0 , 0 , d, 2 ). We follow with the Refining estimates step, optimizing on the log likelihood
(1). The refining step does not change the estimates; it produces the coefficient variancecovariance
matrix from the observed information matrix.
Using this profile likelihood prevents the use of the BHHH optimization method because there are
no observation-level scores.

64

arfima Autoregressive fractionally integrated moving-average models

The MPL
The small-sample MLE for d can be biased when there are exogenous variables in the model. The
MPL reduces this bias (Hauser 1999; Doornik and Ooms 2004). The mpl option will direct arfima
to use this optimization criterion. The MPL is expressed as

T
`m (y|b
r ) = {1 + log(2)}
2

1
1

T
2


b
log |R|

T k2
2

log
b2

1
b 1 X|
log |X0 R
2

where k = rank(X) (An and Bloomfield 1993).


There is no MPL estimator for 2 , and you will notice its absence from the coefficient table.
However, the unbiased estimate assuming ARFIMA(0, 0, 0),

e2 =

b 1 (y X
b )0 R
b)
(y X
T k

is stored in e() for postestimation computation of the forecast and residual root mean squared errors.

References
An, S., and P. Bloomfield. 1993. Cox and Reids modification in regression models with correlated errors. Technical
report, Department of Statistics, North Carolina State University, Raleigh, NC.
Baillie, R. T. 1996. Long memory processes and fractional integration in econometrics. Journal of Econometrics 73:
559.
Beran, J. 1994. Statistics for Long-Memory Processes. Boca Raton: Chapman & Hall/CRC.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Doornik, J. A., and M. Ooms. 2003. Computational aspects of maximum likelihood estimation of autoregressive
fractionally integrated moving average models. Computational Statistics & Data Analysis 42: 333348.
. 2004. Inference and forecasting for ARFIMA models with an application to US and UK inflation. Studies in
Nonlinear Dynamics & Econometrics 8: 123.
Golub, G. H., and C. F. Van Loan. 1996. Matrix Computations. 3rd ed. Baltimore: Johns Hopkins University Press.
Gradshteyn, I. S., and I. M. Ryzhik. 2007. Table of Integrals, Series, and Products. 7th ed. San Diego: Elsevier.
Granger, C. W. J., and R. Joyeux. 1980. An introduction to long-memory time series models and fractional differencing.
Journal of Time Series Analysis 1: 1529.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Hauser, M. A. 1999. Maximum likelihood estimators for ARMA and ARFIMA models: a Monte Carlo study. Journal
of Statistical Planning and Inference 80: 229255.
Hosking, J. R. M. 1981. Fractional differencing. Biometrika 68: 165176.
Hurst, H. E. 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers
116: 770779.
Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley.
Sowell, F. 1992a. Modeling long-run behavior with the fractional ARIMA model. Journal of Monetary Economics
29: 277302.
. 1992b. Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal
of Econometrics 53: 165188.

arfima Autoregressive fractionally integrated moving-average models

Also see
[TS] arfima postestimation Postestimation tools for arfima
[TS] tsset Declare data to be time-series data
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] sspace State-space models
[U] 20 Estimation and postestimation commands

65

Title
arfima postestimation Postestimation tools for arfima
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


References

Options for predict


Also see

Description
The following postestimation commands are of special interest after arfima:
Command

Description

estat acplot
irf
psdensity

estimate autocorrelations and autocovariances


create and analyze IRFs
estimate the spectral density

The following standard postestimation commands are also available:

Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

estat ic, margins, marginsplot, nlcom, and predictnl are not appropriate after arfima, mpl.

66

arfima postestimation Postestimation tools for arfima

67

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic options

Description

statistic
Main

xb
residuals
rstandard
fdifference

predicted values; the default


predicted innovations
standardized innovations
fractionally differenced series

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options



rmse( type newvar)

put the estimated root mean squared error of the predicted statistic
in a new variable; only permitted with options xb and residuals
forecast the time series starting at datetime; only permitted with
option xb

dynamic(datetime)

datetime is a # or a time literal, such as td(1jan1995) or tq(1995q1); see [D] datetime.

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the predictions for the level of depvar.
residuals calculates the predicted innovations.
rstandard calculates the standardized innovations.
fdifference calculates the fractionally differenced predictions of depvar.


Options


rmse( type newvar) puts the root mean squared errors of the predicted statistics into the specified
new variables. The root mean squared errors measure the variances due to the disturbances but do
not account for estimation error. rmse() is only permitted with the xb and residuals options.
dynamic(datetime) specifies when predict starts producing dynamic forecasts. The specified datetime must be in the scale of the time variable specified in tsset, and the datetime must be
inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming
that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables,
they must be present for the whole predicted sample. dynamic() may only be specified with xb.

68

arfima postestimation Postestimation tools for arfima

Remarks and examples


Remarks are presented under the following headings:
Forecasting after ARFIMA
IRF results for ARFIMA

Forecasting after ARFIMA


We assume that you have already read [TS] arfima. In this section, we illustrate some of the
features of predict after fitting an ARFIMA model using arfima.

Example 1
We have monthly data on the one-year Treasury bill secondary market rate imported from the
Federal Reserve Bank (FRED) database using freduse; see Drukker (2006) and Stata YouTube video:
Using freduse to download time-series data from the Federal Reserve for an introduction to freduse.
Below we fit an ARFIMA model with two autoregressive terms and one moving-average term to the
data.

arfima postestimation Postestimation tools for arfima

69

. use http://www.stata-press.com/data/r13/tb1yr
(FRED, 1-year treasury bill; secondary market rate, monthly 1959-2001)
. arfima tb1yr, ar(1/2) ma(1)
Iteration 0:
log likelihood = -235.31856
Iteration 1:
log likelihood = -235.26104 (backed up)
Iteration 2:
log likelihood = -235.25974 (backed up)
Iteration 3:
log likelihood = -235.2544 (backed up)
Iteration 4:
log likelihood = -235.13353
Iteration 5:
log likelihood = -235.13063
Iteration 6:
log likelihood = -235.12108
Iteration 7:
log likelihood = -235.11917
Iteration 8:
log likelihood = -235.11869
Iteration 9:
log likelihood = -235.11868
Refining estimates:
Iteration 0:
log likelihood = -235.11868
Iteration 1:
log likelihood = -235.11868
ARFIMA regression
Sample: 1959m7 - 2001m8
Number of obs
=
506
Wald chi2(4)
=
1864.15
Log likelihood = -235.11868
Prob > chi2
=
0.0000
OIM
Std. Err.

P>|z|

5.496709

2.920357

1.88

0.060

-.2270864

11.2205

ar
L1.
L2.

.2326107
.3885212

.1136655
.0835665

2.05
4.65

0.041
0.000

.0098304
.2247337

.4553911
.5523086

ma
L1.

.7755848

.0669562

11.58

0.000

.6443531

.9068166

.4606489

.0646542

7.12

0.000

.333929

.5873688

/sigma2

.1466495

.009232

15.88

0.000

.1285551

.1647439

tb1yr

Coef.

_cons

[95% Conf. Interval]

tb1yr

ARFIMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

All the parameters are statistically significant at the 5% level, and they indicate a high degree of
dependence in the series. In fact, the confidence interval for the fractional-difference parameter d
indicates that the series may be nonstationary. We will proceed as if the series is stationary and
suppose that it is fractionally integrated of order 0.46.
We begin our postestimation analysis by predicting the series in sample:
. predict ptb
(option xb assumed)

We continue by using the estimated fractional-difference parameter to fractionally difference the


original series and by plotting the original series, the predicted series, and the fractionally differenced
series. See [TS] arfima for a definition of the fractional-difference operator.

70

arfima postestimation Postestimation tools for arfima

10

15

. predict fdtb, fdifference


. twoway tsline tb1yr ptb fdtb, legend(cols(1))

1960m1

1970m1

1980m1
month

1990m1

2000m1

1Year Treasury Bill: Secondary Market Rate


xb prediction
tb1yr fractionally differenced

The above graph shows that the in-sample predictions appear to track the original series well and
that the fractionally differenced series looks much more like a stationary series than does the original.

Example 2
In this example, we use the above estimates to produce a dynamic forecast and a confidence
interval for the forecast for the one-year treasury bill rate and plot them.
We begin by extending the dataset and using predict to put the dynamic forecast in the new
ftb variable and the root mean squared error of the forecast in the new rtb variable. (As discussed
in Methods and formulas, the root mean squared error of the forecast accounts for the idiosyncratic
error but not for the estimation error.)
. tsappend, add(12)
. predict ftb, xb dynamic(tm(2001m9)) rmse(rtb)

Now we compute a 90% confidence interval around the dynamic forecast and plot the original
series, the in-sample forecast, the dynamic forecast, and the confidence interval of the dynamic
forecast.
. scalar z = invnormal(0.95)
. generate lb = ftb - z*rtb if month>=tm(2001m9)
(506 missing values generated)
. generate ub = ftb + z*rtb if month>=tm(2001m9)
(506 missing values generated)
. twoway tsline tb1yr ftb if month>tm(1998m12) ||
>
tsrline lb ub if month>=tm(2001m9),
>
legend(cols(1) label(3 "90% prediction interval"))

71

arfima postestimation Postestimation tools for arfima

1999m1

2000m1

2001m1
month

2002m1

1Year Treasury Bill: Secondary Market Rate


xb prediction, dynamic(tm(2001m9))
90% prediction interval

IRF results for ARFIMA


We assume that you have already read [TS] irf and [TS] irf create. In this section, we illustrate
how to calculate the implulseresponse function (IRF) of an ARFIMA model.

Example 3
Here we use the estimates obtained in example 1 to calculate the IRF of the ARFIMA model; see
[TS] irf and [TS] irf create for more details about IRFs.
. irf
(file
(file
(file

create arfima, step(50) set(myirf)


myirf.irf created)
myirf.irf now active)
myirf.irf updated)

. irf graph irf


arfima, tb1yr, tb1yr
1.5

.5

0
0

50

step
95% CI

impulseresponse function (irf)

Graphs by irfname, impulse variable, and response variable

72

arfima postestimation Postestimation tools for arfima

The figure shows that a shock to tb1yr causes an initial spike in tb1yr, after which the impact
of the shock starts decaying slowly. This behavior is characteristic of long-memory processes.

Methods and formulas


Denote h , h = 1, . . . , t, to be the autocovariance function of the ARFIMA(p, d, q ) process for
two observations, yt and yth , h time periods apart. The covariance matrix V of the process of
length T has a Toeplitz structure of


0
1
V=
..
.
T 1

1
0
..
.

2
1
..
.

T 2

T 3

. . . T 1
. . . T 2
..
..

.
.
...
0

where the process variance is 0 = Var(yt ). We factor V = LDL0 , where L is lower triangular and
D = Diag(t ). The structure of L1 is of importance.

1
1,1
2,2
..
.

0
1
2,1
..
.

0
0
1
..
.

T 1,T 1

T 1,T 2

T 1,T 2

...
...
...
..
.

0
0
0
..
.

. . . T 1,1

0
0

0
..

.
1

Let z = yt xt . The best linear predictor of zt+1 based on z1 , z2 , . . . , zt is zbt+1 =


Pt t
1
up to, but
k=1 t,k ztk+1 . Define t = (t,t , t,t1 , . . . , t1,1 ) to be the tth row of L
1
not including, the diagonal. Then t = Vt t , where Vt is the t t upper left submatrix of V and
t = (1 , 2 , . . . , t )0 . Hence, the best linear predictor of the innovations is computed as b
 = L1 z,
b In practice, the computation is
b =b
and the one-step predictions are y
 + X.



b 1 y X
b + X
b
b=L
y
b and V
b are computed from the maximum likelihood estimates. We use the DurbinLevinson
where L
b , invert L
b , and scale y X
b using
algorithm (Palma 2007; Golub and Van Loan 1996) to factor V
only the vector of estimated autocovariances
b.
The prediction error variances of the one-step predictions are computed recursively in the Durbin
Levinson algorithm. They are the t elements in the diagonal matrix D computed from the Cholesky
2
factorization of V. The recursive formula is 0 = 0 , and t = t1 (1 t,t
).

b 1b
Forecasting is carried out as described by Beran (1994, sec. 8.7), b
zT +k =
e0k V
z, where
0

ek = (b
T +k1 ,
bT +k2 , . . . ,
bk ). The forecast mean squared error is computed as MSE(b
zT +k ) =
b0
0 b 1
1
b

ek V
ek . Computation of V
ek is carried out efficiently using algorithm 4.7.2 of Golub and Van
Loan (1996).

arfima postestimation Postestimation tools for arfima

73

References
Beran, J. 1994. Statistics for Long-Memory Processes. Boca Raton: Chapman & Hall/CRC.
Drukker, D. M. 2006. Importing Federal Reserve economic data. Stata Journal 6: 384386.
Golub, G. H., and C. F. Van Loan. 1996. Matrix Computations. 3rd ed. Baltimore: Johns Hopkins University Press.
Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley.

Also see
[TS] arfima Autoregressive fractionally integrated moving-average models
[TS] estat acplot Plot parametric autocorrelation and autocovariance functions
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] psdensity Parametric spectral density estimation after arima, arfima, and ucm
[U] 20 Estimation and postestimation commands

Title
arima ARIMA, ARMAX, and other dynamic regression models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Basic syntax for a regression model with ARMA disturbances


arima depvar indepvars , ar(numlist) ma(numlist)
Basic syntax for an ARIMA(p, d, q) model
arima depvar , arima(# p ,# d ,# q )
Basic syntax for a multiplicative seasonal ARIMA(p, d, q) (P, D, Q)s model
arima depvar , arima(# p ,# d ,# q ) sarima(# P ,# D ,# Q ,# s )
Full syntax
arima depvar

indepvars

options

 

if

 

in

 

weight

 

, options

Description

Model

noconstant
arima(# p ,# d ,# q )
ar(numlist)
ma(numlist)
constraints(constraints)
collinear

suppress constant term


specify ARIMA(p, d, q ) model for dependent variable
autoregressive terms of the structural model disturbance
moving-average terms of the structural model disturbance
apply specified linear constraints
keep collinear variables

Model 2

sarima(# P ,# D ,# Q ,# s )
mar(numlist, #s )
mma(numlist, #s )

specify period-#s multiplicative seasonal ARIMA term


multiplicative seasonal autoregressive term; may be repeated
multiplicative seasonal moving-average term; may be repeated

Model 3

condition
savespace
diffuse
p0(# | matname)
state0(# | matname)

use conditional MLE instead of full MLE


conserve memory during estimation
use diffuse prior for starting Kalman filter recursions
use alternate prior for starting Kalman recursions; seldom used
use alternate state vector for starting Kalman filter recursions

SE/Robust

vce(vcetype)

vcetype may be opg, robust, or oim


74

arima ARIMA, ARMAX, and other dynamic regression models

75

Reporting

set confidence level; default is level(95)


report list of gaps in time series
do not display constraints
control column formats, row spacing, and line width

level(#)
detail
nocnsreport
display options
Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

You must tsset your data before using arima; see [TS] tsset.
depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
iweights are allowed; see [U] 11.1.6 weight.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Time series

>

ARIMA and ARMAX models

Description
arima fits univariate models with time-dependent disturbances. arima fits a model of depvar on
indepvars where the disturbances are allowed to follow a linear autoregressive moving-average (ARMA)
specification. The dependent and independent variables may be differenced or seasonally differenced
to any degree. When independent variables are included in the specification, such models are often
called ARMAX models; and when independent variables are not specified, they reduce to BoxJenkins
autoregressive integrated moving-average (ARIMA) models in the dependent variable. Multiplicative
seasonal ARMAX and ARIMA models can also be fit. Missing data are allowed and are handled using
the Kalman filter and methods suggested by Harvey (1989 and 1993); see Methods and formulas.
In the full syntax, depvar is the variable being modeled, and the structural or regression part of
the model is specified in indepvars. ar() and ma() specify the lags of autoregressive and movingaverage terms, respectively; and mar() and mma() specify the multiplicative seasonal autoregressive
and moving-average terms, respectively.
arima allows time-series operators in the dependent variable and independent variable lists, and
making extensive use of these operators is often convenient; see [U] 11.4.4 Time-series varlists and
[U] 13.9 Time-series operators for an extended discussion of time-series operators.
arima typed without arguments redisplays the previous estimates.

Options


Model

noconstant; see [R] estimation options.


arima(# p ,# d ,# q ) is an alternative, shorthand notation for specifying models with ARMA disturbances.
The dependent variable and any independent variables are differenced # d times, and 1 through # p
lags of autocorrelations and 1 through # q lags of moving averages are included in the model. For
example, the specification

76

arima ARIMA, ARMAX, and other dynamic regression models


. arima D.y, ar(1/2) ma(1/3)

is equivalent to
. arima y, arima(2,1,3)

The latter is easier to write for simple ARMAX and ARIMA models, but if gaps in the AR or MA
lags are to be modeled, or if different operators are to be applied to independent variables, the
first syntax is required.
ar(numlist) specifies the autoregressive terms of the structural model disturbance to be included in
the model. For example, ar(1/3) specifies that lags of 1, 2, and 3 of the structural disturbance
be included in the model; ar(1 4) specifies that lags 1 and 4 be included, perhaps to account for
additive quarterly effects.
If the model does not contain regressors, these terms can also be considered autoregressive terms
for the dependent variable.
ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for
the lagged innovations (white-noise disturbances).
constraints(constraints), collinear; see [R] estimation options.
If constraints are placed between structural model parameters and ARMA terms, the first few
iterations may attempt steps into nonstationary areas. This process can be ignored if the final
solution is well within the bounds of stationary solutions.

Model 2

sarima(# P ,# D ,# Q ,#s ) is an alternative, shorthand notation for specifying the multiplicative seasonal
components of models with ARMA disturbances. The dependent variable and any independent
variables are lag-# s seasonally differenced #D times, and 1 through # P seasonal lags of autoregressive
terms and 1 through # Q seasonal lags of moving-average terms are included in the model. For
example, the specification
. arima DS12.y, ar(1/2) ma(1/3) mar(1/2,12) mma(1/2,12)

is equivalent to
. arima y, arima(2,1,3) sarima(2,1,2,12)

mar(numlist, # s ) specifies the lag-# s multiplicative seasonal autoregressive terms. For example,
mar(1/2,12) requests that the first two lag-12 multiplicative seasonal autoregressive terms be
included in the model.
mma(numlist, # s ) specified the lag-# s multiplicative seasonal moving-average terms. For example,
mma(1 3,12) requests that the first and third (but not the second) lag-12 multiplicative seasonal
moving-average terms be included in the model.

Model 3

condition specifies that conditional, rather than full, maximum likelihood estimates be produced.
The presample values for t and t are taken to be their expected value of zero, and the estimate
of the variance of t is taken to be constant over the entire sample; see Hamilton (1994, 132).
This estimation method is not appropriate for nonstationary series but may be preferable for long
series or for models that have one or more long AR or MA lags. diffuse, p0(), and state0()
have no meaning for models fit from the conditional likelihood and may not be specified with
condition.

arima ARIMA, ARMAX, and other dynamic regression models

77

If the series is long and stationary and the underlying data-generating process does not have a long
memory, estimates will be similar, whether estimated by unconditional maximum likelihood (the
default), conditional maximum likelihood (condition), or maximum likelihood from a diffuse
prior (diffuse).
In small samples, however, results of conditional and unconditional maximum likelihood may
differ substantially; see Ansley and Newbold (1980). Whereas the default unconditional maximum
likelihood estimates make the most use of sample information when all the assumptions of the model
are met, Harvey (1989) and Ansley and Kohn (1985) argue for diffuse priors often, particularly in
ARIMA models corresponding to an underlying structural model.
The condition or diffuse options may also be preferred when the model contains one or more
long AR or MA lags; this avoids inverting potentially large matrices (see diffuse below).
When condition is specified, estimation is performed by the arch command (see [TS] arch),
and more control of the estimation process can be obtained using arch directly.
condition cannot be specified if the model contains any multiplicative seasonal terms.
savespace specifies that memory use be conserved by retaining only those variables required for
estimation. The original dataset is restored after estimation. This option is rarely used and should
be used only if there is not enough space to fit a model without the option. However, arima
requires considerably more temporary storage during estimation than most estimation commands
in Stata.
diffuse specifies that a diffuse prior (see Harvey 1989 or 1993) be used as a starting point for the
Kalman filter recursions. Using diffuse, nonstationary models may be fit with arima (see the
p0() option below; diffuse is equivalent to specifying p0(1e9)).
By default, arima uses the unconditional expected value of the state vector t (see Methods and
formulas) and the mean squared error (MSE) of the state vector to initialize the filter. When the
process is stationary, this corresponds to the expected value and expected variance of a random draw
from the state vector and produces unconditional maximum likelihood estimates of the parameters.
When the process is not stationary, however, this default is not appropriate, and the unconditional
MSE cannot be computed. For a nonstationary process, another starting point must be used for the
recursions.
In the absence of nonsample or presample information, diffuse may be specified to start the
recursions from a state vector of zero and a state MSE matrix corresponding to an effectively
infinite variance on this initial state. This method amounts to an uninformative and improper prior
that is updated to a proper MSE as data from the sample become available; see Harvey (1989).
Nonstationary models may also correspond to models with infinite variance given a particular
specification. This and other problems with nonstationary series make convergence difficult and
sometimes impossible.
diffuse can also be useful if a model contains one or more long AR or MA lags. Computation
of the unconditional MSE of the state vector (see Methods and formulas) requires construction
and inversion of a square matrix that is of dimension {max(p, q + 1)}2 , where p and q are the
maximum AR and MA lags, respectively. If q = 27, for example, we would require a 784-by-784
matrix. Estimation with diffuse does not require this matrix.
For large samples, there is little difference between using the default starting point and the diffuse
starting point. Unless the series has a long memory, the initial conditions affect the likelihood of
only the first few observations.

78

arima ARIMA, ARMAX, and other dynamic regression models

p0(# | matname) is a rarely specified option that can be used for nonstationary series or when an
alternate prior for starting the Kalman recursions is desired (see diffuse above for a discussion
of the default starting point and Methods and formulas for background).
matname specifies a matrix to be used as the MSE of the state vector for starting the Kalman filter
recursions P1|0 . Instead, one number, #, may be supplied, and the MSE of the initial state vector
P1|0 will have this number on its diagonal and all off-diagonal values set to zero.
This option may be used with nonstationary series to specify a larger or smaller diagonal for P1|0
than that supplied by diffuse. It may also be used with state0() when you believe that you
have a better prior for the initial state vector and its MSE.
state0(# | matname) is a rarely used option that specifies an alternate initial state vector, 1|0 (see
Methods and formulas), for starting the Kalman filter recursions. If # is specified, all elements of
the vector are taken to be #. The default initial state vector is state0(0).

SE/Robust

vce(vcetype) specifies the type of standard error reported, which includes types that are robust to
some kinds of misspecification (robust) and that are derived from asymptotic theory (oim, opg);
see [R] vce option.
For state-space models in general and ARMAX and ARIMA models in particular, the robust or
quasimaximum likelihood estimates (QMLEs) of variance are robust to symmetric nonnormality
in the disturbances, including, as a special case, heteroskedasticity. The robust variance estimates
are not generally robust to functional misspecification of the structural or ARMA components of
the model; see Hamilton (1994, 389) for a brief discussion.

Reporting

level(#); see [R] estimation options.


detail specifies that a detailed list of any gaps in the series be reported, including gaps due to
missing observations or missing data for the dependent variable or independent variables.
nocnsreport; see [R] estimation options.
display options: vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch;
see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), gtolerance(#), nonrtolerance(#), and from(init specs); see [R] maximize for all options except gtolerance(), and see below for information on gtolerance().
These options are sometimes more important for ARIMA models than most maximum likelihood
models because of potential convergence problems with ARIMA models, particularly if the specified
model and the sample data imply a nonstationary model.
Several alternate optimization methods, such as BerndtHallHallHausman (BHHH) and Broyden
FletcherGoldfarbShanno (BFGS), are provided for ARIMA models. Although ARIMA models are
not as difficult to optimize as ARCH models, their likelihoods are nevertheless generally not quadratic
and often pose optimization difficulties; this is particularly true if a model is nonstationary or
nearly nonstationary. Because each method approaches optimization differently, some problems
can be successfully optimized by an alternate method when one method fails.
Setting technique() to something other than the default or BHHH changes the vcetype to vce(oim).

arima ARIMA, ARMAX, and other dynamic regression models

79

The following options are all related to maximization and are either particularly important in fitting
ARIMA models or not available for most other estimators.
technique(algorithm spec) specifies the optimization technique to use to maximize the
likelihood function.
technique(bhhh) specifies the BerndtHallHallHausman (BHHH) algorithm.
technique(dfp) specifies the DavidonFletcherPowell (DFP) algorithm.
technique(bfgs) specifies the BroydenFletcherGoldfarbShanno (BFGS) algorithm.
technique(nr) specifies Statas modified NewtonRaphson (NR) algorithm.
You can specify multiple optimization methods. For example,
technique(bhhh 10 nr 20)
requests that the optimizer perform 10 BHHH iterations, switch to NewtonRaphson for 20
iterations, switch back to BHHH for 10 more iterations, and so on.
The default for arima is technique(bhhh 5 bfgs 10).
gtolerance(#) specifies the tolerance for the gradient relative to the coefficients. When
|gi bi | gtolerance() for all parameters bi and the corresponding elements of the
gradient gi , the gradient tolerance criterion is met. The default gradient tolerance for arima
is gtolerance(.05).
gtolerance(999) may be specified to disable the gradient criterion. If the optimizer becomes
stuck with repeated (backed up) messages, the gradient probably still contains substantial
values, but an uphill direction cannot be found for the likelihood. With this option, results can
often be obtained, but whether the global maximum likelihood has been found is unclear.
When the maximization is not going well, it is also possible to set the maximum number of
iterations (see [R] maximize) to the point where the optimizer appears to be stuck and to inspect
the estimation results at that point.
from(init specs) allows you to set the starting values of the model coefficients; see [R] maximize
for a general discussion and syntax options.
The standard syntax for from() accepts a matrix, a list of values, or coefficient name value
pairs; see [R] maximize. arima also accepts from(armab0), which sets the starting value for
all ARMA parameters in the model to zero prior to optimization.
ARIMA models may be sensitive to initial conditions and may have coefficient values that
correspond to local maximums. The default starting values for arima are generally good,
particularly in large samples for stationary series.

The following option is available with arima but is not shown in the dialog box:
coeflegend; see [R] estimation options.

80

arima ARIMA, ARMAX, and other dynamic regression models

Remarks and examples


Remarks are presented under the following headings:
Introduction
ARIMA models
Multiplicative seasonal ARIMA models
ARMAX models
Dynamic forecasting
Video example

Introduction
arima fits both standard ARIMA models that are autoregressive in the dependent variable and
structural models with ARMA disturbances. Good introductions to the former models can be found in
Box, Jenkins, and Reinsel (2008); Hamilton (1994); Harvey (1993); Newton (1988); Diggle (1990);
and many others. The latter models are developed fully in Hamilton (1994) and Harvey (1989), both of
which provide extensive treatment of the Kalman filter (Kalman 1960) and the state-space form used
by arima to fit the models. Becketti (2013) discusses ARIMA models and Statas arima command,
and he devotes an entire chapter explaining how the principles of ARIMA models are applied to real
datasets in practice.
Consider a first-order autoregressive moving-average process. Then arima estimates all the parameters in the model

yt = xt + t
t = t1 + t1 + t

structural equation
disturbance, ARMA(1, 1)

where

t

is the first-order autocorrelation parameter


is the first-order moving-average parameter
i.i.d. N (0, 2 ), meaning that t is a white-noise disturbance

You can combine the two equations and write a general ARMA(p, q) in the disturbances process as

yt = xt + 1 (yt1 xt1 ) + 2 (yt2 xt2 ) + + p (ytp xtp )


+ 1 t1 + 2 t2 + + q tq + t
It is also common to write the general form of the ARMA model more succinctly using lag operator
notation as
(Lp )(yt xt ) = (Lq )t
ARMA(p, q)
where

(Lp ) = 1 1 L 2 L2 p Lp
(Lq ) = 1 + 1 L + 2 L2 + + q Lq

and Lj yt = ytj .
For stationary series, full or unconditional maximum likelihood estimates are obtained via the
Kalman filter. For nonstationary series, if some prior information is available, you can specify initial
values for the filter by using state0() and p0() as suggested by Hamilton (1994) or assume an
uninformative prior by using the diffuse option as suggested by Harvey (1989).

arima ARIMA, ARMAX, and other dynamic regression models

81

ARIMA models
Pure ARIMA models without a structural component do not have regressors and are often written
as autoregressions in the dependent variable, rather than autoregressions in the disturbances from a
structural equation. For example, an ARMA(1, 1) model can be written as

yt = + yt1 + t1 + t

(1a)

Other than a scale factor for the constant term , these models are equivalent to the ARMA in the
disturbances formulation estimated by arima, though the latter are more flexible and allow a wider
class of models.
To see this effect, replace xt in the structural equation above with a constant term 0 so that

yt = 0 + t
= 0 + t1 + t1 + t
= 0 + (yt1 0 ) + t1 + t
= (1 )0 + yt1 + t1 + t

(1b)

Equations (1a) and (1b) are equivalent, with = (1 )0 , so whether we consider an ARIMA model
as autoregressive in the dependent variable or disturbances is immaterial. Our illustration can easily
be extended from the ARMA(1, 1) case to the general ARIMA(p, d, q) case.

Example 1: ARIMA model


Enders (2004, 8793) considers an ARIMA model of the U.S. Wholesale Price Index (WPI)
using quarterly data over the period 1960q1 through 1990q4. The simplest ARIMA model that includes
differencing and both autoregressive and moving-average components is the ARIMA(1,1,1) specification.
We can fit this model with arima by typing

82

arima ARIMA, ARMAX, and other dynamic regression models


. use http://www.stata-press.com/data/r13/wpi1
. arima wpi, arima(1,1,1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -139.80133
Iteration 1:
log likelihood = -135.6278
Iteration 2:
log likelihood = -135.41838
Iteration 3:
log likelihood = -135.36691
Iteration 4:
log likelihood = -135.35892
(switching optimization to BFGS)
Iteration 5:
log likelihood = -135.35471
Iteration 6:
log likelihood = -135.35135
Iteration 7:
log likelihood = -135.35132
Iteration 8:
log likelihood = -135.35131
ARIMA regression
Sample:

1960q2 - 1990q4

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -135.3513

=
=
=

123
310.64
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

.7498197

.3340968

2.24

0.025

.0950019

1.404637

ar
L1.

.8742288

.0545435

16.03

0.000

.7673256

.981132

ma
L1.

-.4120458

.1000284

-4.12

0.000

-.6080979

-.2159938

/sigma

.7250436

.0368065

19.70

0.000

.6529042

.7971829

D.wpi

Coef.

_cons

wpi

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

Examining the estimation results, we see that the AR(1) coefficient is 0.874, the MA(1) coefficient
is 0.412, and both are highly significant. The estimated standard deviation of the white-noise
disturbance  is 0.725.
This model also could have been fit by typing
. arima D.wpi, ar(1) ma(1)

The D. placed in front of the dependent variable wpi is the Stata time-series operator for differencing.
Thus we would be modeling the first difference in WPI from the second quarter of 1960 through
the fourth quarter of 1990 because the first observation is lost because of differencing. This second
syntax allows a richer choice of models.

Example 2: ARIMA model with additive seasonal effects


After examining first-differences of WPI, Enders chose a model of differences in the natural
logarithms to stabilize the variance in the differenced series. The raw data and first-difference of the
logarithms are graphed below.

arima ARIMA, ARMAX, and other dynamic regression models


US Wholesale Price Index difference of logs

25

.04

.02

50

75

.02

.04

100

.06

.08

125

US Wholesale Price Index

83

1960q1

1970q1

1980q1

1990q1

1960q1

1970q1

1980q1

1990q1

On the basis of the autocorrelations, partial autocorrelations (see graphs below), and the results of
preliminary estimations, Enders identified an ARMA model in the log-differenced series.

10

20
Lag

Bartletts formula for MA(q) 95% confidence bands

30

40

0.40

0.40

Autocorrelations of D.ln_wpi
0.20
0.00
0.20
0.40

Partial autocorrelations of D.ln_wpi


0.20
0.00
0.20
0.40

0.60

0.60

. ac D.ln_wpi, ylabels(-.4(.2).6)
. pac D.ln_wpi, ylabels(-.4(.2).6)

10

20
Lag

30

40

95% Confidence bands [se = 1/sqrt(n)]

In addition to an autoregressive term and an MA(1) term, an MA(4) term is included to account
for a remaining quarterly effect. Thus the model to be fit is

ln(wpit ) = 0 + 1 { ln(wpit1 ) 0 } + 1 t1 + 4 t4 + t

84

arima ARIMA, ARMAX, and other dynamic regression models

We can fit this model with arima and Statas standard difference operator:
. arima D.ln_wpi, ar(1) ma(1 4)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
Iteration 1:
log likelihood =
Iteration 2:
log likelihood =
Iteration 3:
log likelihood =
Iteration 4:
log likelihood =
(switching optimization to BFGS)
Iteration 5:
log likelihood =
Iteration 6:
log likelihood =
Iteration 7:
log likelihood =
Iteration 8:
log likelihood =
Iteration 9:
log likelihood =
Iteration 10: log likelihood =

382.67447
384.80754
384.84749
385.39213
385.40983
385.9021
385.95646
386.02979
386.03326
386.03354
386.03357

ARIMA regression
Sample:

1960q2 - 1990q4

Log likelihood =

386.0336

D.ln_wpi

Coef.

Number of obs
Wald chi2(3)
Prob > chi2

=
=
=

123
333.60
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ln_wpi
_cons

.0110493

.0048349

2.29

0.022

.0015731

.0205255

ar
L1.

.7806991

.0944946

8.26

0.000

.5954931

.965905

ma
L1.
L4.

-.3990039
.3090813

.1258753
.1200945

-3.17
2.57

0.002
0.010

-.6457149
.0737003

-.1522928
.5444622

/sigma

.0104394

.0004702

22.20

0.000

.0095178

.0113609

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

In this final specification, the log-differenced series is still highly autocorrelated at a level of 0.781,
though innovations have a negative impact in the ensuing quarter (0.399) and a positive seasonal
impact of 0.309 in the following year.

Technical note
In one way, the results differ from most of Statas estimation commands: the standard error of
the coefficients is reported as OPG Std. Err. The default standard errors and covariance matrix
for arima estimates are derived from the outer product of gradients (OPG). This is one of three
asymptotically equivalent methods of estimating the covariance matrix of the coefficients (only two of
which are usually tractable to derive). Discussions and derivations of all three estimates can be found
in Davidson and MacKinnon (1993), Greene (2012), and Hamilton (1994). Bollerslev, Engle, and
Nelson (1994) suggest that the OPG estimates are more numerically stable in time-series regressions
when the likelihood and its derivatives depend on recursive computations, which is certainly the case
for the Kalman filter. To date, we have found no numerical instabilities in either estimate of the
covariance matrixsubject to the stability and convergence of the overall model.

arima ARIMA, ARMAX, and other dynamic regression models

85

Most of Statas estimation commands provide covariance estimates derived from the Hessian of
the likelihood function. These alternate estimates can also be obtained from arima by specifying the
vce(oim) option.

Multiplicative seasonal ARIMA models


Many time series exhibit a periodic seasonal component, and a seasonal ARIMA model, often
abbreviated SARIMA, can then be used. For example, monthly sales data for air conditioners have a
strong seasonal component, with sales high in the summer months and low in the winter months.
In the previous example, we accounted for quarterly effects by fitting the model

(1 1 L){ ln(wpit ) 0 } = (1 + 1 L + 4 L4 )t


This is an additive seasonal ARIMA model, in the sense that the first- and fourth-order MA terms work
additively: (1 + 1 L + 4 L4 ).
Another way to handle the quarterly effect would be to fit a multiplicative seasonal ARIMA model.
A multiplicative SARIMA model of order (1, 1, 1) (0, 0, 1)4 for the ln(wpit ) series is

(1 1 L){ ln(wpit ) 0 } = (1 + 1 L)(1 + 4,1 L4 )t


or, upon expanding terms,

ln(wpit ) = 0 + 1 { ln(wpit ) 0 } + 1 t1 + 4,1 t4 + 1 4,1 t5 + t

(2)

In the notation (1, 1, 1) (0, 0, 1)4 , the (1, 1, 1) means that there is one nonseasonal autoregressive
term (1 1 L) and one nonseasonal moving-average term (1 + 1 L) and that the time series is
first-differenced one time. The (0, 0, 1)4 indicates that there is no lag-4 seasonal autoregressive term,
that there is one lag-4 seasonal moving-average term (1 + 4,1 L4 ), and that the series is seasonally
differenced zero times. This is known as a multiplicative SARIMA model because the nonseasonal
and seasonal factors work multiplicatively: (1 + 1 L)(1 + 4,1 L4 ). Multiplying the terms imposes
nonlinear constraints on the parameters of the fifth-order lagged values; arima imposes these constraints
automatically.
To further clarify the notation, consider a (2, 1, 1) (1, 1, 2)4 multiplicative SARIMA model:

(1 1 L 2 L2 )(1 4,1 L4 )4 zt = (1 + 1 L)(1 + 4,1 L4 + 4,2 L8 )t

(3)

where denotes the difference operator yt = yt yt1 and s denotes the lag-s seasonal
difference operator s yt = yt yts . Expanding (3), we have

zet = 1 zet1 + 2 zet2 + 4,1 zet4 1 4,1 zet5 2 4,1 zet6


+ 1 t1 + 4,1 t4 + 1 4,1 t5 + 4,2 t8 + 1 4,2 t9 + t
where

zet = 4 zt = (zt zt4 ) = zt zt1 (zt4 zt5 )


and zt = yt xt if regressors are included in the model, zt = yt 0 if just a constant term is
included, and zt = yt otherwise.

86

arima ARIMA, ARMAX, and other dynamic regression models

More generally, a (p, d, q) (P, D, Q)s multiplicative SARIMA model is


q
Q
(Lp )s (LP )d D
s zt = (L )s (L )t

where

s (LP ) = (1 s,1 Ls s,2 L2s s,P LP s )


s (LQ ) = (1 + s,1 Ls + s,2 L2s + + s,Q LQs )

(Lp ) and (Lq ) were defined previously, d means apply the operator d times, and similarly
for D
s . Typically, d and D will be 0 or 1; and p, q , P , and Q will seldom be more than 2 or 3. s
will typically be 4 for quarterly data and 12 for monthly data. In fact, the model can be extended to
include both monthly and quarterly seasonal factors, as we explain below.
If a plot of the data suggests that the seasonal effect is proportional to the mean of the series, then
the seasonal effect is probably multiplicative and a multiplicative SARIMA model may be appropriate.
Box, Jenkins, and Reinsel (2008, sec. 9.3.1) suggest starting with a multiplicative SARIMA model with
any data that exhibit seasonal patterns and then exploring nonmultiplicative SARIMA models if the
multiplicative models do not fit the data well. On the other hand, Chatfield (2004, 14) suggests that
taking the logarithm of the series will make the seasonal effect additive, in which case an additive
SARIMA model as fit in the previous example would be appropriate. In short, the analyst should
probably try both additive and multiplicative SARIMA models to see which provides better fits and
forecasts.
Unless diffuse is used, arima must create square matrices of dimension {max(p, q + 1)}2 , where
p and q are the maximum AR and MA lags, respectively; and the inclusion of long seasonal terms can
make this dimension rather large. For example, with monthly data, you might fit a (0, 1, 1)(0, 1, 2)12
2
SARIMA model. The maximum MA lag is 2 12 + 1 = 25, requiring a matrix with 26 = 676 rows
and columns.

Example 3: Multiplicative SARIMA model


One of the most common multiplicative SARIMA specifications is the (0, 1, 1) (0, 1, 1)12 airline
model of Box, Jenkins, and Reinsel (2008, sec. 9.2). The dataset airline.dta contains monthly
international airline passenger data from January 1949 through December 1960. After first- and
seasonally differencing the data, we do not suspect the presence of a trend component, so we use the
noconstant option with arima:

arima ARIMA, ARMAX, and other dynamic regression models


. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. generate lnair = ln(air)
. arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant
(setting optimization to BHHH)
Iteration 0:
log likelihood =
223.8437
Iteration 1:
log likelihood = 239.80405
(output omitted )
Iteration 8:
log likelihood = 244.69651
ARIMA regression
Sample: 14 - 144
Number of obs
Wald chi2(2)
Log likelihood = 244.6965
Prob > chi2

DS12.lnair

Coef.

OPG
Std. Err.

P>|z|

=
=
=

87

131
84.53
0.0000

[95% Conf. Interval]

ARMA
ma
L1.

-.4018324

.0730307

-5.50

0.000

-.5449698

-.2586949

ma
L1.

-.5569342

.0963129

-5.78

0.000

-.745704

-.3681644

/sigma

.0367167

.0020132

18.24

0.000

.0327708

.0406625

ARMA12

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

Thus our model of the monthly number of international airline passengers is

12 lnairt = 0.402t1 0.557t12 + 0.224t13 + t

b = 0.037
In (2), for example, the coefficient on t13 is the product of the coefficients on the t1 and t12
terms (0.224 0.402 0.557). arima labeled the dependent variable DS12.lnair to indicate
that it has applied the difference operator and the lag-12 seasonal difference operator 12 to
lnair; see [U] 11.4.4 Time-series varlists for more information.
We could have fit this model by typing
. arima DS12.lnair, ma(1) mma(1, 12) noconstant

For simple multiplicative models, using the sarima() option is easier, though this second syntax
allows us to incorporate more complicated seasonal terms.

The mar() and mma() options can be repeated, allowing us to control for multiple seasonal
patterns. For example, we may have monthly sales data that exhibit a quarterly pattern as businesses
purchase our product at the beginning of calendar quarters when new funds are budgeted, and our
product is purchased more frequently in a few months of the year than in most others, even after we
control for quarterly fluctuations. Thus we might choose to fit the model

(1L)(14,1 L4 )(112,1 L12 )(4 12 salest 0 ) = (1+L)(1+4,1 L4 )(1+12,1 L12 )t

88

arima ARIMA, ARMAX, and other dynamic regression models

Although this model looks rather complicated, estimating it using arima is straightforward:
. arima DS4S12.sales, ar(1) mar(1, 4) mar(1, 12) ma(1) mma(1, 4) mma(1, 12)

If we instead wanted to include two lags in the lag-4 seasonal AR term and the first and third (but
not the second) term in the lag-12 seasonal MA term, we would type
. arima DS4S12.sales, ar(1) mar(1 2, 4) mar(1, 12) ma(1) mma(1, 4) mma(1 3, 12)

However, models with multiple seasonal terms can be difficult to fit. Usually, one seasonal factor
with just one or two AR or MA terms is adequate.

ARMAX models
Thus far all our examples have been pure ARIMA models in which the dependent variable was
modeled solely as a function of its past values and disturbances. Also, arima can fit ARMAX models,
which model the dependent variable in terms of a linear combination of independent variables, as
well as an ARMA disturbance process. The prais command (see [TS] prais), for example, allows
you to control for only AR(1) disturbances, whereas arima allows you to control for a much richer
dynamic error structure. arima allows for both nonseasonal and seasonal ARMA components in the
disturbances.

Example 4: ARMAX model


For a simple example of a model including covariates, we can estimate an update of Friedman and
Meiselmans (1963) equation representing the quantity theory of money. They postulate a straightforward relationship between personal-consumption expenditures (consump) and the money supply
as measured by M2 (m2).
consumpt = 0 + 1 m2t + t
Friedman and Meiselman fit the model over a period ending in 1956; we will refit the model over
the period 1959q1 through 1981q4. We restrict our attention to the period prior to 1982 because the
Federal Reserve manipulated the money supply extensively in the later 1980s to control inflation, and
the relationship between consumption and the money supply becomes much more complex during
the later part of the decade.
To demonstrate arima, we will include both an autoregressive term and a moving-average term for
the disturbances in the model; the original estimates included neither. Thus we model the disturbance
of the structural equation as
t = t1 + t1 + t
As per the original authors, the relationship is estimated on seasonally adjusted data, so there is no
need to include seasonal effects explicitly. Obtaining seasonally unadjusted data and simultaneously
modeling the structural and seasonal effects might be preferable.
We will restrict the estimation to the desired sample by using the tin() function in an if
expression; see [D] functions. By leaving the first argument of tin() blank, we are including all
available data through the second date (1981q4). We fit the model by typing

arima ARIMA, ARMAX, and other dynamic regression models


. use http://www.stata-press.com/data/r13/friedman2, clear
. arima consump m2 if tin(, 1981q4), ar(1) ma(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -344.67575
Iteration 1:
log likelihood = -341.57248
(output omitted )
Iteration 10: log likelihood = -340.50774
ARIMA regression
Sample: 1959q1 - 1981q4
Number of obs
Wald chi2(3)
Log likelihood = -340.5077
Prob > chi2

consump

Coef.

OPG
Std. Err.

P>|z|

=
=
=

89

92
4394.80
0.0000

[95% Conf. Interval]

consump
m2
_cons

1.122029
-36.09872

.0363563
56.56703

30.86
-0.64

0.000
0.523

1.050772
-146.9681

1.193286
74.77062

ar
L1.

.9348486

.0411323

22.73

0.000

.8542308

1.015467

ma
L1.

.3090592

.0885883

3.49

0.000

.1354293

.4826891

/sigma

9.655308

.5635157

17.13

0.000

8.550837

10.75978

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

We find a relatively small money velocity with respect to consumption (1.122) over this period,
although consumption is only one facet of the income velocity. We also note a very large first-order
autocorrelation in the disturbances, as well as a statistically significant first-order moving average.
We might be concerned that our specification has led to disturbances that are heteroskedastic or
non-Gaussian. We refit the model by using the vce(robust) option.

90

arima ARIMA, ARMAX, and other dynamic regression models


. arima consump m2 if tin(, 1981q4), ar(1) ma(1) vce(robust)
(setting optimization to BHHH)
Iteration 0:
log pseudolikelihood = -344.67575
Iteration 1:
log pseudolikelihood = -341.57248
(output omitted )
Iteration 10: log pseudolikelihood = -340.50774
ARIMA regression
Sample: 1959q1 - 1981q4
Number of obs
Wald chi2(3)
Log pseudolikelihood = -340.5077
Prob > chi2

consump

Coef.

Semirobust
Std. Err.

P>|z|

=
=
=

92
1176.26
0.0000

[95% Conf. Interval]

consump
m2
_cons

1.122029
-36.09872

.0433302
28.10477

25.89
-1.28

0.000
0.199

1.037103
-91.18306

1.206954
18.98561

ar
L1.

.9348486

.0493428

18.95

0.000

.8381385

1.031559

ma
L1.

.3090592

.1605359

1.93

0.054

-.0055854

.6237038

/sigma

9.655308

1.082639

8.92

0.000

7.533375

11.77724

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

We note a substantial increase in the estimated standard errors, and our once clearly significant
moving-average term is now only marginally significant.

Dynamic forecasting
Another feature of the arima command is the ability to use predict afterward to make dynamic
forecasts. Suppose that we wish to fit the regression model

yt = 0 + 1 xt + yt1 + t
by using a sample of data from t = 1 . . . T and make forecasts beginning at time f .
If we use regress or prais to fit the model, then we can use predict to make one-step-ahead
forecasts. That is, predict will compute

c0 +
c1 xf + byf 1
ybf =
Most importantly, here predict will use the actual value of y at period f 1 in computing the
forecast for time f . Thus, if we use regress or prais, we cannot make forecasts for any periods
beyond f = T + 1 unless we have observed values for y for those periods.
If we instead fit our model with arima, then predict can produce dynamic forecasts by using
the Kalman filter. If we use the dynamic(f ) option, then for period f predict will compute

c0 +
c1 xf + byf 1
ybf =

arima ARIMA, ARMAX, and other dynamic regression models

91

by using the observed value of yf 1 just as predict after regress or prais. However, for period
f + 1 predict newvar, dynamic(f ) will compute

c0 +
c1 xf +1 + bybf
ybf +1 =
using the predicted value of yf instead of the observed value. Similarly, the period f + 2 forecast
will be
c0 +
c1 xf +2 + bybf +1
ybf +2 =
Of course, because our model includes the regressor xt , we can make forecasts only through periods
for which we have observations on xt . However, for pure ARIMA models, we can compute dynamic
forecasts as far beyond the final period of our dataset as desired.
For more information on predict after arima, see [TS] arima postestimation.

Video example
Time series, part 5: Introduction to ARMA/ARIMA models

Stored results
arima stores the following in e():
Scalars
e(N)
e(N gaps)
e(k)
e(k eq)
e(k eq model)
e(k dv)
e(k1)
e(df m)
e(ll)
e(sigma)
e(chi2)
e(p)
e(tmin)
e(tmax)
e(ar max)
e(ma max)
e(rank)
e(ic)
e(rc)
e(converged)

number of observations
number of gaps
number of parameters
number of equations in e(b)
number of equations in overall model test
number of dependent variables
number of variables in first equation
model degrees of freedom
log likelihood
sigma
2

significance
minimum time
maximum time
maximum AR lag
maximum MA lag
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise

92

arima ARIMA, ARMAX, and other dynamic regression models


Macros
e(cmd)
e(cmdline)
e(depvar)
e(covariates)
e(eqnames)
e(wtype)
e(wexp)
e(title)
e(tmins)
e(tmaxs)
e(chi2type)
e(vce)
e(vcetype)
e(ma)
e(ar)
e(mari)
e(mmai)
e(seasons)
e(unsta)
e(opt)
e(ml method)
e(user)
e(technique)
e(tech steps)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(V)
e(V modelbased)
Functions
e(sample)

arima
command as typed
name of dependent variable
list of covariates
names of equations
weight type
weight expression
title in estimation output
formatted minimum time
formatted maximum time
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
lags for moving-average terms
lags for autoregressive terms
multiplicative AR terms and lag i=1... (# seasonal AR terms)
multiplicative MA terms and lag i=1... (# seasonal MA terms)
seasonal lags in model
unstationary or blank
type of optimization
type of ml method
name of likelihood-evaluator program
maximization technique
number of iterations performed before switching techniques
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins
coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


Estimation is by maximum likelihood using the Kalman filter via the prediction error decomposition;
see Hamilton (1994), Gourieroux and Monfort (1997), or, in particular, Harvey (1989). Any of these
sources will serve as excellent background for the fitting of these models with the state-space form;
each source also provides considerable detail on the method outlined below.
Methods and formulas are presented under the following headings:
ARIMA model
Kalman filter equations
Kalman filter or state-space representation of the ARIMA model
Kalman filter recursions
Kalman filter initial conditions
Likelihood from prediction error decomposition
Missing data

arima ARIMA, ARMAX, and other dynamic regression models

93

ARIMA model
The model to be fit is

yt = xt + t
p
q
X
X
t =
i ti +
j tj + t
i=1

j=1

which can be written as the single equation

yt = x t +

p
X

i (yti xti ) +

i=1

q
X

j tj + t

j=1

Some of the s and s may be constrained to zero or, for multiplicative seasonal models, the products
of other parameters.

Kalman filter equations


We will roughly follow Hamiltons (1994) notation and write the Kalman filter
t = Ft1 + vt
0

(state equation)

y t = A x t + H t + wt
and

vt
wt


N


0,

Q 0
0 R

(observation equation)



We maintain the standard Kalman filter matrix and vector notation, although for univariate models
yt , wt , and R are scalars.

Kalman filter or state-space representation of the ARIMA model


A univariate ARIMA model can be cast in state-space form by defining the Kalman filter matrices
as follows (see Hamilton [1994], or Gourieroux and Monfort [1997], for details):

94

arima ARIMA, ARMAX, and other dynamic regression models

1 2
1 0
F=
0 1
0 0

t1
0

...
vt =

...

...
0

. . . p1
...
0
...
0
...
1

p
0

0
0

A0 =
H0 = [ 1 1

. . . q ]

wt = 0
The Kalman filter representation does not require the moving-average terms to be invertible.

Kalman filter recursions


To demonstrate how missing data are handled, the updating recursions for the Kalman filter will
be left in two steps. Writing the updating equations as one step using the gain matrix K is common.
We will provide the updating equations with little justification; see the sources listed above for details.
As a linear combination of a vector of random variables, the state t can be updated to its expected
value on the basis of the prior state as
t|t1 = Ft1 + vt1

(4)

This state is a quadratic form that has the covariance matrix

Pt|t1 = FPt1 F0 + Q
The estimator of yt is

(5)

bt|t1 = xt + H0 t|t1
y

which implies an innovation or prediction error

bt|t1
bt = yt y
This value or vector has mean squared error (MSE)

Mt = H0 Pt|t1 H + R
Now the expected value of t conditional on a realization of yt is

with MSE

t = t|t1 + Pt|t1 HM1


t
t b

(6)

0
Pt = Pt|t1 Pt|t1 HM1
t H Pt|t1

(7)

This expression gives the full set of Kalman filter recursions.

arima ARIMA, ARMAX, and other dynamic regression models

95

Kalman filter initial conditions


When the series is stationary, conditional on xt , the initial conditions for the filter can be
considered a random draw from the stationary distribution of the state equation. The initial values of
the state and the state MSE are the expected values from this stationary distribution. For an ARIMA
model, these can be written as
1|0 = 0
and
vec(P1|0 ) = (Ir2 F F)1 vec(Q)
where vec() is an operator representing the column matrix resulting from stacking each successive
column of the target matrix.
If the series is not stationary, the initial state conditions do not constitute a random draw from a
stationary distribution, and some other values must be chosen. Hamilton (1994) suggests that they be
chosen based on prior expectations, whereas Harvey suggests a diffuse and improper prior having a
state vector of 0 and an infinite variance. This method corresponds to P1|0 with diagonal elements of
. Stata allows either approach to be taken for nonstationary seriesinitial priors may be specified
with state0() and p0(), and a diffuse prior may be specified with diffuse.

Likelihood from prediction error decomposition


Given the outputs from the Kalman filter recursions and assuming that the state and observation
vectors are Gaussian, the likelihood for the state-space model follows directly from the resulting
multivariate normal in the predicted innovations. The log likelihood for observation t is

lnLt =


1
ln(2) + ln(|Mt |) b0t M1
t
t b
2

This command supports the Huber/White/sandwich estimator of the variance using vce(robust).
See [P] robust, particularly Maximum likelihood estimators and Methods and formulas.

Missing data
Missing data, whether a missing dependent variable yt , one or more missing covariates xt , or
completely missing observations, are handled by continuing the state-updating equations without any
contribution from the data; see Harvey (1989 and 1993). That is, (4) and (5) are iterated for every
missing observation, whereas (6) and (7) are ignored. Thus, for observations with missing data,
t = t|t1 and Pt = Pt|t1 . Without any information from the sample, this effectively assumes
that the prediction error for the missing observations is 0. Other methods of handling missing data
on the basis of the EM algorithm have been suggested, for example, Shumway (1984, 1988).

96

arima ARIMA, ARMAX, and other dynamic regression models


George Edward Pelham Box (19192013) was born in Kent, England, and earned degrees
in statistics at the University of London. After work in the chemical industry, he taught and
researched at Princeton and the University of Wisconsin. His many major contributions to statistics
include papers and books in Bayesian inference, robustness (a term he introduced to statistics),
modeling strategy, experimental design and response surfaces, time-series analysis, distribution
theory, transformations, and nonlinear estimation.

Gwilym Meirion Jenkins (19331982) was a British mathematician and statistician who spent
his career in industry and academia, working for extended periods at Imperial College London
and the University of Lancaster before running his own company. His interests were centered on
time series and he collaborated with G. E. P. Box on what are often called BoxJenkins models.
The last years of Jenkins life were marked by a slowly losing battle against Hodgkins disease.

References
Ansley, C. F., and R. J. Kohn. 1985. Estimation, filtering, and smoothing in state space models with incompletely
specified initial conditions. Annals of Statistics 13: 12861316.
Ansley, C. F., and P. Newbold. 1980. Finite sample properties of estimators for autoregressive moving average models.
Journal of Econometrics 13: 159183.
Baum, C. F. 2000. sts15: Tests for stationarity of a time series. Stata Technical Bulletin 57: 3639. Reprinted in
Stata Technical Bulletin Reprints, vol. 10, pp. 356360. College Station, TX: Stata Press.
Baum, C. F., and T. Room. 2001. sts18: A test for long-range dependence in a time series. Stata Technical Bulletin
60: 3739. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 370373. College Station, TX: Stata Press.
Baum, C. F., and R. I. Sperling. 2000. sts15.1: Tests for stationarity of a time series: Update. Stata Technical Bulletin
58: 3536. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 360362. College Station, TX: Stata Press.
Baum, C. F., and V. L. Wiggins. 2000. sts16: Tests for long memory in a time series. Stata Technical Bulletin 57:
3944. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 362368. College Station, TX: Stata Press.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Berndt, E. K., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974. Estimation and inference in nonlinear structural
models. Annals of Economic and Social Measurement 3/4: 653665.
Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed.
R. F. Engle and D. L. McFadden. Amsterdam: Elsevier.
Box, G. E. P. 1983. Obituary: G. M. Jenkins, 19331982. Journal of the Royal Statistical Society, Series A 146:
205206.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
David, J. S. 1999. sts14: Bivariate Granger causality test. Stata Technical Bulletin 51: 4041. Reprinted in Stata
Technical Bulletin Reprints, vol. 9, pp. 350351. College Station, TX: Stata Press.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
DeGroot, M. H. 1987. A conversation with George Box. Statistical Science 2: 239258.
Diggle, P. J. 1990. Time Series: A Biostatistical Introduction. Oxford: Oxford University Press.
Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley.
Friedman, M., and D. Meiselman. 1963. The relative stability of monetary velocity and the investment multiplier in
the United States, 18971958. In Stabilization Policies, Commission on Money and Credit, 123126. Englewood
Cliffs, NJ: Prentice Hall.
Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge:
Cambridge University Press.

arima ARIMA, ARMAX, and other dynamic regression models

97

Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
. 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press.
Hipel, K. W., and A. I. McLeod. 1994. Time Series Modelling of Water Resources and Environmental Systems.
Amsterdam: Elsevier.
Holan, S. H., R. Lund, and G. Davis. 2010. The ARMA alphabet soup: A tour of ARMA model variants. Statistics
Surveys 4: 232274.
Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Transactions of the ASMEJournal
of Basic Engineering, Series D 82: 3545.
McDowell, A. W. 2002. From the help desk: Transfer functions. Stata Journal 2: 7185.
. 2004. From the help desk: Polynomial distributed lag models. Stata Journal 4: 180189.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2007. Numerical Recipes: The Art of Scientific
Computing. 3rd ed. New York: Cambridge University Press.
Sanchez, G. 2012. Comparing predictions after arima with manual computations. The Stata Blog: Not Elsewhere
Classified. http://blog.stata.com/2012/02/16/comparing-predictions-after-arima-with-manual-computations/.
Shumway, R. H. 1984. Some applications of the EM algorithm to analyzing incomplete time series data. In Time
Series Analysis of Irregularly Observed Data, ed. E. Parzen, 290324. New York: Springer.
. 1988. Applied Statistical Time Series Analysis. Upper Saddle River, NJ: Prentice Hall.
Wang, Q., and N. Wu. 2012. Menu-driven X-12-ARIMA seasonal adjustment in Stata. Stata Journal 12: 214241.

Also see
[TS] arima postestimation Postestimation tools for arima
[TS] tsset Declare data to be time-series data
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] dfactor Dynamic-factor models
[TS] forecast Econometric model forecasting
[TS] mgarch Multivariate GARCH models
[TS] prais Prais Winsten and Cochrane Orcutt regression
[TS] sspace State-space models
[TS] ucm Unobserved-components model
[R] regress Linear regression
[U] 20 Estimation and postestimation commands

Title
arima postestimation Postestimation tools for arima
Description
Remarks and examples

Syntax for predict


Reference

Menu for predict


Also see

Options for predict

Description
The following postestimation commands are of special interest after arima:
Command

Description

estat acplot
estat aroots
irf
psdensity

estimate autocorrelations and autocovariances


check stability condition of estimates
create and analyze IRFs
estimate the spectral density

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
test
testnl

98

arima postestimation Postestimation tools for arima

99

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic options

Description

statistic
Main

xb
stdp
y
mse
residuals
yresiduals

predicted values for mean equationthe differenced series; the default


standard error of the linear prediction
predicted values for the mean equation in y the undifferenced series
mean squared error of the predicted values
residuals or predicted innovations
residuals or predicted innovations in y , reversing any time-series operators

These statistics are available both in and out of sample; type predict . . . if e(sample)
the estimation sample.
Predictions are not available for conditional ARIMA models fit to panel data.

. . . if wanted only for

Description

options
Options

dynamic(time constant)
t0(time constant)
structural

how to handle the lags of yt


set starting point for the recursions to time constant
calculate considering the structural component only

time constant is a # or a time literal, such as td(1jan1995) or tq(1995q1); see


Conveniently typing SIF values in [D] datetime.

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict


Five statistics can be computed using predict after arima: the predictions from the model (the
default also given by xb), the predictions after reversing any time-series operators applied to the
dependent variable (y), the MSE of xb (mse), the predictions of residuals or innovations (residual),
and the predicted residuals or innovations in terms of y (yresiduals). Given the dynamic nature
of the ARMA component and because the dependent variable might be differenced, there are other
ways of computing each. We can use all the data on the dependent variable that is available right
up to the time of each prediction (the default, which is often called a one-step prediction), or we
can use the data up to a particular time, after which the predicted value of the dependent variable is
used recursively to make later predictions (dynamic()). Either way, we can consider or ignore the
ARMA disturbance component (the component is considered by default and is ignored if you specify
structural).
All calculations can be made in or out of sample.

100

arima postestimation Postestimation tools for arima

Main

xb, the default, calculates the predictions from the model. If D.depvar is the dependent variable,
these predictions are of D.depvar and not of depvar itself.
stdp calculates the standard error of the linear prediction xb. stdp does not include the variation
arising from the disturbance equation; use mse to calculate standard errors and confidence bands
around the predicted values.
y specifies that predictions of depvar be made, even if the model was specified in terms of, say,
D.depvar.
mse calculates the MSE of the predictions.
residuals calculates the residuals. If no other options are specified, these are the predicted innovations
t ; that is, they include the ARMA component. If structural is specified, these are the residuals
t from the structural equation; see structural below.
yresiduals calculates the residuals in terms of depvar, even if the model was specified in terms of,
say, D.depvar. As with residuals, the yresiduals are computed from the model, including any
ARMA component. If structural is specified, any ARMA component is ignored, and yresiduals
are the residuals from the structural equation; see structural below.

Options

dynamic(time constant) specifies how lags of yt in the model are to be handled. If dynamic() is
not specified, actual values are used everywhere that lagged values of yt appear in the model to
produce one-step-ahead forecasts.
dynamic(time constant) produces dynamic (also known as recursive) forecasts. time constant
specifies when the forecast is to switch from one step ahead to dynamic. In dynamic forecasts,
references to yt evaluate to the prediction of yt for all periods at or after time constant; they
evaluate to the actual value of yt for all prior periods.
For example, dynamic(10) would calculate predictions in which any reference to yt with t < 10
evaluates to the actual value of yt and any reference to yt with t 10 evaluates to the prediction of
yt . This means that one-step-ahead predictions are calculated for t < 10 and dynamic predictions
thereafter. Depending on the lag structure of the model, the dynamic predictions might still refer
some actual values of yt .
You may also specify dynamic(.) to have predict automatically switch from one-step-ahead to
dynamic predictions at p + q , where p is the maximum AR lag and q is the maximum MA lag.
t0(time constant) specifies the starting point for the recursions to compute the predicted statistics;
disturbances are assumed to be 0 for t < t0(). The default is to set t0() to the minimum t
observed in the estimation sample, meaning that observations before that are assumed to have
disturbances of 0.
t0() is irrelevant if structural is specified because then all observations are assumed to have
disturbances of 0.
t0(5) would begin recursions at t = 5. If the data were quarterly, you might instead type
t0(tq(1961q2)) to obtain the same result.
The ARMA component of ARIMA models is recursive and depends on the starting point of the
predictions. This includes one-step-ahead predictions.
structural specifies that the calculation be made considering the structural component only, ignoring
the ARMA terms, producing the steady-state equilibrium predictions.

arima postestimation Postestimation tools for arima

101

Remarks and examples


Remarks are presented under the following headings:
Forecasting after ARIMA
IRF results for ARIMA

Forecasting after ARIMA


We assume that you have already read [TS] arima. In this section, we illustrate some of the features
of predict after fitting ARIMA, ARMAX, and other dynamic models by using arima. In example 2
of [TS] arima, we fit the model

ln(wpit ) = 0 + 1 { ln(wpit1 ) 0 } + 1 t1 + 4 t4 + t


by typing
. use http://www.stata-press.com/data/r13/wpi1
. arima D.ln_wpi, ar(1) ma(1 4)
(output omitted )

If we use the command


. predict xb, xb

then Stata computes xbt as

xbt = b0 + b1 { ln(wpit1 ) b0 } + b1 b
t1 + b4 b
t4
where

ln(wpitj ) xbtj t j > 0


0
otherwise
meaning that predict newvar, xb calculates predictions by using the metric of the dependent variable.
In this example, the dependent variable represented changes in ln(wpit ), and so the predictions are
likewise for changes in that variable.
b
tj =

If we instead use
. predict y, y

Stata computes yt as yt = xbt + ln(wpit1 ) so that yt represents the predicted levels of ln(wpit ). In
general, predict newvar, y will reverse any time-series operators applied to the dependent variable
during estimation.
If we want to ignore the ARMA error components when making predictions, we use the structural
option,
. predict xbs, xb structural

which generates xbst = b0 because there are no regressors in this model, and
. predict ys, y structural

generates yst = b0 + ln(wpit1 )

102

arima postestimation Postestimation tools for arima

Example 1: Dynamic forecasts


An attractive feature of the arima command is the ability to make dynamic forecasts. In example 4
of [TS] arima, we fit the model

consumpt = 0 + 1 m2t + t
t = t1 + t1 + t
First, we refit the model by using data up through the first quarter of 1978, and then we will evaluate
the one-step-ahead and dynamic forecasts.
. use http://www.stata-press.com/data/r13/friedman2
. keep if time<=tq(1981q4)
(67 observations deleted)
. arima consump m2 if tin(, 1978q1), ar(1) ma(1)
(output omitted )

To make one-step-ahead forecasts, we type


. predict chat, y
(52 missing values generated)

(Because our dependent variable contained no time-series operators, we could have instead used
predict chat, xb and accomplished the same thing.) We will also make dynamic forecasts,
switching from observed values of consump to forecasted values at the first quarter of 1978:
. predict chatdy, dynamic(tq(1978q1)) y
(52 missing values generated)

The following graph compares the forecasted values to the observed values for the first few years
following the estimation sample:

1200

Billions of dollars
1400
1600
1800

2000

Personal consumption

1977q1

1978q1

1979q1
1980q1
Quarter

Observed
Dynamic forecast (1978q1)

1981q1

1982q1

Onestepahead forecast

The one-step-ahead forecasts never deviate far from the observed values, though over time the
dynamic forecasts have larger errors. To understand why that is the case, rewrite the model as

consumpt = 0 + 1 m2t + t1 + t1 + t



= 0 + 1 m2t + consumpt1 0 1 m2t1 + t1 + t

arima postestimation Postestimation tools for arima

103

This form shows that the forecasted value of consumption at time t depends on the value of consumption
at time t 1. When making the one-step-ahead forecast for period t, we know the actual value of
consumption at time t 1. On the other hand, with the dynamic(tq(1978q1)) option, the forecasted
value of consumption for period 1978q1 is based on the observed value of consumption in period
1977q4, but the forecast for 1978q2 is based on the forecast value for 1978q1, the forecast for 1978q3
is based on the forecast value for 1978q2, and so on. Thus, with dynamic forecasts, prior forecast
errors accumulate over time. The following graph illustrates this effect.

200

Forecast Actual
150
100
50

Forecast error

1978q1

1979q1

1980q1
Quarter

Onestepahead forecast

1981q1

1982q1

Dynamic forecast (1978q1)

IRF results for ARIMA


We assume that you have already read [TS] irf and [TS] irf create. In this section, we illustrate
how to calculate the implulseresponse function (IRF) of an ARIMA model.

Example 2
Consider a model of the quarterly U.S. money supply, as measured by M1, from Enders (2004).
Enders (2004, 9397) discusses why seasonal shopping patterns cause seasonal effects in M1. The
variable lnm1 contains data on the natural log of the money supply. We fit seasonal and nonseasonal
ARIMA models and compare the IRFs calculated from both models.
We fit the following nonseasonal ARIMA model

4 lnm1t = 1 (4 lnm1t1 ) + 4 (4 lnm1t4 ) + t

104

arima postestimation Postestimation tools for arima

The code below fits the above model and saves a set of IRF results to a file called myirf.irf.
. use http://www.stata-press.com/data/r13/m1nsa, clear
(U.S. money supply (M1) from Enders (2004), 95-99.)
. arima DS4.lnm1, ar(1 4) noconstant nolog
ARIMA regression
Sample:

1961q2 - 2008q2

Log likelihood =

579.3036

DS4.lnm1

Coef.

Number of obs
Wald chi2(2)
Prob > chi2
OPG
Std. Err.

P>|z|

=
=
=

189
78.34
0.0000

[95% Conf. Interval]

ARMA
ar
L1.
L4.

.3551862
-.3275808

.0503011
.0594953

7.06
-5.51

0.000
0.000

.2565979
-.4441895

.4537745
-.210972

/sigma

.0112678

.0004882

23.08

0.000

.0103109

.0122246

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
. irf
(file
(file
(file

create nonseasonal, set(myirf) step(30)


myirf.irf created)
myirf.irf now active)
myirf.irf updated)

We fit the following seasonal ARIMA model

(1 1 L)(1 4,1 L4 )4 lnm1t = t


The code below fits this nonseasonal ARIMA model and saves a set of IRF results to the active IRF
file, which is myirf.irf.
. arima DS4.lnm1, ar(1) mar(1,4) noconstant nolog
ARIMA regression
Sample:

1961q2 - 2008q2

Log likelihood =

588.6689

DS4.lnm1

Coef.

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

189
119.78
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ARMA
ar
L1.

.489277

.0538033

9.09

0.000

.3838245

.5947296

ar
L1.

-.4688653

.0601248

-7.80

0.000

-.5867076

-.3510229

/sigma

.0107075

.0004747

22.56

0.000

.0097771

.0116379

ARMA4

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
. irf create seasonal, step(30)
(file myirf.irf updated)

arima postestimation Postestimation tools for arima

105

We now have two sets of IRF results in the file myirf.irf. We can graph both IRF functions side
by side by calling irf graph.
. irf graph irf
nonseasonal, DS4.lnm1, DS4.lnm1

seasonal, DS4.lnm1, DS4.lnm1

.5

.5
0

10

20

30

10

20

30

step
95% CI

impulseresponse function (irf)

Graphs by irfname, impulse variable, and response variable

The trajectories of the IRF functions are similar: each figure shows that a shock to lnm1 causes a
temporary oscillation in lnm1 that dies out after about 15 time periods. This behavior is characteristic
of short-memory processes.

See [TS] psdensity for an introduction to estimating spectral densities using the parameters estimated
by arima.

Reference
Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley.

Also see
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] estat acplot Plot parametric autocorrelation and autocovariance functions
[TS] estat aroots Check the stability condition of ARIMA estimates
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] psdensity Parametric spectral density estimation after arima, arfima, and ucm
[U] 20 Estimation and postestimation commands

Title
corrgram Tabulate and graph autocorrelations
Syntax
Options for corrgram
Stored results
References

Menu
Options for ac and pac
Methods and formulas
Also see

Description
Remarks and examples
Acknowledgment

Syntax
Autocorrelations, partial autocorrelations, and portmanteau (Q) statistics
    

corrgram varname if
in
, corrgram options
Graph autocorrelations with confidence intervals

    
ac varname if
in
, ac options
Graph partial autocorrelations with confidence intervals

    
pac varname if
in
, pac options
corrgram options

Description

Main

lags(#)
noplot
yw

calculate # autocorrelations
suppress character-based plots
calculate partial autocorrelations by using YuleWalker equations

ac options

Description

Main

lags(#)
generate(newvar)
level(#)
fft

calculate # autocorrelations
generate a variable to hold the autocorrelations
set confidence level; default is level(95)
calculate autocorrelation by using Fourier transforms

Plot

line options
marker options
marker label options

change look of dropped lines


change look of markers (color, size, etc.)
add marker labels; change look or position

CI plot

ciopts(area options)

affect rendition of the confidence bands

Add plots

addplot(plot)

add other plots to the generated graph

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

106

corrgram Tabulate and graph autocorrelations

107

Description

pac options
Main

calculate # partial autocorrelations


generate a variable to hold the partial autocorrelations
calculate partial autocorrelations by using YuleWalker equations
set confidence level; default is level(95)

lags(#)
generate(newvar)
yw
level(#)
Plot

change look of dropped lines


change look of markers (color, size, etc.)
add marker labels; change look or position

line options
marker options
marker label options
CI plot

affect rendition of the confidence bands

ciopts(area options)
SRV plot

srv
srvopts(marker options)

include standardized residual variances in graph


affect rendition of the plotted standardized residual variances (SRVs)

Add plots

add other plots to the generated graph

addplot(plot)

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

You must tsset your data before using corrgram, ac, or pac; see [TS] tsset. Also, the time series
must be dense (nonmissing and no gaps in the time variable) in the sample if you specify the fft option.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
corrgram
Statistics

>

Time series

>

Graphs

>

Autocorrelations & partial autocorrelations

>

Time series

>

Graphs

>

Correlogram (ac)

>

Time series

>

Graphs

>

Partial correlogram (pac)

ac
Statistics

pac
Statistics

Description
corrgram produces a table of the autocorrelations, partial autocorrelations, and portmanteau (Q)
statistics. It also displays a character-based plot of the autocorrelations and partial autocorrelations.
See [TS] wntestq for more information on the Q statistic.
ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals that
is based on Bartletts formula for MA(q) processes.

108

corrgram Tabulate and graph autocorrelations

pac produces a partial correlogram (agraph of partial autocorrelations) with confidence intervals
calculated using a standard error of 1/ n. The residual variances for each lag may optionally be
included on the graph.

Options for corrgram




Main

lags(#) specifies the number of autocorrelations to calculate. The default is to use min(bn/2c 2, 40),
where bn/2c is the greatest integer less than or equal to n/2.
noplot prevents the character-based plots from being in the listed table of autocorrelations and partial
autocorrelations.
yw specifies that the partial autocorrelations be calculated using the YuleWalker equations instead
of using the default regression-based technique. yw cannot be used if srv is used.

Options for ac and pac




Main

lags(#) specifies the number of autocorrelations to calculate. The default is to use min(bn/2c 2, 40),
where bn/2c is the greatest integer less than or equal to n/2.
generate(newvar) specifies a new variable to contain the autocorrelation (ac command) or partial
autocorrelation (pac command) values. This option is required if the nograph option is used.
nograph (implied when using generate() in the dialog box) prevents ac and pac from constructing
a graph. This option requires the generate() option.
yw (pac only) specifies that the partial autocorrelations be calculated using the YuleWalker equations
instead of using the default regression-based technique. yw cannot be used if srv is used.
level(#) specifies the confidence level, as a percentage, for the confidence bands in the ac or pac
graph. The default is level(95) or as set by set level; see [R] level.
fft (ac only) specifies that the autocorrelations be calculated using two Fourier transforms. This
technique can be faster than simply iterating over the requested number of lags.

Plot

line options, marker options, and marker label options affect the rendition of the plotted autocorrelations (with ac) or partial autocorrelations (with pac).
line options specify the look of the dropped lines, including pattern, width, and color; see
[G-3] line options.
marker options specify the look of markers. This look includes the marker symbol, the marker
size, and its color and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see
[G-3] marker label options.

CI plot

ciopts(area options) affects the rendition of the confidence bands; see [G-3] area options.

corrgram Tabulate and graph autocorrelations

109

SRV plot

srv (pac only) specifies that the standardized residual variances be plotted with the partial autocorrelations. srv cannot be used if yw is used.
srvopts(marker options) (pac only) affects the rendition of the plotted standardized residual
variances; see [G-3] marker options. This option implies the srv option.

Add plots

addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Remarks are presented under the following headings:
Basic examples
Video example

Basic examples
corrgram tabulates autocorrelations, partial autocorrelations, and portmanteau (Q) statistics and
plots the autocorrelations and partial autocorrelations. The Q statistics are the same as those produced
by [TS] wntestq. ac produces graphs of the autocorrelations, and pac produces graphs of the partial
autocorrelations. See Becketti (2013) for additional examples of how these commands are used in
practice.

Example 1
Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G).
This dataset has 144 observations on the monthly number of international airline passengers from
1949 through 1960. We can list the autocorrelations and partial autocorrelations by using corrgram.

110

corrgram Tabulate and graph autocorrelations


. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. corrgram air, lags(20)
LAG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

AC
0.9480
0.8756
0.8067
0.7526
0.7138
0.6817
0.6629
0.6556
0.6709
0.7027
0.7432
0.7604
0.7127
0.6463
0.5859
0.5380
0.4997
0.4687
0.4499
0.4416

PAC
0.9589
-0.3298
0.2018
0.1450
0.2585
-0.0269
0.2043
0.1561
0.5686
0.2926
0.8402
0.6127
-0.6660
-0.3846
0.0787
-0.0266
-0.0581
-0.0435
0.2773
-0.0405

Prob>Q

132.14
245.65
342.67
427.74
504.8
575.6
643.04
709.48
779.59
857.07
944.39
1036.5
1118
1185.6
1241.5
1289
1330.4
1367
1401.1
1434.1

0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000

-1
0
1 -1
0
1
[Autocorrelation] [Partial Autocor]

We can use ac to produce a graph of the autocorrelations.

1.00

Autocorrelations of air
0.50
0.00
0.50

1.00

. ac air, lags(20)

10
Lag

15

20

Bartletts formula for MA(q) 95% confidence bands

The data probably have a trend component as well as a seasonal component. First-differencing
will mitigate the effects of the trend, and seasonal differencing will help control for seasonality. To
accomplish this goal, we can use Statas time-series operators. Here we graph the partial autocorrelations
after controlling for trends and seasonality. We also use srv to include the standardized residual
variances.

corrgram Tabulate and graph autocorrelations

111

Partial autocorrelations of DS12.air


0.50
0.00
0.50
1.00

. pac DS12.air, lags(20) srv

10
Lag

15

20

95% CI
Partial autocorrelations of DS12.air
Standardized variances
95% Confidence bands [se = 1/sqrt(n)]

See [U] 11.4.4 Time-series varlists for more information about time-series operators.

Video example
Time series, part 4: Correlograms and partial correlograms

Stored results
corrgram stores the following in r():
Scalars
r(lags)
r(ac#)
r(pac#)
r(q#)

number of lags
AC for lag #
PAC for lag #
Q for lag #

Matrices
r(AC)
r(PAC)
r(Q)

vector of autocorrelations
vector of partial autocorrelations
vector of Q statistics

Methods and formulas


Box, Jenkins, and Reinsel (2008, sec. 2.1.4); Newton (1988); Chatfield (2004); and Hamilton (1994)
provide excellent descriptions of correlograms. Newton (1988) also discusses the calculation of the
various quantities.
The autocovariance function for a time series x1 , x2 , . . . , xn is defined for |v| < n as
n|v|
1 X
b
R(v)
=
(xi x)(xi+v x)
n i=1

112

corrgram Tabulate and graph autocorrelations

where x is the sample mean, and the autocorrelation function is then defined as

bv =

b
R(v)
b
R(0)

The variance of bv is given by Bartletts formula for MA(q) processes. From Brockwell and Davis (2002,
94), we have

1/n

 v=1
v1
P
Var(b
v ) =
b2 (i)
v>1
n1 1 + 2
i=1

The partial autocorrelation at lag v measures the correlation between xt and xt+v after the effects
of xt+1 , . . . , xt+v1 have been removed. By default, corrgram and pac use a regression-based
method to estimate it. We run an OLS regression of xt on xt1 , . . . , xtv and a constant term. The
estimated coefficient on xtv is our estimate of the v th partial autocorrelation. The residual variance
b .
is the estimated variance of that regression, which we then standardize by dividing by R(0)
If the yw option is specified, corrgram and pac use the YuleWalker equations to estimate the
partial autocorrelations. Per Enders (2010, 6667), let vv denote the v th partial autocorrelation
coefficient. We then have
b11 = b1
and for v > 1

bv
bvv =

v1
P
j=1

bv1,j bvj

v1
P
j=1

bv1,j bj

and

bvj = bv1,j bvv bv1,vj

j = 1, 2, . . . , v 1

Unlike the regression-based method, the YuleWalker equations-based method ensures that the firstsample partial autocorrelation equal the first-sample autocorrelation coefficient, as must be true in the
population; see Greene (2008, 725).
McCullough (1998) discusses other methods of estimating vv ; he finds that relative to other
methods, such as linear regression, the YuleWalker equations-based method performs poorly, in part
because it is susceptible to numerical error. Box, Jenkins, and Reinsel (2008, 69) also caution against
using the YuleWalker equations-based method, especially with data that are nearly nonstationary.

Acknowledgment
The ac and pac commands are based on the ac and pac commands written by Sean Becketti (1992),
a past editor of the Stata Technical Bulletin and author of the Stata Press book Introduction to Time
Series Using Stata.

References
Becketti, S. 1992. sts1: Autocorrelation and partial autocorrelation graphs. Stata Technical Bulletin 5: 2728. Reprinted
in Stata Technical Bulletin Reprints, vol. 1, pp. 221223. College Station, TX: Stata Press.
. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.

corrgram Tabulate and graph autocorrelations

113

Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Brockwell, P. J., and R. A. Davis. 2002. Introduction to Time Series and Forecasting. 2nd ed. New York: Springer.
Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Enders, W. 2010. Applied Econometric Time Series. 3rd ed. New York: Wiley.
Greene, W. H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
McCullough, B. D. 1998. Algorithm choice for (partial) autocorrelation functions. Journal of Economic and Social
Measurement 24: 265278.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.

Also see
[TS] tsset Declare data to be time-series data
[TS] pergram Periodogram
[TS] wntestq Portmanteau (Q) test for white noise

Title
cumsp Cumulative spectral distribution
Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
cumsp varname

if

 

in

 

, options

Description

options
Main

create newvar holding distribution values

generate(newvar)
Plot

affect rendition of the plotted points connected by lines


change look of markers (color, size, etc.)
add marker labels; change look or position

cline options
marker options
marker label options
Add plots

add other plots to the generated graph

addplot(plot)

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

You must tsset your data before using cumsp; see [TS] tsset. Also, the time series must be dense
(nonmissing with no gaps in the time variable) in the sample specified.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Graphs

>

Cumulative spectral distribution

Description
cumsp plots the cumulative sample spectral-distribution function evaluated at the natural frequencies
for a (dense) time series.

Options


Main

generate(newvar) specifies a new variable to contain the estimated cumulative spectral-distribution


values.

Plot

cline options affect the rendition of the plotted points connected by lines; see [G-3] cline options.
114

cumsp Cumulative spectral distribution

115

marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.

Add plots

addplot(plot) provides a way to add other plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Example 1
Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G).
This dataset has 144 observations on the monthly number of international airline passengers from
1949 through 1960. In the cumulative sample spectral distribution function for these data, we also
request a vertical line at frequency 1/12. Because the data are monthly, there will be a pronounced
jump in the cumulative sample spectral-distribution plot at the 1/12 value if there is an annual cycle
in the data.
. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. cumsp air, xline(.083333333)

1.00
0.00

0.00

0.20

0.40

0.60

0.80

Airline Passengers (19491960)


Cumulative spectral distribution
0.20
0.40
0.60
0.80

1.00

Sample spectral distribution function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

Points evaluated at the natural frequencies

The cumulative sample spectral-distribution function clearly illustrates the annual cycle.

116

cumsp Cumulative spectral distribution

Methods and formulas


A time series of interest is decomposed into a unique set of sinusoids of various frequencies and
amplitudes.
A plot of the sinusoidal amplitudes versus the frequencies for the sinusoidal decomposition of a
time series gives us the spectral density of the time series. If we calculate the sinusoidal amplitudes
for a discrete set of natural frequencies (1/n, 2/n, . . . , q/n), we obtain the periodogram.
Let x(1), . . . , x(n) be a time series, and let k = (k 1)/n denote the natural frequencies for
k = 1, . . . , bn/2c + 1 where b c indicates the greatest integer function. Define

Ck2

1
= 2
n


2
n
X


2i(t1)k
x(t)e




t=1

A plot of nCk2 versus k is then called the periodogram.


The sample spectral density may then be defined as fb(k ) = nCk2 .
If we let fb(1 ), . . . , fb(Q ) be the sample spectral density function of the time series evaluated
at the frequencies j = (j 1)/Q for j = 1, . . . , Q and we let q = bQ/2c + 1, then
k
X

Fb(k ) =

fb(j )

i=1
q
X

fb(j )

i=1

is the sample spectral-distribution function of the time series.

References
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.

Also see
[TS] tsset Declare data to be time-series data
[TS] corrgram Tabulate and graph autocorrelations
[TS] pergram Periodogram

Title
dfactor Dynamic-factor models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
dfactor obs eq

fac eq

 

if

 

in

 

, options

obs eq specifies the equation for the observed dependent variables, and it has the form
 


, sopts )
(depvars = exog d
fac eq specifies the equation for the unobserved factors, and it has the form

 

(facvars = exog f
, sopts )
depvars are the observed dependent variables. exog d are the exogenous variables that enter into
the equations for the observed dependent variables. (All factors are automatically entered into the
equations for the observed dependent variables.) facvars are the names for the unobserved factors
in the model. You may specify the names of existing variables in facvars, but dfactor treats
them only as names and takes no notice that they are also variables. exog f are the exogenous
variables that enter into the equations for the factors.
options

Description

Model

constraints(constraints)

apply specified linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, display of omitted variables
and base and empty cells, and factor-variable labeling

Maximization

maximize options
from(matname)

control the maximization process; seldom used


specify initial values for the maximization process; seldom used

Advanced

method(method)

specify the method for calculating the log likelihood; seldom used

coeflegend

display legend instead of statistics

117

118

dfactor Dynamic-factor models

Description

sopts
Model

suppress constant term from the equation; allowed only


in obs eq
ar(numlist)
autoregressive terms
structure of autoregressive coefficient matrices
arstructure(arstructure)
covstructure(covstructure) covariance structure
noconstant

arstructure

Description

diagonal
ltriangular
general

diagonal matrix; the default


lower triangular matrix
general matrix

covstructure

Description

identity
dscalar
diagonal
unstructured

identity matrix
diagonal scalar matrix
diagonal matrix
symmetric, positive-definite matrix

method

Description

hybrid

use the stationary Kalman filter and the De Jong diffuse Kalman
filter; the default
use the stationary De Jong method and the De Jong diffuse Kalman
filter

dejong

You must tsset your data before using dfactor; see [TS] tsset.
exog d and exog f may contain factor variables; see [U] 11.4.3 Factor variables.
depvars, exog d, and exog f may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Dynamic-factor models

Description
dfactor estimates the parameters of dynamic-factor models by maximum likelihood. Dynamicfactor models are flexible models for multivariate time series in which unobserved factors have a
vector autoregressive structure, exogenous covariates are permitted in both the equations for the latent
factors and the equations for observable dependent variables, and the disturbances in the equations
for the dependent variables may be autocorrelated.

dfactor Dynamic-factor models

119

Options


Model

constraints(constraints) apply linear constraints. Some specifications require linear constraints for
parameter identification.
noconstant suppresses the constant term.
ar(numlist) specifies the vector autoregressive lag structure in the equation. By default, no lags are
included in either the observable or the factor equations.
arstructure(diagonal|ltriangular|general) specifies the structure of the matrices in the vector
autoregressive lag structure.
arstructure(diagonal) specifies the matrices to be diagonalseparate parameters for each
lag, but no cross-equation autocorrelations. arstructure(diagonal) is the default for both
the observable and the factor equations.
arstructure(ltriangular) specifies the matrices to be lower triangularparameterizes a
recursive, or Wold causal, structure.
arstructure(general) specifies the matrices to be general matricesseparate parameters for
each possible autocorrelation and cross-correlation.
covstructure(identity | dscalar | diagonal | unstructured) specifies the covariance structure
of the errors.
covstructure(identity) specifies a covariance matrix equal to an identity matrix, and it is the
default for the errors in the factor equations.
covstructure(dscalar) specifies a covariance matrix equal to 2 times an identity matrix.
covstructure(diagonal) specifies a diagonal covariance matrix, and it is the default for the
errors in the observable variables.
covstructure(unstructured) specifies a symmetric, positive-definite covariance matrix with
parameters for all variances and covariances.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, causes dfactor to use the observed information matrix estimator.
vce(robust) causes dfactor to use the Huber/White/sandwich estimator.

Reporting

level(#); see [R] estimation options.


nocnsreport; see [R] estimation options.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see
[R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and
see below for information on from(). These options are seldom used.

120

dfactor Dynamic-factor models

from(matname) specifies initial values for the maximization process. from(b0) causes dfactor
to begin the maximization algorithm with the values in b0. b0 must be a row vector; the number
of columns must equal the number of parameters in the model; and the values in b0 must be
in the same order as the parameters in e(b). This option is seldom used.

Advanced

method(method) specifies how to compute the log likelihood. dfactor writes the model in statespace form and uses sspace to estimate the parameters; see [TS] sspace. method() offers two
methods for dealing with some of the technical aspects of the state-space likelihood. This option
is seldom used.
method(hybrid), the default, uses the Kalman filter with model-based initial values when the
model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when the model
is nonstationary.
method(dejong) uses the De Jong (1988) method for estimating the initial values for the Kalman
filter when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter
when the model is nonstationary.
The following option is available with dfactor but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Remarks are presented under the following headings:
An introduction to dynamic-factor models
Some examples

An introduction to dynamic-factor models


dfactor estimates the parameters of dynamic-factor models by maximum likelihood (ML). Dynamicfactor models represent a vector of k endogenous variables as linear functions of nf < k unobserved
factors and some exogenous covariates. The unobserved factors and the disturbances in the equations
for the observed variables may follow vector autoregressive structures.
Dynamic-factor models have been developed and applied in macroeconomics; see Geweke (1977),
Sargent and Sims (1977), Stock and Watson (1989, 1991), and Watson and Engle (1983).
Dynamic-factor models are very flexible; in a sense, they are too flexible. Constraints must be
imposed to identify the parameters of dynamic-factor and static-factor models. The parameters in the
default specifications in dfactor are identified, but other specifications require additional restrictions.
The factors are identified only up to a sign, which means that the coefficients on the unobserved factors
can flip signs and still produce the same predictions and the same log likelihood. The flexibility of
the model sometimes produces convergence problems.
dfactor is designed to handle cases in which the number of modeled endogenous variables, k ,
is small. The ML estimator is implemented by writing the model in state-space form and by using
the Kalman filter to derive and implement the log likelihood. As k grows, the number of parameters
quickly exceeds the number that can be estimated.

dfactor Dynamic-factor models

121

A dynamic-factor model has the form

yt = Pft + Qxt + ut
ft = Rwt + A1 ft1 + A2 ft2 + + Atp ftp + t
ut = C1 ut1 + C2 ut2 + + Ctq utq + t
where the definitions are given in the following table:
Item
yt
P
ft
Q
xt
ut
R
wt
Ai
t
Ci
t

Dimension
k1
k nf
nf 1
k nx
nx 1
k1
nf nw
nw 1
nf nf
nf 1
kk
k1

Definition
vector of dependent variables
matrix of parameters
vector of unobservable factors
matrix of parameters
vector of exogenous variables
vector of disturbances
matrix of parameters
vector of exogenous variables
matrix of autocorrelation parameters for i {1, 2, . . . , p}
vector of disturbances
matrix of autocorrelation parameters for i {1, 2, . . . , q}
vector of disturbances

By selecting different numbers of factors and lags, the dynamic-factor model encompasses the six
models in the table below:
Dynamic factors with vector autoregressive errors
Dynamic factors
Static factors with vector autoregressive errors
Static factors
Vector autoregressive errors
Seemingly unrelated regression

(DFAR)
(DF)
(SFAR)
(SF)
(VAR)
(SUR)

nf
nf
nf
nf
nf
nf

>0
>0
>0
>0
=0
=0

p>0
p>0
p=0
p=0
p=0
p=0

q
q
q
q
q
q

>0
=0
>0
=0
>0
=0

In addition to the time-series models, dfactor can estimate the parameters of SF models and SUR
models. dfactor can place equality constraints on the disturbance covariances, which sureg and
var do not allow.

Some examples
Example 1: Dynamic-factor model
Stock and Watson (1989, 1991) wrote a simple macroeconomic model as a DF model, estimated the
parameters by ML, and extracted an economic indicator. In this example, we estimate the parameters
of a DF model. In [TS] dfactor postestimation, we extend this example and extract an economic
indicator for the differenced series.
We have data on an industrial-production index, ipman; real disposable income, income; an
aggregate weekly hours index, hours; and aggregate unemployment, unemp. We believe that these
variables are first-difference stationary. We model their first-differences as linear functions of an
unobserved factor that follows a second-order autoregressive process.

122

dfactor Dynamic-factor models


. use http://www.stata-press.com/data/r13/dfex
(St. Louis Fed (FRED) macro data)
. dfactor (D.(ipman income hours unemp) = , noconstant) (f = , ar(1/2))
searching for initial values ..................
(setting technique to bhhh)
Iteration 0:
log likelihood = -675.18934
Iteration 1:
log likelihood = -667.47825
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -662.09507
Iteration 1:
log likelihood = -662.09507
Dynamic-factor model
Sample: 1972m2 - 2008m11

Number of obs
Wald chi2(6)
Prob > chi2

Log likelihood = -662.09507

Coef.

=
=
=

442
751.95
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

f
f
L1.
L2.

.2651932
.4820398

.0568663
.0624635

4.66
7.72

0.000
0.000

.1537372
.3596136

.3766491
.604466

.3502249

.0287389

12.19

0.000

.2938976

.4065522

.0746338

.0217319

3.43

0.001

.0320401

.1172276

.2177469

.0186769

11.66

0.000

.1811407

.254353

-.0676016

.0071022

-9.52

0.000

-.0815217

-.0536816

.1383158
.2773808
.0911446
.0237232

.0167086
.0188302
.0080847
.0017932

8.28
14.73
11.27
13.23

0.000
0.000
0.000
0.000

.1055675
.2404743
.0752988
.0202086

.1710641
.3142873
.1069903
.0272378

D.ipman

D.income

D.hours

D.unemp

var(De.ipman)
var(De.inc~e)
var(De.hours)
var(De.unemp)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

For a discussion of the atypical iteration log, see example 1 in [TS] sspace.
The header in the output describes the estimation sample, reports the log-likelihood function at the
maximum, and gives the results of a Wald test against the null hypothesis that the coefficients on the
independent variables, the factors, and the autoregressive components are all zero. In this example,
the null hypothesis that all parameters except for the variance parameters are zero is rejected at all
conventional levels.
The results in the estimation table indicate that the unobserved factor is quite persistent and that
it is a significant predictor for each of the observed variables.

dfactor Dynamic-factor models

123

dfactor writes the DF model as a state-space model and uses the same methods as sspace to
estimate the parameters. Example 5 in [TS] sspace writes the model considered here in state-space
form and uses sspace to estimate the parameters.

Technical note
The signs of the coefficients on the unobserved factors are not identified. They are not identified
because we can multiply the unobserved factors and the coefficients on the unobserved factors by
negative one without changing the log likelihood or any of the model predictions.
Altering either the starting values for the maximization process, the maximization technique()
used, or the platform on which the command is run can cause the signs of the estimated coefficients
on the unobserved factors to change.
Changes in the signs of the estimated coefficients on the unobserved factors do not alter the
implications of the model or the model predictions.

Example 2: Dynamic-factor model with covariates


Here we extend the previous example by allowing the errors in the equations for the observables to
be autocorrelated. This extension yields a constrained VAR model with an unobserved autocorrelated
factor.
We estimate the parameters by typing

124

dfactor Dynamic-factor models


. dfactor (D.(ipman income hours unemp) = , noconstant ar(1)) (f = , ar(1/2))
searching for initial values ..............
(setting technique to bhhh)
Iteration 0:
log likelihood = -654.19377
Iteration 1:
log likelihood = -627.46986
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -610.28846
Iteration 1:
log likelihood = -610.28846
Dynamic-factor model
Sample: 1972m2 - 2008m11
Number of obs
=
442
Wald chi2(10)
=
990.91
Log likelihood = -610.28846
Prob > chi2
=
0.0000

Coef.

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

f
f
L1.
L2.

.4058457
.3663499

.0906183
.0849584

4.48
4.31

0.000
0.000

.2282371
.1998344

.5834544
.5328654

De.ipman
e.ipman
LD.

-.2772149

.068808

-4.03

0.000

-.4120761

-.1423538

De.income
e.income
LD.

-.2213824

.0470578

-4.70

0.000

-.3136141

-.1291508

De.hours
e.hours
LD.

-.3969317

.0504256

-7.87

0.000

-.495764

-.2980994

De.unemp
e.unemp
LD.

-.1736835

.0532071

-3.26

0.001

-.2779675

-.0693995

.3214972

.027982

11.49

0.000

.2666535

.3763408

.0760412

.0173844

4.37

0.000

.0419684

.110114

.1933165

.0172969

11.18

0.000

.1594151

.2272179

-.0711994

.0066553

-10.70

0.000

-.0842435

-.0581553

.1387909
.2636239
.0822919
.0218056

.0154558
.0179043
.0071096
.0016658

8.98
14.72
11.57
13.09

0.000
0.000
0.000
0.000

.1084981
.2285322
.0683574
.0185407

.1690837
.2987157
.0962265
.0250704

D.ipman

D.income

D.hours

D.unemp

var(De.ipman)
var(De.inc~e)
var(De.hours)
var(De.unemp)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

dfactor Dynamic-factor models

125

The autoregressive (AR) terms are displayed in error notation. e.varname stands for the error in
the equation for varname. The estimate of the pth AR term from y1 on y2 is reported as Lpe.y1 in
equation e.y2. In the above output, the estimated first-order AR term of D.ipman on D.ipman is
0.277 and is labeled as LDe.ipman in equation De.ipman.

The previous two examples illustrate how to use dfactor to estimate the parameters of DF models.
Although the previous example indicates that the more general DFAR model fits the data well, we use
these data to illustrate how to estimate the parameters of more restrictive models.

Example 3: A VAR with constrained error variance


In this example, we use dfactor to estimate the parameters of a SUR model with constraints on the
error-covariance matrix. The model is also a constrained VAR with constraints on the error-covariance
matrix, because we include the lags of two dependent variables as exogenous variables to model the
dynamic structure of the data. Previous exploratory work suggested that we should drop the lag of
D.unemp from the model.

126

dfactor Dynamic-factor models


. constraint 1 [cov(De.unemp,De.income)]_cons = 0
. dfactor (D.(ipman income unemp) = LD.(ipman income), noconstant
> covstructure(unstructured)), constraints(1)
searching for initial values ............
(setting technique to bhhh)
Iteration 0:
log likelihood = -569.3512
Iteration 1:
log likelihood = -548.76963
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -535.12973
Iteration 1:
log likelihood = -535.12973
Dynamic-factor model
Sample: 1972m3 - 2008m11

Number of obs
Wald chi2(6)
Prob > chi2

Log likelihood = -535.12973


( 1) [cov(De.income,De.unemp)]_cons = 0

Coef.

=
=
=

441
88.32
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

D.ipman
ipman
LD.

.206276

.0471654

4.37

0.000

.1138335

.2987185

income
LD.

.1867384

.0512139

3.65

0.000

.086361

.2871158

D.income
ipman
LD.

.1043733

.0434048

2.40

0.016

.0193015

.1894451

income
LD.

-.1957893

.0471305

-4.15

0.000

-.2881634

-.1034153

D.unemp
ipman
LD.

-.0865823

.0140747

-6.15

0.000

-.1141681

-.0589964

income
LD.

-.0200749

.0152828

-1.31

0.189

-.0500285

.0098788

.3243902

.0218533

14.84

0.000

.2815584

.3672219

.0445794

.013696

3.25

0.001

.0177358

.071423

-.0298076
.2747234

.0047755
.0185008

-6.24
14.85

0.000
0.000

-.0391674
.2384624

-.0204478
.3109844

(constrained)
.0019453
14.85

0.000

.0250738

.0326994

var(De.ipman)
cov(De.ipman,
De.income)
cov(De.ipman,
De.unemp)
var(De.inc~e)
cov(De.inc~e,
De.unemp)
var(De.unemp)

0
.0288866

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output indicates that the model fits well, except that the lag of first-differenced income is not
a significant predictor of first-differenced unemployment.

dfactor Dynamic-factor models

127

Technical note
The previous example shows how to use dfactor to estimate the parameters of a SUR model
with constraints on the error-covariance matrix. Neither sureg nor var allows for constraints on the
error-covariance matrix. Without the constraints on the error-covariance matrix and including the lag
of D.unemp,
. dfactor (D.(ipman income unemp) = LD.(ipman income unemp),
> noconstant covstructure(unstructured))
(output omitted )
. var D.(ipman income unemp), lags(1) noconstant
(output omitted )

and
. sureg (D.ipman LD.(ipman income unemp), noconstant)
>
(D.income LD.(ipman income unemp), noconstant)
>
(D.unemp LD.(ipman income unemp), noconstant)
(output omitted )

produce the same estimates after allowing for small numerical differences.

Example 4: A lower-triangular VAR with constrained error variance


The previous example estimated the parameters of a constrained VAR model with a constraint on
the error-covariance matrix. This example makes two refinements on the previous one: we use an
unconditional estimator instead of a conditional estimator, and we constrain the AR parameters to
have a lower triangular structure. (See the next technical note for a discussion of conditional and
unconditional estimators.) The results are

128

dfactor Dynamic-factor models


. constraint 1 [cov(De.unemp,De.income)]_cons = 0
. dfactor (D.(ipman income unemp) = , ar(1) arstructure(ltriangular) noconstant
> covstructure(unstructured)), constraints(1)
searching for initial values ............
(setting technique to bhhh)
Iteration 0:
log likelihood = -543.89836
Iteration 1:
log likelihood = -541.47455
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -540.36159
Iteration 1:
log likelihood = -540.36159
Dynamic-factor model
Sample: 1972m2 - 2008m11

Number of obs
Wald chi2(6)
Prob > chi2

Log likelihood = -540.36159


( 1) [cov(De.income,De.unemp)]_cons = 0

Coef.

=
=
=

442
75.48
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

De.ipman
e.ipman
LD.

.2297308

.0473147

4.86

0.000

.1369957

.3224659

De.income
e.ipman
LD.

.1075441

.0433357

2.48

0.013

.0226077

.1924805

e.income
LD.

-.2209485

.047116

-4.69

0.000

-.3132943

-.1286028

De.unemp
e.ipman
LD.

-.0975759

.0151301

-6.45

0.000

-.1272304

-.0679215

e.income
LD.

-.0000467

.0147848

-0.00

0.997

-.0290244

.0289309

e.unemp
LD.

-.0795348

.0482213

-1.65

0.099

-.1740469

.0149773

.3335286

.0224282

14.87

0.000

.2895702

.377487

.0457804

.0139123

3.29

0.001

.0185127

.0730481

-.0329438
.2743375

.0051423
.0184657

-6.41
14.86

0.000
0.000

-.0430226
.2381454

-.022865
.3105296

(constrained)
.00199
14.68

0.000

.0253083

.0331092

var(De.ipman)
cov(De.ipman,
De.income)
cov(De.ipman,
De.unemp)
var(De.inc~e)
cov(De.inc~e,
De.unemp)
var(De.unemp)

0
.0292088

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The estimated AR terms of D.income and D.unemp on D.unemp are 0.000047 and 0.079535,
and they are not significant at the 1% or 5% levels. The estimated AR term of D.ipman on D.income
is 0.107544 and is significant at the 5% level but not at the 1% level.

dfactor Dynamic-factor models

129

Technical note
We obtained the unconditional estimator in example 4 by specifying the ar() option instead of
including the lags of the endogenous variables as exogenous variables, as we did in example 3. The
unconditional estimator has an additional observation and is more efficient. This change is analogous
to estimating an AR coefficient by arima instead of using regress on the lagged endogenous variable.
For example, to obtain the unconditional estimator in a univariate model, typing
. arima D.ipman, ar(1) noconstant technique(nr)
(output omitted )

will produce the same estimated AR coefficient as


. dfactor (D.ipman, ar(1) noconstant)
(output omitted )

We obtain the conditional estimator by typing either


. regress D.ipman LD.ipman, noconstant
(output omitted )

or
. dfactor (D.ipman = LD.ipman, noconstant)
(output omitted )

Example 5: A static factor model


In this example, we fit regional unemployment data to an SF model. We have data on the
unemployment levels for the four regions in the U.S. census: west for the West, south for the
South, ne for the Northeast, and midwest for the Midwest. We treat the variables as first-difference
stationary and model the first-differences of these variables. Using dfactor yields

130

dfactor Dynamic-factor models


. use http://www.stata-press.com/data/r13/urate
(Monthly unemployment rates in US Census regions)
. dfactor (D.(west south ne midwest) = , noconstant ) (z = )
searching for initial values .............
(setting technique to bhhh)
Iteration 0:
log likelihood = 872.72029
Iteration 1:
log likelihood = 873.04781
(output omitted )
Refining estimates:
Iteration 0:
log likelihood =
873.0755
Iteration 1:
log likelihood =
873.0755
Dynamic-factor model
Sample: 1990m2 - 2008m12
Number of obs
Wald chi2(4)
Log likelihood =
873.0755
Prob > chi2

Coef.

OIM
Std. Err.

=
=
=

227
342.56
0.0000

P>|z|

[95% Conf. Interval]

D.west
z

.0978324

.0065644

14.90

0.000

.0849664

.1106983

.0859494

.0061762

13.92

0.000

.0738442

.0980546

.0918607

.0072814

12.62

0.000

.0775893

.106132

.0861102

.0074652

11.53

0.000

.0714787

.1007417

.0036887
.0038902
.0064074
.0074749

.0005834
.0005228
.0007558
.0008271

6.32
7.44
8.48
9.04

0.000
0.000
0.000
0.000

.0025453
.0028656
.0049261
.0058538

.0048322
.0049149
.0078887
.009096

D.south

D.ne

D.midwest

var(De.west)
var(De.south)
var(De.ne)
var(De.mid~t)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The estimates indicate that we could reasonably suppose that the unobserved factor has the same
effect on the changes in unemployment in all four regions. The output below shows that we cannot
reject the null hypothesis that these coefficients are the same.
. test
( 1)
( 2)
( 3)

[D.west]z = [D.south]z = [D.ne]z = [D.midwest]z


[D.west]z - [D.south]z = 0
[D.west]z - [D.ne]z = 0
[D.west]z - [D.midwest]z = 0
chi2( 3) =
3.58
Prob > chi2 =
0.3109

Example 6: A static factor with constraints


In this example, we impose the constraint that the unobserved factor has the same impact on
changes in unemployment in all four regions. This constraint was suggested by the results of the
previous example. The previous example did not allow for any dynamics in the variables, a problem
we alleviate by allowing the disturbances in the equation for each observable to follow an AR(1)
process.

dfactor Dynamic-factor models


. constraint 2 [D.west]z = [D.south]z
. constraint 3 [D.west]z = [D.ne]z
. constraint 4 [D.west]z = [D.midwest]z
. dfactor (D.(west south ne midwest) = , noconstant ar(1)) (z = ),
> constraints(2/4)
searching for initial values .............
(setting technique to bhhh)
Iteration 0:
log likelihood = 828.22533
Iteration 1:
log likelihood = 874.84221
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = 880.97488
Iteration 1:
log likelihood = 880.97488
Dynamic-factor model
Sample: 1990m2 - 2008m12
Number of obs
=
Wald chi2(5)
=
Log likelihood = 880.97488
Prob > chi2
=
( 1) [D.west]z - [D.south]z = 0
( 2) [D.west]z - [D.ne]z = 0
( 3) [D.west]z - [D.midwest]z = 0

Coef.

OIM
Std. Err.

P>|z|

227
363.34
0.0000

[95% Conf. Interval]

De.west
e.west
LD.

.1297198

.0992663

1.31

0.191

-.0648386

.3242781

De.south
e.south
LD.

-.2829014

.0909205

-3.11

0.002

-.4611023

-.1047004

e.ne
LD.

.2866958

.0847851

3.38

0.001

.12052

.4528715

De.midwest
e.midwest
LD.

.0049427

.0782188

0.06

0.950

-.1483634

.1582488

.0904724

.0049326

18.34

0.000

.0808047

.1001401

.0904724

.0049326

18.34

0.000

.0808047

.1001401

.0904724

.0049326

18.34

0.000

.0808047

.1001401

.0904724

.0049326

18.34

0.000

.0808047

.1001401

.0038959
.0035518
.0058173
.0075444

.0005111
.0005097
.0006983
.0008268

7.62
6.97
8.33
9.12

0.000
0.000
0.000
0.000

.0028941
.0025528
.0044488
.0059239

.0048977
.0045507
.0071859
.009165

De.ne

D.west

D.south

D.ne

D.midwest

var(De.west)
var(De.south)
var(De.ne)
var(De.mid~t)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

131

132

dfactor Dynamic-factor models

The results indicate that the model might not fit well. Two of the four AR coefficients are statistically
insignificant, while the two significant coefficients have opposite signs and sum to about zero. We
suspect that a DF model might fit these data better than an SF model with autocorrelated disturbances.

Stored results
dfactor stores the following in e():
Scalars
e(N)
e(k)
e(k aux)
e(k eq)
e(k eq model)
e(k dv)
e(k obser)
e(k factor)
e(o ar max)
e(f ar max)
e(df m)
e(ll)
e(chi2)
e(p)
e(tmin)
e(tmax)
e(stationary)
e(rank)
e(ic)
e(rc)
e(converged)

significance
minimum time in sample
maximum time in sample
1 if the estimated parameters indicate a stationary model, 0 otherwise
rank of VCE
number of iterations
return code
1 if converged, 0 otherwise

Macros
e(cmd)
e(cmdline)
e(depvar)
e(obser deps)
e(covariates)
e(indeps)
e(factor deps)
e(tvar)
e(eqnames)
e(model)
e(title)
e(tmins)
e(tmaxs)
e(o ar)
e(f ar)
e(observ cov)
e(factor cov)
e(chi2type)
e(vce)
e(vcetype)
e(opt)
e(method)
e(initial values)
e(technique)
e(tech steps)
e(datasignature)
e(datasignaturevars)
e(properties)

dfactor
command as typed
unoperated names of dependent variables in observation equations
names of dependent variables in observation equations
list of covariates
independent variables
names of unobserved factors in model
variable denoting time within groups
names of equations
type of dynamic-factor model specified
title in estimation output
formatted minimum time
formatted maximum time
list of AR terms for disturbances
list of AR terms for factors
structure of observation-error covariance matrix
structure of factor-error covariance matrix
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
type of optimization
likelihood method
type of initial values
maximization technique
iterations taken in maximization technique(s)
the checksum
variables used in calculation of checksum
b V

number of observations
number of parameters
number of auxiliary parameters
number of equations in e(b)
number of equations in overall model test
number of dependent variables
number of observation equations
number of factors specified
number of AR terms for the disturbances
number of AR terms for the factors
model degrees of freedom
log likelihood
2

dfactor Dynamic-factor models


e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(V)
e(V modelbased)
Functions
e(sample)

133

program used to implement estat


program used to implement predict
predictions allowed by margins
predictions disallowed by margins
coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


dfactor writes the specified model as a state-space model and uses sspace to estimate the
parameters by maximum likelihood. See Lutkepohl (2005, 619621) for how to write the DF model
in state-space form. See [TS] sspace for the technical details.

References
De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165169.
. 1991. The diffuse Kalman filter. Annals of Statistics 19: 10731083.
Geweke, J. 1977. The dynamic factor analysis of economic time series models. In Latent Variables in Socioeconomic
Models, ed. D. J. Aigner and A. S. Goldberger, 365383. Amsterdam: North-Holland.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Sargent, T. J., and C. A. Sims. 1977. Business cycle modeling without pretending to have too much a priori economic
theory. In New Methods in Business Cycle Research: Proceedings from a Conference, ed. C. A. Sims, 45109.
Minneapolis: Federal Reserve Bank of Minneapolis.
Stock, J. H., and M. W. Watson. 1989. New indexes of coincident and leading economic indicators. In NBER
Macroeconomics Annual 1989, ed. O. J. Blanchard and S. Fischer, vol. 4, 351394. Cambridge, MA: MIT Press.
. 1991. A probability model of the coincident economic indicators. In Leading Economic Indicators: New
Approaches and Forecasting Records, ed. K. Lahiri and G. H. Moore, 6389. Cambridge: Cambridge University
Press.
Watson, M. W., and R. F. Engle. 1983. Alternative algorithms for the estimation of dymanic factor, MIMIC and
varying coefficient regression models. Journal of Econometrics 23: 385400.

Also see
[TS] dfactor postestimation Postestimation tools for dfactor
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] sspace State-space models
[TS] tsset Declare data to be time-series data
[TS] var Vector autoregressive models
[R] regress Linear regression
[R] sureg Zellners seemingly unrelated regression
[U] 20 Estimation and postestimation commands

Title
dfactor postestimation Postestimation tools for dfactor
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following standard postestimation commands are available after dfactor:
Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
nlcom
predict
predictnl
test
testnl

Syntax for predict


predict
statistic

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

Main

y
xb
xbf
factors
residuals
innovations

dependent variable, which is xbf + residuals


linear predictions using the observable independent variables
linear predictions using the observable independent variables plus the factor
contributions
unobserved factor variables
autocorrelated disturbances
innovations, the observed dependent variable minus the predicted y

These statistics are available both in and out of sample; type predict
the estimation sample.

134

. . . if e(sample) . . . if wanted only for

dfactor postestimation Postestimation tools for dfactor

135

Description

options
Options

equation(eqnames)
rmse(stub* | newvarlist)
dynamic(time constant)

specify name(s) of equation(s) for which predictions are to be made


put estimated root mean squared errors of predicted objects in new
variables
begin dynamic forecast at specified time

Advanced

smethod(method)

method for predicting unobserved states

method

Description

onestep
smooth
filter

predict using past information


predict using all sample information
predict using past and contemporaneous information

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict


The mathematical notation used in this section is defined in Description of [TS] dfactor.

Main

y, xb, xbf, factors, residuals, and innovations specify the statistic to be predicted.
y, the default, predicts the dependent variables. The predictions include the contributions of the
unobserved factors, the linear predictions by using the observable independent variables, and
bb
b t+u
bt.
any autocorrelation, P
ft + Qx

b t.
xb calculates the linear prediction by using the observable independent variables, Qx
xbf calculates the contributions of the unobserved factors plus the linear prediction by using the
bb
b t.
observable independent variables, P
ft + Qx

b t+A
b 1b
b 2b
b tpb
factors estimates the unobserved factors, b
ft = Rw
ft1 + A
ft2 + + A
ftp .
b 1u
b 2u
b tq u
bt = C
b t1 + C
b t2 + + C
b tq .
residuals calculates the autocorrelated residuals, u
bb
b tu
bt.
innovations calculates the innovations, b
t = yt P
ft + Qx


Options

equation(eqnames) specifies the equation(s) for which the predictions are to be calculated.
You specify equation names, such as equation(income consumption) or equation(factor1
factor2), to identify the equations. For the factors statistic, you must specify names of equations
for factors; for all other statistics, you must specify names of equations for observable variables.

136

dfactor postestimation Postestimation tools for dfactor

If you do not specify equation() and do not specify stub*, the results are the same as if you
had specified the name of the first equation for the predicted statistic.
equation() may not be specified with stub*.
rmse(stub* | newvarlist) puts the root mean squared errors of the predicted objects into the specified
new variables. The root mean squared errors measure the variances due to the disturbances but do
not account for estimation error.
dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly, see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with xb, xbf, innovations, smethod(filter), or smethod(smooth).

Advanced

smethod(method) specifies the method used to predict the unobserved states in the model. smethod()
may not be specified with xb.
smethod(onestep), the default, causes predict to use previous information on the dependent
variables. The Kalman filter is performed on previous periods, but only the one-step predictions
are made for the current period.
smethod(smooth) causes predict to estimate the states at each time period using all the sample
data by the Kalman smoother.
smethod(filter) causes predict to estimate the states at each time period using previous
and contemporaneous data by the Kalman filter. The Kalman filter is performed on previous
periods and the current period. smethod(filter) may be specified only with factors and
residuals.

Remarks and examples


We assume that you have already read [TS] dfactor. In this entry, we illustrate some of the features
of predict after using dfactor.
dfactor writes the specified model as a state-space model and estimates the parameters by
maximum likelihood. The unobserved factors and the residuals are states in the state-space form of
the model, and they are estimated by the Kalman filter or the Kalman smoother. The smethod()
option controls how these states are estimated.
The Kalman filter or Kalman smoother is run over the specified sample. Changing the sample can
alter the predicted value for a given observation, because the Kalman filter and Kalman smoother are
recursive algorithms.
After estimating the parameters of a dynamic-factor model, there are many quantities of potential
interest. Here we will discuss several of these statistics and illustrate how to use predict to compute
them.

Example 1: One-step, out-of-sample forecasts


Lets begin by estimating the parameters of the dynamic-factor model considered in example 2 in
[TS] dfactor.

dfactor postestimation Postestimation tools for dfactor

137

. use http://www.stata-press.com/data/r13/dfex
(St. Louis Fed (FRED) macro data)
. dfactor (D.(ipman income hours unemp) = , noconstant ar(1)) (f = , ar(1/2))
(output omitted )

While several of the six statistics computed by predict might be of interest, we will look only at
a few of these statistics for D.ipman. We begin by obtaining one-step predictions in the estimation
sample and a six-month dynamic forecast for D.ipman. The graph of the in-sample predictions
indicates that our model accounts only for a small fraction of the variability in D.ipman.
. tsappend, add(6)
. predict Dipman_f, dynamic(tm(2008m12)) equation(D.ipman)
(option y assumed; fitted values)

. tsline D.ipman Dipman_f if month<=tm(2008m11), lcolor(gs13) xtitle("")


> legend(rows(2))

1970m1

1980m1

1990m1

2000m1

2010m1

Dipman
y prediction, Dipman, dynamic(tm(2008m12))

Graphing the last year of the sample and the six-month out-of-sample forecast yields

. tsline D.ipman Dipman_f if month>=tm(2008m1), xtitle("") legend(rows(2))

2008m1

2008m4

2008m7

2008m10

2009m1

Dipman
y prediction, Dipman, dynamic(tm(2008m12))

2009m4

138

dfactor postestimation Postestimation tools for dfactor

Example 2: Estimating an unobserved factor


Another common task is to estimate an unobserved factor. We can estimate the unobserved factor
at each time period by using only previous information (the smethod(onestep) option), previous
and contemporaneous information (the smethod(filter) option), or all the sample information (the
smethod(smooth) option). We are interested in the one-step predictive power of the unobserved
factor, so we use the default, smethod(onestep).

. predict fac if e(sample), factor


. tsline D.ipman fac, lcolor(gs10) xtitle("") legend(rows(2))

1970m1

1980m1

1990m1

2000m1

2010m1

Dipman
factors, f, onestep

Methods and formulas


dfactor estimates the parameters by writing the model in state-space form and using sspace.
Analogously, predict after dfactor uses the methods described in [TS] sspace postestimation. The
unobserved factors and the residuals are states in the state-space form of the model.
See Methods and formulas of [TS] sspace postestimation for how predictions are made after
estimating the parameters of a state-space model.

Also see
[TS] dfactor Dynamic-factor models
[TS] sspace State-space models
[TS] sspace postestimation Postestimation tools for sspace
[U] 20 Estimation and postestimation commands

Title
dfgls DF-GLS unit-root test
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgments

Syntax
dfgls varname

if

 

in

 

, options

Description

options
Main

maxlag(#)
notrend
ers

use # as the highest lag order for DickeyFuller GLS regressions


series is stationary around a mean instead of around a linear time trend
present interpolated critical values from Elliott, Rothenberg, and Stock (1996)

You must tsset your data before using dfgls; see [TS] tsset.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Tests

>

DF-GLS test for a unit root

Description
dfgls performs a modified DickeyFuller t test for a unit root in which the series has been
transformed by a generalized least-squares regression.

Options


Main

maxlag(#) sets the value of k , the highest lag order for the first-differenced, detrended variable
in the DickeyFuller regression. By default, dfgls sets k according to the method proposed by
Schwert (1989); that is, dfgls sets kmax = floor[12{(T + 1)/100}0.25 ].
notrend specifies that the alternative hypothesis be that the series is stationary around a mean instead
of around a linear time trend. By default, a trend is included.
ers specifies that dfgls should present interpolated critical values from tables presented by Elliott,
Rothenberg, and Stock (1996), which they obtained from simulations. See Critical values under
Methods and formulas for details.

139

140

dfgls DF-GLS unit-root test

Remarks and examples


dfgls tests for a unit root in a time series. It performs the modified DickeyFuller t test (known
as the DF-GLS test) proposed by Elliott, Rothenberg, and Stock (1996). Essentially, the test is an
augmented DickeyFuller test, similar to the test performed by Statas dfuller command, except
that the time series is transformed via a generalized least squares (GLS) regression before performing
the test. Elliott, Rothenberg, and Stock and later studies have shown that this test has significantly
greater power than the previous versions of the augmented DickeyFuller test.
dfgls performs the DF-GLS test for the series of models that include 1 to k lags of the firstdifferenced, detrended variable, where k can be set by the user or by the method described in
Schwert (1989). Stock and Watson (2011, 644649) provide an excellent discussion of the approach.
As discussed in [TS] dfuller, the augmented DickeyFuller test involves fitting a regression of the
form
yt = + yt1 + t + 1 yt1 + 2 yt2 + + k ytk + t
and then testing the null hypothesis H0 : = 0. The DF-GLS test is performed analogously but on
GLS-detrended data. The null hypothesis of the test is that yt is a random walk, possibly with drift.
There are two possible alternative hypotheses: yt is stationary about a linear time trend or yt is
stationary with a possibly nonzero mean but with no linear time trend. The default is to use the
former. To specify the latter alternative, use the notrend option.

Example 1
Here we use the German macroeconomic dataset and test whether the natural log of investment
exhibits a unit root. We use the default options with dfgls.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. dfgls ln_inv
DF-GLS for ln_inv
Number of obs =
80
Maxlag = 11 chosen by Schwert criterion
DF-GLS tau
1% Critical
5% Critical
10% Critical
[lags]
Test Statistic
Value
Value
Value
11
-2.925
10
-2.671
9
-2.766
8
-3.259
7
-3.536
6
-3.115
5
-3.054
4
-3.016
3
-2.071
2
-1.675
1
-1.752
Opt Lag (Ng-Perron seq t) =
Min SC
= -6.169137 at lag
Min MAIC = -6.136371 at lag

-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
-3.610
7 with RMSE
4 with RMSE
1 with RMSE

-2.763
-2.798
-2.832
-2.865
-2.898
-2.929
-2.958
-2.986
-3.012
-3.035
-3.055

-2.489
-2.523
-2.555
-2.587
-2.617
-2.646
-2.674
-2.699
-2.723
-2.744
-2.762

.0388771
.0398949
.0440319

The null hypothesis of a unit root is not rejected for lags 13, it is rejected at the 10% level for
lags 910, and it is rejected at the 5% level for lags 48 and 11. For comparison, we also test for
a unit root in log of investment by using dfuller with two different lag specifications. We need to
use the trend option with dfuller because it is not included by default.

dfgls DF-GLS unit-root test

141

. dfuller ln_inv, lag(4) trend


Augmented Dickey-Fuller test for unit root
Number of obs
=
87
Interpolated Dickey-Fuller
Test
1% Critical
5% Critical
10% Critical
Statistic
Value
Value
Value
Z(t)

-3.133

-4.069

-3.463

-3.158

MacKinnon approximate p-value for Z(t) = 0.0987


. dfuller ln_inv, lag(7) trend
Augmented Dickey-Fuller test for unit root
Number of obs
=
84
Interpolated Dickey-Fuller
Test
1% Critical
5% Critical
10% Critical
Statistic
Value
Value
Value
Z(t)

-3.994

-4.075

-3.466

-3.160

MacKinnon approximate p-value for Z(t) = 0.0090

The critical values and the test statistic produced by dfuller with 4 lags do not support rejecting
the null hypothesis, although the MacKinnon approximate p-value is less than 0.1. With 7 lags, the
critical values and the test statistic reject the null hypothesis at the 5% level, and the MacKinnon
approximate p-value is less than 0.01.
That the dfuller results are not as strong as those produced by dfgls is not surprising because
the DF-GLS test with a trend has been shown to be more powerful than the standard augmented
DickeyFuller test.

Stored results
If maxlag(0) is specified, dfgls stores the following in r():
Scalars
r(rmse0)
r(dft0)

RMSE
DF-GLS statistic

Otherwise, dfgls stores the following in r():


Scalars
r(maxlag)
r(N)
r(sclag)
r(maiclag)
r(optlag)
Matrices
r(results)

highest lag order k


number of observations
lag chosen by Schwarz criterion
lag chosen by modified AIC method
lag chosen by sequential-t method
k, MAIC, SIC, RMSE, and DF-GLS statistics

Methods and formulas


dfgls tests for a unit root. There are two possible alternative hypotheses: yt is stationary around
a linear trend or yt is stationary with no linear time trend. Under the first alternative hypothesis, the
DF-GLS test is performed by first estimating the intercept and trend via GLS. The GLS estimation is
performed by generating the new variables, yet , xt , and zt , where

142

dfgls DF-GLS unit-root test

ye1 = y1
yet = yt yt1 ,

t = 2, . . . , T

x1 = 1
xt = 1 ,

t = 2, . . . , T

z1 = 1
zt = t (t 1)
and = 1 (13.5/T ). An OLS regression is then estimated for the equation

yet = 0 xt + 1 zt + t
The OLS estimators b0 and b1 are then used to remove the trend from yt ; that is, we generate

y = yt (b0 + b1 t)
Finally, we perform an augmented DickeyFuller test on the transformed variable by fitting the OLS
regression
k
X

yt = + yt1
+
j ytj
+ t
(1)
j=1

and then test the null hypothesis H0: = 0 by using tabulated critical values.
To perform the DF-GLS test under the second alternative hypothesis, we proceed as before but
define = 1 (7/T ), eliminate z from the GLS regression, compute y = yt 0 , fit the augmented
DickeyFuller regression by using the newly transformed variable, and perform a test of the null
hypothesis that = 0 by using the tabulated critical values.
dfgls reports the DF-GLS statistic and its critical values obtained from the regression in (1) for
k {1, 2, . . . , kmax }. By default, dfgls sets kmax = floor[12{(T + 1)/100}0.25 ] as proposed by
Schwert (1989), although you can override this choice with another value. The sample size available
with kmax lags is used in all the regressions. Because there are kmax lags of the first-differenced
series, kmax + 1 observations are lost, leaving T kmax observations. dfgls requires that the sample
of T + 1 observations on yt = (y0 , y1 , . . . , yT ) have no gaps.
dfgls reports the results of three different methods for choosing which value of k to use. These
are method 1 the NgPerron sequential t, method 2 the minimum Schwarz information criterion
(SIC), and method 3 the NgPerron modified Akaike information criterion (MAIC). Although the SIC
has a long history in time-series modeling, the NgPerron sequential t was developed by Ng and
Perron (1995), and the MAIC was developed by Ng and Perron (2000).
The SIC can be calculated using either the log likelihood or the sum-of-squared errors from a
regression; dfgls uses the latter definition. Specifically, for each k
SIC

= ln(rmse
d ) + (k + 1)

ln(T kmax )
(T kmax )

dfgls DF-GLS unit-root test

where

rmse
d =

1
(T kmax )

T
X

143

ebt2

t=kmax +1

dfgls reports the value of the smallest SIC and the k that produced it.
Ng and Perron (1995) derived a sequential-t algorithm for choosing k :
i. Set n = 0 and run the regression in method 2 with all kmax n lags. If the coefficient on
kmax is significantly different from zero at level , choose k to kmax . Otherwise, continue
to ii.
ii. If n < kmax , set n = n + 1 and continue to iii. Otherwise, set k = 0 and stop.
iii. Run the regression in method 2 with kmax n lags. If the coefficient on kmax n is
significantly different from zero at level , choose k to kmax n. Otherwise, return to ii.
Per Ng and Perron (1995), dfgls uses = 10%. dfgls reports the k selected by this sequential-t
algorithm and the rmse
d from the regression.
Method (3) is based on choosing k to minimize the MAIC. The MAIC is calculated as
2

MAIC(k) = ln(rmse
d )+
where

(k) =

1
rmse
d

b2
2 0

2{ (k) + k}
T kmax

T
X

yet2

t=kmax +1

and ye was defined previously.


Critical values
By default, dfgls uses the 5% and 10% critical values computed from the response surface
analysis of Cheung and Lai (1995). Because Cheung and Lai (1995) did not present results for the
1% case, the 1% critical values are always interpolated from the critical values presented by ERS.
ERS presented critical values, obtained from simulations, for the DF-GLS test with a linear trend
and showed that the critical values for the mean-only DF-GLS test were the same as those for the ADF
test. If dfgls is run with the ers option, dfgls will present interpolated critical values from these
tables. The method of interpolation is standard. For the trend case, below 50 observations and above
200 there is no interpolation; the values for 50 and are reported from the tables. For a value N
that lies between two values in the table, say, N1 and N2 , with corresponding critical values CV1
and CV2 , the critical value

cv = CV1 +

N N1
(CV2 CV1 )
N1

is presented. The same method is used for the mean-only case, except that interpolation is possible
for values between 50 and 500.

144

dfgls DF-GLS unit-root test

Acknowledgments
We thank Christopher F. Baum of the Department of Economics at Boston College and author of
the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to
Stata Programming and Richard Sperling for a previous version of dfgls.

References
Cheung, Y.-W., and K. S. Lai. 1995. Lag order and critical values of a modified DickeyFuller test. Oxford Bulletin
of Economics and Statistics 57: 411419.
Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root.
Journal of the American Statistical Association 74: 427431.
Elliott, G. R., T. J. Rothenberg, and J. H. Stock. 1996. Efficient tests for an autoregressive unit root. Econometrica
64: 813836.
Ng, S., and P. Perron. 1995. Unit root tests in ARMA models with data-dependent methods for the selection of the
truncation lag. Journal of the American Statistical Association 90: 268281.
. 2000. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69:
15191554.
Schwert, G. W. 1989. Tests for unit roots: A Monte Carlo investigation. Journal of Business and Economic Statistics
2: 147159.
Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: AddisonWesley.

Also see
[TS] dfuller Augmented DickeyFuller unit-root test
[TS] pperron PhillipsPerron unit-root test
[TS] tsset Declare data to be time-series data
[XT] xtunitroot Panel-data unit-root tests

Title
dfuller Augmented DickeyFuller unit-root test
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
dfuller varname

if

 

in

 

, options

Description

options
Main

noconstant
trend
drift
regress
lags(#)

suppress constant term in regression


include trend term in regression
include drift term in regression
display regression table
include # lagged differences

You must tsset your data before using dfuller; see [TS] tsset.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Tests

>

Augmented Dickey-Fuller unit-root test

Description
dfuller performs the augmented DickeyFuller test that a variable follows a unit-root process.
The null hypothesis is that the variable contains a unit root, and the alternative is that the variable
was generated by a stationary process. You may optionally exclude the constant, include a trend term,
and include lagged values of the difference of the variable in the regression.

Options


Main

noconstant suppresses the constant term (intercept) in the model and indicates that the process
under the null hypothesis is a random walk without drift. noconstant cannot be used with the
trend or drift option.
trend specifies that a trend term be included in the associated regression and that the process under
the null hypothesis is a random walk, perhaps with drift. This option may not be used with the
noconstant or drift option.
drift indicates that the process under the null hypothesis is a random walk with nonzero drift. This
option may not be used with the noconstant or trend option.
regress specifies that the associated regression table appear in the output. By default, the regression
table is not produced.
lags(#) specifies the number of lagged difference terms to include in the covariate list.
145

146

dfuller Augmented DickeyFuller unit-root test

Remarks and examples


Dickey and Fuller (1979) developed a procedure for testing whether a variable has a unit root or,
equivalently, that the variable follows a random walk. Hamilton (1994, 528529) describes the four
different cases to which the augmented DickeyFuller test can be applied. The null hypothesis is
always that the variable has a unit root. They differ in whether the null hypothesis includes a drift
term and whether the regression used to obtain the test statistic includes a constant term and time
trend. Becketti (2013, chap. 9) provides additional examples showing how to conduct these tests.
The true model is assumed to be

yt = + yt1 + ut
where ut is an independently and identically distributed zero-mean error term. In cases one and two,
presumably = 0, which is a random walk without drift. In cases three and four, we allow for a
drift term by letting be unrestricted.
The DickeyFuller test involves fitting the model

yt = + yt1 + t + ut
by ordinary least squares (OLS), perhaps setting = 0 or = 0. However, such a regression is likely
to be plagued by serial correlation. To control for that, the augmented DickeyFuller test instead fits
a model of the form

yt = + yt1 + t + 1 yt1 + 2 yt2 + + k ytk + t


(1)
where k is the number of lags specified in the lags() option. The noconstant option removes the
constant term from this regression, and the trend option includes the time trend t, which by
default is not included. Testing = 0 is equivalent to testing = 1, or, equivalently, that yt follows
a unit root process.
In the first case, the null hypothesis is that yt follows a random walk without drift, and (1) is fit
without the constant term and the time trend t. The second case has the same null hypothesis as
the first, except that we include in the regression. In both cases, the population value of is zero
under the null hypothesis. In the third case, we hypothesize that yt follows a unit root with drift, so
that the population value of is nonzero; we do not include the time trend in the regression. Finally,
in the fourth case, the null hypothesis is that yt follows a unit root with or without drift so that is
unrestricted, and we include a time trend in the regression.
The following table summarizes the four cases.

Case
1
2
3
4

Process under
null hypothesis

Regression
restrictions

dfuller
option

Random walk without drift


Random walk without drift
Random walk with drift
Random walk with or
without drift

= 0, = 0
=0
=0
(none)

noconstant
(default)
drift
trend

Except in the third case, the t-statistic used to test H0: = 0 does not have a standard distribution.
Hamilton (1994, chap. 17) derives the limiting distributions, which are different for each of the
three other cases. The critical values reported by dfuller are interpolated based on the tables in
Fuller (1996). MacKinnon (1994) shows how to approximate the p-values on the basis of a regression
surface, and dfuller also reports that p-value. In the third case, where the regression includes a
constant term and under the null hypothesis the series has a nonzero drift parameter , the t statistic
has the usual t distribution; dfuller reports the one-sided critical values and p-value for the test of
H0 against the alternative Ha: < 0, which is equivalent to < 1.

dfuller Augmented DickeyFuller unit-root test

147

Deciding which case to use involves a combination of theory and visual inspection of the data.
If economic theory favors a particular null hypothesis, the appropriate case can be chosen based on
that. If a graph of the data shows an upward trend over time, then case four may be preferred. If the
data do not show a trend but do have a nonzero mean, then case two would be a valid alternative.

Example 1
In this example, we examine the international airline passengers dataset from Box, Jenkins, and
Reinsel (2008, Series G). This dataset has 144 observations on the monthly number of international
airline passengers from 1949 through 1960. Because the data show a clear upward trend, we use the
trend option with dfuller to include a constant and time trend in the augmented DickeyFuller
regression.
. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. dfuller air, lags(3) trend regress
Augmented Dickey-Fuller test for unit root
Test
Statistic
Z(t)

-6.936

Number of obs

140

Interpolated Dickey-Fuller
1% Critical
5% Critical
10% Critical
Value
Value
Value
-4.027

-3.445

-3.145

MacKinnon approximate p-value for Z(t) = 0.0000


D.air

Coef.
air
L1.
LD.
L2D.
L3D.
_trend
_cons

-.5217089
.5572871
.095912
.14511
1.407534
44.49164

Std. Err.

.0752195
.0799894
.0876692
.0879922
.2098378
7.78335

-6.94
6.97
1.09
1.65
6.71
5.72

P>|t|

0.000
0.000
0.276
0.101
0.000
0.000

[95% Conf. Interval]

-.67048
.399082
-.0774825
-.0289232
.9925118
29.09753

-.3729379
.7154923
.2693065
.3191433
1.822557
59.88575

Here we can overwhelmingly reject the null hypothesis of a unit root at all common significance
levels. From the regression output, the estimated of 0.522 implies that = (1 0.522) = 0.478.
Experiments with fewer or more lags in the augmented regression yield the same conclusion.

Example 2
In this example, we use the German macroeconomic dataset to determine whether the log of
consumption follows a unit root. We will again use the trend option, because consumption grows
over time.

148

dfuller Augmented DickeyFuller unit-root test


. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter
. dfuller ln_consump, lags(4) trend
Augmented Dickey-Fuller test for unit root
Number of obs
=
87
Interpolated Dickey-Fuller
Test
1% Critical
5% Critical
10% Critical
Statistic
Value
Value
Value
Z(t)

-1.318

-4.069

-3.463

-3.158

MacKinnon approximate p-value for Z(t) = 0.8834

As we might expect from economic theory, here we cannot reject the null hypothesis that log
consumption exhibits a unit root. Again using different numbers of lag terms yield the same conclusion.

Stored results
dfuller stores the following in r():
Scalars
r(N)
r(lags)
r(Zt)
r(p)

number of observations
number of lagged differences
DickeyFuller test statistic
MacKinnon approximate p-value (if there is a constant or trend in associated regression)

Methods and formulas


In the OLS estimation of an AR(1) process with Gaussian errors,

yt = yt1 + t
where t are independently and identically distributed as N (0, 2 ) and y0 = 0, the OLS estimate
(based on an n-observation time series) of the autocorrelation parameter is given by
n
X

bn =

If || < 1, then

yt1 yt
t=1
n
X
yt2
t=1

n(b
n ) N (0, 1 2 )

If this result were valid when = 1, the resulting distribution would have a variance of zero. When
= 1, the OLS estimate b still converges in probability to one, though we need to find a suitable
nondegenerate distribution so that we can perform hypothesis tests of H0 : = 1. Hamilton (1994,
chap. 17) provides a superb exposition of the requisite theory.

dfuller Augmented DickeyFuller unit-root test

149

To compute the test statistics, we fit the augmented DickeyFuller regression

yt = + yt1 + t +

k
X

j ytj + et

j=1

via OLS where, depending on the options specified, the constant term or time trend t is omitted
and k is the number of lags specified in the lags() option. The test statistic for H0 : = 0 is
b , where
Zt = /b
b is the standard error of b.
The critical values included in the output are linearly interpolated from the table of values that
appears in Fuller (1996), and the MacKinnon approximate p-values use the regression surface published
in MacKinnon (1994).


David Alan Dickey (1945 ) was born in Ohio and obtained degrees in mathematics at Miami
University and a PhD in statistics at Iowa State University in 1976 as a student of Wayne Fuller.
He works at North Carolina State University and specializes in time-series analysis.

Wayne Arthur Fuller (1931 ) was born in Iowa, obtained three degrees at Iowa State University
and then served on the faculty between 1959 and 2001. He has made many distinguished
contributions to time series, measurement-error models, survey sampling, and econometrics.

References
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root.
Journal of the American Statistical Association 74: 427431.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
MacKinnon, J. G. 1994. Approximate asymptotic distribution functions for unit-root and cointegration tests. Journal
of Business and Economic Statistics 12: 167176.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.

Also see
[TS] tsset Declare data to be time-series data
[TS] dfgls DF-GLS unit-root test
[TS] pperron PhillipsPerron unit-root test
[XT] xtunitroot Panel-data unit-root tests

Title
estat acplot Plot parametric autocorrelation and autocovariance functions
Syntax
Remarks and examples

Menu for estat


Methods and formulas

Description
References

Options
Also see

Syntax
estat acplot

, options

options

Description



saving( filename , . . . )

save results to filename; save variables in double precision;


save variables with prefix stubname
set confidence level; default is level(95)
use # autocorrelations
calculate autocovariances; the default is to calculate autocorrelations
report short-memory ACF; only allowed after arfima

level(#)
lags(#)
covariance
smemory
CI plot

affect rendition of the confidence bands

ciopts(rcap options)
Plot

change look of markers (color, size, etc.)


add marker labels; change look or position
affect rendition of the plotted points

marker options
marker label options
cline options
Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

Menu for estat


Statistics

>

Postestimation

>

Reports and statistics

Description
estat acplot plots the estimated autocorrelation and autocovariance functions of a stationary
process using the parameters of a previously fit parametric model.
estat acplot is available after arima and arfima; see [TS] arima and [TS] arfima.

Options


saving( filename , suboptions ) creates a Stata data file (.dta file) consisting of the autocorrelation
estimates, standard errors, and confidence bounds.
Five variables are saved: lag (lag number), ac (autocorrelation estimate), se (standard error),
ci l (lower confidence bound), and ci u (upper confidence bound).
150

estat acplot Plot parametric autocorrelation and autocovariance functions

151

double specifies that the variables be saved as doubles, meaning 8-byte reals. By default, they
are saved as floats, meaning 4-byte reals.
name(stubname) specifies that variables be saved with prefix stubname.
replace indicates that filename be overwritten if it exists.
level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is
level(95) or as set by set level; see [R] level.
lags(#) specifies the number of autocorrelations to calculate. The default is to use
min{floor(n/2) 2, 40}, where floor(n/2) is the greatest integer less than or equal to n/2 and
n is the number of observations.
covariance specifies the calculation of autocovariances instead of the default autocorrelations.
smemory specifies that the ARFIMA fractional integration parameter be ignored. The computed autocorrelations are for the short-memory ARMA component of the model. This option is allowed only
after arfima.

CI plot

ciopts(rcap options) affects the rendition of the confidence bands; see [G-3] rcap options.

Plot

marker options affect the rendition of markers drawn at the plotted points, including their shape,
size, color, and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.
cline options affect whether lines connect the plotted points and the rendition of those lines; see
[G-3] cline options.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, except by(). These
include options for titling the graph (see [G-3] title options) and options for saving the graph to
disk (see [G-3] saving option).

Remarks and examples


The dependent variable evolves over time because of random shocks in the time domain representation. The autocovariances j , j {0, 1, . . . , }, of a covariance-stationary process yt specify its
variance and dependence structure, and the autocorrelations j , j {1, 2, . . . , }, provide a scalefree measure of yt s dependence structure. The autocorrelation at lag j specifies whether realizations
at time t and realizations at time t j are positively related, unrelated, or negatively related. estat
acplot uses the estimated parameters of a parametric model to estimate and plot the autocorrelations
and autocovariances of a stationary process.

152

estat acplot Plot parametric autocorrelation and autocovariance functions

Example 1
In example 1 of [TS] arima, we fit an ARIMA(1,1,1) model of the U.S. Wholesale Price Index
(WPI) using quarterly data over the period 1960q1 through 1990q4.
. use http://www.stata-press.com/data/r13/wpi1
. arima wpi, arima(1,1,1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -139.80133
Iteration 1:
log likelihood = -135.6278
Iteration 2:
log likelihood = -135.41838
Iteration 3:
log likelihood = -135.36691
Iteration 4:
log likelihood = -135.35892
(switching optimization to BFGS)
Iteration 5:
log likelihood = -135.35471
Iteration 6:
log likelihood = -135.35135
Iteration 7:
log likelihood = -135.35132
Iteration 8:
log likelihood = -135.35131
ARIMA regression
Sample:

1960q2 - 1990q4

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -135.3513

=
=
=

123
310.64
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

.7498197

.3340968

2.24

0.025

.0950019

1.404637

ar
L1.

.8742288

.0545435

16.03

0.000

.7673256

.981132

ma
L1.

-.4120458

.1000284

-4.12

0.000

-.6080979

-.2159938

/sigma

.7250436

.0368065

19.70

0.000

.6529042

.7971829

D.wpi

Coef.

_cons

wpi

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

Now we use estat acplot to estimate the autocorrelations implied by the estimated ARMA
parameters. We include lags(50) to indicate that autocorrelations be computed for 50 lags. By
default, a 95% confidence interval is provided for each autocorrelation.

estat acplot Plot parametric autocorrelation and autocovariance functions

153

. estat acplot, lags(50)

.2

Autocorrelations
.4
.6

.8

Parametric autocorrelations of D.wpi


with 95% confidence intervals

10

20
30
quarterly lag

40

50

The graph is similar to a typical autocorrelation function of an AR(1) process with a positive
coefficient. The autocorrelations of a stationary AR(1) process decay exponentially toward zero.

Methods and formulas


The autocovariance function for ARFIMA models is described in Methods and formulas of [TS] arfima.
The autocovariance function for ARIMA models is obtained by setting the fractional difference parameter
to zero.
Box, Jenkins, and Reinsel (2008) provide excellent descriptions of the autocovariance function for
ARIMA and seasonal ARIMA models. Palma (2007) provides an excellent summary of the autocovariance
function for ARFIMA models.

References
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley.

Also see
[TS] arfima Autoregressive fractionally integrated moving-average models
[TS] arima ARIMA, ARMAX, and other dynamic regression models

Title
estat aroots Check the stability condition of ARIMA estimates
Syntax
Remarks and examples
Also see

Menu for estat


Stored results

Description
Methods and formulas

Options
Reference

Syntax
estat aroots

, options

options

Description

nograph
dlabel
modlabel

suppress graph of eigenvalues for the companion matrices


label eigenvalues with the distance from the unit circle
label eigenvalues with the modulus

Grid

suppress polar grid circles


specify radii and appearance of polar grid circles; see Options for details

nogrid

pgrid( . . . )
Plot

change look of markers (color, size, etc.)

marker options
Reference unit circle

affect rendition of reference unit circle

rlopts(cline options)

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

Menu for estat


Statistics

>

Postestimation

>

Reports and statistics

Description
estat aroots checks the eigenvalue stability condition after estimating the parameters of an
ARIMA model using arima. A graph of the eigenvalues of the companion matrices for the AR and
MA polynomials is also produced.

estat aroots is available only after arima; see [TS] arima.

Options
nograph specifies that no graph of the eigenvalues of the companion matrices be drawn.
dlabel labels each eigenvalue with its distance from the unit circle. dlabel cannot be specified
with modlabel.
modlabel labels the eigenvalues with their moduli. modlabel cannot be specified with dlabel.
154

estat aroots Check the stability condition of ARIMA estimates

155

Grid

nogrid suppresses the polar grid circles.





pgrid( numlist
, line options ) determines the radii and appearance of the polar grid circles.
By default, the graph includes nine polar grid circles with radii 0.1, 0.2, . . . , 0.9 that have the grid
line style. The numlist specifies the radii for the polar grid circles. The line options determine the
appearance of the polar grid circles; see [G-3] line options. Because the pgrid() option can be
repeated, circles with different radii can have distinct appearances.

Plot

marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.

Reference unit circle

rlopts(cline options) affect the rendition of the reference unit circle; see [G-3] cline options.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, except by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Inference after arima requires that the variable yt be covariance stationary. The variable yt is
covariance stationary if its first two moments exist and are time invariant. More explicitly, yt is
covariance stationary if
1. E(yt ) is finite and not a function of t;
2. Var(yt ) is finite and independent of t; and
3. Cov(yt , ys ) is a finite function of |t s| but not of t or s alone.
The stationarity of an ARMA process depends on the autoregressive (AR) parameters. If the inverse
roots of the AR polynomial all lie inside the unit circle, the process is stationary, invertible, and
has an infinite-order moving-average (MA) representation. Hamilton (1994, chap. 1) shows that if
the modulus of each eigenvalue of the matrix F() is strictly less than 1, the estimated ARMA is
stationary; see Methods and formulas for the definition of the matrix F().
The MA part of an ARMA process can be rewritten as an infinite-order AR process provided that
the MA process is invertible. Hamilton (1994, chap. 1) shows that if the modulus of each eigenvalue
of the matrix F() is strictly less than 1, the estimated ARMA is invertible; see Methods and formulas
for the definition of the matrix F().

Example 1
In this example, we check the stability condition of the SARIMA model that we fit in example 3
of [TS] arima. We begin by reestimating the parameters of the model.
. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. generate lnair = ln(air)

156

estat aroots Check the stability condition of ARIMA estimates


. arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant
(setting optimization to BHHH)
Iteration 0:
log likelihood =
223.8437
Iteration 1:
log likelihood = 239.80405
Iteration 2:
log likelihood = 244.10265
Iteration 3:
log likelihood = 244.65895
Iteration 4:
log likelihood = 244.68945
(switching optimization to BFGS)
Iteration 5:
log likelihood = 244.69431
Iteration 6:
log likelihood = 244.69647
Iteration 7:
log likelihood = 244.69651
Iteration 8:
log likelihood = 244.69651
ARIMA regression
Sample: 14 - 144
Number of obs
Wald chi2(2)
Log likelihood = 244.6965
Prob > chi2

DS12.lnair

Coef.

OPG
Std. Err.

P>|z|

=
=
=

131
84.53
0.0000

[95% Conf. Interval]

ARMA
ma
L1.

-.4018324

.0730307

-5.50

0.000

-.5449698

-.2586949

ma
L1.

-.5569342

.0963129

-5.78

0.000

-.745704

-.3681644

/sigma

.0367167

.0020132

18.24

0.000

.0327708

.0406625

ARMA12

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

We can now use estat aroots to check the stability condition of the MA part of the model.
. estat aroots
Eigenvalue stability condition
Eigenvalue
.824798
.824798
.9523947
-.824798
-.824798
-.4761974
-.4761974
2.776e-16
2.776e-16
.4761974
.4761974
-.9523947
.4018324

+
-

.4761974i
.4761974i

+
+
+
+
-

.4761974i
.4761974i
.824798i
.824798i
.9523947i
.9523947i
.824798i
.824798i

Modulus
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.952395
.401832

All the eigenvalues lie inside the unit circle.


MA parameters satisfy invertibility condition.

estat aroots Check the stability condition of ARIMA estimates

157

.5

Imaginary
0

.5

Inverse roots of MA polynomial

.5

0
Real

.5

Because the modulus of each eigenvalue is strictly less than 1, the MA process is invertible and
can be represented as an infinite-order AR process.
The graph produced by estat aroots displays the eigenvalues with the real components on the x
axis and the imaginary components on the y axis. The graph indicates visually that these eigenvalues
are just inside the unit circle.

Stored results
aroots stores the following in r():
Matrices
r(Re ar)
real part of the eigenvalues of F ()
r(Im ar)
imaginary part of the eigenvalues of F ()
r(Modulus ar) modulus of the eigenvalues of F ()
r(ar)
F (), the AR companion matrix
r(Re ma)
real part of the eigenvalues of F ()
r(Im ma)
imaginary part of the eigenvalues of F ()
r(Modulus ma) modulus of the eigenvalues of F ()
r(ma)
F (), the MA companion matrix

Methods and formulas


Recall the general form of the ARMA model,
(Lp )(yt xt ) = (Lq )t
where

(Lp ) = 1 1 L 2 L2 p Lp
(Lq ) = 1 + 1 L + 2 L2 + + q Lq

and Lj yt = ytj .

158

estat aroots Check the stability condition of ARIMA estimates

estat aroots forms the companion matrix

1
1

0
F() =
.
..

2
0
1
..
.

. . . r1
...
0
...
0
..
..
.
.
...
1

r
0

0
..
.
0

where = and r = p for the AR part of ARMA, and = and r = q for the MA part of
ARMA. aroots obtains the eigenvalues
of F by using matrix eigenvalues. The modulus of the

complex eigenvalue r + ci is r2 + c2 . As shown by Hamilton (1994, chap. 1), a process is stable


and invertible if the modulus of each eigenvalue of F is strictly less than 1.

Reference
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.

Also see
[TS] arima ARIMA, ARMAX, and other dynamic regression models

Title
fcast compute Compute dynamic forecasts after var, svar, or vec
Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
After var and svar
fcast compute prefix

, options1

, options2

After vec
fcast compute prefix

prefix is the prefix appended to the names of the dependent variables to create the names of the
variables holding the dynamic forecasts.
options1

Description

Main

step(#)
dynamic(time constant)
estimates(estname)
replace

set # periods to forecast; default is step(1)


begin dynamic forecasts at time constant
use previously stored results estname; default is to use active
results
replace existing forecast variables that have the same prefix

Std. Errors

nose
bs
bsp
bscentile
reps(#)
nodots


saving(filename , replace )

suppress asymptotic standard errors


obtain standard errors from bootstrapped residuals
obtain standard errors from parametric bootstrap
estimate bounds by using centiles of bootstrapped dataset
perform # bootstrap replications; default is reps(200)
suppress the usual dot after each bootstrap replication
save bootstrap results as filename; use replace to overwrite
existing filename

Reporting

level(#)

set confidence level; default is level(95)

159

160

fcast compute Compute dynamic forecasts after var, svar, or vec

Description

options2
Main

set # periods to forecast; default is step(1)


begin dynamic forecasts at time constant
use previously stored results estname; default is to use active
results
replace existing forecast variables that have the same prefix
save dynamic predictions of the first-differenced variables

step(#)
dynamic(time constant)
estimates(estname)
replace
differences
Std. Errors

suppress asymptotic standard errors

nose
Reporting

set confidence level; default is level(95)

level(#)

Default is to use asymptotic standard errors if no options are specified.


fcast compute can be used only after var, svar, and vec; see [TS] var, [TS] var svar, and [TS] vec.
You must tsset your data before using fcast compute; see [TS] tsset.

Menu
Statistics

>

Multivariate time series

>

VEC/VAR forecasts

>

Compute forecasts (required for graph)

Description
fcast compute produces dynamic forecasts of the dependent variables in a model previously fit
by var, svar, or vec. fcast compute creates new variables and, if necessary, extends the time
frame of the dataset to contain the prediction horizon.

Options


Main

step(#) specifies the number of periods to be forecast. The default is step(1).


dynamic(time constant) specifies the period to begin the dynamic forecasts. The default is the period
after the last observation in the estimation sample. The dynamic() option accepts either a Stata
date function that returns an integer or an integer that corresponds to a date using the current tsset
format. dynamic() must specify a date in the range of two or more periods into the estimation
sample to one period after the estimation sample.
estimates(estname) specifies that fcast compute use the estimation results stored as estname. By
default, fcast compute uses the active estimation results. See [R] estimates for more information
on manipulating estimation results.
replace causes fcast compute to replace the variables in memory with the specified predictions.
differences specifies that fcast compute also save dynamic predictions of the first-differenced
variables. differences can be specified only with vec estimation results.

fcast compute Compute dynamic forecasts after var, svar, or vec

161

Std. Errors

nose specifies that the asymptotic standard errors of the forecasted levels and, thus the asymptotic
confidence intervals for the levels, not be calculated. By default, the asymptotic standard errors
and the asymptotic confidence intervals of the forecasted levels are calculated.
bs specifies that fcast compute use confidence bounds estimated by a simulation method based on
bootstrapping the residuals.
bsp specifies that fcast compute use confidence bounds estimated via simulation in which the
innovations are drawn from a multivariate normal distribution.
bscentile specifies that fcast compute use centiles of the bootstrapped dataset to estimate the
bounds of the confidence intervals. By default, fcast compute uses the estimated standard errors
and the quantiles of the standard normal distribution determined by level().
reps(#) gives the number of repetitions used in the simulations. The default is 200.
nodots specifies that no dots be displayed while obtaining the simulation-based standard errors. By
default, for each replication, a dot is displayed.


saving(filename , replace ) specifies the name of the file to hold the dataset that contains the
bootstrap replications. The replace option overwrites any file with this name.
replace specifies that filename be overwritten if it exists. This option is not shown in the dialog
box.

Reporting

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is
level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals.

Remarks and examples


Researchers often use VARs and VECMs to construct dynamic forecasts. fcast compute computes
dynamic forecasts of the dependent variables in a VAR or VECM previously fit by var, svar, or vec.
If you are interested in conditional, one-step-ahead predictions, use predict (see [TS] var, [TS] var
svar, and [TS] vec).
To obtain and analyze dynamic forecasts, you fit a model, use fcast compute to compute the
dynamic forecasts, and use fcast graph to graph the results.

Example 1
Typing
.
.
.
.

use http://www.stata-press.com/data/r13/lutkepohl2
var dln_inc dln_consump dln_inv if qtr<tq(1979q1)
fcast compute m2_, step(8)
fcast graph m2_dln_inc m2_dln_inv m2_dln_consump, observed

fits a VAR with two lags, computes eight-step dynamic predictions for each endogenous variable, and
produces the graph

162

fcast compute Compute dynamic forecasts after var, svar, or vec

Forecast for dln_inv

.1

.02

.1

.04

Forecast for dln_inc

1978q3 1979q1 1979q3 1980q1 1980q3

.02

.04

Forecast for dln_consump

1978q3 1979q1 1979q3 1980q1 1980q3

95% CI
observed

forecast

The graph shows that the model is better at predicting changes in income and investment than in
consumption. The graph also shows how quickly the predictions from the two-lag model settle down
to their mean values.

fcast compute creates new variables in the dataset. If there are K dependent variables in the
previously fitted model, fcast compute generates 4K new variables:

K new variables that hold the forecasted levels, named by appending the specified prefix to
the name of the original variable
K estimated lower bounds for the forecast interval, named by appending the specified prefix
and the suffix LB to the name of the original variable
K estimated upper bounds for the forecast interval, named by appending the specified prefix
and the suffix UB to the name of the original variable
K estimated standard errors of the forecast, named by appending the specified prefix and the
suffix SE to the name of the original variable
If you specify options so that fcast compute does not calculate standard errors, the 3K variables
that hold them and the bounds of the confidence intervals are not generated.
If the model previously fit is a VECM, specifying differences generates another K variables
that hold the forecasts of the first differences of the dependent variables, named by appending the
prefix prefixD to the name of the original variable.

Example 2
Plots of the forecasts from different models along with the observations from a holdout sample
can provide insights to their relative forecasting performance. Continuing the previous example,

fcast compute Compute dynamic forecasts after var, svar, or vec

163

.05

.05

.1

. var dln_inc dln_consump dln_inv if qtr<tq(1979q1), lags(1/6)


(output omitted )
. fcast compute m6_, step(8)
. graph twoway line m6_dln_inv m2_dln_inv dln_inv qtr
> if m6_dln_inv < ., legend(cols(1))

1978q4

1979q2

1979q4
quarter

1980q2

1980q4

m6_dln_inv, dyn(1979q1)
m2_dln_inv, dyn(1979q1)
firstdifference of ln_inv

The model with six lags predicts changes in investment better than the two-lag model in some periods
but markedly worse in other periods.

Methods and formulas


Predictions after var and svar
A VAR with endogenous variables yt and exogenous variables xt can be written as

yt = v + A1 yt1 + + Ap ytp + Bxt + ut


where

t = 1, . . . , T
yt = (y1t , . . . , yKt )0 is a K 1 random vector,
the Ai are fixed (K K) matrices of parameters,
xt is an (M 1) vector of exogenous variables,
B is a (K M ) matrix of coefficients,
v is a (K 1) vector of fixed parameters, and
ut is assumed to be white noise; that is,
E(ut ) = 0K
E(ut u0t ) =
E(ut u0s ) = 0K for t 6= s
fcast compute will dynamically predict the variables in the vector yt conditional on p initial values
of the endogenous variables and any exogenous xt . Adopting the notation from Lutkepohl (2005,
402) to fit the case at hand, the optimal h-step-ahead forecast of yt+h conditional on xt is

164

fcast compute Compute dynamic forecasts after var, svar, or vec

b 1 yt (h 1) + + A
b p yt (h p) + Bx
b t
b+A
yt (h) = v

(1)

If there are no exogenous variables, (1) becomes

b 1 yt (h 1) + + A
b p yt (h p)
b+A
yt (h) = v
When there are no exogenous variables, fcast compute can compute the asymptotic confidence
bounds.
As shown by Lutkepohl (2005, 204205), the asymptotic estimator of the covariance matrix of
the prediction error is given by

b (h)
b (h) =
b y (h) + 1

b
y
T

(2)

where

b y (h) =

h1
X

b i
b
b 0i

i=0

(h1
)
(h1
)0
T
X  h1i
1 X X 0  b 0 h1i b b
0 b0
b
b
Z B
i
(h) =
Zt B
i
T t=0 i=0 t
i=0

1 0
0 ...
0
0
b
b
b
b
b A1 A2 . . . Ap1 Ap
v

0
0
0 IK 0 . . .
b =

B
0
0
0 0 IK
.
..
..
..
..

.
.
.
0 0
0 ...
0
0
Zt = (1, yt , . . . , ytp1
)0
b 0 = IK

i
X
bj
bi =
b ij A

IK

i = 1, 2, . . .

j=1

b j = 0 for j > p
A
b is the estimate of the covariance matrix of the innovations, and
b is the estimated VCE of the

coefficients in the VAR. The formula in (2) is general enough to handle the case in which constraints
are placed on the coefficients in the VAR(p).
b y (h) is the estimated mean squared error (MSE) of the
Equation (2) is made up of two terms.
b
b (h)
forecast. y (h) estimates the error in the forecast arising from the unseen innovations. T 1
estimates the error in the forecast that is due to using estimated coefficients instead of the true
coefficients. As the sample size grows, uncertainty with respect to the coefficient estimates decreases,
b (h) goes to zero.
and T 1

fcast compute Compute dynamic forecasts after var, svar, or vec

165

If yt is normally distributed, the bounds for the asymptotic (1 )100% interval around the
forecast for the k th component of yt , h periods ahead, are

bk,t (h) z( 2 )
y
bk (h)

(3)

b (h).
where
bk (h) is the k th diagonal element of
b
y
Specifying the bs option causes the standard errors to be computed via simulation, using bootstrapped
residuals. Both var and svar contain estimators for the coefficients of a VAR that are conditional
on the first p observations on the endogenous variables in the data. Similarly, these algorithms
are conditional on the first p observations of the endogenous variables in the data. However, the
simulation-based estimates of the standard errors are also conditional on the estimated coefficients.
The asymptotic standard errors are not conditional on the coefficient estimates because the second
term on the right-hand side of (2) accounts for the uncertainty arising from using estimated parameters.
For a simulation with R repetitions, this method uses the following algorithm:
1. Fit the model and save the estimated coefficients.
2. Use the estimated coefficients to calculate the residuals.
3. Repeat steps 3a3c R times.
3a. Draw a simple random sample with replacement of size T + h from the residuals.
When the tth observation is drawn, all K residuals are selected, preserving any
contemporaneous correlation among the residuals.
3b. Use the sampled residuals, p initial values of the endogenous variables, any
exogenous variables, and the estimated coefficients to construct a new sample
dataset.
3c. Save the simulated endogenous variables for the h forecast periods in the bootstrapped
dataset.
4. For each endogenous variable and each forecast period, the simulated standard error is the
estimated standard error of the R simulated forecasts. By default, the upper and lower bounds
of the (1 )100% are estimated using the simulation-based estimates of the standard errors
and the normality assumption, as in (3). If the bscentile option is specified, the sample
centiles for the upper and lower bounds of the R simulated forecasts are used for the upper
and lower bounds of the confidence intervals.
If the bsp option is specified, a parametric simulation algorithm is used. Specifically, everything
is as above except that 3a is replaced by 3a(bsp) as follows:
3a(bsp). Draw T + h observations from a multivariate normal distribution with covariance
b
matrix .
The algorithm above assumes that h forecast periods come after the original sample of T
observations. If the h forecast periods lie within the original sample, smaller simulated datasets are
sufficient.
Dynamic forecasts after vec
Methods and formulas of [TS] vec discusses how to obtain the one-step predicted differences and
levels. fcast compute uses the previous dynamic predictions as inputs for later dynamic predictions.

166

fcast compute Compute dynamic forecasts after var, svar, or vec

Per Lutkepohl (2005, sec. 6.5), fcast compute uses


b (h) =

b
y

T
T d

 h1
X

b i
b
bi

i=0

b i are the estimated matrices of impulseresponse functions, T is the number of observations


where the
b is the estimated cross-equation variance
in the sample, d is the number of degrees of freedom, and
b
matrix. The formulas for d and are given in Methods and formulas of [TS] vec.
b (h).
The estimated standard errors at step h are the square roots of the diagonal elements of
b
y
Per Lutkepohl (2005), the estimated forecast-error variance does not consider parameter uncertainty.
As the sample size gets infinitely large, the importance of parameter uncertainty diminishes to zero.

References
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] fcast graph Graph forecasts after fcast compute
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
fcast graph Graph forecasts after fcast compute
Syntax
Remarks and examples

Menu
Also see

Description

Options

Syntax
fcast graph varlist

if

 

in

 

, options

where varlist contains one or more forecasted variables generated by fcast compute.
Description

options
Main

differences
noci
observed

graph forecasts of the first-differenced variables (vec only)


suppress confidence bands
include observed values of the predicted variables

Forecast plot

cline options

affect rendition of the forecast lines

CI plot

ciopts(area options)

affect rendition of the confidence bands

Observed plot

obopts(cline options)

affect rendition of the observed values

Y axis, Time axis, Titles, Legend, Overall

twoway options
byopts(by option)

any options other than by() documented in [G-3] twoway options


affect appearance of the combined graph; see [G-3] by option

Menu
Statistics

>

Multivariate time series

>

VEC/VAR forecasts

>

Graph forecasts

Description
fcast graph graphs dynamic forecasts of the endogenous variables from a VAR(p) or VECM that
has already been obtained from fcast compute; see [TS] fcast compute.

Options


Main

differences specifies that the forecasts of the first-differenced variables be graphed. This option is
available only with forecasts computed by fcast compute after vec. The differences option
implies noci.
167

168

fcast graph Graph forecasts after fcast compute

noci specifies that the confidence intervals be suppressed. By default, the confidence intervals are
included.
observed specifies that observed values of the predicted variables be included in the graph. By
default, observed values are not graphed.

Forecast plot

cline options affect the rendition of the plotted lines corresponding to the forecast;
[G-3] cline options.

see

CI plot

ciopts(area options) affects the rendition of the confidence bands for the forecasts; see
[G-3] area options.

Observed plot

obopts(cline options) affects the rendition of the observed values of the predicted variables; see
[G-3] cline options. This option implies the observed option.

Y axis, Time axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by().
byopts(by option) are documented in [G-3] by option. These options affect the appearance of the
combined graph.

Remarks and examples


fcast graph graphs dynamic forecasts created by fcast compute.

Example 1
In this example, we use a cointegrating VECM to model the state-level unemployment rates in
Missouri, Indiana, Kentucky, and Illinois, and we graph the forecasts against a 6-month holdout
sample.
. use http://www.stata-press.com/data/r13/urates
. vec missouri indiana kentucky illinois if t < tm(2003m7), trend(rconstant)
> rank(2) lags(4)
(output omitted )
. fcast compute m1_, step(6)

fcast graph Graph forecasts after fcast compute

169

. fcast graph m1_missouri m1_indiana m1_kentucky m1_illinois, observed

5.5

6.5

Forecast for indiana

Forecast for missouri

Forecast for illinois

5.5

5.5

6.5

6.5

7.5

Forecast for kentucky

2003m6

2003m8

2003m10

2003m122003m6

95% CI
observed

2003m8

2003m10

2003m12

forecast

Because the 95% confidence bands for the predicted unemployment rates in Missouri and Indiana do
not contain all their observed values, the model does not reliably predict these unemployment rates.

Also see
[TS] fcast compute Compute dynamic forecasts after var, svar, or vec
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
forecast Econometric model forecasting

Syntax

Description

Remarks and examples

References

Also see

Syntax
forecast subcommand . . .

, options

subcommand

Description

create
estimates
identity
coefvector
exogenous
solve
adjust
describe
list
clear
drop
query

create a new model


add estimation result to current model
specify an identity (nonstochastic equation)
specify an equation via a coefficient vector
declare exogenous variables
obtain one-step-ahead or dynamic forecasts
adjust a variable by add factoring, replacing, etc.
describe a model
list all forecast commands composing current model
clear current model from memory
drop forecast variables
check whether a forecast model has been started

See [TS] forecast create, [TS] forecast estimates, [TS] forecast identity, [TS] forecast coefvector,
[TS] forecast exogenous, [TS] forecast solve, [TS] forecast adjust, [TS] forecast describe,
[TS] forecast list, [TS] forecast clear, [TS] forecast drop, and [TS] forecast query for details about
these subcommands.

Description
forecast is a suite of commands for obtaining forecasts by solving models, collections of
equations that jointly determine the outcomes of one or more variables. Equations can be stochastic
relationships fit using estimation commands such as regress, ivregress, var, or reg3; or they can
be nonstochastic relationships, called identities, that express one variable as a deterministic function
of other variables. Forecasting models may also include exogenous variables whose values are already
known or determined by factors outside the purview of the system being examined. The forecast
commands can also be used to obtain dynamic forecasts in single-equation models.
The forecast suite lets you incorporate outside information into your forecasts through the use
of add factors and similar devices, and you can specify the future path for some model variables
and obtain forecasts for other variables conditional on that path. Each set of forecast variables has its
own name prefix or suffix, so you can compare forecasts based on alternative scenarios. Confidence
intervals for forecasts can be obtained via stochastic simulation and can incorporate both parameter
uncertainty and additive error terms.
forecast works with both time-series and panel datasets. Time-series datasets may not contain
any gaps, and panel datasets must be strongly balanced.

170

forecast Econometric model forecasting

171

This manual entry provides an overview of forecasting models and several examples showing how
the forecast commands are used together. See the individual subcommands manual entries for
detailed discussions of the various options available and specific remarks about those subcommands.

Remarks and examples


A forecasting model is a system of equations that jointly determine the outcomes of one or more
endogenous variables, whereby the term endogenous variables contrasts with exogenous variables,
whose values are not determined by the interplay of the systems equations. A model, in the context
of the forecast commands, consists of
1. zero or more stochastic equations fit using Stata estimation commands and added to the
current model using forecast estimates. These stochastic equations describe the behavior
of endogenous variables.
2. zero or more nonstochastic equations (identities) defined using forecast identity. These
equations often describe the behavior of endogenous variables that are based on accounting
identities or adding-up conditions.
3. zero or more equations stored as coefficient vectors and added to the current model using
forecast coefvector. Typically, you will fit your equations in Stata and use forecast
estimates to add them to the model. forecast coefvector is used to add equations
obtained elsewhere.
4. zero or more exogenous variables declared using forecast exogenous.
5. at least one stochastic equation or identity.
6. optional adjustments to be made to the variables of the model declared using forecast adjust.
One use of adjustments is to produce forecasts under alternative scenarios.
The forecast commands are designed to be easy to use, so without further ado, we dive headfirst
into an example.

Example 1: Kleins model


Example 3 of [R] reg3 shows how to fit Kleins (1950) model of the U.S. economy using the
three-stage least-squares estimator (3SLS). Here we focus on how to make forecasts from that model
once the parameters have been estimated. In Kleins model, there are seven equations that describe
the seven endogenous variables. Three of those equations are stochastic relationships, while the rest
are identities:

ct
it
wpt
yt
pt
kt
wt

= 0 + 1 pt + 2 pt1 + 3 wt + 1t
= 4 + 5 pt + 6 pt1 + 7 kt1 + 2t
= 8 + 9 yt + 10 yt1 + 11 yrt + 3t
= ct + it + gt
= yt tt wpt
= kt1 + it
= wgt + wpt

(1)
(2)
(3)
(4)
(5)
(6)
(7)

172

forecast Econometric model forecasting

The variables in the model are defined as follows:


Name

Description

Type

c
p
wp
wg
w
i
k
y
g
t
yr

Consumption
Private-sector profits
Private-sector wages
Government-sector wages
Total wages
Investment
Capital stock
National income
Government spending
Indirect bus. taxes + net exports
Time trend = Year 1931

endogenous
endogenous
endogenous
exogenous
endogenous
endogenous
endogenous
endogenous
exogenous
exogenous
exogenous

Our model has four exogenous variables: government-sector wages (wg), government spending (g),
a time-trend variable (yr), and, for simplicity, a variable that lumps indirect business taxes and net
exports together (t). To make out-of-sample forecasts, we must populate those variables over the
entire forecast horizon before solving our model. (We use the phrases solve our model and obtain
forecasts from our model interchangeably.)
We will illustrate the entire process of fitting and forecasting our model, though our focus will be
on the latter task. See [R] reg3 for a more in-depth look at fitting models like this one. Before we
solve our model, we first estimate the parameters of the stochastic equations by loading the dataset
and calling reg3:

forecast Econometric model forecasting

173

. use http://www.stata-press.com/data/r13/klein2
. reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g)
Three-stage least-squares regression
Equation
c
i
wp

Obs

Parms

RMSE

"R-sq"

chi2

21
21
21

3
3
3

.9443305
1.446736
.7211282

0.9801
0.8258
0.9863

864.59
162.98
1594.75

0.0000
0.0000
0.0000

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

c
p
--.
L1.

.1248904
.1631439

.1081291
.1004382

1.16
1.62

0.248
0.104

-.0870387
-.0337113

.3368194
.3599992

w
_cons

.790081
16.44079

.0379379
1.304549

20.83
12.60

0.000
0.000

.715724
13.88392

.8644379
18.99766

p
--.
L1.

-.0130791
.7557238

.1618962
.1529331

-0.08
4.94

0.936
0.000

-.3303898
.4559805

.3042316
1.055467

k
L1.

-.1948482

.0325307

-5.99

0.000

-.2586072

-.1310893

_cons

28.17785

6.793768

4.15

0.000

14.86231

41.49339

y
--.
L1.

.4004919
.181291

.0318134
.0341588

12.59
5.31

0.000
0.000

.3381388
.1143411

.462845
.2482409

yr
_cons

.149674
1.797216

.0279352
1.115854

5.36
1.61

0.000
0.107

.094922
-.3898181

.2044261
3.984251

wp

Endogenous variables:
Exogenous variables:

c i wp w p y
L.p L.k L.y yr t wg g

The output from reg3 indicates that we have a total of six endogenous variables even though our
model in fact has seven. The discrepancy stems from (6) of our model. The capital stock variable (k)
is a function of the endogenous investment variable and is therefore itself endogenous. However, kt
does not appear in any of our models stochastic equations, so we did not declare it in the endog()
option of reg3; from a purely estimation perspective, the contemporaneous value of the capital stock
variable is irrelevant, though it does play a role in terms of solving our model. We next store the
estimation results using estimates store:
. estimates store klein

Now we are ready to define our model using the forecast commands. We first tell Stata to
initialize a new model; we will call our model kleinmodel:
. forecast create kleinmodel
Forecast model kleinmodel started.

174

forecast Econometric model forecasting

The name you give the model mainly controls how output from forecast commands is labeled.
More importantly, forecast create creates the internal data structures Stata uses to keep track of
your model.
The next step is to add all the equations to the model. To add the three stochastic equations we
fit using reg3, we use forecast estimates:
. forecast estimates klein
Added estimation results from reg3.
Forecast model kleinmodel now contains 3 endogenous variables.

That command tells Stata to find the estimates stored as klein and add them to our model. forecast
estimates uses those estimation results to determine that there are three endogenous variables (c, i,
and wp), and it will save the estimated parameters and other information that forecast solve will
later need to obtain predictions for those variables. forecast estimates confirmed our request by
reporting that the estimation results added were from reg3.
forecast estimates reports that our forecast model has three endogenous variables because our
reg3 command included three left-hand-side variables. The fact that we specified three additional
endogenous variables in the endog() option of reg3 so that reg3 reports a total of six endogenous
variables is irrelevant to forecast. All that matters is the number of left-hand-side variables in the
model.
We also need to specify the four identities, equations (4) through (7), that determine the other four
endogenous variables in our model. To do that, we use forecast identity:
. forecast identity y = c + i + g
Forecast model kleinmodel now contains 4 endogenous variables.
. forecast identity p = y - t - wp
Forecast model kleinmodel now contains 5 endogenous variables.
. forecast identity k = L.k + i
Forecast model kleinmodel now contains 6 endogenous variables.
. forecast identity w = wg + wp
Forecast model kleinmodel now contains 7 endogenous variables.

You specify identities similarly to how you use the generate command, except that the left-hand-side
variable is an endogenous variable in your model rather than a new variable you want to create in your
dataset. Time-series operators often come in handy when specifying identities; here we expressed
capital, a stock variable, as its previous value plus current-period investment, a flow variable. An
identity defines an endogenous variable, so each time we use forecast identity, the number of
endogenous variables in our forecast model increases by one.
Finally, we will tell Stata about the four exogenous variables. We do that with the forecast
exogenous command:
. forecast exogenous wg
Forecast model kleinmodel now contains 1 declared exogenous variable.
. forecast exogenous g
Forecast model kleinmodel now contains 2 declared exogenous variables.
. forecast exogenous t
Forecast model kleinmodel now contains 3 declared exogenous variables.
. forecast exogenous yr
Forecast model kleinmodel now contains 4 declared exogenous variables.

forecast keeps track of the exogenous variables that you declare using the forecast exogenous
command and reports the number currently in the model. When you later use forecast solve,
forecast verifies that these variables contain nonmissing data over the forecast horizon. In fact, we
could have instead typed

forecast Econometric model forecasting

175

. forecast exogenous wg g t yr

but to avoid confusing ourselves, we prefer to issue one command for each variable in our model.
Now Stata knows everything it needs to know about the structure of our model. klein2.dta in
memory contains annual observations from 1920 to 1941. Before we make out-of-sample forecasts,
we should first see how well our model works by comparing its forecasts with actual data. There
are a couple of ways to do that. The first is to produce static forecasts. In static forecasts, actual
values of all lagged variables that appear in the model are used. Because actual values will be missing
beyond the last historical time period in the dataset, static forecasts can only forecast one period
into the future (assuming only first lags appear in the model); for that reason, they are often called
one-step-ahead forecasts. To obtain these one-step-ahead forecasts, we type
. forecast solve, prefix(s_) begin(1921) static
Computing static forecasts for model kleinmodel.
Starting period: 1921
Ending period:
1941
Forecast prefix: s_
1921: ............................................
1922: ..............................................
1923: .............................................
(output omitted )
1940: .............................................
1941: ..............................................
Forecast 7 variables spanning 21 periods.

We specified begin(1921) to request that the first year for which forecasts are produced be 1921. Our
model includes variables that are lagged one period; because our data start in 1920, 1921 is the first
year in which we can evaluate all the equations of the model. If we did not specify the begin(1921)
option, forecast solve would have started forecasting in 1941. By default, forecast solve looks
for the earliest time period in which any of the endogenous variables contains a missing value and
begins forecasting in that period. In klein2.dta, k is missing in 1941.
The header of the output confirms that we requested static forecasts for our model, and it indicates
that it will produce forecasts from 1921 through 1941, the last year in our dataset. By default,
forecast solve produces a status report in which the time period being forecast is displayed along
with a dot for each iteration the equation solver performs. The footer of the output confirms that we
forecast seven endogenous variables for 21 years.
The command we just typed will create seven new variables in our dataset, one for each endogenous
variable, containing the static forecasts. Because we specified prefix(s ), the seven new variables
will be named s c, s i, s wp, s y, s p, s k, and s w. Here we graph a subset of the variables
and their forecasts:

176

forecast Econometric model forecasting


Static Forecasts
Consumption

40

50

60

70

40 50 60 70 80 90

Total Income

1920

1925

1930
year

1935

1940

1920

1925

1935

1940

Private Wages

10

20 30 40 50 60

Investment

1930
year

1920

1925

1930
year

1935

1940

1920

1925

1930
year

1935

1940

Solid lines denote actual values.


Dashed lines denote forecast values.

Our static forecasts appear to fit the data relatively well. Had they not fit well, we would have to go
back and reexamine the specification of our model. If the static forecasts are poor, then the dynamic
forecasts that use previous periods forecast values are unlikely to work well either. On the other
hand, even if the model produces good static forecasts, it may not produce accurate dynamic forecasts
more than one or two periods into the future.
Another way to check how well a model forecasts is to produce dynamic forecasts for time periods
in which observed values are available. Here we begin dynamic forecasts in 1936, giving us six years
data with which to compare actual and forecast values and then graph our results:
. forecast solve, prefix(d_) begin(1936)
Computing dynamic forecasts for model kleinmodel.
Starting period: 1936
Ending period:
1941
Forecast prefix: d_
1936: ............................................
1937: ..........................................
1938: .............................................
1939: .............................................
1940: ............................................
1941: ..............................................
Forecast 7 variables spanning 6 periods.

forecast Econometric model forecasting

177

Dynamic Forecasts
Consumption

40

50

60

70

40 50 60 70 80 90

Total Income

1920

1925

1930
year

1935

1940

1920

1930
year

1935

1940

Private Wages

20 30 40 50 60

Investment

1925

1920

1925

1930
year

1935

1940

1920

1925

1930
year

1935

1940

Solid lines denote actual values.


Dashed lines denote forecast values.

Most of the in-sample forecasts look okay, though our model was unable to predict the outsized
increase in investment in 1936 and the sharp drop in 1938.

Our first example was particularly easy because all the endogenous variables appeared in levels.
However, oftentimes the endogenous variables are better modeled using mathematical transformations
such as logarithms, first differences, or percentage changes; transformations of the endogenous
variables may appear as explanatory variables in other equations. The next few examples illustrate
these complications.

Example 2: Models with transformed endogenous variables


hardware.dta contains hypothetical quarterly sales data from the Hughes Hardware Company,
a huge regional distributor of building products. Hughes Hardware has three main product lines:
dimensional lumber (dim), sheet goods such as plywood and fiberboard (sheet), and miscellaneous
hardware, including fasteners and hand tools (misc). Based on past experience, we know that
dimensional lumber sales are closely tied to the level of new home construction and that other product
lines sales can be modeled in terms of the quantity of lumber sold. We are going to use the following
set of equations to model sales of the three product lines:

%dimt = 10 + 11 ln(startst ) + 12 %gdpt + 13 unratet + 1t


sheett = 20 + 21 dimt + 22 %gdpt + 23 unratet + 2t
misct = 30 + 31 dimt + 32 %gdpt + 33 unratet + 3t
Here startst represents the number of new homes for which construction began in quarter t, gdpt
denotes real (inflation-adjusted) gross domestic product (GDP), and unratet represents the quarterly
average unemployment rate. Our equation for dimt is written in terms of percentage changes from
quarter to quarter rather than in levels, and the percentage change in GDP appears as a regressor in
each equation rather than the level of GDP itself. In our model, these three macroeconomic factors
are exogenous, and here we will reserve the last few years data to make forecasts; in practice, we
would need to make our own forecasts of these macroeconomic variables or else purchase a forecast.
We will approximate the percentage change variables by taking first-differences of the natural
logarithms of the respective underlying variables. In terms of estimation, this does not present any
challenges. Here we load the dataset into memory, create the necessary log-transformed variables,

178

forecast Econometric model forecasting

and fit the three equations using regress with the data through the end of 2009. We use quietly
to suppress the output from regress to save space, and we store each set of estimation results as
we go. In Stata, we type
. use http://www.stata-press.com/data/r13/hardware, clear
(Hughes Hardware sales data)
. generate lndim = ln(dim)
. generate lngdp = ln(gdp)
. generate lnstarts = ln(starts)
.
.
.
.
.
.

quietly regress
estimates store
quietly regress
estimates store
quietly regress
estimates store

D.lndim lnstarts D.lngdp unrate if qdate <= tq(2009q4)


dim
sheet dim D.lngdp unrate if qdate <= tq(2009q4)
sheet
misc dim D.lngdp unrate if qdate <= tq(2009q4)
misc

The equations for sheet goods and miscellaneous items do not present any challenges for forecast,
so we proceed by creating a new forecast model named salesfcast and adding those two equations:
. forecast create salesfcast, replace
(Forecast model kleinmodel ended.)
Forecast model salesfcast started.
. forecast estimates sheet
Added estimation results from regress.
Forecast model salesfcast now contains 1 endogenous variable.
. forecast estimates misc
Added estimation results from regress.
Forecast model salesfcast now contains 2 endogenous variables.

The equation for dimensional lumber requires more finesse. First, because our dependent variable
contains a time-series operator, we must use the names() option of forecast estimates to specify
a valid name for the endogenous variable being added:
. forecast estimates dim, names(dlndim)
Added estimation results from regress.
Forecast model salesfcast now contains 3 endogenous variables.

We have entered the endogenous variable dlndim into our model, but it represents the left-hand-side
variable of the regression equation we just added. That is, dlndim is the first-difference of the
logarithm of dim, the sales variable we ultimately want to forecast. We can specify an identity to
reverse the first-differencing, providing us with a variable containing the logarithm of dim:
. forecast identity lndim = L.lndim + dlndim
Forecast model salesfcast now contains 4 endogenous variables.

Finally, we can specify another identity to obtain dim from lndim:


. forecast identity dim = exp(lndim)
Forecast model salesfcast now contains 5 endogenous variables.

forecast Econometric model forecasting

179

Now we can solve the model. We will obtain dynamic forecasts starting in the first quarter of
2010, and we will use the log(off) option to suppress the iteration log:
. forecast solve, begin(tq(2010q1)) log(off)
Computing dynamic forecasts for model salesfcast.
Starting period: 2010q1
Ending period:
2012q3
Forecast prefix: f_
Forecast 5 variables spanning 11 periods.

We did not specify the prefix() or suffix() option, so by default, forecast prefixed our forecast
variables with f . The following graph illustrates our forecasts:
Hughes Hardware Sales ($mil.)
350
Dimensional Lumber
300

250
160
Sheet Goods
130

100
200
Miscellany
150

100
2008q1

2009q1

2010q1

2011q1

Forecast

Actual

2012q1

Our model performed well in 2010, but it did not forecast the pickup in sales that occurred in 2011
and 2012.

Technical note
For more information about working with log-transformed variables, see the second technical note
in [TS] forecast estimates.

The forecast commands can also be used to make forecasts for strongly balanced panel datasets.
A panel dataset is strongly balanced when all the panels have the same number of observations, and
the observations for different panels were all made at the same times. Our next example illustrates
how to produce a forecast with panel data and highlights a couple of key assumptions one must make.

Example 3: Forecasting a panel dataset


In the previous example, we mentioned that Hughes Hardware was a regional distributor of building
products. In fact, Hughes Hardware operates in five states across the southern United States: Texas,
Oklahoma, Louisiana, Arkansas, and Mississippi. The company is in the process of deciding whether
it should open additional distribution centers or move existing ones to new locations. As part of the
process, we need to make sales forecasts for each of the states the company serves.

180

forecast Econometric model forecasting

To make our state-level forecasts, we will use essentially the same model that we did for the
company-wide forecast, though we will also include state-specific effects. The model we will use is

%dimit = 10 + 11 ln(startsit ) + 12 rgspgrowthit + 13 unrateit + u1i + 1it


sheetit = 20 + 21 dimit + 22 rgspgrowthit + 23 unrateit + u2i + 2it
miscit = 30 + 31 dimit + 32 rgspgrowthit + 33 unrateit + u3i + 3it
The subscript i indexes states, and we have replaced the gdp variable that was in our previous model
with rgspgrowth, which measures the annual growth rate in real gross state product (GSP), the
state-level analogue to national GDP. The GSP data are released only annually, so we have replicated
the annual growth rate for all four quarterly observations in a given year. For example, rgspgrowth
is about 5.3 for the four observations for the state of Texas in the year 2007; in 2007, Texas real
GSP was 5.3% higher than in 2006.
The state-level error terms are u1i , u2i , and u3i . Here we will use the fixed-effects estimator and
fit the three equations via xtreg, fe, again using data only through the end of 2009 so that we
can examine how well our model forecasts. Our first task is to fit the three equations and store the
estimation results. At the same time, we will also use predict to obtain the predicted fixed-effects
terms. You will see why in just a moment. Because the regression results are not our primary concern
here, we will use quietly to suppress the output.
In Stata, we type
. use http://www.stata-press.com/data/r13/statehardware, clear
(Hughes state-level sales data)
. generate lndim = ln(dim)
. generate lnstarts = ln(starts)
. quietly xtreg D.lndim lnstarts rgspgrowth unrate if qdate <= tq(2009q4), fe
. predict dlndim_u, u
(45 missing values generated)
. estimates store dim
. quietly xtreg sheet dim rgspgrowth unrate if qdate <= tq(2009q4), fe
. predict sheet_u, u
(40 missing values generated)
. estimates store sheet
. quietly xtreg misc dim rgspgrowth unrate if qdate <= tq(2009q4), fe
. predict misc_u, u
(40 missing values generated)
. estimates store misc

Having fit the model, we are almost ready to make forecasts. First, though, we need to consider
how to handle the state-level error terms. If we simply created a forecast model, added our three
estimation results, then called forecast solve, Stata would forecast miscit , for example, as a
function of dimit , rgspgrowthit , unrateit , and the estimate of the constant term 30 . However,
our model implies that miscit also depends on u3i and the idiosyncratic error term 3it . We will
ignore the idiosyncratic error for now (but see the discussion of simulations in [TS] forecast solve).
By construction, u3i has a mean of zero when averaged across all panels, but in general, u3i is
nonzero for any individual panel. Therefore, we should include it in our forecasts.
After you fit a model with xtreg, you can predict the panel-specific error component for the
subset of observations in the estimation sample. Typically, xtreg is used in situations where the
number of observations per panel T is modest. In those cases, the estimates of the panel-specific
error components are likely to be noisy (analogous to estimating a sample mean with just a few
observations). Often asymptotic analyses of panel-data estimators assume T is fixed, and in those
cases, the estimators of the panel-specific errors are inconsistent.

forecast Econometric model forecasting

181

However, in forecasting applications, the number of observations per panel is usually larger than
in most other panel-data applications. With enough observations, we can have more confidence in
the estimated panel-specific errors. If we are willing to assume that we have decent estimates of the
panel-specific errors and that those panel-level effects will remain constant over the forecast horizon,
then we can incorporate them into our forecasts. Because predict only provided us with estimates
of the panel-level effects for the estimation sample, we need to extend them into the forecast horizon.
An easy way to do that is to use egen to create a new set of variables:
. by state: egen dlndim_u2 = mean(dlndim_u)
. by state: egen sheet_u2 = mean(sheet_u)
. by state: egen misc_u2 = mean(misc_u)

We can use forecast adjust to incorporate these terms into our forecasts. The following commands
define our forecast model, including the estimated panel-specific terms:
. forecast create statemodel, replace
(Forecast model salesfcast ended.)
Forecast model statemodel started.
. forecast estimates dim, name(dlndim)
Added estimation results from xtreg.
Forecast model statemodel now contains 1 endogenous
. forecast adjust dlndim = dlndim + dlndim_u2
Endogenous variable dlndim now has 1 adjustment.
. forecast identity lndim = L.lndim + dlndim
Forecast model statemodel now contains 2 endogenous
. forecast identity dim = exp(lndim)
Forecast model statemodel now contains 3 endogenous
. forecast estimates sheet
Added estimation results from xtreg.
Forecast model statemodel now contains 4 endogenous
. forecast adjust sheet = sheet + sheet_u2
Endogenous variable sheet now has 1 adjustment.
. forecast estimates misc
Added estimation results from xtreg.
Forecast model statemodel now contains 5 endogenous
. forecast adjust misc = misc + misc_u2
Endogenous variable misc now has 1 adjustment.

variable.

variables.
variables.

variables.

variables.

We used forecast adjust to perform our adjustment to dlndim immediately after we added those
estimation results so that we would not forget to do so and before we used identities to obtain the
actual dim variable. However, we could have specified the adjustment at any time. Regardless of
when you specify an adjustment, forecast solve performs those adjustments immediately after the
variable being adjusted is computed.

182

forecast Econometric model forecasting

Now we can solve our model. Here we obtain dynamic forecasts beginning in the first quarter of
2010:
. forecast solve, begin(tq(2010q1))
Computing dynamic forecasts for model statemodel.
Starting period: 2010q1
Ending period:
2011q4
Number of panels: 5
Forecast prefix: f_
Solving panel 1
Solving panel 2
Solving panel 3
Solving panel 4
Solving panel 5
Forecast 5 variables spanning 8 periods for 5 panels.

Here is our state-level forecast for sheet goods:


Sales of Sheet Goods ($mil.)
MS

8
6
4

13 14 15 16 17

10

LA

AR

2008

2010

2012

TX

70

80

10 11 12

90 100 110

OK

2008

2010

2012

2008

Forecast

2010

2012

Actual

Similar to our company-wide forecast, our state-level forecast failed to call the bottom in sales that
occurred in 2011. Because our model missed the shift in sales momentum in every one of the five
states, we would be inclined to go back and try respecifying one or more of the equations in our
model. On the other hand, if our model forecasted most of the states well but performed poorly in
just a few states, then we would first want to investigate whether any events in those states could
account for the unexpected results.

Technical note
Stata also provides the areg command for fitting a linear regression with a large dummy-variable
set and is designed for situations where the number of groups (panels) is fixed, while the number of
observations per panel increases with the sample size. When the goal is to create a forecast model
for panel data, you should nevertheless use xtreg rather than areg. The forecast commands
require knowledge of the panel-data settings declared using xtset as well as panel-related estimation
information saved by the other panel-data commands in order to produce forecasts with panel datasets.

forecast Econometric model forecasting

183

In the previous example, none of our equations contained lagged dependent variables as regressors.
If an equation did contain a lagged dependent variable, then one could use a dynamic panel-data
(DPD) estimator such as xtabond, xtdpd, or xtdpdsys. DPD estimators are designed for cases
where the number of observations per panel T is small. As shown by Nickell (1981), the bias
of the standard fixed- and random-effects estimators in the presence of lagged dependent variables
is of order 1/T and is thus particularly severe when each panel has relatively few observations.
Judson and Owen (1999) perform Monte Carlo experiments to examine the relative performance of
different panel-data estimators in the presence of lagged dependent variables when used with panel
datasets having dimensions more commonly encountered in macroeconomic applications. Based on
their results, while the bias of the standard fixed-effects estimator (LSDV in their notation) is not
inconsequential even when T = 20, for T = 30, the fixed-effects estimator does work as well as most
alternatives. The only estimator that appreciably outperformed the standard fixed-effects estimator
when T = 30 is the least-squares dummy variable corrected estimator (LSDVC in their notation).
Bruno (2005) provides a Stata implementation of that estimator. Many datasets used in forecasting
situations contain even more observations per panel, so the Nickell bias is unlikely to be a major
concern.
In this manual entry, we have provided an overview of the forecast commands and provided
several examples to get you started. The command-specific entries fill in the details.

References
Bruno, G. S. F. 2005. Estimation and inference in dynamic unbalanced panel-data models with a small number of
individuals. Stata Journal 5: 473500.
Judson, R. A., and A. L. Owen. 1999. Estimating dynamic panel data models: a guide for macroeconomists. Economics
Letters 65: 915.
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.
Nickell, S. J. 1981. Biases in dynamic models with fixed effects. Econometrica 49: 14171426.

Also see
[TS] var Vector autoregressive models
[TS] tsset Declare data to be time-series data
[R] ivregress Single-equation instrumental-variables regression
[R] reg3 Three-stage estimation for systems of simultaneous equations
[R] regress Linear regression
[XT] xtreg Fixed-, between-, and random-effects and population-averaged linear models
[XT] xtset Declare data to be panel data

Title
forecast adjust Adjust a variable by add factoring, replacing, etc.
Syntax
Also see

Description

Remarks and examples

Stored results

Reference

Syntax
forecast adjust varname = exp

if

 

in

varname is the name of an endogenous variable that has been previously added to the model using
forecast estimates, forecast coefvector, or forecast identity.
exp represents a Stata expression; see [U] 13 Functions and expressions.

Description
forecast adjust specifies an adjustment to be applied to an endogenous variable in the model.
Adjustments are typically used to produce alternative forecast scenarios or to incorporate outside
information into a model. For example, you could use forecast adjust with a macroeconomic
model to simulate the effect of an oil price shock whereby the price of oil spikes $50 higher than
your model otherwise predicts in a given quarter.

Remarks and examples


When preparing a forecast, you often want to produce several different scenarios. The baseline
scenario is the default forecast that your model produces. It reflects the interplay among the equations
and exogenous variables without any outside forces acting on the model. Users of forecasts often
want answers to questions like What happens to the economy if housing prices decline 10% more
than your baseline forecast suggests they will? or What happens to unemployment and interest rates
if tax rates increase? forecast adjust lets you explore such questions by specifying alternative
paths for one or more endogenous variables in your model.

Example 1: Revisiting the Klein model


In example 1 of [TS] forecast, we produced a baseline forecast for the classic Klein (1950) model.
We noted that investment declined quite substantially in 1938. Suppose the government had a plan
such as a one-year investment tax credit that it could enact in 1939 to stimulate investment. Based
on discussions with accountants, tax experts, and business leaders, say this plan would encourage an
additional $1 billion in investment in 1939. How would this additional investment affect the economy?
To answer this question, we first refit the Klein (1950) model from [TS] forecast using the data
through 1938 and then obtain dynamic forecasts starting in 1939. We will prefix these forecast
variables with bl to indicate they are the baseline forecasts. In Stata, we type

184

forecast adjust Adjust a variable by add factoring, replacing, etc.


.
.
>
.

185

use http://www.stata-press.com/data/r13/klein2
quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr) if year < 1939,
endog(w p y) exog(t wg g)
estimates store klein

. forecast create kleinmodel


Forecast model kleinmodel started.
. forecast estimates klein
Added estimation results from reg3.
Forecast model kleinmodel now contains 3 endogenous variables.
. forecast identity y = c + i + g
Forecast model kleinmodel now contains 4 endogenous variables.
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast

identity p = y - t - wp
model kleinmodel now contains 5
identity k = L.k + i
model kleinmodel now contains 6
identity w = wg + wp
model kleinmodel now contains 7
exogenous wg
model kleinmodel now contains 1
exogenous g
model kleinmodel now contains 2
exogenous t
model kleinmodel now contains 3
exogenous yr
model kleinmodel now contains 4
solve, prefix(bl_) begin(1939)

endogenous variables.
endogenous variables.
endogenous variables.
declared exogenous variable.
declared exogenous variables.
declared exogenous variables.
declared exogenous variables.

Computing dynamic forecasts for model kleinmodel.


Starting period: 1939
Ending period:
1941
Forecast prefix: bl_
1939: .......................................................................
....................................................
1940: .......................................................................
................................................
1941: .......................................................................
.................................................
Forecast 7 variables spanning 3 periods.

To model our $1 billion increase in investment in 1939, we type


. forecast adjust i = i + 1 if year == 1939
Endogenous variable i now has 1 adjustment.

While computing the forecasts for 1939, whenever forecast evaluates the equation for i, it will set
i to be higher than it would otherwise be by 1. Now we re-solve our model using the prefix alt
to indicate this is an alternative forecast:

186

forecast adjust Adjust a variable by add factoring, replacing, etc.


. forecast solve, prefix(alt_) begin(1939)
Computing dynamic forecasts for model kleinmodel.
Starting period:
Ending period:
Forecast prefix:
1939:
1940:
1941:

1939
1941
alt_

.......................................................................
...................................................
.......................................................................
..............................................
.......................................................................
................................................

Forecast 7 variables spanning 3 periods.

The following graph shows how investment and total income respond to this policy shock.
Effect of $1 billion investment tax credit
Total Income

60

80

$ Billion

$ Billion

100

10

15

120

Investment

1938

1939

1940

1941

1938

year

1939

1940

1941

year

Solid lines denote forecast without tax credit


Dashed lines denote forecast with tax credit

Both investment and total income would be higher not just in 1939 but also in 1940; the higher
capital stock implied by the additional investment raises total output (and hence income) even after
the tax credit expires. Lets look at these two variables in more detail:
. list year bl_i alt_i bl_y alt_y if year >= 1938, sep(0)

19.
20.
21.
22.

year

bl_i

alt_i

bl_y

alt_y

1938
1939
1940
1941

-1.9
3.757227
7.971523
16.16375

-1.9
6.276423
9.501909
16.20362

60.9
75.57685
89.67435
123.0809

60.9
80.71709
94.08473
124.238

Although we simulated a policy that we thought would encourage $1 billion in investment,


investment in fact rises about $2.5 billion in 1939 according to our model. That is because higher
investment raises total income, which also affects private-sector profits, which beget further changes
in investment, and so on.
The investment multiplier in this example might strike you as implausibly large, but it highlights an
important attribute of forecasting models. Studying each equations estimated coefficients in isolation
can help to unveil some specification errors, but one must also consider how those equations interact.

forecast adjust Adjust a variable by add factoring, replacing, etc.

187

It is possible to construct models in which each equation appears to be well specified, but the model
nevertheless forecasts poorly or suggests unlikely behavior in response to policy shocks.

In the previous example, we applied a single adjustment to a single endogenous variable in a


single time period. However, forecast allows you to specify forecast adjust multiple times with
each endogenous variable, and many real-world policy simulations require adjustments to multiple
variables. You can also consider policies that affect variables for multiple periods.
For example, suppose we wanted to see what would happen if our investment tax credit lasted
two years instead of one. One way would be to use forecast adjust twice:
. forecast adjust i = i + 1 if year == 1939
. forecast adjust i = i + 1 if year == 1940

A second way would be to make that adjustment using one command:


. forecast adjust i = i + 1 if year == 1939 | year == 1940

To make adjustments lasting more than one or two periods, you should create an adjustment variable,
which makes more sense. A third way to simulate our two-year tax credit is
. generate i_adj = 0
. replace i_adj = 1 if year == 1939 | year == 1940
. forecast adjust i = i + i_adj

So far in our discussion of forecast adjust, we have always shown an endogenous variable
being adjusted by adding a number or variable to it. However, any valid expression is allowed on the
right-hand side of the equals sign. If you want to explore the effects of a policy that will increase
investment by 10% in 1939, you could type
. forecast adjust i = 1.1*i if year == 1939

If you believe investment will be 2.0 in 1939, you could type


. forecast adjust i = -2.0 if year == 1939

An alternative way to force forecasts of endogenous variables to take on prespecified values is


discussed in example 1 of [TS] forecast solve.

Stored results
forecast adjust stores the following in r():
Macros
r(lhs)
r(rhs)
r(basenames)
r(fullnames)

left-hand-side (endogenous) variable


right-hand side of identity
base names of variables found on right-hand side
full names of variables found on right-hand side

Reference
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.

188

forecast adjust Adjust a variable by add factoring, replacing, etc.

Also see
[TS] forecast Econometric model forecasting
[TS] forecast solve Obtain static and dynamic forecasts

Title
forecast clear Clear current model from memory

Syntax

Description

Remarks and examples

Also see

Syntax
forecast clear

Description
forecast clear removes the current forecast model from memory.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast allows you to have only one model in memory
at a time. You use forecast clear to remove the current model from memory. Forecast models
themselves do not consume a significant amount of memory, so there is no need to clear a model from
memory unless you intend to create a new one. An alternative to forecast clear is the replace
option with forecast create.
Calling forecast clear when no forecast model exists in memory does not result in an error.

Also see
[TS] forecast Econometric model forecasting
[TS] forecast create Create a new forecast model

189

Title
forecast coefvector Specify an equation via a coefficient vector
Syntax
Also see

Description

Options

Remarks and examples

Methods and formulas

Syntax
forecast coefvector cname

, options

cname is a Stata matrix with one row.


options

Description

variance(vname)
errorvariance(ename)


names(namelist , replace )

specify parameter variance matrix


specify additive error variance matrix
use namelist for names of left-hand-side variables

Description
forecast coefvector adds equations that are stored as coefficient vectors to your forecast model.
Typically, equations are added using forecast estimates and forecast identity. forecast
coefvector is used in less-common situations where you have a vector of parameters that represent
a linear equation.
Most users of the forecast commands will not need to use forecast coefvector. We recommend skipping this manual entry until you are familiar with the other features of forecast.

Options
variance(vname) specifies that Stata matrix vname contains the variance matrix of the estimated
parameters. This option only has an effect if you specify the simulate() option when calling
forecast solve and request sim techniques betas or residuals. See [TS] forecast solve.
errorvariance(ename) specifies that the equations being added include an additive error term with
variance ename, where ename is the name of a Stata matrix. The number of rows and columns in
ename must match the number of equations represented by coefficient vector cname. This option
only has an effect if you specify the simulate() option when calling forecast solve and
request sim techniques errors or residuals. See [TS] forecast solve.


names(namelist , replace ) instructs forecast coefvector to use namelist as the names of the
left-hand-side variables in the coefficient vector being added. By default, forecast coefvector
uses the equation names on the column stripe of cname. You must use this option if any of the
equation names stored with cname contains time-series operators.

190

forecast coefvector Specify an equation via a coefficient vector

191

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. This manual entry also assumes that you are familiar with Statas
matrices and the concepts of row and column names that can be attached to them; see [P] matrix.
You use forecast coefvector to add endogenous variables to your model that are defined by linear
equations, where the linear equations are stored in a coefficient (parameter) vector.
Remarks are presented under the following headings:
Introduction
Simulations with coefficient vectors

Introduction
forecast coefvector can be used to add equations that you obtained elsewhere to your model.
For example, you might see the estimated coefficients for an equation in an article and want to add
that equation to your model. User-written estimators that do not implement a predict command can
also be included in forecast models via forecast coefvector. forecast coefvector can also
be useful in situations where you want to simulate time-series data, as the next example illustrates.

Example 1: A shock to an autoregressive process


Consider the following autoregressive process:

yt = 0.9yt1 0.6yt2 + 0.3yt3


Suppose yt is initially equal to zero. How does yt evolve in response to a one-unit shock at time
t = 5? We can use forecast coefvector to find out. First, we create a small dataset with time
variable t and set our target variable y equal to zero:
. set obs 20
obs was 0, now 20
. generate t = _n
. tsset t
time variable:
delta:

t, 1 to 20
1 unit

. generate y = 0

Now lets think about our coefficient vector. The only tricky part is in labeling the columns. We can
represent the lagged values of yt using time-series operators; there is just one equation, corresponding
to variable y. We can use matrix coleq to apply both variable and equation names to the columns
of our matrix. In Stata, we type
. matrix y = (.9, -.6, 0.3)
. matrix coleq y = y:L.y y:L2.y y:L3.y
. matrix list y
y[1,3]
y:
y:
L. L2.
y
y
r1
.9 -.6

y:
L3.
y
.3

192

forecast coefvector Specify an equation via a coefficient vector

forecast coefvector ignores the row name of the vector being added (r1 here), so we can leave
it as is. Next we create a forecast model and add y:
. forecast create
Forecast model started.
. forecast coefvector y
Forecast model now contains 1 endogenous variable.

To shock our system at t = 5, we can use forecast adjust:


. forecast adjust y = 1 in 5
Endogenous variable y now has 1 adjustment.

Now we can solve our model. Because our y variable is filled in for the entire dataset, forecast
solve will not be able to automatically determine when forecasting should commence. We have three
lags in our process, so we will start at t = 4. To reduce the amount of output, we specify log(off):
. forecast solve, begin(4) log(off)
Computing dynamic forecasts for current model.
Starting period:
Ending period:
Forecast prefix:

4
20
f_

Forecast 1 variable spanning 17 periods.

.2

Response
.4
.6

.8

ImpulseResponse Function

10
t

15

20

Evolution of yt in response to a unit shock at t = 5.

The graph shows our shock causing y to jump to 1 at t = 5. At t = 6, we can see that y = 0.9, and
at t = 7, we can see that y = 0.9 0.9 0.6 1 = 0.21.

The previous example used a coefficient vector representing a single equation. However, coefficient
vectors can contain multiple equations. For example, say we read an article and saw the following
results displayed:

xt = 0.2 + 0.3xt1 0.8zt


zt = 0.1 + 0.7zt1 + 0.3xt 0.2xt1

forecast coefvector Specify an equation via a coefficient vector

193

We can add both equations at once to our forecast model. Again the key is in labeling the columns.
forecast coefvector understands cons to mean a constant term, and it looks at the equation
names on the vectors columns to determine how many equations there are and to what endogenous
variables they correspond:
. matrix eqvector = (0.2, 0.3, -0.8, 0.1, 0.7, 0.3, -0.2)
. matrix coleq eqvector = x:_cons x:L.x x:y y:_cons y:L.y y:x y:L.x
. matrix list eqvector
eqvector[1,7]
x:

r1

_cons
.2

x:
L.
x
.3

x:
y
-.8

y:
_cons
.1

y:
L.
y
.7

y:
x
.3

y:
L.
x
-.2

We could then type


. forecast coefvector y

to add our coefficient vector to a model.


Just like with estimation results whose left-hand-side variables contain time-series operators, if
any of the equation names of the coefficient vector being added contains time-series operators, you
must use the names() option of forecast coefvector to specify alternative names.

Simulations with coefficient vectors


The forecast solve command provides the option simulate(sim technique, . . .) to perform
stochastic simulations and obtain measures of forecast uncertainty. How forecast solve handles
coefficient vectors when performing these simulations depends on the options provided with forecast
coefvector. There are four cases to consider:
1. You specify neither variance() nor errorvariance() with forecast coefvector. You
have provided no measures of uncertainty with this coefficient vector. Therefore, forecast
solve treats it like an identity. No random errors or residuals are added to this coefficient
vectors linear combination, nor are the coefficients perturbed in any way.
2. You specify variance() but not errorvariance(). The variance() option provides
the covariance matrix of the estimated parameters in the coefficient vector. Therefore, the
coefficient vector is taken to be stochastic. If you request sim technique betas, this coefficient
vector is assumed to be distributed multivariate normal with a mean equal to the original
value of the vector and covariance matrix as specified in the variance() option, and random
draws are taken from this distribution. If you request sim technique residuals, randomly
chosen static residuals are added to this coefficient vectors linear combination. Because
you did not specify a covariance matrix for the error terms with the errorvariance()
option, sim technique errors cannot draw random errors for this coefficient vectors linear
combination, so sim technique errors has no impact on the equations.
3. You specify errorvariance() but not variance(). Because you specified a covariance
matrix for the assumed additive error term, the equations represented by this coefficient vector
are stochastic. If you request sim technique residuals, randomly chosen static residuals
are added to this coefficient vectors linear combination. If you request sim technique
errors, multivariate normal errors with mean zero and covariance matrix as specified
in the errorvariance() option are added during the simulations. However, specifying
sim technique betas does not affect the equations because there is no covariance matrix
associated with the coefficients.

194

forecast coefvector Specify an equation via a coefficient vector

4. You specify both variance() and errorvariance(). The equations represented by this
coefficient vector are stochastic, and forecast solve treats the coefficient vector just like
an estimation result. sim techniques betas, residuals, and errors all work as expected.

Methods and formulas


Let denote the 1 k coefficient vector being added. Then the matrix specified in the variance()
option must be k k . Row and column names for that matrix are ignored.
Let m denote the number of equations represented by . That is, if is stored as Stata matrix
beta and local macro m is to hold the number of equations, then in Stata parlance,
. local eqnames : coleq beta
. local eq : list uniq eqnames
. local m : list sizeof eq

Then the matrix specified in the errorvariance option must be m m. Row and column names
for that matrix are ignored.

Also see
[TS] forecast Econometric model forecasting
[TS] forecast solve Obtain static and dynamic forecasts
[P] matrix Introduction to matrix commands
[P] matrix rownames Name rows and columns

Title
forecast create Create a new forecast model
Syntax

Description

Option

Remarks and examples

Also see

Syntax
forecast create

name

 

, replace

name is an optional name that can be given to the model. name must follow the naming conventions
described in [U] 11.3 Naming conventions.

Description
forecast create creates a new forecast model in Stata.

Option
replace causes Stata to clear the existing model from memory before creating name. You may have
only one model in memory at a time. By default, forecast create issues an error message if
another model is already in memory.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. The forecast create command creates a new forecast model
in Stata. You must create a model before you can add equations or solve it. You can have only one
model in memory at a time.
You may optionally specify a name for your model. That name will appear in the output produced
by the various forecast subcommands.

Example 1
Here we create a model named salesfcast:
. forecast create salesfcast
Forecast model salesfcast started.

Technical note
Warning: Do not type clear all, clear mata, or clear results after creating a forecast
model with forecast create unless you intend to remove your forecast model. Typing clear all
or clear mata eliminates the internal structures used to store your forecast model. Typing clear
results clears all estimation results from memory. If your forecast model includes estimation results
that rely on the ability to call predict, you will not be able to solve your model.
195

196

forecast create Create a new forecast model

Also see
[TS] forecast Econometric model forecasting
[TS] forecast clear Clear current model from memory

Title
forecast describe Describe features of the forecast model

Syntax
Reference

Description
Also see

Options

Remarks and examples

Stored results

Syntax
Describe the current forecast model


forecast describe , options
Describe particular aspects of the current forecast model


forecast describe aspect , options

aspect

Description

estimates
coefvector
identity
exogenous
adjust
solve
endogenous

estimation results
coefficient vectors
identities
declared exogenous variables
adjustments to endogenous variables
forecast solution information
all endogenous variables

options

Description

brief
detail

provide a one-line summary


provide more-detailed information

Specifying detail provides no additional information with aspects exogenous, endogenous, and solve.

Description
forecast describe displays information about the forecast model currently in memory. For
example, you can type forecast describe endogenous to obtain information regarding all the
endogenous variables in the model. Typing forecast describe without specifying a particular
aspect of the model is equivalent to typing forecast describe aspect for every aspect in the table
above and can result in more output than you want, particularly if you specify the detail option.

Options
brief requests that forecast describe produce a one-sentence summary of the aspect specified.
For example, forecast describe exogenous, brief will tell you just the current forecast
models name and the number of exogenous variables in the model.
197

198

forecast describe Describe features of the forecast model

detail requests a more-detailed description of the aspect specified. For example, typing forecast
describe estimates lists all the estimation results added to the model using forecast estimates, the estimation commands used, and the number of left-hand-side variables in each estimation
result. When you specify forecast describe estimates, detail, the output includes a list
of all the left-hand-side variables entered with forecast estimates.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast describe displays information about the forecast
model currently in memory. You can obtain either all the information at once or information about
individual aspects of your model, whereby we use the word aspect to refer to, for example, just
the estimation results, identities, or solution information.

Example 1
In example 1 of [TS] forecast, we created and forecasted Kleins (1950) model of the U.S. economy.
Here we obtain information about all the endogenous variables in the model:
. forecast describe endogenous
Forecast model kleinmodel contains 7 endogenous variables:
Variable
1.
2.
3.
4.
5.
6.
7.

c
i
wp
y
p
k
w

Source
estimates
estimates
estimates
identity
identity
identity
identity

# adjustments
0
0
0
0
0
0
0

As we mentioned in [TS] forecast, there are seven endogenous variables in this model. Three of those
variables (c, i, and wp) were left-hand-side variables in equations we fitted and added to our forecast
model with forecast estimates. The other four variables were defined by identities added with
forecast identity. The right-hand column of the table indicates that none of our endogenous
variables contains adjustments specified using forecast adjust.
We can obtain more information about the estimated equations in our model using forecast
describe estimates:
. forecast describe estimates, detail
Forecast model kleinmodel contains 1 estimation result:
Estimation
result
1. klein

Command
reg3

LHS variables
c
i
wp

Our model has one estimation result, klein, containing results produced by the reg3 command. If
we had not specified the detail option, forecast describe estimates would have simply stated
the number of left-hand-side variables (3) rather than listing them.

forecast describe Describe features of the forecast model

199

At the end of example 1 in [TS] forecast, we obtained dynamic forecasts beginning in 1936. Here
we obtain information about the solution:
. forecast describe solve
Forecast model kleinmodel has been solved:
Forecast horizon
Begin
End
Number of periods
Forecast variables
Prefix
Number of variables
Storage type
Type of forecast
Solution
Technique
Maximum iterations
Tolerance for function values
Tolerance for function zero

1936
1941
6
d_
7
float
Dynamic
Damped Gauss-Seidel (0.200)
500
1.0e-09
(not applicable)

We obtain information about the forecast horizon, how the variables holding our forecasts were
created and stored, and the solution technique used. If we had used the simulate() option with
forecast solve, we would have obtained information about the types of simulations performed and
the variables used to hold the results.

Stored results
When you specify option brief, only a limited number of results are stored. In the tables
below, a superscript B indicates results that are available even after brief is specified. forecast
coefvector saves certain results only if detail is specified; these are indicated by superscript D.
Typing forecast describe without specifying an aspect does not return any results.
forecast describe estimates stores the following in r():
Scalars
r(n estimates)B
r(n lhs)
Macros
r(model)B
r(lhs)
r(estimates)

number of estimation results


number of left-hand-side variables defined by estimation results
name of forecast model, if named
left-hand-side variables
names of estimation results

forecast describe identity stores the following in r():


Scalars
r(n identities)B
Macros
r(model)B
r(lhs)
r(identities)

number of identities
name of forecast model, if named
left-hand-side variables
list of identities

200

forecast describe Describe features of the forecast model

forecast describe coefvector stores the following in r():


Scalars
r(n coefvectors)B number of coefficient vectors
r(n lhs)B
number of left-hand-side variables defined by coefficient vectors
Macros
r(model)B
r(lhs)
r(rhs)D
r(names)
r(Vnames)D
r(Enames)D

name of forecast model, if named


left-hand-side variables
right-hand-side variables
names of coefficient vectors
names of variance matrices (. if not specified)
names of error variance matrices (. if not specified)

forecast describe exogenous stores the following in r():


Scalars
r(n exogenous)B
Macros
r(model)B
r(exogenous)

number of declared exogenous variables


name of forecast model, if named
declared exogenous variables

forecast describe endogenous stores the following in r():


Scalars
r(n endogenous)B

number of endogenous variables

Macros
r(model)B
r(varlist)
r(source list)
r(adjust cnt)

name of forecast model, if named


endogenous variables
sources of endogenous variables (estimates, identity, coefvector)
number of adjustments per endogenous variable

forecast describe solve stores the following in r():


Scalars
r(periods)
r(Npanels)
r(Nvar)
r(damping)
r(maxiter)
r(vtolerance)
r(ztolerance)
r(sim nreps)
Macros
r(solved)B
r(model)B
r(actuals)
r(double)
r(static)
r(begin)
r(end)
r(technique)
r(sim technique)
r(prefix)
r(suffix)
r(sim prefix i)
r(sim suffix i)
r(sim stat i)

number of periods forecast per panel


number of panels forecast
number of forecast variables
damping parameter for damped GaussSeidel
maximum number of iterations
tolerance for forecast values
tolerance for function zero
number of simulations
solved, if the model has been solved
name of forecast model, if named
actuals, if specified with forecast solve
double, if specified with forecast solve
static, if specified with forecast solve
first period in forecast horizon
last period in forecast horizon
solver technique
specified sim technique
forecast variable prefix
forecast variable suffix
ith simulation statistic prefix
ith simulation statistic suffix
ith simulation statistic

forecast describe Describe features of the forecast model

forecast describe adjust stores the following in r():


Scalars
r(n adjustments)B total number of adjustments
r(n adjust vars)B number of variables with adjustments
Macros
r(model)B
r(varlist)
r(adjust cnt)
r(adjust list)

name of forecast model, if named


variables with adjustments
number of adjustments per endogenous variable
list of adjustments

Reference
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.

Also see
[TS] forecast Econometric model forecasting
[TS] forecast list List forecast commands composing current model

201

Title
forecast drop Drop forecast variables

Syntax
Also see

Description

Options

Remarks and examples

Stored results

Syntax
forecast drop

, options

options

Description

prefix(string)
suffix(string)

specify prefix for forecast variables


specify suffix for forecast variables

You can specify prefix() or suffix() but not both.

Description
forecast drop drops variables previously created by forecast solve.

Options
prefix(string) and suffix(string) specify either a name prefix or a name suffix that will be used to
identify forecast variables to be dropped. You may specify prefix() or suffix() but not both.
By default, forecast drop removes all forecast variables produced by the previous invocation
of forecast solve.
Suppose, however, that you previously specified the simulate() option with forecast solve
and wish to remove variables containing simulation results but retain the variables containing the
point forecasts. Then you can use the prefix() or suffix() option to identify the simulation
variables you want dropped.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast drop safely removes variables previously created
using forecast solve. Say you previously solved your model and created forecast variables that
were suffixed with f. Do not type
. drop *_f

to remove those variables from the dataset. Rather, type


. forecast drop

202

forecast drop Drop forecast variables

203

The former command is dangerous: Suppose you were given the dataset and asked to produce the
forecast. The person who previously worked with the dataset created other variables that ended with
f. Using drop would remove those variables as well. forecast drop removes only those variables
that were previously created by forecast solve based on the model in memory.
If you do not specify any options, forecast drop removes all the forecast variables created by
the current model, including the variables that contain the point forecasts as well as any variables
that contain simulation results specified by the simulate() option with forecast solve. Suppose
you had typed
. forecast solve, prefix(s_) simulate(betas, statistic(stddev, prefix(sd_)))

Then if you type


. forecast drop, prefix(sd_)

forecast drop will remove the variables containing the standard deviations of the forecasts and
will leave the variables containing the point forecasts (prefixed with s ) untouched.
forecast drop does not exit with an error if a variable it intends to drop does not exist in the
dataset.

Stored results
forecast drop stores the following in r():
Scalars
r(n dropped)

number of variables dropped

Also see
[TS] forecast Econometric model forecasting
[TS] forecast solve Obtain static and dynamic forecasts

Title
forecast estimates Add estimation results to a forecast model
Syntax

Description

Options

Remarks and examples

References

Also see

Syntax
Add estimation result currently in memory to model


forecast estimates name , options
name is the name of a stored estimation result; see [R] estimates store.

Add estimation result currently saved on disk to model




forecast estimates using filename , number(#) options
filename is an estimation results file created by estimates save; see [R] estimates save. If no file
extension is specified, .ster is assumed.
options

Description

predict(p options)


names(namelist , replace )
advise

call predict using p options


use namelist for names of left-hand-side variables
advise whether estimation results can be dropped from memory

Description
forecast estimates adds estimation results to the forecast model currently in memory. You
must first create a new model using forecast create before you can add estimation results with
forecast estimates. After estimating the parameters of an equation or set of equations, you must
use estimates store to store the estimation results in memory or use estimates save to save
them on disk before adding them to the model.

Options
predict(p options) specifies the predict options to use when predicting the dependent variables.
For a single-equation estimation command, you simply specify the appropriate options to pass to
predict. If multiple options are required, enclose them in quotation marks:
. forecast estimates ..., predict("pr outcome(#1)")
For a multiple-equation estimation command, you can either specify one set of options that will
be applied to all equations or specify p options, where p is the number of endogenous variables
being added. If multiple options are required for each equation, enclose each equations options
in quotes:
. forecast estimates ..., predict("pr eq(#1)" "pr eq(#2)")
204

forecast estimates Add estimation results to a forecast model

205

If you do not specify the eq() option for any of the equations, forecast automatically includes
it for you.
If you are adding results from a linear estimation command that forecast recognizes as one
whose predictions can be calculated as x0t , do not specify the predict() option, because this
will slow forecasts computation time substantially. Use the advise option to determine whether
forecast needs to call predict.
If you do not specify any predict options, forecast uses the default type of prediction for the
command whose results are being added.


names(namelist , replace ) instructs forecast estimates to use namelist as the names of the
left-hand-side variables in the estimation result being added. You must use this option if any of
the left-hand-side variables contains time-series operators. By default, forecast estimates uses
the names stored in the e(depvar) macro of the results being added.
forecast estimates creates a new variable in the dataset for each element of namelist. If a
variable of the same name already exists in your dataset, forecast estimates exits with an
error unless you specify the replace option, in which case existing variables are overwritten.
advise requests that forecast estimates report a message indicating whether the estimation
results being added can be removed from memory. This option is useful if you expect your model
to contain more than 300 sets of estimation results, the maximum number that Stata allows you to
store in memory; see [R] limits. This option also provides an indication of the speed with which
the model can be solved: forecast executes much more slowly with estimation results that must
remain in memory.
number(#), for use with forecast estimates using, specifies that the #th set of estimation results
from filename be loaded. This assumes that multiple sets of estimation results have been saved
in filename. The default is number(1). See [R] estimates save for more information on saving
multiple sets of estimation results in a single file.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast estimates adds stochastic equations previously fit
by Stata estimation commands to a forecast model.
Remarks are presented under the following headings:
Introduction
The advise option
Using saved estimation results
The predict option
Forecasting with ARIMA models

Introduction
After you fit an equation that will become a part of your model, you must use either estimates
store to store the estimation results in memory or estimates save to save the estimation results
to disk. Then you can use forecast estimates to add that equation to your model.
We usually refer to equation in the singular, but of course, you can also use a multiple-equation
estimation command to fit several equations at once and add them to the model. When we discuss
adding a stochastic equation to a model, we really mean adding a single estimation result.

206

forecast estimates Add estimation results to a forecast model

In this discussion, we also need to make a distinction between making a forecast and obtaining a
prediction. We use the word predict to refer to the process of obtaining a fitted value for a single
equation, just as you can use the predict command to obtain fitted values, residuals, or other statistics
after fitting a model with an estimation command. We use the word forecast to mean finding a
solution to the complete set of equations that compose the forecast model. The iterative techniques
we use to solve the model and produce forecasts require that we be able to obtain predictions from
each of the equations in the model.

Example 1: A simple example


Here we illustrate how to add estimation results from a regression model in which none of
the left-hand-side variables contains time-series operators or mathematical transformations. We use
quietly with the estimation command because the output is not relevant here. We type
.
.
.
.

use http://www.stata-press.com/data/r13/klein2
quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g)
estimates store klein
forecast create kleinmodel
Forecast model kleinmodel started.
. forecast estimates klein
Added estimation results from reg3.
Forecast model kleinmodel now contains 3 endogenous variables.

forecast estimates indicated that three endogenous variables were added to the forecast model.
That is because we specified three equations in our call to reg3. As we mentioned in example 1 in
[TS] forecast, the endog() option of reg3 has no bearing on forecast. All that matters are the
three left-hand-side variables.

Technical note
When you add an estimation result to your forecast model, forecast looks at the macro e(depvar)
to determine the endogenous variables being added. If that macro is empty, forecast tries a few
other macros to account for nonstandard commands. The number of endogenous variables being added
to the model is based on the number of words found in the macro containing the dependent variables.

You can fit equations with the D. and S. first- and seasonal-difference time-series operators
adorning the left-hand-side variables, but in those cases, when you add the equations to the model,
you must use the names() option of forecast estimates. When you specify names(namelist),
forecast estimates uses namelist as the names of the newly declared endogenous variables and
ignores what is in e(depvar). Moreover, forecast does not automatically undo the operators on
left-hand-side variables. For example, you might fit a regression with D.x as the regressand and then
add it to the model using forecast estimates . . ., name(Dx). In that case, forecast will solve
the model in terms of Dx. You must add an identity to convert Dx to the corresponding level variable
x, as the next example illustrates.
Of course, you are free to use the D., S., and L. time-series operators on endogenous variables
when they appear on the right-hand sides of equations. It is only when D. or S. appears on the
left-hand side that you must use the names() option to provide alternative names for them. You
cannot add equations to models for which the L. operator appears on left-hand-side variables. You
cannot use the F. forward operator anywhere in forecast models.

forecast estimates Add estimation results to a forecast model

207

Example 2: Differenced and log-transformed dependent variables


Consider the following model:

D.logC = 10 + 11 D.logW + 12 D.logY + u1t


logW = 20 + 21 L.logW + 22 M + 23 logY + 24 logC + u2t

(1)
(2)

Here logY and M are exogenous variables, so we will assume they are filled in over the forecast
horizon before solving the model. Ultimately, we are interested in forecasting C and W. However,
the first equation is specified in terms of changes in the logarithm of C, and the second equation is
specified in terms of the logarithm of W.
We will refer to variables and transformations like logC, D.logC, and C as related variables
because they are related to one another by simple mathematical functions. Including the related
variables, we in fact have a five-equation model with two stochastic equations and three identities:

dlogC = 10 + 11 D.logW + 12 D.logY + u1t


logC = L.logC + dlogC
C = exp(logC)
logW = 20 + 21 L.logW + 22 M + 23 logY + 24 logC + u2t
W = exp(logW)
To fit (1) and (2) in Stata and create a forecast model, we type
. use http://www.stata-press.com/data/r13/fcestimates, clear
(1978 Automobile Data)
. quietly regress D.logC D.logW D.logY
. estimates store dlogceq
. quietly regress logW L.logW M logY logC
. estimates store logweq
. forecast create cwmodel, replace
(Forecast model kleinmodel ended.)
Forecast model cwmodel started.
. forecast estimates dlogceq, names(dlogC)
Added estimation results from regress.
Forecast model cwmodel now contains 1 endogenous variable.
. forecast identity logC = L.logC + dlogC
Forecast model cwmodel now contains 2 endogenous variables.
. forecast identity C = exp(logC)
Forecast model cwmodel now contains 3 endogenous variables.
. forecast estimates logweq
Added estimation results from regress.
Forecast model cwmodel now contains 4 endogenous variables.
. forecast identity W = exp(logW)
Forecast model cwmodel now contains 5 endogenous variables.

Because the left-hand-side variable in (1) contains a time-series operator, we had to use the names()
option of forecast estimates when adding that equations estimation results to our forecast model.
Here we named this endogenous variable dlogC. We then added the other four equations to our
model. In general, when we have a set of related variables, we prefer to specify the identities right
after we add the stochastic equation so that we do not forget about them.

208

forecast estimates Add estimation results to a forecast model

Technical note
In the previous example, we undid the log-transformations by simply exponentiating the logarithmic variable. However, that is only an approximation that does not work well in many applications.
Suppose we fit the linear regression model
ln yt = x0t + ut
where ut is a zero-mean regression error term. Then E(yt |xt ) = exp(x0t ) E{exp(ut )}. Although
E(ut ) = 0, Jensens inequality suggests that E{exp(ut )} =
6 1, implying that we cannot predict yt
by simply taking the exponential of the linear prediction x0t .
If we assume that ut N (0, 2 ), then E{exp(ut )} = exp( 2 /2). Moreover, many estimation
commands like regress provide an estimate
b2 of 2 , so for regression models that contain a
logarithmic dependent variable, we can obtain better forecasts for the dependent variable in levels if
we approximate E{exp(ut )} as exp(b
2 /2). Suppose we run the regression
. regress lny x1 x2 x3
. estimates store myreg

then we could add lny and y as endogenous variables like this:


. forecast estimates lny
. forecast identity y = exp(lny)*=e(rmse)^2 / 2

In the second command, Stata will first evaluate the expression =e(rmse)^2 / 2 and replace it with
its numerical value. After regress, the macro e(rmse) contains the square root of the estimate of

b2 , so the value of this expression will be our estimate of E{exp(ut )}. Then forecast will forecast
y as the product of this number and exp(lny). Here we had to use a macro expression including
an equals sign to force Stata to evaluate the expression immediately and obtain the expressions
value. Identities are not associated with estimation results, so as soon as we used another estimation
command or restored some other estimation results (perhaps unknowingly by invoking forecast
solve), our reference to e(rmse) would no longer be meaningful. See [U] 18.3.8 Macro expressions
for more information on macro evaluation.
Another alternative would be to use Duans (1983) smearing technique. Stata code for this is
provided in Cameron and Trivedi (2010).
A third alternative is to use the generalized linear model (GLM) as implemented by the glm
command with a log-link function. In a GLM framework, we would be modeling ln {E(yt )} rather
than E { ln(yt )} because we would be using regress, but oftentimes, the two quantities are similar.
Moreover, obtaining predicted values for yt in the GLM does not present the transformation problem as
happens with linear regression. The forecast commands contain special code to handle estimation
results obtained by using glm with the link(log) option, and you do not need to specify an identity
to obtain y as a function of lny. All you would need to do is
. glm y x1 x2 x3, link(log)
. estimates store myglm
. forecast estimates myglm

forecast estimates Add estimation results to a forecast model

209

The advise option


To produce forecasts from your model, forecast must be able to obtain predictions for each
estimation result that you have added. For many of the most commonly used estimation commands such
as regress, ivregress, and var, forecast includes special code to quickly obtain these predictions.
For estimation commands that either require more involved computations to obtain predictions or
are not widely used in forecasting, forecast instead relies on the predict command to obtain
predictions.
The advise option of forecast estimates advises you as to whether forecast includes the
special code to obtain fast predictions for the command whose estimation results are being added
to the model. For example, here we use advise with forecast estimates when building the
Klein (1950) model.

Example 3: Using the advise option


.
.
.
.

use http://www.stata-press.com/data/r13/klein2, clear


quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g)
estimates store klein
forecast create kleinmodel, replace
(Forecast model cwmodel ended.)
Forecast model kleinmodel started.
. forecast estimates klein, advise
(These estimation results are no longer needed; you can drop them.)
Added estimation results from reg3.
Forecast model kleinmodel now contains 3 endogenous variables.

After we typed forecast estimates, Stata advised us that [t]hese estimation results are no longer
needed; you can drop them. That means forecast includes code to obtain predictions from reg3
without having to call predict. forecast has recorded all the information it needs about the
estimation results stored in klein, and we could type
. estimates drop klein

to remove those estimates from memory.

For relatively small models, there is no need to use estimates drop to remove estimation results
from memory. However, Stata allows no more than 300 sets of estimation results to be in memory
at once, and forecast solve requires estimation results to be in memory (and not merely saved
on disk) before it can produce forecasts. For very large models in which that limit may bind, you
can use the advise option to determine which estimation results are needed to solve the model and
which can be dropped.
Suppose we had estimation results from a command for which forecast must call predict to
obtain predictions. Then instead of obtaining the note saying the estimation results were no longer
needed, we would obtain a note stating
. forecast estimates IUsePredict
(These estimation results are needed to solve the model.)

In that case, the estimation results would need to be in memory before calling forecast solve.
The advise option also provides an indication of how quickly forecasts can be produced from
the model. Models for which forecast never needs to call predict can be solved much more
quickly than models that include equations for which forecast must restore estimation results and
call predict to obtain predictions.

210

forecast estimates Add estimation results to a forecast model

Using saved estimation results


Statas estimates commands allow you to save estimation results to disk so that they are available
in subsequent Stata sessions. You can use the using option of forecast estimates to use estimation
results saved on disk without having to first call estimates use. In fact, estimates use can even
retrieve estimation results stored on a website, as the next example demonstrates.

Example 4: Adding saved estimation results


The file klein.ster contains the estimation results produced by reg3 for the three stochastic
equations of Kleins (1950) model. That file is stored on the Stata Press website in the same location
as the example datasets. Here we create a forecast model and add those results:
. use http://www.stata-press.com/data/r13/klein2
. forecast create example4, replace
(Forecast model kleinmodel ended.)
Forecast model example4 started.
. forecast estimates using http://www.stata-press.com/data/r13/klein
Added estimation results from reg3.
Forecast model example4 now contains 3 endogenous variables.

If you do not specify a file extension, forecast estimates assumes the file ends in .ster. You
are more likely to save your estimation results on your computers disk drive rather than a web server,
but in either case, this example shows that you can fit equations in one session of Stata, save the
results to disk, and then build your forecast model later.

The estimates save command allows you to save multiple estimation results to the same file and
numbers them sequentially starting at 1. You can use the number() option of forecast estimates
using to specify which set of estimation results from the specified file you wish to add to the forecast
model. If you do not specify number(), forecast estimates using uses the first set of results.
When you use forecast estimates using, forecast loads the estimation results from disk
and stores them in memory using a temporary name. Later, when you proceed to solve your model,
forecast checks to see whether those estimation results are still in memory. If not, it will attempt
to reload them from the file you had specified. You should therefore not move or rename estimation
result files between the time you add them to your model and the time you solve the model.

The predict option


As we mentioned while discussing the advise option, the forecast commands include code
to quickly obtain predictions from some of the most commonly used commands, while they use
predict to obtain predictions from other estimation commands. When you add estimation results
that require forecast to use predict, by default, forecast assumes that it can pass the option
xb on to predict to obtain the appropriate predicted values. You use the predict() option of
forecast estimates to specify the option that predict must use to obtain predicted values from
the estimates being added.
For example, suppose you used tobit to fit an equation whose dependent variable is left-censored
at zero and then stored the estimation results under the name tobitreg. When solving the model,
you want to use the predicted values of the left-truncated mean, the expected value of the dependent
variable conditional on its being greater than zero. Looking at the Syntax for predict in [R] tobit
postestimation, we see that the appropriate option we must pass to predict is e(0,.). To add this
estimation result to an existing forecast model, we would therefore type

forecast estimates Add estimation results to a forecast model

211

. forecast estimates tobitreg, predict(e(0,.))

Now, whenever forecast calls predict with those estimation results, it will pass the option e(0,.)
so that we obtain the appropriate predictions. If you are adding results from a multiple-equation
estimation command with k dependent variables, then you must specify k predict options within
the predict() option, separated by spaces.

Forecasting with ARIMA models


Practitioners often use ARIMA models to forecast some of the variables in their models, and you
can certainly use estimation results produced by commands such as arima with forecast. There are
just two rules to follow when using commands that use the Kalman filter to obtain predictions. First,
do not specify the predict() option with forecast estimates. The forecast commands know
how to handle these estimators automatically. Second, as we stated earlier, the forecast commands
do not undo any time-series operators that may adorn the left-hand-side variables of estimation
results, so you must use forecast identity to specify identities to recover the underlying variables
in levels.

Example 5: An ARIMA model with first- and seasonal-differencing


wpi1.dta contains quarterly observations on the variable wpi. First, lets fit a multiplicative
seasonal ARIMA model with both first- and seasonal-difference operators applied to the dependent
variable and store the estimation results:
. use http://www.stata-press.com/data/r13/wpi1
. arima wpi, arima(1, 1, 1) sarima(1, 1, 1, 4)
(output omitted )
. estimates store arima

(For details on fitting seasonal ARIMA models, see [TS] arima).


With the difference operators used here, when forecast calls predict, it will obtain predictions
in terms of DS4.wpi. Using the definitions of time-series operators in [TS] tsset, we have
DS4.wpit = (wpit wpit4 ) (wpit1 wpit5 )
so that
wpit = DS4.wpit + wpit4 + (wpit1 wpit5 )
Because our arima results include a dependent variable with time-series operators, we must use the
name() option of forecast estimates to specify an alternative variable name. We will name ours
ds4wpi. Then we can specify an identity by using the previous equation to recover our forecasts in
terms of wpi. We type
. forecast create arimaexample, replace
(Forecast model example4 ended.)
Forecast model arimaexample started.
. forecast estimates arima, name(ds4wpi)
Added estimation results from arima.
Forecast model arimaexample now contains 1 endogenous variable.
. forecast identity wpi = ds4wpi + L4.wpi + (L.wpi - L5.wpi)
Forecast model arimaexample now contains 2 endogenous variables.

212

forecast estimates Add estimation results to a forecast model


. forecast solve, begin(tq(1988q1))
Computing dynamic forecasts for model arimaexample.
Starting period:
Ending period:
Forecast prefix:

1988q1
1990q4
f_

1988q1: .............
1988q2: ...............
1988q3: ...............
(output omitted )
1990q4: ............
Forecast 2 variables spanning 12 periods.

Because our entire forecast model consists of a single equation fit by arima, we can also call predict
to obtain forecasts:
. predict a_wpi, y dynamic(tq(1988q1))
(5 missing values generated)
. list t f_wpi a_wpi in -5/l

120.
121.
122.
123.
124.

f_wpi

a_wpi

1989q4
1990q1
1990q2
1990q3
1990q4

110.2182
111.6782
112.9945
114.3281
115.5142

110.2182
111.6782
112.9945
114.3281
115.5142

Looking at the last few observations in the dataset, we see that the forecasts produced by forecast
(f wpi) match those produced by predict (a wpi). Of course, the advantage of forecast is that
we can combine multiple sets of estimation results and obtain forecasts for an entire system of
equations.

Technical note
Do not add estimation results to your forecast model that you have stored after calling an estimation
command with the by: prefix. The stored estimation results will contain information from only the
last group on which the estimation command was executed. forecast will then use those results for
all observations in the forecast horizon regardless of the value of the group variable you specified
with by:.

References
Cameron, A. C., and P. K. Trivedi. 2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press.
Duan, N. 1983. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical
Association 78: 605610.
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.

forecast estimates Add estimation results to a forecast model

Also see
[TS] forecast Econometric model forecasting
[R] estimates Save and manipulate estimation results
[R] predict Obtain predictions, residuals, etc., after estimation

213

Title
forecast exogenous Declare exogenous variables

Syntax

Description

Remarks and examples

Also see

Syntax
forecast exogenous varlist

Description
forecast exogenous declares exogenous variables in the current forecast model.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast exogenous declares exogenous variables in your
forecast model.
Before you can solve your model, all the exogenous variables must be filled in with nonmissing
values over the entire forecast horizon. When you use forecast solve, Stata first checks your
exogenous variables and exits with an error message if any of them contains missing values for any
periods being forecast. When you assemble a large model with many variables, it is easy to forget
some variables and then have problems obtaining forecasts. forecast exogenous provides you with
a mechanism to explicitly declare the exogenous variables in your model so that you do not forget
about them.
Declaring exogenous variables with forecast exogenous is not explicitly necessary, but we
nevertheless strongly encourage doing so. Stata can check the exogenous variables before solving the
model and issue an appropriate error message if missing values are found, whereas troubleshooting
models for which forecasting failed is more difficult after the fact.

Example 1
Here we fit a simple single-equation dynamic model with two exogenous variables, x1 and x2:
. use http://www.stata-press.com/data/r13/forecastex1
. quietly regress y L.y x1 x2
. estimates store exregression
. forecast create myexample
Forecast model myexample started.
. forecast estimates exregression
Added estimation results from regress.
Forecast model myexample now contains 1 endogenous variable.
. forecast exogenous x1
Forecast model myexample now contains 1 declared exogenous variable.
. forecast exogenous x2
Forecast model myexample now contains 2 declared exogenous variables.

214

forecast exogenous Declare exogenous variables

Instead of using forecast exogenous twice, we could have instead typed


. forecast exogenous x1 x2

Also see
[TS] forecast Econometric model forecasting

215

Title
forecast identity Add an identity to a forecast model
Syntax
Also see

Description

Options

Remarks and examples

Stored results

Syntax
forecast identity varname = exp

, options

options

Description

generate
double

create new variable varname


store new variable as a double instead of as a float

varname is the name of an endogenous variable to be added to the forecast model.

You can only specify double if you also specify generate.

Description
forecast identity adds an identity to the forecast model currently in memory. You must
first create a new model using forecast create before you can add an identity with forecast
identity. An identity is a nonstochastic equation that expresses an endogenous variable in the model
as a function of other variables in the model. Identities often describe the behavior of endogenous
variables that are based on accounting identities or adding-up conditions.

Options
generate specifies that the new variable varname be created equal to exp for all observations in the
current dataset. By default, forecast identity exits with an error if varname does not exist.
double, for use in conjunction with the generate option, requests that the new variable be created
as a double. By default, the new variable is created as a float. See [D] data types.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast identity specifies a nonstochastic equation that
determines the value of an endogenous variable in the model. When you type
. forecast identity varname = exp

forecast identity registers varname as an endogenous variable in your forecast model that is
equal to exp, where exp is a valid Stata expression that is typically a function of other endogenous
variables and exogenous variables in your model and perhaps lagged values of varname as well.
forecast identity was used in all the examples in [TS] forecast.

216

forecast identity Add an identity to a forecast model

217

Example 1: Variables with constant growth rates


Some models contain variables that you are willing to assume will grow at a constant rate throughout
the forecast horizon. For example, say we have a model using annual data and want to assume that
our population variable pop grows at 0.75% per year. Then we can declare endogenous variable pop
by using forecast identity:
. forecast identity pop = 1.0075*L.pop

Typically, you use forecast identity to define the relationship that determines an endogenous
variable that is already in your dataset. For example, in example 1 of [TS] forecast, we used forecast
identity to define total wages as the sum of government and private-sector wages, and the total
wage variable already existed in our dataset.
The generate option of forecast identity is useful when you wish to use a transformation of
one or more endogenous variables as a right-hand-side variable in a stochastic equation that describes
another endogenous variable. For example, say you want to use regress to model variable y as
a function of the ratio of two endogenous variables, u and w, as well as other covariates. Without
the generate option of forecast identity, you would have to define the variable y = u/w
twice: first, you would have to use the generate command to create the variable before fitting your
regression model, and then you would have to use forecast identity to add an identity to your
forecast model to define y in terms of u and w. Assuming you have already created your forecast
model, the generate option allows you to define the ratio variable just once, before you fit the
regression equation. In this example, the ratio variable is easy enough to specify twice, but it is very
easy to forget to include identities that define regressors used in estimation results while building
large forecast models. In other cases, an endogenous variable may be a more complicated function of
other endogenous variables, so having to specify the function only once reduces the chance for error.

Stored results
forecast identity stores the following in r():
Macros
r(lhs)
r(rhs)
r(basenames)
r(fullnames)

left-hand-side (endogenous) variable


right-hand side of identity
base names of variables found on right-hand side
full names of variables found on right-hand side

Also see
[TS] forecast Econometric model forecasting

Title
forecast list List forecast commands composing current model
Syntax

Description

Options

Remarks and examples

Reference

Also see

Syntax
forecast list

, options

Description


saving(filename , replace ) save list of commands to file
notrim
do not remove extraneous white space

options

Description
forecast list produces a list of forecast commands issued since the current model was
started.

Options


saving(filename , replace ) requests that forecast list write the list of commands to disk
with filename. If no extension is specified, .do is assumed. If filename already exists, an error is
issued unless you specify replace, in which case the file is overwritten.
notrim requests that forecast list not remove any extraneous spaces and that commands be
shown exactly as they were originally entered. By default, superfluous white space is removed.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast list produces a list of all the forecast commands
you would need to enter to re-create the forecast model currently in memory. Unlike using a command
log, forecast list only shows the forecast-related commands but not any estimation command
or other commands you may have issued. If you specify saving(filename), forecast list saves
the list as filename.do, which you can then edit using the Do-file Editor.
forecast creates models by accumulating estimation results, identities, and other features that you
add to the model by using various forecast subcommands. Once you add a feature to a model, it
remains a part of the model until you clear the entire model from memory. forecast list provides
a list of all the forecast commands you would need to rebuild the current model.
When building all but the smallest forecast models, you will typically write a do-file to load
your dataset, perhaps call some estimation commands, and issue a sequence of forecast commands
to build and solve your forecast model. There are times, though, when you will type a forecast
command interactively and then later want to undo the command or else wish you had not typed the
command in the first place. forecast list provides the solution.
218

forecast list List forecast commands composing current model

219

Suppose you use forecast adjust to perform some policy simulations and then decide you want
to remove those adjustments from the model. forecast list makes this easy to do. You simply
call forecast list with the saving() option to produce a do-file that contains all the forecast
commands issued since the model was created. Then you can edit the do-file to remove the forecast
adjust command, type forecast clear, and run the do-file.

Example 1: Kleins model


In example 1 of [TS] forecast, we obtained forecasts from Kleins (1950) macroeconomic model.
If we type forecast list after typing all the commands in that example, we obtain
. forecast list
forecast create kleinmodel
forecast estimates klein
forecast identity y = c + i + g
forecast identity p = y - t - wp
forecast identity k = L.k + i
forecast identity w = wg + wp
forecast exogenous wg
forecast exogenous g
forecast exogenous t
forecast exogenous yr

The forecast solve command is not included in output produced by forecast list because
solving the model does not add any features to the model.

Technical note
To prevent you from accidentally destroying the model in memory, forecast list does not add
the replace option to forecast create even if you specified replace when you originally called
forecast create.

Reference
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.

Also see
[TS] forecast Econometric model forecasting

Title
forecast query Check whether a forecast model has been started

Syntax

Description

Remarks and examples

Stored results

Also see

Syntax
forecast query

Description
forecast query issues a message indicating whether a forecast model has been started.

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. forecast query allows you to check whether a forecast model
has been started. Most users of the forecast commands will not need to use forecast query.
This command is most useful to programmers.
Suppose there is no forecast model in memory:
. forecast query
No forecast model exists.

Now we create a forecast model named fcmodel:


. forecast
Forecast
. forecast
Forecast

create fcmodel
model fcmodel started.
query
model fcmodel exists.

Stored results
forecast query stores the following in r():
Scalars
r(found)
Macros
r(name)

1 if model started; 0 otherwise


model name

Also see
[TS] forecast Econometric model forecasting
[TS] forecast describe Describe features of the forecast model

220

Title
forecast solve Obtain static and dynamic forecasts
Syntax
Stored results

Description
Methods and formulas

Options
References

Remarks and examples


Also see

Syntax
forecast solve

options
Model

prefix(string)
suffix(string)
begin(time constant)
end(time constant)
periods(#)
double
static
actuals

prefix(stub) | suffix(stub)

options

Description
specify prefix for forecast variables
specify suffix for forecast variables
specify period to begin forecasting
specify period to end forecasting
specify number of periods to forecast
store forecast variables as doubles instead of as floats
produce static forecasts instead of dynamic forecasts
use actual values if available instead of forecasts

Simulation

simulate(sim technique, sim statistic sim options)


specify simulation technique and options
Reporting

log(log level)

specify level of logging display; log level may be detail,


on, brief, or off

Solver

vtolerance(#)
ztolerance(#)
iterate(#)
technique(technique)

specify tolerance for forecast values


specify tolerance for function zero
specify maximum number of iterations
specify solution method; may be dampedgaussseidel #,
gaussseidel, broydenpowell, or newtonraphson

You can specify prefix() or suffix() but not both.

You can specify end() or periods() but not both.


sim technique

Description

betas
errors
residuals

draw multivariate-normal parameter vectors


draw additive errors from multivariate normal distribution
draw additive residuals based on static forecast errors

You can specify one or two sim methods separated by a space, though you cannot specify both errors and residuals.

221

222

forecast solve Obtain static and dynamic forecasts

sim statistic is
statistic(statistic,



prefix(string) | suffix(string) )

and may be repeated up to three times.


statistic

Description

mean
variance
stddev

record the mean of the simulation forecasts


record the variance of the simulation forecasts
record the standard deviation of the simulation forecasts

sim options

Description

saving(filename, . . .)

save results to file; save statistics in double precision; save results to


filename every # replications
suppress replication dots
perform # replications; default is reps(50)

nodots
reps(#)

Description
forecast solve computes static or dynamic forecasts based on the model currently in memory.
Before you can solve a model, you must first create a new model using forecast create and add
equations and variables to it using the commands summarized in [TS] forecast.

Options


Model

prefix(string) and suffix(string) specify a name prefix or suffix that will be used to name the
variables holding the forecast values of the variables in the model. You may specify prefix() or
suffix() but not both. Sometimes, it is more convenient to have all forecast variables start with
the same set of characters, while other times, it is more convenient to have all forecast variables
end with the same set of characters.
If you specify prefix(f ), then the forecast values of endogenous variables x, y, and z will be
stored in new variables f x, f y, and f z.
If you specify suffix( g), then the forecast values of endogenous variables x, y, and z will be
stored in new variables x g, y g, and z g.
begin(time constant) requests that forecast begin forecasting at period time constant. By default,
forecast determines when to begin forecasting automatically.
end(time constant) requests that forecast end forecasting at period time constant. By default,
forecast produces forecasts for all periods on or after begin() in the dataset.
periods(#) specifies the number of periods after begin() to forecast. By default, forecast
produces forecasts for all periods on or after begin() in the dataset.
double requests that the forecast and simulation variables be stored in double precision. The default
is to use single-precision floats. See [D] data types for more information.

forecast solve Obtain static and dynamic forecasts

223

static requests that static forecasts be produced. Actual values of variables are used wherever
lagged values of the endogenous variables appear in the model. By default, dynamic forecasts are
produced, which use the forecast values of variables wherever lagged values of the endogenous
variables appear in the model. Static forecasts are also called one-step-ahead forecasts.
actuals specifies how nonmissing values of endogenous variables in the forecast horizon are treated.
By default, nonmissing values are ignored, and forecasts are produced for all endogenous variables.
When you specify actuals, forecast sets the forecast values equal to the actual values if they
are nonmissing. The forecasts for the other endogenous variables are then conditional on the known
values of the endogenous variables with nonmissing data.

Simulation

simulate(sim technique, sim statistic sim options) allows you to simulate your model to obtain
measures of uncertainty surrounding the point forecasts produced by the model. Simulating a
model involves repeatedly solving the model, each time accounting for the uncertainty associated
with the error terms and the estimated coefficient vectors.
sim technique can be betas, errors, or residuals, or you can specify both betas and one of
errors or residuals separated by a space. You cannot specify both errors and residuals.
The sim technique controls how uncertainty is introduced into the model.
sim statistic specifies a summary statistic to summarize the forecasts over all the simulations.
sim statistic takes the form
statistic(statistic, { prefix(string) | suffix(string) })
where statistic may be mean, variance, or stddev. You may specify either the prefix or the
suffix that will be used to name the variables that will contain the requested statistic. You
may specify up to three sim statistics, allowing you to track the mean, variance, and standard
deviations of your forecasts.


sim options include saving(filename, suboptions ), nodots, and reps(#).


saving(filename, suboptions ) creates a Stata data file (.dta file) consisting of (for each
endogenous variable in the model) a variable containing the simulated values.
double specifies that the results for each replication be saved as doubles, meaning 8-byte reals.
By default, they are saved as floats, meaning 4-byte reals.
replace specifies that filename be overwritten if it exists.
every(#) specifies that results be written to disk every #th replication. every() should be
specified only in conjunction with saving() when the command takes a long time for each
replication. This will allow recovery of partial results should some other software crash your
computer. See [P] postfile.
nodots suppresses display of the replication dots. By default, one dot character is displayed for
each successful replication. If during a replication convergence is not achieved, forecast
solve exits with an error message.
reps(#) requests that forecast solve perform # replications; the default is reps(50).

Reporting

log(log level) specifies the level of logging provided while solving the model. log level may be
detail, on, brief, or off.

224

forecast solve Obtain static and dynamic forecasts

log(detail) provides a detailed iteration log including the current values of the convergence
criteria for each period in each panel (in the case of panel data) for which the model is being
solved.
log(on), the default, provides an iteration log showing the current panel and period for which
the model is being solved as well as a sequence of dots for each period indicating the number of
iterations.
log(brief), when used with a time-series dataset, is equivalent to log(on). When used with a
panel dataset, log(brief) produces an iteration log showing the current panel being solved but
does not show which period within the current panel is being solved.
log(off) requests that no iteration log be produced.

Solver

vtolerance(#), ztolerance(#), and iterate(#) control when the solver of the system of
equations stops. ztolerance() is ignored if either technique(dampedgaussseidel #) or
technique(gaussseidel) is specified. These options are seldom used. See [M-5] solvenl( ).
technique(technique) specifies the technique to use to solve the system of equations. technique
may be dampedgaussseidel #, gaussseidel, broydenpowell, or newtonraphson, where
0 < # < 1 specifies the amount of damping with smaller numbers indicating less damping.
The default is technique(dampedgaussseidel 0.2), which works well in most situations.
If you have convergence issues, first try continuing to use dampedgaussseidel # but with a
larger damping factor. Techniques broydenpowell and newtonraphson usually work well, but
because they require the computation of numerical derivatives, they tend to be much slower. See
[M-5] solvenl( ).

Remarks and examples


For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you
have already read that manual entry. The forecast solve command solves a forecast model in
Stata. Before you can solve a model, you must first create a model using forecast create, and you
must add at least one equation using forecast estimates, forecast coefvector, or forecast
identity. We covered the most commonly used options of forecast solve in the examples in
[TS] forecast.
Here we focus on two sets of options that are available with forecast solve. First, we discuss
the actuals option, which allows you to obtain forecasts conditional on prespecified values for one
or more of the endogenous variables. Then we focus on performing simulations to obtain estimates
of uncertainty around the point forecasts.
Remarks are presented under the following headings:
Performing conditional forecasts
Using simulations to measure forecast accuracy

Performing conditional forecasts


Sometimes, you already know the values of some of the endogenous variables in the forecast
horizon and would like to obtain forecasts for the remaining endogenous variables conditional on
those known values. Other times, you may not know the values but would nevertheless like to specify
a path for some endogenous variables and see how the others would evolve conditional on that path.
To accomplish these types of exercises, you can use the actuals option of forecast solve.

forecast solve Obtain static and dynamic forecasts

225

Example 1: Specifying alternative scenarios


gdpoil.dta contains quarterly data on the annualized growth rate of GDP and the percentage
change in the quarterly average price of oil through the end of 2007. We want to explore how GDP
would have evolved if the price of oil had risen 10% in each of the first three quarters of 2008 and
then held steady for several years. We will use a bivariate vector autoregression (VAR) to forecast the
variables gdp and oil. Results obtained from the varsoc command indicate that the HannanQuinn
information criterion is minimized when the VAR includes two lags. First, we fit our VAR model and
store the estimation results:
. use http://www.stata-press.com/data/r13/gdpoil
. var gdp oil, lags(1 2)
Vector autoregression
Sample: 1986q4 - 2007q4
Log likelihood = -500.0749
FPE
= 559.0724
Det(Sigma_ml) = 441.7362
Equation
Parms
gdp
oil

5
5

Coef.

RMSE

R-sq

1.88516
11.8776

0.1820
0.1140

No. of obs
AIC
HQIC
SBIC
chi2
P>chi2
18.91318
10.93614

Std. Err.

P>|z|

=
=
=
=

85
12.00176
12.11735
12.28913

0.0008
0.0273

[95% Conf. Interval]

gdp
gdp
L1.
L2.

.1498285
.3465238

.1015076
.1022446

1.48
3.39

0.140
0.001

-.0491227
.146128

.3487797
.5469196

oil
L1.
L2.

-.0374609
.0119564

.0167968
.0164599

-2.23
0.73

0.026
0.468

-.070382
-.0203043

-.0045399
.0442172

_cons

1.519983

.4288145

3.54

0.000

.6795226

2.360444

gdp
L1.
L2.

.8102233
1.090244

.6395579
.6442017

1.27
1.69

0.205
0.091

-.4432871
-.1723684

2.063734
2.352856

oil
L1.
L2.

.0995271
-.1870052

.1058295
.103707

0.94
-1.80

0.347
0.071

-.1078949
-.3902672

.3069491
.0162568

_cons

-4.041859

2.701785

-1.50

0.135

-9.33726

1.253543

oil

. estimates store var

The dataset ends in the fourth quarter of 2007, so before we can produce forecasts for 2008 and
beyond, we need to extend our dataset. We can do that using the tsappend command. Here we
extend our dataset three years:
. tsappend, add(12)

226

forecast solve Obtain static and dynamic forecasts

Now we can create a forecast model and obtain baseline forecasts:


. forecast create oilmodel
Forecast model oilmodel started.
. forecast estimates var
Added estimation results from var.
Forecast model oilmodel now contains 2 endogenous variables.
. forecast solve, prefix(bl_)
Computing dynamic forecasts for model oilmodel.
Starting period: 2008q1
Ending period:
2010q4
Forecast prefix: bl_
2008q1: .................
(output omitted )
2010q4: ............
Forecast 2 variables spanning 12 periods.

To see how GDP evolves if oil prices increase 10% in each of the first three quarters of 2008
and then remain flat, we need to obtain a forecast for gdp conditional on a specified path for
oil. The actuals option of forecast solve will do that for us. With the actuals option, if an
endogenous variable contains a nonmissing value for the period currently being forecast, forecast
solve will use that value as the forecast, overriding whatever value might be produced by that
variables underlying estimation result or identity. Then the endogenous variables with missing values
will be forecast conditional on the endogenous variables that do have valid data. Here we fill in oil
with our hypothesized price path:
. replace oil = 10 if qdate == tq(2008q1)
(1 real change made)
. replace oil = 10 if qdate == tq(2008q2)
(1 real change made)
. replace oil = 10 if qdate == tq(2008q3)
(1 real change made)
. replace oil = 0 if qdate > tq(2008q3)
(9 real changes made)

Now we obtain forecasts conditional on our oil variable. We will use the prefix alt for these
forecast variables:
. forecast solve, prefix(alt_) actuals
Computing dynamic forecasts for model oilmodel.
Starting period:
Ending period:
Forecast prefix:

2008q1
2010q4
alt_

2008q1: ...............
(output omitted )
2010q4: ...........
Forecast 2 variables spanning 12 periods.
Forecasts used actual values if available.

forecast solve Obtain static and dynamic forecasts

227

Finally, we make a variable containing the difference between our alternative and our baseline gdp
forecasts and graph it:
. generate diff_gdp = alt_gdp - bl_gdp

Change in Annualized GDP Growth


.4
.3
.2
.1
0

.1

Oils Effect on GDP

12

Quarters since shock


Assumes oil increases 10% for 3 quarters, then holds steady

Our model indicates GDP growth would be about 0.4% less in the second through fourth quarters of
2008 than it would otherwise be, but would be mostly unaffected thereafter if oil prices followed our
hypothetical path. The one-quarter lag in the response of GDP is due to our using a VAR model. In
our VAR model, lagged values of oil predict the current value of gdp, but the current value of oil
does not.

Technical note
The previous example allowed us to demonstrate forecast solves actuals option, but in fact
measuring the economys response to oil shocks is much more difficult than our simple VAR analysis
would suggest. One obvious complication is that positive and negative oil price shocks do not have
symmetric effects on the economy. In our simple model, if a 50% increase in oil prices lowers GDP
by x%, then a 50% decrease in oil prices must raise GDP by x%. However, a 50% decrease in oil
prices is perhaps more likely to portend weakness in the economy rather than an imminent growth
spurt. See, for example, Hamilton (2003) and Kilian and Vigfusson (2013).

Another way to specify alternative scenarios for your forecasts is to use the forecast adjust
command. That command is more flexible in the types of manipulations you can perform on endogenous
variables but, depending on the task at hand, may involve more effort. The actuals option of the
forecast solve and the forecast adjust commands are complementary. There is much overlap
in what you can achieve; in some situations, specifying the actuals option will be easier, while in
other situations, using adjustments via forecast adjust will prove to be easier.

228

forecast solve Obtain static and dynamic forecasts

Using simulations to measure forecast accuracy


To motivate the discussion, we will focus on the simple linear regression model. Even though
forecast can handle models with many equations with equal ease, all the issues that arise can be
illustrated with one equation. Suppose we have the following relationship between variables y and
x:
yt = + xt + t
(1)
where t is a zero-mean error term. Say we fit (1) by ordinary least squares (OLS) using observations
1, . . . , T and obtain the point estimates
b and b. Assuming we have data for exogenous variable x
at time T + 1, we could forecast yT +1 as

b T +1
ybT +1 =
b + x

(2)

However, there are several factors that prevent us from guaranteeing ex ante that yT +1 will indeed
equal ybT +1 . We must assume that (1) specifies the correct relationship between y and x. Even if that
relationship held for times 1 through T , are we sure it will hold at time T + 1? Uncertainty due to
issues like that are inherent to the type of forecasting that the forecast commands are designed for.
Here we discuss two additional sources of uncertainty that forecast solve can help you measure.
First, we estimated and by OLS to obtain
b and b, but we must emphasize the word estimated.
Our estimates are subject to sampling error. When you fit a regression using regress or any other
estimation command, Stata presents not just the point estimates of the parameters but also the standard
errors and confidence intervals representing the level of uncertainty surrounding those point estimates.
Uncertainty surrounding the true values of and mean that there is some level of uncertainty
surrounding our predicted value ybT +1 as well.
Second, (1) states that yt depends not just on , , and xt but also on an unobserved error term
t . When we make our forecast using (2), we assume that the error term will equal its expected value
of zero. Saying a random error has an expected value of zero is clearly not the same as saying it
will be zero every time. If a positive outside shock occurs at T + 1, yT +1 will be higher than our
estimate based on (2) would lead us to believe.
Fortunately, quantifying both these sources of uncertainty is straightforward using simulation. First,
we solve our model as usual, providing us with our point forecasts. To see how uncertainty surrounding
our estimated parameters affects our forecasts, we can take random draws from a multivariate normal
b and whose variance is the covariance matrix produced by regress.
distribution whose mean is (b
, )
We then solve our model using these randomly drawn parameters rather than the original point
estimates. If we repeat the process of drawing random parameters and solving the model many times,
we can use the variance or standard deviation across replications for each time period as a measure
of uncertainty.
To account for uncertainty surrounding the error term, we can also use simulation. Here, at each
replication, we add a random noise term to our forecast for yT +1 , where we draw our random errors
such that they have the same characteristics as t . There are two ways we can do that. First, all the
estimation commands commonly used in forecasting provide us with an estimate of the variance or
standard deviation of the error term. For example, regress labels the estimated standard deviation
of the error term Root RMSE and conveniently saves it in a macro that forecast can access. If
we are willing to assume that all the errors in the equations in our model are normally distributed,
then we can use random-normal errors drawn with means equal to zero and variances as reported by
the estimation command used to fit each equation.
Sometimes the assumption of normality is unpalatable. In those cases, an alternative is to solve the
model to obtain static forecasts and then compute the sample residuals based on the observations for
which we have nonmissing values of the endogenous variables. Then in our simulations, we randomly
choose one of the residuals observed for that equation.

forecast solve Obtain static and dynamic forecasts

229

At each replication, whether we draw errors based on the normal errors or from the pool of
static-forecast residuals, we add the drawn value to our estimate of ybT +1 to provide a simulated value
for our forecast. Then, just like when simulating parameter uncertainty, we can use the variance or
standard deviation across replications to measure uncertainty. In fact, we can perform simulations that
draw both random parameters and random errors to account for both sources of uncertainty at once.

Example 2: Accounting for parameter uncertainty


Here we revisit our Klein (1950) model from example 1 of [TS] forecast and perform simulations
in which we account for uncertainty associated with the estimated parameters of the model. First, we
load the dataset and set up our model:
.
.
>
.
.

use http://www.stata-press.com/data/r13/klein2, clear


quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y)
exog(t wg g)
estimates store klein
forecast create kleinmodel, replace
(Forecast model oilmodel ended.)
Forecast model kleinmodel started.
. forecast estimates klein
Added estimation results from reg3.
Forecast model kleinmodel now contains 3 endogenous variables.
. forecast identity y = c + i + g
Forecast model kleinmodel now contains 4 endogenous variables.
. forecast
Forecast
. forecast
Forecast

identity p = y model kleinmodel


identity k = L.k
model kleinmodel

t - wp
now contains 5 endogenous variables.
+ i
now contains 6 endogenous variables.

. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast
. forecast
Forecast

identity w = wg + wp
model kleinmodel now
exogenous wg
model kleinmodel now
exogenous g
model kleinmodel now
exogenous t
model kleinmodel now
exogenous yr
model kleinmodel now

contains 7 endogenous variables.


contains 1 declared exogenous variable.
contains 2 declared exogenous variables.
contains 3 declared exogenous variables.
contains 4 declared exogenous variables.

Now we are ready to solve our model. We are going to begin dynamic forecasts in 1936, and we
are going to perform 100 replications. We will store the point forecasts in variables prefixed with d ,
and we will store the standard deviations of our forecasts in variables prefixed with sd . Because
the simulations involve the use of random numbers, we must remember to set the random-number
seed if we want to be able to replicate our results; see [R] set seed. We type

230

forecast solve Obtain static and dynamic forecasts


. set seed 1
. forecast solve, prefix(d_) begin(1936)
> simulate(betas, statistic(stddev, prefix(sd_)) reps(100))
Computing dynamic forecasts for model kleinmodel.
Starting period: 1936
Ending period:
1941
Forecast prefix: d_
1936: ............................................
1937: ..........................................
1938: .............................................
1939: .............................................
1940: ............................................
1941: ..............................................
Performing simulations (100)
1
2
3
4
5
..................................................
50
..................................................
100
Forecast 7 variables spanning 6 periods.

The key here is the simulate() option. We requested that forecast solve perform 100 simulations
by taking random draws for the parameters (betas), and we requested that it record the standard
deviation (stddev) of each endogenous variable in new variables that begin with sd . Next we
compute the upper and lower bounds of a 95% prediction interval for our forecast of total income y:
. gen d_y_up = d_y
(16 missing values
. gen d_y_dn = d_y
(16 missing values

+ invnormal(0.975)*sd_y
generated)
+ invnormal(0.025)*sd_y
generated)

We obtained 16 missing values after each generate because the simulation summary variables only
contain nonmissing data for the periods in which forecasts were made. The point-forecast variables
that begin with d in this example are filled in with the corresponding actual values of the endogenous
variables for periods before the beginning of the forecast horizon; in our experience, having both the
historical data and forecasts in one set of variables simplifies many tasks. Here we graph our forecast
of total income along with the 95% prediction interval:

forecast solve Obtain static and dynamic forecasts

231

50

60

70

80

90

100

Total Income

1935

1937

1939

1941

Solid lines denote actual values.


Dashed lines denote forecast values.
95% confidence bands based on parameter uncertainty.

Our next example will use the same forecast model, but we will not need the forecast variables
we just created. forecast drop makes removing those variables easy:
. forecast drop
(dropped 14 variables)

forecast drop drops all variables created by the previous invocation of forecast solve, including
both the point-forecast variables and any variables that contain simulation results. In this case,
forecast drop will remove all the variables that begin with sd as well as d y, d c, d i, and
so on. However, we are not done yet. We created the variables d y dn and d y up ourselves, and
they were not part of the forecast model. Therefore, they are not removed by forecast drop, and
we need to do that ourselves:
. drop d_y_dn d_y_up

Example 3: Accounting for both parameter uncertainty and random errors


In the previous example, we measured uncertainty in our model stemming from the fact that our
parameters were estimated. Here we not only simulate random draws for the parameters but also add
random-normal errors to the stochastic equations. We type

232

forecast solve Obtain static and dynamic forecasts


. set seed 1
. forecast solve, prefix(d_) begin(1936)
> simulate(betas errors, statistic(stddev, prefix(sd_)) reps(100))
Computing dynamic forecasts for model kleinmodel.
Starting period: 1936
Ending period:
1941
Forecast prefix: d_
1936: ............................................
1937: ..........................................
1938: .............................................
1939: .............................................
1940: ............................................
1941: ..............................................
Performing simulations (100)
1
2
3
4
5
..................................................
50
..................................................
100
Forecast 7 variables spanning 6 periods.

The only difference between this call to forecast solve and the one in the previous example is that
here we specified betas errors in the simulate() option rather than just betas. Had we wanted
to perform simulations involving the parameters and random draws from the pool of static-forecast
residuals rather than random-normal errors, we would have specified betas residuals. After we
re-create the variables containing the bounds on our prediction interval, we obtain the following graph:

50

60

70

80

90

100

Total Income

1935

1937

1939

1941

Solid lines denote actual values.


Dashed lines denote forecast values.
95% confidence bands based on parameter uncertainty and normally distributed errors.

Notice that by accounting for both parameter and additive error uncertainty, our prediction interval
became much wider.

forecast solve Obtain static and dynamic forecasts

233

Stored results
forecast solve stores the following in r():
Scalars
r(first obs)
r(last obs)
r(Npanels)
r(Nvar)
r(vtolerance)
r(ztolerance)
r(iterate)
r(sim nreps)
r(damping)
Macros
r(prefix)
r(suffix)
r(actuals)
r(static)
r(double)
r(sim technique)
r(logtype)

first observation in forecast horizon


last observation in forecast horizon
(of first panel if forecasting panel data)
number of panels forecast
number of forecast variables
tolerance for forecast values
tolerance for function zero
maximum number of iterations
number of simulations
damping parameter for damped GaussSeidel
forecast variable prefix
forecast variable suffix
actuals, if specified
static, if specified
double, if specified
specified sim technique
on, off, brief, or detail

Methods and formulas


Formalizing the definition of a model provided in [TS] forecast, we represent the endogenous
variables in the model as the k 1 vector y, and we represent the exogenous variables in the model as
the m 1 vector x. We refer to the contemporaneous values as yt and xt ; for notational simplicity,
we refer to lagged values as yt1 and xt1 with the implication that further lags of the variables
can also be included with no loss of generality. We use to refer to the vector of all the estimated
parameters in all the equations of the model. We use ut and ut1 to refer to contemporaneous and
lagged error terms, respectively.
The forecast commands solve models of the form

yit = fi (yi,t , yt1 , xt , xt1 , ut , ut1 ; )

(3)

where i = 1, . . . , k and yi,t refers to the k 1 1 vector of endogenous variables other than yi
at time t. If equation j is an identity, we take ujt = 0 for all t; for stochastic equations, the errors
correspond to the usual regression error terms. Equation (3) does not include subscripts indexing
panels for notational simplicity, but the extension is obvious. A model is solveable if k 1. m may
be zero.
Endogenous variables are added to the forecast model via forecast estimates, forecast
identity, and forecast coefvector. Equations added via forecast estimates are always
stochastic, while equations added via forecast identity are always nonstochastic. Equations added
via forecast coefvector are treated as stochastic if options variance() or errorvariance()
(or both) are specified and nonstochastic if neither is specified.
Exogenous variables are declared using forecast exogenous, but the model may contain additional
exogenous variables. For example, the right-hand side of an equation may contain exogenous variables
that are not declared using forecast exogenous. Before solving the model, forecast solve
determines whether the declared exogenous variables contain missing values over the forecast horizon
and issues an informative error message if any do. Undeclared exogenous variables that contain
missing values within the forecast horizon will cause forecast solve to exit with a less-informative
error message and require the user to do more work to pinpoint the problem.

234

forecast solve Obtain static and dynamic forecasts

Adjustments added via forecast adjust easily fit within the framework of (3). Simply let fi ()
represent the value of yit obtained by first evaluating the appropriate estimation result, coefficient
vector, or identity and then performing the adjustments based on that intermediate result. Endogenous
variables may have multiple adjustments; adjustments are made in the order in which they were
specified via forecast adjust. For single-equation estimation results and coefficient vectors as well
as identities, adjustments are performed right after the equation is evaluated. For multiple-equation
estimation results and coefficient vectors, adjustments are made after all the equations within that set
of results are evaluated. Suppose an estimation result that uses predict includes two left-hand-side
variables, y1t and y2t , and you have added two adjustments to y1t and one adjustment to y2t . Here
forecast solve first calls predict twice to obtain candidate values for y1t and y2t ; then it performs
the two adjustments to y1t , and finally it adjusts y2t .
forecast solve offers four solution techniques: GaussSeidel, damped GaussSeidel, Broyden
Powell, and NewtonRaphson. The GaussSeidel techniques are simple iterative techniques that are
often fast and typically work well, particularly when a damping factor is used. GaussSeidel is simply
damped GaussSeidel without damping (a damping factor of 0). By default, damped GaussSeidel
with a damping factor of 0.2 is used, representing a small amount of damping. As Fair (1984, 250)
notes, while these techniques often work well, there is no guarantee that they will converge. Technique
NewtonRaphson typically works well but is slow because it requires the use of numerical derivatives at
every iteration to obtain a Jacobian matrix. The BroydenPowell (Broyden 1970; Powell 1970) method
is analogous to quasi-Newton methods used for function optimization in that an updating method is
used at each iteration to update an estimate of the Jacobian matrix rather than actually recalculating
it. For additional details as well as a discussion of the convergence criteria, see [M-5] solvenl( ).
If you do not specify the begin() option, forecast solve uses the following algorithm to select
the starting time period. Suppose the time variable t runs from 1 to T . If, at time T , none of the
endogenous variables contains missing values, forecast solve exits with an error message: there
are no periods in which the endogenous variables are not known; therefore, there are no periods
where a forecast is obviously required. Otherwise, consider period T 1. If none of the endogenous
variables contains missing values in that period, then the only period to forecast is T . Otherwise,
work back through time to find the latest period in which all of the endogenous variables contain
nonmissing values and then begin forecasting in the subsequent period. In the case of panel datasets,
the same algorithm is applied to each panel, and forecasts for all panels begin on the earliest period
selected.
When you specify the simulate() option with sim technique betas, forecast solve draws
random vectors from the multivariate normal distribution for each estimation result individually.
The mean and variance are based on the estimation results e(b) and e(V) macros, respectively.
If the estimation result is from a multiple-equation estimator, the corresponding Stata command
stores in e(b) and e(V) the full parameter vector and covariance matrix for all equations so that
forecast solves simulations will account for covariances among parameters in that estimation
results equations. However, covariances among parameters that appear in different estimation results
are taken to be zero.
If you specify a coefficient vector using forecast coefvector and specify a variance matrix in
the variance() option, then those coefficient vectors are simulated just like the parameter vectors
from estimation results. If you do not specify the variance() option, then the coefficient vector is
assumed to be nonstochastic and therefore is not simulated.
When you specify the simulate() option with sim technique residuals, forecast solve
first obtains static forecasts from your model for all possible periods. For each endogenous variable
defined by a stochastic equation, it then computes residuals as the forecast value minus the actual
value for all observations with nonmissing data. At each replication and for each period in the forecast
horizon, forecast solve randomly selects one element from each stochastic equations pool of

forecast solve Obtain static and dynamic forecasts

235

residuals before solving the model for that replication and period. Then whenever forecast solve
evaluates a stochastic equation, it adds the chosen element to the predicted value for that equation.
Suppose an estimation result represents a multiple-equation estimator with m equations, and suppose
that there are n time periods for which sample residuals are available. Arrange the residuals into the
n m matrix R. Then when forecast solve is randomly selecting residuals for this estimation
result, it will choose a random number j between 1 and n and select the entire j th row from R.
That preserves the correlation structure among the error terms of the estimation results equations.
If you specify a coefficient vector using forecast coefvector and specify either the variance()
option or the errorvariance() option (or both), sim technique residuals considers the equation
represented by the coefficient vector to be stochastic and resamples residuals for that equation.
When you specify the simulate() option with sim technique errors, forecast solve, for
each stochastic equation, replication, and period, takes a random draw from a multivariate normal
distribution with zero mean before solving the model for that replication and period. Then whenever
forecast solve evaluates a stochastic equation, it adds that random draw to the predicted value
for that equation. The variance of the distribution from which errors are drawn is based on the
estimation results for that equation. The forecast commands look in e(rmse), e(sigma), and
e(Sigma) to find the estimated variance. If you add an estimation result that does not set any of those
three macros and you request sim technique errors, forecast solve exits with an error message.
Multiple-equation commands typically set e(Sigma) so that the randomly drawn errors reflect the
estimated error correlation structure.
If you specify a coefficient vector using forecast coefvector and specify the errorvariance()
option, sim technique errors simulates errors for that equation. Otherwise, the equation is treated
like an identity and no errors are added.
forecast solve solves panel-data models by solving for all periods in the forecast horizon for
the first panel in the dataset, then the second dataset, and so on. When you perform simulations with
panel datasets, one replication is completed for all panels in the dataset before moving to the next
replication. Simulations that include residual resampling select residuals from the pool containing
residuals for all panels; forecast solve does not restrict itself to the static-forecast residuals for a
single panel when simulating that panel.

References
Broyden, C. G. 1970. Recent developments in solving nonlinear algebraic systems. In Numerical Methods for Nonlinear
Algebraic Equations, ed. P. Rabinowitz, 6173. London: Gordon and Breach Science Publishers.
Fair, R. C. 1984. Specification, Estimation, and Analysis of Macroeconometric Models. Cambridge, MA: Harvard
University Press.
Hamilton, J. D. 2003. What is an oil shock? Journal of Econometrics 113: 363398.
Kilian, L., and R. J. Vigfusson. 2013. Do oil prices help forecast U.S. real GDP? The role of nonlinearities and
asymmetries. Journal of Business and Economic Statistics 31: 7893.
Klein, L. R. 1950. Economic Fluctuations in the United States 19211941. New York: Wiley.
Powell, M. J. D. 1970. A hybrid method for nonlinear equations. In Numerical Methods for Nonlinear Algebraic
Equations, ed. P. Rabinowitz, 87114. London: Gordon and Breach Science Publishers.

Also see
[TS] forecast Econometric model forecasting
[TS] forecast adjust Adjust a variable by add factoring, replacing, etc.
[TS] forecast drop Drop forecast variables
[R] set seed Specify initial value of random-number seed

Title
irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
Syntax

Description

Remarks and examples

References

Also see

Syntax
irf subcommand . . .

, ...

subcommand

Description

create
set

create IRF file containing IRFs, dynamic-multiplier functions, and FEVDs


set the active IRF file

graph
cgraph
ograph
table
ctable

graph results from active file


combine graphs of IRFs, dynamic-multiplier functions, and FEVDs
graph overlaid IRFs, dynamic-multiplier functions, and FEVDs
create tables of IRFs, dynamic-multiplier functions, and FEVDs from
active file
combine tables of IRFs, dynamic-multiplier functions, and FEVDs

describe
add
drop
rename

describe contents of active file


add results from an IRF file to the active IRF file
drop IRF results from active file
rename IRF results within a file

IRF stands for impulseresponse function; FEVD stands for forecast-error variance decomposition.
irf can be used only after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec,
[TS] arima, and [TS] arfima.
See [TS] irf create, [TS] irf set, [TS] irf graph, [TS] irf cgraph, [TS] irf ograph, [TS] irf table,
[TS] irf ctable, [TS] irf describe, [TS] irf add, [TS] irf drop, and [TS] irf rename for details about
subcommands.

Description
irf creates and manipulates IRF files that contain estimates of the IRFs, dynamic-multiplier
functions, and forecast-error variance decompositions (FEVDs) created after estimation by var, svar,
or vec; see [TS] var, [TS] var svar, or [TS] vec.
irf creates and manipulates IRF files that contain estimates of the IRFs created after estimation
by arima or arfima; see [TS] arima or [TS] arfima.
IRFs and FEVDs are described below, and the process of analyzing them is outlined. After reading
this entry, please see [TS] irf create.

Remarks and examples


An IRF measures the effect of a shock to an endogenous variable on itself or on another
endogenous variable; see Lutkepohl (2005, 5163) and Hamilton (1994, 318323) for formal definitions.
Becketti (2013) provides an approachable, gentle introduction to IRF analysis. Of the many types of
IRFs, irf create estimates the five most important: simple IRFs, orthogonalized IRFs, cumulative
IRFs, cumulative orthogonalized IRFs, and structural IRFs.
236

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs

237

A dynamic-multiplier function, or transfer function, measures the impact of a unit increase in an


exogenous variable on the endogenous variables over time; see Lutkepohl (2005, chap. 10) for formal
definitions. irf create estimates simple and cumulative dynamic-multiplier functions after var.
The forecast-error variance decomposition (FEVD) measures the fraction of the forecast-error
variance of an endogenous variable that can be attributed to orthogonalized shocks to itself or to
another endogenous variable; see Lutkepohl (2005, 6366) and Hamilton (1994, 323324) for formal
definitions. Of the many types of FEVDs, irf create estimates the two most important: Cholesky
and structural.
To analyze IRFs and FEVDs in Stata, you first fit a model, then use irf create to estimate the
IRFs and FEVDs and save them in a file, and finally use irf graph or any of the other irf analysis
commands to examine results:
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
(output omitted )
. irf
(file
(file
(file
. irf

create order1, step(10) set(myirf1)


myirf1.irf created)
myirf1.irf now active)
myirf1.irf updated)
graph oirf, impulse(dln_inc) response(dln_consump)

order1, dln_inc, dln_consump


.006

.004

.002

.002
0

10

step
95% CI

orthogonalized irf

Graphs by irfname, impulse variable, and response variable

Multiple sets of IRFs and FEVDs can be placed in the same file, with each set of results in a
file bearing a distinct name. The irf create command above created file myirf1.irf and put
one set of results in it, named order1. The order1 results include estimates of the simple IRFs,
orthogonalized IRFs, cumulative IRFs, cumulative orthogonalized IRFs, and Cholesky FEVDs.
Below we use the same estimated var but use a different Cholesky ordering to create a second set
of IRF results, which we will save as order2 in the same file, and then we will graph both results:

238

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs


. irf create order2, step(10) order(dln_inc dln_inv dln_consump)
(file myirf1.irf updated)
. irf graph oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump)

order1, dln_inc, dln_consump

order2, dln_inc, dln_consump

.01

.005

.005
0

10

10

step
95% CI

orthogonalized irf

Graphs by irfname, impulse variable, and response variable

We have compared results for one model under two different identification schemes. We could just
as well have compared results of two different models. We now use irf table to display the results
tabularly:
. irf table oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump)
Results from order1 order2

step
0
1
2
3
4
5
6
7
8
9
10

(1)
oirf

(1)
Lower

(1)
Upper

(2)
oirf

(2)
Lower

(2)
Upper

.004934
.001309
.003573
-.000692
.000905
.000328
.000021
.000154
.000026
.000026
.000026

.003016
-.000931
.001285
-.002333
-.000541
-.0005
-.000675
-.000206
-.000248
-.000121
-.000061

.006852
.003549
.005862
.00095
.002351
.001156
.000717
.000515
.0003
.000174
.000113

.005244
.001235
.00391
-.000677
.00094
.000341
.000042
.000161
.000027
.00003
.000027

.003252
-.001011
.001542
-.002347
-.000576
-.000518
-.000693
-.000218
-.000261
-.000125
-.000065

.007237
.003482
.006278
.000993
.002456
.001201
.000777
.00054
.000315
.000184
.00012

95% lower and upper bounds reported


(1) irfname = order1, impulse = dln_inc, and response = dln_consump
(2) irfname = order2, impulse = dln_inc, and response = dln_consump

Both the table and the graph show that the two orthogonalized IRFs are essentially the same. In both
functions, an increase in the orthogonalized shock to dln inc causes a short series of increases in
dln consump that dies out after four or five periods.

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs

References
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] arfima Autoregressive fractionally integrated moving-average models
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] vec Vector error-correction models
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

239

Title
irf add Add results from an IRF file to the active IRF file
Syntax
Remarks and examples

Menu
Also see

Description

Option

Syntax
irf add




all | newname= oldname . . . , using(irf filename)

Menu
Statistics

>

Multivariate time series

>

Manage IRF results and files

>

Add IRF results

Description
irf add copies results from one IRF file to anotherfrom the specified using() file to the active
IRF file, set by irf set; see [TS] irf set.

Option
using(irf filename) specifies the file from which results are to be obtained and is required. If
irf filename is specified without an extension, .irf is assumed.

Remarks and examples


If you have not read [TS] irf, please do so.

Example 1
After fitting a VAR model, we create two separate IRF files:
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
(output omitted )
. irf create original, set(irf1, replace)
(file irf1.irf created)
(file irf1.irf now active)
(file irf1.irf updated)
. irf create order2, order(dln_inc dln_inv dln_consump) set(irf2, replace)
(file irf2.irf created)
(file irf2.irf now active)
(file irf2.irf updated)

We copy IRF results original to the active file giving them the name order1.
. irf add order1 = original, using(irf1)
(file irf2.irf updated)

240

irf add Add results from an IRF file to the active IRF file

Here we create new IRF results and save them in the new file irf3.
. irf
(file
(file
(file

create order3, order(dln_inc dln_consump dln_inv) set(irf3, replace)


irf3.irf created)
irf3.irf now active)
irf3.irf updated)

Now we copy all the IRF results in file irf2 into the active file.
. irf add _all, using(irf2)
(file irf3.irf updated)

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

241

Title
irf cgraph Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf cgraph (spec1 )


 

(spec2 ) . . . (specN )
, options

where (speck ) is
(irfname impulsevar responsevar stat


, spec options )

irfname is the name of a set of IRF results in the active IRF file. impulsevar should be specified as an
endogenous variable for all statistics except dm and cdm; for those, specify as an exogenous variable.
responsevar is an endogenous variable name. stat is one or more statistics from the list below:
stat

Description

Main

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd

impulseresponse function
orthogonalized impulseresponse function
dynamic-multiplier function
cumulative impulseresponse function
cumulative orthogonalized impulseresponse function
cumulative dynamic-multiplier function
Cholesky forecast-error variance decomposition
structural impulseresponse function
structural forecast-error variance decomposition

Notes: 1. No statistic may appear more than once.


2. If confidence intervals are included (the default), only two statistics may be included.
3. If confidence intervals are suppressed (option noci), up to four statistics may be included.

options

Description

Main

set(filename)

make filename active

Options

combine options

affect appearance of combined graph

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

spec options

level, steps, and rendition of plots and their CIs

individual

graph each combination individually

spec options appear on multiple tabs in the dialog box.

individual does not appear in the dialog box.

242

irf cgraph Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs

spec options

243

Description

Main

suppress confidence bands

noci
Options

set confidence level; default is level(95)


use # for first step
use # for maximum step

level(#)
lstep(#)
ustep(#)
Plots

plot#opts(line options)

affect rendition of the line plotting the # stat

CI plots

ci#opts(area options)

affect rendition of the confidence interval for the # stat

spec options may be specified within a graph specification, globally, or in both. When specified in a graph
specification, the spec options affect only the specification in which they are used. When supplied globally, the
spec options affect all graph specifications. When supplied in both places, options in the graph specification take
precedence.

Menu
Statistics

>

Multivariate time series

>

IRF and FEVD analysis

>

Combined graphs

Description
irf cgraph makes a graph or a combined graph of IRF results. Each block within a pair of
matching parentheseseach (speck )specifies the information for a specific graph. irf cgraph
combines these graphs into one image, unless the individual option is also specified, in which case
separate graphs for each block are created.
To become familiar with this command, we recommend that you type db irf cgraph.

Options


Main

noci suppresses graphing the confidence interval for each statistic. noci is assumed when the model
was fit by vec because no confidence intervals were estimated.
set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.

Options

level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they
are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the
width of confidence intervals. The value set of an overall level() can be overridden by the
level() inside a (speck ).
lstep(#) specifies the first step, or period, to be included in the graph. lstep(0) is the default.
ustep(#), # 1, specifies the maximum step, or period, to be included in the graph.
combine options affect the appearance of the combined graph; see [G-2] graph combine.

244

irf cgraph Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs

Plots

plot1opts(cline options), . . . , plot4opts(cline options) affect the rendition of the plotted statistics. plot1opts() affects the rendition of the first statistic; plot2opts(), the second; and so
on. cline options are as described in [G-3] cline options.

CI plots

ci1opts1(area options) and ci2opts2(area options) affect the rendition of the confidence intervals
for the first (ci1opts()) and second (ci2opts()) statistics. See [TS] irf graph for a description
of this option and [G-3] area options for the suboptions that change the look of the CI.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).
The following option is available with irf cgraph but is not shown in the dialog box:
individual specifies that each graph be displayed individually. By default, irf cgraph combines
the subgraphs into one image.

Remarks and examples


If you have not read [TS] irf, please do so.
The relationship between irf cgraph and irf graph is syntactically and conceptually the same
as that between irf ctable and irf table; see [TS] irf ctable for a description of the syntax.
irf cgraph is much the same as using irf graph to make individual graphs and then using
graph combine to put them together. If you cannot use irf cgraph to do what you want, consider
the other approach.

Example 1
You have previously issued the commands
.
.
.
.
.
.
.

use http://www.stata-press.com/data/r13/lutkepohl2
mat a = (., 0, 0\0,.,0\.,.,.)
mat b = I(3)
svar dln_inv dln_inc dln_consump, aeq(a) beq(b)
irf create modela, set(results3) step(8)
svar dln_inc dln_inv dln_consump, aeq(a) beq(b)
irf create modelb, step(8)

irf cgraph Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs

You now type


. irf cgraph (modela dln_inc dln_consump oirf sirf)
>
(modelb dln_inc dln_consump oirf sirf)
>
(modela dln_inc dln_consump fevd sfevd, lstep(1))
>
(modelb dln_inc dln_consump fevd sfevd, lstep(1)),
>
title("Results from modela and modelb")

Results from modela and modelb


modela: dln_inc > dln_consump

modelb: dln_inc > dln_consump


.01

.006
.004

.005

.002
0
0
.002

.005
0

4
step

4
step

95% CI for oirf

95% CI for sirf

95% CI for oirf

95% CI for sirf

oirf

sirf

oirf

sirf

modela: dln_inc > dln_consump

modelb: dln_inc > dln_consump

.5

.5

.4

.4

.3

.3

.2

.2

.1

.1
0

4
step

4
step

95% CI for fevd

95% CI for sfevd

95% CI for fevd

95% CI for sfevd

fevd

sfevd

fevd

sfevd

Stored results
irf cgraph stores the following in r():
Scalars
r(k)
Macros
r(individual)
r(save)
r(name)
r(title)
r(save#)
r(name#)
r(title#)
r(ci#)
r(response#)
r(impulse#)
r(irfname#)
r(stats#)

number of specific graph commands


individual, if specified
filename, replace from saving() option for combined graph
name, replace from name() option for combined graph
title of the combined graph
filename, replace from saving() option for individual graphs
name, replace from name() option for individual graphs
title for the #th graph
level applied to the #th confidence interval or noci
response specified in the #th command
impulse specified in the #th command
IRF name specified in the #th command
statistics specified in the #th command

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

245

Title
irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
After var
irf create irfname

, var options

, svar options

, vec options

, arima options

, arfima options

After svar
irf create irfname

After vec
irf create irfname

After arima
irf create irfname

After arfima
irf create irfname

irfname is any valid name that does not exceed 15 characters.


var options

Description

Main



set(filename , replace )
replace
step(#)
order(varlist)
estimates(estname)

make filename active


replace irfname if it already exists
set forecast horizon to #; default is step(8)
specify Cholesky ordering of endogenous variables
use previously stored results estname; default is to use active
results

Std. errors

nose
bs
bsp
nodots
reps(#)


bsaving(filename , replace )

do not calculate standard errors


obtain standard errors from bootstrapped residuals
obtain standard errors from parametric bootstrap
do not display . for each bootstrap replication
use # bootstrap replications; default is reps(200)
save bootstrap results in filename
246

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

svar options

247

Description

Main



set(filename , replace )
replace
step(#)
estimates(estname)

make filename active


replace irfname if it already exists
set forecast horizon to #; default is step(8)
use previously stored results estname; default is to use active
results

Std. errors

nose
bs
bsp
nodots
reps(#)


bsaving(filename , replace )

do not calculate standard errors


obtain standard errors from bootstrapped residual
obtain standard errors from parametric bootstrap
do not display . for each bootstrap replication
use # bootstrap replications; default is reps(200)
save bootstrap results in filename

vec options

Description

Main



set(filename , replace )
replace
step(#)
estimates(estname)

make filename active


replace irfname if it already exists
set forecast horizon to #; default is step(8)
use previously stored results estname; default is to use active
results

arima options

Description

Main



set(filename , replace )
replace
step(#)
estimates(estname)

make filename active


replace irfname if it already exists
set forecast horizon to #; default is step(8)
use previously stored results estname; default is to use active
results

Std. errors

nose

do not calculate standard errors

arfima options

Description

Main



set(filename , replace )
replace
step(#)
smemory
estimates(estname)

make filename active


replace irfname if it already exists
set forecast horizon to #; default is step(8)
calculate short-memory IRFs
use previously stored results estname; default is to use active
results

Std. errors

nose

do not calculate standard errors

248

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

The default is to use asymptotic standard errors if no options are specified.


irf create is for use after fitting a model with the var, svar, vec, arima, or arfima command; see [TS] var,
[TS] var svar, [TS] vec, [TS] arima, and [TS] arfima.
You must tsset your data before using var, svar, vec, arima, or arfima and, hence, before using irf create;
see [TS] tsset.

Menu
Statistics
FEVDs

>

Multivariate time series

>

IRF and FEVD analysis

>

Obtain IRFs, dynamic-multiplier functions, and

Description
irf create estimates multiple sets of impulseresponse functions (IRFs), dynamic-multiplier
functions, and forecast-error variance decompositions (FEVDs) after estimation by var, svar, or vec;
see [TS] var, [TS] var svar, or [TS] vec. irf create also estimates multiple sets of IRFs after
estimation by arima or arfima; see [TS] arima or [TS] arfima. All of these estimates and their
standard errors are known collectively as IRF results and are saved in an IRF file under the specified
irfname.
The following types of IRFs and dynamic-multiplier functions are saved:
simple IRFs
orthogonalized IRFs
dynamic multipliers
cumulative IRFs
cumulative orthogonalized IRFs
cumulative dynamic multipliers
structural IRFs

after
after
after
after
after
after
after

var, svar, vec,


var, svar, vec,
var
var, svar, vec,
var, svar, vec,
var
svar, arima, or

arima, or arfima
arima, or arfima
arima, or arfima
arima, or arfima
arfima

The following types of FEVDs are saved:


Cholesky FEVDs
structural FEVDs

after var, svar, or vec


after svar only

Once you have created a set of IRF results, use the other irf commands to analyze them.

Options


Main

set(filename[, replace]) specifies the IRF file to be used. If set() is not specified, the active IRF
file is used; see [TS] irf set.
If set() is specified, the specified file becomes the active file, just as if you had issued an irf
set command.
replace specifies that the results saved under irfname may be replaced, if they already exist. IRF
results are saved in files, and one file may contain multiple IRF results.
step(#) specifies the step (forecast) horizon; the default is eight periods.

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

249

order(varlist) is allowed only after estimation by var; it specifies the Cholesky ordering of the
endogenous variables to be used when estimating the orthogonalized IRFs. By default, the order
in which the variables were originally specified on the var command is used.
smemory is allowed only after estimation by arfima; it specifies that the IRFs are calculated based
on a short-memory model with the fractional difference parameter d set to zero.
estimates(estname) specifies that estimation results previously estimated by var, svar, or vec,
and stored by estimates, be used. This option is rarely specified; see [R] estimates.

Std. errors

nose, bs, and bsp are alternatives that specify how (whether) standard errors are to be calculated. If
none of these options is specified, asymptotic standard errors are calculated, except in two cases:
after estimation by vec and after estimation by svar in which long-run constraints were applied.
In those two cases, the default is as if nose were specified, although in the second case, you could
specify bs or bsp. After estimation by vec, standard errors are simply not available.
nose specifies that no standard errors be calculated.
bs specifies that standard errors be calculated by bootstrapping the residuals. bs may not be
specified if there are gaps in the data.
bsp specifies that standard errors be calculated via a multivariate-normal parametric bootstrap.
bsp may not be specified if there are gaps in the data.


nodots, reps(#), and bsaving(filename , replace ) are relevant only if bs or bsp is specified.
nodots specifies that dots not be displayed each time irf create performs a bootstrap replication.
reps(#), # > 50, specifies the number of bootstrap replications to be performed. reps(200) is
the default.


bsaving(filename , replace ) specifies that file filename be created and that the bootstrap
replications be saved in it. New file filename is just a .dta dataset than can be loaded later using
use; see [D] use. If filename is specified without an extension, .dta is assumed.

Remarks and examples


If you have not read [TS] irf, please do so. An introductory example using IRFs is presented there.
Remarks are presented under the following headings:
Introductory examples
Technical aspects of IRF files
IRFs and FEVDs
IRF results for VARs
An introduction to impulseresponse functions for VARs
An introduction to dynamic-multiplier functions for VARs
An introduction to forecast-error variance decompositions for VARs
IRF results for VECMs
An introduction to impulseresponse functions for VECMs
An introduction to forecast-error variance decompositions for VECMs
IRF results for ARIMA and ARFIMA

250

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

Introductory examples
Example 1: After var
Below we compare bootstrap and asymptotic standard errors for a specific FEVD. We begin by
fitting a VAR(2) model to the Lutkepohl data (we use the var command). We next use the irf create
command twice, first to create results with asymptotic standard errors (saved under the name asymp)
and then to re-create the same results, this time with bootstrap standard errors (saved under the name
bs). Because bootstrapping is a random process, we set the random-number seed (set seed 123456)
before using irf create the second time; this makes our results reproducible. Finally, we compare
results by using the IRF analysis command irf ctable.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4), lags(1/2)
(output omitted )
. irf create asymp, step(8) set(results1)
(file results1.irf created)
(file results1.irf now active)
(file results1.irf updated)
. set seed 123456
. irf create bs, step(8) bs reps(250) nodots
(file results1.irf updated)
. irf ctable (asymp dln_inc dln_consump fevd)
> (bs dln_inc dln_consump fevd), noci stderror

step
0
1
2
3
4
5
6
7
8

(1)
fevd

(1)
S.E.

(2)
fevd

(2)
S.E.

0
.282135
.278777
.33855
.339942
.342813
.343119
.343079
.34315

0
.087373
.083782
.090006
.089207
.090494
.090517
.090499
.090569

0
.282135
.278777
.33855
.339942
.342813
.343119
.343079
.34315

0
.104073
.096954
.100452
.099085
.099326
.09934
.099325
.099368

(1) irfname = asymp, impulse = dln_inc, and response = dln_consump


(2) irfname = bs, impulse = dln_inc, and response = dln_consump

Point estimates are, of course, the same. The bootstrap estimates of the standard errors, however,
are larger than the asymptotic estimates, which suggests that the sample size of 71 is not large
enough for the distribution of the estimator of the FEVD to be well approximated by the asymptotic
distribution. Here we would expect the bootstrap confidence interval to be more reliable than the
confidence interval that is based on the asymptotic standard error.

Technical note
The details of the bootstrap algorithms are given in Methods and formulas. These algorithms are
conditional on the first p observations, where p is the order of the fitted VAR. (In an SVAR model, p
is the order of the VAR that underlies the SVAR.) The bootstrapped estimates are conditional on the
first p observations, just as the estimators of the coefficients in VAR models are conditional on the

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

251

first p observations. With bootstrap standard errors (option bs), the p initial observations are used
with resampling the residuals to produce the bootstrap samples used for estimation. With the more
parametric bootstrap (option bsp), the p initial observations are used with draws from a multivariate
b to generate the bootstrap samples.
normal distribution with variancecovariance matrix

Technical note
b the estimated variance matrix of the disturbances, in
For var and svar e() results, irf uses ,
computing the asymptotic standard errors of all the functions. The point estimates of the orthogonalized impulseresponse functions, the structural impulseresponse functions, and all the variance
b As discussed in [TS] var, var and svar use the ML estimator of
decompositions also depend on .
this matrix by default, but they have option dfk, which will instead use an estimator that includes a
small-sample correction. Specifying dfk when the model is fitwhen the var or svar command is
b and will change the IRF results that depend on it.
givenchanges the estimate of

Example 2: After var with exogenous variables


After fitting a VAR, irf create computes estimates of the dynamic multipliers, which describe
the impact of a unit change in an exogenous variable on each endogenous variable. For instance,
below we estimate and report the cumulative dynamic multipliers from a model in which changes in
investment are exogenous. The results indicate that both of the cumulative dynamic multipliers are
significant.
. var dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4), lags(1/2)
> exog(L(0/2).dln_inv)
(output omitted )
. irf create dm, step(8)
(file results1.irf updated)

252

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs


. irf table cdm, impulse(dln_inv) irf(dm)
Results from dm

step
0
1
2
3
4
5
6
7
8

step
0
1
2
3
4
5
6
7
8

(1)
cdm

(1)
Lower

(1)
Upper

.032164
.096568
.140107
.150527
.148979
.151247
.150267
.150336
.150525

-.027215
.003479
.022897
.032116
.031939
.033011
.033202
.032858
.033103

.091544
.189656
.257317
.268938
.26602
.269482
.267331
.267813
.267948

(2)
cdm

(2)
Lower

(2)
Upper

.058681
.062723
.126167
.136583
.146482
.146075
.145542
.146309
.145786

.012529
-.005058
.032497
.038691
.04442
.045201
.044988
.045315
.045206

.104832
.130504
.219837
.234476
.248543
.24695
.246096
.247304
.246365

95% lower and upper bounds reported


(1) irfname = dm, impulse = dln_inv, and response = dln_inc
(2) irfname = dm, impulse = dln_inv, and response = dln_consump

Example 3: After vec


Although all IRFs and orthogonalized IRFs (OIRFs) from models with stationary variables will taper
off to zero, some of the IRFs and OIRFs from models with first-difference stationary variables will not.
This is the key difference between IRFs and OIRFs from systems of stationary variables fit by var or
svar and those obtained from systems of first-difference stationary variables fit by vec. When the
effect of the innovations dies out over time, the shocks are said to be transitory. In contrast, when
the effect does not taper off, shocks are said to be permanent.
In this example, we look at the OIRF from one of the VECMs fit to the unemployment-rate data
analyzed in example 2 of [TS] vec. We see that an orthogonalized shock to Indiana has a permanent
effect on the unemployment rate in Missouri:
. use http://www.stata-press.com/data/r13/urates
. vec missouri indiana kentucky illinois, trend(rconstant) rank(2) lags(4)
(output omitted )
. irf create vec1, set(vecirfs) step(50)
(file vecirfs.irf created)
(file vecirfs.irf now active)
(file vecirfs.irf updated)

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

253

Now we can use irf graph to graph the OIRF of interest:


. irf graph oirf, impulse(indiana) response(missouri)

vec1, indiana, missouri


.3

.2

.1

0
0

50

step
Graphs by irfname, impulse variable, and response variable

The graph shows that the estimated OIRF converges to a positive asymptote, which indicates that
an orthogonalized innovation to the unemployment rate in Indiana has a permanent effect on the
unemployment rate in Missouri.

Technical aspects of IRF files


This section is included for programmers wishing to extend the irf system.
irf create estimates a series of impulseresponse functions and their standard errors. Although
these estimates are saved in an IRF file, most users will never need to look at the contents of this
file. The IRF commands fill in, analyze, present, and manage IRF results.
IRF files are just Stata datasets that have names ending in .irf instead of .dta. The dataset in
the file has a nested panel structure.

Variable irfname contains the irfname specified by the user. Variable impulse records the name
of the endogenous variable whose innovations are the impulse. Variable response records the name
of the endogenous variable that is responding to the innovations. In a model with K endogenous
variables, there are K 2 combinations of impulse and response. Variable step records the periods
for which these estimates were computed.
Below is a catalog of the statistics that irf create estimates and the variable names under which
they are saved in the IRF file.

254

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs


Statistic

Name

impulseresponse functions
orthogonalized impulseresponse functions
dynamic-multiplier functions
cumulative impulseresponse functions
cumulative orthogonalized impulseresponse functions
cumulative dynamic-multiplier functions
Cholesky forecast-error decomposition
structural impulseresponse functions
structural forecast-error decomposition
standard error of the impulseresponse functions
standard error of the orthogonalized impulseresponse functions
standard error of the cumulative impulseresponse functions
standard error of the cumulative orthogonalized impulseresponse functions
standard error of the Cholesky forecast-error decomposition
standard error of the structural impulseresponse functions
standard error of the structural forecast-error decomposition

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd
stdirf
stdoirf
stdcirf
stdcoirf
stdfevd
stdsirf
stdsfevd

In addition to the variables, information is stored in dta characteristics. Much of the following
information is also available in r() after irf describe, where it is often more convenient to obtain
the information. Characteristic dta[version] contains the version number of the IRF file, which
is currently 1.1. Characteristic dta[irfnames] contains a list of all the irfnames in the IRF file.
For each irfname, there are a series of additional characteristics:
Name

Contents

dta[irfname
dta[irfname
dta[irfname
dta[irfname
dta[irfname

model]
order]
exog]
exogvars]
constant]

dta[irfname
dta[irfname
dta[irfname
dta[irfname
dta[irfname
dta[irfname
dta[irfname

lags]
exlags]
tmin]
tmax]
timevar]
tsfmt]
varcns]

dta[irfname svarcns]
dta[irfname step]
dta[irfname stderror]
dta[irfname reps]
dta[irfname version]
dta[irfname
dta[irfname
dta[irfname
dta[irfname
dta[irfname

rank]
trend]
veccns]
sind]
d]

var, sr var, lr var, vec, arima, or arfima


Cholesky order used in IRF estimates
exogenous variables, and their lags, in VAR
exogenous variables in VAR
constant or noconstant, depending on whether
noconstant was specified in var or svar
lags in model
lags of exogenous variables in model
minimum value of timevar in the estimation sample
maximum value of timevar in the estimation sample
name of tsset timevar
format of timevar
constrained or colon-separated list of
constraints placed on VAR coefficients
constrained or colon-separated list of
constraints placed on VAR coefficients
maximum step in IRF estimates
asymptotic, bs, bsp, or none,
depending on the type of standard errors requested
number of bootstrap replications performed
version of the IRF file that originally
held irfname IRF results
number of cointegrating equations
trend() specified in vec
constraints placed on VECM parameters
normalized seasonal indicators included in vec
fractional difference parameter d in arfima

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

255

IRFs and FEVDs


irf create can estimate several types of IRFs and FEVDs for VARs and VECMs. irf create can
also estimate IRFs and cumulative IRFs for ARIMA and ARFIMA models. We first discuss IRF results for
VAR and SVAR models, and then we discuss them in the context of VECMs. Because the cointegrating
VECM is an extension of the stationary VAR framework, the section that discusses the IRF results for
VECMs draws on the earlier VAR material. We conclude our discussion with IRF results for ARIMA
and ARFIMA models.

IRF results for VARs


An introduction to impulseresponse functions for VARs
A pth-order vector autoregressive model (VAR) with exogenous variables is given by

yt = v + A1 yt1 + + Ap ytp + Bxt + ut


where

yt = (y1t , . . . , yKt )0 is a K 1 random vector,


the Ai are fixed K K matrices of parameters,
xt is an R0 1 vector of exogenous variables,
B is a K R0 matrix of coefficients,
v is a K 1 vector of fixed parameters, and
ut is assumed to be white noise; that is,
E(ut ) = 0
E(ut u0t ) =
E(ut u0s ) = 0 for t 6= s
As discussed in [TS] varstable, a VAR can be rewritten in moving-average form only if it is stable.
Any exogenous variables are assumed to be covariance stationary. Because the functions of interest
in this section depend only on the exogenous variables through their effect on the estimated Ai , we
can simplify the notation by dropping them from the analysis. All the formulas given below still
apply, although the Ai are estimated jointly with B on the exogenous variables.
Below we discuss conditions under which the IRFs and forecast-error variance decompositions have a
causal interpretation. Although estimation requires only that the exogenous variables be predetermined,
that is, that E(xjt uit ) = 0 for all i, j , and t, assigning a causal interpretation to IRFs and FEVDs
requires that the exogenous variables be strictly exogenous, that is, that E(xjs uit ) = 0 for all i, j ,
s, and t.
IRFs describe how the innovations to one variable affect another variable after a given number of
periods. For an example of how IRFs are interpreted, see Stock and Watson (2001). They use IRFs to
investigate the effect of surprise shocks to the Federal Funds rate on inflation and unemployment. In
another example, Christiano, Eichenbaum, and Evans (1999) use IRFs to investigate how shocks to
monetary policy affect other macroeconomic variables.

Consider a VAR without exogenous variables:

yt = v + A1 yt1 + + Ap ytp + ut

(1)

The VAR represents the variables in yt as functions of its own lags and serially uncorrelated innovations
ut . All the information about contemporaneous correlations among the K variables in yt is contained
in . In fact, as discussed in [TS] var svar, a VAR can be viewed as the reduced form of a dynamic
simultaneous-equation model.

256

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

To see how the innovations affect the variables in yt after, say, i periods, rewrite the model in its
moving-average form

X
yt = +
i uti
(2)
i=0

where is the K 1 time-invariant mean of yt , and



IK
i = Pi
j=1 ij Aj

if i = 0
if i = 1, 2, . . .

We can rewrite a VAR in the moving-average form only if it is stable. Essentially, a VAR is stable
if the variables are covariance stationary and none of the autocorrelations are too high (the issue of
stability is discussed in greater detail in [TS] varstable).
The i are the simple IRFs. The j, k element of i gives the effect of a 1time unit increase in
the k th element of ut on the j th element of yt after i periods, holding everything else constant.
Unfortunately, these effects have no causal interpretation, which would require us to be able to answer
the question, How does an innovation to variable k , holding everything else constant, affect variable j
after i periods? Because the ut are contemporaneously correlated, we cannot assume that everything
else is held constant. Contemporaneous correlation among the ut implies that a shock to one variable
is likely to be accompanied by shocks to some of the other variables, so it does not make sense to
shock one variable and hold everything else constant. For this reason, (2) cannot provide a causal
interpretation.
This shortcoming may be overcome by rewriting (2) in terms of mutually uncorrelated innovations.
Suppose that we had a matrix P, such that = PP0 . If we had such a P, then P1 P01 = IK ,
and
E{P1 ut (P1 ut )0 } = P1 E{(ut u0t )P01 } = P1 P01 = IK
We can thus use P1 to orthogonalize the ut and rewrite (2) as

yt = +

i PP1 uti

i=0

=+

i P1 uti

i=0

=+

i wti

i=0

where i = i P and wt = P1 ut . If we had such a P, the wk would be mutually orthogonal,


and no information would be lost in the holding-everything-else-constant assumption, implying that
the i would have the causal interpretation that we seek.
Choosing a P is similar to placing identification restrictions on a system of dynamic simultaneous
equations. The simple IRFs do not identify the causal relationships that we wish to analyze. Thus we
seek at least as many identification restrictions as necessary to identify the causal IRFs.
So, where do we get such a P? Sims (1980) popularized the method of choosing P to be the
b The IRFs based on this choice of P are known as the orthogonalized
Cholesky decomposition of .
b is equivalent to imposing a recursive
IRFs. Choosing P to be the Cholesky decomposition of
structure for the corresponding dynamic structural equation model. The ordering of the recursive
structure is the same as the ordering imposed in the Cholesky decomposition. Because this choice is
arbitrary, some researchers will look at the OIRFs with different orderings assumed in the Cholesky
decomposition. The order() option available with irf create facilitates this type of analysis.

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

257

The SVAR approach integrates the need to identify the causal IRFs into the model specification and
estimation process. Sufficient identification restrictions can be obtained by placing either short-run or
long-run restrictions on the model. The VAR in (1) can be rewritten as

yt v A1 yt1 Ap ytp = ut
Similarly, a short-run SVAR model can be written as

A(yt v A1 yt1 Ap ytp ) = Aut = Bet

(3)

where A and B are K K nonsingular matrices of parameters to be estimated, et is a K 1 vector


of disturbances with et N (0, IK ), and E(et e0s ) = 0K for all s 6= t. Sufficient constraints must
be placed on A and B so that P is identified. One way to see the connection is to draw out the
implications of the latter equality in (3). From (3) it can be shown that
= A1 B(A1 B)0

b and B
b are obtained by maximizing the concentrated
As discussed in [TS] var svar, the estimates A
b obtained from the underlying VAR. The short-run
log-likelihood function on the basis of the
b 1 B
b to identify the causal IRFs. The long-run SVAR approach works
SVAR approach chooses P = A
1
b 1 is the matrix of estimated long-run or accumulated
b B
b =A
b , where A
similarly, with P = C
effects of the reduced-form VAR shocks.
There is one important difference between long-run and short-run SVAR models. As discussed by
Amisano and Giannini (1997, chap. 6), in the short-run model the constraints are applied directly to
the parameters in A and B. Then A and B interact with the estimated parameters of the underlying
VAR. In contrast, in a long-run model, the constraints are placed on functions of the estimated VAR
parameters. Although estimation and inference of the parameters in C is straightforward, obtaining
the asymptotic standard errors of the structural IRFs requires untenable assumptions. For this reason,
irf create does not estimate the asymptotic standard errors of the structural IRFs generated by
long-run SVAR models. However, bootstrap standard errors are still available.
An introduction to dynamic-multiplier functions for VARs
A dynamic-multiplier function measures the effect of a unit change in an exogenous variable on the
endogenous variables over time. Per Lutkepohl (2005, chap. 10), if the VAR with exogenous variables
is stable, it can be rewritten as

yt =

X
i=0

Di xti +

i uti

i=0

where the Di are the dynamic-multiplier functions. (See Methods and formulas for details.) Some
authors refer to the dynamic-multiplier functions as transfer functions because they specify how a
unit change in an exogenous variable is transferred to the endogenous variables.

258

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

Technical note
irf create computes dynamic-multiplier functions only after var. After short-run SVAR models,
the dynamic multipliers from the VAR are the same as those from the SVAR. The dynamic multipliers
for long-run SVARs have not yet been worked out.

An introduction to forecast-error variance decompositions for VARs


Another measure of the effect of the innovations in variable k on variable j is the FEVD. This
method, which is also known as innovation accounting, measures the fraction of the error in forecasting
variable j after h periods that is attributable to the orthogonalized innovations in variable k . Because
deriving the FEVD requires orthogonalizing the ut innovations, the FEVD is always predicated upon
a choice of P.
Lutkepohl (2005, sec. 2.2.2) shows that the h-step forecast error can be written as

bt (h) =
yt+h y

h1
X

i ut+hi

(4)

i=0

where yt+h is the value observed at time t + h and ybt (h) is the h-step-ahead predicted value for
yt+h that was made at time t.
Because the ut are contemporaneously correlated, their distinct contributions to the forecast error
cannot be ascertained. However, if we choose a P such that = PP0 , as above, we can orthogonalize
the ut into wt = P1 ut . We can then ascertain the relative contribution of the distinct elements of
wt . Thus we can rewrite (4) as

bt (h) =
yt+h y

h1
X

i PP1 ut+hi

i=0

h1
X

i wt+hi

i=0

Because the forecast errors can be written in terms of the orthogonalized errors, the forecasterror variance can be written in terms of the orthogonalized error variances. Forecast-error variance
decompositions measure the fraction of the total forecast-error variance that is attributable to each
orthogonalized shock.

Technical note
The details in this note are not critical to the discussion that follows. A forecast-error variance
decomposition is derived for a given P. Per Lutkepohl (2005, sec. 2.3.3), letting mn,i be the m, nth
element of i , we can express the h-step forecast error of the j th component of yt as

bj (h) =
yj,t+h y

h1
X

j1,1 w1,t+hi + + jK,i wK,t+hi

i=0

K
X
k=1

jk,0 wk,t+h + + jk,h1 wk,t+1

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

259

The wt , which were constructed using P, are mutually orthogonal with unit variance. This allows
us to compute easily the mean squared error (MSE) of the forecast of variable j at horizon h in terms
of the contributions of the components of wt . Specifically,
2

E[{yj,t+h yj,t (h)} ] =

K
X

2
2
(jk,0
+ + jk,h1
)

k=1

The k th term in the sum above is interpreted as the contribution of the orthogonalized innovations
in variable k to the h-step forecast error of variable j . Note that the k th element in the sum above
can be rewritten as
h1
X
2
2
2
(jk,0 + + jk,h1 ) =
e0j k ek
i=0

where ei is the ith column of IK . Normalizing by the forecast error for variable j at horizon h yields

Ph1
jk,h =
where MSE{yj,t (h)} =

e0j k ek
MSE{yj,t (h)}

2

i=0

Ph1 PK
i=0

2
k=1 jk,i .

Because the FEVD depends on the choice of P, there are different forecast-error variance decompositions associated with each distinct P. irf create can estimate the FEVD for a VAR or an
b For an SVAR, P is the estimated structural
SVAR. For a VAR, P is the Cholesky decomposition of .
b 1 B
b for short-run models and P = C
b for long-run SVAR models. Due to the
decomposition, P = A
same complications that arose with the structural impulseresponse functions, the asymptotic standard
errors of the structural FEVD are not available after long-run SVAR models, but bootstrap standard
errors are still available.

IRF results for VECMs


An introduction to impulseresponse functions for VECMs
As discussed in [TS] vec intro, the VECM is a reparameterization of the VAR that is especially
useful for fitting VARs with cointegrating variables. This implies that the estimated parameters for
the corresponding VAR model can be backed out from the estimated parameters of the VECM model.
This relationship means we can use the VAR form of the cointegrating VECM to discuss the IRFs for
VECMs.
Consider a cointegrating VAR with one lag with no constant or trend,

yt = Ayt1 + ut

(5)

where yt is a K 1 vector of endogenous, first-difference stationary variables among which there


are 1 r < K cointegration equations; A is K K matrix of parameters; and ut is a K 1 vector
of i.i.d. disturbances.

260

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

We developed intuition for the IRFs from a stationary VAR by rewriting the VAR as an infiniteorder vector moving-average (VMA) process. While the Granger representation theorem establishes
the existence of a VMA formulation of this model, because the cointegrating VAR is not stable, the
inversion is not nearly so intuitive. (See Johansen [1995, chapters 3 and 4] for more details.) For this
reason, we use (5) to develop intuition for the IRFs from a cointegrating VAR.
Suppose that K is 3, that u1 = (1, 0, 0), and that we want to analyze the time paths of the
variables in y conditional on the initial values y0 = 0, A, and the condition that there are no more
shocks to the system, that is, 0 = u2 = u3 = . These assumptions and (5) imply that

y1 = u1
y2 = Ay1 = Au1
y3 = Ay2 = A2 u1
and so on. The ith-row element of the first column of As contains the effect of the unit shock to the
first variable after s periods. The first column of As contains the IRF of a unit impulse to the first
variable after s periods. We could deduce the IRFs of a unit impulse to any of the other variables by
administering the unit shock to one of them instead of to the first variable. Thus we can see that the
(i, j)th element of As contains the unit IRF from variable j to variable i after s periods. By starting
with orthogonalized shocks of the form P1 ut , we can use the same logic to derive the OIRFs to be
As P.
For the stationary VAR, stability implies that all the eigenvalues of A have moduli strictly less than
one, which in turn implies that all the elements of As 0 as s . This implies that all the
IRFs from a stationary VAR taper off to zero as s . In contrast, in a cointegrating VAR, some of
the eigenvalues of A are 1, while the remaining eigenvalues have moduli strictly less than 1. This
implies that in cointegrating VARs some of the elements of As are not going to zero as s ,
which in turn implies that some of the IRFs and OIRFs are not going to zero as s . The fact that
the IRFs and OIRFs taper off to zero for stationary VARs but not for cointegrating VARs is one of the
key differences between the two models.
When the IRF or OIRF from the innovation in one variable to another tapers off to zero as time
goes on, the innovation to the first variable is said to have a transitory effect on the second variable.
When the IRF or OIRF does not go to zero, the effect is said to be permanent.
Note that, because some of the IRFs and OIRFs do not taper off to zero, some of the cumulative
IRFs and OIRFs diverge over time.

An introduction to forecast-error variance decompositions for VECMs


The results from An introduction to impulseresponse functions for VECMs can be used to show
that the interpretation of FEVDs for a finite number of steps in cointegrating VARs is essentially the
same as in the stationary case. Because the MSE of the forecast is diverging, this interpretation is valid
only for a finite number of steps. (See [TS] vec intro and [TS] fcast compute for more information
on this point.)

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

261

IRF results for ARIMA and ARFIMA


A covariance-stationary additive ARMA(p, q) model can be written as
(Lp )(yt xt ) = (Lq )t
where

(Lp ) = 1 1 L 2 L2 p Lp
(Lq ) = 1 + 1 L + 2 L2 + + q Lq

and Lj yt = ytj .
We can rewrite the above model as an infinite-order moving-average process

yt = xt + (L)t
where
(L) =

(L)
= 1 + 1 L + 2 L2 +
(L)

(6)

This representation shows the impact of the past innovations on the current yt . The ith coefficient
describes the response of yt to a one-time impulse in ti , holding everything else constant. The i
coefficients are collectively referred to as the impulseresponse function of the ARMA model. For a
covariance-stationary series, the i coefficients decay exponentially.
A covariance-stationary multiplicative seasonal ARMA model, often abbreviated SARMA, of order
(p, q) (P, Q)s can be written as
(Lp )s (LP )(yt xt ) = (Lq )s (LQ )t
where

s (LP ) = (1 s,1 Ls s,2 L2s s,P LP s )


s (LQ ) = (1 + s,1 Ls + s,2 L2s + + s,Q LQs )

with (Lp ) and (Lq ) defined as above.


We can express this model as an additive ARMA model by multiplying the terms and imposing
nonlinear constraints on multiplied coefficients. For example, consider the SARMA model given by

(1 1 L)(1 4,1 L4 )yt = t


Expanding the above equation and solving for yt yields

yt = 1 yt1 + 4,1 yt4 1 4,1 yt5 + t


or, in ARMA terms,

yt = 1 yt1 + 4 yt4 + 5 yt5 + t


subject to the constraint 5 = 1 4,1 .
Once we have obtained an ARMA representation of a SARMA process, we obtain the IRFs from (6).

262

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

An ARFIMA(p, d, q) model can be written as


(Lp )(1 L)d (yt xt ) = (Lq )t
with (1 L)d denoting a fractional integration operation.
Solving for yt , we obtain

yt = xt + (1 L)d (L)t

This makes it clear that the impulseresponse function for an ARFIMA model corresponds to a
fractionally differenced impulseresponse function for an ARIMA model. Because of the fractional
differentiation, the i coefficients decay very slowly; see Remarks and examples in [TS] arfima.

Methods and formulas


Methods and formulas are presented under the following headings:
Impulseresponse function formulas for VARs
Dynamic-multiplier function formulas for VARs
Forecast-error variance decomposition formulas for VARs
Impulseresponse function formulas for VECMs
Algorithms for bootstrapping the VAR IRF and FEVD standard errors
Impulseresponse function formulas for ARIMA and ARFIMA

Impulseresponse function formulas for VARs


The previous discussion implies that there are three different choices of P that can be used to
obtain distinct i . P is the Cholesky decomposition of for the OIRFs. For the structural IRFs,
P = A1 B for short-run models, and P = C for long-run models. We will distinguish between
lr
the three by defining oi to be the OIRFs, sr
i to be the short-run structural IRFs, and i to be the
long-run structural IRFs.

b c to be the Cholesky decomposition of ,


b sr = A
b 1 B
b to be the short-run
b P
We also define P
b
b
structural decomposition, and Plr = C to be the long-run structural decomposition.
b i and
b from var or svar, the estimates of the simple IRFs and the
Given estimates of the A
OIRFs are, respectively,

bi =

i
X

bj
b ij A

j=1

and

bc
b oi =
b iP

b j = 0K for j > p.
where A
b and B
b , or C
b , from svar, the estimates of the structural IRFs are either
Given the estimates A
b sr
b b

i = i Psr
or

b lr
b b

i = i Plr

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

263

The estimated structural IRFs stored in an IRF file with the variable name sirf may be from
either a short-run model or a long-run model, depending on the estimation results used to create the
IRFs. As discussed in [TS] irf describe, you can easily determine whether the structural IRFs were
generated from a short-run or a long-run SVAR model using irf describe.
Following Lutkepohl (2005, sec. 3.7), estimates of the cumulative IRFs and the cumulative
orthogonalized impulseresponse functions (COIRFs) at period n are, respectively,

bn =

n
X

bi

i=0

and

bn =

n
X

bi

i=0

The asymptotic standard errors of the different impulseresponse functions are obtained by
applications of the delta method. See Lutkepohl (2005, sec. 3.7) and Amisano and Giannini (1997,
chap. 4) for the derivations. See Serfling (1980, sec. 3.3) for a discussion of the delta method. In
presenting the variancecovariance matrix estimators, we make extensive use of the vec() operator,
where vec(X) is the vector obtained by stacking the columns of X.

b n ), and
Lutkepohl (2005, sec. 3.7) derives the asymptotic VCEs of vec(i ), vec(oi ), vec(
2
2
2
b n ). Because vec(i ) is K 1, the asymptotic VCE of vec(i ) is K K , and it is given by
vec(
b G0i
Gi

b
where

Gi =

Pi1

m=0

c 0 )(i1m)
bm
J(M

Gi is K 2 K 2 p

J = (IK , 0K , . . . , 0K )
b
b2 ... A
b p1 A
bp
A1 A
IK 0K . . . 0K 0K

c
0
I
0K 0K
M=
.K K .
..
..
.

..
.

0K 0K . . .

IK

0K

J is KKp

b is KpKp
M

b i are the estimates of the coefficients on the lagged variables in the VAR, and
b is the VCE
The A

b
2
2
b
b
b
matrix of
b = vec(A1 , . . . , Ap ).
is
a
K
p

K
p
matrix
whose
elements
come
from
the VCE
b
of the VAR coefficient estimator. As such, this VCE is the VCE of the constrained estimator if there
are any constraints placed on the VAR coefficients.
b n ) after n periods is given by
The K 2 K 2 asymptotic VCE matrix for vec(
b F0n
Fn

b
where

Fn =

n
X

Gi

i=1

The K 2 K 2 asymptotic VCE matrix of the vectorized, orthogonalized, IRFs at horizon i, vec(oi ),
is

b Ci
b C0i + Ci
Ci
b

264

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

where

LK solves

C0 = 0

C0 is K 2 K 2 p

b 0 IK )Gi , i = 1, 2, . . .
Ci = (P
c

Ci is K 2 K 2 p

Ci = (IK i )H, i = 0, 1, . . .
n
o1
b c IK )L0
H = L0K LK NK (P
K

Ci is K 2 K 2

vech(F) =

LK vec(F)
for

KK solves

LK is K

1
2

(IK 2 + KK )

(K+1)
K 2
2

DK vech(F) = vec(F)

x11
x21
.
.
.

xK1

x
vech(X) =
22

..
.

xK2
.
..

for

KK is K 2 K 2
NK is K 2 K 2

b = 2D+ (
b
b )D+

K
K
b

1
0
0
D
)
D
D+
=
(D
K
K K
K
DK solves

(K+1)
2

F K K and symmetric

KK vec(G) = vec(G0 ) for any K K matrix G


NK =

H is K 2 K

b is

(K+1)
(K+1)
K 2
2

is K
D+
K

F K K and symmetric

(K+1)
K 2
2

DK is K 2 K

(K+1)
2

for

X K K

vech(X) is K

(K+1)
1
2

xKK
b is the VCE of vech().
b More details about LK , KK , DK and vech() are available in
Note that
b

Lutkepohl (2005, sec. A.12). Finally, as Lutkepohl (2005, 113114) discusses, D+


K is the Moore
Penrose inverse of DK .
As discussed in Amisano and Giannini (1997, chap. 6), the asymptotic standard errors of the
structural IRFs are available for short-run SVAR models but not for long-run SVAR models. Following
Amisano and Giannini (1997, chap. 5), the asymptotic K 2 K 2 VCE of the short-run structural IRFs
after i periods, when a maximum of h periods are estimated, is the i, i block of

n
o
n
o0
e i
e 0 + IK (JM
c i J0 ) (0) IK (JM
c j J0 )
b (h)ij = G
b G

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

265

where

e 0 = 0K
G
n

o
e i = Pi1 P
b 0 J(M
c 0 )i1k JM
c k J0
G
sr
k=0

G0 is K 2 K 2 p
Gi is K 2 K 2 p

b (0) = Q2
b W Q02

b (0) is

b W = Q1
b AB Q01

b W is

b sr
b0 P
Q2 = P
n sr
o
b 1 ), (P
b 01 B1 )
Q1 = (IK B

K 2 K 2
K 2 K 2

Q2 is K 2 K 2
Q1 is K 2 2K 2

sr

b AB is the 2K 2 2K 2 VCE of the estimator of vec(A, B).


and

Dynamic-multiplier function formulas for VARs


This section provides the details of how irf create estimates the dynamic-multiplier functions
and their asymptotic standard errors.
A pth order vector autoregressive model (VAR) with exogenous variables may be written as

yt = v + A1 yt1 + + Ap ytp + B0 xt + B1 xt1 + + Bs xts + ut


where all the notation is the same as above except that the s K R matrices B1 , B2 , . . . , Bs are
explicitly included and s is the number of lags of the R exogenous variables in the model.
Lutkepohl (2005) shows that the dynamic-multipliers Di are consistently estimated by

b i = Jx A
ei B
b
D
x x
where

Jx = (IK , 0K , . . . , 0K )


c B
b
M
e
Ax = e e
0 I
b
b2 ... B
bs
B1 B
... 0


0
b = 0.
B
..
..
..
..

.
.
.

0 0 ... 0

0R 0R . . . 0R 0R
IR 0R . . . 0R 0R

0R 0R
eI = 0R IR
.
..
..
..
..

.
.
.
0R

0
b
e
Bx = B0

e0 = B
b0
B
0

0R . . . IR 0R

I0

0 0
0
0

I0 = [ IR 0R 0R ]

i {0, 1, . . .}
J is K(Kp+Rs)

e x is (Kp+Rs)(Kp+Rs)
A

b is KpRs
B

eI is

RsRs

b 0x is R(Kp+Rs)
B
e is RKp
B

I is RRs

is a K R matrix of 0s and 0
e is a Rs Kp matrix of 0s.
and 0

266

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

Consistent estimators of the cumulative dynamic-multiplier functions are given by

Di =

i
X

bj
D

j=0

Letting x = vec[A1 A2 Ap B1 B2 Bs B0 ] and letting b be the asymptotic variance


x
b i is G
e i G
e0
b
covariance estimator (VCE) of x , Lutkepohl shows that an asymptotic VCE of D
bx i

where
" i1
#
X
0
i1j
j
0
j
ei =
e
e J , I R Jx A
e Jx
G
Bx A
Jx A
x
x x
x
j=0

i is
Similarly, an asymptotic VCE of D

P

i
j=0


P

i
ej
e0 .
G
G
j
j=0
bx

Forecast-error variance decomposition formulas for VARs


This section provides details of how irf create estimates the Cholesky FEVD, the structural
FEVD, and their standard errors. Beginning with the Cholesky-based forecast-error decompositions,
the fraction of the h-step-ahead forecast-error variance of variable j that is attributable to the Cholesky
orthogonalized innovations in variable k can be estimated as
Ph1 0
b i ek )2
(ej

bjk,h = i=0
d j (h)
MSE
where MSEj (h) is the j th diagonal element of
h1
X

b i
b
b 0i

i=0

(See Lutkepohl [2005, 109] for a discussion of this result.)


bjk,h and MSEj (h) are scalars. The square
of the standard error of
bjk,h is

b djk,h
b d0jk,h + djk,h
djk,h
where

(
Ph1

b c ek )(e0 P
b0
b iP
MSEj (h)(e0j
k c

e0j )Gi
)

djk,h

= MSE2 (h)2
j

djk,h

0
b c ek )2 Ph1 (e0
b b
(e0j i P
m=0 j m ej )Gm
(
Ph1
b i Pc ek )(e0 e0j
b i )H
=
MSEj (h)(e0j
k
i=0

i=0

djk,h is 1K 2 p

)
b c ek )2
b iP
(e0j

Ph1

0b
m=0 (ej m

b m )DK
ej

1
MSEj (h)2

G0 = 0
and DK is the K 2 K{(K + 1)/2} duplication matrix defined previously.

djk,h is 1K

(K+1)
2

G0 is K 2 K 2 p

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

267

For the structural forecast-error decompositions, we follow Amisano and Giannini (1997, sec. 5.2).
They define the matrix of structural forecast-error decompositions at horizon s, when a maximum of
h periods are estimated, as

c
cs = F
b 1 M
fs
W
s
bs =
F

s1
X

b sr
b sr0

i i

for s = 1, . . . , h + 1
!

IK

i=0
s1

X sr
c
fs =
bi
b sr
M

i
i=0

where is the Hadamard, or element-by-element, product.

c s ) is given by
The K 2 K 2 asymptotic VCE of vec(W
e s (h)Z
e0
Z
s
b (h) is as derived previously, and
where
(
es =
Z

c s)
c s)
c s ) vec(W
vec(W
vec(W
sr ,
sr , ,
b 0 ) vec(
b1 )
b sr
vec(
vec(
h)

n
o
c s)
vec(W
b 1 e b sr
c 0 b 1 e
b sr
sr = 2 (IK Fs )D(j ) (Ws Fs )D(IK )NK (j IK )
bj )
vec(
e
If X is an n n matrix, then D(X)
is the n2 n2 matrix with vec(X) on the diagonal and zeros
in all the off-diagonal elements, and NK is as defined previously.

Impulseresponse function formulas for VECMs


We begin by providing the formulas for backing out the estimates of the Ai from the i estimated
by vec. As discussed in [TS] vec intro, the VAR in (1) can be rewritten as a VECM:

yt = v + yt1 + 1 yt1 + p1 yp2 + t


vec estimates and the i . Johansen (1995, 25) notes that
=

p
X

A i IK

(6)

i=1

where IK is the K -dimensional identity matrix, and


i =

p
X
j=i+1

Aj

(7)

268

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

Defining
= IK

p1
X

i=1

and using (6) and (7) allow us to solve for the Ai as

A 1 = + 1 + IK
Ai = i i1

for i = {2, . . . , p 1}

and

Ap = p1
Using these formulas, we can back out estimates of Ai from the estimates of the i and produced
by vec. Then we simply use the formulas for the IRFs and OIRFs presented in Impulseresponse
function formulas for VARs.
The running sums of the IRFs and OIRFs over the steps within each impulseresponse pair are the
cumulative IRFs and OIRFs.

Algorithms for bootstrapping the VAR IRF and FEVD standard errors
irf create offers two bootstrap algorithms for estimating the standard errors of the various IRFs
and FEVDs. Both var and svar contain estimators for the coefficients in a VAR that are conditional on
the first p observations. The two bootstrap algorithms are also conditional on the first p observations.
Specifying the bs option calculates the standard errors by bootstrapping the residuals. For a
bootstrap with R repetitions, this method uses the following algorithm:
1. Fit the model and save the estimated parameters.
2. Use the estimated coefficients to calculate the residuals.
3. Repeat steps 3a to 3c R times.
3a. Draw a simple random sample of size T with replacement from the residuals. The
random samples are drawn over the K 1 vectors of residuals. When the tth vector is
drawn, all K residuals are selected. This preserves the contemporaneous correlations
among the residuals.
3b. Use the p initial observations, the sampled residuals, and the estimated coefficients to
construct a new sample dataset.
3c. Fit the model and calculate the different IRFs and FEVDs.
3d. Save these estimates as observation r in the bootstrapped dataset.
4. For each IRF and FEVD, the estimated standard deviation from the R bootstrapped estimates
is the estimated standard error of that impulseresponse function or forecast-error variance
decomposition.
Specifying the bsp option estimates the standard errors by a multivariate normal parametric
bootstrap. The algorithm for the multivariate normal parametric bootstrap is identical to the one
above, with the exception that 3a is replaced by 3a(bsp):
3a(bsp). Draw T pseudovariates from a multivariate normal distribution with covariance matrix
b
.

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

269

Impulseresponse function formulas for ARIMA and ARFIMA


The previous discussion showed that a SARMA process can be rewritten as an ARMA process and
that for an ARMA process, we can express (L) in terms of (L) and (L),
(L)
(L)

(L) =
Expanding the above, we obtain

0 + 1 L + 2 L2 + =

1 + 1 L + 2 L2 +
1 1 L 2 L2

Given the estimate of the autoregressive terms


b and the moving-average terms b
, the IRF is
obtained by solving the above equation for the weights. The i are calculated using the recursion

bi = bi +

p
X

bj bij

j=1

with 0 = 1 and i = 0 for i > max(p, q + 1).


The asymptotic standard errors for the IRF for ARMA are calculated using the delta method;
b be the estimate of the
see Serfling (1980, sec. 3.3) for a discussion of the delta method. Let
b
variancecovariance matrix for
b and , and let be a matrix of derivatives of i with respect to

b and b
. Then the standard errors for bi are calculated as

b 0
i
i
The IRF for the ARFIMA(p, d, q) model is obtained by applying the filter (1 L)d to (L). The
filter is given by Hassler and Kokoszka (2010) as

(1 L)d =

bi Li

i=0

with b0 = 1 and subsequent bi calculated by the recursion

b
bbi = d + i 1 bbi1
i
The resulting IRF is then given by

bi =

i
X

bjbbij

j=0

The asymptotic standard errors for the IRF for ARFIMA are calculated using the delta method. Let
b
be the estimate of the variancecovariance matrix for
b, b
, and db, and let be a matrix of
b
b
derivatives of i with respect to
b, , and d. Then the standard errors for bi are calculated as

b 0
i
i

270

irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs

References
Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer.
Christiano, L. J., M. Eichenbaum, and C. L. Evans. 1999. Monetary policy shocks: What have we learned and to
what end? In Handbook of Macroeconomics: Volume 1A, ed. J. B. Taylor and M. Woodford. New York: Elsevier.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Hassler, U., and P. Kokoszka. 2010. Impulse responses of fractionally integrated processes with long memory.
Econometric Theory 26: 18551861.
Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Serfling, R. J. 1980. Approximation Theorems of Mathematical Statistics. New York: Wiley.
Sims, C. A. 1980. Macroeconomics and reality. Econometrica 48: 148.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101115.

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf ctable Combined tables of IRFs, dynamic-multiplier functions, and FEVDs
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf ctable (spec1 )

(spec2 ) . . .

(specN )

 

, options

where (speck ) is
(irfname impulsevar responsevar stat


, spec options )

irfname is the name of a set of IRF results in the active IRF file. impulsevar should be specified as an
endogenous variable for all statistics except dm and cdm; for those, specify as an exogenous variable.
responsevar is an endogenous variable name. stat is one or more statistics from the list below:
stat

Description

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd

impulseresponse function
orthogonalized impulseresponse function
dynamic-multiplier function
cumulative impulseresponse function
cumulative orthogonalized impulseresponse function
cumulative dynamic-multiplier function
Cholesky forecast-error variance decomposition
structural impulseresponse function
structural forecast-error variance decomposition

options

Description

set(filename)
noci
stderror
individual
title("text")
step(#)
level(#)

make filename active


do not report confidence intervals
include standard errors for each statistic
make an individual table for each combination
use text as overall table title
set common maximum step
set confidence level; default is level(95)

spec options

Description

noci
stderror
level(#)

do not report confidence intervals


include standard errors for each statistic
set confidence level; default is level(95)

ititle("text")

use text as individual subtitle for specific table

spec options may be specified within a table specification, globally, or both. When specified in a table specification,
the spec options affect only the specification in which they are used. When supplied globally, the spec options
affect all table specifications. When specified in both places, options for the table specification take precedence.
ititle() does not appear in the dialog box.
271

272

irf ctable Combined tables of IRFs, dynamic-multiplier functions, and FEVDs

Menu
Statistics

>

Multivariate time series

>

IRF and FEVD analysis

>

Combined tables

Description
irf ctable makes a table or a combined table of IRF results. Each block within a pair of matching
parentheseseach (speck )specifies the information for a specific table. irf ctable combines these
tables into one table, unless the individual option is specified, in which case separate tables for
each block are created.
irf ctable operates on the active IRF file; see [TS] irf set.

Options
set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.
noci suppresses reporting of the confidence intervals for each statistic. noci is assumed when the
model was fit by vec because no confidence intervals were estimated.
stderror specifies that standard errors for each statistic also be included in the table.
individual places each block, or (speck ), in its own table. By default, irf ctable combines all
the blocks into one table.
title("text") specifies a title for the table or the set of tables.
step(#) specifies the maximum number of steps to use for all tables. By default, each table is
constructed using all steps available.
level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they
are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the
width of confidence intervals.
The following option is available with irf ctable but is not shown in the dialog box:
ititle("text") specifies an individual subtitle for a specific table. ititle() may be specified only
when the individual option is also specified.

Remarks and examples


If you have not read [TS] irf, please do so.
Also see [TS] irf table for a slightly easier to use, but less powerful, table command.
irf ctable creates a series of tables from IRF results. The information enclosed within each set
of parentheses,


(irfname impulsevar responsevar stat , spec options )
forms a request for a specific table.
The first partirfname impulsevar responsevaridentifies a set of IRF estimates or a set of variance
decomposition estimates. The next partstatspecifies which statistics are to be included in the
table. The last partspec optionsincludes the noci, level(), and stderror options, and places
(or suppresses) additional columns in the table.

irf ctable Combined tables of IRFs, dynamic-multiplier functions, and FEVDs

273

Each specific table displays the requested statistics corresponding to the specified combination of
irfname, impulsevar, and responsevar over the step horizon. By default, all the individual tables are
combined into one table. Also by default, all the steps, or periods, available are included in the table.
You can use the step() option to impose a common maximum for all tables.

Example 1
In example 1 of [TS] irf table, we fit a model using var and we saved the IRFs for two different
orderings. The commands we used were
.
.
.
.
.

use
var
irf
irf
irf

http://www.stata-press.com/data/r13/lutkepohl2
dln_inv dln_inc dln_consump
set results4
create ordera, step(8)
create orderb, order(dln_inc dln_inv dln_consump) step(8)

We then formed the desired table by typing


. irf table oirf fevd, impulse(dln_inc) response(dln_consump) noci std
> title("Ordera versus orderb")

Using irf ctable, we can form the equivalent table by typing


. irf ctable (ordera dln_inc dln_consump oirf fevd)
>
(orderb dln_inc dln_consump oirf fevd),
>
noci std title("Ordera versus orderb")
Ordera versus orderb

step
0
1
2
3
4
5
6
7
8

step
0
1
2
3
4
5
6
7
8

(1)
oirf

(1)
S.E.

(1)
fevd

(1)
S.E.

.005123
.001635
.002948
-.000221
.000811
.000462
.000044
.000151
.000091

.000878
.000984
.000993
.000662
.000586
.000333
.000275
.000162
.000114

0
.288494
.294288
.322454
.319227
.322579
.323552
.323383
.323499

0
.077483
.073722
.075562
.074063
.075019
.075371
.075314
.075386

(2)
oirf

(2)
S.E.

(2)
fevd

(2)
S.E.

.005461
.001578
.003307
-.00019
.000846
.000491
.000069
.000158
.000096

.000925
.000988
.001042
.000676
.000617
.000349
.000292
.000172
.000122

0
.327807
.328795
.370775
.366896
.370399
.371487
.371315
.371438

0
.08159
.077519
.080604
.079019
.079941
.080323
.080287
.080366

(1) irfname = ordera, impulse = dln_inc, and response = dln_consump


(2) irfname = orderb, impulse = dln_inc, and response = dln_consump

The output is displayed in one table. Because the table did not fit horizontally, it automatically
wrapped. At the bottom of the table is a list of keys that appear at the top of each column. The

274

irf ctable Combined tables of IRFs, dynamic-multiplier functions, and FEVDs

results in the table above indicate that the orthogonalized IRFs do not change by much. Because the
estimated forecast-error variances do change, we might want to produce two tables that contain the
estimated forecast-error variance decompositions and their 95% confidence intervals:
. irf ctable (ordera dln_inc dln_consump fevd)
>
(orderb dln_inc dln_consump fevd), individual
Table 1

step
0
1
2
3
4
5
6
7
8

(1)
fevd

(1)
Lower

(1)
Upper

0
.288494
.294288
.322454
.319227
.322579
.323552
.323383
.323499

0
.13663
.149797
.174356
.174066
.175544
.175826
.17577
.175744

0
.440357
.43878
.470552
.464389
.469613
.471277
.470995
.471253

95% lower and upper bounds reported


(1) irfname = ordera, impulse = dln_inc, and response = dln_consump
Table 2

step
0
1
2
3
4
5
6
7
8

(2)
fevd

(2)
Lower

(2)
Upper

0
.327807
.328795
.370775
.366896
.370399
.371487
.371315
.371438

0
.167893
.17686
.212794
.212022
.213718
.214058
.213956
.213923

0
.487721
.48073
.528757
.52177
.52708
.528917
.528674
.528953

95% lower and upper bounds reported


(2) irfname = orderb, impulse = dln_inc, and response = dln_consump

Because we specified the individual option, the output contains two tables, one for each specific
table command. At the bottom of each table is a list of the keys used in that table and a note indicating
the level of the confidence intervals that we requested. The results from table 1 and table 2 indicate
that each estimated function is well within the confidence interval of the other, so we conclude that
the functions are not significantly different.

irf ctable Combined tables of IRFs, dynamic-multiplier functions, and FEVDs

Stored results
irf ctable stores the following in r():
Scalars
r(ncols)
r(k umax)
r(k)
Macros
r(key#)
r(tnotes)

number of columns in all tables


number of distinct keys
number of specific table commands
#th key
list of keys applied to each column

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

275

Title
irf describe Describe an IRF file
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf describe

irf resultslist

 

, options

options

Description

set(filename)
using(irf filename)
detail
variables

make filename active


describe irf filename without making active
show additional details of IRF results
show underlying structure of the IRF dataset

Menu
Statistics

>

Multivariate time series

>

Manage IRF results and files

>

Describe IRF file

Description
irf describe describes the IRF results saved in an IRF file.
If set() or using() is not specified, the IRF results of the active IRF file are described.

Options
set(filename) specifies the IRF file to be described and set; see [TS] irf set. If filename is specified
without an extension, .irf is assumed.
using(irf filename) specifies the IRF file to be described. The active IRF file, if any, remains
unchanged. If irf filename is specified without an extension, .irf is assumed.
detail specifies that irf describe display detailed information about each set of IRF results.
detail is implied when irf resultslist is specified.
variables is a programmers option; additionally displays the output produced by the describe
command.

Remarks and examples


If you have not read [TS] irf, please do so.
276

irf describe Describe an IRF file

Example 1
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
(output omitted )

We create three sets of IRF results:


. irf create order1, set(myirfs, replace)
(file myirfs.irf created)
(file myirfs.irf now active)
(file myirfs.irf updated)
. irf create order2, order(dln_inc dln_inv dln_consump)
(file myirfs.irf updated)
. irf create order3, order(dln_inc dln_consump dln_inv)
(file myirfs.irf updated)
. irf describe
Contains irf results from myirfs.irf (dated 4 Apr 2013 12:36)
irfname
model
endogenous variables and order (*)
order1
order2
order3

var
var
var

dln_inv dln_inc dln_consump


dln_inc dln_inv dln_consump
dln_inc dln_consump dln_inv

(*) order is relevant only when model is var

The output reveals the order in which we specified the variables.


. irf describe order1

irf results for order1


Estimation specification
model: var
endog: dln_inv dln_inc dln_consump
sample: quarterly data from 1960q4 to 1978q4
lags: 1 2
constant: constant
exog: none
exogvars: none
exlags: none
varcns: unconstrained
IRF specification
step: 8
order: dln_inv dln_inc dln_consump
std error: asymptotic
reps: none

Here we see a summary of the model we fit as well as the specification of the IRFs.

277

278

irf describe Describe an IRF file

Stored results
irf describe stores the following in r():
Scalars
r(N)
r(k)
r(width)
r(N max)
r(k max)
r(widthmax)
r(changed)
Macros
r( version)
r(irfnames)
r(irfname model)
r(irfname order)
r(irfname exog)
r(irfname exogvar)
r(irfname constant)
r(irfname lags)
r(irfname exlags)
r(irfname tmin)
r(irfname tmax)
r(irfname timevar)
r(irfname tsfmt)
r(irfname varcns)
r(irfname svarcns)
r(irfname step)
r(irfname stderror)
r(irfname
r(irfname
r(irfname
r(irfname
r(irfname
r(irfname

reps)
version)
rank)
trend)
veccns)
sind)

number of observations in the IRF file


number of variables in the IRF file
width of dataset in the IRF file
maximum number of observations
maximum number of variables
maximum width of the dataset
flag indicating that data have changed since last saved
version of IRF results file
names of IRF results in the IRF file
var, sr var, lr var, or vec
Cholesky order assumed in IRF estimates
exogenous variables, and their lags, in VAR or underlying VAR
exogenous variables in VAR or underlying VAR
constant or noconstant
lags in model
lags of exogenous variables in model
minimum value of timevar in the estimation sample
maximum value of timevar in the estimation sample
name of tsset timevar
format of timevar in the estimation sample
unconstrained or colon-separated list of constraints placed on
VAR coefficients
"." or colon-separated list of constraints placed on SVAR coefficients
maximum step in IRF estimates
asymptotic, bs, bsp, or none, depending on type
of standard errors specified to irf create
"." or number of bootstrap replications performed
version of IRF file that originally held irfname IRF results
"." or number of cointegrating equations
"." or trend() specified in vec
"." or constraints placed on VECM parameters
"." or normalized seasonal indicators included in vec

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf drop Drop IRF results from the active IRF file
Syntax
Remarks and examples

Menu
Also see

Description

Option

Syntax
irf drop irf resultslist

, set(filename)

Menu
Statistics

>

Multivariate time series

>

Manage IRF results and files

>

Drop IRF results

Description
irf drop removes IRF results from the active IRF file.

Option
set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.

Remarks and examples


If you have not read [TS] irf, please do so.

Example 1
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
(output omitted )

We create three sets of IRF results:


. irf
(file
(file
(file

create order1, set(myirfs, replace)


myirfs.irf created)
myirfs.irf now active)
myirfs.irf updated)

. irf
(file
. irf
(file

create order2, order(dln_inc dln_inv dln_consump)


myirfs.irf updated)
create order3, order(dln_inc dln_consump dln_inv)
myirfs.irf updated)

279

280

irf drop Drop IRF results from the active IRF file
. irf describe
Contains irf results from myirfs.irf (dated 4 Apr 2013 12:59)
model
endogenous variables and order (*)
irfname
order1
order2
order3

var
var
var

dln_inv dln_inc dln_consump


dln_inc dln_inv dln_consump
dln_inc dln_consump dln_inv

(*) order is relevant only when model is var

Now lets remove order1 and order2 from myirfs.irf.


. irf drop order1 order2
(order1 dropped)
(order2 dropped)
file myirfs.irf updated
. irf describe
Contains irf results from myirfs.irf (dated 4 Apr 2013 12:59)
irfname
model
endogenous variables and order (*)
order3

var

dln_inc dln_consump dln_inv

(*) order is relevant only when model is var

order1 and order2 have been dropped.

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf graph stat

, options

stat

Description

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd

impulseresponse function
orthogonalized impulseresponse function
dynamic-multiplier function
cumulative impulseresponse function
cumulative orthogonalized impulseresponse function
cumulative dynamic-multiplier function
Cholesky forecast-error variance decomposition
structural impulseresponse function
structural forecast-error variance decomposition

Notes: 1. No statistic may appear more than once.


2. If confidence intervals are included (the default), only two statistics may be included.
3. If confidence intervals are suppressed (option noci), up to four statistics may be included.

options

Description

Main

set(filename)
irf(irfnames)
impulse(impulsevar)
response(endogvars)
noci
level(#)
lstep(#)
ustep(#)

make filename active


use irfnames IRF result sets
use impulsevar as impulse variables
use endogenous variables as response variables
suppress confidence bands
set confidence level; default is level(95)
use # for first step
use # for maximum step

Advanced

individual


iname(namestub , replace )


isaving(filenamestub , replace )

graph each combination individually


stub for naming the individual graphs
stub for saving the individual graphs to files

Plots

plot#opts(cline options)

affect rendition of the line plotting the # stat

CI plots

ci#opts(area options)

affect rendition of the confidence interval for the # stat

281

282

irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in


[G-3] twoway options
how subgraphs are combined, labeled, etc.

twoway options
byopts(by option)

Menu
Statistics

>

Multivariate time series

>

IRF and FEVD analysis

>

Graphs by impulse or response

Description
irf graph graphs impulseresponse functions (IRFs), dynamic-multiplier functions, and forecasterror variance decompositions (FEVDs) over time.

Options


Main

set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.
irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, each of the results in
the active IRF file is used. (Files often contain just one set of IRF results saved under one irfname;
in that case, those results are used.)
impulse(impulsevar) and response(endogvars) specify the impulse and response variables. Usually
one of each is specified, and one graph is drawn. If multiple variables are specified, a separate
subgraph is drawn for each impulseresponse combination. If impulse() and response() are
not specified, subgraphs are drawn for all combination of impulse and response variables.
impulsevar should be specified as an endogenous variable for all statistics except dm or cdm; for
those, specify as an exogenous variable.
noci suppresses graphing the confidence interval for each statistic. noci is assumed when the model
was fit by vec because no confidence intervals were estimated.
level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they
are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the
width of confidence intervals. Also see [TS] irf cgraph for a graph command that allows the
confidence level to vary over the graphs.
lstep(#) specifies the first step, or period, to be included in the graphs. lstep(0) is the default.
ustep(#), # 1, specifies the maximum step, or period, to be included in the graphs.

Advanced

individual specifies that each graph be displayed individually. By default, irf graph combines
the subgraphs into one image. When individual is specified, byopts() may not be specified,
but the isaving() and iname() options may be specified.


iname(namestub , replace ) specifies that the ith individual graph be stored in memory under
the name namestubi, which must be a valid Stata name of 24 characters or fewer. iname() may
be specified only with the individual option.


isaving(filenamestub , replace ) specifies that the ith individual graph should be saved to disk
in the current working directory under the name filenamestubi.gph. isaving() may be specified
only when the individual option is also specified.

irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs

283

Plots

plot1opts(cline options), . . . , plot4opts(cline options) affect the rendition of the plotted statistics (the stat). plot1opts() affects the rendition of the first statistic; plot2opts(), the second;
and so on. cline options are as described in [G-3] cline options.

CI plots

ci1opts(area options) and ci2opts(area options) affect the rendition of the confidence intervals
for the first (ci1opts()) and second (ci2opts()) statistics in stat. area options are as described
in [G-3] area options.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option). Note that the saving() and name() options may not be combined with the
individual option.
byopts(by option) is as documented in [G-3] by option and may not be specified when individual
is specified. byopts() affects how the subgraphs are combined, labeled, etc.

Remarks and examples


If you have not read [TS] irf, please do so.
Also see [TS] irf cgraph, which produces combined graphs; [TS] irf ograph, which produces
overlaid graphs; and [TS] irf table, which displays results in tabular form.
irf graph produces one or more graphs and displays them arrayed into one image unless the
individual option is specified, in which case the individual graphs are displayed separately. Each
individual graph consists of all the specified stat and represents one impulseresponse combination.
Because all the specified stat appear on the same graph, putting together statistics with very
different scales is not recommended. For instance, sometimes sirf and oirf are on similar scales
while irf is on a different scale. In such cases, combining sirf and oirf on the same graph looks
fine, but combining either with irf produces an uninformative graph.

Example 1
Suppose that we have results generated from two different SVAR models. We want to know whether
the shapes of the structural IRFs and the structural FEVDs are similar in the two models. We are also
interested in knowing whether the structural IRFs and the structural FEVDs differ significantly from
their Cholesky counterparts.
Filling in the background, we have previously issued the commands
.
.
.
.
.
.
.

use http://www.stata-press.com/data/r13/lutkepohl2
mat a = (., 0, 0\0,.,0\.,.,.)
mat b = I(3)
svar dln_inv dln_inc dln_consump, aeq(a) beq(b)
irf create modela, set(results3) step(8)
svar dln_inc dln_inv dln_consump, aeq(a) beq(b)
irf create modelb, step(8)

To see whether the shapes of the structural IRFs and the structural FEVDs are similar in the two
models, we type

284

irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs


. irf graph oirf sirf, impulse(dln_inc) response(dln_consump)

modela, dln_inc, dln_consump

modelb, dln_inc, dln_consump

.01

.005

.005
0

step
95% CI for oirf
orthogonalized irf

95% CI for sirf


structural irf

Graphs by irfname, impulse variable, and response variable

The graph reveals that the oirf and the sirf estimates are essentially the same for both models and
that the shapes of the functions are very similar for the two models.
To see whether the structural IRFs and the structural FEVDs differ significantly from their Cholesky
counterparts, we type
. irf graph fevd sfevd, impulse(dln_inc) response(dln_consump) lstep(1)
> legend(cols(1))

modela, dln_inc, dln_consump

modelb, dln_inc, dln_consump

.5

.4

.3

.2

.1
0

step
95% CI for fevd
95% CI for sfevd
fraction of mse due to impulse
(structural) fraction of mse due to impulse
Graphs by irfname, impulse variable, and response variable

This combined graph reveals that the shapes of these functions are also similar for the two models.
However, the graph illuminates one minor difference between them: In modela, the estimated structural

irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs

285

FEVD is slightly larger than the Cholesky-based estimates, whereas in modelb the Cholesky-based

estimates are slightly larger than the structural estimates. For both models, however, the structural
estimates are close to the center of the wide confidence intervals for the two estimates.

Example 2
Lets focus on the results from modela. Suppose that we were interested in examining how
dln consump responded to impulses in its own structural innovations, structural innovations to
dln inc, and structural innovations to dln inv. We type
. irf graph sirf, irf(modela) response(dln_consump)

modela, dln_consump, dln_consump

modela, dln_inc, dln_consump

.01

.005

.005
0

modela, dln_inv, dln_consump


.01

.005

.005
0

step
95% CI

structural irf

Graphs by irfname, impulse variable, and response variable

The upper-left graph shows the structural IRF of an innovation in dln consump on dln consump. It
indicates that the identification restrictions used in modela imply that a positive shock to dln consump
causes an increase in dln consump, followed by a decrease, followed by an increase, and so on,
until the effect dies out after roughly 5 periods.
The upper-right graph shows the structural IRF of an innovation in dln inc on dln consump,
indicating that a positive shock to dln inc causes an increase in dln consump, which dies out after
4 or 5 periods.

Technical note
[TS] irf table contains a technical note warning you to be careful in naming variables when you
fit models. What is said there applies equally here.

286

irf graph Graphs of IRFs, dynamic-multiplier functions, and FEVDs

Stored results
irf graph stores the following in r():
Scalars
r(k)
Macros
r(stats)
r(irfname)
r(impulse)
r(response)
r(plot#)
r(ci)
r(ciopts#)

number of graphs
statlist
resultslist
impulselist
responselist
contents of plot#opts()
level applied to confidence
intervals or noci
contents of ci#opts()

r(byopts)
r(saving)
r(name)
r(individual)
r(isaving)
r(iname)
r(subtitle#)

contents of byopts()
supplied saving() option
supplied name() option
individual or blank
contents of saving()
contents of name()
subtitle for individual graph #

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf ograph Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf ograph (spec1 )


 

(spec2 ) . . . (spec15 )
, options

where (speck ) is
(irfname impulsevar responsevar stat


, spec options )

irfname is the name of a set of IRF results in the active IRF file or ., which means the first named
result in the active IRF file. impulsevar should be specified as an endogenous variable for all statistics
except dm and cdm; for those, specify as an exogenous variable. responsevar is an endogenous variable
name. stat is one or more statistics from the list below:
stat

Description

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd

impulseresponse function
orthogonalized impulseresponse function
dynamic-multiplier function
cumulative impulseresponse function
cumulative orthogonalized impulseresponse function
cumulative dynamic-multiplier function
Cholesky forecast-error variance decomposition
structural impulseresponse function
structural forecast-error variance decomposition

options

Description

Plots

plot options
set(filename)

define the IRF plots


make filename active

Options

common options

level and steps

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

287

288

irf ograph Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs

plot options

Description

Main

set(filename)
irf(irfnames)
impulse(impulsevar)
response(endogvars)
ci

make filename active


use irfnames IRF result sets
use impulsevar as impulse variables
use endogenous variables as response variables
add confidence bands to the graph

spec options

Description

Options

common options

level and steps

Plot

cline options

affect rendition of the plotted lines

CI plot

ciopts(area options)

affect rendition of the confidence intervals

common options

Description

Options

set confidence level; default is level(95)


use # for first step
use # for maximum step

level(#)
lstep(#)
ustep(#)

common options may be specified within a plot specification, globally, or in both. When specified in a plot
specification, the common options affect only the specification in which they are used. When supplied globally,
the common options affect all plot specifications. When supplied in both places, options in the plot specification
take precedence.

Menu
Statistics

>

Multivariate time series

>

IRF and FEVD analysis

>

Overlaid graph

Description
irf ograph displays plots of irf results on one graph (one pair of axes).
To become familiar with this command, type db irf ograph.

Options


Plots

plot options defines the IRF plots and are found under the Main, Plot, and CI plot tabs.
set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.

irf ograph Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs

289

Main

set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.
irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, each of the results in
the active IRF file is used. (Files often contain just one set of IRF results saved under one irfname;
in that case, those results are used.)
impulse(varlist) and response(endogvars) specify the impulse and response variables. Usually
one of each is specified, and one graph is drawn. If multiple variables are specified, a separate
subgraph is drawn for each impulseresponse combination. If impulse() and response() are
not specified, subgraphs are drawn for all combination of impulse and response variables.
ci adds confidence bands to the graph. The noci option may be used within a plot specification to
suppress its confidence bands when the ci option is supplied globally.

Plot

cline options affect the rendition of the plotted lines; see [G-3] cline options.

CI plot

ciopts(area options) affects the rendition of the confidence bands for the plotted statistic; see
[G-3] area options. ciopts() implies ci.

Options

level(#) specifies the confidence level, as a percentage, for confidence bands; see [U] 20.7 Specifying
the width of confidence intervals.
lstep(#) specifies the first step, or period, to be included in the graph. lstep(0) is the default.
ustep(#), # 1, specifies the maximum step, or period, to be included.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


If you have not read [TS] irf, please do so.
irf ograph overlays plots of IRFs and FEVDs on one graph.

Example 1
We have previously issued the commands
.
.
.
.

use
var
irf
irf

http://www.stata-press.com/data/r13/lutkepohl2
dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
create order1, step(10) set(myirf1, new)
create order2, step(10) order(dln_inc dln_inv dln_consump)

290

irf ograph Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs

We now wish to compare the oirf for impulse dln inc and response dln consump for two different
Cholesky orderings:

.002

.002

.004

.006

. irf ograph (order1 dln_inc dln_consump oirf)


>
(order2 dln_inc dln_consump oirf)

10

step
order1: oirf of dln_inc > dln_consump
order2: oirf of dln_inc > dln_consump

Technical note
Graph options allow you to change the appearance of each plot. The following graph contains the
plots of the FEVDs (FEVDs) for impulse dln inc and each response using the results from the first
collection of results in the active IRF file (using the . shortcut). In the second plot, we supply the
clpat(dash) option (an abbreviation for clpattern(dash)) to give the line a dashed pattern. In
the third plot, we supply the m(o) clpat(dash dot) recast(connected) options to get small
circles connected by a line with a dashdot pattern; the cilines option plots the confidence bands
by using lines instead of areas. We use the title() option to add a descriptive title to the graph
and supply the ci option globally to add confidence bands to all the plots.

irf ograph Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs

291

. irf ograph (. dln_inc dln_inc fevd)


>
(. dln_inc dln_consump fevd, clpat(dash))
>
(. dln_inc dln_inv fevd, cilines m(o) clpat(dash_dot)
>
recast(connected))
>
, ci title("Comparison of forecast-error variance decomposition")

.2

.4

.6

.8

Comparison of forecasterror variance decomposition

10

step
95% CI of fevd of dln_inc > dln_inc
95% CI of fevd of dln_inc > dln_consump
95% CI of fevd of dln_inc > dln_inv
fevd of dln_inc > dln_inc
fevd of dln_inc > dln_consump
fevd of dln_inc > dln_inv

The clpattern() option is described in [G-3] connect options, msymbol() is described in


[G-3] marker options, title() is described in [G-3] title options, and recast() is described
in [G-3] advanced options.

Stored results
irf ograph stores the following in r():
Scalars
r(plots)
r(ciplots)
Macros
r(irfname#)
r(impulse#)
r(response#)
r(stat#)
r(ci#)

number of plot specifications


number of plotted confidence bands
irfname from (spec#)
impulse from (spec#)
response from (spec#)
statistics from (spec#)
level from (spec#) or noci

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf rename Rename an IRF result in an IRF file
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Option

Syntax
irf rename oldname newname

, set(filename)

Menu
Statistics

>

Multivariate time series

>

Manage IRF results and files

>

Rename IRF results

Description
irf rename changes the name of a set of IRF results saved in the active IRF file.

Option
set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.

Remarks and examples


If you have not read [TS] irf, please do so.

Example 1
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
(output omitted )

We create three sets of IRF results:


. irf
(file
(file
(file

create original, set(myirfs, replace)


myirfs.irf created)
myirfs.irf now active)
myirfs.irf updated)

. irf
(file
. irf
(file

create order2, order(dln_inc dln_inv dln_consump)


myirfs.irf updated)
create order3, order(dln_inc dln_consump dln_inv)
myirfs.irf updated)

292

irf rename Rename an IRF result in an IRF file


. irf describe
Contains irf results from myirfs.irf (dated 4 Apr 2013 13:06)
model
endogenous variables and order (*)
irfname
original
order2
order3

var
var
var

dln_inv dln_inc dln_consump


dln_inc dln_inv dln_consump
dln_inc dln_consump dln_inv

(*) order is relevant only when model is var

Now lets rename IRF result original to order1.


. irf rename original order1
(81 real changes made)
original renamed to order1
. irf describe
Contains irf results from myirfs.irf (dated 4 Apr 2013 13:06)
model
endogenous variables and order (*)
irfname
order1
order2
order3

var
var
var

dln_inv dln_inc dln_consump


dln_inc dln_inv dln_consump
dln_inc dln_consump dln_inv

(*) order is relevant only when model is var

original has been renamed to order1.

Stored results
irf rename stores the following in r():
Macros
r(irfnames)
r(oldnew)

irfnames after rename


oldname newname

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

293

Title
irf set Set the active IRF file
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
Report identity of active file
irf set
Set, and if necessary create, active file
irf set irf filename
Create, and if necessary replace, active file
irf set irf filename, replace
Clear any active IRF file
irf set, clear

Menu
Statistics

>

Multivariate time series

>

Manage IRF results and files

>

Set active IRF file

Description
In the first syntax, irf set reports the identity of the active file, if there is one. Also see [TS] irf
describe for obtaining reports on the contents of an IRF file.
In the second syntax, irf set irf filename specifies that the file be set as the active file and, if
the file does not exist, that it be created as well.
In the third syntax, irf set irf filename, replace specifies that even if file irf filename exists,
a new, empty file is to be created and set.
In the rarely used fourth syntax, irf set, clear specifies that, if any IRF file is set, it be unset
and that there be no active IRF file.
IRF files are just files: they can be erased by erase, listed by dir, and copied by copy; see
[D] erase, [D] dir, and [D] copy.

If irf filename is specified without an extension, .irf is assumed.

294

irf set Set the active IRF file

295

Options
replace specifies that if irf filename already exists, the file is to be erased and a new, empty IRF
file is to be created in its place. If it does not already exist, a new, empty file is created.
clear unsets the active IRF file.

Remarks and examples


If you have not read [TS] irf, please do so.
irf set reports the identity of the active IRF file:
. irf set
no irf file active

irf set irf filename creates and sets an IRF file:


. irf set results1
(file results1.irf now active)

We specified the name results1, and results1.irf became the active file. The suffix .irf was
added for us.
irf set irf filename can also be used to create a new file:
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inc dln_consump, exog(l.dln_inv)
(output omitted )
. irf set results2
(file results2.irf created)
(file results2.irf now active)
. irf create order1
(file results2.irf updated)

Stored results
irf set stores the following in r():
Macros
r(Orville)

name of active IRF file, if there is an active IRF

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
irf table Tables of IRFs, dynamic-multiplier functions, and FEVDs
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
irf table

stat

 

stat

, options

Description

Main

irf
oirf
dm
cirf
coirf
cdm
fevd
sirf
sfevd

impulseresponse function
orthogonalized impulseresponse function
dynamic-multiplier function
cumulative impulseresponse function
cumulative orthogonalized impulseresponse function
cumulative dynamic-multiplier function
Cholesky forecast-error variance decomposition
structural impulseresponse function
structural forecast-error variance decomposition

If stat is not specified, all statistics are included, unless option nostructural is also specified, in which case
sirf and sfevd are excluded. You may specify more than one stat.

options

Description

Main

set(filename)
irf(irfnames)
impulse(impulsevar)
response(endogvars)
individual
title("text")

make filename active


use irfnames IRF result sets
use impulsevar as impulse variables
use endogenous variables as response variables
make an individual table for each result set
use text for overall table title

Options

level(#)
noci
stderror
nostructural
step(#)

set confidence level; default is level(95)


suppress confidence intervals
include standard errors in the tables
suppress sirf and sfevd from the default list of statistics
use common maximum step horizon # for all tables

296

irf table Tables of IRFs, dynamic-multiplier functions, and FEVDs

297

Menu
Statistics

>

Multivariate time series

>

IRF and FEVD analysis

>

Tables by impulse or response

Description
irf table makes a table from the specified IRF results.
The rows of the tables are the time since impulse. Each column represents a combination of
impulse() variable and response() variable for a stat from the irf() results.

Options


Main

set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the
active file is used.
All results are obtained from one IRF file. If you have results in different files that you want in
one table, use irf add to copy results into one file; see [TS] irf add.
irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, all the results in the
active IRF file are used. (Files often contain just one set of IRF results, saved under one irfname;
in that case, those results are used. When there are multiple IRF results, you may also wish to
specify the individual option.)
impulse(impulsevar) specifies the impulse variables for which the statistics are to be reported. If
impulse() is not specified, each model variable, in turn, is used. impulsevar should be specified
as an endogenous variable for all statistics except dm or cdm; for those, specify as an exogenous
variable.
response(endogvars) specifies the response variables for which the statistics are to be reported. If
response() is not specified, each endogenous variable, in turn, is used.
individual specifies that each set of IRF results be placed in its own table, with its own title and
footer. By default, irf table places all the IRF results in one table with one title and one footer.
individual may not be combined with title().
title("text") specifies a title for the overall table.

Options

level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they
are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the
width of confidence intervals.
noci suppresses reporting of the confidence intervals for each statistic. noci is assumed when the
model was fit by vec because no confidence intervals were estimated.
stderror specifies that standard errors for each statistic also be included in the table.
nostructural specifies that stat, when not specified, exclude sirf and sfevd.
step(#) specifies the maximum step horizon for all tables. If step() is not specified, each table is
constructed using all steps available.

298

irf table Tables of IRFs, dynamic-multiplier functions, and FEVDs

Remarks and examples


If you have not read [TS] irf, please do so.
Also see [TS] irf graph, which produces output in graphical form, and see [TS] irf ctable, which
also produces tabular output. irf ctable is more difficult to use but provides more control over
how tables are formed.

Example 1
We have fit a model with var, and we saved the IRFs from two different orderings. The commands
we previously used were
.
.
.
.
.

use
var
irf
irf
irf

http://www.stata-press.com/data/r13/lutkepohl2
dln_inv dln_inc dln_consump
set results4
create ordera, step(8)
create orderb, order(dln_inc dln_inv dln_consump) step(8)

We now wish to compare the two orderings:


. irf table oirf fevd, impulse(dln_inc) response(dln_consump) noci std
> title("Ordera versus orderb")
Ordera versus orderb

step
0
1
2
3
4
5
6
7
8

step
0
1
2
3
4
5
6
7
8

(1)
oirf

(1)
S.E.

(1)
fevd

(1)
S.E.

.005123
.001635
.002948
-.000221
.000811
.000462
.000044
.000151
.000091

.000878
.000984
.000993
.000662
.000586
.000333
.000275
.000162
.000114

0
.288494
.294288
.322454
.319227
.322579
.323552
.323383
.323499

0
.077483
.073722
.075562
.074063
.075019
.075371
.075314
.075386

(2)
oirf

(2)
S.E.

(2)
fevd

(2)
S.E.

.005461
.001578
.003307
-.00019
.000846
.000491
.000069
.000158
.000096

.000925
.000988
.001042
.000676
.000617
.000349
.000292
.000172
.000122

0
.327807
.328795
.370775
.366896
.370399
.371487
.371315
.371438

0
.08159
.077519
.080604
.079019
.079941
.080323
.080287
.080366

(1) irfname = ordera, impulse = dln_inc, and response = dln_consump


(2) irfname = orderb, impulse = dln_inc, and response = dln_consump

The output is displayed as a single table; because the table did not fit horizontally, it wrapped
automatically. At the bottom of the table is a definition of the keys that appear at the top of each
column. The results in the table above indicate that the orthogonalized IRFs do not change by much.

irf table Tables of IRFs, dynamic-multiplier functions, and FEVDs

299

Example 2
Because the estimated FEVDs do change significantly, we might want to produce two tables that
contain the estimated FEVDs and their 95% confidence intervals:
. irf table fevd, impulse(dln_inc) response(dln_consump) individual
Results from ordera

step
0
1
2
3
4
5
6
7
8

(1)
fevd

(1)
Lower

(1)
Upper

0
.288494
.294288
.322454
.319227
.322579
.323552
.323383
.323499

0
.13663
.149797
.174356
.174066
.175544
.175826
.17577
.175744

0
.440357
.43878
.470552
.464389
.469613
.471277
.470995
.471253

95% lower and upper bounds reported


(1) irfname = ordera, impulse = dln_inc, and response = dln_consump
Results from orderb

step
0
1
2
3
4
5
6
7
8

(1)
fevd

(1)
Lower

(1)
Upper

0
.327807
.328795
.370775
.366896
.370399
.371487
.371315
.371438

0
.167893
.17686
.212794
.212022
.213718
.214058
.213956
.213923

0
.487721
.48073
.528757
.52177
.52708
.528917
.528674
.528953

95% lower and upper bounds reported


(1) irfname = orderb, impulse = dln_inc, and response = dln_consump

Because we specified the individual option, the output contains two tables, one for each set of
IRF results. Examining the results in the tables indicates that each of the estimated functions is well

within the confidence interval of the other, so we conclude that the functions are not significantly
different.

Technical note
Be careful in how you name variables when you fit models. Say that you fit one model with var
and used time-series operators to form one of the endogenous variables
. var d.ln_inv

...

and in another model, you created a new variable:


. gen dln_inv = d.ln_inv
. var dln_inv . . .

300

irf table Tables of IRFs, dynamic-multiplier functions, and FEVDs

Say that you saved IRF results from both (perhaps they differ in the number of lags). Now you
wish to use irf table to compare them. You would not be able to specify response(d.ln inv)
or response(dln inv) because neither variable is in both models. Similarly, you could not specify
impulse(d.ln inv) or impulse(dln inv) for the same reason.
All is not lost; if impulse() is not specified, all endogenous variables are used, and similarly if
response() is not specified, so you could obtain the result you desired by simply not specifying
the options, but you will also obtain a lot more, besides. If you want to specify the impulse() or
response() options, be sure to name variables consistently.
Also, you may forget how the endogenous variables were named. If so, irf describe, detail
can provide the answer. In irf describes output, the endogenous variables are listed next to
endog.

Stored results
If the individual option is not specified, irf table stores the following in r():
Scalars
r(ncols)
r(k umax)
r(k)
Macros
r(key#)
r(tnotes)

number of columns in table


number of distinct keys
number of specific table commands
#th key
list of keys applied to each column

If the individual option is specified, then for each irfname, irf table stores the following in
r():
Scalars
r(irfname
r(irfname
r(irfname
Macros
r(irfname
r(irfname

ncols)
k umax)
k)

number of columns in table for irfname


number of distinct keys in table for irfname
number of specific table commands used to create table for irfname

key#)
tnotes)

#th key for irfname table


list of keys applied to each column in table for irfname

Also see
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
mgarch Multivariate GARCH models

Syntax

Description

Remarks and examples

References

Also see

Syntax
mgarch model eq

eq . . . eq

 

if

 

in

 

, ...

Family

model

Vech
Diagonal vech

dvech

Conditional correlation
constant conditional correlation
dynamic conditional correlation
varying conditional correlation

ccc
dcc
vcc

See [TS] mgarch dvech, [TS] mgarch ccc, [TS] mgarch dcc, and [TS] mgarch vcc for details.

Description
mgarch estimates the parameters of multivariate generalized autoregressive conditionalheteroskedasticity (MGARCH) models. MGARCH models allow both the conditional mean and the
conditional covariance to be dynamic.
The general MGARCH model is so flexible that not all the parameters can be estimated. For this
reason, there are many MGARCH models that parameterize the problem more parsimoniously.
mgarch implements four commonly used parameterizations: the diagonal vech model, the constant
conditional correlation model, the dynamic conditional correlation model, and the time-varying
conditional correlation model.

Remarks and examples


Remarks are presented under the following headings:
An introduction to MGARCH models
Diagonal vech MGARCH models
Conditional correlation MGARCH models
Constant conditional correlation MGARCH model
Dynamic conditional correlation MGARCH model
Varying conditional correlation MGARCH model
Error distributions and quasimaximum likelihood
Treatment of missing data

301

302

mgarch Multivariate GARCH models

An introduction to MGARCH models


Multivariate GARCH models allow the conditional covariance matrix of the dependent variables to
follow a flexible dynamic structure and allow the conditional mean to follow a vector-autoregressive
(VAR) structure.
The general MGARCH model is too flexible for most problems. There are many restricted MGARCH
models in the literature because there is no parameterization that always provides an optimal trade-off
between flexibility and parsimony.
mgarch implements four commonly used parameterizations: the diagonal vech (DVECH) model,
the constant conditional correlation (CCC) model, the dynamic conditional correlation (DCC) model,
and the time-varying conditional correlation (VCC) model.
Bollerslev, Engle, and Wooldridge (1988); Bollerslev, Engle, and Nelson (1994); Bauwens,
Laurent, and Rombouts (2006); Silvennoinen and Terasvirta (2009); and Engle (2009) provide general
introductions to MGARCH models. We provide a quick introduction organized around the models
implemented in mgarch.
We give a formal definition of the general MGARCH model to establish notation that facilitates
comparisons of the models. The general MGARCH model is given by

yt = Cxt + t
1/2

t = Ht t
where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ; and

t is an m 1 vector of zero-mean, unit-variance, and independent and identically distributed


innovations.
In the general MGARCH model, Ht is a matrix generalization of univariate GARCH models. For
example, in a general MGARCH model with one autoregressive conditional heteroskedastic (ARCH)
term and one GARCH term,


vech (Ht ) = s + Avech t1 0t1 + Bvech (Ht1 )

(1)

where the vech() function stacks the unique elements that lie on or below the main diagonal in a
symmetric matrix into a vector, s is a vector of parameters, and A and B are conformable matrices
of parameters. Because this model uses the vech() function to extract and model the unique elements
of Ht , it is also known as the VECH model.
Because it is a conditional covariance matrix, Ht must be positive definite. Equation (1) can be
used to show that the parameters in s, A, and B are not uniquely identified and that further restrictions
must be placed on s, A, and B to ensure that Ht is positive definite for all t.

mgarch Multivariate GARCH models

303

The various MGARCH models proposed in the literature differ in how they trade off flexibility
and parsimony in their specifications for Ht . Increased flexibility allows a model to capture more
complex Ht processes. Increased parsimony makes parameter estimation feasible for more datasets.
An important measure of the flexibilityparsimony trade-off is how fast the number of model parameters
increases with the number of time series m, because many applied models use multiple time series.

Diagonal vech MGARCH models


Bollerslev, Engle, and Wooldridge (1988) derived the diagonal vech (DVECH) model by restricting
A and B to be diagonal. Although the DVECH model is much more parsimonious than the general
model, it can only handle a few series because the number of parameters grows quadratically with
the number of series. For example, there are 3m(m + 1)/2 parameters in a DVECH(1,1) model for
Ht .
Despite the large number of parameters, the diagonal structure implies that each conditional variance
and each conditional covariance depends on its own past but not on the past of the other conditional
variances and covariances. Formally, in the DVECH(1,1) model each element of Ht is modeled by

hij,t = sij + aij i,(t1) j,(t1) + bij hij,(t1)


Parameter estimation can be difficult because it requires that Ht be positive definite for each
t. The requirement that Ht be positive definite for each t imposes complicated restrictions on the
off-diagonal elements.
See [TS] mgarch dvech for more details about this model.

Conditional correlation MGARCH models


Conditional correlation (CC) models use nonlinear combinations of univariate GARCH models to
represent the conditional covariances. In each of the conditional correlation models, the conditional
covariance matrix is positive definite by construction and has a simple structure, which facilitates
parameter estimation. CC models have a slower parameter growth rate than DVECH models as the
number of time series increases.
In CC models, Ht is decomposed into a matrix of conditional correlations Rt and a diagonal
matrix of conditional variances Dt :
1/2

1/2

Ht = Dt Rt Dt

(2)

where each conditional variance follows a univariate GARCH process and the parameterizations of Rt
vary across models.
Equation (2) implies that

hij,t = ij,t i,t j,t

(3)

2
where i,t
is modeled by a univariate GARCH process. Equation (3) highlights that CC models use
nonlinear combinations of univariate GARCH models to represent the conditional covariances and that
the parameters in the model for ij,t describe the extent to which the errors from equations i and j
move together.

304

mgarch Multivariate GARCH models

Comparing (1) and (2) shows that the number of parameters increases more slowly with the number
of time series in a CC model than in a DVECH model.
The three CC models implemented in mgarch differ in how they parameterize Rt .

Constant conditional correlation MGARCH model

Bollerslev (1990) proposed a CC MGARCH model in which the correlation matrix is time invariant.
It is for this reason that the model is known as a constant conditional correlation (CCC) MGARCH
model. Restricting Rt to a constant matrix reduces the number of parameters and simplifies the
estimation but may be too strict in many empirical applications.
See [TS] mgarch ccc for more details about this model.

Dynamic conditional correlation MGARCH model

Engle (2002) introduced a dynamic conditional correlation (DCC) MGARCH model in which the
conditional quasicorrelations Rt follow a GARCH(1,1)-like process. (As described by Engle [2009]
and Aielli [2009], the parameters in Rt are not standardized to be correlations and are thus known
as quasicorrelations.) To preserve parsimony, all the conditional quasicorrelations are restricted to
follow the same dynamics. The DCC model is significantly more flexible than the CCC model without
introducing an unestimable number of parameters for a reasonable number of series.
See [TS] mgarch dcc for more details about this model.

Varying conditional correlation MGARCH model

Tse and Tsui (2002) derived the varying conditional correlation (VCC) MGARCH model in which the
conditional correlations at each period are a weighted sum of a time-invariant component, a measure
of recent correlations among the residuals, and last periods conditional correlations. For parsimony,
all the conditional correlations are restricted to follow the same dynamics.
See [TS] mgarch vcc for more details about this model.

Error distributions and quasimaximum likelihood


By default, mgarch dvech, mgarch ccc, mgarch dcc, and mgarch vcc estimate the parameters
of MGARCH models by maximum likelihood (ML), assuming that the errors come from a multivariate
normal distribution. Both the ML estimator and the quasimaximum likelihood (QML) estimator,
which drops the normality assumption, are assumed to be consistent and normally distributed in large
samples; see Jeantheau (1998), Berkes and Horvath (2003), Comte and Lieberman (2003), Ling and
McAleer (2003), and Fiorentini and Sentana (2007). Specify vce(robust) to estimate the parameters
by QML. The QML parameter estimates are the same as the ML estimates, but the VCEs are different.
Based on low-level assumptions, Jeantheau (1998), Comte and Lieberman (2003), and Ling and
McAleer (2003) prove that some of the ML and QML estimators implemented in mgarch are consistent
and asymptotically normal. Based on higher-level assumptions, Fiorentini and Sentana (2007) prove
that all the ML and QML estimators implemented in mgarch are consistent and asymptotically normal.
The low-level assumption proofs specify the technical restrictions on the data-generating processes
more precisely than the high-level proofs, but they do not cover as many models or cases as the
high-level proofs.

mgarch Multivariate GARCH models

305

It is generally accepted that there could be more low-level theoretical work done to substantiate
the claims that the ML and QML estimators are consistent and asymptotically normally distributed.
These widely applied estimators have been subjected to many Monte Carlo studies that show that the
large-sample theory performs well in finite samples.
The distribution(t) option causes the mgarch commands to estimate the parameters of the
corresponding model by ML assuming that the errors come from a multivariate Student t distribution.
The choice between the multivariate normal and the multivariate t distributions is one between
robustness and efficiency. If the disturbances come from a multivariate Student t, then the ML
estimates based on the multivariate Student t assumption will be consistent and efficient, while the
QML estimates based on the multivariate normal assumption will be consistent but not efficient. In
contrast, if the disturbances come from a well-behaved distribution that is neither multivariate Student
t nor multivariate normal, then the ML estimates based on the multivariate Student t assumption
will not be consistent, while the QML estimates based on the multivariate normal assumption will be
consistent but not efficient.
Fiorentini and Sentana (2007) compare the ML and QML estimators implemented in mgarch and
provide many useful technical results pertaining to the estimators.

Treatment of missing data


mgarch allows for gaps due to missing data. The unconditional expectations are substituted for
the dynamic components that cannot be computed because of gaps. This method of handling gaps
can only handle the case in which g/T goes to zero as T goes to infinity, where g is the number of
observations lost to gaps in the data and T is the number of nonmissing observations.

References
Aielli, G. P. 2009. Dynamic Conditional Correlations: On Properties and Estimation. Working paper, Dipartimento di
Statistica, University of Florence, Florence, Italy.
Bauwens, L., S. Laurent, and J. V. K. Rombouts. 2006. Multivariate GARCH models: A survey. Journal of Applied
Econometrics 21: 79109.
Berkes, I., and L. Horvath. 2003. The rate of consistency of the quasi-maximum likelihood estimator. Statistics and
Probability Letters 61: 133143.
Bollerslev, T. 1990. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH
model. Review of Economics and Statistics 72: 498505.
Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed.
R. F. Engle and D. L. McFadden. Amsterdam: Elsevier.
Bollerslev, T., R. F. Engle, and J. M. Wooldridge. 1988. A capital asset pricing model with time-varying covariances.
Journal of Political Economy 96: 116131.
Comte, F., and O. Lieberman. 2003. Asymptotic theory for multivariate GARCH processes. Journal of Multivariate
Analysis 84: 6184.
Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional
heteroskedasticity models. Journal of Business & Economic Statistics 20: 339350.
. 2009. Anticipating Correlations: A New Paradigm for Risk Management. Princeton, NJ: Princeton University
Press.
Fiorentini, G., and E. Sentana. 2007. On the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models. Working paper 0713, CEMFI, Madrid, Spain.
ftp://ftp.cemfi.es/wp/07/0713.pdf.
Jeantheau, T. 1998. Strong consistency of estimators for multivariate ARCH models. Economic Theory 14: 7086.

306

mgarch Multivariate GARCH models

Ling, S., and M. McAleer. 2003. Asymptotic theory for a vector ARMGARCH model. Economic Theory 19:
280310.
Silvennoinen, A., and T. Terasvirta. 2009. Multivariate GARCH models. In Handbook of Financial Time Series, ed.
T. G. Andersen, R. A. Davis, J.-P. Krei, and T. Mikosch, 201229. New York: Springer.
Tse, Y. K., and A. K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model
with time-varying correlations. Journal of Business & Economic Statistics 20: 351362.

Also see
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
mgarch ccc Constant conditional correlation multivariate GARCH models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
mgarch ccc eq

eq . . . eq

 

if

 

in

 

, options

where each eq has the form





(depvars = indepvars
, eqoptions )
options

Description

Model

arch(numlist)
garch(numlist)
het(varlist)

 
distribution(dist # )
unconcentrated
constraints(numlist)

ARCH terms for all equations


GARCH terms for all equations

include varlist in the specification of the conditional variance


for all equations
use dist distribution for errors [may be gaussian
(synonym normal) or t; default is gaussian]
perform optimization on unconcentrated log likelihood
apply linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

Maximization

maximize options
from(matname)

control the maximization process; seldom used


initial values for the coefficients; seldom used

coeflegend

display legend instead of statistics

307

308

mgarch ccc Constant conditional correlation multivariate GARCH models

eqoptions

Description

noconstant
arch(numlist)
garch(numlist)
het(varlist)

ARCH terms
GARCH terms

suppress constant term in the mean equation

include varlist in the specification of the conditional variance

You must tsset your data before using mgarch ccc; see [TS] tsset.
indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables.
depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Multivariate GARCH

Description
mgarch ccc estimates the parameters of constant conditional correlation (CCC) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional variances
are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH) models
and the conditional covariances are modeled as nonlinear functions of the conditional variances. The
conditional correlation parameters that weight the nonlinear combinations of the conditional variance
are constant in the CCC MGARCH model.
The CCC MGARCH model is less flexible than the dynamic conditional correlation MGARCH model
(see [TS] mgarch dcc) and varying conditional correlation MGARCH model (see [TS] mgarch vcc),
which specify GARCH-like processes for the conditional correlations. The conditional correlation
MGARCH models are more parsimonious than the diagonal vech MGARCH model (see [TS] mgarch
dvech).

Options


Model

arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms
are specified.
garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH
terms are specified.
het(varlist) specifies that varlist be included in the model in the specification of the conditional
variance for all equations. This varlist enters the variance specification collectively as multiplicative
heteroskedasticity.
 
distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian,
normal, or t.
gaussian and normal are synonyms; each causes mgarch ccc to assume that the errors come
from a multivariate normal distribution. # cannot be specified with either of them.

mgarch ccc Constant conditional correlation multivariate GARCH models

309

t causes mgarch ccc to assume that the errors follow a multivariate Student t distribution, and
the degree-of-freedom parameter is estimated along with the other parameters of the model. If
distribution(t #) is specified, then mgarch ccc uses a multivariate Student t distribution
with # degrees of freedom. # must be greater than 2.
unconcentrated specifies that optimization be performed on the unconcentrated log likelihood. The
default is to start with the concentrated log likelihood.
constraints(numlist) specifies linear constraints to apply to the parameter estimates.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, specifies to use the observed information matrix (OIM) estimator.
vce(robust) specifies to use the Huber/White/sandwich estimator.

Reporting

level(#); see [R] estimation options.


nocnsreport; see [R] estimation options.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except
from(), and see below for information on from(). These options are seldom used.
from(matname) specifies initial values for the coefficients. from(b0) causes mgarch ccc to begin
the optimization algorithm with the values in b0. b0 must be a row vector, and the number of
columns must equal the number of parameters in the model.
The following option is available with mgarch ccc but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Eqoptions
noconstant suppresses the constant term in the mean equation.
arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified.
This option may not be specified with model-level arch().
garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified.
This option may not be specified with model-level garch().
het(varlist) specifies that varlist be included in the specification of the conditional variance. This
varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option
may not be specified with model-level het().

310

mgarch ccc Constant conditional correlation multivariate GARCH models

Remarks and examples


We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH
models and the methods implemented in mgarch ccc.
MGARCH models are dynamic multivariate regression models in which the conditional variances
and covariances of the errors follow an autoregressive-moving-average structure. The CCC MGARCH
model uses a nonlinear combination of univariate GARCH models in which the cross-equation weights
are time invariant to model the conditional covariance matrix of the disturbances.

As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their
specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht .
In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled
as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions
of the diagonal terms. In the CCC MGARCH model,

hij,t = ij

p
hii,t hjj,t

where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ij is a timeinvariate weight interpreted as a conditional correlation.
In the dynamic conditional correlation (DCC) and varying conditional correlation (VCC) MGARCH
models discussed in [TS] mgarch dcc and [TS] mgarch vcc, the ij are allowed to vary over
time. Although the conditional-correlation structure provides a useful trade-off between parsimony
and flexibility in the DCC MGARCH and VCC MGARCH models, the time-invariant parameterization
used in the CCC MGARCH model is generally viewed as too restrictive for many applications; see
Silvennoinen and Terasvirta (2009). The baseline CCC MGARCH estimates are frequently compared
with DCC MGARCH and VCC MGARCH estimates.

Technical note
Formally, the CCC MGARCH model derived by Bollerslev (1990) can be written as

yt = Cxt + t
1/2

t = Ht t
1/2

1/2

Ht = Dt RDt
where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ;

t is an m 1 vector of normal, independent, and identically distributed innovations;

Dt is a diagonal matrix of conditional variances,


2
1,t
0

Dt = .
..
0

0
2
2,t
..
.
0

..
..
.
.
2
m,t

mgarch ccc Constant conditional correlation multivariate GARCH models

311

2
in which each i,t
evolves according to a univariate GARCH model of the form
Pqi
P
pi
2
2
j i,tj
j 2i,tj + j=1
i,t
= si + j=1

by default, or
2
i,t
= exp(i zi,t ) +

Ppi

j=1

j 2i,tj +

Pqi

j=1

2
j i,tj

when the het() option is specified, where t is a 1 p vector of parameters, zi is a p 1


vector of independent variables including a constant term, the j s are ARCH parameters,
and the j s are GARCH parameters; and

R is a matrix of time-invariant unconditional correlations of the standardized residuals


1/2
Dt
t ,

1
12 1m
1
2m
12
R=
..
..
..
..

.
.
.
.
1m 2m
1
This model is known as the constant conditional correlation MGARCH model because R is time
invariant.

Some examples
Example 1: Model with common covariates
We have daily data on the stock returns of three car manufacturersToyota, Nissan, and Honda,
from January 2, 2003, to December 31, 2010in the variables toyota, nissan, and honda. We
model the conditional means of the returns as a first-order vector autoregressive process and the
conditional covariances as a CCC MGARCH process in which the variance of each disturbance term
follows a GARCH(1,1) process. We specify the noconstant option, because the returns have mean
zero. The estimated constants in the variance equations are near zero in this example because of how
the data are scaled.

312

mgarch ccc Constant conditional correlation multivariate GARCH models


. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. mgarch ccc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing concentrated log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = 16898.994
Iteration 1:
log likelihood = 17008.914
Iteration 2:
log likelihood = 17156.946
Iteration 3:
log likelihood = 17249.527
Iteration 4:
log likelihood = 17287.251
Iteration 5:
log likelihood =
17313.5
Iteration 6:
log likelihood = 17335.087
Iteration 7:
log likelihood = 17356.534
Iteration 8:
log likelihood = 17376.051
Iteration 9:
log likelihood = 17400.035
(switching technique to nr)
Iteration 10: log likelihood = 17423.634
Iteration 11: log likelihood = 17440.807
Iteration 12: log likelihood = 17446.865
Iteration 13: log likelihood = 17447.637
Iteration 14: log likelihood = 17447.645
Iteration 15: log likelihood = 17447.645
Optimizing unconcentrated log likelihood
Iteration 0:
log likelihood = 17447.645
Iteration 1:
log likelihood = 17447.651
Iteration 2:
log likelihood = 17447.651
Constant conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17447.65
Coef.

Std. Err.

Number of obs
Wald chi2(9)
Prob > chi2
P>|z|

=
=
=

2014
17.46
0.0420

[95% Conf. Interval]

toyota
toyota
L1.

-.0537817

.0353211

-1.52

0.128

-.1230098

.0154463

nissan
L1.

.026686

.024841

1.07

0.283

-.0220015

.0753734

honda
L1.

-.0043073

.0302761

-0.14

0.887

-.0636473

.0550327

ARCH_toyota
arch
L1.

.0615321

.0087313

7.05

0.000

.0444191

.0786452

garch
L1.

.9213798

.0110412

83.45

0.000

.8997395

.9430201

_cons

4.42e-06

1.12e-06

3.93

0.000

2.21e-06

6.62e-06

mgarch ccc Constant conditional correlation multivariate GARCH models

nissan
toyota
L1.

-.0232321

.0400563

-0.58

0.562

-.1017411

.0552769

nissan
L1.

-.0299552

.0309362

-0.97

0.333

-.0905891

.0306787

honda
L1.

.0369229

.0360532

1.02

0.306

-.0337402

.1075859

ARCH_nissan
arch
L1.

.0740294

.0119353

6.20

0.000

.0506366

.0974222

garch
L1.

.9102547

.0142328

63.95

0.000

.8823589

.9381506

_cons

6.36e-06

1.76e-06

3.61

0.000

2.91e-06

9.81e-06

toyota
L1.

-.0378616

.036792

-1.03

0.303

-.1099727

.0342495

nissan
L1.

.0551649

.0272559

2.02

0.043

.0017444

.1085855

honda
L1.

-.0431919

.0331268

-1.30

0.192

-.1081193

.0217354

ARCH_honda
arch
L1.

.0433036

.0070224

6.17

0.000

.0295399

.0570674

garch
L1.

.939117

.010131

92.70

0.000

.9192605

.9589735

_cons

5.02e-06

1.31e-06

3.83

0.000

2.45e-06

7.59e-06

.6532264

.0128035

51.02

0.000

.628132

.6783208

.7185412

.0108132

66.45

0.000

.6973477

.7397347

.6298972

.0135336

46.54

0.000

.6033717

.6564226

313

honda

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)

The iteration log has three parts: the dots from the search for initial values, the iteration log from
optimizing the concentrated log likelihood, and the iteration log from maximizing the unconcentrated
log likelihood. A detailed discussion of the optimization methods can be found in Methods and
formulas.
The header describes the estimation sample and reports a Wald test against the null hypothesis
that all the coefficients on the independent variables in the mean equations are zero. Here the null
hypothesis is rejected at the 5% level.
The output table first presents results for the mean or variance parameters used to model each
dependent variable. Subsequently, the output table presents results for the conditional correlation
parameters. For example, the conditional correlation between the standardized residuals for Toyota
and Nissan is estimated to be 0.65.

314

mgarch ccc Constant conditional correlation multivariate GARCH models

The output above indicates that we may not need all the vector autoregressive parameters, but that
each of the univariate ARCH, univariate GARCH, and conditional correlation parameters are statistically
significant. That the estimated conditional correlation parameters are positive and significant indicates
that the returns on these stocks rise or fall together.
That the conditional correlations are time invariant is a restrictive assumption. The DCC MGARCH
model and the VCC MGARCH model nest the CCC MGARCH model. When we test the time-invariance
assumption with Wald tests on the parameters of these more general models in [TS] mgarch dcc and
[TS] mgarch vcc, we reject the null hypothesis that these conditional correlations are time invariant.

Example 2: Model with covariates that differ by equation


We improve the previous example by removing the insignificant parameters from the model. To
remove these parameters, we specify the honda equation separately from the toyota and nissan
equations:
. mgarch ccc (toyota nissan = , noconstant) (honda = L.nissan, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing concentrated log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood =
16886.88
Iteration 1:
log likelihood = 16974.779
Iteration 2:
log likelihood = 17147.893
Iteration 3:
log likelihood = 17247.473
Iteration 4:
log likelihood = 17285.549
Iteration 5:
log likelihood = 17311.153
Iteration 6:
log likelihood = 17333.588
Iteration 7:
log likelihood = 17353.717
Iteration 8:
log likelihood = 17374.895
Iteration 9:
log likelihood = 17400.669
(switching technique to nr)
Iteration 10: log likelihood = 17425.661
Iteration 11: log likelihood = 17436.784
Iteration 12: log likelihood =
17439.74
Iteration 13: log likelihood = 17439.865
Iteration 14: log likelihood = 17439.866
Optimizing unconcentrated log likelihood
Iteration 0:
log likelihood = 17439.866
Iteration 1:
log likelihood = 17439.872
Iteration 2:
log likelihood = 17439.872

mgarch ccc Constant conditional correlation multivariate GARCH models


Constant conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17439.87
Coef.

Number of obs
Wald chi2(1)
Prob > chi2

Std. Err.

P>|z|

=
=
=

315

2014
1.81
0.1781

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.0619604

.0087942

7.05

0.000

.044724

.0791968

garch
L1.

.9208961

.0110995

82.97

0.000

.8991414

.9426508

_cons

4.43e-06

1.13e-06

3.94

0.000

2.23e-06

6.64e-06

ARCH_nissan
arch
L1.

.0773095

.012328

6.27

0.000

.0531471

.1014719

garch
L1.

.906088

.0147303

61.51

0.000

.8772171

.9349589

_cons

6.77e-06

1.85e-06

3.66

0.000

3.14e-06

.0000104

nissan
L1.

.0186628

.0138575

1.35

0.178

-.0084975

.0458231

ARCH_honda
arch
L1.

.0433741

.006996

6.20

0.000

.0296622

.0570861

garch
L1.

.9391094

.0100707

93.25

0.000

.9193712

.9588477

_cons

5.02e-06

1.31e-06

3.83

0.000

2.45e-06

7.60e-06

.652299

.0128271

50.85

0.000

.6271583

.6774396

.7189531

.0108005

66.57

0.000

.6977845

.7401218

.628435

.0135653

46.33

0.000

.6018475

.6550225

honda

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)

It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant.
We could further improve the model by removing L1.nissan from the model.
As expected, removing the insignificant parameters from conditional mean equations had almost
no effect on the estimated conditional variance parameters.
There is no mean equation for Toyota or Nissan. In [TS] mgarch ccc postestimation, we discuss
prediction from models without covariates.

316

mgarch ccc Constant conditional correlation multivariate GARCH models

Example 3: Model with constraints


Here we fit a bivariate CCC MGARCH model for the Toyota and Nissan shares. We believe that
the shares of these car manufacturers follow the same process, so we impose the constraints that the
ARCH and the GARCH coefficients are the same for the two companies.
. constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch]
. constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch]
. mgarch ccc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2)
Calculating starting values....
Optimizing concentrated log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
(output omitted )
Iteration 8:
log likelihood
Iteration 9:
log likelihood
(switching technique to nr)
Iteration 10: log likelihood
Iteration 11: log likelihood
Iteration 12: log likelihood

=
=
=
=

10317.225
10630.464
10865.964
11063.329

=
=

11273.962
11274.409

=
=
=

11274.494
11274.499
11274.499

Optimizing unconcentrated log likelihood


Iteration 0:
Iteration 1:
Iteration 2:

log likelihood =
log likelihood =
log likelihood =

11274.499
11274.501
11274.501

Constant conditional correlation MGARCH model


Sample: 1 - 2015
Distribution: Gaussian
Log likelihood =
11274.5
( 1)
( 2)

Number of obs
Wald chi2(.)
Prob > chi2

=
=
=

2015
.
.

[ARCH_toyota]L.arch - [ARCH_nissan]L.arch = 0
[ARCH_toyota]L.garch - [ARCH_nissan]L.garch = 0
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.0742678

.0095464

7.78

0.000

.0555572

.0929785

garch
L1.

.9131674

.0111558

81.86

0.000

.8913024

.9350323

_cons

3.77e-06

1.02e-06

3.71

0.000

1.78e-06

5.77e-06

ARCH_nissan
arch
L1.

.0742678

.0095464

7.78

0.000

.0555572

.0929785

garch
L1.

.9131674

.0111558

81.86

0.000

.8913024

.9350323

_cons

5.30e-06

1.36e-06

3.89

0.000

2.63e-06

7.97e-06

.651389

.0128482

50.70

0.000

.6262071

.6765709

corr(toyota,
nissan)

mgarch ccc Constant conditional correlation multivariate GARCH models

317

We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio
test. The results indicate that the restricted model is preferable.

Example 4: Model with a GARCH term


In this example, we have data on fictional stock returns for the Acme and Anvil corporations and
we believe that the movement of the two stocks is governed by different processes. We specify one
ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for
the conditional variance equation for Anvil. In addition, we include the lagged value of the stock
return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For
Acme, we have data on the changes in an index of futures prices of products related to those produced
by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of
inputs used by Anvil in afinputs.

318

mgarch ccc Constant conditional correlation multivariate GARCH models


. use http://www.stata-press.com/data/r13/acmeh
. mgarch ccc (acme = afrelated, noconstant arch(1)
> (anvil = afinputs, arch(1/2) het(L.apex))
Calculating starting values....
Optimizing concentrated log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = -12996.245
Iteration 1:
log likelihood = -12609.982
Iteration 2:
log likelihood = -12563.103
Iteration 3:
log likelihood = -12554.73
Iteration 4:
log likelihood = -12554.542
Iteration 5:
log likelihood = -12554.534
Iteration 6:
log likelihood = -12554.534
Iteration 7:
log likelihood = -12554.534
Optimizing unconcentrated log likelihood
Iteration 0:
log likelihood = -12554.534
Iteration 1:
log likelihood = -12554.533
Constant conditional correlation MGARCH model
Sample: 1 - 2500
Distribution: Gaussian
Log likelihood = -12554.53
Coef.

Std. Err.

garch(1))

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

2499
2212.30
0.0000

P>|z|

[95% Conf. Interval]

acme
afrelated

.9175148

.0651088

14.09

0.000

.7899039

1.045126

ARCH_acme
arch
L1.

.0798719

.0169526

4.71

0.000

.0466455

.1130983

garch
L1.

.7336823

.060157

12.20

0.000

.6157768

.8515877

_cons

2.880836

.7602061

3.79

0.000

1.390859

4.370812

anvil
afinputs
_cons

-1.015561
.0703606

.0226437
.0211689

-44.85
3.32

0.000
0.001

-1.059942
.0288703

-.97118
.1118508

ARCH_anvil
arch
L1.
L2.

.4893288
.2782296

.0286012
.0208172

17.11
13.37

0.000
0.000

.4332714
.2374287

.5453862
.3190305

apex
L1.

1.894972

.0616293

30.75

0.000

1.774181

2.015763

_cons

.1034111

.0735512

1.41

0.160

-.0407466

.2475688

-.5354047

.0143275

-37.37

0.000

-.563486

-.5073234

corr(acme,
anvil)

The results indicate that increases in the futures prices for related products lead to higher returns on
the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional
variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates
that an increase in the return on the Apex stock leads to more variability in the return on the Anvil
stock. That the estimated conditional correlation between the two returns is 0.54 indicates that these

mgarch ccc Constant conditional correlation multivariate GARCH models

319

returns tend to move in opposite directions; in other words, an increase in the return for the Acme
stock tends to be associated with a decrease in the return for the Anvil stock, and vice versa.

Stored results
mgarch ccc stores the following in e():
Scalars
e(N)
e(k)
e(k aux)
e(k extra)
e(k eq)
e(k dv)
e(df m)
e(ll)
e(chi2)
e(p)
e(estdf)
e(usr)
e(tmin)
e(tmax)
e(N gaps)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(model)
e(cmdline)
e(depvar)
e(covariates)
e(dv eqs)
e(indeps)
e(tvar)
e(title)
e(chi2type)
e(vce)
e(vcetype)
e(tmins)
e(tmaxs)
e(dist)
e(arch)
e(garch)
e(technique)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)

number of observations
number of parameters
number of auxiliary parameters
number of extra estimates added to
number of equations in e(b)
number of dependent variables
model degrees of freedom
log likelihood

significance
1 if distribution parameter was estimated, 0 otherwise
user-provided distribution parameter
minimum time in sample
maximum time in sample
number of gaps
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise
mgarch
ccc
command as typed
names of dependent variables
list of covariates
dependent variables with mean equations
independent variables in each equation
time variable
title in estimation output
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
formatted minimum time
formatted maximum time
distribution for error term: gaussian or t
specified ARCH terms
specified GARCH terms
maximization technique
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

320

mgarch ccc Constant conditional correlation multivariate GARCH models

Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(hessian)
e(V)
e(pinfo)
Functions
e(sample)

coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
Hessian matrix
variancecovariance matrix of the estimators
parameter information, used by predict
marks estimation sample

Methods and formulas


mgarch ccc estimates the parameters of the CCC MGARCH model by maximum likelihood. The
unconcentrated log-likelihood function based on the multivariate normal distribution for observation t
is

n 
o
1/2
lt = 0.5m log(2) 0.5log {det (R)} log det Dt
0.5e
t R1e
0t

(1)

1/2

where e
t = Dt
t is an m 1 vector of standardized residuals, t = yt Cxt . The log-likelihood
PT
function is t=1 lt .
If we assume that t follow a multivariate t distribution with degrees of freedom (df) greater than
2, then the unconcentrated log-likelihood function for observation t is

 

df
m
df + m
log

log {(df 2)}


lt = log
2
2
2


n 
o df + m
e
t R1e
0t
1/2
0.5log {det (R)} log det Dt

log 1 +
(2)
2
df 2
The correlation matrix R can be concentrated out of (1) and (2) by defining the (i, j)th element
of R as

bij =

T
X
t=1

!
e
ite
jt

T
X

e
2it

T
 12  X

e
2jt

 21

t=1

t=1

mgarch ccc starts the optimization process with the concentrated log-likelihood function.
The starting values for the parameters in the mean equations and the initial residuals b
t are
obtained by least-squares regression. The starting values for the parameters in the variance equations
are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). If the optimization
is started with the unconcentrated log likelihood, then the initial values for the parameters in R are
calculated from the standardized residuals e
t .
GARCH estimators require initial values that can be plugged in for ti 0ti and

Htj when
t i < 1 and t j < 1. mgarch ccc substitutes an estimator of the unconditional covariance of the
disturbances
b = T 1

T
X
t=1

0
b
b
t b
b
t

(3)

mgarch ccc Constant conditional correlation multivariate GARCH models

321

for ti 0ti when t i < 1 and for Htj when t j < 1, where b
b
t is the vector of residuals
calculated using the estimated parameters.
mgarch ccc requires a sample size that at the minimum is equal to the number of parameters in
the model plus twice the number of equations.
mgarch ccc uses numerical derivatives in maximizing the log-likelihood function.

References
Bollerslev, T. 1990. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH
model. Review of Economics and Statistics 72: 498505.
Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge:
Cambridge University Press.
Silvennoinen, A., and T. Terasvirta. 2009. Multivariate GARCH models. In Handbook of Financial Time Series, ed.
T. G. Andersen, R. A. Davis, J.-P. Kreis, and T. Mikosch, 201229. Berlin: Springer.

Also see
[TS] mgarch ccc postestimation Postestimation tools for mgarch ccc
[TS] mgarch Multivariate GARCH models
[TS] tsset Declare data to be time-series data
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
mgarch ccc postestimation Postestimation tools for mgarch ccc
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following standard postestimation commands are available after mgarch ccc:
Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

322

mgarch ccc postestimation Postestimation tools for mgarch ccc

323

Syntax for predict


predict

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

statistic
Main

linear prediction; the default


residuals
conditional variances and covariances
conditional correlations

xb
residuals
variance
correlation

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options

equation(eqnames)
names of equations for which predictions are made
dynamic(time constant) begin dynamic forecast at specified time

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear predictions of the dependent variables.
residuals calculates the residuals.
variance predicts the conditional variances and conditional covariances.
correlation predicts the conditional correlations.

Options

equation(eqnames) specifies the equation for which the predictions are calculated. Use this option
to predict a statistic for a particular equation. Equation names, such as equation(income), are
used to identify equations.
One equation name may be specified when predicting the dependent variable, the residuals, or
the conditional variance. For example, specifying equation(income) causes predict to predict
income, and specifying variance equation(income) causes predict to predict the conditional
variance of income.
Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the
conditional covariance of income and consumption.

324

mgarch ccc postestimation Postestimation tools for mgarch ccc

dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with residuals.

Remarks and examples


We assume that you have already read [TS] mgarch ccc. In this entry, we use predict after
mgarch ccc to make in-sample and out-of-sample forecasts.

Example 1: Dynamic forecasts


In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns
modeled in example 2 of [TS] mgarch ccc. In the output below, we reestimate the parameters of the
model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample
one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph
the forecasts below.

.001

.002

.003

. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. quietly mgarch ccc (toyota nissan = , noconstant)
> (honda = L.nissan, noconstant), arch(1) garch(1)
. tsappend, add(50)
. predict H*, variance dynamic(2016)

01jan2009

01jul2009

01jan2010
Date

01jul2010

01jan2011

Variance prediction (toyota,toyota), dynamic(2016)


Variance prediction (nissan,nissan), dynamic(2016)
Variance prediction (honda,honda), dynamic(2016)

Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above
graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph
shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also
shows that the dynamic forecasts quickly converge.

mgarch ccc postestimation Postestimation tools for mgarch ccc

325

Methods and formulas


All one-step predictions are obtained by substituting the parameter estimates into the model. The
b is the initial value for the ARCH and
estimated unconditional variance matrix of the disturbances, ,
b using the prediction sample, the parameter
GARCH terms. The postestimation routines recompute
estimates stored in e(b), and (3) in Methods and formulas of [TS] mgarch ccc.
For observations in which the residuals are missing, the estimated unconditional variance matrix
of the disturbances is used in place of the outer product of the residuals.
Dynamic predictions of the dependent variables use previously predicted values beginning in the
period specified by dynamic().

b for the outer product of the


Dynamic variance predictions are implemented by substituting
residuals beginning in the period specified in dynamic().

Also see
[TS] mgarch ccc Constant conditional correlation multivariate GARCH models
[U] 20 Estimation and postestimation commands

Title
mgarch dcc Dynamic conditional correlation multivariate GARCH models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
mgarch dcc eq

eq . . . eq

 

if

 

in

 

, options

where each eq has the form





(depvars = indepvars
, eqoptions )
options

Description

Model

arch(numlist)
garch(numlist)
het(varlist)

 
distribution(dist # )
constraints(numlist)

ARCH terms for all equations


GARCH terms for all equations

include varlist in the specification of the conditional variance


for all equations
use dist distribution for errors [may be gaussian
(synonym normal) or t; default is gaussian]
apply linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

Maximization

maximize options
from(matname)

control the maximization process; seldom used


initial values for the coefficients; seldom used

coeflegend

display legend instead of statistics

326

mgarch dcc Dynamic conditional correlation multivariate GARCH models

eqoptions

Description

noconstant
arch(numlist)
garch(numlist)
het(varlist)

ARCH terms
GARCH terms

327

suppress constant term in the mean equation

include varlist in the specification of the conditional variance

You must tsset your data before using mgarch dcc; see [TS] tsset.
indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables.
depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Multivariate GARCH

Description
mgarch dcc estimates the parameters of dynamic conditional correlation (DCC) multivariate
generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional
variances are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH)
models and the conditional covariances are modeled as nonlinear functions of the conditional variances.
The conditional quasicorrelation parameters that weight the nonlinear combinations of the conditional
variances follow the GARCH-like process specified in Engle (2002).
The DCC MGARCH model is about as flexible as the closely related varying conditional correlation
MGARCH model (see [TS] mgarch vcc), more flexible than the conditional correlation MGARCH
model (see [TS] mgarch ccc), and more parsimonious than the diagonal vech MGARCH model (see
[TS] mgarch dvech).

Options


Model

arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms
are specified.
garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH
terms are specified.
het(varlist) specifies that varlist be included in the specification of the conditional variance for all
equations. This varlist enters the variance specification collectively as multiplicative heteroskedasticity.
 
distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian,
normal, or t.
gaussian and normal are synonyms; each causes mgarch dcc to assume that the errors come
from a multivariate normal distribution. # may not be specified with either of them.

328

mgarch dcc Dynamic conditional correlation multivariate GARCH models

t causes mgarch dcc to assume that the errors follow a multivariate Student t distribution, and
the degree-of-freedom parameter is estimated along with the other parameters of the model. If
distribution(t #) is specified, then mgarch dcc uses a multivariate Student t distribution
with # degrees of freedom. # must be greater than 2.
constraints(numlist) specifies linear constraints to apply to the parameter estimates.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, specifies to use the observed information matrix (OIM) estimator.
vce(robust) specifies to use the Huber/White/sandwich estimator.

Reporting

level(#); see [R] estimation options.


nocnsreport; see [R] estimation options.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except
from(), and see below for information on from(). These options are seldom used.
from(matname) specifies initial values for the coefficients. from(b0) causes mgarch dcc to begin
the optimization algorithm with the values in b0. b0 must be a row vector, and the number of
columns must equal the number of parameters in the model.
The following option is available with mgarch dcc but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Eqoptions
noconstant suppresses the constant term in the mean equation.
arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified.
This option may not be specified with model-level arch().
garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified.
This option may not be specified with model-level garch().
het(varlist) specifies that varlist be included in the specification of the conditional variance. This
varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option
may not be specified with model-level het().

Remarks and examples


We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH
models and the methods implemented in mgarch dcc.

mgarch dcc Dynamic conditional correlation multivariate GARCH models

329

MGARCH models are dynamic multivariate regression models in which the conditional variances
and covariances of the errors follow an autoregressive-moving-average structure. The DCC MGARCH
model uses a nonlinear combination of univariate GARCH models with time-varying cross-equation
weights to model the conditional covariance matrix of the errors.

As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their
specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht .
In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled
as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions
of the diagonal terms. In the DCC MGARCH model,

hij,t = ij,t

p
hii,t hjj,t

where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ij,t follows the
dynamic process specified in Engle (2002) and discussed below.
Because the ij,t varies with time, this model is known as the DCC GARCH model.

Technical note
The DCC GARCH model proposed by Engle (2002) can be written as

yt = Cxt + t
1/2

t = Ht t
1/2

1/2

Ht = Dt Rt Dt

1/2

Rt = diag(Qt )

Qt diag(Qt )

1/2

Qt = (1 1 2 )R + 1 e
t1e
0t1 + 2 Qt1
where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ;

t is an m 1 vector of normal, independent, and identically distributed innovations;

Dt is a diagonal matrix of conditional variances,


2
1,t
0

Dt = .
..
0

..
..
.
.
2
m,t

0
2
2,t
..
.
0

2
in which each i,t
evolves according to a univariate GARCH model of the form
P
Pqi
pi
2
2
i,t
= si + j=1
j 2i,tj + j=1
j i,tj

by default, or
2
i,t
= exp(i zi,t ) +

Ppi

j=1

j 2i,tj +

Pqi

j=1

2
j i,tj

(1)

330

mgarch dcc Dynamic conditional correlation multivariate GARCH models

when the het() option is specified, where t is a 1 p vector of parameters, zi is a p 1


vector of independent variables including a constant term, the j s are ARCH parameters,
and the j s are GARCH parameters;

Rt is a matrix of conditional quasicorrelations,

12,t
Rt =
..
.
1m,t

12,t
1
..
.
2m,t

1m,t
2m,t
..
..

.
.

1
1/2

e
t is an m 1 vector of standardized residuals, Dt

t ; and

1 and 2 are parameters that govern the dynamics of conditional quasicorrelations. 1 and
2 are nonnegative and satisfy 0 1 + 2 < 1.
When Qt is stationary, the R matrix in (1) is a weighted average of the unconditional covariance
matrix of the standardized residuals e
t , denoted by R, and the unconditional mean of Qt , denoted by
Q. Because R 6= Q, as shown by Aielli (2009), R is neither the unconditional correlation matrix nor
the unconditional mean of Qt . For this reason, the parameters in R are known as quasicorrelations;
see Aielli (2009) and Engle (2009) for discussions.

Some examples
Example 1: Model with common covariates
We have daily data on the stock returns of three car manufacturersToyota, Nissan, and Honda,
from January 2, 2003, to December 31, 2010in the variables toyota, nissan and honda. We
model the conditional means of the returns as a first-order vector autoregressive process and the
conditional covariances as a DCC MGARCH process in which the variance of each disturbance term
follows a GARCH(1,1) process.

mgarch dcc Dynamic conditional correlation multivariate GARCH models


. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. mgarch dcc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = 16902.435
Iteration 1:
log likelihood = 17005.448
Iteration 2:
log likelihood = 17157.958
Iteration 3:
log likelihood = 17267.363
Iteration 4:
log likelihood =
17318.29
Iteration 5:
log likelihood = 17353.029
Iteration 6:
log likelihood = 17369.115
Iteration 7:
log likelihood = 17388.035
Iteration 8:
log likelihood = 17401.254
Iteration 9:
log likelihood = 17435.556
(switching technique to nr)
Iteration 10: log likelihood = 17451.739
Iteration 11: log likelihood = 17474.645
Iteration 12: log likelihood = 17481.987
Iteration 13: log likelihood = 17484.827
Iteration 14: log likelihood = 17484.949
Iteration 15: log likelihood =
17484.95
Refining estimates
Iteration 0:
log likelihood =
17484.95
Iteration 1:
log likelihood =
17484.95
Dynamic conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17484.95
Coef.

Number of obs
Wald chi2(9)
Prob > chi2
Std. Err.

=
=
=

2014
19.54
0.0210

P>|z|

[95% Conf. Interval]

toyota
toyota
L1.

-.0510866

.0339824

-1.50

0.133

-.117691

.0155177

nissan
L1.

.0297834

.0247455

1.20

0.229

-.0187169

.0782837

honda
L1.

-.0162826

.0300323

-0.54

0.588

-.0751449

.0425797

ARCH_toyota
arch
L1.

.0608223

.0086686

7.02

0.000

.0438321

.0778124

garch
L1.

.9222207

.0111053

83.04

0.000

.9004547

.9439868

_cons

4.47e-06

1.15e-06

3.90

0.000

2.22e-06

6.72e-06

331

332

mgarch dcc Dynamic conditional correlation multivariate GARCH models

nissan
toyota
L1.

-.005672

.0389348

-0.15

0.884

-.0819828

.0706387

nissan
L1.

-.0287095

.0309379

-0.93

0.353

-.0893466

.0319276

honda
L1.

.0154979

.0358802

0.43

0.666

-.054826

.0858218

ARCH_nissan
arch
L1.

.084424

.0128192

6.59

0.000

.0592989

.1095492

garch
L1.

.8994206

.0151125

59.52

0.000

.8698007

.9290406

_cons

7.21e-06

1.93e-06

3.74

0.000

3.43e-06

.000011

toyota
L1.

-.027242

.0361819

-0.75

0.451

-.0981572

.0436732

nissan
L1.

.0617495

.0271378

2.28

0.023

.0085603

.1149386

honda
L1.

-.063507

.0332918

-1.91

0.056

-.1287578

.0017438

ARCH_honda
arch
L1.

.0490135

.0073695

6.65

0.000

.0345696

.0634573

garch
L1.

.9331126

.0103685

90.00

0.000

.9127907

.9534344

_cons

5.35e-06

1.35e-06

3.95

0.000

2.69e-06

8.00e-06

.6689543

.0168021

39.81

0.000

.6360228

.7018858

.7259625

.0140156

51.80

0.000

.6984923

.7534326

.6335659

.0180412

35.12

0.000

.5982058

.668926

.0315274
.8704193

.0088386
.0613329

3.57
14.19

0.000
0.000

.0142041
.750209

.0488506
.9906295

honda

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)
Adjustment
lambda1
lambda2

The iteration log has three parts: the dots from the search for initial values, the iteration log from
optimizing the log likelihood, and the iteration log from the refining step. A detailed discussion of
the optimization methods is in Methods and formulas.
The header describes the estimation sample and reports a Wald test against the null hypothesis
that all the coefficients on the independent variables in the mean equations are zero. Here the null
hypothesis is rejected at the 5% level.
The output table first presents results for the mean or variance parameters used to model each
dependent variable. Subsequently, the output table presents results for the conditional quasicorrelations.

mgarch dcc Dynamic conditional correlation multivariate GARCH models

333

For example, the conditional quasicorrelation between the standardized residuals for Toyota and Nissan
is estimated to be 0.67. Finally, the output table presents results for the adjustment parameters 1
and 2 . In the example at hand, the estimates for both 1 and 2 are statistically significant.
The DCC MGARCH model reduces to the CCC MGARCH model when 1 = 2 = 0. The output
below shows that a Wald test rejects the null hypothesis that 1 = 2 = 0 at all conventional levels.
. test _b[Adjustment:lambda1] = _b[Adjustment:lambda2] = 0
( 1) [Adjustment]lambda1 - [Adjustment]lambda2 = 0
( 2) [Adjustment]lambda1 = 0
chi2( 2) = 1102.45
Prob > chi2 =
0.0000

These results indicate that the assumption of time-invariant conditional correlations maintained in
the CCC MGARCH model is too restrictive for these data.

Example 2: Model with covariates that differ by equation


We improve the previous example by removing the insignificant parameters from the model. To
remove these parameters, we specify the honda equation separately from the toyota and nissan
equations:
. mgarch dcc (toyota nissan = , noconstant) (honda = L.nissan, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = 16884.502
Iteration 1:
log likelihood = 16970.755
Iteration 2:
log likelihood = 17140.318
Iteration 3:
log likelihood = 17237.807
Iteration 4:
log likelihood =
17306.12
Iteration 5:
log likelihood = 17342.533
Iteration 6:
log likelihood = 17363.511
Iteration 7:
log likelihood = 17392.501
Iteration 8:
log likelihood = 17407.242
Iteration 9:
log likelihood = 17448.702
(switching technique to nr)
Iteration 10: log likelihood = 17472.199
Iteration 11: log likelihood = 17475.842
Iteration 12: log likelihood = 17476.345
Iteration 13: log likelihood =
17476.35
Iteration 14: log likelihood =
17476.35
Refining estimates
Iteration 0:
Iteration 1:

log likelihood =
log likelihood =

17476.35
17476.35

334

mgarch dcc Dynamic conditional correlation multivariate GARCH models


Dynamic conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17476.35
Coef.

Number of obs
Wald chi2(1)
Prob > chi2

=
=
=

2014
2.21
0.1374

Std. Err.

P>|z|

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.0608188

.0086675

7.02

0.000

.0438308

.0778067

garch
L1.

.9219957

.0111066

83.01

0.000

.9002271

.9437643

_cons

4.49e-06

1.14e-06

3.95

0.000

2.27e-06

6.72e-06

ARCH_nissan
arch
L1.

.0876161

.01302

6.73

0.000

.0620974

.1131348

garch
L1.

.8950964

.0152908

58.54

0.000

.865127

.9250658

_cons

7.69e-06

1.99e-06

3.86

0.000

3.79e-06

.0000116

nissan
L1.

.019978

.0134488

1.49

0.137

-.0063811

.0463371

ARCH_honda
arch
L1.

.0488799

.0073767

6.63

0.000

.0344218

.063338

garch
L1.

.9330047

.0103944

89.76

0.000

.912632

.9533774

_cons

5.42e-06

1.36e-06

3.98

0.000

2.75e-06

8.08e-06

.6668433

.0163209

40.86

0.000

.6348548

.6988317

.7258101

.0137072

52.95

0.000

.6989446

.7526757

.6313515

.0175454

35.98

0.000

.5969631

.6657399

.0324493
.8574681

.0074013
.0476274

4.38
18.00

0.000
0.000

.0179429
.7641202

.0469556
.9508161

honda

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)
Adjustment
lambda1
lambda2

It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant.
We could further improve the model by removing L1.nissan from the model.
There is no mean equation for Toyota or Nissan. In [TS] mgarch dcc postestimation, we discuss
prediction from models without covariates.

mgarch dcc Dynamic conditional correlation multivariate GARCH models

335

Example 3: Model with constraints


Here we fit a bivariate DCC MGARCH model for the Toyota and Nissan shares. We believe that
the shares of these car manufacturers follow the same process, so we impose the constraints that the
ARCH coefficients are the same for the two companies and that the GARCH coefficients are also the
same.
. constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch]
. constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch]
. mgarch dcc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = 10307.609
Iteration 1:
log likelihood = 10656.153
Iteration 2:
log likelihood = 10862.137
Iteration 3:
log likelihood = 10987.457
Iteration 4:
log likelihood = 11062.347
Iteration 5:
log likelihood = 11135.207
Iteration 6:
log likelihood = 11245.619
Iteration 7:
log likelihood =
11253.56
Iteration 8:
log likelihood =
11294
Iteration 9:
log likelihood = 11296.364
(switching technique to nr)
Iteration 10: log likelihood =
11296.76
Iteration 11: log likelihood = 11297.087
Iteration 12: log likelihood = 11297.091
Iteration 13: log likelihood = 11297.091
Refining estimates
Iteration 0:
log likelihood = 11297.091
Iteration 1:
log likelihood = 11297.091

336

mgarch dcc Dynamic conditional correlation multivariate GARCH models


Dynamic conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 11297.09
( 1) [ARCH_toyota]L.arch - [ARCH_nissan]L.arch =
( 2) [ARCH_toyota]L.garch - [ARCH_nissan]L.garch
Coef.

Number of obs
Wald chi2(.)
Prob > chi2
0
= 0

Std. Err.

P>|z|

=
=
=

2015
.
.

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.080889

.0103227

7.84

0.000

.060657

.1011211

garch
L1.

.9060711

.0119107

76.07

0.000

.8827267

.9294156

_cons

4.21e-06

1.10e-06

3.83

0.000

2.05e-06

6.36e-06

ARCH_nissan
arch
L1.

.080889

.0103227

7.84

0.000

.060657

.1011211

garch
L1.

.9060711

.0119107

76.07

0.000

.8827267

.9294156

_cons

5.92e-06

1.47e-06

4.03

0.000

3.04e-06

8.80e-06

.6646283

.0187793

35.39

0.000

.6278215

.7014351

.0446559
.8686054

.0123017
.0510885

3.63
17.00

0.000
0.000

.020545
.7684739

.0687668
.968737

corr(toyota,
nissan)
Adjustment
lambda1
lambda2

We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio
test. The results indicate that the restricted model is preferable.

Example 4: Model with a GARCH term


In this example, we have data on fictional stock returns for the Acme and Anvil corporations, and
we believe that the movement of the two stocks is governed by different processes. We specify one
ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for
the conditional variance equation for Anvil. In addition, we include the lagged value of the stock
return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For
Acme, we have data on the changes in an index of futures prices of products related to those produced
by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of
inputs used by Anvil in afinputs.

mgarch dcc Dynamic conditional correlation multivariate GARCH models

337

. use http://www.stata-press.com/data/r13/acmeh
. mgarch dcc (acme = afrelated, noconstant arch(1) garch(1))
> (anvil = afinputs, arch(1/2) het(L.apex))
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = -13260.522
(output omitted )
Iteration 9:
log likelihood = -12362.876
(switching technique to nr)
Iteration 10: log likelihood = -12362.876
Refining estimates
Iteration 0:
Iteration 1:

log likelihood = -12362.876


log likelihood = -12362.876

Dynamic conditional correlation MGARCH model


Sample: 1 - 2500
Distribution: Gaussian
Log likelihood = -12362.88
Coef.

Number of obs
Wald chi2(2)
Prob > chi2
Std. Err.

P>|z|

=
=
=

2499
2596.18
0.0000

[95% Conf. Interval]

acme
afrelated

.950805

.0557082

17.07

0.000

.841619

1.059991

ARCH_acme
arch
L1.

.1063295

.0157161

6.77

0.000

.0755266

.1371324

garch
L1.

.7556294

.0391568

19.30

0.000

.6788836

.8323753

_cons

2.197566

.458343

4.79

0.000

1.29923

3.095901

anvil
afinputs
_cons

-1.015657
.0808653

.0209959
.019445

-48.37
4.16

0.000
0.000

-1.056808
.0427538

-.9745054
.1189767

ARCH_anvil
arch
L1.
L2.

.5261675
.2866454

.0281586
.0196504

18.69
14.59

0.000
0.000

.4709777
.2481314

.5813572
.3251595

apex
L1.

1.953173

.0594862

32.83

0.000

1.836582

2.069764

_cons

-.0062964

.0710842

-0.09

0.929

-.1456188

.1330261

-.5600358

.0326358

-17.16

0.000

-.6240008

-.4960708

.1904321
.7147267

.0154449
.0226204

12.33
31.60

0.000
0.000

.1601607
.6703916

.2207035
.7590618

corr(acme,
anvil)
Adjustment
lambda1
lambda2

The results indicate that increases in the futures prices for related products lead to higher returns on
the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional
variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates

338

mgarch dcc Dynamic conditional correlation multivariate GARCH models

that an increase in the return on the Apex stock leads to more variability in the return on the Anvil
stock.

Stored results
mgarch dcc stores the following in e():
Scalars
e(N)
e(k)
e(k aux)
e(k extra)
e(k eq)
e(k dv)
e(df m)
e(ll)
e(chi2)
e(p)
e(estdf)
e(usr)
e(tmin)
e(tmax)
e(N gaps)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(model)
e(cmdline)
e(depvar)
e(covariates)
e(dv eqs)
e(indeps)
e(tvar)
e(title)
e(chi2type)
e(vce)
e(vcetype)
e(tmins)
e(tmaxs)
e(dist)
e(arch)
e(garch)
e(technique)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(hessian)
e(V)
e(pinfo)
Functions
e(sample)

number of observations
number of parameters
number of auxiliary parameters
number of extra estimates added to
number of equations in e(b)
number of dependent variables
model degrees of freedom
log likelihood

significance
1 if distribution parameter was estimated, 0 otherwise
user-provided distribution parameter
minimum time in sample
maximum time in sample
number of gaps
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise
mgarch
dcc
command as typed
names of dependent variables
list of covariates
dependent variables with mean equations
independent variables in each equation
time variable
title in estimation output
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
formatted minimum time
formatted maximum time
distribution for error term: gaussian or t
specified ARCH terms
specified GARCH terms
maximization technique
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins
coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
Hessian matrix
variancecovariance matrix of the estimators
parameter information, used by predict
marks estimation sample

mgarch dcc Dynamic conditional correlation multivariate GARCH models

339

Methods and formulas


mgarch dcc estimates the parameters of the DCC MGARCH model by maximum likelihood. The
log-likelihood function based on the multivariate normal distribution for observation t is

n 
o
1/2
lt = 0.5m log(2) 0.5log {det (Rt )} log det Dt
0.5e
t R1
0t
t e
1/2

where e
t = Dt
t is an m 1 vector of standardized residuals, t = yt Cxt . The log-likelihood
PT
function is t=1 lt .
If we assume that t follow a multivariate t distribution with degrees of freedom (df) greater than
2, then the log-likelihood function for observation t is


df
m

log {(df 2)}


2
2


n 
o df + m
e
t R1
0t
1/2
t e
0.5log {det (Rt )} log det Dt

log 1 +
2
df 2


lt = log

df + m
2

log

The starting values for the parameters in the mean equations and the initial residuals b
t are
obtained by least-squares regression. The starting values for the parameters in the variance equations
are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). The starting
values for the quasicorrelation parameters are calculated from the standardized residuals e
t . Given
the starting values for the mean and variance equations, the starting values for the parameters 1 and
2 are obtained from a grid search performed on the log likelihood.
The initial optimization step is performed in the unconstrained space. Once the maximum is found,
we impose the constraints 1 0, 2 0, and 0 1 + 2 < 1, and maximize the log likelihood
in the constrained space. This step is reported in the iteration log as the refining step.
GARCH estimators require initial values that can be plugged in for ti 0ti and

Htj when
t i < 1 and t j < 1. mgarch dcc substitutes an estimator of the unconditional covariance of the
disturbances
b = T 1

T
X

0
b
b
t b
b
t

(2)

t=1

for ti 0ti when t i < 1 and for Htj when t j < 1, where b
b
t is the vector of residuals
calculated using the estimated parameters.
mgarch dcc uses numerical derivatives in maximizing the log-likelihood function.

References
Aielli, G. P. 2009. Dynamic Conditional Correlations: On Properties and Estimation. Working paper, Dipartimento di
Statistica, University of Florence, Florence, Italy.
Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional
heteroskedasticity models. Journal of Business & Economic Statistics 20: 339350.
. 2009. Anticipating Correlations: A New Paradigm for Risk Management. Princeton, NJ: Princeton University
Press.

340

mgarch dcc Dynamic conditional correlation multivariate GARCH models

Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge:
Cambridge University Press.

Also see
[TS] mgarch dcc postestimation Postestimation tools for mgarch dcc
[TS] mgarch Multivariate GARCH models
[TS] tsset Declare data to be time-series data
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
mgarch dcc postestimation Postestimation tools for mgarch dcc
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following standard postestimation commands are available after mgarch dcc:
Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

341

342

mgarch dcc postestimation Postestimation tools for mgarch dcc

Syntax for predict


predict

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

statistic
Main

linear prediction; the default


residuals
conditional variances and covariances
conditional correlations

xb
residuals
variance
correlation

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options

equation(eqnames)
names of equations for which predictions are made
dynamic(time constant) begin dynamic forecast at specified time

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear predictions of the dependent variables.
residuals calculates the residuals.
variance predicts the conditional variances and conditional covariances.
correlation predicts the conditional correlations.

Options

equation(eqnames) specifies the equation for which the predictions are calculated. Use this option
to predict a statistic for a particular equation. Equation names, such as equation(income), are
used to identify equations.
One equation name may be specified when predicting the dependent variable, the residuals, or
the conditional variance. For example, specifying equation(income) causes predict to predict
income, and specifying variance equation(income) causes predict to predict the conditional
variance of income.
Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the
conditional covariance of income and consumption.

mgarch dcc postestimation Postestimation tools for mgarch dcc

343

dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with residuals.

Remarks and examples


We assume that you have already read [TS] mgarch dcc. In this entry, we use predict after
mgarch dcc to make in-sample and out-of-sample forecasts.

Example 1: Dynamic forecasts


In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns
modeled in example 2 of [TS] mgarch dcc. In the output below, we reestimate the parameters of the
model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample
one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph
the forecasts below.

.001

.002

.003

. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. quietly mgarch dcc (toyota nissan = , noconstant)
> (honda = L.nissan, noconstant), arch(1) garch(1)
. tsappend, add(50)
. predict H*, variance dynamic(2016)

01jan2009

01jul2009

01jan2010
Date

01jul2010

01jan2011

Variance prediction (toyota,toyota), dynamic(2016)


Variance prediction (nissan,nissan), dynamic(2016)
Variance prediction (honda,honda), dynamic(2016)

Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above
graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph
shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also
shows that the dynamic forecasts quickly converge.

344

mgarch dcc postestimation Postestimation tools for mgarch dcc

Methods and formulas


All one-step predictions are obtained by substituting the parameter estimates into the model. The
b is the initial value for the ARCH and
estimated unconditional variance matrix of the disturbances, ,
b using the prediction sample, the parameter
GARCH terms. The postestimation routines recompute
estimates stored in e(b), and (2) in Methods and formulas of [TS] mgarch dcc.
For observations in which the residuals are missing, the estimated unconditional variance matrix
of the disturbances is used in place of the outer product of the residuals.
Dynamic predictions of the dependent variables use previously predicted values beginning in the
period specified by dynamic().

b for the outer product of the


Dynamic variance predictions are implemented by substituting
residuals beginning in the period specified in dynamic().

Also see
[TS] mgarch dcc Dynamic conditional correlation multivariate GARCH models
[U] 20 Estimation and postestimation commands

Title
mgarch dvech Diagonal vech multivariate GARCH models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
mgarch dvech eq

eq . . . eq

 

if

 

in

 

, options

where each eq has the form





(depvars = indepvars
, noconstant )
options

Description

Model

arch(numlist)
garch(numlist)
 
distribution(dist # )
constraints(numlist)

ARCH terms
GARCH terms

use dist distribution for errors (may be gaussian,


normal, or t; default is gaussian)
apply linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

Maximization

maximize options
control the maximization process; seldom used
from(matname)
initial values for the coefficients; seldom used
svtechnique(algorithm spec)starting-value maximization algorithm
sviterate(#)
number of starting-value iterations; default is sviterate(25)
coeflegend

display legend instead of statistics

You must tsset your data before using mgarch dvech; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
depvars and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

345

346

mgarch dvech Diagonal vech multivariate GARCH models

Menu
Statistics

>

Multivariate time series

>

Multivariate GARCH

Description
mgarch dvech estimates the parameters of diagonal vech (DVECH) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which each element of the conditional
correlation matrix is parameterized as a linear function of its own past and past shocks.
DVECH MGARCH models are less parsimonious than the conditional correlation models discussed
in [TS] mgarch ccc, [TS] mgarch dcc, and [TS] mgarch vcc because the number of parameters in
DVECH MGARCH models increases more rapidly with the number of series modeled.

Options


Model

noconstant suppresses the constant term(s).


arch(numlist) specifies the ARCH terms in the model. By default, no ARCH terms are specified.
garch(numlist) specifies the GARCH terms in the model. By default, no GARCH terms are specified.
 
distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian,
normal, or t.
gaussian and normal are synonyms; each causes mgarch dvech to assume that the errors come
from a multivariate normal distribution. # cannot be specified with either of them.
t causes mgarch dvech to assume that the errors follow a multivariate Student t distribution, and
the degree-of-freedom parameter is estimated along with the other parameters of the model. If
distribution(t #) is specified, then mgarch dvech uses a multivariate Student t distribution
with # degrees of freedom. # must be greater than 2.
constraints(numlist) specifies linear constraints to apply to the parameter estimates.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, specifies to use the observed information matrix (OIM) estimator.
vce(robust) specifies to use the Huber/White/sandwich estimator.

Reporting

level(#); see [R] estimation options.


nocnsreport; see [R] estimation options.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

mgarch dvech Diagonal vech multivariate GARCH models

347

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except
from(), and see below for information on from(). These options are seldom used.
from(matname) specifies initial values for the coefficients. from(b0) causes mgarch dvech to begin
the optimization algorithm with the values in b0. b0 must be a row vector, and the number of
columns must equal the number of parameters in the model.
svtechnique(algorithm spec) and sviterate(#) specify options for the starting-value search
process.
svtechnique(algorithm spec) specifies the algorithm used to search for initial values. The
syntax for algorithm spec is the same as for the technique() option; see [R] maximize.
svtechnique(bhhh 5 nr 16000) is the default. This option may not be specified with
from().
sviterate(#) specifies the maximum number of iterations that the search algorithm may
perform. The default is sviterate(25). This option may not be specified with from().
The following option is available with mgarch dvech but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH
models and the methods implemented in mgarch dvech.
MGARCH models are dynamic multivariate regression models in which the conditional variances
and covariances of the errors follow an autoregressive-moving-average structure. The DVECH MGARCH
model parameterizes each element of the current conditional covariance matrix as a linear function
of its own past and past shocks.

As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their
specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht .
In a DVECH MGARCH model with one ARCH term and one GARCH term, the (i, j)th element of
conditional covariance matrix is modeled by

hij,t = sij + aij i,t1 j,t1 + bij hij,t1


where sij , aij , and bij are parameters and t1 is the vector of errors from the previous period. This
expression shows the linear form in which each element of the current conditional covariance matrix
is a function of its own past and past shocks.

Technical note
The general vech MGARCH model developed by Bollerslev, Engle, and Wooldridge (1988) can be
written as

yt = Cxt + t
t =

1/2
Ht t
p
X

ht = s +

i=1

(1)
(2)

Ai vech(ti 0ti ) +

q
X
j=1

Bj htj

(3)

348

mgarch dvech Diagonal vech multivariate GARCH models

where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ;

t is an m 1 vector of independent and identically distributed innovations;

ht = vech(Ht );
the vech() function stacks the lower diagonal elements of a symmetric matrix into a column
vector, for example,


vech

1 2
2 3

= (1, 2, 3)0

s is an m(m + 1)/2 1 vector of parameters;


each Ai is an {m(m + 1)/2} {m(m + 1)/2} matrix of parameters; and
each Bj is an {m(m + 1)/2} {m(m + 1)/2} matrix of parameters.
Bollerslev, Engle, and Wooldridge (1988) argued that the general-vech MGARCH model in (1)(3)
was too flexible to fit to data, so they proposed restricting the matrices Ai and Bj to be diagonal
matrices. It is for this restriction that the model is known as a diagonal vech MGARCH model. The
diagonal vech MGARCH model can also be expressed by replacing (3) with

Ht = S +

p
X

Ai ti 0ti +

i=1

q
X

Bj Htj

(30 )

j=1

where S is an m m symmetric parameter matrix; each Ai is an m m symmetric parameter


matrix; is the elementwise or Hadamard product; and each Bj is an m m symmetric parameter
matrix. In (30 ), A and B are symmetric but not diagonal matrices because we used the Hadamard
product. The matrices are diagonal in the vech representation of (3) but not in the Hadamard-product
representation of (30 ).
The Hadamard-product representation in (30 ) clarifies that each element in Ht depends on its
past values and the past values of the corresponding ARCH terms. Although this representation does
not allow cross-covariance effects, it is still quite flexible. The rapid rate at which the number of
parameters grows with m, p, or q is one aspect of the models flexibility.

Some examples
Example 1: Model with common covariates
We have data on a secondary market rate of a six-month U.S. Treasury bill, tbill, and on
Moodys seasoned AAA corporate bond yield, bond. We model the first-differences of tbill and the
first-differences of bond as a VAR(1) with an ARCH(1) term.

mgarch dvech Diagonal vech multivariate GARCH models


. use http://www.stata-press.com/data/r13/irates4
(St. Louis Fed (FRED) financial data)
. mgarch dvech (D.bond D.tbill = LD.bond LD.tbill), arch(1)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood = 3569.2723
Iteration 1:
log likelihood = 3708.4561
(output omitted )
Iteration 6:
log likelihood = 4183.8853
Iteration 7:
log likelihood = 4184.2424
(switching technique to nr)
Iteration 8:
log likelihood = 4184.4141
Iteration 9:
log likelihood = 4184.5973
Iteration 10: log likelihood = 4184.5975
Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood = 4184.5975
Iteration 1:
log likelihood = 4200.6303
Iteration 2:
log likelihood = 4208.5342
Iteration 3:
log likelihood =
4212.426
Iteration 4:
log likelihood = 4215.2373
(switching technique to nr)
Iteration 5:
log likelihood = 4217.0676
Iteration 6:
log likelihood = 4221.5706
Iteration 7:
log likelihood = 4221.6576
Iteration 8:
log likelihood = 4221.6577
Diagonal vech MGARCH model
Sample: 3 - 2456
Number of obs
Distribution: Gaussian
Wald chi2(4)
Log likelihood = 4221.658
Prob > chi2
Coef.

Std. Err.

=
=
=

2454
1183.52
0.0000

P>|z|

[95% Conf. Interval]

D.bond
bond
LD.

.2967674

.0247149

12.01

0.000

.2483271

.3452077

tbill
LD.

.0947949

.0098683

9.61

0.000

.0754533

.1141364

_cons

.0003991

.00143

0.28

0.780

-.0024036

.0032019

bond
LD.

.0108373

.0301501

0.36

0.719

-.0482558

.0699304

tbill
LD.

.4344747

.0176497

24.62

0.000

.3998819

.4690675

_cons

.0011611

.0021033

0.55

0.581

-.0029612

.0052835

1_1
2_1
2_2

.004894
.0040986
.0115149

.0002006
.0002396
.0005227

24.40
17.10
22.03

0.000
0.000
0.000

.0045008
.0036289
.0104904

.0052871
.0045683
.0125395

1_1
2_1
2_2

.4514942
.2518879
.843368

.0456835
.036736
.0608055

9.88
6.86
13.87

0.000
0.000
0.000

.3619562
.1798866
.7241914

.5410323
.3238893
.9625446

D.tbill

Sigma0

L.ARCH

349

350

mgarch dvech Diagonal vech multivariate GARCH models

The output has three parts: an iteration log, a header, and an output table. The iteration log has
two parts: the first part reports the iterations from the process of searching for starting values, and
the second part reports the iterations from maximizing the log-likelihood function.
The header describes the estimation sample and reports a Wald test against the null hypothesis that
all the coefficients on the independent variables in each equation are zero. Here the null hypothesis
is rejected at all conventional levels.
The output table reports point estimates, standard errors, tests against zero, and confidence intervals
for the estimated coefficients, the estimated elements of S, and any estimated elements of A or B.
Here the output indicates that in the equation for D.tbill, neither the coefficient on LD.bond nor
the constant are statistically significant. The elements of S are reported in the Sigma0 equation. The
estimate of S[1, 1] is 0.005, and the estimate of S[2, 1] is 0.004. The ARCH term results are reported
in the L.ARCH equation. In the L.ARCH equation, 1 1 is the coefficient on the ARCH term for the
conditional variance of the first dependent variable, 2 1 is the coefficient on the ARCH term for the
conditional covariance between the first and second dependent variables, and 2 2 is the coefficient
on the ARCH term for the conditional variance of the second dependent variable.

Example 2: Model with covariates that differ by equation


We improve the previous example by removing the insignificant parameters from the model:
. mgarch dvech (D.bond = LD.bond LD.tbill, noconstant)
> (D.tbill = LD.tbill, noconstant), arch(1)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood
(switching technique to nr)
Iteration 8:
log likelihood
Iteration 9:
log likelihood
Iteration 10: log likelihood
Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
(switching technique to nr)
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood
Iteration 8:
log likelihood

=
=
=
=
=
=
=
=

3566.8824
3701.6181
3952.8048
4076.5164
4166.6842
4180.2998
4182.4545
4182.9563

=
=
=

4183.0293
4183.1112
4183.1113

=
=
=
=
=

4183.1113
4202.0304
4210.2929
4215.7798
4217.7755

=
=
=
=

4219.0078
4221.4197
4221.433
4221.433

mgarch dvech Diagonal vech multivariate GARCH models


Diagonal vech MGARCH model
Sample: 3 - 2456
Distribution: Gaussian
Log likelihood = 4221.433
Coef.

Number of obs
Wald chi2(3)
Prob > chi2
Std. Err.

=
=
=

351

2454
1197.76
0.0000

P>|z|

[95% Conf. Interval]

D.bond
bond
LD.

.2941649

.0234734

12.53

0.000

.2481579

.3401718

tbill
LD.

.0953158

.0098077

9.72

0.000

.076093

.1145386

D.tbill
tbill
LD.

.4385945

.0136672

32.09

0.000

.4118072

.4653817

1_1
2_1
2_2

.0048922
.0040949
.0115043

.0002005
.0002394
.0005184

24.40
17.10
22.19

0.000
0.000
0.000

.0044993
.0036256
.0104883

.0052851
.0045641
.0125203

1_1
2_1
2_2

.4519233
.2515474
.8437212

.045671
.0366701
.0600839

9.90
6.86
14.04

0.000
0.000
0.000

.3624099
.1796752
.7259589

.5414368
.3234195
.9614836

Sigma0

L.ARCH

We specified each equation separately to remove the insignificant parameters. All the parameter
estimates are statistically significant.

Example 3: Model with constraints


Here we analyze some fictional weekly data on the percentages of bad widgets found in the
factories of Acme Inc. and Anvil Inc. We model the levels as a first-order autoregressive process.
We believe that the adaptive management style in these companies causes the variances to follow
a diagonal vech MGARCH process with one ARCH term and one GARCH term. Furthermore, these
close competitors follow essentially the same process, so we impose the constraints that the ARCH
coefficients are the same for the two companies and that the GARCH coefficients are also the same.

352

mgarch dvech Diagonal vech multivariate GARCH models

Imposing these constraints yields


. use http://www.stata-press.com/data/r13/acme
. constraint 1 [L.ARCH]1_1 = [L.ARCH]2_2
. constraint 2 [L.GARCH]1_1 = [L.GARCH]2_2
. mgarch dvech (acme = L.acme) (anvil = L.anvil), arch(1) garch(1)
> constraints(1 2)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood

=
=
=
=
=
=
=
=

-6087.0665
-6022.2046
-5986.6152
-5976.5739
-5974.4342
-5974.4046
-5974.4036
-5974.4035

Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood

=
=
=
=
=

-5974.4035
-5973.812
-5973.8004
-5973.7999
-5973.7999

(not concave)

Diagonal vech MGARCH model


Sample: 1969w35 - 1998w25
Distribution: Gaussian
Log likelihood =
-5973.8
( 1)
( 2)

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

1499
272.47
0.0000

[L.ARCH]1_1 - [L.ARCH]2_2 = 0
[L.GARCH]1_1 - [L.GARCH]2_2 = 0
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

acme
acme
L1.

.3365278

.0255134

13.19

0.000

.2865225

.3865331

_cons

1.124611

.060085

18.72

0.000

1.006847

1.242376

anvil
L1.

.3151955

.0263287

11.97

0.000

.2635922

.3667988

_cons

1.215786

.0642052

18.94

0.000

1.089947

1.341626

1_1
2_1
2_2

1.889237
.4599576
2.063113

.2168733
.1139843
.2454633

8.71
4.04
8.40

0.000
0.000
0.000

1.464173
.2365525
1.582014

2.314301
.6833626
2.544213

1_1
2_1
2_2

.2813443
.181877
.2813443

.0299124
.0335393
.0299124

9.41
5.42
9.41

0.000
0.000
0.000

.222717
.1161412
.222717

.3399716
.2476128
.3399716

1_1
2_1
2_2

.1487581
.085404
.1487581

.0697531
.1446524
.0697531

2.13
0.59
2.13

0.033
0.555
0.033

.0120445
-.1981094
.0120445

.2854716
.3689175
.2854716

anvil

Sigma0

L.ARCH

L.GARCH

mgarch dvech Diagonal vech multivariate GARCH models

353

We could test our constraints by fitting the unconstrained model and performing either a Wald or a
likelihood-ratio test. The results indicate that we might further restrict the time-invariant components
of the conditional variances to be the same across companies.

Example 4: Model with a GARCH term


Some models of financial data include no covariates or constant terms. For example, in modeling
fictional data on the stock returns of Acme Inc. and Anvil Inc., we found it best not to include
any covariates or constant terms. We include two ARCH terms and one GARCH term to model the
conditional variances.
. use http://www.stata-press.com/data/r13/aacmer
. mgarch dvech (acme anvil = , noconstant), arch(1/2) garch(1)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood = -18417.243 (not concave)
Iteration 1:
log likelihood = -18215.005
Iteration 2:
log likelihood = -18199.691
Iteration 3:
log likelihood = -18136.699
Iteration 4:
log likelihood = -18084.256
Iteration 5:
log likelihood = -17993.662
Iteration 6:
log likelihood =
-17731.1
Iteration 7:
log likelihood = -17629.505
(switching technique to nr)
Iteration 8:
log likelihood = -17548.172
Iteration 9:
log likelihood = -17544.987
Iteration 10: log likelihood = -17544.937
Iteration 11: log likelihood = -17544.937
Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood = -17544.937
Iteration 1:
log likelihood = -17544.937
Diagonal vech MGARCH model
Sample: 1 - 5000
Number of obs
Distribution: Gaussian
Wald chi2(.)
Log likelihood = -17544.94
Prob > chi2
Coef.

Std. Err.

=
=
=

5000
.
.

P>|z|

[95% Conf. Interval]

Sigma0
1_1
2_1
2_2

1.026283
.4300997
1.019753

.0823348
.0590294
.0837146

12.46
7.29
12.18

0.000
0.000
0.000

.8649096
.3144042
.8556751

1.187656
.5457952
1.18383

1_1
2_1
2_2

.2878739
.1036685
.2034196

.02157
.0161446
.019855

13.35
6.42
10.25

0.000
0.000
0.000

.2455975
.0720256
.1645044

.3301504
.1353114
.2423347

1_1
2_1
2_2

.1837825
.0884425
.2025718

.0274555
.02208
.0272639

6.69
4.01
7.43

0.000
0.000
0.000

.1299706
.0451665
.1491355

.2375943
.1317185
.256008

1_1
2_1
2_2

.0782467
.2888104
.201618

.053944
.0818303
.0470584

1.45
3.53
4.28

0.147
0.000
0.000

-.0274816
.1284261
.1093853

.183975
.4491948
.2938508

L.ARCH

L2.ARCH

L.GARCH

354

mgarch dvech Diagonal vech multivariate GARCH models

The model test is omitted from the output, because there are no covariates in the model. The univariate
tests indicate that the included parameters fit the data well. In [TS] mgarch dvech postestimation,
we discuss prediction from models without covariates.

Stored results
mgarch dvech stores the following in e():
Scalars
e(N)
e(k)
e(k extra)
e(k eq)
e(k dv)
e(df m)
e(ll)
e(chi2)
e(p)
e(estdf)
e(usr)
e(tmin)
e(tmax)
e(N gaps)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(model)
e(cmdline)
e(depvar)
e(covariates)
e(dv eqs)
e(indeps)
e(tvar)
e(title)
e(chi2type)
e(vce)
e(vcetype)
e(tmins)
e(tmaxs)
e(dist)
e(arch)
e(garch)
e(svtechnique)
e(technique)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)

number of observations
number of parameters
number of extra estimates added to
number of equations in e(b)
number of dependent variables
model degrees of freedom
log likelihood

significance
1 if distribution parameter was estimated, 0 otherwise
user-provided distribution parameter
minimum time in sample
maximum time in sample
number of gaps
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise
mgarch
dvech
command as typed
names of dependent variables
list of covariates
dependent variables with mean equations
independent variables in each equation
time variable
title in estimation output
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
formatted minimum time
formatted maximum time
distribution for error term: gaussian or t
specified ARCH terms
specified GARCH terms
maximization technique(s) for starting values
maximization technique
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

mgarch dvech Diagonal vech multivariate GARCH models


Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(hessian)
e(A)
e(B)
e(S)
e(Sigma)
e(V)
e(pinfo)
Functions
e(sample)

355

coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
Hessian matrix
estimates of A matrices
estimates of B matrices
estimates of Sigma0 matrix
Sigma hat
variancecovariance matrix of the estimators
parameter information, used by predict
marks estimation sample

Methods and formulas


Recall that the diagonal vech MGARCH model can be written as

yt = Cxt + t
1/2

t = Ht t

Ht = S +

p
X

Ai ti 0ti +

i=1

q
X

Bj Htj

j=1

where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ;

t is an m 1 vector of normal, independent, and identically distributed innovations;

S is an m m symmetric matrix of parameters;


each Ai is an m m symmetric matrix of parameters;

is the elementwise or Hadamard product; and


each Bj is an m m symmetric matrix of parameters.
mgarch dvech estimates the parameters by maximum likelihood. The log-likelihood function based
on the multivariate normal distribution for observation t is
0
lt = 0.5m log(2) 0.5log {det (Ht )} 0.5t H1
t t

where t = yt Cxt . The log-likelihood function is

PT

t=1 lt .

356

mgarch dvech Diagonal vech multivariate GARCH models

If we assume that t follow a multivariate t distribution with degrees of freedom (df) greater than
2, then the log-likelihood function for observation t is

df + m
2

log {(df 2)}


2


0
df + m
t H1
t t
0.5log {det (Ht )}
log 1 +
2
df 2

lt = log

log

df
2

mgarch dvech ensures that Ht is positive definite for each t.


By default, mgarch dvech performs an iterative search for starting values. mgarch dvech estimates
starting values for C by seemingly unrelated regression, uses these estimates to compute residuals b
t ,
plugs b
t into the above log-likelihood function, and optimizes this log-likelihood function over the
parameters in Ht . This starting-value method plugs in consistent estimates of the parameters for the
conditional means of the dependent variables and then iteratively searches for the variance parameters
that maximize the log-likelihood function. Lutkepohl (2005, chap. 16) discusses this method as an
estimator for the variance parameters.
GARCH estimators require initial values that can be plugged in for ti 0ti and

Htj when
t i < 1 and t j < 1. mgarch dvech substitutes an estimator of the unconditional covariance of
the disturbances,
b = T 1

T
X

0
b
b
t b
b
t

(4)

t=1

for ti 0ti when t i < 1 and for Htj when t j < 1, where b
b
t is the vector of residuals
calculated using the estimated parameters.
mgarch dvech uses analytic first and second derivatives in maximizing the log-likelihood function
based on the multivariate normal distribution. mgarch dvech uses numerical derivatives in maximizing
the log-likelihood function based on the multivariate t distribution.

References
Bollerslev, T., R. F. Engle, and J. M. Wooldridge. 1988. A capital asset pricing model with time-varying covariances.
Journal of Political Economy 96: 116131.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] mgarch dvech postestimation Postestimation tools for mgarch dvech
[TS] mgarch Multivariate GARCH models
[TS] tsset Declare data to be time-series data
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
mgarch dvech postestimation Postestimation tools for mgarch dvech
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following standard postestimation commands are available after mgarch dvech:
Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

357

358

mgarch dvech postestimation Postestimation tools for mgarch dvech

Syntax for predict


predict

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

statistic
Main

linear prediction; the default


residuals
conditional variances and covariances

xb
residuals
variance

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options

equation(eqnames)
names of equations for which predictions are made
dynamic(time constant) begin dynamic forecast at specified time

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear predictions of the dependent variables.
residuals calculates the residuals.
variance predicts the conditional variances and conditional covariances.

Options

equation(eqnames) specifies the equation for which the predictions are calculated. Use this option
to predict a statistic for a particular equation. Equation names, such as equation(income), are
used to identify equations.
One equation name may be specified when predicting the dependent variable, the residuals, or
the conditional variance. For example, specifying equation(income) causes predict to predict
income, and specifying variance equation(income) causes predict to predict the conditional
variance of income.
Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the
conditional covariance of income and consumption.

mgarch dvech postestimation Postestimation tools for mgarch dvech

359

dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with residuals.

Remarks and examples


We assume that you have already read [TS] mgarch dvech. In this entry, we illustrate some of the
features of predict after using mgarch dvech to estimate the parameters of diagonal vech MGARCH
models.

Example 1: Dynamic forecasts


In this example, we obtain dynamic predictions for the Acme Inc. and Anvil Inc. fictional widget
data modeled in example 3 of [TS] mgarch dvech. We begin by reestimating the parameters of the
model.

360

mgarch dvech postestimation Postestimation tools for mgarch dvech


. use http://www.stata-press.com/data/r13/acme
. constraint 1 [L.ARCH]1_1 = [L.ARCH]2_2
. constraint 2 [L.GARCH]1_1 = [L.GARCH]2_2
. mgarch dvech (acme = L.acme) (anvil = L.anvil), arch(1) garch(1)
> constraints(1 2)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood

=
=
=
=
=
=
=
=

-6087.0665
-6022.2046
-5986.6152
-5976.5739
-5974.4342
-5974.4046
-5974.4036
-5974.4035

Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood

=
=
=
=
=

-5974.4035
-5973.812
-5973.8004
-5973.7999
-5973.7999

(not concave)

Diagonal vech MGARCH model


Sample: 1969w35 - 1998w25
Distribution: Gaussian
Log likelihood =
-5973.8
( 1)
( 2)

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

1499
272.47
0.0000

[L.ARCH]1_1 - [L.ARCH]2_2 = 0
[L.GARCH]1_1 - [L.GARCH]2_2 = 0
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

acme
acme
L1.

.3365278

.0255134

13.19

0.000

.2865225

.3865331

_cons

1.124611

.060085

18.72

0.000

1.006847

1.242376

anvil
L1.

.3151955

.0263287

11.97

0.000

.2635922

.3667988

_cons

1.215786

.0642052

18.94

0.000

1.089947

1.341626

1_1
2_1
2_2

1.889237
.4599576
2.063113

.2168733
.1139843
.2454633

8.71
4.04
8.40

0.000
0.000
0.000

1.464173
.2365525
1.582014

2.314301
.6833626
2.544213

1_1
2_1
2_2

.2813443
.181877
.2813443

.0299124
.0335393
.0299124

9.41
5.42
9.41

0.000
0.000
0.000

.222717
.1161412
.222717

.3399716
.2476128
.3399716

1_1
2_1
2_2

.1487581
.085404
.1487581

.0697531
.1446524
.0697531

2.13
0.59
2.13

0.033
0.555
0.033

.0120445
-.1981094
.0120445

.2854716
.3689175
.2854716

anvil

Sigma0

L.ARCH

L.GARCH

mgarch dvech postestimation Postestimation tools for mgarch dvech

361

Now we use tsappend (see [TS] tsappend) to extend the data, use predict to obtain the dynamic
predictions, and graph the predictions.

10

. tsappend, add(12)
. predict H*, variance dynamic(tw(1998w26))
. tsline H_acme_acme H_anvil_anvil if t>=tw(1995w25), legend(rows(2))

1995w26

1996w27

1997w26

1998w26

t
variance prediction (acme, acme) dynamic(tw(1998w26))
variance prediction (anvil, anvil) dynamic(tw(1998w26))

The graph shows that the in-sample predictions are similar for the conditional variances of Acme
Inc. and Anvil Inc. and that the dynamic forecasts converge to similar levels. It also shows that
the ARCH and GARCH parameters cause substantial time-varying volatility. The predicted conditional
variance of acme ranges from lows of just over 2 to highs above 10.

Example 2: Predicting in-sample conditional variances


In this example, we obtain the in-sample predicted conditional variances of the returns for the
fictional Acme Inc., which we modeled in example 4 of [TS] mgarch dvech. First, we reestimate the
parameters of the model.
. use http://www.stata-press.com/data/r13/aacmer, clear
. mgarch dvech (acme anvil = , noconstant), arch(1/2) garch(1)
Getting starting values
(setting technique to bhhh)
Iteration 0:
log likelihood = -18417.243 (not concave)
Iteration 1:
log likelihood = -18215.005
Iteration 2:
log likelihood = -18199.691
Iteration 3:
log likelihood = -18136.699
Iteration 4:
log likelihood = -18084.256
Iteration 5:
log likelihood = -17993.662
Iteration 6:
log likelihood =
-17731.1
Iteration 7:
log likelihood = -17629.505
(switching technique to nr)
Iteration 8:
log likelihood = -17548.172
Iteration 9:
log likelihood = -17544.987
Iteration 10: log likelihood = -17544.937
Iteration 11: log likelihood = -17544.937

362

mgarch dvech postestimation Postestimation tools for mgarch dvech


Estimating parameters
(setting technique to bhhh)
Iteration 0:
log likelihood = -17544.937
Iteration 1:
log likelihood = -17544.937
Diagonal vech MGARCH model
Sample: 1 - 5000
Distribution: Gaussian
Log likelihood = -17544.94
Coef.

Std. Err.

Number of obs
Wald chi2(.)
Prob > chi2
z

=
=
=

5000
.
.

P>|z|

[95% Conf. Interval]

Sigma0
1_1
2_1
2_2

1.026283
.4300997
1.019753

.0823348
.0590294
.0837146

12.46
7.29
12.18

0.000
0.000
0.000

.8649096
.3144042
.8556751

1.187656
.5457952
1.18383

1_1
2_1
2_2

.2878739
.1036685
.2034196

.02157
.0161446
.019855

13.35
6.42
10.25

0.000
0.000
0.000

.2455975
.0720256
.1645044

.3301504
.1353114
.2423347

1_1
2_1
2_2

.1837825
.0884425
.2025718

.0274555
.02208
.0272639

6.69
4.01
7.43

0.000
0.000
0.000

.1299706
.0451665
.1491355

.2375943
.1317185
.256008

1_1
2_1
2_2

.0782467
.2888104
.201618

.053944
.0818303
.0470584

1.45
3.53
4.28

0.147
0.000
0.000

-.0274816
.1284261
.1093853

.183975
.4491948
.2938508

L.ARCH

L2.ARCH

L.GARCH

Now we use predict to obtain the in-sample conditional variances of acme and use tsline (see
[TS] tsline) to graph the results.

variance prediction (acme, acme)


10
20
30
40

50

. predict h_acme, variance eq(acme, acme)


. tsline h_acme

1000

2000

3000

4000

5000

The graph shows that the predicted conditional variances vary substantially over time, as the
parameter estimates indicated.

mgarch dvech postestimation Postestimation tools for mgarch dvech

363

Because there are no covariates in the model for acme, specifying xb puts a prediction of 0 in each
observation, and specifying residuals puts the value of the dependent variable into the prediction.

Methods and formulas


All one-step predictions are obtained by substituting the parameter estimates into the model. The
b is the initial value for the ARCH and
estimated unconditional variance matrix of the disturbances, ,
b
GARCH terms. The postestimation routines recompute using the prediction sample, the parameter
estimates stored in e(b), and (4) in Methods and formulas of [TS] mgarch dvech.
For observations in which the residuals are missing, the estimated unconditional variance matrix
of the disturbances is used in place of the outer product of the residuals.
Dynamic predictions of the dependent variables use previously predicted values beginning in the
period specified by dynamic().

b for the outer product of the


Dynamic variance predictions are implemented by substituting
residuals beginning in the period specified by dynamic().

Also see
[TS] mgarch dvech Diagonal vech multivariate GARCH models
[U] 20 Estimation and postestimation commands

Title
mgarch vcc Varying conditional correlation multivariate GARCH models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
mgarch vcc eq

eq . . . eq

 

if

 

in

 

, options

where each eq has the form





(depvars = indepvars
, eqoptions )
options

Description

Model

arch(numlist)
garch(numlist)
het(varlist)

 
distribution(dist # )
constraints(numlist)

ARCH terms for all equations


GARCH terms for all equations

include varlist in the specification of the conditional variance


for all equations
use dist distribution for errors [may be gaussian
(synonym normal) or t; default is gaussian]
apply linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

Maximization

maximize options
from(matname)

control the maximization process; seldom used


initial values for the coefficients; seldom used

coeflegend

display legend instead of statistics

364

mgarch vcc Varying conditional correlation multivariate GARCH models

eqoptions

Description

noconstant
arch(numlist)
garch(numlist)
het(varlist)

ARCH terms
GARCH terms

365

suppress constant term in the mean equation

include varlist in the specification of the conditional variance

You must tsset your data before using mgarch vcc; see [TS] tsset.
indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables.
depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Multivariate GARCH

Description
mgarch vcc estimates the parameters of varying conditional correlation (VCC) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional variances
are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH) models
and the conditional covariances are modeled as nonlinear functions of the conditional variances. The
conditional correlation parameters that weight the nonlinear combinations of the conditional variance
follow the GARCH-like process specified in Tse and Tsui (2002).
The VCC MGARCH model is about as flexible as the closely related dynamic conditional correlation
MGARCH model (see [TS] mgarch dcc), more flexible than the conditional correlation MGARCH model
(see [TS] mgarch ccc), and more parsimonious than the diagonal vech model (see [TS] mgarch
dvech).

Options


Model

arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms
are specified.
garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH
terms are specified.
het(varlist) specifies that varlist be included in the model in the specification of the conditional
variance for all equations. This varlist enters the variance specification collectively as multiplicative
heteroskedasticity.
 
distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian,
normal, or t.
gaussian and normal are synonyms; each causes mgarch vcc to assume that the errors come
from a multivariate normal distribution. # may not be specified with either of them.

366

mgarch vcc Varying conditional correlation multivariate GARCH models

t causes mgarch vcc to assume that the errors follow a multivariate Student t distribution, and
the degree-of-freedom parameter is estimated along with the other parameters of the model. If
distribution(t #) is specified, then mgarch vcc uses a multivariate Student t distribution
with # degrees of freedom. # must be greater than 2.
constraints(numlist) specifies linear constraints to apply to the parameter estimates.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, specifies to use the observed information matrix (OIM) estimator.
vce(robust) specifies to use the Huber/White/sandwich estimator.

Reporting

level(#); see [R] estimation options.


nocnsreport; see [R] estimation options.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except
from(), and see below for information on from(). These options are seldom used.
from(matname) specifies initial values for the coefficients. from(b0) causes mgarch vcc to begin
the optimization algorithm with the values in b0. b0 must be a row vector, and the number of
columns must equal the number of parameters in the model.
The following option is available with mgarch vcc but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Eqoptions
noconstant suppresses the constant term in the mean equation.
arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified.
This option may not be specified with model-level arch().
garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified.
This option may not be specified with model-level garch().
het(varlist) specifies that varlist be included in the specification of the conditional variance. This
varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option
may not be specified with model-level het().

Remarks and examples


We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH
models and the methods implemented in mgarch vcc.

mgarch vcc Varying conditional correlation multivariate GARCH models

367

MGARCH models are dynamic multivariate regression models in which the conditional variances
and covariances of the errors follow an autoregressive-moving-average structure. The VCC MGARCH
model uses a nonlinear combination of univariate GARCH models with time-varying cross-equation
weights to model the conditional covariance matrix of the errors.

As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their
specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht .
In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled
as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions
of the diagonal terms. In the VCC MGARCH model,

hij,t = ij,t

p
hii,t hjj,t

where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ij,t follows the
dynamic process specified in Tse and Tsui (2002) and discussed below.
Because the ij,t varies with time, this model is known as the VCC GARCH model.

Technical note
The VCC GARCH model proposed by Tse and Tsui (2002) can be written as

yt = Cxt + t
1/2

t = Ht t
1/2

1/2

Ht = Dt Rt Dt
Rt = (1 1 2 )R + 1 t1 + 2 Rt1
where

yt is an m 1 vector of dependent variables;


C is an m k matrix of parameters;
xt is a k 1 vector of independent variables, which may contain lags of yt ;
1/2

Ht

is the Cholesky factor of the time-varying conditional covariance matrix Ht ;

t is an m 1 vector of independent and identically distributed innovations;

Dt is a diagonal matrix of conditional variances,


2
1,t
0

Dt = .
..
0

..
..
.
.
2
m,t

0
2
2,t
..
.
0

2
in which each i,t
evolves according to a univariate GARCH model of the form
P
Pqi
pi
2
2
i,t
= si + j=1
j 2i,tj + j=1
j i,tj

by default, or
2
i,t
= exp(i zi,t ) +

Ppi

j=1

j 2i,tj +

Pqi

j=1

2
j i,tj

(1)

368

mgarch vcc Varying conditional correlation multivariate GARCH models

when the het() option is specified, where t is a 1 p vector of parameters, zi is a p 1


vector of independent variables including a constant term, the j s are ARCH parameters,
and the j s are GARCH parameters;

Rt is a matrix of conditional correlations,

12,t
Rt =
..
.
1m,t

12,t
1
..
.
2m,t

1m,t
2m,t
..
..

.
.

R is the matrix of means to which the dynamic process in (1) reverts;


t is the rolling estimator of the correlation matrix of e
t , which uses the previous m + 1
observations; and

1 and 2 are parameters that govern the dynamics of conditional correlations. 1 and 2
are nonnegative and satisfy 0 1 + 2 < 1.
To differentiate this model from Engle (2002), Tse and Tsui (2002) call their model a VCC MGARCH
model.

Some examples
Example 1: Model with common covariates
We have daily data on the stock returns of three car manufacturersToyota, Nissan, and Honda,
from January 2, 2003, to December 31, 2010in the variables toyota, nissan, and honda. We
model the conditional means of the returns as a first-order vector autoregressive process and the
conditional covariances as a VCC MGARCH process in which the variance of each disturbance term
follows a GARCH(1,1) process.
. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. mgarch vcc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood =
16901.2
Iteration 1:
log likelihood = 17028.644
Iteration 2:
log likelihood = 17145.905
Iteration 3:
log likelihood = 17251.485
Iteration 4:
log likelihood = 17306.115
Iteration 5:
log likelihood =
17332.59
Iteration 6:
log likelihood = 17353.617
Iteration 7:
log likelihood =
17374.86
Iteration 8:
log likelihood = 17398.526
Iteration 9:
log likelihood = 17418.748
(switching technique to nr)
Iteration 10: log likelihood = 17442.552
Iteration 11: log likelihood = 17455.702
Iteration 12: log likelihood = 17463.605
Iteration 13: log likelihood = 17463.922
Iteration 14: log likelihood = 17463.925
Iteration 15: log likelihood = 17463.925

mgarch vcc Varying conditional correlation multivariate GARCH models


Refining estimates
Iteration 0:
log likelihood = 17463.925
Iteration 1:
log likelihood = 17463.925
Varying conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17463.92
Coef.

Number of obs
Wald chi2(9)
Prob > chi2
Std. Err.

P>|z|

=
=
=

2014
17.67
0.0392

[95% Conf. Interval]

toyota
toyota
L1.

-.0565645

.0335696

-1.68

0.092

-.1223597

.0092307

nissan
L1.

.0248101

.0252701

0.98

0.326

-.0247184

.0743385

honda
L1.

.0035836

.0298895

0.12

0.905

-.0549986

.0621659

ARCH_toyota
arch
L1.

.0602805

.0086798

6.94

0.000

.0432683

.0772926

garch
L1.

.9224692

.0110316

83.62

0.000

.9008477

.9440907

_cons

4.38e-06

1.12e-06

3.91

0.000

2.18e-06

6.58e-06

nissan
toyota
L1.

-.0196399

.0387112

-0.51

0.612

-.0955124

.0562325

nissan
L1.

-.0306663

.031051

-0.99

0.323

-.091525

.0301925

honda
L1.

.0383151

.0354691

1.08

0.280

-.0312031

.1078332

ARCH_nissan
arch
L1.

.0774227

.0119642

6.47

0.000

.0539733

.1008722

garch
L1.

.9076856

.0139339

65.14

0.000

.8803756

.9349956

_cons

6.20e-06

1.70e-06

3.65

0.000

2.87e-06

9.53e-06

toyota
L1.

-.0358293

.0340492

-1.05

0.293

-.1025645

.030906

nissan
L1.

.0544071

.0276156

1.97

0.049

.0002814

.1085327

honda
L1.

-.0424383

.0326249

-1.30

0.193

-.1063819

.0215054

honda

369

370

mgarch vcc Varying conditional correlation multivariate GARCH models

ARCH_honda
arch
L1.

.0458673

.0072714

6.31

0.000

.0316157

.0601189

garch
L1.

.9369252

.0101756

92.08

0.000

.9169815

.9568689

_cons

4.99e-06

1.29e-06

3.85

0.000

2.45e-06

7.52e-06

.6643028

.0151086

43.97

0.000

.6346905

.6939151

.7302092

.0126361

57.79

0.000

.705443

.7549755

.634732

.0159738

39.74

0.000

.6034239

.6660401

.0277374
.8255524

.0086942
.0755882

3.19
10.92

0.001
0.000

.010697
.6774023

.0447778
.9737025

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)
Adjustment
lambda1
lambda2

The output has three parts: an iteration log, a header, and an output table.
The iteration log has three parts: the dots from the search for initial values, the iteration log from
optimizing the log likelihood, and the iteration log from the refining step. A detailed discussion of
the optimization methods is in Methods and formulas.
The header describes the estimation sample and reports a Wald test against the null hypothesis
that all the coefficients on the independent variables in the mean equations are zero. Here the null
hypothesis is rejected at the 5% level.
The output table first presents results for the mean or variance parameters used to model each
dependent variable. Subsequently, the output table presents results for the parameters in R. For
example, the estimate of the mean of the process that associates Toyota and Nissan is 0.66. Finally,
the output table presents results for the adjustment parameters 1 and 2 . In the example at hand,
the estimates for both 1 and 2 are statistically significant.
The VCC MGARCH model reduces to the CCC MGARCH model when 1 = 2 = 0. The output
below shows that a Wald test rejects the null hypothesis that 1 = 2 = 0 at all conventional levels.
. test _b[Adjustment:lambda1] = _b[Adjustment:lambda2] = 0
( 1) [Adjustment]lambda1 - [Adjustment]lambda2 = 0
( 2) [Adjustment]lambda1 = 0
chi2( 2) =
Prob > chi2 =

482.80
0.0000

These results indicate that the assumption of time-invariant conditional correlations maintained in
the CCC MGARCH model is too restrictive for these data.

mgarch vcc Varying conditional correlation multivariate GARCH models

371

Example 2: Model with covariates that differ by equation


We improve the previous example by removing the insignificant parameters from the model. To
accomplish that, we specify the honda equation separately from the toyota and nissan equations:
. mgarch vcc (toyota nissan = , noconstant) (honda = L.nissan, noconstant),
> arch(1) garch(1)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood =
16889.43
Iteration 1:
log likelihood = 17002.567
Iteration 2:
log likelihood = 17134.525
Iteration 3:
log likelihood = 17233.192
Iteration 4:
log likelihood = 17295.342
Iteration 5:
log likelihood = 17326.347
Iteration 6:
log likelihood = 17348.063
Iteration 7:
log likelihood = 17363.988
Iteration 8:
log likelihood = 17387.216
Iteration 9:
log likelihood = 17404.734
(switching technique to nr)
Iteration 10: log likelihood = 17438.432 (not concave)
Iteration 11: log likelihood = 17450.001
Iteration 12: log likelihood = 17455.442
Iteration 13: log likelihood = 17455.971
Iteration 14: log likelihood =
17455.98
Iteration 15: log likelihood =
17455.98
Refining estimates
Iteration 0:
log likelihood =
Iteration 1:
log likelihood =

17455.98
17455.98

(backed up)

372

mgarch vcc Varying conditional correlation multivariate GARCH models


Varying conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 17455.98
Coef.

Number of obs
Wald chi2(1)
Prob > chi2

Std. Err.

P>|z|

=
=
=

2014
1.62
0.2032

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.0609064

.0087784

6.94

0.000

.043701

.0781117

garch
L1.

.921703

.0111493

82.67

0.000

.8998509

.9435552

_cons

4.42e-06

1.13e-06

3.91

0.000

2.20e-06

6.64e-06

ARCH_nissan
arch
L1.

.0806598

.0123529

6.53

0.000

.0564486

.104871

garch
L1.

.9035239

.014421

62.65

0.000

.8752592

.9317886

_cons

6.61e-06

1.79e-06

3.70

0.000

3.11e-06

.0000101

nissan
L1.

.0175566

.0137982

1.27

0.203

-.0094874

.0446005

ARCH_honda
arch
L1.

.0461398

.0073048

6.32

0.000

.0318226

.060457

garch
L1.

.9366096

.0102021

91.81

0.000

.9166139

.9566053

_cons

5.03e-06

1.31e-06

3.85

0.000

2.47e-06

7.59e-06

.6635251

.0150293

44.15

0.000

.6340682

.692982

.7299703

.0124828

58.48

0.000

.7055045

.754436

.6338207

.0158681

39.94

0.000

.6027198

.6649217

.0285319
.8113924

.0092448
.0854955

3.09
9.49

0.002
0.000

.0104124
.6438243

.0466514
.9789604

honda

corr(toyota,
nissan)
corr(toyota,
honda)
corr(nissan,
honda)
Adjustment
lambda1
lambda2

It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant.
We could further improve the model by removing L1.nissan from the model.
There is no mean equation for Toyota or Nissan. In [TS] mgarch vcc postestimation, we discuss
prediction from models without covariates.

mgarch vcc Varying conditional correlation multivariate GARCH models

373

Example 3: Model with constraints


Here we fit a bivariate VCC MGARCH model for the Toyota and Nissan shares. We believe that
the shares of these car manufacturers follow the same process, so we impose the constraints that the
ARCH coefficients are the same for the two companies and that the GARCH coefficients are also the
same.
. constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch]
. constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch]
. mgarch vcc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2)
Calculating starting values....
Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood = 10326.298
Iteration 1:
log likelihood =
10680.73
Iteration 2:
log likelihood = 10881.388
Iteration 3:
log likelihood = 11043.345
Iteration 4:
log likelihood = 11122.459
Iteration 5:
log likelihood = 11202.411
Iteration 6:
log likelihood = 11253.657
Iteration 7:
log likelihood = 11276.325
Iteration 8:
log likelihood = 11279.823
Iteration 9:
log likelihood = 11281.704
(switching technique to nr)
Iteration 10: log likelihood = 11282.313
Iteration 11: log likelihood =
11282.46
Iteration 12: log likelihood = 11282.461

374

mgarch vcc Varying conditional correlation multivariate GARCH models


Refining estimates
Iteration 0:
log likelihood = 11282.461
Iteration 1:
log likelihood = 11282.461 (backed up)
Varying conditional correlation MGARCH model
Sample: 1 - 2015
Distribution: Gaussian
Log likelihood = 11282.46
( 1)
( 2)

Number of obs
Wald chi2(.)
Prob > chi2

=
=
=

2015
.
.

[ARCH_toyota]L.arch - [ARCH_nissan]L.arch = 0
[ARCH_toyota]L.garch - [ARCH_nissan]L.garch = 0
Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

ARCH_toyota
arch
L1.

.0797459

.0101634

7.85

0.000

.059826

.0996659

garch
L1.

.9063808

.0118211

76.67

0.000

.883212

.9295497

_cons

4.24e-06

1.10e-06

3.85

0.000

2.08e-06

6.40e-06

ARCH_nissan
arch
L1.

.0797459

.0101634

7.85

0.000

.059826

.0996659

garch
L1.

.9063808

.0118211

76.67

0.000

.883212

.9295497

_cons

5.91e-06

1.47e-06

4.03

0.000

3.03e-06

8.79e-06

.6720056

.0162585

41.33

0.000

.6401394

.7038718

.0343012
.7945548

.0128097
.101067

2.68
7.86

0.007
0.000

.0091945
.596467

.0594078
.9926425

corr(toyota,
nissan)
Adjustment
lambda1
lambda2

We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio
test. The results indicate that the restricted model is preferable.

Example 4: Model with a GARCH term


In this example, we have data on fictional stock returns for the Acme and Anvil corporations, and
we believe that the movement of the two stocks is governed by different processes. We specify one
ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for
the conditional variance equation for Anvil. In addition, we include the lagged value of the stock
return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For
Acme, we have data on the changes in an index of futures prices of products related to those produced
by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of
inputs used by Anvil in afinputs.
. use http://www.stata-press.com/data/r13/acmeh
. mgarch vcc (acme = afrelated, noconstant arch(1) garch(1))
> (anvil = afinputs, arch(1/2) het(L.apex))
Calculating starting values....

mgarch vcc Varying conditional correlation multivariate GARCH models


Optimizing log likelihood
(setting technique to bhhh)
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
Iteration 5:
log likelihood
Iteration 6:
log likelihood
Iteration 7:
log likelihood
Iteration 8:
log likelihood
Iteration 9:
log likelihood
(switching technique to nr)
Iteration 10: log likelihood
Iteration 11: log likelihood
Refining estimates
Iteration 0:
log likelihood
Iteration 1:
log likelihood

=
=
=
=
=
=
=
=
=
=

-13252.793
-12859.124
-12522.14
-12406.487
-12304.275
-12273.103
-12256.104
-12254.55
-12254.482
-12254.478

= -12254.478
= -12254.478
= -12254.478
= -12254.478

Varying conditional correlation MGARCH model


Sample: 1 - 2500
Distribution: Gaussian
Log likelihood = -12254.48
Coef.

Std. Err.

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

2499
5226.19
0.0000

P>|z|

[95% Conf. Interval]

acme
afrelated

.9672465

.0510066

18.96

0.000

.8672753

1.067218

ARCH_acme
arch
L1.

.0949142

.0147302

6.44

0.000

.0660435

.1237849

garch
L1.

.7689442

.038885

19.77

0.000

.6927309

.8451574

_cons

2.129468

.464916

4.58

0.000

1.218249

3.040687

anvil
afinputs
_cons

-1.018629
.1015986

.0145027
.0177952

-70.24
5.71

0.000
0.000

-1.047053
.0667205

-.9902037
.1364766

ARCH_anvil
arch
L1.
L2.

.4990272
.2839812

.0243531
.0181966

20.49
15.61

0.000
0.000

.4512959
.2483165

.5467584
.3196459

apex
L1.

1.897144

.0558791

33.95

0.000

1.787623

2.006665

_cons

.0682724

.0662257

1.03

0.303

-.0615276

.1980724

-.6574256

.0294259

-22.34

0.000

-.7150994

-.5997518

.2375029
.6492072

.0179114
.0254493

13.26
25.51

0.000
0.000

.2023971
.5993274

.2726086
.6990869

corr(acme,
anvil)
Adjustment
lambda1
lambda2

375

376

mgarch vcc Varying conditional correlation multivariate GARCH models

The results indicate that increases in the futures prices for related products lead to higher returns on
the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional
variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates
that an increase in the return on the Apex stock leads to more variability in the return on the Anvil
stock.

Stored results
mgarch vcc stores the following in e():
Scalars
e(N)
e(k)
e(k aux)
e(k extra)
e(k eq)
e(k dv)
e(df m)
e(ll)
e(chi2)
e(p)
e(estdf)
e(usr)
e(tmin)
e(tmax)
e(N gaps)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(model)
e(cmdline)
e(depvar)
e(covariates)
e(dv eqs)
e(indeps)
e(tvar)
e(title)
e(chi2type)
e(vce)
e(vcetype)
e(tmins)
e(tmaxs)
e(dist)
e(arch)
e(garch)
e(technique)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)

number of observations
number of parameters
number of auxiliary parameters
number of extra estimates added to
number of equations in e(b)
number of dependent variables
model degrees of freedom
log likelihood

significance
1 if distribution parameter was estimated, 0 otherwise
user-provided distribution parameter
minimum time in sample
maximum time in sample
number of gaps
rank of e(V)
number of iterations
return code
1 if converged, 0 otherwise
mgarch
vcc
command as typed
names of dependent variables
list of covariates
dependent variables with mean equations
independent variables in each equation
time variable
title in estimation output
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
formatted minimum time
formatted maximum time
distribution for error term: gaussian or t
specified ARCH terms
specified GARCH terms
maximization technique
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

mgarch vcc Varying conditional correlation multivariate GARCH models


Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(hessian)
e(V)
e(pinfo)
Functions
e(sample)

377

coefficient vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
Hessian matrix
variancecovariance matrix of the estimators
parameter information, used by predict
marks estimation sample

Methods and formulas


mgarch vcc estimates the parameters of the varying conditional correlation MGARCH model by
maximum likelihood. The log-likelihood function based on the multivariate normal distribution for
observation t is

n 
o
1/2
lt = 0.5m log(2) 0.5log {det (Rt )} log det Dt
0.5e
t R1
0t
t e
1/2

where e
t = Dt
t is an m 1 vector of standardized residuals, t = yt Cxt . The log-likelihood
PT
function is t=1 lt .
If we assume that t follow a multivariate t distribution with degrees of freedom (df) greater than
2, then the log-likelihood function for observation t is


df
m
log {(df 2)}

2
2


n 
o df + m
e
t R1
0t
1/2
t e
0.5log {det (Rt )} log det Dt

log 1 +
2
df 2

lt = log

df + m
2

log

The starting values for the parameters in the mean equations and the initial residuals b
t are
obtained by least-squares regression. The starting values for the parameters in the variance equations
are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). The starting
values for the parameters in R are calculated from the standardized residuals e
t . Given the starting
values for the mean and variance equations, the starting values for the parameters 1 and 2 are
obtained from a grid search performed on the log likelihood.
The initial optimization step is performed in the unconstrained space. Once the maximum is found,
we impose the constraints 1 0, 2 0, and 0 1 + 2 < 1, and maximize the log likelihood
in the constrained space. This step is reported in the iteration log as the refining step.
GARCH estimators require initial values that can be plugged in for ti 0ti and

Htj when
t i < 1 and t j < 1. mgarch vcc substitutes an estimator of the unconditional covariance of the
disturbances
b = T 1

T
X
t=1

0
b
b
t b
b
t

(2)

378

mgarch vcc Varying conditional correlation multivariate GARCH models

for ti 0ti when t i < 1 and for Htj when t j < 1, where b
b
t is the vector of residuals
calculated using the estimated parameters.
mgarch vcc uses numerical derivatives in maximizing the log-likelihood function.

References
Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional
heteroskedasticity models. Journal of Business & Economic Statistics 20: 339350.
Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge:
Cambridge University Press.
Tse, Y. K., and A. K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model
with time-varying correlations. Journal of Business & Economic Statistics 20: 351362.

Also see
[TS] mgarch vcc postestimation Postestimation tools for mgarch vcc
[TS] mgarch Multivariate GARCH models
[TS] tsset Declare data to be time-series data
[TS] arch Autoregressive conditional heteroskedasticity (ARCH) family of estimators
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
mgarch vcc postestimation Postestimation tools for mgarch vcc
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following standard postestimation commands are available after mgarch vcc:
Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

379

380

mgarch vcc postestimation Postestimation tools for mgarch vcc

Syntax for predict


predict

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

statistic
Main

linear prediction; the default


residuals
conditional variances and covariances
conditional correlations

xb
residuals
variance
correlation

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options

equation(eqnames)
names of equations for which predictions are made
dynamic(time constant) begin dynamic forecast at specified time

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear predictions of the dependent variables.
residuals calculates the residuals.
variance predicts the conditional variances and conditional covariances.
correlation predicts the conditional correlations.

Options

equation(eqnames) specifies the equation for which the predictions are calculated. Use this option
to predict a statistic for a particular equation. Equation names, such as equation(income), are
used to identify equations.
One equation name may be specified when predicting the dependent variable, the residuals, or
the conditional variance. For example, specifying equation(income) causes predict to predict
income, and specifying variance equation(income) causes predict to predict the conditional
variance of income.
Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the
conditional covariance of income and consumption.

mgarch vcc postestimation Postestimation tools for mgarch vcc

381

dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with residuals.

Remarks and examples


We assume that you have already read [TS] mgarch vcc. In this entry, we use predict after
mgarch vcc to make in-sample and out-of-sample forecasts.

Example 1: Dynamic forecasts


In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns
modeled in example 2 of [TS] mgarch vcc. In the output below, we reestimate the parameters of the
model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample
one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph
the forecasts below.

.001

.002

.003

. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. quietly mgarch vcc (toyota nissan = , noconstant)
> (honda = L.nissan, noconstant), arch(1) garch(1)
. tsappend, add(50)
. predict H*, variance dynamic(2016)

01jan2009

01jul2009

01jan2010
Date

01jul2010

01jan2011

Variance prediction (toyota,toyota), dynamic(2016)


Variance prediction (nissan,nissan), dynamic(2016)
Variance prediction (honda,honda), dynamic(2016)

Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above
graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph
shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also
shows that the dynamic forecasts quickly converge.

382

mgarch vcc postestimation Postestimation tools for mgarch vcc

Methods and formulas


All one-step predictions are obtained by substituting the parameter estimates into the model. The
b is the initial value for the ARCH and
estimated unconditional variance matrix of the disturbances, ,
b using the prediction sample, the parameter
GARCH terms. The postestimation routines recompute
estimates stored in e(b), and (2) in Methods and formulas of [TS] mgarch vcc.
For observations in which the residuals are missing, the estimated unconditional variance matrix
of the disturbances is used in place of the outer product of the residuals.
Dynamic predictions of the dependent variables use previously predicted values beginning in the
period specified by dynamic().

b for the outer product of the


Dynamic variance predictions are implemented by substituting
residuals beginning in the period specified in dynamic().

Also see
[TS] mgarch vcc Varying conditional correlation multivariate GARCH models
[U] 20 Estimation and postestimation commands

Title
newey Regression with NeweyWest standard errors
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
newey depvar

indepvars

 

if

 

in

 




weight , lag(#) options

Description

options
Model

lag(#)
noconstant

set maximum lag order of autocorrelation


suppress constant term

Reporting

level(#)
display options

set confidence level; default is level(95)


control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

coeflegend

display legend instead of statistics

lag(#) is required.
You must tsset your data before using newey; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
aweights are allowed; see [U] 11.1.6 weight.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Time series

>

Regression with Newey-West std. errors

Description
newey produces Newey West standard errors for coefficients estimated by OLS regression. The
error structure is assumed to be heteroskedastic and possibly autocorrelated up to some lag.

Options


Model

lag(#) specifies the maximum lag to be considered in the autocorrelation structure. If you specify
lag(0), the output is the same as regress, vce(robust). lag() is required.
noconstant; see [R] estimation options.
383

384

newey Regression with NeweyWest standard errors

Reporting

level(#); see [R] estimation options.


display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.
The following option is available with newey but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


The Huber/White/sandwich robust variance estimator (see White [1980]) produces consistent
standard errors for OLS regression coefficient estimates in the presence of heteroskedasticity. The
Newey West (1987) variance estimator is an extension that produces consistent estimates when there
is autocorrelation in addition to possible heteroskedasticity.
The Newey West variance estimator handles autocorrelation up to and including a lag of m,
where m is specified by stipulating the lag() option. Thus, it assumes that any autocorrelation at
lags greater than m can be ignored.
If lag(0) is specified, the variance estimates produced by newey are simply the Huber/White/sandwich robust variances estimates calculated by regress, vce(robust); see [R] regress.

Example 1
newey, lag(0) is equivalent to regress, vce(robust):
. use http://www.stata-press.com/data/r13/auto
(1978 Automobile Data)
. regress price weight displ, vce(robust)
Linear regression

price

Coef.

weight
displacement
_cons

1.823366
2.087054
247.907

. generate t = _n
. tsset t
time variable:
delta:

Number of obs =
F( 2,
71) =
Prob > F
=
R-squared
=
Root MSE
=

Robust
Std. Err.

P>|t|

.7808755
7.436967
1129.602

2.34
0.28
0.22

0.022
0.780
0.827

t, 1 to 74
1 unit

74
14.44
0.0000
0.2909
2518.4

[95% Conf. Interval]


.2663445
-12.74184
-2004.455

3.380387
16.91595
2500.269

newey Regression with NeweyWest standard errors


. newey price weight displ, lag(0)
Regression with Newey-West standard errors
maximum lag: 0

price

Coef.

weight
displacement
_cons

1.823366
2.087054
247.907

Number of obs
F( 2,
71)
Prob > F

Newey-West
Std. Err.

P>|t|

.7808755
7.436967
1129.602

2.34
0.28
0.22

0.022
0.780
0.827

=
=
=

385

74
14.44
0.0000

[95% Conf. Interval]


.2663445
-12.74184
-2004.455

3.380387
16.91595
2500.269

Because newey requires the dataset to be tsset, we generated a dummy time variable t, which in
this example played no role in the estimation.

Example 2
Say that we have time-series measurements on variables usr and idle and now wish to fit an
OLS model but obtain Newey West standard errors allowing for a lag of up to 3:
. use http://www.stata-press.com/data/r13/idle2, clear
. tsset time
time variable: time, 1 to 30
delta: 1 unit
. newey usr idle, lag(3)
Regression with Newey-West standard errors
maximum lag: 3

usr

Coef.

idle
_cons

-.2281501
23.13483

Newey-West
Std. Err.
.0690927
6.327031

Number of obs
F( 1,
28)
Prob > F

t
-3.30
3.66

P>|t|
0.003
0.001

=
=
=

30
10.90
0.0026

[95% Conf. Interval]


-.3696801
10.17449

-.08662
36.09516

386

newey Regression with NeweyWest standard errors

Stored results
newey stores the following in e():
Scalars
e(N)
e(df m)
e(df r)
e(F)
e(lag)
e(rank)
Macros
e(cmd)
e(cmdline)
e(depvar)
e(wtype)
e(wexp)
e(title)
e(vcetype)
e(properties)
e(estat cmd)
e(predict)
e(asbalanced)
e(asobserved)
Matrices
e(b)
e(Cns)
e(V)
Functions
e(sample)

number of observations
model degrees of freedom
residual degrees of freedom
F statistic
maximum lag
rank of e(V)
newey
command as typed
name of dependent variable
weight type
weight expression
title in estimation output
title used to label Std. Err.
b V
program used to implement estat
program used to implement predict
factor variables fvset as asbalanced
factor variables fvset as asobserved
coefficient vector
constraints matrix
variancecovariance matrix of the estimators
marks estimation sample

Methods and formulas


newey calculates the estimates

b OLS = (X0 X)1 X0 y

d
b OLS ) = (X0 X)1 X0
b X(X0 X)1
Var(
That is, the coefficient estimates are simply those of OLS linear regression.
For lag(0) (no autocorrelation), the variance estimates are calculated using the White formulation:

b X = X0
b 0X =
X0

n X 2 0
eb x xi
nk i i i

b OLS , where xi is the ith row of the X matrix, n is the number of observations,
Here ebi = yi xi
and k is the number of predictors in the model, including the constant if there is one. The above
formula is the same as that used by regress, vce(robust) with the regression-like formula (the
default) for the multiplier qc ; see Methods and formulas of [R] regress.

newey Regression with NeweyWest standard errors

387

For lag(m), m > 0, the variance estimates are calculated using the Newey West (1987)
formulation

b X = X0
b 0X +
X0

 X
m 
n
l
n X
1
ebt ebtl (x0t xtl + x0tl xt )
nk
m+1
l=1

t=l+1

where xt is the row of the X matrix observed at time t.


Whitney K. Newey (1954 ) earned degrees in economics at Brigham Young University and
MIT. After a period at Princeton, he returned to MIT as a professor in 1990. His interests in
theoretical and applied econometrics include bootstrapping, nonparametric estimation of models,
semiparametric models, and choosing the number of instrumental variables.

Kenneth D. West (1953 ) earned a bachelors degree in economics and mathematics at Wesleyan
University and then a PhD in economics at MIT. After a period at Princeton, he joined the
University of Wisconsin in 1988. His interests include empirical macroeconomics and timeseries econometrics.

References
Hardin, J. W. 1997. sg72: NeweyWest standard errors for probit, logit, and poisson models. Stata Technical Bulletin
39: 3235. Reprinted in Stata Technical Bulletin Reprints, vol. 7, pp. 182186. College Station, TX: Stata Press.
Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55: 703708.
Wang, Q., and N. Wu. 2012. Long-run covariance and its applications in cointegration regression. Stata Journal 12:
515542.
White, H. L., Jr. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity.
Econometrica 48: 817838.

Also see
[TS] newey postestimation Postestimation tools for newey
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] forecast Econometric model forecasting
[TS] tsset Declare data to be time-series data
[R] regress Linear regression
[U] 20 Estimation and postestimation commands

Title
newey postestimation Postestimation tools for newey
Description
Remarks and examples

Syntax for predict


Also see

Menu for predict

Options for predict

Description
The following postestimation commands are available after newey:
Command

Description

contrast
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
link test for model specification
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

linktest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

Syntax for predict


predict
statistic

type

newvar

if

 

in

 

, statistic

Description

Main

xb
stdp
residuals

linear prediction; the default


standard error of the linear prediction
residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

388

. . . if e(sample) . . . if wanted only for

newey postestimation Postestimation tools for newey

389

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear prediction.


stdp calculates the standard error of the linear prediction.
residuals calculates the residuals.

Remarks and examples


Example 1
We use the test command after newey to illustrate the importance of accounting for the presence of
serial correlation in the error term. The dataset contains daily stock returns of three car manufacturers
from January 2, 2003, to December 31, 2010, in the variables toyota, nissan, and honda.
We fit a model for the Nissan stock returns on the Honda and Toyota stock returns, and we use
estat bgodfrey to test for serial correlation of order one:
. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. regress nissan honda toyota
(output omitted )
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)

chi2

df

Prob > chi2

6.415

0.0113

H0: no serial correlation

The result implies that the error term is serially correlated; therefore, we should rather fit the model
with newey. But lets use the outcome from regress to conduct a test for the statistical significance
of a particular linear combination of the two coefficients in the regression:
. test 1.15*honda+toyota = 1
( 1) 1.15*honda + toyota = 1
F( 1, 2012) =
5.52
Prob > F =
0.0189

We reject the null hypothesis that the linear combination is valid. Lets see if the conclusion
remains the same when we fit the model with newey, obtaining the NeweyWest standard errors for
the OLS coefficient estimates.

390

newey postestimation Postestimation tools for newey


. newey nissan honda toyota,lag(1)
(output omitted )
. test 1.15*honda+toyota = 1
( 1)

1.15*honda + toyota = 1
F(

1, 2012) =
Prob > F =

2.57
0.1088

The conclusion would be the opposite, which illustrates the importance of using the proper estimator
for the standard errors.

Example 2
We want to produce forecasts based on dynamic regressions for each of the three stocks. We will
treat the stock returns for toyota as a leading indicator for the two other stocks. We also check for
autocorrelation with the BreuschGodfrey test.
. use http://www.stata-press.com/data/r13/stocks
(Data from Yahoo! Finance)
. regress toyota l(1/2).toyota
(output omitted )
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)

chi2

df

Prob > chi2

4.373

0.0365

H0: no serial correlation


. regress nissan l(1/2).nissan l.toyota
(output omitted )
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)

chi2

df

Prob > chi2

0.099

0.7536

H0: no serial correlation


. regress honda l(1/2).honda l.toyota
(output omitted )
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)

chi2

df

Prob > chi2

0.923

0.3367

H0: no serial correlation

The first result indicates that we should consider using newey to fit the model for toyota. The
point forecasts would not be actually affected because newey produces the same OLS coefficient
estimates reported by regress. However, if we were interested in obtaining measures of uncertainty
surrounding the point forecasts, we should then use the results from newey for that first equation.

newey postestimation Postestimation tools for newey

391

Lets illustrate the use of forecast with newey for the first equation and regress for the two
other equations. We first declare the forecast model:
. forecast create stocksmodel
Forecast model stocksmodel started.

Then we refit the equations and add them to the forecast model:
. quietly newey toyota l(1/2).toyota, lag(1)
. estimates store eq_toyota
. forecast estimates eq_toyota
Added estimation results from newey.
Forecast model stocksmodel now contains 1 endogenous variable.
. quietly regress nissan l(1/2).nissan l.toyota
. estimates store eq_nissan
. forecast estimates eq_nissan
Added estimation results from regress.
Forecast model stocksmodel now contains 2 endogenous variables.
. quietly regress honda l(1/2).honda l.toyota
. estimates store eq_honda
. forecast estimates eq_honda
Added estimation results from regress.
Forecast model stocksmodel now contains 3 endogenous variables.

We use tsappend to add the number of periods for the forecast, and then we obtain the predicted
values with forecast solve:
. tsappend, add(7)
. forecast solve, prefix(stk_)
Computing dynamic forecasts for model stocksmodel.
Starting period:
Ending period:
Forecast prefix:

2016
2022
stk_

2016: ............
2017: ...........
2018: ...........
2019: ..........
2020: .........
2021: ........
2022: ........
Forecast 3 variables spanning 7 periods.

The graph below shows several interesting results. First, the stock returns of the competitor (toyota)
does not seem to be a leading indicator for the stock returns of the two other companies (otherwise, the
patterns for the movements in nissan and honda would be following the recent past movements in
toyota). You can actually fit the models above for nissan and honda to confirm that the coefficient
estimate for the first lag of toyota is not significant in any of the two equations. Second, immediately
after the second forecasted period, there is basically no variation in the predictions, which indicates
the very short-run predicting influence of past history on the forecasts of the three stock returns.

392

newey postestimation Postestimation tools for newey

Current and forecasted stock returns

.02

Stock returns
0
.02

.04

Dynamic forecast start at 01Jan2011

01

01

c2

de

01

c2

de

08

01

c2

de

15

Honda

01

c2

de

24
Date

Toyota

Also see
[TS] newey Regression with NeweyWest standard errors
[U] 20 Estimation and postestimation commands

01

c2

de

31

11

20

jan

08

Nissan

Title
pergram Periodogram
Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
pergram varname

if

 

in

 

, options

Description

options
Main

generate newvar to contain the raw periodogram values

generate(newvar)
Plot

affect rendition of the plotted points connected by lines


change look of markers (color, size, etc.)
add marker labels; change look or position

cline options
marker options
marker label options
Add plots

add other plots to the generated graph

addplot(plot)

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

nograph

suppress the graph

You must tsset your data before using pergram; see [TS] tsset. Also, the time series must be dense
(nonmissing with no gaps in the time variable) in the specified sample.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.
nograph does not appear in the dialog box.

Menu
Statistics

>

Time series

>

Graphs

>

Periodogram

Description
pergram plots the log-standardized periodogram for a dense time series.

Options


Main

generate(newvar) specifies a new variable to contain the raw periodogram values. The generated
graph log-transforms and scales the values by the sample variance and then truncates them to the
[ 6, 6 ] interval before graphing them.
393

394

pergram Periodogram

Plot

cline options affect the rendition of the plotted points connected by lines; see [G-3] cline options.
marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.

Add plots

addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).
The following option is available with pergram but is not shown in the dialog box:
nograph prevents pergram from constructing a graph.

Remarks and examples


A good discussion of the periodogram is provided in Chatfield (2004), Hamilton (1994), and
Newton (1988). Chatfield is also a good introductory reference for time-series analysis. Another
classic reference is Box, Jenkins, and Reinsel (2008). pergram produces a scatterplot in which
the points of the scatterplot are connected. The points themselves represent the log-standardized
periodogram, and the connections between points represent the (continuous) log-standardized sample
spectral density.
In the following examples, we present the periodograms with an interpretation of the main features
of the plots.

Example 1
We have time-series data consisting of 144 observations on the monthly number of international
airline passengers (in thousands) between 1949 and 1960 (Box, Jenkins, and Reinsel 2008, Series G).
We can graph the raw series and the log periodogram for these data by typing

pergram Periodogram

395

100

Airline Passengers (19491960)


200
300
400
500

600

. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. scatter air time, m(o) c(l)

1950

1955
Time (in months)

1960

. pergram air

6.00
4.00
2.00
0.00

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

Airline Passengers (19491960)


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00

6.00

Sample spectral density function

Evaluated at the natural frequencies

The periodogram highlights the annual cycle together with the harmonics. Notice the peak at a
frequency of about 0.08 cycles per month (cpm). The period is the reciprocal of frequency, and the
reciprocal of 0.08 cpm is approximately 12 months per cycle. The similarity in shape of each group
of 12 observations reveals the annual cycle. The magnitude of the cycle is increasing, resulting in
the peaks in the periodogram at the harmonics of the principal annual cycle.

Example 2
This example uses 215 observations on the annual number of sunspots from 1749 to 1963 (Box
and Jenkins 1976, Series E). The graph of the raw series and the log periodogram for these data are
given as

396

pergram Periodogram

50

Number of sunspots
100
150

200

. use http://www.stata-press.com/data/r13/sunspot
(TIMESLAB: Wolfer sunspot data)
. scatter spot time, m(o) c(l)

1750

1800

1850
Year

1900

1950

. pergram spot

6.00
4.00
2.00
0.00

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

Number of sunspots
Log Periodogram
6.00 4.00 2.00 0.00 2.00

4.00

6.00

Sample spectral density function

Evaluated at the natural frequencies

The periodogram peaks at a frequency of slightly less than 0.10 cycles per year, indicating a 10to 12-year cycle in sunspot activity.

Example 3
Here we examine the number of trapped Canadian lynx from 1821 through 1934 (Newton 1988,
587). The raw series and the log periodogram are given as

pergram Periodogram

397

Number of lynx trapped


2000
4000
6000

8000

. use http://www.stata-press.com/data/r13/lynx2
(TIMESLAB: Canadian lynx)
. scatter lynx time, m(o) c(l)

50

100

150

Time

. pergram lynx

6.00
4.00
2.00
0.00

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

Number of lynx trapped


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00

6.00

Sample spectral density function

Evaluated at the natural frequencies

The periodogram indicates that there is a cycle with a duration of about 10 years for these data but
that it is otherwise random.

Example 4
To more clearly highlight what the periodogram depicts, we present the result of analyzing a
time series of the sum of four sinusoids (of different periods). The periodogram should be able to
decompose the time series into four different sinusoids whose periods may be determined from the
plot.

398

pergram Periodogram

20

10

Sum of 4 cosines
0

10

20

. use http://www.stata-press.com/data/r13/cos4
(TIMESLAB: Sum of 4 Cosines)
. scatter sumfc time, m(o) c(l)

50

100

150

Time

. pergram sumfc, gen(ordinate)

6.00
4.00
2.00
0.00

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

Sum of 4 cosines
Log Periodogram
6.00 4.00 2.00 0.00 2.00

4.00

6.00

Sample spectral density function

Evaluated at the natural frequencies

The periodogram clearly shows the four contributions to the original time series. From the plot, we
can see that the periods of the summands were 3, 6, 12, and 36, although you can confirm this by
using

pergram Periodogram

399

. generate double omega = (_n-1)/144


. generate double period = 1/omega
(1 missing value generated)
. list period omega if ordinate> 1e-5 & omega <=.5

5.
13.
25.
49.

period

omega

36
12
6
3

.02777778
.08333333
.16666667
.33333333

Methods and formulas


We use the notation of Newton (1988) in the following discussion.
A time series of interest is decomposed into a unique set of sinusoids of various frequencies and
amplitudes.
A plot of the sinusoidal amplitudes (ordinates) versus the frequencies for the sinusoidal decomposition of a time series gives us the spectral density of the time series. If we calculate the sinusoidal
amplitudes for a discrete set of natural frequencies (1/n, 2/n, . . . , q/n), we obtain the periodogram.
Let x(1), . . . , x(n) be a time series, and let k = (k 1)/n denote the natural frequencies for
k = 1, . . . , ( n/2 ) + 1. Define

Ck2

1
= 2
n


2
n
X


2i(t1)k
x(t)e




t=1

A plot of nCk2 versus k is then called the periodogram.


The sample spectral density is defined for a continuous frequency as


2
n
X

2i(t1)
1
x(t)e


fb() = n t=1

fb(1 )

if [ 0, .5 ]
if [ .5, 1 ]

The periodogram (and sample spectral density) is symmetric about = 0.5. Further standardize
the periodogram such that
n

1 X nCk2
=1
n

b2
k=2

where
b is the sample variance of the time series so that the average value of the ordinate is one.
Once the amplitudes are standardized, we may then take the natural log of the values and produce
the log periodogram. In doing so, we truncate the graph at 6. We drop the word log and simply
refer to the log periodogram as the periodogram in text.

400

pergram Periodogram

References
Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting and Control. Oakland, CA: HoldenDay.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.

Also see
[TS] tsset Declare data to be time-series data
[TS] corrgram Tabulate and graph autocorrelations
[TS] cumsp Cumulative spectral distribution
[TS] wntestb Bartletts periodogram-based test for white noise

Title
pperron PhillipsPerron unit-root test
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
pperron varname

if

 

in

 

, options

Description

options
Main

noconstant
trend
regress
lags(#)

suppress constant term


include trend term in regression
display regression table
use # NeweyWest lags

You must tsset your data before using pperron; see [TS] tsset.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Tests

>

Phillips-Perron unit-root test

Description
pperron performs the PhillipsPerron (1988) test that a variable has a unit root. The null hypothesis
is that the variable contains a unit root, and the alternative is that the variable was generated by a
stationary process. pperron uses NeweyWest (1987) standard errors to account for serial correlation,
whereas the augmented DickeyFuller test implemented in dfuller (see [TS] dfuller) uses additional
lags of the first-differenced variable.

Options


Main

noconstant suppresses the constant term (intercept) in the model.


trend specifies that a trend term be included in the associated regression. This option may not be
specified if noconstant is specified.
regress specifies that the associated regression table appear in the output. By default, the regression
table is not produced.
lags(#) specifies the n
number of NeweyWest
lags to use in calculating the standard error. The
o
2
/9
default is to use int 4(T /100)
lags.
401

402

pperron PhillipsPerron unit-root test

Remarks and examples


As noted in [TS] dfuller, the DickeyFuller test involves fitting the regression model

yt = + yt1 + t + ut

(1)

by ordinary least squares (OLS), but serial correlation will present a problem. To account for this, the
augmented DickeyFuller tests regression includes lags of the first differences of yt .
The PhillipsPerron test involves fitting (1), and the results are used to calculate the test statistics.
Phillips and Perron (1988) proposed two alternative statistics, which pperron presents. Phillips and
Perrons test statistics can be viewed as DickeyFuller statistics that have been made robust to
serial correlation by using the NeweyWest (1987) heteroskedasticity- and autocorrelation-consistent
covariance matrix estimator.
Hamilton (1994, chap. 17) and [TS] dfuller discuss four different cases into which unit-root tests
can be classified. The PhillipsPerron test applies to cases one, two, and four but not to case three.
Cases one and two assume that the variable has a unit root without drift under the null hypothesis, the
only difference being whether the constant term is included in regression (1). Case four assumes
that the variable has a random walk, with or without drift, under the null hypothesis. Case three,
which assumes that the variable has a random walk with drift under the null hypothesis, is just a
special case of case four, so the fact that the PhillipsPerron test does not apply is not restrictive.
The table below summarizes the relevant cases:

Case
1
2
4

Process under
null hypothesis

Regression
restrictions

dfuller
option

Random walk without drift


Random walk without drift
Random walk with or
without drift

= 0, = 0
=0
(none)

noconstant
(default)
trend

The critical values for the PhillipsPerron test are the same as those for the augmented DickeyFuller
test. See Hamilton (1994, chap. 17) for more information.

Example 1
Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G).
This dataset has 144 observations on the monthly number of international airline passengers from
1949 through 1960. Because the data exhibit a clear upward trend over time, we will use the trend
option.

pperron PhillipsPerron unit-root test

403

. use http://www.stata-press.com/data/r13/air2
(TIMESLAB: Airline passengers)
. pperron air, lags(4) trend regress
Phillips-Perron test for unit root

Test
Statistic
Z(rho)
Z(t)

Number of obs
=
143
Newey-West lags =
4
Interpolated Dickey-Fuller
1% Critical
5% Critical
10% Critical
Value
Value
Value

-46.405
-5.049

-27.687
-4.026

-20.872
-3.444

-17.643
-3.144

MacKinnon approximate p-value for Z(t) = 0.0002


air

Coef.
air
L1.
_trend
_cons

.7318116
.7107559
25.95168

Std. Err.

.0578092
.1670563
7.325951

12.66
4.25
3.54

P>|t|

[95% Conf. Interval]

0.000
0.000
0.001

.6175196
.3804767
11.46788

.8461035
1.041035
40.43547

Just as in the example in [TS] dfuller, we reject the null hypothesis of a unit root at all common
significance levels. The interpolated critical values for Zt differ slightly from those shown in the
example in [TS] dfuller because the sample sizes are different: with the augmented DickeyFuller
regression we lose observations because of the inclusion of lagged difference terms as regressors.

Stored results
pperron stores the following in r():
Scalars
r(N)
r(lags)
r(pval)
r(Zt)
r(Zrho)

number of observations
number of lagged differences used
MacKinnon approximate p-value (not included if noconstant specified)
PhillipsPerron test statistic
Phillips Perron test statistic

Methods and formulas


In the OLS estimation of an AR(1) process with Gaussian errors,

yi = yi1 + i
where i are independently and identically distributed as N (0, 2 ) and y0 = 0, the OLS estimate
(based on an n-observation time series) of the autocorrelation parameter is given by
n
X

bn =

yi1 yi
i=1
n
X
yi2
i=1

404

pperron PhillipsPerron unit-root test

If || < 1, then n(b


n ) N (0, 1 2 ). If this result were valid for when = 1, then the
resulting distribution would have a variance of zero. When = 1, the OLS estimate b still converges
to one, though we need to find a nondegenerate distribution so that we can test H0 : = 1. See
Hamilton (1994, chap. 17).
The PhillipsPerron test involves fitting the regression

yi = + yi1 + i
where we may exclude the constant or include a trend term. There are two statistics, Z and Z ,
calculated as

1 n2
b 2 b 2
Z = n(b
n 1)
n
b0,n
2
2 sn
s
 1 nb

b0,n bn 1 1 b 2

Z =

n
b0,n
b2
bn sn

b
2

n
1 X

bj,n =
u
bi u
bij
n i=j+1
q 
X
b2 =

b
+
2
1
0,n
n

s2n =

1
nk

j=1
n
X

j
q+1

bj,n

u
bi2

i=1

where ui is the OLS residual, k is the number of covariates in the regression, q is the number of
b 2 , and
Newey West lags to use in calculating
b is the OLS standard error of b.
n
The critical values, which have the same distribution as the Dickey Fuller statistic (see Dickey
and Fuller 1979) included in the output, are linearly interpolated from the table of values that appear
in Fuller (1996), and the MacKinnon approximate p-values use the regression surface published in
MacKinnon (1994).


Peter Charles Bonest Phillips (1948 ) was born in Weymouth, England, and earned degrees in
economics at the University of Auckland in New Zealand, and the London School of Economics.
After periods at the Universities of Essex and Birmingham, Phillips moved to Yale in 1979. He
also holds appointments at the University of Auckland and the University of York. His main
research interests are in econometric theory, financial econometrics, time-series and panel-data
econometrics, and applied macroeconomics.

Pierre Perron (1959 ) was born in Quebec, Canada, and earned degrees at McGill, Queens, and
Yale in economics. After posts at Princeton and the Universite de Montreal, he joined Boston
University in 1997. His research interests include time-series analysis, econometrics, and applied
macroeconomics.

pperron PhillipsPerron unit-root test

405

References
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root.
Journal of the American Statistical Association 74: 427431.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
MacKinnon, J. G. 1994. Approximate asymptotic distribution functions for unit-root and cointegration tests. Journal
of Business and Economic Statistics 12: 167176.
Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55: 703708.
Phillips, P. C. B., and P. Perron. 1988. Testing for a unit root in time series regression. Biometrika 75: 335346.

Also see
[TS] tsset Declare data to be time-series data
[TS] dfgls DF-GLS unit-root test
[TS] dfuller Augmented DickeyFuller unit-root test
[XT] xtunitroot Panel-data unit-root tests

Title
prais Prais Winsten and Cochrane Orcutt regression
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
prais depvar

indepvars

options

 

if

 

in

 

, options

Description

Model

rhotype(regress)
rhotype(freg)
rhotype(tscorr)
rhotype(dw)
rhotype(theil)
rhotype(nagar)
corc
ssesearch
twostep
noconstant
hascons
savespace

base on single-lag OLS of residuals; the default


base on single-lead OLS of residuals
base on autocorrelation of residuals
base on autocorrelation based on DurbinWatson
base on adjusted autocorrelation
base on adjusted DurbinWatson
use CochraneOrcutt transformation
search for that minimizes SSE
stop after the first iteration
suppress constant term
has user-defined constant
conserve memory during estimation

SE/Robust

vce(vcetype)

vcetype may be ols, robust, cluster clustvar, hc2, or hc3

Reporting

level(#)
nodw
display options

set confidence level; default is level(95)


do not report the DurbinWatson statistic
control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling

Optimization

optimize options

control the optimization process; seldom used

coeflegend

display legend instead of statistics

You must tsset your data before using prais; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

406

prais Prais Winsten and Cochrane Orcutt regression

407

Menu
Statistics

>

Time series

>

Prais-Winsten regression

Description
prais uses the generalized least-squares method to estimate the parameters in a linear regression
model in which the errors are serially correlated. Specifically, the errors are assumed to follow a
first-order autoregressive process.

Options


Model

rhotype(rhomethod) selects a specific computation for the autocorrelation parameter , where


rhomethod can be
regress
freg
tscorr
dw
theil
nagar

reg = from the residual regression t = t1


freg = from the residual regression t = t+1
tscorr = 0 t1 /0 , where  is the vector of residuals
dw = 1 dw/2, where dw is the Durbin Watson d statistic
theil = tscorr (N k)/N
nagar = (dw N 2 + k 2 )/(N 2 k 2 )

The prais estimator can use any consistent estimate of to transform the equation, and each
of these estimates meets that requirement. The default is regress, which produces the minimum
sum-of-squares solution (ssesearch option) for the Cochrane Orcutt transformation none of
these computations will produce the minimum sum-of-squares solution for the full Prais Winsten
transformation. See Judge et al. (1985) for a discussion of each estimate of .
corc specifies that the Cochrane Orcutt transformation be used to estimate the equation. With this
option, the Prais Winsten transformation of the first observation is not performed, and the first
observation is dropped when estimating the transformed equation; see Methods and formulas below.
ssesearch specifies that a search be performed for the value of that minimizes the sum-of-squared
errors of the transformed equation (Cochrane Orcutt or Prais Winsten transformation). The search
method is a combination of quadratic and modified bisection searches using golden sections.
twostep specifies that prais stop on the first iteration after the equation is transformed by the
two-step efficient estimator. Although iterating these estimators to convergence is customary, they
are efficient at each step.
noconstant; see [R] estimation options.
hascons indicates that a user-defined constant, or a set of variables that in linear combination forms a
constant, has been included in the regression. For some computational concerns, see the discussion
in [R] regress.
savespace specifies that prais attempt to save as much space as possible by retaining only those
variables required for estimation. The original data are restored after estimation. This option is
rarely used and should be used only if there is insufficient space to fit a model without the option.

408

prais Prais Winsten and Cochrane Orcutt regression

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator; see
[R] vce option.
vce(ols), the default, uses the standard variance estimator for ordinary least-squares regression.
vce(robust) specifies to use the Huber/White/sandwich estimator.
vce(cluster clustvar) specifies to use the intragroup correlation estimator.
vce(hc2) and vce(hc3) specify an alternative bias correction for the vce(robust) variance
calculation; for more information, see [R] regress. You may specify only one of vce(hc2),
vce(hc3), or vce(robust).
All estimates from prais are conditional on the estimated value of . Robust variance estimates
here are robust only to heteroskedasticity and are not generally robust to misspecification of the
functional form or omitted variables. The estimation of the functional form is intertwined with
the estimation of , and all estimates are conditional on . Thus estimates cannot be robust to
misspecification of functional form. For these reasons, it is probably best to interpret vce(robust)
in the spirit of Whites (1980) original paper on estimation of heteroskedastic-consistent covariance
matrices.

Reporting

level(#); see [R] estimation options.


nodw suppresses reporting of the Durbin Watson statistic.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.

Optimization

 
optimize options: iterate(#), no log, tolerance(#). iterate() specifies the maximum number of iterations. log/nolog specifies whether to show the iteration log. tolerance() specifies
the tolerance for the coefficient vector; tolerance(1e-6) is the default. These options are seldom
used.
The following option is available with prais but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


prais fits a linear regression of depvar on indepvars that is corrected for first-order serially correlated
residuals by using the Prais Winsten (1954) transformed regression estimator, the Cochrane Orcutt
(1949) transformed regression estimator, or a version of the search method suggested by Hildreth
and Lu (1960). Davidson and MacKinnon (1993) provide theoretical details on the three methods
(see pages 333335 for the latter two and pages 343351 for PraisWinsten). See Becketti (2013)
for more examples showing how to use prais.
The most common autocorrelated error process is the first-order autoregressive process. Under this
assumption, the linear regression model can be written as

yt = xt + ut

prais Prais Winsten and Cochrane Orcutt regression

409

where the errors satisfy

ut = ut1 + et
and the et are independently and identically distributed as N (0, 2 ). The covariance matrix of the
error term u can then be written as

2
T 1
1

T 2

1
2

1
T 3

=

2
1 ..
..
..
..
..

.
.
.
.
.
T 1
T 2
T 3

1
The Prais Winsten estimator is a generalized least-squares (GLS) estimator. The Prais Winsten
method (as described in Judge et al. 1985) is derived from the AR(1) model for the error term described
above. Whereas the Cochrane Orcutt method uses a lag definition and loses the first observation in
the iterative method, the Prais Winsten method preserves that first observation. In small samples,
this can be a significant advantage.

Technical note
To fit a model with autocorrelated errors, you must specify your data as time series and have (or
create) a variable denoting the time at which an observation was collected. The data for the regression
should be equally spaced in time.

Example 1
Say that we wish to fit a time-series model of usr on idle but are concerned that the residuals
may be serially correlated. We will declare the variable t to represent time by typing
. use http://www.stata-press.com/data/r13/idle
. tsset t
time variable: t, 1 to 30
delta: 1 unit

We can obtain CochraneOrcutt estimates by specifying the corc option:


. prais usr idle, corc
Iteration 0: rho = 0.0000
Iteration 1: rho = 0.3518
(output omitted )
Iteration 13: rho = 0.5708
Cochrane-Orcutt AR(1) regression -- iterated estimates
SS
df
MS
Number of obs
Source
F( 1,
27)
Model
40.1309584
1 40.1309584
Prob > F
Residual
166.898474
27 6.18142498
R-squared
Adj R-squared
Total
207.029433
28 7.39390831
Root MSE
usr

Coef.

idle
_cons

-.1254511
14.54641

rho

.5707918

Std. Err.
.0492356
4.272299

t
-2.55
3.40

Durbin-Watson statistic (original)


1.295766
Durbin-Watson statistic (transformed) 1.466222

P>|t|
0.017
0.002

=
=
=
=
=
=

29
6.49
0.0168
0.1938
0.1640
2.4862

[95% Conf. Interval]


-.2264742
5.78038

-.024428
23.31245

410

prais Prais Winsten and Cochrane Orcutt regression

The fitted model is


usrt = 0.1255 idlet + 14.55 + ut

and

ut = 0.5708 ut1 + et

We can also fit the model with the Prais Winsten method,
. prais usr idle
Iteration 0: rho = 0.0000
Iteration 1: rho = 0.3518
(output omitted )
Iteration 14: rho = 0.5535
Prais-Winsten AR(1) regression -- iterated estimates
SS
df
MS
Source
Model
Residual

43.0076941
169.165739

1
28

43.0076941
6.04163354

Total

212.173433

29

7.31632528

usr

Coef.

idle
_cons

-.1356522
15.20415

rho

.5535476

Std. Err.

.0472195
4.160391

-2.87
3.65

P>|t|
0.008
0.001

Number of obs
F( 1,
28)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

30
7.12
0.0125
0.2027
0.1742
2.458

[95% Conf. Interval]


-.2323769
6.681978

-.0389275
23.72633

Durbin-Watson statistic (original)


1.295766
Durbin-Watson statistic (transformed) 1.476004

where the Prais Winsten fitted model is

usrt = .1357 idlet + 15.20 + ut

and

ut = .5535 ut1 + et

As the results indicate, for these data there is little difference between the Cochrane Orcutt and
Prais Winsten estimators, whereas the OLS estimate of the slope parameter is substantially different.

Example 2
We have data on quarterly sales, in millions of dollars, for 5 years, and we would like to use
this information to model sales for company X. First, we fit a linear model by OLS and obtain the
DurbinWatson statistic by using estat dwatson; see [R] regress postestimation time series.
. use http://www.stata-press.com/data/r13/qsales
. regress csales isales
Source
SS
df
MS
Model
Residual

110.256901
.133302302

1
18

110.256901
.007405683

Total

110.390204

19

5.81001072

csales

Coef.

isales
_cons

.1762828
-1.454753

. estat dwatson
Durbin-Watson d-statistic(

Std. Err.
.0014447
.2141461

2,

t
122.02
-6.79

20) =

P>|t|
0.000
0.000

.7347276

Number of obs
F( 1,
18)
Prob > F
R-squared
Adj R-squared
Root MSE

=
20
=14888.15
= 0.0000
= 0.9988
= 0.9987
= .08606

[95% Conf. Interval]


.1732475
-1.904657

.1793181
-1.004849

prais Prais Winsten and Cochrane Orcutt regression

411

Because the DurbinWatson statistic is far from 2 (the expected value under the null hypothesis of
no serial correlation) and well below the 5% lower limit of 1.2, we conclude that the disturbances are
serially correlated. (Upper and lower bounds for the d statistic can be found in most econometrics
texts; for example, Harvey [1990]. The bounds have been derived for only a limited combination of
regressors and observations.) To reinforce this conclusion, we use two other tests to test for serial
correlation in the error distribution.
. estat bgodfrey, lags(1)
Breusch-Godfrey LM test for autocorrelation
lags(p)

chi2

df

Prob > chi2

7.998

0.0047

H0: no serial correlation


. estat durbinalt
Durbins alternative test for autocorrelation
lags(p)

chi2

df

Prob > chi2

11.329

0.0008

H0: no serial correlation

estat bgodfrey reports the BreuschGodfrey Lagrange multiplier test statistic, and estat
durbinalt reports the Durbins alternative test statistic. Both tests give a small p-value and thus
reject the null hypothesis of no serial correlation. These two tests are asymptotically equivalent when
testing for AR(1) process. See [R] regress postestimation time series if you are not familiar with
these two tests.
We correct for autocorrelation with the ssesearch option of prais to search for the value of
that minimizes the sum-of-squared residuals of the Cochrane Orcutt transformed equation. Normally,
the default Prais Winsten transformations is used with such a small dataset, but the less-efficient
Cochrane Orcutt transformation allows us to demonstrate an aspect of the estimators convergence.
. prais csales isales, corc ssesearch
Iteration 1: rho = 0.8944 , criterion = -.07298558
Iteration 2: rho = 0.8944 , criterion = -.07298558
(output omitted )
Iteration 15: rho = 0.9588 , criterion = -.07167037
Cochrane-Orcutt AR(1) regression -- SSE search estimates
Source

SS

df

MS

Model
Residual

2.33199178
.071670369

1
17

2.33199178
.004215904

Total

2.40366215

18

.133536786

csales

Coef.

isales
_cons

.1605233
1.738946

rho

.9588209

Std. Err.
.0068253
1.432674

t
23.52
1.21

Durbin-Watson statistic (original)


0.734728
Durbin-Watson statistic (transformed) 1.724419

Number of obs
F( 1,
17)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.241

=
=
=
=
=
=

19
553.14
0.0000
0.9702
0.9684
.06493

[95% Conf. Interval]


.1461233
-1.283732

.1749234
4.761624

412

prais Prais Winsten and Cochrane Orcutt regression

We noted in Options that, with the default computation of , the Cochrane Orcutt method produces
an estimate of that minimizes the sum-of-squared residuals the same criterion as the ssesearch
option. Given that the two methods produce the same results, why would the search method ever be
preferred? It turns out that the back-and-forth iterations used by Cochrane Orcutt may have difficulty
converging if the value of is large. Using the same data, the Cochrane Orcutt iterative procedure
requires more than 350 iterations to converge, and a higher tolerance must be specified to prevent
premature convergence:
. prais csales isales, corc tol(1e-9) iterate(500)
Iteration 0: rho = 0.0000
Iteration 1: rho = 0.6312
Iteration 2: rho = 0.6866
(output omitted )
Iteration 377: rho = 0.9588
Iteration 378: rho = 0.9588
Iteration 379: rho = 0.9588
Cochrane-Orcutt AR(1) regression -- iterated estimates
Source
SS
df
MS
Number of obs
F( 1,
17)
Model
2.33199171
1 2.33199171
Prob > F
Residual
.071670369
17 .004215904
R-squared
Adj R-squared
2.40366208
18 .133536782
Root MSE
Total
csales

Coef.

isales
_cons

.1605233
1.738946

rho

.9588209

Std. Err.
.0068253
1.432674

t
23.52
1.21

P>|t|
0.000
0.241

=
=
=
=
=
=

19
553.14
0.0000
0.9702
0.9684
.06493

[95% Conf. Interval]


.1461233
-1.283732

Durbin-Watson statistic (original)


0.734728
Durbin-Watson statistic (transformed) 1.724419

Once convergence is achieved, the two methods produce identical results.

.1749234
4.761625

prais Prais Winsten and Cochrane Orcutt regression

413

Stored results
prais stores the following in e():
Scalars
e(N)
e(N gaps)
e(mss)
e(df m)
e(rss)
e(df r)
e(r2)
e(r2 a)
e(F)
e(rmse)
e(ll)
e(N clust)
e(rho)
e(dw)
e(dw 0)
e(rank)
e(tol)
e(max ic)
e(ic)
Macros
e(cmd)
e(cmdline)
e(depvar)
e(title)
e(clustvar)
e(cons)
e(method)
e(tranmeth)
e(rhotype)
e(vce)
e(vcetype)
e(properties)
e(predict)
e(marginsok)
e(asbalanced)
e(asobserved)
Matrices
e(b)
e(V)
e(V modelbased)
Functions
e(sample)

number of observations
number of gaps
model sum of squares
model degrees of freedom
residual sum of squares
residual degrees of freedom
R2

adjusted R2
F statistic
root mean squared error
log likelihood
number of clusters
autocorrelation parameter
DurbinWatson d statistic for transformed regression
DurbinWatson d statistic of untransformed regression
rank of e(V)
target tolerance
maximum number of iterations
number of iterations
prais
command as typed
name of dependent variable
title in estimation output
name of cluster variable
noconstant or not reported
twostep, iterated, or SSE search
corc or prais
method specified in rhotype() option
vcetype specified in vce()
title used to label Std. Err.
b V
program used to implement predict
predictions allowed by margins
factor variables fvset as asbalanced
factor variables fvset as asobserved
coefficient vector
variancecovariance matrix of the estimators
model-based variance
estimation sample

Methods and formulas


Consider the command prais y x z . The 0th iteration is obtained by estimating a, b, and c
from the standard linear regression:

yt = axt + bzt + c + ut
An estimate of the correlation in the residuals is then obtained. By default, prais uses the auxiliary
regression:
ut = ut1 + et
This can be changed to any computation noted in the rhotype() option.

414

prais Prais Winsten and Cochrane Orcutt regression

Next we apply a Cochrane Orcutt transformation (1) for observations t = 2, . . . , n

yt yt1 = a(xt xt1 ) + b(zt zt1 ) + c(1 ) + vt

(1)

and the transformation (10 ) for t = 1

p
p
p
p
1 2 y1 = a( 1 2 x1 ) + b( 1 2 z1 ) + c 1 2 + 1 2 v1

(10 )

Thus the differences between the Cochrane Orcutt and the Prais Winsten methods are that the latter
uses (10 ) in addition to (1), whereas the former uses only (1), necessarily decreasing the sample size
by one.
Equations (1) and (10 ) are used to transform the data and obtain new estimates of a, b, and c.
When the twostep option is specified, the estimation process stops at this point and reports these
estimates. Under the default behavior of iterating to convergence, this process is repeated until the
change in the estimate of is within a specified tolerance.
The new estimates are used to produce fitted values

ybt = b
axt + bbzt + b
c
and then is reestimated using, by default, the regression defined by

yt ybt = (yt1 ybt1 ) + ut

(2)

We then reestimate (1) by using the new estimate of and continue to iterate between (1) and (2)
until the estimate of converges.
Convergence is declared after iterate() iterations or when the absolute difference in the estimated
correlation between two iterations is less than tol(); see [R] maximize. Sargan (1964) has shown
that this process will always converge.
Under the ssesearch option, a combined quadratic and bisection search using golden sections
searches for the value of that minimizes the sum-of-squared residuals from the transformed equation.
The transformation may be either the Cochrane Orcutt (1 only) or the Prais Winsten (1 and 10 ).
All reported statistics are based on the -transformed variables, and is assumed to be estimated
without error. See Judge et al. (1985) for details.
The Durbin Watson d statistic reported by prais and estat dwatson is
n1
P

d=

(uj+1 uj )2

j=1
n
P
j=1

u2j

where uj represents the residual of the j th observation.


This command supports the Huber/White/sandwich estimator of the variance and its clustered
version using vce(robust) and vce(cluster clustvar), respectively. See [P] robust, particularly
Introduction and Methods and formulas.

prais Prais Winsten and Cochrane Orcutt regression

415

All estimates from prais are conditional on the estimated value of . Robust variance estimates here
are robust only to heteroskedasticity and are not generally robust to misspecification of the functional
form or omitted variables. The estimation of the functional form is intertwined with the estimation
of , and all estimates are conditional on . Thus estimates cannot be robust to misspecification
of functional form. For these reasons, it is probably best to interpret vce(robust) in the spirit of
Whites original paper on estimation of heteroskedastic-consistent covariance matrices.

Acknowledgment
We thank Richard Dickens of the Centre for Economic Performance at the London School of
Economics and Political Science for testing and assistance with an early version of this command.


Sigbert Jon Prais (19282014) was born in Frankfurt and moved to Britain in 1934 as a refugee.
After earning degrees at the universities of Birmingham and Cambridge and serving in various
posts in research and industry, he settled at the National Institute of Economic and Social
Research. Praiss interests extended widely across economics, including studies of the influence
of education on economic progress.
Christopher Blake Winsten (19232005) was born in Welwyn Garden City, England; the son
of the writer Stephen Winsten and the painter and sculptress Clare Blake. He was educated
at the University of Cambridge and worked with the Cowles Commission at the University of
Chicago and at the universities of Oxford, London (Imperial College) and Essex, making many
contributions to economics and statistics, including the PraisWinsten transformation and joint
authorship of a celebrated monograph on transportation economics.
Donald Cochrane (19171983) was an Australian economist and econometrician. He was born
in Melbourne and earned degrees at Melbourne and Cambridge. After wartime service in the
Royal Australian Air Force, he held chairs at Melbourne and Monash, being active also in work
for various international organizations and national committees.

Guy Henderson Orcutt (1917 ) was born in Michigan and earned degrees in physics and
economics at the University of Michigan. He worked at Harvard, the University of Wisconsin,
and Yale. He has contributed to econometrics and economics in several fields, most distinctively
in developing microanalytical models of economic behavior.

References
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Cochrane, D., and G. H. Orcutt. 1949. Application of least squares regression to relationships containing auto-correlated
error terms. Journal of the American Statistical Association 44: 3261.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
Durbin, J., and G. S. Watson. 1950. Testing for serial correlation in least squares regression. I. Biometrika 37:
409428.
. 1951. Testing for serial correlation in least squares regression. II. Biometrika 38: 159177.
Hardin, J. W. 1995. sts10: PraisWinsten regression. Stata Technical Bulletin 25: 2629. Reprinted in Stata Technical
Bulletin Reprints, vol. 5, pp. 234237. College Station, TX: Stata Press.
Harvey, A. C. 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press.

416

prais Prais Winsten and Cochrane Orcutt regression

Hildreth, C., and J. Y. Lu. 1960. Demand relations with autocorrelated disturbances. Reprinted in Agricultural
Experiment Station Technical Bulletin, No. 276. East Lansing, MI: Michigan State University Press.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics.
2nd ed. New York: Wiley.
King, M. L., and D. E. A. Giles, ed. 1987. Specification Analysis in the Linear Model: Essays in Honor of Donald
Cochrane. London: Routledge & Kegan Paul.
Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press.
Prais, S. J., and C. B. Winsten. 1954. Trend estimators and serial correlation. Working paper 383, Cowles Commission.
http://cowles.econ.yale.edu/P/ccdp/st/s-0383.pdf.
Sargan, J. D. 1964. Wages and prices in the United Kingdom: A study in econometric methodology. In Econometric
Analysis for National Economic Planning, ed. P. E. Hart, G. Mills, and J. K. Whitaker, 2564. London: Butterworths.
Theil, H. 1971. Principles of Econometrics. New York: Wiley.
White, H. L., Jr. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity.
Econometrica 48: 817838.
Wooldridge, J. M. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western.
Zellner, A. 1990. Guy H. Orcutt: Contributions to economic statistics. Journal of Economic Behavior and Organization
14: 4351.

Also see
[TS] prais postestimation Postestimation tools for prais
[TS] tsset Declare data to be time-series data
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[R] regress Linear regression
[R] regress postestimation time series Postestimation tools for regress with time series
[U] 20 Estimation and postestimation commands

Title
prais postestimation Postestimation tools for prais

Description

Syntax for predict

Menu for predict

Options for predict

Also see

Description
The following standard postestimation commands are available after prais:
Command

Description

contrast
estat ic
estat summarize
estat vce
estimates
forecast
lincom

contrasts and ANOVA-style joint tests of estimates


Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)
summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
link test for model specification
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
pairwise comparisons of estimates
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

linktest
margins
marginsplot
nlcom
predict
predictnl
pwcompare
test
testnl

Syntax for predict


predict
statistic

type

newvar

if

 

in

 

, statistic

Description

Main

xb
stdp
residuals

linear prediction; the default


standard error of the linear prediction
residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

417

. . . if e(sample) . . . if wanted only for

418

prais postestimation Postestimation tools for prais

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the fitted values the prediction of xj b for the specified equation. This is
the linear predictor from the fitted regression model; it does not apply the estimate of to prior
residuals.
stdp calculates the standard error of the prediction for the specified equation, that is, the standard
error of the predicted expected value or mean for the observations covariate pattern. The standard
error of the prediction is also referred to as the standard error of the fitted value.
As computed for prais, this is strictly the standard error from the variance in the estimates of
the parameters of the linear model and assumes that is estimated without error.
residuals calculates the residuals from the linear prediction.

Also see
[TS] prais Prais Winsten and Cochrane Orcutt regression
[U] 20 Estimation and postestimation commands

Title
psdensity Parametric spectral density estimation after arima, arfima, and ucm
Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
psdensity

type

newvarsd newvarf

if

 

in

 

, options

where newvarsd is the name of the new variable that will contain the estimated spectral density and
newvarf is the name of the new variable that will contain the frequencies at which the spectral density
estimate is computed.
Description

options

pspectrum estimate the power spectrum rather than the spectral density
range(a b) limit the frequency range to [a, b)
cycle(#)
estimate the spectral density from the specified stochastic cycle; only allowed
after ucm
estimate the spectral density of the short-memory component of the ARFIMA
smemory
process; only allowed after arfima

Menu
Statistics

>

Time series

>

Postestimation

>

Parametric spectral density

Description
psdensity estimates the spectral density of a stationary process using the parameters of a previously
estimated parametric model.
psdensity works after arfima, arima, and ucm.

Options
pspectrum causes psdensity to estimate the power spectrum rather than the spectral density. The
power spectrum is equal to the spectral density times the variance of the process.
range(a b) limits the frequency range. By default, the spectral density is computed over [0, ).
Specifying range(a b) causes the spectral density to be computed over [a, b). We require that
0 a < b < .
cycle(#) causes psdensity to estimate the spectral density from the specified stochastic cycle after
ucm. By default, the spectral density from the first stochastic cycle is estimated. cycle(#) must
specify an integer that corresponds to a cycle in the model fit by ucm.
smemory causes psdensity to ignore the ARFIMA fractional integration parameter. The spectral
density computed is for the short-memory ARMA component of the model.
419

420

psdensity Parametric spectral density estimation after arima, arfima, and ucm

Remarks and examples


Remarks are presented under the following headings:
The frequency-domain approach to time series
Some ARMA examples

The frequency-domain approach to time series


A stationary process can be decomposed into random components that occur at the frequencies
[0, ]. The spectral density of a stationary process describes the relative importance of these
random components. psdensity uses the estimated parameters of a parametric model to estimate
the spectral density of a stationary process.
We need some concepts from the frequency-domain approach to time-series analysis to interpret
estimated spectral densities. Here we provide a simple, intuitive explanation. More technical presentations can be found in Priestley (1981), Harvey (1989, 1993), Hamilton (1994), Fuller (1996), and
Wei (2006).
In the time domain, the dependent variable evolves over time because of random shocks. The
autocovariances j , j {0, 1, . . . , }, of a covariance-stationary process yt specify its variance and
dependence structure, and the autocorrelations j , j {1, 2, . . . , }, provide a scale-free measure
of its dependence structure. The autocorrelation at lag j specifies whether realizations at time t and
realizations at time t j are positively related, unrelated, or negatively related.
In the frequency domain, the dependent variable is generated by an infinite number of random
components that occur at the frequencies [0, ]. The spectral density specifies the relative
importance of these random components. The area under the spectral density in the interval (, +d)
is the fraction of the variance of the process than can be attributed to the random components that
occur at the frequencies in the interval (, + d).
The spectral density and the autocorrelations provide the same information about the dependence
structure, albeit in different domains. The spectral density can be written as a weighted average of
the autocorrelations of yt , and it can be inverted to retrieve the autocorrelations as a function of the
spectral density.
Like autocorrelations, the spectral density is normalized by 0 , the variance of yt . Multiplying the
spectral density by 0 yields the power spectrum of yt , which changes with the units of yt .
A peak in the spectral density around frequency implies that the random components around
make an important contribution to the variance of yt .
A random variable primarily generated by low-frequency components will tend to have more runs
above or below its mean than an independent and identically distributed (i.i.d.) random variable, and
its plot may look smoother than the plot of the i.i.d. variable. A random variable primarily generated
by high-frequency components will tend to have fewer runs above or below its mean than an i.i.d.
random variable, and its plot may look more jagged than the plot of the i.i.d. variable.

Technical note
A more formal specification of the spectral density allows us to be more specific about how the
spectral density specifies the relative importance of the random components.
If yt is a covariance-stationary process with absolutely summable autocovariances, its spectrum is
given by

psdensity Parametric spectral density estimation after arima, arfima, and ucm

1
1X
gy () =
0 +
k cos(k)
2

421

(1)

k=1

where gy () is the spectrum of yt at frequency and k is the k th autocovariance of yt . Taking


the inverse Fourier transform of each side of (1) yields

gy ()eik d

k =

(2)

where i is the imaginary number i =

1.

Evaluating (2) at k = 0 yields

0 =

gy ()d

which means that the variance of yt can be decomposed in terms of the spectrum gy (). In particular,
gy ()d is the contribution to the variance of yt attributable to the random components in the interval
(, + d).
The spectrum depends on the units in which yt is measured, because it depends on the 0 . Dividing
both sides of (1) by 0 gives us the scale-free spectral density of yt :

1X
1
+
k cos(k)
2

fy () =

k=1

By construction,

fy ()d = 1

so fy ()d is the fraction of the variance of yt attributable to the random components in the interval
(, + d).

Some ARMA examples


In this section, we estimate and interpret the spectral densities implied by the estimated ARMA
parameters. The examples illustrate some of the essential relationships between covariance-stationary
processes, the parameters of ARMA models, and the spectral densities implied by the ARMA-model
parameters.
See [TS] ucm for a discussion of unobserved-components models and the stochastic-cycle model
derived by Harvey (1989) for stationary processes. The stochastic-cycle model has a different parameterization of the spectral density, and it tends to produce spectral densities that look more like
probability densities than ARMA models. See Remarks and examples in [TS] ucm for an introduction
to these models, some examples, and some comparisons between the stochastic-cycle model and
ARMA models.

422

psdensity Parametric spectral density estimation after arima, arfima, and ucm

Example 1
Lets consider the changes in the number of manufacturing employees in the United States, which
we plot below.

Change in number of mfg. employees, D


0
5

. use http://www.stata-press.com/data/r13/manemp2
(FRED data: Number of manufacturing employees in U.S.)
. tsline D.manemp, yline(-0.206)

1950m1

1960m1

1970m1

1980m1
Month

1990m1

2000m1

2010m1

We added a horizontal line at the sample mean of 0.0206 to highlight that there appear to be
more runs above or below the mean than we would expect in data generated by an i.i.d. process.
As a first pass at modeling this dependence, we use arima to estimate the parameters of a first-order
autoregressive (AR(1)) model. Formally, the AR(1) model is given by

yt = yt1 + t
where yt is the dependent variable, is the autoregressive coefficient, and t is an i.i.d. error term.
See [TS] arima for an introduction to ARMA modeling and the arima command.

psdensity Parametric spectral density estimation after arima, arfima, and ucm
. arima D.manemp, ar(1) noconstant
(setting optimization to BHHH)
Iteration 0:
log likelihood = -870.64844
Iteration 1:
log likelihood = -870.64794
Iteration 2:
log likelihood = -870.64789
Iteration 3:
log likelihood = -870.64787
Iteration 4:
log likelihood = -870.64786
(switching optimization to BFGS)
Iteration 5:
log likelihood = -870.64786
Iteration 6:
log likelihood = -870.64786
ARIMA regression
Sample: 1950m2 - 2011m2

Number of obs
Wald chi2(1)
Prob > chi2

Log likelihood = -870.6479

D.manemp

Coef.

OPG
Std. Err.

=
=
=

423

733
730.51
0.0000

P>|z|

[95% Conf. Interval]

ARMA
ar
L1.

.5179561

.0191638

27.03

0.000

.4803959

.5555164

/sigma

.7934554

.0080636

98.40

0.000

.777651

.8092598

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

The statistically significant estimate of 0.518 for the autoregressive coefficient indicates that there
is an important amount of positive autocorrelation in this series.
The spectral density of a covariance-stationary process is symmetric around 0. Following convention,
psdensity estimates the spectral density over the interval [0, ) at the points given in Methods and
formulas.
Now we use psdensity to estimate the spectral density of the process implied by the estimated
ARMA parameters. We specify the names of two new variables in the call to psdensity. The first

new variable will contain the estimated spectral density. The second new variable will contain the
frequencies at which the spectral density is estimated.

424

psdensity Parametric spectral density estimation after arima, arfima, and ucm

.1

ARMA spectral density


.2
.3

.4

.5

. psdensity psden1 omega


. line psden1 omega

Frequency

The above graph is typical of a spectral density of an AR(1) process with a positive coefficient. The
curve is highest at frequency 0, and it tapers off toward zero or a positive asymptote. The estimated
spectral density is telling us that the low-frequency random components are the most important random
components of an AR(1) process with a positive autoregressive coefficient.

.8

20

.9

Spectral Density (=.9)


40
60

Spectral Density (=.1)


1
1.1

80

1.2

100

The closer the is to 1, the more important are the low-frequency components relative to the
high-frequency components. To illustrate this point, we plot the spectral densities implied by AR(1)
models with = 0.1 and = 0.9.

2
Frequency

2
Frequency

psdensity Parametric spectral density estimation after arima, arfima, and ucm

425

As gets closer to 1, the plot of the spectral density gets closer to being a spike at frequency 0,
implying that only the lowest-frequency components are important.

Example 2
Now lets consider a dataset for which the estimated coefficient from an AR(1) model is negative.
Below we plot the changes in initial claims for unemployment insurance in the United States.

200

Change in initial claims, D


100
0
100

200

. use http://www.stata-press.com/data/r13/icsa1, clear


. tsline D.icsa, yline(0.08)

01jan1970

01jan1980

01jan1990
Date

01jan2000

01jan2010

The plot looks a little more jagged than we would expect from an i.i.d. process, but it is hard to
tell. Below we estimate the AR(1) coefficient.
. arima D.icsa, ar(1) noconstant
(setting optimization to BHHH)
Iteration 0:
log likelihood = -9934.0659
Iteration 1:
log likelihood = -9934.0657
Iteration 2:
log likelihood = -9934.0657
ARIMA regression
Sample: 14jan1967 - 19feb2011

Number of obs
Wald chi2(1)
Prob > chi2

Log likelihood = -9934.066


OPG
Std. Err.

P>|z|

=
=
=

2302
666.06
0.0000

D.icsa

Coef.

[95% Conf. Interval]

ar
L1.

-.2756024

.0106789

-25.81

0.000

-.2965326

-.2546722

/sigma

18.10988

.1176556

153.92

0.000

17.87928

18.34048

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

The estimated coefficient is negative and statistically significant.

426

psdensity Parametric spectral density estimation after arima, arfima, and ucm

The spectral density implied by the estimated parameters is

.1

ARMA spectral density


.15
.2
.25

.3

. psdensity psden2 omega2


. line psden2 omega2

Frequency

The above graph is typical of a spectral density of an AR(1) process with a negative coefficient.
The curve is lowest at frequency 0, and it monotonically increases to its highest point, which occurs
when the frequency is .

.8

20

.9

Spectral Density (=.9)


40
60

Spectral Density (=.1)


1
1.1

80

1.2

100

When the coefficient of an AR(1) model is negative, the high-frequency random components are
the most important random components of the process. The closer the is to 1, the more important
are the high-frequency components relative to the low-frequency components. To illustrate this point,
we plot the spectral densities implied by AR(1) models with = 0.1, and = 0.9.

2
Frequency

2
Frequency

psdensity Parametric spectral density estimation after arima, arfima, and ucm

427

As gets closer to 1, the plot of the spectral density shifts toward becoming a spike at frequency
, implying that only the highest-frequency components are important.

For examples of psdensity after arfima and ucm, see [TS] arfima and [TS] ucm.

Methods and formulas


Methods and formulas are presented under the following headings:
Introduction
Spectral density after arima or arfima
Spectral density after ucm

Introduction
The spectral density f () is estimated at the values {1 , 2 , . . . , N } using one of the
formulas given below. Given a sample of size N , after accounting for any if or in restrictions, the
N values of are given by i = (i 1)/(N 1) for i {1, 2, . . . , N }.
In the rare case in which the dataset in memory has insufficient observations for the desired
resolution of the estimated spectral density, you may use tsappend or set obs (see [TS] tsappend
or [D] obs) to increase the number of observations in the current dataset.
You may use an if restriction or an in restriction to restrict the observations to handle panel data
or to compute the estimates for a subset of the observations.

Spectral density after arima or arfima


Let k and k denote the p autoregressive and q moving-average parameters of an ARMA model,
respectively. Box, Jenkins, and Reinsel (2008) show that the spectral density implied by the ARMA
parameters is
2

fARMA (; , , 2 , 0 ) =

2 |1 + 1 ei + 2 ei2 + + q eiq |
20 |1 1 ei 2 ei2 p eip |2

where [0, ] and 2 is the variance of the idiosyncratic error and 0 is the variance of the
dependent variable. We estimate 0 using the arima parameter estimates.
The spectral density for the ARFIMA model is

fARFIMA (; , , d, 2 , 0 ) = |1 ei |2d fARMA (; , , 2 )


where d, 1/2 < d < 1/2, is the fractional integration parameter. The spectral density goes to infinity
as the frequency approaches 0 for 0 < d < 1/2, and it is zero at frequency 0 for 1/2 < d < 0.
The smemory option causes psdensity to perform the estimation with d = 0, which is equivalent
to estimating the spectral density of the fractionally differenced series.
The power spectrum omits scaling by 0 .

428

psdensity Parametric spectral density estimation after arima, arfima, and ucm

Spectral density after ucm


The spectral density of an order-k stochastic cycle with frequency and damping is (Trimbur 2006)

(
f (; , , 2 )

(1 2 )2k1
Pk1 k12
2

i=0

Pk

j=0

2i

Pk

i=0 (1)

j+i k
j

 k
i

j+i cos (j i) cos (j i)


k

2 {1 + 42 cos2 + 4 4(1 + 2 ) cos cos + 22 cos 2}


where 2 is the variance of the cycle error term.
The variance of the cycle is

and the power spectrum omits scaling by

Pk1
i=0

(1

k1 2 2i

i
2
2k1
)

2 .

References
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
. 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press.
Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press.
Trimbur, T. M. 2006. Properties of higher order stochastic cycles. Journal of Time Series Analysis 27: 117.
Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson.

Also see
[TS] arfima Autoregressive fractionally integrated moving-average models
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] ucm Unobserved-components model

Title
rolling Rolling-window and recursive estimation

Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Acknowledgment

Options
References

Syntax
rolling

exp list

 

if

 

in

 

, options

: command

Description

options
Main

number of consecutive data points in each sample


use recursive samples
use reverse recursive samples

window(#)
recursive
rrecursive

Options

clear
saving(filename, . . .)
stepsize(#)
start(time constant)
end(time constant)


keep(varname , start )

replace data in memory with results


save results to filename; save statistics in double precision;
save results to filename every # replications
number of periods to advance window
period at which rolling is to start
period at which rolling is to end
save varname along with results; optionally, use value at
left edge of window

Reporting

nodots
noisily
trace

suppress replication dots


display any output from command
trace commands execution

Advanced

reject(exp)

identify invalid results

window(#) is required.
You must tsset your data before using rolling; see [TS] tsset.
aweights are allowed in command if command accepts aweights; see [U] 11.1.6 weight.

exp list contains

elist contains
eexp is

(name: elist)
elist
eexp
newvar = (exp)
(exp)
specname
[eqno]specname
429

430

rolling Rolling-window and recursive estimation

specname is

eqno is

b
b[]
se
se[]
##
name

exp is a standard Stata expression; see [U] 13 Functions and expressions.



Distinguish between [ ], which are to be typed, and , which indicate optional arguments.

Menu
Statistics

>

Time series

>

Rolling-window and recursive estimation

Description
rolling is a moving sampler that collects statistics from command after executing command on
subsets of the data in memory. Typing
. rolling exp list, window(50) clear: command
executes command on sample windows of span 50. That is, rolling will first execute command by
using periods 150 of the dataset, and then using periods 251, 352, and so on. rolling can also
perform recursive and reverse recursive analyses, in which the starting or ending period is held fixed
and the window size grows.
command defines the statistical command to be executed. Most Stata commands and user-written
programs can be used with rolling, as long as they follow standard Stata syntax and allow the if
qualifier; see [U] 11 Language syntax. The by prefix cannot be part of command.
exp list specifies the statistics to be collected from the execution of command. If no expressions
are given, exp list assumes a default of b if command stores results in e() and of all the scalars if
command stores results in r() and not in e(). Otherwise, not specifying an expression in exp list
is an error.

Options


Main

window(#) defines the window size used each time command is executed. The window size refers to
calendar periods, not the number of observations. If there are missing data (for example, because
of weekends), the actual number of observations used by command may be less than window(#).
window(#) is required.
recursive specifies that a recursive analysis be done. The starting period is held fixed, the ending
period advances, and the window size grows.
rrecursive specifies that a reverse recursive analysis be done. Here the ending period is held fixed,
the starting period advances, and the window size shrinks.

rolling Rolling-window and recursive estimation

431

Options

clear specifies that Stata replace the data in memory with the collected statistics even though the
current data in memory have not been saved to disk.


saving( filename , suboptions ) creates a Stata data file (.dta file) consisting of (for each statistic
in exp list) a variable containing the window replicates.
double specifies that the results for each replication be saved as doubles, meaning 8-byte reals.
By default, they are saved as floats, meaning 4-byte reals.
every(#) specifies that results be written to disk every #th replication. every() should be specified
in conjunction only with saving() when command takes a long time for each replication. This
will allow recovery of partial results should your computer crash. See [P] postfile.
stepsize(#) specifies the number of periods the window is to be advanced each time command is
executed.
start(time constant) specifies the date on which rolling is to start. start() may be specified
as an integer or as a date literal.
end(time constant) specifies the date on which rolling is to end. end() may be specified as an
integer or as a date literal.


keep(varname , start ) specifies a variable to be posted along with the results. The value posted
is the value that corresponds to the right edge of the window. Specifying the start() option
requests that the value corresponding to the left edge of the window be posted instead. This option
is often used to record calendar dates.

Reporting

nodots suppresses display of the replication dot for each window on which command is executed.
By default, one dot character is printed for each window. A red x is printed if command returns
with an error or if any value in exp list is missing.
noisily causes the output of command to be displayed for each window on which command is
executed. This option implies the nodots option.
trace causes a trace of the execution of command to be displayed. This option implies the noisily
and nodots options.

Advanced

reject(exp) identifies an expression that indicates when results should be rejected. When exp is
true, the saved statistics are set to missing values.

Remarks and examples


rolling executes a command on each of a series of windows of observations and stores the
results. rolling can perform what are commonly called rolling regressions, recursive regressions,
and reverse recursive regressions. However, rolling is not limited to just linear regression analysis:
any command that stores results in e() or r() can be used with rolling.
Suppose that you have data collected at 100 consecutive points in time, numbered 1100, and you
wish to perform a rolling regression with a window size of 20 periods. Typing
. rolling _b, window(20) clear: regress depvar indepvar

432

rolling Rolling-window and recursive estimation

causes Stata to regress depvar on indepvar using periods 120, store the regression coefficients
( b), run the regression using periods 221, and so on, finishing with a regression using periods
81100 (the last 20 periods).
The stepsize() option specifies how far ahead the window is moved each time. For example,
if you specify step(2), then command is executed on periods 120, and then 322, 524, etc. By
default, rolling replaces the dataset in memory with the computed statistics unless the saving()
option is specified, in which case the computed statistics are saved in the filename specified. If the
dataset in memory has been changed since it was last saved and you do not specify saving(), you
must use clear.
rolling can also perform recursive and reverse recursive analyses. In a recursive analysis, the
starting date is held fixed, and the window size grows as the ending date is advanced. In a reverse
recursive analysis, the ending date is held fixed, and the window size shrinks as the starting date is
advanced.

Example 1
We have data on the daily returns to IBM stock (ibm), the S&P 500 (spx), and short-term interest
rates (irx), and we want to create a series containing the beta of IBM by using the previous 200 trading
days at each date. We will also record the standard errors, so that we can obtain 95% confidence
intervals for the betas. See, for example, Stock and Watson (2011, 118) for more information on
estimating betas. We type
. use http://www.stata-press.com/data/r13/ibm
(Source: Yahoo! Finance)
. tsset t
time variable: t, 1 to 494
delta: 1 unit
. generate ibmadj = ibm - irx
(1 missing value generated)
. generate spxadj = spx - irx
(1 missing value generated)
. rolling _b _se, window(200) saving(betas, replace)
> spxadj
(running regress on estimation sample)
(note: file betas.dta not found)
Rolling replications (295)
1
2
3
4
5
..................................................
..................................................
..................................................
..................................................
..................................................
.............................................
file betas.dta saved

keep(date): regress ibmadj

50
100
150
200
250

Our dataset has both a time variable t that runs consecutively and a date variable date that
measures the calendar date and therefore has gaps at weekends and holidays. Had we used the date
variable as our time variable, rolling would have used windows consisting of 200 calendar days
instead of 200 trading days, and each window would not have exactly 200 observations. We used
the keep(date) option so that we could refer to the date variable when working with the results
dataset.

rolling Rolling-window and recursive estimation

433

We can list a portion of the dataset created by rolling to see what it contains:
. use betas, clear
(rolling: regress)
. sort date
. list in 1/3, abbrev(10)

1.
2.
3.

start

end

date

_b_spxadj

_b_cons

_se_spxadj

_se_cons

1
2
3

200
201
202

16oct2003
17oct2003
20oct2003

1.043422
1.039024
1.038371

-.0181504
-.0126876
-.0235616

.0658531
.0656893
.0654591

.0748295
.074609
.0743851

The variables start and end indicate the first and last observations used each time that rolling
called regress, and the date variable contains the calendar date corresponding the period represented
by end. The remaining variables are the estimated coefficients and standard errors from the regression.
In our example , b spxadj contains the estimated betas, and b cons contains the estimated alphas.
The variables se spxadj and se cons have the corresponding standard errors.
Finally, we compute the confidence intervals for the betas and examine how they have changed
over time:
. generate lower = _b_spxadj - 1.96*_se_spxadj
. generate upper = _b_spxadj + 1.96*_se_spxadj

.6

.8

Beta

1.2

. twoway (line _b_spxadj date) (rline lower upper date) if date>=td(1oct2003),


> ytitle("Beta")

01oct2003

01jan2004

01apr2004
01jul2004
date
_b[spxadj]

01oct2004

01jan2005

lower/upper

As 2004 progressed, IBMs stock returns were less influenced by returns in the broader market.
Beginning in June of 2004, IBMs beta became significantly different from unity at the 95% confidence
level, as indicated by the fact that the confidence interval does not contain one from then onward.

In addition to rolling-window analyses, rolling can also perform recursive ones. Suppose again
that you have data collected at 100 consecutive points in time, and now you type
. rolling _b, window(20) recursive clear: regress depvar indepvar

434

rolling Rolling-window and recursive estimation

Stata will first regress depvar on indepvar by using observations 120, store the coefficients, run
the regression using observations 121, observations 122, and so on, finishing with a regression
using all 100 observations. Unlike a rolling regression, in which case the number of observations is
held constant and the starting and ending points are shifted, a recursive regression holds the starting
point fixed and increases the number of observations. Recursive analyses are often used in forecasting
situations. As time goes by, more information becomes available that can be used in making forecasts.
See Kmenta (1997, 423424).

Example 2
Using the same dataset, we type
. use http://www.stata-press.com/data/r13/ibm, clear
(Source: Yahoo! Finance)
. tsset t
time variable: t, 1 to 494
delta: 1 unit
. generate ibmadj = ibm - irx
(1 missing value generated)
. generate spxadj = spx - irx
(1 missing value generated)
. rolling _b _se, recursive window(200) clear: regress ibmadj spxadj
(output omitted )
. list in 1/3, abbrev(10)

1.
2.
3.

start

end

_b_spxadj

_b_cons

_se_spxadj

_se_cons

1
1
1

200
201
202

1.043422
1.039024
1.037687

-.0181504
-.0126876
-.016475

.0658531
.0656893
.0655896

.0748295
.074609
.0743481

Here the starting period remains fixed and the window grows larger.

In a reverse recursive analysis, the ending date is held fixed, and the window size becomes smaller
as the starting date is advanced. For example, with a dataset that has observations numbered 1100,
typing
. rolling _b, window(20) reverse recursive clear: regress depvar indepvar

creates a dataset in which the first observation has the results based on periods 1100, the second
observation has the results based on 2100, the third having 3100, and so on, up to the last
observation having results based on periods 81100 (the last 20 observations).

Example 3
Using the data on stock returns, we want to build a model in which we predict todays IBM stock
return on the basis of yesterdays returns on IBM and the S&P 500. That is, letting it and st denote
the returns to IBM and the S&P 500 on date t, we want to fit the regression model

it = 0 + 1 it1 + 2 st1 + t
where t is a regression error term, and then compute

c c
c
id
t+1 = 0 + 1 it + 2 st

rolling Rolling-window and recursive estimation

435

We will use recursive regression because we suspect that the more data we have to fit the regression
model, the better the model will predict returns. We will use at least 20 periods in fitting the regression.
. use http://www.stata-press.com/data/r13/ibm, clear
(Source: Yahoo! Finance)
. tsset t
time variable: t, 1 to 494
delta: 1 unit

One alternative would be to use rolling with the recursive option to fit the regressions, collect
the coefficients, and then compute the predicted values afterward. However, we will instead write a
short program that computes the forecasts automatically and then use rolling, recursive on that
program. The program must accept an if expression so that rolling can indicate to the program
which observations are to be used. Our program is
program myforecast, rclass
syntax [if]
regress ibm L.ibm L.spx if
// Find last time period of estimation sample and
// make forecast for period just after that
summ t if e(sample)
local last = r(max)
local fcast = _b[_cons] + _b[L.ibm]*ibm[last] + ///
_b[L.spx]*spx[last]
return scalar forecast = fcast
// Next periods actual return
// Will return missing value for final period
return scalar actual = ibm[last+1]
end

Now we call rolling:


. rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast
(output omitted )
. corr actual forecast
(obs=474)
actual forecast
actual
forecast

1.0000
-0.0957

1.0000

Our model does not work too wellthe correlation between actual returns and our forecasts is
negative.

Stored results
rolling sets no r- or e-class macros. The results from the command used with rolling, depending
on the last window of data used, are available after rolling has finished.

Acknowledgment
We thank Christopher F. Baum of the Department of Economics at Boston College and author of
the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to
Stata Programming for an earlier rolling regression command.

436

rolling Rolling-window and recursive estimation

References
Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press.
Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: AddisonWesley.

Also see
[D] statsby Collect statistics for a command across a by list
[R] stored results Stored results

Title
sspace State-space models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Covariance-form syntax




sspace state ceq state ceq . . . state ceq obs ceq obs ceq . . . obs ceq
    

if
in
, options
where each state ceq is of the form





(statevar lagged statevars
indepvars , state noerror noconstant )
and each obs ceq is of the form




(depvar statevars
indepvars
, noerror noconstant )
Error-form syntax





sspace state efeq state efeq . . . state efeq obs efeq obs efeq . . . obs efeq
    

if
in
, options
where each state efeq is of the form






indepvars
state errors , state noconstant )
(statevar lagged statevars
and each obs efeq is of the form





(depvar statevars
indepvars
obs errors
, noconstant )
statevar is the name of an unobserved state, not a variable. If there happens to be a variable of the
same name, the variable is ignored and plays no role in the estimation.
lagged statevars is a list of lagged statevars. Only first lags are allowed.
state errors is a list of state-equation errors that enter a state equation. Each state error has the form
e.statevar, where statevar is the name of a state in the model.
obs errors is a list of observation-equation errors that enter an equation for an observed variable.
Each error has the form e.depvar, where depvar is an observed dependent variable in the model.
equation-level options

Description

Model

state
noerror
noconstant

specifies that the equation is a state equation


specifies that there is no error term in the equation
suppresses the constant term from the equation

437

438

sspace State-space models

options

Description

Model

covstate(covform)
covobserved(covform)
constraints(constraints)

specifies the covariance structure for the errors in the state variables
specifies the covariance structure for the errors in the observed
dependent variables
apply specified linear constraints

SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, display of omitted variables
and base and empty cells, and factor-variable labeling

Maximization

maximize options

control the maximization process; seldom used

Advanced

method(method)

specify the method for calculating the log likelihood; seldom used

coeflegend

display legend instead of statistics

covform

Description

identity
dscalar
diagonal
unstructured

identity matrix; the default for error-form syntax


diagonal scalar matrix
diagonal matrix; the default for covariance-form syntax
symmetric, positive-definite matrix; not allowed with error-form
syntax

method

Description

hybrid

use the stationary Kalman filter and the De Jong diffuse Kalman
filter; the default
use the stationary De Jong Kalman filter and the De Jong diffuse
Kalman filter
use the stationary Kalman filter and the nonstationary large-
diffuse Kalman filter; seldom used

dejong
kdiffuse

You must tsset your data before using sspace; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
indepvars and depvar may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, statsby, and rolling are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

sspace State-space models

439

Menu
Statistics

>

Multivariate time series

>

State-space models

Description
sspace estimates the parameters of linear state-space models by maximum likelihood. Linear
state-space models are very flexible and many linear time-series models can be written as linear
state-space models.
sspace uses two forms of the Kalman filter to recursively obtain conditional means and variances
of both the unobserved states and the measured dependent variables that are used to compute the
likelihood.
The covariance-form syntax and the error-form syntax of sspace reflect the two different forms
in which researchers specify state-space models. Choose the syntax that is easier for you; the two
forms are isomorphic.

Options


Equation-level options

Model

state specifies that the equation is a state equation.


noerror specifies that there is no error term in the equation. noerror may not be specified in the
error-form syntax.
noconstant suppresses the constant term from the equation.

Options


Model

covstate(covform) specifies the covariance structure for the state errors.


covstate(identity) specifies a covariance matrix equal to an identity matrix, and it is the
default for the error-form syntax.
2
times an identity matrix.
covstate(dscalar) specifies a covariance matrix equal to state

covstate(diagonal) specifies a diagonal covariance matrix, and it is the default for the covarianceform syntax.
covstate(unstructured) specifies a symmetric, positive-definite covariance matrix with parameters for all variances and covariances. covstate(unstructured) may not be specified
with the error-form syntax.
covobserved(covform) specifies the covariance structure for the observation errors.
covobserved(identity) specifies a covariance matrix equal to an identity matrix, and it is the
default for the error-form syntax.
2
covobserved(dscalar) specifies a covariance matrix equal to observed
times an identity matrix.

covobserved(diagonal) specifies a diagonal covariance matrix, and it is the default for the
covariance-form syntax.

440

sspace State-space models

covobserved(unstructured) specifies a symmetric, positive-definite covariance matrix with


parameters for all variances and covariances. covobserved(unstructured) may not be
specified with the error-form syntax.
constraints(constraints); see [R] estimation options.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, causes sspace to use the observed information matrix estimator.
vce(robust) causes sspace to use the Huber/White/sandwich estimator.

Reporting

level(#), nocnsreport; see [R] estimation options.


display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see
[R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and
see below for information on from(). These options are seldom used.
from(matname) specifies initial values for the maximization process. from(b0) causes sspace
to begin the maximization algorithm with the values in b0. b0 must be a row vector; the number
of columns must equal the number of parameters in the model; and the values in b0 must be
in the same order as the parameters in e(b).

Advanced

method(method) specifies how to compute the log likelihood. This option is seldom used.
method(hybrid), the default, uses the Kalman filter with model-based initial values for the states
when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when
the model is nonstationary.
method(dejong) uses the Kalman filter with the De Jong (1988) method for estimating the initial
values for the states when the model is stationary and uses the De Jong (1988, 1991) diffuse
Kalman filter when the model is nonstationary.
method(kdiffuse) is a seldom used method that uses the Kalman filter with model-based initial
values for the states when the model is stationary and uses the large- diffuse Kalman filter
when the model is nonstationary.
The following option is available with sspace but is not shown in the dialog box:
coeflegend; see [R] estimation options.

sspace State-space models

441

Remarks and examples


Remarks are presented under the following headings:
An introduction to state-space models
Some stationary state-space models
Some nonstationary state-space models

An introduction to state-space models


Many linear time-series models can be written as linear state-space models, including vector
autoregressive moving-average (VARMA) models, dynamic-factor (DF) models, and structural timeseries (STS) models. The solutions to some stochastic dynamic-programming problems can also be
written in the form of linear state-space models. We can estimate the parameters of a linear statespace model by maximum likelihood (ML). The Kalman filter or a diffuse Kalman filter is used
to write the likelihood function in prediction-error form, assuming normally distributed errors. The
quasimaximum likelihood (QML) estimator, which drops the normality assumption, is consistent
and asymptotically normal when the model is stationary. Chang, Miller, and Park (2009) establish
consistency and asymptotic normality of the QML estimator for a class of nonstationary state-space
models. The QML estimator differs from the ML estimator only in the VCE; specify the vce(robust)
option to obtain the QML estimator.
Hamilton (1994a, 1994b), Harvey (1989), and Brockwell and Davis (1991) provide good introductions to state-space models. Anderson and Moores (1979) text is a classic reference; they produced
many results used subsequently. Caines (1988) and Hannan and Deistler (1988) provide excellent,
more advanced, treatments.
sspace estimates linear state-space models with time-invariant coefficient matrices, which cover
the models listed above and many others. sspace can estimate parameters from state-space models
of the form
zt = Azt1 + Bxt + Ct

yt = Dzt + Fwt + Gt
where

zt is an m 1 vector of unobserved state variables;


xt is a kx 1 vector of exogenous variables;
t is a q 1 vector of state-error terms, (q m);

yt is an n 1 vector of observed endogenous variables;


wt is a kw 1 vector of exogenous variables;
t is an r 1 vector of observation-error terms, (r n); and

A, B, C, D, F, and G are parameter matrices.


The equations for zt are known as the state equations, and the equations for yt are known as the
observation equations.
The error terms are assumed to be zero mean, normally distributed, serially uncorrelated, and
uncorrelated with each other;

442

sspace State-space models

t N (0, Q)
t N (0, R)

E[t 0s ] = 0 for all s 6= t


E[t 0s ] = 0 for all s and t
The state-space form is used to derive the log likelihood of the observed endogenous variables
conditional on their own past and any exogenous variables. When the model is stationary, a method
for recursively predicting the current values of the states and the endogenous variables, known as
the Kalman filter, is used to obtain the prediction error form of the log-likelihood function. When
the model is nonstationary, a diffuse Kalman filter is used. How the Kalman filter and the diffuse
Kalman filter initialize their recursive computations depends on the method() option; see Methods
and formulas.
The linear state-space models with time-invariant coefficient matrices defined above can be specified
in the covariance-form syntax and the error-form syntax. The covariance-form syntax requires that
C and G be selection matrices, but places no restrictions on Q or R. In contrast, the error-form
syntax places no restrictions C or G, but requires that Q and R be either diagonal, diagonal-scalar,
or identity matrices. Some models are more easily specified in the covariance-form syntax, while
others are more easily specified in the error-form syntax. Choose the syntax that is easiest for your
application.

Some stationary state-space models


Example 1: An AR(1) model
Following Hamilton (1994b, 373374), we can write the first-order autoregressive (AR(1)) model

yt = (yt1 ) + t
as a state-space model with the observation equation

yt = + ut
and the state equation

ut = ut1 + t
where the unobserved state is ut = yt .
Here we fit this model to data on the capacity utilization rate. The variable lncaputil contains
data on the natural log of the capacity utilization rate for the manufacturing sector of the U.S. economy.
We treat the series as first-difference stationary and fit its first-difference to an AR(1) process. Here
we estimate the parameters of the above state-space form of the AR(1) model:

sspace State-space models

443

. use http://www.stata-press.com/data/r13/manufac
(St. Louis Fed (FRED) manufacturing data)
. constraint 1 [D.lncaputil]u = 1
. sspace (u L.u, state noconstant) (D.lncaputil u,
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood =
1505.36
Iteration 1:
log likelihood = 1512.0581
(output omitted )
Refining estimates:
Iteration 0:
log likelihood =
1516.44
Iteration 1:
log likelihood =
1516.44

noerror), constraints(1)

State-space model
Sample: 1972m2 - 2008m12

Number of obs
Wald chi2(1)
Prob > chi2

Log likelihood =
1516.44
( 1) [D.lncaputil]u = 1

lncaputil

Coef.

u
L1.

.3523983

D.lncaputil
u
_cons

1
-.0003558

=
=
=

443
61.73
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

.0448539

7.86

0.000

.2644862

.4403104

(constrained)
.0005781
-0.62

0.538

-.001489

.0007773

0.000

.000054

.0000704

var(u)

.0000622

4.18e-06

14.88

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The iteration log has three parts: the dots from the search for initial values, the log from finding
the maximum, and the log from a refining step. Here is a description of the logic behind each part:
1. The quality of the initial values affect the speed and robustness of the optimization algorithm.
sspace takes a few iterations in a nonlinear least-squares (NLS) algorithm to find good
initial values and reports a dot for each (NLS) iteration.
2. This iteration log is the standard method by which Stata reports the search for the maximum
likelihood estimates of the parameters in a nonlinear model.
3. Some of the parameters are transformed in the maximization process that sspace reports in
part 2. After a maximum candidate is found in part 2, sspace looks for a maximum in the
unconstrained space, checks that the Hessian of the log-likelihood function is of full rank,
and reports these iterations as the refining step.
The header in the output describes the estimation sample, reports the log-likelihood function at
the maximum, and gives the results of a Wald test against the null hypothesis that the coefficients
on all the independent variables, state variables, and lagged state variables are zero. In this example,
the null hypothesis that the coefficient on L1.u is zero is rejected at all conventional levels.
The estimation table reports results for the state equations, the observation equations, and the
variancecovariance parameters. The estimated autoregressive coefficient of 0.3524 indicates that there
is persistence in the first-differences of the log of the manufacturing rate. The estimated mean of the
differenced series is 0.0004, which is smaller in magnitude than its standard error, indicating that
there is no deterministic linear trend in the series.

444

sspace State-space models

Typing
. arima D.lncaputil, ar(1) technique(nr)
(output omitted )

produces nearly identical parameter estimates and standard errors for the mean and the autoregressive
parameter. Because sspace estimates the variance of the state error while arima estimates the
standard deviation, calculations are required to obtain the same results. The different parameterization
of the variance parameter can cause small numerical differences.

Technical note
In some situations, the second part of the iteration log terminates but the refining step never
converges. Only when the refining step converges does the maximization algorithm find interpretable
estimates. If the refining step iterates without convergence, the parameters of the specified model are
not identified by the data. (See Rothenberg [1971], Drukker and Wiggins [2004], and Davidson and
MacKinnon [1993, sec. 5.2] for discussions of identification.)

Example 2: An ARMA(1,1) model


Following Harvey (1993, 9596), we can write a zero-mean, first-order, autoregressive movingaverage (ARMA(1,1)) model

yt = yt1 + t1 + t

(1)

as a state-space model with state equations

yt
t


=

1
0 0



yt1
t1


+

 
1
t

(2)

and observation equation


yt = ( 1 0 )

yt
t


(3)

The unobserved states in this model are u1t = yt and u2t = t . We set the process mean to zero
because economic theory and the previous example suggest that we should do so. Below we estimate
the parameters in the state-space model by using the error-form syntax:

sspace State-space models

445

. constraint 2 [u1]L.u2 = 1
. constraint 3 [u1]e.u1 = 1
. constraint 4 [D.lncaputil]u1 = 1
. sspace (u1 L.u1 L.u2 e.u1, state noconstant) (u2 e.u1, state noconstant)
> (D.lncaputil u1, noconstant), constraints(2/4) covstate(diagonal)
searching for initial values ...........
(setting technique to bhhh)
Iteration 0:
log likelihood = 1506.0947
Iteration 1:
log likelihood =
1514.014
(output omitted )
Refining estimates:
Iteration 0:
log likelihood =
1531.255
Iteration 1:
log likelihood =
1531.255
State-space model
Sample: 1972m2 - 2008m12

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood =
1531.255
( 1) [u1]L.u2 = 1
( 2) [u1]e.u1 = 1
( 3) [D.lncaputil]u1 = 1

lncaputil

Coef.

u1
L1.

.8056815

u2
L1.
e.u1

1
1

e.u1

-.5188453

D.lncaputil
u1

OIM
Std. Err.

=
=
=

443
333.84
0.0000

P>|z|

[95% Conf. Interval]

0.000

.7032418

.9081212

0.000

-.6564317

-.3812588

0.000

.0000505

.0000659

u1
.0522661

15.41

(constrained)
(constrained)

u2

var(u1)

.0000582

.0701985

-7.39

(constrained)
3.91e-06

14.88

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The command in the above output specifies two state equations, one observation equation, and
two options. The first state equation defines u1t and the second defines u2t according to (2) above.
The observation equation defines the process for D.lncaputil according to the one specified in (3)
above. Several coefficients in (2) and (3) are set to 1, and constraints 24 place these restrictions on
the model.
The estimated coefficient on L.u1 in equation u1, 0.806, is the estimate of in (2), which is the
autoregressive coefficient in the ARMA model in (1). The estimated coefficient on e.u1 in equation
u2, 0.519, is the estimate of , which is the moving-average term in the ARMA model in (1).
This example highlights a difference between the error-form syntax and the covariance-form syntax.
The error-form syntax used in this example includes only explicitly included errors. In contrast, the
covariance-form syntax includes an error term in each equation, unless the noerror option is specified.
The default for covstate() also differs between the error-form syntax and the covarianceform syntax. Because the coefficients on the errors in the error-form syntax are frequently used to

446

sspace State-space models

estimate the standard deviation of the errors, covstate(identity) is the default for the errorform syntax. In contrast, unit variances are less common in the covariance-form syntax, for which
covstate(diagonal) is the default. In this example, we specified covstate(diagonal) to estimate
a nonunitary variance for the state.
Typing
. arima D.lncaputil, noconstant ar(1) ma(1) technique(nr)
(output omitted )

produces nearly identical results. As in the AR(1) example above, arima estimates the standard deviation
of the error term, while sspace estimates the variance. Although they are theoretically equivalent,
the different parameterizations give rise to small numerical differences in the other parameters.

Example 3: A VAR(1) model


The variable lnhours contains data on the log of manufacturing hours, which we treat as firstdifference stationary. We have a theory in which the process driving the changes in the log utilization
rate affects the changes in the log of hours, but changes in the log hours do not affect changes in
the log utilization rate. In line with this theory, we estimate the parameters of a lower triangular,
first-order vector autoregressive (VAR(1)) process

lncaputilt
lnhourst


=

1
2

0
3



lncaputilt1
lnhourst1


+

1t
2t


(4)

where yt = yt yt1 , t = (1t , 2t )0 and Var() = . We can write this VAR(1) process as a
state-space model with state equations

u1t
u2t


=

1
2

0
3



u1(t1)
u2(t1)


+

with Var() = and observation equations

lncaputil
lnhours


=

u1t
u2t

Below we estimate the parameters of the state-space model:

1t
2t


(5)

sspace State-space models

447

. constraint 5 [D.lncaputil]u1 = 1
. constraint 6 [D.lnhours]u2 = 1
. sspace (u1 L.u1, state noconstant)
>
(u2 L.u1 L.u2, state noconstant)
>
(D.lncaputil u1, noconstant noerror)
>
(D.lnhours u2, noconstant noerror),
>
constraints(5/6) covstate(unstructured)
searching for initial values ...........
(setting technique to bhhh)
Iteration 0:
log likelihood = 2993.6647
Iteration 1:
log likelihood = 3088.7416
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = 3211.7532
Iteration 1:
log likelihood = 3211.7532
State-space model
Sample: 1972m2 - 2008m12

Number of obs
Wald chi2(3)
Prob > chi2

Log likelihood = 3211.7532


( 1) [D.lncaputil]u1 = 1
( 2) [D.lnhours]u2 = 1

Coef.

=
=
=

443
166.87
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

u1
u1
L1.

.353257

.0448456

7.88

0.000

.2653612

.4411528

u1
L1.

.1286218

.0394742

3.26

0.001

.0512537

.2059899

u2
L1.

-.3707083

.0434255

-8.54

0.000

-.4558208

-.2855959

D.lncaputil
u1

(constrained)

(constrained)
0.000
0.000
0.000

.0000541
.0000208
.0000335

.0000705
.0000312
.0000437

u2

D.lnhours
u2
var(u1)
cov(u1,u2)
var(u2)

.0000623
.000026
.0000386

4.19e-06
2.67e-06
2.61e-06

14.88
9.75
14.76

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

Specifying covstate(unstructured) caused sspace to estimate the off-diagonal element of .


The output indicates that this parameter, cov(u2,u1): cons, is small but statistically significant.
The estimated coefficient on L.u1 in equation u1, 0.353, is the estimate of 1 in (5). The estimated
coefficient on L.u1 in equation u2, 0.129, is the estimate of 2 in (5). The estimated coefficient on
L.u1 in equation u2, 0.371, is the estimate of 3 in (5).
For the VAR(1) model in (4), the estimated autoregressive coefficient for D.lncaputil is similar to
the corresponding estimate in the univariate results in example 1. The estimated effect of LD.lncaputil
on D.lnhours is 0.129, the estimated autoregressive coefficient of D.lnhours is 0.371, and both
are statistically significant.

448

sspace State-space models

These estimates can be compared with those produced by typing


. constraint 101 [D_lncaputil]LD.lnhours = 0
. var D.lncaputil D.lnhours, lags(1) noconstant constraints(101)
(output omitted )
. matrix list e(Sigma)
(output omitted )

The var estimates are not the same as the sspace estimates because the generalized least-squares
estimator implemented in var is only asymptotically equivalent to the ML estimator implemented
in sspace, but the point estimates are similar. The comparison is useful for pedagogical purposes
because the var estimator is relatively simple.
Some problems require constraining a covariance term to zero. If we wanted to constrain
cov(u2,u1): cons to zero, we could type
. constraint 7 [cov(u2,u1)]_cons = 0
. sspace (u1 L.u1, state noconstant)
>
(u2 L.u1 L.u2, state noconstant)
>
(D.lncaputil u1, noconstant noerror)
>
(D.lnhours u2, noconstant noerror),
>
constraints(5/7) covstate(unstructured)
(output omitted )

Example 4: A VARMA(1,1) model


We now extend the previous example by modeling D.lncaputil and D.lnhours as a first-order
vector autoregressive moving-average (VARMA(1,1)) process. Building on the previous examples, we
allow the lag of D.lncaputil to affect D.lnhours but we do not allow the lag of D.lnhours
to affect the lag of D.lncaputil. Previous univariate analysis revealed that D.lnhours is better
modeled as an autoregressive process than as an ARMA(1,1) process. As a result, we estimate the
parameters of

lncaputilt
lnhourst


=

1
2



0
3

lncaputilt1
lnhourst1


+

1
0

0
0



1(t1)
2(t1)

We can write this VARMA(1,1) process as a state-space model with state equations


s1t
1
s2t = 0
s3t
2
where the states are


1 0
s1(t1)
1
0 0 s2(t1) + 1
0 3
s3(t1)
0

0  

0 1t
2t
1

s1t
lncaputilt
s2t =

1 1t
s3t
lnhourst

and we simplify the problem by assuming that


Var

1t
2t


=

12
0

0
22

Below we estimate the parameters of this model by using sspace:


+

1t
2t

sspace State-space models


.
.
.
.
.

constraint
constraint
constraint
constraint
constraint

7
8
9
10
11

[u1]L.u2
= 1
[u1]e.u1
= 1
[u3]e.u3
= 1
[D.lncaputil]u1 = 1
[D.lnhours]u3 = 1

. sspace (u1 L.u1 L.u2 e.u1, state noconstant)


>
(u2 e.u1, state noconstant)
>
(u3 L.u1 L.u3 e.u3, state noconstant)
>
(D.lncaputil u1, noconstant)
>
(D.lnhours u3, noconstant),
>
constraints(7/11) technique(nr) covstate(diagonal)
searching for initial values ..........
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = 3156.0564
Iteration 1:
log likelihood = 3156.0564
State-space model
Sample: 1972m2 - 2008m12

Number of obs
Wald chi2(4)
Prob > chi2

Log likelihood = 3156.0564


( 1) [u1]L.u2 = 1
( 2) [u1]e.u1 = 1
( 3) [u3]e.u3 = 1
( 4) [D.lncaputil]u1 = 1
( 5) [D.lnhours]u3 = 1

Coef.

OIM
Std. Err.

=
=
=

443
427.55
0.0000

P>|z|

[95% Conf. Interval]

0.000

.7033964

.9082098

u1
u1
L1.

.8058031

u2
L1.
e.u1

1
1

e.u1

-.518907

.0701848

-7.39

0.000

-.6564667

-.3813474

u1
L1.

.1734868

.0405156

4.28

0.000

.0940776

.252896

u3
L1.
e.u3

-.4809376
1

.0498574
-9.65
(constrained)

0.000

-.5786563

-.3832188

D.lncaputil
u1

(constrained)

(constrained)
0.000
0.000

.0000505
.0000331

.0000659
.0000432

.0522493

15.42

(constrained)
(constrained)

u2

u3

D.lnhours
u3
var(u1)
var(u3)

.0000582
.0000382

3.91e-06
2.56e-06

14.88
14.88

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

449

450

sspace State-space models

The estimates of the parameters in the model for D.lncaputil are similar to those in the univariate
model fit in example 2. The estimates of the parameters in the model for D.lnhours indicate that
the lag of D.lncaputil has a positive effect on D.lnhours.

Technical note
The technique(nr) option facilitates convergence in example 4. Fitting state-space models is
notoriously difficult. Convergence problems are common. Four methods for overcoming convergence
problems are 1) selecting an alternate optimization algorithm by using the technique() option,
2) using alternative starting values by specifying the from() option, 3) using starting values obtained
by estimating the parameters of a restricted version of the model of interest, or 4) putting the variables
on the same scale.

Example 5: A dynamic-factor model


Stock and Watson (1989, 1991) wrote a simple macroeconomic model as a dynamic-factor model,
estimated the parameters by ML, and extracted an economic indicator. In this example, we estimate
the parameters of a dynamic-factor model. In [TS] sspace postestimation, we extend this example
and extract an economic indicator for the differenced series.
We have data on an industrial-production index, ipman; an aggregate weekly hours index, hours;
and aggregate unemployment, unemp. income is real disposable income divided by 100. We rescaled
real disposable income to avoid convergence problems.
We postulate a latent factor that follows an AR(2) process. Each measured variable is then related
to the current value of that latent variable by a parameter. The state-space form of our model is



  
ft1
t
=
+
ft1
ft2
0



ipmant
1
1t
incomet 2
2t

= ft +
hourst
3
3t
unempt
4
4t
where

ft

1
1

2
0

2
1t
1

0
Var 2t =
0
3t
4t
0

0
22
0
0

0
0
32
0

0
0

0
42

sspace State-space models

451

The parameter estimates are


. use http://www.stata-press.com/data/r13/dfex
(St. Louis Fed (FRED) macro data)
. constraint 12 [lf]L.f = 1
. sspace (f L.f L.lf, state noconstant)
>
(lf L.f, state noconstant noerror)
>
(D.ipman f, noconstant)
>
(D.income f, noconstant)
>
(D.hours f, noconstant)
>
(D.unemp f, noconstant),
>
covstate(identity) constraints(12)
searching for initial values ................
(setting technique to bhhh)
Iteration 0:
log likelihood = -676.3091
Iteration 1:
log likelihood = -665.61104
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -662.09507
Iteration 1:
log likelihood = -662.09507
State-space model
Sample: 1972m2 - 2008m11
Log likelihood = -662.09507
( 1) [lf]L.f = 1

Coef.

Number of obs
Wald chi2(6)
Prob > chi2

=
=
=

442
751.95
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

f
f
L1.

.2651932

.0568663

4.66

0.000

.1537372

.3766491

lf
L1.

.4820398

.0624635

7.72

0.000

.3596136

.604466

f
L1.

.3502249

.0287389

12.19

0.000

.2938976

.4065522

.0746338

.0217319

3.43

0.001

.0320401

.1172276

.2177469

.0186769

11.66

0.000

.1811407

.254353

-.0676016

.0071022

-9.52

0.000

-.0815217

-.0536816

.1383158
.2773808
.0911446
.0237232

.0167086
.0188302
.0080847
.0017932

8.28
14.73
11.27
13.23

0.000
0.000
0.000
0.000

.1055675
.2404743
.0752988
.0202086

.1710641
.3142873
.1069903
.0272378

lf
(constrained)

D.ipman

D.income

D.hours

D.unemp

var(D.ipman)
var(D.income)
var(D.hours)
var(D.unemp)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output indicates that the unobserved factor is quite persistent and that it is a significant predictor
for each of the observed variables.

452

sspace State-space models

These models are frequently used to forecast the dependent variables and to estimate the unobserved
factors. We present some illustrative examples in [TS] sspace postestimation. The dfactor command
estimates the parameters of dynamic-factor models; see [TS] dfactor.

Some nonstationary state-space models


Example 6: A local-level model
Harvey (1989) advocates the use of STS models. These models parameterize the trends and seasonal
components of a set of time series. The simplest STS model is the local-level model, which is given
by

yt = t + t
where

t = t1 + t
The model is called a local-level model because the level of the series is modeled as a random walk
plus an idiosyncratic noise term. (The model is also known as the random-walk-plus-noise model.)
The local-level model is nonstationary because of the random-walk component. When the variance
of the idiosyncratic-disturbance t is zero and the variance of the level-disturbance t is not zero, the
local-level model reduces to a random walk. When the variance of the level-disturbance t is zero
and the variance of the idiosyncratic-disturbance t is not zero,

t = t1 =
and the local-level model reduces to

y t = + t
which is a simple regression with a time-invariant mean. The parameter is not estimated in the
state-space formulation below.
In this example, we fit weekly levels of the Standard and Poors 500 Index to a local-level model.
Because this model is already in state-space form, we fit close by typing

sspace State-space models

453

. use http://www.stata-press.com/data/r13/sp500w
. constraint 13 [z]L.z

= 1

. constraint 14 [close]z = 1
. sspace (z L.z, state noconstant) (close z, noconstant), constraints(13 14)
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood = -12581.763
Iteration 1:
log likelihood = -12577.727
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -12576.99
Iteration 1:
log likelihood = -12576.99
State-space model
Sample: 1 - 3093
Log likelihood = -12576.99
( 1) [z]L.z = 1
( 2) [close]z = 1

Number of obs

OIM
Std. Err.

close

Coef.

z
L1.

(constrained)

(constrained)

3093

P>|z|

[95% Conf. Interval]

0.000
0.000

155.4794
8.599486

close

var(z)
var(close)

170.3456
15.24858

7.584909
3.392457

22.46
4.49

185.2117
21.89767

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The results indicate that both components have nonzero variances. The output footer informs us
that the model is nonstationary at the estimated parameter values.

Technical note
In the previous example, we estimated the parameters of a nonstationary state-space model. The
model is nonstationary because one of the eigenvalues of the A matrix has unit modulus. That all
the coefficients in the A matrix are fixed is also important. See Lutkepohl (2005, 636637) for why
the ML estimator for the parameters of a nonstationary state-model that is nonstationary because of
eigenvalues with unit moduli from a fixed A matrix is still consistent and asymptotically normal.

Example 7: A local linear-trend model


In another basic STS model, known as the local linear-trend model, both the level and the slope
of a linear time trend are random walks. Here are the state equations and the observation equation
for a local linear-trend model for the level of industrial production contained in variable ipman:

454

sspace State-space models

t
t


=

1 1
0 1



t1
t1


+

1t
2t

ipmant = t + t
The estimated parameters are
. use http://www.stata-press.com/data/r13/dfex
(St. Louis Fed (FRED) macro data)
. constraint 15 [f1]L.f1 = 1
. constraint 16 [f1]L.f2 = 1
. constraint 17 [f2]L.f2 = 1
. constraint 18 [ipman]f1
= 1
. sspace (f1 L.f1 L.f2, state noconstant)
>
(f2 L.f2, state noconstant)
>
(ipman f1, noconstant), constraints(15/18)
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood = -362.93861
Iteration 1:
log likelihood = -362.12048
(output omitted )
Refining estimates:
Iteration 0:
log likelihood = -359.1266
Iteration 1:
log likelihood = -359.1266
State-space model
Sample: 1972m1 - 2008m11
Number of obs
Log likelihood = -359.1266
( 1) [f1]L.f1 = 1
( 2) [f1]L.f2 = 1
( 3) [f2]L.f2 = 1
( 4) [ipman]f1 = 1
OIM
Std. Err.

ipman

Coef.

f1
L1.

(constrained)

f2
L1.

(constrained)

f2
L1.

(constrained)

f1

(constrained)

443

P>|z|

[95% Conf. Interval]

3.62
2.72
2.39

0.000
0.003
0.008

.067506
.0049898
.0063989

f1

f2

ipman

var(f1)
var(f2)
var(ipman)

.1473071
.0178752
.0354429

.0407156
.0065743
.0148186

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

.2271082
.0307606
.0644868

sspace State-space models

455

There is little evidence that either of the variance parameters are zero. The fit obtained indicates
that we could now proceed with specification testing and checks to see how well this model forecasts
these data.

Stored results
sspace stores the following in e():
Scalars
e(N)
e(k)
e(k aux)
e(k eq)
e(k dv)
e(k obser)
e(k state)
e(k obser err)
e(k state err)
e(df m)
e(ll)
e(chi2)
e(p)
e(tmin)
e(tmax)
e(stationary)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(cmdline)
e(depvar)
e(obser deps)
e(state deps)
e(covariates)
e(indeps)
e(tvar)
e(eqnames)
e(title)
e(tmins)
e(tmaxs)
e(R structure)
e(Q structure)
e(chi2type)
e(vce)
e(vcetype)
e(opt)
e(method)
e(initial values)
e(technique)
e(tech steps)
e(datasignature)
e(datasignaturevars)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)

number of observations
number of parameters
number of auxiliary parameters
number of equations in e(b)
number of dependent variables
number of observation equations
number of state equations
number of observation-error terms
number of state-error terms
model degrees of freedom
log likelihood
2

significance
minimum time in sample
maximum time in sample
1 if the estimated parameters indicate a stationary model, 0 otherwise
rank of VCE
number of iterations
return code
1 if converged, 0 otherwise
sspace
command as typed
unoperated names of dependent variables in observation equations
names of dependent variables in observation equations
names of dependent variables in state equations
list of covariates
independent variables
variable denoting time within groups
names of equations
title in estimation output
formatted minimum time
formatted maximum time
structure of observed-variable-error covariance matrix
structure of state-error covariance matrix
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
type of optimization
likelihood method
type of initial values
maximization technique
iterations taken in maximization technique
the checksum
variables used in calculation of checksum
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

456

sspace State-space models

Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(gamma)
e(A)
e(B)
e(C)
e(D)
e(F)
e(G)
e(chol R)
e(chol Q)
e(chol Sz0)
e(z0)
e(d)
e(T)
e(M)
e(V)
e(V modelbased)
Functions
e(sample)

parameter vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
mapping from parameter vector to state-space matrices
estimated A matrix
estimated B matrix
estimated C matrix
estimated D matrix
estimated F matrix
estimated G matrix
Cholesky factor of estimated R matrix
Cholesky factor of estimated Q matrix
Cholesky factor of initial state covariance matrix
initial state vector augmented with a matrix identifying nonstationary components
additional term in diffuse initial state vector, if nonstationary model
inner part of quadratic form for initial state covariance in a partially
nonstationary model
outer part of quadratic form for initial state covariance in a partially
nonstationary model
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


Recall that our notation for linear state-space models with time-invariant coefficient matrices is

zt = Azt1 + Bxt + Ct


yt = Dzt + Fwt + Gt
where

zt is an m 1 vector of unobserved state variables;


xt is a kx 1 vector of exogenous variables;
t is a q 1 vector of state-error terms, (q m);

yt is an n 1 vector of observed endogenous variables;


wt is a kw 1 vector of exogenous variables;
t is an r 1 vector of observation-error terms, (r n); and

A, B, C, D, F, and G are parameter matrices.


The equations for zt are known as the state equations, and the equations for yt are known as the
observation equations.
The error terms are assumed to be zero mean, normally distributed, serially uncorrelated, and
uncorrelated with each other;
t N (0, Q)
t N (0, R)

E[t 0s ] = 0 for all s 6= t


E[t 0s ] = 0 for all s and t

sspace State-space models

457

sspace estimates the parameters of linear state-space models by maximum likelihood. The Kalman
filter is a method for recursively obtaining linear, least-squares forecasts of yt conditional on
past information. These forecasts are used to construct the log likelihood, assuming normality and
stationarity. When the model is nonstationary, a diffuse Kalman filter is used.
Hamilton (1994a; 1994b, 389) shows that the QML estimator, obtained when the normality
assumption is dropped, is consistent and asymptotically normal, although the variancecovariance
matrix of the estimator (VCE) must be estimated by the Huber/White/sandwich estimator. Hamiltons
discussion applies to stationary models, and specifying vce(robust) produces a consistent estimator
of the VCE when the errors are not normal.
Methods for computing the log likelihood differ in how they calculate initial values for the Kalman
filter when the model is stationary, how they compute a diffuse Kalman filter when the model is
nonstationary, and whether terms for initial states are included. sspace offers the method(hybrid),
method(dejong), and method(kdiffuse) options for computing the log likelihood. All three
methods handle both stationary and nonstationary models.
method(hybrid), the default, uses the initial values for the states implied by stationarity to
initialize the Kalman filter when the model is stationary. Hamilton (1994b, 378) discusses this method
of computing initial values for the states and derives a log-likelihood function that does not include
terms for the initial states. When the model is nonstationary, method(hybrid) uses the De Jong
(1988, 1991) diffuse Kalman filter and log-likelihood function, which includes terms for the initial
states.
method(dejong) uses the stationary De Jong (1988) method when the model is stationary and
the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. The stationary
De Jong (1988) method estimates initial values for the Kalman filter as part of the log-likelihood
computation, as in De Jong (1988).
method(kdiffuse) implements the seldom-used large- diffuse approximation to the diffuse
Kalman filter when the model is nonstationary and uses initial values for the states implied by
stationarity when the model is stationary. The log likelihood does not include terms for the initial
states in either case. We recommend that you do not use method(kdiffuse) except to replicate
older results computed using this method.
De Jong (1988, 1991) and De Jong and Chu-Chun-Lin (1994) derive the log likelihood and a diffuse
Kalman filter for handling nonstationary data. De Jong (1988) replaces the stationarity assumption
with a time-immemorial assumption, which he uses to derive the log-likelihood function, an initial
state vector, and a covariance of the initial state vector when the model is nonstationary. By default,
and when method(hybrid) or method(dejong) is specified, sspace uses the diffuse Kalman filter
given in definition 5 of De Jong and Chu-Chun-Lin (1994). This method uses theorem 3 of De Jong
and Chu-Chun-Lin (1994) to compute the covariance of the initial states. When using this method,
sspace saves the matrices from their theorem 3 in e(), although the names are changed. e(Z) is
their U1 , e(T) is their U2 , e(A) is their T, and e(M) is their M.
See De Jong (1988, 1991) and De Jong and Chu-Chun-Lin (1994) for the details of the De Jong
diffuse Kalman filter.
Practical estimation and inference require that the maximum likelihood estimator be consistent and
normally distributed in large samples. These statistical properties of the maximum likelihood estimator
are well established when the model is stationary; see Caines (1988, chap. 5 and 7), Hamilton (1994b,
388389), and Hannan and Deistler (1988, chap. 4). When the model is nonstationary, additional
assumptions must hold for the maximum likelihood estimator to be consistent and asymptotically
normal; see Harvey (1989, sec. 3.4), Lutkepohl (2005, 636637), and Schneider (1988). Chang,
Miller, and Park (2009) show that the ML and the QML estimators are consistent and asymptotically
normal for a class of nonstationary state-space models.

458

sspace State-space models

We now give an intuitive version of the Kalman filter. sspace uses theoretically equivalent, but
numerically more stable, methods. For each time t, the Kalman filter produces the conditional expected
state vector zt|t and the conditional covariance matrix t|t ; both are conditional on information up
to and including time t. Using the model and previous period results, for each t we begin with

zt|t1 = Azt1|t1 + Bxt


t|t1 = At1|t1 A0 + CQC0

(6)

yt|t1 = Dzt|t1 + Fwt


The residuals and the mean squared error (MSE) matrix of the forecast error are

e
t|t = yt yt|t1

(7)

t|t = Dt|t1 D0 + GRG0

In the last steps, we update the conditional expected state vector and the conditional covariance
with the time t information:

zt|t = zt|t1 + t|t1 D1


t|t
t|t e

(8)

1 0
t|t = t|t1 t|t1 Dt|t
D t|t1

Equations (6)(8) are the Kalman filter. The equations denoted by (6) are the one-step predictions.
The one-step predictions do not use contemporaneous values of yt ; only past values of yt , past values
of the exogenous xt , and contemporaneous values of xt are used. Equations (7) and (8) form the
update step of the Kalman filter; they incorporate the contemporaneous dependent variable information
into the predicted states.
The Kalman filter requires initial values for the states and a covariance matrix for the initial states
to start off the recursive process. Hamilton (1994b) discusses how to compute initial values for the
Kalman filter assuming stationarity. This method is used by default when the model is stationary. De
Jong (1988) discusses how to estimate initial values by maximum likelihood; this method is used
when method(dejong) is specified.
Letting be the vector of parameters in the model, Lutkepohl (2005) and Harvey (1989) show
that the log-likelihood function for the parameters of a stationary model is given by

(
lnL() = 0.5

nT ln(2) +

T
X
t=1

ln(|t|t1 |) +

T
X

)
et 1
t|t1 et
0

t=1

where et = (yt yt|t1 ) depends on and also depends on .


The variancecovariance matrix of the estimator (VCE) is estimated by the observed information matrix (OIM) estimator by default. Specifying vce(robust) causes sspace to use the Huber/White/sandwich estimator. Both estimators of the VCE are standard and documented in Hamilton (1994b).
Hamilton (1994b), Hannan and Deistler (1988), and Caines (1988) show that the ML estimator
is consistent and asymptotically normal when the model is stationary. Schneider (1988) establishes
consistency and asymptotic normality when the model is nonstationary because A has some eigenvalues
with modulus 1 and there are no unknown parameters in A.

sspace State-space models

459

Not all state-space models are identified, as discussed in Hamilton (1994b) and Lutkepohl (2005).
sspace checks for local identification at the optimum. sspace will not declare convergence unless
the Hessian is full rank. This check for local identifiability is due to Rothenberg (1971).
Specifying method(dejong) causes sspace to maximize the log-likelihood function given in
section 2 (vii) of De Jong (1988). This log-likelihood function includes the initial states as parameters
to be estimated. We use some of the methods in Casals, Sotoca, and Jerez (1999) for computing the
De Jong (1988) log-likelihood function.

References
Anderson, B. D. O., and J. B. Moore. 1979. Optimal Filtering. Englewood Cliffs, NJ: Prentice Hall.
Brockwell, P. J., and R. A. Davis. 1991. Time Series: Theory and Methods. 2nd ed. New York: Springer.
Caines, P. E. 1988. Linear Stochastic Systems. New York: Wiley.
Casals, J., S. Sotoca, and M. Jerez. 1999. A fast and stable method to compute the likelihood of time invariant
state-space models. Economics Letters 65: 329337.
Chang, Y., J. I. Miller, and J. Y. Park. 2009. Extracting a common stochastic trend: Theory with some applications.
Journal of Econometrics 150: 231247.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165169.
. 1991. The diffuse Kalman filter. Annals of Statistics 19: 10731083.
De Jong, P., and S. Chu-Chun-Lin. 1994. Stationary and non-stationary state space models. Journal of Time Series
Analysis 15: 151166.
Drukker, D. M., and V. L. Wiggins. 2004. Verifying the solution from a nonlinear solver: A case study: Comment.
American Economic Review 94: 397399.
Hamilton, J. D. 1994a. State-space models. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L.
McFadden, 30393080. Amsterdam: Elsevier.
. 1994b. Time Series Analysis. Princeton: Princeton University Press.
Hannan, E. J., and M. Deistler. 1988. The Statistical Theory of Linear Systems. New York: Wiley.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
. 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Rothenberg, T. J. 1971. Identification in parametric models. Econometrica 39: 577591.
Schneider, W. 1988. Analytical uses of Kalman filtering in econometrics: A survey. Statistical Papers 29: 333.
Stock, J. H., and M. W. Watson. 1989. New indexes of coincident and leading economic indicators. In NBER
Macroeconomics Annual 1989, ed. O. J. Blanchard and S. Fischer, vol. 4, 351394. Cambridge, MA: MIT Press.
. 1991. A probability model of the coincident economic indicators. In Leading Economic Indicators: New
Approaches and Forecasting Records, ed. K. Lahiri and G. H. Moore, 6389. Cambridge: Cambridge University
Press.

460

sspace State-space models

Also see
[TS] sspace postestimation Postestimation tools for sspace
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] dfactor Dynamic-factor models
[TS] tsset Declare data to be time-series data
[TS] ucm Unobserved-components model
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
sspace postestimation Postestimation tools for sspace
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


References

Options for predict


Also see

Description
The following standard postestimation commands are available after sspace:
Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing and inference for linear combinations
of coefficients
likelihood-ratio test
point estimates, standard errors, testing and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
nlcom
predict
predictnl
test
testnl

Syntax for predict


predict
statistic

type

{ stub* | newvarlist }

if

 

in

 

, statistic options

Description

Main

xb
states
residuals
rstandard

observable variables
latent state variables
residuals
standardized residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

461

. . . if e(sample) . . . if wanted only for

462

sspace postestimation Postestimation tools for sspace

Description

options
Options

name(s) of equation(s) for which predictions are to be made


put estimated root mean squared errors of predicted statistics in new
variables
dynamic(time constant) begin dynamic forecast at specified time
equation(eqnames)
rmse(stub* | newvarlist)

Advanced

smethod(method)

method for predicting unobserved states

method

Description

onestep
smooth
filter

predict using past information


predict using all sample information
predict using past and contemporaneous information

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, states, residuals, and rstandard specify the statistic to be predicted.


xb, the default, calculates the linear predictions of the observed variables.
states calculates the linear predictions of the latent state variables.
residuals calculates the residuals in the equations for observable variables. residuals may not
be specified with dynamic().
rstandard calculates the standardized residuals, which are the residuals normalized to be uncorrelated and to have unit variances. rstandard may not be specified with smethod(filter),
smethod(smooth), or dynamic().

Options

equation(eqnames) specifies the equation(s) for which the predictions are to be calculated. If you
do not specify equation() or stub*, the results are the same as if you had specified the name
of the first equation for the predicted statistic.
You specify a list of equation names, such as equation(income consumption) or equation(factor1 factor2), to identify the equations. Specify names of state equations when
predicting states and names of observable equations in all other cases.
equation() may not be specified with stub*.
rmse(stub* | newvarlist) puts the root mean squared errors of the predicted statistics into the specified
new variables. The root mean squared errors measure the variances due to the disturbances but do
not account for estimation error.

sspace postestimation Postestimation tools for sspace

463

dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified
time constant must be in the scale of the time variable specified in tsset, and the time constant
must be inside a sample for which observations on the dependent variables are available. For
example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of
2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains
exogenous variables, they must be present for the whole predicted sample. dynamic() may not
be specified with rstandard, residuals, or smethod(smooth).

Advanced

smethod(method) specifies the method for predicting the unobserved states; smethod(onestep),
smethod(filter), and smethod(smooth) cause different amounts of information on the dependent variables to be used in predicting the states at each time period.
smethod(onestep), the default, causes predict to estimate the states at each time period using
previous information on the dependent variables. The Kalman filter is performed on previous
periods, but only the one-step predictions are made for the current period.
smethod(smooth) causes predict to estimate the states at each time period using all the sample
data by the Kalman smoother. smethod(smooth) may not be specified with rstandard.
smethod(filter) causes predict to estimate the states at each time period using previous and
contemporaneous data by the Kalman filter. The Kalman filter is performed on previous periods
and the current period. smethod(filter) may be specified only with states.

Remarks and examples


We assume that you have already read [TS] sspace. In this entry, we illustrate some of the features
of predict after using sspace to estimate the parameters of a state-space model.
All the predictions after sspace depend on the unobserved states, which are estimated recursively.
Changing the sample can alter the state estimates, which can change all other predictions.

Example 1: One-step predictions


In example 5 of [TS] sspace, we estimated the parameters of the dynamic-factor model

  
ft1
t
=
+
ft1
ft2
0



ipmant
1
1t
incomet 2
2t

= ft +
hourst
3
3t
unempt
4
4t
where

ft

1
1

2
0

2
1t
1
2t 0
Var =
3t
0
4t
0



0
22
0
0

0
0
32
0

0
0

0
2
4

464

sspace postestimation Postestimation tools for sspace

by typing
. use http://www.stata-press.com/data/r13/dfex
(St. Louis Fed (FRED) macro data)
. constraint 1 [lf]L.f = 1
. sspace (f L.f L.lf, state noconstant)
>
(lf L.f, state noconstant noerror)
>
(D.ipman f, noconstant)
>
(D.income f, noconstant)
>
(D.hours f, noconstant)
>
(D.unemp f, noconstant),
>
covstate(identity) constraints(1)
(output omitted )

Below we obtain the one-step predictions for each of the four dependent variables in the model,
and then we graph the actual and predicted ipman:
. predict dep*
(option xb assumed; fitted values)

. tsline D.ipman dep1, lcolor(gs10) xtitle("") legend(rows(2))

1970m1

1980m1

1990m1

2000m1

2010m1

Industrial production; manufacturing (NAICS), D


xb prediction, D.ipman, onestep

The graph shows that the one-step predictions account for only a small part of the swings in the
realized ipman.

Example 2: Out-of-sample, dynamic predictions


We use the estimates from the previous example to make out-of-sample predictions. After using
tsappend to extend the dataset by six periods, we use predict with the dynamic() option and
graph the result.
. tsappend, add(6)
. predict Dipman_f, dynamic(tm(2008m12)) equation(D.ipman)
. tsline D.ipman Dipman_f if month>=tm(2008m1), xtitle("") legend(rows(2))

465

sspace postestimation Postestimation tools for sspace

2008m1

2008m4

2008m7

2008m10

2009m1

2009m4

Industrial production; manufacturing (NAICS), D


xb prediction, D.ipman, dynamic(tm(2008m12))

The model predicts that the changes in industrial production will remain negative for the forecast
horizon, although they increase toward zero.

Example 3: Estimating an unobserved factor


In this example, we want to estimate the unobserved factor instead of predicting a dependent
variable. Specifying smethod(smooth) causes predict to use all sample information in estimating
the states by the Kalman smoother.
Below we estimate the unobserved factor by using the estimation sample, and we graph ipman
and the estimated factor:

. predict fac if e(sample), states smethod(smooth) equation(f)


. tsline D.ipman fac, xtitle("") legend(rows(2))

1970m1

1980m1

1990m1

2000m1

Industrial production; manufacturing (NAICS), D


states, f, smooth

2010m1

466

sspace postestimation Postestimation tools for sspace

Example 4: Calculating residuals


The residuals and the standardized residuals are frequently used to review the specification of the
model.
Below we calculate the standardized residuals for each of the series and display them in a combined
graph:
. predict sres1-sres4 if e(sample), rstandard
. tsline sres1, xtitle("") name(sres1)
. tsline sres2, xtitle("") name(sres2)
. tsline sres3, xtitle("") name(sres3)
. tsline sres4, xtitle("") name(sres4)

rstandard, D.hours, onestep


4 2
0
2
4
6

1970m1 1980m1 1990m1 2000m1 2010m1

1970m1 1980m1 1990m1 2000m1 2010m1

rstandard, D.income, onestep


5
0
5
10
1970m1 1980m1 1990m1 2000m1 2010m1

rstandard, D.unemp, onestep


4
2
0
2
4

rstandard, D.ipman, onestep


10
5
0
5

. graph combine sres1 sres2 sres3 sres4, name(combined)

1970m1 1980m1 1990m1 2000m1 2010m1

Methods and formulas


Estimating the unobserved states is key to predicting the dependent variables.
By default and with the smethod(onestep) option, predict estimates the states in each period
by applying the Kalman filter to all previous periods and only making the one-step predictions to the
current period. (See Methods and formulas of [TS] sspace for the Kalman filter equations.)
With the smethod(filter) option, predict estimates the states in each period by applying
the Kalman filter on all previous periods and the current period. The computational difference between smethod(onestep) and smethod(filter) is that smethod(filter) performs the update
step on the current period while smethod(onestep) does not. The statistical difference between

sspace postestimation Postestimation tools for sspace

467

smethod(onestep) and smethod(filter) is that smethod(filter) uses contemporaneous information on the dependent variables while smethod(onestep) does not.
As noted in [TS] sspace, sspace has both a stationary and a diffuse Kalman filter. predict uses
the same Kalman filter used for estimation.
With the smethod(smooth) option, predict estimates the states in each period using all the
sample information by applying the Kalman smoother. predict uses the Harvey (1989, sec. 3.6.2)
fixed-interval smoother with model-based initial values to estimate the states when the estimated
parameters imply a stationary model. De Jong (1989) provides a computationally efficient method.
Hamilton (1994) discusses the model-based initial values for stationary state-space models. When the
model is nonstationary, the De Jong (1989) diffuse Kalman smoother is used to predict the states.
The smoothed estimates of the states are subsequently used to predict the dependent variables.
The dependent variables are predicted by plugging in the estimated states. The residuals are
calculated as the differences between the predicted and the realized dependent variables. The root
mean squared errors are the square roots of the diagonal elements of the mean squared error matrices
that are computed by the Kalman filter. The standardized residuals are the residuals normalized by
the Cholesky factor of their mean squared error produced by the Kalman filter.
predict uses the Harvey (1989, sec. 3.5) methods to compute the dynamic forecasts and the root
mean squared errors. Let be the period at which the dynamic forecasts begin; must either be in
the specified sample or be in the period immediately following the specified sample.
The dynamic forecasts depend on the predicted states in the period 1, which predict obtains by
running the Kalman filter or the diffuse Kalman filter on the previous sample observations. The states
in the periods prior to starting the dynamic predictions may be estimated using smethod(onestep)
or smethod(smooth).
Using an if or in qualifier to alter the prediction sample can change the estimate of the unobserved
states in the period prior to beginning the dynamic predictions and hence alter the dynamic predictions.
The initial states are estimated using e(b) and the prediction sample.

References
De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165169.
. 1989. Smoothing and interpolation with the state-space model. Journal of the American Statistical Association
84: 10851088.
. 1991. The diffuse Kalman filter. Annals of Statistics 19: 10731083.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] sspace State-space models
[TS] dfactor Dynamic-factor models
[TS] dfactor postestimation Postestimation tools for dfactor
[U] 20 Estimation and postestimation commands

Title
tsappend Add observations to a time-series dataset
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
tsappend ,

add(#) | last(date | clock) tsfmt(string)

options

Description

options

add(#)
last(date | clock)

tsfmt(string)
panel(panel id)

add # observations
add observations at date or clock
use time-series function string with last(date | clock)
add observations to panel panel id

Either add(#) is required, or last(date | clock) and tsfmt(string) are required.


You must tsset your data before using tsappend; see [TS] tsset.

Menu
Statistics

>

Time series

>

Setup and utilities

>

Add observations to time-series dataset

Description
tsappend appends observations to a time-series dataset or to a panel dataset. tsappend uses and
updates the information set by tsset.

Options
add(#) specifies the number of observations to add.
last(date | clock) and tsfmt(string) must be specified together and are an alternative to add().
last(date | clock) specifies the date or the date and time of the last observation to add.
tsfmt(string) specifies the name of the Stata time-series function to use in converting the date
specified in last() to an integer. The function names are tc (clock), tC (Clock), td (daily), tw
(weekly), tm (monthly), tq (quarterly), and th (half-yearly).
For clock times, the last time added (if any) will be earlier than the time requested in
last(date | clock) if last() is not a multiple of delta units from the last time in the data.
For instance, you might specify last(17may2007) tsfmt(td), last(2001m1) tsfmt(tm), or
last(17may2007 15:30:00) tsfmt(tc).
panel(panel id) specifies that observations be added only to panels with the ID specified in panel().
468

tsappend Add observations to a time-series dataset

469

Remarks and examples


Remarks are presented under the following headings:
Introduction
Using tsappend with time-series data
Using tsappend with panel data

Introduction
tsappend adds observations to a time-series dataset or to a panel dataset. You must tsset your
data before using tsappend. tsappend simultaneously removes any gaps from the dataset.
There are two ways to use tsappend: you can specify the add(#) option to request that #
observations be added, or you can specify the last(date | clock) option to request that observations
be appended until the date specified is reached. If you specify last(), you must also specify tsfmt().
tsfmt() specifies the Stata time-series date function that converts the date held in last() to an
integer.
tsappend works with time series of panel data. With panel data, tsappend adds the requested
observations to all the panels, unless the panel() option is also specified.

Using tsappend with time-series data


tsappend can be useful for appending observations when dynamically predicting a time series.
Consider an example in which tsappend adds the extra observations before dynamically predicting
from an AR(1) regression:
. use http://www.stata-press.com/data/r13/tsappend1
. regress y l.y
SS
df
MS
Source
Model
Residual

115.349555
461.241577

1
477

115.349555
.966963473

Total

576.591132

478

1.2062576

Coef.

Std. Err.

y
L1.

.4493507

.0411417

_cons

11.11877

.8314581

Number of obs
F( 1,
477)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

479
119.29
0.0000
0.2001
0.1984
.98334

P>|t|

[95% Conf. Interval]

10.92

0.000

.3685093

.5301921

13.37

0.000

9.484993

12.75254

. mat b = e(b)
. mat colnames b = L.xb one
. tsset
time variable: t2, 1960m2 to 2000m1
delta: 1 month
. tsappend, add(12)
. tsset
time variable: t2, 1960m2 to 2001m1
delta: 1 month
. predict xb if t2<=tm(2000m2)
(option xb assumed; fitted values)
(12 missing values generated)

470

tsappend Add observations to a time-series dataset


. gen one=1
. mat score xb=b if t2>=tm(2000m2), replace

The calls to tsset before and after tsappend were unnecessary. Their output reveals that tsappend
added another year of observations. We then used predict and matrix score to obtain the dynamic
predictions, which allows us to produce the following graph:

18

19

20

21

22

23

. line y xb t2 if t2>=tm(1995m1), ytitle("") xtitle("time")

1995m1

1996m1

1997m1

1998m1
time

1999m1

2000m1

2001m1

Fitted values

In the call to tsappend, instead of saying that we wanted to add 12 observations, we could have
specified that we wanted to fill in observations through the first month of 2001:
. use http://www.stata-press.com/data/r13/tsappend1, clear
. tsset
time variable: t2, 1960m2 to 2000m1
delta: 1 month
. tsappend, last(2001m1) tsfmt(tm)
. tsset
time variable: t2, 1960m2 to 2001m1
delta: 1 month

We specified the tm() function in the tsfmt() option. [D] functions contains a list of timeseries functions for translating date literals to integers. Because we have monthly data, and since
[D] functions tells us that we want to use the tm() function, we specified the tsfmt(tm) option.
The following table shows the most common types of time-series data, their formats, the appropriate
translation functions, and the corresponding options for tsappend:
Description
time
time
daily
weekly
monthly
quarterly
half-yearly
yearly

Format
%tc
%tC
%td
%tw
%tm
%tq
%th
%ty

Function
tc()
tC()
td()
tw()
tm()
tq()
th()
ty()

Option
tsfmt(tc)
tsfmt(tC)
tsfmt(td)
tsfmt(tw)
tsfmt(tm)
tsfmt(tq)
tsfmt(th)
tsfmt(ty)

tsappend Add observations to a time-series dataset

471

Using tsappend with panel data


tsappends actions on panel data are similar to its action on time-series data, except that tsappend
performs those actions on each time series within the panels.
If the end dates vary over panels, last() and add() will produce different results. add(#) always
adds # observations to each panel. If the data end at different periods before tsappend, add() is
used, the data will still end at different periods after tsappend, add(). In contrast, tsappend,
last() tsfmt() will cause all the panels to end on the specified last date. If the beginning dates
differ across panels, using tsappend, last() tsfmt() to provide a uniform ending date will not
create balanced panels because the number of observations per panel will still differ.
Consider the panel data summarized in the output below:
. use http://www.stata-press.com/data/r13/tsappend3, clear
. xtdescribe
id: 1, 2, ..., 3
n =
t2: 1998m1, 1998m2, ..., 2000m1
T =
Delta(t2) = 1 month
Span(t2) = 25 periods
(id*t2 uniquely identifies each observation)
Distribution of T_i:
min
5%
25%
50%
75%
13
13
13
20
24
Freq. Percent
Cum.
Pattern
1
1
1

33.33
33.33
33.33

33.33
66.67
100.00

3
100.00
. by id: summarize t2

3
25

95%
24

max
24

............1111111111111
1111.11111111111111111111
11111111111111111111.....
XXXXXXXXXXXXXXXXXXXXXXXXX

-> id = 1
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

13

474

3.89444

468

480

-> id = 2
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

20

465.5

5.91608

456

475

Variable

Obs

Mean

Std. Dev.

Min

Max

t2

24

468.3333

7.322786

456

480

-> id = 3

The output from xtdescribe and summarize on these data tells us that one panel starts later
than the other, that another panel ends before the other two, and that the remaining panel has a gap
in the time variable but otherwise spans the entire time frame.

472

tsappend Add observations to a time-series dataset

Now consider the data after a call to tsappend, add(6):


. tsappend, add(6)
. xtdescribe
id: 1, 2, ..., 3
t2: 1998m1, 1998m2, ..., 2000m7
Delta(t2) = 1 month
Span(t2) = 31 periods
(id*t2 uniquely identifies each observation)
Distribution of T_i:
Percent

min
19
Cum.

1
1
1

33.33
33.33
33.33

33.33
66.67
100.00

100.00

Freq.

5%
25%
19
19
Pattern

n =
T =

50%
26

75%
31

3
31

95%
31

max
31

............1111111111111111111
11111111111111111111111111.....
1111111111111111111111111111111
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

. by id: summarize t2
-> id = 1
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

19

477

5.627314

468

486

-> id = 2
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

26

468.5

7.648529

456

481

-> id = 3
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

31

471

9.092121

456

486

This output from xtdescribe and summarize after the call to tsappend shows that the call to
tsappend, add(6) added 6 observations to each panel and filled in the gap in the time variable in
the second panel. tsappend, add() did not cause a uniform end date over the panels.
The following output illustrates the contrast between tsappend, add() and tsappend, last()
tsfmt() with panel data that end at different dates. The output from xtdescribe and summarize
shows that the call to tsappend, last() tsfmt() filled in the gap in t2 and caused all the panels
to end at the specified end date. The output also shows that the panels remain unbalanced because
one panel has a later entry date than the other two.

tsappend Add observations to a time-series dataset


. use http://www.stata-press.com/data/r13/tsappend2, clear
. tsappend, last(2000m7) tsfmt(tm)
. xtdescribe
id: 1, 2, ..., 3
n =
t2: 1998m1, 1998m2, ..., 2000m7
T =
Delta(t2) = 1 month
Span(t2) = 31 periods
(id*t2 uniquely identifies each observation)
Distribution of T_i:
min
5%
25%
50%
75%
19
19
19
31
31
Pattern
Freq. Percent
Cum.
2
1

66.67
33.33

66.67
100.00

3
100.00
. by id: summarize t2

3
31

95%
31

1111111111111111111111111111111
............1111111111111111111
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

-> id = 1
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

19

477

5.627314

468

486

Variable

Obs

Mean

Std. Dev.

Min

Max

t2

31

471

9.092121

456

486

-> id = 3
Variable

Obs

Mean

Std. Dev.

Min

Max

t2

31

471

9.092121

456

486

-> id = 2

Stored results
tsappend stores the following in r():
Scalars
r(add)

number of observations added

Also see
[TS] tsset Declare data to be time-series data

max
31

473

Title
tsfill Fill in gaps in time variable
Syntax
Remarks and examples

Menu
Also see

Description

Option

Syntax
tsfill

, full

You must tsset your data before using tsfill; see [TS] tsset.

Menu
Statistics

>

Time series

>

Setup and utilities

>

Fill in gaps in time variable

Description
tsfill is used after tsset to fill in gaps in time-series data and gaps in panel data with
new observations, which contain missing values. For instance, perhaps observations for timevar =
1, 3, 5, 6, . . . , 22 exist. tsfill would create observations for timevar = 2 and timevar = 4 containing
all missing values. There is seldom reason to do this because Statas time-series operators consider
timevar, not the observation number. Referring to L.gnp to obtain lagged gnp values would correctly
produce a missing value for timevar = 3, even if the data were not filled in. Referring to L2.gnp
would correctly return the value of gnp in the first observation for timevar = 3, even if the data were
not filled in.

Option
full is for use with panel data only. With panel data, tsfill by default fills in observations for
each panel according to the minimum and maximum values of timevar for the panel. Thus if the
first panel spanned the times 520 and the second panel the times 115, after tsfill they would
still span the same periods; observations would be created to fill in any missing times from 520
in the first panel and from 115 in the second.
If full is specified, observations are created so that both panels span the time 120, the overall
minimum and maximum of timevar across panels.

Remarks and examples


Remarks are presented under the following headings:
Using tsfill with time-series data
Using tsfill with panel data
Video example

474

tsfill Fill in gaps in time variable

475

Using tsfill with time-series data


You have monthly data, with gaps:
. use http://www.stata-press.com/data/r13/tsfillxmpl
. tsset
time variable: mdate, 1995m7 to 1996m3, but with gaps
delta: 1 month
. list mdate income
mdate

income

1.
2.
3.
4.
5.

1995m7
1995m8
1995m11
1995m12
1996m1

1153
1181
1236
1297
1265

6.

1996m3

1282

You can fill in the gaps by interpolation easily with tsfill and ipolate. tsfill creates the
missing observations:
. tsfill
. list mdate income
mdate

income

1.
2.
3.
4.
5.

1995m7
1995m8
1995m9
1995m10
1995m11

1153
1181
.
.
1236

6.
7.
8.
9.

1995m12
1996m1
1996m2
1996m3

1297
1265
.
1282

new
new

new

We can now use ipolate (see [D] ipolate) to fill them in:
. ipolate income mdate, gen(ipinc)
. list mdate income ipinc
mdate

income

ipinc

1.
2.
3.
4.
5.

1995m7
1995m8
1995m9
1995m10
1995m11

1153
1181
.
.
1236

1153
1181
1199.3333
1217.6667
1236

6.
7.
8.
9.

1995m12
1996m1
1996m2
1996m3

1297
1265
.
1282

1297
1265
1273.5
1282

476

tsfill Fill in gaps in time variable

Using tsfill with panel data


You have the following panel dataset:
. use http://www.stata-press.com/data/r13/tsfillxmpl2, clear
. tsset
panel variable: edlevel (unbalanced)
time variable: year, 1988 to 1992, but with a gap
delta: 1 unit
. list edlevel year income
edlevel

year

income

1.
2.
3.
4.
5.

1
1
1
1
2

1988
1989
1990
1991
1989

14500
14750
14950
15100
22100

6.
7.

2
2

1990
1992

22200
22800

Just as with nonpanel time-series datasets, you can use tsfill to fill in the gaps:
. tsfill
. list edlevel year income
edlevel

year

income

1.
2.
3.
4.
5.

1
1
1
1
2

1988
1989
1990
1991
1989

14500
14750
14950
15100
22100

6.
7.
8.

2
2
2

1990
1991
1992

22200
.
22800

new

You could instead use tsfill to produce fully balanced panels with the full option:
. tsfill, full
. list edlevel year income, sep(0)

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

edlevel

year

income

1
1
1
1
1
2
2
2
2
2

1988
1989
1990
1991
1992
1988
1989
1990
1991
1992

14500
14750
14950
15100
.
.
22100
22200
.
22800

new
new

new

tsfill Fill in gaps in time variable

Video example
Time series, part 1: Formatting dates, tsset, tsreport, and tsfill

Also see
[TS] tsset Declare data to be time-series data
[TS] tsappend Add observations to a time-series dataset

477

Title
tsfilter Filter a time-series, keeping only selected periodicities
Syntax
Acknowledgments

Description
References

Remarks and examples


Also see

Methods and formulas

Syntax
Filter one variable
tsfilter filter

type

newvar = varname

if

 

in

 

, options

Filter multiple variables, unique names




    

tsfilter filter type newvarlist = varlist if
in
, options
Filter multiple variables, common name stub


    

tsfilter filter type stub* = varlist if
in
, options
filter

Name

See

bk
bw
cf
hp

BaxterKing
Butterworth
ChristianoFitzgerald
HodrickPrescott

[TS]
[TS]
[TS]
[TS]

tsfilter
tsfilter
tsfilter
tsfilter

bk
bw
cf
hp

You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset.
varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
options differ across the filters and are documented in each filters manual entry.

Description
tsfilter separates a time series into trend and cyclical components. The trend component may
contain a deterministic or a stochastic trend. The stationary cyclical component is driven by stochastic
cycles at the specified periods.

Remarks and examples


The time-series filters implemented in tsfilter separate a time-series yt into trend and cyclical
components:
yt = t + ct
where t is the trend component and ct is the cyclical component. t may be nonstationary; it may
contain a deterministic or a stochastic trend, as discussed below.
The primary objective of the methods implemented in tsfilter is to estimate ct , a stationary
cyclical component that is driven by stochastic cycles within a specified range of periods. The trend
component t is calculated by the difference t = yt ct .
478

tsfilter Filter a time-series, keeping only selected periodicities

479

Although the filters implemented in tsfilter have been widely applied by macroeconomists,
they are general time-series methods and may be of interest to other researchers.
Remarks are presented under the following headings:
An example dataset
A baseline method: Symmetric moving-average (SMA) filters
An overview of filtering in the frequency domain
SMA revisited: The BaxterKing filter
Filtering a random walk: The ChristianoFitzgerald filter
A one-parameter high-pass filter: The HodrickPrescott filter
A two-parameter high-pass filter: The Butterworth filter

An example dataset
Time series are frequently filtered to remove unwanted characteristics, such as trends and seasonal
components, or to estimate components driven by stochastic cycles from a specific range of periods.
Although the filters implemented in tsfilter can be used for both purposes, their primary purpose
is the latter, and we restrict our discussion to that use.
We explain the methods implemented in tsfilter by estimating the business-cycle component
of a macroeconomic variable, because they are frequently used for this purpose. We estimate the
business-cycle component of the natural log of an index of the industrial production of the United
States, which is plotted below.

Example 1: A trending time series

log of industrial production


2
3
4

. use http://www.stata-press.com/data/r13/ipq
(Federal Reserve Economic Data, St. Louis Fed)
. tsline ip_ln

1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

quarterly time variable

The above graph shows that ip ln contains a trend component. Time series may contain deterministic trends or stochastic trends. A polynomial function of time is the most common deterministic
time trend. An integrated process is the most common stochastic trend. An integrated process is a
random variable that must be differenced one or more times to be stationary; see Hamilton (1994) for
a discussion. The different filters implemented in tsfilter allow for different orders of deterministic
time trends or integrated processes.

480

tsfilter Filter a time-series, keeping only selected periodicities

We now illustrate the four methods implemented in tsfilter, each of which will remove the
trend and estimate the business-cycle component. Burns and Mitchell (1946) defined oscillations in
business data with recurring periods between 1.5 and 8 years to be business-cycle fluctuations; we
use their commonly accepted definition.

A baseline method: Symmetric moving-average (SMA) filters


Symmetric moving-average (SMA) filters form a baseline method for estimating a cyclical component
because of their properties and simplicity. An SMA filter of a time series yt , t {1, . . . , T }, is the
data transform defined by

yt =

q
X

j ytj

j=q

for each t {q + 1, . . . , T q}, where j = j for j {q, . . . , q}. Although the original series
has T observations, the filtered series has only T 2q , where q is known as the order of the SMA
filter.
SMA filters with weights that sum to zero remove deterministic and stochastic trends of order 2 or
less, as shown by Fuller (1996) and Baxter and King (1999).

Example 2: A trend-removing SMA filter


This trend-removal property of SMA filters with coefficients that sum to zero may surprise some
readers. For illustration purposes, we filter ip ln by the filter

0.2ip lnt2 0.2ip lnt1 + 0.8ip lnt 0.2ip lnt+1 0.2ip lnt+2
and plot the filtered series. We do not even need tsfilter to implement this second-order SMA
filter; we can use generate.
. generate ip_sma = -.2*L2.ip_ln-.2*L.ip_ln+.8*ip_ln-.2*F.ip_ln-.2*F2.ip_ln
(4 missing values generated)

.2

.1

ip_sma
0

.1

.2

. tsline ip_sma

1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

quarterly time variable

tsfilter Filter a time-series, keeping only selected periodicities

481

The filter has removed the trend.

There is no good reason why we chose that particular SMA filter. Baxter and King (1999) derived
a class of SMA filters with coefficients that sum to zero and get as close as possible to keeping only
the specified cyclical component.

An overview of filtering in the frequency domain


We need some concepts from the frequency-domain approach to time-series analysis to motivate
how Baxter and King (1999) defined as close as possible. These concepts also motivate the other
filters in tsfilter. The intuitive explanation presented here glosses over many technical details
discussed by Priestley (1981), Hamilton (1994), Fuller (1996), and Wei (2006).
As with much time-series analysis, the basic results are for covariance-stationary processes with
additional results handling some nonstationary cases. We present some useful results for covariancestationary processes and discuss how to handle nonstationary series below.
The autocovariances j , j {0, 1, . . . , }, of a covariance-stationary process yt specify its
variance and dependence structure. In the frequency-domain approach to time-series analysis, yt and
the autocovariances are specified in terms of independent stochastic cycles that occur at frequencies
[, ]. The spectral density function fy () specifies the contribution of stochastic cycles at
each frequency relative to the variance of yt , which is denoted by y2 . The variance and the
autocovariances can be expressed as an integral of the spectral density function. Formally,

eij fy ()d

j =

(1)

where i is the imaginary number i =

1.

Equation (1) can be manipulated to show what fraction of the variance of yt is attributable to
stochastic cycles in a specified range of frequencies. Hamilton (1994, 156) discusses this point in
more detail.
Equation (1) implies that if fy () = 0 for [1 , 2 ], then stochastic cycles at these frequencies
contribute zero to the variance and autocovariances of yt .
The goal of time-series filters is to transform the original series into a new series yt for which
the spectral density function of the filtered series fy () is zero for unwanted frequencies and equal
to fy () for desired frequencies.
A linear filter of yt can be written as

yt

j ytj = (L)yt

j=

where we let yt be an infinitely long series as required by some of the results below. To see the
impact of the filter on the components of yt at each frequency , we need an expression for fy ()
in terms of fy () and the filter weights j . Wei (2006, 282) shows that for each ,

fy () = |(ei )|2 fy ()

(2)

482

tsfilter Filter a time-series, keeping only selected periodicities

where |(ei )| is known as the gain of the filter. Equation (2) makes explicit that the squared gain
function |a(ei )|2 converts the spectral density of the original series, fy (), into the spectral density
of the filtered series, fy (). In particular, (2) says that for each frequency , the spectral density
of the filtered series is the product of the square of the gain of the filter and the spectral density of
the original series.
As we will see in the examples below, the gain function provides a crucial interpretation of what
a filter is doing. We want a filter for which fy () = 0 for unwanted frequencies and for which
fy () = fy () for desired frequencies. So we seek a filter for which the gain is 0 for unwanted
frequencies and for which the gain is 1 for desired frequencies.
In practice, we cannot find such an ideal filter exactly, because the constraints an ideal filter places
on filter coefficients cannot be satisfied for time series with only a finite number of observations. The
expansive literature on filters is a result of the trade-offs involved in designing implementable filters
that approximate the ideal filter.
Ideally, filters pass or block the stochastic cycles at specified frequencies by having a gain of
1 or 0. Band-pass filters, such as the BaxterKing (BK) and the ChristianoFitzgerald (CF) filters,
pass through stochastic cycles in the specified range of frequencies and block all the other stochastic
cycles. High-pass filters, such as the HodrickPrescott (HP) and Butterworth filters, only allow the
stochastic cycles at or above a specified frequency to pass through and block the lower-frequency
stochastic cycles. For band-pass filters, let [0 , 1 ] be the set of desired frequencies with all other
frequencies being undesired. For high-pass filters, let 0 be the cutoff frequency with only those
frequencies 0 being desired.

SMA revisited: The BaxterKing filter


We now return to the class of SMA filters with coefficients that sum to zero and get as close as
possible to keeping only the specified cyclical component as derived by Baxter and King (1999).
For an infinitely long series, there is an ideal band-pass filter for which the gain function is 1 for
[0 , 1 ] and 0 for all other frequencies. It just so happens that this ideal band-pass filter is an
SMA filter with coefficients that sum to zero. Baxter and King (1999) derive the coefficients of this
ideal band-pass filter and then define the BK filter to be the SMA filter with 2q + 1 terms that are
as close as possible to those of the ideal filter. There is a trade-off in choosing q : larger values of
q cause the gain of the BK filter to be closer to the gain of the ideal filter, but larger values also
increase the number of missing observations in the filtered series.
Although the mathematics of the frequency-domain approach to time-series analysis is in terms of
stochastic cycles at frequencies [, ], applied work is generally in terms of periods p, where
p = 2/ . So the options for the tsfilter subcommands are in terms of periods.

Example 3: A BK estimate of the business-cycle component


Below we use tsfilter bk, which implements the BK filter, to estimate the business-cycle
component composed of stochastic cycles between 6 and 32 periods, and then we graph the estimated
component.

tsfilter Filter a time-series, keeping only selected periodicities

483

.3

ip_ln cyclical component from bk filter


.2
.1
0
.1

.2

. tsfilter bk ip_bk = ip_ln, minperiod(6) maxperiod(32)


. tsline ip_bk

1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

quarterly time variable

The above graph tells us what the estimated business-cycle component looks like, but it presents no
evidence as to how well we have estimated the component. A periodogram is better for this purpose.
A periodogram is an estimator of a transform of the spectral density function; see [TS] pergram for
details. Below we plot the periodogram for the BK estimate of the business-cycle component. pergram
displays the results in natural frequencies, which are the standard frequencies divided by 2 . We use
the xline() option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125)
and the upper natural-frequency cutoff (1/6 0.16667).

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

ip_ln cyclical component from bk filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. pergram ip_bk, xline(0.03125 0.16667)

Evaluated at the natural frequencies

If the filter completely removed the stochastic cycles corresponding to the unwanted frequencies,
the periodogram would be a flat line at the minimum value of 6 outside the range identified by
the vertical lines. That the periodogram takes on values greater than 6 outside the specified range
indicates the inability of the BK filter to pass through only stochastic cycles at frequencies inside the
specified band.

484

tsfilter Filter a time-series, keeping only selected periodicities

We can also evaluate the BK filter by plotting its gain function against the gain function of an
ideal filter. In the output below, we reestimate the business-cycle component to store the gain of the
BK filter for the specified parameters. (The coefficients and the gain of the BK filter are completely
determined by the specified minimum period, the maximum period, and the order of the SMA filter.)
We label the variable bkgain for the graph below.
. drop ip_bk
. tsfilter bk ip_bk = ip_ln, minperiod(6) maxperiod(32) gain(bkgain abk)
. label variable bkgain "BK filter"

Below we generate ideal, the gain function of the ideal band-pass filter at the frequencies f.
Then we plot the gain of the ideal filter and the gain of the BK filter.
. generate f = _pi*(_n-1)/_N
. generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0))
. label variable ideal "Ideal filter"

.5

. twoway line ideal f || line bkgain abk

2
Ideal filter

3
BK filter

The graph reveals that the gain of the BK filter deviates markedly from the square-wave gain of the
ideal filter. Increasing the symmetric moving average via the smaorder() option will cause the gain
of the BK filter to more closely approximate the gain of the ideal filter at the cost of lost observations
in the filtered series.

Filtering a random walk: The ChristianoFitzgerald filter


Although Baxter and King (1999) minimized the error between the coefficients in their filter and
the ideal band-pass filter, Christiano and Fitzgerald (2003) minimized the mean squared error between
the estimated component and the true component, assuming that the raw series is a random-walk
process. Christiano and Fitzgerald (2003) give three important reasons for using their filter:
1. The true dependence structure of the data affects which filter is optimal.
2. Many economic time series are well approximated by random-walk processes.

tsfilter Filter a time-series, keeping only selected periodicities

485

3. Their filter does a good job passing through stochastic cycles of desired frequencies and blocking
stochastic cycles from unwanted frequencies on a range of processes that are close to being a
random-walk process.
The CF filter obtains its optimality properties at the cost of an additional parameter that must be
estimated and a loss of robustness. The CF filter is optimal for a random-walk process. If the true
process is a random walk with drift, then the drift term must be estimated and removed; see [TS] tsfilter
cf for details. The CF filter is not symmetric, so it will not remove second-order deterministic or
second-order integrated processes. tsfilter cf also implements another filter that Christiano and
Fitzgerald (2003) derived that is an SMA filter with coefficients that sum to zero. This filter is designed
to be as close as possible to the random-walk optimal filter under the constraint that it be an SMA
filter with constraints that sum to zero; see [TS] tsfilter cf for details.

Technical note
A random-walk process is a first-order integrated process; it must be differenced once to produce
a stationary process. Formally, a random-walk process is given by yt = yt1 + t , where t is a zeromean stationary random variable. A random-walk-plus-drift process is given by yet = + yet1 + t ,
where t is a zero-mean stationary random variable.

Example 4: A CF estimate of the business-cycle component


In this example, we use the CF filter to estimate the business-cycle component, and we plot the
periodogram of the CF estimates. We specify the drift option because ip ln is well approximated
by a random-walk-plus-drift process.
. tsfilter cf ip_cf = ip_ln, minperiod(6) maxperiod(32) drift

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

ip_ln cyclical component from cf filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. pergram ip_cf, xline(0.03125 0.16667)

Evaluated at the natural frequencies

The periodogram of the CF estimates of the business-cycle component indicates that the CF filter
did a better job than the BK filter of passing through only the desired stochastic cycles. Given that
ip ln is well approximated by a random-walk-plus-drift process, the relative performance of the CF
filter is not surprising.

486

tsfilter Filter a time-series, keeping only selected periodicities

As with the BK filter, plotting the gain of the CF filter and the gain of the ideal filter gives an
impression of how well the filter isolates the specified components. In the output below, we reestimate
the business-cycle component, using the gain() option to store the gain of the CF filter, and we plot
the gain functions.
. drop ip_cf

.5

1.5

. tsfilter cf ip_cf = ip_ln, minperiod(6) maxperiod(32) drift gain(cfgain acf)


. label variable cfgain "CF filter"
. twoway line ideal f || line cfgain acf

2
Ideal filter

3
CF filter

Comparing this graph with the graph of the BK gain function reveals that the CF filter is closer to
the gain of the ideal filter than is the BK filter. The graph also reveals that the gain of the CF filter
oscillates above and below 1 for desired frequencies.

The choice between the BK or the CF filter is one between robustness or efficiency. The BK filter
handles a broader class of stochastic processes, but the CF filter produces a better estimate of ct if
yt is close to a random-walk process or a random-walk-plus-drift process.

A one-parameter high-pass filter: The HodrickPrescott filter


Hodrick and Prescott (1997) motivated the HodrickPrescott (HP) filter as a trend-removal technique
that could be applied to data that came from a wide class of data-generating processes. In their view,
the technique specified a trend in the data, and the data were filtered by removing the trend. The
smoothness of the trend depends on a parameter . The trend becomes smoother as . Hodrick
and Prescott (1997) recommended setting to 1,600 for quarterly data.
King and Rebelo (1993) showed that removing a trend estimated by the HP filter is equivalent to
a high-pass filter. They derived the gain function of this high-pass filter and showed that the filter
would make integrated processes of order 4 or less stationary, making the HP filter comparable with
the band-pass filters discussed above.

tsfilter Filter a time-series, keeping only selected periodicities

487

Example 5: An HP estimate of the business-cycle component


We begin by applying the HP high-pass filter to ip ln and plotting the periodogram of the
estimated business-cycle component. We specify the gain() option because will use the gain of the
filter in the next example.

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

ip_ln cyclical component from hp filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. tsfilter hp ip_hp = ip_ln, gain(hpg1600 ahp1600)


. label variable hpg1600 "HP(1600) filter"
. pergram ip_hp, xline(0.03125)

Evaluated at the natural frequencies

Because the HP filter is a high-pass filter, the high-frequency stochastic cycles corresponding to
those periods below 6 remain in the estimated component. Of more concern is the presence of the
low-frequency stochastic cycles that the filter should remove. We address this issue in the example
below.

Example 6: Choosing the parameters for the HP filter


Hodrick and Prescott (1997) argued that the smoothing parameter should be 1,600 on the basis
of a heuristic argument that specified values for the variance of the cyclical component and the
variance of the second difference of the trend component, both recorded at quarterly frequencies. In
this example, we choose the smoothing parameter to be 677.13, which sets the gain of the filter to
0.5 at the frequency corresponding to 32 periods, as explained in the technical note below. We then
plot the periodogram of the filtered series.

tsfilter Filter a time-series, keeping only selected periodicities


gain(hpg677 ahp677)

2.00

4.00

6.00

Sample spectral density function

0.00

ip_ln cyclical component from hp filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. tsfilter hp ip_hp2 = ip_ln, smooth(677.13)


. label variable hpg677 "HP(677) filter"
. pergram ip_hp, xline(0.03125)

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

488

Evaluated at the natural frequencies

Although the periodogram looks better than the periodogram with the default smoothing, the HP
filter still did not zero out the low-frequency stochastic cycles as well as the CF filter did. We take
another look at this issue by plotting the gain functions for these filters along with the gain function
from the ideal band-pass filter.

.2

.4

.6

.8

. twoway line ideal f || line hpg677 ahp677

Ideal filter

HP(677) filter

Comparing the gain graphs reveals that the gain of the CF filter is closest to the gain of the ideal
filter. Both the BK and the HP filters allow some low-frequency stochastic cycles to pass through. The
plot also illustrates that the HP filter is a high-pass filter because its gain is 1 for those stochastic
cycles at frequencies above 6 periods, whereas the other gain functions go to zero.

tsfilter Filter a time-series, keeping only selected periodicities

489

Technical note
Conventionally, economists have used = 1600, which Hodrick and Prescott (1997) recommended
for quarterly data. Ravn and Uhlig (2002) derived values for at monthly and annual frequencies that
are rescalings of the conventional = 1600 for quarterly data. These heuristic values are the default
values; see [TS] tsfilter hp for details. In the filter literature, filter parameters are set as functions of
the cutoff frequency; see Pollock (2000, 324), for instance. This method finds the filter parameter
that sets the gain of the filter equal to 1/2 at the cutoff frequency. Applying this method to selecting
at the cutoff frequency of 32 periods requires solving
2

1/2 =

4 {1 cos(2/32)}

1 + 4 {1 cos(2/32)}

for , which yields 677.13, which was used in the previous example.

.2

.4

.6

.8

The gain function of the HP filter is a function of the parameter , and sets both the location of
the cutoff frequency and the slope of the gain function. The graph below illustrates this dependence
by plotting the gain function of the HP filter for set to 10, 677.13, and 1,600 along with the gain
function for the ideal band-pass filter with cutoff periods of 32 periods and 6 periods.

1
Ideal filter
HP(677) filter

3
HP(10) filter
HP(1600) filter

A two-parameter high-pass filter: The Butterworth filter


Engineers have used Butterworth filters for a long time because they are maximally flat. The
gain functions of these filters are as close as possible to being a flat line at 0 for the unwanted periods
and a flat line at 1 for the desired periods; see Butterworth (1930) and Bianchi and Sorrentino (2007,
1720).
Pollock (2000) showed that Butterworth filters can be derived from some axioms that specify
properties we would like a filter to have. Although the Butterworth and BK filters share the properties
of symmetry and phase neutrality, the coefficients of Butterworth filters do not need to sum to
zero. (Phase-neutral filters do not shift the signal forward or backward in time; see Pollock [1999].)
Although the BK filter relies on the detrending properties of SMA filters with coefficients that sum
to zero, Pollock (2000) shows that Butterworth filters have detrending properties that depend on the
filters parameters.

490

tsfilter Filter a time-series, keeping only selected periodicities

tsfilter bw implements the high-pass Butterworth filter using the computational method that
Pollock (2000) derived. This filter has two parameters: the cutoff period and the order of the filter
denoted by m. The cutoff period sets the location where the gain function starts to filter out the
high-period (low-frequency) stochastic cycles, and m sets the slope of the gain function for a given
cutoff period. For a given cutoff period, the slope of the gain function at the cutoff period increases
with m. For a given m, the slope of the gain function at the cutoff period increases with the cutoff
period.
We cannot obtain a vertical slope at the cutoff frequency, which is the ideal, because the computation
becomes unstable; see Pollock (2000). The m for which the computation becomes unstable depends
on the cutoff period.
Pollock (2000) and Gomez (1999) argue that the additional flexibility produced by the additional
parameter makes the high-pass Butterworth filter a better filter than the HP filter for estimating the
cyclical components.
Pollock (2000) shows that the high-pass Butterworth filter can estimate the desired components of
the dth difference of a dth-order integrated process as long as m d.

Example 7: A Butterworth filter that removes low-frequency components


Below we use tsfilter bw to estimate the components driven by stochastic cycles greater than
32 periods using Butterworth filters of order 2 and order 6. We also compute, label, and plot the gain
functions for each filter.
tsfilter bw ip_bw1 = ip_ln, gain(bwgain1 abw1) maxperiod(32) order(2)
label variable bwgain1 "BW 2"
tsfilter bw ip_bw6 = ip_ln, gain(bwgain6 abw6) maxperiod(32) order(6)
label variable bwgain6 "BW 6"
twoway line ideal f || line bwgain1 abw1 || line bwgain6 abw6

.2

.4

.6

.8

.
.
.
.
.

2
Ideal filter
BW 6

3
BW 2

The graph illustrates that the slope of the gain function increases with the order of the filter.
The graph below provides another perspective by plotting the gain function from the ideal band-pass
filter on a graph with plots of the gain functions from the Butterworth filter of order 6, the CF filter,
and the HP(677) filter.

tsfilter Filter a time-series, keeping only selected periodicities

491

.25

.5

.75

1.25

. twoway line ideal f || line bwgain6 abw6 || line cfgain acf


> || line hpg677 ahp677

Ideal filter
CF filter

BW 6
HP(677) filter

Although the slope of the gain function from the CF filter is closer to being vertical at the cutoff
frequency, the gain function of the Butterworth filter does not oscillate above and below 1 after it first
reaches the value of 1. The flatness of the Butterworth filter below and above the cutoff frequency
is not an accident; it is one of the filters properties.

Example 8: A Butterworth filter that removes high-frequency components


In the previous example, we used the Butterworth filter of order 6 to remove low-frequency
stochastic cycles, and we saved the results in ip bw6. The Butterworth filter did not address the
high-frequency stochastic cycles below 6 periods because it is a high-pass filter. We remove those
high-frequency stochastic cycles in this example by keeping the trend produced by refiltering the
previously filtered series.
This example uses a common trick: keeping the trend produced by a high-pass filter turns that
high-pass filter into a low-pass filter. Because we want to remove the high-frequency stochastic cycles
still in the previously filtered series ip bw6, we need a low-pass filter. So we keep the trend produced
by refiltering the previously filtered series.
In the output below, we apply a Butterworth filter of order 20 to the previously filtered series
ip bw6. We explain why we used order 20 in the next example. We specify the trend() option to
keep the low-frequency components from these filters. Then we compute and graph the periodogram
for the trend variable.

tsfilter Filter a time-series, keeping only selected periodicities


tsfilter bw ip_bwu20 = ip_bw6, gain(bwg20 fbw20) maxperiod(6) order(20)
trend(ip_bwb)
label variable bwg20 "BW upper filter 20"
pergram ip_bwb, xline(0.03125 0.16667)

2.00

4.00

6.00

Sample spectral density function

0.00

ip_bw6 trend component from bw filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

.
>
.
.

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

492

Evaluated at the natural frequencies

The periodogram reveals that the two-pass process has passed the original series ip ln through
a band-pass filter. It also reveals that the two-pass process did a reasonable job of filtering out the
stochastic cycles corresponding to the unwanted frequencies.

Example 9: Choosing the order of a Butterworth filter


In the previous example, when the cutoff period was 6, we set the order of the Butterworth filter
to 20. In contrast, in example 7, when the cutoff period was 32, we set the order of the Butterworth
filter to 6. We had to increase filter order because the slope of the gain function of the Butterworth
filter is increasing with the cutoff period. We needed a larger filter order to get an acceptable slope
at the lower cutoff period.
We illustrate this point in the output below. We apply Butterworth filters of orders 1 and 6 to the
previously filtered series ip bw6, we compute the gain functions, we label the gain variables, and
then we plot the gain functions from the ideal filter and the Butterworth filters.

tsfilter Filter a time-series, keeping only selected periodicities


tsfilter bw ip_bwu1 = ip_bw6, gain(bwg1 fbw1)
maxperiod(6) order(2)
label variable bwg1 "BW upper filter 2"
tsfilter bw ip_bwu6 = ip_bw6, gain(bwg6 fbw6)
maxperiod(6) order(6)
label variable bwg6 "BW upper filter 6"
twoway line ideal f || line bwg1 fbw1 || line bwg6 fbw6 || line bwg20 fbw20

.2

.4

.6

.8

.
.
.
.
.

493

1
Ideal filter
BW upper filter 6

3
BW upper filter 2
BW upper filter 20

Because the cutoff period is 6, the gain functions for m = 2 and m = 6 are much flatter than the
gain functions for m = 2 and m = 6 in example 7 when the cutoff period was 32. The gain function
for m = 20 is reasonably close to vertical, so we used it in example 8. We mentioned above that
for any given cutoff period, the computation eventually becomes unstable for larger values of m. For
instance, when the cutoff period is 32, m = 20 is not numerically feasible.

Example 10: Comparing the Butterworth and CF estimates


As a conclusion, we plot the business-cycle components estimated by the CF filter and by the
two passes of Butterworth filters. The shaded areas identify recessions. The two estimates are close
but the differences could be important. Which estimate is better depends on whether the oscillations
around 1 in the graph of the CF gain function (the second graph of example 7) cause more problems
than the nonvertical slopes at the cutoff periods that occur in the BW6 gain function of that same
graph and the BW upper filter 20 gain function graphed above.

tsfilter Filter a time-series, keeping only selected periodicities

.25

.25

494

1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

quarterly time variable


Butterworth filter

CF filter

There is a long tradition in economics of using models to estimate components. Instead of comparing
filters by their gain functions, some authors compare filters by finding underlying models for which
the filter parameters are the model parameters. For instance, Harvey and Jaeger (1993), Gomez (1999,
2001), Pollock (2000, 2006), and Harvey and Trimbur (2003) derive models that correspond to the
HP or the Butterworth filter. Some of these references also compare components estimated by filters
with components estimated by making predictions from estimated models. In effect, these references
point out that arima, dfactor, sspace, and ucm (see [TS] arima, [TS] dfactor, [TS] sspace, and
[TS] ucm) implement alternative methods to component estimation.

Methods and formulas


All filters work with both time-series data and panel data when there are many observations on
each panel. When used with panel data, the calculations are performed separately within each panel.
For these filters, the default minimum and maximum periods of oscillation correspond to the
boundaries used by economists (Burns and Mitchell 1946) for business cycles. Burns and Mitchell
defined business cycles as oscillations in business data with recurring periods between 1.5 and 8
years. Their definition continues to be cited by economists investigating correlations between business
cycles.
If yt is a time series, then the cyclical component is

ct = B(L)yt =

bj ytj

j=

where bj are the coefficients of the impulseresponse sequence of some ideal filter. The impulse
response sequence is the inverse Fourier transform of either a square wave or step function depending
upon whether the filter is a band-pass or high-pass filter, respectively.

tsfilter Filter a time-series, keeping only selected periodicities

495

In finite sequences, it is necessary to approximate this calculation with a finite impulseresponse


sequence b
bj :
n2
X
bbj ytj
b
b
ct = Bt (L)yt =
j=n1

The infinite-order impulseresponse sequence for the filters implemented in tsfilter are symmetric
and time-invariant.
In the frequency domain, the relationships between the true cyclical component and its finite
estimates respectively are
c() = B()y()
and

b
b
c() = B()y()
b
b.
where B() and B()
are the frequency transfer functions of the filters B and B
The frequency transfer function for B() can be expressed in polar form as

B() = |B()|exp{i()}
where |B()| is the filters gain function and () is the filters phase function. The gain function
determines whether the amplitude of the stochastic cycle is increased or decreased at a particular
frequency. The phase function determines how a cycle at a particular frequency is shifted forward or
backward in time.
In this form, it can be shown that the spectrum of the cyclical component, fc (), is related to the
spectrum of yt series by the squared gain:

fc () = |B()|2 fy ()
Each of the four filters in tsfilter has an option for returning an estimate of the gain function
together with its associated scaled frequency a = / , where 0 . These are consistent
estimates of |B()|, the gain from the ideal linear filter.
The band-pass filters implemented in tsfilter, the BK and CF filters, use a square wave as the
ideal transfer function:

1 if || [l , h ]
B() =

0 if ||
/ [l , h ]
The high-pass filters, the HodrickPrescott and Butterworth filters, use a step function as the ideal
transfer function:

1 if || h
B() =

0 if || < h

Acknowledgments
We thank Christopher F. Baum of the Department of Economics at Boston College and author of the
Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to Stata
Programming for his previous implementations of these filters: BaxterKing (bking), Christiano
Fitzgerald (cfitzrw), HodrickPrescott (hprescott), and Butterworth (butterworth).

496

tsfilter Filter a time-series, keeping only selected periodicities

We also thank D. S. G. Pollock of the Department of Economics at the University of Leicester,


UK, for his helpful responses to our questions about Butterworth filters and the methods that he has

developed.

References
Baxter, M., and R. G. King. 1999. Measuring business cycles: Approximate band-pass filters for economic time series.
Review of Economics and Statistics 81: 575593.
Bianchi, G., and R. Sorrentino. 2007. Electronic Filter Simulation and Design. New York: McGrawHill.
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
Butterworth, S. 1930. On the theory of filter amplifiers. Experimental Wireless and the Wireless Engineer 7: 536541.
Christiano, L. J., and T. J. Fitzgerald. 2003. The band pass filter. International Economic Review 44: 435465.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Gomez, V. 1999. Three equivalent methods for filtering finite nonstationary time series. Journal of Business and
Economic Statistics 17: 109116.
. 2001. The use of Butterworth filters for trend and cycle estimation in economic time series. Journal of Business
and Economic Statistics 19: 365373.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C., and A. Jaeger. 1993. Detrending, stylized facts and the business cycle. Journal of Applied Econometrics
8: 231247.
Harvey, A. C., and T. M. Trimbur. 2003. General model-based filters for extracting cycles and trends in economic
time series. The Review of Economics and Statistics 85: 244255.
Hodrick, R. J., and E. C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money,
Credit, and Banking 29: 116.
King, R. G., and S. T. Rebelo. 1993. Low frequency filtering and real business cycles. Journal of Economic Dynamics
and Control 17: 207231.
Leser, C. E. V. 1961. A simple method of trend construction. Journal of the Royal Statistical Society, Series B 23:
91107.
Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic
Press.
. 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317334.
. 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 22682292.
Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press.
Ravn, M. O., and H. Uhlig. 2002. On adjusting the HodrickPrescott filter for the frequency of observations. Review
of Economics and Statistics 84: 371376.
Schmidt, T. J. 1994. sts5: Detrending with the HodrickPrescott filter. Stata Technical Bulletin 17: 2224. Reprinted
in Stata Technical Bulletin Reprints, vol. 3, pp. 216219. College Station, TX: Stata Press.
Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson.

Also see
[TS] tsset Declare data to be time-series data
[XT] xtset Declare data to be panel data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tsfilter bk BaxterKing time-series filter
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Filter one variable
tsfilter bk

type

newvar = varname

if

 

in

 

, options

Filter multiple variables, unique names




    

tsfilter bk type newvarlist = varlist if
in
, options
Filter multiple variables, common name stub


    

tsfilter bk type stub* = varlist if
in
, options
Description

options
Main

filter out stochastic cycles at periods smaller than #


filter out stochastic cycles at periods larger than #
number of observations in each direction that contribute to
each filtered value
use calculations for a stationary time series

minperiod(#)
maxperiod(#)
smaorder(#)
stationary
Trend

trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s)


Gain

gain(gainvar anglevar)

save the gain and angular frequency

You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset.
varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Filters for cyclical components

>

Baxter-King

Description
tsfilter bk uses the Baxter and King (1999) band-pass filter to separate a time series into trend
and cyclical components. The trend component may contain a deterministic or a stochastic trend. The
stationary cyclical component is driven by stochastic cycles at the specified periods.
See [TS] tsfilter for an introduction to the methods implemented in tsfilter bk.
497

498

tsfilter bk BaxterKing time-series filter

Options


Main

minperiod(#) filters out stochastic cycles at periods smaller than #, where # must be at least 2
and less than maxperiod(). By default, if the units of the time variable are set to daily, weekly,
monthly, quarterly, or half-yearly, then # is set to the number of periods equivalent to 1.5 years;
yearly data use minperiod(2); otherwise, the default value is minperiod(6).
maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than
minperiod(). By default, if the units of the time variable are set to daily, weekly, monthly,
quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 8 years;
otherwise, the default value is maxperiod(32).
smaorder(#) sets the order of the symmetric moving average, denoted by q . The order is an integer
that specifies the number of observations in each direction used in calculating the symmetric
moving average estimate of the cyclical component. This number must be an integer greater than
zero and less than (T 1)/2. The estimate for the cyclical component for the tth observation, yt ,
is based upon the 2q + 1 values ytq , ytq+1 , . . . , yt , yt+1 , . . . , yt+q . By default, if the units
of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then # is
set to the equivalent of 3 years; otherwise, the default value is smaorder(12).
stationary modifies the filter calculations to those appropriate for a stationary series. By default,
the series is assumed nonstationary.

Trend

trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by
newvar, newvarlist, or stub*.

Gain

gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar.
Gains are calculated at the N angular frequencies that uniformly partition the interval (0, ], where
N is the sample size.

Remarks and examples


We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and
the methods implemented in tsfilter bk, more examples using tsfilter bk, and a comparison
of the four filters implemented by tsfilter. In particular, an understanding of gain functions as
presented in [TS] tsfilter is required to understand these remarks.
tsfilter bk uses the BaxterKing (BK) band-pass filter to separate a time-series yt into trend
and cyclical components:
yt = t + ct
where t is the trend component and ct is the cyclical component. t may be nonstationary; it may
contain a deterministic or a stochastic trend, as discussed below.
The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic
cycles within a specified range of periods. The trend component t is calculated by the difference
t = yt ct .
Although the BK band-pass filter implemented in tsfilter bk has been widely applied by
macroeconomists, it is a general time-series method and may be of interest to other researchers.

tsfilter bk BaxterKing time-series filter

499

Symmetric moving-average (SMA) filters with coefficients that sum to zero remove stochastic and
deterministic trends of first and second order; see Fuller (1996), Baxter and King (1995), and Baxter
and King (1999).
For an infinitely long series, there is an ideal band-pass filter for which the gain function is 1 for
[0 , 1 ] and 0 for all other frequencies; see [TS] tsfilter for an introduction to gain functions.
It just so happens that this ideal band-pass filter is an SMA filter with coefficients that sum to zero.
Baxter and King (1999) derive the coefficients of this ideal band-pass filter and then define the BK
filter to be the SMA filter with 2q + 1 terms that are as close as possible to those of the ideal filter.
There is a trade-off in choosing q : larger values of q cause the gain of the BK filter to be closer to
the gain of the ideal filter, but they also increase the number of missing observations in the filtered
series.
The smaorder() option specifies q . The default value of smaorder() is the number of periods
equivalent to 3 years, following the Baxter and King (1999) recommendation.
Although the mathematics of the frequency-domain approach to time-series analysis is in terms of
stochastic cycles at frequencies [, ], applied work is generally in terms of periods p, where
p = 2/ . So tsfilter bk has the minperiod() and maxperiod() options to specify the desired
range of stochastic cycles.
Among economists, the BK filter is commonly used for investigating business cycles. Burns and
Mitchell (1946) defined business cycles as stochastic cycles in business data corresponding to periods
between 1.5 and 8 years. The default values for minperiod() and maxperiod() are the Burns
Mitchell values of 1.5 and 8 years, scaled to the frequency of the dataset. The calculations of
the default values assume that the time variable is formatted as daily, weekly, monthly, quarterly,
half-yearly, or yearly; see [D] format.
For each variable, the band-pass BK filter estimate of ct is put in the corresponding new variable,
and when the trend() option is specified, the estimate of t is put in the corresponding new variable.
tsfilter bk automatically detects panel data from the information provided when the dataset was
tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning
and end of the sample are excluded from the sample. The sample may not contain gaps.
Baxter and King (1999) derived their method for nonstationary time series, but they noted that
a small modification makes it applicable to stationary time series. Imposing the condition that the
filter coefficients sum to zero is what makes their method applicable to nonstationary time series;
dropping this condition yields a filter for stationary time series. Specifying the stationary option
causes tsfilter bk to use coefficients calculated without the constraint that they sum to zero.

Example 1: Estimating a business-cycle component


In this and the subsequent examples, we use tsfilter bk to estimate the business-cycle component
of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly
data goes from 1952q1 to 2010q4. Below we read in and plot the data.

500

tsfilter bk BaxterKing time-series filter

7.5

natural log of real GDP


8.5
9

9.5

. use http://www.stata-press.com/data/r13/gdp2
(Federal Reserve Economic Data, St. Louis Fed)
. tsline gdp_ln

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable

The series is nonstationary and is thus a candidate for the BK filter.


Below we use tsfilter bk to filter gdp ln, and we use pergram (see [TS] pergram) to compute
and to plot the periodogram of the estimated cyclical component.

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from bk filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. tsfilter bk gdp_bk = gdp_ln


. pergram gdp_bk, xline(.03125 .16667)

Evaluated at the natural frequencies

Because our sample is of quarterly data, tsfilter bk used the default values of minperiod(6),
maxperiod(32), and smaorder(12). The minimum and maximum periods are the Burns and
Mitchell (1946) business-cycle periods for quarterly data. The default of smaorder(12) was recommend by Baxter and King (1999) for quarterly data.
In the periodogram, we added vertical lines at the natural frequencies corresponding to the
conventional Burns and Mitchell (1946) values for business-cycle components. pergram displays the

tsfilter bk BaxterKing time-series filter

501

results in natural frequencies, which are the standard frequencies divided by 2 . We use the xline()
option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper
natural-frequency cutoff (1/6 0.16667).
If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram
would be a flat line at the minimum value of 6 outside the range identified by the vertical lines.
The periodogram reveals that the default value of smaorder(12) did not do a good job of filtering
out the high-periodicity stochastic cycles, because there are too many points above 6.00 to the left
of the left-hand vertical line. It also reveals that the filter did not remove enough low-periodicity
stochastic cycles, because there are too many points above 6.00 to the right of the right-hand vertical
line.
We address these problems in the next example.

Example 2: Changing the order of the filter


In this example, we change the symmetric moving average of the filter via the smaorder() option
so that it will remove more of the unwanted stochastic cycles. As mentioned, larger values of q cause
the gain of the BK filter to be closer to the gain of the ideal filter, but larger values also increase the
number of missing observations in the filtered series.
In the output below, we estimate the business-cycle component and compute the gain functions
when the SMA-order of the filter is 12 and when it is 20. We also generate ideal, the gain function
of the ideal band-pass filter at the frequencies f. Then we plot the gain functions from all three filters.
. tsfilter bk gdp_bk12 = gdp_ln, gain(g12 a12)
. label variable g12 "BK SMA-order 12"
. tsfilter bk gdp_bk20 = gdp_ln, gain(g20 a20) smaorder(20)
. label variable g20 "BK SMA-order 20"
. generate f = _pi*(_n-1)/_N
. generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0))
. label variable ideal "Ideal filter"

.5

. twoway line ideal f || line g12 a12 || line g20 a20

1
Ideal filter
BK SMAorder 20

3
BK SMAorder 12

502

tsfilter bk BaxterKing time-series filter

As discussed in [TS] tsfilter, the gain function of the ideal filter is a square wave with a value of 0
at the frequencies corresponding to unwanted frequencies and a value of 1 at the desired frequencies.
The vertical lines in the gain function of the ideal filter occur at the frequencies /16, corresponding
to 32 periods, and at /3, corresponding to 6 periods. (Given that p = 2/ , where p is the period
corresponding to the frequency , the frequency is given by 2/p.)
The differences between the gain function of the filter with SMA-order 12 and the gain function
of the ideal band-pass filter is the root of the issues mentioned at the end of example 1. The filter
with SMA-order 20 is closer to the gain function of the ideal band-pass filter at the cost of 16 more
missing values in the filtered series.
Below we compute and graph the periodogram of the series filtered with SMA-order 20.

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from bk filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. pergram gdp_bk20, xline(.03125 .16667)

Evaluated at the natural frequencies

The above periodogram indicates that the filter of SMA-order 20 removed more of the stochastic cycles
at the unwanted periodicities than did the filter of SMA-order 12. Whether removing the stochastic
cycles at the unwanted periodicities is worth losing more observations in the filtered series is a
judgment call.

tsfilter bk BaxterKing time-series filter

503

.04

gdp_ln cyclical component from bk filter


.02
0
.02

.04

Below we plot the estimated business-cycle component with recessions identified by the shaded
areas.

1957q3

1969q3

1981q3

1993q3

2005q3

quarterly time variable


gdp_ln cyclical component from bk filter

Stored results
tsfilter bk stores the following in r():
Scalars
r(smaorder)
r(minperiod)
r(maxperiod)
Macros
r(varlist)
r(filterlist)
r(trendlist)
r(method)
r(stationary)
r(unit)
Matrices
r(filter)

order of the symmetric moving average


minimum period of stochastic cycles
maximum period of stochastic cycles
original time-series variables
variables containing estimates of the cyclical components
variables containing estimates of the trend components, if trend() was specified
Baxter-King
yes or no, indicating whether the calculations assumed the series was or was not stationary
units of time variable set using tsset or xtset
(q+1)1 matrix of filter weights, where q is the order of the symmetric moving average

Methods and formulas


Baxter and King (1999) showed that there is an infinite-order SMA filter with coefficients that sum
to zero that can extract the specified components from a nonstationary time series. The components
are specified in terms of the minimum and maximum periods of the stochastic cycles that drive these
components in the frequency domain. This ideal filter is not feasible, because the constraints imposed
on the filter can only be satisfied using an infinite number of coefficients, so Baxter and King (1999)
derived a finite approximation to this ideal filter.
The infinite-order, ideal band-pass filter obtains the cyclical component with the calculation

ct =

X
j=

bj ytj

504

tsfilter bk BaxterKing time-series filter

Letting pl and ph be the minimum and maximum periods of the stochastic cycles of interest, the
weights bj in this calculation are given by

bj =

1 (h l )

if j = 0

(j)1 {sin(jh ) sin(jl )} if j 6= 0

where l = 2/pl and h = 2/ph are the lower and higher cutoff frequencies, respectively.
For the default case of nonstationary time series with finite length, the ideal band-pass filter cannot
be used without modification. Baxter and King (1999) derived modified weights for a finite order
SMA filter with coefficients that sum to zero.
As a result, Baxter and King (1999) estimate ct by

ct =

+q
X

bbj ytj

j=q

The coefficients b
bj in this calculation are equal to bbj = bj bq , where bbj = bbj and bq is the mean
of the ideal coefficients truncated at q :

bq = (2q + 1)1

q
X

bj

j=q

P+q
Note that j=q b
bj = 0 and that the first and last q values of the cyclical component cannot be
estimated using this filter.
If the stationary option is set, the BK filter sets the coefficients to the ideal coefficients, that is,
Pq b
bbj = bj . For these weights, bbj = bbj , and although P
b
j= bj = 0, for small q ,
q bj 6= 0.

References
Baxter, M., and R. G. King. 1995. Measuring business cycles approximate band-pass filters for economic time series.
NBER Working Paper No. 5022, National Bureau of Economic Research. http://www.nber.org/papers/w5022.
. 1999. Measuring business cycles: Approximate band-pass filters for economic time series. Review of Economics
and Statistics 81: 575593.
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic
Press.
. 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 22682292.

Also see
[TS] tsset Declare data to be time-series data
[XT] xtset Declare data to be panel data
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[D] format Set variables output format
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tsfilter bw Butterworth time-series filter
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Filter one variable
tsfilter bw

type

newvar = varname

if

 

in

 

, options

Filter multiple variables, unique names




    

tsfilter bw type newvarlist = varlist if
in
, options
Filter multiple variables, common name stub


    

tsfilter bw type stub* = varlist if
in
, options
Description

options
Main

filter out stochastic cycles at periods larger than #


set the order of the filter; default is order(2)

maxperiod(#)
order(#)
Trend

trend(newvar|newvarlist|stub*) save the trend component(s) in new variable(s)


Gain

gain(gainvar anglevar)

save the gain and angular frequency

You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset.
varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Filters for cyclical components

>

Butterworth

Description
tsfilter bw uses the Butterworth high-pass filter to separate a time series into trend and cyclical
components. The trend component may contain a deterministic or a stochastic trend. The stationary
cyclical component is driven by stochastic cycles at the specified periods.
See [TS] tsfilter for an introduction to the methods implemented in tsfilter bw.

505

506

tsfilter bw Butterworth time-series filter

Options


Main

maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than 2.
By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly,
or yearly, then # is set to the number of periods equivalent to 8 years; otherwise, the default value
is maxperiod(32).
order(#) sets the order of the Butterworth filter, which must be an integer. The default is order(2).

Trend

trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by
newvar, newvarlist, or stub*.

Gain

gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar.
Gains are calculated at the N angular frequencies that uniformly partition the interval (0, ], where
N is the sample size.

Remarks and examples


We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and
the methods implemented in tsfilter bw, more examples using tsfilter bw, and a comparison
of the four filters implemented by tsfilter. In particular, an understanding of gain functions as
presented in [TS] tsfilter is required to understand these remarks.
tsfilter bw uses the Butterworth high-pass filter to separate a time-series yt into trend and
cyclical components:
yt = t + ct
where t is the trend component and ct is the cyclical component. t may be nonstationary; it may
contain a deterministic or a stochastic trend, as discussed below.
The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic
cycles within a specified range of periods. The trend component t is calculated by the difference
t = yt ct .
Although the Butterworth high-pass filter implemented in tsfilter bw has been widely applied
by macroeconomists and engineers, it is a general time-series method and may be of interest to other
researchers.
Engineers have used Butterworth filters for a long time because they are maximally flat. The
gain functions of these filters are as close as possible to being a flat line at 0 for the unwanted periods
and a flat line at 1 for the desired periods; see Butterworth (1930) and Bianchi and Sorrentino (2007,
1720). (See [TS] tsfilter for an introduction to gain functions.)
The high-pass Butterworth filter is a two-parameter filter. The maxperiod() option specifies the
maximum period; the stochastic cycles of all higher periodicities are filtered out. The maxperiod()
option sets the location of the cutoff period in the gain function. The order() option specifies the
order of the filter, which determines the slope of the gain function at the cutoff frequency.
For a given cutoff period, the slope of the gain function at the cutoff period increases with filter
order. For a given filter order, the slope of the gain function at the cutoff period increases with the
cutoff period.

tsfilter bw Butterworth time-series filter

507

We cannot obtain a vertical slope at the cutoff frequency, which is the ideal, because the computation
becomes unstable; see Pollock (2000). The filter order for which the computation becomes unstable
depends on the cutoff period.
Among economists, the high-pass Butterworth filter is commonly used for investigating business
cycles. Burns and Mitchell (1946) defined business cycles as stochastic cycles in business data
corresponding to periods between 1.5 and 8 years. For this reason, the default value for maxperiod()
is the number of periods in 8 years, if the time variable is formatted as daily, weekly, monthly,
quarterly, half-yearly, or yearly; see [D] format. The default value for maxperiod() is 32 for all
other time formats.
For each variable, the high-pass Butterworth filter estimate of ct is put in the corresponding new
variable, and when the trend() option is specified, the estimate of t is put in the corresponding
new variable.
tsfilter bw automatically detects panel data from the information provided when the dataset was
tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning
and end of the sample are excluded from the sample. The sample may not contain gaps.

Example 1: Estimating a business-cycle component


In this and the subsequent examples, we use tsfilter bw to estimate the business-cycle component
of the natural log of the real gross domestic product (GDP) of the United States. Our sample of quarterly
data goes from 1952q1 to 2010q4. Below we read in and plot the data.

7.5

natural log of real GDP


8.5
9

9.5

. use http://www.stata-press.com/data/r13/gdp2
(Federal Reserve Economic Data, St. Louis Fed)
. tsline gdp_ln

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable

The series is nonstationary. Pollock (2000) shows that the high-pass Butterworth filter can estimate
the components driven by the stochastic cycles at the specified frequencies when the original series
is nonstationary.
Below we use tsfilter bw to filter gdp ln and use pergram (see [TS] pergram) to compute
and to plot the periodogram of the estimated cyclical component.
. tsfilter bw gdp_bw = gdp_ln
. pergram gdp_bw, xline(.03125 .16667)

tsfilter bw Butterworth time-series filter

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from bw filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

508

Evaluated at the natural frequencies

tsfilter bw used the default value of maxperiod(32) because our sample is of quarterly data. In
the periodogram, we added vertical lines at the natural frequencies corresponding to the conventional
Burns and Mitchell (1946) values for business-cycle components. pergram displays the results in
natural frequencies, which are the standard frequencies divided by 2 . We use option xline() to draw
vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency
cutoff (1/6 0.16667).
If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram
would be a flat line at the minimum value of 6 outside the range identified by the vertical lines.
The periodogram reveals two issues. First, it indicates that the default value of order(2) did not
do a good job of filtering out the high-periodicity stochastic cycles, because there are too many points
above 6.00 to the left of the left-hand vertical line. Second, it reveals the high-pass nature of the
filter, because none of the low-period (high-frequency) stochastic cycles have been filtered out.
We cope with these two issues in the remaining examples.

Example 2: Changing the order of the filter


In this example, we change the order of the filter so that it will remove more of the unwanted
low-frequency stochastic cycles. As previously mentioned, increasing the order of the filter increases
the slope of the gain function at the cutoff period.
For orders 2 and 8, we compute the filtered series, compute the gain functions, and label the gain
variables. We also generate ideal, the gain function of the ideal band-pass filter at the frequencies
f. Then we plot the gain function of the ideal band-pass filter and the gain functions of the high-pass
Butterworth filters of orders 2 and 8.
.
.
.
.
.
.
.

tsfilter bw gdp_bw2 = gdp_ln, gain(g1 a1)


label variable g1 "BW order 2"
tsfilter bw gdp_bw8 = gdp_ln, gain(g8 a8) order(8)
label variable g8 "BW order 8"
generate f = _pi*(_n-1)/_N
generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0))
label variable ideal "Ideal filter"

tsfilter bw Butterworth time-series filter

509

.2

.4

.6

.8

. twoway line ideal f || line g1 a1 || line g8 a8

1
Ideal filter
BW order 8

3
BW order 2

As discussed in [TS] tsfilter, the gain function of the ideal filter is a square wave with a value of 0
at the frequencies corresponding to unwanted frequencies and a value of 1 at the desired frequencies.
The vertical lines in the gain function of the ideal filter occur at /16, corresponding to 32 periods,
and at /3, corresponding to 6 periods. (Given that p = 2/ , where p is the period corresponding
to frequency , the frequency is given by 2/p.)
The distance between the gain function of the filter with order 2 and the gain function of the ideal
band-pass filter at /16 is the root of the first issue mentioned at the end of example 1. The filter
with order 8 is much closer to the gain function of the ideal band-pass filter at /16 than is the
filter with order 2. That both gain functions are 1 to the right of the vertical line at /3 reveals the
high-pass nature of the filter.

Example 3: Removing the high-frequency component


In this example, we use a common trick to resolve the second issue mentioned at the end of
example 1. Keeping the trend produced by a high-pass filter turns that high-pass filter into a low-pass
filter. Because we want to remove the high-frequency stochastic cycles still in the previously filtered
series gdp bw8, we need to run gdp bw8 through a low-pass filter. So we keep the trend produced
by refiltering the previously filtered series.
To determine an order for the filter, we run the filter with order(8), then with order(15), and
then we plot the gain functions along with the gain function of the ideal filter.
. tsfilter bw gdp_bwn8 = gdp_bw8, gain(gc8 ac8) order(8)
> maxperiod(6) trend(gdp_bwc8)
. label variable gc8 "BW order 8"
. tsfilter bw gdp_bwn15 = gdp_bw8, gain(gc15 ac15) order(15)
> maxperiod(6) trend(gdp_bwc15)
. label variable gc15 "BW order 15"
. twoway line ideal f || line gc8 ac8 || line gc15 ac15

tsfilter bw Butterworth time-series filter

.2

.4

.6

.8

510

Ideal filter
BW order 15

3
BW order 8

We specified much higher orders for the filter in this example because the cutoff period is 6 instead
of 32. (As previously mentioned, holding the order of the filter constant, the slope of the gain function
at the cutoff period decreases when the period decreases.) The above graph indicates that the filter
with order(15) is reasonably close to the gain function of the ideal filter.
Now we compute and plot the periodogram of the estimated business-cycle component.

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_bw8 trend component from bw filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. pergram gdp_bwc15, xline(.03125 .16667)

Evaluated at the natural frequencies

The graph indicates that the above applications of the Butterworth filter did a reasonable job of
filtering out the high-periodicity stochastic cycles but that the low-periodicity stochastic cycles have
not been completely removed.

tsfilter bw Butterworth time-series filter

511

.04

gdp_bw8 trend component from bw filter


.02
0
.02
.04

Below we plot the estimated business-cycle component with recessions identified by the shaded
areas.

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable


gdp_bw8 trend component from bw filter

Stored results
tsfilter bw stores the following in r():
Scalars
r(order)
r(maxperiod)

order of the filter


maximum period of stochastic cycles

Macros
r(varlist)
r(filterlist)
r(trendlist)
r(method)
r(unit)

original time-series variables


variables containing estimates of the cyclical components
variables containing estimates of the trend components, if trend() was specified
Butterworth
units of time variable set using tsset or xtset

Methods and formulas


tsfilter bw uses the computational methods described in Pollock (2000) to implement the filter.
Pollock (2000) shows that the gain of the Butterworth high-pass filter is given by

"
() = 1 +

tan(c /2)
tan(/2)

2m #1

where m is the order of the filter, c = 2/ph is the cutoff frequency, and ph is the maximum
period.
Here is an outline of the computational procedure that Pollock (2000) derived.
Pollock (2000) showed that the Butterworth filter corresponds to a particular model. Actually, his
model is more general than the Butterworth filter, but tsfilter bw restricts the computations to the
case in which the model corresponds to the Butterworth filter.

512

tsfilter bw Butterworth time-series filter

The model represents the series to be filtered, yt , in terms of zero mean, covariance stationary,
and independent and identically distributed shocks t and t :

yt =

(1 + L)m
t + t
(1 L)m

From this model, Pollock (2000) shows that the optimal estimate for the cyclical component is
given by
c = Q(L + H )1 Q0 y
where Var{Q0 (y c)} = 2 L and Var{Q0 c} = 2 H . Here L and H are symmetric Toeplitz
matrices with 2m + 1 nonzero diagonal bands and generating functions (1 + z)m (1 + z 1 )m and
(1 z)m (1 z 1 )m , respectively.
The parameter in this expression is a function of ph (the maximum period of stochastic cycles
filtered out) and the order of the filter:

= {tan(/ph )}2m
The matrix Q0 in this expression is a function of the coefficients in the polynomial (1 L)d =
1 + 1 L + + d Ld :

d
..
.

0
Q = 0
.
..

0
0

. . . 1
.
..
. ..
. . . d
... 0
..
.
...
...

0
0

1
..
.
d1
d
..
.

... 0
.
..
. ..
... 1
. . . 1
..
.

0
0

. . . d
... 0

0
..
.

...

0
..
.

0
1

...
...
..
.

0
0
..
.

d1
d

... 1
. . . 1

(T d)T
0
..
.

0
..

.
0
1

It can be shown that H = Q0 Q and L = |H |, which simplifies the calculation of the cyclical
component to
c = Q{|Q0 Q| + (Q0 Q)}1 Q0 y

References
Bianchi, G., and R. Sorrentino. 2007. Electronic Filter Simulation and Design. New York: McGrawHill.
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
Butterworth, S. 1930. On the theory of filter amplifiers. Experimental Wireless and the Wireless Engineer 7: 536541.
Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic
Press.
. 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317334.
. 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 22682292.

tsfilter bw Butterworth time-series filter

Also see
[TS] tsset Declare data to be time-series data
[XT] xtset Declare data to be panel data
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[D] format Set variables output format
[TS] tssmooth Smooth and forecast univariate time-series data

513

Title
tsfilter cf ChristianoFitzgerald time-series filter
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Filter one variable
tsfilter cf

type

newvar = varname

if

 

in

 

, options

Filter multiple variables, unique names




    

tsfilter cf type newvarlist = varlist if
in
, options
Filter multiple variables, common name stub


    

tsfilter cf type stub* = varlist if
in
, options
Description

options
Main

filter out stochastic cycles at periods smaller than #


filter out stochastic cycles at periods larger than #
number of observations in each direction that contribute to
each filtered value
use calculations for a stationary time series
remove drift from the time series

minperiod(#)
maxperiod(#)
smaorder(#)
stationary
drift
Trend

trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s)


Gain

gain(gainvar anglevar)

save the gain and angular frequency

You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset.
varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Filters for cyclical components

>

Christiano-Fitzgerald

Description
tsfilter cf uses the Christiano and Fitzgerald (2003) band-pass filter to separate a time series
into trend and cyclical components. The trend component may contain a deterministic or a stochastic
trend. The stationary cyclical component is driven by stochastic cycles at the specified periods.
See [TS] tsfilter for an introduction to the methods implemented in tsfilter cf.
514

tsfilter cf ChristianoFitzgerald time-series filter

515

Options


Main

minperiod(#) filters out stochastic cycles at periods smaller than #, where # must be at least 2
and less than maxperiod(). By default, if the units of the time variable are set to daily, weekly,
monthly, quarterly, or half-yearly, then # is set to the number of periods equivalent to 1.5 years;
yearly data use minperiod(2); otherwise, the default value is minperiod(6).
maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than
minperiod(). By default, if the units of the time variable are set to daily, weekly, monthly,
quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 8 years;
otherwise, the default value is maxperiod(32).
smaorder(#) sets the order of the symmetric moving average, denoted by q . By default, smaorder()
is not set, which invokes the asymmetric calculations for the ChristianoFitzgerald filter. The order
is an integer that specifies the number of observations in each direction used in calculating the
symmetric moving average estimate of the cyclical component. This number must be an integer
greater than zero and less than (T 1)/2. The estimate of the cyclical component for the tth
observation, yt , is based upon the 2q + 1 values ytq , ytq+1 , . . . , yt , yt+1 , . . . , yt+q .
stationary modifies the filter calculations to those appropriate for a stationary series. By default,
the series is assumed nonstationary.
drift removes drift using the approach described in Christiano and Fitzgerald (2003). By default,
drift is not removed.

Trend

trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by
newvar, newvarlist, or stub*.

Gain

gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar.
Gains are calculated at the N angular frequencies that uniformly partition the interval (0, ], where
N is the sample size.

Remarks and examples


We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and
the methods implemented in tsfilter cf, more examples using tsfilter cf, and a comparison
of the four filters implemented by tsfilter. In particular, an understanding of gain functions as
presented in [TS] tsfilter is required to understand these remarks.
tsfilter cf uses the ChristianoFitzgerald (CF) band-pass filter to separate a time-series yt into
trend and cyclical components
yt = t + ct
where t is the trend component and ct is the cyclical component. t may be nonstationary; it may
contain a deterministic or a stochastic trend, as discussed below.
The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic
cycles at a specified range of periods. The trend component t is calculated by the difference
t = yt ct .

516

tsfilter cf ChristianoFitzgerald time-series filter

Although the CF band-pass filter implemented in tsfilter cf has been widely applied by
macroeconomists, it is a general time-series method and may be of interest to other researchers.
As discussed by Christiano and Fitzgerald (2003) and in [TS] tsfilter, if one had an infinitely long
series, one could apply an ideal band-pass filter that perfectly separates out cyclical components driven
by stochastic cycles at the specified periodicities. In finite samples, it is not possible to exactly satisfy
the conditions that a filter must fulfill to perfectly separate out the specified stochastic cycles; the
expansive filter literature reflects the trade-offs involved in choosing a finite-length filter to separate
out the specified stochastic cycles.
Christiano and Fitzgerald (2003) derive a finite-length CF band-pass filter that minimizes the mean
squared error between the filtered series and the series filtered by an ideal band-pass filter that perfectly
separates out components driven by stochastic cycles at the specified periodicities. Christiano and
Fitzgerald (2003) place two important restrictions on the mean squared error problem that their filter
solves. First, the CF filter is restricted to be a linear filter. Second, yt is assumed to be a random-walk
process; in other words, yt = yt1 + t , where t is independently and identically distributed with
mean zero and finite variance. The CF filter is the best linear predictor of the series filtered by the
ideal band-pass filter when yt is a random walk.
Christiano and Fitzgerald (2003) make four points in support of the random-walk assumption.
First, the mean squared error problem solved by their filter requires that the process for yt be
specified. Second, they provide a method for removing drift so that their filter handles cases in
which yt is a random walk with drift. Third, many economic time series are well approximated by a
random-walk-plus-drift process. (We add that many time series encountered in applied statistics are
well approximated by a random-walk-plus-drift process.) Fourth, they provide simulation evidence
that their filter performs well when the process generating yt is not a random-walk-plus-drift process
but is close to being a random-walk-plus-drift process.
Comparing the CF filter with the BaxterKing (BK) filter provides some intuition and explains
the smaorder() option in tsfilter cf. As discussed in [TS] tsfilter and Baxter and King (1999),
symmetric moving-average (SMA) filters with coefficients that sum to zero can extract the components
driven by stochastic cycles at specified periodicities when the series to be filtered has a deterministic
or stochastic trend of order 1 or 2.
The coefficients of the finite-length BK filter are as close as possible to the coefficients of an ideal
SMA band-pass filter under the constraints that the BK coefficients are symmetric and sum to zero.
The coefficients of the CF filter are not symmetric nor do they sum to zero, but the CF filter was
designed to filter out the specified periodicities when yt has a first-order stochastic trend.
To be robust to second-order trends, Christiano and Fitzgerald (2003) derive a constrained version
of the CF filter. The coefficients of the constrained filter are constrained to be symmetric and to
sum to zero. Subject to these constraints, the coefficients of the constrained CF filter minimize the
mean squared error between the filtered series and the series filtered by an ideal band-pass filter that
perfectly separates out the components. Christiano and Fitzgerald (2003) note that the higher-order
detrending properties of this constrained filter come at the cost of lost efficiency. If the constraints
are binding, the constrained filter cannot predict the series filtered by the ideal filter as well as the
unconstrained filter can.
Specifying the smaorder() option causes tsfilter cf to compute the SMA-constrained CF filter.
The choice between the BK and the CF filters is one between robustness and efficiency. The BK
filter handles a broader class of stochastic processes than does the CF filter, but the CF filter produces
a better estimate of ct if yt is close to a random-walk process or a random-walk-plus-drift process.
Among economists, the CF filter is commonly used for investigating business cycles. Burns and
Mitchell (1946) defined business cycles as stochastic cycles in business data corresponding to periods

tsfilter cf ChristianoFitzgerald time-series filter

517

between 1.5 and 8 years. The default values for minperiod() and maxperiod() are the Burns
Mitchell values of 1.5 and 8 years scaled to the frequency of the dataset. The calculations of the default
values assume that the time variable is formatted as daily, weekly, monthly, quarterly, half-yearly, or
yearly; see [D] format.
When yt is assumed to be a random-walk-plus-drift process instead of a random-walk process,
specify the drift option, which removes the linear drift in the series before applying the filter. Drift
is removed by transforming the original series to a new series by using the calculation

zt = yt

(t 1)(yT y1 )
T 1

The cyclical component ct is calculated from drift-adjusted series zt . The trend component t is
calculated by t = yt ct .
By default, the CF filter assumes the series is nonstationary. If the series is stationary, the
stationary option is used to change the calculations to those appropriate for a stationary series.
For each variable, the CF filter estimate of ct is put in the corresponding new variable, and when
the trend() option is specified, the estimate of t is put in the corresponding new variable.
tsfilter cf automatically detects panel data from the information provided when the dataset was
tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning
and end of the sample are excluded from the sample. The sample may not contain gaps.

Example 1: Estimating a business-cycle component


In this and the subsequent examples, we use tsfilter cf to estimate the business-cycle component
of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly
data goes from 1952q1 to 2010q4. Below we read in and plot the data.
. use http://www.stata-press.com/data/r13/gdp2
(Federal Reserve Economic Data, St. Louis Fed)

7.5

natural log of real GDP


8.5
9

9.5

. tsline gdp_ln

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable

The series looks like it might be generated by a random-walk-plus-drift process and is thus a
candidate for the CF filter.

518

tsfilter cf ChristianoFitzgerald time-series filter

Below we use tsfilter cf to filter gdp ln, and we use pergram (see [TS] pergram) to compute
and to plot the periodogram of the estimated cyclical component.

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from cf filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. tsfilter cf gdp_cf = gdp_ln


. pergram gdp_cf, xline(.03125 .16667)

Evaluated at the natural frequencies

Because our sample is of quarterly data, tsfilter cf used the default values of minperiod(6)
and maxperiod(32). The minimum and maximum periods are the Burns and Mitchell (1946)
business-cycle periods for quarterly data.
In the periodogram, we added vertical lines at the natural frequencies corresponding to the
conventional Burns and Mitchell (1946) values for business-cycle components. pergram displays the
results in natural frequencies, which are the standard frequencies divided by 2 . We use the xline()
option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper
natural-frequency cutoff (1/6 0.16667).
If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram
would be a flat line at the minimum value of 6 outside the range identified by the vertical lines.
The periodogram reveals that the CF did a reasonable job of filtering out the unwanted stochastic
cycles.

tsfilter cf ChristianoFitzgerald time-series filter

519

.04

gdp_ln cyclical component from cf filter


.02
0
.02
.04

Below we plot the estimated business-cycle component with recessions identified by the shaded
areas.

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable


gdp_ln cyclical component from cf filter

Stored results
tsfilter cf stores the following in r():
Scalars
r(smaorder)
r(minperiod)
r(maxperiod)
Macros
r(varlist)
r(filterlist)
r(trendlist)
r(method)
r(symmetric)
r(drift)
r(stationary)
r(unit)
Matrices
r(filter)

order of the symmetric moving average, if specified


minimum period of stochastic cycles
maximum period of stochastic cycles
original time-series variables
variables containing estimates of the cyclical components
variables containing estimates of the trend components, if trend() was specified
Christiano-Fitzgerald
yes or no, indicating whether the symmetric version of the filter was or was not used
yes or no, indicating whether drift was or was not removed before filtering
yes or no, indicating whether the calculations assumed the series was or was not stationary
units of time variable set using tsset or xtset
(q+1)1 matrix of weights (b
b0 ,b
b1 ,...,b
bq )0 , where q is the order of the symmetric moving

average, and the weights are the ChristianoFitzgerald coefficients; only returned when
smaorder() is used to set q

Methods and formulas


For an infinitely long series, there is an ideal band-pass filter that extracts the cyclical component
by using the calculation

X
ct =
bj ytj
j=

If pl and ph are the minimum and maximum periods of the stochastic cycles of interest, the weights
bj in the ideal band-pass filter are given by

520

tsfilter cf ChristianoFitzgerald time-series filter

bj =

1 (h l )

if j = 0

(j)1 {sin(jh ) sin(jl )} if j 6= 0

where l = 2/pl and h = 2/ph are the lower and higher cutoff frequencies, respectively.
Because our time series has finite length, the ideal band-pass filter cannot be computed exactly.
Christiano and Fitzgerald (2003) derive the finite-length CF band-pass filter that minimizes the mean
squared error between the filtered series and the series filtered by an ideal band-pass filter that
perfectly separates out the components. This filter is not symmetric nor do the coefficients sum to
zero. The formula for calculating the value of cyclical component ct for t = 2, 3, . . . , T 1 using
the asymmetric version of the CF filter can be expressed as

ct = b0 yt +

TX
t1

bj yt+j + ebT t yT +

j=1

t2
X

bj ytj + ebt1 y1

j=1

where b0 , b1 , . . . are the weights used by the ideal band-pass filter. e


bT t and ebt1 are linear functions
of the ideal weights used in this calculation. The CF filter uses two different calculations for e
bt
depending upon whether the series is assumed to be stationary or nonstationary.
For the default nonstationary case with 1 < t < T , Christiano and Fitzgerald (2003) set e
bT t and
ebt1 to
TX
t1
t2
X
1
ebT t = 1 b0
bj and ebt1 = b0
bj
2
2
j=1
j=1
which forces the weights to sum to zero.
For the nonstationary case, when t = 1 or t = T , the two endpoints (c1 and cT ) use only one
modified weight, e
bT 1 :

c1 =

T
2
X
1
b0 y1 +
bj yj+1 + ebT 1 yT
2
j=1

and

cT =

T
2
X
1
b0 y T +
bj yT j + ebT 1 y1
2
j=1

When the stationary option is used to invoke the stationary calculations, all weights are set to
the ideal filter weight, that is, e
bj = bj .
If the smaorder() option is set, the symmetric version of the CF filter is used. This option specifies
the length of the symmetric moving average denoted by q . The symmetric calculations for ct are
similar to those used by the BK filter:

ct = bbq {Lq (yt ) + Lq (yt )} +

q1
X

bj Lj (yt )

j=q+1

Pq1
where, for the default nonstationary calculations, b
bq = (1/2)b0 j=1 bj . If the smaorder()
and stationary options are set, then b
bq is set equal to the ideal weight bq .

tsfilter cf ChristianoFitzgerald time-series filter

521

References
Baxter, M., and R. G. King. 1999. Measuring business cycles: Approximate band-pass filters for economic time series.
Review of Economics and Statistics 81: 575593.
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
Christiano, L. J., and T. J. Fitzgerald. 2003. The band pass filter. International Economic Review 44: 435465.
Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic
Press.
. 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 22682292.

Also see
[TS] tsset Declare data to be time-series data
[XT] xtset Declare data to be panel data
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[D] format Set variables output format
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tsfilter hp HodrickPrescott time-series filter
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
Filter one variable
tsfilter hp

type

newvar = varname

if

 

in

 

, options

Filter multiple variables, unique names




    

tsfilter hp type newvarlist = varlist if
in
, options
Filter multiple variables, common name stub


    

tsfilter hp type stub* = varlist if
in
, options
Description

options
Main

smoothing parameter for the HodrickPrescott filter

smooth(#)
Trend

trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s)


Gain

gain(gainvar anglevar)

save the gain and angular frequency

You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset.
varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Filters for cyclical components

>

Hodrick-Prescott

Description
tsfilter hp uses the HodrickPrescott high-pass filter to separate a time series into trend and
cyclical components. The trend component may contain a deterministic or a stochastic trend. The
smoothing parameter determines the periods of the stochastic cycles that drive the stationary cyclical
component.
See [TS] tsfilter for an introduction to the methods implemented in tsfilter hp.

522

tsfilter hp HodrickPrescott time-series filter

523

Options


Main

smooth(#) sets the smoothing parameter for the HodrickPrescott filter. By default if the units of the
time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then the RavnUhlig
rule is used to set the smoothing parameter; otherwise, the default value is smooth(1600). The
RavnUhlig rule sets # to 1600p4q , where pq is the number of periods per quarter. The smoothing
parameter must be greater than 0.

Trend

trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by
newvar, newvarlist, or stub*.

Gain

gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar.
Gains are calculated at the N angular frequencies that uniformly partition the interval (0, ], where
N is the sample size.

Remarks and examples


We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and
the methods implemented in tsfilter hp, more examples using tsfilter hp, and a comparison
of the four filters implemented by tsfilter. In particular, an understanding of gain functions as
presented in [TS] tsfilter is required to understand these remarks.
tsfilter hp uses the HodrickPrescott (HP) high-pass filter to separate a time-series yt into trend
and cyclical components
yt = t + ct
where t is the trend component and ct is the cyclical component. t may be nonstationary; it may
contain a deterministic or a stochastic trend, as discussed below.
The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic
cycles at a range of periods. The trend component t is calculated by the difference t = yt ct .
Although the HP high-pass filter implemented in tsfilter hp has been widely applied by
macroeconomists, it is a general time-series method and may be of interest to other researchers.
Hodrick and Prescott (1997) motivated the HP filter as a trend-removal technique that could be
applied to data that came from a wide class of data-generating processes. In their view, the technique
specified a trend in the data and the data was filtered by removing the trend. The smoothness of
the trend depends on a parameter . The trend becomes smoother as , and Hodrick and
Prescott (1997) recommended setting to 1,600 for quarterly data.
King and Rebelo (1993) showed that removing a trend estimated by the HP filter is equivalent to
a high-pass filter. They derived the gain function of this high-pass filter and showed that the filter
would make integrated processes of order 4 or less stationary, making the HP filter comparable to the
other filters implemented in tsfilter.

524

tsfilter hp HodrickPrescott time-series filter

Example 1: Estimating a business-cycle component


In this and the subsequent examples, we use tsfilter hp to estimate the business-cycle component
of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly
data goes from 1952q1 to 2010q4. Below we read in and plot the data.

7.5

natural log of real GDP


8.5
9

9.5

. use http://www.stata-press.com/data/r13/gdp2
(Federal Reserve Economic Data, St. Louis Fed)
. tsline gdp_ln

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable

The series is nonstationary and is thus a candidate for the HP filter.


Below we use tsfilter hp to filter gdp ln, and we use pergram (see [TS] pergram) to compute
and to plot the periodogram of the estimated cyclical component.
. tsfilter hp gdp_hp = gdp_ln
. pergram gdp_hp, xline(.03125 .16667)

Because our sample is of quarterly data, tsfilter hp used the default value for the smoothing
parameter of 1,600.
In the periodogram, we added vertical lines at the natural frequencies corresponding to the
conventional Burns and Mitchell (1946) values for business-cycle components of 32 periods and
6 periods. pergram displays the results in natural frequencies, which are the standard frequencies
divided by 2 . We use the xline() option to draw vertical lines at the lower natural-frequency cutoff
(1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 0.16667).
If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram
would be a flat line at the minimum value of 6 outside the range identified by the vertical lines.

0.00

2.00

4.00

6.00

Sample spectral density function

525

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from hp filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

tsfilter hp HodrickPrescott time-series filter

Evaluated at the natural frequencies

The periodogram reveals a high-periodicity issue and a low-periodicity issue. The points above 6.00
to the left of the left-hand vertical line in the periodogram reveal that the filter did not do a good
job of filtering out the high-periodicity stochastic cycles with the default value smoothing parameter
of 1,600. That there is no tendency of the points to the right of the right-hand vertical line to be
smoothed toward 6.00 reveals that the HP filter did not remove any of the low-periodicity stochastic
cycles. This result is not surprising, because the HP filter is a high-pass filter.
In the next example, we address the high-periodicity issue. See [TS] tsfilter and [TS] tsfilter bw
for how to turn a high-pass filter into a band-pass filter.

Example 2: Choosing the filter parameters


In the filter literature, filter parameters are set as functions of the cutoff frequency; see Pollock (2000,
324), for instance. This method finds the filter parameter that sets the gain of the filter equal to 1/2
at the cutoff frequency. In a technical note in [TS] tsfilter, we showed that applying this method to
selecting at the cutoff frequency of 32 periods suggests setting 677.13. In the output below, we
estimate the business-cycle component using this value for the smoothing parameter, and we compute
and plot the periodogram of the estimated business-cycle component.

526

tsfilter hp HodrickPrescott time-series filter

0.00

2.00

4.00

6.00

Sample spectral density function

0.00

0.10

0.20
0.30
Frequency

0.40

0.50

6.00 4.00 2.00

gdp_ln cyclical component from hp filter


Log Periodogram
6.00 4.00 2.00 0.00 2.00 4.00 6.00

. tsfilter hp gdp_hp677 = gdp_ln, smooth(677.13)


. pergram gdp_hp677, xline(.03125 .16667)

Evaluated at the natural frequencies

A comparison of the two periodograms reveals that setting the smoothing parameter to 677.13
removes more of the high-periodicity stochastic cycles than does the default 1,600. In [TS] tsfilter,
we found that the HP filter was not as good at removing the high-periodicity stochastic cycles as
was the ChristianoFitzgerald filter implemented in tsfilter cf or as was the Butterworth filter
implemented in tsfilter bw.

.04

gdp_ln cyclical component from hp filter


.02
0
.02

.04

Below we plot the estimated business-cycle component with recessions identified by the shaded
areas.

1950q1

1960q1

1970q1

1980q1

1990q1

2000q1

2010q1

quarterly time variable


gdp_ln cyclical component from hp filter

tsfilter hp automatically detects panel data from the information provided when the dataset was
tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning
and end of the sample are excluded from the sample. The sample may not contain gaps.

tsfilter hp HodrickPrescott time-series filter

527

Stored results
tsfilter hp stores the following in r():
Scalars
r(smooth)
Macros
r(varlist)
r(filterlist)
r(trendlist)
r(method)
r(unit)

smoothing parameter
original time-series variables
variables containing estimates of the cyclical components
variables containing estimates of the trend components, if trend() was specified
Hodrick-Prescott
units of time variable set using tsset or xtset

Methods and formulas


Formally, the filter is defined as the solution to the following optimization problem for t


X
T
1
T
X
2
2
{(t+1 t ) (t t1 )}
(yt t ) +
min
t

t=2

t=1

where the smoothing parameter is set fixed to a value.


If = 0, the solution degenerates to t = yt , in which case the filter excludes all frequencies,
that is, ct = 0. On the other extreme, as , the solution approaches the least-squares fit to the
line t = 0 + 1 t; see Hodrick and Prescott (1997) for a discussion.
For a fixed , it can be shown that the cyclical component c0 = (c1 , c2 , . . . , cT ) is calculated by

c = (IT M1 )y
where y is the column vector y0 = (y1 , y2 , . . . , yT ), IT is the T T identity matrix, and M is the
T T matrix:

(1 + )
2

0
0
0
...
0
2 (1 + 5)
4

0
0
...
0


4
(1 + 6)
4

0
...
0

4
(1 + 6)
4

...
0

..
..
..
..
..
..
..

.
.
.
.
.

.
.
.
.
.
M=

..
..
..
..
..
..

.
.
.
.
.
.
0
0

0
...

4
(1 + 6)
4

0
...
0

4
(1 + 6)
4

0
...
0
0

4
(1 + 5) 2
0
...
0
0
0

2
(1 + )

The gain of the HP filter is given by (see King and Rebelo [1993], Maravall and del Rio [2007],
or Harvey and Trimbur [2008])

() =

4{1 cos()}2
1 + 4{1 cos()}2

528

tsfilter hp HodrickPrescott time-series filter

As discussed in [TS] tsfilter, there are two approaches to selecting . One method, based on
the heuristic argument of Hodrick and Prescott (1997), is used to compute the default values for .
The method sets to 1,600 for quarterly data and to the rescaled values worked out by Ravn and
Uhlig (2002). The rescaled default values for are 6.25 for yearly data, 100 for half-yearly data,
129,600 for monthly data, 1600 124 for weekly data, and 1600 (365/4)4 for daily data.
The second method for selecting uses the recommendations of Pollock (2000, 324), who uses
the gain function of the filter to identify a value for .
Additional literature critiques the HP filter by pointing out that the HP filter corresponds to a specific
model. Harvey and Trimbur (2008) show that the cyclical component estimated by the HP filter is
equivalent to one estimated by a particular unobserved-components model. Harvey and Jaeger (1993),
Gomez (1999), Pollock (2000), and Gomez (2001) also show this result and provide interesting
comparisons of estimating ct by filtering and model-based methods.

References
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
Gomez, V. 1999. Three equivalent methods for filtering finite nonstationary time series. Journal of Business and
Economic Statistics 17: 109116.
. 2001. The use of Butterworth filters for trend and cycle estimation in economic time series. Journal of Business
and Economic Statistics 19: 365373.
Harvey, A. C., and A. Jaeger. 1993. Detrending, stylized facts and the business cycle. Journal of Applied Econometrics
8: 231247.
Harvey, A. C., and T. M. Trimbur. 2008. Trend estimation and the HodrickPrescott filter. Journal of the Japanese
Statistical Society 38: 4149.
Hodrick, R. J., and E. C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money,
Credit, and Banking 29: 116.
King, R. G., and S. T. Rebelo. 1993. Low frequency filtering and real business cycles. Journal of Economic Dynamics
and Control 17: 207231.
Leser, C. E. V. 1961. A simple method of trend construction. Journal of the Royal Statistical Society, Series B 23:
91107.
Maravall, A., and A. del Rio. 2007. Temporal aggregation, systematic sampling, and the HodrickPrescott filter.
Working Paper No. 0728, Banco de Espana.
http://www.bde.es/webbde/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/07/Fic/dt0728e.pdf.
Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic
Press.
. 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317334.
. 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 22682292.
Ravn, M. O., and H. Uhlig. 2002. On adjusting the HodrickPrescott filter for the frequency of observations. Review
of Economics and Statistics 84: 371376.

Also see
[TS] tsset Declare data to be time-series data
[XT] xtset Declare data to be panel data
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[D] format Set variables output format
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tsline Plot time-series data

Syntax
Remarks and examples

Menu
References

Description
Also see

Options

Syntax
Time-series line plot


    

twoway tsline varlist if
in
, tsline options
Time-series range plot with lines


    

twoway tsrline y1 y2 if
in
, tsrline options
where the time variable is assumed set by tsset (see [TS] tsset), varlist has the interpretation
y1 y2 . . . yk .
Description

tsline options
Plots

any of the options documented in [G-2] graph twoway scatter with


the exception of marker options, marker placement options,
and marker label options, which will be ignored if specified

scatter options

Y axis, Time axis, Titles, Legend, Overall, By

twoway options

any options documented in [G-3] twoway options

tsrline options

Description

Plots

any of the options documented in [G-2] graph twoway rline

rline options

Y axis, Time axis, Titles, Legend, Overall, By

any options documented in [G-3] twoway options

twoway options

Menu
Statistics

>

Time series

>

Graphs

>

Line plots

Description
tsline draws line plots for time-series data.
tsrline draws a range plot with lines for time-series data.
529

530

tsline Plot time-series data

tsline and tsrline are both commands and plottypes as defined in [G-2] graph twoway. Thus
the syntax for tsline is
. graph twoway tsline ...
. twoway tsline ...
. tsline ...

and similarly for tsrline. Being plot types, these commands may be combined with other plot types
in the twoway family, as in,
. twoway (tsrline

. . . ) (tsline . . . ) (lfit . . . ) . . .

which can equivalently be written


. tsrline

. . . || tsline . . . || lfit . . . || . . .

Options


Plots

scatter options are any of the options allowed by the graph twoway scatter command except that
marker options, marker placement option, and marker label options will be ignored if specified;
see [G-2] graph twoway scatter.
rline options are any of the options allowed by the graph twoway rline command; see [G-2] graph
twoway rline.

Y axis, Time axis, Titles, Legend, Overall, By

twoway options are any of the options documented in [G-3] twoway options. These include options
for titling the graph (see [G-3] title options), for saving the graph to disk (see [G-3] saving option),
and the by() option, which will allow you to simultaneously plot different subsets of the data
(see [G-3] by option).
Also see the recast() option discussed in [G-3] advanced options for information on how to
plot spikes, bars, etc., instead of lines.

Remarks and examples


Remarks are presented under the following headings:
Basic examples
Video example

Basic examples
Example 1
We simulated two separate time series (each of 200 observations) and placed them in a Stata
dataset, tsline1.dta. The first series simulates an AR(2) process with 1 = 0.8 and 2 = 0.2; the
second series simulates an MA(2) process with 1 = 0.8 and 2 = 0.2. We use tsline to graph
these two series.

tsline Plot time-series data

531

. use http://www.stata-press.com/data/r13/tsline1
. tsset lags
time variable: lags, 0 to 199
delta: 1 unit
. tsline ar ma

50
Simulated AR(.8,.2)

100
lags

150

200

Simulated MA(.8,.2)

Example 2
Suppose that we kept a calorie log for an entire calendar year. At the end of the year, we would
have a dataset (for example, tsline2.dta) that contains the number of calories consumed for 365
days. We could then use tsset to identify the date variable and tsline to plot calories versus time.
Knowing that we tend to eat a little more food on Thanksgiving and Christmas day, we use the
ttick() and ttext() options to point these days out on the time axis.

532

tsline Plot time-series data


. use http://www.stata-press.com/data/r13/tsline2
. tsset day
time variable:
delta:

day, 01jan2002 to 31dec2002


1 day

01jan2002

01apr2002

01jul2002
Date

01oct2002

xmas

3400

thanks

3600

Calories consumed
3800
4000
4200

4400

. tsline calories, ttick(28nov2002 25dec2002, tpos(in))


> ttext(3470 28nov2002 "thanks" 3470 25dec2002 "x-mas", orient(vert))

01jan2003

We were uncertain of the exact values we logged, so we also gave a range for each day. Here is
a plot of the summer months.

3300

3400

Calories
3500 3600

3700

3800

. tsrline lcalories ucalories if tin(1may2002,31aug2002) || tsline cal ||


> if tin(1may2002,31aug2002), ytitle(Calories)

01may2002

01jun2002

01jul2002
Date

Calorie range

01aug2002

01sep2002

Calories consumed

Options associated with the time axis allow dates (and times) to be specified in place of numeric
date (and time) values. For instance, we used
ttick(28nov2002 25dec2002, tpos(in))

to place tick marks at the specified dates. This works similarly for tlabel, tmlabel, and tmtick.

tsline Plot time-series data

533

Suppose that we wanted to place vertical lines for the previously mentioned holidays. We could
specify the dates in the tline() option as follows:

3400

3600

Calories consumed
3800
4000
4200

4400

. tsline calories, tline(28nov2002 25dec2002)

01jan2002

01apr2002

01jul2002
Date

01oct2002

01jan2003

We could also modify the format of the time axis so that only the day in the year is displayed in
the labeled ticks:

3400

3600

Calories consumed
3800
4000

4200

4400

. tsline calories, tlabel(, format(%tdmd)) ttitle("Date (2002)")

Jan1

Apr1

Video example
Time series, part 2: Line graphs and tin()

Jul1
Date (2002)

Oct1

Jan1

534

tsline Plot time-series data

References
Cox, N. J. 2009a. Speaking Stata: Graphs for all seasons. Stata Journal 6: 397419.
. 2009b. Stata tip 76: Separating seasonal time series. Stata Journal 9: 321326.
. 2012. Speaking Stata: Transforming the time axis. Stata Journal 12: 332341.

Also see
[TS] tsset Declare data to be time-series data
[G-2] graph twoway Twoway graphs
[XT] xtline Panel-data line plots

Title
tsreport Report time-series aspects of a dataset or estimation sample
Syntax
Remarks and examples

Menu
Stored results

Description
Also see

Options

Syntax
tsreport

varlist

 

if

 

in

 

, options

Description

options
Main

list periods for each gap


treat a period as a gap if any of the specified variables are missing
do not count panel changes as gaps

detail
casewise
panel

varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Setup and utilities

>

Report time-series aspects of dataset

Description
tsreport reports time gaps in a dataset or in a subset of variables. By default, tsreport reports
periods in which no information is recorded in the dataset; the time variable does not include these
periods. When you specify varlist, tsreport reports periods in which either no information is
recorded in the dataset or the time variable is present, but one or more variables in varlist contain a
missing value.

Options


Main

detail reports the beginning and ending times of each gap.


casewise specifies that a period for which any of the specified variables are missing be counted as
a gap. By default, gaps are reported for each variable individually.
panel specifies that panel changes not be counted as gaps. Whether panel changes are counted as
gaps usually depends on how the calling command handles panels.

Remarks and examples


Remarks are presented under the following headings:
Basic examples
Video example

535

536

tsreport Report time-series aspects of a dataset or estimation sample

Basic examples
Time-series commands sometimes require that observations be on a fixed time interval with no
gaps, or the command may not function properly. tsreport provides a tool for reporting the gaps
in a sample.

Example 1: A simple panel-data example


The following monthly panel data have two panels and a missing month (March) in the second
panel:
. use http://www.stata-press.com/data/r13/tsrptxmpl
. list edlevel month income in 1/6, sep(0)
edlevel

month

income

1
1
1
2
2
2

1998m1
1998m2
1998m3
1998m1
1998m2
1998m4

687
783
790
1435
1522
1532

1.
2.
3.
4.
5.
6.

Invoking tsreport gives us the following report:


. tsreport
Panel variable:
Time variable:

edlevel
month

Starting period = 1998m1


Ending period
= 1998m4
Observations
=
6
Number of gaps =
2
(Gap count includes panel changes)

Two gaps are reported in the sample. We know the second panel is missing the month of March, but
where is the second gap? The note at the bottom of the output is telling us something about panel
changes. Lets use the detail option to get more information:
. tsreport, detail
Panel variable:
Time variable:

edlevel
month

Starting period = 1998m1


Ending period
= 1998m4
Observations
=
6
Number of gaps =
2
(Gap count includes panel changes)
Gap report
Obs.

3
6

edlevel

Start

End

N. Obs.

1
2

1998m4
1998m3

.
1998m3

.
1

We now see what is happening. tsreport is counting the change from the first panel to the second
panel as a gap. Look at the output from the list command above. The value of month in observation

tsreport Report time-series aspects of a dataset or estimation sample

537

4 is not one month later than the value of month in observation 3, so tsreport reports a gap. (If
we are programmers writing a procedure that does not account for panels, a change from one panel
to the next represents a break in the time series just as a gap in the data does.) For the second gap,
tsreport indicates that just one observation is missing because we are only missing the month of
March. This gap is between observations 5 and 6 of the data.
In other cases, we may not care about changes in panels and not want them counted as gaps. We
can use the panel option to specify that tsreport should ignore panel changes:
. tsreport, detail panel
Panel variable:
edlevel
Time variable:
month
Starting period
Ending period
Observations
Number of gaps
Gap report

=
=
=
=

Obs.
5

1998m1
1998m4
6
1

edlevel

Start

End

N. Obs.

1998m3

1998m3

tsreport now indicates there is just one gap, corresponding to March for the second panel.

Example 2: Variables with missing data


We asked two large hotels in Las Vegas to record the prices they were quoting people who called
to make reservations. Because these prices change frequently in response to promotions and market
conditions, we asked the hotels to record their prices hourly. Unfortunately, the managers did not
consider us a top priority, so we are missing some data. Our dataset looks like this:
. use http://www.stata-press.com/data/r13/hotelprice
. list, sep(0)

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.

13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007

hour

price1

price2

08:00:00
09:00:00
10:00:00
11:00:00
12:00:00
13:00:00
14:00:00
15:00:00
16:00:00
17:00:00
20:00:00

140
155
.
155
160
.
165
170
175
180
190

245
250
250
250
255
.
255
260
265
.
270

First, lets invoke tsreport without specifying price1 or price2. We will specify the detail
option so that we can see the periods corresponding to the gap or gaps reported:

538

tsreport Report time-series aspects of a dataset or estimation sample


. tsreport, detail
Time variable:

hour

Starting period
Ending period
Observations
Number of gaps

= 13feb2007 08:00:00
= 13feb2007 20:00:00
=
11
=
1

Gap report
Obs.
10

11

Start

End

N. Obs.

13feb2007 18:00:00

13feb2007 19:00:00

One gap is reported, lasting two periods. We have no data corresponding to 6:00 p.m. and 7:00 p.m.
on February 13, 2007.
What about observations 3, 6, and 10? We are missing data on one or both of the price variables for
those observations, but the time variable itself is present for those observations. By default, tsreport
defines gaps as periods in which no information, not even the time variable itself, is recorded.
If we instead want to obtain information about when one or more variables are missing information,
then we specify those variables in our call to tsreport. Here we specify price1, first without the
detail option:
. tsreport price1
Gap summary report

Variable
price1

Start

End

13feb2007 08:00:00

13feb2007 20:00:00

Number of
Obs.
Gaps
9

The output indicates that we have data on price1 from 8:00 a.m. to 8:00 p.m. However, we only
have 9 observations on price1 during that span because we have 3 gaps in the data. Lets specify
the detail option to find out where:
. tsreport price1, detail
Variable:
Time variable:

price1
hour

Starting period
Ending period
Observations
Number of gaps

= 13feb2007 08:00:00
= 13feb2007 20:00:00
=
9
=
3

Gap report
Obs.
2 4
5 7
10 11

Start

End

N. Obs.

13feb2007 10:00:00
13feb2007 13:00:00
13feb2007 18:00:00

13feb2007 10:00:00
13feb2007 13:00:00
13feb2007 19:00:00

1
1
2

The three gaps correspond to observations 3 and 6, for which price1 is missing, as well as the
two-period gap in the evening when not even the time variable is recorded in the dataset.

tsreport Report time-series aspects of a dataset or estimation sample

539

When you specify multiple variables with tsreport, by default, it summarizes gaps in each
variable separately. Apart from combining the information into one table, typing
. tsreport price1 price2

is almost the same as typing


. tsreport price1
. tsreport price2

The only difference between the two methods is that the former stores results for both variables in
r-class macros for later use, whereas if you were to type the latter two commands in succession,
r-class macros would only contain results for price2.
In many types of analyses, including linear regression, you can only use observations for which
all the variables contain nonmissing data. Similarly, you can have tsreport report as gaps periods
in which any of the specified variables contain missing values. To do that, you use the casewise
option.

Example 3: Casewise analyses


Continuing with our hotel data, we specify both price1 and price2 in the variable list of
tsreport. We request casewise analysis, and we specify the detail option to get information on
each gap tsreport finds.
. tsreport price1 price2, casewise detail
Variables:
price1 and price2
Time variable:
hour
Starting period
Ending period
Observations
Number of gaps
Gap report
Obs.
2
5
9

4
7
11

= 13feb2007 08:00:00
= 13feb2007 20:00:00
=
8
=
3

Start

End

N. Obs.

13feb2007 10:00:00
13feb2007 13:00:00
13feb2007 17:00:00

13feb2007 10:00:00
13feb2007 13:00:00
13feb2007 19:00:00

1
1
3

The first gap reported by tsreport corresponds to observation 3, when price1 is missing, and the
second gap corresponds to observation 6, when both price1 and price2 are missing. The third gap
spans 3 observations: the 5:00 p.m. observation is missing for price2, and as we discovered earlier,
not even the time variable is present at 6:00 p.m. and 7:00 p.m.

Video example
Time series, part 1: Formatting dates, tsset, tsreport, and tsfill

540

tsreport Report time-series aspects of a dataset or estimation sample

Stored results
tsreport, when no varlist is specified or when casewise is specified, stores the following in
r():
Scalars
r(N gaps)
r(N obs)
r(start)
r(end)
Macros
r(tsfmt)
Matrices
r(table)

number of gaps
number of observations
first time in series
last time in series
%fmt of time variable
matrix containing start and end times of each gap, if detail is specified

tsreport, when a varlist is specified and casewise is not specified, stores the following in r():
Scalars
r(N gaps#)
r(N obs#)
r(start#)
r(end#)
Macros
r(tsfmt)
r(var#)
Matrices
r(table#)

number of gaps for variable #


number of observations for variable #
first time in series for variable #
last time in series for variable #
%fmt of time variable
name of variable #
matrix containing start and end times of each gap for variable #, if detail is specified

When k variables are specified in varlist, # ranges from 1 to k .

Also see
[TS] tsset Declare data to be time-series data

Title
tsrevar Time-series operator programming command
Syntax
Stored results

Description
Also see

Options

Remarks and examples

Syntax


  
tsrevar varlist
if
in
, substitute list
You must tsset your data before using tsrevar; see [TS] tsset.

Description
tsrevar, substitute takes a varlist that might contain op.varname combinations and substitutes
equivalent temporary variables for the combinations.
tsrevar, list creates no new variables. It returns in r(varlist) the list of base variables
corresponding to varlist.

Options
substitute specifies that tsrevar resolve op.varname combinations by creating temporary variables
as described above. substitute is the default action taken by tsrevar; you do not need to
specify the option.
list specifies that tsrevar return a list of base variable names.

Remarks and examples


tsrevar substitutes temporary variables for any op.varname combinations in a variable list. For
instance, the original varlist might be gnp L.gnp r, and tsrevar, substitute would create
newvar = L.gnp and create the equivalent varlist gnp newvar r. This new varlist could then be
used with commands that do not otherwise support time-series operators, or it could be used in a
program to make execution faster at the expense of using more memory.
tsrevar, substitute might create no new variables, one new variable, or many new variables,
depending on the number of op.varname combinations appearing in varlist. Any new variables created
are temporary. The new, equivalent varlist is returned in r(varlist). The new varlist corresponds
one to one with the original varlist.
tsrevar, list returns in r(varlist) the list of base variable names of varlist with the timeseries operators removed. tsrevar, list creates no new variables. For instance, if the original
varlist were gnp l.gnp l2.gnp r l.cd, then r(varlist) would contain gnp r cd. This is
useful for programmers who might want to create programs to keep only the variables corresponding
to varlist.

541

542

tsrevar Time-series operator programming command

Example 1
. use http://www.stata-press.com/data/r13/tsrevarex
. tsrevar l.gnp d.gnp r

creates two temporary variables containing the values for l.gnp and d.gnp. The variable r appears
in the new variable list but does not require a temporary variable.
The resulting variable list is
. display "r(varlist)"
00014P
00014Q r

(Your temporary variable names may be different, but that is of no consequence.)


We can see the results by listing the new variables alongside the original value of gnp.
. list gnp r(varlist)

1.
2.
3.
4.
5.

in 1/5

gnp

__00014P

__00014Q

128
135
132
138
145

.
128
135
132
138

.
7
-3
6
7

3.2
3.8
2.6
3.9
4.2

Temporary variables automatically vanish when the program concludes.


If we had needed only the base variable names, we could have specified
. tsrevar l.gnp d.gnp r, list
. display "r(varlist)"
gnp r

The order of the list will probably differ from that of the original list; base variables are listed only
once and are listed in the order that they appear in the dataset.

Technical note
tsrevar, substitute avoids creating duplicate variables. Consider
. tsrevar gnp l.gnp r cd l.cd l.gnp

l.gnp appears twice in the varlist. tsrevar will create only one new variable for l.gnp and use
that new variable twice in the resulting r(varlist). Moreover, tsrevar will even do this across
multiple calls:
. tsrevar gnp l.gnp cd l.cd
. tsrevar cpi l.gnp

l.gnp appears in two separate calls. At the first call, tsrevar creates a temporary variable corresponding to l.gnp. At the second call, tsrevar remembers what it has done and uses that same
temporary variable for l.gnp again.

tsrevar Time-series operator programming command

Stored results
tsrevar stores the following in r():
Macros
r(varlist)

the modified variable list or list of base variable names

Also see
[P] syntax Parse Stata syntax
[P] unab Unabbreviate variable list
[U] 11 Language syntax
[U] 11.4.4 Time-series varlists
[U] 18 Programming Stata

543

Title
tsset Declare data to be time-series data
Syntax
Remarks and examples

Menu
Stored results

Description
References

Options
Also see

Syntax
Declare data to be time series


tsset timevar , options


tsset panelvar timevar , options
Display how data are currently tsset
tsset
Clear time-series settings
tsset, clear
In the declare syntax, panelvar identifies the panels and timevar identifies the times.
options

Description

Main

unitoptions

specify units of timevar

Delta

deltaoption

specify period of timevar

noquery

suppress summary calculations and output

noquery is not shown in the dialog box.

unitoptions

Description

(default)
clocktime
daily
weekly
monthly
quarterly
halfyearly
yearly
generic

timevars units to be obtained from timevars display format


timevar is %tc: 0 = 1jan1960 00:00:00.000, 1 = 1jan1960 00:00:00.001, . . .
timevar is %td: 0 = 1jan1960, 1 = 2jan1960, . . .
timevar is %tw: 0 = 1960w1, 1 = 1960w2, . . .
timevar is %tm: 0 = 1960m1, 1 = 1960m2, . . .
timevar is %tq: 0 = 1960q1, 1 = 1960q2,. . .
timevar is %th: 0 = 1960h1, 1 = 1960h2,. . .
timevar is %ty: 1960 = 1960, 1961 = 1961, . . .
timevar is %tg: 0 = ?, 1 = ?, . . .

format(% fmt)

specify timevars format and then apply default rule

In all cases, negative timevar values are allowed.


544

tsset Declare data to be time-series data

545

deltaoption specifies the period between observations in timevar units and may be specified as
deltaoption

Example

delta(#)
delta((exp))
delta(# units)
delta((exp) units)

delta(1) or delta(2)
delta((7*24))
delta(7 days) or delta(15 minutes) or delta(7 days 15 minutes)
delta((2+3) weeks)

Allowed units for %tc and %tC timevars are


seconds
minutes
hours
days
weeks

second
minute
hour
day
week

secs
mins

sec
min

and for all other %t timevars, units specified must match the frequency of the data; for example, for
%ty, units must be year or years.

Menu
Statistics

>

Time series

>

Setup and utilities

>

Declare dataset to be time-series data

Description
tsset declares the data in memory to be a time series. tssetting the data is what makes Statas
time-series operators such as L. and F. (lag and lead) work; the operators are discussed under
Remarks and examples below. Also, before using the other ts commands, you must tsset the data
first. If you save the data after tsset, the data will be remembered to be time series and you will
not have to tsset again.
There are two syntaxes for setting the data:
tsset timevar
tsset panelvar timevar
In the first syntaxtsset timevarthe data are set to be a straight time series.
In the second syntaxtsset panelvar timevarthe data are set to be a collection of time series,
one for each value of panelvar, also known as panel data, cross-sectional time-series data, and xt data.
Such datasets can be analyzed by xt commands as well as ts commands. If you tsset panelvar
timevar, you do not need to xtset panelvar timevar to use the xt commands.
tsset without argumentstssetdisplays how the data are currently tsset and sorts the data
on timevar or panelvar timevar if they are sorted differently from that.
tsset, clear is a rarely used programmers command to declare that the data are no longer a
time series.

546

tsset Declare data to be time-series data

Options


Main

unitoptions clocktime, daily, weekly, monthly, quarterly, halfyearly, yearly, generic,


and format(% fmt) specify the units in which timevar is recorded.
timevar will usually be a %t variable; see [D] datetime. If timevar already has a %t display format
assigned to it, you do not need to specify a unitoption; tsset will obtain the units from the
format. If you have not yet bothered to assign the appropriate %t format, however, you can use the
unitoptions to tell tsset the units. Then tsset will set timevars display format for you. Thus,
the unitoptions are convenience options; they allow you to skip formatting the time variable. The
following all have the same net result:
Alternative 1

Alternative 2

Alternative 3

format t %td
tsset t

(t not formatted)

(t not formatted)

tsset t, daily

tsset t, format(%td)

timevar is not required to be a %t variable; it can be any variable of your own concocting so
long as it takes on only integer values. In such cases, it is called generic and considered to be
%tg. Specifying the unitoption generic or attaching a special format to timevar, however, is not
necessary because tsset will assume that the variable is generic if it has any numerical format
other than a %t format (or if it has a %tg format).
clearused in tsset, clearmakes Stata forget that the data ever were tsset. This is a rarely
used programmers option.

Delta

delta() specifies the period of timevar and is commonly used when timevar is %tc. delta() is
only sometimes used with the other %t formats or with generic time variables.
If delta() is not specified, delta(1) is assumed. This means that at timevar = 5, the previous
time is timevar = 5 1 = 4 and the next time would be timevar = 5 + 1 = 6. Lag and lead
operators, for instance, would work this way. This would be assumed regardless of the units of
timevar.
If you specified delta(2), then at timevar = 5, the previous time would be timevar = 5 2 = 3
and the next time would be timevar = 5 + 2 = 7. Lag and lead operators would work this way.
In the observation with timevar = 5, L.price would be the value of price in the observation
for which timevar = 3 and F.price would be the value of price in the observation for which
timevar = 7. If you then add an observation with timevar = 4, the operators will still work
appropriately; that is, at timevar = 5, L.price will still have the value of price at timevar = 3.
There are two aspects of timevar: its units and its periodicity. The unitoptions set the units.
delta() sets the periodicity.
We mentioned that delta() is commonly used with %tc timevars because Statas %tc variables
have units of milliseconds. If delta() is not specified and in some model you refer to L.price,
you will be referring to the value of price 1 ms ago. Few people have data with periodicity
of a millisecond. Perhaps your data are hourly. You could specify delta(3600000). Or you
could specify delta((60*60*1000)), because delta() will allow expressions if you include an
extra pair of parentheses. Or you could specify delta(1 hour). They all mean the same thing:
timevar has periodicity of 3,600,000 ms. In an observation for which timevar = 1,489,572,000,000
(corresponding to 15mar2007 10:00:00), L.price would be the observation for which timevar =
1,489,572,000,000 3,600,000 = 1,489,568,400,000 (corresponding to 15mar2007 9:00:00).

tsset Declare data to be time-series data

547

When you tsset the data and specify delta(), tsset verifies that all the observations follow
the specified periodicity. For instance, if you specified delta(2), then timevar could contain any
subset of {. . . , 4, 2, 0, 2, 4, . . . } or it could contain any subset of {. . . , 3, 1, 1, 3, . . . }.
If timevar contained a mix of values, tsset would issue an error message. If you also specify
a panelvaryou type tsset panelvar timevar, delta(2)the check is made on each panel
independently. One panel might contain timevar values from one set and the next, another, and
that would be fine.
The following option is available with tsset but is not shown in the dialog box:
noquery prevents tsset from performing most of its summary calculations and suppresses output.
With this option, only the following results are posted:
r(tdelta)
r(panelvar)
r(timevar)

r(tsfmt)
r(unit)
r(unit1)

Remarks and examples


Remarks are presented under the following headings:
Overview
Video example

Overview
tsset sets timevar so that Statas time-series operators are understood in varlists and expressions.
The time-series operators are
Operator

Meaning

L.
L2.
...
F.
F2.
...
D.
D2.
...
S.
S2.
...

lag xt1
2-period lag xt2
lead xt+1
2-period lead xt+2
difference xt xt1
difference of difference xt xt1 (xt1 xt2 ) = xt 2xt1 + xt2
seasonal difference xt xt1
lag-2 (seasonal) difference xt xt2

Time-series operators may be repeated and combined. L3.gnp refers to the third lag of variable
gnp, as do LLL.gnp, LL2.gnp, and L2L.gnp. LF.gnp is the same as gnp. DS12.gnp refers to the
one-period difference of the 12-period difference. LDS12.gnp refers to the same concept, lagged
once.
D1. = S1., but D2. 6= S2., D3. 6= S3., and so on. D2. refers to the difference of the difference.
S2. refers to the two-period difference. If you wanted the difference of the difference of the 12-period
difference of gnp, you would write D2S12.gnp.

548

tsset Declare data to be time-series data

Operators may be typed in uppercase or lowercase. Most users would type d2s12.gnp instead of
D2S12.gnp.
You may type operators however you wish; Stata internally converts operators to their canonical
form. If you typed ld2ls12d.gnp, Stata would present the operated variable as L2D3S12.gnp.
Stata also understands operator(numlist). to mean a set of operated variables. For instance, typing
L(1/3).gnp in a varlist is the same as typing L.gnp L2.gnp L3.gnp. The operators can also be
applied to a list of variables by enclosing the variables in parentheses; for example,
. list year L(1/3).(gnp cpi)
year

L.gnp

1.
2.
3.
4.

1989
1990
1991
1992

.
5452.8
5764.9
5932.4

8.

1996

7330.1

L2.gnp

L3.gnp

.
.
.
.
5452.8
.
5764.9
5452.8
(output omitted )
6892.2
6519.1

L.cpi

L2.cpi

L3.cpi

.
100
105
108

.
.
100
105

.
.
.
100

122

119

112

In operator#., making # zero returns the variable itself. L0.gnp is gnp. Thus, you can type list
year l(0/3).gnp to mean list year gnp L.gnp L2.gnp L3.gnp.
The parenthetical notation may be used with any operator. Typing D(1/3).gnp would return the
first through third differences.
The parenthetical notation may be used in operator lists with multiple operators, such as
L(0/3)D2S12.gnp.
Operator lists may include up to one set of parentheses, and the parentheses may enclose a numlist;
see [U] 11.1.8 numlist.
Before you can use these time-series operators, however, the dataset must satisfy two requirements:
1. the dataset must be tsset and
2. the dataset must be sorted by timevar or, if it is a cross-sectional time-series dataset, by panelvar
timevar.
tsset handles both requirements. As you use Stata, however, you may later use a command that
re-sorts that data, and if you do, the time-series operators will not work:
. tsset time
(output omitted )
. regress y x l.x
(output omitted )
. (you continue to use Stata and, sometime later:)
. regress y x l.x
not sorted
r(5);

Then typing tsset without arguments will reestablish the sort order:
. tsset
(output omitted )
. regress y x l.x
(output omitted )

Here typing tsset is the same as typing sort time. Had we previously tsset country time,
however, typing tsset would be the same as typing sort country time. You can type the sort
command or type tsset without arguments; it makes no difference.

tsset Declare data to be time-series data

549

There are two syntaxes for setting your data:


tsset timevar
tsset panelvar timevar
In both, timevar must contain integer values. If panelvar is specified, it too must contain integer
values, and the dataset is declared to be a cross-section of time series, such as a collection of time
series for different countries.

Example 1: Numeric time variable


You have monthly data on personal income. Variable t records the time of an observation, but
there is nothing special about the name of the variable. There is nothing special about the values of
the variable, either. t is not required to be %tm variableperhaps you do not even know what that
means. t is just a numeric variable containing integer values that represent the month, and we will
imagine that t takes on the values 1, 2, . . . , 9, although it could just as well be 3, 2 . . . , 5,
or 1,023, 1,024, . . . , 1,031. What is important is that the values are dense: adjacent months have a
time value that differs by 1.
. use http://www.stata-press.com/data/r13/tssetxmpl
. list t income
t

income

1.
2.

1
1153
2
1181
(output omitted )
9.
9
1282
. tsset t
time variable: t, 1 to 9
delta: 1 unit
. regress income l.income
(output omitted )

Example 2: Adjusting the starting date


In the example above, that t started at 1 was not important. As we said, the t variable could
just as well be recorded 3, 2 . . . , 5, or 1,023, 1,024, . . . , 1,031. What is important is that the
difference in t between observations be delta() when there are no gaps.
Although how time is measured makes no difference, Stata has formats to display time nicely if
it is recorded in certain ways; you can learn about the formats by seeing [D] datetime. Stata likes
time variables in which 1jan1960 is recorded as 0. In our previous example, if t = 1 corresponds to
July 1995, then we could make a variable that fits Statas preference by typing
. generate newt = tm(1995m7) + t - 1

tm() is the function that returns a month equivalent; tm(1995m7) evaluates to the constant 426,
meaning 426 months after January 1960. We now have variable newt containing

550

tsset Declare data to be time-series data


. list t newt income
t
1.
2.
3.
9.

newt

income

1
2
3

426
1153
427
1181
428
1208
(output omitted )
9
434
1282

If we put a %tm format on newt, it will display more cleanly:


. format newt %tm
. list t newt income
t
1.
2.
3.

1
2
3

9.

newt

income

1995m7
1153
1995m8
1181
1995m9
1208
(output omitted )
1996m3
1282

We could now tsset newt rather than t:


. tsset newt
time variable:
delta:

newt, 1995m7 to 1996m3


1 month

Technical note
In addition to monthly, Stata understands clock times (to the millisecond level) as well as daily,
weekly, quarterly, half-yearly, and yearly data. See [D] datetime for a description of these capabilities.
Lets reconsider the previous example, but rather than monthly, lets assume the data are daily,
weekly, etc. The only thing to know is that, corresponding to function tm(), there are functions
td(), tw(), tq(), th(), and ty() and that, corresponding to format %tm, there are formats %td,
%tw, %tq, %th, and %ty. Here is what we would have typed had our data been on a different time
scale:
Daily:

Weekly:

if your t variable had t=1 corresponding to 15mar1993


. gen newt = td(15mar1993) + t - 1
. tsset newt, daily

if your t variable had t=1 corresponding to


. gen newt = tw(1994w1) + t - 1
. tsset newt, weekly
Monthly:
if your t variable had t=1 corresponding to
. gen newt = tm(2004m7) + t - 1
. tsset newt, monthly
Quarterly:
if your t variable had t=1 corresponding to
. gen newt = tq(1994q1) + t - 1
. tsset newt, quarterly
Half-yearly: if your t variable had t=1 corresponding to
. gen newt = th(1921h2) + t - 1
. tsset newt, halfyearly
Yearly:
if your t variable had t=1 corresponding to
. gen newt = 1842 + t - 1
. tsset newt, yearly

1994w1:

2004m7:

1994q1:

1921h2:

1842:

tsset Declare data to be time-series data

551

In each example above, we subtracted one from our time variable in constructing the new time
variable newt because we assumed that our starting time value was 1. For the quarterly example, if
our starting time value were 5 and that corresponded to 1994q1, we would type
. generate newt = tq(1994q1) + t - 5

Had our initial time value been t = 742 and that corresponded to 1994q1, we would have typed
. generate newt = tq(1994q1) + t - 742

Example 3: Time-series data but no time variable


Perhaps we have the same time-series data but no time variable:
. use http://www.stata-press.com/data/r13/tssetxmpl2, clear
. list income
income
1.
2.
3.
4.
5.

1153
1181
1208
1272
1236

6.
7.
8.
9.

1297
1265
1230
1282

Say that we know that the first observation corresponds to July 1995 and continues without gaps. We
can create a monthly time variable and format it by typing
. generate t = tm(1995m7) + _n - 1
. format t %tm

We can now tsset our dataset and list it:


. tsset t
time variable:
delta:
. list t income
t
1.
2.
3.
9.

income

1995m7
1153
1995m8
1181
1995m9
1208
(output omitted )
1996m3
1282

t, 1995m7 to 1996m3
1 month

552

tsset Declare data to be time-series data

Example 4: Time variable as a string


Your data might include a time variable that is encoded into a string. In the example below
each monthly observation is identified by string variable yrmo containing the month and year of the
observation, sometimes with punctuation between:
. use http://www.stata-press.com/data/r13/tssetxmpl, clear
. list yrmo income
yrmo

income

1.
2.
3.
4.
5.

7/1995
8/1995
9-1995
10,1995
11 1995

1153
1181
1208
1272
1236

6.
7.
8.
9.

12 1995
1/1996
2.1996
3- 1996

1297
1265
1230
1282

The first step is to convert the string to a numeric representation. Doing so is easy using the monthly()
function; see [D] datetime.
. gen mdate = monthly(yrmo, "MY")
. list yrmo mdate income
yrmo
1.
2.
3.
9.

mdate

income

7/1995
426
8/1995
427
9-1995
428
(output omitted )
3- 1996
434

1153
1181
1208
1282

Our new variable, mdate, contains the number of months from January 1960. Now that we have
numeric variable mdate, we can tsset the data:
. format mdate %tm
. tsset mdate
time variable:
delta:

mdate, 1995m7 to 1996m3


1 month

In fact, we can combine the two and type


. tsset mdate, format(%tm)
time variable: mdate, 1995m7 to 1996m3
delta: 1 month

or type
. tsset mdate, monthly
time variable:
delta:

mdate, 1995m7 to 1996m3


1 month

tsset Declare data to be time-series data

553

In all cases, we obtain


. list yrmo mdate income
yrmo

mdate

income

1.
2.
3.
4.
5.

7/1995
8/1995
9-1995
10,1995
11 1995

1995m7
1995m8
1995m9
1995m10
1995m11

1153
1181
1208
1272
1236

6.
7.
8.
9.

12 1995
1/1996
2.1996
3- 1996

1995m12
1996m1
1996m2
1996m3

1297
1265
1230
1282

Stata can translate many different date formats, including strings like 12jan2009; January 12, 2009;
12-01-2009; 01/12/2009; 01/12/09; 12jan2009 8:14; 12-01-2009 13:12; 01/12/09 1:12 pm; Wed Jan
31 13:03:25 CST 2009; 1998q1; and more. See [D] datetime.

Example 5: Time-series data with gaps


Gaps in the time series cause no difficulties:
. use http://www.stata-press.com/data/r13/tssetxmpl3, clear
. list yrmo income
yrmo

income

1.
2.
3.
4.
5.

7/1995
8/1995
11 1995
12 1995
1/1996

1153
1181
1236
1297
1265

6.

3- 1996

1282

. gen mdate = monthly(yrmo, "MY")


. tsset mdate, monthly
time variable: mdate, 1995m7 to 1996m3, but with gaps
delta: 1 month

Once the dataset has been tsset, we can use the time-series operators. The D operator specifies first
differences:
. list mdate income d.income
mdate

income

D.income

1.
2.
3.
4.
5.

1995m7
1995m8
1995m11
1995m12
1996m1

1153
1181
1236
1297
1265

.
28
.
61
-32

6.

1996m3

1282

554

tsset Declare data to be time-series data

We can use the operators in an expression or varlist context; we do not have to create a new variable
to hold D.income. We can use D.income with the list command, with regress or any other Stata
command that allows time-series varlists.

Example 6: Clock times


We have data from a large hotel in Las Vegas that changes the reservation prices for its rooms
hourly. A piece of the data looks like
. use http://www.stata-press.com/data/r13/tssetxmpl4, clear
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

Variable time is a string variable. The first step in making this dataset a time-series dataset is to
translate the string to a numeric variable:
. generate double t = clock(time, "MDY hm")
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

1.487e+12
1.487e+12
1.487e+12
1.487e+12
1.487e+12

See [D] datetime for an explanation of what is going on here. clock() is the function that converts
strings to datetime (%tc) values. We typed clock(time, "MDY hm") to convert string variable time,
and we told clock() that the values in time were in the order month, day, year, hour, and minute.
We stored new variable t as a double because time values are large, and doing so is required to
prevent rounding. Even so, the resulting values 1.487e+12 look rounded, but that is only because of
the default display format for new variables. We can see the values better if we change the format:
. format t %20.0gc
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

1,486,972,800,000
1,486,976,400,000
1,486,980,000,000
1,486,983,600,000
1,486,987,200,000

tsset Declare data to be time-series data

555

Even better would be to change the format to %tcStatas clock-time format:


. format t %tc
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

t
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007

08:00:00
09:00:00
10:00:00
11:00:00
12:00:00

We could drop variable time. New variable t contains the same information as time and t is better
because it is a Stata time variable, the most important property of which being that it is numeric
rather than string. We can tsset it. Here, however, we also need to specify the period with tssets
delta() option. Statas time variables are numeric, but they record milliseconds since 01jan1960
00:00:00. By default, tsset uses delta(1), and that means the time-series operators would not
work as we want them to work. For instance, L.price would look back only 1 ms (and find nothing).
We want L.price to look back 1 hour (3,600,000 ms):
. tsset t, delta(1 hour)
time variable: t,
13feb2007 08:00:00.000 to 13feb2007 14:00:00.000
delta: 1 hour
. list t price l.price in 1/5

1.
2.
3.
4.
5.

13feb2007
13feb2007
13feb2007
13feb2007
13feb2007

price

L.price

08:00:00
09:00:00
10:00:00
11:00:00
12:00:00

140
155
160
155
160

.
140
155
160
155

Example 7: Clock times must be double


In the previous example, it was of vital importance that when we generated the %tc variable t,
. generate double t = clock(time, "MDY hm")
we generated it as a double. Lets see what would have happened had we forgotten and just typed
generate t = clock(time, "MDY hm"). Lets go back and start with the same original data:
. use http://www.stata-press.com/data/r13/tssetxmpl4, clear
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

556

tsset Declare data to be time-series data

Remember, variable time is a string variable, and we need to translate it to numeric. So we translate,
but this time we forget to make the new variable a double:
. generate t = clock(time, "MDY hm")
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

1.49e+12
1.49e+12
1.49e+12
1.49e+12
1.49e+12

We see the first differencet now lists as 1.49e+12 rather than 1.487e+12 as it did previouslybut
this is nothing that would catch our attention. We would not even know that the value is different.
Lets continue.
We next put a %20.0gc format on t to better see the numerical values. In fact, that is not something
we would usually do in an analysis. We did that in the example to emphasize to you that the t values
were really big numbers. We will repeat the exercise just to be complete, but in real analysis, we
would not bother.
. format t %20.0gc
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

1,486,972,780,544
1,486,976,450,560
1,486,979,989,504
1,486,983,659,520
1,486,987,198,464

Okay, we see big numbers in t. Lets continue.


Next we put a %tc format on t, and that is something we would usually do, and you should
always do. You should also list a bit of the data, as we did:
. format t %tc
. list in 1/5

1.
2.
3.
4.
5.

02.13.2007
02.13.2007
02.13.2007
02.13.2007
02.13.2007

time

price

08:00
09:00
10:00
11:00
12:00

140
155
160
155
160

t
13feb2007
13feb2007
13feb2007
13feb2007
13feb2007

07:59:40
09:00:50
09:59:49
11:00:59
11:59:58

By now, you should see a problem: the translated datetime values are off by a second or two. That
was caused by rounding. Dates and times should be the same, not approximately the same, and when
you see a difference like this, you should say to yourself, The translation is off a little. Why is
that? and then you should think, Of course, rounding. I bet that I did not create t as a double.

tsset Declare data to be time-series data

557

Let us assume, however, that you do not do this. You instead plow ahead:
. tsset t, delta(1 hour)
time values with period less than delta() found
r(451);

And that is what will happen when you forget to create t as a double. The rounding will cause
uneven period, and tsset will complain.
By the way, it is only important that clock times (%tc and %tC variables) be stored as doubles.
The other date values %td, %tw, %tm, %tq, %th, and %ty are small enough that they can safely be
stored as floats, although forgetting and storing them as doubles does no harm.

Technical note
Stata provides two clock-time formats, %tc and %tC. %tC provides a clock with leap seconds. Leap
seconds are occasionally inserted to account for randomness of the earths rotation, which gradually
slows. Unlike the extra day inserted in leap years, the timing of when leap seconds will be inserted
cannot be foretold. The authorities in charge of such matters announce a leap second approximately
6 months before insertion. Leap seconds are inserted at the end of the day, and the leap second is
called 23:59:60 (that is, 11:59:60 pm), which is then followed by the usual 00:00:00 (12:00:00 am).
Most nonastronomers find these leap seconds vexing. The added seconds cause problems because
of their lack of predictabilityknowing how many seconds there will be between 01jan2012 and
01jan2013 is not possibleand because there are not necessarily 24 hours in a day. If you use a leap
second adjustedclock, most days have 24 hours, but a few have 24 hours and 1 second. You must
look at a table to find out.
From a time-series analysis point of view, the nonconstant day causes the most problems. Lets
say that you have data on blood pressure, taken hourly at 1:00, 2:00, . . . , and that you have tsset
your data with delta(1 hour). On most days, L24.bp would be blood pressure at the same time
yesterday. If the previous day had a leap second, however, and your data were recorded using a
leap second adjustedclock, there would be no observation L24.bp because 86,400 seconds before
the current reading does not correspond to an on-the-hour time; 86,401 seconds before the current
reading corresponds to yesterdays time. Thus, whenever possible, using Statas %tc encoding rather
than %tC is better.
When times are recorded by computers using leap secondadjusted clocks, however, avoiding %tC
is not possible. For performing most time-series analysis, the recommended procedure is to map the
%tC values to %tc and then tsset those. You must ask yourself whether the process you are studying
is based on the clockthe nurse does something at 2 oclock every dayor the true passage of
timethe emitter spits out an electron every 86,400,000 ms.
When dealing with computer-recorded times, first find out whether the computer (and its timerecording software) use a leap secondadjusted clock. If it does, translate that to a %tC value. Then
use function cofC() to convert to a %tc value and tsset that. If variable T contains the %tC value,
. gen double t = cofC(T)
. format t %tc
. tsset t, delta(. . . )

Function cofC() moves leap seconds forward: 23:59:60 becomes 00:00:00 of the next day.

558

tsset Declare data to be time-series data

Panel data
Example 8: Time-series data for multiple groups
Assume that we have a time series on average annual income and that we have the series for two
groups: individuals who have not completed high school (edlevel = 1) and individuals who have
(edlevel = 2).
. use http://www.stata-press.com/data/r13/tssetxmpl5, clear
. list edlevel year income, sep(0)

1.
2.
3.
4.
5.
6.
7.

edlevel

year

income

1
1
1
1
2
2
2

1988
1989
1990
1991
1989
1990
1992

14500
14750
14950
15100
22100
22200
22800

We declare the data to be a panel by typing


. tsset edlevel year, yearly
panel variable: edlevel, (unbalanced)
time variable: year, 1988 to 1992, but with a gap
delta: 1 year

Having tsset the data, we can now use time-series operators. The difference operator, for example,
can be used to list annual changes in income:
. list edlevel year income d.income, sep(0)

1.
2.
3.
4.
5.
6.
7.

edlevel

year

income

D.income

1
1
1
1
2
2
2

1988
1989
1990
1991
1989
1990
1992

14500
14750
14950
15100
22100
22200
22800

.
250
200
150
.
100
.

We see that in addition to producing missing values due to missing times, the difference operator
correctly produced a missing value at the start of each panel. Once we have tsset our panel data,
we can use time-series operators and be assured that they will handle missing time periods and panel
changes correctly.

tsset Declare data to be time-series data

559

Video example
Time series, part 1: Formatting dates, tsset, tsreport, and tsfill

Stored results
tsset stores the following in r():
Scalars
r(imin)
r(imax)
r(tmin)
r(tmax)
r(tdelta)
Macros
r(panelvar)
r(timevar)
r(tdeltas)
r(tmins)
r(tmaxs)
r(tsfmt)
r(unit)
r(unit1)
r(balanced)

minimum panel ID
maximum panel ID
minimum time
maximum time
delta
name of panel variable
name of time variable
formatted delta
formatted minimum time
formatted maximum time
%fmt of time variable
units of time variable: Clock, clock, daily, weekly, monthly, quarterly,
halfyearly, yearly, or generic
units of time variable: C, c, d, w, m, q, h, y, or ""
unbalanced, weakly balanced, or strongly balanced; a set of panels
are strongly balanced if they all have the same time values, otherwise
balanced if same number of time values, otherwise unbalanced

References
Baum, C. F. 2000. sts17: Compacting time series data. Stata Technical Bulletin 57: 4445. Reprinted in Stata Technical
Bulletin Reprints, vol. 10, pp. 369370. College Station, TX: Stata Press.
Cox, N. J. 2010. Stata tip 68: Week assumptions. Stata Journal 10: 682685.
. 2012. Stata tip 111: More on working with weeks. Stata Journal 12: 565569.

Also see
[TS] tsfill Fill in gaps in time variable

Title
tssmooth Smooth and forecast univariate time-series data
Syntax

Description

Remarks and examples

References

Also see

Syntax
tssmooth smoother

type

newvar = exp

if

 

in

 

, ...

Smoother category

smoother

Moving average
with uniform weights
with specified weights

ma
ma

Recursive
exponential
double exponential
nonseasonal HoltWinters
seasonal HoltWinters

exponential
dexponential
hwinters
shwinters

Nonlinear filter

nl

See [TS] tssmooth ma, [TS] tssmooth exponential, [TS] tssmooth dexponential,
[TS] tssmooth hwinters, [TS] tssmooth shwinters, and [TS] tssmooth nl.

Description
tssmooth creates new variable newvar and fills it in by passing the specified expression (usually
a variable name) through the requested smoother.

Remarks and examples


The recursive smoothers may also be used for forecasting univariate time series; indeed, the
HoltWinters methods are used almost exclusively for this. All can perform dynamic out-of-sample
forecasts, and the smoothing parameters may be chosen to minimize the in-sample sum-of-squared
prediction errors.
The moving-average and nonlinear smoothers are generally used to extract the trendor signal
from a time series while omitting the high-frequency or noise components.
All smoothers work both with time-series data and panel data. When used with panel data, the
calculation is performed separately within panel.
Several texts provide good introductions to the methods available in tssmooth. Chatfield (2004)
discusses how these methods fit into time-series analysis in general. Abraham and Ledolter (1983);
Montgomery, Johnson, and Gardiner (1990); Bowerman, OConnell, and Koehler (2005); and Chatfield (2001) discuss using these methods for modern time-series forecasting. Becketti (2013) includes
a Stata-centric discussion of these techniques. As he emphasizes, these methods often work as well as
more complicated methods and are easier to explain to lay audiences. Do not dismiss these techniques
as being too simplistic or inferior.
560

tssmooth Smooth and forecast univariate time-series data

561

References
Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Bowerman, B. L., R. T. OConnell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied
Approach. 4th ed. Pacific Grove, CA: Brooks/Cole.
Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC.
. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129140.
Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal
of Forecasting 20: 510.
Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New
York: McGrawHill.
Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324342.

Also see
[TS] tsset Declare data to be time-series data
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] sspace State-space models
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[R] smooth Robust nonlinear smoother

Title
tssmooth dexponential Double-exponential smoothing
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
tssmooth dexponential

type

newvar = exp

if

 

in

 

, options

Description

options
Main

replace
parms(# )
samp0(#)
s0(#1 #2 )
forecast(#)

replace newvar if it already exists


use # as smoothing parameter
use # observations to obtain initial values for recursions
use #1 and #2 as initial values for recursions
use # periods for the out-of-sample forecast

You must tsset your data before using tssmooth dexponential; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

>

Double-exponential smoothing

Description
tssmooth dexponential models the trend of a variable whose difference between changes from
the previous values is serially correlated. More precisely, it models a variable whose second difference
follows a low-order, moving-average process.

Options


Main

replace replaces newvar if it already exists.


parms(# ) specifies the parameter for the double-exponential smoothers; 0 < # < 1. If
parms(# ) is not specified, the smoothing parameter is chosen to minimize the in-sample
sum-of-squared forecast errors.
samp0(#) and s0(#1 #2 ) are mutually exclusive ways of specifying the initial values for the recursion.
By default, initial values are obtained by fitting a linear regression with a time trend, using the
first half of the observations in the dataset; see Remarks and examples.
samp0(#) specifies that the first # be used in that regression.
s0(#1 #2 ) specifies that #1 #2 be used as initial values.
562

tssmooth dexponential Double-exponential smoothing

563

forecast(#) specifies the number of periods for the out-of-sample prediction; 0 # 500. The
default is forecast(0), which is equivalent to not performing an out-of-sample forecast.

Remarks and examples


The double-exponential smoothing procedure is designed for series that can be locally approximated
as

x
bt = mt + bt t
where x
bt is the smoothed or predicted value of the series x, and the terms mt and bt change over time.
Abraham and Ledolter (1983), Bowerman, OConnell, and Koehler (2005), and Montgomery, Johnson,
and Gardiner (1990) all provide good introductions to double-exponential smoothing. Chatfield (2001,
2004) provides helpful discussions of how double-exponential smoothing relates to modern time-series
methods.
The double-exponential method has been used both as a smoother and as a prediction method.
[TS] tssmooth exponential shows that the single-exponential smoothed series is given by

St = xt + (1 )St1
where is the smoothing constant and xt is the original series. The double-exponential smoother is
obtained by smoothing the smoothed series,
[2]

[2]

St = St + (1 )St1
[2]

Values of S0 and S0 are necessary to begin the process. Per Montgomery, Johnson, and Gar[2]
diner (1990), the default method is to obtain S0 and S0 from a regression of the first Npre values
of xt on e
t = (1, . . . , Npre t0 )0 . By default, Npre is equal to one-half the number of observations
in the sample. Npre can be specified using the samp0() option.
[2]

The values of S0 and S0 can also be specified using the option s0().

Example 1: Smoothing a locally trending series


Suppose that we had some data on the monthly sales of a book and that we wanted to smooth
this series. The graph below illustrates that this series is locally trending over time, so we would not
want to use single-exponential smoothing.

90

100

110

Sales

120

130

140

Monthly book sales

20

40
Time

60

564

tssmooth dexponential Double-exponential smoothing

The following example illustrates that double-exponential smoothing is simply smoothing the
smoothed series. Because the starting values are treated as time-zero values, we actually lose 2
observations when smoothing the smoothed series.
. use http://www.stata-press.com/data/r13/sales2
. tssmooth exponential double sm1=sales, p(.7) s0(1031)
exponential coefficient =
sum-of-squared residuals =
root mean squared error =

0.7000
13923
13.192

. tssmooth exponential double sm2=sm1, p(.7) s0(1031)


exponential coefficient =
sum-of-squared residuals =
root mean squared error =

0.7000
7698.6
9.8098

. tssmooth dexponential double sm2b=sales, p(.7) s0(1031 1031)


double-exponential coefficient
sum-of-squared residuals
root mean squared error

=
=
=

0.7000
3724.4
6.8231

. generate double sm2c = f2.sm2


(2 missing values generated)
. list sm2b sm2c in 1/10
sm2b

sm2c

1.
2.
3.
4.
5.

1031
1028.3834
1030.6306
1017.8182
1022.938

1031
1028.3834
1030.6306
1017.8182
1022.938

6.
7.
8.
9.
10.

1026.0752
1041.8587
1042.8341
1035.9571
1030.6651

1026.0752
1041.8587
1042.8341
1035.9571
1030.6651

The double-exponential method can also be viewed as a forecasting mechanism. The exponential
forecast method is a constrained version of the HoltWinters method implemented in [TS] tssmooth
hwinters (as discussed by Gardner [1985] and Chatfield [2001]). Chatfield (2001) also notes that the
double-exponential method arises when the underlying model is an ARIMA(0,2,2) with equal roots.
This method produces predictions x
bt for t = t1 , . . . , T + forecast(). These predictions are
obtained as a function of the smoothed series and the smoothed-smoothed series. For t [t0 , T ],


x
bt = 2 +
[2]

where St and St



 [2]
St 1 +
S
1
1 t

are as given above.

The out-of-sample predictions are obtained as a function of the constant term, the linear term of the
[2]
smoothed series at the last observation in the sample, and time. The constant term is aT = 2ST ST ,
[2]

and the linear term is bT = 1


(ST ST ). The th-step-ahead out-of-sample prediction is given
by
x
bt = at + bT

tssmooth dexponential Double-exponential smoothing

565

Example 2: Forecasting a locally trending series


Specifying the forecast option puts the double-exponential forecast into the new variable instead
of the double-exponential smoothed series. The code given below uses the smoothed series sm1 and
sm2 that were generated above to illustrate how the double-exponential forecasts are computed.
. tssmooth dexponential double f1=sales, p(.7) s0(1031 1031) forecast(4)
double-exponential coefficient =
0.7000
sum-of-squared residuals
=
20737
root mean squared error
=
16.1
. generate double xhat = (2 + .7/.3) * sm1 - (1 + .7/.3)* f.sm2
(5 missing values generated)
. list xhat f1 in 1/10
xhat

f1

1.
2.
3.
4.
5.

1031
1031
1023.524
1034.8039
994.0237

1031
1031
1023.524
1034.8039
994.0237

6.
7.
8.
9.
10.

1032.4463
1031.9015
1071.1709
1044.6454
1023.1855

1032.4463
1031.9015
1071.1709
1044.6454
1023.1855

Example 3: Choosing an optimal parameter to forecast


Generally, when you are forecasting, you do not know the smoothing parameter. tssmooth
dexponential computes the double-exponential forecasts of a series and obtains the optimal smoothing
parameter by finding the smoothing parameter that minimizes the in-sample sum-of-squared forecast
errors.
. tssmooth dexponential f2=sales, forecast(4)
computing optimal double-exponential coefficient (0,1)
optimal double-exponential coefficient =
0.3631
sum-of-squared residuals
=
16075.805
root mean squared error
=
14.175598

The following graph describes the fit that we obtained by applying the double-exponential forecast
method to our sales data. The out-of-sample dynamic predictions are not constant, as in the singleexponential case.

566

tssmooth dexponential Double-exponential smoothing


. line f2 sales t, title("Double exponential forecast with optimal alpha")
> ytitle(Sales) xtitle(time)

950

1000

Sales

1050

1100

Double exponential forecast with optimal alpha

20

40
time
dexpc(0.3631) = sales

60

80
sales

tssmooth dexponential automatically detects panel data from the information provided when
the dataset was tsset. The starting values are chosen separately for each series. If the smoothing
parameter is chosen to minimize the sum-of-squared prediction errors, the optimization is performed
separately on each panel. The stored results contain the results from the last panel. Missing values at
the beginning of the sample are excluded from the sample. After at least one value has been found,
missing values are filled in using the one-step-ahead predictions from the previous period.

Stored results
tssmooth dexponential stores the following in r():
Scalars
r(N)
r(alpha)
r(rss)
r(rmse)
r(N pre)
r(s2 0)
r(s1 0)
r(linear)
r(constant)
r(period)
Macros
r(method)
r(exp)
r(timevar)
r(panelvar)

number of observations
smoothing parameter
sum-of-squared errors
root mean squared error
number of observations used in calculating starting values, if starting values calculated
initial value for linear term, i.e., S0[2]
initial value for constant term, i.e., S0
final value of linear term
final value of constant term
period, if filter is seasonal
smoothing method
expression specified
time variable specified in tsset
panel variable specified in tsset

Methods and formulas


A truncated description of the specified double-exponential filter is used to label the new variable.
See [D] label for more information on labels.

tssmooth dexponential Double-exponential smoothing

567

An untruncated description of the specified double-exponential filter is saved in the characteristic


tssmooth for the new variable. See [P] char for more information on characteristics.
The updating equations for the smoothing and forecasting versions are as given previously.
The starting values for both the smoothing and forecasting versions of double-exponential are
obtained using the same method, which begins with the model

xt = 0 + 1 t
where xt is the series to be smoothed and t is a time variable that has been normalized to equal 1 in
the first period included in the sample. The regression coefficient estimates b0 and b1 are obtained
via OLS. The sample is determined by the option samp0(). By default, samp0() includes the first
half of the observations. Given the estimates b0 and b1 , the starting values are

S0 = b0 {(1 )/}b1
[2]
S0 = b0 2{(1 )/}b1

References
Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley.
Bowerman, B. L., R. T. OConnell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied
Approach. 4th ed. Pacific Grove, CA: Brooks/Cole.
Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC.
. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129140.
Gardner, E. S., Jr. 1985. Exponential smoothing: The state of the art. Journal of Forecasting 4: 128.
Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal
of Forecasting 20: 510.
Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New
York: McGrawHill.
Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324342.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tssmooth exponential Single-exponential smoothing
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
tssmooth exponential

type

newvar = exp

if

 

in

 

, options

Description

options
Main

replace
parms(# )
samp0(#)
s0(#)
forecast(#)

replace newvar if it already exists


use # as smoothing parameter
use # observations to obtain initial value for recursion
use # as initial value for recursion
use # periods for the out-of-sample forecast

You must tsset your data before using tssmooth exponential; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

>

Single-exponential smoothing

Description
tssmooth exponential models the trend of a variable whose change from the previous value
is serially correlated. More precisely, it models a variable whose first difference follows a low-order,
moving-average process.

Options


Main

replace replaces newvar if it already exists.


parms(# ) specifies the parameter for the exponential smoother; 0 < # < 1. If parms(# )
is not specified, the smoothing parameter is chosen to minimize the in-sample sum-of-squared
forecast errors.
samp0(#) and s0(#) are mutually exclusive ways of specifying the initial value for the recursion.
samp0(#) specifies that the initial value be obtained by calculating the mean over the first #
observations of the sample.
s0(#) specifies the initial value to be used.
If neither option is specified, the default is to use the mean calculated over the first half of the
sample.
forecast(#) gives the number of observations for the out-of-sample prediction; 0 # 500. The
default value is forecast(0) and is equivalent to not forecasting out of sample.
568

tssmooth exponential Single-exponential smoothing

569

Remarks and examples


Introduction
Examples
Treatment of missing values

Introduction
Exponential smoothing can be viewed either as an adaptive-forecasting algorithm or, equivalently,
as a geometrically weighted moving-average filter. Exponential smoothing is most appropriate when
used with time-series data that exhibit no linear or higher-order trends but that do exhibit lowvelocity, aperiodic variation in the mean. Abraham and Ledolter (1983), Bowerman, OConnell, and
Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) all provide good introductions to
single-exponential smoothing. Chatfield (2001, 2004) discusses how single-exponential smoothing
relates to modern time-series methods. For example, simple exponential smoothing produces optimal
forecasts for several underlying models, including ARIMA(0,1,1) and the random-walk-plus-noise
state-space model. (See Chatfield [2001, sec. 4.3.1].)
The exponential filter with smoothing parameter creates the series St , where

St = Xt + (1 )St1

for t = 1, . . . , T

and S0 is the initial value. This is the adaptive forecast-updating form of the exponential smoother.
This implies that
T
1
X
St =
(1 )K XT k + (1 )T S0
k=0

which is the weighted moving-average representation, with geometrically declining weights. The
choice of the smoothing constant determines how quickly the smoothed series or forecast will adjust
to changes in the mean of the unfiltered series. For small values of , the response will be slow
because more weight is placed on the previous estimate of the mean of the unfiltered series, whereas
larger values of will put more emphasis on the most recently observed value of the unfiltered series.

Examples
Example 1: Smoothing a series for specified parameters
Lets consider some examples using sales data. Here we forecast sales for three periods with a
smoothing parameter of 0.4:
. use http://www.stata-press.com/data/r13/sales1
. tssmooth exponential sm1=sales, parms(.4) forecast(3)
exponential coefficient =
0.4000
sum-of-squared residuals =
8345
root mean squared error =
12.919

To compare our forecast with the actual data, we graph the series and the forecasted series over
time.

570

tssmooth exponential Single-exponential smoothing


. line sm1 sales t, title("Single exponential forecast")
> ytitle(Sales) xtitle(Time)

1000

1020

Sales
1040 1060

1080

1100

Single exponential forecast

10

20

30

40

50

Time
exp parms(0.4000) = sales

sales

The graph indicates that our forecasted series may not be adjusting rapidly enough to the changes
in the actual series. The smoothing parameter controls the rate at which the forecast adjusts.
Smaller values of adjust the forecasts more slowly. Thus we suspect that our chosen value of 0.4
is too small. One way to investigate this suspicion is to ask tssmooth exponential to choose the
smoothing parameter that minimizes the sum-of-squared forecast errors.
. tssmooth exponential sm2=sales, forecast(3)
computing optimal exponential coefficient (0,1)
optimal exponential coefficient =
0.7815
sum-of-squared residuals
=
6727.7056
root mean squared error
=
11.599746

The output suggests that the value of = 0.4 is too small. The graph below indicates that the
new forecast tracks the series much more closely than the previous forecast.
. line sm2 sales t, title("Single exponential forecast with optimal alpha")
> ytitle(sales) xtitle(Time)

1000

1020

Sales
1040 1060

1080

1100

Single exponential forecast with optimal alpha

10

20

30

40

50

Time
parms(0.7815) = sales

sales

tssmooth exponential Single-exponential smoothing

571

We noted above that simple exponential forecasts are optimal for an ARIMA (0,1,1) model. (See
[TS] arima for fitting ARIMA models in Stata.) Chatfield (2001, 90) gives the following useful
derivation that relates the MA coefficient in an ARIMA (0,1,1) model to the smoothing parameter in
single-exponential smoothing. An ARIMA (0,1,1) is given by

xt xt1 = t + t1
where t is an identically and independently distributed white-noise error term. Thus given b, an
b t . Because t is not observable,
estimate of , an optimal one-step prediction of x
bt+1 is x
bt+1 = xt + 
it can be replaced by
bt = xt x
bt1
yielding

b tx
x
bt+1 = xt + (x
bt1 )
Letting
b = 1 + b and doing more rearranging implies that

b t b
bxt1
x
bt+1 = (1 + )x
x
bt+1 =
bxt (1
b)b
xt1

Example 2: Comparing ARIMA to exponential smoothing


Lets compare the estimate of the optimal smoothing parameter of 0.7815 with the one we could
obtain using [TS] arima. Below we fit an ARIMA(0,1,1) to the sales data and then remove the estimate
of . The two estimates of are quite close, given the large estimated standard error of b.
. arima sales, arima(0,1,1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -189.91037
Iteration 1:
log likelihood = -189.62405
Iteration 2:
log likelihood = -189.60468
Iteration 3:
log likelihood = -189.60352
Iteration 4:
log likelihood = -189.60343
(switching optimization to BFGS)
Iteration 5:
log likelihood = -189.60342
ARIMA regression
Sample: 2 - 50

Number of obs
Wald chi2(1)
Prob > chi2

Log likelihood = -189.6034

D.sales

Coef.

OPG
Std. Err.

P>|z|

=
=
=

49
1.41
0.2347

[95% Conf. Interval]

sales
_cons

.5025469

1.382727

0.36

0.716

-2.207548

3.212641

ma
L1.

-.1986561

.1671699

-1.19

0.235

-.5263031

.1289908

/sigma

11.58992

1.240607

9.34

0.000

9.158378

14.02147

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
. di 1 + _b[ARMA:L.ma]
.80134387

572

tssmooth exponential Single-exponential smoothing

Example 3: Handling panel data


tssmooth exponential automatically detects panel data. Suppose that we had sales figures for
five companies in long form. Running tssmooth exponential on the variable that contains all five
series puts the smoothed series and the predictions in one variable in long form. When the smoothing
parameter is chosen to minimize the squared prediction error, an optimal value for the smoothing
parameter is chosen separately for each panel.
. use http://www.stata-press.com/data/r13/sales_cert, clear
. tsset
panel variable: id (strongly balanced)
time variable: t, 1 to 100
delta: 1 unit
. tssmooth exponential sm5=sales, forecast(3)
-> id = 1
computing optimal exponential coefficient (0,1)
optimal exponential coefficient =
0.8702
sum-of-squared residuals
=
16070.567
root mean squared error
=
12.676974
-> id = 2
computing optimal exponential coefficient (0,1)
optimal exponential coefficient =
0.7003
sum-of-squared residuals
=
20792.393
root mean squared error
=
14.419568
-> id = 3
computing optimal exponential

coefficient (0,1)

optimal exponential coefficient =


sum-of-squared residuals
=
root mean squared error
=

0.6927
21629
14.706801

-> id = 4
computing optimal exponential coefficient (0,1)
optimal exponential coefficient =
0.3866
sum-of-squared residuals
=
22321.334
root mean squared error
=
14.940326
-> id = 5
computing optimal exponential coefficient (0,1)
optimal exponential coefficient =
0.4540
sum-of-squared residuals
=
20714.095
root mean squared error
=
14.392392

tssmooth exponential computed starting values and chose an optimal for each panel individually.

tssmooth exponential Single-exponential smoothing

573

Treatment of missing values


Missing values in the middle of the data are filled in with the one-step-ahead prediction using the
previous values. Missing values at the beginning or end of the data are treated as if the observations
were not there.
tssmooth exponential treats observations excluded from the sample by if and in just as if
they were missing.

Example 4: Handling missing data in the middle of a sample


Here the 28th observation is missing. The prediction for the 29th observation is repeated in the
new series.
. use http://www.stata-press.com/data/r13/sales1, clear
. tssmooth exponential sm1=sales, parms(.7) forecast(3)
(output omitted )
. generate sales2=sales if t!=28
(4 missing values generated)
. tssmooth exponential sm3=sales2, parms(.7) forecast(3)
exponential coefficient =
0.7000
sum-of-squared residuals =
6842.4
root mean squared error =
11.817
. list t sales2 sm3 if t>25 & t<31

26.
27.
28.
29.
30.

sales2

sm3

26
27
28
29
30

1011.5
1028.3
.
1028.4
1054.8

1007.5
1010.3
1022.9
1022.9
1026.75

Because the data for t = 28 are missing, the prediction for period 28 has been used in its place.
This implies that the updating equation for period 29 is

S29 = S28 + (1 )S28 = S28


which explains why the prediction for t = 28 is repeated.
Because this is a single-exponential procedure, the loss of that one observation will not be noticed
several periods later.

574

tssmooth exponential Single-exponential smoothing


. generate diff = sm3-sm1 if t>28
(28 missing values generated)
. list t diff if t>28 & t<39
t

diff

29.
30.
31.
32.
33.

29
30
31
32
33

-3.5
-1.050049
-.3150635
-.0946045
-.0283203

34.
35.
36.
37.
38.

34
35
36
37
38

-.0085449
-.0025635
-.0008545
-.0003662
-.0001221

Example 5: Handling missing data at the beginning and end of a sample


Now consider an example in which there are data missing at the beginning and end of the sample.
. generate sales3=sales if t>2 & t<49
(7 missing values generated)
. tssmooth exponential sm4=sales3, parms(.7) forecast(3)
exponential coefficient =
sum-of-squared residuals =
root mean squared error =

0.7000
6215.3
11.624

. list t sales sales3 sm4 if t<5 | t>45


t

sales

sales3

sm4

1.
2.
3.
4.
46.

1
2
3
4
46

1031
1022.1
1005.6
1025
1055.2

.
.
1005.6
1025
1055.2

.
.
1016.787
1008.956
1057.2

47.
48.
49.
50.
51.

47
48
49
50
51

1056.8
1034.5
1041.1
1056.1
.

1056.8
1034.5
.
.
.

1055.8
1056.5
1041.1
1041.1
1041.1

52.
53.

52
53

.
.

.
.

1041.1
1041.1

The output above illustrates that missing values at the beginning or end of the sample cause the
sample to be truncated. The new series begins with nonmissing data and begins predicting immediately
after it stops.
One period after the actual data concludes, the exponential forecast becomes a constant. After the
actual end of the data, the forecast at period t is substituted for the missing data. This also illustrates
why the forecasted series is a constant.

tssmooth exponential Single-exponential smoothing

575

Stored results
tssmooth exponential stores the following in r():
Scalars
r(N)
r(alpha)
r(rss)
r(rmse)
r(N pre)
r(s1 0)
Macros
r(method)
r(exp)
r(timevar)
r(panelvar)

number of observations
smoothing parameter
sum-of-squared prediction errors
root mean squared error
number of observations used in calculating starting values
initial value for St
smoothing method
expression specified
time variable specified in tsset
panel variable specified in tsset

Methods and formulas


The formulas for deriving smoothed series are as given in the text. When the value of is not
specified, an optimal value is found that minimizes the mean squared forecast error. A method of
bisection is used to find the solution to this optimization problem.
A truncated description of the specified exponential filter is used to label the new variable. See
[D] label for more information about labels.
An untruncated description of the specified exponential filter is saved in the characteristic tssmooth
for the new variable. See [P] char for more information about characteristics.

References
Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley.
Bowerman, B. L., R. T. OConnell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied
Approach. 4th ed. Pacific Grove, CA: Brooks/Cole.
Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC.
. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129140.
Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal
of Forecasting 20: 510.
Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New
York: McGrawHill.
Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324342.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tssmooth hwinters HoltWinters nonseasonal smoothing
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
tssmooth hwinters

type

newvar = exp

if

 

in

 

, options

Description

options
Main

replace
parms(# # )
samp0(#)
s0(#cons #lt )
forecast(#)

replace newvar if it already exists


use # and # as smoothing parameters
use # observations to obtain initial values for recursion
use #cons and #lt as initial values for recursion
use # periods for the out-of-sample forecast

Options

alternative initial-value specification; see Options

diff
Maximization

maximize options
from(# # )

control the maximization process; seldom used


use # and # as starting values for the parameters

You must tsset your data before using tssmooth hwinters; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

>

Holt-Winters nonseasonal smoothing

Description
tssmooth hwinters is used in smoothing or forecasting a series that can be modeled as a linear
trend in which the intercept and the coefficient on time vary over time.

Options


Main

replace replaces newvar if it already exists.


parms(# # ), 0 # 1 and 0 # 1, specifies the parameters. If parms() is not specified,
the values are chosen by an iterative process to minimize the in-sample sum-of-squared prediction
errors.
576

tssmooth hwinters HoltWinters nonseasonal smoothing

577

If you experience difficulty converging (many iterations and not concave messages), try using
from() to provide better starting values.
samp0(#) and s0(#cons #lt ) specify how the initial values #cons and #lt for the recursion are
obtained.
By default, initial values are obtained by fitting a linear regression with a time trend using the
first half of the observations in the dataset.
samp0(#) specifies that the first # observations be used in that regression.
s0(#cons #lt ) specifies that #cons and #lt be used as initial values.
forecast(#) specifies the number of periods for the out-of-sample prediction; 0 # 500. The
default is forecast(0), which is equivalent to not performing an out-of-sample forecast.

Options

diff specifies that the linear term is obtained by averaging the first difference of expt and the intercept
is obtained as the difference of exp in the first observation and the mean of D.expt .
If the diff option is not specified, a linear regression of expt on a constant and t is fit.

Maximization

maximize options controls the process for solving for the optimal and when parms() is not
specified.
 
maximize options: nodifficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), and nonrtolerance; see [R] maximize. These options are seldom used.
from(# # ), 0 < # < 1 and 0 < # < 1, specifies starting values from which the optimal values
of and will be obtained. If from() is not specified, from(.5 .5) is used.

Remarks and examples


The HoltWinters method forecasts series of the form

x
bt+1 = at + bt t
where x
bt is the forecast of the original series xt , at is a mean that drifts over time, and bt is a
coefficient on time that also drifts. In fact, as Gardner (1985) has noted, the HoltWinters method
produces optimal forecasts for an ARIMA(0,2,2) model and some local linear models. See [TS] arima
and the references in that entry for ARIMA models, and see Harvey (1989) for a discussion of the
local linear model and its relationship to the HoltWinters method. Abraham and Ledolter (1983),
Bowerman, OConnell, and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) all
provide good introductions to the HoltWinters method. Chatfield (2001, 2004) provides helpful
discussions of how this method relates to modern time-series analysis.
The HoltWinters method can be viewed as an extension of double-exponential smoothing with
two parameters, which may be explicitly set or chosen to minimize the in-sample sum-of-squared
forecast errors. In the latter case, as discussed in Methods and formulas, the smoothing parameters
are chosen to minimize the in-sample sum-of-squared forecast errors plus a penalty term that helps
to achieve convergence when one of the parameters is too close to the boundary.

578

tssmooth hwinters HoltWinters nonseasonal smoothing

Given the series xt , the smoothing parameters and , and the starting values a0 and b0 , the
updating equations are
at = xt + (1 ) (at1 + bt1 )

bt = (at at1 ) + (1 ) bt1


After computing the series of constant and linear terms, at and bt , respectively, the -step-ahead
prediction of xt is given by

x
bt+ = at + bt

Example 1: Smoothing a series for specified parameters


Below we show how to use tssmooth hwinters with specified smoothing parameters. This
example also shows that the HoltWinters method can closely follow a series in which both the mean
and the time coefficient drift over time.
Suppose that we have data on the monthly sales of a book and that we want to forecast this series
with the HoltWinters method.
. use http://www.stata-press.com/data/r13/bsales
. tssmooth hwinters hw1=sales, parms(.7 .3) forecast(3)
Specified weights:
alpha = 0.7000
beta = 0.3000
sum-of-squared residuals = 2301.046
root mean squared error = 6.192799
. line sales hw1 t, title("Holt-Winters Forecast with alpha=.7
> ytitle(Sales) xtitle(Time)

and beta=.3")

90

100

Sales
110
120

130

140

HoltWinters forecast with alpha=.7 and beta = .3

20

40

60

Time
sales

hw parms(0.700 0.300) = sales

The graph indicates that the forecasts are for linearly decreasing sales. Given aT and bT , the out-ofsample predictions are linear functions of time. In this example, the slope appears to be too steep,
probably because our choice of and .

tssmooth hwinters HoltWinters nonseasonal smoothing

579

Example 2: Choosing the initial values


The graph in the previous example illustrates that the starting values for the linear and constant
series can affect the in-sample fit of the predicted series for the first few observations. The previous
example used the default method for obtaining the initial values for the recursion. The output below
illustrates that, for some problems, the differenced-based initial values provide a better in-sample fit
for the first few observations. However, the differenced-based initial values do not always outperform
the regression-based initial values. Furthermore, as shown in the output below, for series of reasonable
length, the predictions produced are nearly identical.
. tssmooth hwinters hw2=sales, parms(.7 .3) forecast(3) diff
Specified weights:
alpha = 0.7000
beta = 0.3000
sum-of-squared residuals = 2261.173
root mean squared error = 6.13891
. list hw1 hw2 if _n<6 | _n>57
hw1

hw2

1.
2.
3.
4.
5.

93.31973
98.40002
100.8845
98.50404
93.62408

97.80807
98.11447
99.2267
96.78276
92.2452

58.
59.
60.
61.
62.

116.5771
119.2146
119.2608
111.0299
109.2815

116.5771
119.2146
119.2608
111.0299
109.2815

63.

107.5331

107.5331

When the smoothing parameters are chosen to minimize the in-sample sum-of-squared forecast
errors, changing the initial values can affect the choice of the optimal and . When changing the
initial values results in different optimal values for and , the predictions will also differ.

When the HoltWinters model fits the data well, finding the optimal smoothing parameters
generally proceeds well. When the model fits poorly, finding the and that minimize the in-sample
sum-of-squared forecast errors can be difficult.

Example 3: Forecasting with optimal parameters


In this example, we forecast the book sales data using the and that minimize the in-sample
squared forecast errors.

580

tssmooth hwinters HoltWinters nonseasonal smoothing


. tssmooth hwinters hw3=sales, forecast(3)
computing optimal weights
Iteration 0:
penalized RSS = -2632.2073 (not concave)
Iteration 1:
penalized RSS = -1982.8431
Iteration 2:
penalized RSS = -1976.4236
Iteration 3:
penalized RSS = -1975.9172
Iteration 4:
penalized RSS = -1975.9036
Iteration 5:
penalized RSS = -1975.9036
Optimal weights:
alpha = 0.8209
beta = 0.0067
penalized sum-of-squared residuals = 1975.904
sum-of-squared residuals = 1975.904
root mean squared error = 5.738617

The following graph contains the data and the forecast using the optimal and . Comparing
this graph with the one above illustrates how different choices of and can lead to very different
forecasts. Instead of linearly decreasing sales, the new forecast is for linearly increasing sales.
. line sales hw3 t, title("Holt-Winters Forecast with optimal alpha and beta")
> ytitle(Sales) xtitle(Time)

90

100

Sales
110
120

130

140

HoltWinters forecast with optimal alpha and beta

20

40

60

Time
sales

hw parms(0.821 0.007) = sales

Stored results
tssmooth hwinters stores the following in r():
Scalars
r(N)
r(alpha)
r(beta)
r(rss)
r(prss)

number of observations
smoothing parameter
smoothing parameter
sum-of-squared errors
penalized sum-of-squared errors,
if parms() not specified
root mean squared error

r(rmse)
Macros
r(method) smoothing method
r(exp)
expression specified

r(s2 0)
r(s1 0)
r(linear)
r(constant)

number of observations used


in calculating starting values
initial value for linear term
initial value for constant term
final value of linear term
final value of constant term

r(timevar)
r(panelvar)

time variables specified in tsset


panel variables specified in tsset

r(N pre)

tssmooth hwinters HoltWinters nonseasonal smoothing

581

Methods and formulas


A truncated description of the specified HoltWinters filter is used to label the new variable. See
[D] label for more information on labels.
An untruncated description of the specified HoltWinters filter is saved in the characteristic named
tssmooth for the new variable. See [P] char for more information on characteristics.
Given the series, xt ; the smoothing parameters, and ; and the starting values, a0 and b0 , the
updating equations are

at = xt + (1 ) (at1 + bt1 )
bt = (at at1 ) + (1 ) bt1
By default, the initial values are found by fitting a linear regression with a time trend. The time
variable in this regression is normalized to equal one in the first period included in the sample. By
default, one-half of the data is used in this regression, but this sample can be changed using samp0().
a0 is then set to the estimate of the constant, and b0 is set to the estimate of the coefficient on the
time trend. Specifying the diff option sets b0 to the mean of D.x and a0 to x1 b0 . s0() can also
be used to specify the initial values directly.
Sometimes, one or both of the optimal parameters may lie on the boundary of [ 0, 1 ]. To keep the
estimates inside [ 0, 1 ], tssmooth hwinters parameterizes the objective function in terms of their
inverse logits, that is, in terms of exp()/{1 + exp()} and exp()/{1 + exp()}. When one of
these parameters is actually on the boundary, this can complicate the optimization. For this reason,
e be the
tssmooth hwinters optimizes a penalized sum-of-squared forecast errors. Let x
bt (e
, )
e
forecast for the series xt , given the choices of
e and . Then the in-sample penalized sum-of-squared
prediction errors is

P =

T h
X
t=1

e 2+I
e 12)2
{xt x
bt (e
, )}
)| 12)2 + I|f (e)|>12) (|f ()|
|f (
e)|>12) (|f (e

where f (x) = ln {x(1 x)}. The penalty term is zero unless one of the parameters is close to the
boundary. When one of the parameters is close to the boundary, the penalty term will help to obtain
convergence.

Acknowledgment
We thank Nicholas J. Cox of the Department of Geography at Durham University, UK, and coeditor
of the Stata Journal for his helpful comments.

References
Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley.
Bowerman, B. L., R. T. OConnell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied
Approach. 4th ed. Pacific Grove, CA: Brooks/Cole.
Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC.
. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129140.

582

tssmooth hwinters HoltWinters nonseasonal smoothing

Gardner, E. S., Jr. 1985. Exponential smoothing: The state of the art. Journal of Forecasting 4: 128.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal
of Forecasting 20: 510.
Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New
York: McGrawHill.
Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324342.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tssmooth ma Moving-average filter
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
Reference

Syntax
Moving average with uniform weights


   
   

tssmooth ma type newvar = exp if
in , window(#l #c #f ) replace
Moving average with specified weights






   
tssmooth ma type newvar = exp if
in , weights( numlistl <#c > numlistf )


replace
You must tsset your data before using tssmooth ma; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

>

Moving-average filter

Description
tssmooth ma creates a new series in which each observation is an average of nearby observations
in the original series.
In the first syntax, window() is required and specifies the span of the filter. tssmooth ma constructs
a uniformly weighted moving average of the expression.
In the second syntax, weights() is required and specifies the weights to be used. tssmooth ma
then applies the specified weights to construct a weighted moving average of the expression.

Options
  
window(#l #c #f ) describes the span of the uniformly weighted moving average.
#l specifies the number of lagged terms to be included, 0 #l one-half the number of
observations in the sample.
#c is optional and specifies whether to include the current observation in the filter. A 0 indicates
exclusion and 1, inclusion. The current observation is excluded by default.
#f is optional and specifies the number of forward terms to be included, 0 #f one-half the
number of observations in the sample.
583

584

tssmooth ma Moving-average filter





weights( numlistl <#c > numlistf ) is required for the weighted moving average and describes
the span of the moving average, as well as the weights to be applied to each term in the average.
The middle term literally is surrounded by < and >, so you might type weights(1/2 <3> 2/1).
numlistl is optional and specifies the weights to be applied to the lagged terms when computing
the moving average.
#c is required and specifies the weight to be applied to the current term.
numlistf is optional and specifies the weights to be applied to the forward terms when computing
the moving average.
The number of elements in each numlist is limited to one-half the number of observations in the
sample.
replace replaces newvar if it already exists.

Remarks and examples


Remarks are presented under the following headings:
Overview
Video example

Overview
Moving averages are simple linear filters of the form

Pf
x
bt =

i=l

Pf

wi xt+i

i=l

wi

where

x
bt

is the moving average

xt

is the variable or expression to be smoothed

wi

are the weights being applied to the terms in the filter

l is the longest lag in the span of the filter


f

is the longest lead in the span of the filter

Moving averages are used primarily to reduce noise in time-series data. Using moving averages to
isolate signals is problematic, however, because the moving averages themselves are serially correlated,
even when the underlying data series is not. Still, Chatfield (2004) discusses moving-average filters
and provides several specific moving-average filters for extracting certain trends.

Example 1: A symmetric moving-average filter with uniform weights


Suppose that we have a time series of sales data, and we want to separate the data into two
components: signal and noise. To eliminate the noise, we apply a moving-average filter. In this
example, we use a symmetric moving average with a span of 5. This means that we will average the
first two lagged values, the current value, and the first two forward terms of the series, with each
term in the average receiving a weight of 1.

tssmooth ma Moving-average filter

585

. use http://www.stata-press.com/data/r13/sales1
. tsset
time variable: t, 1 to 50
delta: 1 unit
. tssmooth ma sm1 = sales, window(2 1 2)
The smoother applied was
(1/5)*[x(t-2) + x(t-1) + 1*x(t) + x(t+1) + x(t+2)]; x(t)= sales

We would like to smooth our series so that there is no autocorrelation in the noise. Below we
compute the noise as the difference between the smoothed series and the series itself. Then we use
ac (see [TS] corrgram) to check for autocorrelation in the noise.
. generate noise = sales-sm1

0.40

Autocorrelations of noise
0.20
0.00
0.20

0.40

. ac noise

10

15

20

25

Lag
Bartletts formula for MA(q) 95% confidence bands

Example 2: A symmetric moving-average filter with nonuniform weights


In the previous example, there is some evidence of negative second-order autocorrelation, possibly
due to the uniform weighting or the length of the filter. We are going to specify a shorter filter in
which the weights decline as the observations get farther away from the current observation.
The weighted moving-average filter requires that we supply the weights to apply to each element
with the weights() option. In specifying the weights, we implicitly specify the span of the filter.
Below we use the filter

x
bt = (1/9)(1xt2 + 2xt1 + 3xt + 2xt+1 + 1xt+2 )
In what follows, 1/2 does not mean one-half, it means the numlist 1 2:
. tssmooth ma sm2 = sales, weights( 1/2 <3> 2/1)
The smoother applied was
(1/9)*[1*x(t-2) + 2*x(t-1) + 3*x(t) + 2*x(t+1) + 1*x(t+2)]; x(t)= sales
. generate noise2 = sales-sm2

We compute the noise and use ac to check for autocorrelation.

586

tssmooth ma Moving-average filter

0.40

Autocorrelations of noise2
0.20
0.00
0.20

0.40

. ac noise2

10

15

20

25

Lag
Bartletts formula for MA(q) 95% confidence bands

The graph shows no significant evidence of autocorrelation in the noise from the second filter.

Technical note
tssmooth ma gives any missing observations a coefficient of zero in both the uniformly weighted
and weighted moving-average filters. This simply means that missing values or missing periods are
excluded from the moving average.
Sample restrictions, via if and in, cause the expression smoothed by tssmooth ma to be missing
for the excluded observations. Thus sample restrictions have the same effect as missing values in a
variable that is filtered in the expression. Also, gaps in the data that are longer than the span of the
filter will generate missing values in the filtered series.
Because the first l observations and the last f observations will be outside the span of the filter,
those observations will be set to missing in the moving-average series.

Video example
Time series, part 6: Moving-average smoothers using tssmooth

tssmooth ma Moving-average filter

587

Stored results
tssmooth ma stores the following in r():
Scalars
r(N)
r(w0)
r(wlead#)
r(wlag#)
Macros
r(method)
r(exp)
r(timevar)
r(panelvar)

number of observations
weight on the current observation
weight on lead #, if leads are specified
weight on lag #, if lags are specified
smoothing method
expression specified
time variable specified in tsset
panel variable specified in tsset

Methods and formulas


The formula for moving averages is the same as previously given.
A truncated description of the specified moving-average filter labels the new variable. See [D] label
for more information on labels.
An untruncated description of the specified moving-average filter is saved in the characteristic
tssmooth for the new variable. See [P] char for more information on characteristics.

Reference
Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tssmooth nl Nonlinear filter
Syntax
Remarks and examples

Menu
Stored results

Description
Methods and formulas

Options
Also see

Syntax


   


tssmooth nl type newvar = exp if
in , smoother(smoother , twice )


replace
  
where smoother is specified as Sm Sm . . .
and Sm is one of

 
1|2|3|4|5|6|7|8| 9 R
  


3 R S S|R S|R ...
E
H
The numbers specified in smoother represent the span of a running median smoother. For example,
a number 3 specifies that each value be replaced by the median of the point and the two adjacent
data values. The letter H indicates that a Hanning linear smoother, which is a span-3 smoother with
binomial weights, be applied.
The letters E, S, and R are three refinements that can be combined with the running median and
Hanning smoothers. First, the end points of a smooth can be given special treatment. This is specified
by the E operator. Second, smoothing by 3, the span-3 running median, tends to produce flat-topped
hills and valleys. The splitting operator, S, splits these repeated values, applies the end-point operator
to them, and then rejoins the series. Third, it is sometimes useful to repeat an odd-span median
smoother or the splitting operator until the smooth no longer changes. Following a digit or an S with
an R specifies this type of repetition.
Finally, the twice operator specifies that after smoothing, the smoother be reapplied to the resulting
rough, and any recovered signal be added back to the original smooth.


Letters may be specified in lowercase, if preferred. Examples of smoother , twice include
3RSSH
3rssh

3RSSH,twice
3rssh,twice

4253H
4253h

4253H,twice
4253h,twice

You must tsset your data before using tssmooth nl; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

588

>

Nonlinear filter

43RSR2H,twice
43rsr2h,twice

tssmooth nl Nonlinear filter

589

Description
tssmooth nl uses nonlinear smoothers to identify the underlying trend in a series.

Options


Main



smoother(smoother , twice ) is required; it specifies the nonlinear smoother to be used.
replace replaces newvar if it already exists.

Remarks and examples


tssmooth nl works as a front end to smooth. See [R] smooth for details.

Stored results
tssmooth nl stores the following in r():
Scalars
r(N)
Macros
r(method)
r(smoother)
r(timevar)
r(panelvar)

number of observations
nl
specified smoother
time variable specified in tsset
panel variable specified in tsset

Methods and formulas


The methods are documented in [R] smooth.
A truncated description of the specified nonlinear filter labels the new variable. See [D] label for
more information on labels.
An untruncated description of the specified nonlinear filter is saved in the characteristic tssmooth
for the new variable. See [P] char for more information on characteristics.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
tssmooth shwinters HoltWinters seasonal smoothing
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
tssmooth shwinters

type

newvar = exp

if

 

in

 

, options

Description

options
Main

replace newvar if it already exists


use # , # , and # as smoothing parameters
use # observations to obtain initial values for recursion
use #cons and #lt as initial values for recursion
use # periods for the out-of-sample forecast
use # for period of the seasonality
use additive seasonal HoltWinters method

replace
parms(# # # )
samp0(#)
s0(#cons #lt )
forecast(#)
period(#)
additive
Options

use initial seasonal values in varname


store estimated initial values for seasonal terms in newvar
store final years estimated seasonal terms in newvar
normalize seasonal values
use alternative method for computing the starting values

sn0 0(varname)
sn0 v(newvar)
snt v(newvar)
normalize
altstarts
Maximization

control the maximization process; seldom used


use # , # , and # as starting values for the parameters

maximize options
from(# # # )

You must tsset your data before using tssmooth shwinters; see [TS] tsset.
exp may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Smoothers/univariate forecasters

>

Holt-Winters seasonal smoothing

Description
tssmooth shwinters performs the seasonal HoltWinters method on a user-specified expression,
which is usually just a variable name, and generates a new variable containing the forecasted series.

590

tssmooth shwinters HoltWinters seasonal smoothing

591

Options


Main

replace replaces newvar if it already exists.


parms(# # # ), 0 # 1, 0 # 1, and 0 # 1, specifies the parameters. If
parms() is not specified, the values are chosen by an iterative process to minimize the in-sample
sum-of-squared prediction errors.
If you experience difficulty converging (many iterations and not concave messages), try using
from() to provide better starting values.
samp0(#) and s0(#cons #lt ) have to do with how the initial values #cons and #lt for the recursion
are obtained.
s0(#cons #lt ) specifies the initial values to be used.
samp0(#) specifies that the initial values be obtained using the first # observations of the sample.
This calculation is described under Methods and formulas and depends on whether the altstart
and additive options are also specified.
If neither option is specified, the first half of the sample is used to obtain initial values.
forecast(#) specifies the number of periods for the out-of-sample prediction; 0 # 500. The
default is forecast(0), which is equivalent to not performing an out-of-sample forecast.
period(#) specifies the period of the seasonality. If period() is not specified, the seasonality is
obtained from the tsset options daily, weekly, . . . , yearly; see [TS] tsset. If you did not
specify one of those options when you tsset the data, you must specify the period() option.
For instance, if your data are quarterly and you did not specify tssets quarterly option, you
must now specify period(4).
By default, seasonal values are calculated, but you may specify the initial seasonal values to be
used via the sn0 0(varname) option. The first period() observations of varname are to contain
the initial seasonal values.
additive uses the additive seasonal HoltWinters method instead of the default multiplicative
seasonal HoltWinters method.

Options

sn0 0(varname) specifies the initial seasonal values to use. varname must contain a complete years
worth of seasonal values, beginning with the first observation in the estimation sample. For example,
if you have monthly data, the first 12 observations of varname must contain nonmissing data.
sn0 0() cannot be used with sn0 v().
sn0 v(newvar) stores in newvar the initial seasonal values after they have been estimated. sn0 v()
cannot be used with sn0 0().
snt v(newvar) stores in newvar the seasonal values for the final years worth of data.
normalize specifies that the seasonal values be normalized. In the multiplicative model, they are
normalized to sum to one. In the additive model, the seasonal values are normalized to sum to
zero.
altstarts uses an alternative method to compute the starting values for the constant, the linear,
and the seasonal terms. The default and the alternative methods are described in Methods and
formulas. altstarts may not be specified with s0().

592

tssmooth shwinters HoltWinters seasonal smoothing

Maximization

maximize options controls the process for solving for the optimal , , and when the parms()
option is not specified.
 
maximize options: nodifficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), and nonrtolerance; see [R] maximize. These options are seldom used.
from(# # # ), 0 < # < 1, 0 < # < 1, and 0 < # < 1, specifies starting values from which
the optimal values of , , and will be obtained. If from() is not specified, from(.5 .5 .5)
is used.

Remarks and examples


Remarks are presented under the following headings:
Introduction
HoltWinters seasonal multiplicative method
HoltWinters seasonal additive method

Introduction
The seasonal HoltWinters methods forecast univariate series that have a seasonal component.
If the amplitude of the seasonal component grows with the series, the HoltWinters multiplicative
method should be used. If the amplitude of the seasonal component is not growing with the series, the
HoltWinters additive method should be used. Abraham and Ledolter (1983), Bowerman, OConnell,
and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) provide good introductions to the
HoltWinters methods in recursive univariate forecasting methods. Chatfield (2001, 2004) provides
introductions in the broader context of modern time-series analysis.
Like the other recursive methods in tssmooth, tssmooth shwinters uses the information stored
by tsset to detect panel data. When applied to panel data, each series is smoothed separately, and
the starting values are computed separately for each panel. If the smoothing parameters are chosen
to minimize the in-sample sum-of-squared forecast errors, the optimization is performed separately
on each panel.
When there are missing values at the beginning of the series, the sample begins with the first
nonmissing observation. Missing values after the first nonmissing observation are filled in with
forecasted values.

HoltWinters seasonal multiplicative method


This method forecasts seasonal time series in which the amplitude of the seasonal component
grows with the series. Chatfield (2001) notes that there are some nonlinear state-space models whose
optimal prediction equations correspond to the multiplicative HoltWinters method. This procedure
is best applied to data that could be described by

xt+j = (t + j)St+j + t+j


where xt is the series, t is the time-varying mean at time t, is a parameter, St is the seasonal
component at time t, and t is an idiosyncratic error. See Methods and formulas for the updating
equations.

tssmooth shwinters HoltWinters seasonal smoothing

593

Example 1: Forecasting from the multiplicative model


We have quarterly data on turkey sales by a new producer in the 1990s. The data have a strong
seasonal component and an upward trend. We use the multiplicative HoltWinters method to forecast
sales for the year 2000. Because we have already tsset our data to the quarterly format, we do not
need to specify the period() option.
. use http://www.stata-press.com/data/r13/turksales
. tssmooth shwinters shw1 = sales, forecast(4)
computing optimal weights
Iteration 0:
penalized RSS = -189.34609 (not concave)
Iteration 1:
penalized RSS = -108.68038 (not concave)
Iteration 2:
penalized RSS = -106.23703
Iteration 3:
penalized RSS = -106.14101
Iteration 4:
penalized RSS = -106.14093
Iteration 5:
penalized RSS = -106.14093
Optimal weights:
alpha = 0.1310
beta = 0.1428
gamma = 0.2999
penalized sum-of-squared residuals = 106.1409
sum-of-squared residuals = 106.1409
root mean squared error = 1.628964

The graph below describes the fit and the forecast that was obtained.
. line sales shw1 t, title("Multiplicative Holt-Winters forecast")
> xtitle(Time) ytitle(Sales)

95

100

Sales
105

110

115

Multiplicative HoltWinters forecast

1990q1

1992q1

1994q1

sales

1996q1
Time

1998q1

2000q1

shw parms(0.131 0.143 0.300) = sales

HoltWinters seasonal additive method


This method is similar to the previous one, but the seasonal effect is assumed to be additive rather
than multiplicative. This method forecasts series that can be described by the equation

xt+j = (t + j) + St+j + t+j


See Methods and formulas for the updating equations.

594

tssmooth shwinters HoltWinters seasonal smoothing

Example 2: Forecasting from the additive model


In this example, we fit the data from the previous example to the additive model to forecast sales
in the coming year. We use the snt v() option to save the last years seasonal terms in the new
variable seas.
. tssmooth shwinters shwa = sales, forecast(4) snt_v(seas) normalize additive
computing optimal weights
Iteration 0:
penalized RSS = -190.90242 (not concave)
Iteration 1:
penalized RSS = -108.8357
Iteration 2:
penalized RSS = -107.9543
Iteration 3:
penalized RSS = -107.66582
Iteration 4:
penalized RSS = -107.66442
Iteration 5:
penalized RSS = -107.66442
Optimal weights:
alpha = 0.1219
beta = 0.1580
gamma = 0.3340
penalized sum-of-squared residuals = 107.6644
sum-of-squared residuals = 107.6644
root mean squared error = 1.640613

The output reveals that the multiplicative model has a better in-sample fit, and the graph below
shows that the forecast from the multiplicative model is higher than that of the additive model.
. line shw1 shwa t if t>=tq(2000q1), title("Multiplicative and additive"
> "Holt-Winters forecasts") xtitle("Time") ytitle("Sales") legend(cols(1))

108

109

Sales
110 111

112

113

Multiplicative and additive


HoltWinters forecasts

2000q1

2000q2

2000q3
Time

2000q4

2001q1

shw parms(0.131 0.143 0.300) = sales


shwadd parms(0.122 0.158 0.334) = sales

To check whether the estimated seasonal components are intuitively sound, we list the last years
seasonal components.

tssmooth shwinters HoltWinters seasonal smoothing

595

. list t seas if seas < .

37.
38.
39.
40.

seas

1999q1
1999q2
1999q3
1999q4

-2.7533393
-.91752566
1.8082417
1.8626233

The output indicates that the signs of the estimated seasonal components agree with our intuition.

Stored results
tssmooth shwinters stores the following in r():
Scalars
r(N)
r(alpha)
r(beta)
r(gamma)
r(prss)
r(rss)
r(rmse)
Macros
r(method)

number of observations
smoothing parameter
smoothing parameter
smoothing parameter

penalized sum-of-squared errors


sum-of-squared errors
root mean squared error

shwinters, additive or
shwinters, multiplicative
r(normalize) normalize, if specified

r(s2 0)
r(s1 0)
r(linear)
r(constant)
r(period)

number of seasons used


in calculating starting values
initial value for linear term
initial value for constant term
final value of linear term
final value of constant term
period, if filter is seasonal

r(exp)
r(timevar)
r(panelvar)

expression specified
time variable specified in tsset
panel variable specified in tsset

r(N pre)

Methods and formulas


A truncated description of the specified seasonal HoltWinters filter labels the new variable. See
[D] label for more information on labels.
An untruncated description of the specified seasonal HoltWinters filter is saved in the characteristic
named tssmooth for the new variable. See [P] char for more information on characteristics.
When the parms() option is not specified, the smoothing parameters are chosen to minimize the
in-sample sum of penalized squared-forecast errors. Sometimes, one or more of the three optimal
parameters lies on the boundary [ 0, 1 ]. To keep the estimates inside [ 0, 1 ], tssmooth shwinters
parameterizes the objective function in terms of their inverse logits, that is, exp()/{1 + exp()},
exp()/{1 + exp()}, and exp()/{1 + exp()}. When one of these parameters is actually on the
boundary, this can complicate the optimization. For this reason, tssmooth shwinters optimizes a
e
penalized sum-of-squared forecast errors. Let x
bt (e
, ,
e) be the forecast for the series xt given the
choices of
e, e, and
e. Then the in-sample penalized sum-of-squared prediction errors is

P =

T h
X
t=1

e
e 12)2
{xt x
bt (e
, ,
e)}2 + I|f (
)| 12)2 + I|f (e)|>12) (|f ()|
e)|>12) (|f (e
i
2
+I|f (e
(|f
(e

)|

12)
)|>12)

596

tssmooth shwinters HoltWinters seasonal smoothing



x
where f (x) = ln 1x
. The penalty term is zero unless one of the parameters is close to the
boundary. When one of the parameters is close to the boundary, the penalty term will help to obtain
convergence.
HoltWinters seasonal multiplicative procedure
As with the other recursive methods in tssmooth, there are three aspects to implementing the
HoltWinters seasonal multiplicative procedure: the forecasting equation, the initial values, and the
updating equations. Unlike in the other methods, the data are now assumed to be seasonal with period
L.
Given the estimates a(t), b(t), and s(t + L), a step-ahead point forecast of xt , denoted by
ybt+ , is
ybt+ = {a(t) + b(t) } s(t + L)
Given the smoothing parameters , , and , the updating equations are

a(t) =

xt
+ (1 ) {a(t 1) + b(t 1)}
s(t L)

b(t) = {a(t) a(t 1)} + (1 ) b(t 1)


and


s(t) =

xt
a(t)


+ (1 )s(t L)

To restrict the seasonal terms to sum to 1 over each year, specify the normalize option.
The updating equations require the L + 2 initial values a(0), b(0), s(1 L), s(2 L), . . . , s(0).
Two methods calculate the initial values with the first m years, each of which contains L seasons.
By default, m is set to the number of seasons in half the sample.
The initial value of the trend component, b(0), can be estimated by

b(0) =

xm x1
(m 1)L

where xm is the average level of xt in year m and x1 is the average level of xt in the first year.
The initial value for the linear term, a(0), is then calculated as

a(0) = x1

L
b(0)
2

To calculate the initial values for the seasons 1, 2, . . . , L, we first calculate the deviation-adjusted
values,
xt
n
o
S(t) =
(L+1)
xi

j
b(0)
2
where i is the year that corresponds to time t, j is the season that corresponds to time t, and xi is
the average level of xt in year i.

tssmooth shwinters HoltWinters seasonal smoothing

597

Next, for each season l = 1, 2, . . . , L, we define sl as the average St over the years. That is,

sl =

m1
1 X
Sl+kL
m

for l = 1, 2, . . . , L

k=0

Then the initial seasonal estimates are

s0l = sl

PL

l=1 sl

for l = 1, 2, . . . , L

and these values are used to fill in s(1 L), . . . , s(0).


If the altstarts option is specified, the starting values are computed based on a regression with
seasonal indicator variables. Specifically, the series xt is regressed on a time variable normalized
to equal one in the first period in the sample and on a constant. Then b(0) is set to the estimated
coefficient on the time variable, and a(0) is set to the estimated constant term. To calculate the
seasonal starting values, xt is regressed on a set of L seasonal dummy variables. The lth seasonal
starting value is set to ( 1 )bl , where is the mean of xt and bl is the estimated coefficient on
the lth seasonal dummy variable. The sample used in both regressions and the mean computation is
restricted to include the first samp0() years. By default, samp0() includes half the data.

Technical note
If there are missing values in the first few years, a small value of m can cause the starting value
methods for seasonal term to fail. Here you should either specify a larger value of m by using
samp0() or directly specify the seasonal starting values by using the snt0 0() option.

HoltWinters seasonal additive procedure


This procedure is similar to the previous one, except that the data are assumed to be described by

xt = (0 + 1 t) + st + t
As in the multiplicative case, there are three smoothing parameters, , , and , which can either
be set or chosen to minimize the in-sample sum-of-squared forecast errors.
The updating equations are

a(t) = {xt s(t L)} + (1 ) {a(t 1) + b(t 1)}

b(t) = {a(t) a(t 1)} + (1 )b(t 1)


and

s(t) = {xt a(t)} + (1 )s(t L)


To restrict the seasonal terms to sum to 0 over each year, specify the normalize option.
A -step-ahead forecast, denoted by ybt+ , is given by

x
bt+ = a(t) + b(t) + s(t + L)

598

tssmooth shwinters HoltWinters seasonal smoothing

As in the multiplicative case, there are two methods for setting the initial values.
The default method is to obtain the initial values for a(0), b(0), s(1 L), . . . , s(0) from the
regression
xt = a(0) + b(0)t + s,1L D1 + s,2L D2 + + s,0 DL + et
where the D1 , . . . , DL are dummy variables with


Di =

1
0

if t corresponds to season i
otherwise

When altstarts is specified, an alternative method is used that regresses the xt series on a time
variable that has been normalized to equal one in the first period in the sample and on a constant
term. b(0) is set to the estimated coefficient on the time variable, and a(0) is set to the estimated
constant term. Then the demeaned series x
et = xt is created, where is the mean of the xt .
The x
et are regressed on L seasonal dummy variables. The lth seasonal starting value is then set to
l , where l is the estimated coefficient on the lth seasonal dummy variable. The sample in both the
regression and the mean calculation is restricted to include the first samp0 years, where, by default,
samp0() includes half the data.

Acknowledgment
We thank Nicholas J. Cox of the Department of Geography at Durham University, UK, and coeditor
of the Stata Journal for his helpful comments.

References
Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley.
Bowerman, B. L., R. T. OConnell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied
Approach. 4th ed. Pacific Grove, CA: Brooks/Cole.
Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC.
. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129140.
Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal
of Forecasting 20: 510.
Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New
York: McGrawHill.
Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324342.

Also see
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data

Title
ucm Unobserved-components model
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
ucm depvar

options

indepvars

 

if

 

in

 

, options

Description

Model

model(model)
specify trend and idiosyncratic components
seasonal(#)

 include a seasonal component with a period of # time units
cycle(# , frequency(#f ) ) include a cycle component of order # and optionally set initial
frequency to #f , 0 < #f < ; cycle() may be specified up to
three times
apply specified linear constraints
constraints(constraints)
collinear
keep collinear variables
SE/Robust

vce(vcetype)

vcetype may be oim or robust

Reporting

level(#)
nocnsreport
display options

set confidence level; default is level(95)


do not display constraints
control column formats, row spacing, display of omitted variables
and base and empty cells, and factor-variable labeling

Maximization

maximize options

control the maximization process

coeflegend

display legend instead of statistics

model

Description

rwalk
none
ntrend
dconstant
llevel
dtrend
lldtrend
rwdrift
lltrend
strend
rtrend

random-walk model; the default


no trend or idiosyncratic component
no trend component but include idiosyncratic component
deterministic constant with idiosyncratic component
local-level model
deterministic-trend model with idiosyncratic component
local-level model with deterministic trend
random-walk-with-drift model
local-linear-trend model
smooth-trend model
random-trend model

599

600

ucm Unobserved-components model

You must tsset your data before using ucm; see [TS] tsset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
indepvars and depvar may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Time series

>

Unobserved-components model

Description
Unobserved-components models (UCMs) decompose a time series into trend, seasonal, cyclical,
and idiosyncratic components and allow for exogenous variables. ucm estimates the parameters of
UCMs by maximum likelihood.
All the components are optional. The trend component may be first-order deterministic or it may
be first-order or second-order stochastic. The seasonal component is stochastic; the seasonal effects
at each time period sum to a zero-mean finite-variance random variable. The cyclical component is
modeled by the stochastic-cycle model derived by Harvey (1989).

Options


Model

model(model) specifies the trend and idiosyncratic components. The default is model(rwalk). The
available models are listed in Syntax and discussed in detail in Models for the trend and idiosyncratic
components under Remarks and examples below.
seasonal(#) adds a stochastic-seasonal component to the model. # is the period of the season, that
is, the number of time-series observations required for the period to complete.
cycle(#) adds a stochastic-cycle component of order # to the model. The order # must be 1, 2, or
3. Multiple cycles are added by repeating the cycle(#) option with up to three cycles allowed.
cycle(#, frequency(#f )) specifies #f as the initial value for the central-frequency parameter
in the stochastic-cycle component of order #. #f must be in the interval (0, ).
constraints(constraints), collinear; see [R] estimation options.

SE/Robust

vce(vcetype) specifies the estimator for the variancecovariance matrix of the estimator.
vce(oim), the default, causes ucm to use the observed information matrix estimator.
vce(robust) causes ucm to use the Huber/White/sandwich estimator.

Reporting

level(#), nocnsreport; see [R] estimation options.


display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see
[R] estimation options.

ucm Unobserved-components model

601

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and
see below for information on from().
from(matname) specifies initial values for the maximization process. from(b0) causes ucm to
begin the maximization algorithm with the values in b0. b0 must be a row vector; the number
of columns must equal the number of parameters in the model; and the values in b0 must be
in the same order as the parameters in e(b).
If you model fails to converge, try using the difficult option. Also see the technical note below
example 5.
The following option is available with ucm but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Remarks are presented under the following headings:
An introduction to UCMs
A random-walk model example
Frequency-domain concepts used in the stochastic-cycle model
Another random-walk model example
Comparing UCM and ARIMA
A local-level model example
Comparing UCM and ARIMA, revisited
Models for the trend and idiosyncratic components
Seasonal component

An introduction to UCMs
UCMs decompose a time series into trend, seasonal, cyclical, and idiosyncratic components and
allow for exogenous variables. Formally, UCMs can be written as

yt = t + t + t + xt + t

(1)

where yt is the dependent variable, t is the trend component, t is the seasonal component, t is
the cyclical component, is a vector of fixed parameters, xt is a vector of exogenous variables, and
t is the idiosyncratic component.
By placing restrictions on t and t , Harvey (1989) derived a series of models for the trend and the
idiosyncratic components. These models are briefly described in Syntax and are further discussed in
Models for the trend and idiosyncratic components. To these models, Harvey (1989) added models for
the seasonal and cyclical components, and he also allowed for the presence of exogenous variables.
It is rare that a UCM contains all the allowed components. For instance, the seasonal component
is rarely needed when modeling deseasonalized data.
Harvey (1989) and Durbin and Koopman (2012) show that UCMs can be written as state-space
models that allow the parameters of a UCM to be estimated by maximum likelihood. In fact, ucm
uses sspace (see [TS] sspace) to perform the estimation calculations; see Methods and formulas for
details.

602

ucm Unobserved-components model

After estimating the parameters, predict can produce in-sample predictions or out-of-sample
forecasts; see [TS] ucm postestimation. After estimating the parameters of a UCM that contains
a cyclical component, estat period converts the estimated central frequency to an estimated
central period and psdensity estimates the spectral density implied by the model; see [TS] ucm
postestimation and the examples below.
We illustrate the basic approach of analyzing data with UCMs, and then we discuss the details of
the different trend models in Models for the trend and idiosyncratic components.
Although the methods implemented in ucm have been widely applied by economists, they are general
time-series techniques and may be of interest to researchers from other disciplines. In example 8, we
analyze monthly data on the reported cases of mumps in New York City.

A random-walk model example


Example 1
We begin by plotting monthly data on the U.S. civilian unemployment rate.

Civilian Unemployment Rate


4
6
8

10

. use http://www.stata-press.com/data/r13/unrate
. tsline unrate, name(unrate)

1950m1

1960m1

1970m1

1980m1
Month

1990m1

2000m1

2010m1

This series looks like it might be well approximated by a random-walk model. Formally, a
random-walk model is given by

yt = t
t = t1 + t
The random-walk is so frequently applied, at least as a starting model, that it is the default model
for ucm. In the output below, we fit the random-walk model to the unemployment data.

ucm Unobserved-components model


. ucm unrate
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood = 84.272992
Iteration 1:
log likelihood = 84.394942
Iteration 2:
log likelihood = 84.400923
Iteration 3:
log likelihood = 84.401282
Iteration 4:
log likelihood = 84.401305
(switching technique to nr)
Iteration 5:
log likelihood = 84.401306
Refining estimates:
Iteration 0:
log likelihood = 84.401306
Iteration 1:
log likelihood = 84.401307
Unobserved-components model
Components: random walk
Sample: 1948m1 - 2011m1
Log likelihood = 84.401307

unrate
var(level)

Coef.
.0467196

OIM
Std. Err.
.002403

Number of obs

z
19.44

603

757

P>|z|

[95% Conf. Interval]

0.000

.0420098

.0514294

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output indicates that the model is nonstationary, as all random-walk models are.
We consider a richer model in the next example.

Example 2
We suspect that there should be a stationary cyclical component that produces serially correlated
shocks around the random-walk trend. Harvey (1989) derived a stochastic-cycle model for these
stationary cyclical components.
The stochastic-cycle model has three parameters: the frequency at which the random components
are centered, a damping factor that parameterizes the dispersion of the random components around
the central frequency, and the variance of the stochastic-cycle process that acts as a scale factor.

604

ucm Unobserved-components model

Fitting this model to unemployment data yields


. ucm unrate, cycle(1)
searching for initial values ....................
(setting technique to bhhh)
Iteration 0:
log likelihood = 84.273579
Iteration 1:
log likelihood = 87.852115
Iteration 2:
log likelihood = 88.253422
Iteration 3:
log likelihood = 89.191311
Iteration 4:
log likelihood = 94.675898
(switching technique to nr)
Iteration 5:
log likelihood = 98.394691 (not concave)
Iteration 6:
log likelihood = 98.983092
Iteration 7:
log likelihood = 99.983623
Iteration 8:
log likelihood = 104.83121
Iteration 9:
log likelihood = 114.26885
Iteration 10: log likelihood =
116.4747
Iteration 11: log likelihood = 118.45875
Iteration 12: log likelihood = 118.88058
Iteration 13: log likelihood = 118.88421
Iteration 14: log likelihood = 118.88421
Refining estimates:
Iteration 0:
log likelihood = 118.88421
Iteration 1:
log likelihood = 118.88421
Unobserved-components model
Components: random walk, order 1 cycle
Sample: 1948m1 - 2011m1
Log likelihood =

Number of obs
Wald chi2(2)
Prob > chi2

118.88421
OIM
Std. Err.

unrate

Coef.

frequency
damping

.0933466
.9820003

.0103609
.0061121

var(level)
var(cycle1)

.0143786
.0270339

.0051392
.0054343

=
=
=

757
26650.81
0.0000

P>|z|

[95% Conf. Interval]

9.01
160.66

0.000
0.000

.0730397
.9700207

.1136535
.9939798

2.80
4.97

0.003
0.000

.004306
.0163829

.0244511
.0376848

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The estimated central frequency for the cyclical component is small, implying that the cyclical
component is centered on low-frequency components. The high-damping factor indicates that all the
components from this cyclical component are close to the estimated central frequency. The estimated
variance of the stochastic-cycle process is small but significant.
We use estat period to convert the estimate of the central frequency to an estimated central
period.
. estat period
cycle1

Coef.

period
frequency
damping

67.31029
.0933466
.9820003

Std. Err.

[95% Conf. Interval]

7.471004
.0103609
.0061121

52.66739
.0730397
.9700207

Note: Cycle time unit is monthly.

81.95319
.1136535
.9939798

ucm Unobserved-components model

605

Because we have monthly data, the estimated central period of 67.31 implies that the cyclical
component is composed of random components that occur around a central periodicity of about 5.61
years. This estimate falls within the conventional Burns and Mitchell (1946) definition of business-cycle
shocks occurring between 1.5 and 8 years.
We can convert the estimated parameters of the cyclical component to an estimated spectral
density of the cyclical component, as described by Harvey (1989). The spectral density of the cyclical
component describes the relative importance of the random components at different frequencies; see
Frequency-domain concepts used in the stochastic-cycle model for details. We use psdensity (see
[TS] psdensity) to obtain the spectral density of the cyclical component implied by the estimated
parameters, and we use twoway line (see [G-2] graph twoway line) to plot the estimated spectral
density.

UCM cycle 1 spectral density


2
4
6

. psdensity sdensity omega


. line sdensity omega

Frequency

The estimated spectral density shows that the cyclical component is composed of random components
that are tightly distributed at the low-frequency peak.

Frequency-domain concepts used in the stochastic-cycle model


The parameters of the stochastic-cycle model are easiest to interpret in the frequency domain. We
now provide a review of the useful concepts from the frequency domain. Crucial to understanding the
stochastic-cycle model is the frequency-domain concept that a stationary process can be decomposed
into random components that occur at the frequencies in the interval [0, ].
We need some concepts from the frequency-domain approach to interpret the parameters in the
stochastic-cycle model of the cyclical component. Here we provide a simple, intuitive explanation.
More technical presentations can be found in Priestley (1981), Harvey (1989, 1993), Hamilton (1994),
Fuller (1996), and Wei (2006).
As with much time-series analysis, the basic results are for covariance-stationary processes with
additional results handling some nonstationary cases. We present some useful results for covariancestationary processes. These results provide what we need to interpret the stochastic-cycle model for
the stationary cyclical component.

606

ucm Unobserved-components model

The autocovariances j , j {0, 1, . . . , }, of a covariance-stationary process yt specify its


variance and dependence structure. In the frequency-domain approach to time-series analysis, the
spectral density describes the importance of the random components that occur at frequency relative
to the components that occur at other frequencies.
The frequency-domain approach focuses on the relative contributions of random components that
occur at the frequencies [0, ].
The spectral density can be written as a weighted average of the autocorrelations of yt . Like
autocorrelations, the spectral density is normalized by 0 , the variance of yt . Multiplying the spectral
density by 0 yields the power-spectrum of yt .
In an independent and identically distributed (i.i.d.) process, the components at all frequencies are
equally important, so the spectral density is a flat line.
In common parlance, we speak of high-frequency noise making a series look more jagged and of
low-frequency components causing smoother plots. More formally, we say that a process composed
primarily of high-frequency components will have fewer runs above or below the mean than an i.i.d.
process and that a process composed primarily of low-frequency components will have more runs
above or below the mean than an i.i.d. process.
To further formalize these ideas, consider the first-order autoregressive (AR(1)) process given by

yt = yt1 + t
where t is a zero-mean, covariance-stationary process with finite variance 2 , and || < 1 so that
yt is covariance stationary. The first-order autocorrelation of this AR(1) process is .

=0.8
321 0 1 2

=0.8
21 0 1 2 3

=0
2 1 0 1 2

Below are plots of simulated data when is set to 0, 0.8, and 0.8. When = 0, the data are i.i.d.
When = 0.8, the value today is strongly negatively correlated with the value yesterday, so this case
should be a prototypical high-frequency noise example. When = 0.8, the value today is strongly
positively correlated with the value yesterday, so this case should be a prototypical low-frequency
shock example.

Time

The plots above confirm our conjectures. The plot when = 0.8 contains fewer runs above or
below the mean, and it is more jagged than the i.i.d. plot. The plot when = 0.8 contains more runs
above or below the mean, and it is smoother than the i.i.d. plot.

ucm Unobserved-components model

607

Spectral density
10
15

20

25

Below we plot the spectral densities for the AR(1) model with = 0, = 0.8, and = 0.8.

Frequency
=0

=0.8

=0.8

The high-frequency components are much more important to the AR(1) process with = 0.8 than
to the i.i.d. process with = 0. The low-frequency components are much more important to the
AR(1) process with = 0.8 than to the i.i.d. process.

Technical note
Autoregressive moving-average (ARMA) models parameterize the autocorrelation in a time series
by allowing todays value to be a weighted average of past values and a weighted average of past i.i.d.
shocks; see Hamilton (1994), Wei (2006), and [TS] arima for introductions and a Stata implementation.
The intuitive ARMA parameterization has many nice features, including that one can easily rewrite
the ARMA model as a weighted average of past i.i.d. shocks to trace how a shock feeds through the
system.
Although it is easy to obtain the spectral density of an ARMA process, the parameters themselves
provide limited information about the underlying spectral density.
In contrast, the parameters of the stochastic-cycle parameterization of autocorrelation in a time series
directly provide information about the underlying spectral density. The parameter 0 is the central
frequency at which the random components are clustered. If 0 is small, then the model is centered
on low-frequency components. If 0 is close to , then the model is centered on high-frequency
components. The parameter is the damping factor that indicates how tightly clustered the random
components are at the central frequency 0 . If is close to 0, there is no clustering of the random
components. If is close to 1, the random components are tightly distributed at the central frequency
0 .
In the graph below, we draw the spectral densities implied by stochastic-cycle models with
four sets of parameters: 0 = /4, = 0.8; 0 = /4, = 0.9; 0 = 4/5, = 0.8; and
0 = 4/5, = 0.9. The graph below illustrates that 0 is the central frequency at which the other
important random components are distributed. It also illustrates that the damping parameter controls
the dispersion of the important components at the central frequency.

ucm Unobserved-components model

50

608

0 = 5
= 0.9

30

40

0 = 4
= 0.9

0 = 5
= 0.8

10

20

0 = 4
= 0.8

/4

/2

3/4

Another random-walk model example


Example 3
Now lets reconsider example 2. Although we might be happy with how our model has identified
a stationary cyclical component that we could interpret in business-cycle terms, we suspect that there
should also be a high-frequency cyclical component. It is difficult to estimate the parameters of a UCM
with two or more stochastic-cycle models. Providing starting values for the central frequencies can be
a crucial help to the optimization procedure. Below we estimate a UCM with two cyclical components.
We use the frequency() suboption to provide starting values for the central frequencies; we specified
the values below because we suspect one model will pick up the low-frequency components and the
other will pick up the high-frequency components. We specified the low-frequency model to be order
2 to make it less peaked for any given damping factor. (Trimbur [2006] provides a nice introduction
and some formal results for higher-order stochastic-cycle models.)
. ucm unrate, cycle(1, frequency(2.9)) cycle(2, frequency(.09))
searching for initial values ....................
(setting technique to bhhh)
Iteration 0:
log likelihood = 115.98563
Iteration 1:
log likelihood = 125.04043
Iteration 2:
log likelihood = 127.69387
Iteration 3:
log likelihood = 134.50864
Iteration 4:
log likelihood = 136.91353
(switching technique to nr)
Iteration 5:
log likelihood =
138.5091
Iteration 6:
log likelihood = 146.09273
Iteration 7:
log likelihood = 146.28132
Iteration 8:
log likelihood = 146.28326
Iteration 9:
log likelihood = 146.28326
Refining estimates:
Iteration 0:
log likelihood = 146.28326
Iteration 1:
log likelihood = 146.28326

ucm Unobserved-components model


Unobserved-components model
Components: random walk, 2 cycles of order 1 2
Sample: 1948m1 - 2011m1
Log likelihood =

146.28326
OIM
Std. Err.

unrate

Coef.

cycle1
frequency
damping

2.882382
.7004295

.0668017
.125157

cycle2
frequency
damping

.0667929
.9074708
.0207704
.0027886
.002714

var(level)
var(cycle1)
var(cycle2)

Number of obs
Wald chi2(4)
Prob > chi2

=
=
=

609

757
7681.33
0.0000

P>|z|

[95% Conf. Interval]

43.15
5.60

0.000
0.000

2.751453
.4551262

3.013311
.9457328

.0206848
.0142273

3.23
63.78

0.001
0.000

.0262514
.8795858

.1073344
.9353559

.0039669
.0014363
.001028

5.24
1.94
2.64

0.000
0.026
0.004

.0129953
0
.0006991

.0285454
.0056037
.0047289

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output provides some support for the existence of a second, high-frequency cycle. The highfrequency components are centered at 2.88, whereas the low-frequency components are centered at
0.067. That the estimated damping factor is 0.70 for the high-frequency cycle whereas the estimated
damping factor for the low-frequency cycle is 0.91 indicates that the high-frequency components are
more diffusely distributed at 2.88 than the low-frequency components are at 0.067.
We obtain and plot the estimated spectral densities to get another look at these results.
. psdensity sdensity2a omega2a
. psdensity sdensity2b omega2b, cycle(2)

. line sdensity2a sdensity2b omega2a, legend(col(1))

Frequency
UCM cycle 1 spectral density
UCM cycle 2 spectral density

The estimated spectral densities indicate that we have found two distinct cyclical components.

610

ucm Unobserved-components model

It does not matter whether we specify omega2a or omega2b to be the x-axis variable, because
they are equal to each other.

Technical note
That the estimated spectral densities in the previous example do not overlap is important for
parameter identification. Although the parameters are identified in large-sample theory, we have found
it difficult to estimate the parameters of two cyclical components when the spectral densities overlap.
When the spectral densities of two cyclical components overlap, the parameters may not be well
identified and the optimization procedure may not converge.

Comparing UCM and ARIMA


Example 4
This example provides some insight for readers familiar with autoregressive integrated movingaverage (ARIMA) models but not with UCMs. If you are not familiar with ARIMA models, you may
wish to skip this example. See [TS] arima for an introduction to ARIMA models in Stata.
UCMs provide an alternative to ARIMA models implemented in [TS] arima. Neither set of models
is nested within the other, but there are some cases in which instructive comparisons can be made.

The random-walk model corresponds to an ARIMA model that is first-order integrated and has
an i.i.d. error term. In other words, the random-walk UCM and the ARIMA(0,1,0) are asymptotically
equivalent. Thus
ucm unrate

and
arima unrate, arima(0,1,0) noconstant

produce asymptotically equivalent results.


The stochastic-cycle model for the stationary cyclical component is an alternative functional form
for stationary processes to stationary autoregressive moving-average (ARMA) models. Which model
is preferred depends on the application and which parameters a researchers wants to interpret. Both
the functional forms and the parameter interpretations differ between the stochastic-cycle model and
the ARMA model. See Trimbur (2006, eq. 25) for some formal comparisons of the two models.
That both models can be used to estimate the stationary cyclical components for the random-walk
model implies that we can compare the results in this case by comparing their estimated spectral
densities. Below we estimate the parameters of an ARIMA(2,1,1) model and plot the estimated spectral
density of the stationary component.

ucm Unobserved-components model


. arima unrate, noconstant arima(2,1,1)
(setting optimization to BHHH)
Iteration 0:
log likelihood =
129.8801
Iteration 1:
log likelihood = 134.61953
Iteration 2:
log likelihood = 137.04909
Iteration 3:
log likelihood = 137.71386
Iteration 4:
log likelihood = 138.25255
(switching optimization to BFGS)
Iteration 5:
log likelihood = 138.51924
Iteration 6:
log likelihood = 138.81638
Iteration 7:
log likelihood = 138.83615
Iteration 8:
log likelihood =
138.8364
Iteration 9:
log likelihood = 138.83642
Iteration 10: log likelihood = 138.83642
ARIMA regression
Sample: 1948m2 - 2011m1
Log likelihood =

138.8364

D.unrate

Coef.

Number of obs
Wald chi2(3)
Prob > chi2

=
=
=

611

756
683.34
0.0000

OPG
Std. Err.

P>|z|

[95% Conf. Interval]

ARMA
ar
L1.
L2.

.5398016
.2468148

.0586304
.0359396

9.21
6.87

0.000
0.000

.4248882
.1763744

.6547151
.3172551

ma
L1.

-.5146506

.0632838

-8.13

0.000

-.6386845

-.3906167

/sigma

.2013332

.0032644

61.68

0.000

.1949351

.2077313

.2

ARMA spectral density


.4
.6

.8

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
. psdensity sdensity_arma omega_arma
. line sdensity_arma omega_arma

Frequency

The estimated spectral density from the ARIMA(2,1,1) has a similar shape to the plot obtained by
combining the two spectral densities estimated from the stochastic-cycle model in example 3. For
this particular application, the estimated central frequencies of the two cyclical components from the

612

ucm Unobserved-components model

stochastic-cycle model provide information about the business-cycle component and the high-frequency
component that is not easily obtained from the ARIMA(2,1,1) model. On the other hand, it is easier
to work out the impulseresponse function for the ARMA model than for the stochastic-cycle model,
implying that the ARMA model is easier to use when tracing the effect of a shock feeding through
the system.

A local-level model example


We now consider the weekly series of initial claims for unemployment insurance in the United
States, which is plotted below.

Example 5

200

Change in initial claims


300
400
500

600

700

. use http://www.stata-press.com/data/r13/icsa1, clear


. tsline icsa

01jan1970

01jan1980

01jan1990
Date

01jan2000

01jan2010

This series looks like it was generated by a random walk with extra noise, so we want to use a
random-walk model that includes an additional random term. This structure causes the model to be
occasionally known as the random-walk-plus-noise model, but it is more commonly known as the
local-level model in the UCM literature.
The local-level model models the trend as a random walk and models the idiosyncratic components
as independent and identically distributed components. Formally, the local-level model specifies the
observed time-series yt , for t = 1, . . . , T , as

yt = t + t
t = t1 + t
where t i.i.d. N (0, 2 ) and t i.i.d. N (0, 2 ) and are mutually independent.

ucm Unobserved-components model

613

We fit the local-level model in the output below:


. ucm icsa, model(llevel)
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood = -9982.7798
Iteration 1:
log likelihood = -9913.2745
Iteration 2:
log likelihood = -9894.9925
Iteration 3:
log likelihood = -9893.7191
Iteration 4:
log likelihood = -9893.2876
(switching technique to nr)
Iteration 5:
log likelihood = -9893.2614
Iteration 6:
log likelihood = -9893.2469
Iteration 7:
log likelihood = -9893.2469
Refining estimates:
Iteration 0:
log likelihood = -9893.2469
Iteration 1:
log likelihood = -9893.2469
Unobserved-components model
Components: local level
Sample: 07jan1967 - 19feb2011
Log likelihood = -9893.2469

icsa
var(level)
var(icsa)

Coef.
116.558
124.2715

OIM
Std. Err.
8.806587
7.615506

Number of obs

z
13.24
16.32

2303

P>|z|

[95% Conf. Interval]

0.000
0.000

99.29745
109.3454

133.8186
139.1976

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.
Note: Time units are in 7 days.

The output indicates that both components are statistically significant.

Technical note
The estimation procedure will not always converge when estimating the parameters of the local-level
model. If the series does not vary enough in the random level, modeled by the random walk, and in
the stationary shocks around the random level, the estimation procedure will not converge because it
will be unable to set the variance of one of the two components to 0.
Take another look at the graphs of unrate and icsa. The extra noise around the random level
that can be seen in the graph of icsa allows us to estimate both variances.
A closely related point is that it is difficult to estimate the parameters of a local-level model with
a stochastic-cycle component because the series must have enough variation to identify the variance
of the random-walk component, the variance of the idiosyncratic term, and the parameters of the
stochastic-cycle component. In some cases, series that look like candidates for the local-level model
are best modeled as random-walk models with stochastic-cycle components.
In fact, convergence can be a problem for most of the models in ucm. Convergence problems
occur most often when there is insufficient variation to estimate the variances of the components in
the model. When there is insufficient variation to estimate the variances of the components in the
model, the optimization routine will fail to converge as it attempts to set the variance equal to 0.
This usually shows up in the iteration log when the log likelihood gets stuck at a particular value and
the message (not concave) or (backed up) is displayed repeatedly. When this happens, use the

614

ucm Unobserved-components model

iterate() option to limit the number of iterations, look to see which of the variances is being driven
to 0, and drop that component from the model. (This technique is a method to obtain convergence
to interpretable estimates, not a model-selection method.)

Example 6
We might suspect that there is some serial correlation in the idiosyncratic shock. Alternatively,
we could include a cyclical component to model the stationary time-dependence in the series. In the
example below, we add a stochastic-cycle model for the stationary cyclical process, but we drop
the idiosyncratic term and use a random-walk model instead of the local-level model. We change
the model because it is difficult to estimate the variance of the idiosyncratic term along with the
parameters of a stationary cyclical component.
. ucm icsa, model(rwalk) cycle(1)
searching for initial values ....................
(setting technique to bhhh)
Iteration 0:
log likelihood = -10008.167
Iteration 1:
log likelihood = -10007.272
Iteration 2:
log likelihood = -10007.206 (backed up)
Iteration 3:
log likelihood = -10007.17 (backed up)
Iteration 4:
log likelihood = -10007.148 (backed up)
(switching technique to nr)
Iteration 5:
log likelihood = -10007.137 (not concave)
Iteration 6:
log likelihood = -9885.1932 (not concave)
Iteration 7:
log likelihood = -9884.1636
Iteration 8:
log likelihood = -9881.6478
Iteration 9:
log likelihood = -9881.4496
Iteration 10: log likelihood = -9881.4441
Iteration 11: log likelihood = -9881.4441
Refining estimates:
Iteration 0:
log likelihood = -9881.4441
Iteration 1:
log likelihood = -9881.4441
Unobserved-components model
Components: random walk, order 1 cycle
Sample: 07jan1967 - 19feb2011
Number of obs
Wald chi2(2)
Log likelihood = -9881.4441
Prob > chi2

=
=
=

2303
23.04
0.0000

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

1.469633
.1644576

.3855657
.0349537

3.81
4.71

0.000
0.000

.7139385
.0959495

2.225328
.2329656

97.90982
149.7323

8.320047
9.980798

11.77
15.00

0.000
0.000

81.60282
130.1703

114.2168
169.2943

icsa

Coef.

frequency
damping
var(level)
var(cycle1)

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.
Note: Time units are in 7 days.

Although the output indicates that the model fits well, the small estimate of the damping parameter
indicates that the random components will be widely distributed at the central frequency. To get a
better idea of the dispersion of the components, we look at the estimated spectral density of the
stationary cyclical component.

ucm Unobserved-components model

615

.145

UCM cycle 1 spectral density


.15
.155
.16
.165

.17

. psdensity sdensity3 omega3


. line sdensity3 omega3

Frequency

The graph shows that the random components that make up the cyclical component are diffusely
distributed at a central frequency.

Comparing UCM and ARIMA, revisited


Example 7
Including lags of the dependent variable is an alternative method for modeling serially correlated
errors. The estimated coefficients on the lags of the dependent variable estimate the coefficients in an
autoregressive model for the stationary cyclical component; see Harvey (1989, 4748) for a discussion.
Including lags of the dependent variable should be viewed as an alternative to the stochastic-cycle
model for the stationary cyclical component. In this example, we use the large-sample equivalence of
the random-walk model with pth order autoregressive errors and an ARIMA(p, 1, 0) to illustrate this
point.

616

ucm Unobserved-components model

In the output below, we include 2 lags of the dependent variable in the random-walk UCM.
. ucm icsa L(1/2).icsa, model(rwalk)
searching for initial values ..........
(setting technique to bhhh)
Iteration 0:
log likelihood = -10044.209
Iteration 1:
log likelihood = -9975.8312
Iteration 2:
log likelihood = -9953.5727
Iteration 3:
log likelihood = -9936.7489
Iteration 4:
log likelihood = -9927.2306
(switching technique to nr)
Iteration 5:
log likelihood = -9918.9538
Iteration 6:
log likelihood = -9890.8306
Iteration 7:
log likelihood = -9889.562
Iteration 8:
log likelihood = -9889.5608
Iteration 9:
log likelihood = -9889.5608
Refining estimates:
Iteration 0:
log likelihood = -9889.5608
Iteration 1:
log likelihood = -9889.5608
Unobserved-components model
Components: random walk
Sample: 21jan1967 - 19feb2011

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -9889.5608


OIM
Std. Err.

=
=
=

2301
271.88
0.0000

icsa

Coef.

icsa
L1.
L2.

-.3250633
-.1794686

.0205148
.0205246

-15.85
-8.74

0.000
0.000

-.3652715
-.2196961

-.2848551
-.1392411

317.6474

9.36691

33.91

0.000

299.2886

336.0062

var(level)

P>|z|

[95% Conf. Interval]

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.
Note: Time units are in 7 days.

Now we use arima to estimate the parameters of an asymptotically equivalent ARIMA(2,1,0) model.
(We specify the technique(nr) option so that arima will compute the observed information matrix
standard errors that ucm computes.) We use nlcom to compute a point estimate and a standard error
for the variance, which is directly comparable to the one produced by ucm.

ucm Unobserved-components model

617

. arima icsa, noconstant arima(2,1,0) technique(nr)


Iteration 0:
Iteration 1:

log likelihood = -9896.4584


log likelihood = -9896.458

ARIMA regression
Sample:

14jan1967 - 19feb2011

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -9896.458


OIM
Std. Err.

P>|z|

=
=
=

2302
271.95
0.0000

D.icsa

Coef.

[95% Conf. Interval]

ar
L1.
L2.

-.3249383
-.1793353

.0205036
.0205088

-15.85
-8.74

0.000
0.000

-.3651246
-.2195317

-.284752
-.1391388

/sigma

17.81606

.2625695

67.85

0.000

17.30143

18.33068

ARMA

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
. nlcom _b[sigma:_cons]^2
_nl_1:

_b[sigma:_cons]^2

D.icsa

Coef.

_nl_1

317.4119

Std. Err.
9.355904

z
33.93

P>|z|

[95% Conf. Interval]

0.000

299.0746

335.7491

It is no accident that the parameter estimates and the standard errors from the two estimators
are so close. As the sample size grows the differences in the parameter estimates and the estimated
standard errors will go to 0, because the two estimators are equivalent in large samples.

Models for the trend and idiosyncratic components


A general model that allows for fixed or stochastic trends in t is given by

t = t1 + t1 + t
t = t1 + t

(2)
(3)

Following Harvey (1989), we define 11 flexible models for yt that specify both t and t in (1).
These models place restrictions on the general model specified in (2) and (3) and on t in (1). In
other words, these models jointly specify t and t .
To any of these models, a cyclical component, a seasonal component, or exogenous variables may
be added.

618

ucm Unobserved-components model

Table 1. Models for the trend and idiosyncratic components


Model name

Syntax option

No trend or idiosyncratic component

model(none)

Model

No trend

model(ntrend)

yt =t

Deterministic constant

model(dconstant)

yt = + t
=

Local level

model(llevel)

yt =t + t
t =t1 + t

Random walk

model(rwalk)

yt =t
t =t1 + t

Deterministic trend

model(dtrend)

yt =t + t
t =t1 +
=

Local level with


deterministic trend

model(lldtrend)

yt =t + t
t =t1 + + t
=

Random walk with drift

model(rwdrift)

yt =t
t =t1 + + t
=

Local linear trend

model(lltrend)

yt =t + t
t =t1 + t1 + t
t =t1 + t

Smooth trend

model(strend)

yt =t + t
t =t1 + t1
t =t1 + t

Random trend

model(rtrend)

yt =t
t =t1 + t1
t =t1 + t

The majority of the models available in ucm are designed for nonstationary time series. The
deterministic-trend model incorporates a first-order deterministic time-trend in the model. The locallevel, random-walk, local-level-with-deterministic-trend, and random-walk-with-drift models are for
modeling series with first-order stochastic trends. A series with a dth-order stochastic trend must be
differenced d times to be stationary. The local-linear-trend, smooth-trend, and random-trend models
are for modeling series with second-order stochastic trends.
The no-trend-or-idiosyncratic-component model is useful for using ucm to model stationary series
with cyclical components or seasonal components and perhaps exogenous variables. The no-trend and
the deterministic-constant models are useful for using ucm to model stationary series with seasonal
components or exogenous variables.

ucm Unobserved-components model

619

Seasonal component
A seasonal component models cyclical behavior in a time series that occurs at known seasonal
periodicities. A seasonal component is modeled in the time domain; the period of the cycle is specified
as the number of time periods required for the cycle to complete.

Example 8
Lets begin by considering a series that displays a seasonal effect. Below we plot a monthly series
containing the number of new cases of mumps in New York City between January 1928 and December
1972. (See Hipel and McLeod [1994] for the source and further discussion of this dataset.)

number of mumps cases reported in NYC


500
1000
1500
2000

. use http://www.stata-press.com/data/r13/mumps, clear


. tsline mumps

1930m1

1940m1

1950m1
Month

1960m1

1970m1

The graph reveals recurring spikes at regular intervals, which we suspect to be seasonal effects. The
series may or may not be stationary; the graph evidence is not definitive.
Deterministic seasonal effects are a standard method of incorporating seasonality into a model. In a
model with a constant term, the s deterministic seasonal effects are modeled as s parameters subject to
the constraint that they sum to zero; formally, t + t1 + + t(s1) = 0. A stochastic-seasonal
model is a more flexible alternative that allows the seasonal effects at time t to sum to t , a zero-mean,
finite-variance, i.i.d. random variable; formally, t + t1 + + t(s1) = t .
In the output below, we model the seasonal effects by a stochastic-seasonal model, we allow for
the series to follow a random walk, and we include a stationary cyclical component.

620

ucm Unobserved-components model


. ucm mumps, seasonal(12) cycle(1)
searching for initial values ...................
(setting technique to bhhh)
Iteration 0:
log likelihood = -3268.1808
Iteration 1:
log likelihood = -3256.5168
Iteration 2:
log likelihood = -3254.609
Iteration 3:
log likelihood = -3250.3542
Iteration 4:
log likelihood = -3249.3591
(switching technique to nr)
Iteration 5:
log likelihood = -3248.9226
Iteration 6:
log likelihood = -3248.7178
Iteration 7:
log likelihood = -3248.7138
Iteration 8:
log likelihood = -3248.7138
Refining estimates:
Iteration 0:
log likelihood = -3248.7138
Iteration 1:
log likelihood = -3248.7138
Unobserved-components model
Components: random walk, seasonal(12), order 1 cycle
Sample: 1928m1 - 1972m6
Number of obs
Wald chi2(2)
Log likelihood = -3248.7138
Prob > chi2
OIM
Std. Err.

mumps

Coef.

frequency
damping

.3863607
.8405622

.0282037
.0197933

221.2131
4.151639
12228.17

140.5179
4.383442
813.8394

var(level)
var(seasonal)
var(cycle1)

=
=
=

534
2141.69
0.0000

P>|z|

[95% Conf. Interval]

13.70
42.47

0.000
0.000

.3310824
.8017681

.4416389
.8793563

1.57
0.95
15.03

0.058
0.172
0.000

0
0
10633.08

496.6231
12.74303
13823.27

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output indicates that the trend and seasonal variances may not be necessary. When the variance of
the seasonal component is zero, the seasonal component becomes deterministic. Below we estimate the
parameters of a model that includes deterministic seasonal effects and a stationary cyclical component.
. ucm mumps ibn.month, model(none) cycle(1)
searching for initial values .......
(setting technique to bhhh)
Iteration 0:
log likelihood = -3944.7035
Iteration 1:
log likelihood = -3646.639
Iteration 2:
log likelihood = -3546.182
Iteration 3:
log likelihood = -3468.1879
Iteration 4:
log likelihood = -3432.8603
(switching technique to nr)
Iteration 5:
log likelihood = -3405.0632
Iteration 6:
log likelihood = -3285.9443
Iteration 7:
log likelihood = -3283.0404
Iteration 8:
log likelihood = -3283.0284
Iteration 9:
log likelihood = -3283.0284
Refining estimates:
Iteration 0:
log likelihood = -3283.0284
Iteration 1:
log likelihood = -3283.0284

ucm Unobserved-components model


Unobserved-components model
Components: order 1 cycle
Sample: 1928m1 - 1972m6

Number of obs
Wald chi2(14)
Prob > chi2

Log likelihood = -3283.0284


OIM
Std. Err.

mumps

Coef.

cycle1
frequency
damping

.3272754
.844874

.0262922
.0184994

480.5095
561.9174
832.8666
894.0747
869.6568
770.1562
433.839
218.2394
140.686
148.5876
215.0958
330.2232
13031.53

=
=
=

621

534
3404.29
0.0000

P>|z|

[95% Conf. Interval]

12.45
45.67

0.000
0.000

.2757436
.8086157

.3788071
.8811322

32.67128
32.66999
32.67696
32.64568
32.56282
32.48587
32.50165
32.56712
32.64138
32.69067
32.70311
32.68906

14.71
17.20
25.49
27.39
26.71
23.71
13.35
6.70
4.31
4.55
6.58
10.10

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

416.475
497.8854
768.8209
830.0904
805.8348
706.4851
370.1369
154.409
76.7101
84.51508
150.9989
266.1538

544.544
625.9494
896.9122
958.0591
933.4787
833.8274
497.541
282.0698
204.662
212.6601
279.1927
394.2926

798.2719

16.32

0.000

11466.95

14596.11

mumps
month
1
2
3
4
5
6
7
8
9
10
11
12
var(cycle1)

Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

The output indicates that each of these components is statistically significant.

Technical note
In a stochastic model for the seasonal component, the seasonal effects sum to the random variable
t i.i.d. N (0, 2 ):
s1
X
t =
tj + t
j=1

Stored results
Because ucm is estimated using sspace, most of the sspace stored results appear after ucm. Not
all of these results are relevant for ucm; programmers wishing to treat ucm results as sspace results
should see Stored results of [TS] sspace. See Methods and formulas for the state-space representation
of UCMs, and see [TS] sspace for more documentation that relates to all the stored results.

622

ucm Unobserved-components model

ucm stores the following in e():


Scalars
e(N)
e(k)
e(k aux)
e(k eq)
e(k dv)
e(k cycles)
e(df m)
e(ll)
e(chi2)
e(p)
e(tmin)
e(tmax)
e(stationary)
e(rank)
e(ic)
e(rc)
e(converged)
Macros
e(cmd)
e(cmdline)
e(depvar)
e(covariates)
e(indeps)
e(tvar)
e(eqnames)
e(model)
e(title)
e(tmins)
e(tmaxs)
e(chi2type)
e(vce)
e(vcetype)
e(opt)
e(initial values)
e(technique)
e(tech steps)
e(properties)
e(estat cmd)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(ilog)
e(gradient)
e(V)
e(V modelbased)
Functions
e(sample)

number of observations
number of parameters
number of auxiliary parameters
number of equations in e(b)
number of dependent variables
number of stochastic cycles
model degrees of freedom
log likelihood
2

significance
minimum time in sample
maximum time in sample
1 if the estimated parameters indicate a stationary model, 0 otherwise
rank of VCE
number of iterations
return code
1 if converged, 0 otherwise
ucm
command as typed
unoperated names of dependent variables in observation equations
list of covariates
independent variables
variable denoting time within groups
names of equations
type of model
title in estimation output
formatted minimum time
formatted maximum time
Wald; type of model 2 test
vcetype specified in vce()
title used to label Std. Err.
type of optimization
type of initial values
maximization technique
iterations taken in maximization technique
b V
program used to implement estat
program used to implement predict
predictions allowed by margins
predictions disallowed by margins
parameter vector
constraints matrix
iteration log (up to 20 iterations)
gradient vector
variancecovariance matrix of the estimators
model-based variance
marks estimation sample

Methods and formulas


Methods and formulas are presented under the following headings:
Introduction
State-space formulation
Cyclical component extensions

ucm Unobserved-components model

623

Introduction
The general form of UCMs can be expressed as

yt = t + t + t + xt + t
where t is the trend, t is the seasonal component, t is the cycle, is the regression coefficients
for regressors xt , and t is the idiosyncratic error with variance 2 .
We can decompose the trend as

t = t
t = t1 + t1 + t
t = t1 + t
where t is the local level, t is the local slope, and t and t are i.i.d. normal errors with mean 0
and variance 2 and 2 , respectively.
Next consider the seasonal component, t , with a period of s time units. Ignoring a seasonal
Ps1
disturbance term, the seasonal effects will sum to zero, j=0 tj = 0. Adding a normal error term,
t , with mean 0 and variance 2 , we express the seasonal component as

t =

s1
X

tj + t

j=1

Finally, the cyclical component, t , is a function of the frequency , in radians, and a unit-less
scaling variable , termed the damping effect, 0 < < 1. We require two equations to express the
cycle:
t = t1 cos + et1 sin + t

et
et = t1 sin + et1 cos +
where the t and
et disturbances are normally distributed with mean 0 and variance 2 .
The disturbance terms t , t , t , t , t , and
et are independent.

State-space formulation
ucm is an easy-to-use implementation of the state-space command sspace, with special modifications, where the local linear trend components, seasonal components, and cyclical components are
states of the state-space model. The state-space model can be expressed in matrix form as

yt = Dzt + Fxt + t
zt = Azt1 + Ct
where yt , t = 1, . . . , T , are the observations and zt are the unobserved states. The number of states,
m, depends on the model specified. The k 1 vector xt contains the exogenous variables specified
as indepvars, and the 1 k vector F contains the regression coefficients to be estimated. t is the
observation equation disturbance, and the m0 1 vector t contains the state equation disturbances,
where m0 m. Finally, C is a m m0 matrix of zeros and ones. These recursive equations are
evaluated using the diffuse Kalman filter of De Jong (1991).

624

ucm Unobserved-components model

Below we give the state-space matrix structures for a local linear trend with a stochastic seasonal
component, with a period of 4 time units, and an order-2 cycle. The state vector, zt , and its transition
matrix, A, have the structure

1
0

A = 0

0
0

1 0
0
0
0
0
0
0
1 0
0
0
0
0
0
0

0 1 1 1
0
0
0
0

0 1
0
0
0
0
0
0

0 0
1
0
0
0
0
0

0 0
0
0
cos sin
1
0

0 0
0
0 sin cos
0
1

0 0
0
0
0
0
cos sin
0 0
0
0
0
0
sin cos

1
0

C = 0

0
0

0
1
0
0
0
0
0
0
0

0
0
1
0
0
0
0
0
0

0
0
0
0
0
0
0
1
0

0
0

0
1

t
t

t1

zt =
t2
t,1

et,1

t,2
et,2

t
t

t = t

t

et

D = (1 0 1 0 0 1 0 0 0)

Cyclical component extensions


Recall that the stochastic cyclical model is given by

t = (t1 cos c + t1
sin c ) + t,1

t = (t1 sin c + t1
cos c ) + t,2

where t,j i.i.d. N (0, 2 ) and 0 < < 1 is a damping effect. The cycle is variance-stationary
when < 1 because Var(t ) = 2 /(1 ). We will express a UCM with a cyclical component added
to a trend as
yt = t + t + t
where t can be any of the trend parameterizations discussed earlier.
Higher-order cycles, k = 2 or k = 3, are defined as

t,j = (t1,j cos c + t1,j


sin c ) + t1,j+1

t,j
= (t1,j sin c + t1,j
cos c ) + t1,j+1

ucm Unobserved-components model

for j < k , and

625

t,k = (t1,k cos c + t1,k


sin c ) + t,1

t,k
= (t1,k sin c + t1,k
cos c ) + t,2

Harvey and Trimbur (2003) discuss the properties of this model and its state-space formulation.

Andrew Charles Harvey (1947 ) is a British econometrician. After receiving degrees in economics
and statistics from the University of York and the London School of Economics and working
for a period in Kenya, he has worked as a teacher and researcher at the University of Kent,
the London School of Economics, and now the University of Cambridge. Harveys interests are
centered on time series, especially state-space models, signal extraction, volatility, and changes
in quantiles.

References
Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic
Research.
De Jong, P. 1991. The diffuse Kalman filter. Annals of Statistics 19: 10731083.
Durbin, J., and S. J. Koopman. 2012. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford
University Press.
Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
. 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press.
Harvey, A. C., and T. M. Trimbur. 2003. General model-based filters for extracting cycles and trends in economic
time series. The Review of Economics and Statistics 85: 244255.
Hipel, K. W., and A. I. McLeod. 1994. Time Series Modelling of Water Resources and Environmental Systems.
Amsterdam: Elsevier.
Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press.
Trimbur, T. M. 2006. Properties of higher order stochastic cycles. Journal of Time Series Analysis 27: 117.
Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson.

Also see
[TS] ucm postestimation Postestimation tools for ucm
[TS] arima ARIMA, ARMAX, and other dynamic regression models
[TS] sspace State-space models
[TS] tsfilter Filter a time-series, keeping only selected periodicities
[TS] tsset Declare data to be time-series data
[TS] tssmooth Smooth and forecast univariate time-series data
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
ucm postestimation Postestimation tools for ucm
Description
Options for predict
Options for estat period
Also see

Syntax for predict


Syntax for estat period
Remarks and examples

Menu for predict


Menu for estat
Methods and formulas

Description
The following postestimation commands are of special interest after ucm:
Command

Description

estat period
psdensity

display cycle periods in time units


estimate the spectral density

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing and inference for linear combinations
of coefficients
likelihood-ratio test
point estimates, standard errors, testing and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
nlcom
predict
predictnl
test
testnl

Special-interest postestimation commands


estat period transforms an estimated central frequency to an estimated period after ucm.

626

ucm postestimation Postestimation tools for ucm

627

Syntax for predict


predict

type

 

stub* | newvarlist

if

 

in

 

, statistic options

Description

statistic
Main

linear prediction using exogenous variables


trend component
seasonal component
cyclical component
residuals
standardized residuals

xb
trend
seasonal
cycle
residuals
rstandard

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Description

options
Options

rmse(stub* | newvarlist)

put estimated root mean squared errors of predicted statistics in the new
variable
dynamic(time constant) begin dynamic forecast at specified time
Advanced

smethod(method)

method for predicting unobserved components

method

Description

onestep
smooth
filter

predict using past information


predict using all sample information
predict using past and contemporaneous information

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, trend, seasonal, cycle, residuals, and rstandard specify the statistic to be predicted.
xb, the default, calculates the linear predictions using the exogenous variables. xb may not be
used with the smethod(filter) option.
trend estimates the unobserved trend component.
seasonal estimates the unobserved seasonal component.
cycle estimates the unobserved cyclical component.

628

ucm postestimation Postestimation tools for ucm

residuals calculates the residuals in the equation for the dependent variable. residuals may
not be specified with dynamic().
rstandard calculates the standardized residuals, which are the residuals normalized to have unit
variances. rstandard may not be specified with the smethod(filter), smethod(smooth),
or dynamic() option.

Options

rmse(stub* | newvarlist) puts the root mean squared errors of the predicted statistic into the specified
new variable. Multiple variables are only required for predicting cycles of a model that has more
than one cycle. The root mean squared errors measure the variances due to the disturbances but
do not account for estimation error. The stub* syntax is for models with multiple cycles, where
you provide the prefix and predict will add a numeric suffix for each predicted cycle.
dynamic(time constant) specifies when predict should start producing dynamic forecasts. The
specified time constant must be in the scale of the time variable specified in tsset, and the
time constant must be inside a sample for which observations on the dependent variable are
available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth
quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model
contains exogenous variables, they must be present for the whole predicted sample. dynamic()
may not be specified with the rstandard, residuals, or smethod(smooth) option.

Advanced

smethod(method) specifies the method for predicting the unobserved components. smethod() causes
different amounts of information on the dependent variable to be used in predicting the components
at each time period.
smethod(onestep), the default, causes predict to estimate the components at each time period
using previous information on the dependent variable. The Kalman filter is performed on
previous periods, but only the one-step predictions are made for the current period.
smethod(smooth) causes predict to estimate the components at each time period using all
the sample data by the Kalman smoother. smethod(smooth) may not be specified with the
rstandard option.
smethod(filter) causes predict to estimate the components at each time period using previous
and contemporaneous data by the Kalman filter. The Kalman filter is performed on previous
periods and the current period. smethod(filter) may not be specified with the xb option.

Syntax for estat period


estat period
options

, options

Description

Main

level(#)
cformat(% fmt)

set confidence level; default is level(95)


numeric format

ucm postestimation Postestimation tools for ucm

629

Menu for estat


Statistics

>

Postestimation

>

Reports and statistics

Options for estat period




Options

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is
level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals.
cformat(% fmt) sets the display format for the table numeric values. The default is cformat(%9.0g).

Remarks and examples


We assume that you have already read [TS] ucm. In this entry, we illustrate some features of
predict after using ucm to estimate the parameters of an unobserved-components model.
All predictions after ucm depend on the unobserved components, which are estimated recursively
using a Kalman filter. Changing the sample can alter the state estimates, which can change all other
predictions.

Example 1
We begin by modeling monthly data on the median duration of employment spells in the United
States. We include a stochastic-seasonal component because the data have not been seasonally adjusted.
. use http://www.stata-press.com/data/r13/uduration2
(BLS data, not seasonally adjusted)
. ucm duration, seasonal(12) cycle(1) difficult
searching for initial values ....................
(setting technique to bhhh)
Iteration 0:
log likelihood = -409.79452
Iteration 1:
log likelihood = -403.38288
Iteration 2:
log likelihood = -403.37351 (backed up)
Iteration 3:
log likelihood = -403.36878 (backed up)
Iteration 4:
log likelihood = -403.36759 (backed up)
(switching technique to nr)
Iteration 5:
log likelihood = -403.36699 (backed up)
Iteration 6:
log likelihood = -397.87773 (not concave)
Iteration 7:
log likelihood = -396.44601 (not concave)
Iteration 8:
log likelihood = -394.58451 (not concave)
Iteration 9:
log likelihood = -392.58307 (not concave)
Iteration 10: log likelihood = -389.9884 (not concave)
Iteration 11: log likelihood =
-388.885
Iteration 12: log likelihood = -388.65318
Iteration 13: log likelihood = -388.29788
Iteration 14: log likelihood = -388.26268
Iteration 15: log likelihood = -388.25677
Iteration 16: log likelihood = -388.25675
Refining estimates:
Iteration 0:
log likelihood = -388.25675
Iteration 1:
log likelihood = -388.25675

630

ucm postestimation Postestimation tools for ucm


Unobserved-components model
Components: random walk, seasonal(12), order 1 cycle
Sample: 1967m7 - 2008m12

Number of obs
Wald chi2(2)
Prob > chi2

Log likelihood = -388.25675

498
7.17
0.0277

OIM
Std. Err.

P>|z|

[95% Conf. Interval]

1.641531
.2671232

.7250323
.1050168

2.26
2.54

0.024
0.011

.2204938
.0612939

3.062568
.4729524

.1262922
.0017289
.0641496

.0221428
.0009647
.0211839

5.70
1.79
3.03

0.000
0.037
0.001

.0828932
0
.0226299

.1696912
.0036196
.1056693

duration

Coef.

frequency
damping
var(level)
var(seasonal)
var(cycle1)

=
=
=

Note: Model is not stationary.


Note: Tests of variances against zero are one sided, and the two-sided
confidence intervals are truncated at zero.

Below we predict the trend and the seasonal components to get a look at the model fit.
. predict strend, trend
. predict season, seasonal
. tsline duration strend, name(trend) nodraw legend(rows(1))
. tsline season, name(season) yline(0,lwidth(vthin)) nodraw

10

12

14

. graph combine trend season, rows(2)

1970m1

1980m1

1990m1
Month

2010m1

trend, onestep

seasonal, onestep
1
0
1

median duration of unemployment

2000m1

1970m1

1980m1

1990m1
Month

2000m1

2010m1

ucm postestimation Postestimation tools for ucm

631

The trend tracks the data well. That the seasonal component appears to change over time indicates
that the stochastic-seasonal component might fit better than a deterministic-seasonal component.

Example 2
In this example, we use the model to forecast the median unemployment duration. We use the root
mean squared error of the prediction to compute a confidence interval of our dynamic predictions.
Recall that the root mean squared error accounts for variances due to the disturbances but not due to
the estimation error.

10

12

14

. tsappend, add(12)
. predict duration_f, dynamic(tm(2009m1)) rmse(rmse)
. scalar z = invnormal(0.95)
. generate lbound = duration_f - z*rmse if tm>=tm(2008m12)
(497 missing values generated)
. generate ubound = duration_f + z*rmse if tm>=tm(2008m12)
(497 missing values generated)
. label variable lbound "90% forecast interval"
. twoway (tsline duration duration_f if tm>=tm(2006m1))
>
(tsrline lbound ubound if tm>=tm(2008m12)),
>
ysize(2) xtitle("") legend(cols(1))

2006m1

2007m1

2008m1

2009m1

2010m1

median duration of unemployment


xb prediction, duration, dynamic(tm(2009m1))
90% forecast interval/ubound

The model forecasts a large temporary increase in the median duration of unemployment.

Methods and formulas


For details on the ucm postestimation methods, see [TS] sspace postestimation.
See [TS] psdensity for the methods used to estimate the spectral density.

Also see
[TS] ucm Unobserved-components model
[TS] psdensity Parametric spectral density estimation after arima, arfima, and ucm
[TS] sspace postestimation Postestimation tools for sspace
[U] 20 Estimation and postestimation commands

Title
var intro Introduction to vector autoregressive models

Description

Remarks and examples

References

Also see

Description
Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference
on vector autoregressive (VAR) models and structural vector autoregressive (SVAR) models. The suite
includes several commands for estimating and interpreting impulseresponse functions (IRFs), dynamicmultiplier functions, and forecast-error variance decompositions (FEVDs). The table below describes
the available commands.

Fitting a VAR or SVAR


var
[TS] var
svar
[TS] var svar
varbasic
[TS] varbasic
Model diagnostics and inference
varstable
[TS] varstable
varsoc
[TS] varsoc

varwle

[TS] varwle

vargranger

[TS] vargranger

varlmar

[TS] varlmar

varnorm

[TS] varnorm

Fit vector autoregressive models


Fit structural vector autoregressive models
Fit a simple VAR and graph IRFs or FEVDs
Check the stability condition of VAR or SVAR estimates
Obtain lag-order selection statistics for VARs
and VECMs
Obtain Wald lag-exclusion statistics after var or
svar
Perform pairwise Granger causality tests after var
or svar
Perform LM test for residual autocorrelation
after var or svar
Test for normally distributed disturbances after
var or svar

Forecasting after fitting a VAR or SVAR


fcast compute [TS] fcast compute Compute dynamic forecasts after var, svar, or vec
fcast graph
[TS] fcast graph
Graph forecasts after fcast compute
Working with IRFs, dynamic-multiplier functions, and FEVDs
irf
[TS] irf
Create and analyze IRFs, dynamic-multiplier functions,
and FEVDs

This entry provides an overview of vector autoregressions and structural vector autoregressions.
More rigorous treatments can be found in Hamilton (1994), Lutkepohl (2005), and Amisano and
Giannini (1997). Stock and Watson (2001) provide an excellent nonmathematical treatment of vector
autoregressions and their role in macroeconomics. Becketti (2013) provides an excellent introduction
to VAR analysis with an emphasis on how it is done in practice.

632

var intro Introduction to vector autoregressive models

633

Remarks and examples


Remarks are presented under the following headings:
Introduction to VARs
Introduction to SVARs
Short-run SVAR models
Long-run restrictions
IRFs and FEVDs

Introduction to VARs
A VAR is a model in which K variables are specified as linear functions of p of their own lags,
p lags of the other K 1 variables, and possibly additional exogenous variables. Algebraically, a
p-order VAR model, written VAR(p), with exogenous variables xt is given by

yt = v + A1 yt1 + + Ap ytp + B0 xt + B1 xt1 + + Bs xts + ut

t {, } (1)

where

yt = (y1t , . . . , yKt )0 is a K 1 random vector,


A1 through Ap are K K matrices of parameters,
xt is an M 1 vector of exogenous variables,
B0 through Bs are K M matrices of coefficients,
v is a K 1 vector of parameters, and
ut is assumed to be white noise; that is,
E(ut ) = 0,
E(ut u0t ) = , and
E(ut u0s ) = 0 for t 6= s
There are K 2 p + K (M (s + 1) + 1) parameters in the equation for yt , and there are
{K (K + 1)}/2 parameters in the covariance matrix . One way to reduce the number of parameters
is to specify an incomplete VAR, in which some of the A or B matrices are set to zero. Another way
is to specify linear constraints on some of the coefficients in the VAR.
A VAR can be viewed as the reduced form of a system of dynamic simultaneous equations. Consider
the system

f 1 xt + W
f 2 xt2 + + W
f s xts + et
W0 yt = a + W1 yt1 + + Wp ytp + W

(2)

where a is a K 1 vector of parameters, each Wi , i = 0, . . . , p, is a K K matrix of parameters,


and et is a K 1 disturbance vector. In the traditional dynamic simultaneous equations approach,
sufficient restrictions are placed on the Wi to obtain identification. Assuming that W0 is nonsingular,
(2) can be rewritten as

yt =W01 a + W01 W1 yt1 + + W01 Wp ytp


f 1 xt + W1 W
f 2 xt2 + + W1 W
f s xts + W1 et
+ W01 W
0
0
0
which is a VAR with
v = W01 a
Ai = W01 Wi
fi
Bi = W01 W
ut = W01 et

(3)

634

var intro Introduction to vector autoregressive models

The cross-equation error variancecovariance matrix contains all the information about contemporaneous correlations in a VAR and may be the VARs greatest strength and its greatest weakness.
Because no questionable a priori assumptions are imposed, fitting a VAR allows the dataset to speak
for itself. However, without imposing some restrictions on the structure of , we cannot make a
causal interpretation of the results.
If we make additional technical assumptions, we can derive another representation of the VAR in
(1). If the VAR is stable (see [TS] varstable), we can rewrite yt as

yt = +

X
i=0

Di xti +

i uti

(4)

i=0

where is the K 1 time-invariant mean of the process and Di and i are K M and K K
matrices of parameters, respectively. Equation (4) states that the process by which the variables in
yt fluctuate about their time-invariant means, , is completely determined by the parameters in
Di and i and the (infinite) past history of the exogenous variables xt and the independent and
identically distributed (i.i.d.) shocks or innovations, ut1 , ut2 , . . . . Equation (4) is known as the
vector moving-average representation of the VAR. The Di are the dynamic-multiplier functions, or
transfer functions. The moving-average coefficients i are also known as the simple IRFs at horizon
i. The precise relationships between the VAR parameters and the Di and i are derived in Methods
and formulas of [TS] irf create.
The joint distribution of yt is determined by the distributions of xt and ut and the parameters
v, Bi , and Ai . Estimating the parameters in a VAR requires that the variables in yt and xt be
covariance stationary, meaning that their first two moments exist and are time invariant. If the yt are
not covariance stationary, but their first differences are, a vector error-correction model (VECM) can
be used. See [TS] vec intro and [TS] vec for more information about those models.
If the ut form a zero mean, i.i.d. vector process, and yt and xt are covariance stationary and are
not correlated with the ut , consistent and efficient estimates of the Bi , the Ai , and v are obtained
via seemingly unrelated regression, yielding estimators that are asymptotically normally distributed.
When the equations for the variables yt have the same set of regressors, equation-by-equation OLS
estimates are the conditional maximum likelihood estimates.
Much of the interest in VAR models is focused on the forecasts, IRFs, dynamic-multiplier functions,
and the FEVDs, all of which are functions of the estimated parameters. Estimating these functions is
straightforward, but their asymptotic standard errors are usually obtained by assuming that ut forms
a zero mean, i.i.d. Gaussian (normal) vector process. Also, some of the specification tests for VARs
have been derived using the likelihood-ratio principle and the stronger Gaussian assumption.
In the absence of contemporaneous exogenous variables, the disturbance variancecovariance
matrix contains all the information about contemporaneous correlations among the variables. VARs
are sometimes classified into three types by how they account for this contemporaneous correlation.
(See Stock and Watson [2001] for one derivation of this taxonomy.) A reduced-form VAR, aside
from estimating the variancecovariance matrix of the disturbance, does not try to account for
contemporaneous correlations. In a recursive VAR, the K variables are assumed to form a recursive
dynamic structural equation model in which the first variable is a function of lagged variables, the
second is a function of contemporaneous values of the first variable and lagged values, and so on.
In a structural VAR, the theory you are working with places restrictions on the contemporaneous
correlations that are not necessarily recursive.
Stata has two commands for fitting reduced-form VARs: var and varbasic. var allows for
constraints to be imposed on the coefficients. varbasic allows you to fit a simple VAR quickly
without constraints and graph the IRFs.

var intro Introduction to vector autoregressive models

635

Because fitting a VAR of the correct order can be important, varsoc offers several methods for
choosing the lag order p of the VAR to fit. After fitting a VAR, and before proceeding with inference,
interpretation, or forecasting, checking that the VAR fits the data is important. varlmar can be used
to check for autocorrelation in the disturbances. varwle performs Wald tests to determine whether
certain lags can be excluded. varnorm tests the null hypothesis that the disturbances are normally
distributed. varstable checks the eigenvalue condition for stability, which is needed to interpret the
IRFs and IRFs.

Introduction to SVARs
As discussed in [TS] irf create, a problem with VAR analysis is that, because is not restricted
to be a diagonal matrix, an increase in an innovation to one variable provides information about the
innovations to other variables. This implies that no causal interpretation of the simple IRFs is possible:
there is no way to determine whether the shock to the first variable caused the shock in the second
variable or vice versa.
However, suppose that we had a matrix P such that = PP0 . We can then show that the variables
in P1 ut have zero mean and that E{P1 ut (P1 ut )0 } = IK . We could rewrite (4) as

yt = +

s PP1 uts

s=0

=+

s P1 uts

s=0

=+

s wts

(5)

s=0

where s = s P and wt = P1 ut . If we had such a P, the wk would be mutually orthogonal,


and the s would allow the causal interpretation that we seek.
SVAR models provide a framework for estimation of and inference about a broad class of P
matrices. As described in [TS] irf create, the estimated P matrices can then be used to estimate
structural IRFs and structural FEVDs. There are two types of SVAR models. Short-run SVAR models
identify a P matrix by placing restrictions on the contemporaneous correlations between the variables.
Long-run SVAR models, on the other hand, do so by placing restrictions on the long-term accumulated
effects of the innovations.

Short-run SVAR models


A short-run SVAR model without exogenous variables can be written as

A(IK A1 L A2 L2 Ap Lp )yt = At = Bet

(6)

where L is the lag operator; A, B, and A1 , . . . , Ap are K K matrices of parameters; t is a


K 1 vector of innovations with t N (0, ) and E[t 0s ] = 0K for all s 6= t; and et is a K 1
vector of orthogonalized disturbances; that is, et N (0, IK ) and E[et e0s ] = 0K for all s 6= t.
These transformations of the innovations allow us to analyze the dynamics of the system in terms
of a change to an element of et . In a short-run SVAR model, we obtain identification by placing
restrictions on A and B, which are assumed to be nonsingular.

636

var intro Introduction to vector autoregressive models

Equation (6) implies that Psr = A1 B, where Psr is the P matrix identified by a particular
short-run SVAR model. The latter equality in (6) implies that

At 0t A0 = Bet e0t B0


Taking the expectation of both sides yields
= Psr P0sr
Assuming that the underlying VAR is stable (see [TS] varstable for a discussion of stability), we
can invert the autoregressive representation of the model in (6) to an infinite-order, moving-average
representation of the form

X
yt = +
sr
(7)
s ets
s=0

whereby yt is expressed in terms of the mutually orthogonal, unit-variance structural innovations et .


The sr
s contain the structural IRFs at horizon s.
In a short-run SVAR model, the A and B matrices model all the information about contemporaneous
correlations. The B matrix also scales the innovations ut to have unit variance. This allows the
structural IRFs constructed from (7) to be interpreted as the effect on variable i of a one-time unit
increase in the structural innovation to variable j after s periods.

Psr identifies the structural IRFs by defining a transformation of , and Psr is identified by
the restrictions placed on the parameters in A and B. Because there are only K(K + 1)/2 free
parameters in , only K(K + 1)/2 parameters may be estimated in an identified Psr . Because there
are 2K 2 total parameters in A and B, the order condition for identification requires that at least
2K 2 K(K + 1)/2 restrictions be placed on those parameters. Just as in the simultaneous-equations
framework, this order condition is necessary but not sufficient. Amisano and Giannini (1997) derive
a method to check that an SVAR model is locally identified near some specified values for A and B.
Before moving on to models with long-run constraints, consider these limitations. We cannot place
constraints on the elements of A in terms of the elements of B, or vice versa. This limitation is
imposed by the form of the check for identification derived by Amisano and Giannini (1997). As
noted in Methods and formulas of [TS] var svar, this test requires separate constraint matrices for
the parameters in A and B. Also, we cannot mix short-run and long-run constraints.

Long-run restrictions
A general short-run SVAR has the form

A(IK A1 L A2 L2 Ap Lp )yt = Bet


= (IK A1 L A2 L2 Ap Lp ). The model is assumed to be
To simplify the notation, let A
1 , the matrix of estimated long-run effects of the reduced-form VAR
stable (see [TS] varstable), so A
shocks, is well defined. Constraining A to be an identity matrix allows us to rewrite this equation as
1 Bet
yt = A
1 B is the matrix of long-run responses to the
which implies that = BB0 . Thus C = A
orthogonalized shocks, and
yt = Cet

var intro Introduction to vector autoregressive models

637

In long-run models, the constraints are placed on the elements of C, and the free parameters are
estimated. These constraints are often exclusion restrictions. For instance, constraining C[1, 2] to be
zero can be interpreted as setting the long-run response of variable 1 to the structural shocks driving
variable 2 to be zero.
Statas svar command estimates the parameters of structural VARs. See [TS] var svar for more
information and examples.

IRFs and FEVDs


IRFs describe how the K endogenous variables react over time to a one-time shock to one of the
K disturbances. Because the disturbances may be contemporaneously correlated, these functions do
not explain how variable i reacts to a one-time increase in the innovation to variable j after s periods,
holding everything else constant. To explain this, we must start with orthogonalized innovations so
that the assumption to hold everything else constant is reasonable. Recursive VARs use a Cholesky
decomposition to orthogonalize the disturbances and thereby obtain structurally interpretable IRFs.
Structural VARs use theory to impose sufficient restrictions, which need not be recursive, to decompose
the contemporaneous correlations into orthogonal components.
FEVDs are another tool for interpreting how the orthogonalized innovations affect the K variables
over time. The FEVD from j to i gives the fraction of the s-step forecast-error variance of variable i
that can be attributed to the j th orthogonalized innovation.

Dynamicmultiplier functions describe how the endogenous variables react over time to a unit
change in an exogenous variable. This is a different experiment from that in IRFs and FEVDs because
dynamic-multiplier functions consider a change in an exogenous variable instead of a shock to an
endogenous variable.
irf create estimates IRFs, Cholesky orthogonalized IRFs, dynamic-multiplier functions, and
structural IRFs and their standard errors. It also estimates Cholesky and structural FEVDs. The irf
graph, irf cgraph, irf ograph, irf table, and irf ctable commands graph and tabulate these
estimates. Stata also has several other commands to manage IRF and FEVD results. See [TS] irf for a
description of these commands.
fcast compute computes dynamic forecasts and their standard errors from VARs. fcast graph
graphs the forecasts that are generated using fcast compute.
VARs allow researchers to investigate whether one variable is useful in predicting another variable.
A variable x is said to Granger-cause a variable y if, given the past values of y , past values of x are
useful for predicting y . The Stata command vargranger performs Wald tests to investigate Granger
causality between the variables in a VAR.

References
Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101115.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

638

var intro Introduction to vector autoregressive models

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] vec intro Introduction to vector error-correction models
[TS] vec Vector error-correction models
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs

Title
var Vector autoregressive models
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
var depvarlist

if

 

in

 

, options

Description

options
Model

noconstant
lags(numlist)
exog(varlist)

suppress constant term


use lags numlist in the VAR
use exogenous variables varlist

Model 2

constraints(numlist)
nolog
iterate(#)
tolerance(#)
noisure
dfk
small
nobigf

apply specified linear constraints


suppress SURE iteration log
set maximum number of iterations for SURE; default is
iterate(1600)
set convergence tolerance of SURE
use one-step SURE
make small-sample degrees-of-freedom adjustment
report small-sample t and F statistics
do not compute parameter vector for coefficients implicitly
set to zero

Reporting

level(#)
lutstats
nocnsreport
display options

set confidence level; default is level(95)


report Lutkepohl lag-order selection statistics
do not display constraints
control column formats, row spacing, and line width

coeflegend

display legend instead of statistics

You must tsset your data before using var; see [TS] tsset.
depvarlist and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Vector autoregression (VAR)

639

640

var Vector autoregressive models

Description
var fits a multivariate time-series regression of each dependent variable on lags of itself and on
lags of all the other dependent variables. var also fits a variant of vector autoregressive (VAR) models
known as the VARX model, which also includes exogenous variables. See [TS] var intro for a list of
commands that are used in conjunction with var.

Options


Model

noconstant; see [R] estimation options.


lags(numlist) specifies the lags to be included in the model. The default is lags(1 2). This option
takes a numlist and not simply an integer for the maximum lag. For example, lags(2) would
include only the second lag in the model, whereas lags(1/2) would include both the first and
second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series varlists for more
discussion of numlists and lags.
exog(varlist) specifies a list of exogenous variables to be included in the VAR.

Model 2

constraints(numlist); see [R] estimation options.


nolog suppresses the log from the iterated seemingly unrelated regression algorithm. By default, the
iteration log is displayed when the coefficients are estimated through iterated seemingly unrelated
regression. When the constraints() option is not specified, the estimates are obtained via OLS,
and nolog has no effect. For this reason, nolog can be specified only when constraints() is
specified. Similarly, nolog cannot be combined with noisure.
iterate(#) specifies an integer that sets the maximum number of iterations when the estimates
are obtained through iterated seemingly unrelated regression. By default, the limit is 1,600. When
constraints() is not specified, the estimates are obtained using OLS, and iterate() has no
effect. For this reason, iterate() can be specified only when constraints() is specified.
Similarly, iterate() cannot be combined with noisure.
tolerance(#) specifies a number greater than zero and less than 1 for the convergence tolerance of
the iterated seemingly unrelated regression algorithm. By default, the tolerance is 1e-6. When the
constraints() option is not specified, the estimates are obtained using OLS, and tolerance()
has no effect. For this reason, tolerance() can be specified only when constraints() is
specified. Similarly, tolerance() cannot be combined with noisure.
noisure specifies that the estimates in the presence of constraints be obtained through one-step
seemingly unrelated regression. By default, var obtains estimates in the presence of constraints
through iterated seemingly unrelated regression. When constraints() is not specified, the
estimates are obtained using OLS, and noisure has no effect. For this reason, noisure can be
specified only when constraints() is specified.
dfk specifies that a small-sample degrees-of-freedom adjustment be used when estimating , the error
variancecovariance matrix. Specifically, 1/(T m) is used instead of the large-sample divisor
1/T , where m is the average number of parameters in the functional form for yt over the K
equations.
small causes var to report small-sample t and F statistics instead of the large-sample normal and
chi-squared statistics.

var Vector autoregressive models

641

nobigf requests that var not save the estimated parameter vector that incorporates coefficients that
have been implicitly constrained to be zero, such as when some lags have been omitted from a
model. e(bf) is used for computing asymptotic standard errors in the postestimation commands
irf create and fcast compute; see [TS] irf create and [TS] fcast compute. Therefore, specifying
nobigf implies that the asymptotic standard errors will not be available from irf create and
fcast compute. See Fitting models with some lags excluded.

Reporting

level(#); see [R] estimation options.


lutstats specifies that the Lutkepohl (2005) versions of the lag-order selection statistics be reported.
See Methods and formulas in [TS] varsoc for a discussion of these statistics.
nocnsreport; see [R] estimation options.
display options: vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch;
see [R] estimation options.
The following option is available with var but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Remarks are presented under the following headings:
Introduction
Fitting models with some lags excluded
Fitting models with exogenous variables
Fitting models with constraints on the coefficients

Introduction
A VAR is a model in which K variables are specified as linear functions of p of their own lags, p
lags of the other K 1 variables, and possibly exogenous variables. A VAR with p lags is usually
denoted a VAR(p). For more information, see [TS] var intro.

Example 1: VAR model


To illustrate the basic usage of var, we replicate the example in Lutkepohl (2005, 7778). The
data consists of three variables: the first difference of the natural log of investment, dln inv; the
first difference of the natural log of income, dln inc; and the first difference of the natural log of
consumption, dln consump. The dataset contains data through the fourth quarter of 1982, though
Lutkepohl uses only the observations through the fourth quarter of 1978.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. tsset
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

642

var Vector autoregressive models


. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats dfk
Vector autoregression
Sample: 1960q4 - 1978q4
No. of obs
Log likelihood =
606.307
(lutstats) AIC
FPE
= 2.18e-11
HQIC
Det(Sigma_ml) = 1.23e-11
SBIC
Equation
Parms
RMSE
R-sq
chi2
P>chi2
dln_inv
dln_inc
dln_consump

7
7
7

Coef.

.046148
.011719
.009445

Std. Err.

0.1286
0.1142
0.2513

9.736909
8.508289
22.15096

P>|z|

=
73
= -24.63163
= -24.40656
= -24.06686

0.1362
0.2032
0.0011

[95% Conf. Interval]

dln_inv
dln_inv
L1.
L2.

-.3196318
-.1605508

.1254564
.1249066

-2.55
-1.29

0.011
0.199

-.5655218
-.4053633

-.0737419
.0842616

dln_inc
L1.
L2.

.1459851
.1146009

.5456664
.5345709

0.27
0.21

0.789
0.830

-.9235013
-.9331388

1.215472
1.162341

dln_consump
L1.
L2.

.9612288
.9344001

.6643086
.6650949

1.45
1.40

0.148
0.160

-.3407922
-.369162

2.26325
2.237962

_cons

-.0167221

.0172264

-0.97

0.332

-.0504852

.0170409

dln_inc
dln_inv
L1.
L2.

.0439309
.0500302

.0318592
.0317196

1.38
1.58

0.168
0.115

-.018512
-.0121391

.1063739
.1121995

dln_inc
L1.
L2.

-.1527311
.0191634

.1385702
.1357525

-1.10
0.14

0.270
0.888

-.4243237
-.2469067

.1188615
.2852334

dln_consump
L1.
L2.

.2884992
-.0102

.168699
.1688987

1.71
-0.06

0.087
0.952

-.0421448
-.3412354

.6191431
.3208353

_cons

.0157672

.0043746

3.60

0.000

.0071932

.0243412

dln_consump
dln_inv
L1.
L2.

-.002423
.0338806

.0256763
.0255638

-0.09
1.33

0.925
0.185

-.0527476
-.0162235

.0479016
.0839847

dln_inc
L1.
L2.

.2248134
.3549135

.1116778
.1094069

2.01
3.24

0.044
0.001

.005929
.1404798

.4436978
.5693471

dln_consump
L1.
L2.

-.2639695
-.0222264

.1359595
.1361204

-1.94
-0.16

0.052
0.870

-.5304451
-.2890175

.0025062
.2445646

_cons

.0129258

.0035256

3.67

0.000

.0060157

.0198358

var Vector autoregressive models

643

The output has two parts: a header and the standard Stata output table for the coefficients, standard
errors, and confidence intervals. The header contains summary statistics for each equation in the VAR
and statistics used in selecting the lag order of the VAR. Although there are standard formulas for all
the lag-order statistics, Lutkepohl (2005) gives different versions of the three information criteria that
drop the constant term from the likelihood. To obtain the Lutkepohl (2005) versions, we specified
the lutstats option. The formulas for the standard and Lutkepohl versions of these statistics are
given in Methods and formulas of [TS] varsoc.
The dfk option specifies that the small-sample divisor 1/(T m) be used in estimating instead
of the maximum likelihood (ML) divisor 1/T , where m is the average number of parameters included
in each of the K equations. All the lag-order statistics are computed using the ML estimator of .
Thus, specifying dfk will not change the computed lag-order statistics, but it will change the estimated
variancecovariance matrix. Also, when dfk is specified, a dfk-adjusted log likelihood is computed
and stored in e(ll dfk).

The lag() option takes a numlist of lags. To specify a model that includes the first and second
lags, type
. var y1 y2 y3, lags(1/2)

not
. var y1 y2 y3, lags(2)

because the latter specification would fit a model that included only the second lag.

Fitting models with some lags excluded


To fit a model that has only a fourth lag, that is,

yt = v + A4 yt4 + ut
you would specify the lags(4) option. Doing so is equivalent to fitting the more general model

yt = v + A1 yt1 + A2 yt2 + A3 yt3 + A4 yt4 + ut


with A1 , A2 , and A3 constrained to be 0. When you fit a model with some lags excluded, var
estimates the coefficients included in the specification (A4 here) and stores these estimates in e(b).
To obtain the asymptotic standard errors for impulseresponse functions and other postestimation
statistics, Stata needs the complete set of parameter estimates, including those that are constrained
to be zero; var stores them in e(bf). Because you can specify models for which the full set of
parameter estimates exceeds Statas limit on the size of matrices, the nobigf option specifies that var
not compute and store e(bf). This means that the asymptotic standard errors of the postestimation
functions cannot be obtained, although bootstrap standard errors are still available. Building e(bf)
can be time consuming, so if you do not need this full matrix, and speed is an issue, use nobigf.

644

var Vector autoregressive models

Fitting models with exogenous variables


Example 2: VAR model with exogenous variables
We use the exog() option to include exogenous variables in a VAR.
. var dln_inc dln_consump if qtr<=tq(1978q4), dfk exog(dln_inv)
Vector autoregression
Sample: 1960q4 - 1978q4
No. of obs
Log likelihood = 478.5663
AIC
FPE
= 9.64e-09
HQIC
Det(Sigma_ml) = 6.93e-09
SBIC
Equation
Parms
RMSE
R-sq
chi2
P>chi2
dln_inc
dln_consump

6
6

Coef.

.011917
.009197

Std. Err.

0.0702
0.2794

5.059587
25.97262

P>|z|

=
73
= -12.78264
= -12.63259
= -12.40612

0.4087
0.0001

[95% Conf. Interval]

dln_inc
dln_inc
L1.
L2.

-.1343345
.0120331

.1391074
.1380346

-0.97
0.09

0.334
0.931

-.4069801
-.2585097

.1383111
.2825759

dln_consump
L1.
L2.

.3235342
.0754177

.1652769
.1648624

1.96
0.46

0.050
0.647

-.0004027
-.2477066

.647471
.398542

dln_inv
_cons

.0151546
.0145136

.0302319
.0043815

0.50
3.31

0.616
0.001

-.0440987
.0059259

.074408
.0231012

dln_consump
dln_inc
L1.
L2.

.2425719
.3487949

.1073561
.1065281

2.26
3.27

0.024
0.001

.0321578
.1400036

.452986
.5575862

dln_consump
L1.
L2.

-.3119629
-.0128502

.1275524
.1272325

-2.45
-0.10

0.014
0.920

-.5619611
-.2622213

-.0619648
.2365209

dln_inv
_cons

.0503616
.0131013

.0233314
.0033814

2.16
3.87

0.031
0.000

.0046329
.0064738

.0960904
.0197288

All the postestimation commands for analyzing VARs work when exogenous variables are included
in a model, but the asymptotic standard errors for the h-step-ahead forecasts are not available.

Fitting models with constraints on the coefficients


var permits model specifications that include constraints on the coefficient, though var does not
allow for constraints on . See [TS] var intro and [TS] var svar for ways to constrain .

var Vector autoregressive models

645

Example 3: VAR model with constraints


In the first example, we fit a full VAR(2) to a three-equation model. The coefficients in the equation
for dln inv were jointly insignificant, as were the coefficients in the equation for dln inc; and
many individual coefficients were not significantly different from zero. In this example, we constrain
the coefficient on L2.dln inc in the equation for dln inv and the coefficient on L2.dln consump
in the equation for dln inc to be zero.
. constraint 1 [dln_inv]L2.dln_inc = 0
. constraint 2 [dln_inc]L2.dln_consump = 0
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats dfk
> constraints(1 2)
Estimating VAR coefficients
Iteration 1:
tolerance = .00737681
Iteration 2:
tolerance = 3.998e-06
Iteration 3:
tolerance = 2.730e-09
Vector autoregression
Sample: 1960q4 - 1978q4
Log likelihood = 606.2804
FPE
= 1.77e-14
Det(Sigma_ml) = 1.05e-14
Equation
Parms
dln_inv
dln_inc
dln_consump
( 1)
( 2)

6
6
7

No. of obs
AIC
HQIC
SBIC
chi2
P>chi2

(lutstats)

RMSE

R-sq

.043895
.011143
.008981

0.1280
0.1141
0.2512

[dln_inv]L2.dln_inc = 0
[dln_inc]L2.dln_consump = 0

9.842338
8.584446
22.86958

0.0798
0.1268
0.0008

=
73
= -31.69254
= -31.46747
= -31.12777

646

var Vector autoregressive models

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

dln_inv
dln_inv
L1.
L2.

-.320713
-.1607084

.1247512
.124261

-2.57
-1.29

0.010
0.196

-.5652208
-.4042555

-.0762051
.0828386

dln_inc
L1.
L2.

.1195448
-2.55e-17

.5295669
1.18e-16

0.23
-0.22

0.821
0.829

-.9183873
-2.57e-16

1.157477
2.06e-16

dln_consump
L1.
L2.

1.009281
1.008079

.623501
.5713486

1.62
1.76

0.106
0.078

-.2127586
-.1117438

2.231321
2.127902

_cons

-.0162102

.016893

-0.96

0.337

-.0493199

.0168995

dln_inc
dln_inv
L1.
L2.

.0435712
.0496788

.0309078
.0306455

1.41
1.62

0.159
0.105

-.017007
-.0103852

.1041495
.1097428

dln_inc
L1.
L2.

-.1555119
.0122353

.1315854
.1165811

-1.18
0.10

0.237
0.916

-.4134146
-.2162595

.1023908
.2407301

dln_consump
L1.
L2.

.29286
1.78e-19

.1568345
8.28e-19

1.87
0.22

0.062
0.829

-.01453
-1.45e-18

.6002501
1.80e-18

_cons

.015689

.003819

4.11

0.000

.0082039

.0231741

dln_consump
dln_inv
L1.
L2.

-.0026229
.0337245

.0253538
.0252113

-0.10
1.34

0.918
0.181

-.0523154
-.0156888

.0470696
.0831378

dln_inc
L1.
L2.

.2224798
.3469758

.1094349
.1006026

2.03
3.45

0.042
0.001

.0079912
.1497984

.4369683
.5441532

dln_consump
L1.
L2.

-.2600227
-.0146825

.1321622
.1117618

-1.97
-0.13

0.049
0.895

-.519056
-.2337315

-.0009895
.2043666

_cons

.0129149

.003376

3.83

0.000

.0062981

.0195317

None of the free parameter estimates changed by much. Whereas the coefficients in the equation
dln inv are now significant at the 10% level, the coefficients in the equation for dln inc remain
jointly insignificant.

var Vector autoregressive models

Stored results
var stores the following in e():
Scalars
e(N)
e(N gaps)
e(k)
e(k eq)
e(k dv)
e(df eq)
e(df m)
e(df r)
e(ll)
e(ll dfk)
e(obs #)
e(k #)
e(df m#)
e(df r#)
e(r2 #)
e(ll #)
e(chi2 #)
e(F #)
e(rmse #)
e(aic)
e(hqic)
e(sbic)
e(fpe)
e(mlag)
e(tmin)
e(tmax)
e(detsig)
e(detsig ml)
e(rank)

number of observations
number of gaps in sample
number of parameters
number of equations in e(b)
number of dependent variables
average number of parameters in an equation
model degrees of freedom
residual degrees of freedom (small only)
log likelihood
dfk adjusted log likelihood (dfk only)
number of observations on equation #
number of parameters in equation #
model degrees of freedom for equation #
residual degrees of freedom for equation # (small only)
R-squared for equation #
log likelihood for equation #
x2 for equation #
F statistic for equation # (small only)
root mean squared error for equation #
Akaike information criterion
HannanQuinn information criterion
SchwarzBayesian information criterion
final prediction error
highest lag in VAR
first time period in sample
maximum time
determinant of e(Sigma)
bml
determinant of
rank of e(V)

647

648

var Vector autoregressive models

Macros
e(cmd)
e(cmdline)
e(depvar)
e(endog)
e(exog)
e(exogvars)
e(eqnames)
e(lags)
e(exlags)
e(title)
e(nocons)
e(constraints)
e(cnslist var)
e(small)
e(lutstats)
e(timevar)
e(tsfmt)
e(dfk)
e(properties)
e(predict)
e(marginsok)
e(marginsnotok)
Matrices
e(b)
e(Cns)
e(Sigma)
e(V)
e(bf)
e(exlagsm)
e(G)
Functions
e(sample)

var
command as typed
names of dependent variables
names of endogenous variables, if specified
names of exogenous variables, and their lags, if specified
names of exogenous variables, if specified
names of equations
lags in model
lags of exogenous variables in model, if specified
title in estimation output
nocons, if noconstant is specified
constraints, if specified
list of specified constraints
small, if specified
lutstats, if specified
time variable specified in tsset
format for the current time variable
dfk, if specified
b V
program used to implement predict
predictions allowed by margins
predictions disallowed by margins
coefficient vector
constraints matrix
b matrix

variancecovariance matrix of the estimators


constrained coefficient vector
matrix mapping lags to exogenous variables
Gamma matrix; see Methods and formulas
marks estimation sample

Methods and formulas


When there are no constraints placed on the coefficients, the VAR(p) is a seemingly unrelated regression model with the same explanatory variables in each equation. As discussed in Lutkepohl (2005) and
Greene (2008, 696), performing linear regression on each equation produces the maximum likelihood
estimates of the coefficients. The estimated coefficients can then be used to calculate the residuals,
which in turn are used to estimate the cross-equation error variancecovariance matrix .
Per Lutkepohl (2005), we write the VAR(p) with exogenous variables as

yt = AYt1 + B0 xt + ut
where

yt is the K 1 vector of endogenous variables,


A is a K Kp matrix of coefficients,
B0 is a K M matrix of coefficients,

(5)

var Vector autoregressive models

649

xt is the M 1 vector of exogenous variables,


ut is the K 1 vector of white noise innovations, and

yt
..

Yt is the Kp 1 matrix given by Yt =


.
ytp+1
Although (5) is easier to read, the formulas are much easier to manipulate if it is instead written
as

Y = BZ + U
where

Y= (y1 , . . . , yT )
Y is K T
B= (A, B0 )
B is K (Kp + M )


Y0 . . . , YT 1
Z=
Z is (Kp + M ) T
x1 . . . , xT
U= (u1 , . . . , uT )

U is K T

Intercept terms in the model are included in xt . If there are no exogenous variables and no intercept
terms in the model, xt is empty.
The coefficients are estimated by iterated seemingly unrelated regression. Because the estimation
is actually performed by reg3, the methods are documented in [R] reg3. See [P] makecns for more
on estimation with constraints.
b be the matrix of residuals that are obtained via Y BZ
b , where B
b is the matrix of estimated
Let U
coefficients. Then the estimator of is

b 0U
b
b = 1U

Te
By default, the maximum likelihood divisor of Te = T is used. When dfk is specified, a small-sample
degrees-of-freedom adjustment is used; then, Te = T m where m is the average number of parameters
per equation in the functional form for yt over the K equations.
small specifies that Wald tests after var be assumed to have F or t distributions instead of
chi-squared or standard normal distributions. The standard errors from each equation are computed
using the degrees of freedom for the equation.
The gamma matrix stored in e(G) referred to in Stored results is the (Kp + 1) (Kp + 1)
matrix given by
T
1X
(1, Yt0 )(1, Yt0 )0
T
t=1

The formulas for the lag-order selection criteria and the log likelihood are discussed in [TS] varsoc.

650

var Vector autoregressive models

Acknowledgment
We thank Christopher F. Baum of the Department of Economics at Boston College and author of
the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to
Stata Programming for his helpful comments.

References
Greene, W. H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101115.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

Also see
[TS] var postestimation Postestimation tools for var
[TS] tsset Declare data to be time-series data
[TS] dfactor Dynamic-factor models
[TS] forecast Econometric model forecasting
[TS] mgarch Multivariate GARCH models
[TS] sspace State-space models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] vec Vector error-correction models
[U] 20 Estimation and postestimation commands

[TS] var intro Introduction to vector autoregressive models

Title
var postestimation Postestimation tools for var
Description
Remarks and examples

Syntax for predict


Methods and formulas

Menu for predict


Also see

Options for predict

Description
The following postestimation commands are of special interest after var:
Command

Description

fcast compute
fcast graph
irf
vargranger
varlmar
varnorm
varsoc
varstable
varwle

obtain dynamic forecasts


graph dynamic forecasts obtained from fcast compute
create and analyze IRFs and FEVDs
Granger causality tests
LM test for autocorrelation in residuals
test for normally distributed residuals
lag-order selection criteria
check stability condition of estimates
Wald lag-exclusion statistics

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
test
testnl

651

652

var postestimation Postestimation tools for var

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic equation(eqno | eqname)

Description

statistic
Main

linear prediction; the default


standard error of the linear prediction
residuals

xb
stdp
residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear prediction for the specified equation.
stdp calculates the standard error of the linear prediction for the specified equation.
residuals calculates the residuals.
equation(eqno | eqname) specifies the equation to which you are referring.
equation() is filled in with one eqno or eqname for options xb, stdp, and residuals. For
example, equation(#1) would mean that the calculation is to be made for the first equation,
equation(#2) would mean the second, and so on. You could also refer to the equation by its name;
thus, equation(income) would refer to the equation named income and equation(hours), to
the equation named hours.
If you do not specify equation(), the results are the same as if you specified equation(#1).
For more information on using predict after multiple-equation estimation commands, see [R] predict.

Remarks and examples


Remarks are presented under the following headings:
Model selection and inference
Forecasting

var postestimation Postestimation tools for var

653

Model selection and inference


See the following sections for information on model selection and inference after var.
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs


vargranger Perform pairwise Granger causality tests after var or svar
varlmar Perform LM test for residual autocorrelation after var or svar
varnorm Test for normally distributed disturbances after var or svar
varsoc Obtain lag-order selection statistics for VARs and VECMs
varstable Check the stability condition of VAR or SVAR estimates
varwle Obtain Wald lag-exclusion statistics after var or svar

Forecasting
Two types of forecasts are available after you fit a VAR(p): a one-step-ahead forecast and a dynamic
h-step-ahead forecast.
The one-step-ahead forecast produces a prediction of the value of an endogenous variable in the
current period by using the estimated coefficients, the past values of the endogenous variables, and any
exogenous variables. If you include contemporaneous values of exogenous variables in your model,
you must have observations on the exogenous variables that are contemporaneous with the period
in which the prediction is being made to compute the prediction. In Stata terms, these one-stepahead predictions are just the standard linear predictions available after any estimation command.
Thus predict, xb eq(eqno | eqname) produces one-step-ahead forecasts for the specified equation.
predict, stdp eq(eqno | eqname) produces the standard error of the linear prediction for the
specified equation. The standard error of the forecast includes an estimate of the variability due to
innovations, whereas the standard error of the linear prediction does not.
The dynamic h-step-ahead forecast begins by using the estimated coefficients, the lagged values of
the endogenous variables, and any exogenous variables to predict one step ahead for each endogenous
variable. Then the one-step-ahead forecast produces two-step-ahead forecasts for each endogenous
variable. The process continues for h periods. Because each step uses the predictions of the previous
steps, these forecasts are known as dynamic forecasts. See the following sections for information on
obtaining forecasts after svar:
[TS] fcast compute Compute dynamic forecasts after var, svar, or vec
[TS] fcast graph Graph forecasts after fcast compute

Methods and formulas


Formulas for predict
predict with the xb option provides the one-step-ahead forecast. If exogenous variables are
specified, the forecast is conditional on the exogenous xt variables. Specifying the residuals option
causes predict to calculate the errors of the one-step-ahead forecasts. Specifying the stdp option
causes predict to calculate the standard errors of the one-step-ahead forecasts.

654

var postestimation Postestimation tools for var

Also see
[TS] var Vector autoregressive models
[U] 20 Estimation and postestimation commands

Title
var svar Structural vector autoregressive models
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
Short-run constraints
    
svar depvarlist if
in , aconstraints(constraintsa ) aeq(matrixaeq )
acns(matrixacns ) bconstraints(constraintsb ) beq(matrixbeq ) bcns(matrixbcns )


short run options
Long-run constraints
    
svar depvarlist if
in , lrconstraints(constraintslr ) lreq(matrixlreq )


lrcns(matrixlrcns )
long run options

655

656

var svar Structural vector autoregressive models

short run options

Description

Model

noconstant
aconstraints(constraintsa )

aeq(matrixaeq )

acns(matrixacns )

bconstraints(constraintsb )
beq(matrixbeq )

bcns(matrixbcns )
lags(numlist)

suppress constant term


apply previously defined constraintsa to A
define and apply to A equality constraint matrix matrixaeq
define and apply to A cross-parameter constraint matrix
matrixacns
apply previously defined constraintsb to B
define and apply to B equality constraint matrix matrixbeq
define and apply to B cross-parameter constraint matrixbcns
use lags numlist in the underlying VAR

Model 2

exog(varlistexog )
varconstraints(constraintsv )
noislog
isiterate(#)
istolerance(#)
noisure
dfk
small
noidencheck
nobigf

use exogenous variables varlist


apply constraintsv to underlying VAR
suppress SURE iteration log
set maximum number of iterations for SURE; default is
isiterate(1600)
set convergence tolerance of SURE
use one-step SURE
make small-sample degrees-of-freedom adjustment
report small-sample t and F statistics
do not check for local identification
do not compute parameter vector for coefficients implicitly
set to zero

Reporting

level(#)
full
var
lutstats
nocnsreport
display options

set confidence level; default is level(95)


show constrained parameters in table
display underlying var output
report Lutkepohl lag-order selection statistics
do not display constraints
control column formats

Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

aconstraints(constraintsa ), aeq(matrixaeq ), acns(matrixacns ), bconstraints(constraintsb ),


beq(matrixbeq ), bcns(matrixbcns ): at least one of these options must be specified.
coeflegend does not appear in the dialog box.

var svar Structural vector autoregressive models

657

Description

long run options


Model

suppress constant term


apply previously defined constraintslr to C
define and apply to C equality constraint matrix matrixlreq
define and apply to C cross-parameter constraint matrix
matrixlrcns
use lags numlist in the underlying VAR

noconstant
lrconstraints(constraintslr )

lreq(matrixlreq )

lrcns(matrixlrcns )

lags(numlist)
Model 2

exog(varlistexog )
varconstraints(constraintsv )
noislog
isiterate(#)
istolerance(#)
noisure
dfk
small
noidencheck
nobigf

use exogenous variables varlist


apply constraintsv to underlying VAR
suppress SURE iteration log
set maximum number of iterations for SURE; default is
isiterate(1600)
set convergence tolerance of SURE
use one-step SURE
make small-sample degrees-of-freedom adjustment
report small-sample t and F statistics
do not check for local identification
do not compute parameter vector for coefficients implicitly
set to zero

Reporting

set confidence level; default is level(95)


show constrained parameters in table
display underlying var output
report Lutkepohl lag-order selection statistics
do not display constraints
control column formats

level(#)
full
var
lutstats
nocnsreport
display options
Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

lrconstraints(constraintslr ), lreq(matrixlreq ), lrcns(matrixlrcns ): at least one of these options must be


specified.
coeflegend does not appear in the dialog box.
You must tsset your data before using svar; see [TS] tsset.
depvarlist and varlistexog may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Structural vector autoregression (SVAR)

658

var svar Structural vector autoregressive models

Description
svar fits a vector autoregressive model subject to short- or long-run constraints you place on
the resulting impulseresponse functions (IRFs). Economic theory typically motivates the constraints,
allowing a causal interpretation of the IRFs to be made. See [TS] var intro for a list of commands
that are used in conjunction with svar.

Options


Model

noconstant; see [R] estimation options.


aconstraints(constraintsa ), aeq(matrixaeq ), acns(matrixacns )
bconstraints(constraintsb ), beq(matrixbeq ), bcns(matrixbcns )
These options specify the short-run constraints in an SVAR. To specify a short-run SVAR model,
you must specify at least one of these options. The first list of options specifies constraints on
the parameters of the A matrix; the second list specifies constraints on the parameters of the B
matrix (see Short-run SVAR models). If at least one option is selected from the first list and none
are selected from the second list, svar sets B to the identity matrix. Similarly, if at least one
option is selected from the second list and none are selected from the first list, svar sets A to
the identity matrix.
None of these options may be specified with any of the options that define long-run constraints.
aconstraints(constraintsa ) specifies a numlist of previously defined Stata constraints that are
to be applied to A during estimation.
aeq(matrixaeq ) specifies a matrix that defines a set of equality constraints. This matrix must be
square with dimension equal to the number of equations in the underlying VAR. The elements
of this matrix must be missing or real numbers. A missing value in the (i, j ) element of this
matrix specifies that the (i, j ) element of A is a free parameter. A real number in the (i, j )
element of this matrix constrains the (i, j ) element of A to this real number. For example,


A=

1 0
. 1.5

specifies that A[1, 1] = 1, A[1, 2] = 0, A[2, 2] = 1.5, and A[2, 1] is a free parameter.

var svar Structural vector autoregressive models

659

acns(matrixacns ) specifies a matrix that defines a set of exclusion or cross-parameter equality


constraints on A. This matrix must be square with dimension equal to the number of equations
in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer.
A missing value in the (i, j ) element of this matrix specifies that no constraint be placed on
this element of A. A zero in the (i, j ) element of this matrix constrains the (i, j ) element of
A to be zero. Any strictly positive integers must be in two or more elements of this matrix.
A strictly positive integer in the (i, j ) element of this matrix constrains the (i, j ) element of
A to be equal to all the other elements of A that correspond to elements in this matrix that
contain the same integer. For example, consider the matrix

. 1
A=
1 0

Specifying acns(A) in a two-equation SVAR constrains A[2, 1] = A[1, 2] and A[2, 2] = 0


while leaving A[1, 1] free.
bconstraints(constraintsb ) specifies a numlist of previously defined Stata constraints to be
applied to B during estimation.
beq(matrixbeq ) specifies a matrix that defines a set of equality constraints. This matrix must be
square with dimension equal to the number of equations in the underlying VAR. The elements
of this matrix must be either missing or real numbers. The syntax of implied constraints is
analogous to the one described in aeq(), except that it applies to B rather than to A.
bcns(matrixbcns ) specifies a matrix that defines a set of exclusion or cross-parameter equality
constraints on B. This matrix must be square with dimension equal to the number of equations
in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer.
The format of the implied constraints is the same as the one described in the acns() option
above.
lrconstraints(constraintslr ), lreq(matrixlreq ), lrcns(matrixlrcns )
These options specify the long-run constraints in an SVAR. To specify a long-run SVAR model,
you must specify at least one of these options. The list of options specifies constraints on the
parameters of the long-run C matrix (see Long-run SVAR models for the definition of C). None
of these options may be specified with any of the options that define short-run constraints.
lrconstraints(constraintslr ) specifies a numlist of previously defined Stata constraints to be
applied to C during estimation.
lreq(matrixlreq ) specifies a matrix that defines a set of equality constraints on the elements of C.
This matrix must be square with dimension equal to the number of equations in the underlying
VAR. The elements of this matrix must be either missing or real numbers. The syntax of implied
constraints is analogous to the one described in option aeq(), except that it applies to C.
lrcns(matrixlrcns ) specifies a matrix that defines a set of exclusion or cross-parameter equality
constraints on C. This matrix must be square with dimension equal to the number of equations
in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer.
The syntax of the implied constraints is the same as the one described for the acns() option
above.
lags(numlist) specifies the lags to be included in the underlying VAR model. The default is lags(1
2). This option takes a numlist and not simply an integer for the maximum lag. For instance,
lags(2) would include only the second lag in the model, whereas lags(1/2) would include
both the first and second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series
varlists for further discussion of numlists and lags.

660

var svar Structural vector autoregressive models

Model 2

exog(varlistexog ) specifies a list of exogenous variables to be included in the underlying VAR.


varconstraints(constraintsv ) specifies a list of constraints to be applied to coefficients in the
underlying VAR. Because svar estimates multiple equations, the constraints must specify the
equation name for all but the first equation.
noislog prevents svar from displaying the iteration log from the iterated seemingly unrelated
regression algorithm. When the varconstraints() option is not specified, the VAR coefficients
are estimated via OLS, a noniterative procedure. As a result, noislog may be specified only with
varconstraints(). Similarly, noislog may not be combined with noisure.
isiterate(#) sets the maximum number of iterations for the iterated seemingly unrelated regression
algorithm. The default limit is 1,600. When the varconstraints() option is not specified, the
VAR coefficients are estimated via OLS, a noniterative procedure. As a result, isiterate() may
be specified only with varconstraints(). Similarly, isiterate() may not be combined with
noisure.
istolerance(#) specifies the convergence tolerance of the iterated seemingly unrelated regression
algorithm. The default tolerance is 1e-6. When the varconstraints() option is not specified,
the VAR coefficients are estimated via OLS, a noniterative procedure. As a result, istolerance()
may be specified only with varconstraints(). Similarly, istolerance() may not be combined
with noisure.
noisure specifies that the VAR coefficients be estimated via one-step seemingly unrelated regression
when varconstraints() is specified. By default, svar estimates the coefficients in the VAR
via iterated seemingly unrelated regression when varconstraints() is specified. When the
varconstraints() option is not specified, the VAR coefficient estimates are obtained via OLS, a
noniterative procedure. As a result, noisure may be specified only with varconstraints().
dfk specifies that a small-sample degrees-of-freedom adjustment be used when estimating , the
covariance matrix of the VAR disturbances. Specifically, 1/(T m) is used instead of the largesample divisor 1/T , where m is the average number of parameters in the functional form for yt
over the K equations.
small causes svar to calculate and report small-sample t and F statistics instead of the large-sample
normal and chi-squared statistics.
noidencheck requests that the Amisano and Giannini (1997) check for local identification not be
performed. This check is local to the starting values used. Because of this dependence on the
starting values, you may wish to suppress this check by specifying the noidencheck option.
However, be careful in specifying this option. Models that are not structurally identified can still
converge, thereby producing meaningless results that only appear to have meaning.
nobigf requests that svar not save the estimated parameter vector that incorporates coefficients that
have been implicitly constrained to be zero, such as when some lags have been omitted from a
model. e(bf) is used for computing asymptotic standard errors in the postestimation commands
irf create and fcast compute. Therefore, specifying nobigf implies that the asymptotic
standard errors will not be available from irf create and fcast compute. See Fitting models
with some lags excluded in [TS] var.

Reporting

level(#); see [R] estimation options.


full shows constrained parameters in table.

var svar Structural vector autoregressive models

661

var specifies that the output from var also be displayed. By default, the underlying VAR is fit
quietly.
lutstats specifies that the Lutkepohl versions of the lag-order selection statistics be reported. See
Methods and formulas in [TS] varsoc for a discussion of these statistics.
nocnsreport; see [R] estimation options.
display options: cformat(% fmt), pformat(% fmt), and sformat(% fmt); see [R] estimation options.

Maximization

 
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(init specs); see [R] maximize. These options are
seldom used.
The following option is available with svar but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Remarks are presented under the following headings:
Introduction
Short-run SVAR models
Long-run SVAR models

Introduction
This entry assumes that you have already read [TS] var intro and [TS] var; if not, please do. Here
we illustrate how to fit SVARs in Stata subject to short-run and long-run restrictions. For more detailed
information on SVARs, see Amisano and Giannini (1997) and Hamilton (1994). For good introductions
to VARs, see Lutkepohl (2005), Hamilton (1994), Stock and Watson (2001), and Becketti (2013).

Short-run SVAR models


A short-run SVAR model without exogenous variables can be written as

A(IK A1 L A2 L2 Ap Lp )yt = At = Bet


where L is the lag operator, A, B, and A1 , . . . , Ap are K K matrices of parameters, t is a
K 1 vector of innovations with t N (0, ) and E[t 0s ] = 0K for all s 6= t, and et is a K 1
vector of orthogonalized disturbances; that is, et N (0, IK ) and E[et e0s ] = 0K for all s 6= t.
These transformations of the innovations allow us to analyze the dynamics of the system in terms
of a change to an element of et . In a short-run SVAR model, we obtain identification by placing
restrictions on A and B, which are assumed to be nonsingular.

Example 1: Short-run just-identified SVAR model


Following Sims (1980), the Cholesky decomposition is one method of identifying the impulse
response functions in a VAR; thus, this method corresponds to an SVAR. There are several sets of
constraints on A and B that are easily manipulated back to the Cholesky decomposition, and the
following example illustrates this point.

662

var svar Structural vector autoregressive models

One way to impose the Cholesky restrictions is to assume an SVAR model of the form

e K A1 A2 L2 Ap Lp )yt = Be
e t
A(I
e is a lower triangular matrix with ones on the diagonal and B
e is a diagonal matrix. Because
where A
1 e
e
b
the P matrix for this model is Psr = A B, its estimate, Psr , obtained by plugging in estimates
e and B
e , should equal the Cholesky decomposition of .
b
of A
To illustrate, we use the German macroeconomic data discussed in Lutkepohl (2005) and used
in [TS] var. In this example, yt = (dln inv, dln inc, dln consump), where dln inv is the
first difference of the log of investment, dln inc is the first difference of the log of income, and
dln consump is the first difference of the log of consumption. Because the first difference of the
natural log of a variable can be treated as an approximation of the percentage change in that variable,
we will refer to these variables as percentage changes in inv, inc, and consump, respectively.
We will impose the Cholesky restrictions on this system by applying
constraint matrices

. 0
1 0 0
and
B = 0 .
A = . 1 0
0 0
. . 1

equality constraints with the

0
0
.

With these structural restrictions, we assume that the percentage change in inv is not contemporaneously affected by the percentage changes in either inc or consump. We also assume that the
percentage change of inc is affected by contemporaneous changes in inv but not consump. Finally,
we assume that percentage changes in consump are affected by contemporaneous changes in both
inv and inc.
The following commands fit an SVAR model with these constraints.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. matrix A = (1,0,0\.,1,0\.,.,1)
. matrix B = (.,0,0\0,.,0\0,0,.)

var svar Structural vector autoregressive models

663

. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B)


Estimating short-run parameters
(output omitted )
Structural vector autoregression
( 1) [a_1_1]_cons = 1
( 2) [a_1_2]_cons = 0
( 3) [a_1_3]_cons = 0
( 4) [a_2_2]_cons = 1
( 5) [a_2_3]_cons = 0
( 6) [a_3_3]_cons = 1
( 7) [b_1_2]_cons = 0
( 8) [b_1_3]_cons = 0
( 9) [b_2_1]_cons = 0
(10) [b_2_3]_cons = 0
(11) [b_3_1]_cons = 0
(12) [b_3_2]_cons = 0
Sample: 1960q4 - 1978q4
Exactly identified model
Coef.

No. of obs
Log likelihood
Std. Err.

/a_1_1
/a_2_1
/a_3_1
/a_1_2
/a_2_2
/a_3_2
/a_1_3
/a_2_3
/a_3_3

1
-.0336288
-.0435846
0
1
-.424774
0
0
1

(constrained)
.0294605
-1.14
.0194408
-2.24
(constrained)
(constrained)
.0765548
-5.55
(constrained)
(constrained)
(constrained)

/b_1_1
/b_2_1
/b_3_1
/b_1_2
/b_2_2
/b_3_2
/b_1_3
/b_2_3
/b_3_3

.0438796
0
0
0
.0110449
0
0
0
.0072243

.0036315
12.08
(constrained)
(constrained)
(constrained)
.0009141
12.08
(constrained)
(constrained)
(constrained)
.0005979
12.08

P>|z|

=
=

73
606.307

[95% Conf. Interval]

0.254
0.025

-.0913702
-.0816879

.0241126
-.0054812

0.000

-.5748187

-.2747293

0.000

.036762

.0509972

0.000

.0092534

.0128365

0.000

.0060525

.0083962

The SVAR output has four parts: an iteration log, a display of the constraints imposed, a header with
sample and SVAR log-likelihood information, and a table displaying the estimates of the parameters
from the A and B matrices. From the output above, we can see that the equality constraint matrices
supplied to svar imposed the intended constraints and that the SVAR header informs us that the model
we fit is just identified. The estimates of a 2 1, a 3 1, and a 3 2 are all negative. Because the
off-diagonal elements of the A matrix contain the negative of the actual contemporaneous effects,
the estimated effects are positive, as expected.

b and B
b are stored in e(A) and e(B), respectively, allowing us to compute the
The estimates A
estimated Cholesky decomposition.
. matrix Aest = e(A)
. matrix Best = e(B)
. matrix chol_est = inv(Aest)*Best

664

var svar Structural vector autoregressive models


. matrix list chol_est
chol_est[3,3]
dln_inv
dln_inv
.04387957
dln_inc
.00147562
dln_consump
.00253928

dln_inc
0
.01104494
.0046916

dln_consump
0
0
.00722432

svar stores the estimated from the underlying var in e(Sigma). The output below illustrates
the computation of the Cholesky decomposition of e(Sigma). It is the same as the output computed
from the SVAR estimates.
. matrix sig_var = e(Sigma)
. matrix chol_var = cholesky(sig_var)
. matrix list chol_var
chol_var[3,3]
dln_inv
dln_inv
.04387957
dln_inc
.00147562
dln_consump
.00253928

dln_inc
0
.01104494
.0046916

dln_consump
0
0
.00722432

We might now wonder why we bother obtaining parameter estimates via nonlinear estimation if
we can obtain them simply by a transform of the estimates produced by var. When the model is just
identified, as in the previous example, the SVAR parameter estimates can be computed via a transform
of the VAR estimates. However, when the model is overidentified, such is not the case.

Example 2: Short-run overidentified SVAR model


The Cholesky decomposition example above fit a just-identified model. This example considers
an overidentified model. In example 1, the a 2 1 parameter was not significant, which is consistent
with a theory in which changes in our measure of investment affect only changes in income with a
lag. We can impose the restriction that a 2 1 is zero and then test this overidentifying restriction.
Our A and B matrices are now

1 0 0
. 0 0
A = 0 1 0
and
B = 0 . 0
. . 1
0 0 .
The output below contains the commands and results we obtained by fitting this model on the
Lutkepohl data.
. matrix B = (.,0,0\0,.,0\0,0,.)
. matrix A = (1,0,0\0,1,0\.,.,1)

var svar Structural vector autoregressive models

665

. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B)


Estimating short-run parameters
(output omitted )
Structural vector autoregression
( 1) [a_1_1]_cons = 1
( 2) [a_1_2]_cons = 0
( 3) [a_1_3]_cons = 0
( 4) [a_2_1]_cons = 0
( 5) [a_2_2]_cons = 1
( 6) [a_2_3]_cons = 0
( 7) [a_3_3]_cons = 1
( 8) [b_1_2]_cons = 0
( 9) [b_1_3]_cons = 0
(10) [b_2_1]_cons = 0
(11) [b_2_3]_cons = 0
(12) [b_3_1]_cons = 0
(13) [b_3_2]_cons = 0
Sample: 1960q4 - 1978q4
Overidentified model
Coef.

No. of obs
Log likelihood
Std. Err.

P>|z|

/a_1_1
/a_2_1
/a_3_1
/a_1_2
/a_2_2
/a_3_2
/a_1_3
/a_2_3
/a_3_3

1
0
-.0435911
0
1
-.4247741
0
0
1

(constrained)
(constrained)
.0192696
-2.26
(constrained)
(constrained)
.0758806
-5.60
(constrained)
(constrained)
(constrained)

/b_1_1
/b_2_1
/b_3_1
/b_1_2
/b_2_2
/b_3_2
/b_1_3
/b_2_3
/b_3_3

.0438796
0
0
0
.0111431
0
0
0
.0072243

.0036315
12.08
(constrained)
(constrained)
(constrained)
.0009222
12.08
(constrained)
(constrained)
(constrained)
.0005979
12.08

LR test of identifying restrictions:

chi2(

=
=

73
605.6613

[95% Conf. Interval]

0.024

-.0813589

-.0058233

0.000

-.5734973

-.2760508

0.000

.036762

.0509972

0.000

.0093356

.0129506

0.000

.0060525

.0083962

1)=

1.292

Prob > chi2 = 0.256

The footer in this example reports a test of the overidentifying restriction. The null hypothesis of this
test is that any overidentifying restrictions are valid. In the case at hand, we cannot reject this null
hypothesis at any of the conventional levels.

Example 3: Short-run SVAR model with constraints


svar also allows us to place constraints on the parameters of the underlying VAR. We begin by
looking at the underlying VAR for the SVARs that we have used in the previous examples.

666

var svar Structural vector autoregressive models


. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4)
Vector autoregression
Sample: 1960q4 - 1978q4
No. of obs
Log likelihood =
606.307
AIC
FPE
= 2.18e-11
HQIC
Det(Sigma_ml) = 1.23e-11
SBIC
Equation
Parms
RMSE
R-sq
chi2
P>chi2
dln_inv
dln_inc
dln_consump

7
7
7

Coef.

.046148
.011719
.009445

Std. Err.

0.1286
0.1142
0.2513

10.76961
9.410683
24.50031

P>|z|

=
73
= -16.03581
= -15.77323
= -15.37691

0.0958
0.1518
0.0004

[95% Conf. Interval]

dln_inv
dln_inv
L1.
L2.

-.3196318
-.1605508

.1192898
.118767

-2.68
-1.35

0.007
0.176

-.5534355
-.39333

-.0858282
.0722283

dln_inc
L1.
L2.

.1459851
.1146009

.5188451
.508295

0.28
0.23

0.778
0.822

-.8709326
-.881639

1.162903
1.110841

dln_consump
L1.
L2.

.9612288
.9344001

.6316557
.6324034

1.52
1.48

0.128
0.140

-.2767936
-.3050877

2.199251
2.173888

_cons

-.0167221

.0163796

-1.02

0.307

-.0488257

.0153814

dln_inc
dln_inv
L1.
L2.

.0439309
.0500302

.0302933
.0301605

1.45
1.66

0.147
0.097

-.0154427
-.0090833

.1033046
.1091437

dln_inc
L1.
L2.

-.1527311
.0191634

.131759
.1290799

-1.16
0.15

0.246
0.882

-.4109741
-.2338285

.1055118
.2721552

dln_consump
L1.
L2.

.2884992
-.0102

.1604069
.1605968

1.80
-0.06

0.072
0.949

-.0258926
-.3249639

.6028909
.3045639

_cons

.0157672

.0041596

3.79

0.000

.0076146

.0239198

dln_consump
dln_inv
L1.
L2.

-.002423
.0338806

.0244142
.0243072

-0.10
1.39

0.921
0.163

-.050274
-.0137607

.045428
.0815219

dln_inc
L1.
L2.

.2248134
.3549135

.1061884
.1040292

2.12
3.41

0.034
0.001

.0166879
.1510199

.4329389
.558807

dln_consump
L1.
L2.

-.2639695
-.0222264

.1292766
.1294296

-2.04
-0.17

0.041
0.864

-.517347
-.2759039

-.010592
.231451

_cons

.0129258

.0033523

3.86

0.000

.0063554

.0194962

var svar Structural vector autoregressive models

667

The equation-level model tests reported in the header indicate that we cannot reject the null
hypotheses that all the coefficients in the first equation are zero, nor can we reject the null that all
the coefficients in the second equation are zero at the 5% significance level. We use a combination of
theory and the p-values from the output above to place some exclusion restrictions on the underlying
VAR(2). Specifically, in the equation for the percentage change of inv, we constrain the coefficients
on L2.dln inv, L.dln inc, L2.dln inc, and L2.dln consump to be zero. In the equation for
dln inc, we constrain the coefficients on L2.dln inv, L2.dln inc, and L2.dln consump to be
zero. Finally, in the equation for dln consump, we constrain L.dln inv and L2.dln consump to
be zero. We then refit the SVAR from the previous example.
.
.
.
.

constraint
constraint
constraint
constraint

1
2
3
4

[dln_inv]L2.dln_inv = 0
[dln_inv ]L.dln_inc = 0
[dln_inv]L2.dln_inc = 0
[dln_inv]L2.dln_consump = 0

. constraint 5 [dln_inc]L2.dln_inv = 0
. constraint 6 [dln_inc]L2.dln_inc = 0
. constraint 7 [dln_inc]L2.dln_consump = 0
. constraint 8 [dln_consump]L.dln_inv = 0
. constraint 9 [dln_consump]L2.dln_consump = 0
. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B)
> varconst(1/9) noislog
Estimating short-run parameters
(output omitted )
Structural vector autoregression
( 1)
( 2)
( 3)
( 4)
( 5)
( 6)
( 7)
( 8)
( 9)
(10)
(11)
(12)
(13)

[a_1_1]_cons
[a_1_2]_cons
[a_1_3]_cons
[a_2_1]_cons
[a_2_2]_cons
[a_2_3]_cons
[a_3_3]_cons
[b_1_2]_cons
[b_1_3]_cons
[b_2_1]_cons
[b_2_3]_cons
[b_3_1]_cons
[b_3_2]_cons

=
=
=
=
=
=
=
=
=
=
=
=
=

1
0
0
0
1
0
1
0
0
0
0
0
0

668

var svar Structural vector autoregressive models


Sample: 1960q4 - 1978q4
Overidentified model
Coef.

No. of obs
Log likelihood
Std. Err.

P>|z|

/a_1_1
/a_2_1
/a_3_1
/a_1_2
/a_2_2
/a_3_2
/a_1_3
/a_2_3
/a_3_3

1
0
-.0418708
0
1
-.4255808
0
0
1

(constrained)
(constrained)
.0187579
-2.23
(constrained)
(constrained)
.0745298
-5.71
(constrained)
(constrained)
(constrained)

/b_1_1
/b_2_1
/b_3_1
/b_1_2
/b_2_2
/b_3_2
/b_1_3
/b_2_3
/b_3_3

.0451851
0
0
0
.0113723
0
0
0
.0072417

.0037395
12.08
(constrained)
(constrained)
(constrained)
.0009412
12.08
(constrained)
(constrained)
(constrained)
.0005993
12.08

LR test of identifying restrictions:

chi2(

1)=

=
=

73
601.8591

[95% Conf. Interval]

0.026

-.0786356

-.0051061

0.000

-.5716565

-.2795051

0.000

.0378557

.0525145

0.000

.0095276

.013217

0.000

.006067

.0084164

.8448

Prob > chi2 = 0.358

If we displayed the underlying VAR(2) results by using the var option, we would see that most of
the unconstrained coefficients are now significant at the 10% level and that none of the equation-level
model statistics fail to reject the null hypothesis at the 10% level. The svar output reveals that the
p-value of the overidentification test rose and that the coefficient on a 3 1 is still insignificant at the
1% level but not at the 5% level.

Before moving on to models with long-run constraints, consider these limitations. We cannot place
constraints on the elements of A in terms of the elements of B, or vice versa. This limitation is
imposed by the form of the check for identification derived by Amisano and Giannini (1997). As
noted in Methods and formulas, this test requires separate constraint matrices for the parameters in
A and B. Another limitation is that we cannot mix short-run and long-run constraints.

Long-run SVAR models


As discussed in [TS] var intro, a long-run SVAR has the form

yt = Cet
In long-run models, the constraints are placed on the elements of C, and the free parameters are
estimated. These constraints are often exclusion restrictions. For instance, constraining C[1, 2] to be
zero can be interpreted as setting the long-run response of variable 1 to the structural shocks driving
variable 2 to be zero.
Similar to the short-run model, the Plr matrix such that Plr P0lr = identifies the structural
impulseresponse functions. Plr = C is identified by the restrictions placed on the parameters in
C. There are K 2 parameters in C, and the order condition for identification requires that there be
at least K 2 K(K + 1)/2 restrictions placed on those parameters. As in the short-run model, this
order condition is necessary but not sufficient, so the Amisano and Giannini (1997) check for local
identification is performed by default.

var svar Structural vector autoregressive models

669

Example 4: Long-run SVAR model


Suppose that we have a theory in which unexpected changes to the money supply have no long-run
effects on changes in output and, similarly, that unexpected changes in output have no long-run effects
on changes in the money supply. The C matrix implied by this theory is

. 0
C=
0 .

. use http://www.stata-press.com/data/r13/m1gdp
. matrix lr = (.,0\0,.)
. svar d.ln_m1 d.ln_gdp, lreq(lr)
Estimating long-run parameters
(output omitted )
Structural vector autoregression
( 1) [c_1_2]_cons = 0
( 2) [c_2_1]_cons = 0
Sample: 1959q4 - 2002q2
Overidentified model
Coef.
/c_1_1
/c_2_1
/c_1_2
/c_2_2

.0301007
0
0
.0129691

Std. Err.

.0016277
18.49
(constrained)
(constrained)
.0007013
18.49

LR test of identifying restrictions:

chi2(

1)=

No. of obs
Log likelihood

=
=

171
1151.614

P>|z|

[95% Conf. Interval]

0.000

.0269106

.0332909

0.000

.0115946

.0143436

.1368

Prob > chi2 = 0.712

We have assumed that the underlying VAR has 2 lags; four of the five selection-order criteria
computed by varsoc (see [TS] varsoc) recommended this choice. The test of the overidentifying
restrictions provides no indication that it is not valid.

670

var svar Structural vector autoregressive models

Stored results
svar stores the following in e():
Scalars
e(N)
e(N cns)
e(k eq)
e(k dv)
e(k aux)
e(ll)
e(ll #)
e(N gaps var)
e(k var)
e(k eq var)
e(k dv var)
e(df eq var)
e(df m var)
e(df r var)
e(obs # var)
e(k # var)
e(df m# var)
e(df r# var)
e(r2 # var)
e(ll # var)
e(chi2 # var)
e(F # var)
e(rmse # var)
e(mlag var)
e(tparms var)
e(aic var)
e(hqic var)
e(sbic var)
e(fpe var)
e(ll var)
e(detsig var)
e(detsig ml var)
e(tmin)
e(tmax)
e(chi2 oid)
e(oid df)
e(rank)
e(ic ml)
e(rc ml)

number of observations
number of constraints
number of equations in e(b)
number of dependent variables
number of auxiliary parameters
log likelihood from svar
log likelihood for equation #
number of gaps in the sample
number of coefficients in VAR
number of equations in underlying VAR
number of dependent variables in underlying VAR
average number of parameters in an equation
model degrees of freedom
if small, residual degrees of freedom
number of observations on equation #
number of coefficients in equation #
model degrees of freedom for equation #
residual degrees of freedom for equation # (small only)
R-squared for equation #
log likelihood for equation # VAR
2 statistic for equation #
F statistic for equation # (small only)
root mean squared error for equation #
highest lag in VAR
number of parameters in all equations
Akaike information criterion
HannanQuinn information criterion
SchwarzBayesian information criterion
final prediction error
log likelihood from var
determinant of e(Sigma)
bml
determinant of
first time period in the sample
maximum time
overidentification test
number of overidentifying restrictions
rank of e(V)
number of iterations
return code from ml

var svar Structural vector autoregressive models


Macros
e(cmd)
e(cmdline)
e(lrmodel)
e(lags var)
e(depvar var)
e(endog var)
e(exog var)
e(nocons var)
e(cns lr)
e(cns a)
e(cns b)
e(dfk var)
e(eqnames var)
e(lutstats var)
e(constraints var)
e(small)
e(tsfmt)
e(timevar)
e(title)
e(properties)
e(predict)
Matrices
e(b)
e(Cns)
e(Sigma)
e(V)
e(b var)
e(V var)
e(bf var)
e(G var)
e(aeq)
e(acns)
e(beq)
e(bcns)
e(lreq)
e(lrcns)
e(Cns var)
e(A)
e(B)
e(C)
e(A1)
Functions
e(sample)

671

svar
command as typed
long-run model, if specified
lags in model
names of dependent variables
names of endogenous variables
names of exogenous variables, if specified
noconstant, if noconstant specified
long-run constraints
cross-parameter equality constraints on A
cross-parameter equality constraints on B
alternate divisor (dfk), if specified
names of equations
lutstats, if specified
constraints var, if there are constraints on VAR
small, if specified
format of timevar
name of timevar
title in estimation output
b V
program used to implement predict
coefficient vector
constraints matrix
b matrix

variancecovariance matrix of the estimators


coefficient vector of underlying VAR model
VCE of underlying VAR model
full coefficient vector with zeros in dropped lags
G matrix stored by var; see [TS] var Methods and formulas
aeq(matrix), if specified
acns(matrix), if specified
beq(matrix), if specified
bcns(matrix), if specified
lreq(matrix), if specified
lrcns(matrix), if specified
constraint matrix from var, if varconstraints() is specified
estimated A matrix, if a short-run model
estimated B matrix
estimated C matrix, if a long-run model
matrix, if a long-run model
estimated A
marks estimation sample

Methods and formulas


The log-likelihood function for models with short-run constraints is

L(A, B) =

NK
N
N
b)
ln(2) +
ln(|W|2 ) tr(W0 W
2
2
2

where W = B1 A.

1 B and A = IK , W = B1 = C1 A
1 =
When there are long-run constraints, because C = A
1

(AC) . Substituting the last term for W in the short-run log likelihood produces the long-run log
likelihood

672

var svar Structural vector autoregressive models

L(C) =

NK
N
f 2 ) N tr(W
f 0W
f
b)
ln(2) +
ln(|W|
2
2
2

f = (AC)
1 .
where W
For both the short-run and the long-run models, the maximization is performed by the scoring
method. See Harvey (1990) for a discussion of this method.
Based on results from Amisano and Giannini (1997), the score vector for the short-run model is

h
i
L(A, B)
b IK )
= N {vec(W01 )}0 {vec(W)}0 (
[vec(A), vec(B)]


(IK B1 ), (A0 B01 B1 )
and the expected information matrix is


I [vec(A), vec(B)] = N




(W1 B01 )
(IK 2 + ) (W01 B1 ), (IK B1 )
01
(IK B )

where is the commutation matrix defined in Magnus and Neudecker (1999, 4648).
Using results from Amisano and Giannini (1997), we can derive the score vector and the expected
information matrix for the case with long-run restrictions. The score vector is

h
i

L(C)
01 C01 C1 )
b IK ) (A
= N {vec(W01 )}0 {vec(W)}0 (
vec(C)
and the expected information matrix is

I [vec(C)] = N (IK C01 )(IK 2 + )(IK C01 )

Checking for identification


This section describes the methods used to check for identification of models with short-run or
long-run constraints. Both methods depend on the starting values. By default, svar uses starting
values constructed by taking a vector of appropriate dimension and applying the constraints. If there
are m parameters in the model, the j th element of the 1 m vector is 1 + m/100. svar also allows
the user to provide starting values.
For the short-run case, the model is identified if the matrix

NK
NK
NK
NK

Vsr
=

Ra (W0 B)
0K 2
0K 2
Ra (IK B)

has full column rank of 2K 2 , where NK = (1/2)(IK 2 + ), Ra is the constraint matrix for the
parameters in A (that is, Ra vec(A) = ra ), and Rb is the constraint matrix for the parameters in B
(that is, Rb vec(B) = rb ).

var svar Structural vector autoregressive models

673

For the long-run case, based on results from the C model in Amisano and Giannini (1997), the
model is identified if the matrix


(I C01 )(2NK )(I C1 )

Vlr
=
Rc
has full column rank of K 2 , where Rc is the constraint matrix for the parameters in C; that is,
Rc vec(C) = rc .
The test of the overidentifying restrictions is computed as

LR = 2(LLvar LLsvar )
where LR is the value of the test statistic against the null hypothesis that the overidentifying restrictions
are valid, LLvar is the log likelihood from the underlying VAR(p) model, and LLsvar is the log
likelihood from the SVAR model. The test statistic is asymptotically distributed as 2 (q), where q is the
number of overidentifying restrictions. Amisano and Giannini (1997, 3839) emphasize that, because
this test of the validity of the overidentifying restrictions is an omnibus test, it can be interpreted as
a test of the null hypothesis that all the restrictions are valid.
Because constraints might not be independent either by construction or because of the data, the
number of restrictions is not necessarily equal to the number of constraints. The rank of e(V)
gives the number of parameters that were independently estimated after applying the constraints. The
maximum number of parameters that can be estimated in an identified short-run or long-run SVAR is
K(K + 1)/2. This implies that the number of overidentifying restrictions, q , is equal to K(K + 1)/2
minus the rank of e(V).
The number of overidentifying restrictions is also linked to the order condition for each model. In
a short-run SVAR model, there are 2K 2 parameters. Because no more than K(K + 1)/2 parameters
may be estimated, the order condition for a short-run SVAR model is that at least 2K 2 K(K + 1)/2
restrictions be placed on the model. Similarly, there are K 2 parameters in long-run SVAR model.
Because no more than K(K + 1)/2 parameters may be estimated, the order condition for a long-run
SVAR model is that at least K 2 K(K + 1)/2 restrictions be placed on the model.

Acknowledgment
We thank Gianni Amisano of the Dipartimento di Scienze Economiche at the Universit`a degli
Studi di Brescia for his helpful comments.

References
Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Christiano, L. J., M. Eichenbaum, and C. L. Evans. 1999. Monetary policy shocks: What have we learned and to
what end? In Handbook of Macroeconomics: Volume 1A, ed. J. B. Taylor and M. Woodford. New York: Elsevier.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Magnus, J. R., and H. Neudecker. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics.
Rev. ed. New York: Wiley.

674

var svar Structural vector autoregressive models

Rothenberg, T. J. 1971. Identification in parametric models. Econometrica 39: 577591.


Sims, C. A. 1980. Macroeconomics and reality. Econometrica 48: 148.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101115.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

Also see
[TS] var svar postestimation Postestimation tools for svar
[TS] tsset Declare data to be time-series data
[TS] var Vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] vec Vector error-correction models
[U] 20 Estimation and postestimation commands

[TS] var intro Introduction to vector autoregressive models

Title
var svar postestimation Postestimation tools for svar
Description
Remarks and examples

Syntax for predict


Also see

Menu for predict

Options for predict

Description
The following postestimation commands are of special interest after svar:
Command

Description

fcast compute
fcast graph
irf
vargranger
varlmar
varnorm
varsoc
varstable
varwle

obtain dynamic forecasts


graph dynamic forecasts obtained from fcast compute
create and analyze IRFs and FEVDs
Granger causality tests
LM test for autocorrelation in residuals
test for normally distributed residuals
lag-order selection criteria
check stability condition of estimates
Wald lag-exclusion statistics

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
nlcom
predict
predictnl
test
testnl

675

676

var svar postestimation Postestimation tools for svar

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic equation(eqno | eqname)

Description

statistic
Main

linear prediction; the default


standard error of the linear prediction
residuals

xb
stdp
residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear prediction for the specified equation.
stdp calculates the standard error of the linear prediction for the specified equation.
residuals calculates the residuals.
equation(eqno | eqname) specifies the equation to which you are referring.
equation() is filled in with one eqno or eqname for options xb, stdp, and residuals. For
example, equation(#1) would mean that the calculation is to be made for the first equation,
equation(#2) would mean the second, and so on. You could also refer to the equation by its name;
thus, equation(income) would refer to the equation named income and equation(hours), to
the equation named hours.
If you do not specify equation(), the results are the same as if you specified equation(#1).
For more information on using predict after multiple-equation estimation commands, see [R] predict.

Remarks and examples


Remarks are presented under the following headings:
Model selection and inference
Forecasting

var svar postestimation Postestimation tools for svar

Model selection and inference


See the following sections for information on model selection and inference after var.
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs


vargranger Perform pairwise Granger causality tests after var or svar
varlmar Perform LM test for residual autocorrelation after var or svar
varnorm Test for normally distributed disturbances after var or svar
varsoc Obtain lag-order selection statistics for VARs and VECMs
varstable Check the stability condition of VAR or SVAR estimates
varwle Obtain Wald lag-exclusion statistics after var or svar

Forecasting
See the following sections for information on obtaining forecasts after svar:
[TS] fcast compute Compute dynamic forecasts after var, svar, or vec
[TS] fcast graph Graph forecasts after fcast compute

Also see
[TS] var svar Structural vector autoregressive models
[U] 20 Estimation and postestimation commands

677

Title
varbasic Fit a simple VAR and graph IRFs or FEVDs
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
varbasic depvarlist

if

 

in

 

, options

Description

options
Main

lags(numlist)
irf
fevd
nograph
step(#)

use lags numlist in the model; default is lags(1 2)


produce matrix graph of IRFs
produce matrix graph of FEVDs
do not produce a graph
set forecast horizon # for estimating the OIRFs, IRFs, and FEVDs;
default is step(8)

You must tsset your data before using varbasic; see [TS] tsset.
depvarlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.
rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Basic VAR

Description
varbasic fits a basic vector autoregressive (VAR) model and graphs the impulseresponse functions (IRFs), the orthogonalized impulseresponse functions (OIRFs), or the forecast-error variance
decompositions (FEVDs).

Options


Main

lags(numlist) specifies the lags to be included in the model. The default is lags(1 2). This option
takes a numlist and not simply an integer for the maximum lag. For instance, lags(2) would
include only the second lag in the model, whereas lags(1/2) would include both the first and
second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series varlists for more
discussion of numlists and lags.
irf causes varbasic to produce a matrix graph of the IRFs instead of a matrix graph of the OIRFs,
which is produced by default.
678

varbasic Fit a simple VAR and graph IRFs or FEVDs

679

fevd causes varbasic to produce a matrix graph of the FEVDs instead of a matrix graph of the
OIRFs, which is produced by default.
nograph specifies that no graph be produced. The IRFs, OIRFs, and FEVDs are still estimated and
saved in the IRF file varbasic.irf.
step(#) specifies the forecast horizon for estimating the IRFs, OIRFs, and FEVDs. The default is eight
periods.

Remarks and examples


varbasic simplifies fitting simple VARs and graphing the IRFs, the OIRFs, or the FEVDs. See
[TS] var and [TS] var svar for fitting more advanced VAR models and structural vector autoregressive
(SVAR) models. All the postestimation commands discussed in [TS] var postestimation work after
varbasic.
This entry does not discuss the methods for fitting a VAR or the methods surrounding the IRFs,
OIRFs, and FEVDs. See [TS] var and [TS] irf create for more on these methods. This entry illustrates
how to use varbasic to easily obtain results. It also illustrates how varbasic serves as an entry
point to further analysis.

Example 1
We fit a three-variable VAR with two lags to the German macro data used by Lutkepohl (2005).
The three variables are the first difference of natural log of investment, dln inv; the first difference
of the natural log of income, dln inc; and the first difference of the natural log of consumption,
dln consump. In addition to fitting the VAR, we want to see the OIRFs. Below we use varbasic to
fit a VAR(2) model on the data from the second quarter of 1961 through the fourth quarter of 1978.
By default, varbasic produces graphs of the OIRFs.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. varbasic dln_inv dln_inc dln_consump if qtr<=tq(1978q4)
Vector autoregression
Sample: 1960q4 - 1978q4
No. of obs
=
73
Log likelihood =
606.307
AIC
= -16.03581
FPE
= 2.18e-11
HQIC
= -15.77323
Det(Sigma_ml) = 1.23e-11
SBIC
= -15.37691
Equation
Parms
RMSE
R-sq
chi2
P>chi2
dln_inv
dln_inc
dln_consump

7
7
7

.046148
.011719
.009445

0.1286
0.1142
0.2513

10.76961
9.410683
24.50031

0.0958
0.1518
0.0004

680

varbasic Fit a simple VAR and graph IRFs or FEVDs

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

dln_inv
dln_inv
L1.
L2.

-.3196318
-.1605508

.1192898
.118767

-2.68
-1.35

0.007
0.176

-.5534355
-.39333

-.0858282
.0722283

dln_inc
L1.
L2.

.1459851
.1146009

.5188451
.508295

0.28
0.23

0.778
0.822

-.8709326
-.881639

1.162903
1.110841

dln_consump
L1.
L2.

.9612288
.9344001

.6316557
.6324034

1.52
1.48

0.128
0.140

-.2767936
-.3050877

2.199251
2.173888

_cons

-.0167221

.0163796

-1.02

0.307

-.0488257

.0153814

dln_inc
dln_inv
L1.
L2.

.0439309
.0500302

.0302933
.0301605

1.45
1.66

0.147
0.097

-.0154427
-.0090833

.1033046
.1091437

dln_inc
L1.
L2.

-.1527311
.0191634

.131759
.1290799

-1.16
0.15

0.246
0.882

-.4109741
-.2338285

.1055118
.2721552

dln_consump
L1.
L2.

.2884992
-.0102

.1604069
.1605968

1.80
-0.06

0.072
0.949

-.0258926
-.3249639

.6028909
.3045639

_cons

.0157672

.0041596

3.79

0.000

.0076146

.0239198

dln_consump
dln_inv
L1.
L2.

-.002423
.0338806

.0244142
.0243072

-0.10
1.39

0.921
0.163

-.050274
-.0137607

.045428
.0815219

dln_inc
L1.
L2.

.2248134
.3549135

.1061884
.1040292

2.12
3.41

0.034
0.001

.0166879
.1510199

.4329389
.558807

dln_consump
L1.
L2.

-.2639695
-.0222264

.1292766
.1294296

-2.04
-0.17

0.041
0.864

-.517347
-.2759039

-.010592
.231451

_cons

.0129258

.0033523

3.86

0.000

.0063554

.0194962

varbasic Fit a simple VAR and graph IRFs or FEVDs

varbasic, dln_consump, dln_consump

varbasic, dln_consump, dln_inc

varbasic, dln_consump, dln_inv

varbasic, dln_inc, dln_consump

varbasic, dln_inc, dln_inc

varbasic, dln_inc, dln_inv

varbasic, dln_inv, dln_consump

varbasic, dln_inv, dln_inc

varbasic, dln_inv, dln_inv

681

.06
.04
.02
0
.02

.06
.04
.02
0
.02

.06
.04
.02
0
.02
0

step
95% CI

orthogonalized irf

Graphs by irfname, impulse variable, and response variable

Because we are also interested in looking at the FEVDs, we can use irf graph to obtain the
graphs. Although the details are available in [TS] irf and [TS] irf graph, the command below produces
what we want after the call to varbasic.
. irf graph fevd, lstep(1)
varbasic, dln_consump, dln_consump

varbasic, dln_consump, dln_inc

varbasic, dln_consump, dln_inv

varbasic, dln_inc, dln_consump

varbasic, dln_inc, dln_inc

varbasic, dln_inc, dln_inv

varbasic, dln_inv, dln_consump

varbasic, dln_inv, dln_inc

varbasic, dln_inv, dln_inv

.5

.5

.5

0
0

step
95% CI

fraction of mse due to impulse

Graphs by irfname, impulse variable, and response variable

Technical note
Stata stores the estimated IRFs, OIRFs, and FEVDs in a IRF file called varbasic.irf in the current
working directory. varbasic replaces any varbasic.irf that already exists. Finally, varbasic
makes varbasic.irf the active IRF file. This means that the graph and table commands irf graph,

682

varbasic Fit a simple VAR and graph IRFs or FEVDs

irf cgraph, irf ograph, irf table, and irf ctable will all display results that correspond to
the VAR fit by varbasic.

Stored results
See Stored results in [TS] var.

Methods and formulas


varbasic uses var and irf graph to obtain its results. See [TS] var and [TS] irf graph for a
discussion of how those commands obtain their results.

References
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] varbasic postestimation Postestimation tools for varbasic
[TS] tsset Declare data to be time-series data
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[U] 20 Estimation and postestimation commands

[TS] var intro Introduction to vector autoregressive models

Title
varbasic postestimation Postestimation tools for varbasic
Description
Remarks and examples

Syntax for predict


Also see

Menu for predict

Options for predict

Description
The following postestimation commands are of special interest after varbasic:
Command

Description

fcast compute
fcast graph
irf
vargranger
varlmar
varnorm
varsoc
varstable
varwle

obtain dynamic forecasts


graph dynamic forecasts obtained from fcast compute
create and analyze IRFs and FEVDs
Granger causality tests
LM test for autocorrelation in residuals
test for normally distributed residuals
lag-order selection criteria
check stability condition of estimates
Wald lag-exclusion statistics

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
test
testnl

683

684

varbasic postestimation Postestimation tools for varbasic

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic equation(eqno | eqname)

Description

statistic
Main

xb
stdp
residuals

linear prediction; the default


standard error of the linear prediction
residuals

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the linear prediction for the specified equation.
stdp calculates the standard error of the linear prediction for the specified equation.
residuals calculates the residuals.
equation(eqno | eqname) specifies the equation to which you are referring.
equation() is filled in with one eqno or eqname for the xb, stdp, and residuals options.
For example, equation(#1) would mean that the calculation is to be made for the first equation,
equation(#2) would mean the second, and so on. You could also refer to the equation by its name;
thus, equation(income) would refer to the equation named income and equation(hours), to
the equation named hours.
If you do not specify equation(), the results are the same as if you specified equation(#1).
For more information on using predict after multiple-equation estimation commands, see [R] predict.

Remarks and examples


Example 1
All the postestimation commands discussed in [TS] var postestimation work after varbasic.
Suppose that we are interested in testing the hypothesis that there is no autocorrelation in the VAR
disturbances. Continuing example 1 from [TS] varbasic, we now use varlmar to test this hypothesis.

varbasic postestimation Postestimation tools for varbasic

685

. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. varbasic dln_inv dln_inc dln_consump if qtr<=tq(1978q4)
(output omitted )
. varlmar
Lagrange-multiplier test
lag

chi2

df

Prob > chi2

1
2

5.5871
6.3189

9
9

0.78043
0.70763

H0: no autocorrelation at lag order

Because we cannot reject the null hypothesis of no autocorrelation in the residuals, this test does
not indicate any model misspecification.

Also see
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[U] 20 Estimation and postestimation commands

Title
vargranger Perform pairwise Granger causality tests after var or svar

Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
vargranger

, estimates(estname) separator(#)

vargranger can be used only after var or svar; see [TS] var and [TS] var svar.

Menu
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

Granger causality tests

Description
vargranger performs a set of Granger causality tests for each equation in a VAR, providing a
convenient alternative to test; see [R] test.

Options
estimates(estname) requests that vargranger use the previously obtained set of var or svar
estimates stored as estname. By default, vargranger uses the active results. See [R] estimates
for information on manipulating estimation results.
separator(#) specifies how often separator lines should be drawn between rows. By default, separator
lines appear every K lines, where K is the number of equations in the VAR under analysis. For
example, separator(1) would draw a line between each row, separator(2) between every
other row, and so on. separator(0) specifies that lines not appear in the table.

Remarks and examples


After fitting a VAR, we may want to know whether one variable Granger-causes another
(Granger 1969). A variable x is said to Granger-cause a variable y if, given the past values of y ,
past values of x are useful for predicting y . A common method for testing Granger causality is to
regress y on its own lagged values and on lagged values of x and test the null hypothesis that the
estimated coefficients on the lagged values of x are jointly zero. Failure to reject the null hypothesis
is equivalent to failing to reject the hypothesis that x does not Granger-cause y .
For each equation and each endogenous variable that is not the dependent variable in that equation,
vargranger computes and reports Wald tests that the coefficients on all the lags of an endogenous
variable are jointly zero. For each equation in a VAR, vargranger tests the hypotheses that each of
the other endogenous variables does not Granger-cause the dependent variable in that equation.
686

vargranger Perform pairwise Granger causality tests after var or svar

687

Because it may be interesting to investigate these types of hypotheses by using the VAR that
underlies an SVAR, vargranger can also produce these tests by using the e() results from an svar.
When vargranger uses svar e() results, the hypotheses concern the underlying var estimates.
See [TS] var and [TS] var svar for information about fitting VARs and SVARs in Stata. See
Lutkepohl (2005), Hamilton (1994), and Amisano and Giannini (1997) for information about Granger
causality and on VARs and SVARs in general.

Example 1: After var


Here we refit the model with German data described in [TS] var and then perform Granger causality
tests with vargranger.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small
(output omitted )
. vargranger
Granger causality Wald tests
Equation

Excluded

df

df_r

dln_inv
dln_inv
dln_inv

Prob > F

dln_inc
dln_consump
ALL

.04847
1.5004
1.5917

2
2
4

66
66
66

0.9527
0.2306
0.1869

dln_inc
dln_inc
dln_inc

dln_inv
dln_consump
ALL

1.7683
1.7184
1.9466

2
2
4

66
66
66

0.1786
0.1873
0.1130

dln_consump
dln_consump
dln_consump

dln_inv
dln_inc
ALL

.97147
6.1465
3.7746

2
2
4

66
66
66

0.3839
0.0036
0.0080

Because the estimates() option was not specified, vargranger used the active e() results.
Consider the results of the three tests for the first equation. The first is a Wald test that the coefficients
on the two lags of dln inc that appear in the equation for dln inv are jointly zero. The null
hypothesis that dln inc does not Granger-cause dln inv cannot be rejected. Similarly, we cannot
reject the null hypothesis that the coefficients on the two lags of dln consump in the equation for
dln inv are jointly zero, so we cannot reject the hypothesis that dln consump does not Grangercause dln inv. The third test is with respect to the null hypothesis that the coefficients on the two
lags of all the other endogenous variables are jointly zero. Because this cannot be rejected, we cannot
reject the null hypothesis that dln inc and dln consump, jointly, do not Granger-cause dln inv.
Because we failed to reject most of these null hypotheses, we might be interested in imposing
some constraints on the coefficients. See [TS] var for more on fitting VAR models with constraints
on the coefficients.

Example 2: Using test instead of vargranger


We could have used test to compute these Wald tests, but vargranger saves a great deal of
typing. Still, seeing how to use test to obtain the results reported by vargranger is useful.

688

vargranger Perform pairwise Granger causality tests after var or svar


. test [dln_inv]L.dln_inc [dln_inv]L2.dln_inc
( 1) [dln_inv]L.dln_inc = 0
( 2) [dln_inv]L2.dln_inc = 0
F( 2,
66) =
0.05
Prob > F =
0.9527
. test [dln_inv]L.dln_consump [dln_inv]L2.dln_consump, accumulate
( 1) [dln_inv]L.dln_inc = 0
( 2) [dln_inv]L2.dln_inc = 0
( 3) [dln_inv]L.dln_consump = 0
( 4) [dln_inv]L2.dln_consump = 0
F( 4,
66) =
1.59
Prob > F =
0.1869
. test [dln_inv]L.dln_inv [dln_inv]L2.dln_inv, accumulate
( 1) [dln_inv]L.dln_inc = 0
( 2) [dln_inv]L2.dln_inc = 0
( 3) [dln_inv]L.dln_consump = 0
( 4) [dln_inv]L2.dln_consump = 0
( 5) [dln_inv]L.dln_inv = 0
( 6) [dln_inv]L2.dln_inv = 0
F( 6,
66) =
1.62
Prob > F =
0.1547

The first two calls to test show how vargranger obtains its results. The first test reproduces
the first test reported for the dln inv equation. The second test reproduces the ALL entry for the
first equation. The third test reproduces the standard F statistic for the dln inv equation, reported
in the header of the var output in the previous example. The standard F statistic also includes the
lags of the dependent variable, as well as any exogenous variables in the equation. This illustrates
that the test performed by vargranger of the null hypothesis that the coefficients on all the lags of
all the other endogenous variables are jointly zero for a particular equation; that is, the All test is
not the same as the standard F statistic for that equation.

Example 3: After svar


When vargranger is run on svar estimates, the null hypotheses are with respect to the underlying
var estimates. We run vargranger after using svar to fit an SVAR that has the same underlying
VAR as our model in example 1.

vargranger Perform pairwise Granger causality tests after var or svar

689

. matrix A = (., 0,0 \ ., ., 0\ .,.,.)


. matrix B = I(3)
. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small aeq(A) beq(B)
(output omitted )
. vargranger
Granger causality Wald tests
Equation

Excluded

df

df_r

dln_inv
dln_inv
dln_inv

Prob > F

dln_inc
dln_consump
ALL

.04847
1.5004
1.5917

2
2
4

66
66
66

0.9527
0.2306
0.1869

dln_inc
dln_inc
dln_inc

dln_inv
dln_consump
ALL

1.7683
1.7184
1.9466

2
2
4

66
66
66

0.1786
0.1873
0.1130

dln_consump
dln_consump
dln_consump

dln_inv
dln_inc
ALL

.97147
6.1465
3.7746

2
2
4

66
66
66

0.3839
0.0036
0.0080

As we expected, the vargranger results are identical to those in the first example.

Stored results
vargranger stores the following in r():
Matrices
r(gstats)
r(gstats)

2 , df, and p-values (if e(small)=="")


F , df, df r, and p-values (if e(small)!="")

Methods and formulas


vargranger uses test to obtain Wald statistics of the hypothesis that all coefficients on the
lags of variable x are jointly zero in the equation for variable y . vargranger uses the e() results
stored by var or svar to determine whether to calculate and report small-sample F statistics or
large-sample 2 statistics.

Clive William John Granger (19342009) was born in Swansea, Wales, and earned degrees at the
University of Nottingham in mathematics and statistics. Joining the staff there, he also worked
at Princeton on the spectral analysis of economic time series, before moving in 1973 to the
University of California, San Diego. He was awarded the 2003 Nobel Prize in Economics for
methods of analyzing economic time series with common trends (cointegration). He was knighted
in 2005, thus becoming Sir Clive Granger.

690

vargranger Perform pairwise Granger causality tests after var or svar

References
Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer.
Granger, C. W. J. 1969. Investigating causal relations by econometric models and cross-spectral methods. Econometrica
37: 424438.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Phillips, P. C. B. 1997. The ET Interview: Professor Clive Granger. Econometric Theory 13: 253303.

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] var intro Introduction to vector autoregressive models

Title
varlmar Perform LM test for residual autocorrelation after var or svar
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
varlmar

, options

options

Description

mlag(#)
estimates(estname)
separator(#)

use # for the maximum order of autocorrelation; default is mlag(2)


use previously stored results estname; default is to use active results
draw separator line after every # rows

varlmar can be used only after var or svar; see [TS] var and [TS] var svar.
You must tsset your data before using varlmar; see [TS] tsset.

Menu
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

LM test for residual autocorrelation

Description
varlmar implements a Lagrange multiplier (LM) test for autocorrelation in the residuals of VAR
models, which was presented in Johansen (1995).

Options
mlag(#) specifies the maximum order of autocorrelation to be tested. The integer specified in mlag()
must be greater than 0; the default is 2.
estimates(estname) requests that varlmar use the previously obtained set of var or svar estimates
stored as estname. By default, varlmar uses the active results. See [R] estimates for information
on manipulating estimation results.
separator(#) specifies how often separator lines should be drawn between rows. By default,
separator lines do not appear. For example, separator(1) would draw a line between each row,
separator(2) between every other row, and so on.

Remarks and examples


Most postestimation analyses of VAR models and SVAR models assume that the disturbances are
not autocorrelated. varlmar implements the LM test for autocorrelation in the residuals of a VAR
model discussed in Johansen (1995, 2122). The test is performed at lags j = 1, . . . , mlag(). For
each j , the null hypothesis of the test is that there is no autocorrelation at lag j .
691

692

varlmar Perform LM test for residual autocorrelation after var or svar

varlmar uses the estimation results stored by var or svar. By default, varlmar uses the active
estimation results. However, varlmar can use any previously stored var or svar estimation results
specified in the estimates() option.

Example 1: After var


Here we refit the model with German data described in [TS] var and then call varlmar.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk
(output omitted )
. varlmar, mlag(5)
Lagrange-multiplier test
lag

chi2

df

Prob > chi2

1
2
3
4
5

5.5871
6.3189
8.4022
11.8742
5.2914

9
9
9
9
9

0.78043
0.70763
0.49418
0.22049
0.80821

H0: no autocorrelation at lag order

Because we cannot reject the null hypothesis that there is no autocorrelation in the residuals for
any of the five orders tested, this test gives no hint of model misspecification. Although we fit the
VAR with the dfk option to be consistent with the example in [TS] var, varlmar always uses the ML
estimator of . The results obtained from varlmar are the same whether or not dfk is specified.

Example 2: After svar


When varlmar is applied to estimation results produced by svar, the sequence of LM tests is
applied to the underlying VAR. See [TS] var svar for a description of how an SVAR model builds on
a VAR. In this example, we fit an SVAR that has an underlying VAR with two lags that is identical to
the one fit in the previous example.
. matrix A = (.,.,0\0,.,0\.,.,.)
. matrix B = I(3)
. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk aeq(A) beq(B)
(output omitted )
. varlmar, mlag(5)
Lagrange-multiplier test
lag

chi2

df

Prob > chi2

1
2
3
4
5

5.5871
6.3189
8.4022
11.8742
5.2914

9
9
9
9
9

0.78043
0.70763
0.49418
0.22049
0.80821

H0: no autocorrelation at lag order

varlmar Perform LM test for residual autocorrelation after var or svar

693

Because the underlying VAR(2) is the same as the previous example (we assure you that this is
true), the output from varlmar is also the same.

Stored results
varlmar stores the following in r():
Matrices
r(lm)

2 , df, and p-values

Methods and formulas


The formula for the LM test statistic at lag j is

LMs = (T d .5) ln

b|
|
e s|
|

b is the maximum likelihood


where T is the number of observations in the VAR; d is explained below;
e s is the
estimate of , the variancecovariance matrix of the disturbances from the VAR; and
maximum likelihood estimate of from the following augmented VAR.
If there are K equations in the VAR, we can define et to be a K 1 vector of residuals. After we
create the K new variables e1, e2, . . . , eK containing the residuals from the K equations, we can
augment the original VAR with lags of these K new variables. For each lag s, we form an augmented
regression in which the new residual variables are lagged s times. Per the method of Davidson and
e s is the
MacKinnon (1993, 358), the missing values from these s lags are replaced with zeros.
maximum likelihood estimate of from this augmented VAR, and d is the number of coefficients
estimated in the augmented VAR. See [TS] var for a discussion of the maximum likelihood estimate
of in a VAR.
The asymptotic distribution of LMs is 2 with K 2 degrees of freedom.

References
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] var intro Introduction to vector autoregressive models

Title
varnorm Test for normally distributed disturbances after var or svar
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
varnorm

, options

options

Description

jbera
skewness
kurtosis
estimates(estname)
cholesky
separator(#)

report JarqueBera statistic; default is to report all three statistics


report skewness statistic; default is to report all three statistics
report kurtosis statistic; default is to report all three statistics
use previously stored results estname; default is to use active results
use Cholesky decomposition
draw separator line after every # rows

varnorm can be used only after var or svar; see [TS] var and [TS] var svar.
You must tsset your data before using varnorm; see [TS] tsset.

Menu
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

Test for normally distributed disturbances

Description
varnorm computes and reports a series of statistics against the null hypothesis that the disturbances
in a VAR are normally distributed. For each equation, and for all equations jointly, up to three statistics
may be computed: a skewness statistic, a kurtosis statistic, and the JarqueBera statistic. By default,
all three statistics are reported.

Options
jbera requests that the JarqueBera statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
skewness requests that the skewness statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
kurtosis requests that the kurtosis statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
estimates(estname) specifies that varnorm use the previously obtained set of var or svar estimates
stored as estname. By default, varnorm uses the active results. See [R] estimates for information
on manipulating estimation results.
694

varnorm Test for normally distributed disturbances after var or svar

695

cholesky specifies that varnorm use the Cholesky decomposition of the estimated variancecovariance
b to orthogonalize the residuals when varnorm is applied to svar
matrix of the disturbances, ,
results. By default, when varnorm is applied to svar results, it uses the estimated structural
b 1 B
b on C
b to orthogonalize the residuals. When applied to var e() results,
decomposition A
b For this reason, the cholesky option
varnorm always uses the Cholesky decomposition of .
may not be specified when using var results.
separator(#) specifies how often separator lines should be drawn between rows. By default,
separator lines do not appear. For example, separator(1) would draw a line between each row,
separator(2) between every other row, and so on.

Remarks and examples


Some of the postestimation statistics for VAR and SVAR assume that the K disturbances have a
K -dimensional multivariate normal distribution. varnorm uses the estimation results produced by
var or svar to produce a series of statistics against the null hypothesis that the K disturbances in
the VAR are normally distributed.
b1 , the kurtosis statistic
b2 , and
Per the notation in Lutkepohl (2005), call the skewness statistic
b3 . The JarqueBera statistic is a combination of the other two statistics.
the JarqueBera statistic
The single-equation results are from tests against the null hypothesis that the disturbance for that
particular equation is normally distributed. The results for all the equations are from tests against
the null hypothesis that the K disturbances follow a K -dimensional multivariate normal distribution.
Failure to reject the null hypothesis indicates a lack of model misspecification.

696

varnorm Test for normally distributed disturbances after var or svar

Example 1: After var


We refit the model with German data described in [TS] var and then call varnorm.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk
(output omitted )
. varnorm
Jarque-Bera test
Equation

chi2

df

Prob > chi2

2.821
3.450
1.566
7.838

2
2
2
6

0.24397
0.17817
0.45702
0.25025

Skewness

chi2

df

Prob > chi2

.11935
-.38316
-.31275

0.173
1.786
1.190
3.150

1
1
1
3

0.67718
0.18139
0.27532
0.36913

Kurtosis

chi2

df

Prob > chi2

3.9331
3.7396
2.6484

2.648
1.664
0.376
4.688

1
1
1
3

0.10367
0.19710
0.53973
0.19613

dln_inv
dln_inc
dln_consump
ALL
Skewness test
Equation
dln_inv
dln_inc
dln_consump
ALL
Kurtosis test
Equation
dln_inv
dln_inc
dln_consump
ALL

dfk estimator used in computations

In this example, neither the single-equation JarqueBera statistics nor the joint JarqueBera statistic
come close to rejecting the null hypothesis.
The skewness and kurtosis results have similar structures.
The JarqueBera results use the sum of the skewness and kurtosis statistics. The skewness and
kurtosis results are based on the skewness and kurtosis coefficients, respectively. See Methods and
formulas.

Example 2: After svar


The test statistics are computed on the orthogonalized VAR residuals; see Methods and formulas.
When varnorm is applied to var results, varnorm uses a Cholesky decomposition of the estimated
b to orthogonalize the residuals.
variancecovariance matrix of the disturbances, ,
By default, when varnorm is applied to svar estimation results, it uses the estimated structural
b 1 B
b on C
b to orthogonalize the residuals of the underlying VAR. Alternatively, when
decomposition A
varnorm is applied to svar results and the cholesky option is specified, varnorm uses the Cholesky
b to orthogonalize the residuals of the underlying VAR.
decomposition of

varnorm Test for normally distributed disturbances after var or svar

697

We fit an SVAR that is based on an underlying VAR with two lags that is the same as the one
fit in the previous example. We impose a structural decomposition that is the same as the Cholesky
decomposition, as illustrated in [TS] var svar.
. matrix a = (.,0,0\.,.,0\.,.,.)
. matrix b = I(3)
. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk aeq(a) beq(b)
(output omitted )
. varnorm
Jarque-Bera test
Equation

chi2

df

Prob > chi2

2.821
3.450
1.566
7.838

2
2
2
6

0.24397
0.17817
0.45702
0.25025

Skewness

chi2

df

Prob > chi2

.11935
-.38316
-.31275

0.173
1.786
1.190
3.150

1
1
1
3

0.67718
0.18139
0.27532
0.36913

Kurtosis

chi2

df

Prob > chi2

3.9331
3.7396
2.6484

2.648
1.664
0.376
4.688

1
1
1
3

0.10367
0.19710
0.53973
0.19613

dln_inv
dln_inc
dln_consump
ALL
Skewness test
Equation
dln_inv
dln_inc
dln_consump
ALL
Kurtosis test
Equation
dln_inv
dln_inc
dln_consump
ALL

dfk estimator used in computations

Because the estimated structural decomposition is the same as the Cholesky decomposition, the
varnorm results are the same as those from the previous example.

Technical note
b the estimated variancecovariance matrix of
The statistics computed by varnorm depend on ,
the disturbances. var uses the maximum likelihood estimator of this matrix by default, but the dfk
option produces an estimator that uses a small-sample correction. Thus specifying dfk in the call to
var or svar will affect the test results produced by varnorm.

698

varnorm Test for normally distributed disturbances after var or svar

Stored results
varnorm stores the following in r():
Macros
r(dfk)

dfk, if specified

Matrices
r(kurtosis)
r(skewness)
r(jb)

kurtosis test, df, and p-values


skewness test, df, and p-values
JarqueBera test, df, and p-values

Methods and formulas


b t be the K 1
varnorm is based on the derivations found in Lutkepohl (2005, 174181). Let u
vector of residuals from the K equations in a previously fitted VAR or the residuals from the K
b be the estimated covariance
equations of the VAR underlying a previously fitted SVAR. Similarly, let
b
matrix of the disturbances. (Note that depends on whether the dfk option was specified.) The
skewness, kurtosis, and JarqueBera statistics must be computed using the orthogonalized residuals.
Because

bP
b0
b=P

implies that

b 1
b 10 = IK
bP
P
b is one way of performing the orthogonalization. When varnorm is applied
b t by P
premultiplying u
b
b When varnorm is applied to
to var results, P is defined to be the Cholesky decomposition of .
b is set, by default, to the estimated structural decomposition; that is, P
b =A
b 1 B
b,
svar results, P
b and B
b are the svar estimates of the A and B matrices, or C
b , where C
b is the long-run
where A
SVAR estimation of C. (See [TS] var svar for more on the origin and estimation of the A and B
b is set
matrices.) When varnorm is applied to svar results and the cholesky option is specified, P
b
to the Cholesky decomposition of .
b t to be the orthogonalized VAR residuals given by
Define w
b 1 u
b t = (w
bt
w
b1t , . . . , w
bKt )0 = P
The K 1 vectors of skewness and kurtosis coefficients are then computed using the orthogonalized
residuals by
T
X
3
b 1 = (bb11 , . . . , bbK1 )0 ;
bbk1 = 1
w
bkt
b
T
i=1

T
X
bbk2 = 1
w
b4
T i=1 kt

b 2 = (bb12 , . . . , bbK2 )0 ;
b

Under the null hypothesis of multivariate Gaussian disturbances,

b0 b
b1 = T b1 b1

2 (K)

varnorm Test for normally distributed disturbances after var or svar

0 b
b
b2 = T (b2 3) (b2 3)

24

and

b3 =
b1 +
b2

699

2 (K)

2 (2K)

b1 is the skewness statistic,


b2 is the kurtosis statistic, and
b3 is the JarqueBera statistic.

b1 ,
b2 , and
b3 are for tests of the null hypothesis that the K 1 vector of disturbances follows

a multivariate normal distribution. The corresponding statistics against the null hypothesis that the
disturbances from the k th equation come from a univariate normal distribution are
b2
b1k = T b k1

2 (1)

2
b2
b2k = T ( b k2 3)

24

and

b3k =
b1 +
b2

2 (1)
d

2 (2)

References
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Jarque, C. M., and A. K. Bera. 1987. A test for normality of observations and regression residuals. International
Statistical Review 2: 163172.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] var intro Introduction to vector autoregressive models

Title
varsoc Obtain lag-order selection statistics for VARs and VECMs
Syntax
Postestimation option
References

Menu
Remarks and examples
Also see

Description
Stored results

Preestimation options
Methods and formulas

Syntax
Preestimation syntax
varsoc depvarlist

if

 

in

 

, preestimation options

Postestimation syntax


varsoc , estimates(estname)
preestimation options

Description

Main

maxlag(#)
exog(varlist)
constraints(constraints)
noconstant
lutstats
level(#)
separator(#)

set maximum lag order to #; default is maxlag(4)


use varlist as exogenous variables
apply constraints to exogenous variables
suppress constant term
use Lutkepohls version of information criteria
set confidence level; default is level(95)
draw separator line after every # rows

You must tsset your data before using varsoc; see [TS] tsset.
by is allowed with the preestimation version of varsoc; see [U] 11.1.10 Prefix commands.

Menu
Preestimation for VARs
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

Lag-order selection statistics (preestimation)

>

VAR diagnostics and tests

>

Lag-order selection statistics (postestimation)

>

VEC diagnostics and tests

>

Lag-order selection statistics (preestimation)

>

VEC diagnostics and tests

>

Lag-order selection statistics (postestimation)

Postestimation for VARs


Statistics

>

Multivariate time series

Preestimation for VECMs


Statistics

>

Multivariate time series

Postestimation for VECMs


Statistics

>

Multivariate time series

700

varsoc Obtain lag-order selection statistics for VARs and VECMs

701

Description
varsoc reports the final prediction error (FPE), Akaikes information criterion (AIC), Schwarzs
Bayesian information criterion (SBIC), and the Hannan and Quinn information criterion (HQIC) lagorder selection statistics for a series of vector autoregressions of order 1, . . . , maxlag(). A sequence
of likelihood-ratio test statistics for all the full VARs of order less than or equal to the highest lag
order is also reported. In the postestimation version, the maximum lag and estimation options are
based on the model just fit or the model specified in estimates(estname).
The preestimation version of varsoc can also be used to select the lag order for a vector errorcorrection model (VECM). As shown by Nielsen (2001), the lag-order selection statistics discussed
here can be used in the presence of I(1) variables.

Preestimation options


Main

maxlag(#) specifies the maximum lag order for which the statistics are to be obtained.
exog(varlist) specifies exogenous variables to include in the VARs fit by varsoc.
constraints(constraints) specifies a list of constraints on the exogenous variables to be applied.
Do not specify constraints on the lags of the endogenous variables because specifying one would
mean that at least one of the VAR models considered by varsoc will not contain the lag specified
in the constraint. Use var directly to obtain selection-order criteria with constraints on lags of the
endogenous variables.
noconstant suppresses the constant terms from the model. By default, constant terms are included.
lutstats specifies that the Lutkepohl (2005) versions of the information criteria be reported. See
Methods and formulas for a discussion of these statistics.
level(#) specifies the confidence level, as a percentage, that is used to identify the first likelihoodratio test that rejects the null hypothesis that the additional parameters from adding a lag are jointly
zero. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of
confidence intervals.
separator(#) specifies how often separator lines should be drawn between rows. By default,
separator lines do not appear. For example, separator(1) would draw a line between each row,
separator(2) between every other row, and so on.

Postestimation option
estimates(estname) specifies the name of a previously stored set of var or svar estimates.
When no depvarlist is specified, varsoc uses the postestimation syntax and uses the currently
active estimation results or the results specified in estimates(estname). See [R] estimates for
information on manipulating estimation results.

Remarks and examples


Many selection-order statistics have been developed to assist researchers in fitting a VAR of the
correct order. Several of these selection-order statistics appear in the [TS] var output. The varsoc
command computes these statistics over a range of lags p while maintaining a common sample and
option specification.

702

varsoc Obtain lag-order selection statistics for VARs and VECMs

varsoc can be used as a preestimation or a postestimation command. When it is used as a


preestimation command, a depvarlist is required, and the default maximum lag is 4. When it is used
as a postestimation command, varsoc uses the model specification stored in estname or the previously
fitted model.
varsoc computes four information criteria as well as a sequence of likelihood ratio (LR) tests.
The information criteria include the FPE, AIC, the HQIC, and SBIC.
For a given lag p, the LR test compares a VAR with p lags with one with p 1 lags. The null
hypothesis is that all the coefficients on the pth lags of the endogenous variables are zero. To use this
sequence of LR tests to select a lag order, we start by looking at the results of the test for the model
with the most lags, which is at the bottom of the table. Proceeding up the table, the first test that
rejects the null hypothesis is the lag order selected by this process. See Lutkepohl (2005, 143144)
for more information on this procedure. An * appears next to the LR statistic indicating the optimal
lag.
For the remaining statistics, the lag with the smallest value is the order selected by that criterion.
An * indicates the optimal lag. Strictly speaking, the FPE is not an information criterion, though
we include it in this discussion because, as with an information criterion, we select the lag length
corresponding to the lowest value; and, naturally, we want to minimize the prediction error. The AIC
measures the discrepancy between the given model and the true model, which, of course, we want
to minimize. Amemiya (1985) provides an intuitive discussion of the arguments in Akaike (1973).
The SBIC and the HQIC can be interpreted similarly to the AIC, though the SBIC and the HQIC have a
theoretical advantage over the AIC and the FPE. As Lutkepohl (2005, 148152) demonstrates, choosing
p to minimize the SBIC or the HQIC provides consistent estimates of the true lag order, p. In contrast,
minimizing the AIC or the FPE will overestimate the true lag order with positive probability, even
with an infinite sample size.

Example 1: Preestimation
Here we use varsoc as a preestimation command.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. varsoc dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats
Selection-order criteria (lutstats)
Sample: 1961q2 - 1978q4
Number of obs
=
71
lag
0
1
2
3
4

LL

LR

564.784
576.409
588.859
591.237
598.457

Endogenous:
Exogenous:

23.249
24.901*
4.7566
14.438

df

9
9
9
9

0.006
0.003
0.855
0.108

FPE

AIC

HQIC

2.7e-11
-24.423
-24.423*
2.5e-11
-24.497 -24.3829
2.3e-11* -24.5942* -24.3661
2.7e-11 -24.4076 -24.0655
2.9e-11 -24.3575 -23.9012

SBIC
-24.423*
-24.2102
-24.0205
-23.5472
-23.2102

dln_inv dln_inc dln_consump


_cons

The sample used begins in 1961q2 because all the VARs are fit to the sample defined by any if or
in conditions and the available data for the maximum lag specified. The default maximum number
of lags is four. Because we specified the lutstats option, the table contains the Lutkepohl (2005)
versions of the information criteria, which differ from the standard definitions in that they drop the
constant term from the log likelihood. In this example, the likelihood-ratio tests selected a model
with two lags. AIC and FPE have also both chosen a model with two lags, whereas SBIC and HQIC
have both selected a model with zero lags.

varsoc Obtain lag-order selection statistics for VARs and VECMs

703

Example 2: Postestimation
varsoc works as a postestimation command when no dependent variables are specified.
. var dln_inc dln_consump if qtr<=tq(1978q4), lutstats exog(l.dln_inv)
(output omitted )
. varsoc
Selection-order criteria (lutstats)
Sample: 1960q4 - 1978q4
lag
0
1
2

LL

LR

460.646
467.606
477.087

Endogenous:
Exogenous:

13.919
18.962*

df

4
4

0.008
0.001

Number of obs
FPE

AIC

HQIC

73
SBIC

1.3e-08 -18.2962 -18.2962 -18.2962*


1.2e-08 -18.3773 -18.3273 -18.2518
1.0e-08* -18.5275* -18.4274* -18.2764

dln_inc dln_consump
L.dln_inv _cons

Because we included one lag of dln inv in our original model, varsoc did likewise with each
model it fit.

Based on the work of Tsay (1984), Paulsen (1984), and Nielsen (2001), these lag-order selection
criteria can be used to determine the lag length of the VAR underlying a VECM. See [TS] vec intro
for an example in which we use varsoc to choose the lag order for a VECM.

Stored results
varsoc stores the following in r():
Scalars
r(N)
r(tmax)
r(tmin)
Macros
r(endog)
r(lutstats)
r(cns#)
Matrices
r(stats)

number of observations
last time period in sample
first time period in sample

r(mlag)
r(N gaps)

maximum lag order


the number of gaps in
the sample

names of endogenous variables


lutstats, if specified
the #th constraint

r(exog)
r(rmlutstats)

names of exogenous variables


rmlutstats, if specified

LL, LR, FPE, AIC, HQIC,


SBIC, and p-values

Methods and formulas


As shown by Hamilton (1994, 295296), the log likelihood for a VAR(p) is
LL

 


T
b 1 | K ln(2) K
ln |
2

704

varsoc Obtain lag-order selection statistics for VARs and VECMs

b is the maximum
where T is the number of observations, K is the number of equations, and
likelihood estimate of E[ut u0t ], where ut is the K 1 vector of disturbances. Because


b 1 | = ln |
b|
ln |
the log likelihood can be rewritten as

 n
o

T
b | + K ln(2) + K
LL =
ln |
2
Letting LL(j ) be the value of the log likelihood with j lags yields the LR statistic for lag order j as
LR(j)



= 2 LL(j) LL(j 1)

Model-order statistics
The formula for the FPE given in Lutkepohl (2005, 147) is

K

T + Kp + 1
FPE = |u |
T Kp 1
This formula, however, assumes that there is a constant in the model and that none of the variables
are dropped because of collinearity. To deal with these problems, the FPE is implemented as

FPE

K

T +m
= |u |
T m

where m is the average number of parameters over the K equations. This implementation accounts
for variables dropped because of collinearity.
By default, the AIC, SBIC, and HQIC are computed according to their standard definitions, which
include the constant term from the log likelihood. That is,

2tp
T
 
LL
ln(T )
SBIC = 2
+
tp
T
T


 
2ln ln(T )
LL
HQIC = 2
+
tp
T
T
AIC

=2

LL

where tp is the total number of parameters in the model and LL is the log likelihood.

varsoc Obtain lag-order selection statistics for VARs and VECMs

705

Lutstats
Lutkepohl (2005) advocates dropping the constant term from the log likelihood because it does
not affect inference. The Lutkepohl versions of the information criteria are

 2pK 2
= ln |u | +
T
 ln(T )
SBIC = ln |u | +
pK 2
T

 2ln ln(T )
HQIC = ln |u | +
pK 2
T
AIC

References
Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In Second International
Symposium on Information Theory, ed. B. N. Petrov and F. Csaki, 267281. Budapest: AkailseoniaiKiudo.
Amemiya, T. 1985. Advanced Econometrics. Cambridge, MA: Harvard University Press.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Nielsen, B. 2001. Order determination in general vector autoregressions. Working paper, Department of Economics,
University of Oxford and Nuffield College. http://ideas.repec.org/p/nuf/econwp/0110.html.
Paulsen, J. 1984. Order determination of multivariate autoregressive time series with unit roots. Journal of Time Series
Analysis 5: 115127.
Tsay, R. S. 1984. Order selection in nonstationary autoregressive models. Annals of Statistics 12: 14251433.

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] vec Vector error-correction models
[TS] var intro Introduction to vector autoregressive models
[TS] vec intro Introduction to vector error-correction models

Title
varstable Check the stability condition of VAR or SVAR estimates
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
varstable

, options

Description

options
Main

estimates(estname)
amat(matrix name)
graph
dlabel
modlabel
marker options
rlopts(cline options)
nogrid

pgrid( . . . )

use previously stored results estname; default is to use active results


save the companion matrix as matrix name
graph eigenvalues of the companion matrix
label eigenvalues with the distance from the unit circle
label eigenvalues with the modulus
change look of markers (color, size, etc.)
affect rendition of reference unit circle
suppress polar grid circles
specify radii and appearance of polar grid circles; see Options for details

Add plots

addplot(plot)

add other plots to the generated graph

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

varstable can be used only after var or svar; see [TS] var and [TS] var svar.

Menu
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

Check stability condition of VAR estimates

Description
varstable checks the eigenvalue stability condition after estimating the parameters of a vector
autoregression using var or svar.

Options


Main

estimates(estname) requests that varstable use the previously obtained set of var estimates
stored as estname. By default, varstable uses the active estimation results. See [R] estimates
for information on manipulating estimation results.
706

varstable Check the stability condition of VAR or SVAR estimates

707

amat(matrix name) specifies a valid Stata matrix name by which the companion matrix A can be
saved (see Methods and formulas for the definition of the matrix A). The default is not to save
the A matrix.
graph causes varstable to draw a graph of the eigenvalues of the companion matrix.
dlabel labels each eigenvalue with its distance from the unit circle. dlabel cannot be specified
with modlabel.
modlabel labels the eigenvalues with their moduli. modlabel cannot be specified with dlabel.
marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.
rlopts(cline options) affect the rendition of the reference unit circle; see [G-3] cline options.
nogrid suppresses the polar grid circles.



pgrid( numlist
, line options ) determines the radii and appearance of the polar grid circles.
By default, the graph includes nine polar grid circles with radii 0.1, 0.2, . . . , 0.9 that have the grid
line style. The numlist specifies the radii for the polar grid circles. The line options determine the
appearance of the polar grid circles; see [G-3] line options. Because the pgrid() option can be
repeated, circles with different radii can have distinct appearances.

Add plots

addplot(plot) adds specified plots to the generated graph. See [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, except by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Inference after var and svar requires that variables be covariance stationary. The variables in yt
are covariance stationary if their first two moments exist and are independent of time. More explicitly,
a variable yt is covariance stationary if
1. E[yt ] is finite and independent of t.
2. Var[yt ] is finite and independent of t
3. Cov[yt , ys ] is a finite function of |t s| but not of t or s alone.
Interpretation of VAR models, however, requires that an even stricter stability condition be met. If a
VAR is stable, it is invertible and has an infinite-order vector moving-average representation. If the
VAR is stable, impulseresponse functions and forecast-error variance decompositions have known
interpretations.
Lutkepohl (2005) and Hamilton (1994) both show that if the modulus of each eigenvalue of the
matrix A is strictly less than one, the estimated VAR is stable (see Methods and formulas for the
definition of the matrix A).

708

varstable Check the stability condition of VAR or SVAR estimates

Example 1
After fitting a VAR with var, we can use varstable to check the stability condition. Using the
same VAR model that was used in [TS] var, we demonstrate the use of varstable.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4)
(output omitted )
. varstable, graph
Eigenvalue stability condition
Eigenvalue
.5456253
-.3785754
-.3785754
-.0643276
-.0643276
-.3698058

+
+
-

Modulus
.545625
.540232
.540232
.464074
.464074
.369806

.3853982i
.3853982i
.4595944i
.4595944i

All the eigenvalues lie inside the unit circle.


VAR satisfies stability condition.

Because the modulus of each eigenvalue is strictly less than 1, the estimates satisfy the eigenvalue
stability condition.
Specifying the graph option produced a graph of the eigenvalues with the real components on
the x axis and the complex components on the y axis. The graph below indicates visually that these
eigenvalues are well inside the unit circle.

.5

Imaginary
0

.5

Roots of the companion matrix

.5

0
Real

.5

Example 2
This example illustrates two other features of the varstable command. First, varstable can
check the stability of the estimates of the VAR underlying an SVAR fit by var svar. Second, varstable
can check the stability of any previously stored var or var svar estimates.

varstable Check the stability condition of VAR or SVAR estimates

709

We begin by refitting the previous VAR and storing the results as var1. Because this is the same
VAR that was fit in the previous example, the stability results should be identical.
. var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4)
(output omitted )
. estimates store var1

Now we use svar to fit an SVAR with a different underlying VAR and check the estimates of that
underlying VAR for stability.
. matrix A = (.,0\.,.)
. matrix B = I(2)
. svar d.ln_inc d.ln_consump, aeq(A) beq(B)
(output omitted )
. varstable
Eigenvalue stability condition
Eigenvalue
.548711
-.2979493 +
-.2979493 -.3570825

.4328013i
.4328013i

Modulus
.548711
.525443
.525443
.357082

All the eigenvalues lie inside the unit circle.


VAR satisfies stability condition.

The estimates() option allows us to check the stability of the var results stored as var1.
. varstable, est(var1)
Eigenvalue stability condition
Eigenvalue
.5456253
-.3785754
-.3785754
-.0643276
-.0643276
-.3698058

+
+
-

.3853982i
.3853982i
.4595944i
.4595944i

Modulus
.545625
.540232
.540232
.464074
.464074
.369806

All the eigenvalues lie inside the unit circle.


VAR satisfies stability condition.

The results are identical to those obtained in the previous example, confirming that we were
checking the results in var1.

Stored results
varstable stores the following in r():
Matrices
r(Re)
r(Im)
r(Modulus)

real part of the eigenvalues of A


imaginary part of the eigenvalues of A
modulus of the eigenvalues of A

710

varstable Check the stability condition of VAR or SVAR estimates

Methods and formulas


varstable forms the companion matrix

A1
I

0
A=
.
..

A2
0
I
..
.
0

. . . Ap1
...
0
...
0
..
..
.
.
...
I

Ap
0

0
..
.
0

and obtains
its eigenvalues by using matrix eigenvalues. The modulus of the complex eigenvalue
r + ci is r2 + c2 . As shown by Lutkepohl (2005) and Hamilton (1994), the VAR is stable if the
modulus of each eigenvalue of A is strictly less than 1.

References
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] var intro Introduction to vector autoregressive models

Title
varwle Obtain Wald lag-exclusion statistics after var or svar
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
varwle

, estimates(estname) separator(#)

varwle can be used only after var or svar; see [TS] var and [TS] var svar.

Menu
Statistics

>

Multivariate time series

>

VAR diagnostics and tests

>

Wald lag-exclusion statistics

Description
varwle reports Wald tests the hypothesis that the endogenous variables at a given lag are jointly
zero for each equation and for all equations jointly.

Options
estimates(estname) requests that varwle use the previously obtained set of var or svar estimates
stored as estname. By default, varwle uses the active estimation results. See [R] estimates for
information on manipulating estimation results.
separator(#) specifies how often separator lines should be drawn between rows. By default,
separator lines do not appear. For example, separator(1) would draw a line between each row,
separator(2) between every other row, and so on.

Remarks and examples


After fitting a VAR, one hypothesis of interest is that all the endogenous variables at a given lag
are jointly zero. varwle reports Wald tests of this hypothesis for each equation and for all equations
jointly. varwle uses the estimation results from a previously fitted var or svar. By default, varwle
uses the active estimation results, but you may also use a stored set of estimates by specifying the
estimates() option.
If the VAR was fit with the small option, varwle also presents small-sample F statistics; otherwise,
varwle presents large-sample chi-squared statistics.

711

712

varwle Obtain Wald lag-exclusion statistics after var or svar

Example 1: After var


We analyze the model with the German data described in [TS] var using varwle.
. use http://www.stata-press.com/data/r13/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small
(output omitted )
. varwle
Equation: dln_inv
lag
1
2

df

df_r

2.64902
1.25799

3
3

66
66

df

df_r

2.19276
.907499

3
3

66
66

Prob > F
0.0560
0.2960

Equation: dln_inc
lag
1
2

Prob > F
0.0971
0.4423

Equation: dln_consump
lag
1
2

df

df_r

1.80804
5.57645

3
3

66
66

df

df_r

3.78884
2.96811

9
9

66
66

Prob > F
0.1543
0.0018

Equation: All
lag
1
2

Prob > F
0.0007
0.0050

Because the VAR was fit with the dfk and small options, varwle used the small-sample estimator
b in constructing the VCE, producing an F statistic. The first two equations appear to have a
of
different lag structure from that of the third. In the first two equations, we cannot reject the null
hypothesis that all three endogenous variables have zero coefficients at the second lag. The hypothesis
that all three endogenous variables have zero coefficients at the first lag can be rejected at the 10%
level for both of the first two equations. In contrast, in the third equation, the coefficients on the
second lag of the endogenous variables are jointly significant, but not those on the first lag. However,
we strongly reject the hypothesis that the coefficients on the first lag of the endogenous variables
are zero in all three equations jointly. Similarly, we can also strongly reject the hypothesis that the
coefficients on the second lag of the endogenous variables are zero in all three equations jointly.
If we believe these results strongly enough, we might want to refit the original VAR, placing some
constraints on the coefficients. See [TS] var for details on how to fit VAR models with constraints.

varwle Obtain Wald lag-exclusion statistics after var or svar

713

Example 2: After svar


Here we fit a simple SVAR and then run varwle:
. matrix a = (.,0\.,.)
. matrix b = I(2)
. svar dln_inc dln_consump, aeq(a) beq(b)
Estimating short-run parameters
Iteration 0:
log likelihood = -159.21683
Iteration 1:
log likelihood = 490.92264
Iteration 2:
log likelihood = 528.66126
Iteration 3:
log likelihood = 573.96363
Iteration 4:
log likelihood = 578.05136
Iteration 5:
log likelihood = 578.27633
Iteration 6:
log likelihood = 578.27699
Iteration 7:
log likelihood = 578.27699
Structural vector autoregression
( 1) [a_1_2]_cons = 0
( 2) [b_1_1]_cons = 1
( 3) [b_1_2]_cons = 0
( 4) [b_2_1]_cons = 0
( 5) [b_2_2]_cons = 1
Sample: 1960q4 - 1982q4
Exactly identified model
Coef.
/a_1_1
/a_2_1
/a_1_2
/a_2_2

89.72411
-64.73622
0
126.2964

/b_1_1
/b_2_1
/b_1_2
/b_2_2

1
0
0
1

Std. Err.

No. of obs
Log likelihood
z

6.725107
13.34
10.67698
-6.06
(constrained)
9.466318
13.34

P>|z|

=
=

89
578.277

[95% Conf. Interval]

0.000
0.000

76.54315
-85.66271

102.9051
-43.80973

0.000

107.7428

144.8501

(constrained)
(constrained)
(constrained)
(constrained)

The output table from var svar gives information about the estimates of the parameters in the A
and B matrices in the structural VAR. But, as discussed in [TS] var svar, an SVAR model builds on
an underlying VAR. When varwle uses the estimation results produced by svar, it performs Wald
lag-exclusion tests on the underlying VAR model. Next we run varwle on these svar results.

714

varwle Obtain Wald lag-exclusion statistics after var or svar


. varwle
Equation: dln_inc
lag
1
2

chi2

df

Prob > chi2

6.88775
1.873546

2
2

0.032
0.392

Equation: dln_consump
lag
1
2

chi2

df

Prob > chi2

9.938547
13.89996

2
2

0.007
0.001

chi2

df

Prob > chi2

34.54276
19.44093

4
4

0.000
0.001

Equation: All
lag
1
2

Now we fit the underlying VAR with two lags and apply varwle to these results.
. var dln_inc dln_consump
(output omitted )
. varwle
Equation: dln_inc
lag
1
2

chi2

df

Prob > chi2

6.88775
1.873546

2
2

0.032
0.392

Equation: dln_consump
lag
1
2

chi2

df

Prob > chi2

9.938547
13.89996

2
2

0.007
0.001

chi2

df

Prob > chi2

34.54276
19.44093

4
4

0.000
0.001

Equation: All
lag
1
2

Because varwle produces the same results in these two cases, we can conclude that when varwle
is applied to svar results, it performs Wald lag-exclusion tests on the underlying VAR.

varwle Obtain Wald lag-exclusion statistics after var or svar

715

Stored results
varwle stores the following in r():
Matrices
if e(small)==""
r(chi2)
r(df)
r(p)
if e(small)!=""
r(F)
r(df r)
r(df)
r(p)

2 test statistics

degrees of freedom
p-values
F test statistics
numerator degrees of freedom
denominator degree of freedom
p-values

Methods and formulas


varwle uses test to obtain Wald statistics of the hypotheses that all the endogenous variables at
a given lag are jointly zero for each equation and for all equations jointly. Like the test command,
varwle uses estimation results stored by var or var svar to determine whether to calculate and
report small-sample F statistics or large-sample chi-squared statistics.


Abraham Wald (19021950) was born in Cluj, in what is now Romania. He studied mathematics at
the University of Vienna, publishing at first on geometry, but then became interested in economics
and econometrics. He moved to the United States in 1938 and later joined the faculty at Columbia.
His major contributions to statistics include work in decision theory, optimal sequential sampling,
large-sample distributions of likelihood-ratio tests, and nonparametric inference. Wald died in a
plane crash in India.

References
Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Lutkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer.
Mangel, M., and F. J. Samaniego. 1984. Abraham Walds work on aircraft survivability. Journal of the American
Statistical Association 79: 259267.
Wolfowitz, J. 1952. Abraham Wald, 19021950. Annals of Mathematical Statistics 23: 113 (and other reports in
same issue).

Also see
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[TS] varbasic Fit a simple VAR and graph IRFs or FEVDs
[TS] var intro Introduction to vector autoregressive models

Title
vec intro Introduction to vector error-correction models

Description

Remarks and examples

References

Also see

Description
Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference
on vector error-correction models (VECMs) with cointegrating variables. After fitting a VECM, the
irf commands can be used to obtain impulseresponse functions (IRFs) and forecast-error variance
decompositions (FEVDs). The table below describes the available commands.

Fitting a VECM
vec

[TS] vec

Model diagnostics and inference


vecrank
[TS] vecrank
veclmar
[TS] veclmar

vecnorm
vecstable
varsoc

[TS] vecnorm
[TS] vecstable
[TS] varsoc

Fit vector error-correction models

Estimate the cointegrating rank of a VECM


Perform LM test for residual autocorrelation
after vec
Test for normally distributed disturbances after vec
Check the stability condition of VECM estimates
Obtain lag-order selection statistics for VARs
and VECMs

Forecasting from a VECM


fcast compute [TS] fcast compute
fcast graph
[TS] fcast graph

Compute dynamic forecasts after var, svar, or vec


Graph forecasts after fcast compute

Working with IRFs and FEVDs


irf
[TS] irf

Create and analyze IRFs and FEVDs

This manual entry provides an overview of the commands for VECMs; provides an introduction
to integration, cointegration, estimation, inference, and interpretation of VECM models; and gives an
example of how to use Statas vec commands.

Remarks and examples


vec estimates the parameters of cointegrating VECMs. You may specify any of the five trend
specifications in Johansen (1995, sec. 5.7). By default, identification is obtained via the Johansen
normalization, but vec allows you to obtain identification by placing your own constraints on
the parameters of the cointegrating vectors. You may also put more restrictions on the adjustment
coefficients.
vecrank is the command for determining the number of cointegrating equations. vecrank implements Johansens multiple trace test procedure, the maximum eigenvalue test, and a method based
on minimizing either of two different information criteria.
716

vec intro Introduction to vector error-correction models

717

Because Nielsen (2001) has shown that the methods implemented in varsoc can be used to choose
the order of the autoregressive process, no separate vec command is needed; you can simply use
varsoc. veclmar tests that the residuals have no serial correlation, and vecnorm tests that they are
normally distributed.
All the irf routines described in [TS] irf are available for estimating, interpreting, and managing
estimated IRFs and FEVDs for VECMs.
Remarks are presented under the following headings:
Introduction to cointegrating VECMs
What is cointegration?
The multivariate VECM specification
Trends in the Johansen VECM framework
VECM estimation in Stata
Selecting the number of lags
Testing for cointegration
Fitting a VECM
Fitting VECMs with Johansens normalization
Postestimation specification testing
Impulseresponse functions for VECMs
Forecasting with VECMs

Introduction to cointegrating VECMs


This section provides a brief introduction to integration, cointegration, and cointegrated vector
error-correction models. For more details about these topics, see Hamilton (1994), Johansen (1995),
Lutkepohl (2005), Watson (1994), and Becketti (2013).
What is cointegration?
Standard regression techniques, such as ordinary least squares (OLS), require that the variables
be covariance stationary. A variable is covariance stationary if its mean and all its autocovariances
are finite and do not change over time. Cointegration analysis provides a framework for estimation,
inference, and interpretation when the variables are not covariance stationary.
Instead of being covariance stationary, many economic time series appear to be first-difference
stationary. This means that the level of a time series is not stationary but its first difference is. Firstdifference stationary processes are also known as integrated processes of order 1, or I(1) processes.
Covariance-stationary processes are I(0). In general, a process whose dth difference is stationary is
an integrated process of order d, or I(d).
The canonical example of a first-difference stationary process is the random walk. This is a variable
xt that can be written as
xt = xt1 + t
(1)
where the t are independently and identically distributed (i.i.d.) with mean zero and a finite variance
2 . Although E[xt ] = 0 for all t, Var[xt ] = T 2 is not time invariant, so xt is not covariance
stationary. Because xt = xt xt1 = t and t is covariance stationary, xt is first-difference
stationary.
These concepts are important because, although conventional estimators are well behaved when
applied to covariance-stationary data, they have nonstandard asymptotic distributions and different
rates of convergence when applied to I(1) processes. To illustrate, consider several variants of the
model
yt = a + bxt + et
(2)
Throughout the discussion, we maintain the assumption that E[et ] = 0.

718

vec intro Introduction to vector error-correction models

If both yt and xt are covariance-stationary processes, et must also be covariance stationary. As


long as E[xt et ] = 0, we can consistently estimate the parameters a and b by using OLS. Furthermore,
the distribution of the OLS estimator converges to a normal distribution centered at the true value as
the sample size grows.
If yt and xt are independent random walks and b = 0, there is no relationship between yt and
xt , and (2) is called a spurious regression. Granger and Newbold (1974) performed Monte Carlo
experiments and showed that the usual t statistics from OLS regression provide spurious results: given
a large enough dataset, we can almost always reject the null hypothesis of the test that b = 0 even
though b is in fact zero. Here the OLS estimator does not converge to any well-defined population
parameter.
Phillips (1986) later provided the asymptotic theory that explained the Granger and Newbold (1974)
results. He showed that the random walks yt and xt are first-difference stationary processes and that
the OLS estimator does not have its usual asymptotic properties when the variables are first-difference
stationary.
Because yt and xt are covariance stationary, a simple regression of yt on xt appears to
be a viable alternative. However, if yt and xt cointegrate, as defined below, the simple regression of
yt on xt is misspecified.
If yt and xt are I(1) and b 6= 0, et could be either I(0) or I(1). Phillips and Durlauf (1986) have
derived the asymptotic theory for the OLS estimator when et is I(1), though it has not been widely
used in applied work. More interesting is the case in which et = yt a bxt is I(0). yt and xt are
then said to be cointegrated. Two variables are cointegrated if each is an I(1) process but a linear
combination of them is an I(0) process.
It is not possible for yt to be a random walk and xt and et to be covariance stationary. As
Granger (1981) pointed out, because a random walk cannot be equal to a covariance-stationary
process, the equation does not balance. An equation balances when the processes on each side
of the equal sign are of the same order of integration. Before attacking any applied problem with
integrated variables, make sure that the equation balances before proceeding.
An example from Engle and Granger (1987) provides more intuition. Redefine yt and xt to be

yt + xt = t ,
yt + xt = t ,

t = t1 + t
t = t1 + t ,

|| < 1

(3)
(4)

where t and t are i.i.d. disturbances over time that are correlated with each other. Because t is
I(1), (3) and (4) imply that both xt and yt are I(1). The condition that || < 1 implies that t and
yt + xt are I(0). Thus yt and xt cointegrate, and (1, ) is the cointegrating vector.
Using a bit of algebra, we can rewrite (3) and (4) as

yt =zt1 + 1t
xt = zt1 + 2t

(5)
(6)

where = (1)/(), zt = yt +xt , and 1t and 2t are distinct, stationary, linear combinations
of t and t . This representation is known as the vector error-correction model (VECM). One can think
of zt = 0 as being the point at which yt and xt are in equilibrium. The coefficients on zt1 describe
how yt and xt adjust to zt1 being nonzero, or out of equilibrium. zt is the error in the system,
and (5) and (6) describe how system adjusts or corrects back to the equilibrium. As goes to 1, the
system degenerates into a pair of correlated random walks. The VECM parameterization highlights
this point, because 0 as 1.

vec intro Introduction to vector error-correction models

719

If we knew , we would know zt , and we could work with the stationary system of (5) and (6).
Although knowing seems silly, we can conduct much of the analysis as if we knew because
there is an estimator for the cointegrating parameter that converges to its true value at a faster rate
than the estimator for the adjustment parameters and .
The definition of a bivariate cointegrating relation requires simply that there exist a linear combination
of the I(1) variables that is I(0). If yt and xt are I(1) and there are two finite real numbers a 6= 0
and b 6= 0, such that ayt + bxt is I(0), then yt and xt are cointegrated. Although there are two
parameters, a and b, only one will be identifiable because if ayt + bxt is I(0), so is cayt + cbxt
for any finite, nonzero, real number c. Obtaining identification in the bivariate case is relatively
simple. The coefficient on yt in (4) is unity. This natural construction of the model placed the
necessary identification restriction on the cointegrating vector. As we discuss below, identification in
the multivariate case is more involved.
If yt is a K 1 vector of I(1) variables and there exists a vector , such that yt is a vector
of I(0) variables, then yt is said to be cointegrating of order (1,0) with cointegrating vector . We
say that the parameters in are the parameters in the cointegrating equation. For a vector of length
K , there may be at most K 1 distinct cointegrating vectors. Engle and Granger (1987) provide a
more general definition of cointegration, but this one is sufficient for our purposes.
The multivariate VECM specification
In practice, most empirical applications analyze multivariate systems, so the rest of our discussion
focuses on that case. Consider a VAR with p lags

yt = v + A1 yt1 + A2 yt2 + + Ap ytp + t

(7)

where yt is a K 1 vector of variables, v is a K 1 vector of parameters, A1 Ap are K K


matrices of parameters, and t is a K 1 vector of disturbances. t has mean 0, has covariance
matrix , and is i.i.d. normal over time. Any VAR(p) can be rewritten as a VECM. Using some algebra,
we can rewrite (7) in VECM form as

yt = v + yt1 +

p1
X

i yti + t

(8)

i=1

where =

Pj=p
j=1

Aj Ik and i =

Pj=p

j=i+1

Aj . The v and t in (7) and (8) are identical.

Engle and Granger (1987) show that if the variables yt are I(1) the matrix in (8) has rank
0 r < K , where r is the number of linearly independent cointegrating vectors. If the variables
cointegrate, 0 < r < K and (8) shows that a VAR in first differences is misspecified because it omits
the lagged level term yt1 .
Assume that has reduced rank 0 < r < K so that it can be expressed as = 0 , where
and are both r K matrices of rank r. Without further restrictions, the cointegrating vectors are
not identified: the parameters (, ) are indistinguishable from the parameters (Q, Q10 ) for any
r r nonsingular matrix Q. Because only the rank of is identified, the VECM is said to identify
the rank of the cointegrating space, or equivalently, the number of cointegrating vectors. In practice,
the estimation of the parameters of a VECM requires at least r2 identification restrictions. Statas vec
command can apply the conventional Johansen restrictions discussed below or use constraints that
the user supplies.
The VECM in (8) also nests two important special cases. If the variables in yt are I(1) but not
cointegrated, is a matrix of zeros and thus has rank 0. If all the variables are I(0), has full rank
K.

720

vec intro Introduction to vector error-correction models

There are several different frameworks for estimation and inference in cointegrating systems.
Although the methods in Stata are based on the maximum likelihood (ML) methods developed by
Johansen (1988, 1991, 1995), other useful frameworks have been developed by Park and Phillips (1988,
1989); Sims, Stock, and Watson (1990); Stock (1987); and Stock and Watson (1988); among others.
The ML framework developed by Johansen was independently developed by Ahn and Reinsel (1990).
Maddala and Kim (1998) and Watson (1994) survey all of these methods. The cointegration methods
in Stata are based on Johansens maximum likelihood framework because it has been found to be
particularly useful in several comparative studies, including Gonzalo (1994) and Hubrich, Lutkepohl,
and Saikkonen (2001).
Trends in the Johansen VECM framework
Deterministic trends in a cointegrating VECM can stem from two distinct sources; the mean of the
cointegrating relationship and the mean of the differenced series. Allowing for a constant and a linear
trend and assuming that there are r cointegrating relations, we can rewrite the VECM in (8) as
0

yt = yt1 +

p1
X

i yti + v + t + t

(9)

i=1

where is a K 1 vector of parameters. Because (9) models the differences of the data, the constant
implies a linear time trend in the levels, and the time trend t implies a quadratic time trend in the
levels of the data. Often we may want to include a constant or a linear time trend for the differences
without allowing for the higher-order trend that is implied for the levels of the data. VECMs exploit
the properties of the matrix to achieve this flexibility.
Because is a K r rank matrix, we can rewrite the deterministic components in (9) as

v = +
t = t + t

(10a)
(10b)

where and are r 1 vectors of parameters and and are K 1 vectors of parameters.
is orthogonal to , and is orthogonal to ; that is, 0 = 0 and 0 = 0, allowing us to
rewrite (9) as
p1
X
yt = (0 yt1 + + t) +
i yti + + t + t
(11)
i=1

Placing restrictions on the trend terms in (11) yields five cases.


CASE 1: Unrestricted trend

If no restrictions are placed on the trend parameters, (11) implies that there are quadratic trends
in the levels of the variables and that the cointegrating equations are stationary around time
trends (trend stationary).
CASE 2: Restricted trend,

=0

By setting = 0, we assume that the trends in the levels of the data are linear but not quadratic.
This specification allows the cointegrating equations to be trend stationary.
CASE 3: Unrestricted constant,

= 0 and = 0

By setting = 0 and = 0, we exclude the possibility that the levels of the data have
quadratic trends, and we restrict the cointegrating equations to be stationary around constant
means. Because is not restricted to zero, this specification still puts a linear time trend in the
levels of the data.

vec intro Introduction to vector error-correction models


CASE 4: Restricted constant,

721

= 0, = 0, and = 0

By adding the restriction that = 0, we assume there are no linear time trends in the levels of
the data. This specification allows the cointegrating equations to be stationary around a constant
mean, but it allows no other trends or constant terms.
CASE 5: No trend,

= 0, = 0, = 0, and = 0

This specification assumes that there are no nonzero means or trends. It also assumes that the
cointegrating equations are stationary with means of zero and that the differences and the levels
of the data have means of zero.
This flexibility does come at a price. Below we discuss testing procedures for determining the
number of cointegrating equations. The asymptotic distribution of the LR for hypotheses about r
changes with the trend specification, so we must first specify a trend specification. A combination of
theory and graphical analysis will aid in specifying the trend before proceeding with the analysis.

VECM estimation in Stata

11.2

11.4

11.6

11.8

12

12.2

We provide an overview of the vec commands in Stata through an extended example. We have
monthly data on the average selling prices of houses in four cities in Texas: Austin, Dallas, Houston,
and San Antonio. In the dataset, these average housing prices are contained in the variables austin,
dallas, houston, and sa. The series begin in January of 1990 and go through December 2003, for
a total of 168 observations. The following graph depicts our data.

1990m1

1995m1

2000m1

2005m1

t
ln of house prices in austin
ln of house prices in houston

ln of house prices in dallas


ln of house prices in san antonio

The plots on the graph indicate that all the series are trending and potential I(1) processes. In a
competitive market, the current and past prices contain all the information available, so tomorrows
price will be a random walk from todays price. Some researchers may opt to use [TS] dfgls to
investigate the presence of a unit root in each series, but the test for cointegration we use includes the
case in which all the variables are stationary, so we defer formal testing until we test for cointegration.
The time trends in the data appear to be approximately linear, so we will specify trend(constant)
when modeling these series, which is the default with vec.
The next graph shows just Dallas and Houstons data, so we can more carefully examine their
relationship.

vec intro Introduction to vector error-correction models

11.2

11.4

11.6

11.8

12

12.2

722

1990m1 1991m11

1994m1

1996m1

1998m1

2000m1

2002m1

2004m1

t
ln of house prices in dallas

ln of house prices in houston

Except for the crash at the end of 1991, housing prices in Dallas and Houston appear closely
related. Although average prices in the two cities will differ because of resource variations and other
factors, if the housing markets become too dissimilar, people and businesses will migrate, bringing
the average housing prices back toward each other. We therefore expect the series of average housing
prices in Houston to be cointegrated with the series of average housing prices in Dallas.
Selecting the number of lags
To test for cointegration or fit cointegrating VECMs, we must specify how many lags to include.
Building on the work of Tsay (1984) and Paulsen (1984), Nielsen (2001) has shown that the methods
implemented in varsoc can be used to determine the lag order for a VAR model with I(1) variables.
As can be seen from (9), the order of the corresponding VECM is always one less than the VAR. vec
makes this adjustment automatically, so we will always refer to the order of the underlying VAR. The
output below uses varsoc to determine the lag order of the VAR of the average housing prices in
Dallas and Houston.
. use http://www.stata-press.com/data/r13/txhprice
. varsoc dallas houston
Selection-order criteria
Sample: 1990m5 - 2003m12
Number of obs
lag
0
1
2
3
4

LL

LR

299.525
577.483
590.978
593.437
596.364

Endogenous:
Exogenous:

555.92
26.991*
4.918
5.8532

df

4
4
4
4

0.000
0.000
0.296
0.210

FPE

AIC

.000091 -3.62835
3.2e-06
-6.9693
2.9e-06* -7.0851*
2.9e-06 -7.06631
3.0e-06 -7.05322

164

HQIC

SBIC

-3.61301
-6.92326
-7.00837*
-6.95888
-6.9151

-3.59055
-6.85589
-6.89608*
-6.80168
-6.71299

dallas houston
_cons

We will use two lags for this bivariate model because the HannanQuinn information criterion (HQIC)
method, Schwarz Bayesian information criterion (SBIC) method, and sequential likelihood-ratio (LR)
test all chose two lags, as indicated by the * in the output.
The reader can verify that when all four cities data are used, the LR test selects three lags, the
HQIC method selects two lags, and the SBIC method selects one lag. We will use three lags in our
four-variable model.

vec intro Introduction to vector error-correction models

723

Testing for cointegration


The tests for cointegration implemented in vecrank are based on Johansens method. If the log
likelihood of the unconstrained model that includes the cointegrating equations is significantly different
from the log likelihood of the constrained model that does not include the cointegrating equations,
we reject the null hypothesis of no cointegration.
Here we use vecrank to determine the number of cointegrating equations:
. vecrank dallas houston
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1990m3 - 2003m12
Lags =

maximum
rank
0
1
2

parms
6
9
10

LL
576.26444
599.58781
599.67706

eigenvalue
.
0.24498
0.00107

166
2

5%
trace
critical
statistic
value
46.8252
15.41
0.1785*
3.76

Besides presenting information about the sample size and time span, the header indicates that test
statistics are based on a model with two lags and a constant trend. The body of the table presents test
statistics and their critical values of the null hypotheses of no cointegration (line 1) and one or fewer
cointegrating equations (line 2). The eigenvalue shown on the last line is used to compute the trace
statistic in the line above it. Johansens testing procedure starts with the test for zero cointegrating
equations (a maximum rank of zero) and then accepts the first null hypothesis that is not rejected.
In the output above, we strongly reject the null hypothesis of no cointegration and fail to reject
the null hypothesis of at most one cointegrating equation. Thus we accept the null hypothesis that
there is one cointegrating equation in the bivariate model.
Using all four series and a model with three lags, we find that there are two cointegrating
relationships.
. vecrank austin dallas houston sa, lag(3)
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1990m4 - 2003m12
Lags =

maximum
rank
0
1
2
3
4

parms
36
43
48
51
52

LL
1107.7833
1137.7484
1153.6435
1158.4191
1158.5868

eigenvalue
.
0.30456
0.17524
0.05624
0.00203

165
3

5%
trace
critical
statistic
value
101.6070
47.21
41.6768
29.68
9.8865*
15.41
0.3354
3.76

Fitting a VECM
vec estimates the parameters of cointegrating VECMs. There are four types of parameters of interest:
1. The parameters in the cointegrating equations
2. The adjustment coefficients
3. The short-run coefficients
4. Some standard functions of and that have useful interpretations

724

vec intro Introduction to vector error-correction models

Although all four types are discussed in [TS] vec, here we discuss only types 13 and how they
appear in the output of vec.
Having determined that there is a cointegrating equation between the Dallas and Houston series,
we now want to estimate the parameters of a bivariate cointegrating VECM for these two series by
using vec.
. vec dallas houston
Vector error-correction model
Sample: 1990m3 - 2003m12
Log likelihood =
Det(Sigma_ml) =
Equation
D_dallas
D_houston

599.5878
2.50e-06
Parms

RMSE

R-sq

4
4

.038546
.045348

0.1692
0.3737

Coef.

Std. Err.

No. of obs
AIC
HQIC
SBIC
chi2
P>chi2
32.98959
96.66399

P>|z|

=
166
= -7.115516
= -7.04703
= -6.946794

0.0000
0.0000

[95% Conf. Interval]

D_dallas
_ce1
L1.

-.3038799

.0908504

-3.34

0.001

-.4819434

-.1258165

dallas
LD.

-.1647304

.0879356

-1.87

0.061

-.337081

.0076202

houston
LD.

-.0998368

.0650838

-1.53

0.125

-.2273988

.0277251

_cons

.0056128

.0030341

1.85

0.064

-.0003339

.0115595

D_houston
_ce1
L1.

.5027143

.1068838

4.70

0.000

.2932258

.7122028

dallas
LD.

-.0619653

.1034547

-0.60

0.549

-.2647327

.1408022

houston
LD.

-.3328437

.07657

-4.35

0.000

-.4829181

-.1827693

_cons

.0033928

.0035695

0.95

0.342

-.0036034

.010389

Cointegrating equations
Equation
Parms
_ce1

Identification:

chi2

P>chi2

1640.088

0.0000

beta is exactly identified


Johansen normalization restriction imposed

beta

Coef.

dallas
houston
_cons

1
-.8675936
-1.688897

Std. Err.

P>|z|

[95% Conf. Interval]

_ce1
.
.0214231
.

.
-40.50
.

.
0.000
.

.
-.9095821
.

.
-.825605
.

vec intro Introduction to vector error-correction models

725

The header contains information about the sample, the fit of each equation, and overall model
fit statistics. The first estimation table contains the estimates of the short-run parameters, along with
their standard errors, z statistics, and confidence intervals. The two coefficients on L. ce1 are the
parameters in the adjustment matrix for this model. The second estimation table contains the
estimated parameters of the cointegrating vector for this model, along with their standard errors, z
statistics, and confidence intervals.
Using our previous notation, we have estimated

b = (0.304, 0.503)
and

b = (1, 0.868)

b=

0.165 0.0998
0.062 0.333

b = (0.0056, 0.0034)
v


Overall, the output indicates that the model fits well. The coefficient on houston in the cointegrating
equation is statistically significant, as are the adjustment parameters. The adjustment parameters in
this bivariate example are easy to interpret, and we can see that the estimates have the correct
signs and imply rapid adjustment toward equilibrium. When the predictions from the cointegrating
equation are positive, dallas is above its equilibrium value because the coefficient on dallas in the
cointegrating equation is positive. The estimate of the coefficient [D dallas]L. ce1 is .3. Thus
when the average housing price in Dallas is too high, it quickly falls back toward the Houston level.
The estimated coefficient [D houston]L. ce1 of .5 implies that when the average housing price in
Dallas is too high, the average price in Houston quickly adjusts toward the Dallas level at the same
time that the Dallas prices are adjusting.
Fitting VECMs with Johansens normalization
As discussed by Johansen (1995), if there are r cointegrating equations, then at least r2 restrictions
are required to identify the free parameters in . Johansen proposed a default identification scheme
that has become the conventional method of identifying models in the absence of theoretically justified
restrictions. Johansens identification scheme is

e0)
0 = (Ir ,
e is an (K r) r matrix of identified parameters. vec
where Ir is the r r identity matrix and
applies Johansens normalization by default.
To illustrate, we fit a VECM with two cointegrating equations and three lags on all four series. We
are interested only in the estimates of the parameters in the cointegrating equations, so we can specify
the noetable option to suppress the estimation table for the adjustment and short-run parameters.

726

vec intro Introduction to vector error-correction models


. vec austin dallas houston sa, lags(3) rank(2) noetable
Vector error-correction model
Sample: 1990m4 - 2003m12
No. of obs
AIC
Log likelihood = 1153.644
HQIC
Det(Sigma_ml) = 9.93e-12
SBIC
Cointegrating equations
Equation
Parms
chi2
P>chi2
_ce1
_ce2

2
2

Identification:

586.3044
2169.826

=
165
= -13.40174
= -13.03496
= -12.49819

0.0000
0.0000

beta is exactly identified


Johansen normalization restrictions imposed

beta

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

austin
dallas
houston
sa
_cons

1
-1.30e-17
-.2623782
-1.241805
5.577099

.
.
.1893625
.229643
.

.
.
-1.39
-5.41
.

.
.
0.166
0.000
.

.
.
-.6335219
-1.691897
.

.
.
.1087655
-.7917128
.

austin
dallas
houston
sa
_cons

-1.41e-18
1
-1.095652
.2883986
-2.351372

.
.
.0669898
.0812396
.

.
.
-16.36
3.55
.

.
.
0.000
0.000
.

.
.
-1.22695
.1291718
.

.
.
-.9643545
.4476253
.

_ce1

_ce2

The Johansen identification scheme has placed four constraints on the parameters in :
[ ce1]austin=1, [ ce1]dallas=0, [ ce2]austin=0, and [ ce2]dallas=1. (The computational method used imposes zero restrictions that are numerical rather than exact. The values 3.48e
17 and 1.26e17 are indistinguishable from zero.) We interpret the results of the first equation as
indicating the existence of an equilibrium relationship between the average housing price in Austin
and the average prices of houses in Houston and San Antonio.
The Johansen normalization restricted the coefficient on dallas to be unity in the second
cointegrating equation, but we could instead constrain the coefficient on houston. Both sets of
restrictions define just-identified models, so fitting the model with the latter set of restrictions will
yield the same maximized log likelihood. To impose the alternative set of constraints, we use the
constraint command.
. constraint define 1 [_ce1]austin = 1
. constraint define 2 [_ce1]dallas = 0
. constraint define 3 [_ce2]austin = 0
. constraint define 4 [_ce2]houston = 1

vec intro Introduction to vector error-correction models

727

. vec austin dallas houston sa, lags(3) rank(2) noetable bconstraints(1/4)


Iteration 1:
log likelihood = 1148.8745
(output omitted )
Iteration 25:
log likelihood = 1153.6435
Vector error-correction model
Sample: 1990m4 - 2003m12
No. of obs
=
165
AIC
= -13.40174
Log likelihood = 1153.644
HQIC
= -13.03496
Det(Sigma_ml) = 9.93e-12
SBIC
= -12.49819
Cointegrating equations
Equation
Parms
chi2
P>chi2
_ce1
_ce2

2
2

586.3392
3455.469

0.0000
0.0000

Identification: beta is exactly identified


( 1) [_ce1]austin = 1
( 2) [_ce1]dallas = 0
( 3) [_ce2]austin = 0
( 4) [_ce2]houston = 1
beta

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

austin
dallas
houston
sa
_cons

1
0
-.2623784
-1.241805
5.577099

.
(omitted)
.1876727
.2277537
.

-1.40
-5.45
.

0.162
0.000
.

-.6302102
-1.688194
.

.1054534
-.7954157
.

austin
dallas
houston
sa
_cons

0
-.9126985
1
-.2632209
2.146094

(omitted)
.0595804
.
.0628791
.

-15.32
.
-4.19
.

0.000
.
0.000
.

-1.029474
.
-.3864617
.

-.7959231
.
-.1399802
.

_ce1

_ce2

Only the estimates of the parameters in the second cointegrating equation have changed, and the
new estimates are simply the old estimates divided by 1.095652 because the new constraints are
just an alternative normalization of the same just-identified model. With the new normalization, we
can interpret the estimates of the parameters in the second cointegrating equation as indicating an
equilibrium relationship between the average house price in Houston and the average prices of houses
in Dallas and San Antonio.
Postestimation specification testing
Inference on the parameters in depends crucially on the stationarity of the cointegrating equations,
so we should check the specification of the model. As a first check, we can predict the cointegrating
equations and graph them over time.
. predict ce1, ce equ(#1)
. predict ce2, ce equ(#2)

728

vec intro Introduction to vector error-correction models

.4

Predicted cointegrated equation


.2
0
.2

.4

. twoway line ce1 t

1990m1

1995m1

2000m1

2005m1

2000m1

2005m1

.3

Predicted cointegrated equation


.2
.1
0
.1

.2

. twoway line ce2 t

1990m1

1995m1
t

Although the large shocks apparent in the graph of the levels have clear effects on the predictions
from the cointegrating equations, our only concern is the negative trend in the first cointegrating
equation since the end of 2000. The graph of the levels shows that something put a significant brake
on the growth of housing prices after 2000 and that the growth of housing prices in San Antonio
slowed during 2000 but then recuperated while Austin maintained slower growth. We suspect that
this indicates that the end of the high-tech boom affected Austin more severely than San Antonio.
This difference is what causes the trend in the first cointegrating equation. Although we could try to
account for this effect with a more formal analysis, we will proceed as if the cointegrating equations
are stationary.
We can use vecstable to check whether we have correctly specified the number of cointegrating
equations. As discussed in [TS] vecstable, the companion matrix of a VECM with K endogenous
variables and r cointegrating equations has K r unit eigenvalues. If the process is stable, the moduli
of the remaining r eigenvalues are strictly less than one. Because there is no general distribution

vec intro Introduction to vector error-correction models

729

theory for the moduli of the eigenvalues, ascertaining whether the moduli are too close to one can
be difficult.
. vecstable, graph
Eigenvalue stability condition
Eigenvalue
1
1
-.6698661
.3740191
.3740191
-.386377
-.386377
.540117
-.0749239
-.0749239
-.2023955
.09923966

Modulus

+
+
-

.4475996i
.4475996i
.395972i
.395972i

+
-

.5274203i
.5274203i

1
1
.669866
.583297
.583297
.553246
.553246
.540117
.532715
.532715
.202395
.09924

The VECM specification imposes 2 unit moduli.

.5

Imaginary
0

.5

Roots of the companion matrix

.5

0
Real

.5

The VECM specification imposes 2 unit moduli

Because we specified the graph option, vecstable plotted the eigenvalues of the companion
matrix. The graph of the eigenvalues shows that none of the remaining eigenvalues appears close to
the unit circle. The stability check does not indicate that our model is misspecified.
Here we use veclmar to test for serial correlation in the residuals.
. veclmar, mlag(4)
Lagrange-multiplier test
lag

chi2

df

Prob > chi2

1
2
3
4

56.8757
31.1970
30.6818
14.6493

16
16
16
16

0.00000
0.01270
0.01477
0.55046

H0: no autocorrelation at lag order

730

vec intro Introduction to vector error-correction models

The results clearly indicate serial correlation in the residuals. The results in Gonzalo (1994) indicate
that underspecifying the number of lags in a VECM can significantly increase the finite-sample bias
in the parameter estimates and lead to serial correlation. For this reason, we refit the model with five
lags instead of three.
. vec austin dallas houston sa, lags(5) rank(2) noetable bconstraints(1/4)
Iteration 1:
log likelihood = 1200.5402
(output omitted )
Iteration 20:
log likelihood = 1203.9465
Vector error-correction model
Sample:

1990m6 - 2003m12

Log likelihood = 1203.946


Det(Sigma_ml) = 4.51e-12
Cointegrating equations
Equation
Parms
_ce1
_ce2

2
2

No. of obs
AIC
HQIC
SBIC
chi2

P>chi2

498.4682
4125.926

0.0000
0.0000

=
163
= -13.79075
= -13.1743
= -12.27235

Identification: beta is exactly identified


( 1) [_ce1]austin = 1
( 2) [_ce1]dallas = 0
( 3) [_ce2]austin = 0
( 4) [_ce2]houston = 1
beta

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

austin
dallas
houston
sa
_cons

1
0
-.6525574
-.6960166
3.846275

.
(omitted)
.2047061
.2494167
.

-3.19
-2.79
.

0.001
0.005
.

-1.053774
-1.184864
.

-.2513407
-.2071688
.

austin
dallas
houston
sa
_cons

0
-.932048
1
-.2363915
2.065719

(omitted)
.0564332
.
.0599348
.

-16.52
.
-3.94
.

0.000
.
0.000
.

-1.042655
.
-.3538615
.

-.8214409
.
-.1189215
.

_ce1

_ce2

Comparing these results with those from the previous model reveals that
1. there is now evidence that the coefficient [ ce1]houston is not equal to zero,
2. the two sets of estimated coefficients for the first cointegrating equation are different, and
3. the two sets of estimated coefficients for the second cointegrating equation are similar.
The assumption that the errors are independently, identically, and normally distributed with zero
mean and finite variance allows us to derive the likelihood function. If the errors do not come from
a normal distribution but are just independently and identically distributed with zero mean and finite
variance, the parameter estimates are still consistent, but they are not efficient.

vec intro Introduction to vector error-correction models

731

We use vecnorm to test the null hypothesis that the errors are normally distributed.
. qui vec austin dallas houston sa, lags(5) rank(2) bconstraints(1/4)
. vecnorm
Jarque-Bera test
Equation

chi2

df

Prob > chi2

D_austin
D_dallas
D_houston
D_sa
ALL

74.324
3.501
245.032
8.426
331.283

2
2
2
2
8

0.00000
0.17370
0.00000
0.01481
0.00000

Skewness test
Equation

Skewness

chi2

df

Prob > chi2

D_austin
D_dallas
D_houston
D_sa
ALL

.60265
.09996
-1.0444
.38019

9.867
0.271
29.635
3.927
43.699

1
1
1
1
4

0.00168
0.60236
0.00000
0.04752
0.00000

Equation

Kurtosis

chi2

df

Prob > chi2

D_austin
D_dallas
D_houston
D_sa
ALL

6.0807
3.6896
8.6316
3.8139

64.458
3.229
215.397
4.499
287.583

1
1
1
1
4

0.00000
0.07232
0.00000
0.03392
0.00000

Kurtosis test

The results indicate that we can strongly reject the null hypothesis of normally distributed errors.
Most of the errors are both skewed and kurtotic.
Impulseresponse functions for VECMs
With a model that we now consider acceptably well specified, we can use the irf commands to
estimate and interpret the IRFs. Whereas IRFs from a stationary VAR die out over time, IRFs from a
cointegrating VECM do not always die out. Because each variable in a stationary VAR has a timeinvariant mean and finite, time-invariant variance, the effect of a shock to any one of these variables
must die out so that the variable can revert to its mean. In contrast, the I(1) variables modeled in a
cointegrating VECM are not mean reverting, and the unit moduli in the companion matrix imply that
the effects of some shocks will not die out over time.
These two possibilities gave rise to new terms. When the effect of a shock dies out over time, the
shock is said to be transitory. When the effect of a shock does not die out over time, the shock is
said to be permanent.
Below we use irf create to estimate the IRFs and irf graph to graph two of the orthogonalized
IRFs.

732

vec intro Introduction to vector error-correction models


. irf
(file
(file
(file
. irf

create vec1, set(vecintro, replace) step(24)


vecintro.irf created)
vecintro.irf now active)
vecintro.irf updated)
graph oirf, impulse(austin dallas) response(sa) yline(0)

vec1, austin, sa

vec1, dallas, sa

.015

.01

.005

10

20

30

10

20

30

step
Graphs by irfname, impulse variable, and response variable

The graphs indicate that an orthogonalized shock to the average housing price in Austin has a
permanent effect on the average housing price in San Antonio but that an orthogonalized shock to
the average price of housing in Dallas has a transitory effect. According to this model, unexpected
shocks that are local to the Austin housing market will have a permanent effect on the housing market
in San Antonio, but unexpected shocks that are local to the Dallas housing market will have only a
transitory effect on the housing market in San Antonio.
Forecasting with VECMs
Cointegrating VECMs are also used to produce forecasts of both the first-differenced variables and
the levels of the variables. Comparing the variances of the forecast errors of stationary VARs with
those from a cointegrating VECM reveals a fundamental difference between the two models. Whereas
the variances of the forecast errors for a stationary VAR converge to a constant as the prediction
horizon grows, the variances of the forecast errors for the levels of a cointegrating VECM diverge
with the forecast horizon. (See sec. 6.5 of Lutkepohl [2005] for more about this result.) Because all
the variables in the model for the first differences are stationary, the forecast errors for the dynamic
forecasts of the first differences remain finite. In contrast, the forecast errors for the dynamic forecasts
of the levels diverge to infinity.
We use fcast compute to obtain dynamic forecasts of the levels and fcast graph to graph
these dynamic forecasts, along with their asymptotic confidence intervals.

vec intro Introduction to vector error-correction models

733

. tsset
time variable:
delta:

t, 1990m1 to 2003m12
1 month

. fcast compute m1_, step(24)


. fcast graph m1_austin m1_dallas m1_houston m1_sa

Forecast for dallas

12.1 12.2 12.3 12.4 12.5

12 12.1 12.2 12.3 12.4

Forecast for austin

Forecast for sa

11.9 12 12.1 12.2 12.3

11.7 11.8 11.9 12 12.1

Forecast for houston

2004m1 2004m7 2005m1 2005m7 2006m1 2004m1 2004m7 2005m1 2005m7 2006m1

95% CI

forecast

As expected, the widths of the confidence intervals grow with the forecast horizon.

References
Ahn, S. K., and G. C. Reinsel. 1990. Estimation for partially nonstationary multivariate autoregressive models. Journal
of the American Statistical Association 85: 813823.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Engle, R. F., and C. W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing.
Econometrica 55: 251276.
Gonzalo, J. 1994. Five alternative methods of estimating long-run equilibrium relationships. Journal of Econometrics
60: 203233.
Granger, C. W. J. 1981. Some properties of time series data and their use in econometric model specification. Journal
of Econometrics 16: 121130.
Granger, C. W. J., and P. Newbold. 1974. Spurious regressions in econometrics. Journal of Econometrics 2: 111120.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Hubrich, K., H. Lutkepohl, and P. Saikkonen. 2001. A review of systems cointegration tests. Econometric Reviews
20: 247318.
Johansen, S. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12:
231254.
. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models.
Econometrica 59: 15511580.
. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

734

vec intro Introduction to vector error-correction models

Maddala, G. S., and I.-M. Kim. 1998. Unit Roots, Cointegration, and Structural Change. Cambridge: Cambridge
University Press.
Nielsen, B. 2001. Order determination in general vector autoregressions. Working paper, Department of Economics,
University of Oxford and Nuffield College. http://ideas.repec.org/p/nuf/econwp/0110.html.
Park, J. Y., and P. C. B. Phillips. 1988. Statistical inference in regressions with integrated processes: Part I. Econometric
Theory 4: 468497.
. 1989. Statistical inference in regressions with integrated processes: Part II. Econometric Theory 5: 95131.
Paulsen, J. 1984. Order determination of multivariate autoregressive time series with unit roots. Journal of Time Series
Analysis 5: 115127.
Phillips, P. C. B. 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33: 311340.
Phillips, P. C. B., and S. N. Durlauf. 1986. Multiple time series regressions with integrated processes. Review of
Economic Studies 53: 473495.
Sims, C. A., J. H. Stock, and M. W. Watson. 1990. Inference in linear time series models with some unit roots.
Econometrica 58: 113144.
Stock, J. H. 1987. Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica 55:
10351056.
Stock, J. H., and M. W. Watson. 1988. Testing for common trends. Journal of the American Statistical Association
83: 10971107.
Tsay, R. S. 1984. Order selection in nonstationary autoregressive models. Annals of Statistics 12: 14251433.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

Also see
[TS] vec Vector error-correction models
[TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs

Title
vec Vector error-correction models
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
vec depvarlist

if

 

in

 

options

, options

Description

Model

rank(#)
lags(#)
trend(constant)
trend(rconstant)
trend(trend)
trend(rtrend)
trend(none)
bconstraints(constraintsbc )
aconstraints(constraintsac )

use # cointegrating equations; default is rank(1)


use # for the maximum lag in underlying VAR model
include an unrestricted constant in model; the default
include a restricted constant in model
include a linear trend in the cointegrating equations and a
quadratic trend in the undifferenced data
include a restricted trend in model
do not include a trend or a constant
place constraintsbc on cointegrating vectors
place constraintsac on adjustment parameters

Adv. model

sindicators(varlistsi )
noreduce

include normalized seasonal indicator variables varlistsi


do not perform checks and corrections for collinearity among
lags of dependent variables

Reporting

level(#)
nobtable
noidtest
alpha
pi
noptable
mai
noetable
dforce
nocnsreport
display options

set confidence level; default is level(95)


do not report parameters in the cointegrating equations
do not report the likelihood-ratio test of overidentifying
restrictions
report adjustment parameters in separate table
report parameters in = 0
do not report elements of matrix
report parameters in the moving-average impact matrix
do not report adjustment and short-run parameters
force reporting of short-run, beta, and alpha parameters when
the parameters in beta are not identified; advanced option
do not display constraints
control column formats, row spacing, and line width

Maximization

maximize options

control the maximization process; seldom used

coeflegend

display legend instead of statistics

735

736

vec Vector error-correction models

vec does not allow gaps in the data.


You must tsset your data before using vec; see [TS] tsset.
varlist must contain at least two variables and may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Menu
Statistics

>

Multivariate time series

>

Vector error-correction model (VECM)

Description
vec fits a type of vector autoregression in which some of the variables are cointegrated by using
Johansens (1995) maximum likelihood method. Constraints may be placed on the parameters in the
cointegrating equations or on the adjustment terms. See [TS] vec intro for a list of commands that
are used in conjunction with vec.

Options


Model

rank(#) specifies the number of cointegrating equations; rank(1) is the default.


lags(#) specifies the maximum lag to be included in the underlying VAR model. The maximum lag
in a VECM is one smaller than the maximum lag in the corresponding VAR in levels; the number
of lags must be greater than zero but small enough so that the degrees of freedom used up by the
model are fewer than the number of observations. The default is lags(2).
trend(trend spec) specifies which of Johansens five trend specifications to include in the model.
These specifications are discussed in Specification of constants and trends below. The default is
trend(constant).
bconstraints(constraintsbc ) specifies the constraints to be placed on the parameters of the cointegrating equations. When no constraints are placed on the adjustment parametersthat is, when
the aconstraints() option is not specifiedthe default is to place the constraints defined by
Johansens normalization on the parameters of the cointegrating equations. When constraints are
placed on the adjustment parameters, the default is not to place constraints on the parameters in
the cointegrating equations.
aconstraints(constraintsac ) specifies the constraints to be placed on the adjustment parameters.
By default, no constraints are placed on the adjustment parameters.

Adv. model

sindicators(varlistsi ) specifies the normalized seasonal indicator variables to include in the model.
The indicator variables specified in this option must be normalized as discussed in Johansen (1995).
If the indicators are not properly normalized, the estimator of the cointegrating vector does not
converge to the asymptotic distribution derived by Johansen (1995). More details about how these
variables are handled are provided in Methods and formulas. sindicators() cannot be specified
with trend(none) or with trend(rconstant).

vec Vector error-correction models

737

noreduce causes vec to skip the checks and corrections for collinearity among the lags of the
dependent variables. By default, vec checks to see whether the current lag specification causes
some of the regressions performed by vec to contain perfectly collinear variables; if so, it reduces
the maximum lag until the perfect collinearity is removed.

Reporting

level(#); see [R] estimation options.


nobtable suppresses the estimation table for the parameters in the cointegrating equations. By default,
vec displays the estimation table for the parameters in the cointegrating equations.
noidtest suppresses the likelihood-ratio test of the overidentifying restrictions, which is reported
by default when the model is overidentified.
alpha displays a separate estimation table for the adjustment parameters, which is not displayed by
default.
pi displays a separate estimation table for the parameters in = 0 , which is not displayed by
default.
noptable suppresses the estimation table for the elements of the matrix, which is displayed by
default when the parameters in the cointegrating equations are not identified.
mai displays a separate estimation table for the parameters in the moving-average impact matrix,
which is not displayed by default.
noetable suppresses the main estimation table that contains information about the estimated adjustment
parameters and the short-run parameters, which is displayed by default.
dforce displays the estimation tables for the short-run parameters and and if the last two are
requestedwhen the parameters in are not identified. By default, when the specified constraints
do not identify the parameters in the cointegrating equations, estimation tables are displayed only
for and the MAI.
nocnsreport; see [R] estimation options.

display options: vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch;
see [R] estimation options.

Maximization

maximize options: iterate(#), nolog, trace, toltrace, tolerance(#), ltolerance(#),


afrom(matrixa ), and bfrom(matrixb ); see [R] maximize.
toltrace displays the relative differences for the log likelihood and the coefficient vector at every
iteration. This option cannot be specified if no constraints are defined or if nolog is specified.
afrom(matrixa ) specifies a 1 (K r) row vector with starting values for the adjustment parameters,
where K is the number of endogenous variables and r is the number of cointegrating equations
specified in the rank() option. The starting values should be ordered as they are reported in
e(alpha). This option cannot be specified if no constraints are defined.
bfrom(matrixb ) specifies a 1 (m1 r) row vector with starting values for the parameters of the
cointegrating equations, where m1 is the number of variables in the trend-augmented system and
r is the number of cointegrating equations specified in the rank() option. (See Methods and
formulas for more details about m1 .) The starting values should be ordered as they are reported
in e(betavec). As discussed in Methods and formulas, for some trend specifications, e(beta)
contains parameter estimates that are not obtained directly from the optimization algorithm.
bfrom() should specify only starting values for the parameters reported in e(betavec). This
option cannot be specified if no constraints are defined.

738

vec Vector error-correction models

The following option is available with vec but is not shown in the dialog box:
coeflegend; see [R] estimation options.

Remarks and examples


Remarks are presented under the following headings:
Introduction
Specification of constants and trends
Collinearity

Introduction
VECMs are used to model the stationary relationships between multiple time series that contain
unit roots. vec implements Johansens approach for estimating the parameters of a VECM.

[TS] vec intro reviews the basics of integration and cointegration and highlights why we need
special methods for modeling the relationships between processes that contain unit roots. This manual
entry assumes familiarity with the material in [TS] vec intro and provides examples illustrating how to
use the vec command. See Johansen (1995), Hamilton (1994), and Becketti (2013) for more in-depth
introductions to cointegration analysis.

Example 1
This example uses annual data on the average per-capita disposable personal income in the eight
U.S. Bureau of Economic Analysis (BEA) regions of the United States. We use data from 19482002
in logarithms. Unit-root tests on these series fail to reject the null hypothesis that per-capita disposable
income in each region contains a unit root. Because capital and labor can move easily between the
different regions of the United States, we would expect that no one series will diverge from all the
remaining series and that cointegrating relationships exist.
Below we graph the natural logs of average disposal income in the New England and the Southeast
regions.

10

11

. use http://www.stata-press.com/data/r13/rdinc
. line ln_ne ln_se year

1950

1960

1970

1980

1990

year
ln(new_england)

ln(southeast)

2000

vec Vector error-correction models

739

The graph indicates a differential between the two series that shrinks between 1960 and about
1980 and then grows until it stabilizes around 1990. We next estimate the parameters of a bivariate
VECM with one cointegrating relationship.
. vec ln_ne ln_se
Vector error-correction model
Sample: 1950 - 2002
Log likelihood =
Det(Sigma_ml) =
Equation
D_ln_ne
D_ln_se

300.6224
4.06e-08
Parms

RMSE

R-sq

4
4

.017896
.018723

0.9313
0.9292

Coef.

Std. Err.

No. of obs
AIC
HQIC
SBIC
chi2
P>chi2
664.4668
642.7179

P>|z|

=
53
= -11.00462
= -10.87595
= -10.67004

0.0000
0.0000

[95% Conf. Interval]

D_ln_ne
_ce1
L1.

-.4337524

.0721365

-6.01

0.000

-.5751373

-.2923675

ln_ne
LD.

.7168658

.1889085

3.79

0.000

.3466119

1.08712

ln_se
LD.

-.6748754

.2117975

-3.19

0.001

-1.089991

-.2597599

_cons

-.0019846

.0080291

-0.25

0.805

-.0177214

.0137521

_ce1
L1.

-.3543935

.0754725

-4.70

0.000

-.5023168

-.2064701

ln_ne
LD.

.3366786

.1976448

1.70

0.088

-.050698

.7240553

ln_se
LD.

-.1605811

.2215922

-0.72

0.469

-.5948939

.2737317

_cons

.002429

.0084004

0.29

0.772

-.0140355

.0188936

D_ln_se

Cointegrating equations
Equation
Parms
_ce1

Identification:

chi2

P>chi2

29805.02

0.0000

beta is exactly identified


Johansen normalization restriction imposed

beta

Coef.

ln_ne
ln_se
_cons

1
-.9433708
-.8964065

Std. Err.

P>|z|

[95% Conf. Interval]

_ce1
.
.0054643
.

.
-172.64
.

.
0.000
.

.
-.9540807
.

.
-.9326609
.

The default output has three parts. The header provides information about the sample, the model
fit, and the identification of the parameters in the cointegrating equation. The main estimation table

740

vec Vector error-correction models

contains the estimates of the short-run parameters, along with their standard errors and confidence
intervals. The second estimation table reports the estimates of the parameters in the cointegrating
equation, along with their standard errors and confidence intervals.
The results indicate strong support for a cointegrating equation such that
ln ne .943 ln se .896
should be a stationary series. Identification of the parameters in the cointegrating equation is achieved
by constraining some of them to be fixed, and fixed parameters do not have standard errors. In this
example, the coefficient on ln ne has been normalized to 1, so its standard error is missing. As
discussed in Methods and formulas, the constant term in the cointegrating equation is not directly
estimated in this trend specification but rather is backed out from other estimates. Not all the elements
of the VCE that correspond to this parameter are readily available, so the standard error for the cons
parameter is missing.
To get a better idea of how our model fits, we predict the cointegrating equation and graph it over
time:

.25

Predicted cointegrated equation


.2
.15
.1

.05

. predict ce, ce
. line ce year

1950

1960

1970

1980

1990

2000

year

Although the predicted cointegrating equation has the right appearance for the time before the
mid-1960s, afterward the predicted cointegrating equation does not look like a stationary series. A
better model would account for the trends in the size of the differential.

As discussed in [TS] vec intro, simply normalizing one of the coefficients to be one is sufficient to
identify the parameters of the single cointegrating vector. When there is more than one cointegrating
equation, more restrictions are required.

Example 2
We have data on monthly unemployment rates in Indiana, Illinois, Kentucky, and Missouri from
January 1978 through December 2003. We suspect that factor mobility will keep the unemployment
rates in equilibrium. The following graph plots the data.

vec Vector error-correction models

741

10

12

. use http://www.stata-press.com/data/r13/urates, clear


. line missouri indiana kentucky illinois t

1980m1

1985m1

1990m1
t
missouri
kentucky

1995m1

2000m1

2005m1

indiana
illinois

The graph shows that although the series do appear to move together, the relationship is not as clear
as in the previous example. There are periods when Indiana has the highest rate and others when
Indiana has the lowest rate. Although the Kentucky rate moves closely with the other series for most
of the sample, there is a period in the mid-1980s when the unemployment rate in Kentucky does not
fall at the same rate as the other series.
We will model the series with two cointegrating equations and no linear or quadratic time trends
in the original series. Because we are focusing on the cointegrating vectors, we use the noetable
option to suppress displaying the short-run estimation table.

742

vec Vector error-correction models


. vec missouri indiana kentucky illinois, trend(rconstant) rank(2)
> noetable
Vector error-correction model
Sample: 1978m5 - 2003m12
No. of obs
AIC
Log likelihood = 417.1314
HQIC
Det(Sigma_ml) = 7.83e-07
SBIC
Cointegrating equations
Equation
Parms
chi2
P>chi2
_ce1
_ce2

2
2

Identification:

133.3885
195.6324

lags(4)

=
308
= -2.306048
= -2.005818
= -1.555184

0.0000
0.0000

beta is exactly identified


Johansen normalization restrictions imposed

beta

Coef.

Std. Err.

P>|z|

[95% Conf. Interval]

_ce1
missouri
indiana
kentucky
illinois
_cons

1
-2.52e-18
.3493902
-1.135152
-.3880707

.
.
.2005537
.2069063
.4974323

.
.
1.74
-5.49
-0.78

.
.
0.081
0.000
0.435

.
.
-.0436879
-1.540681
-1.36302

.
.
.7424683
-.7296235
.5868787

_ce2
missouri
indiana
kentucky
illinois
_cons

9.30e-17
1
.2059473
-1.51962
2.92857

.
.
.2718678
.2804792
.6743122

.
.
0.76
-5.42
4.34

.
.
0.449
0.000
0.000

.
.
-.3269038
-2.069349
1.606942

.
.
.7387985
-.9698907
4.250197

Except for the coefficients on kentucky in the two cointegrating equations and the constant
term in the first, all the parameters are significant at the 5% level. We can refit the model with the
Johansen normalization and the overidentifying constraint that the coefficient on kentucky in the
second cointegrating equation is zero.
.
.
.
.

constraint
constraint
constraint
constraint

define
define
define
define

1
2
3
4

[_ce1]missouri
[_ce1]indiana
[_ce2]missouri
[_ce2]indiana

=
=
=
=

1
0
0
1

. constraint define 5 [_ce2]kentucky = 0

vec Vector error-correction models


. vec missouri indiana kentucky illinois, trend(rconstant) rank(2)
> lags(4) noetable bconstraints(1/5)
Iteration 1:
log likelihood = 416.97177
(output omitted )
Iteration 20:
log likelihood = 416.9744
Vector error-correction model
Sample: 1978m5 - 2003m12
No. of obs
AIC
Log likelihood = 416.9744
HQIC
Det(Sigma_ml) = 7.84e-07
SBIC
Cointegrating equations
Equation
Parms
_ce1
_ce2

2
1

Identification:
(
(
(
(
(

1)
2)
3)
4)
5)

chi2

P>chi2

145.233
209.9344

0.0000
0.0000

743

=
308
= -2.311522
= -2.016134
= -1.572769

beta is overidentified

[_ce1]missouri = 1
[_ce1]indiana = 0
[_ce2]missouri = 0
[_ce2]indiana = 1
[_ce2]kentucky = 0
beta

Coef.

Std. Err.

_ce1
missouri
indiana
kentucky
illinois
_cons

1
0
.2521685
-1.037453
-.3891102

.
(omitted)
.1649653
.1734165
.4726968

_ce2
missouri
indiana
kentucky
illinois
_cons

0
1
0
-1.314265
2.937016

(omitted)
.
(omitted)
.0907071
.6448924

LR test of identifying restrictions:

P>|z|

[95% Conf. Interval]

1.53
-5.98
-0.82

0.126
0.000
0.410

-.0711576
-1.377343
-1.315579

.5754946
-.6975626
.5373586

-14.49
4.55

0.000
0.000

-1.492048
1.67305

-1.136483
4.200982

chi2(

1) =

.3139

Prob > chi2 = 0.575

The test of the overidentifying restriction does not reject the null hypothesis that the restriction
is valid, and the p-value on the coefficient on kentucky in the first cointegrating equation indicates
that it is not significant. We will leave the variable in the model and attribute the lack of significance
to whatever caused the kentucky series to temporarily rise above the others from 1985 until 1990,
though we could instead consider removing kentucky from the model.
Next, we look at the estimates of the adjustment parameters. In the output below, we replay
the previous results. We specify the alpha option so that vec will display an estimation table for
the estimates of the adjustment parameters, and we specify nobtable to suppress the table for the
parameters of the cointegrating equations because we have already looked at those.

744

vec Vector error-correction models


. vec, alpha nobtable noetable
Vector error-correction model
Sample: 1978m5 - 2003m12
Log likelihood =
Det(Sigma_ml) =

416.9744
7.84e-07

Adjustment parameters
Equation
Parms
D_missouri
D_indiana
D_kentucky
D_illinois

No. of obs
AIC
HQIC
SBIC

2
2
2
2

chi2

P>chi2

19.39607
6.426086
8.524901
22.32893

0.0001
0.0402
0.0141
0.0000

Std. Err.

alpha

Coef.

D_missouri
_ce1
L1.

-.0683152

.0185763

-3.68

0.000

-.1047242

-.0319063

_ce2
L1.

.0405613

.0112417

3.61

0.000

.018528

.0625946

D_indiana
_ce1
L1.

-.0342096

.0220955

-1.55

0.122

-.0775159

.0090967

_ce2
L1.

.0325804

.0133713

2.44

0.015

.0063732

.0587877

D_kentucky
_ce1
L1.

-.0482012

.0231633

-2.08

0.037

-.0936004

-.0028021

_ce2
L1.

.0374395

.0140175

2.67

0.008

.0099657

.0649133

D_illinois
_ce1
L1.

.0138224

.0227041

0.61

0.543

-.0306768

.0583215

_ce2
L1.

.0567664

.0137396

4.13

0.000

.0298373

.0836955

LR test of identifying restrictions:

chi2(

P>|z|

=
308
= -2.311522
= -2.016134
= -1.572769

1) =

[95% Conf. Interval]

.3139

Prob > chi2 = 0.575

All the coefficients are significant at the 5% level, except those on Indiana and Illinois in the first
cointegrating equation. From an economic perspective, the issue is whether the unemployment rates
in Indiana and Illinois adjust when the first cointegrating equation is out of equilibrium. We could
impose restrictions on one or both of those parameters and refit the model, or we could just decide
to use the current results.

vec Vector error-correction models

745

Technical note
vec can be used to fit models in which the parameters in are not identified, in which case only
the parameters in and the moving-average impact matrix C are identified. When the parameters in
b and
are not identified, the values of
b can vary depending on the starting values. However, the
estimates of and C are identified and have known asymptotic distributions. This method is valid
because these additional normalization restrictions impose no restriction on or C.

Specification of constants and trends


As discussed in [TS] vec intro, allowing for a constant term and linear time trend allow us to
write the VECM as

yt = (yt1 + + t) +

p1
X

i yti + + t + t

i=1

Five different trend specifications are available:


Option in trend()

Parameter restrictions

Johansen (1995) notation

trend
rtrend
constant
rconstant
none

none
=0
= 0, and = 0
= 0, = 0 and = 0
= 0, = 0, = 0, and = 0

H(r)
H (r)
H1 (r)
H1 (r)
H2 (r)

trend(trend) allows for a linear trend in the cointegrating equations and a quadratic trend in
the undifferenced data. A linear trend in the cointegrating equations implies that the cointegrating
equations are assumed to be trend stationary.
trend(rtrend) defines a restricted trend model that excludes linear trends in the differenced data
but allows for linear trends in the cointegrating equations. As in the previous case, a linear trend in
a cointegrating equation implies that the cointegrating equation is trend stationary.
trend(constant) defines a model with an unrestricted constant. This allows for a linear trend
in the undifferenced data and cointegrating equations that are stationary around a nonzero mean. This
is the default.
trend(rconstant) defines a model with a restricted constant in which there is no linear or
quadratic trend in the undifferenced data. A nonzero allows for the cointegrating equations to be
stationary around nonzero means, which provide the only intercepts for differenced data. Seasonal
indicators are not allowed with this specification.
trend(none) defines a model that does not include a trend or a constant. When there is no trend
or constant, the cointegrating equations are restricted to being stationary with zero means. Also, after
adjusting for the effects of lagged endogenous variables, the differenced data are modeled as having
mean zero. Seasonal indicators are not allowed with this specification.

746

vec Vector error-correction models

Technical note
vec uses a switching algorithm developed by Boswijk (1995) to maximize the log-likelihood
function when constraints are placed on the parameters. The starting values affect both the ability of
the algorithm to find a maximum and its speed in finding that maximum. By default, vec uses the
parameter estimates that correspond to Johansens normalization. Sometimes, other starting values
will cause the algorithm to find a maximum faster.
To specify starting values for the parameters in , we specify a 1 (K r) matrix in the afrom()
option. Specifying starting values for the parameters in is slightly more complicated. As explained
in Methods and formulas, specifying trend(constant), trend(rtrend), or trend(trend) causes
b to be backed out. The switching
some of the estimates of the trend parameters appearing in
algorithm estimates only the parameters of the cointegrating equations whose estimates are stored in
e(betavec). For this reason, only the parameters stored in e(betavec) can have their initial values
set via bfrom().
The table below describes which trend parameters in the cointegrating equations are estimated by
the switching algorithm for each of the five specifications.
Trend specification
none
rconstant
constant
rtrend
trend

Trend parameters in
cointegrating equations
none
cons
cons
cons, trend
cons, trend

Trend parameter estimated


via switching algorithm
none
cons
none
trend
none

Collinearity
As expected, collinearity among variables causes some parameters to be unidentified numerically.
If vec encounters perfect collinearity among the dependent variables, it exits with an error.
In contrast, if vec encounters perfect collinearity that appears to be due to too many lags in the
model, vec displays a warning message and reduces the maximum lag included in the model in an
effort to find a model with fewer lags in which all the parameters are identified by the data. Specifying
the noreduce option causes vec to skip over these additional checks and corrections for collinearity.
Thus the noreduce option can be used to force the estimation to proceed when not all the parameters
are identified by the data. When some parameters are not identified because of collinearity, the results
cannot be interpreted but can be used to find the source of the collinearity.

vec Vector error-correction models

Stored results
vec stores the following in e():
Scalars
e(N)
e(k rank)
e(k eq)
e(k dv)
e(k ce)
e(n lags)
e(df m)
e(ll)
e(chi2 res)
e(df lr)
e(beta iden)
e(beta icnt)
e(k #)
e(df m#)
e(r2 #)
e(chi2 #)
e(rmse #)
e(aic)
e(hqic)
e(sbic)
e(tmin)
e(tmax)
e(detsig ml)
e(rank)
e(converge)
Macros
e(cmd)
e(cmdline)
e(trend)
e(tsfmt)
e(tvar)
e(endog)
e(covariates)
e(eqnames)
e(cenames)
e(reduce opt)
e(reduce lags)
e(title)
e(aconstraints)
e(bconstraints)
e(sindicators)
e(properties)
e(predict)
e(marginsok)
e(marginsnotok)

number of observations
number of unconstrained parameters
number of equations in e(b)
number of dependent variables
number of cointegrating equations
number of lags
model degrees of freedom
log likelihood
value of test of overidentifying restrictions
degrees of freedom of the test of overidentifying restrictions
1 if the parameters in are identified and 0 otherwise
number of independent restrictions placed on
number of variables in equation #
model degrees of freedom in equation #
R2 of equation #
2 statistic for equation #
RMSE of equation #
value of AIC
value of HQIC
value of SBIC
minimum time
maximum time
determinant of the estimated covariance matrix
rank of e(V)
1 if the switching algorithm converged, 0 if it did not converge
vec
command as typed
trend specified
format of the time variable
variable denoting time within groups
endogenous variables
list of covariates
equation names
names of cointegrating equations
noreduce, if noreduce is specified
list of maximum lags to which the model has been reduced
title in estimation output
constraints placed on
constraints placed on
sindicators, if specified
b V
program used to implement predict
predictions allowed by margins
predictions disallowed by margins

747

748

vec Vector error-correction models

Matrices
e(b)
e(V)
e(beta)

estimates of short-run parameters


VCE of short-run parameter estimates
estimates of

e(V beta)
e(betavec)

b
VCE of
directly obtained estimates of

e(pi)

b
estimates of

e(V pi)
e(alpha)
e(V alpha)

b
VCE of
estimates of
VCE of
b

e(omega)
e(mai)
e(V mai)
Functions
e(sample)

b
estimates of
estimates of C
b
VCE of C

marks estimation sample

Methods and formulas


Methods and formulas are presented under the following headings:
General specification of the VECM
The log-likelihood function
Unrestricted trend
Restricted trend
Unrestricted constant
Restricted constant
No trend
Estimation with Johansen identification
Estimation with constraints: identified
Estimation with constraints: not identified
Formulas for the information criteria
Formulas for predict

General specification of the VECM


vec estimates the parameters of a VECM that can be written as

yt = 0 yt1 +

p1
X

i yti + v + t + w1 s1 + + wm sm + t

(1)

i=1

where

yt is a K 1 vector of endogenous variables,


is a K r matrix of parameters,
is a K r matrix of parameters,
1 , . . . , p1 are K K matrices of parameters,
v is a K 1 vector of parameters,

is a K 1 vector of trend coefficients,

t is a linear time trend,


s1 , . . . , sm are orthogonalized seasonal indicators specified in the sindicators() option, and
w1 , . . . , wm are K 1 vectors of coefficients on the orthogonalized seasonal indicators.

vec Vector error-correction models

749

There are two types of deterministic elements in (1): the trend, v + t, and the orthogonalized
seasonal terms, w1 s1 + + wm sm . Johansen (1995, chap. 11) shows that inference about the
number of cointegrating equations is based on nonstandard distributions and that the addition of any
term that generalizes the deterministic specification in (1) changes the asymptotic distributions of the
statistics used for inference on the number of cointegrating equations and the asymptotic distribution
of the ML estimator of the cointegrating equations. In fact, Johansen (1995, 84) notes that including
event indicators causes the statistics used for inference on the number of cointegrating equations to
have asymptotic distributions that must be computed case by case. For this reason, event indicators
may not be specified in the present version of vec.
If seasonal indicators are included in the model, they cannot be collinear with a constant term. If
they are collinear with a constant term, one of the indicator variables is omitted.
As discussed in Specification of constants and trends, we can reparameterize the model as

yt = (yt1 + + t) +

p1
X

i yti + + t + t

(2)

i=1

The log-likelihood function


We can maximize the log-likelihood function much more easily by writing it in concentrated
form. In fact, as discussed below, in the simple case with the Johansen normalization on and no
constraints on , concentrating the log-likelihood function produces an analytical solution for the
parameter estimates.
To concentrate the log likelihood, rewrite (2) as

e 0 Z1t + Z2t + t
Z0t =

(3)

where Z0t is a K 1 vector of variables yt , is the K r matrix of adjustment coefficients, and


t is a K 1 vector of independently and identically distributed normal vectors with mean 0 and
e and depend on the trend specification and are
contemporaneous covariance matrix . Z1t , Z2t , ,
defined below.
The log-likelihood function for the model in (3) is
1n
L = T K ln(2) + T ln(||)
2
T
o
X
e 0 Z1t Z2t )0 1 (Z0t
e 0 Z1t Z2t )
+
(Z0t

(4)

t=1

e have rank r.
with the constraints that and
Johansen (1995, chap. 6), building on Anderson (1951), shows how the parameters can be
e , and the data, yielding the concentrated log-likelihood function
expressed as analytic functions of ,

Lc =

1n
T K ln(2) + T ln(||)
2
T
o
X
e 0 R1t )0 1 (R0t
e 0 R1t )
+
(R0t
t=1

(5)

750

vec Vector error-correction models

where

Mij = T 1

PT

R0t = Z0t
R1t = Z1t

t=1

Zit Z0jt ,

M02 M1
22 Z2t ;
M12 M1
22 Z2t .

i, j {0, 1, 2};
and

e and change with the trend specifications, although some of their


The definitions of Z1t , Z2t , ,
components stay the same.
Unrestricted trend

When the trend in the VECM is unrestricted, we can define the variables in (3) directly in terms
of the variables in (1):

Z1t = yt1 is K 1
0
0
Z2t = (yt1
, . . . , ytp+1
, 1, t, s1 , . . . , sm )0 is {K(p 1) + 2 + m} 1;

= (1 , . . . , p1 , v, , w1 , . . . , wm ) is K {K(p 1) + 2 + m}

e = is the K r matrix composed of the r cointegrating vectors.

In the unrestricted trend specification, m1 = K, m2 = K(p 1) + 2 + m, and there are


nparms = Kr + Kr + K{K(p 1) + 2 + m} parameters in (3).
Restricted trend

When there is a restricted trend in the VECM in (2), = 0, but the intercept v = + is
unrestricted. The VECM with the restricted trend can be written as
0

yt = ( , )

yt1
t


+

p1
X

i yti + v + w1 s1 + + wm sm + t

i=1

This equation can be written in the form of (3) by defining


0
0
, t is (K + 1) 1
Z1t = yt1
0
0
, 1, s1 , . . . , sm )0 is {K(p 1) + 1 + m} 1
, . . . , ytp+1
Z2t = (yt1

= (1 , . . . , p1 , v, w1 , . . . , wm ) is K {K(p 1) + 1 + m}

e = 0 , 0 is the (K + 1) r matrix composed of the r cointegrating vectors and the r

trend coefficients
In the restricted trend specification, m1 = K + 1, m2 = {K(p 1) + 1 + m}, and there are
nparms = Kr + (K + 1)r + K{K(p 1) + 1 + m} parameters in (3).
Unrestricted constant

An unrestricted constant in the VECM in (2) is equivalent to setting = 0 in (1), which can be
written in the form of (3) by defining

Z1t = yt1 is (K 1)
0
0
Z2t = (yt1
, . . . , ytp+1
, 1, s1 , . . . , sm )0 is {K(p 1) + 1 + m} 1;

= (1 , . . . , p1 , v, w1 , . . . , wm ) is K {K(p 1) + 1 + m}

e = is the K r matrix composed of the r cointegrating vectors

vec Vector error-correction models

751

In the unrestricted constant specification, m1 = K, m2 = {K(p 1) + 1 + m}, and there are


nparms = Kr + Kr + K{K(p 1) + 1 + m} parameters in (3).
Restricted constant

When there is a restricted constant in the VECM in (2), it can be written in the form of (3) by
defining
0
0
, 1 is (K + 1) 1
Z1t = yt1
0
0
Z2t = (yt1
, . . . , ytp+1
)0 is K(p 1) 1

= (1 , . . . , p1 ) is K K(p 1)

e = 0 , 0 is the (K + 1) r matrix composed of the r cointegrating vectors and the r

constants in the cointegrating relations.


In the restricted trend specification, m1 = K + 1, m2 = K(p 1), and there are nparms =
Kr + (K + 1)r + K{K(p 1)} parameters in (3).
No trend

When there is no trend in the VECM in (2), it can be written in the form of (3) by defining

Z1t = yt1 is K 1
0
0
Z2t = (yt1
, . . . , ytp+1
)0 is K(p 1) + m 1

= (1 , . . . , p1 ) is K K(p 1)

e = is K r matrix of r cointegrating vectors

In the no-trend specification, m1 = K , m2 = K(p 1), and there are nparms = Kr + Kr +


K{K(p 1)} parameters in (3).

Estimation with Johansen identification


e are identified. Consider the simple case in which
e is K r
Not all the parameters in and
and let Q be a nonsingular r r matrix. Then
0

e = QQ1
e = Q(
e Q 1 )0 =

0
e 0 would not change the value of the log
Substituting
into the log likelihood in (5) for
e As discussed
likelihood, so some a priori identification restrictions must be found to identify and .
in Johansen (1995, chap. 5 and 6) and Boswijk (1995), if the restrictions exactly identify or overidentify
e the estimates of the unconstrained parameters in
e will be superconsistent, meaning that the estimates
,
e
of the free parameters in will converge at a faster rate than estimates of the short-run parameters
in and i . This allows the distribution of the estimator of the short-run parameters to be derived
e
conditional on the estimated .

Johansen (1995, chap. 6) has proposed a normalization method for use when theory does not
provide sufficient a priori restrictions to identify the cointegrating vector. This method has become
widely adopted by researchers. Johansens identification scheme is

e 0 = (Ir ,
0)

is a (m1 r) r matrix of identified parameters.


where Ir is the r r identity matrix and

(6)

752

vec Vector error-correction models

Johansens identification method places r2 linearly independent constraints on the parameters in


e thereby defining an exactly identified model. The total number of freely estimated parameters is
,
nparms r2 = {K + m2 + (K + m1 r)r}, and the degrees of freedom d is calculated as the
integer part of (nparms r2 )/K.
When only the rank and the Johansen identification restrictions are placed on the model, we can
e ,
further manipulate the log likelihood in (5) to obtain analytic formulas for the parameters in ,
0
e
e
and . For a given value of , and can be found by regressing R0t on R1t . This allows a
further simplification of the problem in which

e ) = S01
e (
e 0 S11
e )1
(
e ) = S00 S01
e (
e 0 S11
e )1
e 0 S10
(
PT
0
Sij = (1/T ) t=1 Rit Rjt
i, j {0, 1}
b is given by the r
Johansen (1995) shows that by inserting these solutions into equation (5),
eigenvectors v1 , . . . , vr corresponding to the r largest eigenvalues 1 , . . . , r that solve the generalized
eigenvalue problem
|i S11 S10 S1
(7)
00 S01 | = 0
The eigenvectors corresponding to 1 , . . . , r that solve (7) are the unidentified parameter estimates.
To impose the identification restrictions in (6), we normalize the eigenvectors such that

i S11 vi = S01 S1
00 S01 vi

(8)

1 if i = j
0 otherwise

(9)

and

vi0 S11 vj =

At the optimum the log-likelihood function with the Johansen identification restrictions can be expressed
in terms of T, K, S00 , and the r largest eigenvalues
r
o
X
1 n
bi )
ln(1
Lc = T K ln(2) + K + ln(|S00 |) +
2
i=1

bi are the eigenvalues that solve (7), (8), and (9).


where the
b we can then obtain the estimates
Using the normalized ,
b (
b 0 S11
b )1

b = S01
and

(10)

b 0 S10
b = S00

b y be a K r matrix that contains the estimates of the parameters in in (1).


b y differs
Let
b in that any trend parameter estimates are omitted from .
b We can then use
b y to obtain
from
predicted values for the r nondemeaned cointegrating equations
b
et =
b 0y yt
E

vec Vector error-correction models

753

be
The r series in E
t are called the predicted, nondemeaned cointegrating equations because they still
contain the terms and . We want to work with the predicted, demeaned cointegrating equations.
Thus we need estimates of and . In the trend(rconstant) specification, the algorithm directly
produces the estimator
b . Similarly, in the trend(rtrend) specification, the algorithm directly
produces the estimator
b. In the remaining cases, to back out estimates of and , we need estimates
of v and , which we can obtain by estimating the parameters of the following VAR:
p1

X
b
e t1 +
yt = E
i yti + v + t + w1 s1 + + wm sm + t

(11)

i=1

Depending on the trend specification, we use


b to back out the estimates of

b = (b
0
b )1
b 0v

(12)

b = (b
0
b )1
b 0b

(13)

b and are included in the trend specification.


if they are not already in
b y to
We then augment

b 0f = (
b 0y ,

b,
b)

b or backed out using (12) and (13). We


where the estimates of
b and
b are either obtained from
b t , via
b f to obtain the r predicted, demeaned cointegrating equations, E
next use
bt =
b 0f (yt0 , 1, t)0
E
We last obtain estimates of all the short-run parameters from the VAR:

b t1 +
yt = E

p1
X

i yti + + t + w1 s1 + + wm sm + t

(14)

i=1

b f converges in probability to its true value at a rate faster than T 12 , we


Because the estimator
b t1 as given data in (14). This allows us to estimate the variancecovariance
can take our estimated E
(VCE) matrix of the estimates of the parameters in (14) by using the standard VAR VCE estimator.
Equation (11) can be used to obtain consistent estimates of all the parameters and of the VCE of all
b and b
the parameters, except v and . The standard VAR VCE of v
is incorrect because these estimates
converge at a faster rate. This is why it is important to use the predicted, demeaned cointegrating
b t1 , when estimating the short-run parameters and trend terms. In keeping with the
equations, E
cointegration literature, vec makes a small-sample adjustment to the VCE estimator so that the divisor
is (T d) instead of T , where d represents the degrees of freedom of the model. d is calculated as
the integer part of nparms /K, where nparms is the total number of freely estimated parameters in
the model.
In the trend(rconstant) specification, the estimation procedure directly estimates . For
trend(constant), trend(rtrend), and trend(trend), the estimates of are backed out using (12). In the trend(rtrend) specification, the estimation procedure directly estimates . In the
trend(trend) specification, the estimates of are backed out using (13). Because the elements of
the estimated VCE are readily available only when the estimates are obtained directly, when the trend
b f are missing.
parameter estimates are backed out, their elements in the VCE for

754

vec Vector error-correction models

b the estimates of the parameters in


Under the Johansen identification restrictions, vec obtains ,
0
e
b
the r m1 matrix in (5). The VCE of vec() is rm1 rm1 . Per Johansen (1995), the asymptotic
b is mixed Gaussian, and its VCE is consistently estimated by
distribution of


 0 1
1
1
(Ir HJ )0
(Ir HJ ) (b

b ) (H0J S11 HJ )
(15)
T d
where HJ is the m1 (m1 r) matrix given by HJ = (00r(m1 r) , Im1 r )0 . The VCE reported in
e(V beta) is the estimated VCE in (15) augmented with missing values to account for any backed-out
estimates of or .
b , using (10) or from the VAR in
The parameter estimates
b can be found either as a function of
(14). The estimated VCE of
b reported in e(V alpha) is given by

1
b
bB

(T d)
0

b S11
b )1 .
b B = (
where
As we would expect, the estimator of = 0 is

b =
b0

b
and its estimated VCE is given by

1
b
b0)
b (
bB

(T d)
The moving-average impact matrix C is estimated by

b =
b (b
b )1
b
C

b 0
b is the orthogonal complement of
by ,
where
b is the orthogonal complement of
b , and
Pp1
b
= IK i=1 i . The orthogonal complement of a K r matrix Q that has rank r is a matrix
Q of rank K r, such that Q0 Q = 0. Although this operation is not uniquely defined, the results
used by vec do not depend on the method of obtaining the orthogonal complement. vec uses the
following method: the orthogonal complement of Q is given by the r eigenvectors with the highest
eigenvalues from the matrix Q0 (Q0 Q)1 Q0 .
b is estimated by
Per Johansen (1995, chap. 13) and Drukker (2004), the VCE of C
T d b b b0
Sq Vb
Sq
T
where

bq = C
b b
S


(b
1 , b
2 ) if p > 1
b
=
b
1
if p = 1
b 0
b
b0 IK )
1 = (C

=
b (b
0
b )1

b
b
2 = p1 C
p1 is a (p 1) 1 vector of ones

b is the estimated VCE of b


b1 , . . .
bp1 )
V
= (b
,
b

(16)

vec Vector error-correction models

755

Estimation with constraints: identified


vec can also fit models in which the adjustment parameters are subject to homogeneous linear
constraints and the cointegrating vectors are subject to general linear restrictions. Mathematically,
vec allows for constraints of the form

R0 vec() = 0

(17)

where R is a known Kr n constraint matrix, and

e) = b
R0e vec(

(18)

where R e is a known m1 r n constraint matrix and b is a known n 1 vector of constants.

Although (17) and (18) are intuitive, they can be rewritten in a form to facilitate computation.
Specifically, (17) can be written as
vec(0 ) = Ga
(19)
where G is Kr n and a is n 1. Equation (18) can be rewritten as

e ) = Hb + h0
vec(

(20)

where H is a known n1 r n matrix, b is an n 1 matrix of parameters, and h0 is a known


n1 r 1 matrix. See [P] makecns for a discussion of the different ways of specifying the constraints.
When constraints are specified via the aconstraints() and bconstraints() options, the
e are underidentified, exactly
Boswijk (1995) rank method determines whether the parameters in
identified, or overidentified.
e are
Boswijk (1995) uses the Rothenberg (1971) method to determine whether the parameters in
2
e
e
identified. Thus the parameters in are exactly identified if = r , and the parameters in are
overidentified if > r2 , where



)
= rank R e (Ir

is a full-rank matrix with the same dimensions as .


e The computed is stored in
and

e(beta icnt).
e is given by jacob , where
Similarly, the number of freely estimated parameters in and
n
o
b )G
jacob = rank (b
Im1 )H, (IK
Using jacob , we can calculate several other parameter counts of interest. In particular, the degrees of
freedom of the overidentifying test are given by (K + m1 r)r jacob , and the number of freely
estimated parameters in the model is nparms = Km2 + jacob .
Although the problem of maximizing the log-likelihood function in (4), subject to the constraints in
(17) and (18), could be handled by the algorithms in [R] ml, the switching algorithm of Boswijk (1995)
has proven to be more convergent. For this reason, vec uses the Boswijk (1995) switching algorithm
to perform the optimization.

756

vec Vector error-correction models

b0, a
b 0 ), the algorithm iteratively updates the estimates until convergence
b0 ,
Given starting values (b
is achieved, as follows:
bj

b j is constructed from (19) and a


bj
b j is constructed from (20) and b

b j+1 = {H0 (b
b
b 1
b 1
b
0j
b j S11 )H}1 H0 (b
j
j InZ1 )h0 }
j
j S11 ){vec(P) (b
1 0 b 1
b
b
b
b j S11 )vec(P)
b 1
bj+1 = {G(
a
G (j
j j S11 j )G}

b j+1 = S00 S01


bj
b 0j S10 +
b 0j S11
bj

b 0j
bj
bj
b 0j
b is given by
The estimated VCE of
1
H{H0 (W S11 )H}1 H0
(T d)
1

b
where W is
b 0
b . As in the case without constraints, the estimated VCE of
b can be obtained
either from the VCE of the short-run parameters, as described below, or via the formula
"
#
n 1
o1
0
1
0 b
b
b
Vb
G0
b = (T d) G G ( S11 )G
Boswijk (1995) notes that, as long as the parameters of the cointegrating equations are exactly
e
identified or overidentified, the constrained ML estimator produces superconsistent estimates of .
This implies that the method of estimating the short-run parameters described above applies in the
presence of constraints, as well, albeit with a caveat: when there are constraints placed on , the
VARs must be estimated subject to these constraints.

b , Drukker (2004)
With these estimates and the estimated VCE of the short-run parameter matrix V
b

b
shows that the estimated VCE for is given by
0
b IK )Vb (
b
(

b IK )

b can be obtained from (16) with the extension


Drukker (2004) also shows that the estimated VCE of C
b
that Vb
b.
is the estimated VCE of b that takes into account any constraints on

Estimation with constraints: not identified


When the parameters in are not identified, only the parameters in = and C are identified.
The estimates of and C would not change if more identification restrictions were imposed to
b can be derived as if the model
b and C
achieve exact identification. Thus the VCE matrices for
exactly identified .

vec Vector error-correction models

757

Formulas for the information criteria


The AIC, SBIC, and HQIC are calculated according to their standard definitions, which include the
constant term from the log likelihood; that is,

 
L
2nparms
AIC = 2
+
T
T
 
L
ln(T )
SBIC = 2
nparms
+
T
T


 
2ln ln(T )
L
nparms
HQIC = 2
+
T
T
where nparms is the total number of parameters in the model and L is the value of the log likelihood
at the optimum.

Formulas for predict


xb, residuals and stdp are standard and are documented in [R] predict. ce causes predict to
bt =
b f yt for the requested cointegrating equation.
compute E
levels causes predict to compute the predictions for the levels of the data. Let ybtd be the
predicted value of yt . Because the computations are performed for a given equation, yt is a scalar.
Using ybtd , we can predict the level by ybt = ybtd + yt1 .
Because the residuals from the VECM for the differences and the residuals from the corresponding
VAR in levels are identical, there is no need for an option for predicting the residuals in levels.

References
Anderson, T. W. 1951. Estimating linear restrictions on regression coefficients for multivariate normal distributions.
Annals of Mathematical Statistics 22: 327351.
Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press.
Boswijk, H. P. 1995. Identifiability of cointegrated systems. Discussion Paper #95-78, Tinbergen Institute.
http://www1.fee.uva.nl/pp/bin/258fulltext.pdf.
Boswijk, H. P., and J. A. Doornik. 2004. Identifying, estimating and testing restricted cointegrating systems: An
overview. Statistica Neerlandica 58: 440465.
Drukker, D. M. 2004. Some further results on estimation and inference in the presence of constraints on alpha in a
cointegrating VECM. Working paper, StataCorp.
Engle, R. F., and C. W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing.
Econometrica 55: 251276.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Johansen, S. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12:
231254.
. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models.
Econometrica 59: 15511580.
. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
Maddala, G. S., and I.-M. Kim. 1998. Unit Roots, Cointegration, and Structural Change. Cambridge: Cambridge
University Press.

758

vec Vector error-correction models

Park, J. Y., and P. C. B. Phillips. 1988. Statistical inference in regressions with integrated processes: Part I. Econometric
Theory 4: 468497.
. 1989. Statistical inference in regressions with integrated processes: Part II. Econometric Theory 5: 95131.
Phillips, P. C. B. 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33: 311340.
Phillips, P. C. B., and S. N. Durlauf. 1986. Multiple time series regressions with integrated processes. Review of
Economic Studies 53: 473495.
Rothenberg, T. J. 1971. Identification in parametric models. Econometrica 39: 577591.
Sims, C. A., J. H. Stock, and M. W. Watson. 1990. Inference in linear time series models with some unit roots.
Econometrica 58: 113144.
Stock, J. H. 1987. Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica 55:
10351056.
Stock, J. H., and M. W. Watson. 1988. Testing for common trends. Journal of the American Statistical Association
83: 10971107.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

Also see
[TS] vec postestimation Postestimation tools for vec
[TS] tsset Declare data to be time-series data
[TS] var Vector autoregressive models
[TS] var svar Structural vector autoregressive models
[U] 20 Estimation and postestimation commands

[TS] vec intro Introduction to vector error-correction models

Title
vec postestimation Postestimation tools for vec
Description
Remarks and examples

Syntax for predict


Also see

Menu for predict

Options for predict

Description
The following postestimation commands are of special interest after vec:
Command

Description

fcast compute
fcast graph
irf
veclmar
vecnorm
vecstable

obtain dynamic forecasts


graph dynamic forecasts obtained from fcast compute
create and analyze IRFs and FEVDs
LM test for autocorrelation in residuals
test for normally distributed residuals
check stability condition of estimates

The following standard postestimation commands are also available:


Command

Description

estat ic
estat summarize
estat vce
estimates
forecast
lincom

Akaikes and Schwarzs Bayesian information criteria (AIC and BIC)


summary statistics for the estimation sample
variancecovariance matrix of the estimators (VCE)
cataloging estimation results
dynamic forecasts and simulations
point estimates, standard errors, testing, and inference for linear combinations
of coefficients
likelihood-ratio test
marginal means, predictive margins, marginal effects, and average marginal
effects
graph the results from margins (profile plots, interaction plots, etc.)
point estimates, standard errors, testing, and inference for nonlinear combinations
of coefficients
predictions, residuals, influence statistics, and other diagnostic measures
point estimates, standard errors, testing, and inference for generalized predictions
Wald tests of simple and composite linear hypotheses
Wald tests of nonlinear hypotheses

lrtest
margins
marginsplot
nlcom
predict
predictnl
test
testnl

759

760

vec postestimation Postestimation tools for vec

Syntax for predict


predict

type

newvar

if

 

in

 

, statistic equation(eqno | eqname)

Description

statistic
Main

xb
stdp
residuals
ce
levels
usece(varlistce )

fitted value for the specified equation; the default


standard error of the linear prediction
residuals
the predicted value of specified cointegrating equation
one-step prediction of the level of the endogenous variable
compute the predictions using previously predicted cointegrating equations

These statistics are available both in and out of sample; type predict
the estimation sample.

. . . if e(sample) . . . if wanted only for

Menu for predict


Statistics

>

Postestimation

>

Predictions, residuals, etc.

Options for predict




Main

xb, the default, calculates the fitted values for the specified equation. The form of the VECM implies
that these fitted values are the one-step predictions for the first-differenced variables.
stdp calculates the standard error of the linear prediction for the specified equation.
residuals calculates the residuals from the specified equation of the VECM.
ce calculates the predicted value of the specified cointegrating equation.
levels calculates the one-step prediction of the level of the endogenous variable in the requested
equation.
usece(varlistce ) specifies that previously predicted cointegrating equations saved under the names in
varlistce be used to compute the predictions. The number of variables in the varlistce must equal
the number of cointegrating equations specified in the model.
equation(eqno | eqname) specifies to which equation you are referring.
equation() is filled in with one eqno or eqname for xb, residuals, stdp, ce, and levels
options. equation(#1) would mean that the calculation is to be made for the first equation,
equation(#2) would mean the second, and so on. You could also refer to the equation by its
name. equation(D income) would refer to the equation named D income and equation( ce1),
to the first cointegrating equation, which is named ce1 by vec.
If you do not specify equation(), the results are as if you specified equation(#1).
For more information on using predict after multiple-equation estimation commands, see [R] predict.

vec postestimation Postestimation tools for vec

Remarks and examples


Remarks are presented under the following headings:
Model selection and inference
Forecasting

Model selection and inference


See the following sections for information on model selection and inference after vec.
[TS]
[TS]
[TS]
[TS]
[TS]
[TS]

irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs


varsoc Obtain lag-order selection statistics for VARs and VECMs
veclmar Perform LM test for residual autocorrelation after vec
vecnorm Test for normally distributed disturbances after vec
vecrank Estimate the cointegrating rank of a VECM
vecstable Check the stability condition of VECM estimates

Forecasting
See the following sections for information on obtaining forecasts after vec:
[TS] fcast compute Compute dynamic forecasts after var, svar, or vec
[TS] fcast graph Graph forecasts after fcast compute

Also see
[TS] vec Vector error-correction models
[U] 20 Estimation and postestimation commands

[TS] vec intro Introduction to vector error-correction models

761

Title
veclmar Perform LM test for residual autocorrelation after vec
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
Reference

Syntax
veclmar

, options

options

Description

mlag(#)
estimates(estname)
separator(#)

use # for the maximum order of autocorrelation; default is mlag(2)


use previously stored results estname; default is to use active results
draw separator line after every # rows

veclmar can be used only after vec; see [TS] vec.


You must tsset your data before using veclmar; see [TS] tsset.

Menu
Statistics

>

Multivariate time series

>

VEC diagnostics and tests

>

LM test for residual autocorrelation

Description
veclmar implements a Lagrange multiplier (LM) test for autocorrelation in the residuals of vector
error-correction models (VECMs).

Options
mlag(#) specifies the maximum order of autocorrelation to be tested. The integer specified in mlag()
must be greater than 0; the default is 2.
estimates(estname) requests that veclmar use the previously obtained set of vec estimates stored
as estname. By default, veclmar uses the active results. See [R] estimates for information on
manipulating estimation results.
separator(#) specifies how many rows should appear in the table between separator lines. By
default, separator lines do not appear. For example, separator(1) would draw a line between
each row, separator(2) between every other row, and so on.

Remarks and examples


Estimation, inference, and postestimation analysis of VECMs is predicated on the errors not being
autocorrelated. veclmar implements the LM test for autocorrelation in the residuals of a VECM
discussed in Johansen (1995, 2122). The test is performed at lags j = 1, . . . , mlag(). For each j ,
the null hypothesis of the test is that there is no autocorrelation at lag j .
762

veclmar Perform LM test for residual autocorrelation after vec

763

Example 1
We fit a VECM using the regional income data described in [TS] vec and then call veclmar to test
for autocorrelation.
. use http://www.stata-press.com/data/r13/rdinc
. vec ln_ne ln_se
(output omitted )
. veclmar, mlag(4)
Lagrange multiplier test
lag

chi2

df

Prob > chi2

1
2
3
4

8.9586
4.9809
4.8519
0.3270

4
4
4
4

0.06214
0.28926
0.30284
0.98801

H0: no autocorrelation at lag order

At the 5% level, we cannot reject the null hypothesis that there is no autocorrelation in the residuals
for any of the orders tested. Thus this test finds no evidence of model misspecification.

Stored results
veclmar stores the following in r():
Matrices
r(lm)

2 , df, and p-values

Methods and formulas


Consider a VECM without any trend:

yt = yt1 +

p1
X

i yti + t

i=1

As discussed in [TS] vec, as long as the parameters in the cointegrating vectors, , are exactly
identified or overidentified, the estimates of these parameters are superconsistent. This implies that
the r 1 vector of estimated cointegrating relations

bt =
b yt
E

(1)

can be used as data with standard estimation and inference methods. When the parameters of the
b t ; in these
cointegrating equations are not identified, (1) does not provide consistent estimates of E
cases, veclmar exits with an error message.
The VECM above can be rewritten as

bt +
yt = E

p1
X
i=1

i yti + t

764

veclmar Perform LM test for residual autocorrelation after vec

which is just a VAR with p 1 lags where the endogenous variables have been first-differenced and
b . veclmar fits this VAR and then calls varlmar to
is augmented with the exogenous variables E
compute the LM test for autocorrelation.
The above discussion assumes no trend and implicitly ignores constraints on the parameters in
. As discussed in vec, the other four trend specifications considered by Johansen (1995, sec. 5.7)
b t can
complicate the estimation of the free parameters in but do not alter the basic result that the E
be used as data in the subsequent VAR. Similarly, constraints on the parameters in imply that the
b t can still be used as data in
subsequent VAR must be estimated with these constraints applied, but E
the VAR.
See [TS] varlmar for more information on the Johansen LM test.

Reference
Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.

Also see
[TS] vec Vector error-correction models
[TS] varlmar Perform LM test for residual autocorrelation after var or svar
[TS] vec intro Introduction to vector error-correction models

Title
vecnorm Test for normally distributed disturbances after vec
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
vecnorm

, options

options

Description

jbera
skewness
kurtosis
estimates(estname)
dfk

report JarqueBera statistic; default is to report all three statistics


report skewness statistic; default is to report all three statistics
report kurtosis statistic; default is to report all three statistics
use previously stored results estname; default is to use active results
make small-sample adjustment when computing the estimated
variancecovariance matrix of the disturbances
draw separator line after every # rows

separator(#)

vecnorm can be used only after vec; see [TS] vec.

Menu
Statistics

>

Multivariate time series

>

VEC diagnostics and tests

>

Test for normally distributed disturbances

Description
vecnorm computes and reports a series of statistics against the null hypothesis that the disturbances
in a VECM are normally distributed.

Options
jbera requests that the JarqueBera statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
skewness requests that the skewness statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
kurtosis requests that the kurtosis statistic and any other explicitly requested statistic be reported.
By default, the JarqueBera, skewness, and kurtosis statistics are reported.
estimates(estname) requests that vecnorm use the previously obtained set of vec estimates stored
as estname. By default, vecnorm uses the active results. See [R] estimates for information on
manipulating estimation results.
dfk requests that a small-sample adjustment be made when computing the estimated variance
covariance matrix of the disturbances.
separator(#) specifies how many rows should appear in the table between separator lines. By
default, separator lines do not appear. For example, separator(1) would draw a line between
each row, separator(2) between every other row, and so on.
765

766

vecnorm Test for normally distributed disturbances after vec

Remarks and examples


vecnorm computes a series of test statistics of the null hypothesis that the disturbances in a VECM
are normally distributed. For each equation and all equations jointly, up to three statistics may be
computed: a skewness statistic, a kurtosis statistic, and the JarqueBera statistic. By default, all three
statistics are reported; if you specify only one statistic, the others are not reported. The JarqueBera
statistic tests skewness and kurtosis jointly. The single-equation results are against the null hypothesis
that the disturbance for that particular equation is normally distributed. The results for all the equations
are against the null that all K disturbances have a K -dimensional multivariate normal distribution.
Failure to reject the null hypothesis indicates lack of model misspecification.
As noted by Johansen (1995, 141), the log likelihood for the VECM is derived assuming the errors
are independently and identically distributed (i.i.d.) normal, though many of the asymptotic properties
can be derived under the weaker assumption that the errors are merely i.i.d. Many researchers still
prefer to test for normality. vecnorm uses the results from vec to produce a series of statistics against
the null hypothesis that the K disturbances in the VECM are normally distributed.

Example 1
This example uses vecnorm to test for normality after estimating the parameters of a VECM using
the regional income data.
. use http://www.stata-press.com/data/r13/rdinc
. vec ln_ne ln_se
(output omitted )
. vecnorm
Jarque-Bera test
Equation

chi2

df

Prob > chi2

D_ln_ne
D_ln_se
ALL

0.094
0.586
0.680

2
2
4

0.95417
0.74608
0.95381

Skewness test
Equation

Skewness

chi2

df

Prob > chi2

D_ln_ne
D_ln_se
ALL

.05982
.243

0.032
0.522
0.553

1
1
2

0.85890
0.47016
0.75835

Equation

Kurtosis

chi2

df

Prob > chi2

D_ln_ne
D_ln_se
ALL

3.1679
2.8294

0.062
0.064
0.126

1
1
2

0.80302
0.79992
0.93873

Kurtosis test

The JarqueBera results present test statistics for each equation and for all equations jointly
against the null hypothesis of normality. For the individual equations, the null hypothesis is that the
disturbance term in that equation has a univariate normal distribution. For all equations jointly, the
null hypothesis is that the K disturbances come from a K -dimensional normal distribution. In this
example, the single-equation and overall JarqueBera statistics do not reject the null of normality.

vecnorm Test for normally distributed disturbances after vec

767

The single-equation skewness test statistics are of the null hypotheses that the disturbance term
in each equation has zero skewness, which is the skewness of a normally distributed variable. The
row marked ALL shows the results for a test that the disturbances in all equations jointly have zero
skewness. The skewness results shown above do not suggest nonnormality.
The kurtosis of a normally distributed variable is three, and the kurtosis statistics presented in the
table test the null hypothesis that the disturbance terms have kurtosis consistent with normality. The
results in this example do not reject the null hypothesis.

The statistics computed by vecnorm are based on the estimated variancecovariance matrix of the
disturbances. vec saves the ML estimate of this matrix, which vecnorm uses by default. Specifying
the dfk option instructs vecnorm to make a small-sample adjustment to the variancecovariance
matrix before computing the test statistics.

Stored results
vecnorm stores the following in r():
Macros
r(dfk)
Matrices
r(jb)
r(skewness)
r(kurtosis)

dfk, if specified
JarqueBera 2 , df, and p-values
skewness 2 , df, and p-values
kurtosis 2 , df, and p-values

Methods and formulas


As discussed in Methods and formulas of [TS] vec, a cointegrating VECM can be rewritten as a
VAR in first differences that includes the predicted cointegrating equations as exogenous variables.
vecnorm computes the tests discussed in [TS] varnorm for the corresponding augmented VAR in first
differences. See Methods and formulas of [TS] veclmar for more information on this approach.
When the parameters of the cointegrating equations are not identified, the consistent estimates
of the cointegrating equations are not available, and, in these cases, vecnorm exits with an error
message.

References
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Jarque, C. M., and A. K. Bera. 1987. A test for normality of observations and regression residuals. International
Statistical Review 2: 163172.
Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] vec Vector error-correction models
[TS] varnorm Test for normally distributed disturbances after var or svar
[TS] vec intro Introduction to vector error-correction models

Title
vecrank Estimate the cointegrating rank of a VECM
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
vecrank depvarlist

if

 

in

 

, options

Description

options
Model

lags(#)
trend(constant)
trend(rconstant)
trend(trend)
trend(rtrend)
trend(none)

use # for the maximum lag in underlying VAR model


include an unrestricted constant in model; the default
include a restricted constant in model
include a linear trend in the cointegrating equations and a
quadratic trend in the undifferenced data
include a restricted trend in model
do not include a trend or a constant

Adv. model

sindicators(varlistsi )
noreduce

include normalized seasonal indicator variables varlistsi


do not perform checks and corrections for collinearity among lags
of dependent variables

Reporting

do not report the trace statistic


report maximum-eigenvalue statistic
report information criteria
report 1% critical values instead of 5% critical values
report both 1% and 5% critical values

notrace
max
ic
level99
levela

You must tsset your data before using vecrank; see [TS] tsset.
depvar may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
vecrank does not allow gaps in the data.

Menu
Statistics

>

Multivariate time series

>

Cointegrating rank of a VECM

Description
vecrank produces statistics used to determine the number of cointegrating equations in a vector
error-correction model (VECM).
768

vecrank Estimate the cointegrating rank of a VECM

769

Options


Model

lags(#) specifies the number of lags in the VAR representation of the model. The VECM will include
one fewer lag of the first differences. The number of lags must be greater than zero but small
enough so that the degrees of freedom used by the model are less than the number of observations.
trend(trend spec) specifies one of five trend specifications to include in the model. See [TS] vec
intro and [TS] vec for descriptions. The default is trend(constant).

Adv. model

sindicators(varlistsi ) specifies normalized seasonal indicator variables to be included in the model.


The indicator variables specified in this option must be normalized as discussed in Johansen (1995,
84). If the indicators are not properly normalized, the likelihood-ratiobased tests for the number
of cointegrating equations do not converge to the asymptotic distributions derived by Johansen.
For details, see Methods and formulas of [TS] vec. sindicators() cannot be specified with
trend(none) or trend(rconstant)
noreduce causes vecrank to skip the checks and corrections for collinearity among the lags of
the dependent variables. By default, vecrank checks whether the current lag specification causes
some of the regressions performed by vecrank to contain perfectly collinear variables and reduces
the maximum lag until the perfect collinearity is removed. See Collinearity in [TS] vec for more
information.

Reporting

notrace requests that the output for the trace statistic not be displayed. The default is to display the
trace statistic.
max requests that the output for the maximum-eigenvalue statistic be displayed. The default is to not
display this output.
ic causes the output for the information criteria to be displayed. The default is to not display this
output.
level99 causes the 1% critical values to be displayed instead of the default 5% critical values.
levela causes both the 1% and the 5% critical values to be displayed.

Remarks and examples


Remarks are presented under the following headings:
Introduction
The trace statistic
The maximum-eigenvalue statistic
Minimizing an information criterion

Introduction
Before estimating the parameters of a VECM models, you must choose the number of lags in the
underlying VAR, the trend specification, and the number of cointegrating equations. vecrank offers
several ways of determining the number of cointegrating vectors conditional on a trend specification
and lag order.

770

vecrank Estimate the cointegrating rank of a VECM

vecrank implements three types of methods for determining r, the number of cointegrating
equations in a VECM. The first is Johansens trace statistic method. The second is his maximum
eigenvalue statistic method. The third method chooses r to minimize an information criterion.
All three methods are based on Johansens maximum likelihood (ML) estimator of the parameters
of a cointegrating VECM. The basic VECM is
0

yt = yt1 +

p1
X

i yti + t

t=1

where y is a (K 1) vector of I(1) variables, and are (K r) parameter matrices with rank
r < K , 1 , . . . , p1 are (K K) matrices of parameters, and t is a (K 1) vector of normally
distributed errors that is serially uncorrelated but has contemporaneous covariance matrix .
Building on the work of Anderson (1951), Johansen (1995) derives an ML estimator for the
parameters and two likelihood-ratio (LR) tests for inference on r. These LR tests are known as the
trace statistic and the maximum-eigenvalue statistic because the log likelihood can be written as the
log of the determinant of a matrix plus a simple function of the eigenvalues of another matrix.
Let 1 , . . . , K be the K eigenvalues used in computing the log likelihood at the optimum.
Furthermore, assume that these eigenvalues are sorted from the largest 1 to the smallest K . If there
are r < K cointegrating equations, and have rank r and the eigenvalues r+1 , . . . , K are zero.

The trace statistic


The null hypothesis of the trace statistic is that there are no more than r cointegrating relations.
Restricting the number of cointegrating equations to be r or less implies that the remaining K r
eigenvalues are zero. Johansen (1995, chap. 11 and 12) derives the distribution of the trace statistic

K
X

bi )
ln(1

i=r+1

bi are the estimated eigenvalues. For any given value


where T is the number of observations and the
of r, large values of the trace statistic are evidence against the null hypothesis that there are r or
fewer cointegrating relations in the VECM.
One of the problems in determining the number of cointegrating equations is that the process
involves more than one statistical test. Johansen (1995, chap. 6, 11, and 12) derives a method based
on the trace statistic that has nominal coverage despite evaluating multiple tests. This method can
be interpreted as being an estimator rb of the true number of cointegrating equations r0 . The method
starts testing at r = 0 and accepts as rb the first value of r for which the trace statistic fails to reject
the null.

Example 1
We have quarterly data on the natural logs of aggregate consumption, investment, and GDP in
the United States from the first quarter of 1959 through the fourth quarter of 1982. As discussed in
King et al. (1991), the balanced-growth hypothesis in economics implies that we would expect to
find two cointegrating equations among these three variables. In the output below, we use vecrank
to determine the number of cointegrating equations using Johansens multiple-trace test method.

vecrank Estimate the cointegrating rank of a VECM


. use http://www.stata-press.com/data/r13/balance2
(macro data for VECM/balance study)
. vecrank y i c, lags(5)
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1960q2 - 1982q4
Lags =

maximum
rank
0
1
2
3

parms
39
44
47
48

LL
1231.1041
1245.3882
1252.5055
1254.1787

eigenvalue
.
0.26943
0.14480
0.03611

771

91
5

5%
trace
critical
statistic
value
46.1492
29.68
17.5810
15.41
3.3465*
3.76

The header produces information about the sample, the trend specification, and the number of
lags included in the model. The main table contains a separate row for each possible value of r, the
number of cointegrating equations. When r = 3, all three variables in this model are stationary.
In this example, because the trace statistic at r = 0 of 46.1492 exceeds its critical value of 29.68,
we reject the null hypothesis of no cointegrating equations. Similarly, because the trace statistic at
r = 1 of 17.581 exceeds its critical value of 15.41, we reject the null hypothesis that there is one or
fewer cointegrating equation. In contrast, because the trace statistic at r = 2 of 3.3465 is less than its
critical value of 3.76, we cannot reject the null hypothesis that there are two or fewer cointegrating
equations. Because Johansens method for estimating r is to accept as rb the first r for which the null
hypothesis is not rejected, we accept r = 2 as our estimate of the number of cointegrating equations
between these three variables. The * by the trace statistic at r = 2 indicates that this is the value
of r selected by Johansens multiple-trace test procedure. The eigenvalue shown in the last line of
output computes the trace statistic in the preceding line.

Example 2
In the previous example, we used the default 5% critical values. We can estimate r with 1%
critical values instead by specifying the level99 option.
. vecrank y i c, lags(5) level99
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1960q2 - 1982q4
Lags =

maximum
rank
0
1
2
3

parms
39
44
47
48

LL
1231.1041
1245.3882
1252.5055
1254.1787

eigenvalue
.
0.26943
0.14480
0.03611

91
5

1%
trace
critical
statistic
value
46.1492
35.65
17.5810*
20.04
3.3465
6.65

The output indicates that switching from the 5% to the 1% level changes the resulting estimate from
r = 2 to r = 1.

772

vecrank Estimate the cointegrating rank of a VECM

The maximum-eigenvalue statistic


The alternative hypothesis of the trace statistic is that the number of cointegrating equations is
strictly larger than the number r assumed under the null hypothesis. Instead, we could assume a
given r under the null hypothesis and test this against the alternative that there are r + 1 cointegrating
equations. Johansen (1995, chap. 6, 11, and 12) derives an LR test of the null of r cointegrating
relations against the alternative of r + 1 cointegrating relations. Because the part of the log likelihood
that changes with r is a simple function of the eigenvalues of a (K K) matrix, this test is known
as the maximum-eigenvalue statistic. This method is used less often than the trace statistic method
because no solution to the multiple-testing problem has yet been found.

Example 3
In the output below, we reexamine the balanced-growth hypothesis. We use the levela option to
obtain both the 5% and 1% critical values, and we use the notrace option to suppress the table of
trace statistics.
. vecrank y i c, lags(5) max levela notrace
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1960q2 - 1982q4
Lags =
maximum
rank
0
1
2
3

parms
39
44
47
48

LL
1231.1041
1245.3882
1252.5055
1254.1787

eigenvalue
0.26943
0.14480
0.03611

max
statistic
28.5682
14.2346
3.3465

5% critical
value
20.97
14.07
3.76

91
5

1% critical
value
25.52
18.63
6.65

We can reject r = 1 in favor of r = 2 at the 5% level but not at the 1% level. As with the trace
statistic method, whether we choose to specify one or two cointegrating equations in our VECM will
depend on the significance level we use here.

Minimizing an information criterion


Many multiple-testing problems in the time-series literature have been solved by defining an
estimator that minimizes an information criterion with known asymptotic properties. Selecting the lag
length in an autoregressive model is probably the best-known example. Gonzalo and Pitarakis (1998)
and Aznar and Salvador (2002) have shown that this approach can be applied to determining the
number of cointegrating equations in a VECM. As in the lag-length selection problem, choosing the
number of cointegrating equations that minimizes either the Schwarz Bayesian information criterion
(SBIC) or the Hannan and Quinn information criterion (HQIC) provides a consistent estimator of the
number of cointegrating equations.

Example 4
We use these information-criteria methods to estimate the number of cointegrating equations in
our balanced-growth data.

vecrank Estimate the cointegrating rank of a VECM


. vecrank y i c, lags(5) ic notrace
Johansen tests for cointegration
Trend: constant
Number of obs =
Sample: 1960q2 - 1982q4
Lags =
maximum
rank
0
1
2
3

parms
39
44
47
48

LL
1231.1041
1245.3882
1252.5055
1254.1787

eigenvalue
0.26943
0.14480
0.03611

SBIC
-25.12401
-25.19009
-25.19781*
-25.18501

HQIC
-25.76596
-25.91435
-25.97144*
-25.97511

773

91
5

AIC
-26.20009
-26.40414
-26.49463
-26.50942

Both the SBIC and the HQIC estimators suggest that there are two cointegrating equations in the
balanced-growth data.

Stored results
vecrank stores the following in e():
Scalars
e(N)
e(k eq)
e(k dv)
e(tmin)
e(tmax)
e(n lags)
e(k ce95)
e(k ce99)
e(k cesbic)
e(k cehqic)
Macros
e(cmd)
e(cmdline)
e(trend)
e(reduced lags)
e(reduce opt)
e(tsfmt)
Matrices
e(max)
e(trace)
e(lambda)
e(k rank)
e(hqic)
e(sbic)
e(aic)

number of observations
number of equations in e(b)
number of dependent variables
minimum time
maximum time
number of lags
number of cointegrating equations
number of cointegrating equations
number of cointegrating equations
number of cointegrating equations

chosen
chosen
chosen
chosen

by
by
by
by

multiple trace tests with level(95)


multiple trace tests with level(99)
minimizing SBIC
minimizing HQIC

vecrank
command as typed
trend specified
list of maximum lags to which the model has been reduced
noreduce, if noreduce is specified
format for current time variable
vector
vector
vector
vector
vector
vector
vector

of
of
of
of
of
of
of

maximum-eigenvalue statistics
trace statistics
eigenvalues
numbers of unconstrained parameters
HQIC values
SBIC values
AIC values

Methods and formulas


As shown in Methods and formulas of [TS] vec, given a lag, trend, and seasonal specification
when there are 0 r K cointegrating equations, the log likelihood with the Johansen identification
restrictions can be written as
#
"
r


X
1
bi
L = T K {ln (2) + 1} + ln (|S00 |) +
ln 1
(1)
2
i=1

774

vecrank Estimate the cointegrating rank of a VECM

bi are defined in Methods and formulas of


where the (K K) matrix S00 and the eigenvalues
[TS] vec.
The trace statistic compares the null hypothesis that there are r or fewer cointegrating relations with
the alternative hypothesis that there are more than r cointegrating equations. Under the alternative
hypothesis, the log likelihood is

"
#
K


X
1
bi
LA = T K {ln (2) + 1} + ln (|S00 |) +
ln 1
2
i=1

(2)

Thus the LR test that compares the unrestricted model in (2) with the restricted model in (1) is given
by
K


X
bi
LRtrace = T
ln 1
i=r+1

As discussed by Johansen (1995), the trace statistic has a nonstandard distribution under the null
hypothesis because the null hypothesis places restrictions on the coefficients on yt1 , which is
assumed to have K r random-walk components. vecrank reports the Osterwald-Lenum (1992)
critical values.
The maximum-eigenvalue statistic compares the null model containing r cointegrating relations
with the alternative model that has r + 1 cointegrating relations. Thus using these two values for r
in (1) and a few lines of algebra implies that the LR test of this hypothesis is



br+1
LRmax = T ln 1
As for the trace statistic, because this test involves restrictions on the coefficients on a vector of
I(1) variables, the test statistics distribution will be nonstandard. vecrank reports the OsterwaldLenum (1992) critical values.
The formulas for the AIC, SBIC, and HQIC are given in Methods and formulas of [TS] vec.

Sren Johansen (1939 ) earned degrees in mathematical statistics at the University of Copenhagen,
where he is now based. In addition to making contributions to mathematical statistics, probability
theory, and medical statistics, he has worked mostly in econometricsin particular, on the theory
of cointegration.

References
Anderson, T. W. 1951. Estimating linear restrictions on regression coefficients for multivariate normal distributions.
Annals of Mathematical Statistics 22: 327351.
Aznar, A., and M. Salvador. 2002. Selecting the rank of the cointegration space and the form of the intercept using
an information criterion. Econometric Theory 18: 926947.
Engle, R. F., and C. W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing.
Econometrica 55: 251276.
Gonzalo, J., and J.-Y. Pitarakis. 1998. Specification via model selection in vector error correction models. Economics
Letters 60: 321328.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.

vecrank Estimate the cointegrating rank of a VECM

775

Hubrich, K., H. Lutkepohl, and P. Saikkonen. 2001. A review of systems cointegration tests. Econometric Reviews
20: 247318.
Johansen, S. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12:
231254.
. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models.
Econometrica 59: 15511580.
. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
King, R. G., C. I. Plosser, J. H. Stock, and M. W. Watson. 1991. Stochastic trends and economic fluctuations.
American Economic Review 81: 819840.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.
Maddala, G. S., and I.-M. Kim. 1998. Unit Roots, Cointegration, and Structural Change. Cambridge: Cambridge
University Press.
Osterwald-Lenum, M. G. 1992. A note with quantiles of the asymptotic distribution of the maximum likelihood
cointegration rank test statistics. Oxford Bulletin of Economics and Statistics 54: 461472.
Park, J. Y., and P. C. B. Phillips. 1988. Statistical inference in regressions with integrated processes: Part I. Econometric
Theory 4: 468497.
. 1989. Statistical inference in regressions with integrated processes: Part II. Econometric Theory 5: 95131.
Phillips, P. C. B. 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33: 311340.
Phillips, P. C. B., and S. N. Durlauf. 1986. Multiple time series regressions with integrated processes. Review of
Economic Studies 53: 473495.
Sims, C. A., J. H. Stock, and M. W. Watson. 1990. Inference in linear time series models with some unit roots.
Econometrica 58: 113144.
Stock, J. H. 1987. Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica 55:
10351056.
Stock, J. H., and M. W. Watson. 1988. Testing for common trends. Journal of the American Statistical Association
83: 10971107.
Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F.
Engle and D. L. McFadden. Amsterdam: Elsevier.

Also see
[TS] tsset Declare data to be time-series data
[TS] vec Vector error-correction models
[TS] vec intro Introduction to vector error-correction models

Title
vecstable Check the stability condition of VECM estimates
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Options
References

Syntax
vecstable

, options


Description

options
Main

estimates(estname)
amat(matrix name)
graph
dlabel
modlabel
marker options
rlopts(cline options)
nogrid

pgrid( . . . )

use previously stored results estname; default is to use active results


save the companion matrix as matrix name
graph eigenvalues of the companion matrix
label eigenvalues with the distance from the unit circle
label eigenvalues with the modulus
change look of markers (color, size, etc.)
affect rendition of reference unit circle
suppress polar grid circles
specify radii and appearance of polar grid circles; see Options for details

Add plots

addplot(plot)

add other plots to the generated graph

Y axis, X axis, Titles, Legend, Overall

twoway options

any options other than by() documented in [G-3] twoway options

vecstable can be used only after vec; see [TS] vec.

Menu
Statistics

>

Multivariate time series

>

VEC diagnostics and tests

>

Check stability condition of VEC estimates

Description
vecstable checks the eigenvalue stability condition in a vector error-correction model (VECM)
fit using vec.

Options


Main

estimates(estname) requests that vecstable use the previously obtained set of vec estimates stored
as estname. By default, vecstable uses the active results. See [R] estimates for information on
manipulating estimation results.
776

vecstable Check the stability condition of VECM estimates

777

amat(matrix name) specifies a valid Stata matrix name by which the companion matrix can be saved.
The companion matrix is referred to as the A matrix in Lutkepohl (2005) and [TS] varstable. The
default is not to save the companion matrix.
graph causes vecstable to draw a graph of the eigenvalues of the companion matrix.
dlabel labels the eigenvalues with their distances from the unit circle. dlabel cannot be specified
with modlabel.
modlabel labels the eigenvalues with their moduli. modlabel cannot be specified with dlabel.
marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.
rlopts(cline options) affects the rendition of the reference unit circle; see [G-3] cline options.
nogrid suppresses the polar grid circles.


 



pgrid( numlist , line options ) pgrid( numlist , line options ) . . .
 


pgrid( numlist , line options ) determines the radii and appearance of the polar grid circles.
By default, the graph includes nine polar grid circles with radii 0.1, 0.2, . . . , 0.9 that have the grid
linestyle. The numlist specifies the radii for the polar grid circles. The line options determine the
appearance of the polar grid circles; see [G-3] line options. Because the pgrid() option can be
repeated, circles with different radii can have distinct appearances.

Add plots

addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Inference after vec requires that the cointegrating equations be stationary and that the number
of cointegrating equations be correctly specified. Although the methods implemented in vecrank
identify the number of stationary cointegrating equations, they assume that the individual variables are
I(1). vecstable provides indicators of whether the number of cointegrating equations is misspecified
or whether the cointegrating equations, which are assumed to be stationary, are not stationary.
vecstable is analogous to varstable. vecstable uses the coefficient estimates from the
previously fitted VECM to back out estimates of the coefficients of the corresponding VAR and then
compute the eigenvalues of the companion matrix. See [TS] varstable for details about how the
companion matrix is formed and about how to interpret the resulting eigenvalues for covariancestationary VAR models.
If a VECM has K endogenous variables and r cointegrating vectors, there will be K r unit
moduli in the companion matrix. If any of the remaining moduli computed by vecrank are too close
to one, either the cointegrating equations are not stationary or there is another common trend and
the rank() specified in the vec command is too high. Unfortunately, there is no general distribution
theory that allows you to determine whether an estimated root is too close to one for all the cases
that commonly arise in practice.

778

vecstable Check the stability condition of VECM estimates

Example 1
In example 1 of [TS] vec, we estimated the parameters of a bivariate VECM of the natural logs
of the average disposable incomes in two of the economic regions created by the U.S. Bureau of
Economic Analysis. In that example, we concluded that the predicted cointegrating equation was
probably not stationary. Here we continue that example by refitting that model and using vecstable
to analyze the eigenvalues of the companion matrix of the corresponding VAR.
. use http://www.stata-press.com/data/r13/rdinc
. vec ln_ne ln_se
(output omitted )
. vecstable
Eigenvalue stability condition
Eigenvalue
1
.9477854
.2545357 +
.2545357 -

Modulus
1
.947785
.343914
.343914

.2312756i
.2312756i

The VECM specification imposes a unit modulus.

The output contains a table showing the eigenvalues of the companion matrix and their associated
moduli. The table shows that one of the roots is 1. The table footer reminds us that the specified
VECM imposes one unit modulus on the companion matrix.
The output indicates that there is a real root at about 0.95. Although there is no distribution
theory to measure how close this root is to one, per other discussions in the literature (for example,
Johansen [1995, 137138]), we conclude that the root of 0.95 supports our earlier analysis, in which
we concluded that the predicted cointegrating equation is probably not stationary.
If we had included the graph option with vecstable, the following graph would have been
displayed:

.5

Imaginary
0

.5

Roots of the companion matrix

.5

0
Real

.5

The VECM specification imposes 1 unit modulus

The graph plots the eigenvalues of the companion matrix with the real component on the x axis and
the imaginary component on the y axis. Although the information is the same as in the table, the
graph shows visually how close the root with modulus 0.95 is to the unit circle.

vecstable Check the stability condition of VECM estimates

779

Stored results
vecstable stores the following in r():
Scalars
r(unitmod)
Matrices
r(Re)
r(Im)
r(Modulus)

number of unit moduli imposed on the companion matrix


real part of the eigenvalues of A
imaginary part of the eigenvalues of A
moduli of the eigenvalues of A

where A is the companion matrix of the VAR that corresponds to the VECM.

Methods and formulas


vecstable uses the formulas given in Methods and formulas of [TS] irf create to obtain estimates of
the parameters in the corresponding VAR from the vec estimates. With these estimates, the calculations
are identical to those discussed in [TS] varstable. In particular, the derivation of the companion matrix,
A, from the VAR point estimates is given in [TS] varstable.

References
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University
Press.
Lutkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer.

Also see
[TS] vec Vector error-correction models
[TS] vec intro Introduction to vector error-correction models

Title
wntestb Bartletts periodogram-based test for white noise
Syntax
Remarks and examples
References

Menu
Stored results
Also see

Description
Methods and formulas

Options
Acknowledgment

Syntax
wntestb varname

if

 

in

 

, options

Description

options
Main

display a table instead of graphical output


set confidence level; default is level(95)

table
level(#)
Plot

change look of markers (color, size, etc.)


add marker labels; change look or position
add connecting lines; change look

marker options
marker label options
cline options
Add plots

add other plots to the generated graph

addplot(plot)

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

You must tsset your data before using wntestb; see [TS] tsset. In addition, the time series must be dense
(nonmissing with no gaps in the time variable) in the specified sample.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Tests

>

Bartletts periodogram-based white-noise test

Description
wntestb performs Bartletts periodogram-based test for white noise. The result is presented
graphically by default but optionally may be presented as text in a table.

Options


Main

table displays the test results as a table instead of as the default graph.
level(#) specifies the confidence level, as a percentage, for the confidence bands included on the
graph. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of
confidence intervals.
780

wntestb Bartletts periodogram-based test for white noise

781

Plot

marker options specify the look of markers. This look includes the marker symbol, the marker size,
and its color and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see
[G-3] marker label options.
cline options specify if the points are to be connected with lines and the rendition of those lines; see
[G-3] cline options.

Add plots

addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Bartletts test is a test of the null hypothesis that the data come from a white-noise process of
uncorrelated random variables having a constant mean and a constant variance.
For a discussion of this test, see Bartlett (1955, 9294), Newton (1988, 172), or Newton (1996).

Example 1
In this example, we generate two time series and show the graphical and statistical tests that can
be obtained from this command. The first time series is a white-noise process, and the second is a
white-noise process with an embedded deterministic cosine curve.
. drop _all
. set seed 12393
. set obs 100
obs was 0, now 100
. generate x1 = rnormal()
. generate x2 = rnormal() + cos(2*_pi*(_n-1)/10)
. generate time = _n
. tsset time
time variable:
delta:

time, 1 to 100
1 unit

We can then submit the white-noise data to the wntestb command by typing

782

wntestb Bartletts periodogram-based test for white noise


. wntestb x1

0.00

Cumulative periodogram for x1


0.20
0.40
0.60
0.80

1.00

Cumulative Periodogram WhiteNoise Test

0.00

0.10

Bartletts (B) statistic =

0.20
0.30
Frequency

0.40

0.50

0.71 Prob > B = 0.6957

We can see in the graph that the values never appear outside the confidence bands. The test statistic
has a p-value of 0.91, so we conclude that the process is not different from white noise. If we had
wanted only the statistic without the plot, we could have used the table option.
Turning our attention to the other series (x2), we type
. wntestb x2

0.00

Cumulative periodogram for x2


0.20
0.40
0.60
0.80

1.00

Cumulative Periodogram WhiteNoise Test

0.00

0.10

Bartletts (B) statistic =

0.20
0.30
Frequency

0.40

0.50

1.83 Prob > B = 0.0024

Here the process does appear outside of the bands. In fact, it steps out of the bands at a frequency
of 0.1 (exactly as we synthesized this process). We also have confirmation from the test statistic, at
a p-value of 0.001, that the process is significantly different from white noise.

wntestb Bartletts periodogram-based test for white noise

783

Stored results
wntestb stores the following in r():
Scalars
r(stat)

Bartletts statistic

r(p)

probability value

Methods and formulas


If x(1), . . . , x(T ) is a realization from a white-noise process with variance 2 , the spectral
distribution would be given by F () = 2 for [ 0, 1 ], and we would expect the cumulative
periodogram (see [TS] cumsp) of the data to be close to the points Sk = k/q for q = bn/2c + 1, k =
1, . . . , q . bn/2c is the greatest integer less than or equal to n/2.
Except for = 0 and = .5, the random variables 2fb(k )/ 2 are asymptotically independently
and identically distributed as 22 . Because 22 is the same as twice a random variable distributed
exponentially with mean 1, the cumulative periodogram has approximately the same distribution as
the ordered values from a uniform (on the unit interval) distribution. Feller (1948) shows that this
results in





X
2 2
k

(1)j e2a j = G(a)
lim Pr max q Uk a =
q
1kq
q
j=

where Uk is the ordered uniform quantile. The Bartlett statistic is computed as

r
B = max

1kq



n b
k
F

k
2
q

where Fbk is the cumulative periodogram defined in terms of the sample spectral density fb (see
[TS] pergram) as
Pk b
j=1 f (j )
Fbk = Pq
b
j=1 f (j )
The associated p-value for the Bartlett statistic and the confidence bands on the graph are computed
as 1 G(B) using Fellers result.

Maurice Stevenson Bartlett (19102002) was a British statistician. Apart from a short period
in industry, he spent his career teaching and researching at the universities of Cambridge,
Manchester, London (University College), and Oxford. His many contributions include work on
the statistical analysis of multivariate data (especially factor analysis) and time series and on
stochastic models of population growth, epidemics, and spatial processes.

Acknowledgment
wntestb is based on the wntestf command by H. Joseph Newton (1996) of the Department of
Statistics at Texas A&M University and coeditor of the Stata Journal.

784

wntestb Bartletts periodogram-based test for white noise

References
Bartlett, M. S. 1955. An Introduction to Stochastic Processes with Special Reference to Methods and Applications.
Cambridge: Cambridge University Press.
Feller, W. 1948. On the KolmogorovSmirnov limit theorems for empirical distributions. Annals of Mathematical
Statistics 19: 177189.
Gani, J. 2002. Professor M. S. Bartlett FRS, 19102002. Statistician 51: 399402.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.
. 1996. sts12: A periodogram-based test for white noise. Stata Technical Bulletin 34: 3639. Reprinted in Stata
Technical Bulletin Reprints, vol. 6, pp. 203207. College Station, TX: Stata Press.
Olkin, I. 1989. A conversation with Maurice Bartlett. Statistical Science 4: 151163.

Also see
[TS] tsset Declare data to be time-series data
[TS] corrgram Tabulate and graph autocorrelations
[TS] cumsp Cumulative spectral distribution
[TS] pergram Periodogram
[TS] wntestq Portmanteau (Q) test for white noise

Title
wntestq Portmanteau (Q) test for white noise
Syntax
Remarks and examples
Also see

Menu
Stored results

Description
Methods and formulas

Option
References

Syntax
wntestq varname

if

 

in

 

, lags(#)

You must tsset your data before using wntestq; see [TS] tsset. Also the time series must be dense (nonmissing
with no gaps in the time variable) in the specified sample.
varname may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Tests

>

Portmanteau white-noise test

Description
wntestq performs the portmanteau (or Q) test for white noise.

Option
lags(#) specifies the number of autocorrelations to calculate. The default is to use min(bn/2c 2, 40),
where bn/2c is the greatest integer less than or equal to n/2.

Remarks and examples


Box and Pierce (1970) developed a portmanteau test of white noise that was refined by Ljung and
Box (1978). See also Diggle (1990, sec. 2.5).

Example 1
In the example shown in [TS] wntestb, we generated two time series. One (x1) was a white-noise
process, and the other (x2) was a white-noise process with an embedded cosine curve. Here we
compare the output of the two tests.
. drop _all
. set seed 12393
. set obs 100
obs was 0, now 100
. generate x1 = rnormal()
. generate x2 = rnormal() + cos(2*_pi*(_n-1)/10)
. generate time = _n
. tsset time
time variable: time, 1 to 100
delta: 1 unit

785

786

wntestq Portmanteau (Q) test for white noise


. wntestb x1, table
Cumulative periodogram white-noise test
Bartletts (B) statistic =
0.7093
Prob > B
=
0.6957
. wntestq x1
Portmanteau test for white noise
Portmanteau (Q) statistic =
Prob > chi2(40)
=

32.6863
0.7875

. wntestb x2, table


Cumulative periodogram white-noise test
Bartletts (B) statistic =
1.8323
Prob > B
=
0.0024
. wntestq x2
Portmanteau test for white noise
Portmanteau (Q) statistic =
Prob > chi2(40)
=

129.4436
0.0000

This example shows that both tests agree. For the first process, the Bartlett and portmanteau tests
result in nonsignificant test statistics: a p-value of 0.9053 for wntestb and one of 0.9407 for wntestq.
For the second process, each test has a significant result to 0.0010.

Stored results
wntestq stores the following in r():
Scalars
r(stat)
r(df)

Q statistic
degrees of freedom

r(p)

probability value

Methods and formulas


The portmanteau test relies on the fact that if x(1), . . . , x(n) is a realization from a white-noise
process. Then
m
X
1
b 2 (j) 2m
Q = n(n + 2)
nj
j=1

where m is the number of autocorrelations calculated (equal to the number of lags specified) and
indicates convergence in distribution to a 2 distribution with m degrees of freedom. bj is the
estimated autocorrelation for lag j ; see [TS] corrgram for details.

wntestq Portmanteau (Q) test for white noise

787

References
Box, G. E. P., and D. A. Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving
average time series models. Journal of the American Statistical Association 65: 15091526.
Diggle, P. J. 1990. Time Series: A Biostatistical Introduction. Oxford: Oxford University Press.
Ljung, G. M., and G. E. P. Box. 1978. On a measure of lack of fit in time series models. Biometrika 65: 297303.
Sperling, R. I., and C. F. Baum. 2001. sts19: Multivariate portmanteau (Q) test for white noise. Stata Technical
Bulletin 60: 3941. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 373375. College Station, TX:
Stata Press.

Also see
[TS] tsset Declare data to be time-series data
[TS] corrgram Tabulate and graph autocorrelations
[TS] cumsp Cumulative spectral distribution
[TS] pergram Periodogram
[TS] wntestb Bartletts periodogram-based test for white noise

Title
xcorr Cross-correlogram for bivariate time series
Syntax
Remarks and examples

Menu
Methods and formulas

Description
References

Options
Also see

Syntax
xcorr varname1 varname2

if

 

in

 

, options

Description

options
Main

create newvar containing cross-correlation values


display a table instead of graphical output
do not include the character-based plot in tabular output
include # lags and leads in graph

generate(newvar)
table
noplot
lags(#)
Plot

value to drop to; default is 0


change look of markers (color, size, etc.)
add marker labels; change look or position
change look of dropped lines

base(#)
marker options
marker label options
line options
Add plots

add other plots to the generated graph

addplot(plot)

Y axis, X axis, Titles, Legend, Overall

any options other than by() documented in [G-3] twoway options

twoway options

You must tsset your data before using xcorr; see [TS] tsset.
varname1 and varname2 may contain time-series operators; see [U] 11.4.4 Time-series varlists.

Menu
Statistics

>

Time series

>

Graphs

>

Cross-correlogram for bivariate time series

Description
xcorr plots the sample cross-correlation function.

Options


Main

generate(newvar) specifies a new variable to contain the cross-correlation values.


table requests that the results be presented as a table rather than the default graph.
noplot requests that the table not include the character-based plot of the cross-correlations.
788

xcorr Cross-correlogram for bivariate time series

789

lags(#) indicates the number of lags and leads to include in the graph. The default is to use
min(b n/2 c 2, 20).

Plot

base(#) specifies the value from which the lines should extend. The default is base(0).
marker options, marker label options, and line options affect the rendition of the plotted crosscorrelations.
marker options specify the look of markers. This look includes the marker symbol, the marker
size, and its color and outline; see [G-3] marker options.
marker label options specify if and how the markers are to be labeled; see
[G-3] marker label options.
line options specify the look of the dropped lines, including pattern, width, and color; see
[G-3] line options.

Add plots

addplot(plot) provides a way to add other plots to the generated graph; see [G-3] addplot option.

Y axis, X axis, Titles, Legend, Overall

twoway options are any of the options documented in [G-3] twoway options, excluding by(). These
include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see
[G-3] saving option).

Remarks and examples


Example 1
We have a bivariate time series (Box, Jenkins, and Reinsel 2008, Series J) on the input and output
of a gas furnace, where 296 paired observations on the input (gas rate) and output (% CO2 ) were
recorded every 9 seconds. The cross-correlation function is given by
. use http://www.stata-press.com/data/r13/furnace
(TIMESLAB: Gas furnace)
. xcorr input output, xline(5) lags(40)

1.00
0.50
0.00
0.50

40

20

0
Lag

20

40

1.00

1.00

Crosscorrelations of input and output


0.50
0.00
0.50
1.00

Crosscorrelogram

790

xcorr Cross-correlogram for bivariate time series

We included a vertical line at lag 5, because there is a well-defined peak at this value. This peak
indicates that the output lags the input by five periods. Further, the fact that the correlations are
negative indicates that as input (coded gas rate) is increased, output (% CO2 ) decreases.
We may obtain the table of autocorrelations and the character-based plot of the cross-correlations
(analogous to the univariate time-series command corrgram) by specifying the table option.
. xcorr input output, table
-1
0
1
LAG
CORR
[Cross-correlation]
-20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

-0.1033
-0.1027
-0.0998
-0.0932
-0.0832
-0.0727
-0.0660
-0.0662
-0.0751
-0.0927
-0.1180
-0.1484
-0.1793
-0.2059
-0.2266
-0.2429
-0.2604
-0.2865
-0.3287
-0.3936
-0.4845
-0.5985
-0.7251
-0.8429
-0.9246
-0.9503
-0.9146
-0.8294
-0.7166
-0.5998
-0.4952
-0.4107
-0.3479
-0.3049
-0.2779
-0.2632
-0.2548
-0.2463
-0.2332
-0.2135
-0.1869

Once again, the well-defined peak is apparent in the plot.

xcorr Cross-correlogram for bivariate time series

791

Methods and formulas


The cross-covariance function of lag k for time series x1 and x2 is given by

n
o
Cov x1 (t), x2 (t + k) = R12 (k)
This function is not symmetric about lag zero; that is,

R12 (k) 6= R12 (k)


We define the cross-correlation function as

n
o
Rij (k)
ij (k) = Corr xi (t), xj (t + k) = p
Rii (0)Rjj (0)
where 11 and 22 are the autocorrelation functions for x1 and x2 , respectively. The sequence 12 (k)
is the cross-correlation function and is drawn for lags k (Q, Q + 1, . . . , 1, 0, 1, . . . , Q 1, Q).
If 12 (k) = 0 for all lags, x1 and x2 are not cross-correlated.

References
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed.
Hoboken, NJ: Wiley.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth.

Also see
[TS] tsset Declare data to be time-series data
[TS] corrgram Tabulate and graph autocorrelations
[TS] pergram Periodogram

Glossary
add factor. An add factor is a quantity added to an endogenous variable in a forecast model. Add
factors can be used to incorporate outside information into a model, and they can be used to
produce forecasts under alternative scenarios.
ARCH model. An autoregressive conditional heteroskedasticity (ARCH) model is a regression model
in which the conditional variance is modeled as an autoregressive (AR) process. The ARCH(m)
model is
y t = x t + t

E(2t |2t1 , 2t2 , . . .) = 0 + 1 2t1 + + m 2tm


where t is a white-noise error term. The equation for yt represents the conditional mean of
the process, and the equation for E(2t |2t1 , 2t2 , . . .) specifies the conditional variance as an
autoregressive function of its past realizations. Although the conditional variance changes over
time, the unconditional variance is time invariant because yt is a stationary process. Modeling
the conditional variance as an AR process raises the implied unconditional variance, making this
model particularly appealing to researchers modeling fat-tailed data, such as financial data.
ARFIMA model. An autoregressive fractionally integrated moving-average (ARFIMA) model is a timeseries model suitable for use with long-memory processes. ARFIMA models generalize autoregressive
integrated moving-average (ARIMA) models by allowing the differencing parameter to be a real
number in (0.5, 0.5) instead of requiring it to be an integer.
ARIMA model. An autoregressive integrated moving-average (ARIMA) model is a time-series model
suitable for use with integrated processes. In an ARIMA(p, d, q) model, the data is differenced d
times to obtain a stationary series, and then an ARMA(p, q) model is fit to this differenced data.
ARIMA models that include exogenous explanatory variables are known as ARMAX models.
ARMA model. An autoregressive moving-average (ARMA) model is a time-series model in which
the current periods realization is the sum of an autoregressive (AR) process and a moving-average
(MA) process. An ARMA(p, q) model includes p AR terms and q MA terms. ARMA models with
just a few lags are often able to fit data as well as pure AR or MA models with many more lags.
ARMAX model. An ARMAX model is a time-series model in which the current periods realization is
an ARMA process plus a linear function of a set of exogenous variables. Equivalently, an ARMAX
model is a linear regression model in which the error term is specified to follow an ARMA process.
autocorrelation function. The autocorrelation function (ACF) expresses the correlation between periods
t and t k of a time series as function of the time t and the lag k . For a stationary time series,
the ACF does not depend on t and is symmetric about k = 0, meaning that the correlation between
periods t and t k is equal to the correlation between periods t and t + k .
autoregressive process. An autoregressive process is a time-series model in which the current value
of a variable is a linear function of its own past values and a white-noise error term. A first-order
autoregressive process, denoted as an AR(1) process, is yt = yt1 + t . An AR(p) model contains
p lagged values of the dependent variable.
band-pass filter. Time-series filters are designed to pass or block stochastic cycles at specified
frequencies. Band-pass filters, such as those implemented in tsfilter bk and tsfilter cf,
pass through stochastic cycles in the specified range of frequencies and block all other stochastic
cycles.
792

Glossary

793

Cholesky ordering. Cholesky ordering is a method used to orthogonalize the error term in a VAR or
VECM to impose a recursive structure on the dynamic model, so that the resulting impulseresponse
functions can be given a causal interpretation. The method is so named because it uses the Cholesky
decomposition of the error-covariance matrix.
CochraneOrcutt estimator. This estimation is a linear regression estimator that can be used when the
error term exhibits first-order autocorrelation. An initial estimate of the autocorrelation parameter
is obtained from OLS residuals, and then OLS is performed on the transformed data yet = yt yt1
et = xt xt1 .
and x
cointegrating vector. A cointegrating vector specifies a stationary linear combination of nonstationary
variables. Specifically, if each of the variables x1 , x2 , . . . , xk is integrated of order one and there
exists a set of parameters 1 , 2 , . . . , k such that zt = 1 x1 + 2 x2 + + k xk is a stationary
process, the variables x1 , x2 , . . . , xk are said to be cointegrated, and the vector is known as a
cointegrating vector.
conditional variance. Although the conditional variance is simply the variance of a conditional
distribution, in time-series analysis the conditional variance is often modeled as an autoregressive
process, giving rise to ARCH models.
correlogram. A correlogram is a table or graph showing the sample autocorrelations or partial
autocorrelations of a time series.
covariance stationarity. A process is covariance stationary if the mean of the process is finite and
independent of t, the unconditional variance of the process is finite and independent of t, and the
covariance between periods t and t s is finite and depends on t s but not on t or s themselves.
Covariance-stationary processes are also known as weakly stationary processes.
cross-correlation function. The cross-correlation function expresses the correlation between one series
at time t and another series at time t k as a function of the time t and lag k . If both series
are stationary, the function does not depend on t. The function is not symmetric about k = 0:
12 (k) 6= 12 (k).
cyclical component. A cyclical component is a part of a time series that is a periodic function of
time. Deterministic functions of time are deterministic cyclical components, and random functions
of time are stochastic cyclical components. For example, fixed seasonal effects are deterministic
cyclical components and random seasonal effects are stochastic seasonal components.

Random coefficients on time inside of periodic functions form an especially useful class of stochastic
cyclical components; see [TS] ucm.
deterministic trend. A deterministic trend is a deterministic function of time that specifies the long-run
tendency of a time series.
difference operator. The difference operator denotes the change in the value of a variable
from period t 1 to period t. Formally, yt = yt yt1 , and 2 yt = (yt yt1 ) =
(yt yt1 ) (yt1 yt2 ) = yt 2yt1 + yt2 .
drift. Drift is the constant term in a unit-root process. In

yt = + yt1 + t
is the drift when t is a stationary, zero-mean process.
dynamic forecast. A dynamic forecast uses forecast values wherever lagged values of the endogenous
variables appear in the model, allowing one to forecast multiple periods into the future.
dynamic-multiplier function. A dynamic-multiplier function measures the effect of a shock to an
exogenous variable on an endogenous variable. The k th dynamic-multiplier function of variable i

794

Glossary

on variable j measures the effect on variable j in period t + k in response to a one-unit shock to


variable i in period t, holding everything else constant.
endogenous variable. An endogenous variable is a regressor that is correlated with the unobservable
error term. Equivalently, an endogenous variable is one whose values are determined by the
equilibrium or outcome of a structural model.
exogenous variable. An exogenous variable is a regressor that is not correlated with any of the
unobservable error terms in the model. Equivalently, an exogenous variable is one whose values
change independently of the other variables in a structural model.
exponential smoothing. Exponential smoothing is a method of smoothing a time series in which the
smoothed value at period t is equal to a fraction of the series value at time t plus a fraction 1
of the previous periods smoothed value. The fraction is known as the smoothing parameter.
forecast-error variance decomposition. Forecast-error variance decompositions measure the fraction
of the error in forecasting variable i after h periods that is attributable to the orthogonalized shocks
to variable j .
forward operator. The forward operator F denotes the value of a variable at time t + 1. Formally,
F yt = yt+1 , and F 2 yt = F yt+1 = yt+2 .
frequency-domain analysis. Frequency-domain analysis is analysis of time-series data by considering
its frequency properties. The spectral density and distribution functions are key components of
frequency-domain analysis, so it is often called spectral analysis. In Stata, the cumsp and pergram
commands are used to analyze the sample spectral distribution and density functions, respectively.
psdensity estimates the spectral density or the spectral distribution function after estimating the
parameters of a parametric model using arfima, arima, or ucm.
gain (of a linear filter). The gain of a linear filter scales the spectral density of the unfiltered series
into the spectral density of the filtered series for each frequency. Specifically, at each frequency,
multiplying the spectral density of the unfiltered series by the square of the gain of a linear filter
yields the spectral density of the filtered series. If the gain at a particular frequency is 1, the filtered
and unfiltered spectral densities are the same at that frequency and the corresponding stochastic
cycles are passed through perfectly. If the gain at a particular frequency is 0, the filter removes
all the corresponding stochastic cycles from the unfiltered series.
GARCH model. A generalized autoregressive conditional heteroskedasticity (GARCH) model is a regression model in which the conditional variance is modeled as an ARMA process. The GARCH(m, k)
model is
yt = xt + t
2
2
t2 = 0 + 1 2t1 + + m 2tm + 1 t1
+ + k tk

where the equation for yt represents the conditional mean of the process and t represents the
conditional variance. See [TS] arch or Hamilton (1994, chap. 21) for details on how the conditional
variance equation can be viewed as an ARMA process. GARCH models are often used because the
ARMA specification often allows the conditional variance to be modeled with fewer parameters
than are required by a pure ARCH model. Many extensions to the basic GARCH model exist; see
[TS] arch for those that are implemented in Stata. See also ARCH model.
generalized least-squares estimator. A generalized least-squares (GLS) estimator is used to estimate
the parameters of a regression function when the error term is heteroskedastic or autocorrelated.
In the linear case, GLS is sometimes described as OLS on transformed data because the GLS
estimator can be implemented by applying an appropriate transformation to the dataset and then
using OLS.

Glossary

795

Granger causality. The variable x is said to Granger-cause variable y if, given the past values of y,
past values of x are useful for predicting y.
high-pass filter. Time-series filters are designed to pass or block stochastic cycles at specified
frequencies. High-pass filters, such as those implemented in tsfilter bw and tsfilter hp, pass
through stochastic cycles above the cutoff frequency and block all other stochastic cycles.
HoltWinters smoothing. A set of methods for smoothing time-series data that assume that the value
of a time series at time t can be approximated as the sum of a mean term that drifts over time,
as well as a time trend whose strength also drifts over time. Variations of the basic method allow
for seasonal patterns in data, as well.
impulseresponse function. An impulseresponse function (IRF) measures the effect of a shock to an
endogenous variable on itself or another endogenous variable. The k th impulseresponse function
of variable i on variable j measures the effect on variable j in period t + k in response to a
one-unit shock to variable i in period t, holding everything else constant.
independent and identically distributed. A series of observations is independently and identically
distributed (i.i.d.) if each observation is an independent realization from the same underlying
distribution. In some contexts, the definition is relaxed to mean only that the observations are
independent and have identical means and variances; see Davidson and MacKinnon (1993, 42).
integrated process. A nonstationary process is integrated of order d, written I(d), if the process must
be differenced d times to produce a stationary series. An I(1) process yt is one in which yt is
stationary.
Kalman filter. The Kalman filter is a recursive procedure for predicting the state vector in a state-space
model.
lag operator. The lag operator L denotes the value of a variable at time t 1. Formally, Lyt = yt1 ,
and L2 yt = Lyt1 = yt2 .
linear filter. A linear filter is a sequence of weights used to compute a weighted average of a time
series at each time period. More formally, a linear filter (L) is

(L) = 0 + 1 L + 2 L2 + =

=0

where L is the lag operator. Applying the linear filter (L) to the time series xt yields a sequence
of weighted averages of xt :

X
(L)xt =
L xt
=0

long-memory process. A long-memory process is a stationary process whose autocorrelations decay


at a slower rate than a short-memory process. ARFIMA models are typically used to represent
long-memory processes, and ARMA models are typically used to represent short-memory processes.
moving-average process. A moving-average process is a time-series process in which the current
value of a variable is modeled as a weighted average of current and past realizations of a whitenoise process and, optionally, a time-invariant constant. By convention, the weight on the current
realization of the white-noise process is equal to one, and the weights on the past realizations are
known as the moving-average (MA) coefficients. A first-order moving-average process, denoted as
an MA(1) process, is yt = t1 + t .
multivariate GARCH models. Multivariate GARCH models are multivariate time-series models in
which the conditional covariance matrix of the errors depends on its own past and its past shocks.

796

Glossary

The acute trade-off between parsimony and flexibility has given rise to a plethora of models; see
[TS] mgarch.
NeweyWest covariance matrix. The NeweyWest covariance matrix is a member of the class of
heteroskedasticity- and autocorrelation-consistent (HAC) covariance matrix estimators used with
time-series data that produces covariance estimates that are robust to both arbitrary heteroskedasticity
and autocorrelation up to a prespecified lag.
one-step-ahead forecast. See static forecast.
orthogonalized impulseresponse function. An orthogonalized impulseresponse function (OIRF)
measures the effect of an orthogonalized shock to an endogenous variable on itself or another
endogenous variable. An orthogonalized shock is one that affects one variable at time t but no
other variables. See [TS] irf create for a discussion of the difference between IRFs and OIRFs.
partial autocorrelation function. The partial autocorrelation function (PACF) expresses the correlation
between periods t and t k of a time series as a function of the time t and lag k , after controlling
for the effects of intervening lags. For a stationary time series, the PACF does not depend on t.
The PACF is not symmetric about k = 0: the partial autocorrelation between yt and ytk is not
equal to the partial autocorrelation between yt and yt+k .
periodogram. A periodogram is a graph of the spectral density function of a time series as a function
of frequency. The pergram command first standardizes the amplitude of the density by the sample
variance of the time series, and then plots the logarithm of that standardized density. Peaks in the
periodogram represent cyclical behavior in the data.
phase function. The phase function of a linear filter specifies how the filter changes the relative
importance of the random components at different frequencies in the frequency domain.
portmanteau statistic. The portmanteau, or Q, statistic is used to test for white noise and is calculated
using the first m autocorrelations of the series, where m is chosen by the user. Under the null
hypothesis that the series is a white-noise process, the portmanteau statistic has a 2 distribution
with m degrees of freedom.
PraisWinsten estimator. A PraisWinsten estimator is a linear regression estimator that is used
when the error term exhibits first-order autocorrelation; see p
also CochraneOrcutt estimator.
Here
p
e1 = 1 2 x1 ,
the first observation in the dataset is transformed as ye1 = 1 2 y1 and x
so that the first observation is not lost. The PraisWinsten estimator is a generalized least-squares
estimator.
priming values. Priming values are the initial, preestimation values used to begin a recursive process.
random walk. A random walk is a time-series process in which the current periods realization is
equal to the previous periods realization plus a white-noise error term: yt = yt1 + t . A random
walk with drift also contains a nonzero time-invariant constant: yt = + yt1 + t . The constant
term is known as the drift parameter. An important property of random-walk processes is that
the best predictor of the value at time t + 1 is the value at time t plus the value of the drift
parameter.
recursive regression analysis. A recursive regression analysis involves performing a regression at
time t by using all available observations from some starting time t0 through time t, performing
another regression at time t + 1 by using all observations from time t0 through time t + 1, and
so on. Unlike a rolling regression analysis, the first period used for all regressions is held fixed.
rolling regression analysis. A rolling, or moving window, regression analysis involves performing
regressions for each period by using the most recent m periods data, where m is known as the
window size. At time t the regression is fit using observations for times t 19 through time t; at
time t + 1 the regression is fit using the observations for time t 18 through t + 1; and so on.

Glossary

797

seasonal difference operator. The period-s seasonal difference operator s denotes the difference
in the value of a variable at time t and time t s. Formally, s yt = yt yts , and 2s yt =
s (yt yts ) = (yt yts ) (yts yt2s ) = yt 2yts + yt2s .
serial correlation. Serial correlation refers to regression errors that are correlated over time. If a
regression model does not contained lagged dependent variables as regressors, the OLS estimates
are consistent in the presence of mild serial correlation, but the covariance matrix is incorrect.
When the model includes lagged dependent variables and the residuals are serially correlated, the
OLS estimates are biased and inconsistent. See, for example, Davidson and MacKinnon (1993,
chap. 10) for more information.
serial correlation tests. Because OLS estimates are at least inefficient and potentially biased in the
presence of serial correlation, econometricians have developed many tests to detect it. Popular ones
include the DurbinWatson (1950, 1951, 1971) test, the BreuschPagan (1980) test, and Durbins
(1970) alternative test. See [R] regress postestimation time series.
smoothing. Smoothing a time series refers to the process of extracting an overall trend in the data.
The motivation behind smoothing is the belief that a time series exhibits a trend component as
well as an irregular component and that the analyst is interested only in the trend component.
Some smoothers also account for seasonal or other cyclical patterns.
spectral analysis. See frequency-domain analysis.
spectral density function. The spectral density function is the derivative of the spectral distribution
function. Intuitively, the spectral density function f () indicates the amount of variance in a time
series that is attributable to sinusoidal components with frequency . See also spectral distribution
function. The spectral density function is sometimes called the spectrum.
spectral distribution function. The (normalized) spectral distribution function F () of a process
describes the proportion of variance that can be explained by sinusoids with frequencies in the
range (0, ), where 0 . The spectral distribution and density functions used in frequencydomain analysis are closely related to the autocorrelation function used in time-domain analysis;
see Chatfield (2004, chap. 6) and Wei (2006, chap. 12).
spectrum. See spectral density function.
state-space model. A state-space model describes the relationship between an observed time series
and an unobservable state vector that represents the state of the world. The measurement equation
expresses the observed series as a function of the state vector, and the transition equation describes
how the unobserved state vector evolves over time. By defining the parameters of the measurement
and transition equations appropriately, one can write a wide variety of time-series models in the
state-space form.
static forecast. A static forecast uses actual values wherever lagged values of the endogenous variables
appear in the model. As a result, static forecasts perform at least as well as dynamic forecasts,
but static forecasts cannot produce forecasts into the future if lags of the endogenous variables
appear in the model.

Because actual values will be missing beyond the last historical time period in the dataset, static
forecasts can only forecast one period into the future (assuming only first lags appear in the model);
for that reason, they are often called one-step-ahead forecasts.
steady-state equilibrium. The steady-state equilibrium is the predicted value of a variable in a dynamic
model, ignoring the effects of past shocks, or, equivalently, the value of a variable, assuming that
the effects of past shocks have fully died out and no longer affect the variable of interest.

798

Glossary

stochastic equation. A stochastic equation, in contrast to an identity, is an equation in a forecast model


that includes a random component, most often in the form of an additive error term. Stochastic
equations include parameters that must be estimated from historical data.
stochastic trend. A stochastic trend is a nonstationary random process. Unit-root process and random
coefficients on time are two common stochastic trends. See [TS] ucm for examples and discussions
of more commonly applied stochastic trends.
strict stationarity. A process is strictly stationary if the joint distribution of y1 , . . . , yk is the same
as the joint distribution of y1+ , . . . , yk+ for all k and . Intuitively, shifting the origin of the
series by units has no effect on the joint distributions.
structural model. In time-series analysis, a structural model is one that describes the relationship
among a set of variables, based on underlying theoretical considerations. Structural models may
contain both endogenous and exogenous variables.
SVAR. A structural vector autoregressive (SVAR) model is a type of VAR in which short- or long-run
constraints are placed on the resulting impulseresponse functions. The constraints are usually
motivated by economic theory and therefore allow causal interpretations of the IRFs to be made.
time-domain analysis. Time-domain analysis is analysis of data viewed as a sequence of observations
observed over time. The autocorrelation function, linear regression, ARCH models, and ARIMA
models are common tools used in time-domain analysis.
trend. The trend specifies the long-run behavior in a time series. The trend can be deterministic or
stochastic. Many economic, biological, health, and social time series have long-run tendencies to
increase or decrease. Before the 1980s, most time-series analysis specified the long-run tendencies
as deterministic functions of time. Since the 1980s, the stochastic trends implied by unit-root
processes have become a standard part of the toolkit.
unit-root process. A unit-root process is one that is integrated of order one, meaning that the process
is nonstationary but that first-differencing the process produces a stationary series. The simplest
example of a unit-root process is the random walk. See Hamilton (1994, chap. 15) for a discussion
of when general ARMA processes may contain a unit root.
unit-root tests. Whether a process has a unit root has both important statistical and economic
ramifications, so a variety of tests have been developed to test for them. Among the earliest
tests proposed is the one by Dickey and Fuller (1979), though most researchers now use an
improved variant called the augmented DickeyFuller test instead of the original version. Other
common unit-root tests implemented in Stata include the DFGLS test of Elliott, Rothenberg, and
Stock (1996) and the PhillipsPerron (1988) test. See [TS] dfuller, [TS] dfgls, and [TS] pperron.

Variants of unit-root tests suitable for panel data have also been developed; see [XT] xtunitroot.
VAR. A vector autoregressive (VAR) model is a multivariate regression technique in which each
dependent variable is regressed on lags of itself and on lags of all the other dependent variables
in the model. Occasionally, exogenous variables are also included in the model.
VECM. A vector error-correction model (VECM) is a type of VAR that is used with variables that
are cointegrated. Although first-differencing variables that are integrated of order one makes them
stationary, fitting a VAR to such first-differenced variables results in misspecification error if the
variables are cointegrated. See The multivariate VECM specification in [TS] vec intro for more
on this point.
white noise. A variable ut represents a white-noise process if the mean of ut is zero, the variance
of ut is 2 , and the covariance between ut and us is zero for all s 6= t.

Glossary

799

YuleWalker equations. The YuleWalker equations are a set of difference equations that describe the
relationship among the autocovariances and autocorrelations of an autoregressive moving-average
(ARMA) process.

References
Breusch, T. S., and A. R. Pagan. 1980. The Lagrange multiplier test and its applications to model specification in
econometrics. Review of Economic Studies 47: 239253.
Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root.
Journal of the American Statistical Association 74: 427431.
Durbin, J. 1970. Testing for serial correlation in least-squares regressions when some of the regressors are lagged
dependent variables. Econometrica 38: 410421.
Durbin, J., and G. S. Watson. 1950. Testing for serial correlation in least squares regression. I. Biometrika 37:
409428.
. 1951. Testing for serial correlation in least squares regression. II. Biometrika 38: 159177.
. 1971. Testing for serial correlation in least squares regression. III. Biometrika 58: 119.
Elliott, G. R., T. J. Rothenberg, and J. H. Stock. 1996. Efficient tests for an autoregressive unit root. Econometrica
64: 813836.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Phillips, P. C. B., and P. Perron. 1988. Testing for a unit root in time series regression. Biometrika 75: 335346.
Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson.

Subject and author index


This is the subject and author index for the Time-Series
Reference Manual. Readers interested in topics other
than time series should see the combined subject index
(and the combined author index) in the Glossary and
Index.

A
Abraham, B., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
ac command, [TS] corrgram
acplot, estat subcommand, [TS] estat acplot
add factor, [TS] Glossary
add, irf subcommand, [TS] irf add
adjust, forecast subcommand, [TS] forecast adjust
Adkins, L. C., [TS] arch
Ahn, S. K., [TS] vec intro
Aielli, G. P., [TS] mgarch, [TS] mgarch dcc
Akaike, H., [TS] varsoc
alternative scenarios, [TS] forecast, [TS] forecast
adjust, [TS] forecast clear, [TS] forecast
coefvector, [TS] forecast create, [TS] forecast
describe, [TS] forecast drop, [TS] forecast
estimates, [TS] forecast exogenous,
[TS] forecast identity, [TS] forecast list,
[TS] forecast query, [TS] forecast solve
Amemiya, T., [TS] varsoc
Amisano, G., [TS] irf create, [TS] var intro, [TS] var
svar, [TS] vargranger, [TS] varwle
An, S., [TS] arfima
Anderson, B. D. O., [TS] sspace
Anderson, T. W., [TS] vec, [TS] vecrank
Ansley, C. F., [TS] arima
A-PARCH, see asymmetric power autoregressive
conditional heteroskedasticity
AR, see autoregressive
ARCH, see autoregressive conditional heteroskedasticity
arch command, [TS] arch, [TS] arch postestimation
ARFIMA, see autoregressive fractionally integrated
moving-average model
arfima command, [TS] arfima, [TS] arfima
postestimation
ARIMA, see autoregressive integrated moving-average
model
arima command, [TS] arima, [TS] arima
postestimation
ARMA, see autoregressive moving average
ARMAX, see autoregressive moving average with
exogenous inputs
aroots, estat subcommand, [TS] estat aroots
asymmetric power autoregressive conditional
heteroskedasticity, [TS] arch

801

autocorrelation, [TS] arch, [TS] arfima, [TS] arima,


[TS] corrgram, [TS] dfactor, [TS] estat
acplot, [TS] newey, [TS] prais, [TS] psdensity,
[TS] sspace, [TS] ucm, [TS] var, [TS] varlmar,
[TS] Glossary
autocovariance, [TS] arfima, [TS] arima,
[TS] corrgram, [TS] estat acplot, [TS] psdensity
autoregressive, [TS] arch, [TS] arfima, [TS] arima,
[TS] dfactor, [TS] sspace, [TS] ucm
conditional heteroskedasticity
effects, [TS] arch
model, [TS] arch, [TS] arch postestimation,
[TS] Glossary, also see multivariate GARCH
fractionally integrated moving-average model,
[TS] arfima, [TS] arfima postestimation,
[TS] estat acplot, [TS] psdensity, [TS] Glossary
integrated moving-average model, [TS] arima,
[TS] arima postestimation, [TS] estat acplot,
[TS] estat aroots, [TS] psdensity, [TS] Glossary
model, [TS] dfactor, [TS] estat acplot,
[TS] psdensity, [TS] sspace, [TS] ucm
moving average, [TS] arch, [TS] arfima,
[TS] arima, [TS] sspace, [TS] ucm,
[TS] Glossary
moving average with exogenous inputs, [TS] arfima,
[TS] arima, [TS] dfactor, [TS] sspace,
[TS] ucm, [TS] Glossary
process, [TS] Glossary
Aznar, A., [TS] vecrank

B
Baillie, R. T., [TS] arfima
band-pass filters, [TS] tsfilter bk, [TS] tsfilter cf,
[TS] Glossary
Bartlett, M. S., [TS] wntestb
Bartletts
bands, [TS] corrgram
periodogram test, [TS] wntestb
Baum, C. F., [TS] arch, [TS] arima, [TS] dfgls,
[TS] rolling, [TS] time series, [TS] tsfilter,
[TS] tsset, [TS] var, [TS] wntestq
Bauwens, L., [TS] mgarch
BaxterKing filter, [TS] tsfilter, [TS] tsfilter bk
Baxter, M., [TS] tsfilter, [TS] tsfilter bk, [TS] tsfilter
cf
Becketti, S., [TS] arch, [TS] arima, [TS] corrgram,
[TS] dfuller, [TS] irf, [TS] prais, [TS] time
series, [TS] tssmooth, [TS] var intro, [TS] var
svar, [TS] vec intro, [TS] vec
Bera, A. K., [TS] arch, [TS] varnorm, [TS] vecnorm
Beran, J., [TS] arfima, [TS] arfima postestimation
Berkes, I., [TS] mgarch
Berndt, E. K., [TS] arch, [TS] arima
Bianchi, G., [TS] tsfilter, [TS] tsfilter bw
bk, tsfilter subcommand, [TS] tsfilter bk
Black, F., [TS] arch
block exogeneity, [TS] vargranger
Bloomfield, P., [TS] arfima

802 Subject and author index


Bollerslev, T., [TS] arch, [TS] arima, [TS] mgarch,
[TS] mgarch ccc, [TS] mgarch dvech
Boswijk, H. P., [TS] vec
Bowerman, B. L., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Box, G. E. P., [TS] arfima, [TS] arima,
[TS] corrgram, [TS] cumsp, [TS] dfuller,
[TS] estat acplot, [TS] pergram, [TS] pperron,
[TS] psdensity, [TS] wntestq, [TS] xcorr
Breusch, T. S., [TS] Glossary
Brockwell, P. J., [TS] corrgram, [TS] sspace
Broyden, C. G., [TS] forecast solve
Bruno, G. S. F., [TS] forecast
Burns, A. F., [TS] tsfilter, [TS] tsfilter bk, [TS] tsfilter
bw, [TS] tsfilter cf, [TS] tsfilter hp, [TS] ucm
business calendars, [TS] intro
Butterworth filter, [TS] tsfilter, [TS] tsfilter bw
Butterworth, S., [TS] tsfilter, [TS] tsfilter bw
bw, tsfilter subcommand, [TS] tsfilter bw

C
Caines, P. E., [TS] sspace
calendars, [TS] intro
Cameron, A. C., [TS] forecast estimates
Casals, J., [TS] sspace
ccc, mgarch subcommand, [TS] mgarch ccc
cf, tsfilter subcommand, [TS] tsfilter cf
cgraph, irf subcommand, [TS] irf cgraph
Chang, Y., [TS] sspace
Chatfield, C., [TS] arima, [TS] corrgram,
[TS] pergram, [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth ma,
[TS] tssmooth shwinters, [TS] Glossary
Cheung, Y.-W., [TS] dfgls
Cholesky ordering, [TS] Glossary
Chou, R. Y., [TS] arch
ChristianoFitzgerald filter, [TS] tsfilter, [TS] tsfilter cf
Christiano, L. J., [TS] irf create, [TS] tsfilter,
[TS] tsfilter cf, [TS] var svar
Chu-Chun-Lin, S., [TS] sspace
clear, forecast subcommand, [TS] forecast clear
clock time, [TS] tsset
cluster estimator of variance, PraisWinsten and
CochraneOrcutt regression, [TS] prais
Cochrane, D., [TS] prais
CochraneOrcutt regression, [TS] prais, [TS] Glossary
coefvector, forecast subcommand, [TS] forecast
coefvector
cointegration, [TS] fcast compute, [TS] fcast graph,
[TS] vec intro, [TS] vec, [TS] veclmar,
[TS] vecnorm, [TS] vecrank, [TS] vecstable,
[TS] Glossary
compute, fcast subcommand, [TS] fcast compute
Comte, F., [TS] mgarch
conditional variance, [TS] arch, [TS] Glossary

constant conditional-correlation model, [TS] mgarch,


[TS] mgarch ccc
constrained estimation
ARCH, [TS] arch
ARFIMA, [TS] arfima
ARIMA and ARMAX, [TS] arima
dynamic factor model, [TS] dfactor
GARCH model, [TS] mgarch ccc, [TS] mgarch
dcc, [TS] mgarch dvech, [TS] mgarch vcc
state-space model, [TS] sspace
structural vector autoregressive models, [TS] var
svar
unobserved-components model, [TS] ucm
vector autoregressive models, [TS] var
vector error-correction models, [TS] vec
correlogram, [TS] corrgram, [TS] Glossary
corrgram command, [TS] corrgram
covariance stationarity, [TS] Glossary
Cox, N. J., [TS] tsline, [TS] tsset, [TS] tssmooth
hwinters, [TS] tssmooth shwinters
create,
forecast subcommand, [TS] forecast create
irf subcommand, [TS] irf create
cross-correlation function, [TS] xcorr, [TS] Glossary
cross-correlogram, [TS] xcorr
ctable, irf subcommand, [TS] irf ctable
cumsp command, [TS] cumsp
cumulative spectral distribution, empirical, [TS] cumsp,
[TS] psdensity
cyclical component, [TS] tsfilter, [TS] ucm,
[TS] Glossary

D
data manipulation, [TS] tsappend, [TS] tsfill,
[TS] tsreport, [TS] tsrevar, [TS] tsset
David, J. S., [TS] arima
Davidson, R., [TS] arch, [TS] arima, [TS] prais,
[TS] sspace, [TS] varlmar, [TS] Glossary
Davis, G., [TS] arima
Davis, R. A., [TS] corrgram, [TS] sspace
dcc, mgarch subcommand, [TS] mgarch dcc
De Jong, P., [TS] dfactor, [TS] sspace, [TS] sspace
postestimation, [TS] ucm
DeGroot, M. H., [TS] arima
Deistler, M., [TS] sspace
del Rio, A., [TS] tsfilter hp
describe,
forecast subcommand, [TS] forecast describe
irf subcommand, [TS] irf describe
deterministic trend, [TS] Glossary
dexponential, tssmooth subcommand,
[TS] tssmooth dexponential
dfactor command, [TS] dfactor, [TS] dfactor
postestimation
dfgls command, [TS] dfgls
dfuller command, [TS] dfuller
diagonal vech model, [TS] mgarch, [TS] mgarch dvech

Subject and author index 803


Dickens, R., [TS] prais
Dickey, D. A., [TS] dfgls, [TS] dfuller, [TS] pperron,
[TS] Glossary
DickeyFuller test, [TS] dfgls, [TS] dfuller
Diebold, F. X., [TS] arch
difference operator, [TS] Glossary
Diggle, P. J., [TS] arima, [TS] wntestq
Ding, Z., [TS] arch
Doornik, J. A., [TS] arfima, [TS] vec
double-exponential smoothing, [TS] tssmooth
dexponential
drift, [TS] Glossary
drop,
forecast subcommand, [TS] forecast drop
irf subcommand, [TS] irf drop
Drukker, D. M., [TS] arfima postestimation,
[TS] sspace, [TS] vec
Duan, N., [TS] forecast estimates
Durbin, J., [TS] prais, [TS] ucm, [TS] Glossary
DurbinWatson statistic, [TS] prais
Durlauf, S. N., [TS] vec intro, [TS] vec, [TS] vecrank
dvech, mgarch subcommand, [TS] mgarch dvech
dynamic conditional-correlation model, [TS] mgarch,
[TS] mgarch dcc
dynamic factor model, [TS] dfactor, [TS] dfactor
postestimation, also see state-space model
dynamic forecast, [TS] arch, [TS] arfima, [TS] fcast
compute, [TS] fcast graph, [TS] forecast,
[TS] forecast adjust, [TS] forecast clear,
[TS] forecast coefvector, [TS] forecast
create, [TS] forecast describe, [TS] forecast
drop, [TS] forecast estimates, [TS] forecast
exogenous, [TS] forecast identity, [TS] forecast
list, [TS] forecast query, [TS] forecast solve,
[TS] mgarch, [TS] Glossary
dynamic regression model, [TS] arfima, [TS] arima,
[TS] var
dynamic structural simultaneous equations, [TS] var
svar
dynamic-multiplier function, [TS] irf, [TS] irf cgraph,
[TS] irf create, [TS] irf ctable, [TS] irf ograph,
[TS] irf table, [TS] var intro, [TS] Glossary

E
EGARCH, see exponential generalized autoregressive
conditional heteroskedasticity
Eichenbaum, M., [TS] irf create, [TS] var svar
eigenvalue stability condition, [TS] estat aroots,
[TS] varstable, [TS] vecstable
Elliott, G. R., [TS] dfgls, [TS] Glossary
Enders, W., [TS] arch, [TS] arima, [TS] arima
postestimation, [TS] corrgram
endogenous variable, [TS] Glossary

Engle, R. F., [TS] arch, [TS] arima, [TS] dfactor,


[TS] mgarch, [TS] mgarch dcc, [TS] mgarch
dvech, [TS] mgarch vcc, [TS] vec intro,
[TS] vec, [TS] vecrank
estat
acplot command, [TS] estat acplot
aroots command, [TS] estat aroots
period command, [TS] ucm postestimation
estimates, forecast subcommand, [TS] forecast
estimates
Evans, C. L., [TS] irf create, [TS] var svar
exogenous, forecast subcommand, [TS] forecast
exogenous
exogenous variable, [TS] Glossary
exp list, [TS] rolling
exponential generalized autoregressive conditional
heteroskedasticity, [TS] arch
exponential smoothing, [TS] tssmooth, [TS] tssmooth
exponential, [TS] Glossary
exponential, tssmooth subcommand, [TS] tssmooth
exponential

F
factor model, [TS] dfactor
Fair, R. C., [TS] forecast solve
fcast compute command, [TS] fcast compute
fcast graph command, [TS] fcast graph
feasible generalized least squares, [TS] dfgls,
[TS] prais, [TS] var
Feller, W., [TS] wntestb
FEVD, see forecast-error variance decomposition
FGLS, see feasible generalized least squares
filters, [TS] tsfilter, also see smoothers
BaxterKing, [TS] tsfilter bk
Butterworth, [TS] tsfilter bw
ChristianoFitzgerald, [TS] tsfilter cf
HodrickPrescott, [TS] tsfilter hp
Fiorentini, G., [TS] mgarch
Fitzgerald, T. J., [TS] tsfilter, [TS] tsfilter cf
Flannery, B. P., [TS] arch, [TS] arima
forecast, [TS] forecast
adjust command, [TS] forecast adjust
clear command, [TS] forecast clear
coefvector command, [TS] forecast coefvector
create command, [TS] forecast create
describe command, [TS] forecast describe
drop command, [TS] forecast drop
estimates command, [TS] forecast estimates
exogenous command, [TS] forecast exogenous
identity command, [TS] forecast identity
list command, [TS] forecast list
query command, [TS] forecast query
solve command, [TS] forecast solve

804 Subject and author index


forecast,
ARCH model, [TS] arch postestimation
ARFIMA model, [TS] arfima postestimation
ARIMA model, [TS] arima postestimation
dynamic-factor model, [TS] dfactor postestimation
econometric model, [TS] forecast, [TS] forecast
adjust, [TS] forecast clear, [TS] forecast
coefvector, [TS] forecast create, [TS] forecast
describe, [TS] forecast drop, [TS] forecast
estimates, [TS] forecast exogenous,
[TS] forecast identity, [TS] forecast list,
[TS] forecast query, [TS] forecast solve
MGARCH model, see multivariate GARCH
postestimation
state-space model, [TS] sspace postestimation
structural vector autoregressive model, [TS] var svar
postestimation
unobserved-components model, [TS] ucm
postestimation
vector autoregressive model, [TS] var
postestimation
vector error-correction model, [TS] vec
postestimation
forecast-error variance decomposition, [TS] irf,
[TS] irf create, [TS] irf ograph, [TS] irf table,
[TS] var intro, [TS] varbasic, [TS] vec intro,
[TS] Glossary
forecasting, [TS] arch, [TS] arfima, [TS] arima,
[TS] fcast compute, [TS] fcast graph,
[TS] irf create, [TS] mgarch, [TS] tsappend,
[TS] tssmooth, [TS] tssmooth dexponential,
[TS] tssmooth exponential, [TS] tssmooth
hwinters, [TS] tssmooth ma, [TS] tssmooth
shwinters, [TS] ucm, [TS] var intro, [TS] var,
[TS] vec intro, [TS] vec
forward operator, [TS] Glossary
fractionally integrated autoregressive moving-average
model, [TS] estat acplot, [TS] psdensity
freduse command, [TS] arfima postestimation
frequency-domain analysis, [TS] cumsp, [TS] pergram,
[TS] psdensity, [TS] Glossary
Friedman, M., [TS] arima
Fuller, W. A., [TS] dfgls, [TS] dfuller, [TS] pperron,
[TS] psdensity, [TS] tsfilter, [TS] tsfilter bk,
[TS] ucm, [TS] Glossary

G
gain, [TS] tsfilter, [TS] tsfilter bk, [TS] tsfilter bw,
[TS] tsfilter cf, [TS] tsfilter hp, [TS] Glossary
Gani, J., [TS] wntestb
GARCH, see generalized autoregressive conditional
heteroskedasticity
Gardiner, J. S., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Gardner, E. S., Jr., [TS] tssmooth dexponential,
[TS] tssmooth hwinters

generalized
autoregressive conditional heteroskedasticity,
[TS] arch, [TS] Glossary
least-squares estimator, [TS] prais, [TS] Glossary
Geweke, J., [TS] dfactor
Giannini, C., [TS] irf create, [TS] var intro, [TS] var
svar, [TS] vargranger, [TS] varwle
Giles, D. E. A., [TS] prais
GJR, see threshold autoregressive conditional
heteroskedasticity
Glosten, L. R., [TS] arch
Golub, G. H., [TS] arfima, [TS] arfima postestimation
Gomez, V., [TS] tsfilter, [TS] tsfilter hp
Gonzalo, J., [TS] vec intro, [TS] vecrank
Gourieroux, C. S., [TS] arima, [TS] mgarch ccc,
[TS] mgarch dcc, [TS] mgarch vcc
Gradshteyn, I. S., [TS] arfima
Granger, C. W. J., [TS] arch, [TS] arfima,
[TS] vargranger, [TS] vec intro, [TS] vec,
[TS] vecrank
Granger causality, [TS] vargranger, [TS] Glossary
graph,
fcast subcommand, [TS] fcast graph
irf subcommand, [TS] irf graph
graphs,
autocorrelations, [TS] corrgram
correlogram, [TS] corrgram
cross-correlogram, [TS] xcorr
cumulative spectral density, [TS] cumsp
forecasts, [TS] fcast graph
impulseresponse functions, [TS] irf, [TS] irf
cgraph, [TS] irf graph, [TS] irf ograph
parametric autocorrelation, [TS] estat acplot
parametric autocovariance, [TS] estat acplot
partial correlogram, [TS] corrgram
periodogram, [TS] pergram
white-noise test, [TS] wntestb
Greene, W. H., [TS] arch, [TS] arima, [TS] corrgram,
[TS] var
Griffiths, W. E., [TS] arch, [TS] prais

H
Hall, B. H., [TS] arch, [TS] arima
Hall, R. E., [TS] arch, [TS] arima
Hamilton, J. D., [TS] arch, [TS] arfima, [TS] arima,
[TS] corrgram, [TS] dfuller, [TS] estat
aroots, [TS] fcast compute, [TS] forecast
solve, [TS] irf, [TS] irf create, [TS] pergram,
[TS] pperron, [TS] psdensity, [TS] sspace,
[TS] sspace postestimation, [TS] time series,
[TS] tsfilter, [TS] ucm, [TS] var intro, [TS] var,
[TS] var svar, [TS] vargranger, [TS] varnorm,
[TS] varsoc, [TS] varstable, [TS] varwle,
[TS] vec intro, [TS] vec, [TS] vecnorm,
[TS] vecrank, [TS] vecstable, [TS] xcorr,
[TS] Glossary
Hannan, E. J., [TS] sspace
Hardin, J. W., [TS] newey, [TS] prais

Subject and author index 805


Harvey, A. C., [TS] arch, [TS] arima, [TS] prais,
[TS] psdensity, [TS] sspace, [TS] sspace
postestimation, [TS] tsfilter, [TS] tsfilter hp,
[TS] tssmooth hwinters, [TS] ucm, [TS] var
svar
Hassler, U., [TS] irf create
Hauser, M. A., [TS] arfima
Hausman, J. A., [TS] arch, [TS] arima
heteroskedasticity,
ARCH model, see autoregressive conditional
heteroskedasticity model
GARCH model, see generalized autoregressive
conditional heteroskedasticity
NeweyWest estimator, see NeweyWest regression
Higgins, M. L., [TS] arch
high-pass filter, [TS] tsfilter bw, [TS] tsfilter hp,
[TS] Glossary
Hildreth, C., [TS] prais
HildrethLu regression, [TS] prais
Hill, R. C., [TS] arch, [TS] prais
Hipel, K. W., [TS] arima, [TS] ucm
HodrickPrescott filter, [TS] tsfilter, [TS] tsfilter hp
Hodrick, R. J., [TS] tsfilter, [TS] tsfilter hp
Holan, S. H., [TS] arima
Holt, C. C., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
HoltWinters smoothing, [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters, [TS] Glossary
Horvath, L., [TS] mgarch
Hosking, J. R. M., [TS] arfima
hp, tsfilter subcommand, [TS] tsfilter hp
Huber/White/sandwich estimator of variance, see robust,
Huber/White/sandwich estimator of variance
Hubrich, K., [TS] vec intro, [TS] vecrank
Hurst, H. E., [TS] arfima
hwinters, tssmooth subcommand, [TS] tssmooth
hwinters

I
identity, forecast subcommand, [TS] forecast
identity
impulseresponse functions, [TS] irf, [TS] irf add,
[TS] irf cgraph, [TS] irf create, [TS] irf ctable,
[TS] irf describe, [TS] irf drop, [TS] irf graph,
[TS] irf ograph, [TS] irf rename, [TS] irf set,
[TS] irf table, [TS] var intro, [TS] varbasic,
[TS] vec intro, [TS] Glossary
independent and identically distributed, [TS] Glossary
information criterion, [TS] varsoc
innovation accounting, [TS] irf
integrated autoregressive moving-average model,
[TS] estat acplot, [TS] psdensity
integrated process, [TS] Glossary
IRF, see impulseresponse functions

irf, [TS] irf


add command, [TS] irf add
cgraph command, [TS] irf cgraph
create command, [TS] irf create
ctable command, [TS] irf ctable
describe command, [TS] irf describe
drop command, [TS] irf drop
graph command, [TS] irf graph
ograph command, [TS] irf ograph
rename command, [TS] irf rename
set command, [TS] irf set
table command, [TS] irf table

J
Jaeger, A., [TS] tsfilter, [TS] tsfilter hp
Jagannathan, R., [TS] arch
Jarque, C. M., [TS] varnorm, [TS] vecnorm
JarqueBera statistic, [TS] varnorm, [TS] vecnorm
Jeantheau, T., [TS] mgarch
Jenkins, G. M., [TS] arfima, [TS] arima,
[TS] corrgram, [TS] cumsp, [TS] dfuller,
[TS] estat acplot, [TS] pergram, [TS] pperron,
[TS] psdensity, [TS] xcorr
Jerez, M., [TS] sspace
Johansen, S., [TS] irf create, [TS] varlmar, [TS] vec
intro, [TS] vec, [TS] veclmar, [TS] vecnorm,
[TS] vecrank, [TS] vecstable
Johnson, L. A., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Joyeux, R., [TS] arfima
Judge, G. G., [TS] arch, [TS] prais
Judson, R. A., [TS] forecast

K
Kalman
filter, [TS] arima, [TS] dfactor, [TS] dfactor
postestimation, [TS] sspace, [TS] sspace
postestimation, [TS] ucm, [TS] ucm
postestimation, [TS] Glossary
forecast, [TS] dfactor postestimation, [TS] sspace
postestimation, [TS] ucm postestimation
smoothing, [TS] dfactor postestimation,
[TS] sspace postestimation, [TS] ucm
postestimation
Kalman, R. E., [TS] arima
Kilian, L., [TS] forecast solve
Kim, I.-M., [TS] vec intro, [TS] vec, [TS] vecrank
King, M. L., [TS] prais
King, R. G., [TS] tsfilter, [TS] tsfilter bk, [TS] tsfilter
cf, [TS] tsfilter hp, [TS] vecrank
Klein, L. R., [TS] forecast, [TS] forecast adjust,
[TS] forecast describe, [TS] forecast estimates,
[TS] forecast list, [TS] forecast solve
Kmenta, J., [TS] arch, [TS] prais, [TS] rolling

806 Subject and author index


Koehler, A. B., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Kohn, R. J., [TS] arima
Kokoszka, P., [TS] irf create
Koopman, S. J., [TS] ucm
Kroner, K. F., [TS] arch
kurtosis, [TS] varnorm, [TS] vecnorm

L
lag operator, [TS] Glossary
lag-exclusion statistics, [TS] varwle
lag-order selection statistics, [TS] var intro, [TS] var,
[TS] var svar, [TS] varsoc, [TS] vec intro
Lagrange multiplier test, [TS] varlmar, [TS] veclmar
Lai, K. S., [TS] dfgls
Laurent, S., [TS] mgarch
leap seconds, [TS] tsset
Ledolter, J., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Lee, T.-C., [TS] arch, [TS] prais
Leser, C. E. V., [TS] tsfilter, [TS] tsfilter hp
Lieberman, O., [TS] mgarch
Lilien, D. M., [TS] arch
Lim, G. C., [TS] arch
linear
filter, [TS] tsfilter, [TS] tsfilter cf, [TS] tssmooth
ma, [TS] Glossary
regression, [TS] newey, [TS] prais
Ling, S., [TS] mgarch
list, forecast subcommand, [TS] forecast list
Ljung, G. M., [TS] wntestq
long-memory process, [TS] arfima, [TS] Glossary
Lu, J. Y., [TS] prais
Lund, R., [TS] arima
Lutkepohl, H., [TS] arch, [TS] dfactor, [TS] fcast
compute, [TS] irf, [TS] irf create, [TS] mgarch
dvech, [TS] prais, [TS] sspace, [TS] sspace
postestimation, [TS] time series, [TS] var
intro, [TS] var, [TS] var svar, [TS] varbasic,
[TS] vargranger, [TS] varnorm, [TS] varsoc,
[TS] varstable, [TS] varwle, [TS] vec intro,
[TS] vecnorm, [TS] vecrank, [TS] vecstable

M
MA, see moving average model
ma, tssmooth subcommand, [TS] tssmooth ma
MacKinnon, J. G., [TS] arch, [TS] arima, [TS] dfuller,
[TS] pperron, [TS] prais, [TS] sspace,
[TS] varlmar, [TS] Glossary
Maddala, G. S., [TS] vec intro, [TS] vec, [TS] vecrank
Magnus, J. R., [TS] var svar

Mandelbrot, B. B., [TS] arch


Mangel, M., [TS] varwle
Maravall, A., [TS] tsfilter hp
McAleer, M., [TS] mgarch
McCullough, B. D., [TS] corrgram
McDowell, A. W., [TS] arima
McLeod, A. I., [TS] arima, [TS] ucm
Meiselman, D., [TS] arima
MGARCH, see multivariate GARCH
mgarch
ccc command, [TS] mgarch ccc, [TS] mgarch ccc
postestimation
dcc command, [TS] mgarch dcc, [TS] mgarch dcc
postestimation
dvech command, [TS] mgarch dvech, [TS] mgarch
dvech postestimation
vcc command, [TS] mgarch vcc, [TS] mgarch vcc
postestimation
Miller, J. I., [TS] sspace
Mitchell, W. C., [TS] tsfilter, [TS] tsfilter bk,
[TS] tsfilter bw, [TS] tsfilter cf, [TS] tsfilter hp,
[TS] ucm
Monfort, A., [TS] arima, [TS] mgarch ccc,
[TS] mgarch dcc, [TS] mgarch vcc
Montgomery, D. C., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
Moore, J. B., [TS] sspace
moving average
model, [TS] arch, [TS] arfima, [TS] arima,
[TS] sspace, [TS] ucm
process, [TS] Glossary
smoother, [TS] tssmooth, [TS] tssmooth ma
multiplicative heteroskedasticity, [TS] arch
multivariate GARCH, [TS] mgarch, [TS] Glossary
model,
constant conditional correlation, [TS] mgarch ccc
diagonal vech, [TS] mgarch dvech
dynamic conditional correlation, [TS] mgarch
dcc
varying conditional correlation, [TS] mgarch vcc
postestimation,
after ccc model, [TS] mgarch ccc postestimation
after dcc model, [TS] mgarch dcc
postestimation
after dvech model, [TS] mgarch dvech
postestimation
after vcc model, [TS] mgarch vcc postestimation
multivariate time-series estimators,
dynamic-factor models, [TS] dfactor
MGARCH models, see multivariate GARCH
state-space models, [TS] sspace
structural vector autoregressive models, [TS] var
svar
vector autoregressive models, [TS] var,
[TS] varbasic
vector error-correction models, [TS] vec

Subject and author index 807

NARCH, see nonlinear autoregressive conditional


heteroskedasticity
NARCHK, see nonlinear autoregressive conditional
heteroskedasticity with a shift
Nelson, D. B., [TS] arch, [TS] arima, [TS] mgarch
Neudecker, H., [TS] var svar
Newbold, P., [TS] arima, [TS] vec intro
newey command, [TS] newey, [TS] newey
postestimation
Newey, W. K., [TS] newey, [TS] pperron
NeweyWest
covariance matrix, [TS] Glossary
postestimation, [TS] newey postestimation
regression, [TS] newey
Newton, H. J., [TS] arima, [TS] corrgram,
[TS] cumsp, [TS] dfuller, [TS] pergram,
[TS] wntestb, [TS] xcorr
Ng, S., [TS] dfgls
Nickell, S. J., [TS] forecast
Nielsen, B., [TS] varsoc, [TS] vec intro
nl, tssmooth subcommand, [TS] tssmooth nl
nonlinear
autoregressive conditional heteroskedasticity,
[TS] arch
autoregressive conditional heteroskedasticity with a
shift, [TS] arch
estimation, [TS] arch
power autoregressive conditional heteroskedasticity,
[TS] arch
smoothing, [TS] tssmooth nl
nonstationary time series, [TS] dfgls, [TS] dfuller,
[TS] pperron, [TS] vec intro, [TS] vec
normality test
after VAR or SVAR, [TS] varnorm
after VEC, [TS] vecnorm
NPARCH, see nonlinear power autoregressive
conditional heteroskedasticity

pac command, [TS] corrgram


Pagan, A. R., [TS] Glossary
Palma, W., [TS] arfima, [TS] arfima postestimation,
[TS] estat acplot
parametric spectral density estimation, [TS] psdensity
PARCH, see power autoregressive conditional
heteroskedasticity
Park, J. Y., [TS] sspace, [TS] vec intro, [TS] vec,
[TS] vecrank
partial autocorrelation function, [TS] corrgram,
[TS] Glossary
Paulsen, J., [TS] varsoc, [TS] vec intro
pergram command, [TS] pergram
period, estat subcommand, [TS] ucm postestimation
periodogram, [TS] pergram, [TS] psdensity,
[TS] Glossary
Perron, P., [TS] dfgls, [TS] pperron, [TS] Glossary
phase function, [TS] Glossary
Phillips, P. C. B., [TS] pperron, [TS] vargranger,
[TS] vec intro, [TS] vec, [TS] vecrank,
[TS] Glossary
PhillipsPerron test, [TS] pperron
Pierce, D. A., [TS] wntestq
Pisati, M., [TS] time series
Pitarakis, J.-Y., [TS] vecrank
Plosser, C. I., [TS] vecrank
Pollock, D. S. G., [TS] tsfilter, [TS] tsfilter bk,
[TS] tsfilter bw, [TS] tsfilter cf, [TS] tsfilter hp
portmanteau statistic, [TS] corrgram, [TS] wntestq,
[TS] Glossary
postestimation command, [TS] estat acplot, [TS] estat
aroots, [TS] fcast compute, [TS] fcast graph,
[TS] irf, [TS] psdensity, [TS] vargranger,
[TS] varlmar, [TS] varnorm, [TS] varsoc,
[TS] varstable, [TS] varwle, [TS] veclmar,
[TS] vecnorm, [TS] vecstable
Powell, M. J. D., [TS] forecast solve
power autoregressive conditional heteroskedasticity,
[TS] arch
pperron command, [TS] pperron
prais command, [TS] prais, [TS] prais postestimation
Prais, S. J., [TS] prais
PraisWinsten regression, [TS] prais, [TS] prais
postestimation, [TS] Glossary
Prescott, E. C., [TS] tsfilter, [TS] tsfilter hp
Press, W. H., [TS] arch, [TS] arima
Priestley, M. B., [TS] psdensity, [TS] tsfilter, [TS] ucm
priming values, [TS] Glossary
psdensity command, [TS] psdensity

O
OConnell, R. T., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
ograph, irf subcommand, [TS] irf ograph
Olkin, I., [TS] wntestb
one-step-ahead forecast, see static forecast
Ooms, M., [TS] arfima
Orcutt, G. H., [TS] prais
orthogonalized impulseresponse function, [TS] irf,
[TS] var intro, [TS] vec intro, [TS] vec,
[TS] Glossary
Osterwald-Lenum, M. G., [TS] vecrank
Owen, A. L., [TS] forecast

Q
Q statistic, see portmanteau statistic
query, forecast subcommand, [TS] forecast query

808 Subject and author index

R
random walk, [TS] Glossary
Ravn, M. O., [TS] tsfilter, [TS] tsfilter hp
Rebelo, S. T., [TS] tsfilter, [TS] tsfilter hp
recursive estimation, [TS] rolling
recursive regression analysis, [TS] Glossary
Reinsel, G. C., [TS] arfima, [TS] arima,
[TS] corrgram, [TS] cumsp, [TS] dfuller,
[TS] estat acplot, [TS] pergram, [TS] pperron,
[TS] psdensity, [TS] vec intro, [TS] xcorr
rename, irf subcommand, [TS] irf rename
Robins, R. P., [TS] arch
robust, Huber/White/sandwich estimator of variance
ARCH, [TS] arch
ARFIMA, [TS] arfima
ARIMA and ARMAX, [TS] arima
dynamic-factor model, [TS] dfactor
GARCH, [TS] arch
NeweyWest regression, [TS] newey
PraisWinsten and CochraneOrcutt regression,
[TS] prais
state-space model, [TS] sspace
unobserved-components model, [TS] ucm
rolling command, [TS] rolling
rolling regression, [TS] rolling, [TS] Glossary
Rombouts, J. V. K., [TS] mgarch
Room, T., [TS] arima
Rothenberg, T. J., [TS] dfgls, [TS] sspace, [TS] var
svar, [TS] vec, [TS] Glossary
Runkle, D. E., [TS] arch
Ryzhik, I. M., [TS] arfima

S
SAARCH, see simple asymmetric autoregressive
conditional heteroskedasticity
Saikkonen, P., [TS] vec intro, [TS] vecrank
Salvador, M., [TS] vecrank
Samaniego, F. J., [TS] varwle
Sanchez, G., [TS] arima
sandwich/Huber/White estimator of variance, see robust,
Huber/White/sandwich estimator of variance
Sargan, J. D., [TS] prais
Sargent, T. J., [TS] dfactor
scenarios, [TS] forecast, [TS] forecast adjust,
[TS] forecast clear, [TS] forecast coefvector,
[TS] forecast create, [TS] forecast describe,
[TS] forecast drop, [TS] forecast estimates,
[TS] forecast exogenous, [TS] forecast
identity, [TS] forecast list, [TS] forecast query,
[TS] forecast solve
Schmidt, T. J., [TS] tsfilter
Schneider, W., [TS] sspace
Schwert, G. W., [TS] dfgls

seasonal
ARIMA, [TS] arima
difference operator, [TS] Glossary
smoothing, [TS] tssmooth, [TS] tssmooth shwinters
seemingly unrelated regression, [TS] dfactor
selection-order statistics, [TS] varsoc
Sentana, E., [TS] mgarch
Serfling, R. J., [TS] irf create
serial correlation, see autocorrelation
test, [TS] Glossary
set, irf subcommand, [TS] irf set
Shumway, R. H., [TS] arima
shwinters, tssmooth subcommand, [TS] tssmooth
shwinters
Silvennoinen, A., [TS] mgarch, [TS] mgarch ccc
simple asymmetric autoregressive conditional
heteroskedasticity, [TS] arch
Sims, C. A., [TS] dfactor, [TS] irf create, [TS] var
svar, [TS] vec intro, [TS] vec, [TS] vecrank
simulation, [TS] forecast, [TS] forecast adjust,
[TS] forecast clear, [TS] forecast coefvector,
[TS] forecast create, [TS] forecast describe,
[TS] forecast drop, [TS] forecast estimates,
[TS] forecast exogenous, [TS] forecast
identity, [TS] forecast list, [TS] forecast query,
[TS] forecast solve
skewness, [TS] varnorm
smoothers, [TS] tssmooth, [TS] Glossary
double exponential, [TS] tssmooth dexponential
exponential, [TS] tssmooth exponential
HoltWinters,
nonseasonal, [TS] tssmooth hwinters
seasonal, [TS] tssmooth shwinters
moving average, [TS] tssmooth ma
nonlinear, [TS] tssmooth nl
solve, forecast subcommand, [TS] forecast solve
Sorrentino, R., [TS] tsfilter, [TS] tsfilter bw
Sotoca, S., [TS] sspace
Sowell, F., [TS] arfima
spectral
analysis, [TS] Glossary
density, [TS] psdensity, [TS] Glossary
distribution, [TS] cumsp, [TS] pergram,
[TS] psdensity, [TS] Glossary
spectrum, [TS] psdensity, [TS] Glossary
Sperling, R. I., [TS] arch, [TS] arima, [TS] dfgls,
[TS] wntestq
sspace command, [TS] sspace, [TS] sspace
postestimation
stability, [TS] var intro, [TS] var, [TS] var svar,
[TS] vecstable
after ARIMA, [TS] estat aroots
after VAR or SVAR, [TS] varstable
after VEC, [TS] vec intro, [TS] vec
standard errors, robust,
see robust, Huber/White/sandwich estimator of
variance

Subject and author index 809


state-space model, [TS] sspace, [TS] sspace
postestimation, [TS] Glossary, also see
autoregressive integrated moving-average model,
also see dynamic factor model
static forecast, [TS] forecast, [TS] forecast adjust,
[TS] forecast clear, [TS] forecast coefvector,
[TS] forecast create, [TS] forecast describe,
[TS] forecast drop, [TS] forecast estimates,
[TS] forecast exogenous, [TS] forecast
identity, [TS] forecast list, [TS] forecast query,
[TS] forecast solve, [TS] Glossary
stationary time series, [TS] dfgls, [TS] dfuller,
[TS] pperron, [TS] var intro, [TS] var, [TS] vec
intro, [TS] vec
steady-state equilibrium, [TS] Glossary
stochastic
equation, [TS] Glossary
trend, [TS] tsfilter, [TS] ucm, [TS] Glossary
Stock, J. H., [TS] arch, [TS] dfactor, [TS] dfgls,
[TS] irf create, [TS] rolling, [TS] sspace,
[TS] time series, [TS] var intro, [TS] var,
[TS] var svar, [TS] vec intro, [TS] vec,
[TS] vecrank, [TS] Glossary
strict stationarity, [TS] Glossary
structural model, [TS] Glossary
structural time-series model, [TS] psdensity,
[TS] sspace, [TS] ucm, [TS] Glossary
structural vector autoregressive
model, [TS] var intro, [TS] var svar, [TS] Glossary
postestimation, [TS] fcast compute, [TS] fcast
graph, [TS] irf, [TS] irf create, [TS] var svar
postestimation, [TS] vargranger, [TS] varlmar,
[TS] varnorm, [TS] varsoc, [TS] varstable,
[TS] varwle
SUR, see seemingly unrelated regression
SVAR, see structural vector autoregressive
svar command, [TS] var svar, [TS] var svar
postestimation

T
table, irf subcommand, [TS] irf table
tables, [TS] irf ctable, [TS] irf table
TARCH, see threshold autoregressive conditional
heteroskedasticity
Terasvirta, T., [TS] mgarch, [TS] mgarch ccc
test,
DickeyFuller, see DickeyFuller test
Granger causality, see Granger causality
Lagrange multiplier, see Lagrange multiplier test
normality, see normality test
Wald, see Wald test
Teukolsky, S. A., [TS] arch, [TS] arima
Theil, H., [TS] prais
threshold autoregressive conditional heteroskedasticity,
[TS] arch
time-domain analysis, [TS] arch, [TS] arfima,
[TS] arima, [TS] Glossary

time-series
filter, [TS] psdensity, [TS] ucm
operators, [TS] tsset
time-varying variance, [TS] arch
trend, [TS] Glossary
Trimbur, T. M., [TS] psdensity, [TS] tsfilter,
[TS] tsfilter hp, [TS] ucm
Trivedi, P. K., [TS] forecast estimates
tsappend command, [TS] tsappend
Tsay, R. S., [TS] varsoc, [TS] vec intro
Tse, Y. K., [TS] mgarch, [TS] mgarch vcc
tsfill command, [TS] tsfill
tsfilter, [TS] tsfilter
bk command, [TS] tsfilter bk
bw command, [TS] tsfilter bw
cf command, [TS] tsfilter cf
hp command, [TS] tsfilter hp
tsline command, [TS] tsline
tsreport command, [TS] tsreport
tsrevar command, [TS] tsrevar
tsrline command, [TS] tsline
tsset command, [TS] tsset
tssmooth, [TS] tssmooth
dexponential command, [TS] tssmooth
dexponential
exponential command, [TS] tssmooth exponential
hwinters command, [TS] tssmooth hwinters
ma command, [TS] tssmooth ma
nl command, [TS] tssmooth nl
shwinters command, [TS] tssmooth shwinters
Tsui, A. K. C., [TS] mgarch, [TS] mgarch vcc

U
UCM, see unobserved-components model
ucm command, [TS] ucm, [TS] ucm postestimation
Uhlig, H., [TS] tsfilter, [TS] tsfilter hp
unit-root
models, [TS] vec intro, [TS] vec
process, [TS] Glossary
test, [TS] dfgls, [TS] dfuller, [TS] pperron,
[TS] Glossary
univariate time series, [TS] arch, [TS] arfima,
[TS] arima, [TS] newey, [TS] prais, [TS] ucm
unobserved-components model, [TS] psdensity
model, [TS] ucm
postestimation, [TS] ucm postestimation

V
Van Loan, C. F., [TS] arfima, [TS] arfima
postestimation
VAR, see vector autoregressive
var command, [TS] var, [TS] var postestimation
varbasic command, [TS] varbasic, [TS] varbasic
postestimation
vargranger command, [TS] vargranger

810 Subject and author index


variance, Huber/White/sandwich estimator, see robust,
Huber/White/sandwich estimator of variance
variance decompositions, see forecast-error variance
decomposition
varlmar command, [TS] varlmar
varnorm command, [TS] varnorm
varsoc command, [TS] varsoc
varstable command, [TS] varstable
varwle command, [TS] varwle
varying conditional-correlation model, [TS] mgarch,
[TS] mgarch vcc
vcc, mgarch subcommand, [TS] mgarch vcc
VEC, see vector error-correction model
vec command, [TS] vec, [TS] vec postestimation
veclmar command, [TS] veclmar
VECM, see vector error-correction model
vecnorm command, [TS] vecnorm
vecrank command, [TS] vecrank
vecstable command, [TS] vecstable
vector autoregressive
forecast, [TS] fcast compute, [TS] fcast graph
model, [TS] dfactor, [TS] sspace, [TS] ucm,
[TS] var intro, [TS] var, [TS] var svar,
[TS] varbasic, [TS] Glossary
moving-average model, [TS] dfactor, [TS] sspace,
[TS] ucm
postestimation, [TS] fcast compute, [TS] fcast
graph, [TS] irf, [TS] irf create, [TS] var
postestimation, [TS] vargranger, [TS] varlmar,
[TS] varnorm, [TS] varsoc, [TS] varstable,
[TS] varwle
vector error-correction
model, [TS] vec intro, [TS] vec, [TS] Glossary, also
see multivariate GARCH
postestimation, [TS] fcast compute, [TS] fcast
graph, [TS] irf, [TS] irf create, [TS] varsoc,
[TS] vec postestimation, [TS] veclmar,
[TS] vecnorm, [TS] vecrank, [TS] vecstable
Vetterling, W. T., [TS] arch, [TS] arima
Vigfusson, R. J., [TS] forecast solve

W
Wald, A., [TS] varwle
Wald test, [TS] vargranger, [TS] varwle
Wang, Q., [TS] arima, [TS] newey
Watson, G. S., [TS] prais, [TS] Glossary
Watson, M. W., [TS] arch, [TS] dfactor, [TS] dfgls,
[TS] irf create, [TS] rolling, [TS] sspace,
[TS] time series, [TS] var intro, [TS] var,
[TS] var svar, [TS] vec intro, [TS] vec,
[TS] vecrank
Wei, W. W. S., [TS] psdensity, [TS] tsfilter, [TS] ucm,
[TS] Glossary
weighted moving average, [TS] tssmooth,
[TS] tssmooth ma
West, K. D., [TS] newey, [TS] pperron
White, H. L., Jr., [TS] newey, [TS] prais
white noise, [TS] wntestb, [TS] wntestq, [TS] Glossary

White/Huber/sandwich estimator of variance, see robust,


Huber/White/sandwich estimator of variance
Wiggins, V. L., [TS] arch, [TS] arima, [TS] sspace
Winsten, C. B., [TS] prais
Winters, P. R., [TS] tssmooth, [TS] tssmooth
dexponential, [TS] tssmooth exponential,
[TS] tssmooth hwinters, [TS] tssmooth
shwinters
wntestb command, [TS] wntestb
wntestq command, [TS] wntestq
Wolfowitz, J., [TS] varwle
Wooldridge, J. M., [TS] arch, [TS] mgarch,
[TS] mgarch dvech, [TS] prais
Wu, N., [TS] arima, [TS] newey

X
xcorr command, [TS] xcorr

Y
Yar, M., [TS] tssmooth, [TS] tssmooth dexponential,
[TS] tssmooth exponential, [TS] tssmooth
hwinters, [TS] tssmooth shwinters
YuleWalker equations, [TS] corrgram, [TS] Glossary

Z
Zakoian, J. M., [TS] arch
Zellner, A., [TS] prais

You might also like