You are on page 1of 407
{Time Series Analysis — James D. Hamilton Warts PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY ‘Copyright © 1988 by Princeton University Press. Pooled AIL Righs Reserved brary of Congress Catslepag in Poin Data Hanon, Janes D. (lames Bowls), (1950) Tie ers sony amet D. Home ee : ISBN o-91-000006 Qansotiss 1550 SSS —a0 93.959 “Tis bok has bem composed in Times Roman, Princeton University Pres book ate pied on aire paper and mee the giles fr fetes an roy ofthe Comoro dnc Gaines Boo Longe fe Pointed athe Usited Sate of Ameria wos7eses21 PREFACE sit 1 Difference Equations 7 1.1. First-Order Difference Equations 1 1.2. pth-Order Difference Equations 7 APPENDIX L.A. Proot of Chaoter 1 Provosions 21 References 4 2 Lag Operators 25 24. Introduction 25 2.2. First-Order Difference Equations 27 2'3. Second-Order Difference Equations 29 24. pth Order Difference Equations 33 2.5. Initial Conditions and Unbounded Sequences 36 References 42 3 Stationary ARMA Processes e Expectations, Stationarity, and Ergodicity 43 White Noise 47 Moving Average Processes 48 Autoregressive Processes 53 Mixed Autoregressive Moving Average Processes 59 ‘The Autocovariance-Generating Function 61 Invertibility 04 APPENDIX 3.4. Convergence Result for Infinite-Order Moning Average Processes 69 Exercises 70° References 71 Forecasting 2 Principles of Forecasting 72 Forecasts Based on an Infinite Number of Observations 7 4.3. Forecasts Based 4.4, The Triangular Factorization of a Positive Definite ‘Symmetric Matrix 87 4.5. Updating a Lineat Projection 92 4.6. Optimal Forecasts for Gaussian Processes 100 4.7. Sums of ARMA Processes 102 4.8. Wold’s Decomposition and the Box-Jenkins Modeling Philosophy 108 Be By a Finite Number APPENDIX 4.4, Parallel Retevon OLS Regresion and Linear Projection 113 APPENDIX 4.B. Triangular Factortuion ofthe Covariance ‘Matrix for an MA() Process T14 Exercises 1S References 116 5 Maximum Likelihood Estimation a7 5.1, Introduction 117 5.2. The Likelihood Function for a Gaussian AR(1) Process 118 5.3. The Likelihood Function for a Gaussian AR(p) Process 123 5.4. The Likelihood Function for a Gaussian MA(1) Process 127 5.5. The Likelihood Function for a Gaussian MA(q) Process 130 5.6. The Likelihood Function for a Gaussian ARMA(p, q) Process 132 5.7. Numerical Optimization 133 vi Contents 5.8. Statistical Inference with Maximum Likelihood Estimation 142 5.9, Inequality Constraints 146 APPENDIX 5.8. Proofs of Chapter $ Propositions 168 Bxercises 150 References 150 ‘Spectral Analysis 152 ‘The Population Spectrum 152 The Sample Periodogram 158 Estimating the Population Spectrum 163 Uses of Spectral Analysis 167 Exercies 178 References 178 ‘Asymptotic Distribution Theory ‘180 Review of Asymptotic Distribution Theory 180 Limit Theorems for Serially Dependent oO 6 APEENDIX 7.A. Proofs of Chapter 7 Froposinons 198 Exercises 198 References 199 8 Linear Regression Models 200 RA. Review of Ordinary Least Squares with Deterministic Regressors and i..d. Gaussian Disturbances 200 8.2. Ordinary Least Squares Under More General Conditions 207 83. Generalized Least Squares 220 NX B.A. Proofs of Chapter 8 Propositions 228 f 9 Tinear Systems of Simultaneous Fquations 234 9.1. Simultaneous Equations Bias 233 9.2. Instrumental Variables and Two-Stage Least Squares 238 Contents vit 93, Identification 243 9.4, Full-Information Maximum Likelihood Estimation 247 9.5. Estimation Based on the Reduced Form 250 9.6. Overview of Simultaneous Equations Bias 252 APPENDIX 9.A. Proofs of Chapter 9 Proposition 253 Exercise 285 References 256 10 Covariance-Stationary Vector Processes 257 10.1. Introduction to Vector Autoregressions 257 10.2. Autocovariances and Convergence Results for Vector Processes 261 10.3. The Autocovariance-Generating Function for Vector Processes 206 10.4. The Spectrum for Vector Processes 268 10.5. The Sample Mean of a Vector Process 279 APPENDIX 10.4. Proofs of Chapter 10 Propositions 285, Exercises 290 References 290 11 Vector Autoregressions 291 11.1. Maximum Likelihood Estimation and Hypothesis, Testing for an Unrestricted Vector Autoregiession 291 11.2. Bivariate Granger Causality Tests 302 113. Maximum Likelihood Estimation of Restricted Vector Autoregressions 309 ‘The Impulse-Response Function 318 Variance Decomposition 323 Models 324 Standard Errors for Impulse-Response Functions 336 APPENDIX LL.A, Proofs of Chapter 11 Propostions 340 APPENDIX 11.B, Calculation of Analytic Derivatives 144 Exercises 348 References 349 Contens 7D B.A. 133, 134 13.5. 136. 13.7. 13.8. a 141. 142 143. 144. 15 15.1. 1592. Rayesian Analysis 37 Introduction to Bayesian Analysis 351 Bayesian Analysis of Vector Autoregressions 360 . Numerical Bayesian Methods 362 APPENDIX 12.A. Proofs of Chapter 12 Propositions 366 exercise 370 ajerences 370 The Kalman Filter ‘372 The State-Space Representation of a Dynamic System 372 Forecasts Based on the State-Space Representation 381 Maximum Likelihood Estimation of Parameters 385 ‘The Steady-State Kalman Filter 389 Smoothing 394 Statistical Inference with the Kalman Filter 397 ‘Time-Varying Farameters 355 APPENDIX 13.A. Proofs of Chapter 13 Propositions 403 Exercises 406 References 407 [4 Generalized Method of Moments 409 Estimation by the Generalized Method ‘ot Moments 409 Examples 415 Extensions 424 GMM and Maximum Likelihood Estimation 427 APPENDIX 14.4. Proofs of Chapter 14 Propositions 431 Models of Nonstationary Time Series 8B Introduction 435, Why Linear ‘Lime Trends and Unit Roots? 438 Contents ix 15.3, Comparison of Trend-Stationary and Unit Root Processes 43% 15.4. The Meaning of Tests for Unit Roots 444 15.5. Other Approaches to Trended Time Series 447 APPENDIX 15.4. Derivation of Selected Equations Sor Chapter 15 451 References 452 18 Unit Roots in Multivariate Time Series 544 18.1. Asymptotic Results for Nonstationary Vector Processes $44 18.2. Vector Autoregressions Containing Unit Roots 549 18.3. Spurious Regressions 557 AFENDIN 18.4. Proajs of Chapter 1 Froposiions 362 Exercises $68 References 09 | 16 Processes with Deterministic Time Trends 454 | 19 Cointegration 571 16.1. Asymptotic Distribution of OLS Estimates ' 19.1, Introduction 571 of the Simple Time Trend Model 454 | 10.2. Testing the Null Hypathesis af No 16.2. Hypothesis Testing for the Simple Time Trend | Cointegration 582 Model 461 | 19.3. Testing Hypotheses About the Cointegrating 16.3. Asymptotic Inference for an Autoregressive i Vector 601 Process Around a Deterministic Time Trend 463 ARPENDIX 19.A. Proof of Chapter 19 Propositions 618 APPENDIX 16. Dervaion of Sete Equations | Psion pomdoper for Chapter 18-05 | Exercises 474 References 474 | | 20 Full-Information Maximum Likelihood | Analysis of Cointegrated Systems 630 7 Univariate Processes with Unit Roots ATS | 201 Cscaneal Conelanon Co) 17.1. Introduction 475 17.2. Brownian Motion 477 17.3. The Functional Central Limit Theorem 479 174. Asymptotic Properties of a First-Order Autoregression when the True Coefficient Is Unity 486 17.5, Asymptotic Results for Unit Root Processes with General Serial Correlation 504 17.6. Philips-Perron Tests for Unit Roots 506 177 Acymptotic Props Autoregression and the Augmented Dickey-Fuller ests tor Unit Roots 516 11.8. Other Approaches to Testing for Unit Roots $31 17.9. Bayesian Analysis and Unit Roots 532 APPENDIX 17.4. Proofs of Chapter 17 Propositions 534 Exercises 537 References S41 X Contents 20.2. Maximum Likelihood Estimation 635 20.3. Hypothesis Testing 645 20.4. Overview of Unit Roots—To Difference or Not to Difference? 651 APPENDIX 20.A. Prooft of Chapter 20 Propositions 653, Eisercises 655 Keferences 053 21 Time Series Models of Heteroskedasticity 657 21.1. Autoregressive Conditional Heteroskedasticity (ARCH) 657 Extensions 665 APPENDIX 21.A. Dervaiton of Selected Equations for Chapter 21 673 References 674 Contents xi 22 Modeling Time Series with Changes in Regime 67 22.1. Introduction 677 22.2) Markov Chains 678 22.3. Statistical Analysis of i.i.d. Mixture Distributions 685 22.4, Time Series iiudeis uf Changes in Regime 690 APPENDIX 22.4. Derivation of Selected Equations for Chaper 22 699 Exercise 702 References 702 Mathematical Review 74 A.1, Trigonometry 704 A2. Complex Numbers 708 A3. Calculus 711 ‘Ad. Matrix Algebra 721 A'S. Probability and Statistics 739 References 750 ‘Statistical Tables 731 Answers to Selected Exercises 769 D Greek Letters and Mathematical Symbols 786 Used in the Text AUTHOR INDEX 789 SUBJECT INDEX 792 xii Contents i | | | | | Preface ‘Much of economics is concerned with modeling dynamics, There has been an ‘explosion of research inthis arc inthe last decade, as “time series econometrics” has practically come to be synonymous with "empirical mactoeconomics.” ‘Several texts provide good coverage of tne aavances mn tne economic analysis of dynamic systems, while others summarize the earlier literature on statistical inference for time series data. There seemed a use for a text that could integrate the thearetical and empirical issues as well as incorporate the many advances of the last decade, such asthe analysis of vector autoregressions, estimation by gen~ crazed method of moments, and statistical inference for nonstationary dats. This is the goal of Time Series Analysis ‘A principal anticipated use ofthe book would be as a textbook fora graduate ‘econometrics course in time series analysis. The book aims for maximum flexibility ‘through what might be described as an ntegratea modular structure. As an example ‘ofthis, the first three sections of Chapter 13 on the Kalman filter could be covered Fight after Chapter 4, if desired, Alternatively, Chapter 13 could be skipped al- together without lace nf compeehensinn. Despite this lexhility, state-space ideas are fully integrated into the text beginning with Chapter 1, where a state-space reprecentation is used (without any jargon or formalism) to introduce the key results concerning difference equations. Thus, when the reader encounters the formal development uf the state-space fiamework end the Keluan filter in Chapter 13, the notation and key ideas should already be quite familiar ‘Special analysis (Chapter 6) s another topic that could be covered ata point of the reader's choosing or skipped altogether. In this case, the integrated modular structure is achieved by the early introduction and use of autocovariance generating. functions and filters, Wherever possible, reeulte are decribed in terms af these rather than the spectrum. “Although the book is designed with an econometrics couse in time serio methods in mind, the book should be useful for several other purposes. It is completely selfedntained, sarting trom basic principles accessible to frstyeat graduate students and including an extensive math review appendix. Thus the book ‘would be quite suitable for a fist-year graduate course in macroeconomics or ‘dynamic mathods that has no econometric content. Such a course might use Chay ters 1 and 2, Sections 3.1 through 3.5, and Sections 4.1 and 42. ‘Yet another intended use for the book would be in a conventional econo- smettis course without an explicit time series focus. The popular econometrics texts ‘Go not have much discussion of such topics as numerical methods; asymptotic results for serially dependent, heterogeneously distributed observations; estimation of ‘models with dlstributed lags; autocorrelation- and eteroskedasticity-consistent ait standard errors; Bayesian analysis; or generalized method of moments. All ofthese topice receive extensive treatment in Time Series Analysis, Tous, an econometrics ‘course without an explicit focus on time series might make use of Sections 3.1 LUxougl 3.5, Chaplets 7 rough 9, and Chapter 1, ana perhaps any of Cheprers 5, 11, and 12 as well. Again, the text is self-contained, with a fairly complete ‘discussion of conventional simultaneous equations methods in Chapter 9, Indeed, 1 very important goal of the text is to develop the parallels between (1) the ta ditional econometric approach to simultaneous equations and (2) the current pop- larity of vector autoregressinns and generalized method af mamente aetimatian Finally, the book attempts to provide a rigorous motivation for the methods and yet still be accessible for researchers with purely applied interests. This is achieved by relegation of many details to mathematical appendixes at the ends of hapicis, and by iuclusion of nunevous examples that ilustrate exactly how the theoretical results are used and applied in practice ‘The book developed out of my lectures at the University of Virginia. Tam grateful frst and foremost to my many students over the years whose question and comments have shaped the course of the manuscript. I also have an enormous debt to numernts colleagues who have Kindly nffered many = ‘and would like to thank in particular Donald W. K. Andrews, Stephen R, Blough, John Cochrane, George Davis, Michacl Dotacy, Robert Eigle, T. Wake Epps, Marjorie Flavin, Jobn Geweke, Eric Ghysels,Cario Giannini, Clive W. J. Granger, ‘Alastair Hall, Bruce E. Hansen, Kevin Hassett, Tomoo lnove, Rav Jagannathan, Kenneth F. Kroner, Rooco Mosconi, Masao Ogaki, Adrian Pagan, Peter C. B. Philips, Peter Rapoport, Glenn Rudebusch, Raul Susmel, Mark Watson, Kennett D. West, Halbert White, and Jeffrey M. Wooldridge. I would also like to thank Pok-sang Lam and John Rogers for graciously sharing their data, Tanks also go ta Keith Sil and Chrietogher Siomberg for the (Chen for assistance with the statistical tables in Appendix B, and to Richard Mickey for a superb job of copy editing James D. Hamilton XV Preface Time Series Analysis TL. First-Order Difference Equations Tis book is concerned with the dynamic consequences of events overtime. Let's say we are studying a variable whose value at date ¢is denoted y. Suppose we ate given a dynamic equation relating the value y takes on at date £10 another variable wand ta the value y tank on in the previens period He Ors + He roel Equation [1.11] sa linear first-order difference equation. A difference equation is ‘an expression relating a variable y, to its previous values. This is a first-order ifference equation because onlv the first lag of the variable (¥...) appears in the ‘equation. Note that it expresses y, as a linear function of y,_» and, ‘An example of [1.11] ie Golfelas (1073) ectimated money demand function forthe United States. Goldfeld’s model related the log ofthe real money holdings of the public (re) tothe fog of aggtegate teat inane (Z, the log ofthe itereat ale 0 ‘bank accounts (7), and the log of the interest rte on commercial paper (7): im, = 027 + OTim.; + 0191, ~ 0.0454 ~ 001%rq [1.1.2] This isa special case of [1.1.1] with y, = m, @ = 0.72, and = 027 + OAL, ~ 0.045 ~ 0.0157 For purposes of analyzing the dynamics of such a system, it simplifies the algebra ali to summarize the effet ofall the input variables (ru and) in terms cof a scalar was here. Tn Chapter 3 the igpt variable w, wil be regarded asa random variable, and the implications of [1.1.1 for th statistical properties of the output series y, will be ‘explored. In preparation for this discussion, itis noceasery Gt to understand the mechanics of difference equations. For the discussion in Chapters 1 and , the values Tor tne mput varie (wa, ---} Wal Simpy De regaroea as sequence ot aeter- sinistc numbers, Our goal isto answer the following question: If a dynamic system is described by [1.1.1], what are the effects on y of changes inthe value of w? Solving a Difference Equation by Recursive Substisution ‘The presumption is that the dynamic equation [1.1.1] goverus the behavior of y forall dates . Thus, for each date we have an equation relating the value of 1 _y for that date to its previous value and the current value of w: 0 ee ya the ny i= dye +m (14) 2 weéntm pas} Fs dart me p14 I we know the starting value of y for date ¢ = 1 and the value of w for ‘dates 1 = 0,1, 2, ..-, then its possible to simulate this dynamic system to find the value of y for any date. For example, if we know the value of y for = 1 andthe value of w for = 0, we can cleat the value of y for ¢ = 0 deatly {rom [1.1.3]. Given this value af yp and the valie af w for f= 1, we can cl the value of y for = 1 from {1.1.4} by. + Hoya +H) + we, Ye OY + Oy + my Given this value of y, and the value of w for ¢ = 2, we cam calculate the value of y for t = 2 from [1.1 5} a= Ont Wa = G1 + bo +m) + ms Yam Oy + Foo + dm, + me Continuing recursively in this fasion, the value that y takes on at date ¢ can be described as a function of its intial value y , and the history af we hetween date Oand date Y= OV + Guy + Som, + bw, + + hwy tm [1.1.7] ‘his procedure is known as solving the difference equation [1.1.1] by recursive substicution, Dynamic Multipliers [Note that [1.1.7] expresses y, 28a linear function of the intial value y_, and ie iitorica values Of W. Tats maxes ic very easy to calculate the effect ot My on. Ye If wy were to change with y_, and wy, Way... w; taken as unaffected, the ‘ftect on y, would be given by Oey, fine (18) [Note that the calculations would be exactly the same if the dynamic simulation were started at date r (aking y,-. as given); then y,., could be described as @ 2 Chapter | Difference Equations function of ya and Wi Wiese s Weas eat Po + Bly + Bas + OMe ‘The eflect of w, on ys given by at ogy 1.1.20) ‘Thus the dynamic multiplier [1.1.10] depends only on/, the length of tine sepatating the disturbance tothe input (w,) and the observed value ofthe output (j.,). The tuultplier docs wot depend on ¢, dat i, it dues not depend on the dates of the ‘observations themselves. This is true of any linear difference equation, ‘As an example of calculating a dynamic multiplier, consider again Goldfeld’s ‘money demand specification [1.1.2]. Suppose we want to know what will happen to money demand two quarters from now if current income J, were to increase by ‘one unit today with fate inenme 1, and J, zsmaffecte! aman _ am, mem sca aL ead rom [1.1.2], a one-unit increase in J will increase w, by 0.19 units, meaning that anal, = 0.19. Since @ = 0.72, we calculate armen - et ~ (0-20.49) = 0.098, {ecause 1 tne 1og ot mcome, an increase 11 of V0 units corresponds to a 1% increase in income. An increase in m, of 0.04)-(0.098) = 0.001 corresponds to {10.1% increase in money holdings, Thus the public would be expected to increase its money holdings by afte exe than 0.1% two quarters folowing a 1%h increase in income, Different values of ¢ in [11.1] ean produce a vatiety of dynamic responses of y tow. IO < 6 < 1, the multiplier 2,.,/3w, in [1.110] decays geometrically toward zero, Panel (a) of Figuce 1.1 plots @ as u function of j for @ = 0.8. If =1.< <0, the multiplier dy,,,4m, wil alternate in sign asin panel (b). In this case an increase in w, wil cause) to be higher, y,., tobe lower, j,.. 0 be higher, and soon. Again the absolute value ofthe eect decays eometrcaly toward zero, If > 1, the dynamic multiplier increases exponentially over time as in panel (c) {A given increate in w, has a larger effect the farth @< ~1, the system [1.1.1] exhibits explosive oscillation as in panel (4). ‘This, if [| <1, the sytem ia atable; the consequences of a given change in w, wil eventually die out. If fl > 1, the system is explosive. An interesting pos silly isthe borderline case, @ = 1. In ts eae, the solution (1.1.9} becomes Peay Fon 8K mad Ema Fmt may CLA Hire the output variable y i the sum ofthe historical inputs w. A in w will cause a permanent one-unit increase in y: Bay om forj = 0,1, We might also he interested in the effect of w om the present value of the stream of future realizations of y. For a given stream of future values Yo Jre1+ LLL. First Order Difference Equations 3 $$ amr gc gerne (@) 6 = 08 b) = -08 (eat @es-11 FIGURE 1.1 Dynamic multiplier for first-order difference equation for different values of ¢ (plot of 2y 40m, — #! ac a function ofthe lag j). Yon «+ and a constant interest rate! r > 0, the present value of the stream at time ¢ i given by igs + ty 1.1.12) tert Tee ae 12 Let p denote the discount factor: pau +9). [Note that 0 < P< 1. Then the provent value [1.1.12] can be written as PB Pier 1.1.13] Consider what would happen if there were a one-unit increase in w, with Meas ‘unaffected. The consequences ofthis change for the present value of y are found by differentiating [1.1.13] with respect to w, and then using [1.1.10] 4 Chaper 1 | Difference Equations te evaluate each derivative: Doe! 3 pies = uc - 64), fu.14 provided that Jl <1 iy ie Syoamic murs [110] oF (14, we were asing what woud happen fm, were to inretse by one un wih met menses en Unatected. We were thus hnding the ete of pus) trasatory change in Pane! (a) of Finure 1. shows the ine path fw associated with ths question, and panel (6) shows the implied path for. Because the Ecler he respon afy to singe impale in, impubereiponsefncton (a) Value of w ge a a a lh. Tie (0) vaue oy FIGURE 1.2 Paths of input vasiable (w,) and output variable (y,) assumed for dynamic multiplier and present-value calculations. 1.1. First Order Difference Equations 5 ‘Sometimes we might instead be interested inthe consequences of a permanent ‘change in w. A permanent change in w means that WW «+ and 4, Would all increase by one unit. as in Figure 1.3. From formula [1.1.10}. the effect on v,, ‘of permanent change in w beginning in period ris given by Woy 4 Mon 4 Beh yg Be ia Bel 4 eh Bish 4g eh = gs 4 gt sw tol ‘When [dl < 1, the limit of this expression as j goes to infinity is sometimes described as the “longrin” effect af w om y: in [Bets Bee By Meller e esas Pe Lam * aes * Sms rel pass) ua - 4) (@) Value of w | lll (&) Value of y FIGURE. 1.3. Paths of input variable (w_) and output variable (y) astumed for long-run effect calculations. 6 Chapter 1 | Difference Equations For example, the long-run income elasticity of money demand inthe system [1.1.2] is given by 019 on = 0.68. A permanent 1% increase in income will eventually ead to « 0.68% increase in ‘money demand. ‘Auvitcr wlaied question wnncins tie cumulative comequences for y of a ‘one-time change in w. Here we consider a transitory disturbance to w asin panel (@) of Figure 1-2, but wish to calculate the sum of the consequences forall future values of y. Another way to think of this is asthe effect on the preseat value of y [1.1.13] with the discount rate = 1. Setting = 1in [1.1.14] shows this cumulative effect to be equal to § Be 2 a - Bam Wa - 6, (1.1.16) provided that || < 1. Note that the cumulative effect on y of a transitory change 1m w (expression [1.1.16)) is the same a8 the long-run effect on y of @ permanent change in w (expression [1.1.15). 12. pik-Order Difference Equations Let us now generalize the dynamic system [1.1] by allowing the value ofy at date #0 depend on p of its own lags along withthe current value ofthe input variable baron + bade +t Syeay + Me (2a) Equation [1.2.1] i linear pth o difference equation. Itis often convenient to rewrite the pth-order difference equation {1.2.1} in tie saint y, 9 a firscorder Gierence equation in a vector g,. Detine tne (p X 1) vector &, by % Yet ee] yo 122) Jeepes The second element off, isthe value y took on at date 1 ~ 1, and $0 on. Define the (p x p) marx # oy Ob by dpa by 1020 Oe 6 Felo 10 Oia 1.23) 000 iit gh 12. pth-Order Difference Equations 7 For example, for p = 4, F refers to the following 4 x 4 matrix; ob by de re[t oo 0 0100 si Oran Forp ~ 1 (he Srstordr difference equation [1.1.1 Fis juste scalar 9. Fialy, define the (p % 1) vetorv, by 0 p24) Consider the following firet-order vector difference equation: tbe bs i 0 0 efor ol Deed Velo gui ‘mis i a system ot p equations. ‘Ihe first equation in this system is identical to equation [1.2.1]. The second equation is simply the identity cog Yas = Yeats ‘owing tothe fact thatthe second element of & isthe same as the first element of & 1. The third equation in [1.2.5] states that y,-. = ys; the pth equation states that ype = ¥, tor system [1.25] is simply an alternative represen stem {1.2.1}. The advouiage uf remsiting sie pi ‘order system [1.2.1] in the form of a firs-order system [1.2.5] is that first-order systems ave oflew easier 19 work with than pth-order systems. ‘A dynamic multiplier for (1.2.5] ean be found ia exactly the same way as was sone for the tst-order scalar system of Section 1.1. If we knew the value of the vector § for date ¢ = ~1 and of v for date r = 0, we could find the value of & for date 0 from boa FE: tHe ‘The value of § for date 1 is 6 Fy +1, = FRE +90) + = FE + Pho + Proceeding recursively in this fashion produces a generalization of 1.1.7} Ba POE FP t Po EP Rt By tHe [1.26] 8 Chapter 1 | Difference Equations Writing thie out in terme of the definitions of & and v, ” yo m0 wy Jens ya o 0 yin [= Foy] + Flo of +. Depet 0 ale nial o 0. ee ee einen ‘Ff denote the (1, 1) element of F, /{j the (1, 2) element of F', and so on, Then the first equation of [1.2.71 states that Liye = “ACP bo 4 +166 + Tito 1.28) This describes the value of y at date «as a linear function ot psnitil values of y (10-202 +++ Yop) and the history ofthe input variable w since time O (wo, w:, 1). Note that "whereas only one initial value for y (the value y_,) was needed inthe case of a firstorder difference equation. initial values fo y (the valnac Yow Jory. Yop) are needed in the case of a pth-order difference equation, "The obvious generalization of [1.1.9] is Bay PM HR Bat Bg + + Pea t te an from which ey = FEM + Mor Ho AL + Film, ee ee ‘Thus, for a ptorder difference equation, the dynamic multiplier is given by Rete rp 2a where f4Q denotes the (1, 1) element of Bl. For j = 1, this is simply the (1, 1) clement of F, or the parameter 4. Thus, for any ptb-order system, the effect on Yins of a one unit increase in w; is glven by the coefficient Lelaling y, 40 yo-aia equation [1.2.1]: Mer at = by. Direct multiplication of [12.3] reveals thatthe (1,1) element of Fis (8% + 4). © Bet = + 6, ina pth-order system. For larger values of, an easy way to obtain a numerical value forthe dynamic multiplier ay... isto simulate the svstem. This is done as follows. Set ¥ . = Yor = *** = yup = 0, Hy = 1, and get the value of w forall other dates to 0. ‘Then use [1.2] to ealelate the value of y, for f = 0 (namely, yy — 1). Next substitute this value along with yr, Yo-2 »» » Joop x back into (1.2.] to calculate Yors, nd continue recursively in this fashiou, The value vf y al step ¢ gives the effect of a one-unit change in Wo 00 J ‘Although numerical simulation may be adequate for many circumstances, it is also useful to have a simple analytical characterization of 2,.,/2v, which, we know from [1.2.11] is given by the (1,1) clement of F. This is faily easy to obtain in terms ofthe eigenvalues ofthe matrix F. Recall that the eigenvalues of a matrix F are those aumbers A for which ou pany For example frp ~ 2 te egemalues are he ston oh ie aia a ERR [O78 S]-w- eres a ‘The two eigenvalues of F for a second-order difference equation are thus given by ig bt VERE 1.2.45] For a general pth-order system, the determinant in (1.2.12] is a pth-order poly- ‘nomial in A whose p solutions characterize the p eigenvalues of F. This polynomial turns out to take a very similar form to [1.2.13]. The following result is proved in. Appendix 1.A at the end of thie chapter. srroposuion 1.1; Ine eigenvalues of the matns ¥ defined in equation 1.2.3] are the values of A that satisfy WP 6M = bP = by -6 1.216) Once we know the eigenvalucs it it straightforward to characterize the dy namie behavior of the system. First we consider the case when the eigenvalues of, F ore distinc; for example, we requise tat Ay and Ay in (1.2.18) and (1.2.13) be ifferent numbers 10 Chapter 1 | Difference Equations General Solution of a pth Order Difference Equation with Distinct Eigenvalues cantata Recall that ifthe eigenvalues of a (p p) matrix F are distinct, there exists ‘a wuusingulat (p x p) matsixT such that F tat 207] and zeros elsewhere: iycuvaives ut F along te principal clagonal a 0 Os 0 Om 0 Oo (.218) Plin [1210p very easy, For example, fom [217] we aa mote Po F = TAT x TAT"! ST KAX (EM) XA XT TKAXLXAXT = TAT, ‘The diagonal structure of A implies that A*isals a diagonal matrix whose elements ar 0 0 ° HOARD 0 00 0 a ‘More generally, we can characterize Fin terms of the eigenvalues of F a8 Bi = TATA! x TAT! x +++ x TATA Tes TAX (TOT) XA TT) x ATO, which simplifies to P= TAT 1.219] where AL 0 0 ° wale %e 0 GOO ca *See equation {A 4.24] nth Mathematical Review (Appendix A) atthe endo the book, 1.2, pth-Order Difference Eauations 11 Let fy denote the row i, column j element of T and let denote the row i, column. {Jelement of T-'. Equation [1.2.19] written out explicitly becomes rs ec rr pal eo LO Oe off mo aw aa aio tone ce tae puis e tae tM ta oo A] PB BE a a ie a aa ee BM adh tek Let ata from which the (1,1) element of Fis given by AP = [eal AG + lea AL + + AS FR = GM teak tt oh (1.2.29) whese = (ee) (122) Note thatthe sum of the terms has the following interpretation: st) + leat] ++ + Uae], (1.2.23) st the (p x p) identity atgtintg which i the (1,1) clement of TT Since 7-7! Sunt, (22 2] tpies thatthe c teas sum 0 way tabbed hax) Subeatng (1.2.20) into (1.2.11 gives the form ofthe dynamic mutper fora ponder dference equation: ee ees [1.2.24 Equation [1.2.24] characterizes the dynamic multiplier as a weighted average of each of hep eigenvalues cesed to the th power. “The flowing res provides acid frm expression for the cones eee Propostion 1.2: If the eigenvalues (gs) ofthe matric 1.23) are hc then ie magnate i [1.2] Zan De wen q-Z (1.225) flaw To summarize, the pth-order difference equation (1.2.1) implies that oes Lr + IE ena Ho + 1G (1.2.26) Fees Mensa F PaMrajea tO Bairro 12 Chapter 1 | Difference Equations The dynamic mliplier Fel my 227 is givon by the (1,1) element of F: ao 12. (1.2.28) ‘A closed-form expression for y can be obtained by finding the eigenvalues of F, orn vaca ofA stsping 1-216). Denoning tnesep vals by (edn = Ap} and assuming them to be dite, the dynamic multiplier is given By We eM + edb +b ol 1.229] where (cy, i. -- 5 6) isa set of constants summing to unity given by expression. 1.2.23) For a first-order system (p = 1), this rule would have ut solve [1.2.16], Am by 1.2.30) ‘According to (1.2.29), the dynamic multiplier is given by 8 Ft = eal 1.2.31] From {1.2.23}, ¢, = 1. Substituting this and (1.2.30) into [1.2.31] gives Mer as Pein ‘or the same result found in Sostion 1.1 For higher-order systems, [1.2.29] allows a vatiety of more complicated dy. ans, Suppose Got hat all Uy eigenvalues of F (oF solutions to [1.2.10)) are real, This would be the case, for example, if p = 2 and 4 + 46, > 0 in the solutions (1.2.14) and [1.2.13] forthe second-order system. If, furthermore, all of the eigenvalues are less than 1 in absolute value, then the system is stable, dynamics are represented as a weighted average of decaying exponent caving exponential cxillating in order difference equation: Ye = O6Ypa4 + O22 + Me From equations [1.2.14] and [1.2.15], the eigenvalues uf this systcin ace gives by 0.6 + ORFF HOD ay = VOTE TO ogy 0.6 = VORF FRO aie 5 = 024. From [1.2.25], we have 6, = AQ, = A) = 0.778 6 = Avy ~ Ay) = 0.222. ‘The dynamic multiplier for thie aystem, Sets eat + oa, 12. pth-Order Dt erence Equations 13 is plotted as a function of jin panel (a) of Figure 1.4. Note that as / becomes Tanger, the pattern is dominated by the larger eigenvalue (A,), approximating a simple geometric decay at rate 2. Ifthe eigenvalues (the solutions to (1.2.16) are real but atleast one is greater ‘than unity in absolute vale, the system is explosive. If 4, denotes the eigenvalue ‘that i largest in absolute value, the dynamic multiplier is eventually dominated by an exponential function of that eigenvalue: aia fam ‘Other interesting posites aris if some ofthe eigenvalues are complex. Whencver this i the ene, they appear az complex conjugates. For example, if p= 2and $] + 46, <0, ten the solutions Ay and A, in (12.14) and (1.2.15 are Comples eonjugates: Suppose that A and Ay ae complex conjugates, written oe As at bi (1.2.32) a, =a ~ bi 1.2.33) For the p = 2case of [1.2.14] and [12.15], we would have a= 42 (12.34) b= GRE (1.235) (Gur goal ic to characterize the contribution to the dynamic multiplier Ai ‘when Ay is complex number asin [1.2.32]. Recall that to raise a complex number to-a power, we rewrite [1.2.32] in polar coordinate form: A, = Relos() + isin(@n. 11.2361 ns: sin(@) = BIR. [Note that R is equal to the modulus ofthe complex number A, ‘The eigenvalue Ay in [1.2.36] ean be waitten as" ao Rie sad so A = Bile] = Rifcos() + fia) 2.37) ‘Analogously, if isthe complex conjugate ofA, then Ay = Rlcos() ~ isin(®), which can be written’ Taos A = Rife] = Refcos(#) ~ sin( OH). (1.2.38) rte sumer plot ain Fig 4, he Agia ane ppon aay ts aameril slain ofthe ester ‘Sec equation [4.3.25] in the Mathratcal Review (Append A) athe. *See eqntio [A328 14 Chapter 1 | Difference Equations ee 83 i | cae ve an - (@) & = 066-02 ° ij He, Enea wo es FIGURE 1.4 Dynamic multiplies for sevonl-ueder differeuve equation fr ditfer- ent values of , and gy (plot of a, ,/aw, as a function of the lag). ‘Substituting 1.2.37] and (1.2.38) into [1.2.25] gives the contribution of the complex conjugates to the dynamic multiplies 3,5 fe, + ex} Recos(@) + Hey ~ ed Rsin(. ‘The appearance of the imaginary number i in (1.2.39] may seem a litle Awoubling. After al, this calculation was intended to give the effect of a change in the real-valued variable w,on the real-valued variable y,,, as predicted by the real- Vaned system [1.2.1], and it would be odd indeed if the correct answer involved the imaginary number i! Fortunately, it tums out from 1.2.25] that if Ax and A, axe complex conjugates, then c, and ¢, are complex conjugates; that, they can 1.2. pth-Order Difference Equations 15 be written as: sat Bi Gna pt for some real numbers « and , Substituting thece exprossions into (1.2.38) yielde (a + A) + (a A) Ricos( gH + ila + Ai) ~ (a ~ BR) (2a Ricos(o)) + F200 -R’sin(@) 2aR cos) ~ 26R'sin(), eM +e hich is strety ral ‘Thus, when some of the eigenvalues are complex, they contribute terms proportional to K’ cot() and K’sin(d)) to the dyaamic multiplier 3, .,dw,, Note that if R = 1—that is, ifthe complex eigenvalues have unit modulus—the mul- tiplers are periodic sine and cosine functions of j. A given increase in w, increases Yes, for some ranges of jand decreases y,,, aver other ranges, with the impulse sever dying ‘o.If the cor axe leis th 1 in modi atthe rate Rif the complex eigenvalues are greater than I in modulus (R > 1), the amplitude of the sinusoids explodes at the rate FY For an example of dynamic behavior characterized by decaying sinusoids, consider the second-order system YH O54 ~ O8).-2 + Me The cigenvalues for this system are given from [1.2.14] and (1.2.15) = 0.25 + 0.861 os - vos? = 408) 2 25 ~ 0.86%, with modulus R- VOBYlF WR = 09. Since R <1, the dynamic multiplier follows a patter of damped oscillation plotted 1m panel (0) of Figure 4.4. ine Tequency* of cnese osciiauons is given oy the parameter 6 in [1.2.39], which was defined implicitly by cos(8) = aR = (0.25)(0.9) = 0.28 on 1D, ‘The cyl asociated with the dynamie lip fenton [1.2.9] ths hve a riod of if Ie _ QB.1459) Cima’) that is, the peaks inthe pattern in panel (b) of Figure 1.¢ appear about five periods apart. = 495 “See econ A.1 ofthe Mathemata! Review (Appedix A) at he end ofthe book fra session ofthe equency nad period of «snd function 16 Chapter 1 | Difference Equations Solution of a Second-Order Difference Equation with Distinct Eigenvalues ‘he second-order ditterence equation (p = 2) comes up sufficiently often that i s useful to summarize the properties of the solution asa general fnetion of 6; and dy, which we now do.” ‘The eigenvalues A, and Az in [1.2 14] and [1.2.15] are complex whenever 88+ 4d <0, ot whenever (6, dy) is below the parabola indicated in Figure 1. For the case of complex eigenvalues, the modulus R satisfies Rea a+b, or, from tata 2.35), = DY ~ ($F + 4) = bu, ‘Thus, a system with complex eigenvalues is explasive whenever 4 <° —1, Alen when the eigenvalues are complex, the frequency of oscillation is given by = cos MaiR) = cos-\[ds(2 V=F)), where “os "(4)" denotes tne inverse ofthe cosine function, or the radian measure WN A LLIS ee, fares corre Tl ANI iN FIGURE 1.$ Summary of dynamics for a second-order difference equation. "is scsion closely flows Surge (1987, p. 18-0), 1.2. puh-Order Difference Equations 17 For the case of real eigenvalues, the arithmetically larger eigenvalue (A,) will be greater than unity whenever B+ VER 2 VETTE >2- 6 ‘Assuming that A; 18 real, the let side of this expression is a positive number and the inequality would be satisfied for any value of éy > 2. If on the other hand, 4, <2, we can square both sides to conclude that A, will exceed unity whenever OF + Ady > 4 4b, + OF fel ‘Ths, inthe rel region, 2, will he geo a. lies northeast of the line y = 1 ~ ¢, in Figure 1.5. Similarly, with real eigenvalues, the arithmetizally smaller eigenvalue (4;) wil be less that! —7 whenever OE + db > 4+ 44, + 62 O21 +o ‘Ths, in the real region, Az will be les then —1 if citer $y < ~2.ur (by 6) les to the northwest of the line 6, = 1+ 4, in Figure 1.5. ‘The system is thus stable whenever (g, ¢) lies within the tnangular region of Figure 15. General Solution of a pth-Order Difference Equation with Repeuted Eigenvalues In the more general case of a difference equation for which F has repeated cigenvalues and s

BE Bs, the result will be exactly the same as if we had applied the lag operator fret and then the multiplication operator: RR Bt 26 Chapter 2 | Lag Operators ‘Thus the lag operator and maliplication operator ate commutative: Lp) = BL, Similarly, if we first add two series and then apply the lag operator tothe result, Gu Me iad F Hine the result isthe same a if we had applied the lg operator before adding (6 WF CBs Wins) Pins + Mra “Ths, the lag operator is distributive over the addition operator: Ls, + w) = Le, + Lw, We thus se that the lag operator follows exactly the same algebraic rues as the multiplication operator For this reaton, iis tempting t we the expression “multiply y, by L” rather than “operate on (yJe.-u By L." Although the latter xpreon i tachnialy mors core this tex wil ten we the former shorthand ‘expression to feitate the exporon, Fave wi te even dened in cus wound ypsatrs, we ae fee to use the standard commutative, sasociative, and dstibuive algebraic avs for Imulpiaton and sdation to expres tn compound operitor in an atematve form. For example, the proces defined by y= (a+ BLL, is exactly the same as Ye (QL + DLA, ax + ta To uke anoxher exampie, (= AE = Ay = A = AL + LD (TF AIL + Aa, 245) Ht Adan + (ua, ‘An expression such as (al. + bl’) is referred to asa polynomial inthe lag ‘opersor. It's algebraically similar toa simple polynomial (az + bz*) where z is 2 scalar. The ditference is that the simple polynomial (ez + 6:2) refers to a fan operator that would be applied to one time series {x)7.-» to produce a new is just a series of constants, * for allt ‘then the lag operator applied to x, produces the same series of constants b= my ee. (aL + BL? + Ly (+ Bene 2.1.6 2.2. First-Order Difference Equations Let us now return tothe first-order difference equation analyzed in Section 1.1: Ye Os + Me p24) 2.2, First-Order Difference Equations 27 Equation [2.2.1] can be rewritten using the lag operator [2.1.3] 28 w= OLY + ‘This equation, in turn, cam be rearranged using standard algebra, a by = mn (1 6b), = p22} [Next consider “multiplying” both sides of [2.2.2] by the following operator: (1+ OL + gL + Lt FOLD. 223) ‘The result would be Wt OL + OL? + OE + + HNC ~ aL), HLF OL + FLEE + + OLDW, Expanding out the compound operator onthe lft side of [2.2.4] results in (14 OL + LEE OL +--+ PLN 61) Abs BS BEL gTD ~ (4 L FOL EOL +--+ GLIOL (4 0b + OLE + gL e+ gL) ~@LESL EOL + FOL 4 gL) ae), Substioting [2.2.5] into [2.2.4] yields CO Lt Ont OL FO He oL IM, E20 Weiting 2.2.6] out explicitly wing (21.4) produces = OH ge p24 225) We Om ug AM gee ba HPV tHe + OMe + PM + Mag too + dy (22.7) ‘Notice dhat equation (2.2.7] is identical to equation [1.1.7]. Applying the ‘operator (2.2.3) is performing exactly the same set of recursive substitutions that ‘were employed in the previous chapter to arrive at [1.1.7] It's interesting to reflect on the mature ofthe operator [2 23] as 1 hecomes large. We saw in [2.2.5] that (+ OL + GLE + LP Ho + GLIA - OL = oY That is, (Lt 6h + GL? + BL + +++ + PLIL — Ly dtters trom y, by the term *ty_,. If |g] <1 and if y_, isa finite number, this residual fly. ‘will become negligible as ¢ becomes large CELE PES OE H+ GEN Oty, forsiange AA sequence {y)}*._. i etd to be Bounded if there exits a finite number F such that bd 1 will he diconened in Section 2.5, Consider next a second-order difference equation: Je= Ora + tadoa + We 23. 23, Second-Order Difference Equations 29 Rewriting this in lag operator form produces (= bb ~ daL%), (232) ‘The left side of [2.3.2] contains a second-order polynomial in the lag operator LL. Suppose we factor this polynomial, that is. find numbers A, and A, such that (1 dub @aL3)= (1 AL) ~ Ag) = (I= Bayt AIL + AAL?). (2.3.3) ‘This is just the operation in [2.1.5] in reverse. Given values for g, and gz, we seek. numbers Ay and Ay wilh the properties that Ata os and Ads =~ For example, ify = 06 and dy = ~0.08, then we should choose A, = 0.4 and ae i = BSEKA ~ .2L). 4 It's easy enough to sce that these values of Ay and 4 Work fos thio nuuetival example, but how are A, and 4; found in general? The task isto choose Ay and Ay so as to make sure that the operator on the ght side of [2.3.8] Mdentical to that fn the left side. This will be true whenever the following represent the identical functions of 2: = 62 - a2) (aay = Az). the point of doing s0? With [23 5}, we can now ask, For what values oft the right side ot (2.3.] equal to 2er0? The answer i, if either 2 = Ay? or 2 = Az, then the right side of [23.5] would be zero. It would not have made sense to ask an analogous question of [2.3.3] —L denotes a particular operator, not a number, and L = A;*is not a sensible statement, ‘Why should we care thatthe right side of [2.35] is zero if = Ay orf = AZ!7 Recall that the gosl wae to choote A, aad A, 20 that the two sides of [2.3.5] represented the identical polynomial in z. This means that for any particular value 4 Gie ty fauniivus anune produce de stime nunber, i we find a vaiue of z mat Sets the right side to zero, that same value of z must set the left side to zero at well, But the values of z that set the left side to zero, 92-02) 20, 22.0) ae given by the quadratic formule Ao VETER, ee pan b+ VEER a pan Setting < = £y oF 22 uiakes the left side of [2.3.5] zeru, while z= Ay * or A" sets the right side of [2.3.5] to zero, Thus apen 39} Ase 23.19) 30 Chapter2 | Leg Operators Returning to the numerical example [2.3.4] in which 4 = 0,6 and 6 wwe would calculate 08, 00 = VORF= HOB - 200.08) 0.6 + V(0.6F — 10.08) irra cca = 50, “ 2.5) = 04 (5.0) = 0.2, 5 was found in (2.3.4) ‘When $3 + 4d, < 0, the values 2, and 2, are complex conjugates, and their reciprocals A; and A; can be found by first writing the complex number in polar coordinate form. Specifically, write neat bl Ke oos(0) + en(9)] = Re. Then Rover = RV |cos(8) ~ tsin(6)). ‘Actually, there is a more direct method for calculating the values of Ay and ‘A, rom @ and g, Divide both sides of (2.3.5) by 2 and define Ato be the variable 2~*: dent (23.12} Substituting [2.3.12] ito [2.3.11] produces (8 = GA ~ $3) = = ANE ~ Ad. 23.13] Again, [2.3.13] must hold forall values of Ain order for the two ti to represent the same polynomial. The values of A that set the are A= Ayand A = Ay. These same values wus se ie ffs of (2 aswell: of [2.3.5] er erg iad, (23.14) ‘Thus, to calculate the values of A and A, tha factor the polynomial in [2.3.3], we can find the roots of (2.3.14) directly from the quadcatic formule: b+ VERE he paas i (23.16) For the example of 2.3.4], we would thus calculate 4 x SL OOH |, 06 = VORP ATH a = 02 23, Second-Order Difference Equations 31 It is instructive to compare these results with those in Chapter 1, There the ‘dynamics of the second-order difference equation [2.5.1] were summarized by calculating the eigenvalues of the matrix F given by Fe {ft $}: ea.) The eigenvalues of F were seen to be the two values of A that satisfy equation 2.8 08 = ad ~ #) = 0. ‘But tis isthe same calculation asin [23.14]. This finding is summarized in the following proposition Proposition 2.1: Factoring the polynomial (L~ dul ~ gab) ws (1 ~ 6b ~ 614 = 0 ~a,D0 ~ a1) a3] 4 the same calcadation as finding the eigenvlues ofthe maiix F in [23.17]. The eigenvaiies nad A of are te same as the parameters hy aid by is |23.18, and are given by equations [2.3.15] and [2.3.16 The correspondence herween calculating the eigenvalues of a matix and factoring a polynomial inthe lag operator is very instructive. However, it introduces ‘one minor source of pomsible semantic confusion about which we have tobe careful Recall from Chapter 1 that the system [2.31] i stable if both hy and Ay are less then 1 in modulus and explosive if either A, or Ay is greater than 1 in modulus. Sometimes this is described as the requirement thatthe roots of (8 @A- 6) =0 3.19] lie mside the unit circle. The possible confusion is that i is often convenient to work directly withthe polynomial in the form in which it sppeare in (7 37) (1 = bir ~ 6:24) = 0, 23.29) ‘whose roots, we ave seen, ae the reciprocals of those of [2.3.19]. Thus, we could say with equal acruracy that “the difference equation (2.21] s stable whenever inside the unit circle” or that “the diference equation ts 3.29} ic ume tie unit cic. The two Statements mean exactly the same thing. Some scholars refer simply tothe “roots of the diference equation (2.4.1]," though ths raises the possibility of confusion between {2.3.19] and [23.20]. This book will ollow the convention of usine the term “eigenvalues” to refer to the roots of [2.3.19]. Wherever the term “roots” i used, we will indicate explicitly the equation whose roots are heing described. From here on inthis section, its assumed thatthe second-order difference ‘austin is stable, with the eigenvalues 4, and A, distinct and both inside the unit citle. Where this isthe cas, the inverses (1 ADy = 1+ AIL + ARLE sa + (Dy = AL ARE? sa + ae well defined for hounsded sequences Write [2 42] in factored form: (1 ADV - Ady, = fand operate on both sides by (I= AL)-*(1 ~ AyL)=* Y= 0 ~ ALO = aL), 23.21] 32 Chapter 2 | Lag Operators Following Sargent (1987. p. 180). when dy # As, we can ae the foloing pera afm a . ee eee ME gee (2.3.29) Notice that his simply another way of writing the operator in 2.3.21} a ae pa = AL) = AG = 0} (= AD (= AD) ‘Thus, (2.321) can be written as y= ay, [gh Mea ae U1 ak #08 6 a hw u-& Jem fen cde Leaks + hdmi + [ol + ella + LeAE + cabling tor [23.23] where 6. Auth, = 4) 23.24) la, = 22) [pas From [2.323] the dynamic multiplier canbe read off dredy a, Fel m cal + ol, ihe sae seauitattives tin equations (4.2.25) anc 7a pih-Order Difference Equations ‘These techniques generalize in a straightforward way to a pth-order difference equation of the form Dem bv tated HF preg # me 4a] Write [94 1a terme af lag apertnne (1 - @L - @L? - »L?ly, = Wy [2.4.2] Factor the operator on the left side of [2.4.2] as (= b= byl? + 8pLP) = = ALICE Ab) AgL). (24.3) “This the same as finding the values of (Ay, Ay,» Ap sh thatthe following polynomials are the same fra : (1 = Oar = dat = 5 = dy2t) = = Aa) = ks) = a2) 24. oth-Order Difference Eauations 33 ‘As inthe second-order system, we multiply both sides of this equation by 2-* and. ‘dete A# 2°: (raat gat tek 4) O-Ma= ay a-ay. Ped Cary, sein A = A fori = 1,2, or p eases the right sid of 2.44 0 equal ao. Th th vlc (hy Ay) mast temas tha 5th el Sie of expression [lu 0 zero s wal BOM MAA bk ge RAS) ‘This expression again is identical to that given in Proposition 1.1, terized the eigenvalues (Ay... .,)of the matrix F defined in eq ‘Thus, Proposition 2.1 readily generalizes. Proposition 2.2: Factoring a pth-order polynomial in the lag operator. (= yb = dL? ~ +++ = byt) =U = ALY = AL) = AL), 4s the same calculation as finding the eigenvalues of the matrix F defined in (1.2.3). The eigenvalues (Ass Bay» «5 Ay) of F are the sume us the parameters (Ay hays» 2,) in [2.43] and are given by the solutions to equation [2.4.3] ‘The difference equation [2.4.1]is stable ifthe eigenvalues (the roots of[2.4.5}) lie inside the unit circle, or equivalently if the roots of 1m de = Get zr 0 R69] ‘Assuming that the eigenvalues are inside the unit cscle and that we are sesticting ouselves to considering bounded sequences, the inverses (I= AyL) (1 AiL)-*,..., (1 = A,L)°* all exist, permitting the difference equation (2 DG = Ab) = Agbdys = my tobe writen as rata ee sscned with he epeator on the Wight ie of [2.47 ea again be oe ‘panded with partial fractions 1 Oat) ual acu iaiagin nag ia Tm * Tm a 55 Following Sorgen (1027, pp 109-02), the vale of, 2, that mabe [2.4.8] ‘ue can be found by multiplying both sides by (1 — Ayz\(L ~ Aye) == (1 = Aya) Tm ox(1 ~ Age)(t ~ As) + (1 = Aya) ell = ABIL = Ase) = Aya) + ay) $6 (k= AM = Ae) = Apa) Equation [2.4.9] hes to hold forall vlues of z. Since i is a (p ~ I)th-order polynomial, if (cy, Ca, «» »» ¢g) are chosen so that [2.4.9} holds for p particular 34 Chapter 2 | Lag Operators Alistnet values of 2, hen [2 49] must hold forall 7 Toe atz = Az" requires that (1 = AREAL = AAP (LAAT) me that [2.40] holds a SS = A (24.10) For [2.49] to hold for z = AF, As... Az requires fui ag i OE AN = AT Oe =) aus ast =. 24.23) GANA.) = od [Note again that these are identical to expression [1.2.25] in Chapter 1, Recall from ‘the discussion there that cy + ¢ + «+ + 6 = 1 To conclude, [2.4.7] can be written nn ee "OAD" T=” iD" meh ALE ALE ALLE wets ht Mal + AIL? + AIL? + Fe GOAL EARL? +L? Ho Ow, Je le eto + eli + [eh + he to + eA Mins FlGAE + GAR +b Gn 24.13) SHGAT tea tot Gang te where (ey ci +169) are given by equitions [24.10] through [24.12]. Again, the dynamic muller canbe read directly off 2.4.13} Bed a foal + Gah too + GA 241g sprog the sl fom Chapt “Tere is very convenient way to calculate the effect of w onthe present value of wing the lag operator representation, Write [2613] a5, = Gas + Yatra tama + atin to (24.13) [eM amet oa 4.16} [Next rewite [2.4.15] in lag operator notation as y= WL) ‘where Y(L) denotes an infinit-order polynomial inthe lag operator: WL) = Uy + ale + oat + GL? + 24.17) 24. oth-Order Difference Eauations 35 Notice that ¥ is the dynamic multiplier [2.4.14]. The effect of w, on the present value of y is given by rom tar ew a nee (2) = Uo + the + Hz? + ert, it appear that he uli (2.4.18 simpy his pomomilevauated atx = 23, Pm aaa [24.18] HB) He RE EE A [2.49] ‘But comparing [24.17] with [2.4.7], itis apparent that WL) = [1 = AL) = Aa) = ALDI, 1S from [2.4.3] this means thet WL) = [= dL = dl? = = te We conclude that O2) = [= diz = dae = 82h foray vale of, n pats, Gee ae amt ee [2.4.20] Sbuitting (2420) int 2.4.19} evens Ua 2D BM. 1 Sapa ee AI reproducing tae ciaim in Proposition 1.3. Again, the iong-run muitupuer ootains a the special case of [2.4.21] with B = 1: tim [Meg Bon, Moar pe Law, * Owes Bre a-ha e 2.5. Initial Conditions and Unbounded Sequences Section 1.2 analyzed the following problem. Given a pth-order difference equation Nem Der ayer tT pnp t Me 5.1] Piitial values of y, ee a (252) and a sequence of values for the input variable w, {Wop Wiss es Ms (253) 36 Chapter 2 | Lax Operators we sought to calculate the sequence of values for the output variable ¥ AYor Yao s+ ‘Certsinly there are systems where the question is posed in precisely this form. We ‘may know the equation of motion forthe system [25.1] and its eurent state [25.2] ‘and wish to characterize the values that (Yo )1,.. - /} might take on for different specifications of two, mi,» We However, there are many examples in economics and finance in which a theory species Just the equation ot moton {2.9.1} and a sequence of diving variables [2.5.3] Clearly, these two piece of information alone are insufficient to . ‘Ths, (PJP _ ie tohe a hannded convene, then we cen tal a5 T © t0 conclude aS [pa] 2 a) hich i referred to asthe “market fundamentals” solution of [2.5.5] for the general ‘case of time-varying dividends. Notice that [2.5.14] produces (2.5.8) asa special ‘ase when D, = D for all Describing the value of a variable at time ¢ as a function of futur realizations foresight model of stock prices. However, an analogous set of operations turas out to be appropriate in a systeut sinilar w [2.5.4] in which expected returns are constant." In such systems [2.5.14] generalizes to -3 IF 1 "22. *See Sarge (987) and Wniemen (1983) fr quan invliag expectations reduction to he menpuaton of eterence 2.5. Initial Conditions and Unbounded Sequences 39 ‘where E, denotes an expectation of an unknown future quantity based on infor- mation available to investors at date Expression [2.5.14] determines the particular value for the inital price P> that is consistent with the boundedness condition (2.5.9). Setting ¢ = Oin 2.5.14] and substituting into (2 5.6] produces raves todo [ Se ee eco = OD, = 9D, = = DD - Dy ee eee ny sein die ini eoatvins Fy ty vi itholds for alle. Choosing P, equal to any other value would cause the consequences ‘of each period's dividends to accumulate over time 50 as to lead to a violation of [25.9] eventually. 1 is useful to discuss these same calculations from the perspective of lag operators. In Section 2.2 the recursive substitution backward that led fram [2.5 8] to [2.5.6] was represented by writing [25.5] in terms of lag operators as (1-0 + NLP .s = ~D, 25.13) and multiplying both sides of [2.5.15] by the following operator: (4 Q+Qbe ene ene, Rsi6, If(1 + 7) were less than unity, it would be natural to consider the limit of (2.5.16) U-G+nua1+0snnsaraes this operator is not defined. In this case, a lag operator representation can be sought forthe recursive substitution forwatd tha led from [2.5.3] to (2.5.13). This is accomplished using the inverse ofthe lax operstor, Lo = Wass which extends result [2.14] to negative values of k. Note that Lis indeed the inverse of the operator L: L-Y(Lw) = LW In general, LH = U4, With L? defined asthe identity operator 1, wy 40° Chapter 2 | Lag Operators Now consider mltpving [25.15] by UFC E NEE EI ERE dey OL- ey x [d+ ney BS ‘oobain BRM ea aE nn x= 0+ LP, Sse preted yet £ ET ME I OND oS 1 1], HOO = li ‘ pe + IF 1 ‘J Duar a so + [Aa] ee FLT one hich identical [25.13] within 2513] ceplaced with ¢ +1 ‘When r > 0 and {P,}7. _. is a bounded sequence, the left side of the preceding equation wil approach P.,; aT becomes lage. Tm, when > and (Ps Std YD. ate bounded toquences, the lint ofthe operator in [2317] exsts 2d could bo vowed atthe lnteree of the opersor on theif side of 2.513) M14 ALP = -0 + yb KLE GFN LE HNL S Applying his limiting operator to [2.5.15] amounts to solving te difference equa- tion forward as in (2.5.1d] and selecting the marker fundamentals salution among the set of possible tine paths for (Pr, ~~ given «particular time path for dividends [Diifa = “Ths, given « first-order difference equation ofthe form (= 61)y,= We (2.18) Sargent’s (1987) advice was to solve the equation “backward” when [6 <1 by ‘multilving by Qi - ¢L)" aUs ere eset] BS19) and to solve the equation “forward” when | > 1 by multiplying by 614-1 en (25.20) HOLL GL AL GLI Defining the inverse of [1 ~ 6] in this way amounts to selecting an operator OE] wii re propenion thas (1 - gl} © at] - 1 (the idenity operator) and that, when it is applied to a bounded sequence fw) [1 ~ ol}, the result is another bounded sequence. ‘The conclusion from this discussion is that in applying an operator suc as [1 ~ $1)", we are implicitly imposing a boundedness assumption that rules out 25. Initial Conditions and Unbounded Sequences 41 phenomena such as the speculative bubbles of equation (2.5.7] a priori. Where that is our intention, so much the better though we should not apply the rules [2.5.19] of 2.5.20] without some reflection on their economic content. Chapter 2 References Sargent, Thomas J. 1987. Macroeconomic Theor), 24 ed, Boston: Academic Pres. ‘Whiteman, Charles H, 1983. Linear Raional Expectations Models: A User's Gulde. Min- neapolis: University of Minnesota Press. 42 Chapter 2 | Lag Operators Stationary ARMA Pracesses “This chapter introduces univariate ARMA processes, which provide avery useful class ofmodelsfordeserbing te dynamics ot an individual ue series, Ibe chapter bens with definitions of some of the key concepts used in time series analsis Sectons3.2 through 3.5 then investigate the properties of various ARMA procestes. Section 3.6 introduces the autocovariance-generating function, which eal for analyzing the consequences of combining diferent time series and for an under- ofthe population spectrum. The chapter concludes witha discussion of invertlity (Section 3.7), which can be important for selecting the ARMA rep- reseatation of an observed tne seties thats appropslate given teases be made cof the model 31. Expectations, Stationarity, and Ergodicity Expectations and Stochastic Processes ‘Suppose we have observed a sample of size T of some random variable ¥; Dis Yas + + Yeh Bata] For example, consider a collection of T independent and identically distributed fetes eh B12 with ~ NO, 2°). This is referred to as a sample of size T from a Gaussian white noite proves ‘The observed sample [3.1.1] represents T particular numbers, but thie sot of T numbers is only one possible outcome of the underlying stochastic process that yeucrated ti data, indeed, even if we were wo imagine naving observed the process for an infinite period of time, arriving atthe sequence Ulta es Yeas Yoo Yaa Yas = 6 Ire Prete ren the infinite sequence {y}%. - would stil be viewed as a single realization from a fime teries process For example, we might set one computer to work generating an infinite sequence of i.i.d. M(0, 0%) variates, (ef). ., and a second computer generating a separate sequence, {ein = We would then view thee ws (wo independent realizations of a Gaussian white noise process. 43 Imagine a battery of J such computers generating sequences {yf}. {Ph + {y}7. my and consider selecting the observation associated with date # from each sequence 19. Py) ‘This would be deteribed as « eample of J realizstions of the random variable ¥, This random variable has some density, denoted fy(y,), whichis called the un- tonulitnal density uf Y. For exaupie, for die Gaussian waite novse process, this Gensity is given by ‘The expectation of the th observation of a time sevies refers to the mean of, ‘this probability ietribntion, provided it exit: E(Y) = fafa) dy G13] ‘We might view this as the probability limit of the ensemble average: CY) ~ pli (un 3 ve pag = Tepresents the sum of a constant y plus a Gaussian white For example, if) ea pas} then ite mean ie BU) = e+ Ele) B14 then its mean ie EY) = Be Bats} Sometimes for emphasis the expectation E(Y,) Is called the unconditional ‘mean of ¥., The unconditional mean is denoted 4, EY) = by Note that this notation allows the general possibility thatthe mean can be a function of the date of the observation t. For the process [3.1.7] involving the time trend, the mean [3.1.8} sa function af time, whereat forthe constant phis Gaussian white noise, the mean [3.1.6] is not a function of time. "The variance ofthe random variable ¥, (denoted 7) is similerly defined as ae E0= w= [7 0, - wa flo) a B13) 44° Chapter 3 | Stationary ARMA Processes For example, for the process [3.1.7], the variance is Yu = E(Y, ~ Bi = Ee) = 0, Autocovariance Given a particular realization such as {y}}%._» on a time series process, ‘consider constructing a vector x2 associated with date . This vector consists of the [j + 1} most recent observations on y as of date ¢ for that realization: a 2, a, We think of each realization {y7~. a8 generating one particular value of the vector x. and want 10 calculate the probability dstriburion ofthis vector w? across realizations i, This distribution is called the joint distribution of (Y,, Yj. ~- +s Y,_.). From this distibution we can calculate the th autocovariance of ¥, (denoted » me SL Lo mos ~ mea % Frame Ia Norse) 4, Bran ny (BAO) EG = Hing — ted [Note that [5:1.10} has the form of «covariance between two variables X and ¥: Cov 1) = LA ped py, Thus [2.1.10] coud he described th covariance of Y, with its wn lagged v tence, the tem “autoovarane,” Notice further trom [310] tat te Olt at touovsiance efor te valance of Ys sotispated bythe notation fn.) “The atocovaiance yy canbe viewed athe (1.] + I}eleweat ofthe vaiace. covasinos mati ofthe esto n. For this rea, the nutacvetianecs ae de> seribed asthe second moments of the proces for Y “Agumnit may be belpul fo think ofthe th autocovarane asthe probability limit ofan ensemble averare n> Blin any (HP ~ ad (88, ~ aod Baa] As an example of calculating autocovariances, note that for the process in [3.1.5] the autocovariances are all zero for j + 0: (¥, — w)(Wnj H) = Eee 0 forj #0, % Stationarity Tf neither the mean j, nor the autocovariances 1, depend on the date ¢, then the process for ¥, is said to be covariance-staionary Or weakly stationary: EY) = for allt EY, = wing ~ w) = 9 for all cand any j BLL. Expectations, Stationarity, and Ergodiciy 4S For example, the proces in (3.1.5] is covariance-stationary: EY) = ot tory 0 EO, = why = = Oya (PTE By contrast, the process of [3.1.7 isnot covarance-stationary, because its mean, Bi 16a fanchon of time. Notive that if a process is covariance stationary the covariance between ¥. and ¥,., depends only on j, the length of time separating the observations, and not om i the date of the observation. 1 follows that for 4 covariancestatonary process, y, and 7, would represent the same magnitude. To see thi, recall the definition y= EO = WY, ~ W). B17 Ifthe process is covariance-stationary, then this magnitude is the same for any value of we might have chosen; for example, we can replace f with ¢ + j Ea = mi Egy MY Ee) HE) = BO Ny — wD [Rut eferring agsinto the definition [3.1 12] thislast expression is just the definition of y., Thus, for any covariance-stationary process, ay ry forall integers 6.1.13] A citterent concept is that of sir stationary. process isa to be strictly stationary if, for any values of js» » othe joint distribution of (¥, Ye Yeoyy +s Yo.) depends only oa the intervals Separating the dates (jj... 1.'and not on the date itself (0. Notice that if a process strictly stationary with finite second moments, then it must be covariance-stationary-if the densities over which we ace integrating in [31.3] and [3.1.10] do not depend on time, then the moments, and 7 will ot depend on time. However, its posible to image & proces that is covartaxe-stationary Du wa sity sathnaty the wea aid ir {ocovarianes could not be functions of time, but peshaps highec moments sich a5 Er?) ae. In this text the term “stationary” by itself is taken to mean “covariance- stationary.” ‘A process {Y.} is said to be Gautsian if the joint dent Frstoncatoalto Vestn + Yreid {is Gaussian for any jy jay «-»» jy Since the mean and variance are all that are needed to parameterize a multvarate Gaussian dstibution completely, a covariance stationary Gaussian proces i strictly stationary. Ergodicity ‘We have viewed expectations of atime series in terms of ensemble averages such a8 [31.4] and [3.111] ‘These definitions may seem a bit contrived, since usually all one has available isa single realization of size T from the proces, which we earlier denoted (y10, »{, 72}. Prom these observations we would wal- culate the sample mean F. This, of course, i not an ensemble average but rather 2 time average! FoQTy dy paag 46 Chapter 3 | Stationary ARMA Processes ‘Whether time averages such s(3.1.14Jeventually converge tothe ensemble concept 2E(1) tor a tanonary process has 10 do wath ergodici. A covaniance-statonary process is said to be ergodic for the mean if [3.1.14] converges in probability 10 E(Y) as T—> =. A proces wil be ergodic for the mean provided thatthe auto- covariance x, gos to 210 sutfiienly quickly as j becomes large. In Chapter 7 we willse that ifthe autocovaiances fora covariance-stationary process satisty Shce, B15] then {Y) is ergodic for the mean, Similarly, a covariance-stationary process is said to be ergodic for second moments if twee = 91 30% wn, {or all. Sufficient conditions tor second-moment ergodicity wil be presented in Chapter 7. Inthe special case where {Ys a stationary Gaussian process, condition [3.1.15] is suficient to ensure ergodicity forall moments. For many applications, stationarity and ergodicity tum out to amount to the same requirements, For purposes of clarifying the concepts of stationarity and ergodicity, however, it may be helpful to consider an example of a process that it stationary but not ergodic. Suppose the mean for the ith realization {yPifs = is generated from a N(Q, »”) distribution, Say i= uO +e, B.L16) Here {e) is a Gaussian white noise process with mean zero and variance o? that ss imgependent ot 0”. Notice that ba = E(u) + Ele) ~ 0 Abo, ya = EQ + oP = to? sad ye EW! + Mul +e) =a forj #0. ‘Thus the proces of [3.1.16 is covariancetatonary. It doesnot sats the suficent conatin 119] 1 ergoatey tor te mean, nowever, and maeed, etme average wn 3 vp = a Fw +g = a +O Ee converges to rather than to zero, the mean of ¥,. 32. White Noise “The hacic hiding lark for al the processes considered in this chapter isa sequence {JZ whose elements have mean 2ero and variance o*, E(e) = 0 Baa Eee) = 0%, 22) and for which the «'s are uncorrelated actos ime: "Ohan “ergoiy” ud ia more gener snes Hencan (1970, p, 201-2, 4 Andarion tnd Moore (197, p39) 3.2, Whive Noise 47 E(ee) =0 fore és. 23] [A process satistying 3.2.1] through [3.2.3] is described as a white noise process. ‘We shall on occasion wish o replace [3.2.3] with the slightly stronger condition that the &'s are independent across time: rv te independent for # 32.4] Notice that [3.2.4] implies [3.2.3] but [3.2.3] does not imply [3.2.4]. A process satistving [3.2.1] through (3.2.41 is called an indevendent white noise process. Finally if [3.2.1] through (3.2.4) hold along with a~MO,e), Baa} thea we have the Gaussian white noe process 33. Moving Average Processes The First-Order Moving Average Process Let e) be white noise as in [3.2.1] through [3.2.3], and consider the process Ya wt et Oh, B31] where 4 and 6 could be any constants. This time series i called a first-order moving ‘average process, denoted MA(1). The term “moving average” comes from the fact that Y, is constructed from a weighted sum, akin to an average, of the two most recent values of « ‘The expectation of ¥, is given by BY) = B+ 6+ bea) = a + Ble) + OE) = 4 332] We used the symbol forthe constant term in [3.3.1] in ancepation of the resuit, that this constant term turns out tobe the mean of th process. “The variance of ¥, is FO, — wy = Eat Or? BUG + 28et,1 + Fe) B33) #404 Pot (1+ ee ‘The first autocovatiance is BUY, = w(K Ele, + B,-Meins + 86-2) Elen + ORs + Oeeia + Pert) 33.4] = 0+ oo +040, Higher qutocovariances are all 2er0: POY, = a, ;- forj>1 (335) Fe, + fe, Mes je; Since the mean and autocovariances are not functions of time, an MA(1) process is covariance-stationary regardless of the value of 8, Furthermore, [3.1.15]is early satisfied: Sina a + eer + [or The, iff) all moments. asian white noice, then the MA(I) process [3.3.1] is ergadie for 48 Chapter 3 | Stationary ARMA Proceses ‘The jth autocorrelation ofa covariance-stationary process (denoted p) is de tuned as ts th autocovariance divided by the variance: 8 1 r0 bag ‘Again the terminology arises from the fact that pi the coseelation hetween ¥, and ¥,-, Cont, Yo) = eam mats = ie = on Since p, ica coreelation, [p< 1 forall), hy the Cauchy-Schwar7 inequality, Natice also that the Oth autocorrelation pis equal to unity for any covariance-stationary process by definition From (3.3.3) and (3.3.4), the frst autocorrelation for an MA() process is given by, bo? 6 MTs ee Os Fy ba Higher autocorcelations are al zero, "Thc aulucoiselation p, can be plotted as a fanstivn of jas ia Figuse 3.1 Pawel (2) shows the autocorrelation function for white noise, while panel (b) gives the autocorrelation function for the MA(L) process: ¥, ey 0.88. For different specifications of @ we would obtain different values for the frst suceoraaion pi [3.3.7 Pstve ae of # ince poke eoaraltion by a larger-than-average value for ¥,,, jus a8 a smaller-than-average Y, may well be followed by a smaller-than-averaye ¥,, By contrast, negative values of # imply negative autocorrelation—a large ¥, might be expected to be followed by a small value for ¥.ys ‘The values for p, implied by different specifications of @ are plotted in Figure 3.2. Notice thatthe largest posible value for pis 0.5; this occurs if @ = 1. The tmallest value for p, ie ~0.5, which oceureif @ = ~1. For any value of p, between 0.5 and 0.5, there are two different values of 6 that could produce that auto- qu es) 6 Te uaF” Fs wer] aT For example, the processes Vee 40504 and Ya + 2 would have the same autocorrelation function: Pea Ts} We will have mare to say about the relation hetween two MA(1) procesces that share the same autocorrelation function in Section 3.7. 3.3. Moving Average Processes 49 iL A (8) White noise: ¥, = 5, (©) MA(): ¥, (@) ARG: ¥, = O8Y.4 + 6, 1 ‘ ae FIGURE 2.1 Autocorrelation functions for serorted ARMA processes, The qth-Order Moving Average Process A qihvorder moving average procest, denoted MA(g), is characterized by Vie went Be rth at ae, pag whete (6) satisfies 3.21} through [3.2.3] and (6... 8) ould be any real Tumbers. The mean of (3.3.8) is again given by 4: EC) = w+ Ele) + 8: EC6i-1) + By Ele-2) + + FEC, ‘The variance of an MA(g) process is Wm E(w = Eley + Ose t treat + On) 33] $0 Chapter 3 | Stationary ARMA Processes a FIGURE 3.2 The first autocorrelation (p)) for an MA(1) process possible for different values of & Since the e's ae uncorrelated, the vaslance (3.3.9) i? A 4 Oot + do +--+ ote LF + BSF oNO% [3.3.10] Forj=1,2,...44 ym Ele, + Os + Bana t+ Oe) H (eiay * Oates + Batejea t+ Ogtinjoadd aa] = Flee? j+ 0.2, 48 ‘Terms involving e's at different dates have been dropped because their product has expectation zero, and ty i detined to De unity. For / > q, there are no e's With ‘common dates in the definition of »,, and so the expectation is zero, Thus, 18, + ai + Gah es + 88, ho? forf=1,2--.49 [3.3.12] For example, for an MA(2) process, 1 OF + oi oe 0, + Ao? v= [eho memes O, For any values of (@, #5 --- » 89s the MA(g) process is thus eovariauce tationary. Condition [3.1.15] i satistied, so for Gaussian e, the MA(q) process is also ergodic for all moments. The autocorrelation function is zero after q lags, 85 in panel (c) of Figure 3.1. ‘The Tnfinite-Order Moving Average Process ‘The MA(g) process can be written "See equation (AS18in Appendix Aa teen of he book x, 3.3. Moving Average Processes 51 with 6 = 1. Consider the process that results as q > 2: vient Sie, Feet et hea tees. 3.3.3) ‘This could be described as an MA(=) process, To preserve notational flexibility later, we will use ’s forthe coefficients ofan infnite-order moving average process and @'s forthe coefficients ofa finite-order moving average process Appendix 3.A to this chapter shows that the infinite sequence in [3.3.13] Zuce, 3.14) Itis often convenient to work with a slightly stronger condition than [3.3.14} Sic p35) A sequence of numbers (vf satisfying [3.3.14 is said to he square summahle whereas a sequence satisfying [3.3.15] is said to be absolutely summable. Absolute summabilty implies eqvare-simmabiiy, but the converse does not hold there Aare examples of square-summable sequences that are not absolutely summable (again, 2ee Appendix 2.A). ‘The mean and autocovariances of an MA(=) process with absolutely sum- rable coefficients can be calculated from a simple extrapolation of the results for an MA(Q) process? E(Y) = im BG + Yor + Yea + tates +8 + Hien) [33.16] oH w= bun wt tore? 33.7) = lin B+ oe Be + viet y= EO = sw) a = ONY + Yicith + Veesds + dha +99, Moreover, an MA(=) process with absolutely summable coefficients has absolutely summable auiwcovatiances, Bpic= 3.9) ‘ence, an MA(=) process saustying (3.3.15) is ergodic for the mean (see Appendix 3.A). If the e's are Gaussian, then the process is ergodic forall moments. °Atolue sumably of) and existence of he second moment Ee) af sulient conditions ‘yon ines ener or meg ata suman Specie EE} soaenee ondom aia sh tat Caan Sane {Sn} = Som see Rao 973... 32 Chapter 3 | Stationary ARMA Processes 34, Autoregressive Processes The First-Order Autoregressive Process A fintorder auoregression, denoted AR(1), satisfies the following difference equation: Yine+ Year+ ey (34.1) ‘Again, {e} is white noise sequence satisfying [3.2.1] through [3.2.3]. Notice that [94.17 takes the form of the first order diference equation [1.11] or (2.2.3) in Which the input variable w,is given by w, = ¢ + «,. We know from the analysis fof fustorder difference equations that if] = 1, the consequences of the e's for ¥ accumulate rather than die out overtime. Its thus perhaps not surprising that when 1 = 1, there does not exist a covariance-stationary process for ¥, with finite variance that satisies (3.4.1. In the case when [4] <1, there is @ covariance stationary process for ¥, satisfying [3.4.1] Itis given by the stable solution to [3.4.1] er altdlete N+ ere dt Octet = [oO] tet bent Peat Pest “This can be viewed as an MLA(®) process as in [3.3.13] with y, given by 4/. When. |4l <1, condition [3.3.15] i eatiafed [a2 Zusi= Ze, which equals 1/(1 ~ [g)) provided that || < 1. The remainder of this discussion Of frst order autoregressive processes assumes that |) 1. This ensures thatthe ‘MA(@) representation exists and can be manipulated in the obvious way, and that Ae ARC) process is cigonic ft the auea “Taking expectations of [3.4.2], we see that E(Y) = [oll — A] +04 040+, s0 that the mean of a stationary AR(1) process is waell~ 9), 3.43} The variance is wo Ea Ele 4 be + Feat Pat F 3.4.41 SOG tae pot =o1-#), FO = 0% =) Elect bene + Baa beet Bees taper FO apa tT Uyt Oia t Oona 1 (34.5) [e+ oF git edo? OL 4 go? = [60 eo 3.4, Autoregressive Processes 53 It follows from [3.4.4] and [3.4.5] that the autocorrelation function, Pps Wy = oe B46) follows a pattern af geometric decay as in panel (4) of Figure 3.1. Indeed, the autocorrelation function [3.4.6] for a stationary AR(1) process is identical to the ‘dynamic multiplier or impulse-response function [1.1.10}; the effect of a one-unit increase in ,0n Y,,, is equal to the correlation hetween Y, and ¥,.,. A positive value of 4 like @ postive value of @ for an BA(1) process, implies positive cor- ‘but postive second-order autocorrelation, as in panel (e) of Figure 3.1. Figure 3.3 shows the effect on the appearance ot the time series (of varying the parameter g. The panels show realizations of the process in [3.4.1] with ¢ = and «, ~ N(0, 1) for different values of the autoregressive parameter g. Panel (8) displays white noise (6 = 0). A series with no autocorrelation looks choppy and palternless tothe eye; the value of one observation gives no information about the value of the next aheervation. For = 0.5 (panel (b), the serie seems smoother, with observations above or below the mean often appearing in clusters of modest longed; strong shocks take considerable time to die out ‘The moments for a stauonary AK(1) were derived above by viewing it a6 an ‘MA(e) process. A second way to arrive atthe same results is to assume that the process is covariance-stationary and calculate the moments directly fom the dif- ference equation [3.4.1]. Taking expectations of both sides of (3.41) re aeats vats Ue qulie yu B(Y) = 0 + bE...) + Ble). 47) ‘Assuming tat the proces is covariancestationary, BUY) = E23) = 8.43] Substituting [8.4.8] int (3.47), wect gute nee - 4) bas] reproducing the earlier result [3.4.3 Notice that formula [3.4.9] is clearly not generating a sensible statement if la] = 1. Far example, if > and > 1, then ¥, in [24] is equal to a postive constant plus a positive number times its lagged value plus a mean-zero random variable. Vet [3.49] scems to assert that ¥; would be ucgative ou avetage fo sch 1 process! The reason that formula [3.4.9] is not valid when |6| = 1 is that we assumed in [3.4.8] chat ¥,[s covarlance-stationary, at assumption which is not correct when |g] = 1 ‘To find the second moments of Y, in an analogous manner, use [3.4.3] 10 rewrite 134.11 as Y= wl ~ 8) + Yate =H) = Oar + 3.4.19) [Now square both sides of [3.4.10] and take expectations: EQ ~ WF = OK ~ w+ 26E (Fer ~ wed + E(@). BA.) $4 Chapter 3 | Stationary ARMA Proceses : el eeeaea (2) @ = 0 (wnite noise) (b) 6 = 05 fe) #=09 FIGURE3.3 Realizations of an AR(1) process, ¥, = 6¥-1 + ¢, for alternative values of §. 3.4, Autoregressive Processes 55 Recall from [3.4.2] that (¥,4 ~ n) i lines fanetion of, 6-2 Char a) = at been + Bes + But ¢, is uncorrelated with 6-15 m5 «<4 $0 6; must be uncorrelated with (nn = 1). Ths the mide fern om the sight side of [8431] i ree BU(h-s ~ wel 4.22] ‘Again, assuming covariace-stationarity, we have BUY, = WP = B01 - aF = B43] ‘Substituting [3.4.13] and [3.4.12] into [3.4.11], w= owt O+o? ne = oll 6, reproducing [3.4.4] Similarly, we could multiply [34.10] by (Y,.j ~ a) and take expectations EU — w-) ~ (B.4.14) = PEK — aM) — HN) + BleA¥-, ~ wh = p) will be a linear funetion of €y-j, joys joa = 22 Which, for > 0, will be uncurcelated with «Thus, for J> 0, the last tet oi te right side in [34.14] is zero, Notice, moreover, that the expression appearing in the fist term on the right side of [3.4.14), EQ ity — But the term (¥, is the autocovarance of observation on ¥ separted by j= 1 periods EM = Kn = w= Hr “Ths, for j> 0, [34.14 becomes 4 on (4.5) Equation [3.415] takes the form ofa fist-order itference equation Y= Or + Hn in which the autocovaiance 7 takes the place ofthe variable y adn wish the subscript (whic indexes the order ofthe autocovriance) replaces (which indexes lume), The input w; an [3.4.15] 18 identically equal t0zer0. Its easy to see thatthe difference equation [3.4.15] has the solution, b%. solution to a first-order difference equation with autoregressive parameter g, an initial value of unity, and no subsequent shocks. The Second-Order Autoregressive Process ‘A second-order aucregression, denoted AR(2) Ym et b¥nr + dVin + te 4.16) satisfies 56 Chapter 3 | Stationary ARMA Processes or, in lag operator notation, (-6L- OLY = c+ 6, Baa The difference equation [3.4.16] is stable provided that the roots of (= be ~ bz) #0 (3.4.18) lic ouside the unit circle. When this coniiou is satisfied, the AR (2) process turns ‘out to be covariance-stationary, and the inverse of the autoregressive operator in [.6.1/J15 gwen oy HL) = = Oh aL! — oy + ab + al? + LM tee. [3.4.19] Recalling [1.7.44], the value of yj can he found from the (1, 1) element of the matrix F raised to the jih power, as in expression [1.2.28]. Where the roots of [3.4.18] are distinct, a closed form expretion for gi given by [1.2.29] and [1.2.25]. Exercise 3.3at the end ofthis chapter discusses alternative algorithms for calculating by i Multiplying both sides of [3.4.17] by ¥(L) gives Y= ube + we (34.20) Iti straightforward to show that whe = 1 6 ~ 6) 42y and Sic paz the reader is invited to prove these claims in Exercises 3.4 and 3.5. Since (3.4.20) {an absolutely summable MA(=) proces, its man is given by the aust tei = ol = 4, = 80) £3.423] ‘An alternative method for calculating the mean is to assume that the process is covariance-stationary and take expectations of [3.4.16] directly: BQ) = 6 + HEM) + OE) + Ble), wed bet he 40, reproducing [3.4.23 To find second moments, write [3.4.16] as Yew OO) Tart Oa HH, (Y= u) = 40%, =) + ay, pang Multiplying both sides of [3.4.24] by (¥,_, ~ u) and taking expectations produces Ha b4-1+ by forj = 1,2, (3.4.25) ‘Thus, the autocovariances follow the same second-order difference equation as does the procest for Y, with the difference equation for 7 indexed by the Ing j. ‘The autocovariances therefore behave just as the solutions to the second-order difference equation analyzed iu Section 1.2. An ARQ) process is covariance- stationary provided that $, and ¢, le within the triangular region of Figure 1.5. 3.4, Autoreeressive Processes 7 ‘When @, and & lie within the triangular region but above the parabola in that figure, the autocovariance funetion 7, is the sum of two decaying exponential functions of j When ¢h, and ¢ fall within the triangular region but below the parabola, 7 is a damped sinusoidal function, ‘The autocorrelations arc found by dividing both sides of [3.4.25] by 1 = bas + b-2 forj = 1,2. 429 In particular, setting j = 1 produces = bs + bop, p= ull ~ 63) 4.271 Forj = 2, P= bits + te [423] ‘The variance of a covariance-stationary second-order autoregression can be found by multiplying both sides of [3.4.24] by (¥, ~ ) and taking expectations ECE — WP = Oy EOi-2 ~ MIO ~ A) + PECs - LN - #) + FEM, — 1) w= bin + dane + 0? 34.29] ‘The last term (0°) in [3.429] comes from noticing that (6) ~ H) Ele Pinr ~ a) + Ona — w) + = 4:04 4:0 4 08 Equation [3.4.29] can be written Y= dipite + drone + 0% [3.4.30] Substituting 3.4.27] and [3.4.28] into [3.4.30] gives ay relyee | O bo? + ONG 6? - FT The pth-Order Autoregressive Process A pih-order aucoresression. denoted AR(.). satisfies Yet Yr t ihr t+ Yop te BABI Provided that the roots of Lm de ~ de? — = bz = 0 3.4.32] all lie outside the unit circle, itis straightforward to verify that a covenance- stationary representation ofthe form Kant ve, 3.4.33] 58° Chapter 3 | Stationary ARMA Processes exics where KL) = 1 ~ &\L ~ 6b? = Ary and S70 yi <<. Assuming that the stationarity condition is satisfied, one way to find the mean Is to take expectations of [3.4.31] dat det + Ot, w= dll ~ 4) ~ dy - +++ = 4), 134.341 Using [34.34], equation [3.4.31] can be written Te B= Hy — HM) + OLY =H) + + tollinp — He, BAAS ‘Autocovariances are found by multiplying both sides of [34.35] by (Y,, ~ ) and taking expectations | Using the fat that y-, = yy the system of equations in [3.4.36] for j = 0. 1 2 can De solved fr 9» ++ Ye 28 functions Of oF, by by vo dye He am be shove thatthe (p % 1) veetot Gn ¥en by the fe p elements ofthe fst column of the (p? x pi) matss > ~(F @ P)]~ where Fis the (p x p) matrix defined in equation 1.23] and ) ndieates the Kronecker product. IAT ter tH OY» forf = 1,2. 4. + dome to ty het orf 0 Late "Yo Prodascs the Vues Walker equations: P= bins + dom ato Ay, fori 13.... (ADT Thus, the autocovariances and autocorrelations follow the same pth-orter iterence equation as does the process itself [3.4.31] For distnet roots, theit solutions take the form WHBM ted t + gal, 3.438) where the eigenvalues (A, Ay) are the solutions to WON = NF 4, =, 35. Mixed Autoregressive Moving Average Processes ‘An ARMA(p, ) proces includes both auoceyessve and moving average terms: YR CH AY tat + Vogt a) Mer Bu + Oa HF Bg, of in lag operetor form, 1 L ~ dy? =~ 4,L9y, ge od Provided thatthe roots of La gr = de? - Bsa ‘Te earl be inv tn pen hie a Be Iie outside the unit circle, both sides of [25 2] can he divided hy (1 — Al — al? == dL?) to obta Y= w+ w(L)e were Gears alte s+ 90H GOL + ALE + HLA WO) (gL ob = 40) Bice B= ML — br - b2-- ++ = b)- ‘Thus, stationarity of an ARMA process depends eatiely on the autoregresive parameters (Ors hr +d) and not on she unvng average parameter (yn 6.) ‘It's often convenient to write the ARMA process [8.51]in terms of deviations from the mean: Y= w= b(hian — a) + (Kear ~ psy $y — MDH Et Eran + Baia +o + Ogfrne ‘Autocovariances are found by multiplying both sides of [3.5.4] by (¥,-) — a) and taking expectations. For j > g, the resulting equations take the form yr bint Hat + Oy» forjmat lar? Bs3] “Thus, after q lags the autocovariance function 9, (and the autocorrelation function parameters ‘Note that [3.5.5] doesnot hold for j= 4, owing to correlation between Ot and ¥,-,. Hence, an ARMA(p, q) process will have more complicated autocovar: jances for lags 1 through q than Would the corresponding AX(p) process. For J > @ with distinet autoregressive roots, the autocovariances will be given by aya BAL + RAL + A, B56) This takes the same form as the autocovariances for an AR(p) process [3.4.38], sivaugit because tie ital wuaitious Cro ries ++» r) iff for the AHL [AR processes, the parameters hy in [3.5.6] Will not be the same asthe parameters Bein (3.4.38) There is a potential for redundant parameterization with ARMA processes. Consider, for example, a simple white noise process, nae bs7 Suppose both sides of [35.7] are multiplied by (1 — pL): (= at, = = ale Sal Clearly. if [3.5.7] is a valid representation, then so is [3.5.8] for any value ofp. ‘Thus, [3.5.8] might be described as an ARMA(I, 1) process, with , = p and 8, = =p Tris important to avoid sich 4 parameterization. Since any value of o in [3.5.8] describes the data equally well, we will obviously get into trouble trying to estimate the parameter in [3.5.8] by maximum likelihood. Moreover, theo- retical manipulations based on a representation such as [3.5.8] may overlook key cancellations. If we are using an ARMA(L, 1) model in which 4, i lose to ~ 2h, then the data might better be modeled as simple white noise. 60 hanter 4 | Stationary ARMA Processes A related overparameterization can aise with an ARMA(p, q) model. Con- sider factoring te lag polyuonial operators i [3.5.2] as in (2.4.3) (AINA) = aE, = 0) (= mB ~ mL) = nL: We assume that [| < 1 for all, so that the process is covarance-stationary. If Bs9] the autoregressive operator (1 —"4,L— gal? — 4,L*) and the moving average operator (1+ @L + &L* +--+ &L8) have any roots in common, 233, Ay ~ yj for some F aut j, tien Gat sides of [3.3.9] can be divided b} a iD) int ” fla-an.-m=fha- ans. (= Ot = 6FLF = LPM, ~ ) ADL A OND ee yg, 35.10) where (1 Oh - oP = gd (= ALL = Ab) = Aa gDIL ~ Asl) = aL) (1+ ote + OL 4 + og Le) FLA mB = B= LI = aad = QD) The stationary ARMA(p, q) process satisfying [3.5.2] is clearly identical to the sauonary AKMA(p — 1, g ~ 1) process satistying [3.5.10] 3.6. The Autocovariance-Generating Function For each of the covariance-stationary processes for ¥, considered s0 fa, we c culated the sequence of autocoveriances {y).. If this sequence is absolutely summable, then one way of summasizing the sulocovariauces is through # scalar valued function called the autocovariance-generating function te ve ‘This function is constructed by taking the jth autocovatiauce an sauplyng it by some number z raised to the jth power, and then summing over all the possible values of j. The argument ofthis function (z) is taken to be a complex scalar. (Of particular interest as an argument for the autocovariance-generating. func tion is any value of z that lie on the complex unit circle, = eosin) ~ ssimuy = & \V=T and w isthe radian angle that 2 make withthe real axis. Ifthe ‘autocovariance-generating function is evaluated at z = e-™ and divided by 2m, 6 cesultig Faction of wy where sv(a) = eave) is called the population spectrum of Y. The population spectrum will be discussed 4.6, The Autocovariance-Generaing Function 61 in deal ia Copter 6. Tee I wl be shown that fora proces wth abso ia dete aulmoranaces, te TUN yo ea canbe aed ta eae Siu the autocovariances, This means that fo diferent proseses share the at sNcovtaneeyeetang anton, then he Wo proses exh he ee Teal sequence of stocoarnce “Ata example of casting an autoovariance generating fenton, consider the MALI) proc From equations [2.3 3} tf 33), eautnonaianen generating function is lz) = [0o5]e-! + [CL + Pore? + [Borz= oF fart + (1+ #) + Oe [Notice that ts expression coud alternatvely be wenten sels) = o% | ONL 1 A psa) The form of expression [3.62] sugzests that forthe MAC@) proces, Yom wt (LOL + OLE + + OLE, the autocovarance-geneatng funtion might be ealeulated as fre) = HL + 02 + PH + OED 3.63] w+ en +e 1 OA) “This conjecture can be verified by caring aut the multiplication in [2.6.3] and collecting tems by powers of (14 0 + Bet HOE) K (LH OEE BEE OEE) ZCODET Oya + ODEO? + gen +O + BED Fee EO H OM ht Oe! B64) HOAs ee Hope $+ OB, + Bate oy ade (OETA ‘Comparison of [3.6.4] wih [3.3.10] ue [3.3.12] outta that the wef in 3.6.3 is indeed the jth autocovariance. This method for finding gy(z) extends to the MA(=) case, If Y= w+ UDe [3.6.5] OL) = vo + al + Ll? + B66} and Juice, Bon then 2) = aPutadie-9, Beal For example, the stationary AR(1) process can be writen as Y- we Len which is in the form of [3.6.5] with YL) = W('~ gL). The autocovariance: jenerating function for an A2(1) process could therefore be calculated from a) ~ = gi = 42°) ae 62 Chapter3 | Stationary ARMA Proceses ‘To verify this claim directly, expand out the terms in [3.6.91 e Tear gry 70 + art en from which the coefficient om 2! G+ GbE BPE A) = aM — 4A This indeed yields the jth autocovariance as earlier calculated in equation (3.4.5) ie autocovanance-generatng function fora stationary ARMA(p. 4 proces can be writen OL +e + Oe 4 + OZ + 8, forte te) (0) = HAE + Os Bat 4 0-9 OO Ta batt = ag B= ETE BE) (2.6.10 Flew Sometimes the data are fiterd, or trated in « pauticules way before they are analyzed, and we would lke to summarize the effets of this treatment on the sutocovariances. This calculation iy particularly simple using the autocovariance. kenerating function, For example, suppose that the original ata Y, weve generated fiom an MLA(1) process, Tete + abe, B61) with autocavariance gencrating function given by [3.0.2]. Let's say that the data as actually analyzed, X,, represent the change in Y, over its value the previous petiod, Kev red - by, 6.2] ‘Sabsttuting [3.6.11] ito [3.6.12], the observed uta cam be characterized as the following MA(2) process, X,=(1- Ll + 6L)e,= (1+ (0- NL - 613} 14604617, [36.13] with a, = (@ — 1) and # = ~8, The astocovariance-generating function ofthe hseved date, canbe enous by dest apples of [0 aeG) = OAL OE + ONLY OE FE). [R014] eee factored form of the first line of (3.6.13), a (1+ 02+ 42%) = (1 21 + 62), in which case [3.6.14] could be written Bx(2) = o°(1 ~ 2)(L + 02) = 2-0 + 2-2) = = 90 = 29 -gy(2), Of course. [3.6.14] and [3.6.15] represent the identical function of z, and which way we choose to write iti simply a matter of convenience, Applying the filter 6.15] 3.6. The Autocovariance-Generating Function 63 (1 — L) to ¥; thus results in multiplying its autocovariance-generating function by G20 - 25 ‘This principle readily generalizes. Suppose that the original data series {Y.} satisfies [3.65] through [3.6.7]. Let's say the data are filtered according to m= mum, ipsa ae m= Sav DM 1. In other words, for any invertible MA(1) reprerentation [2.7.1], we have found a nonimvertible MAC) representation [3.7.4] withthe same fist and second moments as the invertible fepreseulation, Conversely, given ay suninvertible representation with [61> 1y there exist an invertible representation with 9 = (1/8) that has the same Gr and second moments asthe noninvetible representation. Inthe borderline case where 21, there is only one representation of the process, and itis noniavertibe [Not only do the invertible and noninvertble epresentations share the Same moments, ther mprecontation [09 1) ne 47 Al could be description of any given MA(1) process! Suppose a computer generated an infinite sequence of Ps according to [0.74] with @> 1. Than we haw fo fax thal the data were generated from an MA(1) process expressed ia terms ofa noniavertble tepresentaton. In what sease cou these same data be atosate with a ivertDle MAQ) representation? ‘ote rom [2.2.8] tht Cb ME fh (OU a ne + (One + OPE + 37. Inveribiliy 68 ‘Imagine calculating @ series {c,}7._~ defined by (Lt LHR, w) HOW aw) a= Oy wy tee, BTA) where 0 = (1/8) is the moving average parameter atociated with the invertible ‘MA(1) representation that shares the same moments as [3.7.4]. Note that since lol < 1, this produces a well-defined, mean squave cauvergent series (6) Furthermore, the sequence {¢} so generated is white noise. The simplest way to very iis sto calculate tne autocovariance-generating function of e, and confirm that the coefficient on 2! (the jth autocovariance) is equal to zero for any j + 0. From [3.7.8] and (3.6.17), the autocovariance-generating function for e, is given bv Bz) = (1 + 6)" + O2-*)~'g (2). G.7.9] Substiuting (3.7.5] into [3.7.5], BA) = (1+ ML + BE NEPBVL + ONL + ID gy any = 66, nai wliete the iat equality follows from the fact nat * = @. Since the autocovariance- generating function i a constant, it follows that ¢, is a white noise process with atiance #3", “Multiplying both sides of (3.7.8) by (1 + @L). F-wen (Lt ole, is a perfectly valid invertible MA(1) representation of data thet were actully ‘enetates from te noniverue fepresentation (3.74 The converse proposition is also true—suppose that the data were really senerated from [37.1] with] <1, an inverbe representation. Then thete exists { noninvertbe representation with # = 1/8 that decries theve data ith ema ‘aliity. To characterize this noninveribe representation, consider the operator ‘rnposed in [28.20] a the appropriate inverse of I+ BL): OL = ONL + GOL - YP 4 OL ~ OL + PL“? OL 4} Deine é, tobe the series that results fom applying this operator to (Y, ~ 1), Bem Nes ~ Hw) ~ Yow) + OM gs = a), BL] noting that this series converges for [|< 1. Again ths series is white noise: BAe) = (02 ~ 00? 4? = OD x (0a ~ 02! + 02 — BE ++ oX + on) + a2») = Got The coeiciet on 2/5210 for « 0, 0, is white noise as claimed, Furthermore, by conettion, Y-w= (1+ OL, so that we have found a noninvertible MA(1) representation of data that were ‘actuslly generated by the invertible MA(I) representation [3.7.1] Either the invertible or the noniavertble representation could characterize any given data equally well, though there isa practical reason for preferring the 66 Chapter 3 | Stationary ARMA Processes invertible representation. To find the value of ¢ for date ¢ associated with the invertible representation as in [37.8], we need vo know current ana past values of ¥. By contrast, to find the value of é for date 1 associated with the noninvertible representation a in [3.7.11], we need to use all of the future values of Y! If the intention is to calculate the current value of ¢, using real-world data, it will be feasible only to work with the invertible representation. Also, 28 will be noted in Chapters 4 and 5, some conveniant algorithms for etimating parameters and fore casting are valid only ifthe invertible representation is used. called the fundamental Imovation for Y,. For the borderline case when |8) = 1, the process is noninvertible, bot the innovation ¢, for such a process wil stil be escribed as the fundamental innovation for ¥, Tnvertibility for the MA(q) Process Consider now the MA(a) process. CW dF eb salts + etn, 7.17 forte seed =( Shea Provided that the roots of (14 a2 + O22 ++ + 02) = 0 (3.7.13) lie outside the unit ctce, [3.7.12] can be writen as an AR(~) simply by inverting the Ma operator, (4 mL + mbt tL FIM) Oe where (ot mt + malt me ye LE LEN Where this is ne ease, tne MA(g) representation (3./.1d] 8 invertible, Factor the moving average operator as (LHL + OL 46+ 46,18) = (L-ADL—AD) (= A,D. (3.7.18) If \A{ < 1 forall, then the roots of [3.7.13] are all outside the unit cicle and the representation [3.7.12] is invertible. If instead some of the A, are outside (but not (on) the unit citele, Hansen and Sargent (1981, p. 102) suggested the following procedure for finding an invertible representation, The autocovariance-generating ar) ( Asy GA {= Az = Aye) AE. Order the 2's 80 that Ay, Ass Ay) ate inside the unit egce and (Aye, dy sy Ag) ate outside the unit circle” Suppose o in (3.715) i replaced by’ Assis: +23; since complex A, appear as conjugate pairs, this is @ positive real number. Suppose further that (Ayo) Apne eplaced with their B75] 3.7. Invertbiliy 67 reciprocals, (Azih, Azdas««, Az"). The resulting function would be we alfa -salf tho avon {fa -aro}{ fa - ara} an ry ectanee ae eee o [Be aa}f BL toone {fla -ae9]f Tone - offja ~ ao}f Hae - 9} «f U off ie aa}{fa - azn]. which is identical to [3.7.15) ‘The implication is as follows. Suppose a noninvertible representation for an MA(a) process is written in the form fla-arall Has-nl J J om Ta aba, Brg where IPStor 42,0000 IAI>1 frien tteat2...0 and fortes iw otnermse. Ete) Then the invertible representation is given by Yene {i a- aw} ta = wy}se (e737) where Elen) = [Preaern eee ‘Then [3.7.16] and [3.7.17] have the identical astocovarionce generating function, though only [3.7.17] satisfies the invertibilty condition, From the structure uf the preceding argument, it is clear thet there are @ number of alternative MA(a) representations ofthe data, associated with all the possible “lips” between A,and A-*. Only one ofthese has ll ofthe A, on or inside the unit circle. The innovations associated with this representation are said to be the fundamental innovations for Y, 68 Chapter 3 | Stationary ARMA Processes APPENDIX 3.A. Convergence Results for Infinite-Order Moving ‘Average Processes “This appendix proves the statements made inthe text about convergence forthe MA(=) proces 343) "ast we show that absolte sumably ofthe moving average coefficients quae sammabiliy.Suppote that) x abelsely summatie. ‘Then there exes an N © 2 uch fhatjgl ct forall) = ¥, implying of < for al) = N. Then Sa-Su+Se< tis fnite since Nisfoite, and 37 y|Ois finite since {gps absolutely summable (Gps ertablising tha [3.3.13] imple (3.3.18) at la ‘Next we show that square summabilty does no imply absolute sumenaility. For a5, example of series tat i iquare-summabl Dut noe absolutly sumeable, conser @ = 2, Notice that ij > 1s forall > j, meaning that Ujtee] ay> [ae and 20 wy> fama bt ing h + 1) ~ lag) = gt + 1). hich diverges too a8 N + =. Hence ify isnot absolutely summuble, Ii, however, Sguaresummable, since lj? < his for all' 0, tee oxi 8 lle eget N such festy ager tS, tn words. one we have summed terms, clelatng the sum out to 2 larger number M doesnt change the total by any ore than an arbitrary seal aber. or a tochasie proce such a8 3.3.5), the compatable question is whether fn gry converges in mean aquare to some random varable Yas T=, inthis ose the Clichy enterom states tht Sze de; converges if and ony Wf for any > 0, there ht stabi Teg steer such that fran ateger > N Sve iu] <. pay In words, once 1 terms have been summed, the difference between that sum and the one ‘Oblsined from summing to M is random variable whose mean and variance are both atbitrany cove to 2200. 34 Camuersence Results far 1 fi W.0) der Mavine Averace Processes 69 Now, the left de of (3.4.1) k simply Eldon + War atiwes 400+ Wnticwal = (Het Wo tot Pha fa ~[Se-Sa]-« Buti. of converges a equ by (3.3.14, then by the Cau tron he ah ie AUER a Tint he mate ae ama x dcr by chive of evitahl large N Th he finite Serie in (33.13] converges In mean square provided tat [3.3.14] i satisied Final. we show tht absolute sumably of the moving average coeficients ilies ‘thatthe process ergode forthe mean. Waite [3.3.18] as ee d= oS, oan ‘A key propery of the abalute value operator that a+b + els fal + [bl + Ie Hence bho 3 Wahl Ved: Ind =o SS Moatl = 0° SB al Wal = oF 3 But ere exit a8 Mw uch hat 3 i < Me and therefore Ba < Mork Oo, 2. meaning that, Zico Sin mcone ce Hence (3.115) holds and the process is ergodic forthe mean Chapter 5 Exercises 3.1. Is the following MA() process covariance stationary? Yn (+ 2A + O8L2e, 1 fore= n= {5 Sot 532.8 the following AR(2) process covariance stationary? (= LL + 0.1825) ee so, caleulat its antocovaiances. 33. A eovatimestatiouay AR) process, (= Gl = gal = = BLY, - 0 Chanter? | Setionany ARMA P ncesse thas an MA() representation piven by a we, Mb) = UN b= BLE = =) = db ~ dL =~ lds + wl + Ol? + In onder for this equation tobe tus, the implied cefient on 1? that they imply a tecursive algorithm for generating the MA(®) weights do, that thie ecusion se algerie quivaert tn eine aqualtn he (11) siomert of the ‘ati Fase (othe [th power asin equation [1.2.28 3A, Derive (2.4.21) 3S. Verify (3.4.23) 36, Suggest a recursive algorithm for calelating the AR(=) weighs, (nk + alte Y= associated with a invertible MA(@) process, Ge wy Mr ao wot oo + BER, Give a closed-form expression for n a6 a function of the roots of (462+ att + 62) =0, sccuming that thece rons ate al distinct 37. Repeat Exercise 3.6 for a noninvertble MA(q) process. (HINT: Recall equation saan 38._ Show that the MA(2) process in Exercise 3.1 is not invertible. Find the invertible Fepretentaon fo the process. Calculate the aulocovartances Of the avertbe representation ‘sing equation (3.3.12) and verify that these are the same as obtained in Exercise 31 Chapter 3 References ‘Anderson, Brian D. O., and John B. Moore. 1979. Optimal Fitering. Englewood Clits, Hannan, E. J. 1970, Multiple Time Seis. New York: Wiley. snd Estimating Dynamic Minnesota Pres Rao, C, Radhakrishnn, 1973, Linzer Static! Inference sud ls Appistions, 288. New York: Wiley. Saigent, Mommas J. 1987. Macruccomumde Theury, 2a ed Destou: Acadcaic Press Chanter? Refers cox TH ‘This chapter discusses how to forecast time series. Section 4.1 reviews the theory of forecasting and introduces the idea of a linear projection, which is a forecast formed from a linear function of past observations. Seetion 4.2 deseribes the fore casts one would use for ARMA models if an infinite number of past observations ‘were available. These rerults are useful in theoretical manipulations and in under standing the formulas in Section 4.3 for approximate optimal forecasts when only ‘Tite number of observations ate available Section 4.4 describes how fo achieve a triangular factorization and Cholesky ‘actorization ofa variance-covariance matrix. These results are Used in that secon to calculate exact optimal forecasts based on a finite number of observations. They will also be used in Chapter 11 (0 interpret vector autoregressions, in Chapter 13 to derive the Kalman filter. and in a number of ather thenretieal calculations and numerical methods appearing throughout the text, The triangular factorization is used to derive a formula for updating a forecast in Sestion 4.5 and to catablish in Section 4.6 that for Gaussian processes the linear projection is better than any nonlinear forecast. Section 4.7 analyzes what kind of process results when two different ARMA [processes are added together. Section 4.8 states Wold's decomposition, which provides 2 basis for using an MA(=) representation to characterize the linear forecast rule for any covariance-stationary process. The section also describes @ pular empirical apparh for finding a reacemahle anpenvimatinn tn this repens Sentation that was developed by Box and Jenkins (1976). TT. Principles of Forecasting Forecasts Based on Conditional Expectation ‘Suppose we are interested in forecasting the value ofa variable Y,,, based con set vniahoe Xhasrved at fate + Enp evample, we ight want Y, based on ts m mot recent values. In this cate, X, would consist of «consent PIS Yop ngs at Yom Let ¥7,, denote 2 forecast of Y,,, based on X,. To evaluate the usefulness ofthis forecast, we need to speity ast junction, or & summary of how concerned we are if our forecast is off by a particular amount. Very convenient results are obtained from asuming a quadratic los function. A quadratic loss function means choosing the forecast Y7, 0 a8 t0 minimize E(iias ~ Yeewl. (11) n Expression [41.1] is known as the mean squared error associated with the forecast fea denoted MSEC) = Ero ~ Yee? ‘The forecast with the smallest mean squared error turns out to be the ex pectation of ¥,, conditional on X,; Yeon ® EOD) (a2) conditional expectation, Foaap = ato. (4.13) For this eanaidate forecasting rule, me MSE would be FU cex = SOP = FV ers — EeeIX,) + BM. BK) ~ 8OCOF Die = EY IX OP G14 + PEUYnn ~ EY. IK JILECY 9K) ~ 8K) (Weite the middle term on the sight side of [4.4] 28 2Elnas. (4.1.51 where yor Rios BW aK) [ECW AlN) ~ BID. ‘Consider fist the expectation of 7,., conditional on X,. Conditional on X,, the terms E(Y,,,X,) and g(X,) are known constants and can be factored out of this expectation: ElnelX BEC) ~ 50RD] x BCH ~ EO ORD FY) = 206] «0 0. By asaphovardapicaon ofthe hw oiteratedenpettons, euro [A310], it follows that Elries] = Ex(Eles AX) = 0. Substituting this back into [4.1.4] gives Fl¥oea = (IP = ElYes ~ Een KOE + ECE olX) ~ aKIP)- (4.6) ‘The second term on the right side of [4.1.6] cannot be made smaller than zer0, and the fist term does not depend on a(X,). The function x(X.) that makes the ‘mean squared error [4.1.6] as small as possible isthe function that sets the second term in [4.1.6] to er E(YoIX) = a(X). (4.17) ‘Thus the forecast g(X,) that minimizes the mean squated error is the conditional expectation E(¥;, 1%), a claimed, ‘The MSE of this optimal forecast is FUYo04 > KQP = Beas ~ BWDP: (4.1.8) "he contol exestation EX) represents the eondtceal poplin moment ofthe rn om aie Yd ts Ko he inne vsate Yn ale ot eal Ya ~ (°K), the EC) = eR, whieh does ot depend on 4.1, Principles of Forecasting 73 Forecasts Based on Linear Projection We now restrict the class of forecasts considered by requiring the forecast Yfsypt0 be a linear function of X, ¥isay = 0%, [es] Suppose we were to find a value for a such thatthe forecast ertor (¥,, ~ aX.) is uncorrelated with X; Blea = @ RIK: = 41.10] If (4.1 10] holds, then the forecast aX, ie called the lincar projection of ¥,«1 on "The linea projection (urns out wo produce the suallest mean syuated exror ‘among the class of linear forecasting rules, The proof of this claim closely parallels ‘the demonstration ofthe optimality of the conditional expectation among the set ofall possible forecasts. Let g’X, denote any arbitrary liner forecasting rule. Note that its MSE is aa FLY... - aX, + 0%, 2 XP = Elen @X) +26 ir. @'XJ[eX,— ex) M1 + £la'x,- #X/. ‘As in the case of [4.1.4], the middle term on the right side of [4.1.13] is eve B((¥..1— eX) (0°, ~ eX) = (BLY. ~ a’XJX@ ~ e] = ofa by virtue of [41.10]. Thus [6.1.11] simpliis to Elon 8X P= Ear eX + Blak, eX [4.1.12] ‘ne optimal unear forecast g'X, isthe value that sets the second term in [41.12] equal to zer eX, where a’X, satisfies (4.1.10) For a'X, satis¥ing [41.10], we will se the notation POX) = aX, or sometimes simply Hoop = © Xe, to indicate the lnoar projection of ¥,., 0m X, Notice that MSE(PCY, JK] = MSELE(Y,. IX). since the conditional expectation offers the best possible forecast For most applications a constant term wil De included in the projection. We wil se the symbol Eto india ines projection on a vector of random variates 1X, along wth a constant term: LCE AK) = PO LX). Properties of Linear Projection It is straightforward to use [4.1.10] to caleulate the projection coefficient in terms of the moments of ¥,., and X.; E(axX) EOX!), 74 Chapter 4 | Forecasine at EXE XN, (41.13) sssuming that E(X,X;) is a nonsingular matrix, When E(X,X;) is singular, the coefficient vector as ot uniquely determined by [4.1.10], though the produet of {ths vector withthe explanatory variables, a'X,, is uniquely determined by [4.1.10] “The MSE associated with linear projection i given by EvYoas = aK) = E(Yias)®~ 2E@K Yas) + ECQXX Substituting [4.1.13] into [4.1.14] produces BUYoe1 ~ @X)) = BY osF ~ ZEW KERR EOYs) + EQ a XNECLXD]* (ass) ¥ EXEC KI EO Fei) = E(Via)? ~ EM X MERKEL d Notice that if X, includes & constant term, then the projection of (2Y,..+5) fon 8, (were @ and b afe ceterminisic constants is equai to Aleve + IK] = ePID +b ‘To see this, observe that a-PLY, |X.) + hi a linear function of X, Moreover, the forecast error, (0%, +B) [OPC AK) +6] = alFees ~ PH aK) 4s uncorrelated with X,, a5 required ofa near projection, (4.1.14) Linear Projection and Ordinary Least Squares Regression Linear projection is closely related to ordinary least squares regression. This subsection discusses the relationship between the two concepts ‘A linear cegression model relates an observation an yo: 10 8 x +, (4.1.16) Given a sample of T observations on y and x, the sample sum of squared residuals is defined as Neo Zon em (42.27) ‘The value of that minimizes [4.1.17], denoted b, is the ordinary least squares (OLS) estimate of B. The formula for b turns out t0 be that some inet combinston eX fs egal 020 forall renliations For example, X, coos of (oe anne vate, de sewed vr mat be a sexed ven of te ty ~ 6 yn One Ci ing Gop tun ates Som ak yom mate at pono Yaxeontt, whore Xp investor soting th nonreGundas seen of, horas projection ‘KP canbe wont celted om 6113} wh Xn [613] replaced by XP-Aay Unear om ‘Shatoe fhe ori carbs a, etjing 10] erotic tome rds able hat 2%, = aX} forall aloes of econsten it (4110) 4.1, Principles of Forecasting 78 which equivalently can be writen » [wn » xxi] [on3xr] ($119) Comparing the OLS coefficient estimate bin equation 6.1.19} with the near projection coefficient a in equation [1.1], we se that b constructed fom the Populetion moments E(XK!) and ECKY,,). Thus OLS regression iss summary Of the pareuar sample Observations (34%, 1 %G) AN Oy Jueves Pres ‘whereas linear projection isa summary ofthe population characteristics of the Sochastic proces (Xs Yak ‘Although linear projection describes population moments and ordinary least squares describes sample moments, there is formal matheratieal sense in which therun operations arethe ime Appendix 4.A tothis chapter dscustes this parallel and shows bow the formulas for an OLS tegression canbe viewed asa special ase [Notice that if the stochastic process {X,,¥,,.) is covariance-stationary and ergocic or second moments, then the sample moments wil converge tO the pop- lation moments asthe sample sie T goes to infin (U EXX/5 EHX) wr) Ext ERY) implying bia. (4.1.20) Thus OLS regression of y,a: on x, yields a consistent estimate of the linear projection coefficient. Note that this result requires only that the process be ergodic for seennd moments Ry contrast. , the forecast in [4.24] converges in mean squsie to w, the unconditional mean, The MSE [4.2.6] likewise converges to o? Fc, whichis the unconditional vatiance of the MA(=) process (42-1) ‘A compact lag operator expression for the forecast in [4.7.4 i enmatimas usec. Consider taking the polynomial #(L) and dividing by Lé ae Be ree gumr sgt bake ee Fel! + teyal? + ‘Tne annitilaion operator* (indicated by [/].) replaces negative powers of L by 2210; for example, FO) gees wage [| a et daa i823] ‘Comparing [4.2.8] with 4.2.4], the optimal forecast could be written in lg operator notation as (82.9) "This scion of forecating bated onthe anaiitaion operator sar tht in Sarge (198 78 Chapter 4 | Forecasting Forecasting Based on Lugged Y's ‘The previous forecasts were based on the sasumption that «, i observed directly. Inthe usual forecasting situation, we actuslly have observations on lagged ¥'s, aot lagged e's, Suppose thatthe proces [42.1] has an AR(=) representation given by ALI = m= en (4.2.10) where m(L) = 2fonjL!, m = 1 and 27.0ln| <2. Suppose further that the A polynomial n(L) and the MA polynomial (L) are related by aL) = WO. (42a) A covariance stationary AR(p) model ofthe form (UL = OEP = 9+ = LH) = bn (42.2) cof, more compactly, OLN, - 0) =. clearly satistes these requirements, with n(L) = (L) and y(L) = (6(L)]-?. An MA(q) process Vinee Qt eb sles e+ athe, (42.3) ¥,- m= aL)e, {is also ofthis form, with g(L) = 0(L) and n(L) = [0(L)]-!, provided thet [4.2.13] fx haeed on the invertihle repeecentation With » naninvertnia MAC). the mate rust first be flipped as described in Section 3.7 before applying the formulas given inthis ecction. An ARBA(p, q) also satisfies (4.2.10) and [42.13] with e(L) = (L)¢(L), provided thet the autoregressive operator XL) saistes the stationarity condition (000 of 6(2) = Ofie uutsde the unit eee) and tat che moving average operator @(L) satisfies the invertbility condition (oots of a(2) = O lie outside the suit ciel) ‘Where the restrictions associated with [4.2.10] and [4.2.11] are satisfied, ob- servations on {¥,, ¥;-1, - } will be suficient to construct {e,, 6, ..)- For Prample, for on 4R(i} prncese [7 10) wal he (= 6b - w=, (4.2.14) Thus, given ¢ and 1 and observation of Y, and Y,.., the value of 4, can be eorstiucied from (HAM) For an MA(1) process written in invertible form, [4.2.10] would be (14 aL)", ~ n) = 6 Given an infinite number of observations on Y, we could construct from hw) = Oa) + Oa =) = Osa) + Under these condition, [4.2.10] can be substituted into [4.2.9] to obtain the forecast of Y,,, 88 @ function of lagged Y's: Elid Yow Dat [2] LNY, - Hh [4.2.15] 42. areca Red an an Infinite Number af Ohservations 79 or, using [4.2.11], Attadieten d= ns [2] Sion. wate Equation [42.16] is known as the Wiener-Kolmogorov prediction formula. Several ‘examples of using this forecasting ule follow. Forecast ing an AR(Z) Provess For the covatiance-stationary AR(1) process [4.2.14], we have 2) == ote Ls AL Le (420 fw] tel. ore Substituting [4.2.18] into [6.2.16] yields the optimal linear s-period-shead forecast Tor a stationary ARG) provess we Ae on =Ht eH ‘me torecast decays geometrically irom (7 — p) ww p29 te forenes ‘increases, From [4.2.17], the moving average weight 4 is given by @!, so from [4.2.6], the mean squared s-period-ahead forecast etror 1s Ut ote otter + Mer [Notice that this grows with s and asymptotically approaches 03/(1 ~ 4%), the ‘unconditional variance of Y. Yd Yaw (4219) Forecasting an AR(p) Process [Next consider forecasting the stationary AR(p) process [42.12]. The Wiene Kolmogorov formula in [6.2.10] essentally expresses the value of (¥;.5 — 4) terms of initial values ((¥, — 1), (¥,-y — wand subsequent values of (6,1. Byars +» Ge) and then drops the terms involving future e's. An expression of thie form was provided by equation (1.2.26) which described the value ofa variable subject to a pth-order difference equation in terms of initial conditions and sub- Fron — 2 FRM = a) + FAM a AM + bien + Uataasan + Wabienaa +07 + Prcabints (4.2.20) where y= F8. (6221) 80 Chanter d | Foor tie Recall that f{? denotes the (1,1) element of Bi, fl denotes the (1, 2) element af F’, and $0 on, whete F isthe following (p Xp) matrix: ht by ben Oy 100 O0 re|0 10 00 Loo 0 i od Ihe opal sperod-ahed forecasts thus Fong — 4 R08 = 0) Hier 0 + 42.22 $f Uras - bie [Notice that for any forecast horizon «the optimal forecetis «constant pus linear fonction of {¥», Yi) ++ Yinpsih The associated forecast error is Bins ~ Fane = tess tities 7 Oita TT ate i ‘The casiest way to calealate the forceast in [4.2.22] ie through a simple re cursion, This recursion can be deduced independently from a principle known as lun of terated projections, which will be proved formally in Section .5, Suppoxe ‘hat at date we wanted to make a one-period-ahead forecast of Y,... The optimal forecast is clearly Pisay =H) = = WY + A =H) + (Y. w) Consider next a two-period ahead forecast. Suppose that at date ¢ + L we were {0 make a one-period-ahead forecast of Ys,3- Replacing ¢ with ¢ + 1 in [4.2.24] ives the optimal forecast as Brcsyen ~ mw) = Ox(Yeer — a) + dal, — w+ +O Yrcps2 ~ H) ‘The law of iterated projections asserts that if this date ¢ + 1 forecast of ¥,, is projected on date ¢ information. the result i the date ¢ forecast of ¥.... At date [4.2.24] (4.2.25) 1 the vals Yo, Yooay «+ + Yeopoz in [4.2.25] are known, Thos, (Brsae — WY = di Boay ~ #) + a0, — Hw) + (6.2.26) VO Qieper He ‘Substituting [4.2.24] into [4.2.26] then yields the two-period-ahead forecast for an AR) proce Bray — w) = OL6K w+ bees = wt + Ope — HD) mM) ea WT (GE + BMY, — w) + (bby + BMY, — wd + + Gibps + 6) Mapas — H) + br8,Hiag es — BD “The s-period-ahead forecasts of an AR(p) process can be obtained by iterating Posy = 1) = A Bereae 0) + AePerreae ~ a) +> 2.2 + OFrrom =) eae 42 F react Base an.an Ini tte Number a Observations forj = 1,2... 8 where fun ¥e forse Forecasting an MA(I) Process [Next consider an invertible MA(1) representation, = a aye, (array with [| <1. Replacing U(L) in the Wiener-Kolmogoroy formula [4.2.16] with (1+ OL) gives 4) (42.25) ‘To forecast an MA(1) process one period into the future (s = 1), ficeiac iia (haya and 50 Bom et eae» aa = wt AY, w) ~ OK — w) + OK 2 WY) * Ieis sometimes useful to write [42.2] a8 ea 8) and view «a8 the outcome ofan infinite recursion, 8 = =) = Os (6231) ‘The one-perod-ahend forecast [42.30] could then be writen as cea 92.3 quesion in iu faut au cami viatavicsicativn uf oy deduced simple rearrangement of [6.2.28]. The “hat" notation (é) i introduced at this point {in anticipation of the approximations to «, that will be introduced in the folowing section and substituted into [4.2.31] and [4.2.32] ‘To forecast an MA(I) process for s = 2,3, periods into the future, ["). andl co, from [4.9.29] 0 fors = 2,3,..65 Fut e fors=2,3, (42.33) Forecasting an MA(q) Process For an invertible MA(q) process, (= w= Ut aL + OL + Le 82 Chapter 4 | Forecasting the forecast [42.16] hecames bn [evens ou +a} ap = wy [LEO A BLES + Oabt . (42.34 x 1 THOL+ ODS Fare # Now 14 OL + OL? 4 +--+ LE T : - tt FOL + Malet FALE fore 1.2...69 0 fors— gt tg +2, ‘Thus, for horizons of s = 1,2... «gy the forecast is given by ein tcal si miee lala atten Where é, can be characterized by the recursion BL Wm Oba = taba = Rabies (42.36) A forecast farther than q petiods into the future is simply the unconditional For an ARMA(1, 1) process (= 6%, - =O +a that is stationary (|| < 1) and invertible (4 < 1), fant Aton ] i =k TFL Tenn =. (42.37) (aiseel. ( oL+ gL? OL + OL + Lt + ]. v o (e239 rr erE sees 91am ELE ges Testes ars eet) oreo Toa ‘Substituting [4.2.38] into [4.2.37] gives oy [tee] t= ok fay [HAM] Lhe, ay bee (62.39) = SE oy, - 42 are ate Rated om am Infinite Numhe nf Oheerintnne 8% Note that for s = 2. 3... . «the forecast [4.2.39] obeys the recursion (Pasay — #) = OH Posse = #): ‘Thus, beyond one period, the forecast decays geometrically atthe rate ¢ towar ‘the unconditional mean 4. The one-period-ahead forecast (¢ = 1) Is given by @ i Le Hh = 0. 62.40) Fry = e+ ‘This can equivalently be written aay =u) = SEEDER 8D) yy = ey, a + 08, (62.411 T+ OL where (42.49) ‘Forecasting an ARMA(p, q) Process Finally. consider forecasting a stationary and invertible ARMA(0. 0) proces (1 6E = byl == BLN =H) = (4 HL + LEH + OL The natural geneatctions of (4.2.1) and 6.2.42] are Broa 8) = B(,= w+ ahaa =) 4 $Oyleeps — HY EE Bear Ho Orga EPA) wich {@) generated recursively from 48 Sh 24g “The eperiod-ahead forecasts would be Canam) (42.45) SBresanp = M+ O(Precens = WE + OErecree =H ce] FB EO tt lie OER L Bg bPrssap — H) + Oa(Frosnay ~ M) +6 + OfPresape — were 1 grester than the moving average order. follow a ptborder difference equation governed solely by the autoregressive parameter 84 Chapter 4 | Forecasing 4.3. Forecasts Based on a Finite Number of Observations ‘The formulas in the preceding section assumed that we had an infinite number of past observations on Y, {Y,, ¥;-r, .}, and knew with certainty population pe- rameters euch 26 1, 4, and @."This tection continucs to assume that population Parameters are known with certainty, but develops forecasts based ona finite numer of viver vations {ip Frags << Fromesh For forecasting an AR(p) process, an optimal s-period-ahead linear forecast based on an infinite number of observations {¥, Ym ) ia fact makes use of ‘only the p most recent values {¥,. Ys.y.-... Yropey) For an MA or ARMA process, however, we would in principle vequire all ofthe historical values of ¥ in lrder to implement the formulas af the One approach to forecasting based on s finite number of observations ie to act as if presample e's were all equal to zero. The idea is thus to use the approx- BO Me Year) aucun inet at spam = Oy secutsion [2-36] cat be started by setting hem qi 20 (43.2) and then iterating on [4.2.36] t0 generate és. lations produce fomes— H)~ Bibra ma2 ~ ibs mers ‘The resulting values for (By és) --.y Bg) afe then substituted ively into [4.2.35] wo produce the forecast (4.3.1). or example, tors =q = 1, the forecast would be Broay™ w+ OHH) PU a = 8) aa + PY aw) (1 me = Hs Which is to be used as an spproximation tothe AR) forecast, BA AY, BP arB) + Oa = 0) — (43.4) For m large and | small this clearly gives an excellent approximation. For |e closer to unity, the approximation may be poorer. Note thet if the moving average operator is noninvertible, the forecast [4.3.1] is inappropriate and should not be wsed, 4.3. Forecasts Based on a Finite Number of Observations 88 Exact Finite Sample Forecasts ‘Au alteruaive approach is (o calculate the exact projection of Ye om it m most recent values. Let ‘We thus seek @ linear forecast ofthe form AK, = al + alY, + AMY FAK mer — [435] ‘The coefficient relating Y,,, to Y, in a projection of ¥,.3 of the m most recent woluee af Vis donated af in {44 5) hie will im general he ifinent fram the coefficient relating ¥;,, to ¥, i a projection of ¥,., om the m + 1 most recent Nalucs of Yt tae woeftsleat would be denoted af"*™, TE, is covariance stationary, then E(Y,Y,-.) = 1, + w?. Setting X, = (1, Y, Yeas Yromes i 13) implies ew af) af ao alt) le Cnt) td Om + HL fai » » 7 At eee inn ea [4.3.6] Bet yea new ‘When a constant term is included ia X, itis more convenient to express vasiabies in deviations fiom the mean. Then we could calculate the rojesiion of og — H) 00%, = (08, ~ 1), Wons = Weer Homes — HI]! Foray mm aS(Y, = a) + OSH = a) + (43.71 + ames =H) For this definition of X, the coefficients can be calculated directly from [4.1.13] to gl teary ea We will demonstrate in Section 4.5 that the coefficients (a!™.a!", n) matrix O bas a auique sepiesentation of the form = Apa’, 44. where A is a lower triangular matrix with 1s along the principal diagonal, LimOnnO i ea) mm 1 0 ° avant oF, nag yy and D is a diagonal matrix, dy 0 0 ° 0 da 0 0 i ida of, 00 0 4, where d > 0 for alli, This is known as the triangular factorization of ©. 44. Factorization of a Positive Definite Symmetric Matric 87 To see how the triangular factorization can be caleulated, consider My My My --- Day My My => Oy Baits tee Og 42] My Da Dy + Oe ‘We assume that is positive definite, meaning that x'9%x > 0 for any nonzero (rx 1) vecor x, We tao assume tat Qs syeimeiic, so that 2 = Dy ‘The matrix can be transformed into a matrix wih er the 2, 1) poston by multiplying the ist row of © by Nf and subtracting the resulting row from the second, A zero can be put ia the (3,1) postion by multiplying the Gist row by 04,05? and subtracting the resulting row from the third. We proceed inthis, fashion down the fist column Thiet of operations can he summarized x pre- multiplying 0 by the following matrix: Tao NOHNE IO MyM 10 0 =O05' 01 0 (aa) =O,05' 00 1 ‘This matrix always exists, provided that Q, + 0. This is ensured in the present definite, ee, must be greater than zero. ‘Wiren $1 is premultiplied by Baad postnuliplied by Bj the keslt iy arena We nest proceed in exactly the same way withthe second columa of H. The approach now will be to multiply the second row of H by hygh* and subtract the result from the third row. Similarly, we multiply the second row of H by hh! ‘and subtract the result from the fourth row, and so on down through the second 88. Chapter | Forecasting column of H. These operations can be represented as premultplying H hy the following matrix: ee ee oa) 0 1 oo Eye] 0 what 1s 0 [44.61 LY whats! Oo 1] ‘This ustix always exists provided that hy, # O. But hz can be caleulated as hg = eflfes, where ef = [0 1 0+ 0]. Moreover, H = E,QE}, where 0 is positive definite and Eis piven by [4.4.3], Since E, is lower triangular, its deter- inant is the product of terms along the principal diagonal, which are all unity, ‘Thus Eis nonsingular, meaning that H = E,QQE; is positive definite and $0 finn = ‘He, must be strictly positive. Thus the matriv in [4.4.6] can always be calculated If His premultiplied by the matrix in [4.4.6] and postmltipied by the trans- where o 0 hag ~ ahi hay OO y= habit => hy ~ hath | Again, since H is positive definite and since Ky is nonsingular, K is postive and in particular fy is positive, Proceeding through each ot the columns with the same approach, we see that for any positive definite symmetric matrix © there exist matrices E;, By,» , Ey such that Byoy EE OEE ++ By, = Dy (84.7) 0, o oo. ° 0 M,- MyM, 0 ° ° 0 ag — Fabian 0 : Lo a o With all the diagonal entries of D strictly postive. The matries Ey and Ey in [4.4.7] are given by [4.4.3] and [4.4.6] In genera, Eis a matrix with nonzero values in the jth column below the principal diagonal, Te slong the principal diagonal, and zeros everywhere else "Thus each B, is lower triangular with unit determinant. Hence Ej" exists, and the following matrix exists: A= En Bey. [4.4.8] 44. Factorization of a Positive Definite Symmetric Matrie 89 If [4.4.7] is premuliplied by A aud postuulliplied by A", the result is a = ADA. B49} [Recall that E, represents the operation of multiplying the frst row of © by certain numbers and subtracting the results from each ofthe subsequent rows. Is fnverse E;" undoes this operation, which would be achieved by multiplying the first row by these same numbers and adding the results to the subsequent rows. Thus Laoag nog tia te =| mudi OV 0 (44.0) M05! 0 0 + I as may be verified directly by multiplying (4.4.3) by [44.10] to obtain the identity ratris. Siar, laud 50 on, Because of this special structure, the series of multiplications in [1.4.8] turns out to be trivial to carry out: 1 0 0 ° GeGataea aii nO) Aa | Ont Aohzl 1 ° [san] A hahat hast + 1] ‘That is, the jth columa of A is just the ith column of E>. ‘We should emphasize that the simplicity of carrying out these matrix malt plications is due not just ta the special structure of the Ej-* matrices but also t0 the order in which they are multiplied. For example, A“? = E,E,-2 + Ey ‘cannot be ealoulated simply by wsing the jth columa of E, for the jth columa of rey ‘yuice tne manrix A in [3.6.1] is lower wiangutar wit diagonal, expression [4.4.9] isthe triangular factorization of 0. For illustration, the triangular factorization © = ADA’ of a (2 X 2) matrix ng ie painipat [es f)-[ogos 4] i aa while that of a (2 ¥ 3) matrix ie My M2 Oy FE My On Oy} = | Oni? 1 0 My, Mm 3} | OnO5 habe! 1 fe eS Te ee, [sa tenll? be where haz = (Az — MeiMF'My), has = (Oxy — MyyM7M),), and Ay ~ hye ~ (Ms - 0,070). 44.3] Triqueness of the Triangular Factorization ‘We next establish that the triangular factorization is unique. Suppose that shore A, and Ay ae ath lover triangular with I lone the picial diagonal and Dy and D, ate both diagonal with postive entries along the principal diagonal Pca the masons have taverses, Premoliplying [1] by Dz¥4; and post ‘multiplying by (As]-" yields Ai(Ag]“? = Dz*Ar*AD, (a4a3] Since Aj upper angular with ts along the principal diagonal, (A! must likewise be upper tiangular with 1s along the principal diagonal. Since Ais also oft form, te left sige of [4413] upper triangular with 15 along te principal diagonal. By similar reasoning, the right ie of [44.15] must be lower angular. ‘Toe only way an upper triangular marx an equal a tower triangular matnx si Mi the oftzgnnal terme ae ern, Morenver, since the diagonal entis On the ie side of [8.413] areal unity, this matrix mast be te deity mati AilAal be Postoultplcation by Az establishes that Aj — Aj. Peemultiplying [4.4.14] by A~* and postenultiplying by (A'J-! then yields D, = Ds, The Cholesky Factorization [A closely related tactonzation of a symmetne postive definite matrix © is obtained as follows. Define D'to be the (n x 2) diagonal matrix whose diagonal entries are the square roots ofthe corresponding elements of the matrix Din the triangular factorization: [va 0 0 oui] 0 Viz 0 ° pe=| 0 0 Vis oO o 0 0 Vin Since the matrix D is unique and has stritly positive diagonal entries, the matrix 1° exists and is unique, Thea the triangular factorization can be written 0. = AD“DA! = ADYAAD!Y’ efinite Sommetric Mire OW a-p (Haug) were P= apie 10 0 ° 0 ay 1 0 ° ° ete Vis 0 F a 0 al Va 0 ody Vb =| eave vd aaVEy taVdn toVds +++ Vow Expression [4.4.16] i known as the Cholesky factorization of ©. Note that P like A, is lower liangular, though whereas A has 1s along the principal diagonal, the Cholesky factor has the square roots of the elements of D along the principal ‘diagonal 4. Updating a Linear Projection Triangular Factorization of a Second-Moment Matrix and Linear Projection Let ¥ = (Ys Yar » Ya) Be am (n x 1) vector of random variables whose second-moueat maltix is given by 0 = EY’). (5.11 Let Q = ADA" be the triangular factorization of ©, and define any, [452] ‘ihe secone-moment matnx of these tanstormed vanabies is gven by ERY) = EVV) = AHEORY IIA f4sa] Substituting [4.5.1] into [4.5.3], the second-moment matrix of ¥ is seen to be diagonal ‘Thus the P's form a series of random variables that are uncorrelated with fone another.‘ To see the implication of ths, premultiply [4.5.2] by A: A=y. 1456) “We wit we "Ya Yar uncorelatd to meen “E(HY) ~ 0." The terminology wil be secet ‘EY and Y, have ao meats ori constant erm added inthe Lac projection 92 chaper4 | Forcaing Expression (4.4.11] cam be used to write out (4.5.6] explicitly as Te ouiig oR] fe eee tele 405° eds! olfsjelel usm A054 gah Rygkit oo 1 , An nstequanon a [4.9.7 sates that ar (4338) So the frst elements of the vectors ¥ and ¥ represent the same random variable ‘The second equation in [4.5.7] asserts that Dy Nii'¥, + Ya = Yor or, using (4.5.8), where we have defined a = 2,05 implies ‘The fact that ¥, is uncorrelated with Y BUR) = £12 - ar yr = 0 fs.2.10) But, recalling (4.1.10), the value of that eats [45.10] is defined asthe cost ficient ofthe linear projection of ¥; on Y,. Thus the triangular factorization of en be used to infer that the cocficient of a linear projection of Ys on Ys elven bya = M,)M;', confirming the earlier result (4.1.13). In general, the row i, columa Yemry of ith we 8 ae coetnent om er Pojecton of Yon Since ¥, has the interpretation as the residual from a projection of ¥3 00 ¥;, from [4.5.5] ds gives the MSE of this projection EP) = dy = My — MNF!M,3 This confirms the formula for the MSE ofa linear projection derived earlier (equa tion (4.1.15). “The third equation in [45.7] states that My NGF, + hyhgth, + Py = ‘Substituting in from [4.5.8] and [4.5.9] and rearranging, Ps — Ys ~ OOF, — haahaa'(Fa ~ Oyy"F)). (4.11) Thus 5 tthe residual from subtracting a particular linear combination of ¥, and ¥; from ¥,. From [4.5], this residual uncorrelated with either ¥, oY BLY, ~ O5O¥, — hese ~ MeO, = 0 for] us cis residual is uncoreatea vata ever ¥, oF Y>, meaning that ¥, has the interpretation asthe residual from a linear projection of Yon Y, and Y,. According to [45.11], the linear projection is given by PORT) ~ OOF, + hahaa ~ Aah?) [4.9.12] ‘The MSF of the linear projection is the variance of Y, which from [4.5] is given by dy: or 2 ELYs ~ PCYIYa,¥)P 33 — ih [45.3] 4.5. Updating a Linear Projection 93 Expression [4.5.12] gives a convenient formula for updating a linear projec- tion, Suppore we aic interested in forecasting the value of Ty. Let Y be some initial information on which this forecast might be based. A forecast of Y, on the basis of ¥; alone takes the form POLED = Mn 05'¥, Lat Yq represent some new information with which we could update this forecast. I we were asked to guess the magnitude ofthis second variable on the basis of ¥, POVAY,) = 4,05°Y, Equation [45.12] states that PUYAYn¥,) = POY) + hahah, - POLY) (4.5.14) thus optimally update the initial forecast P(Y|¥,) by adding to it a multiple (isdhz! ofthe unanticipated component of the aew information (¥, — PCY} ‘This multiple (hihi) can also be interpreted asthe coefficient on Ya in a linear projection uf ¥; om 73 and "To understand the nature ofthe multiplier (tyhi"), define the (n x 1) vector Yq) by 4) = By, (45.3) where R, isthe mate given in (44.3). Notice thatthe second-moment matrix of YC) is given by E(((H(1))} = EXE, YY'E}} = E,OK;. ‘But fom [8.4.4] this is just the matrix H. Thus H has the interpretation asthe second-moment matrix of Y(1), Substituting [4.4.3] into [4.5.15]. % Ya ~ MyMie¥, fay = | M05% O05, ‘The frst element of ¥(1) is thus just ¥ itself, while the ith element of ¥(1) for i= 2,3,..- 41s the tesidval frou ¢ projection of Yon ¥4. The matin His ‘thus the second-moment matrix of the residuals from projections of each of the vanables on ¥;. In particular, fy the M92 trom a projection ot ¥; 00 Y,: fn - El¥ POAY)P, tite athe expected pronto this ane wih the errr fram a prajetion of Yoon fog = EilYs ~ POHIY ONY ~ PCHLYD: ‘us equation [4.3.14] states that a near projection can be updated using the following formals POAIaY) = POM) + (EL - POON, ~ PVA (45.1) XELYa ~ POY * 4 - POND 94 Chapter 4 | Forecasting For example, suppose that Y; isa constant term, so that P(Y¥;) i just as the mean of ¥,, wale POY|Y,) = jy, Equation (2.3.10) then states that POG Ia) = ty + Covers, 1a) Vara)" MO ~ Hn) ‘The MSE associated with this updated linear projection can also be calculated from the triangular factorization. From [4.5.5] the MSE from a linear projection of ¥; on ¥, and ¥, can be cakulated frou BLY, ~ YAY YOE 2 hgh hs In general, for i> 2, the coefficient on Y; in a linear projection of ¥, on Ys nd ¥, given by the ith lement of the second column of the matrix A. For aay i> j, the coefficients on ¥, in a linear projection of ¥,on ¥y ¥,-1, 1. Yrs [Bien by the row é, column j clement of A. The magnitude dj gives the MSE for linear projection of Y,on Yiu, Yi-ay ++» + Ys Ape! lication: Exact Finite-Sample Forecasts for an MA(1) ‘As an example of applying these results, suppose that ¥, follows an MA(I) proces: Yen tet Oy where e, is a white noise process with variance 07 and @ is unrestricted. Suppose ve want to forecast the value of ¥, on the hasi ofthe previews m ~ 1 values (Y,, Vago y Fant): Let Yel -») W-a) o anw) wh and let © denote the (n x 7) vaniance-covariance matrix of Y: ie Ca} oa csc tn la = EY" 0 8 LHR 0 (45.7) ey ‘Appendix 4 fo hs chapter shows thatthe wangula factorization of fs Ae (15.1) aa 0 Oo. o ol 6 te 1 te ° ° ac ear ° ° Treee | , Ee eos 1+ ees 45. Updating a Linear Projection 95 D f4s.9] o o 0 ° 2 Lhesere v en ararencerana a a ‘To use the triangular factorization to calculate exact finite-cample forecast, recall that ¥, the ith element of ¥ = A-1Y, has the interpretation as the residual from a linear projection of ¥, on a constant and ite previous values: Bem Yim BOY 0 Yoon VO. ‘The system of equations AY = ¥ can be weitten out explicitly as h=N-w R+he nme Troree ° Te oa + 6) +h ites a G+ es + way De rs errr oe al Solving the last equation for YF, Yq ~ BUY yn Yoon + -6Yi) = Yo ALB ee geenal ” Tae ae Don ~ EY Vaan Yh implying EWM y-a Yan YK) = (45.20) Lt 4 oe + remyy heen = BM alana, Penns Yall “The MSF of this forecast is given by d, Lees o ys om MSELEU Me wYare Wie gegen (6521) 10 intersting to aote th bebavor of his optimal forecast asthe aumber ot observations (n) becomes lare, Fist, sunoos tha the moving aver sentation is inveruble (ol 1) In this case, a8 1, the coeteat [432] tends to @ As e+ ot + oma 1+ eos oD oo while the MSE [4.5.21] tends to 0°, the variance of the fundamental innovation ‘Thus the optimal forecast fora finite aumber of observations [4.5 20] eventually tends toward the forecast rule used for an infinite number of observations [4.2.32]. 96 Chapter 4 | Forecasting Alternatively, the calculation that produced (4.5.20) are equally valid for a ‘noninvertible representation with jy] > 1. Im ths cage the coetticient in [4.2.20} tends toward 0”? OL + Oe ots + ety ft — ey ~ 0 TsO see eT ae" — 9°) =o ‘Tous, the coetfcient in [4.5.20] tends to 8-1 in this case, which is the moving average coefficient associated withthe invertible representation. The MSE s.5.21] tends to 0°62: ol 8 op, wich wil be reconize tom (3771 the variance of the innovation asociated ‘with the fundamental representation. ‘This observation explaine the vse of the expression “fundamental inthis context. The fundamental innovation f, has the property that ¥,~ BUM Year Vion) a [45.22] as m+ = wnere "denotes mean square convergence. Thus when [él > 1, she coefceat@ in the approximation i [3.3.3] should be replaced by @"™. When this 55 done, expression [23.3] will approach the correc forecast asm > It's ao instructive to consider the borderline case § = 1. The optimal Grite- sample forecast for an MA(1) process wth 8 = 1 is seen from [¢.520] tobe given by EOEV aaa Vacae AY) = w+ SHY BlaaPonaeFocoe YI whic, alter recursive sbstiuion, becomes EO aos. Yoctss- sD) wet w-S2qa- 6528] roa wae PL = wd) The MSE of this forevast is given by [45.21} OX + tin oF “The the variance of the foreeastersor again tends toward that af 4, Hence the innovation is again fundamental for this case in the sense of [4.5.22]. Note the contrast between the optimal forecast [4.5.29] and a forecast based on a naive application of [4.3.3], + Ween =m) ~ (Kuan =) + Wan = me OO ~ “The approximation [43.3] was derived under the assumption thatthe moving average representation was invertible, and the borderine case 9 = 1 isnot invertible. For this (45.24) 45. Updating a Linear Projection 97 reason [45.24] does not converge tothe optimal forecast [4.5.23] as m grows large. When @ = 1, ¥, = + e+ 6-1 and [4.5.24] can be waiten a5, He Gyan rt) = non t Fs) T (Gans t+ Ean) 1 + 6) = wt ee +e ‘The difference between this and ¥, the value being forecast, is ¢, — (—1)"ea, which has MSE 20? for all n. Thus, whereas [4.5.23] converges t0 the optimal forecast as n+, [4 5 2] des nat Block Triangular Factorization Suppose we have observations on two sets of variables. The fret set of var- bless collected in an (n X 1) vector ¥, and the second st in an (ny X 1) vector Ys. Their socond-moment matrix ean be written in partitioned form as feo eon) _ fin ane ol ren roe)" (man) where isan (ms ma) mate, Qo is an (is % mo) matrix, and the (ns mo atic the transpose of the (ty Xm) matrix Dy ‘We can pt zeros inthe lower let (ne Xm) Bork of £ hy premoliplying by the following matric If 9 is premultipied by B, and portmultiptied by B, the recut is (ow CES SHE (45.25) _ [in ° “(6 on - 040504] Define i Lae Bat Lai n] 11 [45.25] is premultipied by A and postmultipied by A’, the result is Oy My 1, 0 [os 0 | a [outs wl 5 [Ms vite Citic a Lo o,-a,0;%0,] Lo 1, | 8° = WDM. ‘iss similar to the triangular factorization {2 = ADA’, except that Dis a block- iagonal matrix rather then a truly diagonal matrix ma [t D2 on io, 98 Chapter 4 | Forecasting As in the ear case, D canbe interpreted as the second-moment matrix of the vector f= A", K)_fom ody ¥J ~ |-asos: 1.) Ly. thatis, fy = ¥, and ¥, ~ ¥, ~ 0 051%, The th clement of fis given by Va minus ‘a linear combination of the elements of ¥,. The block-dagonality of B inpics vie die produc of any element of ¥, win any element ot Y, nas expectation zero. Thus 2,0; gives the matrix of coefficients associated with the linear projection of the vector Y, on the vector Ys. PAIN) = MAA, asz7) 1 claimed in [4.1.23]. The MSE macrx assoclated with this linear projection is Biv ~ Polya: ~ Porgy)? = Edt) sc fas 7 = Oy ~ 940510, as claimed in (6.1.24) ‘The calculations for a (3 © 3) matrix similarly extend to 9 (3 3) block ratix without complications. Let Yy, Ys, and Ys be (1: © 2, (ts % 1), ad (ty © 1) vectors. A block-trongularfctoietion of thei sccond-momeat matin is obtained from a simple generalization of equation [4.4.13]. 1% % O41 TL. 0 07 My Oy |= 9.05! 1, ‘| lan My Mss, 25,2," Myglty,! A, [65.29] ae ° 1, DFM, OFM) x] 0 He ° Ciianianens Grin Oiti Roy i RaeSe'eA | Oia Oma Ia (Udag ~ Mab !E2p), Hay = (Why ~ ty Kt), and Hay = Hy = (Oey = 94050) ‘This allows us to generalize the earlier result [4.5.12] on updating a linear projection The optimal forecact of Y, conditional on ¥, and ¥, can be read off the last block row of A: POGI¥RY)) = O05 % + HoHs'(¥2 ~ O05'Y,) = POY) + Haltaly, — Poy, AS) where Hy = BX, ~ PORN: ~ POLY? My — BAX — PAN ITIN2 ~ PevaNOT. ‘The MSE of this forecast isthe matrix generalization of [4.5.13], E((¥, — PONY, YaII¥s ~ POI Va¥)I} = Hos — Hala", [4.5.31] 45. Updating a Linear Projection 99 where Hy = E((¥s — POIs ~ Posy Law of Iterated Projections ‘Another useful result, the law of iterated projections, can be inferred im- mediately from {4.5.30}. What happens if the projection P(Y,Y.,¥,) is itselt fo the simple pojecon of ¥ 00 Yi. PrpCHys ¥OINd Poly (652) ‘Yo venty ths claim, we aeed to show that te eiterence petween P(Y3)¥n¥,) and P(W.j¥,) is uacorelated with Y, But rom [&.5.30), this ference i given by Poa.) ~ PonN) Hatin'(2 ~ FORD), . Optimal Forecasts for Gaussian Processes “The forecasting rules developed ia this chapter ae optimal within the cast of linear functions of the variables on which the forecast i based. For Gaussian processes, ‘we can make the stronger claim that as long as @ constant term is included among. ‘ut to have linear form and thus is given by the linear projection “ro vey ths, tet be a (m % 1) vector with eal yy and an (mg % 1) ‘vector with mean wa, where the variance covariance matrit i given by [ee 1 gts = my oe = wham] Ou. Oo) E(¥2 — wa)(¥) — wa) ECW: — was — pa)’ nu Mn, ICY, and ¥, are Gaussian, then the joint probability density ic pF ee funtw = arama, afl (864 x exo{ Zl ~ a Os ~ sorte Be] [= wT] The aver fis ey found by inverting (45254 ~Hi"thn | Oa ° | me ET oe-ocece.] #6 ° 7" Ih, Likewise, the determinant of € canbe found by taking the determinant of 4526) 0] = (Al -- . 100. Chapter | Forecasting Tt Xie lower triangular mate. te determinant i therefore given by the product of terms along the principal diagonal, all of which are unity. Hence [Al = 1 and | ~ [os 2, 9, My Mr ° = M0510, [4.63] 10) ° [Mey ~ O05; Substituting [4.6.2) and [4.6.3] into [4.6.1], the joint density can be written faye %) 1 ae as Tae IOI He = 95.807°0h xexo{-J tos wy os aon ['y 8] PP og etosna) Leesan adE 2) SH! Hyg — Dl x evo -fl0. 44)" Os - my] [464] ef 98 Gi a) [u-#]]} LO (Wy Ualirttay J Ly = mJ) 1 2 Ag ~ 0,42 = Ga OI Ma ~ Mag) x on{-F.0s = wy M51, - H) = $02 = my = 95.05'0.3)-45 ~ »} where me yo + Oaths ~ we 05) ‘The conditional deat of Ys given Yi found by dividing the joint density (464 by the marginal dens: fut) = arial en -J.65 = wy ORM — »} ye yin roan orm as JM wow J 1 upp Tua EAE ty EE the pina agonal, Wate, ~ 04.0; 35M My". Tea O = MIM", where fs] o-B OM, & ‘Thus 1 hs the sae determiant a J. Beene J upper tangle, ts determinate product ‘oftems along th picpl diagonal, o | = |) Heaee [|= (J - |, ~ 005. 46. Oniimal Forecasts for Gaussian Processes 101 “The result af thie division is Sean(riod) ~ Suter 1 el Ln - myH“'G, - apa en[ 30 ~ my »). Wen, - 0,0;50, beat In other words, YAY, ~ M(m, H) ~¥(bms + 9505701 ~ wd [a ~ Ma05'm,3). 7 ‘We saw in Section 4.1 that te optima unrestricted forecast is given oy te conditional expectation. For a Gaussian proces, the optimal forecast is thus E(YAY:) = ta + On Oi" — 1). On the other hand, for any distribution, the linear projection of the vector ¥3 on A vector Y, and a constant term is given by EQN) = wa + MOR, ~ Hh). Hence, for a Gaussian process, the linear projection gives the unrestricted optimal "7. Sums of ARMA Processes ‘This section explores the nature of series that result from adding two different ARMA processes together, beginning with an instructive example. Tim af an MA() Procese Plus White Nov Suppose that a series X, follows a zero-mean MA(1) process Xe ut Bs, (74) where u, is white noise jo tory Fuad = (Foe The sutocoveriances of X 6 thus (1+ 803 forj= 0 EAA) = 4 008 torj= 2 1372 ° ashersice Let indicate a separate white noice esis: {et fori=0 102 Chapier¢ | Forecasting Suppose, furthermare, that v and u are uncorrelated at all leads and lags E(u.) =0 forall}, implying E(Ky,.) = 0 forall (47.4) Let an observed series ¥, represent the sum of the MA(1) and the white noise process Xen Fy + Biya + Ye ‘The question now posed is, Waat are the ume series properties ot ¥? ‘Clearly, ¥,has mean zero, and its autocovariances can be deduced from [4.7.2] through [47.5] 75] EUV.) = ECR, + VQ +H) (1+ ft +08 fori = 0 (5.7. { forj = 21 3 oben “Thus, the sum X; + ¥ je covariance stationary, and its autocoveriances are Eero beyond one lag, as are those for an MA(1). We might naturally then ask whether ‘here exis a oro-meas MA(I) representa with soma (Mise 0 otherwise, ‘whose autocovariances match those implied by [4.7.6]. The autocovariances of [4.7.7] would be given by (Q4 8)? forj=0 Fen.) = jo forj= 21 ° otherwize set to be vonsstent with [4.7.6], i mould have Co be de ease that (1+ oe? = (1+ Hot + oF 78) and 40% = B02. 79] Equation [4.7.9] can be solved for 0%, o* = boi/0, 147.101 and then substituted into [4.7.8] to deduce (1 + 0604/9) = (1 + 8 e3 + oF (1+ 68 = [0 +) + (BaD 80 — [0 + 89) + (otoD + 8 = 0. (era) 4.7. Sums of ARMA Processes 103 For given values of 8, 02, and 02, two values of 6 that satisty [4.7.11] can be found from the quadratic formula: (1 + 8) + (odo VIF OTF CODE = ° x (47.22) If 3 were equal to zero, the quadratic equation in [4.7.11] would just be BO (1+ 0+ 8 = H- HO- B= 0, [4.7.13] ‘whose solutions are # = 6 and é = 8~*, the moving average parameter for X, from the invertible and aoninvertible representations, retpectively, Figure 4.1 graphe ‘equations [4.7.11] and [4.7.13] as functions of 6 assuming postive autocorrelation for X, (6 > 0) For 0 > 0 aad of > 0, equation [4.7.13] is everywhere lower that [4.7.13] by the amount (03/¢2)6, implying that [47.11] has two real solutions for 6, aa invertible solution ¥* saustying 0< [orl ) HH Oy + Og = Oy Hoo) ‘The series e, defined in [4.7.16] isa distributed lag on past values of w and v, so it might seem to possess a rch autocortelation structure. In fact, it turas out to be (47.16) era FIGURE 4.1 Graphs of equations [4.7.13] and [47.11] 104 Chapter 4 | Forecasting white noise! To see this, note from [4.7.6] that the autocovariance-generating function of ¥ ean be written a) so thatthe uutocovaris 1+ axjoi{t + 824) + of, (oat co generating function of ¢, = (1+ 6°L) *Y,18 (1+ anoitt + 8e-¥) + ot C+ eat ey” ‘But 0* and o*? were chosen so as to make the autocovariance-generating function of (1 + 6*LJe, namely, 82) = (67.18) Ut ory ht side of 4.7.17]. Thus, [4.7.18] simply equal +o), deatival to the Bee) = 0, To summarize, adding an MA(1) process toa white noise series with which itis uncorelated ot ll ead and lays proces a ew MA() process chaasiesized by [4.79] Note thatthe series i [6.7.10] could not be forecast as a tnear function of lagged cor of lagged ¥. Clearly, could be forecast, however, on the bass of lagged u or lagged v. The histories {u} and {v} contain more information than {e} cor {¥.. The optimal forecat of ¥,.,0n the basis of (¥, ¥,-w.--) wold he ROW. Yo oe, with associated mean squared error o*?. By contrast, the optimal linear forecast fof Yip, 0m the Basis OF {Uy toss + + 4 Yn Year +} Would be EC esalte teas ++ Ve Vents +) By with associated mean squared error o3 + o2. Recalling from [4.7.14] that |6*| < |a), it appears from [4.7.9] that (8°4)o"2 < S203, meaning from [4.7.8] that o# > 2 + 2. In other words, past values of ¥ contain less information than past values ‘This example can be useful for thinking about the consequences of differing Information sets. Oue can always ake a seusible forevast on the basis of what fone knows, {¥,,¥;- ---}, though usually there is other information that could have Nelpea more. Aa important feature of such settings i8 that even though, uu, and ¥, are all white noise, there are complicated correlations between these white noise series. ‘Another point worth noting is that all that can be estimated on the basis of (¥)ate the two parameters 6° and a, whereas the true “structural” model 4.7.5] ate unidenified inthe sense ia which econometicians use this term —there exists 4 family 0f alternative configurations of 8, 22, and o3 with [aj < 1 that would produce the identical value for the liketibood function ofthe observed data {Y}. “The procesies that were added together for this example both had mesn 2270. ‘Adding constant terms to the procestes will not change the results in ay interesting ‘way—if Xs an MA(D) process with mean andi is white noise plus a constant dr thea 2, +, will be an MA(2) process with mean given by poy + joy Thus, ‘nothing is lost by restricting the subsequent discussion to sums of zero-mean presses. 47. Sums of ARMA Processes 105 Taidng Two Moving Average Processes Suppose neat that X, is a zero-mean MA(q,) proces: X= (1+ hb + Lt + + BLO) = ALU, win ot Fu.) = {2 Let W, be a zero-mean MA(q,) process: Wyn (14 mL gl? + oo + mgt an 2 forj=0 QW ofthe same basic structure. Assume that X and lags ECW.) <0 forall autocovariances 7, Ware uncorrelated wilh each other at all leads and suppose we observe Y= X+W Define ¢ to be the larger of gy oF gs: qm max{q,, a3) ‘Then the jta autocovariance of ¥ is given by EQN.) — BR + WOR) + Head ‘Thus the autocovariances are zero oeyons q lags, Suggesting taat ¥, migat be represented as an MA(a) process ‘What more would we nced to show to be fully convinced that ¥, is indeed an MA(g) process? This question can be posed in terms of autocovariance-gen- erating functions, Since Wet tm 4 follows at Dante = Stet Sores But these are jst the definitions ofthe respective autocovariance generating fune srl) = ax(0) | suo) (6709) Equation [4.7.19] i 2 quite general result—if ane adds tagether two eavariance- stationary processes that are uncorrelated with each other at all leads and lags, the 106 Choper¢ | Forecaxing autocovariance-generating function of the sum is the sum ofthe autocovariance- generating functions ofthe indvidual sens. EY, sto be expressed as an MA(a) process, Yom (14 OL + GL? + + LDR, = OLDE, with je tory =0 Feed = {eile then its autocovariance-generatng function would be Bele) = 6(2)6(2"1)0%, ‘The question is thus whether there always exist values of (0,8, «yy 22) such that [47.10] ie eat 8(2)6(2"*o? = 8(2)8(2 "Nod + x(2)K(2 03 (4.7.20) Te turns out that there do. Thus, the conjecture turns out to be correct that if two roving average provenses diet aie wiwunclaied with caus vies at aif Feady aut lags are added together, the result is anew moving average process whose order ss the larger of the order of the orginal two senses: MACg) + MACas) = MA(maxlas, 43) at] A proof of this assertion, along with a constructive algorithm for achieving the factorization in [4.7.20], will be provided in Chapter 13. Suppose now that X, and W, are two AR(1) processes: (x, = 4, (4.7.22) (= pL), = ¥ (4.7.23) where u, and v, are each white noise with u, uncorrelated with v, forall and +. ‘Again suppose that we observe YaX +, ‘and want to forecast ¥,., on the bass of its own lagged values. If, by chance, ¥ and W share the same autoregressive parameter, or reo, ten [47.2] coud simply be added directly to [4.7.23] to deduce (= alk, + (0 2l)W, =u, +, or But the sum u, + vis white noise (asa special case of result (47.210). meaning that Y, has an AR(1) representation = Y= 5 In the more likely caze thatthe autoregres ferent, thea [4.7.22] can be multiplied by (1 ~ pL): (1 = pl(t ~ #0), = (= plus (4.7.24 parameters w and p are dif. 47. Sums of ARMA Processes 107 and similarly, [4.7.23] could be multiplied by (1 — 1): (1 = wL)(L ~ pLyW, = (1 ~ mL), (4.7.25), ‘Adding [4.7.24] to [4.7.25] produces (1 ~ phyQd ~ aby(X, + W) = = ply + = aE)y, [4.7.26] From [4.7.21], the right sde of [47.26] has an MA(1) representation. Thus, we could write (= OL ~ GLY, = (1 + O)e,, where (db ol) - ply — wt) and (1 + Ob)e, = (1 - plu, + (1 - aly, In other words, AR(1) + AR() = ARMAQ, 1) [s727) ‘In general, adding an AR(p,) process HO), = tan AR(D») process with which i is uncorrelated at al leads and lags AL), = Ys produces an ARMA(p, + ps, max{p,, p)) process, DY, = KLE, wheie HL) ~ mL)olL) and HL)es = aL uct w(L)y, q . Wold’s Decomposition and the Box-Jenkins (odeling Philosophy Wold’s Decomposit All of the covariance-stationary processes considered in Chapter 3 can be written inthe form nm Yen S bun (8.1) Where , is the white noise error one would make in forecasting Y, as a linear Tunction of lagged Y and where 27-99} <® with Up = 1. ‘One might think that we were able to write all these processes in the form of [4.8.1] because the discussion was restricted to convenient class of models, ‘However. the following result estahliches that the representation [4.8.1] i in fact fundamental for any covariance-stationary time series. 108 Chapter 4 | Forecasting Proposition 4.1: (Wold’s decomposition). Any zero-mean covariance:stationary process ¥,ean he represented inthe form ie Soon tena Bey Bee enone a a a (4.8.31 The value of x, is uncorrelated with ¢,., for any j, though x, can be predicted ‘arburany well jrom a tnear funcnon of past values of ¥: w= BG Mon Kear) “The cerm x; sealed dhe lnerly deterministic component of Y., while Zj=0 Vf) is called the linearly indeterministic component. If x, = 0, then the process is called ‘purely linearly indeterminisi ‘This proposition was fst proved by Wold (1938).* The proposition relies on stable second moments of Y but makes no use of higher moments. It thus describes ‘only optimal Hinear forecasts of ¥. Finding the Wold representation in principle requires fitting an infinite num- ber of paratnctets (ty Yay - ) 0 die data, With « finite uunber of observations ‘on(Y, Ys,. - » ¥z), this will never be possible. As practical matter, we therefore need to make some additional assumptions about the nature of (Ji, Ba, -- -)- A typical assumption in Chapter 3 was that Y(L) can be expressed as the ratio of two Snite-order polynomials: Sy ME) APOE Et ee BWU gayi mane alee a “Anather approach, based on the presume * of the population spec- trim, wil Be explored im Chapter, (484) ‘smoothne The Box-Jenkins Modeling Philosophy Many forecasters ate persuaded ofthe benefits of parsimony. oF using as few Parameters as possible. Box and Jenkins (1976) been influential advocates of Uhis ew, They noted tat in pactic, analyse end op replacing the true operators %(L) and 6(L) with estimates &(L) and 6(L) based on the dat. The more param- tiers to eximate, the more room there to go wroug. ‘Ahough complcsted modes can tack the data very well over the historical period for which parameters sre esimate, they fen perform peoly When ised fr ‘out-of-sample forecasting, For example, the 1960s saw the development of a number aflarge macroeconomic models purporting to deseibe the econo sing tadeeds ofmacroeconomic variables and eouations Part ofthe disionment with sch efforts was the discovery that univariate ARMA models with smal values ofp or ¢ often produced better forecaets than the big models (ee for example Neloa, 172). Ae te shall seein later chapter, large se lone was hardly the only Habit ofthese leigestale masoevonwueue models Evea so, the dain tha ples move provide more robust forecasts has a great many beevers across deine. ‘ee Sargent (987, pp. 285-90) for sc st ofthe intaiton bend tires. "For mote recent pesmi evidence about arent are eae model, ee Astley (58), 48. Wolé's Decomposition and the Boxlenkins Modeling Philosophy 109 ‘The approach to forecasting advocated by Box and Jenkins can be broken down into four steps: (0) Transform the date, if accesary,s0 thatthe assumption of covariance: stationarity is a reasonable one. (2) Make aa initial guess of small values for p and q for an ARMA(p, g) model that might describe the wansformed serie. (@) Estimate the parameters in 4(L) and 0(L). (4) Perform diagnostic analysis to confirm thatthe model is indeed consistent with the observed features ofthe data. ‘The first step, selecting a suitable transformation of the data, is discussed in Chapter 15. For now we merely remark that for economic series that grow over time, many researchers use the change in the natural logarithm of the raw data For example, if X, isthe level of real GNP in year ¢, then Y¥, = log X, ~ tor X,-1 148.51 might be the variable that an ARMA model purports to describe. ‘The third and fourth steps, esumation an diagnostic testing, wil be discussed in Chapters 5 and 14. Analysis of seasonal dynamics can also be an important part of step 2 of the procedure; this is briefly discussed in Section 6.4. The remainder of this section is devoted to an exposition of the second step in the Box-Jenkins procedure on aonseasonal deta, namely, selecting candidate values for p and q.* ‘An important part of this selection procedure isto form an estimate 6 ofthe population autocorrelation g,, Recall that p, was defined as A= Hm where y= EC, ~ a... = a). A natural estimate of the population autocorrelation p, is provided by the corresponding sample moments: > ito where BF Wry — I) for} ~ 0,152, 1 486) = T. a Ce I= Td (687) Tm Note that even though only T — j observations are used to construct 4, the enominator in [4.8.6] is Prather thau Tj. Thus, for lage j, expression (4.8.6) shrinks the estimates toward zero, a8 indeed the population autocovariances go 0 ‘ero a5 j=, assuming covanancestationanty. Also, the full sample of abser- ions is used to construct F. ox aod Jes refer oi sep identicaton” of the sppropriae mael, We woid Bot and ‘Jean's terminology, brane “Meaieation” basa que diletent mening for cconometicas 110 Chapter ¢ | Forecasting Recall that if the data really follow an MA(a) process, then p, will be zero tor > g. By contrast, i the data follow an AK(p) proces, then p will gradually decay toward zero as a mixture of exponentials or damped sinusoids. One guide for distinguishing between MA and AR representations, then, would be the decay properties of 9, Often, we are interested in 2 quick assessment af whether forj = q + 1,q +2,.... Ifthe data were really generated by a Gaussian MA(q) process, then the variance of the estimate could be approximated by? vagym ei 2S ct} trrmartar tie. (8s) “Thus, in particular, if we suspect thatthe data were generted by Gaussian white nusey den f for say j # O should ie between =2/VT about 95% of We time. ‘a gencal, if there is autocorrelation in the proces that generated the orgial data (¥), then ihe estimate f wil be correlated with f for # j.° Thus paterns inthe estimated f, may repeesat sampling error rather than pateas in the rue p, Partial Autocorrelation ‘Another useful measure isthe partial autocorrelation. The mth population partial autocorrelation (denoted af) is defined as the last oneficent in a linear projection of Y on its m most recent values (equation [4.3.7]: Pragya CY, — a) + a8 (ins — M) +20 + a Yeas ~ ‘We saw i equation [4.3.8] hat the vector a can be calculated from ay n maa ” a Ym-1 Ym=a fl ‘Recall that ifthe data were rally generated by an AR(p) process, only the p most recent values of ¥ would be useful for forecasting. In this case, the projection coefficients on Ys more than p periods inthe past are equal to zero: form=p + iprd By contrat, ifthe data really were goucrated by on MA(g) process with ¢ = 1, then the partial autocorrelation a{~? asymptotically approsches zero instead of ‘ulkng off abruptly. ‘A natural estimate of the mth partial autocorrelation isthe last coefficient in an OLS regression of y on a constant and its m most recent values: py ata na gg Yor er an AR(p) process, then the Semple estimate (4) would have a variance eround ‘he tue value 0) thet could be approximated Uy"? Var(Sig)) = form — p+ tip +2, See Box and Jenkins (1516, p38) "Again. ee Box ad Jenks (57. p38). "Box ad Jeans (196, p 5). 48. Wold's Decomposition and the Box-Jenkins Modeling Philosophy 111 Moreover. ifthe data were really generated by an AR(p) process, then 6? and i? would be asymptotically independent fr i,j > p. Example 4.1 We lutate the Box-Jenkins approach with seasonally adjusted quarterly data fon US, real GNP from 1947 through 1988. The raw data (x) were converted {0 tog changes (y) in (48.5). Panel (6) of Figure 4.2 plots the sample futocorrelations of y (f for j = 0, 1,--- 20), while panel (b) displays the Sample para! autocoreiaons (a for m =", 1, + 20). Niweiyrine percent confidence bands (22/7) ae potted on bath panels; for panel (a), these are appropriate under the null hypothesis that she data ae really white ‘aise, herent for panel (D) these are appropriate if the data are really gen ‘rated by an AR(p) process for p Tess than m. TE ° we . (2) Sample autocorrelations in Loe td * (b) Sample partial autocorrelations FIGURE 4.2 Sample autocorrelations and partial autocorrelations for U.S. quar- tery ceal GNP growth, 1947-110 1980:1V. Ninety Sve percent confidence interval ae ploted s = 2/7. 112 Chapter 4 | Forecasting ‘The first two autocorrelations appear nonzero, suggesting that q@ = 2 would be needed to describe these data as coming from a moving average process. On the other hand. the pattera of autocorrelation appears consistent ‘withthe simple geometric decay of an AR(1) process, ane ‘with g & 0.4. The partial autocorrelation could also be viewed as dying out ater one lag, also consistent wath the AKL) nypotness. 1nus, one's intial ‘guess for a parsimonious model might be that GNP growth follows an AR(1) process, with MA(2) as another possiblity to be considered. APPENDIX 4.A. Parallel Between OLS Regression and Linear Projection ‘This appendix discusses the parallel between ordinary least squares regression and linear ‘onstueted 0 as 0 have population moment identical to the sample moment of pertculat ‘mpl. Say that in some parila sample on which we intend to performs OLS we have ‘observed T particular values forthe explanatory vector, denoted ym. Xp. Consider sn arial lhceetctned randory variable € hate take on only re theese Tralues, each with probably (IIT) Ag - a) UT Pig = x) = UT niga ays ur “Thon & i an aiialy cost random wre whe popuin probaly Ahi Baton i gen bythe exp! dstrbuion function of The population meat of the random variable Bis Be) = Dxpe= ahd “Thus he populiton mean of quale aserved simple met ofthetrrandom variable The popiton second moment of € ECE) = FE ex. BAW) ‘hich the ample second moment of (= 7): Sin canemitany concn a eri ia vara that cant ono of the gaerete aes Pm <->» Frei). Spee tat the Jot tbaton of «and Es gen PIE= x. = Youd UT fort 1,2,...,7. Thea Fite) = 2S are taal ES ean neo So Ew e'gy £3 Om. a0! (As) “Thi alpstricaly te same problem as cocing B10 8 to miinize (41.17, Ths, inayat sures regreson (oon B 0 to minimise (6117) can be ned as 1 Spedial case of linear projection (choosing a0 as to minimize (4.A.3)). The value of @ Appendix 4.A. Parallel Benween OLS Regression and Linear Projecion 113 that minimizes (4.A.3}can be found from substituting the expressions forthe population ‘moments uf te arts endo varables equations ¢.A.1] and [%.A2) iat the formula {ra linear projection (equation (4.1.1) Jes ax] [£3 xa). “Thus the formula forthe OLS estimate i [41 18] can he abtined nem special cacao the fora forthe linear projection coefficient ain (61.15) Tecaute linear nrnictions and OLS regressions share the same mathematical struc tue, statements about one have a parallel in the other. Tis can be a useful device for remembering results orconfsming algebra. For example, the statement about population moments, 1g weaenr2tes) = [33 BOP) = Vac) + 1E0DP. waa tas the sample analog ignite PLT Ty Ont OF (a5) wit F = ar "Aza seond example, suppose that we exits a sere of OLS regesions, with ‘the depenent variable forthe fh repesion and x» (k % 1) vecor of explaatry Ymabssommon oeoch epeson, Lal, (jn Yor" ss Ya an Wate te Fepesion model st yas +a, for 1 an (n x k) matrix of regression coefficients. Then the sample variance-covariance matrix ofthe OLS residuals can be inferred from (4.1.25) 1S ae [LS yy] [1 E FRM Re] PS 2 Bx aad the sow of Ba ven by «-((hiee] Bo) whcte B. Triunguur Fucivrigaiion Othe Covariance Mairi Yor an MACH) Process Ths appendix eabishes thatthe anlar actretion of in S.17] gen by [65.18] sad (4240) "The magnitude o* is simpy a constant term that wil end up mulipiyng every tem fn the D atts Recognizing ths, we can inital sve the fadorization suring that = 1 and then muleply the resulting D matrix by oto obtain the result for the general fase. The (1,1) element of D (ignoring the factor 0° is given bythe (1,1) clement of O: dd = (1+ @). To puta zero inthe (2,1) position off, we multiply the frst row of by oi(h + @) and subuact the result from the second; hence, ny = 6 + #), This operation changes the (2,2) element of to yt Ute isonet See ema ee T+¢ ‘Toputazeroin the (3,2) element off, the second row ofthe new matrix must be multiplied by Bid and then ebracted from the third row; hence, pages ac Se ee eee 114 Oheper¢ | Forecasting ‘This changes the (3,3) element nn (+ mas e+ oy - oC + 0 - Tose GFP + Feds eH) MUL e) 1+ e+e leeemse Tso I general, for the ah row, dy peer e OTE OS a Br ‘To puta zero in the ((+ 1, position, multiply by oid, Ls THOT Oe ae and subtract from the (+1) row, producing outs errs (ee ee eee Ee ee ee ee rs ee See ee erty ase Teer et te Lee eee s geen Chapter 4 Exercises 4:1 Use formula [4.3.6 9 show that for 3 covariances pce, the projection of ¥..1 0m a constant and ¥, a aa wy cae a Meal¥) = (= ade + AY, @ "Sow Sissons ths erodes equation [219] for {8 Show tat forthe AAC) proven the erodes eqn [1520] for n= (©) Show that or an ARC) procs the imped forcast a + ut = aK = sth enor noid with hs forecast ote wih ¥/ Init conelated with 2 42. Verify equation [4.3.3] 43. Fad he wing forint owing mat: fauaaay “2 6 -4 aati 44, Can the coefficient on Y; from a linear projection of ¥. on ¥%. Ys. and ¥; be found from the (4,2) element ofthe mat A fom he tanguarfsctorzation of = BY)? 45. Suppose that X, follows an AR(p) procees and v, ig white nose proces that ie imcorrlated with X,., for all. Show thatthe sum Hn Xa wy, follows an ARMA(p, p) proces. Chapter 4 Exercises 115 4.5. Generalize Exercise 45 to deduce that if one ads together an AR(P) process with fin MA(g) process and if dese wo processes ae uncorelate wit each her at all eats find lps: then the reul isan ARMA(p, p + 9) process Chapter + References ‘Ashley, Richard. 1988, “On the Relative Worth of Recent Macroeconomic Forecast.” Iaivratinal Ioana nf Favecsting 463-76 Box, George E.P., nd Gwilym M. Jenkins. 1976. Tine Series Analyst: Forecasting and Contr. reo San France Hoiden-Dag ‘Nelson, Charlee R. 1972. “The Prediction Performance ofthe F..B.-M.LT.~PENN Model of the 118 Eenmomy ” American Feomomie Review 6-010 =17 Sargent, Thomas J. 1967. Macroeconomic Theory, 24 ed. Boston: Academic Pres Wold, Herman. 1938 (2 ed. 1954). A Study in the Analysis of Stationary Tome Seri Uppsala, Sweden: Almgvst and Wiksell 116 Chapter ¢ | Forecasting Maximum Likelihood Estimation Tniroduction Vm et AM BM OM iey 64 Meier fot] 4 ea bee Oe, With e, white noise: ea] Bee) = {7 [5.1.3] ‘ne prenouscnapesassimed tna the population parameter (Gy wm ell = 6) and variance BOY — a? = 090 ~ 99) Since {e}" ie Ganesian, ¥, i alen Ganesan Hance, tha density af the fret observation takes the form Peis D = fx Oni & 04) i 1 oof Mp et wn") 2) VinVeml — oF I - 8) [Next consider the istributon of the second observation Ya conditional on observing Yen ys: From (52:1) Wace ote 15231 Conditioning on ¥, = y, means treating the random variable Y, as if it were the ‘dexerminisuc constant, For this ease, (3.2.3) gives ¥ 25 the constant (¢ + 4) plus the M(0, 2°) variable e,, Hence, (FY, = 7) ~ Mle + 47 0, aa Srarfoalysi 9) ~ aoe gt p24] me jot ceasty a otseebons 1 and 2 then ut the prod of [52.4] and (5.22) Fayre 14% = fear 0alrsi )fr,(ri 0)- Sint th dsrinton of the hd observation condonlon theft wos Famer (ods. 8) = [eae] meaning Vana | from which fa veulee Yo 16 8) = Rosle Ws Of we YO In genera, the values of ¥;, Ya, » Yen mater for Y, only through the value of Y,-y, and the density of sbservation conditonal onthe preceding ~ 1 observation siven by Frbtte-seo events Yeas + 85) = fety, yen ®) isla ate Vinai | a 118 Chapter § | Maximum Likelihood Estimation [525] ‘The joint density of the first observations is then Prrrovee Vo Innis 99) Fear 8 Oates iow Years at ‘The liketitood of the complete sample can thus be calculated as Pentre Oe Seow HO) = fx OT fray, Oe (5.2.7) ‘The log likelihood function (denoted $£(8)) can be found by taking logs of [5.2.7]: LC) ~ og F055) + B08 fray, 9) B25 Clearly, the value of @ that maximizes [5.2.8] is identical to the value that ‘maximizes (5.2.7]. However, Section 5.8 presents @ number of useful results that can be calculated as a by-product of the maximizatic likelihood function [5.2.7]. Substituting (5.2.2] aud 5.2.5] into [5.2.6], the fog likeiwod for a sample of size T from a Gaussian AR(1) process is seen to be £@) = ~Hog(2n) ~ } log{o7(1 - 4%)] aM as EP _ ((r 192} 1922) 5.29] = (C7 = ryn} ont ~ ¥ [Oe= = 9.20] 2 Lanaaaenaity ‘An Alternative Expression for the Likelihood Function A diferent description ofthe likelihood function for a sample of size T from ‘Gaussian AR(1) process is sometimes useful, Collect the fll set of observations Ina (7% 1) vector, gly OH rescore) This vector could be viewed asa single realization from a T-dimensional Gaussian Aiatrihation The mean af this (T ¥-1) vector is eed) [w a 9 | | (sam Leta Lad ee ee E(X) = wy Where w denotes the (T° X 1) vector on the right side of [5.2.10]. The variance- covariance matrix of ¥ is given by ELY - wy = wy] = 0, (5.2.11) 5.2. The Likelihood Function for a Gaussian AR(I) Process 119 where B= WF BO = Wa a) oo EK we ~ a) Ew) EU wo BO a- : E(t, wMY,— a) Elke ~ a #) > EU, ~ 0 (52.13) ‘The elements of this matrix correspond to autocovariances of ¥. Recall that the {ih autacovariance for an AR(1) pracess it piven by EY, ~ Ws ~ #) = 0°40 ~ 64) (52.13) Hence, [5.2.12] can be writen as Ms ov, [52.14] where ca agent a iu tcl tale et ’ bas] ght grt gr 1 Viewing the observed sample y as 2 single draw from 2 N(q, 0) distribution, the sample likelihood could be written down immediately from the formula for the ‘multivariate Gaussian density $u(95 8) = (2n)-™* [O49 expl Ky ~ wy'A-Hy — wy], (52.16) ‘wth Log tkeubood £0) = (=70) tog(m) + Hogi = Hy = 'A-Wy = Ww) {5217} Evidently, 5.2.17] and [§-2.] must represent the identical function of (3,35. - yz). To verify that this is indeed the case, define ang -¢ eth 0 0 L | Ouest i (52.8) cae oem It is straightforward to show that! LL=v-, (52.19) "By cret muliptiation, one exces VTE ITF VISE «one 0 =e) ot) ore ° 0 a raw, ° ° ° ale a prcaiyiag tis by Lyles te (FT) Meaty sae Ths, LV = fy conning i219) 120 Chapter $ | Maximum Likelihood Estimation plying from (5.2.14) that 07 = LL, [5.2.20] Substituting [5.2.20] into [5.2.17] results in £00) = (~T2)log(2n) + Hoglo*L'L| ~ Ky ~ w)'o-AL'LAy ~ pw). [5.2.21] Define the (IX 1) vector y ta be 0 oe 0 Oy Ran “#1 05 6 Ol ye eo ee 0 Ol] -a 0 0 oe 6 tty td 520 Fo vIzem- 7 Or > w) ~ On =H) =} O-H)- 0 - a) Or) ~ 60r-1~ Subsituing u = 61 ~ 6), thi becomes [=F b.- 0-01] nae oH noe bn roe Ors The let term in [5.2.21 can thus be writen 1G — wey ~ w) = WQUIS'S uot ~ gin ~ ok ~ oF 152.231 + (Qe) YO. - € - dy-sF “The middle term in [52.21 ie smarty Hloglo*L'L] = Hoglo=* «LL log o + 4 log (52.24 = (72) tog 0° + tog, ‘matical Review (Appendix A) tthe end ofthe book. Moreover, since Li lower Uiangulay, is determinant fs given by the product ofthe terms slong the principal diagonal: L] = VT 9. Thus, [5.2.24] states that Hlogle2UU = (72) log o? + Hog(t ~ 47). (52.25) Substituting [52.23] and [5.2.2] into [5.2.21] reproduces (5.29). Thus, equations (5.2.17) and [5.2.9] are just two different expressions for the same magaltude, as claimed. Either expression accurately describes the log likelihood function, 5.2. The Likelihood Function for a Gaussian ARID) Process 124 Expression [5.2.17] requites inverting (7 ) matrix, whereas [5.2.0] dors ‘not. Thus, expression [5.2.9] is clearly to be preferred for computations. It avoids investing « (7 % 7) wats by writing ¥, ae the eum of a forecast (c | $¥,_3) and a forecast error («). The forecast error is independent from previous observations by construction, so the tog of Its density is simply added to the log lketihood of the preceding observations. This approach is known as a predicton-error decom- position ofthe likelihood function Fxact Maximum Likelihood Estimates for the Gaussian AR(L) Process ‘The MLE 0 is the value for which (52.9] is maximized. Ia principle, this requires diferentating [5.2.9] and seting the result equal to zero. In practice. When aa attempt is made to cary this out, the result is a sjstem of aoalinear equations in @ and (J, Jo)... Jy) for which there is n0 simple tolution for O in terms of (), Yo 9p» Maximization of [5.2.9] thus requires iterative or u- meric proveduics desoed in Sexson Conditional Maximum Likelihood Estimates ‘An alternative to numerical maximization of the exact likelihood function is 1 regard the value of y, 25 deterministic and maximize the Hketiuood conditioned ‘on the first observation, ae mel . 59981 the objective thea being Wo maximize LOB Frprrerontsin( Vie Yenur IB) = [F192] osm) ~ (CF — 1)2} tog(o*) (5.2.27) 5 dy" [5.2.23] sich ic achive yan onnary leat squares (OLS) regretson ay,an a eantant tad its own ngged valu, The codiionel mimum Uetinood estates fc and @ are therefore given by Pati ea gui osu lel" Le wel Lo-wl where 3 denotes summation over ¢ = 2,3,... 5 T- "The conditional maximum likclihood catimate of the innovation variance found by differentiating [5.2.27] with respect to 0° and setting the result equal to 122 Chaoter S| Maximum Likelihood Estimation 3 [& = dy, | 3 T=1 In other words, the conditional MLE isthe average squared residual frou the OLS regression (5.2.28). Tu cousas to exact menimum lkeliiood estimates, the conditional maximum lelihood estimates are thus tvalto compute. Moreover, ifthe sample size Tis sulicently are, the frst observation makes a negligible contribution tothe total Iikehood. The exact MLE and conditional MLE turn outta have the came lage ‘sample distribution, provided that |$| < 1. And when |6| > 1, the conditional MLE Continues ta provide consistent estimates, whereas svimization of [8.29] docs not. This is because (5.2 is derived from {5.2.2], which docs not accurately deseibe the density of ¥; when [al > 1. or these feasous, in most applications te parameters of an autoregression are estimated by OLS (conditional maximum 1ethood) rather than exact maximum lixelinood. 5.3. The Likelihood Function for a Gaussian AR(p) Process ‘This section discusses a Gaustian AR(p) proscat, Yact Nt Mert ot OY te (53.11 'N(, o%). In this case, the vector of population parameters to be Cas das oo dp OY. with &~ i Evaluating the Likelihood Function ‘A combination of the two methods deseribed for the AR(1) case is used to calculate the likelihood function far a sample of sire T for an AR(p) process The first p observations in the sample (y,yn,.- - ,¥,) are collected ina (p * 1) vector Yp. which is viewed as the realization of a p-dimensional Gaussian variable. The miean ofthis vector is yp, which denotes a (p x 1) vector each of whose elements is given by Wed =a) 63.2) Let o*¥, denote the (p X p) vatiance-covariance matrix of (Yi, Yay ++ Y,) EC FB = we) EO = wth, - 0) BYE =) BO a C= HOH, ~ oN, = EU, = ws = w) BLY, ~ ws ~ w) EY, oF [53.3] Fon exnupie, far u drsorder euvoregression (p = 1), Vp 8 the scalar U(L ~ For a general pth-order autaregression, Fema wt Meets th a oye| a nw x |, Yr Yor Wea % 5.3. The Likelihood Function for a Gaussian AR p Process 123 where y, the th autocovariance for an AR(p) process. can be calculated using the rmethods in Chapter 3. The density of the first p observations is then that of @ Mays @°V,) variable: or} Fires = apr la2vss2 onl FEO — WV, ~ »] 53.4 fait = Bay-ronPAVG I OB] ~FEMYy ~ Hol V7 Ue ~ He | where use has been made of result {A.48] For the remaining observations in the sample, (pau, Ypnus --- +); the ‘prediction-error decomposition can be used, Conditional on the frst — 1 obser- Nations, the ih oiservation is Gaussian with mean 6 be + ben HT Ady and variance 3. Ooly the p most recent observations matter fr this distribution Hence, fort > p. movin geass 090) » 38) ‘The likelihood function forthe complete sample is then gerne Ye vee EB) ae TD Pravin rier ag Demis Bena Yeap Oe and ihe 1g tketood is meretore £00) = 108 Frpry pono Frr Yeowr 11 8) = = Soae) = Binion» Lat ~ Seales — HVS "Os ~ Hod ~ FP iogten) - T=? toglor) (i= €= bye dys 634 tog(ee) — F log(ot) + $toalv alte ~ Ha)VSMe — 0) Zo ee ee Eraluation of [5.3.6] requires inverting the (p x p) mattix Vp. Denote the ‘ow i, column jelement of V;' by w(p). Galbraith and Galbraith (1974, equation 124 Chavter | Maximum: Likelihood Estimation 16, p. 70) showed that me) - {z Sebeeys here gy = —1. Values of (p) for i> j can be inferred from the fact that Viz" is symmetric (v(p) = v4(p)). For example, for an AR(1) process, Vz isa scalar whose value is found by taking i = w= [3 oe Thus @?¥, = a1 — 42), which indeed reproduces the formula for the varence of an AR(t) process. For p = 2, equation (5.3.7) implies a [ G-6) -@+ se] TL-@+éeo aap | ‘rom which one readily calculates cular gf@- Wm wa - ot — of Wet ja af ca aay || - @ + eta - ex - 08 Zoe] = 8-60 = 00-4, oy Ge HOVE, = Hd (a @) le -a a -& = éallo, - wo. G+ od x iG oun - WF = Bésloy = WO = 1) + (0 = Ae = A The exact og likelihood for a Gausian AR(2) proces is thus given by (0) = ~Fros(am) ~ Progior) + Foelcr + a0 ~ ay = at ai rita aa Ba wi + i = oad = n= 6 = byes = dy a = en - WF (53.3) where w= (1 ~ 4 ~ 4) Conditional Maximum Likelihood Estimates Maximization of the exact log likelihood for an AR(D) process [5.3.6] must bbe accomplished numerically. In contrast, the log of the likelihood conditional on the first p observations assumes the simple form LOB Fr ree atetnnts(Pte Yroay s+ YouslYan oss Vi O) T=. Pops = TSP tostan) - E52 ogi) [53.9] Sa = buys = baa ~ Yep he 20 5.3. The Likelihood Function for a Gaussian AR p Process 125 ‘The values ofc, du Gay -- «+p that maximize [5.3.9] are the same as those that B01 = biden — Pair = = [53.10] ‘Thus, the conditional maximum likelihood estimates ofthese parameters can be ‘obtained from en OLS regression ot y,on a constant and p of ts own lagged values. The conditional maximum likelihood estimate of o* turns out to be the average squared residual from this regression: = bye = baa = byvenn ‘The exact maximum likelihood estimates and the conditional maximum likelihood estimates again have the same large-sample distribution Series We noted in Chapter 4 that an OLS regression of a variable on a constant and p of ite age would yield a consistent estimate of the cooticents ofthe Hnear projection, EO Moss Yor Yong provided thatthe proces is ergodic for second moments. This OLS regression also ‘maximizes the Gaussian conditional log likelihood 5.351. Thus. even if the nrocess is non-Gaussian, if we mistakenly form a Gaussian log Helibood function and maximize it the resulting estimates (@ dy dy, 4) will provide consistent estimates of the population parameters in (8.3.1). 'An eatinate Unt nailnien a misspesified lkethood function (fr ex an MLE calculated under the assumption of a Gaussian process when the tre data fe non-Gaussian) is Known a8 a quasi-maximum likelihood extmate. Somedines, a turns out to be the case here, quasi-maximum likelihood estimation provides consistent estimates ofthe population parameters of interest. However, standard errors for the estimated coefficients that are caleulated under the Gaussinity assumption need not be correct if the true data are non-Gaussian? "Alternatively f the raw data are non-Gaussian, sometimes a zimple trans formation suchas taking logs wll produce « Gaussian time series. For a positive sandou variable ¥, Bua and Cox (1964 proposed the geuetal eas of Wansfr~ mations ¥! for A a fora #0 bog ¥, ford = 0, (ue sppronh iso pckx pris vale ofA maine he hein ut for ¥® under the assumption that Y‘ is a Gaussian ARMA process. The value Of A that is associated with the highest value of the maximized lkeliood is taken a5 the best transformation. However, Nelson and Granger (1979) reported dis- ‘ouraging results from this method in practice. ‘These points wee Sit aed by White (182) and we disused further in Serons 58 md 144 126 Chapter 5 | Maximum Likeikood Estimation Li and McLeod (1988) and Janacek and Swift (1990) described approaches to maximum hkeihood esumation for some non-Gaussian ARMA models, Martin (1981) discussed robust time series estimation for contaminated data 5.4, The Likelihood Function for a Gaussian MA(I) Process Conditional Likelihood Function Calculation of the likelihood function for an autoregression turned out to be ‘much simpler it We conditioned om initial values forthe ¥"s, Similarly, calculation ‘of the likelihood function for a moving average process is simpler ‘on intial values for the es Yr uta t bs, (54.11 with 6 ~ iid. MO, 0°). Let © = (1, 6, 0°) denote the population parameters to be estimated. If dhe value of ,.. Were known with certainty, then Flea Mu | 06.1), 0%) fran = eno =O= BE Ha?) aay sae leo = 0) ~ NGs, v”). en Moreover, given abeery well allowing application of [5.2] again: Praran-a dvs fo = 0: 0) = es exp Since fi known with certainty, een be caeulated from foe wm be Proceeding inthis fashion, iis clear that given knowledge that gy = 0, the full sequence { ,. .. #7} canbe ealeuated from (yy, a) by iterating on ew bes 5431 for t= 1, 2, T, starting from ¢ = 0, The conditional density of the ‘th ‘observation can be calculated trom [3.6.2] as ee ee) Princesa: ®) (S44) - nl} 5.4. The Likelihood Function for a Gaussian MA(L) Process 127 ‘The sample likelihood would then be the product of these individual densities: 0) Fetes tle YB r—0 «= 30 araeeohDraaeYonas- +» Jae p = 058). = Frjep-o(dileo = 050) [] Fri. “The conditional log likelihood is £0) = 108 Sipps tinal Ponies We = 0:9) 45.45) T T aa = ~zlosta) - Floater) - D5 For a paricular numerical value of . we thus calculate the sequence of e's implied by the data from (5.4.3). The conditional log likelihood [5.4.5] ls then a function of the sum of equaret of there cs. Although tie simple to program thie iteration by computer, the log likelihood isa faity complicated nonlinear function ‘of wand 6 90 tet an analytical expression forthe maximum ikellhood extimatcs ‘of u and @ is not readily caleulated. Hence, even the conditional maximum like- inood estimates tor an Ma(1) proces mus be rouna by numtencal optimization Iteration on [5.4.3] from an arbitrary starting value of ¢9 will result in = n= B= Wr — H+ PO. ~~ # (DOE — H) + (6 It ois substantially les than unity, the effet of imposing ¢9 ~ 0 will quickly die ‘ut and the conditional likelihood [5.44] will give good approximation tothe ‘unconditional likehhood fora reasonably large simple Size. BY contrast, it | > 1, the consequences of imposing. ey = 0 accumulate overtime. The conditional ap: ‘roach snot reasonable in such a case If america optimization of [5.45] results In value of that exceeds 1 abso vale, the tele mst he dcearded. The ‘numerical optimization should be attempted again with the reciprocal of 8 used as 2 starting vate for the nmesialteareh procedure Exact Likelihood Function ‘Two convenient algorithms are available for calculating the exact likelihood function for a Gaussian MA(1) orocess. One aooroach is to use the Kelman filter discussed in Chapter 13. A second approach uses the tiangular factorization of the variance-covariance matrix, The second approsch ie described here’ ‘As in Section 5.2, the observations on y can be collected in a (T’ x 1) vector 1 Otto cece 7A) wilh mean po Ges tine pw) and (2) variance ‘covariance matrix = BY - WY Wy. ‘he variance-covariance matrix for I consecutive draws from an MA(1) process is fare oe ° e7 ual seu Mie gai) a=] 0 oa ey a 6 o 048) ‘The likelihood function is then fuly 8) = 2m) IO exp Hy - WAY - Ww). (54.6) 128 Chapter 5 | Maximum Likelihood Estimation AA prediction-error decomposition of the likelihood is provided from the tri- angulat facovizaion of = Ann, 5.4.71 where A isthe lower tiangularmtrixvenin(4..18] and isthe diagonal matic ines to Suhctang 47] nt (S8) gee Ju(9:8) = @n)-™ADAI~¥ x expt 1 — WTA DA“ ~ wh Bs we ts me og a gos. Hence, (S43) ADA‘ = [ALIDH4A"| = [D} Farther defining yeaty- wh (549) the likelihood [5.4.8] can be written FAY ®) = Qo)-7 [BIH expl—'D-9) (5.4.10) Notice that [5.4.9] implies 4 The Gistiow of » 18.4.1 {he vector ¥ can thus be calculated by iterating on [5.4.11] for = 2.3... .T7 Starting from J, = y, ~ j. The variable j, has the interpretation as the residual from a linear projection of y,on a constant and ys, Ja «+ Js while the th iaganal element nf gins the MSE af thi Hi Lt 64 be dy ~ BER) = ot LA E [54.33] a es Since D is diagonal, its detevuinaut i the product ofthe terms aluug the principal iagonal, i= IT (54.13) While the inverse of D is obtained by taking reciprocals of the terms along the prncipal aiagonal. Hence, yoy = 32 (54.14) Sabting [3413] and [5.4.4] nto [5-410 the ketoodfncion i nie = anna] "eof 2E2]. sass ese en er ee 40) = tog f(y: 8) = —4 53 woutdy - 33 2 Given numerical values for, 8, and 0°, the sequence J, calelated by iterating oo [4-11] sartng with y, =». — Wie d 8 gwen by 5.4.12) In contrast o the conditional log iketnood function [5.45], expression [54.16 vl be valid regardles of whether 8 is asorated with an iaverble MAC) rep- resentation, The value of [5.416] at 6 = 6, 0? = @2 wll be identical to its value a0 = 6*, oF = Oa; see Exercise 5.1. log(2n) (5.4.16) 3 5.4. The Likelihood Function for a Gaussian MA(1) Process 129

You might also like