You are on page 1of 86

Basic time series concepts ARMA and ARIMA

Univariate Time Series Models

In Univariate Time Series models we attempt to predict a variable using only information
contained in its past values. (i.e. let the data speak for themselves)

Stochastic Process: A sequence of random variables Y1, Y2,..YT. Observed value of a TS is


considered a realization of the stochastic process. (Analogy b/w Population and sample)

A Strictly Stationary Process


A strictly stationary process is one where joint distribution of Y1, Y2,..YT is same as of Y1-k, Y2-k,..YT -k
i.e its properties remain invariant to a time displacement. All the moments are time invariant.

A Weakly Stationary Process


If a series satisfies the next three equations, it is said to be weakly or covariance stationary
1. E(yt) = ,
t = 1,2,...,
2
2. E ( yt )( yt ) = <
3. E ( yt )( yt ) = t t t1 , t2
1

A stationary series with zero mean

Non-stationarity due to changing


mean

Non-stationarity due to changing


variance

Non-stationarity in autocorrelations
well as in variance
A driftless r andom walk Xt=Xt+N(0,9)
50
40
30
20
10
0
-10
-20
-30
-40
250

500

750

1000

Non-Stationarity in autocorrelations
well as in mean and variance
A r a n d o m w a lk w ith d r ift X t= 0 .2 + x t( - 1 ) + N ( 0 ,9 )
240

200

160

120

80

40

0
250

500

750

1000

Non-Stationarity due to mean and


variance: real data

Source: Mukherjee et al(1998). Econometrics and data analysis for developing countries

Log transformation to remove nonstationarity in variance

Why stationarity is required?

For a stochastic process Y1, Y2,..YT we need to estimate:


T means E(Y1), E(Y2), . . .E(YT)
T variances V(Y1), V(Y2), . . .V(YT)
T(T-1)/2 covariances Cov(Yi,Yj), i<j
In all 2T+ T(T-1)/2 = T(T+3)/2 parameters
We only have T time series observations

Some simplifying assumptions are needed to reduce


number of parameters to be estimated
A simplification comes from stationarity assumption
10

Univariate Time Series Models (contd)

So if the process is covariance stationary, all the variances are the same and all
the covariances depend on the difference between t1 and t2. The moments
E ( yt E ( yt ))( yt + s E ( yt + s )) = s , s = 0,1,2, ...
are known as the covariance function.
The covariances, s, are known as autocovariances.
However, the value of the autocovariances depend on the units of measurement
of yt.
It is thus more convenient to use the autocorrelations which are the
autocovariances normalised by dividing by the variance:

= s , s = 0,1,2, ...
s

If we plot s against s=0,1,2,... then we obtain the autocorrelation function or


correlogram.
11

A White Noise Process

A white noise process is one with (virtually) no discernible structure. A definition


of a white noise process is
E ( yt ) =
Var ( yt ) = 2
2 if t = r
t r =
otherwise
0
Thus the autocorrelation function will be zero for s greater or equal to 1
s approximately N(0,1/T) where T = sample size

We can use this to test whether any autocorrelation coefficient is significantly


different from zero and for constructing a confidence interval.

For example, a 95% confidence interval would be given by .196


. If the
T
sample autocorrelation coefficient, $s , falls outside this region for any value of s,
then we reject the null hypothesis that the true value of the coefficient at lag s is
zero.

12

Joint Hypothesis Tests

We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic developed by Box and
m
Pierce:
Q = T k2
k =1

where T = sample size, m = maximum lag length


The Q-statistic is asymptotically distributed as a m2.
However, the Box Pierce test has poor small sample properties, so a variant
has been developed, called the Ljung-Box statistic:
m

Q = T (T + 2 )

k =1

k2
T k

~ m2

This statistic is very useful as a portmanteau (general) test of linear dependence


in time series.
13

An ACF Example

Question:
Suppose that a researcher had estimated the first 5 autocorrelation coefficients
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the BoxPierce and Ljung-Box tests to establish whether they are jointly significant.

Solution:
A coefficient would be significant if it lies outside (-0.196,+0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant. [p-val=1-@cchisq(5.09,5)=0.595]
14

Moving Average Processes

Some economic hypothesis lead to moving average time series structure. Changes in
price of a stock from day 1 to next day behave as a series of uncorrelated random
variables with zero mean and constant variance
i.e. y t = Pt Pt 1 + u t , t = 1, 2, . . ., T
[ ut is uncorrelated random variable]
Random component ut reflects unexpected news e.g. new information about financial
health of a corporation, popularity of the product suddenly rises or falls (due to reports
of desirable or undesirable effects), emergence of a new competitors, revelation of
management scandal etc.
But suppose that full impact of any unexpected news is not completely absorbed by the
market in one day. Then the price change next day might be y t +1 = u t +1 + u t
Where ut +1is the effect of new information received during day t+1 andu reflects
t
the continuing assessment of day t news.
The equation above is a moving average process. The value of economic variable
is yt +1 a weighted combination of current and past period random disturbances.
15

Moving Average Processes

Let ut (t=1,2,3,...) be a sequence of independently and identically


distributed (iid) random variables with E(ut)=0 and Var(ut)= 2 , then
yt = + ut + 1ut-1 + 2ut-2 + ... + qut-q
is a qth order moving average model MA(q).

Its properties are


E(yt)=; Var(yt) = 0 = (1+12 + 22 +...+ q2 )2
Covariances

( s + s +1 1 + s + 2 2 + ... + q q s ) 2
s =
0 for s > q

for

s = 1,2,..., q

16

Example of an MA Problem

1. Consider the following MA(2) process:


X t = u t + 1 u t 1 + 2 u t 2
where ut is a zero mean white noise process with variance 2 .
(i) Calculate the mean and variance of Xt
(ii) Derive the autocorrelation function for this process (i.e. express the
autocorrelations, 1, 2, ... as functions of the parameters 1 and
2).
(iii) If 1 = -0.5 and 2 = 0.25, sketch the acf of Xt.

17

Solution
(i) If E(ut)=0, then E(ut-i)=0 i.
So
E(Xt) = E(ut + 1ut-1+ 2ut-2)= E(ut)+ 1E(ut-1)+ 2E(ut-2)=0 (why ?)
Var(Xt)
but E(Xt)
Var(Xt)

= E[Xt-E(Xt)][Xt-E(Xt)]
= 0, so
= E[(Xt)(Xt)]
= E[(ut + 1ut-1+ 2ut-2)(ut + 1ut-1+ 2ut-2)]
= E[ u t2 + 12 u t21 + 22 u t2 2 +cross-products]

But E[cross-products]=0 since Cov(ut,ut-s)=0 for s0.

(why?)
18

Solution (contd)
So Var(Xt) = 0= E [ u t + 1 u t 1 + 2 u t 2 ]
2
2
2
2
2
= +1 + 2
(why?)
2
2
2
= (1 + 1 + 2 )
2

(ii) The acf of Xt.


1
= E[Xt-E(Xt)][Xt-1-E(Xt-1)] (first order auto covariance)
= E[Xt][Xt-1]
= E[(ut +1ut-1+ 2ut-2)(ut-1 + 1ut-2+ 2ut-3)]
2
2
= E[( 1u t 1 + 1 2 u t 2 )]
= 1 2 + 1 2 2
= ( 1 + 1 2 ) 2

19

Solution (contd)
2

= E[Xt-E(Xt)][Xt-2-E(Xt-2)] (second order auto covariance)


= E[Xt][Xt-2]
= E[(ut + 1ut-1+2ut-2)(ut-2 +1ut-3+2ut-4)]
= E[( 2 u t2 2 )]
2
= 2

= E[Xt-E(Xt)][Xt-3-E(Xt-3)]
= E[Xt][Xt-3]
= E[(ut +1ut-1+2ut-2)(ut-3 +1ut-4+2ut-5)]
=0

So s = 0 for s > 2.
20

Solution (contd)
We have the autocovariances, now calculate the autocorrelations:

0 = 0 = 1
0

( 1 + 1 2 ) 2
1
( 1 + 1 2 )
1 =
=
=
0 (1 + 12 + 22 ) 2 (1 + 12 + 22 )
( 2 ) 2
2
2
2 =
=
=
0 (1 + 12 + 22 ) 2 (1 + 12 + 22 )

3 = 3 = 0
0

s = s = 0 s > 2
0

(iii) For 1 = -0.5 and 2 = 0.25, substituting these into the formulae above
gives 1 = -0.476, 2 = 0.190.
21

ACF Plot
Thus the ACF plot will appear as follows:
1.2
1
0.8
0.6

acf

0.4
0.2
0
0

-0.2
-0.4
-0.6

22

Autoregressive Processes
Economic activity takes time to slow down and speed up. There is a
built in inertia in economic series. A simple process that characterize
this process is the first order autoregressive process
yt = + 1 yt 1 + ut

Where is an intercept parameter and it is assumed that 1 < 1 < 1


ut is uncorrelated random error with mean zero and variance 2
yt is seen to comprise two parts (in addition to intercept)
i. 1yt1 carry over component depending on last period value of yt
Ii. ut new shock to the level of economic variable in current period

23

Autoregressive Processes

An autoregressive model of order p, an AR(p) can be expressed as

y t = + 1 y t 1 + 2 y t 2 + ... + p y t p + u t

Or using the lag operator notation:


Lyt = yt-1
Liyt = yt-i
p

y t = + i y t i + u t
i =1

i
or y t = + i L y t + u t
i =1

or ( L) y t = + u t

where

( L) = 1 (1 L + 2 L2 +... p Lp ) .
24

The Stationary Condition for an AR Model

The condition for stationarity of a general AR(p) model is that the roots of
polynomial
2
p

lag

1 1 L 2 L ... p L = 0

all lie outside the unit circle i.e. have their absolute value greater than one.

A stationary AR(p) model is required for it to have an MA() representation.

Example 1: Is yt = yt-1 + ut stationary?


The characteristic root is 1, so it is a unit root process (so non-stationary)
(simulation exercise, acf, pacf)
Example 2: Is yt = 1.2yt-1 - 0.32yt-2 +ut stationary?
The characteristic polynomial is

1 1.2 L + 0.32 L2 = 0

0.32 L2 1.2 L + 1 = 0

Characteristic roots are 2.5 and 1.25 both outside the unit circle, the process is
stationary.

25

Wolds Decomposition Theorem

States that any stationary series can be decomposed into the sum of two
unrelated processes, a purely deterministic part and a purely stochastic
part, which will be an MA().

For the AR(p) model, ( L) y t = u t , ignoring the intercept, the Wold


decomposition is

y t = ( L)u t
where,
( L) = (1 1 L 2 L2 ... p Lp ) 1

26

Sample AR Problem

Consider the following simple AR(1) model

yt = + 1 yt 1 + ut
(i) Calculate the (unconditional) mean of yt.
For the remainder of the question, set =0 for simplicity.
(ii) Calculate the (unconditional) variance of yt.
(iii) Derive the autocorrelation function for yt.

27

Solution

1 < < 1 so that AR(1) process is


(i) Unconditional mean: Assume that
stationary
Stationarity implies that mean and variance are same for all yt t= 1,2,.
E(yt) = E(+1yt-1)
= + 1 E ( yt )
so E ( yt ) 1 E ( yt ) =
1

( y

28

Solution (contd)

(ii) Calculating the variance of yt:


From
Wolds
decomposition
theorem:
yt (1 1 L) = ut

yt = (1 1 L) 1 ut

yt = (1 + 1 L + 1 L2 + ...)ut
2

yt = ut + 1ut 1 + 1 ut 2 + ...
2

Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0, since we are setting = 0.
Var(yt) = E[(yt)(yt)]

29

Solution (contd)
Var(yt)=E

=E

(u

(ut + 1 ut 1 + 1 ut 2 + ... + cross products )]


2

=E (ut
=

)(

+ 1u t 1 + 1 u t 2 + .. u t + 1u t 1 + 1 u t 2 + ..

+ 1 ut 1 + 1 ut 2 + ...)]
2

u2 + 12 u2 + 14 u2 + ...

u2 (1 + 12 + 14 + ...)

u2
(1 12 )

30

Solution (contd)

(iii) Turning now to calculating the acf, first calculate


autocovariances:
1 = Cov(yt, yt-1) = E[yt-E(yt)][yt-1-E(yt-1)]
Since has been set to zero, E(yt) = 0 and E(yt-1) = 0, so
1 = E[ytyt-1]
2
2
1 = E[ (u t + 1u t 1 + 1 u t 2 + ...) (u t 1 + 1u t 2 + 1 u t 3 + ...)]
2
3
2
= E[ 1 u t 1 + 1 u t 2 + ... + cross products ]
=

the

1 2 + 13 2 + 15 2 + ...
1 2

=
(1 12 )
(make a bivariate table for understanding product of brackets)
31

Solution (contd)
For the second autocorrelation coefficient,
2 = Cov(yt, yt-2) = E[yt-E(yt)][yt-2-E(yt-2)]
Using the same rules as applied above for the lag 1 covariance
2 = E[ytyt-2]
2
2
= E[(ut + 1ut 1 + 1 ut 2 + ...)(u t 2 + 1u t 3 + 1 u t 4 + ...) ]
= E[ 1 2 u t 2 2 + 1 4 u t 3 2 + ... + cross products]
= 12 2 + 14 2 + ...
2
2
2
4
= 1 (1 + 1 + 1 + ...)
=

12 2
(1 12 )
32

Solution (contd)

If these steps were repeated for 3, the following expression would be


obtained
3 =

13 2
(1 12 )

and for any lag s, the autocovariance would be given by

s =

1s 2
(1 12 )

The acf can now be obtained by dividing the covariances by the


variance:
33

Solution (contd)
0
=1
0 =
0

1 2

(
1

1 )
1

= 1
1 = =

0
2

(
1

1 )

2 =

2
0

2 2
1

(
1

1 )

=
= 12

(
1

1 )

3
3 = 1

s = 1s
34

The Partial Autocorrelation Function (denoted kk)

Measures the correlation between an observation k periods ago and the


current observation, after controlling for observations at intermediate lags
(i.e. all lags < k).

So kk measures the correlation between yt and yt-k after removing the effects
of yt-k+1 , yt-k+2 , , yt-1 .

At lag 1, the acf = pacf always

At lag 2, 22 = (2-12) / (1-12)

For lags 3+, the formulae are more complex.

35

The Partial Autocorrelation Function (denoted kk)


(contd)

The pacf is useful for telling the difference between an AR process and an
ARMA process.

In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.

So for an AR(p), the theoretical pacf will be zero after lag p.

In the case of an MA(q), this can be written as an AR(), so there are direct
connections between yt and all its previous values.

For an MA(q), the theoretical pacf will be geometrically declining.


36

ARMA Processes

By combining the AR(p) and MA(q) models, we can obtain an ARMA(p,q)


model:
( L) y t = + ( L)u t
where ( L) = 1 1 L 2 L2 ... p Lp
and ( L) = 1 + 1L + 2 L2 + ... + q Lq
or

y t = + 1 y t 1 + 2 y t 2 + ... + p y t p + 1u t 1 + 2 u t 2 + ... + q u t q + u t

2
2
with E (u t ) = 0; E (u t ) = ; E (u t u s ) = 0, t s

37

The Invertibility Condition

Similar to the stationarity condition, we typically require the MA(q) part of


the model to have roots of (z)=0 greater than one in absolute value.
An inventible MA(q) process can be expressed as infinite order AR
process
The mean of an ARMA series is given by
E ( yt ) =

1 1 2 ...p

The autocorrelation function for an ARMA process will display


combinations of behaviour derived from the AR and MA parts, but for lags
beyond q, the acf will simply be identical to the individual AR(p) model.

38

Summary of the Behaviour of the acf for


AR and MA Processes

An autoregressive process has


a geometrically decaying acf
number of spikes of pacf = AR order
A moving average process has
Number of spikes of acf = MA order
a geometrically decaying pacf

39

Summary of the Behaviour ACF and PACF

40

Can you identify the appropriate


ARIMA model from this Pacf?

41

First or second difference needs to be performed?

42

Some sample acf and pacf plots


for standard processes
The acf and pacf are not produced analytically from the relevant formulae for a model of that
type, but rather are estimated using 100,000 simulated observations with disturbances drawn
from a normal distribution.
ACF and PACF for an MA(1) Model: yt = 0.5ut-1 + ut
0.05
0
1

10

-0.05

acf and pacf

-0.1
-0.15
-0.2
-0.25
-0.3
acf
-0.35

pacf

-0.4
-0.45
Lag

43

ACF and PACF for an MA(2) Model:


yt = 0.5ut-1 - 0.25ut-2 + ut
0.4

acf

0.3

pacf
0.2

acf and pacf

0.1

0
1

10

-0.1

-0.2

-0.3

-0.4
Lags

44

ACF and PACF for a slowly decaying AR(1) Model:


yt = 0.9yt-1 + ut
1
0.9
acf
pacf

0.8
0.7

acf and pacf

0.6
0.5

0.4
0.3
0.2
0.1
0
1

10

-0.1
Lags

45

ACF and PACF for a more rapidly decaying AR(1)


Model: yt = 0.5yt-1 + ut
0.6

0.5

acf
pacf

acf and pacf

0.4

0.3

0.2

0.1

0
1

10

-0.1
Lags

46

ACF and PACF for a more rapidly decaying AR(1)


Model with Negative Coefficient: yt = -0.5yt-1 + ut
0.3
0.2
0.1

acf and pacf

0
1

10

-0.1
-0.2
-0.3
-0.4

acf
pacf

-0.5
-0.6
Lags

47

ACF and PACF for a Non-stationary Model


(i.e. a unit coefficient): yt = yt-1 + ut
1
0.9

acf
pacf

0.8

acf and pacf

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1

10

Lags

48

ACF and PACF for an ARMA(1,1):


yt = 0.5yt-1 + 0.5ut-1 + ut
0.8

0.6

acf
pacf
acf and pacf

0.4

0.2

0
1

10

-0.2

-0.4
Lags

49

Building ARMA Models


- The Box Jenkins Approach

Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model in a systematic manner. There are 3 steps to their approach:
1. Identification
2. Estimation
3. Model diagnostic checking

Step 1:
- Involves determining the order of the model.
- Use of graphical procedures
- A better procedure is now available

50

Building ARMA Models


- The Box Jenkins Approach (contd)
Step 2:
- Estimation of the parameters
- AR model can be estimated using least square, while MA and mixed
(ARMA/ARIMA) involve non-linear parameter models can be
estimated iteratively using maximum likelihood.
Step 3:
- Model checking
Box and Jenkins suggest 2 methods:
- deliberate overfitting
- residual diagnostics
51

Estimation of ARIMA models

Consider MA(1) model


Box and Jenkins suggest a grid search procedure
Estimate = x and equating first sample and population autocorrelations
functions r1 = 1 /(1 + 12 ) . Using them as starting
values and assuming 0 = 1 = 0 compute
by recursive
1 = X 1
substitution as follows
t = X t 1 t 1 , t 2
Compute
for each set of values in a suitable range. Point estimates
of the parameters are obtained where error sum of square is minimized.
If tare assumed normally distributed, the Maximum likelihood
estimates as same as LS. Formula for asymptotic distribution of variances
of ML estimators can be applied for computing standard error and
confidence intervals. More complex models can estimated similarly.
52

Some More Recent Developments in


ARMA Modelling

Identification would typically not be done using acfs.

We want to form a parsimonious model.

Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features

This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters

The object is to choose the number of parameters which minimises the


information criterion.

53

Information Criteria for Model Selection

The information criteria vary according to how stiff the penalty term is.
The three most popular criteria are Akaikes (1974) information criterion
(AIC), Schwarzs (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).
AIC = ln($ 2 ) + 2 k / T
k
SBIC = ln( 2 ) + ln T
T
2k
HQIC = ln( 2 ) +
ln(ln(T ))
T
where k = p + q + 1, T = sample size. So we min. IC s.t. p p, q q
SBIC embodies a stiffer penalty term than AIC.
Which IC should be preferred if they suggest different model orders?
SBIC is strongly consistent but (inefficient).
AIC is not consistent, and will typically pick bigger models.
54

ARIMA Models

Box-Jenkins approach assumes that variable to be modelled is


stationary
ARIMA is distinct from ARMA models. The I stands for integrated.

An integrated autoregressive process is one with a characteristic root


on the unit circle. (i.e. a non-stationary process)

Typically researchers difference the variable as necessary and then


build an ARMA model on those differenced variables.

An ARMA(p,q) model in the variable differenced d times is equivalent


to an ARIMA(p,d,q) model on the original data.
55

Forecasting in Econometrics

Forecasting = prediction.
An important test of the adequacy of a model. e.g.
Forecasting tomorrows return on a particular share
Forecasting the price of a house given its characteristics
Forecasting the riskiness of a portfolio over the next year
Forecasting the volatility of bond returns

We can distinguish two approaches:


- Econometric (structural) forecasting
- Time series forecasting

The distinction between the two types is somewhat blurred (e.g, VARs).

56

In-Sample Versus Out-of-Sample

Expect the forecast of the model to be good in-sample.

Say we have some data - e.g. monthly KSE-100 index returns for 120
months: 1990M1 1999M12. We could use all of it to build the model, or
keep some observations back:

A good test of the model since we have not used the information from
1999M1 onwards when we estimated the model parameters.

57

How to produce forecasts

Multi-step ahead versus single-step ahead forecasts

Recursive versus rolling windows

To understand how to construct forecasts, we need the idea of conditional


expectations:
E(yt+1 | t )

We cannot forecast a white noise process: E(ut+s | t ) = 0 s > 0.

The two simplest forecasting methods


1. Assume no change : f(yt+s) = yt
2. Forecasts are the long term average f(yt+s) = y
58

Models for Forecasting (contd)

Time Series Models


The current value of a series, yt, is modelled as a function only of its previous
values and the current value of an error term (and possibly previous values of
the error term).

Models include:
simple unweighted averages
exponentially weighted averages
ARIMA models
Non-linear models e.g. threshold models, GARCH, etc.
59

Forecasting with ARMA Models

The forecasting model typically used is of the form:


p

i =1

j =1

f t , s = + i f t , s i + j ut + s j
where ft,s = yt+s , s 0; ut+s = 0, s > 0
= ut+s , s 0

60

Forecasting with MA Models

An MA(q) only has memory of q.


e.g. say we have estimated an MA(3) model:
yt = + 1ut-1 + 2ut-2 + 3ut-3 + ut
yt+1 = + 1ut + 2ut-1 + 3ut-2 + ut+1
yt+2 = + 1ut+1 + 2ut + 3ut-1 + ut+2
yt+3 = + 1ut+2 + 2ut+1 + 3ut + ut+3

We are at time t and we want to forecast 1,2,..., s steps ahead.

We know yt , yt-1, ..., and ut , ut-1


61

Forecasting with MA Models (contd)

ft, 1 = E(yt+1 | t )

=
=

E( + 1ut + 2ut-1 + 3ut-2 + ut+1)


+ 1ut + 2ut-1 + 3ut-2

ft, 2 = E(yt+2 | t )

=
=

E( + 1ut+1 + 2ut + 3ut-1 + ut+2)


+ 2ut + 3ut-1

ft, 3 = E(yt+3 | t )

=
=

E( + 1ut+2 + 2ut+1 + 3ut + ut+3)


+ 3ut

ft, 4 = E(yt+4 | t )

ft, s = E(yt+s | t )

s4
62

Forecasting with AR Models

Say we have estimated an AR(2)


yt = + 1yt-1 + 2yt-2 + ut
yt+1 = + 1yt + 2yt-1 + ut+1
yt+2 = + 1yt+1 + 2yt + ut+2
yt+3 = + 1yt+2 + 2yt+1 + ut+3
ft, 1 = E(yt+1 | t ) = E( + 1yt + 2yt-1 + ut+1)
= + 1E(yt) + 2E(yt-1)
= + 1yt + 2yt-1
ft, 2 = E(yt+2 | t ) = E( + 1yt+1 + 2yt + ut+2)
= + 1E(yt+1) + 2E(yt)
= + 1 ft, 1 + 2yt
63

Forecasting with AR Models (contd)


ft, 3 = E(yt+3 | t ) = E( + 1yt+2 + 2yt+1 + ut+3)
= + 1E(yt+2) + 2E(yt+1)
= + 1 ft, 2 + 2 ft, 1

We can see immediately that


ft, 4 = + 1 ft, 3 + 2 ft, 2 etc., so
ft, s = + 1 ft, s-1 + 2 ft, s-2

Can easily generate ARMA(p,q) forecasts in the same way.


64

How can we test whether a forecast is accurate or not?


For example, say we predict that tomorrows return on the FTSE will be 0.2, but
the outcome is actually -0.4. Is this accurate? Define ft,s as the forecast made at
time t for s steps ahead (i.e. the forecast made for time t+s), and yt+s as the
realised value of y at time t+s.
Some of the most popular criteria for assessing the accuracy of time series
forecasting techniques are:

1
MSE =
N
MAE is given by

1
MAE =
N

t =1

t =1

Mean absolute percentage error:

( yt + s f t , s ) 2
yt + s f t , s

1 N yt +s ft ,s
MAPE = 100
N t =1
yt +s

65

Box-Jenkins Methodology Summarized

66

Illustrations of Box-Jenkins
methodology-I (Pak GDP forecasting)
Year
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974

GDP
82085
86693
92737
98902
108259
115517
119831
128097
135972
148343
149900
153018
163262
174712

Year
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988

GDP
180404
186479
191717
206746
218258
233345
247831
266572
284667
295977
321751
342224
362110
385416

Year
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999

Pakistan's real GDP at 1980-81 factor cost (Rs million)

GDP
403948
422484
446005
480413
487782
509091
534861
570157
579865
600125

625223

700,000
600,000
500,000
400,000
300,000
200,000
100,000

An upward non-linear trend with some


evidence of increasing variability

0
1965

1970

1975

1980

1985

1990

1995

67

Pakistan GDP forecasting


Stationarity and Identification
First difference of GDP still seems to have some trend with high
variability near end of sample. First difference of log GDP appears
to be relatively trend less.
First difference of GDP

First difference of log (GDP)

36,000

.10

32,000
28,000

.08

24,000
.06

20,000
16,000

.04

12,000
8,000

.02
4,000
0
1965

1970

1975

Var (d(GDP))=71614669

1980

1985

1990

1995

.00

Var (d(logGDP))= 0.00039

1965

1970

1975

1980

1985

1990

1995

68

Pakistans GDP forecasting


Stationarity and Identification
Over differencing needs to be avoided
Second differences also appear to be stationary with some outliers
Second difference of log GDP

Second difference of GDP


.06

20,000

.04
10,000
.02
.00

-.02
-10,000
-.04
-20,000

-.06
-.08

-30,000
1965

1970

1975

1980

1985

1990

1995

1965

1970

1975

1980

1985

1990

1995

Var(d(gdp),2)=72503121 Var(d(log(gdp),2)=0.00074

69

Stationarity and Identification

GDP series appears to have very slowly decaying autocorrelation and


single spike at lag 1 possibly indicates that GDP is a random walk. First
differenced GDP has many significant autocorrelations, which can also be
seen from Ljung-Box stats and p-values

70

Stationarity and Identification

Log of GDP has same autocorrelation structure as GDP. First difference of


log (GDP) looks like white noise. Also look at the Q-stats and p-values

71

Stationarity and Identification


Second differencing seems to be unnecessary. So we
work with first difference of log(GDP).i.e. d=1. ACF
and PACF do not show any nice looking theoretical
pattern.

72

Stationarity and Identification

We will consider fitting several ARIMA(p,1,q) models

ARIMA (p,d,q)

AIC

BIC

ARIMA (1,1,0)

-4.879

-4.792

ARIMA (4,1,0)

-4.932

-4.708

ARIMA (0,1,1)

-4.910

-4.824

ARIMA (0,1,4)

-5.370

-5.284

ARIMA (4,1,4)

-5.309

-5.174

ARIMA (5,1,5)

-5.249

-5.113

ARIMA (1,1,4)

-5.333

-5.202

ARIMA(0,1,4), is identified as the best models using the two model selection
criteria. Smaller the values of the selection criteria better is the in-sample fit
73

Estimation of the models

Estimation output of two best


fitting models

(1 L) yt = (1+1L +2 L2 +3L3 +4 L4 )t
(1 L) yt = (1 0.104L + 0.165L 0.201L + 0.913L )t
2

74

Model Diagnostics
We look at the correlogram of the estimated model.
The residuals appear to be white noise. P-values of Qstats of ARIMA(0,1,4) are smaller.

75

Forecasting: In sample Estimation


To compare the out sample performance of the competing
forecasting models, we hold out last few observations. In this case
the out sample performance will be compared using 5 year hold
out sample 1995-1999
Re-estimate the model using sample 1961-1994
Observed GDP and and fitted GDP from ARIMA(0,1,4) model
700,000
ARIMA(0,1,4) model shows
600,000
some under estimation near
500,000
400,000
end of sample
300,000
200,000
100,000
0
1965

1970

1975

1980

GDP

1985
GDPF6

1990

1995

76

Forecasting: In sample Estimation


Similar underestimation is
observed for ARIMA(1,1,4)
model
We will select the forecasting
model using
out sample accuracy
measures e.g.
RMSE or MAPE, which
Eviews report under
Forecasting tab
RMSE =

(Y Y )
h

700,000

Observed GDP and and fitted GDP from ARIMA(1,1,4) model

600,000
500,000
400,000
300,000
200,000
100,000
0
1965

1970

1975

1980

GDP

1985

1990

1995

GDPF7

77

Out sample forecast evaluation

Using the two competing models the forecasts are generated as follows:

Year

Observed

ARIMA(0,1,4)

ARIMA(1,1,4)

1995

534861.0

536938.6

539376.9

1996

570157.0

569718.6

570955.6

1997

579865.0

584971.0

587198.2

1998

600125.0

615367.1

61828.2

1999

625223.0

648580.1

652246.0

RMSE

12715.77

15064.7

Note: Static forecast option for dynamic models (e.g ARIMA) in Eviews uses
actual values of lagged dependent variable, while dynamic forecast option uses
previously forecasted values of lagged dependent variable.
ARIMA(0,1,4) generates better forecasts as seen by smaller value of RMSE

78

Box-Jenkins Method :Application II


(Airline Passenger Data)
The given data on number of airline passenger has been
analyzed by several authors including Box and Jenkins.

79

Airline Passenger Data


Stationarity and Identification
The time series plot indicates an upward trend with
with seasonality and increasing variability. Log
transformation seem to stabilize variance. Seasonality
has to be modeled.
Num ber of airline passengers in thousand

LO G (P AS S E NGE R S )

700

6.50

600

6.25

500

6.00

400

5.75
5.50

300

5.25
200
5.00
100
4.75
0
49

50

51

52

53

54

55

56

57

58

59

60

61

4.50
49

50

51

52

53

54

55

56

57

58

59

60

80

61

Airline Passenger Data


Stationarity and Identification
First difference of eliminates trend. Seasonality is evident. A
seasonal difference Yt -Yt-12 is also needed after first difference.
This is done in Eviews as d(log(Yt),1,12). Both trend and
seasonality appear to be removed.
D(LOG(PASSENGERS))

D(LOG(PASSENGERS),1,12)

.3

.15

.2

.10

.1

.05

.0

.00

-.1

-.05

-.2

-.10

-.3

-.15
49

50

51

52

53

54

55

56

57

58

59

60

61

49

50

51

52

53

54

55

56

57

58

59

60

61

81

Airline Passenger Data


Stationarity and Identification

Lets have a look at ACF, PACF. ACF and PACF of d(log(Yt),1,12) indicate some significant
values at lag 1, and 12. We will do further work on d(log(Yt),1,12)

82

Airline Passenger Data: Identification


We will choose suitable model
using AIC, BIC criteria with
seasonal moving average
SMA(12) or seasonal
autoregressive SAR (12) to
be included. Both AIC and
BIC criteria point towards a
mixed ARIMA (1,1,1)
model with a seasonal moving
average term of order 12.

Models

AIC

BIC

MA(1) SMA(12)

-3.754

-3.689

AR(1) SMA(12)

-3.744

-3.678

AR(1) SAR(12)

-3.655

-3.585

MA(1) SAR (12)

-3.677

-3.609

AR(1) MA(1)
SAR(12)

-3.656

-3.562

AR(1) MA(1)
SMA(12)

-3.779

-3.691

83

Airline Passenger Data: Estimation


All the coefficient in the estimated
model AR(1) MA(1) SMA (12)
are significant. The estimated
model in compact form is
(1 1 L) yt = (1 + L)(1 + wL12 ) t
where yt = d (log( passenger ,1,12)
(1 0.661L) yt = (1 0.957 L)(1 0.867 L12 ) t
where yt = d (log( passenger ,1,12)

84

Airline Passenger Data: Diagnostic


Checks
After estimating the above
model Use Eviews
command ident resid.The
residuals appear to be
white noise.

85

Airline Passenger Data: Forecasting

Here is the graph of observed and


fitted passengers. The forecast are
given in the table below:

Month

Forecast

700

1961.01

442.3

600

1961.02

429.45

500

1961.03

490.40

400

1961.04

484.82

1961.05

490.93

1961.06

560.17

1961.07

629.91

100

1961.08

626.91

1961.09

539.16

1961.10

474.11

1961.11

412.15

1961.12

462.14

Obse rved values of number of passengers and forecast for 1961

300
200

49

50

51

52

53

54

PASSENGERS

55

56

57

58

59

60

61

PASSENGERSF

86

You might also like