You are on page 1of 23

AUTOREGRESSIVE

INTEGRATED MOVING
AVERAGE - ARIMA

Chirag Jain

CONTENTS

What is ARIMA?

Types of ARIMA models

When to use ARIMA?

How to apply ARIMA?

Special cases of ARIMA

Example Tractor Sales

WHAT IS ARIMA?

A statistical analysis model that uses time series data to predict


future trends.
ARIMA models are an adaptation of discrete-time filtering methods
developed in 1930s-1940s by electrical engineers (Norbert Wiener et
al.). Statisticians George Box and Gwilym Jenkins developed
systematic methods for applying them to business & economic data in
the 1970s (hence the name Box-Jenkins models)
It is a form of regression analysis that seeks to predict future
movements along the seemingly random walk by examining the
differences between values in the series instead of using the actual
data values.

WHAT IS ARIMA CONTD.

Lags of the differenced series are referred to as


"autoregressive" and lags within forecasted data are
referred to as "moving average."
A series which needs to be differenced to be made
stationary is an integrated (I) series.
Lags of the stationarized series are called autoregressive (AR) terms.
Lags of the forecast errors are called moving average
(MA) terms.

TYPES OF ARIMA MODELS:

Non-Seasonal ARIMA model:


ARIMA(p, d, q)

Seasonal ARIMA model:

Seasonality in a time series is a regular pattern of changes that


repeats over S time periods, where S defines the number of time
periods until the pattern repeats again.

The seasonal ARIMA model incorporates both non-seasonal and


seasonal factors in a multiplicative model.
ARIMA(p, d, q) (P, D, Q)S,

WHEN TO USE ARIMA?

Its main application is in the area of short term


forecasting requiring at least 40 historical data points.
It works best when your data exhibits a stable or
consistent pattern over time with a minimum amount
of outliers.
ARIMA is usually superior to exponential smoothing
techniques when the data is reasonably long and the
correlation between past observations is stable.
If you do not have at least 38 data points, you should
consider some other method than ARIMA.

HOW TO APPLY ARIMA?

The first step in applying ARIMA methodology is to check for


stationarity. "Stationarity" implies that the series remains at a
fairly constant level over time.
If

a trend exists, as in most economic or business applications,


then your data is NOT stationary. The data should also show a
constant variance in its fluctuations over time.

This

is easily seen with a series that is heavily seasonal and


growing at a faster rate. In such a case, the ups and downs in
the seasonality will become more dramatic over time.

Without

these stationarity conditions being met, many of the


calculations associated with the process cannot be computed.

HOW TO APPLY ARIMA? CONTD.

Differencing is an excellent way of transforming a nonstationary series to a stationary one.


This

is done by subtracting the observation in the current


period from the previous one.
If

this transformation is done only once to a series, you say


that the data has been "first differenced".
This

process essentially eliminates the trend if your series is


growing at a fairly constant rate.
If

it is growing at an increasing rate, you can apply the same


procedure and difference the data again. Your data would then
be "second differenced".

HOW TO APPLY ARIMA? CONTD.

"Autocorrelations" are numerical values that measures how strongly


data values at a specified number of periods apart are correlated to
each other over time. The number of periods apart is usually called the
"lag".
For

example, an autocorrelation at lag 1 measures how values 1


period apart are correlated to one another throughout the series. An
autocorrelation at lag 2 measures how the data two periods apart are
correlated throughout the series.
Autocorrelations may range from +1 to -1.
These

measures are most often evaluated through graphical plots


called "correlagrams".
A

correlagram plots the auto- correlation values for a given series at


different lags. This is referred to as the "autocorrelation function" and
is very important in the ARIMA method.

HOW TO APPLY ARIMA? CONTD.

ARIMA methodology attempts to describe the movements in a


stationary time series as a function of what are called "autoregressive
and moving average" parameters. These are referred to as AR
parameters (autoregessive) and MA parameters (moving averages).
First - An AR model with only 1 parameter may be written as:
X(t) = A(1) * X(t-1) + E(t)
where X(t) = time series under investigation
A(1) = the autoregressive parameter of order 1
X(t-1) = the time series lagged 1 period
E(t) = the error term of the model

X(t) = A(1) * X(t-1) + A(2) * X(t-2) + E(t)

HOW TO APPLY ARIMA? CONTD.

A second type of Box-Jenkins model is called a


"moving average" model.
Moving average parameters relate what happens
in period t only to the random errors that
occurred in past time periods, i.e. E(t-1), E(t-2),
etc. rather than to X(t-1), X(t-2), (Xt-3) as in the
autoregressive approaches.
A moving average model with one MA term may
be written as follows...
X(t) = -B(1) * E(t-1) + E(t)

HOW TO APPLY ARIMA? CONTD.


Forecast for y at time t
= constant + weighted sum of the last p values of y +
weighted sum of the last q forecast errors

yt 1 yt 1 ... p yt p1et 1 ...q et q

The lagged values of y that appear in the equation are


called autoregressive (AR) terms, and the lagged values of
the forecast errors are called moving-average (MA) terms.
The resulting model is called an ARIMA(p,d,q) model if
the constant is assumed to be zero, and it is an
ARIMA(p,d,q)+constant model if the constant is not
zero.

SPECIAL CASES OF ARIMA MODEL

ARIMA(1,0,0) = first-order autoregressive model:


t = + 1Yt-1
which is Y regressed on itself lagged by one period.

ARIMA(2,0,0) = second-order autoregressive model:


t = + 1Yt-1 + 2Yt-2

ARIMA(0,1,0) = random walk:


t = + Yt-1

SPECIAL CASES OF ARIMA MODEL


CONTD.

ARIMA(1,1,0) = differenced first-order autoregressive


model:
t = + Yt-1 + 1 (Yt-1 - Yt-2)

ARIMA(0,1,1) with constant = simple exponential


smoothing:
t = + Yt-1 - 1et-1

EG: TRACTOR SALES

Plot tractor sales data as time series

data<-read.csv([location of data])
data<-ts(data[,2],start = c(2003,1),frequency = 12)
plot(data, xlab=Years, ylab = Tractor Sales)

EG: TRACTOR SALES CONTD.

Difference data to make data stationary on mean


(remove trend)
plot(diff(data),ylab=Differenced Tractor Sales)

EG: TRACTOR SALES CONTD.

log transform datato make data stationary on


variance
plot(log10(data),ylab=Log (Tractor Sales))

EG: TRACTOR SALES CONTD.

Differencelog transform datato make data


stationary on both mean and variance
plot(diff(log10(data)),ylab=Differenced Log

(Tractor Sales))

EG: TRACTOR SALES CONTD.

Plot ACF and PACF to identify potential AR and MA


model

EG: TRACTOR SALES CONTD.

Identification of best fit ARIMA model


require(forecast)
ARIMAfit <- auto.arima(log10(data),
approximation=FALSE,trace=FALSE)
Summary(ARIMAfit)
Best fit Model: ARIMA(0,1,1)(0,1,1)[12]
As expected,our model has I (or integrated) component equal to
1. This representsdifferencing of order 1. There is additional
differencing of lag 12 in the abovebest fit model. Moreover,the
best fit model has MA value of order 1. Also, there is seasonal
MA with lag 12 of order 1.

EG: TRACTOR SALES CONTD.

Forecast sales using the best fitARIMA model


pred <- predict(ARIMAfit, n.ahead = 36)
pred
plot(data,type=l,xlim=c(2004,2018),ylim=c(1,1600),xlab = Year,ylab =
Tractor Sales)
lines(10^(pred$pred),col=blue)
lines(10^(pred$pred+2*pred$se),col=orange)
lines(10^(pred$pred-2*pred$se),col=orange)

Plot ACF and PACF for residuals of ARIMA model to


ensure no more information is left for extraction

Thank You

You might also like