Regression Lecture 10

TIME SERIES MODELS Introduction
Time series are data where the response variable has some dependence on a predictor
variable of time. Trends in time series are upwards or downwards movement
that characterize the time series over a given period of time and the simplest
characterization of a trend (T ) is a linear trend so that
Tt = β0 + β1 t.
Commonly, time series show seasonal effects which are periodic patterns repeated
on some regular basis. An individual pattern can stretch over periods as long as a
year with variations on a quarterly or monthly basis or periods as short as a day with
variations over the shift from daylight hours to night–time hours.
The seasonal effects can be modelled by using dummy or indicator variables. If the
value of the response variable at time t is yt , the trend at time t is Tt and the seasonal
effect at time t is denoted by St , the appropriate model is
yt = T t + S t + ε
where if there are k seasons (quarters, months, days of the week), St is given by
St = βS1 IS1 + βS2 IS2 + βS3 IS3 + . . . + βS(k−1) IS(k−1)
where the indicator variables are defined as follows.
1 if the time period is time 1
(
IS1 =
0 otherwise
1 if the time period is time 2
(
IS2 =
0 otherwise
..
.
1 if the time period is time k-1
(
IS(k−1) =
0 otherwise
The indicator variables ensure that the seasonal parameter for say season 3 is added
to the trend in each time period that is period 3 and so we have the model
yt = T t + S t + ε
= Tt + βS1 IS1 + βS2 IS2 + βS3 IS3 + . . . + βS(k−1) IS(k−1)
= Tt + βS1 (0) + βS2 (0) + βS3 (1) + . . . + βS(k−1) (0)
= Tt + βS3 + ε
10.1
TIME SERIES MODELS Example
The table below gives the quarterly bottled gas bills for a farming enterprise over a
period of ten years.
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4
1 344.39 246.63 131.53 288.87

2 313.45 189.76 179.10 221.10
3 246.84 209.00 51.21 133.89
4 277.01 197.98 50.68 218.08
5 365.10 207.51 54.63 214.09
6 267.00 230.28 230.32 426.41
7 467.06 306.03 253.23 279.46
8 336.56 196.67 152.15 319.67
9 440.00 315.04 216.42 339.78
10 434.66 399.66 330.80 539.78
A graph of the bill against the time suggests an upward trend in the bills but the
exact nature of the trend is difficult to distinguish because of the large seasonal
fluctuations. If the average yearly bill is plotted against the year, it appears that the
trend is quadratic and so the trend is modelled as
Tt = β 0 + β 1 t + β 2 t 2
and the seasonal effect is taken into account by using three indicator variables
1 if the time period is quarter 1
(
I1 =
0 otherwise
(
I2 =
0 otherwise
(
I3 =
0 otherwise
to give the regression model
yt = β0 + β1 t + β2 t2 + β3 I1 + β4 I2 + β5 I3 + ε.
10.2
TIME SERIES MODELS Example
Fitting this regression model using Minitab gives the following.
The regression equation is

y = 277 - 7.46 t + 0.301 t^2 + 65.8 I1 - 37.9 I2 - 128 I3
Predictor Coef SE Coef T P

Constant 276.64 35.05 7.89 0.000
t -7.458 3.396 -2.20 0.035
t^2 0.30123 0.08030 3.75 0.001
I1 65.77 27.16 2.42 0.021
I2 -37.87 27.10 -1.40 0.171
I3 -127.61 27.06 -4.72 0.000
S = 60.4726 R-Sq = 74.4% R-Sq(adj) = 70.7%
Analysis of Variance
Source DF SS MS F P
Regression 5 361967 72393 19.80 0.000
Residual Error 34 124336 3657
Total 39 486303
A plot of the residuals against time shows that the signs of the residuals have the
following pattern
+++++-+--+---+--++---++++++-----+----+++
of positive residuals being followed by positive residuals and negative residuals

being followed by negative residuals. This suggests that the residuals show positive
autocorrelation and the value of the Durbin–Watson statistic of 0.839689 confirms
this as for the Durbin–Watson statistic dL,0.05 = 1.23 and dU,0.05 = 1.79.
A problem of using a regression model with autocorrelated residuals is that the least
squares procedure often produces estimates of the standard errors of the coefficients
that are too small and hence can declare variables significant when they are not useful
in the model. Also, if autocorrelated residuals are ignored prediction intervals tend
to be wider and so less accurate.
10.3
TIME SERIES MODELS Modelling Autocorrelation
When the residuals are autocorrelated they can be modelled to solve the problem of
autocorrelation. An autocorrelation structure that is often used is the first–order
autoregressive process which considers the model
εt = ρεt−1 + Ut
where the residual at time t is related to the residual at time t − 1, the Ui each have
zero mean and satisfy the assumptions of the usual regression model and ρ is defined
as the correlation coefficient between εt and εt−1 . If ρ is positive the residuals are
positively correlated and if ρ is negative the residuals are negatively correlated.
For the general model
yt = β0 + β1 xt1 + β2 xt2 + . . . + βp xtp + εt
with
εt = ρεt−1 + Ut
we have
ρyt−1 = ρβ0 + ρβ1 xt−1,1 + ρβ2 xt−1,2 + . . . + ρβp xt−1,p + ρεt−1
and for t = 2, 3, . . . , n by subtraction we have
yt − ρyt−1 = β0 (1 − ρ) + β1 (xt1 − ρxt−1,1 ) + β1 (xt2 − ρxt−1,2 )
+ . . . + β1 (xtp − ρxt−1,p ) + [εt − ρεt−1 ]

We are not able to compute yt −p ρyt−1 for t = 1 but if both sides of the regression
equation for y1 are multiplied by 1 − ρ2 to give
p p p p
1 − ρ2 y1 = 1 − ρ2 β0 + β1 ( 1 − ρ2 x11 ) + β2 ( 1 − ρ2 x12 )
p p
+ . . . + βp ( 1 − ρ2 x1p + 1 − ρ2 εt
this equation along with the equations for t = 2, 3, . . . , n satisfy the regression
assumptions.
In order to transform the data we need to estimate ρ but we have to estimate ρ from
the untransformed data and so an iterative procedure is suggested. This procedure,
known as the Cochran–Orcutt procedure is applied as follows.
10.4
The first step is to fit the untransformed model and save the residuals e1 , e2 , . . . , en .
The second step is to fit the regression model
et = ρet−1 + Ut
using these residuals to obtain the following output

e(t) = 0.584 e(t-1)

Noconstant
e(t-1) 0.5841 0.1369 4.27 0.000
S = 47.0133
Source DF SS MS F P
Regression 1 40263 40263 18.22 0.000
Total 39 124252
and so an estimate of ρ as 0.5841.
The third step is to perform the transformations suggested earlier using r, the estimate
of ρ, so that the vector of responses becomes
√
1 − r 2 y1

 y2 − ry1 
y3 − ry2
 
Y = 
 .. 
.
 
yn − ryn−1
10.5
the design matrix becomes

 √1 − r 2 √
1 − r2 x11 ...
√
1 − r2 x1i ...
√
1 − r2 x1p 
 1−r x21 − rx11 ... x2i − rx1i ... x2p − rx1p 
1−r x31 − rx21 ... x3i − rx2i ... x3p − rx2p
 
X= 
 .
.. .. .. .. 
. . .
 
1−r xn1 − rxn−1,1 ... xni − rxn−1,i ... xnp − rxn−1,p
and the new least squares estimates of the coefficients are obtained as
β̂ = (X ′ X)−1 X ′ Y .
The residuals using these new estimates of the coefficients are calculated and these
new residuals are used to obtain a revised estimate of ρ as in the second step of the
Cochran–Orcutt procedure. This new estimate of ρ is used to compute a new set
of transformed data as in the third step of the Cochran–Orcutt procedure and this
new set of transformed data is used to obtain a new set of revised estimates of the
coefficients. The iterative procedure finishes when the estimates only change by a
small amount between iterations.
Predictions of future values of the response along with corresponding confidence

intervals use the transformed model. If a preduction is being made τ time points
ahead
ŷn+τ = µ̂n+τ + ε̂n+τ
= µ̂n+τ + rε̂n+τ −1 + Ûn+τ
= β̂0 + β̂1 xn+τ,1 + β̂2 xn+τ,2 + . . . + β̂p xn+τ,p + rε̂n+τ −1
as Un+τ is predicted to be zero.
If prediction is being made one time period ahead (τ = 1), as yn has been observed
we have that
ε̂n+τ −1 = ε̂n
= yn − µ̂n
= yn − [β̂0 + β̂1 xn1 + β̂2 xn2 + . . . + β̂p xnp ].
10.6
If prediction is being made more than one time period ahead (τ > 1), as yn+τ −1 has
not been observed we have that
ε̂n+τ −1 = ŷn+τ −1 − µ̂n+τ −1
= ŷn+τ −1 − [β̂0 + β̂1 xn+τ −1,1 + β̂2 xn+τ −1,2 + . . . + β̂p xn+τ −1,p ].
An approximate 100(1-α)% prediction interval for yn+1 is

ŷn+1 ± tn−p;α/2 × s
and for yn+τ for τ ≥ 2 is
p
ŷn+τ ± tn−p;α/2 × s 1 + r2 + . . . + r2(τ −1)
where s is the standard error for the transformed data and r is the final estimate of
ρ.
For the numerical example being considered, the estimates of the coefficients using
the transformed data are given by

ty = 284 constant - 9.19 tt + 0.352 tt^2 + 70.1 tI1 - 35.5 tI2 - 127 tI3

Noconstant
constant 283.98 51.91 5.47 0.000
tt -9.187 5.683 -1.62 0.115
tt^2 0.3522 0.1338 2.63 0.013
tI1 70.06 17.27 4.06 0.000
tI2 -35.49 19.30 -1.84 0.075
tI3 -126.56 16.78 -7.54 0.000
S = 49.5102
Source DF SS MS F P
Regression 6 886772 147795 60.29 0.000
Total 40 970115
10.7
with the standard error for the transformed model being 49.5102. In this case
the estimates of the coefficients do not differ greatly from those obtained using the
untransformed data and so one iteration of the process is sufficient.
Estimates of the quarterly bottled gas bills and appropriate prediction intervals for
the four quarters of the following year are obtained using the above results. For
example, for the first quarter in the following year
ŷ41 = µ̂41 + rε̂40

= [283.98 − 9.187(41) + 0.3522(41)2 + 70.06(1) − 35.49(0) − 126.56(0)]
+ 0.5841(59.76)
= 604.33
where
ε̂40 = 539.78 − [283.98 − 9.187(40) + 0.3522(40)2 + 70.06(0) − 35.49(0) − 126.56(0)]

= 539.78 − 480.02
= 59.76
The confidence interval is 604.33 ± 2.034(49.51) or (503.63, 705.03).
For the fourth quarter in the following year
ŷ44 = µ̂44 + rε̂43

= [283.98 − 9.187(44) + 0.3522(44)2 + 70.06(0) − 35.49(0) − 126.56(0)]
+ 0.5841(11.91)
= 568.57
where ε̂43 = 425.50 − 413.59 = 11.91 .

√
The confidence interval is ŷ44 ± t34;0.025 s 1 + r2 + r4 + r6
p
= 568.57 ± 2.034(49.5102) 1 + (0.5841)2 + (0.5841)4 + (0.5841)6
= 568.57 ± 123.22
= (445.35, 691.79)
10.8

Regression Lecture 10

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Lecture 10

Uploaded by

Copyright:

Available Formats

TIME SERIES MODELS Introduction

Year Quarter 1 Quarter 2 Quarter 3 Quarter 4

1 344.39 246.63 131.53 288.87

Fitting this regression model using Minitab gives the following.

The regression equation is

Predictor Coef SE Coef T P

S = 60.4726 R-Sq = 74.4% R-Sq(adj) = 70.7%

of positive residuals being followed by positive residuals and negative residuals

For the general model

yt = β0 + β1 xt1 + β2 xt2 + . . . + βp xtp + εt

ρyt−1 = ρβ0 + ρβ1 xt−1,1 + ρβ2 xt−1,2 + . . . + ρβp xt−1,p + ρεt−1

and for t = 2, 3, . . . , n by subtraction we have

yt − ρyt−1 = β0 (1 − ρ) + β1 (xt1 − ρxt−1,1 ) + β1 (xt2 − ρxt−1,2 )

+ . . . + β1 (xtp − ρxt−1,p ) + [εt − ρεt−1 ]

The second step is to fit the regression model

using these residuals to obtain the following output

The regression equation is

Predictor Coef SE Coef T P

and so an estimate of ρ as 0.5841.

the design matrix becomes

Predictions of future values of the response along with corresponding confidence

An approximate 100(1-α)% prediction interval for yn+1 is

The regression equation is

Predictor Coef SE Coef T P

ŷ41 = µ̂41 + rε̂40

ε̂40 = 539.78 − [283.98 − 9.187(40) + 0.3522(40)2 + 70.06(0) − 35.49(0) − 126.56(0)]

The confidence interval is 604.33 ± 2.034(49.51) or (503.63, 705.03).

For the fourth quarter in the following year

ŷ44 = µ̂44 + rε̂43

where ε̂43 = 425.50 − 413.59 = 11.91 .

You might also like