You are on page 1of 22

Lecture Notes 20152016

Advanced Econometrics I
Chapter 6

Francisco Blasques

These lecture notes contain the material covered in the master course
Advanced Econometrics I. Further study material can be found in the
lecture slides and the many references cited throughout the text.

Contents
6 Asymptotic normality
6.1 Statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The classical asymptotic normality theorem . . . . . . . . . . . .
6.3 Approximate Statistical Inference . . . . . . . . . . . . . . . . . .
6.4 Asymptotic Normality: Well-Behaved Functions . . . . . . . . . .
6.4.1 Stochastic Equicontinuity . . . . . . . . . . . . . . . . . .
6.4.2 Misspecification and a CLT for Lp -Approximable Processes
6.4.3 Score Normality for Time-Varying Parameter Models . . .
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

3
3
5
13
16
16
17
18
20

Asymptotic normality
Some more reading material:
1. Newey and McFadden (1994), Large sample estimation and hypothesis testing. Chapter 36 of Handbook of Econometrics.
Sections 3 and 4
2. Domowitz and White (1982), Misspecified Models with Dependent
Observations. Journal of Econometrics, issue 20.
3. White (1996), Estimation Inference and Specification Analysis
Chapters 6, 8 and 11
4. van der Vaart (1998), Asymptotic Statistics
Chapter 5.35.6
5. Potscher and Prucha (1997), Dynamic Nonlinear Economic Models:
Asymptotic Theory
Chapters 8, 10 and 11
6. Davidson (1994), Stochastic Limit Theory
Chapters 15, 23 and 24

6.1

Statistical inference

In the previous chapter, we discussed the consistency of extremum estimators in


general, and M-estimators in particular. However, the consistency of an estimator is,
by itself, a very weak property. Too weak in fact to be useful in practice.
Suppose for example that you have a consistent estimator T and obtain a point
estimate T = 2.5 for some unknown parameter 0 . What can you conclude from
this point estimate? Is the true parameter 0 close to the value 2.5? Could it be that
0 > 10? or 0 < 0?
The consistency of the estimator T ensures that the estimator converges to 0 ,
p
i.e. that T 0 as T . However, it does not tell us if the point estimate of
2.5 is close to 0 or not. In fact, we cannot give a satisfactory answer to any of the
questions above. In order to find answers to these questions we need to have an idea
of what is the distribution of the estimator T .
Suppose that T is known to follow a normal distribution with variance 0.7 and a
mean that is centered at the true unknown parameter 0 . Then we can use the fact
that T N (0 , 0.7) to answer the questions above. Take the following examples:
1. Is it reasonable to suppose that 0 = 3?
Well, if indeed 0 = 3, then T N (3, 0.7), and hence the probability
of obtaining an estimate of T = 2.5 (or something worse T 2.5) is
P (T 2.5) 0.21. This is a rather big probability. Hence, 0 = 3
3

seems like a reasonable possibility. There is no great reason to reject the


hypothesis that 0 = 3.
2. Could it be that 0 = 6?
Well, if 0 = 6, then T N (6, 0.7), and hence the probability of obtaining
an estimate of T = 2.5 (or something worse) is P (T 2.5) 0.000001.
This is such a small probability! It is extremely unlikely that we would ever
obtain the estimate T = 2.5 if the true parameter were 0 = 6. Hence,
0 = 6 seems like a very unreasonable possibility. We should probably
reject the hypothesis that 0 = 6.
3. How about 0 < 0?
If 0 = 0, then T N (0, 0.7) and hence the probability of obtaining an
estimate of T = 2.5 (or something worse) is P (T 2.5) 0.0014. This
is quite a small probability. Obtaining an estimate of T = 2.5 if 0 = 0 is
quite unlikely. On the other hand, it could happen! Should we reject 0 ?
The examples above show that knowing the distribution of an estimator is a crucial
ingredient for interpreting parameter estimates and conducting statistical inference.
In introductory econometrics you have learned how to derive the distribution of
simple estimators. When the data is iid, and the estimator is very simple, then we
can derive the exact finite-sample distribution of the estimator. For example, you
may recall that
 if we have a sample of iid data x1 , ..., xT with Gaussian
PTdistribution
1
2
T = T t=1 xt (which
xt N , , then it is easy to show that the sample average
is the ML estimator of the mean) satisfies,

2 

T N ,
.
T
In many cases, even if the distribution of the data is not known, then we still know
that the normal distribution above is approximately correct by applying a central limit
theorem. Indeed, as we have seen in Chapter 2, we can use central limit theorems
to derive an approximate distribution for estimators that are analytically tractable.
There, we established the asymptotic normality of the least squares estimator in the
linear regression yt = xt + t , which is given by
PT
y t xt

T = Pt=1
,
T
2
t=1 xt
as well as the ML estimator for the linear Gaussian AR(1) xt = xt1 + t , which is
given by
PT
xt xt1

T = Pt=2
.
T
2
t=2 xt1
4

Establishing the asymptotic normality of these estimators is important for conducting


inference. In particular, if we can show that the estimator is asymptotically normal,
then we can use the asymptotic distribution as an approximate distribution on finite
samples.
T is said to be asymptotically normal for a parameter
Definition 1 An estimator
0 if and only if

 d

T 0
T
N 0, V
as T .
T .
V is called the asymptotic variance of
Note that an asymptotically normal estimator such that



T 0 approx
T
N 0, V
where
that

approx

denotes an approximate distribution. As a result, it follows naturally


T 0

and hence that


T

approx

approx

N 0 , V /T


N 0 , V /T .

T has a distribution that is approximately normal,


This means that the estimator
centered at 0 , and with variance V /T that vanishes to zero as the sample size T
diverges to infinity.
In what follows we will attempt to derive an approximate distribution for estimators
in complex nonlinear dynamic settings. We thus consider the more general case where
the estimator is potentially intractable.

6.2

The classical asymptotic normality theorem

T defined as
Consider the extremum estimator
T arg max QT (xT , ).

When the criterion function QT is twice continuously differentiable in , then the


T can be easily obtained by applying a mean
asymptotic normality of the estimator
1
T
value theorem. In particular, note that, by construction, the extremum estimator
always satisfies
T ) = 0
QT (xT ,
1

Recall that the mean value theorem states that if f : A R is a function that is continuously
differentiable on a convex subset of A. Then for any two points (a, b) A A, there exists a point
c A between a and b such that f (a) = f (b) + f (c)(a b), where f denotes the derivative of f .
The mean value theorem is like a Taylor expansion without remainder term. In fact, it is sometimes
called an exact Taylor expansion. The remainder term disappears because the derivative is evaluated
at some point c between a and b instead of being evaluated at a as in the Taylor expansion!

T ) denotes the derivative of the criterion function QT (xT , ) : R


where QT (xT ,
T
with respect to the parameter , evaluated at

T ) = QT (xT , )
QT (xT ,
.
T

=
T ) as
Second, note that we can use the mean value theorem to write QT (xT ,
T 0 )
T ) = QT (xT , 0 ) + 2 QT (xT , )(
QT (xT ,
T

(1)

where 2 QT (xT , v T ) denotes the second derivative of the criterion function QT (xT , ) :
T and
R with respect to the parameter , evaluated at some point T between
0
2 QT (xT , )
2 QT (xT , T ) =
.
0
= T
T ) = 0, the expression in equation (1) above can be
Finally, since QT (xT ,
written as
T 0 )
0 = QT (xT , 0 ) + 2 QT (xT , T )(
and this implies naturally that

1
T 0 ) = 2 QT (xT , )
(
QT (xT , 0 ).
T
Multiplying both sides by

T yields the following very useful equality



1

T ( T 0 ) = QT (xT , T )
T QT (xT , 0 ).

This equality is useful because it immediately suggests that we can obtain the asymp T as long as we can show that
totic normality of the estimator

d
T QT (xT , 0 ) N (0, ) as T
and


1

1
p
2 QT (xT , T )
2 Q ( 0 )

as T .

(2)

Note that the second derivative of the criterion function 2 QT (xT , T ) is evaluated
T and 0 . If the estimator
T is consistent for 0 , then we
at some point T between
obtain the convergence in (2) as long as 2 QT converges uniformly over to 2 Q .
p

Theorem 1 Let T 0 as T and



p
sup 2 QT (xT , ) 2 Q () 0

as T .

Then

1

QT (xT , T )

1
Q ( 0 )
as T .
2

Theorem 1 below states as the classical asymptotic normality conditions for extremum estimators. This remarkable theorem is the end result of decades of work
that dates back to Doob (1934). It is by all accounts, an impressive and beautiful
theorem!
T be a consistent extremum estimator for
Theorem 2 (Asymptotic normality) Let
a parameter 0 that lies in the interior of a compact parameter space . Suppose
further that
1. The scaled criterion derivative is asymptotically normal at 0

T QT (xT , 0 ) N (0, )

as T

2. The second derivative of the criterion converges uniformly



p
sup 2 QT (xT , ) 2 Q () 0 as T

3. The second derivative of the limit criterion 2 Q ( 0 ) is invertible.


Then we have




d
>

T T 0 N 0 ,

as T

1
where = Q ( 0 )
.


The conditions of the theorem above are quite simple. Let us look at a few examples.
In particular, let us look at the usual suspects: the maximum likelihood and least
squares estimators.
Example: (Maximum likelihood estimator) The ML estimator is given by
T arg max LT (xT , )

T is consistent for 0 , then it is also


By Theorem 2 above, if the ML estimator
asymptotically normal as long as the derivative of the log likelihood function LT
(known as the score) converges in distribution to a normal centered at 0

T LT (xT , 0 ) N (0, )

as T

and if the second derivative of the log likelihood function converges uniformly to an
invertible limit 2 L

p
sup 2 LT (xT , ) 2 L () 0 as T .

Example: (Least-squares estimator) The least-squares estimator is given by


T arg max UT ()

where UT () is the sum of squared residuals. By Theorem 2 above, if the least-squares


T is consistent for 0 , then it is also asymptotically normal as long the
estimator
derivative of the least squares function converges in distribution to a normal at 0

d
T UT (xT , 0 ) N (0, ) as T
and if the second derivative of the least squares function converges uniformly to an
invertible limit 2 U

p
sup 2 UT (xT , ) 2 U () 0 as T

The assumption that 0 lies in the interior of , means simply that 0 cannot be a
point in the boundary of . For example, if = [1, 50], then 0 can be any point
between 1 and 50, but we cannot have 0 = 1 or 0 = 50, as these are boundary
points.
The invertibility requirement (condition 3) in Theorem 2, ensures identification
of the parameter 0 . In particular, when 0 is well identified, then the criterion Q
has strong curvature around 0 , and hence 2 Q is non-singular and invertible. If
Q is flat (has no curvature) at 0 , then 2 Q is singular and hence not invertible.
The uniform convergence of the second derivative (condition 2 in Theorem 2), can
be obtained using the techniques introduced in Chapter 4 and 5. In particular, on a
compact parameter space ,

p
sup 2 QT (xT , ) 2 Q () 0 as T

follows easily from the pointwise convergence of 2 QT to 2 Q


p

2 QT (xT , ) 2 Q ()

as T

and the stochastic equicontinuity of {2 QT } which is ensured by




sup E sup 3 QT (xT , ) < .
T

Finally, condition 1 in Theorem 2 is also easy to verify for M-estimators or Zestimators. In the case of an M-estimator,
T
1X
q(xt , xt1 , )
QT (xT , ) =
T t=2


the convergence in distribution of the scaled criterion derivative T QT (xT , 0 ) in to
a normal N (0, ) distribution (condition 1 in Theorem 2), will typically be obtained
by applying a central limit theorem to q(xt , xt1 , ) since
T
1X

T QT (xT , 0 ) = T
q(xt , xt1 , 0 )
T t=2

!
T
1X
q(xt , xt1 , 0 ) Eq(xt , xt1 , 0 ) .
T t=2

The second equality above is valid because, by construction, we have


Eq(xt , xt1 , 0 ) = 0
since 0 is the unique maximizer of the limit criterion Q () = Eq(xt , xt1 , )
in the interior of , and hence Q has zero derivative at 0 .
Theorem 3 re-states the classical conditions for asymptotic normality for the case
special of M-estimators.
T be a consistent M-estimator for a paTheorem 3 (Asymptotic normality) Let
rameter 0 lying in the interior of a compact parameter space . Suppose further
that
1. The scaled criterion derivative is asymptotically normal at 0
T
1X
d
T
q(xt , xt1 , 0 ) N (0, )
T t=2

as T

2. The second derivative of the criterion converges uniformly


T
1 X


p
sup
2 q(xt , xt1 , ) E2 q(xt , xt1 , ) 0
T

t=2

as T

3. The second derivative of the limit criterion E2 q(xt , xt1 , 0 ) is invertible.


Then we have

 d

T 0
T
N 0 , >

as T

where = E q(xt , xt1 , 0 )

1

Example: (Maximum likelihood estimator of NLAR model) Let observed data


{xt }Tt=1 be a subset of an SE sequence {xt }tZ and suppose that we want to estimate
the following NLAR model
xt = (xt1 , ) + t
9

tZ

with iid innovations {t } with distribution t f (). Suppose that the ML estimator
given by
T
1X

T arg max
`(xt , xt1 , )
T
t=2

where `(xt , xt1 , ) := log f xt (xt1 , ),
is consistent for a parameter 0 that lies in the interior of . To ensure the asymptotic
normality of our ML estimator we just have to verify the conditions of Theorem 3
above. Suppose that f is continuously differentiable Then, since the score function
` is continuous, we have that
n
o
`(xt , xt1 , 0 )
tZ

is also strictly stationary and ergodic by Krengels theorem. As a result, if the second
moment of the score sequence is bounded at 0

2
E `(xt , xt1 , 0 ) <
then by the central limit theorem for SE sequences (see Section 4.3.2), we have2
T
1X
d
`(xt , xt1 , 0 ) N (0, )
T
T t=2

as T for some covariance matrix .

Furthermore, if ` is twice continuously differentiable then


n
o
2
`(xt , xt1 , )
tZ

is SE by Krengels theorem. If the second derivative has a bounded first moment




E 2 `(xt , xt1 , ) < for every
then application of a law of large numbers for every yields
T
1X 2
p
`(xt , xt1 , ) E2 `(xt , xt1 , ) as T for every .
T t=2

Finally, if ` is three times continuously differentiable and the third derivative has a
bounded first moment


E sup 3 `(xt , xt1 , ) <

Note that E`(xt , xt1 , 0 ) = E`(xt , xt1 , 0 ) = 0 since E`(xt , xt1 , ) is maximized at = 0

10



then, 2 `(xt , xt1 , ) tZ is stochastically equicontinuous on and we obtain the
uniform convergence of the second derivative
T

1 X
p

2
2
`(xt , xt1 , ) E `(xt , xt1 , ) 0
sup
T t=2

as T .

T is asymptotiAll the conditions of Theorem 3 are thus verified and the estimator
cally normal.
Example: (Maximum likelihood estimator of Gaussian AR(1) model)
Let the observed data {xt }Tt=1 be a subset of an SE sequence {xt }tZ with bounded
fourth moment E|xt |4 < . Consider the Gaussian AR(1) model with N (0, 1) innovations,
xt = xt1 + t where {t }tZ NID(0, 1).

Since xt |xt1 N xt1 , 1 , the log likelihood function is given by
T
1
1X 1
log 2 (xt xt1 )2 .
LT (xT , ) =
T t=2 2
2

Suppose that the ML estimator T is consistent for the parameter 0 lying in the
interior of a compact parameter space. Then it is easy to see that T is also asymptotically normal. First, at 0 , the score is given by
T
1X
(xt 0 xt1 )xt1 .
LT (xT , 0 ) =
T t=2

Since {(xt 0 xt1 )xt1 } is SE (by Krengels Theorem) and


E|(xt 0 xt1 )xt1 |2 cE|xt xt1 |2 + c|0 |2 E|xt1 |4 < ,

(cn -inequality)

we conclude that application of the central limit theorem for SE sequences that
T
1X
d
T
(xt 0 xt1 )xt1 N (0, ) as T .
T t=2

Finally, we note that the second derivative is given by


T
1X
x2t1 .
LT (xT , ) =
T t=2
2

Since {2x2t1 } is SE and satisfies


E| x2t1 | E|xt1 |2 <
11

it follows by a pointwise law of large numbers that


T
1X
p
x2t1 Ex2t1
T t=2

as T for every .

Finally, since the third derivative of the criterion function with respect to is just
zero, the moment bound E|3 LT (xT , )| < holds trivially and we conclude that
{2 LT (xT , )} is stochastically equicontinuous. This means naturally that 2 LT
converges uniformly over the compact parameter space
T
1 X


2
2 p
xt1 + Ext1 0 as T .
sup
T t=2

As a result, all the conditions of Theorem 3 are satisfied and we conclude that T is
asymptotically normal.
In the following example we use the property of well behaved functions to obtain our
results much more quickly!
Example: (Least-squares estimator of NLAR(1) model) Let the observed data
{xt }Tt=1 be a subset of an SE sequence {xt }tZ with bounded fourth moment E|xt |4 <
. Consider the following NLAR(1) model
xt = + tanh(xt1 ) + t
with = (, ) and parameter space = [5, 5] [2, 2]. The criterion function of
the LS estimator takes the form
T
2
1X
q(xt , xt1 , ) with q(xt , xt1 , ) = xt tanh(xt1 ) .
QT (xT , ) =
T t=2

T is consistent for 0 in the interior of , then asympIf the least squares estimator
totic normality follows easily. In particular, since q is three times continuously differentiable, all derivative processes are SE, i.e.
{q(xt , xt1 , )} ,

{2 q(xt , xt1 , )} and {3 q(xt , xt1 , )}

are SE.

Furthermore, since q(xt , xt1 , ) is well behaved and has bounded moments of second
order

2 2
E|q(xt , xt1 , )|2 = E xt tanh(xt1 )

4
= E xt tanh(xt1 )
(cn -inequality)
(| tanh(z)| < 1 z)

= cE|xt |4 + c||4 + c||4 E| tanh(xt1 )|4


cE|xt |4 + c||4 + c||4 <
12

then we conclude immediately that the derivatives q(xt , xt1 , ), 2 q(xt , xt1 , )
and 3 q(xt , xt1 , ) also have two bounded moments.
Ekq(xt , xt1 , )k2 < , Ek2 q(xt , xt1 , )k2 < and Ek3 q(xt , xt1 , )k2 < .
The bounded moment Ekq(xt , xt1 , )k2 < gives us the asymptotic normality of
the score,
T
1X
d
T
q(xt , xt1 , ) N (0, ) as T .
T t=2
The bounded moment Ek2 q(xt , xt1 , )k < gives us a pointwise law of large number for the second derivative. The bounded moment Ek3 q(xt , xt1 , )k < gives
us stochastic equicontinuity of {2 q(xt , xt1 , )} and hence the uniform convergence
T
1 X


p
2
2
sup
q(xt , xt1 , ) E q(xt , xt1 , ) 0
T t=2

6.3

as T .

Approximate Statistical Inference

In the previous section we have learned that, under appropriate conditions, an ex T that is consistent is also asymptotically normal
tremum estimator



 d
>

as T .
T T 0 N 0 ,
From an inferential perspective, we use the asymptotic distribution as an approximate
T . It would be certainly better to know the exact
distribution for the estimator
distribution of the estimator, but in these complicated dynamic nonlinear settings, it
is simply impossible to derive the exact distribution of the estimator. If we take the
asymptotic distribution as an approximate distribution, we essentially have




T 0 app
N 0 , >
T
app

where denotes an approximate distribution. This means naturally, that




1
T 0 app

N 0 , >
T
and hence that



1
T app

N 0 , > .
T
The above expression tells us that the extremum estimator is approximately centered
at the true parameter 0 and is approximately normally distributed with variancecovariance matrix T1 > that vanishes to zero as T . The vanishing variance is
natural for a consistent estimator since we expect it to converge to 0 asymptotically.
13

For simplicity, consider the case of a scalar parameter . Then the asymptotic
T is given by
distribution of




d
T 0
T
N 0 , 2
as T

where and 2 are just scalars. Recall that is the asymptotic variance of T QT ( 0 )
and = 1/2 Q ( 0 ). This means that the larger the variance of the criterion deriva T becomes. On the other hand,
tive is, the larger the asymptotic variance 2 of
the stronger the curvature of Q at 0 , the smaller = 1/2 Q ( 0 ) becomes, and
T . This happens because
hence the smaller becomes the asymptotic variance of
a log likelihood with strong curvature at 0 is a log likelihood that identifies well
the parameter. On the contrary, if the curvature of the limit criterion is weak, then
T becomes
the parameter 0 is not well identified and the asymptotic variance of
T allows us to finally conduct
large. This approximate distribution of the estimator
inference on parameters.
Example: (Logistic SESTAR model) Consider the following estimation results for
a logistic SESTAR
xt = g(xt1 ; )xt1 + t t Z

t Z.
g(xt1 ; ) := +
1 + exp(xt1 )
Parameter

Estimate

Std Error

0.864
0.127

0.132
0.045

We can test the existence of nonlinear dynamics by testing if = 0. Indeed, under


the null hypothesis H0 : = 0, the dynamics are given by a linear AR(1) model.
Now, since the estimator T has an approximate normal distribution given by T
N (0, 0.0452 ) under the null hypothesis. Then, the probability of obtaining an estimate
of T = 0.127 (or larger) if indeed = 0 is rather small p = 0.0048. Maybe we should
reject the null hypothesis. There is strong evidence that the dynamics of the data are
nonlinear.
Example: (GARCH model) Consider the following estimation results for the GARCH
model
xt = t t , {t }tZ NID(0, 1) ,
2
t2 = + x2t1 + t1
.

14

Parameter

Estimate

Std Error

0.023
0.097
0.872

0.001
0.057
0.132

We can test the existence of a time-varying volatility {t2 } driven by lagged returns
x2t1 by testing if = 0. Indeed, under the null hypothesis H0 : = 0, the volatility
simply converges to the constant /(1 ) and does not respond to changes in x2t1 .
Under the null hypothesis that = 0, the estimator
T has an approximate normal
distribution given by
T N (0, 0.057). The probability of obtaining an estimate of

T = 0.097 (or larger) if indeed = 0 is p = 0.088. This is a probability of almost


9%. Should we reject this null hypothesis? This is surely a matter of debate!
In the examples above, the asymptotic variance of the estimator was obtained by
estimating and . Under the null hypothesis H0 : 0 = 0 , Theorem 3 tells us that
is the asymptotic variance of the standardized derivative of the criterion function
T
1X
d
q(xt , xt1 , 0 ) N (0, )
T
T t=2

as T .

When the model is well specified, then {q(xt , xt1 , 0 )} is uncorrelated and hence
T using
we can estimate
T
1X

T =
q(xt , xt1 , 0 )2 .
T t=2

If the model is mis-specified, then {q(xt , xt1 , 0 )} might be correlated, and as a


T . See Section 7.4 for further details.
result, we must use a robust estimator for
Under the null hypothesis H0 : 0 = 0 , Theorem 3 also tells us that is given by
1
= E2 q(xt , xt1 , 0 ) .
As a result, we can estimate using the sample average
T
1
1 X

q(xt , xt1 , 0 )
.
T =
T t=1
As a result, we obtain a final estimated distribution for the estimator given by



>

N 0 , T T T /T .
Please read Section 7.4 for further information.
Note: Software packages typically give estimates of the asymptotic variance that are
based on the assumption of correct specification! As a serious econometrician, you
surely recognize the great limitations of this approach!
15

6.4
6.4.1

Asymptotic Normality: Well-Behaved Functions


Stochastic Equicontinuity

In Chapter 5, we have seen that the property of stochastic equicontinuity can be easily
established when functions are well behaved. In particular, we noted that the uniform
moment bound on the derivative
E sup kq(xt , xt1 , )k <

is implied by a moment bound on the well behaved function of order 1 (see Chapter
4)
E|q(xt , xt1 , )| < for some .
We further noted that, for time-varying parameter models, the uniform moment
bound on the derivative
E sup kq(xt , t (, 1 ), )k <

is implied by a moment bound on the well behaved function (see Chapter 4)


E|q(xt , t (, 1 ), )|2 <

for some

when both the criterion function q and the filtering function are well behaved of
second order in their arguments. These results simplified considerably the proofs
of consistency since stochastic equicontinuity is a crucial ingredient for ensuring the
uniform convergence of criterion functions. As we shall now see, asymptotic normality
proofs are also easier for well behaved criterion functions. In particular, we can use
bounded moments of q(xt , xt1 , ) to establish:
1. Bounded moments for q(xt , xt1 , ), which are typically needed to apply a
central limit theorem and obtain the asymptotic normality of the criterions
first derivative.
2. Bounded moments for 2 q(xt , xt1 , ), that are important to establish a law of
large numbers for the pointwise convergence of the criterions second derivative.
3. Bounded moments for 3 q(xt , xt1 , ), which play a role in establishing the
stochastic equicontinuity of the criterions second derivative and its uniform
convergence.
Theorem 4 (Simple Moments for Asymptotic Normality) Let q(xt , xt1 , ) be three
times continuously differentiable and well behaved of order 2, with two derivatives q
and 2 q that are well behaved of order 1 in . Then having
E|q(xt , xt1 , )|2 <
16

for some

implies that
Ekq(xt , xt1 , 0 )k2 <

and

E sup ki q(xt , xt1 , )k < , i = 2, 3.

The result above follows naturally by noting that if q(xt , xt1 , ) and its derivatives are well behaved, then the moments are transferred to higher-order derivatives
uniformly in . In essence, the two bounded moments stretch to any derivative
uniformly on the parameter space.
Theorem 5 (Simple Moments for Asymptotic Normality in Time-Varying Parameter Models) Let the function q(xt , t (, 1 ), ) be three times continuously differentiable and well behaved of order 4 in both t (, 1 ) and , with two derivatives q
and 2 q that are well behaved of order 2 in both t (, 1 ) and . Furthermore,
let t (, 1 ) be a time-varying parameter
t+1 = (t , xt , )
with a three-times continuously differentiable updating function that is well-behaved
of order 4 in both t (, 1 ) and , that satisfies the following conditions for
E|t ()|4 <
(, x , ) 4


t
E|(1 , xt , )|4 < and E sup
< 1.

Then having
E|q(xt , t (, 1 ), )|4 <
implies that

2
E q(xt , t ( 0 , 1 ), 0 ) <

and

for some



E sup i q(xt , t (, 1 ), ) < i = 2, 3.

Note that the moment bound of Theorem 5 is more restrictive than the moment
bound of Theorem 4. In particular, Theorem 4 only required the second moment of
q(xt , xt1 , ) to be bounded whereas Theorem 5 requires a bounded fourth moment.
6.4.2

Misspecification and a CLT for Lp -Approximable Processes

The usefulness of well-behaved functions does not end with the easy moment conditions required for stochastic equicontinuity. One further simplification can be achieved
with well-behaved functions: a CLT is always available for the criterion derivative,
even if the model is mis-specified!
As noted before, the classical theorem of asymptotic normality for M-estimators
makes use of the asymptotic normality of the criterion derivative
T
1X
d
T
q(xt , xt1 , 0 ) N (0, )
T t=2

17

as T .

When the model is well specified, then {q(xt , xt1 , 0 )} is a martingale difference
sequence (this is essentially an uncorrelated sequence with mean zero). As a result,
if it can be shown that {q(xt , xt1 , 0 )} is also SE, then we can apply the central
limit theorem for SE martingales stated in Chapter 4.
Unfortunately, when the model is mis-specified, then {q(xt , xt1 , 0 )} typically
fails to be a martingale difference sequence. This prevents us from applying the
central limit theorem for SE martingales. In general, it might even prevent us from
applying any central limit theorem. Fortunately, if q(xt , xt1 , 0 ) is well behaved with
well behaved derivative, then {q(xt , xt1 , 0 )} is Lp -approximable by a mixingale.
You do not have to know exactly what this means (feel free to read Potscher and
Prucha (1997) for details). It is sufficient to know that if {q(xt , xt1 , 0 )} is Lp approximable by a mixingale, then the process can exhibit some temporal dependence,
as long as this dependence vanishes sufficiently fast.
T be an M-estimator defined as
Theorem 6 Let
T
X
T arg max 1
q(xt , xt1 , ).

T
t=2

Suppose that 0 is the unique maximizer of the limit criterion


Q () = Eq(xt , xt1 , )
and let q(xt , xt1 , ) be well behaved of order 1 and continuously differentiable with one
bounded moment. If the model is mis-specified, then the derivative {q(xt , xt1 , 0 )}
is Lp approximable by a mixingale sequence.
The theorem above is crucial as it allows us to obtain a central limit theorem for
SE mixingale sequences. This theorem is shown below:
Theorem 7 (Central Limit Theorem) Let {zt }tZ be a strictly stationary and ergodic
that is Lp approximable by a mixingale sequence with E(z1 ) = 0 and Var(z1 ) = 2 <
T
. Then
1X
d
T
zt N (0, ) as T for some .
T t=2
6.4.3

Score Normality for Time-Varying Parameter Models

We end this chapter with one last advantage of well-behaved functions. Namely,
that well-behaved functions can help us derive asymptotic normality results for timevarying parameter models.
The problem we face is simple: in time-varying parameter models we cannot apply
a CLT to the criterion derivative because it depends on the filtered parameter which is

18

not SE. Indeed, in time-varying parameter models, the first derivative of the criterion
function of an M-estimator takes the form
T
1X
q(xt , t (, 1 ), )
QT (xT , ) =
T t=2

where t (, 1 ) denotes the filtered time-varying parameter at time t. We learned in


Chapter 4, that the filtered parameter cannot be SE regardless of its initialization!
T that maximizes
As we already know, the asymptotic normality of the M-estimator
QT (xT , ) can be obtained by establishing first the asymptotic normality of the first
derivative of the criterion function at 0 . However, we face an obvious question: how
can we apply a CLT to the criterion derivative if the SE property is not satisfied?
Luckily, the solution is simple. If the filter {t ( 0 , 1 )}tN satisfies Bougerols
theorem, then we can:
1. substitute the filter for the limit SE filter {t ( 0 )}tZ in the criterion derivative;
2. apply the CLT using the limit SE filter;
3. argue that if criterion derivative is well behaved of order 1, then the error
incurred by substituting {t ( 0 , 1 )}tN for {t ( 0 )}tZ is asymptotically negligible.
This three-step argument for establishing asymptotic normality is summarized in
Theorem 8 below. Note that this theorem makes use of assumptions that were already
used for establishing stochastic equicontinuity in Theorem 5 above. As such Theorem
8 does not impose any real additional restrictions besides the asymptotic normality
of the score under the limit filter. Namely, note that the assumption that q is well
behaved of order 1 is already implied by the assumption of Theorem 5 that q is well
behaved of order 2. Similarly, the moment bound Ekq(xt , t ( 0 , 1 ), 0 )k < is
implied by the conditions of Theorem 5, which ensure Ekq(xt , t ( 0 , 1 ), 0 )k2 < .
Theorem 8 Let q(xt , t (, 1 ), ) be continuously differentiable and well behaved
of order 1 in both t (, 1 ) and . If E|q(xt , t ( 0 , 1 ), 0 )| < and a CLT
applies under the limit filter
T
i
h1 X
d
T
q(xt , t ( 0 ), 0 ) Eq(xt , t ( 0 ), 0 ) N (0, V ) as T .
T t=2

Then the following also holds


T
i
h1 X
d
T
q(xt , t ( 0 , 1 ), 0 ) Eq(xt , t ( 0 ), 0 ) N (0, V )
T t=2

19

as T .

The proof of Theorem 8 is quite simple and intuitive. First, add and subtract the
criterion derivative under the limit filter, and note that3
T
1X
q(xt , t (, 1 ), )
T
T t=2
T h
i
1X
= T
q(xt , t (, 1 ), ) q(xt , t (), ) + q(xt , t (), )
T t=2
T

1 X 
T q(xt , t (, 1 ), ) q(xt , t (), )
=
T t=2

T
i
h1 X
T
q(xt , t (), ) Eq(xt , t (), ) .
T t=2

Clearly, since a CLT holds for the last term by assumption


T
i
h1 X
d
T
q(xt , t ( 0 ), 0 ) Eq(xt , t ( 0 ), 0 ) N (0, V ) as T ,
T t=2

the desired result is obtained by showing that the error term vanishes asymptotically
T 
i
h1 X
p
T
q(xt , t (, 1 ), ) q(xt , t (), ) 0 as T .
T t=2

This is easily achieved by noting that, since q is continuously differentiable, it follows


by the mean value theorem that,






T q(xt , t (, 1 ), ) q(xt , t (), ) 2 q(xt , t , ) T t (, 1 ) t () .
Now since E|q(xt , t (
0 ,2 1 ), 0)| < and q is well-behaved of order 1, it follows immediately that q(xt , t , ) is bounded in probability. Furthermore, since

e.a.s.
Bougerol theorem applies, we have that t (, 1 ) t () 0 as t . Taken
together, these two ingredients imply that4
p
2

q(xt , t , ) T t (, 1 ) t ()
0.

6.5

Exercises

1. Let the sample of data {xt }Tt=1 be a subset of an SE time-series {xt }tZ with
T is
four bounded moments E|xt |4 < . Suppose the least squares estimator
consistent for some 0 in the interior of . Give sufficient conditions for the
T in the following regressions:
asymptotic normality of
3
4

Recall that the last step is valid because Eq(xt , t (), ) = 0.


See Strauman and Mikosch (2006) for a proof of this statement.

20

(a) Fat-tailed sigmoid AR(1):


xt = + cos(xt1 ) + t ,

{t }tZ T ID(7).

(b) Logistic SESTAR:


{t }tZ N ID(0, 1) ,

xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) :=

1 + exp(xt1 )

t Z.

(c) Exponential SESTAR:


{t }tZ N ID(0, 2) ,

xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) := +


1 + exp + (xt1 )2

t Z.

2. Let the sample of data {xt }Tt=1 be a subset of an SE time-series {xt }tZ with
10 bounded moments E|xt |10 < . Suppose the maximum likelihood estimator
T is consistent for some 0 in the interior of . Give sufficient conditions for

T in the following regressions:


the asymptotic normality of
(a) Fat-tailed sigmoid AR(1):
xt = + cos(xt1 ) + t ,

{t }tZ T ID().

(b) Logistic SESTAR:


xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) :=

{t }tZ N ID(0, 2 ) ,

1 + exp(xt1 )

t Z.

(c) Exponential SESTAR:


xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) := +

{t }tZ N ID(0, 2) ,


1 + exp + (xt1 )2

21

t Z.

(d) Gaussian observation-driven local-level model :


xt = t + t ,

{t } NID(0, 2 ) ,

t = + (xt1 t1 ) + t1 .
(e) GARCH:
{t }tZ NID(0, 1) ,

xt = t t ,

2
t2 = + x2t1 + t1
.

(f) Robust GARCH:


xt = t t ,

{t } TID(7) ,

2
t2 = + tanh(x2t ) + t1
.

(g) NGARCH:
xt = t t ,

{t } NID(0, 1) ,

2
t2 = + (xt1 t1 )2 + t1
.

(h) QGARCH:
xt = t t ,

{t } NID(0, 1) ,

2
t2 = + x2t1 + xt1 + t1
.

22

You might also like