Professional Documents
Culture Documents
Advanced Econometrics I
Chapter 6
Francisco Blasques
These lecture notes contain the material covered in the master course
Advanced Econometrics I. Further study material can be found in the
lecture slides and the many references cited throughout the text.
Contents
6 Asymptotic normality
6.1 Statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The classical asymptotic normality theorem . . . . . . . . . . . .
6.3 Approximate Statistical Inference . . . . . . . . . . . . . . . . . .
6.4 Asymptotic Normality: Well-Behaved Functions . . . . . . . . . .
6.4.1 Stochastic Equicontinuity . . . . . . . . . . . . . . . . . .
6.4.2 Misspecification and a CLT for Lp -Approximable Processes
6.4.3 Score Normality for Time-Varying Parameter Models . . .
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
13
16
16
17
18
20
Asymptotic normality
Some more reading material:
1. Newey and McFadden (1994), Large sample estimation and hypothesis testing. Chapter 36 of Handbook of Econometrics.
Sections 3 and 4
2. Domowitz and White (1982), Misspecified Models with Dependent
Observations. Journal of Econometrics, issue 20.
3. White (1996), Estimation Inference and Specification Analysis
Chapters 6, 8 and 11
4. van der Vaart (1998), Asymptotic Statistics
Chapter 5.35.6
5. Potscher and Prucha (1997), Dynamic Nonlinear Economic Models:
Asymptotic Theory
Chapters 8, 10 and 11
6. Davidson (1994), Stochastic Limit Theory
Chapters 15, 23 and 24
6.1
Statistical inference
T N ,
.
T
In many cases, even if the distribution of the data is not known, then we still know
that the normal distribution above is approximately correct by applying a central limit
theorem. Indeed, as we have seen in Chapter 2, we can use central limit theorems
to derive an approximate distribution for estimators that are analytically tractable.
There, we established the asymptotic normality of the least squares estimator in the
linear regression yt = xt + t , which is given by
PT
y t xt
T = Pt=1
,
T
2
t=1 xt
as well as the ML estimator for the linear Gaussian AR(1) xt = xt1 + t , which is
given by
PT
xt xt1
T = Pt=2
.
T
2
t=2 xt1
4
d
T 0
T
N 0, V
as T .
T .
V is called the asymptotic variance of
Note that an asymptotically normal estimator such that
T 0 approx
T
N 0, V
where
that
approx
approx
approx
N 0 , V /T
N 0 , V /T .
6.2
T defined as
Consider the extremum estimator
T arg max QT (xT , ).
Recall that the mean value theorem states that if f : A R is a function that is continuously
differentiable on a convex subset of A. Then for any two points (a, b) A A, there exists a point
c A between a and b such that f (a) = f (b) + f (c)(a b), where f denotes the derivative of f .
The mean value theorem is like a Taylor expansion without remainder term. In fact, it is sometimes
called an exact Taylor expansion. The remainder term disappears because the derivative is evaluated
at some point c between a and b instead of being evaluated at a as in the Taylor expansion!
=
T ) as
Second, note that we can use the mean value theorem to write QT (xT ,
T 0 )
T ) = QT (xT , 0 ) + 2 QT (xT , )(
QT (xT ,
T
(1)
where 2 QT (xT , v T ) denotes the second derivative of the criterion function QT (xT , ) :
T and
R with respect to the parameter , evaluated at some point T between
0
2 QT (xT , )
2 QT (xT , T ) =
.
0
= T
T ) = 0, the expression in equation (1) above can be
Finally, since QT (xT ,
written as
T 0 )
0 = QT (xT , 0 ) + 2 QT (xT , T )(
and this implies naturally that
1
T 0 ) = 2 QT (xT , )
(
QT (xT , 0 ).
T
Multiplying both sides by
T ( T 0 ) = QT (xT , T )
T QT (xT , 0 ).
This equality is useful because it immediately suggests that we can obtain the asymp T as long as we can show that
totic normality of the estimator
d
T QT (xT , 0 ) N (0, ) as T
and
1
1
p
2 QT (xT , T )
2 Q ( 0 )
as T .
(2)
Note that the second derivative of the criterion function 2 QT (xT , T ) is evaluated
T and 0 . If the estimator
T is consistent for 0 , then we
at some point T between
obtain the convergence in (2) as long as 2 QT converges uniformly over to 2 Q .
p
as T .
Then
1
QT (xT , T )
1
Q ( 0 )
as T .
2
Theorem 1 below states as the classical asymptotic normality conditions for extremum estimators. This remarkable theorem is the end result of decades of work
that dates back to Doob (1934). It is by all accounts, an impressive and beautiful
theorem!
T be a consistent extremum estimator for
Theorem 2 (Asymptotic normality) Let
a parameter 0 that lies in the interior of a compact parameter space . Suppose
further that
1. The scaled criterion derivative is asymptotically normal at 0
T QT (xT , 0 ) N (0, )
as T
T T 0 N 0 ,
as T
1
where = Q ( 0 )
.
The conditions of the theorem above are quite simple. Let us look at a few examples.
In particular, let us look at the usual suspects: the maximum likelihood and least
squares estimators.
Example: (Maximum likelihood estimator) The ML estimator is given by
T arg max LT (xT , )
T LT (xT , 0 ) N (0, )
as T
and if the second derivative of the log likelihood function converges uniformly to an
invertible limit 2 L
p
sup
2 LT (xT , ) 2 L ()
0 as T .
d
T UT (xT , 0 ) N (0, ) as T
and if the second derivative of the least squares function converges uniformly to an
invertible limit 2 U
p
sup
2 UT (xT , ) 2 U ()
0 as T
The assumption that 0 lies in the interior of , means simply that 0 cannot be a
point in the boundary of . For example, if = [1, 50], then 0 can be any point
between 1 and 50, but we cannot have 0 = 1 or 0 = 50, as these are boundary
points.
The invertibility requirement (condition 3) in Theorem 2, ensures identification
of the parameter 0 . In particular, when 0 is well identified, then the criterion Q
has strong curvature around 0 , and hence 2 Q is non-singular and invertible. If
Q is flat (has no curvature) at 0 , then 2 Q is singular and hence not invertible.
The uniform convergence of the second derivative (condition 2 in Theorem 2), can
be obtained using the techniques introduced in Chapter 4 and 5. In particular, on a
compact parameter space ,
p
sup
2 QT (xT , ) 2 Q ()
0 as T
2 QT (xT , ) 2 Q ()
as T
Finally, condition 1 in Theorem 2 is also easy to verify for M-estimators or Zestimators. In the case of an M-estimator,
T
1X
q(xt , xt1 , )
QT (xT , ) =
T t=2
the convergence in distribution of the scaled criterion derivative T QT (xT , 0 ) in to
a normal N (0, ) distribution (condition 1 in Theorem 2), will typically be obtained
by applying a central limit theorem to q(xt , xt1 , ) since
T
1X
T QT (xT , 0 ) = T
q(xt , xt1 , 0 )
T t=2
!
T
1X
q(xt , xt1 , 0 ) Eq(xt , xt1 , 0 ) .
T t=2
as T
t=2
as T
d
T 0
T
N 0 , >
as T
1
tZ
with iid innovations {t } with distribution t f (). Suppose that the ML estimator
given by
T
1X
T arg max
`(xt , xt1 , )
T
t=2
where `(xt , xt1 , ) := log f xt (xt1 , ),
is consistent for a parameter 0 that lies in the interior of . To ensure the asymptotic
normality of our ML estimator we just have to verify the conditions of Theorem 3
above. Suppose that f is continuously differentiable Then, since the score function
` is continuous, we have that
n
o
`(xt , xt1 , 0 )
tZ
is also strictly stationary and ergodic by Krengels theorem. As a result, if the second
moment of the score sequence is bounded at 0
2
E
`(xt , xt1 , 0 )
<
then by the central limit theorem for SE sequences (see Section 4.3.2), we have2
T
1X
d
`(xt , xt1 , 0 ) N (0, )
T
T t=2
Finally, if ` is three times continuously differentiable and the third derivative has a
bounded first moment
E sup
3 `(xt , xt1 , )
<
Note that E`(xt , xt1 , 0 ) = E`(xt , xt1 , 0 ) = 0 since E`(xt , xt1 , ) is maximized at = 0
10
then, 2 `(xt , xt1 , ) tZ is stochastically equicontinuous on and we obtain the
uniform convergence of the second derivative
T
1 X
p
2
2
`(xt , xt1 , ) E `(xt , xt1 , )
0
sup
T t=2
as T .
T is asymptotiAll the conditions of Theorem 3 are thus verified and the estimator
cally normal.
Example: (Maximum likelihood estimator of Gaussian AR(1) model)
Let the observed data {xt }Tt=1 be a subset of an SE sequence {xt }tZ with bounded
fourth moment E|xt |4 < . Consider the Gaussian AR(1) model with N (0, 1) innovations,
xt = xt1 + t where {t }tZ NID(0, 1).
Since xt |xt1 N xt1 , 1 , the log likelihood function is given by
T
1
1X 1
log 2 (xt xt1 )2 .
LT (xT , ) =
T t=2 2
2
Suppose that the ML estimator T is consistent for the parameter 0 lying in the
interior of a compact parameter space. Then it is easy to see that T is also asymptotically normal. First, at 0 , the score is given by
T
1X
(xt 0 xt1 )xt1 .
LT (xT , 0 ) =
T t=2
(cn -inequality)
we conclude that application of the central limit theorem for SE sequences that
T
1X
d
T
(xt 0 xt1 )xt1 N (0, ) as T .
T t=2
as T for every .
Finally, since the third derivative of the criterion function with respect to is just
zero, the moment bound E|3 LT (xT , )| < holds trivially and we conclude that
{2 LT (xT , )} is stochastically equicontinuous. This means naturally that 2 LT
converges uniformly over the compact parameter space
T
1 X
2
2 p
xt1 + Ext1 0 as T .
sup
T t=2
As a result, all the conditions of Theorem 3 are satisfied and we conclude that T is
asymptotically normal.
In the following example we use the property of well behaved functions to obtain our
results much more quickly!
Example: (Least-squares estimator of NLAR(1) model) Let the observed data
{xt }Tt=1 be a subset of an SE sequence {xt }tZ with bounded fourth moment E|xt |4 <
. Consider the following NLAR(1) model
xt = + tanh(xt1 ) + t
with = (, ) and parameter space = [5, 5] [2, 2]. The criterion function of
the LS estimator takes the form
T
2
1X
q(xt , xt1 , ) with q(xt , xt1 , ) = xt tanh(xt1 ) .
QT (xT , ) =
T t=2
T is consistent for 0 in the interior of , then asympIf the least squares estimator
totic normality follows easily. In particular, since q is three times continuously differentiable, all derivative processes are SE, i.e.
{q(xt , xt1 , )} ,
are SE.
Furthermore, since q(xt , xt1 , ) is well behaved and has bounded moments of second
order
2 2
E|q(xt , xt1 , )|2 = E xt tanh(xt1 )
4
= Ext tanh(xt1 )
(cn -inequality)
(| tanh(z)| < 1 z)
then we conclude immediately that the derivatives q(xt , xt1 , ), 2 q(xt , xt1 , )
and 3 q(xt , xt1 , ) also have two bounded moments.
Ekq(xt , xt1 , )k2 < , Ek2 q(xt , xt1 , )k2 < and Ek3 q(xt , xt1 , )k2 < .
The bounded moment Ekq(xt , xt1 , )k2 < gives us the asymptotic normality of
the score,
T
1X
d
T
q(xt , xt1 , ) N (0, ) as T .
T t=2
The bounded moment Ek2 q(xt , xt1 , )k < gives us a pointwise law of large number for the second derivative. The bounded moment Ek3 q(xt , xt1 , )k < gives
us stochastic equicontinuity of {2 q(xt , xt1 , )} and hence the uniform convergence
T
1 X
p
2
2
sup
q(xt , xt1 , ) E q(xt , xt1 , )
0
T t=2
6.3
as T .
In the previous section we have learned that, under appropriate conditions, an ex T that is consistent is also asymptotically normal
tremum estimator
d
>
as T .
T T 0 N 0 ,
From an inferential perspective, we use the asymptotic distribution as an approximate
T . It would be certainly better to know the exact
distribution for the estimator
distribution of the estimator, but in these complicated dynamic nonlinear settings, it
is simply impossible to derive the exact distribution of the estimator. If we take the
asymptotic distribution as an approximate distribution, we essentially have
T 0 app
N 0 , >
T
app
N 0 , >
T
and hence that
1
T app
N 0 , > .
T
The above expression tells us that the extremum estimator is approximately centered
at the true parameter 0 and is approximately normally distributed with variancecovariance matrix T1 > that vanishes to zero as T . The vanishing variance is
natural for a consistent estimator since we expect it to converge to 0 asymptotically.
13
For simplicity, consider the case of a scalar parameter . Then the asymptotic
T is given by
distribution of
d
T 0
T
N 0 , 2
as T
where and 2 are just scalars. Recall that is the asymptotic variance of T QT ( 0 )
and = 1/2 Q ( 0 ). This means that the larger the variance of the criterion deriva T becomes. On the other hand,
tive is, the larger the asymptotic variance 2 of
the stronger the curvature of Q at 0 , the smaller = 1/2 Q ( 0 ) becomes, and
T . This happens because
hence the smaller becomes the asymptotic variance of
a log likelihood with strong curvature at 0 is a log likelihood that identifies well
the parameter. On the contrary, if the curvature of the limit criterion is weak, then
T becomes
the parameter 0 is not well identified and the asymptotic variance of
T allows us to finally conduct
large. This approximate distribution of the estimator
inference on parameters.
Example: (Logistic SESTAR model) Consider the following estimation results for
a logistic SESTAR
xt = g(xt1 ; )xt1 + t t Z
t Z.
g(xt1 ; ) := +
1 + exp(xt1 )
Parameter
Estimate
Std Error
0.864
0.127
0.132
0.045
14
Parameter
Estimate
Std Error
0.023
0.097
0.872
0.001
0.057
0.132
We can test the existence of a time-varying volatility {t2 } driven by lagged returns
x2t1 by testing if = 0. Indeed, under the null hypothesis H0 : = 0, the volatility
simply converges to the constant /(1 ) and does not respond to changes in x2t1 .
Under the null hypothesis that = 0, the estimator
T has an approximate normal
distribution given by
T N (0, 0.057). The probability of obtaining an estimate of
as T .
When the model is well specified, then {q(xt , xt1 , 0 )} is uncorrelated and hence
T using
we can estimate
T
1X
T =
q(xt , xt1 , 0 )2 .
T t=2
q(xt , xt1 , 0 )
.
T =
T t=1
As a result, we obtain a final estimated distribution for the estimator given by
>
N 0 , T T T /T .
Please read Section 7.4 for further information.
Note: Software packages typically give estimates of the asymptotic variance that are
based on the assumption of correct specification! As a serious econometrician, you
surely recognize the great limitations of this approach!
15
6.4
6.4.1
In Chapter 5, we have seen that the property of stochastic equicontinuity can be easily
established when functions are well behaved. In particular, we noted that the uniform
moment bound on the derivative
E sup kq(xt , xt1 , )k <
is implied by a moment bound on the well behaved function of order 1 (see Chapter
4)
E|q(xt , xt1 , )| < for some .
We further noted that, for time-varying parameter models, the uniform moment
bound on the derivative
E sup kq(xt , t (, 1 ), )k <
for some
when both the criterion function q and the filtering function are well behaved of
second order in their arguments. These results simplified considerably the proofs
of consistency since stochastic equicontinuity is a crucial ingredient for ensuring the
uniform convergence of criterion functions. As we shall now see, asymptotic normality
proofs are also easier for well behaved criterion functions. In particular, we can use
bounded moments of q(xt , xt1 , ) to establish:
1. Bounded moments for q(xt , xt1 , ), which are typically needed to apply a
central limit theorem and obtain the asymptotic normality of the criterions
first derivative.
2. Bounded moments for 2 q(xt , xt1 , ), that are important to establish a law of
large numbers for the pointwise convergence of the criterions second derivative.
3. Bounded moments for 3 q(xt , xt1 , ), which play a role in establishing the
stochastic equicontinuity of the criterions second derivative and its uniform
convergence.
Theorem 4 (Simple Moments for Asymptotic Normality) Let q(xt , xt1 , ) be three
times continuously differentiable and well behaved of order 2, with two derivatives q
and 2 q that are well behaved of order 1 in . Then having
E|q(xt , xt1 , )|2 <
16
for some
implies that
Ekq(xt , xt1 , 0 )k2 <
and
The result above follows naturally by noting that if q(xt , xt1 , ) and its derivatives are well behaved, then the moments are transferred to higher-order derivatives
uniformly in . In essence, the two bounded moments stretch to any derivative
uniformly on the parameter space.
Theorem 5 (Simple Moments for Asymptotic Normality in Time-Varying Parameter Models) Let the function q(xt , t (, 1 ), ) be three times continuously differentiable and well behaved of order 4 in both t (, 1 ) and , with two derivatives q
and 2 q that are well behaved of order 2 in both t (, 1 ) and . Furthermore,
let t (, 1 ) be a time-varying parameter
t+1 = (t , xt , )
with a three-times continuously differentiable updating function that is well-behaved
of order 4 in both t (, 1 ) and , that satisfies the following conditions for
E|t ()|4 <
(, x , ) 4
t
E|(1 , xt , )|4 < and E sup
< 1.
Then having
E|q(xt , t (, 1 ), )|4 <
implies that
2
E
q(xt , t ( 0 , 1 ), 0 )
<
and
for some
E sup
i q(xt , t (, 1 ), )
< i = 2, 3.
Note that the moment bound of Theorem 5 is more restrictive than the moment
bound of Theorem 4. In particular, Theorem 4 only required the second moment of
q(xt , xt1 , ) to be bounded whereas Theorem 5 requires a bounded fourth moment.
6.4.2
The usefulness of well-behaved functions does not end with the easy moment conditions required for stochastic equicontinuity. One further simplification can be achieved
with well-behaved functions: a CLT is always available for the criterion derivative,
even if the model is mis-specified!
As noted before, the classical theorem of asymptotic normality for M-estimators
makes use of the asymptotic normality of the criterion derivative
T
1X
d
T
q(xt , xt1 , 0 ) N (0, )
T t=2
17
as T .
When the model is well specified, then {q(xt , xt1 , 0 )} is a martingale difference
sequence (this is essentially an uncorrelated sequence with mean zero). As a result,
if it can be shown that {q(xt , xt1 , 0 )} is also SE, then we can apply the central
limit theorem for SE martingales stated in Chapter 4.
Unfortunately, when the model is mis-specified, then {q(xt , xt1 , 0 )} typically
fails to be a martingale difference sequence. This prevents us from applying the
central limit theorem for SE martingales. In general, it might even prevent us from
applying any central limit theorem. Fortunately, if q(xt , xt1 , 0 ) is well behaved with
well behaved derivative, then {q(xt , xt1 , 0 )} is Lp -approximable by a mixingale.
You do not have to know exactly what this means (feel free to read Potscher and
Prucha (1997) for details). It is sufficient to know that if {q(xt , xt1 , 0 )} is Lp approximable by a mixingale, then the process can exhibit some temporal dependence,
as long as this dependence vanishes sufficiently fast.
T be an M-estimator defined as
Theorem 6 Let
T
X
T arg max 1
q(xt , xt1 , ).
T
t=2
We end this chapter with one last advantage of well-behaved functions. Namely,
that well-behaved functions can help us derive asymptotic normality results for timevarying parameter models.
The problem we face is simple: in time-varying parameter models we cannot apply
a CLT to the criterion derivative because it depends on the filtered parameter which is
18
not SE. Indeed, in time-varying parameter models, the first derivative of the criterion
function of an M-estimator takes the form
T
1X
q(xt , t (, 1 ), )
QT (xT , ) =
T t=2
19
as T .
The proof of Theorem 8 is quite simple and intuitive. First, add and subtract the
criterion derivative under the limit filter, and note that3
T
1X
q(xt , t (, 1 ), )
T
T t=2
T h
i
1X
= T
q(xt , t (, 1 ), ) q(xt , t (), ) + q(xt , t (), )
T t=2
T
1 X
T q(xt , t (, 1 ), ) q(xt , t (), )
=
T t=2
T
i
h1 X
T
q(xt , t (), ) Eq(xt , t (), ) .
T t=2
the desired result is obtained by showing that the error term vanishes asymptotically
T
i
h1 X
p
T
q(xt , t (, 1 ), ) q(xt , t (), ) 0 as T .
T t=2
6.5
Exercises
1. Let the sample of data {xt }Tt=1 be a subset of an SE time-series {xt }tZ with
T is
four bounded moments E|xt |4 < . Suppose the least squares estimator
consistent for some 0 in the interior of . Give sufficient conditions for the
T in the following regressions:
asymptotic normality of
3
4
20
{t }tZ T ID(7).
xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) :=
1 + exp(xt1 )
t Z.
xt = g(xt1 ; )xt1 + t ,
g(xt1 ; ) := +
1 + exp + (xt1 )2
t Z.
2. Let the sample of data {xt }Tt=1 be a subset of an SE time-series {xt }tZ with
10 bounded moments E|xt |10 < . Suppose the maximum likelihood estimator
T is consistent for some 0 in the interior of . Give sufficient conditions for
{t }tZ T ID().
{t }tZ N ID(0, 2 ) ,
1 + exp(xt1 )
t Z.
{t }tZ N ID(0, 2) ,
1 + exp + (xt1 )2
21
t Z.
{t } NID(0, 2 ) ,
t = + (xt1 t1 ) + t1 .
(e) GARCH:
{t }tZ NID(0, 1) ,
xt = t t ,
2
t2 = + x2t1 + t1
.
{t } TID(7) ,
2
t2 = + tanh(x2t ) + t1
.
(g) NGARCH:
xt = t t ,
{t } NID(0, 1) ,
2
t2 = + (xt1 t1 )2 + t1
.
(h) QGARCH:
xt = t t ,
{t } NID(0, 1) ,
2
t2 = + x2t1 + xt1 + t1
.
22