You are on page 1of 44

Nov. 22, 2003, revised Dec.

27, 2003

Hayashi Econometrics

Solution to Chapter 1 Analytical Exercises


1. (Reproducing the answer on p. 84 of the book) (yX ) (y X ) = [(y Xb) + X(b )] [(y Xb) + X(b )] (by the add-and-subtract strategy) = [(y Xb) + (b ) X ][(y Xb) + X(b )] = (y Xb) (y Xb) + (b ) X (y Xb) + (y Xb) X(b ) + (b ) X X(b ) = (y Xb) (y Xb) + 2(b ) X (y Xb) + (b ) X X(b ) (since (b ) X (y Xb) = (y Xb) X(b )) = (y Xb) (y Xb) + (b ) X X(b ) (since X (y Xb) = 0 by the normal equations) (y Xb) (y Xb)
n

(since (b ) X X(b ) = z z =
i=1

2 zi 0 where z X(b )).

2. (a), (b). If X is an n K matrix of full column rank, then X X is symmetric and invertible. It is very straightforward to show (and indeed youve been asked to show in the text) that MX In X(X X)1 X is symmetric and idempotent and that MX X = 0. In this question, set X = 1 (vector of ones). (c) M1 y = [In 1(1 1)1 1 ]y 1 = y 11 y (since 1 1 = n) n n 1 =y 1 y i = y 1 y n i=1 (d) Replace y by X in (c). 3. Special case of the solution to the next exercise. 4. From the normal equations (1.2.3) of the text, we obtain (a) X1 X2 . [X1 . . X2 ] b1 b2 = X1 X2 y.

Using the rules of multiplication of partitioned matrices, it is straightforward to derive () and () from the above. 1

(b) By premultiplying both sides of () in the question by X1 (X1 X1 )1 , we obtain X1 (X1 X1 )1 X1 X1 b1 = X1 (X1 X1 )1 X1 X2 b2 + X1 (X1 X1 )1 X1 y X1 b1 = P1 X2 b2 + P1 y Substitution of this into () yields X2 (P1 X2 b2 + P1 y) + X2 X2 b2 = X2 y X2 (I P1 )X2 b2 = X2 (I P1 )y X2 M1 X2 b2 = X2 M1 y X2 M1 M1 X2 b2 = X2 M1 M1 y Therefore, b2 = (X2 X2 )1 X2 y (The matrix X2 X2 is invertible because X2 is of full column rank. To see that X2 is of full column rank, suppose not. Then there exists a non-zero vector c such that X2 c = 0. But d X2 c = X2 c X1 d where d (X1 X1 )1 X1 X2 c. That is, X = 0 for . This is c . a contradiction because X = [X1 . . X2 ] is of full column rank and = 0.) (c) By premultiplying both sides of y = X1 b1 + X2 b2 + e by M1 , we obtain M1 y = M1 X1 b1 + M1 X2 b2 + M1 e. Since M1 X1 = 0 and y M1 y, the above equation can be rewritten as y = M1 X2 b2 + M1 e = X2 b2 + M1 e. M1 e = e because M1 e = (I P1 )e = e P1 e = e X1 (X1 X1 )1 X1 e =e (d) From (b), we have b2 = (X2 X2 )1 X2 y = (X2 X2 )1 X2 M1 M1 y = (X2 X2 )1 X2 y. Therefore, b2 is the OLS coecient estimator for the regression y on X2 . The residual vector from the regression is y X2 b2 = (y y) + (y X2 b2 ) = (y M1 y) + (y X2 b2 ) = (y M1 y) + e (by (c)) = P 1 y + e. 2 (since X1 e = 0 by normal equations). X2 X2 b2 = X2 y. (since M1 is symmetric & idempotent)

This does not equal e because P1 y is not necessarily zero. The SSR from the regression of y on X2 can be written as (y X2 b2 ) (y X2 b2 ) = (P1 y + e) (P1 y + e) = ( P1 y ) ( P1 y ) + e e This does not equal e e if P1 y is not zero. (e) From (c), y = X2 b2 + e. So y y = (X2 b2 + e) (X2 b2 + e) = b2 X2 X2 b2 + e e (since X2 e = 0). (since P1 e = X1 (X1 X1 )1 X1 e = 0).

Since b2 = (X2 X2 )1 X2 y, we have b2 X2 X2 b2 = y X2 (X2 M1 X2 )1 X2 y. (f) (i) Let b1 be the OLS coecient estimator for the regression of y on X1 . Then b1 = (X1 X1 )1 X1 y = (X1 X1 )1 X1 M1 y = (X1 X1 )1 (M1 X1 ) y =0 (since M1 X1 = 0).

So SSR1 = (y X1 b1 ) (y X1 b1 ) = y y. (ii) Since the residual vector from the regression of y on X2 equals e by (c), SSR2 = e e. (iii) From the Frisch-Waugh Theorem, the residuals from the regression of y on X1 and X2 equal those from the regression of M1 y (= y) on M1 X2 (= X2 ). So SSR3 = e e. 5. (a) The hint is as good as the answer. (b) Let y X , the residuals from the restricted regression. By using the add-and-subtract strategy, we obtain y X = (y Xb) + X(b ). So SSRR = [(y Xb) + X(b )] [(y Xb) + X(b )] = (y Xb) (y Xb) + (b ) X X(b ) But SSRU = (y Xb) (y Xb), so SSRR SSRU = (b ) X X(b ) = (Rb r) [R(X X)1 R ]1 (Rb r) = R(X X) = X(X X) = P. (c) The F -ratio is dened as F (Rb r) [R(X X)1 R ]1 (Rb r)/r s2 3 (where r = #r) (1.4.9)
1

(since X (y Xb) = 0).

(using the expresion for from (a))

(using the expresion for from (a)) (by the rst order conditions that X (y X ) = R )

Since (Rb r) [R(X X)1 R ]1 (Rb r) = SSRR SSRU as shown above, the F -ratio can be rewritten as F = (SSRR SSRU )/r s2 (SSRR SSRU )/r = e e/(n K ) (SSRR SSRU )/r = SSRU /(n K )

Therefore, (1.4.9)=(1.4.11). 6. (a) Unrestricted model: y = X + , where y1 1 x12 . . . . y = . X = . . , . .


(N 1)

yn

(N K )

xn2

. . . x1K . .. . . . , . . . xnK

1 . = . . . (K 1) n

Restricted model: y = X + , 0 0 R = . . ((K 1)K ) .

R = r, where 1 0 ... 0 0 1 ... 0 , . .. . . . 0 0 1

0 . r = . . . ((K 1)1) 0

Obviously, the restricted OLS estimator of is y y 0 y = . . So X = . . . . (K 1) . 0 y

= 1 y.

(You can use the formula for the unrestricted OLS derived in the previous exercise, = b (X X)1 R [R(X X)1 R ]1 (Rb r), to verify this.) If SSRU and SSRR are the minimized sums of squared residuals from the unrestricted and restricted models, they are calculated as
n

SSRR = (y X ) (y X ) =
i=1

(yi y )2
n

SSRU = (y Xb) (y Xb) = e e =


i=1

e2 i

Therefore,
n n

SSRR SSRU =
i=1

(yi y )2
i=1

e2 i.

(A)

On the other hand, (b ) (X X)(b ) = (Xb X ) (Xb X )


n

=
i=1

(yi y )2 .

Since SSRR SSRU = (b ) (X X)(b ) (as shown in Exercise 5(b)),


n n n

(yi y )2
i=1 i=1

e2 i =
i=1

(yi y )2 .

(B)

(b) F = (SSRR SSRU )/(K 1) n 2 i=1 ei /(n K ) (


n i=1 (yi n

(by Exercise 5(c))

y )2 i=1 e2 i )/(K 1) n 2 /(n K ) e i=1 i 1)

(by equation (A) above)

P (y P b y) /y()K1) P e(y/( nK ) P (y y)
n i=1 i n i=1 2 i 2 n 2 i=1 i n i i=1 2

n 2 i=1 (yi y ) /(K n 2 i=1 ei /(n K )

(by equation (B) above)


n

(by dividing both numerator & denominator by


i=1

(yi y )2 )

R2 /(K 1) (1 R2 )/(n K )

(by the denition or R2 ).

7. (Reproducing the answer on pp. 84-85 of the book) (a) GLS = A where A (X V1 X)1 X V1 and b GLS = B where B (X X)1 X (X V1 X)1 X V1 . So Cov( GLS , b GLS ) = Cov(A, B) = A Var()B = 2 AVB . It is straightforward to show that AVB = 0. (b) For the choice of H indicated in the hint,
1 Var( ) Var( GLS ) = CVq C.

If C = 0, then there exists a nonzero vector z such that C z v = 0. For such z,


1 z [Var( ) Var( GLS )]z = v Vq v<0

(since Vq is positive denite),

which is a contradiction because GLS is ecient.

Nov. 25, 2003, Revised February 23, 2010

Hayashi Econometrics

Solution to Chapter 2 Analytical Exercises


1. For any > 0, Prob(|zn | > ) = So, plim zn = 0. On the other hand, E(zn ) = which means that limn E(zn ) = . 2. As shown in the hint, (z n )2 = (z n E(z n ))2 + 2(z n E(z n ))(E(z n ) ) + (E(z n ) )2 . Take the expectation of both sides to obtain E[(z n )2 ] = E[(z n E(z n ))2 ] + 2 E[z n E(z n )](E(z n ) ) + (E(z n ) )2 = Var(z n ) + (E(z n ) )2 (because E[z n E(z n )] = E(z n ) E(z n ) = 0). 1 n1 0 + n2 = n, n n 1 0 as n . n

Take the limit as n of both sides to obtain


n

lim E[(z n )2 ] = lim Var(z n ) + lim (E(z n ) )2


n n

=0

(because lim E(z n ) = , lim Var(z n ) = 0).


n n

Therefore, zn m.s. . By Lemma 2.2(a), this implies zn p . 3. (a) Since an i.i.d. process is ergodic stationary, Assumption 2.2 is implied by Assumption 2.2 . Assumptions 2.1 and 2.2 imply that gi xi i is i.i.d. Since an i.i.d. process with mean zero is mds (martingale dierences), Assumption 2.5 is implied by Assumptions 2.2 and 2.5 . (b) Rewrite the OLS estimator as
1 b = (X X)1 X = S xx g.

(A)

Since by Assumption 2.2 {xi } is i.i.d., {xi xi } is i.i.d. So by Kolmogorovs Second Strong LLN, we obtain Sxx xx
p

The convergence is actually almost surely, but almost sure convergence implies convergence in probability. Since xx is invertible by Assumption 2.4, by Lemma 2.3(a) we get
1 1 S xx xx . p

Similarly, under Assumption 2.1 and 2.2 {gi } is i.i.d. By Kolmogorovs Second Strong LLN, we obtain g E(gi ),
p

which is zero by Assumption 2.3. So by Lemma 2.3(a),


1 1 S xx g xx 0 = 0. p

Therefore, plimn (b ) = 0 which implies that the OLS estimator b is consistent. Next, we prove that the OLS estimator b is asymptotically normal. Rewrite equation(A) above as 1 n(b ) = S ng . xx As already observed, {gi } is i.i.d. with E(gi ) = 0. The variance of gi equals E(gi gi ) = S since E(gi ) = 0 by Assumption 2.3. So by the Lindeberg-Levy CLT, ng N (0, S).
d 1 1 Furthermore, as already noted, S xx p xx . Thus by Lemma 2.4(c), 1 1 n(b ) N (0, xx S xx ). d

4. The hint is as good as the answer. 5. As shown in the solution to Chapter 1 Analytical Exercise 5, SSRR SSRU can be written as SSRR SSRU = (Rb r) [R(X X)1 R ]1 (Rb r). Using the restrictions of the null hypothesis, Rb r = R(b ) = R(X X)1 X
1 = RS xx g

(since b = (X X)1 X ) 1 n
n

(where g

xi i .).
i=1

1 1 Also [R(X X)1 R]1 = n [RS . So xx R] 1 1 1 1 R S SSRR SSRU = ( n g) S xx ( n g). xx R (R Sxx R )

Thus SSRR SSRU 1 2 1 1 1 = ( n g) S R S xx R (s R Sxx R ) xx ( n g) 2 s 1 = zn A n zn , where 1 2 1 zn R S xx ( n g), An s R Sxx R .

By Assumption 2.2, plim Sxx = xx . By Assumption 2.5, Lemma 2.4(c), we have: 1 1 zn N (0, R xx Sxx R ).
d

ng d N (0, S). So by

But, as shown in (2.6.4), S = 2 xx under conditional homoekedasticity (Assumption 2.7). So the expression for the variance of the limiting distribution above becomes
1 1 1 2 R xx Sxx R = Rxx R A.

Thus we have shown: zn z, z N (0, A).


d

As already observed, Sxx p xx . By Assumption 2.7, 2 = E(2 i ). So by Proposition 2.2, s2 p 2 . Thus by Lemma 2.3(a) (the Continuous Mapping Theorem), An p A. Therefore, by Lemma 2.4(d), 1 1 zn A z. n zn z A
d

But since Var(z) = A, the distribution of z A

z is chi-squared with #z degrees of freedom.

6. For simplicity, we assumed in Section 2.8 that {yi , xi } is i.i.d. Collecting all the assumptions made in Section 2.8, (i) (linearity) yi = xi + i . (ii) (random sample) {yi , xi } is i.i.d. (iii) (rank condition) E(xi xi ) is non-singular. (iv) E(2 i xi xi ) is non-singular. (v) (stronger version of orthogonality) E(i |xi ) = 0 (see (2.8.5)). (vi) (parameterized conditional heteroskedasticity) E(2 i |xi ) = zi . These conditions together are stronger than Assumptions 2.1-2.5. (a) We wish to verify Assumptions 2.1-2.3 for the regression equation (2.8.8). Clearly, Assumption 2.1 about the regression equation (2.8.8) is satised by (i) about the original regression. Assumption 2.2 about (2.8.8) (that {2 i , xi } is ergodic stationary) is satised by (i) and (ii). To see that Assumption 2.3 about (2.8.8) (that E(zi i ) = 0) is satised, note rst that E(i |xi ) = 0 by construction. Since zi is a function of xi , we have E(i |zi ) = 0 by the Law of Iterated Expectation. Therefore, Assumption 2.3 is satised. The additional assumption needed for (2.8.8) is Assumption 2.4 that E(zi zi ) be nonsingular. With Assumptions 2.1-2.4 satised for (2.8.8), the OLS estimator is consistent by Proposition 2.1(a) applied to (2.8.8). (b) Note that = ( ) ( ) and use the hint. (c) Regarding the rst term of (), by Kolmogorovs LLN, the sample mean in that term converges in probability to E(xi i zi ) provided this population mean exists. But E(xi i zi ) = E[zi xi E(i |zi )]. By (v) (that E(i |xi ) = 0) and the Law of Iterated Expectations, E(i |zi ) = 0. Thus E(xi i zi ) = 0. Furthermore, plim(b ) = 0 since b is consistent when Assumptions 2.1-2.4 (which are implied by Assumptions (i)-(vi) above) are satised for the original regression. Therefore, the rst term of () converges in probability to zero. Regarding the second term of (), the sample mean in that term converges in probability to E(x2 i zi ) provided this population mean exists. Then the second term converges in probability to zero because plim(b ) = 0.

(d) Multiplying both sides of () by n( ) = = 1 n


n

n,
n

1 n

zi zi
i=1 1

1 n

zi vi
i=1 n

zi zi
i=1

1 2 n(b ) n

xi i zi +
i=1

n(b ) (b )

1 n

x2 i zi .
i=1

Under Assumptions 2.1-2.5 for the original regression (which are implied by Assumptions (i)-(vi) above), n(b ) converges in distribution to a random variable. As shown in n 1 (c), n i=1 xi i zi p 0. So by Lemma 2.4(b) the rst term in the brackets vanishes n 1 2 (converges to zero in probability). As shown in (c), (b ) n i=1 xi zi vanishes provided 2 E(xi zi ) exists and is nite. So by Lemma 2.4(b) the second term, too, vanishes. Therefore, n( ) vanishes, provided that E(zi zi ) is non-singular. 7. This exercise is about the model in Section 2.8, so we continue to maintain Assumptions (i)(vi) listed in the solution to the previous exercise. Given the hint, the only thing to show is 1 1 1 that the LHS of () equals xx S xx , or more specically, that plim n X VX = S. Write S as S = E(2 i xi xi ) = E[E(2 i |xi )xi xi ] = E(zi xi xi ) (since E(2 i |xi ) = zi by (vi)).

Since xi is i.i.d. by (ii) and since zi is a function of xi , zi xi xi is i.i.d. So its sample mean converges in probability to its population mean E(zi xi xi ), which equals S. The sample mean can be written as 1 n = =
n

zi xi xi
i=1

1 n

vi xi xi
i=1

(by the denition of vi , where vi is the i-th diagonal element of V)

1 X VX. n

8. See the hint. 9. (a) E(gt |gt1 , gt2 , . . . , g2 ) = E[E(gt |t1 , t2 , . . . , 1 )|gt1 , gt2 , . . . , g2 ] (by the Law of Iterated Expectations) = E[E(t t1 |t1 , t2 , . . . , 1 )|gt1 , gt2 , . . . , g2 ] = E[t1 E(t |t1 , t2 , . . . , 1 )|gt1 , gt2 , . . . , g2 ] =0 (since E(t |t1 , t2 , . . . , 1 ) = 0). (by the linearity of conditional expectations)

(b)
2 2 E(gt ) = E(2 t t1 ) 2 = E[E(2 t t1 |t1 , t2 , . . . , 1 )]

(by the Law of Total Expectations) of conditional expectations)

= =

2 E[E(2 (by the linearity t |t1 , t2 , . . . , 1 )t1 ] 2 2 2 E( t1 ) (since E(t |t1 , t2 , . . . , 1 ) = 2 )

= 2 E(2 t1 ). But
2 2 2 E(2 t1 ) = E[E(t1 |t2 , t3 , . . . , 1 )] = E( ) = .

(c) If {t } is ergodic stationary, then {t t1 } is ergodic stationary (see, e.g., Remark 5.3 on p. 488 of S. Karlin and H. Taylor, A First Course in Stochastic Processes, 2nd. ed., Academic Press, 1975, which states that For any function , the sequence Yn = (Xn , Xn+1 , . . . ) generates an ergodic stationary process whenever {Xn } is ergodic Thus the stationary.) n 1 Billingsley CLT (see p. 106 of the text) is applicable to n1 = n n t=j +1 gt .
2 in probability to E(2 (d) Since 2 t is ergodic stationary, 0 converges t ) = . As shown in (c), 1 4 n1 d N (0, ). So by Lemma 2.4(c) n 0 d N (0, 1).

10. (a) Clearly, E(yt ) = 0 for all t = 1, 2, . . . . 2 2 2 ) + 2 (1 + 1 ( + ) 2 1 1 2 Cov(yt , ytj ) = 2 2 0 So neither E(yt ) nor Cov(yt , ytj ) depends on t. (b) E(yt |ytj , ytj 1 , . . . , y0 , y1 ) = E(yt |tj , tj 1 , . . . , 0 , 1 ) (as noted in the hint) = E(t + 1 t1 + 2 t2 |tj , tj 1 , . . . , 0 , 1 ) t + 1 t1 + 2 t2 for j = 0, for j = 1, 1 t1 + 2 t2 = for j = 2, 2 t2 0 for j > 2, which gives the desired result. for for for for j j j j =0 = 1, = 2, > 2,

(c) 1 Var( n y ) = [Cov(y1 , y1 + + yn ) + + Cov(yn , y1 + + yn )] n = 1 [(0 + 1 + + n2 + n1 ) + (1 + 0 + 1 + + n2 ) n + + (n1 + n2 + + 1 + 0 )] 1 [n0 + 2(n 1)1 + + 2(n j )j + + 2n1 ] n
n1

= 0 + 2
j =1

j j . n

(This is just reproducing (6.5.2) of the book.) Since j = 0 for j > 2, one obtains the desired result. (d) To use Lemma 2.1, one sets zn = ny . However, Lemma 2.1, as stated in the book, inadvertently misses the required condition that there exist an M > 0 such that E(|zn |s+ ) < M for all n for some > 0. Provided this technical condition is satised, the variance of the limiting distribution of ny is the limit of Var( ny ), which is 0 + 2(1 + 2 ). 11. (a) In the auxiliary regression, the vector of the dependent variable is e and the matrix of . regressors is [X . . E]. Using the OLS formula, = B1
1 nX 1 nE

e e

.
1 nE

X e = 0 by the normal equations for the original regression. The j -th element of 1 1 (ej +1 e1 + + en enj ) = et etj . n n t=j +1
n

e is

which equals j dened in (2.10.9). n 1 1 (b) The j -th column of n X E is n t=j +1 xt etj (which, incidentally, equals j dened on p. 147 of the book). Rewrite it as follows. 1 xt etj n t=j +1 1 = xt (tj xtj (b )) n t=j +1 n n 1 1 = xt tj xt xtj (b ) n t=j +1 n t=j +1
1 The last term vanishes because b is consistent for . Thus n t=j +1 xt etj converges in probability to E(xt tj ). 1 E E is, for i j , The (i, j ) element of the symmetric matrix n n n n

1 1 (e1+ij e1 + + enj eni ) = et et(ij ) . n n t=1+ij 6

nj

Using the relation et = t xt (b ), this can be rewritten as 1 1 t t(ij ) (xt t(ij ) + xt(ij ) t ) (b ) n t=1+ij n t=1+ij (b ) 1 xt xt(ij ) (b ). n t=1+ij
nj nj nj

The type of argument that is by now routine (similar to the one used on p. 145 for (2.10.10)) shows that this expression converges in probability to ij , which is 2 for i = j and zero for i = j . (c) As shown in (b), plim B = B. Since xx is non-singular, B is non-singular. So B1 converges in probability to B1 . Also, using an argument similar to the one used in (b) 1 E E = Ip , we can show that plim = 0. Thus the formula in (a) for showing that plim n shows that converges in probability to zero.
1 (d) (The hint should have been: n E e = . Show that SSR n

1 ne

0 . The SSR from

the auxiliary regression can be written as . . 1 1 SSR = (e [X . . E]) (e [X . . E]) n n . 1 = ( e [X . . E]) e (by the normal equation for the auxiliary regression) n . 1 1 . E] e = e e [X . n n = 1 ee n 1 ee n
1 nX 1 nE

e e (since X e = 0 and 1 E e = ). n

1 e e = 2 . As shown in (c), plim = 0 and plim = 0. By Proposition 2.2, we have plim n 2 Hence SSR/n (and therefore SSR/(n K p)) converges to in probability.

(e) Let R
(pK )

. . .

Ip

. , V [X . . E].

The F -ratio is for the hypothesis that R = 0. The F -ratio can be written as F = (R) R(V V)1 R (R)/p . SSR/(n K p)
1

()

Using the expression for in (a) above, R can be written as 0 . ( K 1) . R = 0 . Ip B1 (pK )


(p1)

=
(pK )

. . .

Ip

(K K ) B21
(pK )

B11

(K p) 22 (pp)

B12 B

(K 1)

()

(p1)

= B22 . Also, R(V V)


1

R in the expression for F can be written as 1 R B 1 R n 0 (since . . . Ip 1 V V = B) n B11 B12 B

R(V V)1 R =

(K p)

1 = n =

(pK )

(K K ) B21
(pK )

(K p) 22 (pp)

Ip

1 22 B . ( ) n Substitution of ( ) and () into () produces the desired result. (f) Just apply the formula for partitioned inverses. (g) Since n n / 2 p 0 and p , it should be clear that the modied Box-Pierce Q (= n (Ip )1 ) is asymptotically equivalent to n (Ip )1 / 4 . Regarding the pF statistic given in (e) above, consider the expression for B22 given in (f) above. Since 1 the j -th element of n X E is j dened right below (2.10.19) on p. 147, we have s2 = so B22 = 1 1 1 E X S XE , xx n n 1 E E s2 n
1

. )1 , and pF is asymptoti-

1 As shown in (b), n E E p 2 Ip . Therefore, B22 p cally equivalent to n (Ip )1 / 4 .

1 2 (Ip

12. The hints are almost as good as the answer. Here, we give solutions to (b) and (c) only. (b) We only prove the rst convergence result. 1 n r xt xt = n t=1
r

1 r

xt xt
t=1

1 r

xt xt
t=1

The term in parentheses converges in probability to xx as n (and hence r) goes to innity. (c) We only prove the rst convergence result. 1 n
r

xt t =
t=1

r n

1 r

xt t
t=1

1 r

xt t
t=1

The term in parentheses converges in distribution to N (0, 2 xx ) as n (and hence r) goes to innity. So the whole expression converges in distribution to N (0, 2 xx ).

December 27, 2003

Hayashi Econometrics

Solution to Chapter 3 Analytical Exercises


1. If A is symmetric and idempotent, then A = A and AA = A. So x Ax = x AAx = x A Ax = z z 0 where z Ax. 2. (a) By assumption, {xi , i } is jointly stationary and ergodic, so by ergodic theorem the rst 2 term of () converges almost surely to E(x2 i i ) which exists and is nite by Assumption 3.5. (b) zi x2 i i is the product of xi i and xi zi . By using the Cauchy-Schwarts inequality, we obtain E(|xi i xi zi |)
2 2 2 E(x2 i i ) E(xi zi ).

2 2 2 E(x2 i i ) exists and is nite by Assumption 3.5 and E(xi zi ) exists and is nite by Assumption 3.6. Therefore, E(|xi zi xi i |) is nite. Hence, E(xi zi xi i ) exists and is nite. (c) By ergodic stationarity the sample average of zi x2 i i converges in probability to some nite number. Because is consistent for by Proposition 3.1, converges to 0 in probability. Therefore, the second term of () converges to zero in probability. 2 2 xi converges in prob(d) By ergodic stationarity and Assumption 3.6 the sample average of zi ability to some nite number. As mentioned in (c) converges to 0 in probability. Therefore, the last term of () vanishes.

3. (a) Q = = = = = xz S1 xz xz Wxz (xz WSWxz )1 xz Wxz xz C Cxz xz Wxz (xz WC1 C H H xz Wxz (G G) xz Wxz H H H G(G G)1 G H H [IK G(G G)1 G ]H H MG H. IK G(G(G G)1 ) IK G((G G)1 G ) IK G(G G)1 G MG .
1 1

Wxz )1 xz Wxz

(b) First, we show that MG is symmetric and idempotent. MG = = = = MG MG = = =

IK IK G(G G)1 G IK IK G(G G)1 G + G(G G)1 G G(G G)1 G IK G(G G)1 G MG .

Thus, MG is symmetric and idempotent. For any L-dimensional vector x, x Qx = = x H MG Hx z MG z (where z Hx) 0 (since MG is positive semidenite).

Therefore, Q is positive semidenite. 1

4. (the answer on p. 254 of the book simplied) If W is as dened in the hint, then WSW = W and xz Wxz = zz A1 zz .

So (3.5.1) reduces to the asymptotic variance of the OLS estimator. By (3.5.11), it is no smaller than (xz S1 xz )1 , which is the asymptotic variance of the ecient GMM estimator. 5. (a) From the expression for (S1 ) (given in (3.5.12)) and the expression for gn ( ) (given in (3.4.2)), it is easy to show that gn ( (S1 )) = Bsxy . But Bsxy = Bg because Bsxy = (IK Sxz (Sxz S1 Sxz )1 Sxz S1 )sxy = (IK Sxz (Sxz S1 Sxz )1 Sxz S1 )(Sxz + g) = (Sxz Sxz ) + Bg = Bg. (b) Since S1 = C C, we obtain B S1 B = B C CB = (CB) (CB). But CB = = C(IK Sxz (Sxz S1 Sxz )1 Sxz S1 ) C CSxz (Sxz C CSxz )1 Sxz C C (where A CSxz ) (since yi = zi + i ) = (Sxz Sxz (Sxz S1 Sxz )1 Sxz S1 Sxz ) + (IK Sxz (Sxz S1 Sxz )1 Sxz S1 )g

= C A(A A)1 A C = [IK A(A A)1 A ]C MC.

So B S1 B = (MC) (MC) = C M MC. It should be routine to show that M is symmetric and idempotent. Thus B S1 B = C MC. The rank of M equals its trace, which is trace(M) = trace(IK A(A A)1 A ) = trace(IK ) trace(A(A A)1 A ) = = = trace(IK ) trace(A A(A A)1 ) K trace(IL ) K L.

(c) As dened in (b), C C = S1 . Let D be such that D D = S1 . The choice of C and D is not unique, but it would be possible to choose C so that plim C = D. Now, v n(Cg) = C( n g). By using the Ergodic Stationary Martingale Dierences CLT, we obtain n g d N (0, S). So v = C( n g) N (0, Avar(v))
d

where Avar(v) = = = = 2 DSD D(D D)1 D DD1 D1 D IK .

(d) J ( (S1 ), S1 ) = = = = = n gn ( (S1 )) S1 gn ( (S1 )) n (Bg) S1 (Bg) ng BS


1

(by (a))

Bg

n g C MCg (by (b)) v Mv (since v nCg).

Since v d N (0, IK ) and M is idempotent, v Mv is asymptotically chi-squared with degrees of freedom equaling the rank of M = K L. 6. From Exercise 5, J = ng B S1 Bg. Also from Exercise 5, Bg = Bsxy . 7. For the most parts, the hints are nearly the answer. Here, we provide answers to (d), (f), (g), (i), and (j). (d) As shown in (c), J1 = v1 M1 v1 . It suces to prove that v1 = C1 F C1 v. v 1 nC1 g 1 = nC1 F g = nC1 F C1 Cg = C1 F C1 nCg = C1 F C1 v (since v nCg). (f) Use the hint to show that A D = 0 if A1 M1 = 0. It should be easy to show that A1 M1 = 0 from the denition of M1 . (g) By the denition of M in Exercise 5, MD = D A(A A)1 A D. So MD = D since A D = 0 as shown in the previous part. Since both M and D are symmetric, DM = D M = (MD) = D = D. As shown in part (e), D is idempotent. Also, M is idempotent as shown in Exercise 5. So (M D)2 = M2 DM MD + D2 = M D. As shown in Exercise 5, the trace of M is K L. As shown in (e), the trace of D is K1 L. So the trace of M D is K K1 . The rank of a symmetric and idempotent matrix is its trace. (i) It has been shown in Exercise 6 that g C MCg = sxy C MCsxy since C MC = B S1 B. Here, we show that g C DCg = sxy C DCsxy . g C DCg = g FC1 M1 C1 F g = g FB1 (S11 )1 B1 F g = g1 B1 (S11 )1 B1 g1 (C DC = FC1 M1 C1 F by the denition of D in (d)) (since C1 M1 C1 = B1 (S11 )1 B1 from (a)) (since g1 = F g).

From the denition of B1 and the fact that sx1 y = Sx1 z + g1 , it follows that B1 g1 = B1 sx1 y . So g1 B1 (S11 )1 B1 g1 = sx1 y B1 (S11 )1 B1 sx1 y = sxy FB1 (S11 )1 B1 F sxy = sxy FC1 M1 C1 F sxy = sxy C DCsxy . 3 (since sx1 y = F sxy ) (since B1 (S11 )1 B1 = C1 M1 C1 from (a))

(j) M D is positive semi-denite because it is symmetric and idempotent. 8. (a) Solve the rst-order conditions in the hint for to obtain = (W ) 1 (S WSxz )1 R . 2n xz

Substitute this into the constraint R = r to obtain the expression for in the question. Then substitute this expression for into the above equation to obtain the expression for in the question. (b) The hint is almost the answer. (c) What needs to be shown is that n( (W) ) (Sxz WSxz )( (W) ) equals the Wald statistic. But this is immediate from substitution of the expression for in (a). 9. (a) By applying (3.4.11), we obtain n( 1 ) (Sxz W1 Sxz )1 Sxz W1 = ng . n( 1 ) (Sxz W2 Sxz )1 Sxz W2 By using Billingsley CLT, we have ng N (0, S).
d

Also, we have (Sxz W1 Sxz )1 Sxz W1 (Sxz W2 Sxz )1 Sxz W2 Therefore, by Lemma 2.4(c), n( 1 ) n( 1 ) d = (b) nq can be rewritten as nq = n( 1 2 ) = n( 1 ) n( 2 ) = 1 Therefore, we obtain nq N (0, Avar(q)).
d

1 Q 1 xz W1 . 1 Q2 xz W2

N N

0, 0,

1 . Q 1 1 xz W1 S (W Q1 . . W2 xz Q 1 xz 1 1 2 ) Q2 xz W2

A11 A21

A12 A22

n( 1 ) . n( 2 )

where Avar(q) = 1 1 A11 A21 A12 A22 1 = A11 + A22 A12 A21 . 1

(c) Since W2 = S1 , Q2 , A12 , A21 , and A22 can be rewritten as follows: Q2 = = = = = = A21 = = A22 xz W2 xz xz S1 xz ,
1 1 1 Q xz Q 1 xz W1 S S 2 1 1 Q 1 (xz W1 xz )Q2 1 1 Q 1 Q1 Q2 1 Q 2 , 1 1 1 Q SW1 xz Q 2 xz S 1 1 Q 2 ,

A12

= (xz S1 xz )1 xz S1 SS1 xz (xz S1 xz )1 = (xz S1 xz )1 =


1 Q 2 .

Substitution of these into the expression for Avar(q) in (b), we obtain Avar(q) = = = 10. (a) xz E(xi zi ) = = = (b) From the denition of , = 1 n
n 1 1 A11 Q 2

A11 (xz S1 xz )1 Avar( (W1 )) Avar( (S1 )).

E(xi (xi + vi )) E(x2 i ) + E(xi vi ) 2 (by assumptions (2), (3), and (4)). x = 0

xi zi
i=1

1 n

n 1 xi i = s xz i=1

1 n

xi i .
i=1

We have xi zi = xi (xi + vi ) = x2 i + xi vi , which, being a function of (xi , i ), is ergodic stationary by assumption (1). So by the Ergodic theorem, sxz p xz . Since xz = 0 by 1 1 (a), we have s xz p xz . By assumption (2), E(xi i ) = 0. So by assumption (1), we have n 1 i=1 xi i p 0. Thus p 0. n (c) sxz = = p = 1 n 1 n
n

xi z i
i=1 n

(x2 i + xi vi )
i=1 n

1 1 nn

x2 i +
i=1

1 n

xi vi
i=1

1 (since = ) n

0 E(x2 i ) + E(xi vi ) 0 5

(d) 1 nsxz = n
n

x2 i
i=1

1 + n

xi vi .
i=1

By assumption (1) and the Ergodic Theorem, the rst term of RHS converges in probability 2 to E(x2 i ) = x > 0. Assumption (2) and the Martingale Dierences CLT imply that 1 n
n

xi vi a N (0, s22 ).
i=1 d

Therefore, by Lemma 2.4(a), we obtain 2 nsxz x + a.


d

(e) can be rewritten as = ( nsxz )1 ng 1 . From assumption (2) and the Martingale Dierences CLT, we obtain ng 1 b N (0, s11 ).
d

where s11 is the (1, 1) element of S. By using the result of (d) and Lemma 2.3(b),
2 (x + a)1 b. d

(a, b) are jointly normal because the joint distribution is the limiting distribution of ng = ng 1
1 n( n n i=1

xi vi )

2 (f) Because converges in distribution to (x + a)1 b which is not zero, the answer is No.

January 8, 2004, answer to 3(c)(i) simplied, February 23, 2004

Hayashi Econometrics

Solution to Chapter 4 Analytical Exercises


1 1 1. It should be easy to show that Amh = n Zm PZh and that cmh = n Zm Pyh . Going back to the formula (4.5.12) on p. 278 of the book, the rst matrix on the RHS (the matrix to be inverted) is a partitioned matrix whose (m, h) block is Amh . It should be easy to see

that it equals
1 nZ

1 n [Z

P)Z]. Similarly, the second matrix on the RHS of (4.5.12) equals

P) y .

2. The sprinkled hints are as good as the answer. 3. (b) (amplication of the answer given on p. 320) In this part only, for notational brevity, let zi be a m Lm 1 stacked vector collecting (zi1 , . . . , ziM ). E(im | Z) = E(im | z1 , z2 , . . . , zn ) (since Z collects zi s) = E(im | zi ) (since (im , zi ) is independent of zj (j = i)) =0 (by the strengthened orthogonality conditions).

The (i, j ) element of the n n matrix E(m h | Z) is E(im jh | Z). E(im jh | Z) = E(im jh | z1 , z2 , . . . , zn ) = E(im jh | zi , zj ) For j = i, this becomes E(im jh | zi , zj ) = E [E(im jh | zi , zj , jh ) | zi , zj ] (since (im , zi , jh , zj ) is independent of zk (k = i, j )).

(by the Law of Iterated Expectations)

= E [jh E(im | zi , zj , jh ) | zi , zj ] (by linearity of conditional expectations) = E [jh E(im | zi ) | zi , zj ] (since (im , zi ) is independent of (jh , zj )) =0 (since E(im | zi ) = 0). For j = i, E(im jh | Z) = E(im ih | Z) = E(im ih | zi ). Since xim = xi and xi is the union of (zi1 , . . . , ziM ) in the SUR model, the conditional homoskedasticity assumption, Assumption 4.7, states that E(im ih | zi ) = E(im ih | xi ) = mh . (c) (i) We need to show that Assumptions 4.1-4.5, 4.7 and (4.5.18) together imply Assumptions 1.1-1.3 and (1.6.1). Assumption 1.1 (linearity) is obviously satised. Assumption 1.2 (strict exogeneity) and (1.6.1) have been veried in 3(b). That leaves Assumption 1.3 (the rank condition that Z (dened in Analytical Exercise 1) be of full column rank). Since Z is block diagonal, it suces to show that Zm is of full column rank for m = 1, 2, . . . , M . The proof goes as follows. By Assumption 4.5, 1

S is non-singular. By Assumption 4.7 and the condition (implied by (4.5.18)) that the set of instruments be common across equations, we have S = E(xi xi ) (as 1 in (4.5.9)). So the square matrix E(xi xi ) is non-singular. Since n X X (where X is the n K data matrix, as dened in Analytical Exercise 1) converges almost surely to E(xi xi ), the n K data matrix X is of full column rank for suciently large n. Since Zm consists of columns selected from the columns of X, Zm is of full column rank as well. (ii) The hint is the answer. (iii) The unbiasedness of SUR follows from (i), (ii), and Proposition 1.7(a). (iv) Avar( SUR ) is (4.5.15) where Amh is given by (4.5.16 ) on p. 280. The hint shows that it equals the plim of n Var( SUR | Z). (d) For the most part, the answer is a straightforward modication of the answer to (c). The only part that is not so straightforward is to show in part (i) that the M n L matrix Z is of full column rank. Let Dm be the Dm matrix introduced in the answer to (c), so zim = Dm xi and Zm = XDm . Since the dimension of xi is K and that of zim is L, the M matrix Dm is K L. The m=1 Km L matrix xz in Assumption 4.4 can be written as D1 . xz = [IM E(xi xi )]D where D . . .
(KM L) (KM L)

DM

Since xz is of full column rank by Assumption 4.4 and since E(xi xi ) is non-singular, D is of full column rank. So Z = (IM X)D is of full column rank if X is of full column rank. X is of full column rank for suciently large n if E(xi xi ) is non-singular. 4. (a) Assumptions 4.1-4.5 imply that the Avar of the ecient multiple-equation GMM estimator is (xz S1 xz )1 . Assumption 4.2 implies that the plim of Sxz is xz . Under Assumptions 4.1, 4.2, and 4.6, the plim of S is S. (b) The claim to be shown is just a restatement of Propositions 3.4 and 3.5. (c) Use (A9) and (A6) of the books Appendix A. Sxz and W are block diagonal, so WSxz (Sxz WSxz )1 is block diagonal. (d) If the same residuals are used in both the ecient equation-by-equation GMM and the ecient multiple-equation GMM, then the S in () and the S in (Sxz S1 Sxz )1 are numerically the same. The rest follows from the inequality in the question and the hint. (e) Yes. (f) The hint is the answer. 5. (a) For the LW69 equation, the instruments (1, MED) are 2 in number while the number of the regressors is 3. So the order condition is not satised for the equation. (b) (reproducing the answer on pp. 320-321) 1 E(S69) E(IQ) E(LW69) 0 1 E(S80) E(IQ) E(LW80) E(MED) E(S69 MED) E(IQ MED) 1 = E(LW69 MED) . 2 E(MED) E(S80 MED) E(IQ MED) E(LW80 MED) The condition for the system to be identied is that the 4 3 coecient matrix is of full column rank. 2

(c) (reproducing the answer on p. 321) If IQ and MED are uncorrelated, then E(IQ MED) = E(IQ) E(MED) and the third column of the coecient matrix is E(IQ) times the rst column. So the matrix cannot be of full column rank. 6. (reproducing the answer on p. 321) im = yim zim m = im zim ( m m ). So 1 n where (1) = 1 n
n n

[im zim ( m m )][ih zih ( h h )] = (1) + (2) + (3) + (4),


i=1

im ih ,
i=1

(2) = ( m m ) (3) = ( h h ) (4) = ( m m ) 1 n

1 n 1 n

zim ih ,
i=1 n

zih im ,
i=1

zim zih ( h h ).
i=1

As usual, under Assumption 4.1 and 4.2, (1) p mh ( E(im ih )). For (4), by Assumption 4.2 and the assumption that E(zim zih ) is nite, zim zih converges in probability to a (nite) matrix. So (4) p 0. Regarding (2), by Cauchy-Schwartz, E(|zimj ih |)
2 ) E(2 ), E(zimj ih 1 n i

where zimj is the j -th element of zim . So E(zim ih ) is nite and (2) p 0 because m m p 0. Similarly, (3) p 0. 7. (a) Let B, Sxz , and W be as dened in the hint. Also let 1 n i=1 xi yi1 n . . sxy = . . (M K 1) n 1 i=1 xi yiM n Then 3SLS = Sxz WSxz = (I B )( =
1 1

Sxz Wsxy
1

1 S xx )(I B) 1

(I B )(

1 S xx )sxy

1 B S xx B

1 B S xx sxy

1 1 = (B S xx B)

1 B S xx sxy

1 1 1 = IM (B S B S xx B) xx sxy n 1 1 1 1 (B S B S xx B) xx n i=1 xi yi1 . . = , . 1 1 1 (B S B S xx B) xx 1 n n i=1

xi yiM

which is a stacked vector of 2SLS estimators. (b) The hint is the answer. 8. (a) The ecient multiple-equation GMM estimator is Sxz S1 Sxz
1

Sxz S1 sxy ,

where Sxz and sxy are as dened in (4.2.2) on p. 266 and S1 is a consistent estimator of S. Since xim = zim here, Sxz is square. So the above formula becomes
1 S xz S Sxz 1 1 Sxz S1 sxy = S xz sxy ,

which is a stacked vector of OLS estimators. (b) The SUR is ecient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim ih ) = 0 for all m, h. The OLS estimator derived above is (trivially) ecient multiple-equation GMM under conditional homoskedasticity when the set of orthogonality conditions is E(zim im ) = 0 for all m. Since the sets of orthogonality conditions dier, the ecient GMM estimators dier. 9. The hint is the answer (to derive the formula in (b) of the hint, use the SUR formula you derived in Analytical Exercise 2(b)).
1 10. (a) Avar( 1,2SLS ) = 11 A 11 .

(b) Avar( 1,3SLS ) equals G1 . The hint shows that G =

1 11 A11 .

11. Because there are as many orthogonality conditions as there are coecients to be estimated, it is possible to choose so that gn ( ) dened in the hint is a zero vector. Solving 1 n
n

zi1 yi1 + +
i=1

1 n

ziM yiM
i=1

1 n

zi 1 zi 1 + +
i=1

1 n

ziM ziM = 0
i=1

for , we obtain = 1 n
n

zi 1 zi 1 + +
i=1

1 n

ziM ziM
i=1

1 n

zi1 yi1 + +
i=1

1 n

ziM yiM ,
i=1

which is none other than the pooled OLS estimator.

January 9, 2004

Hayashi Econometrics

Solution to Chapter 5 Analytical Exercises


1. (a) Let (a , b ) be the OLS estimate of ( , ) . Dene MD as in equation (4) of the hint. By the Frisch-Waugh theorem, b is the OLS coecient estimate in the regression of MD y on MD F. The proof is complete if we can show the claim that y = MD y and F = MD F, where y and F are dened in (5.2.2) and (5.2.3). This is because the xed-eects estimator can be written as (F F)1 F y (see (5.2.4)). But the above claim follows immediately if we 1 can show that MD = In Q, where Q IM M 1M 1M , the annihilator associated with 1M . MD = IM n (In 1M ) [(In 1M ) (In 1M )] = IM n (In 1M ) [(In 1M 1M )]
1 1 1

(In 1M )

(In 1M )

= IM n (In 1M ) [(In M )] (In 1M ) 1 = IM n (In 1M )(In )(In 1M ) M 1 = IM n (In 1M 1M ) M 1 = (In IM ) (In 1M 1M ) M 1 = (In (IM 1M 1M )) M = In Q. (b) As indicated in the hint to (a), we have a = (D D)1 (D y D Fb). It should be straightforward to show that 1M F1 b 1M y 1 . . . D D = M In , D y = . . . , D Fb = . 1M yn 1M Fn b Therefore,
1 M (1M y1 1 M (1M yn

a=

1M F1 b) . . . . 1M Fn b)

The desired result follows from this because b equals the xed-eects estimator FE and fi1 M . 1M yi = (yi1 + + yiM ) and 1M Fn b = 1M . b = fim b. . fiM
m=1

(c) What needs to be shown is that (3) and conditions (i)-(iv) listed in the question together imply Assumptions 1.1-1.4. Assumption 1.1 (linearity) is none other than (3). Assumption 1.3 is a restatement of (iv). This leaves Assumptions 1.2 (strict exogeneity) and Assumption 1.4 (spherical error term) to be veried. The following is an amplication of the answer to 1.(c) on p. 363. E( i | W) = E( i | F) (since D is a matrix of constants) = E( i | F1 , . . . , Fn ) = E( i | Fi ) (since ( i , Fi ) is indep. of Fj for j = i) by (i) =0 (by (ii)). Therefore, the regressors are strictly exogenous (Assumption 1.2). Also, E( i i | W) = E( i i | F) = E( i i | Fi )
2 = IM

(by the spherical error assumption (iii)).

For i = j , E( i j | W) = E( i j | F) = E( i j | F1 , . . . , Fn ) = E( i j | Fi , Fj ) (since ( i , Fi , j , Fj ) is indep. of Fk for k = i, j by (i)) = E[E( i j | Fi , Fj , i ) | Fi , Fj ] = E[ i E( j | Fi , Fj , i ) | Fi , Fj ] = E[ i E( j | Fj ) | Fi , Fj ] =0 (since ( j , Fj ) is independent of ( i , Fi ) by (i)) (since E( j | Fj ) by (ii)).

2 So E( | W) = IM n (Assumption 1.4). Since the assumptions of the classical regression model are satised, Propositions 1.1 holds for the OLS estimator (a, b). The estimator is unbiased and the Gauss-Markov theorem holds. As shown in Analytical Exercise 4.(f) in Chapter 1, the residual vector from the original regression (3) (which is to regress y on D and F) is numerically the same as the residual vector from the regression of y (= MD y) on F (= MD F)). So the two SSR s are the same.

2. (a) It is evident that C 1M = 0 if C is what is referred to in the question as the matrix of rst dierences. Next, to see that C 1M = 0 if C is an M (M 1) matrix created by dropping one column from Q, rst note that by construction of Q, we have: Q 1M =
(M M ) (M 1)

0 ,

which is a set of M equations. Drop one row from Q and call it C and drop the corresponding element from the 0 vector on the RHS. Then
((M 1)M )

1M =

((M 1)1)

(b) By multiplying both sides of (5.1.1 ) on p. 329 by C , we eliminate 1M bi and 1M i . 2

(c) Below we verify the ve conditions. The random sample condition is immediate from (5.1.2). Regarding the orthogonality conditions, as mentioned in the hint, (5.1.8b) can be written as E( i xi ) = 0. This implies the orthogonality conditions because E( i xi ) = E[(C IK )( i xi )] = (C IK ) E( i xi ). As shown on pp. 363-364, the identication condition to be veried is equivalent to (5.1.15) (that E(QFi xi ) be of full column rank). Since i = 1M i + i , we have i C i = C i . So i i = C i i C and E( i i | xi ) = E(C i i C | xi ) = C E(i i | xi )C = C C. (The last equality is by (5.1.5).) By the denition of gi , we have: gi gi = i i xi xi . But as just shown above, i i = C i i C. So gi gi = C i i C xi xi = (C IK )(i i xi xi )(C IK ). Thus E(gi gi ) = (C IK ) E[(i i xi xi )](C IK ) = (C IK ) E(gi gi )(C IK ) (since gi i xi ).

Since E(gi gi ) is non-singular by (5.1.6) and since C is of full column rank, E(gi gi ) is non-singular. (d) Since Fi C Fi , we can rewrite Sxz and sxy as Sxz = (C IK ) So Sxz WSxz = = 1 n 1 n 1 n
n

1 n

Fi xi , sxy = (C IK )
i=1

1 n

yi xi .
i=1

Fi xi (C IK ) (C C)1
i=1 n

1 n
n

xi xi
i=1 1

(C IK )
n

1 n

Fi xi
i=1

Fi xi
i=1 n

C(C C)1 C Q 1 n
n

1 n

xi xi
i=1

1 n

Fi xi
i=1

Fi xi
i=1

xi xi
i=1

1 n

Fi xi
i=1

(since C(C C)1 C = Q, as mentioned in the hint). Similarly, Sxz Wsxy = 1 n


n

Fi xi
i=1

1 n

xi xi
i=1

1 n

yi xi .
i=1

Noting that fim is the m-th row of Fi and writing out the Kronecker products in full, we obtain
M M

Sxz WSxz =
m=1 h=1 M M

qmh

1 n 1 n

fim xi
i=1 n

1 n 1 n

xi xi
i=1 n

1 n 1 n

xi fih
i=1 n

Sxz Wsxy =
m=1 h=1

qmh

fim xi
i=1

xi xi
i=1

xi yih
i=1

where qmh is the (m, h) element of Q. (This is just (4.6.6) with xim = xi , zim = fim ,
1 W = Q n i=1 xi xi xi dissappears. So n 1

.) Since xi includes all the elements of Fi , as noted in the hint,


M n n M M

Sxz WSxz =
m=1 h=1 M M

qmh

1 n 1 n

fim fih =
i=1 n

1 n 1 n

qmh fim fih ,


i=1 m=1 h=1 n M M

Sxz Wsxy =
m=1 h=1

qmh

fim yih =
i=1

qmh fim yih .


i=1 m=1 h=1

Using the beautifying formula (4.6.16b), this expression can be simplied as Sxz WSxz = Sxz Wsxy =
1

1 n 1 n

Fi QFi ,
i=1 n

Fi Qyi .
i=1

So Sxz WSxz

Sxz Wsxy is the xed-eects estimator.

(e) The previous part shows that the xed-eects estimator is not ecient because the W in (10) does not satisfy the eciency condition that plim W = S1 . Under conditional homoskedasticity, S = E( i i ) E(xi xi ). Thus, with being a consistent estimator of E( i i ), the ecient GMM estimator is given by setting W=
1

1 n

xi xi
i=1

This is none other than the random-eects estimator applied to the system of M 1 equations (9). By setting Zi = Fi , = , yi = yi in (4.6.8 ) and (4.6.9 ) on p. 293, we obtain (12) and (13) in the question. It is shown on pp. 292-293 that these beautied formulas are numerically equivalent versions of (4.6.8) and (4.6.9). By Proposition 4.7, the random-eects estimator (4.6.8) is consistent and asymptotically normal and the asymptotic variance is given by (4.6.9). As noted on p. 324, it should be routine to show that those conditions veried in (c) above are sucient for the hypothesis of Proposition 4.7. In particular, the xz referred to in Assumption 4.4 can be written as E(Fi xi ). In (c), weve veried that this matrix is of full column rank. (f) Proposition 4.1, which is about the estimation of error cross moments for the multipleequation model of Section 4.1, can easily be adapted to the common-coecient model of Section 4.6. Besides linearity, the required assumptions are (i) that the coecient estimate 4

(here FE ) used for calculating the residual vector be consistent and (ii) that the cross moment between the vector of regressors from one equation (a row from Fi ) and those from another (another row from Fi ) exist and be nite. As seen in (d), the xed-eects estimator FE is a GMM estimator. So it is consistent. As noted in (c), E(xi xi ) is non-singular. Since xi contains all the elements of Fi , the cross moment assumption is satised. (g) As noted in (e), the assumptions of Proposition 4.7 holds for the present model in question. It has been veried in (f) that dened in (14) is consistent. Therefore, Proposition 4.7(c) holds for the present model. 2 C C (the last equality is by (h) Since i C i , we have E( i i ) = E(C i i C) = 2 (15)). By setting = C C in the expression for W in the answer to (e) (thus setting
2 W = CC 1 n n i=1

xi xi
1

), the estimator can be written as a GMM estimator

(Sxz WSxz )

Sxz Wsxy . Clearly, it is numerically equal to the GMM estimator with


1 n n i=1

W = C C

xi xi

, which, as was veried in (d), is the xed-eects estimator.

(i) Evidently, replacing C by B CA in (11) does not change Q. So the xed-eects estimator is invariant to the choice of C. To see that the numerical values of (12) and (13) i B Fi and y i B yi . That is, the original M are invariant to the choice of C, let F equations (5.1.1 ) are transformed into M 1 equations by B = CA, not by C. Then i = A Fi and y is the estimated error cross moment matrix when (14) is i = A yi . If F i replacing Fi , then we have: = A A. So i replacing yi and F used with y 1 F i = F A(A A)1 A Fi = F AA1 1 (A )1 A Fi = F 1 Fi . F i i i i
1 1 y i = Fi yi . Similarly, F i

3. From (5.1.1 ), vi = C (yi Fi ) = C i . So E(vi vi ) = E(C i i C) = C E( i i )C = 2 C C. By the hint, plim SSR 2 2 2 = trace (C C)1 C C = trace[IM 1 ] = (M 1). n

4. (a) bi is absent from the system of M equations (or bi is a zero vector). y i1 y i0 . . yi = . . , Fi = . . . yiM yi,M 1 (b) Recursive substitution (starting with a substitution of the rst equation of the system into the second) yields the equation in the hint. Multiply both sides of the equation by ih and take expectations to obtain E(yim ih ) = E(im ih ) + E(i,m1 ih ) + + m1 E(i1 ih ) 1 m + E(i ih ) + m E(yi0 ih ) 1 = E(im ih ) + E(i,m1 ih ) + + m1 E(i1 ih ) (since E(i ih ) = 0 and E(yi0 ih ) = 0) =
2 mh 0

if h = 1, 2, . . . , m, if h = m + 1, m + 2, . . . .

2 (c) That E(yim ih ) = mh for m h is shown in (b). Noting that Fi here is a vector, not a matrix, we have:

E(Fi Q i ) = E[trace(Fi Q i )] = E[trace( i Fi Q)] = trace[E( i Fi )Q] = trace[E( i Fi )(IM = trace[E( i Fi )] = trace[E( i Fi )] By the results shown in (b), E( i Fi ) can be 0 1 0 0 . . . 2 . . E( i Fi ) = . 0 0 0 1 11 )] M

1 trace[E( i Fi )11 ] M 1 1 E( i Fi )1. M

written as 1 .. . 2 .. . 0 M 2 M 3 . . . . 1 0 1 0

So, in the above expression for E(Fi Q i ), trace[E( i Fi )] = 0 and 1 E( i Fi )1 = sum of the elements of E( i Fi ) = sum of the rst row + + sum of the last row
2 =

1 M 1 1 M 2 1 + + + 1 1 1 M 1 M + M . (1 )2

2 =

(d) (5.2.6) is violated because E(fim ih ) = E(yi,m1 ih ) = 0 for h m 1. 5. (a) The hint shows that E(Fi Fi ) = E(QFi xi ) IM E(xi xi )
1

E(QFi xi ).

By (5.1.15), E(QFi xi ) is of full column rank. So the matrix product above is nonsingular. (b) By (5.1.5) and (5.1.6 ), E(i i ) is non-singular. (c) By the same sort of argument used in (a) and (b) and noting that Fi C Fi , we have E(Fi 1 Fi ) = E(C Fi xi ) 1 E(xi xi )
1

E(C Fi xi ).

Weve veried in 2(c) that E(C Fi xi ) is of full column rank.

6. This question presumes that fi1 . . xi = . and fim = Am xi . fiM bi (a) The m-th row of Fi is fim and fim = xi Am . (b) The rank condition (5.1.15) is that E(Fi xi ) be of full column rank (where Fi QFi ). By the hint, E(Fi xi ) = [IM E(xi xi )](Q IK )A. Since E(xi xi ) is non-singular, IM E(xi xi ) is non-singular. Multiplication by a non-singular matrix does not alter rank. 7. The hint is the answer.

September 10, 2004

Hayashi Econometrics

Solution to Chapter 6 Analytical Exercises


1. The hint is the answer. 2. (a) Let n
n j =0 2 j . Then m 2

E[(yt,m yt,n )2 ] = E
j =n+1 m

j tj
2 j j =n+1

= 2

(since {t } is white noise)

= 2 |m n |. Since {j } is absolutely summable (and hence square summable), {n } converges. So |m n | as m, n . Therefore, E[(yt,m yt,n )2 ] 0 as m, n , which means {yt,n } converges in mean square in n by (i). (b) Since yt,n m.s. yt as shown in (a), E(yt ) = lim E(yt,n ) by (ii). But E(yt,n ) = 0.
n

(c) Since yt,n m.s. yt and ytj,n m.s. ytj as n , E[(yt )(ytj )] = lim E[(yt,n )(ytj ,n )].
n

(d) (reproducing the answer on pp. 441-442 of the book) Since {j } is absolutely summable, j 0 as j . So for any j , there exists an A > 0 such that |j +k | A for all j, k . So |j +k k | A|k |. Since {k } (and hence {Ak }) is absolutely summable, so is {j +k k } (k = 0, 1, 2, . . .) for any given j . Thus by (i),

|j | =

2 k=0

j +k k

2 k=0

|j +k k | =

2 k=0

|j +k | |k | < .

Now set ajk in (ii) to |j +k | |k |. Then


|ajk | =
j =0 j =0

|k | |j +k | |k |
j =0

|j | < .

Let M

|j | and sk |k |
j =0 j =0

|j +k |.

Then {sk } is summable because |sk | |k | M and {k } is absolutely summable. Therefore, by (ii),

|j +k | |k | < .
j =0 k=0

This and the rst inequality above mean that {j } is absolutely summable. 1

3. (a) j = Cov(yt,n , ytj,n ) = Cov(h0 xt + h1 xt1 + + hn xtn , h0 xtj + h1 xtj 1 + + hn xtj n )


n n

=
k=0 =0 n n

hk h Cov(xtk , xtj )
x hk h j + k=0 =0 k .

(b) Since {hj } is absolutely summable, we have yt,n m.s. yt as n by Proposition 6.2(a). Then, using the facts (i) and (ii) displayed in Analytical Exercise 2, we can show:
n n x hk h j + k=0 =0 k

= Cov(yt,n , ytj,n )

= E(yt,n ytj,n ) E(yt,n ) E(ytj,n ) E(yt ytj ) E(yt ) E(ytj ) = Cov(yt , ytj ) as n . That is, result.
n k=0 n =0 x hk h j + k

converges as n , which is the desired

4. (a) (8) solves the dierence equation yj 1 yj 1 2 yj 2 = 0 because yj 1 yj 1 2 yj 2


j j j +1 j +1 j +2 j +2 = (c10 + c20 2 ) 2 (c10 + c20 ) 1 + c20 2 ) 1 (c10 1 1 2 j j 2 2 = c10 1 (1 1 1 2 1 ) + c20 1 (1 1 2 2 2 )

=0

(since 1 and 2 are the roots of 1 1 z 2 z 2 = 0).

Writing down (8) for j = 0, 1 gives


1 1 y0 = c10 + c20 , y1 = c10 1 + c20 2 .

Solve this for (c10 , c20 ) given (y0 , y1 , 1 , 2 ). (b) This should be easy. (c) For j J , we have j n j < bj . Dene B as B max Then, by construction, jnj or j n j B bj bj for j = 0, 1, .., J 1. Choose A so that A > 1 and A > B . Then j n j < bj < A bj for j J and j n j B bj < A bj for all j = 0, 1, . . . , J 1. B (d) The hint is the answer. 5. (a) Multiply both sides of (6.2.1 ) by ytj and take the expectation of both sides to derive the desired result. (b) The result follows immediately from the MA representation ytj = tj + tj 1 + 2 tj 2 + . 2 (J 1)n J 1 2n j 3n 3 , 2 , 3 ,..., b b b bJ 1 .

(c) Immediate from (a) and (b). (d) Set j = 1 in (10) to obtain 1 0 = 0. Combine this with (9) to solve for (0 , 1 ): 0 = 2 2 , 1 = . 2 1 1 2

Then use (10) as the rst-order dierence equation for j = 2, 3, . . . in j with the initial 2 2 j condition 1 = 1 2 . This gives: j = 12 , verifying (6.2.5). 6. (a) Should be obvious. (b) By the denition of mean-square convergence, what needs to be shown is that E[(xt xt,n )2 ] 0 as n . E[(xt xt,n )2 ] = E[(n xtn )2 ] = 2n E(x2 tn ) 0 (c) Should be obvious. 7. (d) By the hint, what needs to be shown is that (F)n tn m.s. 0. Let zn (F)n tn . Contrary to the suggestion of the hint, which is to show the mean-square convergence of the components of zn , here we show an equivalent claim (see Review Question 2 to Section 2.1) that lim E(zn zn ) = 0.
n

(since xt = xt,n + n xtn )

(since || < 1 and E(x2 tn ) < ).

zn zn = trace(zn zn ) = trace[ tn [(F)n ] [(F)n ] tn ] = trace{ tn tn [(F)n ] [(F)n ]} Since the trace and the expectations operator can be interchanged, E(zn zn ) = trace{E( tn tn )[(F)n ] [(F)n ]}. Since t is covariance-stationary, we have E( tn tn ) = V (the autocovariance matrix). Since all the roots of the characteristic equation are less than one in absolute value, Fn = T()n T1 converges to a zero matrix. We can therefore conclude that E(zn zn ) 0. (e) n is the (1,1) element of T()n T1 . 8. (a) 1 t c c + t E(y0 ) , 1 1 1 2t 2 2 Var(yt ) = + 2t Var(y0 ) , 2 1 1 2 1 2(tj ) 2 2 Cov(yt , ytj ) = j + 2(tj ) Var(y0 ) j . 2 1 1 2 E(yt ) = (b) This should be easy to verify given the above formulas. 9. (a) The hint is the answer.
2 (b) Since j 0, the result proved in (a) implies that n j =1 |j | 0. Also, 0 /n 0. So by the inequality for Var(y ) shown in the question, Var(y ) 0. n

10. (a) By the hint,


n N n n n

j aj
j =1 j =1 k=j

ak +
j =N +1 k=j

ak < N M + (n N ) . 2

So 1 n

j aj <
j =1

NM nN NM + < + . n n 2 n 2

By taking n large enough, N M/n can be made less than /2. (b) From (6.5.2),
n1

Var( n y ) = 0 + 2
j =1

n1 n1 2 j j = 0 + 2 j j j . 1 n n j =1 j =1

The term in brackets converges to j = j if {j } is summable. (a) has shown that the last term converges to zero if {j } is summable.

September 14, 2004

Hayashi Econometrics

Solution to Chapter 7 Analytical Exercises


1. (a) Since a(w) = 1 f (y |x; ) = f (y |x; 0 ), we have Prob[a(w) = 1] = Prob[f (y |x; ) = f (y |x; 0 )]. But Prob[f (y |x; ) = f (y |x; 0 )] > 0 by hypothesis. (b) Set c(x) = log(x) in Jensens inequality. a(w) is non-constant by (a). (c) By the hint, E[a(w)|x] = 1. By the Law of Total Expectation, E[a(w)] = 1. (d) By combining (b) and (c), E[log(a(w))] < log(1) = 0. But log(a(w)) = log f (y |x; ) log f (y |x; 0 ). 2. (a) (The answer on p. 505 is reproduced here.) Since f (y | x; ) is a hypothetical density, its integral is unity: f (y | x; )dy = 1. (1)

This is an identity, valid for any . Dierentiating both sides of this identity with respect to , we obtain f (y | x; )dy = 0 . (2) (p1) If the order of dierentiation and integration can be interchanged, then f (y | x; )dy = f (y | x; )dy.
f (y

(3)

But by the denition of the score, s(w; )f (y | x; ) = into (3), we obtain s(w; )f (y | x; )dy =

| x; ). Substituting this (4)

(p1)

0 .

This holds for any , in particular, for 0 . Setting = 0 , we obtain s(w; 0 )f (y | x; 0 )dy = E[s(w; 0 ) | x] = 0 . (5)

(p1)

Then, by the Law of Total Expectations, we obtain the desired result. (b) By the hint, H(w; )f (y |x; )dy + s(w; ) s(w; ) f (y | x; )dy = 0 .

(pp)

The desired result follows by setting = 0 . 3. (a) For the linear regression model with ( , 2 ) , the objective function is the average log likelihood: n 1 1 1 1 2 Qn ( ) = log(2 ) log( ) 2 (yt xt )2 . 2 2 2 n t=1

To obtain the concentrated average log likelihood, take the partial derivative with respect to 2 and set it equal to 0, which yields 2 = 1 n
n

(yt xt )2
t=1

1 SSR( ). n

Substituting this into the average log likelihood, we obtain the concentrated average log likelihood (concentrated with respect to 2 ): Qn ( , 1 1 1 1 1 SSR( )) = log(2 ) log( SSR( )). n 2 2 2 n

The unconstrained ML estimator ( , 2 ) of 0 is obtained by maximizing this concentrated 1 SSR( ). average log likelihood with respect to , which yields , and then setting 2 = n The constrained ML estimator, ( , 2 ), is obtained from doing the same subject to the constraint R = c. But, as clear from the expression for the concentrated average log likelihood shown above, maximizing the concentrated average log likelihood is equivalent to minimizing the sum of squared residuals SSR( ). (b) Just substitute 2 = likelihood above.
1 n SSR( )

and 2 =

1 n SSR( )

into the concentrated average log

2 (c) As explained in the hint, both 2 and 2 are consistent for 0 . Reproducing (part of) (7.3.18) of Example 7.10, 1 0 2 E(xt xt ) 0 . (7.3.18) E H(wt ; 0 ) = 1 0 4 2
0

Clearly, both and are consistent for E H(wt ; 0 ) because both 2 and 2 are n 1 2 consistent for 0 and n t=1 xt xt is consistent for E(xt xt ). (d) The a( ) and A( ) in Table 7.2 for the present case are a( ) = R c, Also, observe that
1 1 Qn ( ) 1 t=1 xt (yt xt ) 2 n X (y X ) = = n 1 1 1 2 0 SSR R 22 + 24 n t=1 (yt xt ) n

A( ) =

(r K )

. . .

(r 1)

and 1 1 1 1 1 1 1 1 Qn ( ) = log(2 ) log( SSRU ), Qn ( ) = log(2 ) log( SSRR ). 2 2 2 n 2 2 2 n Substitute these expressions and the expression for and given in the question into the Table 7.2 formulas, and just do the matrix algebra. (e) The hint is the answer.
SSRR 1 (f) Let x SSR . Then x 1 and W/n = x 1, LR/n = log(x), and LM/n = 1 x . Draw U the graph of these three functions of x with x in the horizontal axis. Observe that their values at x = 1 are all 0 and the slopes at x = 1 are all one. Also observe that for x > 1, 1 x 1 > log(x) > 1 x .

September 22, 2004

Hayashi Econometrics

Solution to Chapter 8 Analytical Exercises


1. From the hint,
n n n

(yt xt )(yt xt ) =
t=1 t=1 n

vt vt + ( )
t=1 n

xt xt ( ).

But ( )

xt xt ( ) =
t=1 t=1

) xt xt ( )

is positive semi-denite. 2. Since yt = 0 xt + vt , we have yt xt = vt + (0 ) xt . So E[(yt xt )(yt xt ) ] = E[(vt + (0 ) xt )(vt + (0 ) xt ) ] = E(vt vt ) + E[vt xt (0 )] + E[(0 ) xt vt ] + (0 ) E(xt xt )(0 ) = E(vt vt ) + (0 ) E(xt xt )(0 ) So () 0 + (0 ) E(xt xt )(0 ) almost surely. By the matrix algebra result cited in the previous question, |0 + (0 ) E(xt xt )(0 )| |0 | > 0. So for suciently large n, () is positive denite. . . xt Cm from left by xt to obtain 3. (a) Multiply both sides of ztm = yt Sm . . xt ztm = xt yt Sm . . xt xt Cm . (since E(xt vt ) = 0).

()

Do the same to the reduced form yt = xt 0 + vt to obtain xt yt = xt xt 0 + xt vt . Substitute this into () to obtain . . . . xt ztm = xt xt 0 Sm . . xt xt Cm + xt vt . . 0 = xt xt 0 Sm . . Cm + xt vt . .0 . Take the expected value of both sides and use the fact that E(xt vt ) = 0 to obtain the desired result. (b) Use the reduced form yt = 0 xt + vt to derive yt + 1 Bxt = vt + (0 + 1 B)xt as in the hint. So (yt + 1 Bxt )(yt + 1 Bxt ) = [vt + (0 + 1 B)xt ][vt + (0 + 1 B)xt ] = vt vt + (0 + 1 B)xt vt + vt xt (0 + 1 B) + (0 + 1 B)xt xt (0 + 1 B) . 1

Taking the expected value and noting that E(xt vt ) = 0, we obtain E[(yt + 1 Bxt )(yt + 1 Bxt ) ] = E(vt vt ) + (0 + 1 B) E(xt xt )(0 + 1 B) . Since {yt , xt } is i.i.d., the probability limit of ( ) is given by this expectation. In this 1 1 1 expression, E(vt vt ) equals 0 0 (0 ) because by denition vt 0 t and 0 E(t t ). (c) What needs to be proved is that plim |( )| is minimized only if 0 + B = 0. Let A 1 1 1 B) E(xt xt )(0 + 0 0 (0 ) be the rst term on the RHS of (7) and let D (0 + 1 1 B) be the second term. Since 0 is positive denite and 0 is non-singular, A is positive denite. Since E(xt xt ) is positive denite, D is positive semi-denite. Then use the following the matrix inequality (which is slightly dierent from the one mentioned in Analytical Exercise 1 on p. 552): (Theorem 22 on p. 21 of Matrix Dierential Calculus with Applications in Statistics and Econometrics by Jan R. Magnus and Heinz Neudecker, Wiley, 1988) Let A be positive denite and B positive semi-denite. Then |A + B| |A| with equality if and only if B = 0. Hence
1 1 plim |( )| = |A + D| |A| = | 0 0 (0 ) |.

with equality |A + D| = |A| only if D = 0. Since E(xt xt ) is positive denite, D (0 + 1 B) E(xt xt )(0 + 1 B) is a zero matrix only if 0 + 1 B = 0, which holds if and only if 0 + B = 0 since is non-singular (the parameter space is such that is non-singular). (d) For m = 1, the LHS of (8) is m = The RHS is em m Sm
(Km M )

11

1 11

12

0 .

(Mm M ) 0 (1(Mm +Km ))

(Mm K )

1 0 0 0 0 0 0 1 0 0 0 1 0 0 . 0

Cm

(Km K )

11

11

12

. (e) Since m is the m-th row of . . B , the m-th row of of the LHS of (9) equals Sm 0 0 (Mm M ) (Mm K ) (M 0 K ) (by (8)) m = em m 0 Cm IK (1(Mm +Km )) I K (K M )
m

(Km K )

= em

0 Sm m IK 0

0 Cm

0 IK

. Sm 0 = [[0 . . IK ] e m ] m Cm = 0m m Sm 0 Cm (by the denition of 0m ).

(f) By denition (see (8.5.10)), 0 0 + B0 = 0. By the same argument given in (e) with m replaced by 0m shows that 0m is a solution to (10). Rewrite (10) by taking the transpose: . Ax = y with A [0 Sm . . Cm ], x m , y 0m . (10 )

A necessary and sucient condition that 0m is the only solution to (10 ) is that the coecient matrix in (10 ), which is K Lm (where Lm = Mm + Km ), be of full column rank (that is, the rank of the matrix be equal to the number of columns, which is Lm ). We have shown in (a) that this condition is equivalent to the rank condition for identication for the m-th equation. (g) The hint is the answer. . 4. In this part, we let Fm stand for the K Lm matrix [0 Sm . . Cm ]. Since xtK does not appear in the system, the last row of 0 is a vector of zeros and the last row of Cm is a vector of zeros. So the last row of of Fm is a vector of zeros: Fm Fm = ((K 1)Lm ) . 0
(1Lm )

Dropping xtK from the list of instruments means dropping the last row of Fm , which does not alter the full column rank condition. The asymptotic variance of the FIML estimator is given in (4.5.15) with (4.5.16) on p. 278. Using (6) on (4.5.16), we obtain Amh = Fm E(xt xt )Fh = Fm 0 E(xt xt ) E(xtK xt ) E(xtK xt ) E(x2 tK ) Fh = Fm E(xt xt )Fh . 0

This shows that the asymptotic variance is unchanged when xtK is dropped.

September 16, 2004

Hayashi Econometrics

Solution to Chapter 9 Analytical Exercises


1. From the hint, we have 1 T
T

t t1 =
t=1

1 T 2 T

1 0 2 T

1 2T

(t )2 .
t=1

()

Consider the second term on the RHS of (). Since E(0 / T ) 0 and Var(0 / T ) 0, 0 / T converges in mean square (by Chevychevs LLN), and hence in probability, to 0. So the second term vanishes (converges in probability to zero) (this can actually be shown directly from the denition of convergence in probability). Next, consider the expression T / T in the rst term on the RHS of (). It can be written as 1 1 T = (0 + 1 + + T ) = 0 + T T T T T
T

t .
t=1

0 vanishes. Since t is I(0) satisfying (9.2.1)-(9.2.3), the hypothesis of ProposiAs just seen, T tion 6.9 is satised (in particular, the absolute summability in the hypothesis of the Proposition is satised because it is implied by the one-summability (9.2.3a)). So

1 T T

t X, X N (0, 1).
t=1 d

where 2 is the long-run variance of t . Regarding the third term on the RHS of (), since T 1 2 t is ergodic stationary, 21 t=1 (t ) converges in probability to 2 0 . Finally, by Lemma T 2 1 2 2.4(a) we conclude that the RHS of () converges in distribution to 2 X 2 0 . 2. (a) The hint is the answer. (b) From (a), T ( 1) =
1 T 1 T2 T t=1 yt yt1 . T 2 t=1 (yt1 )

Apply Proposition 9.2(d) to the numerator and Proposition 9.2(c) to the denominator. (c) Since {yt } is random walk, 2 = 0 . Just set 2 = 0 in (4) of the question. (d) First, a proof that p 0. By the algebra of OLS, = 1 T 1 T 1 T 1 T
T

(yt yt1 )
t=1 T

(yt ( 1)yt1 )
t=1 T

yt ( 1)
t=1 T

1 T

yt1
t=1

1 yt T ( 1) T t=1 1

1 1 TT

yt1
t=1

1 The rst term after the last equality, T t=1 yt , vanishes (converges to zero in probability) because yt is ergodic stationary and E(yt ) = 0. To show that the second term 1 after the last equality vanishes, we rst note that T ( 1) vanishes because T 1 1 T ( 1) converges to a random variable by (b). By (6) in the hint, t=1 yt1 T T converges to a random variable. Therefore, by Lemma 2.4(b), the whole second term vanishes. Now turn to s2 . From the hint, T

s2 =

1 T 1

(yt )2
t=1

2 1 [T ( 1)] T 1 T

(yt ) yt1
t=1 T

1 1 [T ( 1)]2 2 + T 1 T

(yt1 )2 .
t=1

()

Since p 0, it should be easy to show that the rst term on the RHS of () converges to 0 in probability. Regarding the second term, rewrite it as 1 2 [T ( 1)] T 1 T 2 T 1 1 yt yt1 [T ( 1)] T 1 TT t=1
T T T

yt1 . ()
t=1

1 By Proposition 9.2(b), T t=1 yt yt1 converges to a random variable. So does T ( 1). Hence the rst term of () vanishes. Turning to the second term of (), (6) T 1 1 in the question means t=1 yt1 converges to a random variable. It should now T T be routine to show that the whole second term of () vanishes. A similar argument, this time utilizing Proposition 9.2(a), shows that the third term of () vanishes.

(e) By (7) in the hint and (3), a little algebra yields t = 1 s


1
T 2 t=1 (yt1 )

= s

1 T

T t=1 1 T2

yt yt 1 T 2 t=1 (yt1 )

Use Proposition 9.2(c) and (d) with 2 = 0 = 2 and the fact that s is consistent for to complete the proof. 3. (a) The hint is the answer. (b) From (a), we have T ( 1) =
1 T 1 T2 T t=1 yt yt1 . T 2 t=1 (yt1 )

. By construction, Let t and t be as dened in the hint. Then yt = + t and yt = t T y = 0. So t=1 t1 T 1 t t 1 T ( 1) = T1 t=1 . T 2 ) ( 2 t 1 t =1 T

Since {t } is driftless I(1), Proposition 9.2(e) and (f) can be used here. (c) Just observe that 2 = 0 if {yt } is a random walk with or without drift.

4. From the hint, 1 T


T

yt1 t = (1)
t=1

1 T

wt1 t +
t=1

1 T

t1 t + (y0 0 )
t=1

1 T

t .
t=1

()

Consider rst the second term on the RHS of (). Since t1 , which is a function of (t1 , t2 , . . . ), is independent of t , we have: E(t1 t ) = E(t1 ) E(t ) = 0. Then by the ergodic theorem T 1 this second term vanishes. Regarding the third term of (), T t=1 t p 0. So the whole third term vanishes. Lastly, consider the rst term on the RHS of (). Since {wt } is random walk and T 1 2 2 t = wt , Proposition 9.2(b) with 2 = 0 = 2 implies T t=1 wt1 t d 2 [W (1) 1]. 5. Comparing Proposition 9.6 and 9.7, the null is the same (that {yt } is zero-mean stationary AR(p), (L)yt = t , whose MA representation is yt = (L)t with (L) (L)1 ) but the augmented autoregression in Proposition 9.7 has an intercept. The proof of Proposition 9.7 (for p = 1) makes appropriate changes on the argument developed on pp. 587-590. Let b and be as dened in the hint. The AT and cT for the present case is AT =
1 T2 1 1 T T 1 T 1 T T 2 t=1 (yt1 ) T () t=1 (yt1 ) T t=1 yt 1 t 1 1 T T 1 T 1 T 1 T T () t=1 yt1 (yt1 ) T () 2 ] t=1 [(yt1 ) T t=1 yt 1 t

yt 1

cT =

T () t t=1 (yt1 )

T () t t=1 (yt1 )

where t is the residual from the regression of t on a constant for t = 1, 2, ..., T . (1,1) element of AT : Since {yt } is driftless I(1) under the null, Proposition 9.2(c) can T 2 2 be used to claim that T12 t=1 (yt (W )2 , where 2 = 2 [ (1)]2 with 2 1 ) d Var(t ).
1 (2,2) element of AT : Since (yt1 )() = yt1 T written as T T 1 1 [(yt1 )() ]2 = (yt1 )2 T t=1 T t=1 T t=1

yt1 , this element can be


2

1 T

yt1
t=1

Since E(yt1 ) = 0 and E[(yt1 )2 ] = 0 (the variance of yt ), this expression converges in probability to 0 . O diagonal elements of AT : it equals 1 1 TT 1 (yt1 )() yt 1 = T t=1
T

1 T

1 1 (yt1 ) yt1 TT t=1

yt1
t=1

1 T

yt1
t=1

The term in the square bracket is (9.4.14), which is shown to converge to a random variable T 1 1 (Review Question 3 of Section 9.4). The next term, t=1 yt1 , converges to a ranT T
1 dom variable by (6) assumed in Analytical Exercise 2(d). The last term, T converges to zero in probability. Therefore, the o-diagonal elements vanish. T t=1

yt1 ,

Taken together, we have shown that AT is asymptotically diagonal: AT


d

1 [W (r)]2 0

dr

0 3

0 , 0

so (AT )1
d

1 [W (r)]2 0

dr

0 1 . 0

Now turn to cT .
1 1st element of cT : Recall that yt 1 yt1 T t=1 yt1 . Combine this with the BN decomposition yt1 = (1)wt1 + t1 + (y0 0 ) with wt1 1 + + t1 to obtain T

1 T
wt 1

T yt 1 t = (1) t=1 T t=1

1 T

T wt 1 t + t=1

1 T

T t 1 t , t=1

1 where wt1 T wt1 . is dened similarly. Since t1 is independent of t , the second term on the RHS vanishes. Noting that wt = t and applying Proposition 9.2(d) to the random walk {wt }, we obtain

t 1

1 T

T wt 1 t t=1 d

2 2

[W (1) ]2 [W (0) ]2 1 .

Therefore, the 1st element of cT converges in distribution to c1 2 (1) 1 2 [W (1) ]2 [W (0) ]2 1 .


T t=1

1 2nd element of cT : Using the denition (yt1 )() yt1 T easy to show that it converges in distribution to

yt1 , it should be

c2 N (0, 0 2 ). Using the results derived so far, the modication to be made on (9.4.20) and (9.4.21) on p. 590 for the present case where the augmented autoregression has an intercept is T ( 1)
d

2 (1) 2

1 2

[W (1) ]2 [W (0) ]2 1
1 [W (r)]2 0

or 2 . 0

dr
d

2 T ( 1) DF , d 2 (1)

T (1 1 ) N 0,

Repeating exactly the same argument that is given in the subsection entitled Deriving Test 2 Statistics on p. 590, we can claim that 2 (1) is consistently estimated by 1/(1 ). This completes the proof of claim (9.4.34) of Proposition 9.7. 6. (a) The hint is the answer. (b) The proof should be straightforward. 7. The one-line proof displayed in the hint is (with i replaced by k to avoid confusion)

|j | =
j =0 j =0

k=j +1

k
j =0 k=j +1

|k | =
k=0

k |k | < ,

()

where {k } (k = 0, 1, 2, ...) is one-summable as assumed in (9.2.3a). We now justify each of the equalities and inequalities. For this purpose, we reproduce here the facts from calculus shown on pp. 429-430: 4

(i) If {ak } is absolutely summable, then {ak } is summable (i.e., <


k=0

ak < ) and

ak
k=0 k=0

|ak |.
j =0

(ii) Consider a sequence with two subscripts, {ajk } (j, k = 0, 1, 2, . . .). Suppose for each k and let sk j =0 |ajk |. Suppose {sk } is summable. Then

|ajk | <

ajk
j =0 k=0

< and
j =0 k=0

ajk =
k=0 j =0

ajk < .

Since {k } is one-summable, it is absolutely summable. Let ak = k 0 if k j + 1, otherwise.

Then {ak } is absolutely summable because {k } is absolutely summable. So by (i) above, we have

k=j +1

k =
k=j +1

k =
k=0

ak
k=0

|ak | =
k=j +1

|k |.

Summing over j = 0, 1, 2, ..., n, we obtain


n n

j =0 k=j +1

k
j =0 k=j +1

|k |.

If the limit as n of the RHS exists and is nite, then the limit of the LHS exists and is nite (this follows from the fact that if {xn } is non-decreasing in n and if xn A < , n then the limit of xn exists and is nite; set xn j =0 | k=j +1 k |). Thus, provided that j =0 k=j +1 |k | is well-dened, we have

j =0 k=j +1

k
j =0 k=j +1

|k |.

We now show that

j =0

k=j +1

|k | is well-dened. In (ii), set ajk as |k | if k j + 1, 0 otherwise.

ajk =

Then j =0 |ajk | = k |k | < for each k and sk = k |k |. By one-summability of {k }, {sk } is summable. So the conditions in (ii) are satised for this choice of ajk . We therefore conclude that

|k | =
j =0 k=j +1 j =0 k=0

ajk =
k=0 j =0

ajk =
k=0

k |k | < .

This completes the proof.

You might also like