You are on page 1of 13

APPLIED ECONOMETRICS

ASSIGNMENT 1
WONG SHAO YUN CHARIS
LILI LAURINA EISENRING


Question 1
Consider a simple linear regression: y
i
=
0
+
1
x
i
+ u
i
where u
i
is the error term. The sample
regression function is

, where the respective estimators of population parameters 0


and 1 are

.
Assume Gauss Markov assumptions are satisfied. This implies that

are unbiased
estimators, i.e.

. This also implies that the error terms are


homoskedastic, i.e.
(|

(P1)
Let


To prove that

, the following equations are needed:


E1:


E2:


E3:


E4: (

) [(


E5: (


E6: ((

((


E7: [

] (here we assume that error terms are serially uncorrelated with


each other)

(Substituting E2 inside)
((


((

(Substituting E1 inside)
((


((

)(

) (

( (

(Using E3 and E5)


( (

(Using E6)

*(

+ (

(Using P1, E4 and E7)

, i.e.


Hence,

is an unbiased estimator of

.






Question 2

(a) Yes. The p-value for
expersq
is 0 which is smaller than 1% significance level, thus we reject the
null hypothesis that
expersq
= 0. Thus, exper
2
is statistically significant at 1% significance level.

(b) (

)
Given other things being equal, using the above approximation, the approximation return to
fifth year of experience is:
100*(.0328542 + 2*(-.0006606)*5)*5 = 100(0.0262482*5) = 13.12%

(c) If education increases by 2 years, salary on average increases by 100*(.0853489*2) =17.1%,
holding other factors constant.

(d) F(4,521) is the F-statistic to test H
0
:
educ
=
tenure
=
exper
=
expersq
= 0 against H
1
: H
0
is not true.
Since the p-value = 0, it implies that F(4,521) = 73.09 is more than the critical value of F-
distribution with q= number of restrictions = 4 and degrees of freedom = n-k-1 = 521. Thus,
we reject H
0
:
educ
=
tenure
=
exper
=
expersq
= 0 at 5% significance level. Therefore, one or
more of these variables (education, tenure, experience and exper
2
) are important regressors
in explaining salary.

(e) Test H
0
:
exper
=
expersq
= 0 vs H
1
: H
0
is not true.
q= number of restrictions = 2 and degrees of freedom = n-k-1 = 521.
Unrestricted (UR) Model:


Restricted (R) Model:





_cons .1983445 .1019556 1.95 0.052 -.0019501 .3986392
expersq -.0006606 .0001111 -5.94 0.000 -.0008789 -.0004423
exper .0328542 .0051135 6.42 0.000 .0228085 .0428999
tenure .0208413 .0030037 6.94 0.000 .0149404 .0267422
educ .0853489 .0071885 11.87 0.000 .071227 .0994709

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .42704
Adj R-squared = 0.3545
Residual 95.0110462 521 .182362853 R-squared = 0.3595
Model 53.3187052 4 13.3296763 Prob > F = 0.0000
F( 4, 521) = 73.09
Source SS df MS Number of obs = 526
. reg lwage educ tenure exper expersq
Results for Restricted model:


(

(

(




The p-value = 0 is less than 5% significance level, implying F = 20.72 > critical F-value at 5%
significance level. Hence, we reject H
0
:
exper
=
expersq
= 0 at 5% significance level. Therefore,
it is quite likely that both experience and (experience)
2
are important in explaining wage.

Question 3
(a) As this model is a level-log model, thus interpretation of

would be: (

.
Therefore, holding other factors constant, a 1% increase in Candidate As campaign
expenditure on average increases percentage of the vote received by Candidate A
by (

.

(b) The null hypothesis is a 1% increase in As expenditures is offset by a 1% increase in Bs
expenditure. This is equivalent to











_cons .4044739 .0916956 4.41 0.000 .224337 .5846109
tenure .0258143 .0026795 9.63 0.000 .0205504 .0310782
educ .0865276 .0069909 12.38 0.000 .0727939 .1002613

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .44285
Adj R-squared = 0.3059
Residual 102.567109 523 .19611302 R-squared = 0.3085
Model 45.7626421 2 22.881321 Prob > F = 0.0000
F( 2, 523) = 116.67
Source SS df MS Number of obs = 526
. reg lwage educ tenure
Prob > F = 0.0000
F( 2, 521) = 20.72
( 2) expersq = 0
( 1) exper = 0
. test exper expersq
(c) Results for given model:

From the reported t-ratios for both

, both

are statistically significant at


5% significance level. Hence, As and Bs expenditures are significant explanatory variables in
affecting the outcome.
No, we cannot use these results to test the hypothesis in part (b). This is because the t-ratios
in the report are to test the following null hypothesis:

and not
null hypothesis in part (b).
Furthermore, the t-statistic to test the null hypothesis in part (b) is calculated as follows:

)

(

) (

) (

)
However, (

) is not obtained in the report, thus it is difficult to calculate the above


t-statistic using results in the report.

(d) Taking

from part (b) into account, let =

. We transform the given


model as follows:

( (

( (



Thus, to test

vs.

from part (b), we can use the t-


test for in the transformed model.







_cons 45.07893 3.926305 11.48 0.000 37.32801 52.82985
prtystrA .1519574 .0620181 2.45 0.015 .0295274 .2743873
lexpendB -6.615417 .3788203 -17.46 0.000 -7.363246 -5.867588
lexpendA 6.083316 .38215 15.92 0.000 5.328914 6.837719

voteA Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 48457.2486 172 281.728189 Root MSE = 7.7123
Adj R-squared = 0.7889
Residual 10052.1389 169 59.480112 R-squared = 0.7926
Model 38405.1096 3 12801.7032 Prob > F = 0.0000
F( 3, 169) = 215.23
Source SS df MS Number of obs = 173
. reg voteA lexpendA lexpendB prtystrA
Results for transformed model: (Note: lexpendA_B = lexpendA- lexpendB)


The absolute value of t-ratio for coefficient of lexpendB is less than 2. Thus, we fail to reject

at 5% significance level. Equivalently, we fail to reject

at 5% significance level. Hence, we suspect that 1% increase in As


expenditures is highly unlikely to be offset by a 1% increase in Bs expenditure.

(e) Testing the homoskedasticity assumption is equivalent to testing


against

, where

are slope parameters in the following


auxiliary regression:

.

Results from auxiliary regression:

LM statistic = n* R
2
uhatsq
= 173 * 0.0545 = 9.4285 > 7.815 =


Hence, we reject null hypothesis and conclude that the model contains heteroskedastic
errors (i.e. homoskedasticity assumption is not met).



_cons 45.07893 3.926305 11.48 0.000 37.32801 52.82985
prtystrA .1519574 .0620181 2.45 0.015 .0295274 .2743873
lexpendA_B 6.083316 .38215 15.92 0.000 5.328914 6.837719
lexpendB -.532101 .5330858 -1.00 0.320 -1.584466 .5202639

voteA Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 48457.2486 172 281.728189 Root MSE = 7.7123
Adj R-squared = 0.7889
Residual 10052.1388 169 59.4801115 R-squared = 0.7926
Model 38405.1097 3 12801.7032 Prob > F = 0.0000
F( 3, 169) = 215.23
Source SS df MS Number of obs = 173
. reg voteA lexpendB lexpendA_B prtystrA

_cons 159.5399 46.05722 3.46 0.001 68.6183 250.4615
prtystrA -.5260204 .7274986 -0.72 0.471 -1.962176 .910135
lexpendB -3.011196 4.443722 -0.68 0.499 -11.78355 5.761159
lexpendA -12.01325 4.482781 -2.68 0.008 -20.86271 -3.163787

u2 Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1463001.66 172 8505.8236 Root MSE = 90.469
Adj R-squared = 0.0378
Residual 1383202.77 169 8184.63179 R-squared = 0.0545
Model 79798.8863 3 26599.6288 Prob > F = 0.0233
F( 3, 169) = 3.25
Source SS df MS Number of obs = 173
. reg u2 lexpendA lexpendB prtystrA
(f) To test for nonlinearities in the original, we test

where

are slope
parameters in the following regression:



The results of the above regression:



Hence, reject

at 5% significance level. Thus, the original model is suspected


to omit nonlinear terms.

Question 4
(a) Suppose the true regression model is y
i
=
0
+
1
x
1
+
2
x
2
+ u
i
, but the researcher ran a
wrong regression model: y
i
=
0
+
1
x
1
+ v. To obtain

, one has to minimize


with respect to
0
and
1
. Thus, after differentiating with
respect to
0
and 1, the following first order conditions are:

(

-- (1)

-- (2)





_cons -68.73712 16.28095 -4.22 0.000 -100.8801 -36.59411
yhat3 -.0005976 .0000759 -7.88 0.000 -.0007474 -.0004478
yhat2 .0836951 .0107371 7.79 0.000 .0624972 .1048931
prtystrA -.3593402 .0921216 -3.90 0.000 -.5412132 -.1774672
lexpendB 16.34374 3.379216 4.84 0.000 9.67225 23.01523
lexpendA -15.66736 2.972818 -5.27 0.000 -21.53651 -9.798213

voteA Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 48457.2486 172 281.728189 Root MSE = 6.6197
Adj R-squared = 0.8445
Residual 7318.06896 167 43.8207722 R-squared = 0.8490
Model 41139.1796 5 8227.83592 Prob > F = 0.0000
F( 5, 167) = 187.76
Source SS df MS Number of obs = 173
. reg voteA lexpendA lexpendB prtystrA yhat2 yhat3
Prob > F = 0.0000
F( 2, 167) = 31.20
( 2) yhat3 = 0
( 1) yhat2 = 0
. test yhat2 yhat3
Substitute (1) into (2):

( (

))

( (



(b) Assume that x
1
and x
2
are not correlated with each other. This implies that error term v
which contains x
2
is uncorrelated with x
1
, implying that zero conditional mean assumption
holds, i.e.

. Therefore, this results in

being an unbiased estimator


for
1
(shown below).

[

(This is because zero conditional mean assumption holds)



[



However, if zero conditional mean assumption does not hold, this would imply error term v
is correlated with x
1
, i.e.

.

[



Hence,

would be a biased estimator for


1
(shown above).
Therefore, the assumption that makes

unbiased is the zero conditional mean, i.e.


|

.

(c) Assume x
1
and x
2
are positively correlated and
2
has a positive theoretical sign.
Positive correlation between x
1
and x
2
implies that x
1
and v are positively correlated, i.e.
(

+ (


(This is because under the true regression model, zero conditional mean is satisfied, i.e.
(

)

Therefore, since (

> 0, this violates the zero conditional mean assumption. As a


result,

would be a biased estimator for


1
(as shown in part (b)).

(

(By law of large numbers under large sample size n)



(As (

)

Hence, even under large samples,

overestimates true

. Thus, this would imply


that [

. Therefore,

is biased and on average it overestimates the true

.


Question 5
(a) Mean of prpblck = 0.113 =11.3%; Standard deviation of prpblack= 0.1824165
Mean of income = 47053.78; Standard deviation of income= 13179.29
Units of measurement for prpblack and income are percentage points and dollars
respectively.






(b) Results:

Results in equation form:

(0.018992) (0.026001) (0.000000362)
n = 401 R
2
= 0.0681

Interpret the coefficient on prpblck: As the proportion of blacks increases by 0.1 (i.e. 10
percentage points), price of soda on average increases by 0.012 dollars, ceteris paribus.

The coefficient on prpblck is not economically large as its magnitude is close to 0.

(c) Results:

Interpret the coefficient on prpblck: As the proportion of blacks increases by 0.01 (i.e. 1
percentage point), price of soda on average increases by 0.1215%, ceteris paribus.
If prpblck increases by 0.2 (20 percentage points), price of soda on average increases by
2.43%, ceteris paribus.




(d) Results of model in part (c) after including prppov:

Intuition: The higher the proportion in poverty (i.e. higher prppov), the lower the demand
for soda. This is because there are fewer people who are able to afford soda. Thus, by the
theory of demand, price of soda (psoda) will increase. This implies a positive relationship
between prppov and psoda.

Since the coefficient of prppov is positive (as shown in the above report), it is what we
expected.

(e) Because log(income) and prppov are so highly correlated, they have no business being in
the same regression.

The above statement is false. According to the classical linear model assumptions, there
should be no perfect collinearity, implying no perfect correlation between log(income) and
prppov. However, this assumption of no perfect collinearity allows independent explanatory
variables to be correlated, but not perfectly. Since the correlation between log(income) and
prppov (shown below) shows that these 2 variables are not perfectly correlated, thus
log(income) and prppov can be in the same regression.




(f) (i) Test

Hence, based on the p-value, we reject H
0
at 5% significance level. We conclude that one or
more of these variables (prpblck, log(income) and prppov) are important in explaining the
price of soda.

(ii) Test

Hence, based on the p-value, we fail to reject H
0
at 5% significance level.

(iii) Test
|| |



Hence, we reject H
0
at 5% significance level.

You might also like