You are on page 1of 1

Econometrics Exercise Sheet 6

Download the two datasets jobtrain.dta and mlda.dta from the moodle elearning platform
(folder: Exercise 6) to your computer, and save it on the X: drive.
Open Stata, go to the do-file editor and set up a new do-file called econometrics ex6.

Difference-in-difference regression

1. Load the data jobtrain.dta. The dataset contains information on real earnings of workers in
the years 1974 and 1976. Some of the workers have participated in a job training program in
1975 (participation in the training program is indicated by the dummy variable train). We are
interested in the effect of the job training program on real earnings and want to measure it using
a difference-in-difference approach.
2. Consider the following regression (year76t is a dummy equal to zero in year 1976 and zero otherwise):
realearningsit = 0 + 1 year76t + 2 traini + 3 [traini year76t ] + it

(1)

Which coefficient reflects the difference-in-difference effect? How do you interpret it?
3. Use the regression above to estimate the effect of the job training program on real earnings. Interpret
the effect of the training program quantitatively and comment on its statistical significance.
4. Use the commands margin and marginplots to compare the real earnings in both groups before
and after treatment.

Sharp regression discontinuity


1. Load the data mlda.dta. The data gives information on death rates (measured as deaths per
100,000 persons per year) by month of age (defined as 30-day intervals). We are interested in
the effect of legal access to alcohol (Minimum Legal Drinking Age, MLDA) on death rates using
regression discontinuity design.
2. Explain the idea behind regression discontinuity design considering treatment status, Da , as a
deterministic function of a, the running variable:
(
1, if a >= 21
Da =
(2)
0, if a < 21
Why is the running variable central to the RD story? How to understand the attribute sharp?
3. Run the following simple regression of death rates on age (centered):
M a = + Da + (a a0 ) + a ,

(3)

Interpret the estimates and explain whether they have a causal claim. Is there an Omitted Variable
Bias?
4. Plot the estimation results of the relationship between the running variable and the outcome and
interpret the graph.
5. Explain why RD tools do not necessarily produce reliable causal estimates when nonlinear trends
are neglected.
6. Modify the simple regression above with (a) including a quadratic term of the running variable:
M a = + Da + 1 (a a0 ) + 2 (a a0 )2 + a ,

(4)

and (b) including interaction terms, respectively:


M a = + Da + 1 (a a0 ) + 2 (a a0 )2 + 1 [(a a0 )Da ] + 2 [(a a0 )2 Da ] + a ,

(5)

Graph all three models and explain which one has the better fit. Is there a formal test for model
fit? Whats the difference between specification (a) and (b)?

You might also like