HONOURS ECONOMETRICS TUTORIAL 7 MORE ON SPECIFICATION AND DATA ISSUES 06 APRIL 2011 ECO4016F Part A: Problems Answer any three questions in this section.
HONOURS ECONOMETRICS TUTORIAL 7 MORE ON SPECIFICATION AND DATA ISSUES 06 APRIL 2011 ECO4016F Part A: Problems Answer any three questions in this section.
HONOURS ECONOMETRICS TUTORIAL 7 MORE ON SPECIFICATION AND DATA ISSUES 06 APRIL 2011 ECO4016F Part A: Problems Answer any three questions in this section.
06 APRIL 2011 ECO4016F Part A: Problems Answer any three questions in this section 1. (a) If the true model is 1 i = , 1 +, 2 A 2i +, 3 A 3i +n i but you t 1 i = c 1 +c 2 A 2i + i what model specication error will you have committed? What will be the properties of b c 1 and b c 2 with respect to bias if A 2 and A 3 are uncorrelated? (b) What are the dierences between an outlier, an observation with high leverage and an inuential observation? 2. (a) For a two variable regression model 1 i = , 1 + , 2 A i + n i , show that when there are errors of measurement in A (rather than 1 ) the explanatory variable and the error term are correlated. (b) If the true model is 1 i = c 1 +c 2 A 2i + i but you t 1 i = , 1 +, 2 A 2i +, 3 A 3i +n i what will be the properties of b , 1 , b , 2 and b , 3 with respect to bias? What is the value of 1( b , 3 )? 3. Let :ct/: denote the percentage of students at a Western Cape high school receiving a passing score on a standardised maths test. We are interested in estimating the eect of per student spending (crjc:d) on maths performance. A simple model is :ct/: = , 0 + , 1 log(crjc:d) + , 2 log(c::o||) + , 3 joc:t + n. where c::o|| is student enrollment (to reect school size) and joc:t is the percentage of students living in poverty. (a) The variable /o/:4qood is the percentage of students eligible to receive school shoes from Bobby Skinstads bobsforgood foundation (http://bobsforgood.co.za). Why is this a sensible proxy for joc:t? (b) The table that follows contains OLS estimates, with and without /o/:4qood as an explanatory variable. 1 Dependent Variable: :ct/: Independent Variables (1) (2) log(crjc:d) 11.13 (3:30) 7.75 (3:04) log(c::o||) .022 (:615) 1.26 (:58) /o/:4qood .324 (:036) i:tc:ccjt 69.24 (26:72) 23.14 (24:99) o/:c:ctio:: 428 428 1 2 .0297 .1893 Explain why the eect of expenditures on :ct/: is lower in column (2) than in column (1). Is the eect in column (2) still statistically greater than zero? (c) Does it appear that pass rates are lower at larger schools, other facts being equal? Explain. 4. We are interested in estimating a model relating number of campus crimes to student enrollment for a sample of universities in 2006. The sample we have is not a random sample of universities in South Africa, because many universities did not report campus crimes in 2006. Do you think that university failure to report crimes can be viewed as exogenous sample selection? Explain. 5. The following equation explains weekly hours of television viewing by a child in terms of the childs age, mothers education, fathers education, and number of siblings: t/on::
= , 0 + , 1 cqc + , 2 cqc 2 + , 3 :ot/cdnc + , 4 ,ct/cdnc + , 5 :i/: + n. We are worried that t/on::
is measured with error in our survey. Let t/on:: denote the
reported hours of television viewing per week. (a) What do the classical errors-in-variables (CEV) assumptions require in this ap- plication? (b) Do you think the CEV assumptions are likely to hold? Explain. Part B: Computer Exercises Well go over Questions 2, 3 and 4 in the tutorial. Questions 1 and 5 are home- work. 1. Use the data set WAGE2.dta for this exercise. The dataset contains information on monthly earnings, education, several demographic variables, and IQ scores for 935 men in 1980. (a) Apply RESET to the model log(ncqc) = , 0 + , 1 cdnc + , 2 exp c: + , 3 tc:n:c + n Is there evidence of functional form mispecication in the model? 2 (b) Use the Davidson-Mackinnon test to test the model log(ncqc) = , 0 + , 1 cdnc + , 2 exp c: + , 3 tc:n:c + n (1) against the model log(ncqc) = , 0 + , 1 log(cdnc) + , 2 log(exp c:) + , 3 log(tc:n:c) + n (2) (c) Now estimate the following model log(ncqc) = , 0 + , 1 cdnc + , 2 exp c: + , 3 tc:n:c + , 4 :c::icd + , 5 :ont/ +, 6 n:/c: + , 7 /|cc/ + , 8 1Q + n where IQ controls for omitted ability bias. (d) Now use the variable KWW (the knowledge of the world of work test score) as a proxy for ability in place of IQ. What is the estimated return to education in this case? (e) Now use IQ and KWW together as proxy variables. What happens to the esti- mated return to education? (f) In part (e), are IQ and KWW individually signicant? Are they jointly signi- cant? 2. You need to use two datasets for this exercise, JTRAIN2.dta and JTRAIN3.dta. The former is an outcome of a job training experiment. The le JTRAIN3.dta contains observational data, where individuals largely determine whether they participate in job training. The datasets cover the same time period. (a) In the dataset JTRAIN2.dta, what fraction of the men received job training? What is the fraction in JTRAIN3.dta? Why do you think there is such a big dierence? (b) Using JTRAIN2.dta, run a simple regression of :c78 on t:ci:. What is the esti- mated eect of participating in job training on real earnings? (c) Now add as controls to the regression in part (b) the variable :c74. :c75. cdnc. cqc. /|cc/. and /i:j. Does the estimated eect of job training on :c78 change much? How come? (Hint: Remember that these are experimental data) (d) Do the regression in part (b) and (c) using the data in JTRAIN3.dta, reporting only the estimated coecients on t:ci:, along with their t-statistics. What is the eect now of controlling for the extra factors, and why? (e) Dene cq:c = (:c74 + :c75),2. Find the sample averages, standard deviations, and minimum and maximum values in the two datasets. Are these datasets representative of the same populations in 1978? 3 (f) Almost 96% of the men in the dataset JTRAIN2.dta have cq:c less than $10,000. Using only these men, run the regression of :c78 on t:ci:. :c74. :c75. cdnc. cqc. /|cc/. and /i:j and report the training estimate and its t statistic. Run the same regression for JTRAIN3.dta, using only men with cq:c 10. For the subsample of low income men, how do the estimated training eects compare across experimental and nonexperimental data sets? (g) Now use each data set to run the simple regression :c78 on t:ci:, but only for men who were unemployed in 1974 and 1975. How do the training estimates compare now? (h) Using your ndings from the previous regressions, discuss the potential impor- tance of having comparable populations underlying comparisons of experimental and nonexperimental estimates. 3. Use the state-level data on murder rates and executions in MURDER.dta for the following questions. The variable ::d:tc is the murder rate, that is, the number of murders per 100, 000 people. The variable crcc is the total number of prisoners executed for the current and prior two years; n:c: is the state unemployment rate. Use the data for the year 1993 for this question, although you will need to rst obtain the lagged murder rate, say ::d:tc 1 . (a) Run the regression of ::d:tc on crcc, n:c:. What are the coecient and t statistic on crcc? (b) How many executions are reported for Texas during 1993? (Actually, this is the sum of executions for the current and past two years.) How does this compare with the other states? Add a dummy variable for Texas to the regression in part (a). Is its t statistic unusually large? From this, does it appear Texas is an outlier? (c) To the regression in part (a) add the lagged murder rate. What happens to b , exec and its statistical signicance? (d) For the regression in part (c), does it appears Texas is an outlier? What is the eect on b , exec from dropping Texas from the regression? 4. Use the dataset JTRAIN.dta for Michigan manufacturing rms. (a) Consider the simple regression model log(:c:cj) = , 0 + , 1 q:c:t + n. where scrap is the rm scrap rate and grant is a dummy variable indicating whether a rm received a job training grant. Can you think of some reasons why the unobserved factors in u might be correlated with grant? 4 (b) Estimate the simple regression model using the data for 1988. (You should have 54 observations.) Does receiving a job training grant signicantly lower a rms scrap rate? (c) Now add as an explanatory variable log(:c:cj 87 ). How does this change the estimated eect of q:c:t? Interpret the coecient on q:c:t. Is it statistically signicant at the 5% level against the one-sided alternative H 1 : , grant < 0? (d) Test the null hypothesis that the parameter on log(:c:cj 87 ) is one against the two-sided alternative. Report the p-value for the test. (e) Repeat parts (c) and (d), using heteroskedasticity-robust standard errors, and briey discuss any notable dierences. 5. Use the le CHICKEN.dta to study the demand for chicken in the US, 1960-1982. This le contains data for the following variables: 1 = per capita consumption of chickens, in kg A 2 = real disposable income per capita, in $ A 3 = real retail price of chicken per kg, in cents A 4 = real retail price of pork per kg, in cents A 5 = real retail price of beef per kg, in cents A 6 = composite real price of chicken substitutes per kg, in cents (which is a weighted average of the real retail prices per kg of pork and beef, the weights being the relative consumptions of beef and pork in total beef and pork consumption). Now consider the following demand functions: ln 1 t = c 1 + c 2 ln A 2t + c 3 ln A 3t + n t (1) ln 1 t = 1 + 2 ln A 2t + 3 ln A 3t + 4 ln A 4t + n t (2) ln 1 t = ` 1 + ` 2 ln A 2t + ` 3 ln A 3t + ` 5 ln A 5t + n t (3) ln 1 t = o 1 + o 2 ln A 2t + o 3 ln A 3t + o 4 ln A 4t + o 5 ln A 5t + n t (4) ln 1 t = , 1 + , 2 ln A 2t + , 3 ln A 3t + , 6 ln A 6t + n t (5) From microeconomic theory it is known that the demand for a commodity generally de- pends on the real income of the consumer, the real price of the commodity, and the real prices of competing and complementary commodities. In view of these considerations, answer the following questions. (a) Which demand function among the ones given here would you choose, and why? What is the dierence between specications (2) and (4)? What problems do you foresee if you adopt specication (4)? (b) Since specication (5) includes the composite price of beef and pork, would you prefer the demand function (5) to function (4)? Why? Are pork and/or beef competing or substitute products to chicken? How do you know? 5 (c) Assume function (5) is the correct demand function. Estimate the parameters of this model, obtain their standard errors, and R 2 , adjusted-R 2 . Interpret your results. Now suppose you run the incorrect model (2). Assess the consequences of this misspecication by considering the values of 2 and 3 in relation to , 2 and , 3 respectively. (d) Assume now that model (1) is the true demand function, if we now estimate model (5), what type of specication error is committed in this instance? What are the theoretical consequences of this type of specication error? Illustrate with the data at hand. (e) Are models (2) and (3) nested models? Motivate. How do you decide the model to adopt between the two of them? Which one is preferable? 6