You are on page 1of 20

Panel Data Regression

Vid Adrison
Outline
Structure of Data
Structure of error in panel data
Strict exogeneity assumption
Estimation techniques in panel data under
strict exogeneity assumption
Estimation technique when strict
exogeneity assumption is violated

Structure of Data
Time Series: Single individual, many time observation. Ex: West
Java Rice production, from 1980 - 2007
Cross Section: Many individuals, single time observation. Ex:
Indonesian Rice production by provinces, in 2007
Panel/Longitudinal: Many individuals, and multiple time
observations. Ex: Indonesian rice production by provinces, from
1980-2008

Error structure in panel data


Ci is usually called: unobserved component, latent variable,
unobserved heterogeneity, individual effect, individual heterogeneity
Ci assumed to be constant over time, and vary across individuals.
For instance:
ability in wage equation Ability is unobserved by econometrician, but
definitely affects individuals wage
The analysis of panel data is centered around the assumption of Ci
(i.e., whether or not Ci is correlated with explanatory variables)
If Ci is correlated with one or more explanatory variables, then Fixed
Effect is the appropriate technique
If Ci is uncorrelated with any explanatory variable, then Random
Effect is the appropriate technique
it i it it
u c X Y + + = |
Strict Exogeneity Assumption
Strict Exogeneity Assumption
If stated in term of unobserved effect

Once Xit and Ci are controlled for, there are no other
variables affecting the value of Yit
If stated in term of the idiosyncratic error



This assumption is much stronger, because it does not
allow any arbitrary correlation between error and any
covariates
Standard Fixed Effect and Random effect regression
only valid when strict exogeneity assumption is
satisfied
( ) ( )
i it i i it i iT i i it
c X c x Y E c x x x Y E + = = | , | , ,.... , |
1 2 1
( )
i iT i i it
c x x x u E , ,.... , |
2 1
( ) T t s u x E
it is
,....., 1 , 0
'
= =
Strict Exogeneity Assumption
Examples of Strict Exogeneity Assumption
violation
Example 1:
If individuals decision to participate in a training is
influenced by shocks on his/her wage in the past,
or if the administrator choose individuals with low
Uit to participate in the training in t+1, then strict
exogeneity assumption might not be satisfied


( ) ( )
it i it it it
u c training X wage + + + = o | log
Strict Exogeneity Assumption
Example 2:
In this model, individuals wage depends on his/her
wage in the past. Recall that the fundamental
assumption in panel is E(Xis,Uit)=0 for s,t=1,.T.
If there is a shock on wage at time t, it will affect
the wage in time t. Since lagged wage is included
as explanatory variable, then E(Xis,Uit) will not be
equal to zero. Thus, any model with lagged
dependent variable will not satisfy the strict
exogeneity assumption, therefore, standard
random effect or standard fixed effect will not be
appropriate

( ) ( )
it i it it it
u c wage X wage + + + =
1
log o |
Estimation Techniques under Strict
Exogeneity Assumption
A: FIXED EFFECT: if unobserved heterogeneity (Ci) is arbitrarily
correlated with observed characteristics
Example:
Firms decision to evade taxes depend on unobservable characteristics (i.e.,
managers preference to cheat). However, there is a possibility that the
decision is related to some observed characteristics, such as asset size and
cash flow (Big firms have higher incentives to evade taxes, and are more
able to pay fines if the evasion is detected)
There are two techniques of estimation under Fixed effect
Between Estimators
Estimate the parameters using cross sectional information
Run the average value of each individual
What is the average between Mary and Joe if they differ in X by one unit?
Within Estimators
Estimate the parameters using time series information of each individual
Calculated by regressing the difference of each variable with its over time
average, to get rid off the time constant unobservable
What is the expected difference in Joes value if X increases by one unit?


( )
i it i it i it
u u X X Y Y + = |
Estimation Techniques under the
Strict Exogeneity Assumption
If we have an individual with at least one
variable that is constant over time,
parameter estimates can not be obtained.
Example: We want to see what factors cause the
economic growth of a city. In the specification, we
include dummy variable to indicate the location of
a city, i.e., whether it is located near to the sea.
Since the value of location dummy will be constant
over time, the difference will be zero, just like the
difference of the unobserved heterogeneity. Thus,
we can not distinguish the effect of time-constant
observable and time-constant unobservable



Estimation Techniques under the
Strict Exogeneity Assumption
B. RANDOM EFFECT: if unobserved heterogeneity (Ci) is
uncorrelated with explanatory variable
If we assume that the constants (unobserved heterogeneity) are
randomly distributed across cross sectional unit
It would be appropriate if we believe that sampled cross-sectional
units were drawn from a large population
The estimation is conducted under FGLS
The parameter value of Random Effect would be a weighted
average of Between and Within Estiamators
( ) |
.
|

\
|
E |
.
|

\
|
E = O O =

=

n
i
i i
n
i
i i
Y X X X Y X X X
1
1 '
1
1
1 ' 1
1
1
' ' |
(

= E
' 2 / 1
1
T T
i i
T
I
u
o
c
2 2
1
u
To o
o
u
c
c
+
=
Where

Estimation Techniques under Strict
Exogeneity Assumption
If we do not have time-constant variable, which
method is appropriate?
Use Hausman Test basically test whether there is a
systematic difference between the two specification
For instance: Specification RE uses Random Effect, and
Specification FE uses Fixed effects
Ho is: There is no systematic difference between specification
RE and FE
Ha: There is a systematic difference between specification RE
and FE
Specification FE is consistent in both Ho and Ha
Specification RE is inconsistent in Ha, but efficient in Ho

Estimation Technique when Strict
Exogeneity Assumption is Violated
General steps:
Use a transformation to eliminate the unobserved
heterogeneity
Choose an instrument for endogenous variables in
the transformed equation
Estimate using pooled 2SLS

( ) ( ) ( ) ( )
1 2
1
1 2 1 1 1
1 2 1 1 1
1

A A
A + A + A = A
+ + =
+ + + =
+ + + =
it it
it it it it
it it it it it it it it
it it i i it it it it it it
it i it it it
Y for instrument as Y use
u Y X Y
u u Y Y X X Y Y
u u c c Y Y X X Y Y
u c Y X Y
o |
o |
o o | |
o |
Regression Results

rho .83717807 (fraction of variance due to u_i)
sigma_e 3.612922
sigma_u 8.1923983

_cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692
unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619
exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124

mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369
Random effects u_i ~ Gaussian Wald chi2(2) = 0.90
overall = 0.0433 max = 3
between = 0.0732 avg = 3.0
R-sq: within = 0.0015 Obs per group: min = 3
Group variable: id Number of groups = 51
Random-effects GLS regression Number of obs = 153
. xtreg mrdrte exec unem

_cons .348119 2.68724 0.13 0.897 -4.961612 5.65785
unem 1.258905 .4373612 2.88 0.005 .394721 2.12309
exec .1650227 .1938679 0.85 0.396 -.2180419 .5480872

mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 12845.3381 152 84.5088034 Root MSE = 8.9612
Adj R-squared = 0.0498
Residual 12045.5418 150 80.3036122 R-squared = 0.0623
Model 799.796283 2 399.898141 Prob > F = 0.0081
F( 2, 150) = 4.98
Source SS df MS Number of obs = 153
. reg mrdrte exec unem
Prob > chi2 = 0.0000
chi2(1) = 98.47
Test: Var(u) = 0
u 67.11539 8.192398
e 13.05321 3.612922
mrdrte 84.5088 9.192867

Var sd = sqrt(Var)
Estimated results:
mrdrte[id,t] = Xb + u[id] + e[id,t]
Breusch and Pagan Lagrangian multiplier test for random effects
. xttest0
Specification Test
What method to choose?
Depends on the existence of unobserved heterogeneity
If the unobserved heterogeneity is significant, then it on whether
or not it is correlated with observed characteristic
We can employ Hausman Test; which basically test RE
against FE. In Hausman Test, statistically significant difference
is interpreted as evidence against the random effects
However, there are two caveats;
1. Correlation between observed characteristic and idyosincratic
error causes both RE and FE to be inconsistent
2. Hausman is conducted under two assumptions; unobserved
characteristic is uncorrelated with observed characteristic, and it
is normally distributed. If it is not normally distributed, then
Hausman does not have systematic power against this
condition.

Specification Test
In null, OLS, FE and RE all consistent. If null is rejected, RE
is inconsistent.
However, there are cases when the difference between FE
and FE coefficients are small, but statistically significant.
On the other hand, there are cases where RE and FE
coefficients differ greatly, but we cannot reject null due to
large standard error.
In this circumstances, a typical response is to choose RE
specification. However, this comes at the cost of increased
Type II error (failing to reject null, when it is false)
. est store fe
F test that all u_i=0: F(50, 100) = 16.46 Prob > F = 0.0000

rho .85542114 (fraction of variance due to u_i)
sigma_e 3.612922
sigma_u 8.788124

_cons 7.637844 1.684436 4.53 0.000 4.295971 10.97972
unem .095914 .2800721 0.34 0.733 -.4597411 .6515692
exec -.1140743 .1800836 -0.63 0.528 -.4713551 .2432065

mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.0635 Prob > F = 0.7909
F(2,100) = 0.24
overall = 0.0002 max = 3
between = 0.0007 avg = 3.0
R-sq: within = 0.0047 Obs per group: min = 3
Group variable: id Number of groups = 51
Fixed-effects (within) regression Number of obs = 153
. xtreg mrdrte exec unem, fe
. est store re

rho .83717807 (fraction of variance due to u_i)
sigma_e 3.612922
sigma_u 8.1923983

_cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692
unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619
exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124

mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369
Random effects u_i ~ Gaussian Wald chi2(2) = 0.90
overall = 0.0433 max = 3
between = 0.0732 avg = 3.0
R-sq: within = 0.0015 Obs per group: min = 3
Group variable: id Number of groups = 51
Random-effects GLS regression Number of obs = 153
. xtreg mrdrte exec unem
Prob>chi2 = 0.0336
= 6.79
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
b = consistent under Ho and Ha; obtained from xtreg

unem .095914 .2560543 -.1601403 .0711792
exec -.1140743 -.0351956 -.0788787 .0786584

fe re Difference S.E.
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Coefficients
Mundlaks Approach
Although with the Hausman test available, choosing
between FE and RE specification poses a dilemma.
FE is robust to correlation between the unobserved
heterogeneity and explanatory variables. However, we cannot
use time invariant regressors.
RE, on the other hand, can use time invariant regressors, but the
assumption of zero correlation between unobserved
heterogeneity and explanatory variables is unlikely
Mundlak (1978) proposes modification of RE that would
at least partially overcome its deficit.
The trick is to include additional variables the time average of
time-varying variables into the regression
it i i it it
u c X X y + + + = |
.

You might also like