Part 2 Rothart Empirical Research Methods

The linear Regression Model
Linear Regression with one Regressor
• The regression describes the unknown effect of changing one

variable X on an other variable Y
• example: agriculture researcher is interested in the effect of fertilizer

on yield, holding other factors fixed
change on yield ∆Yield

β fert = =
change of fertilizer ∆Fertilizer
• Interpretation
β fert : change in Yield that results from changing in fertilizer
∆ Yield=β fert ⋅ ∆ Fert β fert =0.3 ∆Fert=2 ∆Yield=0,3 ⋅ 2=0,6
• This effect is described by the regression line (population)
Yi = β 0 + β1 X i + ui i=1,2,... n
• Yi … dependend variable
• Xi … independend variable
• β 0 + β1 X i ... population regression line
• β 0 ... intercept of the population regression line
• β1 ... slope of the population regression line
• u i ... error term: deviation of observed data and data on regression line
• the slope β1 ,is the expected change in Yi associated with a 1-
unit change in Xi.
• the intercept β0 determines the level of the regression line
• Intercept and slope are called coefficients or parameters
• ui shows the difference between Yi and the population regression
function. The error term contains all the other factors besides Xi
that determine the value of the dependent variable Yi, for a
specific observation. The error term is a stochastic component
and random variable.
Causing error terms:

• not all variables influencing Y can be used
• only the important variables are use
• indeterminacy in nature
• Yi = β 0 + β1 X i + ui i=1,2,... n population regression function
- descriptive statistics
- Problem: population (Data generating process) mostly
unknown
• draw sample – sample regression function
- Yi = b0 + b1 X i + ei
- random variables
- inference statistics
- econometricians try to estimate a population regression
function on basis of samples
OLS (OrdinaryLeast Squares) Estimator
• in a practical situation, the intercept β0 and slope β1 of the
population regression line are unknown
• the unknown slope and intercept have to be estimated by OLS
estimator
• the OLS estimator chooses the regression coefficients so that
the estimated regression line is as close as possible to the
observed data, where closeness is measured by the sum of
squared mistakes made in predicting Y given X (min ei ) ∑i
• b0 and b1 are estimators of β0 and β1
• regression line based on these estimators: b0+ b1X
• the value of predicted Yi using this line: b0+ b1 Xi
• the sum of squared prediction mistakes over n observations
n
Y − b −
∑ i 0 1 i
( b X ) 2
i =1
Derivation of the OLS estimators:
• minimize the square mistakes of the predicted values
n n
min ∑ i =1 (Yi − ˆYi ) 2 = min ∑ i =1 (Yi − b0 − b1 X i )
β0 β1 b0b1
ˆ
Y
• i − Y i = ei … residual: difference of the predicted value and Yi
• first take partial derivations with respect to b0 and b1
ϑ ∑ e2 n 2 n
∑ ( Yi - b 0 - b1 X i ) = -2 ∑ ( Yi - b 0 - b1 X i ) = − 2 ∑ i e 2 = 0
ϑ b0 i =1 i =1
ϑ ∑ e2 n n
∑ (Y - b 0 - b1 X i ) = -2 ∑ ( Y i - b 0 - b1 X i ) = − 2 ∑ i X i e i = 0
2
ϑ b1 i =1
i
i =1
n
∑(X − X )(Yi − Y )
i
∑ ( X − X )Y Cov(Y , X )
b1 = i =1
n
= n
i i
=
Var ( X )
∑(Xi − X )
i =1
2
∑(X − X )
i =1
i
2
b0 = Y − b1 X
ei = Yi − b0 − b1 X i
Results of OLS estimation
• sum of residuals and therefor its mean equals zero
∑e
i =0
- only for residuals with intercept
- only for estimated residuals
- Proof:
• Mean of fitted Y equals mean of observed Y

^
Y =Y
- Proof:
• sum of the product of estimated residuals and the

independent variable equal zero/ the estimated residuals
do not correlate with the independent variable.
∑X e i i =0
- Proof:
• Covariance of the fitted Y and the estimated residuals of an
OLS estimation with intercept equal always zero.
^
Cov(Y , e) = 0
- Proof:
Sampling distribution
• repeated sampling
• b0, b1 …. random variables, whereas β0 and β1 are fixed
variables – sampling distribution, expected value, variance
• ei depends on value of b0 and b1
• ei … random variable – Y is random variable as well
• fitted Y is conditional expextet value of Y (on the average)
• Assumptions about population (data generating process)

Yi = β 0 + β1 X i + ui
E (ui ) = 0
Var (ui ) = σ 2
ui ~ iid (0,σ 2 )
• Properties of sampling distribution:
- Average of the estimation is closed to the value of the
population – law of large numbers: the average of the
results obtained from a large number of trials should be
close to the expected value, and will tend to become
closer as more trials are performed.
- Distribution of estimations are bell-shaped – central
theorem: the mean of a sufficiently large number of
independent random variables, each with finite mean and
variance, will be approximately normally distributed
Measures of fit: R² and standard error
Regression R² and standard error measure how well the OLS
regression line fits the data.
Coeffizient of determination: R²
• R² is the ratio of the sample variance Yi explained by Xi.
sample variance of Yˆ i EES
R =
2
=
sample variance of Yi TSS
• explained sum of squares
• total sum of squares
• sum of squared residuals:

• if b1 =0, then Xi explains none of the variation of Yi, and the
predicted value of Yi based on the regression is just the sample
average of Yi.
ESS=0 SSR=TSS thus R²=0
Xi explains all of the variation of Yi, ESS=0
Xi explains all of the variation of Yi then Yi =Yi,

the residual is zero and ESS=TSS and R²=1
Standard error
• The standard error term of regression is an estimator of the
standard deviation of the regression error ui
• so the standard error term is a measure of the spread of the
observation around the regression line, measured in the units
of the dependent variable
E (∑ ei2 ) SSR
σ² = =
n−2 n-2
• example:
R²=0,043
sei=23,2 not all factors might be included
Assumptions of OLS Estimators
The concept of sampling distribution allows us to define the
properties of estimating function very precisely. We want the
estimated parameters being very close to the value of the
population on the average and have a very smal variance.
We want them to be:
- unbiased
Repeated sampling of small size
- efficient
- and consistent in contrast to other linear estimating functions
GAUSS - MARKOV THEOREM: OLS estimator shows
under certain assumptions, the smallest variance of all linear and
unbiased estimating functions – BLUE (Best linear unbiased
estimator) Under the Gauss Markov conditions, the OLS
estimator is the best linear unbiased estimator of the regression
coefficients conditional on the values of the regressors.
Unbiasedness
Assumption: X are fixed in repeated sampling

• Unbiasedness of b0, b1
∑ ( X − X )Y
b = i i
∑(X − X )
1 2
i
- b1: linear combination of Yi

- X and u are stochastically independent - exogenity
E(ui ) = 0
• Variance, Cov of b0, b1
- Assumption: ui ~ iid (0,σ )
2
σ2
Var (b1 ) =
∑ (X
i i − X )²
The smaller the variance of b1

- the smaller the variance of the population σ²
- the larger the variance of X ∑ (X
i i − X )²
- the larger the sample size

σ 2 ∑ X i2
Var (b0 ) =
n∑ ( X i − X ) 2
− Xσ 2
Cov(b0 , b1 ) =
n∑ ( X i − X ) 2
• Unbiased estimator s²
- Assumption: E (ui 2 ) = σ 2 ... Variance of u i constant
E (ui , u j ) = 0 ... no correlation between error terms
E (∑ ei2 ) ( n − 2)σ ²
E ( s ²) = = =σ²
n−2 n−2
E (∑ ei2 )
σ² = ... standard error of regression
n−2
• Standard error of b0, b1:
s2
s =
2
∑ i ( X i − X )²
b1
s 2 ∑ X i2
sb20 =
n∑ i ( X i − X )²
• Assumption 1
linear regression, use of relevant variables
Yi = β 0 + β1 X i + ui i=1,2,... n
• Assumption 2
E (ui ) = 0
• Assumption 3
Homoskedasticity: σ² is constant for all ui
Var (ui ) ≡ E[ui − E (ui )]² = E (ui )² = σ ²
Heteroskedasticity: The variance of the error regression term ui,
conditional on the regressors, is not constant.
example: incline of income of men and women
• Assumption 4
error terms of population do not correlate: no autocorrelation
Cov(ui , u j ) ≡ E (ui , u j ) = 0 i≠ j
Assumptions 2-4: ui ~ iid (0,σ )

2
• Assumption 5
X are fixed in repeated samples
E (ui | X i ) = E (ui ) = 0 Cov(ui , X i ) = 0

Heteroskedasticity
Autocorrrelation
• Assumption 6
Independent Variables do not show pertect multicollinearity
(=two or more predictor variables are highly correlated)
• Assumption 7
Sample variance of Xi (var(Xi) ist positive and finite number
• Assumption 8
n>k (Parameters)
Proof of efficiency of OLS estimators
(=Gauss Markov Theorem)
cn , λ1 , λ2 → computed
*xi
Consistency
• asymptotic property:
- no properties of small samples
- unbiased expected value will be closed to parameter of
population by increasing sample size
^
lim P[| Θ n − Θ < δ |= 1 δ >0
n →∞
plim Θ n = Θ
Hypothesis Tests and Confidence Interval
Assumption: Gauss Markov theorem, normal distribution of ui, b0

and b1
• create z-variable: need expected value, standard error s² of
estimated parameter
• s² causes t-distribution not normal distribution
b1 − β1
~ tn − 2
sb1
b0 − β 0
~ tn − 2
sb0
• Confidence Interval
The smaller the confidence Intervall
• the larger α
• the smaller s² and σ²
• the larger n
• Hypothesis:
H 0 : β k = β k0 H1 : β k ≠ β k0
• t-statistic:
b − β k0
~ tn − 2
sb
• Confidence intervall
[ β k0 − tαc / 2 sb ; β k0 + tαc / 2 sb ]
• p-value: computer
Example: 5 observations
X Y
1,2 2,6
3 1,6
4,5 4
5,8 3
7,2 4,9
Compute:
• point estimator b0, b1
• sb0, sb1
• t-statistic
• R², r

Part 2 Rothart Empirical Research Methods

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 2 Rothart Empirical Research Methods

Uploaded by

Copyright:

Available Formats

The linear Regression Model

Linear Regression with one Regressor

• The regression describes the unknown effect of changing one

• example: agriculture researcher is interested in the effect of fertilizer

change on yield ∆Yield

Causing error terms:

• first take partial derivations with respect to b0 and b1

• Mean of fitted Y equals mean of observed Y

• sum of the product of estimated residuals and the

• Assumptions about population (data generating process)

• total sum of squares

• sum of squared residuals:

Xi explains all of the variation of Yi, ESS=0

Xi explains all of the variation of Yi then Yi =Yi,

Assumption: X are fixed in repeated sampling

- b1: linear combination of Yi

The smaller the variance of b1

- the larger the sample size

Assumptions 2-4: ui ~ iid (0,σ )

E (ui | X i ) = E (ui ) = 0 Cov(ui , X i ) = 0

Assumption: Gauss Markov theorem, normal distribution of ui, b0

You might also like