You are on page 1of 35

The linear Regression Model

Linear Regression with one Regressor

• The regression describes the unknown effect of changing one


variable X on an other variable Y

• example: agriculture researcher is interested in the effect of fertilizer


on yield, holding other factors fixed

change on yield ∆Yield


β fert = =
change of fertilizer ∆Fertilizer

• Interpretation
β fert : change in Yield that results from changing in fertilizer
∆ Yield=β fert ⋅ ∆ Fert β fert =0.3 ∆Fert=2 ∆Yield=0,3 ⋅ 2=0,6
• This effect is described by the regression line (population)
Yi = β 0 + β1 X i + ui i=1,2,... n

• Yi … dependend variable
• Xi … independend variable
• β 0 + β1 X i ... population regression line
• β 0 ... intercept of the population regression line
• β1 ... slope of the population regression line
• u i ... error term: deviation of observed data and data on regression line
• the slope β1 ,is the expected change in Yi associated with a 1-
unit change in Xi.
• the intercept β0 determines the level of the regression line
• Intercept and slope are called coefficients or parameters
• ui shows the difference between Yi and the population regression
function. The error term contains all the other factors besides Xi
that determine the value of the dependent variable Yi, for a
specific observation. The error term is a stochastic component
and random variable.

Causing error terms:


• not all variables influencing Y can be used
• only the important variables are use
• indeterminacy in nature
• Yi = β 0 + β1 X i + ui i=1,2,... n population regression function
- descriptive statistics
- Problem: population (Data generating process) mostly
unknown
• draw sample – sample regression function
- Yi = b0 + b1 X i + ei
- random variables
- inference statistics
- econometricians try to estimate a population regression
function on basis of samples
OLS (OrdinaryLeast Squares) Estimator
• in a practical situation, the intercept β0 and slope β1 of the
population regression line are unknown
• the unknown slope and intercept have to be estimated by OLS
estimator
• the OLS estimator chooses the regression coefficients so that
the estimated regression line is as close as possible to the
observed data, where closeness is measured by the sum of
squared mistakes made in predicting Y given X (min ei ) ∑i
• b0 and b1 are estimators of β0 and β1
• regression line based on these estimators: b0+ b1X
• the value of predicted Yi using this line: b0+ b1 Xi
• the sum of squared prediction mistakes over n observations
n
Y − b −
∑ i 0 1 i
( b X ) 2

i =1
Derivation of the OLS estimators:
• minimize the square mistakes of the predicted values
n n
min ∑ i =1 (Yi − ˆYi ) 2 = min ∑ i =1 (Yi − b0 − b1 X i )
β0 β1 b0b1

ˆ
Y
• i − Y i = ei … residual: difference of the predicted value and Yi

• first take partial derivations with respect to b0 and b1

ϑ ∑ e2 n 2 n
∑ ( Yi - b 0 - b1 X i ) = -2 ∑ ( Yi - b 0 - b1 X i ) = − 2 ∑ i e 2 = 0
ϑ b0 i =1 i =1

ϑ ∑ e2 n n

∑ (Y - b 0 - b1 X i ) = -2 ∑ ( Y i - b 0 - b1 X i ) = − 2 ∑ i X i e i = 0
2

ϑ b1 i =1
i
i =1
n

∑(X − X )(Yi − Y )
i
∑ ( X − X )Y Cov(Y , X )
b1 = i =1
n
= n
i i
=
Var ( X )
∑(Xi − X )
i =1
2
∑(X − X )
i =1
i
2

b0 = Y − b1 X
ei = Yi − b0 − b1 X i
Results of OLS estimation
• sum of residuals and therefor its mean equals zero
∑e
i =0
- only for residuals with intercept
- only for estimated residuals
- Proof:

• Mean of fitted Y equals mean of observed Y


^
Y =Y
- Proof:

• sum of the product of estimated residuals and the


independent variable equal zero/ the estimated residuals
do not correlate with the independent variable.

∑X e i i =0

- Proof:
• Covariance of the fitted Y and the estimated residuals of an
OLS estimation with intercept equal always zero.
^
Cov(Y , e) = 0

- Proof:
Sampling distribution

• repeated sampling
• b0, b1 …. random variables, whereas β0 and β1 are fixed
variables – sampling distribution, expected value, variance
• ei depends on value of b0 and b1
• ei … random variable – Y is random variable as well
• fitted Y is conditional expextet value of Y (on the average)

• Assumptions about population (data generating process)


Yi = β 0 + β1 X i + ui
E (ui ) = 0
Var (ui ) = σ 2
ui ~ iid (0,σ 2 )
• Properties of sampling distribution:
- Average of the estimation is closed to the value of the
population – law of large numbers: the average of the
results obtained from a large number of trials should be
close to the expected value, and will tend to become
closer as more trials are performed.
- Distribution of estimations are bell-shaped – central
theorem: the mean of a sufficiently large number of
independent random variables, each with finite mean and
variance, will be approximately normally distributed
Measures of fit: R² and standard error
Regression R² and standard error measure how well the OLS
regression line fits the data.

Coeffizient of determination: R²
• R² is the ratio of the sample variance Yi explained by Xi.
sample variance of Yˆ i EES
R =
2
=
sample variance of Yi TSS
• explained sum of squares

• total sum of squares

• sum of squared residuals:


• if b1 =0, then Xi explains none of the variation of Yi, and the
predicted value of Yi based on the regression is just the sample
average of Yi.
ESS=0 SSR=TSS thus R²=0

Xi explains all of the variation of Yi, ESS=0

Xi explains all of the variation of Yi then Yi =Yi,


the residual is zero and ESS=TSS and R²=1
Standard error
• The standard error term of regression is an estimator of the
standard deviation of the regression error ui
• so the standard error term is a measure of the spread of the
observation around the regression line, measured in the units
of the dependent variable

E (∑ ei2 ) SSR
σ² = =
n−2 n-2

• example:
R²=0,043
sei=23,2 not all factors might be included
Assumptions of OLS Estimators
The concept of sampling distribution allows us to define the
properties of estimating function very precisely. We want the
estimated parameters being very close to the value of the
population on the average and have a very smal variance.
We want them to be:
- unbiased
Repeated sampling of small size
- efficient
- and consistent in contrast to other linear estimating functions
GAUSS - MARKOV THEOREM: OLS estimator shows
under certain assumptions, the smallest variance of all linear and
unbiased estimating functions – BLUE (Best linear unbiased
estimator) Under the Gauss Markov conditions, the OLS
estimator is the best linear unbiased estimator of the regression
coefficients conditional on the values of the regressors.
Unbiasedness

Assumption: X are fixed in repeated sampling


• Unbiasedness of b0, b1
∑ ( X − X )Y
b = i i

∑(X − X )
1 2
i

- b1: linear combination of Yi


- X and u are stochastically independent - exogenity
E(ui ) = 0
• Variance, Cov of b0, b1
- Assumption: ui ~ iid (0,σ )
2

σ2
Var (b1 ) =
∑ (X
i i − X )²

The smaller the variance of b1


- the smaller the variance of the population σ²
- the larger the variance of X ∑ (X
i i − X )²

- the larger the sample size


σ 2 ∑ X i2
Var (b0 ) =
n∑ ( X i − X ) 2
− Xσ 2
Cov(b0 , b1 ) =
n∑ ( X i − X ) 2

• Unbiased estimator s²
- Assumption: E (ui 2 ) = σ 2 ... Variance of u i constant
E (ui , u j ) = 0 ... no correlation between error terms

E (∑ ei2 ) ( n − 2)σ ²
E ( s ²) = = =σ²
n−2 n−2
E (∑ ei2 )
σ² = ... standard error of regression
n−2
• Standard error of b0, b1:

s2
s =
2

∑ i ( X i − X )²
b1

s 2 ∑ X i2
sb20 =
n∑ i ( X i − X )²
• Assumption 1
linear regression, use of relevant variables

Yi = β 0 + β1 X i + ui i=1,2,... n

• Assumption 2

E (ui ) = 0

• Assumption 3
Homoskedasticity: σ² is constant for all ui
Var (ui ) ≡ E[ui − E (ui )]² = E (ui )² = σ ²
Heteroskedasticity: The variance of the error regression term ui,
conditional on the regressors, is not constant.
example: incline of income of men and women
• Assumption 4
error terms of population do not correlate: no autocorrelation

Cov(ui , u j ) ≡ E (ui , u j ) = 0 i≠ j

Assumptions 2-4: ui ~ iid (0,σ )


2

• Assumption 5
X are fixed in repeated samples

E (ui | X i ) = E (ui ) = 0 Cov(ui , X i ) = 0


Heteroskedasticity

Autocorrrelation
• Assumption 6
Independent Variables do not show pertect multicollinearity
(=two or more predictor variables are highly correlated)
• Assumption 7
Sample variance of Xi (var(Xi) ist positive and finite number
• Assumption 8
n>k (Parameters)
Proof of efficiency of OLS estimators
(=Gauss Markov Theorem)

cn , λ1 , λ2 → computed
*xi
Consistency

• asymptotic property:
- no properties of small samples
- unbiased expected value will be closed to parameter of
population by increasing sample size
^
lim P[| Θ n − Θ < δ |= 1 δ >0
n →∞

plim Θ n = Θ
Hypothesis Tests and Confidence Interval

Assumption: Gauss Markov theorem, normal distribution of ui, b0


and b1
• create z-variable: need expected value, standard error s² of
estimated parameter
• s² causes t-distribution not normal distribution

b1 − β1
~ tn − 2
sb1
b0 − β 0
~ tn − 2
sb0
• Confidence Interval
The smaller the confidence Intervall
• the larger α
• the smaller s² and σ²
• the larger n
• Hypothesis:
H 0 : β k = β k0 H1 : β k ≠ β k0
• t-statistic:
b − β k0
~ tn − 2
sb
• Confidence intervall

[ β k0 − tαc / 2 sb ; β k0 + tαc / 2 sb ]
• p-value: computer
Example: 5 observations
X Y

1,2 2,6

3 1,6

4,5 4

5,8 3

7,2 4,9

Compute:
• point estimator b0, b1
• sb0, sb1
• t-statistic
• R², r

You might also like