Lecture1 Econometrics

CARLETON UNIVERSITY
OTTAWA CANADA
ECONOMETRICS 5701
LECTURE 1
Instructor: Marcel Voia
1 BASIC DISTRIBUTION THEORY

Reference: Wooldridge (2002) Chapter 3, Greene Chapter 3, Chapter 4
1.1 Review relationship between basic distributions.

1.1.1 The Chi Squared Distribution
If zÑ (0, 1) and x = z 2 then x is said to have a chi-square distribution with
one degree of freedom, denoted
z 2 ∼ χ2(1)
The mean of a chi square is 1 and the variance is 2. More generally,

If x1 , x2 , ....xn are n independent chi-squared [1] variables, then
n
X
p= xi ∼ χ2(n)
i=1
The mean of p is n and the variance is 2n.
1.1.2 Student’s t Distribution

If z is an N (0, 1) variable and x is a χ2(n) and is independent of z, then the ratio
z
t(n) = p
x/n
has a t or student’s t distribution with n degrees of freedom. The t distribution

has the same shape as the normal distribution but has thicker tails.
1.1.3 The F Distribution

If x1 and x2 are two independent chi-square variables with degrees of freedom
n1 and n2 respectively, then the ratio
x1 /n1
F = ∼ F(n1 ,n2 )
x2 /n2
has an F distribution.
1
1.1.4 The Gamma and Exponential Distribution
Gamma Distribution is used on studies of income distribution and production
function
λP −λx P −1
f (x) = e x , x ≥ 0, λ > 0, P > 0
Γ (P )
P P
E (x) = , Var (x) = 2 .
λ λ
When P = 1 ⇒ Exponential Distribution
When λ = 12 , P = n2 ⇒ χ(n)
1.1.5 The Beta Distribution

Used for variables constrained between 0 and c > 0,
µ ¶
Γ (α + β) ³ x ´α−1 ³ x ´β−1 1
f (x) = 1− ,
Γ (α) Γ (β) c c c
cα
E (x) =
(α + β)
c2 αβ
V ar (x) =
(α + β + 1) (α + β)2
very flexible in shape, if α = β ⇒ f (x) is symmetric

It is used in studies of labour force participation rates.
1.1.6 Logistic Distribution

It is a thicker-tailed symmetric distribution, with the following CDF
1
F (x) = Λ (x) = , and the density
1 + e−x
f (x) = Λ (x) (1 − Λ (x))
E (x) = 0
π2
V ar (x) = .
3
1.2 Limiting and Asymptotic Properties

Introduce the ideas of the bias and efficiency of estimators. These are finite-
sample concepts. We are often interested in the properties of estimators as we
increase the sample size. This may be for two reasons:
First, we often find that we need to work with statistics (mainly test statistics
such as the t and F) which are not independent of the sample size. If this is the
case we need to be able to consider how the estimator will behave if the sample
changes, most usually when it gets larger.
2
Second, in many cases we may not be able to say anything precise about
the properties of an estimator in a small sample but we may be able to derive
(approximate) results about the estimator as the sample size gets large.
This section describes some of the tools from statistical analysis.
1.2.1 Pointwise Convergence

A sequence of functions {hn (x)} converges pointwise to a limit function h (x) ,
hn (x) → h (x) , if for each x
lim hn (x) = h (x) .
n→∞
1.2.2 Uniform Convergence

A sequence of functions {hn (x)} is said to converge uniformly to a limit function
h (x) , if
lim sup |hn (x) − h (x)| = 0.
n→∞ x
1.2.3 Convergence in Probability.

A random variable xn is said to converge in probability if:
lim xn = p lim xn = x
n→∞
where x is some arbitrary value is read as “the probability limit of xn is x” .

The probability limit can be rewritten as
lim Pr (x − ε ≤ xn ≤ x + ε) = 1, with ε > 0.
n→∞
*** EXPLAIN HOW TO READ THIS TERM **
1.2.4 Convergence in Distribution and Limiting Distributions

The limiting distribution of a random variable x is denoted F(x) and we say
d
that the random variable xn converges in distribution to x, denoted xn → x.
The limiting mean and limiting variance are simply the mean and variance of
the limiting distribution, F (x). A key result is :
Theorem 1 The Slutsky Theorem
If g(x) is a continuous function of x then
p lim g (x) = g (p lim (x))
which states that ”the limit of the function is the function of the limit”.
Definition 2 Jensen’s Inequality:
which states that, in general, E [g (x)] 6= g [E (x)] .
More specifically, when g(x) is a convex function, E [g (x)] ≥ g [E (x)].
Only in the case where g(.) is a linear function does the equality holds.).
3
The general case ¡ ¢ ¡ ¢
If y = αxβ then E (y) = E αxβ 6= αE xβ
However, in the linear case
If y = α + βx then E (y) = E (α + βx) = α + βE (x)
another reason why linear models are easy to deal with!
The Slutsky Theorem applies to random vectors (matrices) as well as to random
scalars. For example
If p lim Wn = Ω then p lim Wn−1 = Ω−1
The key result are therefore as follows:
p d
• xn → x ⇒ xn → x
d d
• If xn → x and g(x) is a function then the Slutsky Theorem says g (xn ) →
g (x).
√ ³ ´
d
• (Delta Method) If n b θ − θ0 → N (0, Σ) ⇒
µ ¶
√ ³ ³ ´ ´
d ∂g (θ) X ∂g (θ) 0
n g b θ − g (θ0 ) → N 0, 0 |θ0 0 |θ0
∂θ ∂θ
d p
• If xn → x and yn → α, where α is a constant, then
d
(i) xn + yn → x + a
d
(ii) xn yn → xa
xn d x
(iii) → , a different from 0.
yn a
1.2.5 Orders of Magnitude
Let {an } be a sequence of real numbers, {Xn } be a sequence of random variables
and {gn } be a sequence of positive real numbers. Then,
1. an is of smaller order (in magnitude) than gn , denoted an = o (gn ) , if
limn→∞ agnn = 0.
2. an is at most of order (in magnitude) gn , denoted an = O (gn ) , if there
exists a real number B such that
|an |
≤ B for all n.
gn
3. Xn is of smaller order (in probability) than gn , denoted Xn = op (gn ) , if

Xn p
gn → 0.
4. Xn is at most of order (in probability) gn , denoted Xn = O (gn ) , if there

exists a nonstochastic sequence {cn } such that cn = O (1) and
µ ¶
Xn p
− cn → 0.
gn
4
1.2.6 Stabilizing Transformations
Finally, it is often the case that the limiting distribution, F (x), of a random
variable is a point (often zero). There is very little information in this point and
we often want to analyze the properties of a random variable before it collapses
to a spike. This can be achieved using a stabilizing transformation.
Consider the following statistic
µ ¶
σ2
x ∼ µ, ,
n
σ2
which says that x has a mean of µ and a variance of n . We know that
p lim x = µ
but as the sample size increases the distribution collapses to a point. This is
known as a degenerate distribution.
Imagine we had another statistic with a distribution
µ ¶
0 σ2
x ∼ µ, θ
n
This has the same property that p lim x0 = µ but even though it has a different
variance it too will have the same degenerate distribution. As the sample size
gets large enough we cannot distinguish between the two distributions. In other
words, different distributions may be indistinguishable in the limit.
However we may be able to define a transformation such that
d
z = h (x) → f (z)
but
d
z 0 = h (x0 ) → f (z 0 )
where f(z) is a well-defined limiting distribution. We now have a basis for
comparison. This is a so-called stabilizing transformation.
This allows us to introduce our next desirable property of an estimator:
1.3 Consistency
An estimator is said to be consistent if its probability limit is equal to the true
population parameter. In other words
p lim b
θ=θ
Thus if, as the sample size increases, the bias and the variance decline and if
they both converge on zero when the sample is infinite, then the estimator is
consistent.
Note: all unbiased estimators are consistent; but not all consistent
³ éstimators
are unbiased. An estimator may be biased in finite sample (i.e. E b θ 6= θ) but
consistent (p lim b
θ = θ). There is a large class of estimators with this property,
the most important being IV and GMM estimators.
5
1.4 Using Distribution Theory: The Sampling Distribu-
tion of the Sample Mean
Imagine drawing a random sample of n observations from a population and
computing a statistic B for example, the sample mean. If we drew another
sample we would, obviously, get another value for the statistic. The sample
mean is thus a random variable.
We are interested in deriving the sampling distribution of this sample mean
in cases where the variable can assume any value and can come from any kind
of distribution.
Proposition 3 If X1 ...Xn are a random sample and can be regarded as iden-
tically and independently distributed draws from a population with mean, µ and
variance σ 2 then whatever the form of the distribution of X, the sampling dis-
2
tribution of the random variable X will have mean µ and variance σn .
Proof. We define the sample mean as
n
1X 1
X= Xi = (X1 + X2 + ... + Xn )
n i=1 n
where X1 , X2 , ..., Xn are n variables drawn from the same sample. It is assumed
that the Xi are identically and independently distributed. Since n is constant,
Ã n !
¡ ¢ 1X 1
E X =E Xi = E (X1 + X2 + ... + Xn )
n i=1 n
We know from Jensen’s Inequality that in the case of a linear function, the
expectation of a sum is equal to the sum of the expectations, and that the mean
of each Xi ,X, is, by definition, µ:
¡ ¢ 1 1 nµ
E X = E (X1 + X2 + ... + Xn ) = (µ + µ + ... + µ) = =µ
n n n
Thus the mean of the sampling distribution of X is equal to the population
mean. The variance of the sample mean is given by
Ã n ! Ã n !
2
¡ ¢ 1X 1 X
σ X = V ar X = V ar Xi = 2 V ar Xi
n i=1 n i=1
Since the variables

P areP
independent, their covariances are zero, then we know
that V ar ( ni=1 Xi ) = ni=1 V ar (Xi ) and thus
1 1 ¡ 2 ¢ nσ 2 σ2
σ 2X = (V ar (X 1 ) + V ar (X2 ) + ... + V ar (X n )) = σ + σ 2
+ ... + σ 2
= = .
n2 n2 n2 n
¡ ¢ ³ 2
´
To summarize: If X ∼ µ, σ 2 then X ∼ µ, σn . Thus the sample mean is
an unbiased estimator of the population mean and as the sample size increases
the distribution of X converges to a point. Taken together this implies that X
is a consistent estimator.
6
1.5 The link with estimation theory
Theorem 4 The Central Limit Theorem
If x1 ...xn are a random sample from any probability distribution with finite
mean µ and variance σ 2 then
√ d ¡ ¢
n (x − µ) → N 0, σ 2 ,
which states that the limiting distribution of the mean of the sample is Normal.
There are a number of representations of this expression. For example is we

standardize the random variable then the limiting distribution is given as
√
n (x − µ) d
→ N (0, 1) .
σ
If each random variable has a common mean µ, but non-similar variances σ 2i
then the limiting distribution is given as
√ d ¡ ¢
n (x − µ) → N 0, σ 2 .
Finally the central limit theorem applies to a multivariate setting
Theorem 5 Lindberg-Levy Central Limit Theorem

If x1 ...xn are a random sample from any multivariate probability distribution
with finite mean vector µ and finite positive definite covariance matrix Q, then
µ ¶
√ d
n x − µ → N (0, Q)
__ −−
which states that the limiting distribution of the mean of the sample is Normal.
Central limit theorems give us an indication of the properties of the limiting

distribution of the sample mean. The final key theorem is
Theorem 6 Asymptotic Distribution of the Sample Mean

If √
n (x − µ) d
→ N (0, 1)
σ
then, asymptotically µ ¶
σ2
x ∼ N µ,
n
which states that the mean of the random variable x is asymptotically (i.e. in
very large samples) distributed normally with a mean of µ (the population mean)
2
and a variance σn (which goes to zero as the sample size goes to infinity.
7
The central limit theorem is absolutely crucial to econometrics as it allows
us to base our inference on the properties of the sample mean on the assumption
that its distribution can be approximated by a normal distribution whatever the
true (but unknown population distribution).
Free Gauss download at:

Open: “ftp://ftp.aptech.com/”, you can download manuals by double click-
ing on “Manuals”, or you can download Gauss Light by double clicking on
“GAUSS_Light”, you get to the following page:
“ftp://ftp.aptech.com/GAUSS_Light/”. Download the zip files from there.
Free OX download at:
Open: ”http://www.doornik.com/download.html”, double click on: ”Down-
load Ox Console (all platforms)”.
Fill out the opened web page
If you are working on Gauss, you can open a ”txt” file as an editor file. Write
your code on that file and save it using shorter names (e.g. ass1.txt)
If you are working on OX and you are programming using gauss codes, save
your editor file using the extension “.src”, (e.g. ass1.src).
Test your package by doing a small program:

eg:
n=1000;
z=rndn(n,1); @generate standard normal@
x=10+3*z; @generate N(10,9)@
meanc(x); @check the mean of x@
x1=zeros(10,1); @genrate sample 1 of x@
i=1;
do while i<=10; @do a doloop to fill out sample 1@
x1[i,1]=x[100*i,1];
i=i+1;
endo;
meanc(x1); @check the mean of x1@
i=1;
x2[i,1]=x[50*i,1];
i=i+1;
endo;

i=1;
x3[i,1]=x[80*i,1];
i=i+1;
8
endo;
@compute the mean of the samples means and compare with the population
mean meanc(x)@
x_average=(meanc(x1)+meanc(x2)+meanc(x3))/3;

Lecture1 Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture1 Econometrics

Uploaded by

Copyright:

Available Formats

CARLETON UNIVERSITY

1 BASIC DISTRIBUTION THEORY

1.1 Review relationship between basic distributions.

The mean of a chi square is 1 and the variance is 2. More generally,

The mean of p is n and the variance is 2n.

1.1.2 Student’s t Distribution

has a t or student’s t distribution with n degrees of freedom. The t distribution

1.1.3 The F Distribution

1.1.5 The Beta Distribution

very flexible in shape, if α = β ⇒ f (x) is symmetric

1.1.6 Logistic Distribution

1.2 Limiting and Asymptotic Properties

1.2.1 Pointwise Convergence

1.2.2 Uniform Convergence

1.2.3 Convergence in Probability.

where x is some arbitrary value is read as “the probability limit of xn is x” .

*** EXPLAIN HOW TO READ THIS TERM **

1.2.4 Convergence in Distribution and Limiting Distributions

3. Xn is of smaller order (in probability) than gn , denoted Xn = op (gn ) , if

4. Xn is at most of order (in probability) gn , denoted Xn = O (gn ) , if there

Since the variables

There are a number of representations of this expression. For example is we

Finally the central limit theorem applies to a multivariate setting

Theorem 5 Lindberg-Levy Central Limit Theorem

Central limit theorems give us an indication of the properties of the limiting

Theorem 6 Asymptotic Distribution of the Sample Mean

Free Gauss download at:

Test your package by doing a small program:

x3=zeros(10,1); @genrate sample 3 of x@

You might also like

* EXPLAIN HOW TO READ THIS TERM