You are on page 1of 9

CARLETON UNIVERSITY

OTTAWA CANADA
ECONOMETRICS 5701
LECTURE 1
Instructor: Marcel Voia

1 BASIC DISTRIBUTION THEORY


Reference: Wooldridge (2002) Chapter 3, Greene Chapter 3, Chapter 4

1.1 Review relationship between basic distributions.


1.1.1 The Chi Squared Distribution
If z˜N (0, 1) and x = z 2 then x is said to have a chi-square distribution with
one degree of freedom, denoted

z 2 ∼ χ2(1)

The mean of a chi square is 1 and the variance is 2. More generally,


If x1 , x2 , ....xn are n independent chi-squared [1] variables, then
n
X
p= xi ∼ χ2(n)
i=1

The mean of p is n and the variance is 2n.

1.1.2 Student’s t Distribution


If z is an N (0, 1) variable and x is a χ2(n) and is independent of z, then the ratio

z
t(n) = p
x/n

has a t or student’s t distribution with n degrees of freedom. The t distribution


has the same shape as the normal distribution but has thicker tails.

1.1.3 The F Distribution


If x1 and x2 are two independent chi-square variables with degrees of freedom
n1 and n2 respectively, then the ratio

x1 /n1
F = ∼ F(n1 ,n2 )
x2 /n2

has an F distribution.

1
1.1.4 The Gamma and Exponential Distribution
Gamma Distribution is used on studies of income distribution and production
function

λP −λx P −1
f (x) = e x , x ≥ 0, λ > 0, P > 0
Γ (P )
P P
E (x) = , Var (x) = 2 .
λ λ
When P = 1 ⇒ Exponential Distribution
When λ = 12 , P = n2 ⇒ χ(n)

1.1.5 The Beta Distribution


Used for variables constrained between 0 and c > 0,
µ ¶
Γ (α + β) ³ x ´α−1 ³ x ´β−1 1
f (x) = 1− ,
Γ (α) Γ (β) c c c

E (x) =
(α + β)
c2 αβ
V ar (x) =
(α + β + 1) (α + β)2

very flexible in shape, if α = β ⇒ f (x) is symmetric


It is used in studies of labour force participation rates.

1.1.6 Logistic Distribution


It is a thicker-tailed symmetric distribution, with the following CDF
1
F (x) = Λ (x) = , and the density
1 + e−x
f (x) = Λ (x) (1 − Λ (x))
E (x) = 0
π2
V ar (x) = .
3

1.2 Limiting and Asymptotic Properties


Introduce the ideas of the bias and efficiency of estimators. These are finite-
sample concepts. We are often interested in the properties of estimators as we
increase the sample size. This may be for two reasons:
First, we often find that we need to work with statistics (mainly test statistics
such as the t and F) which are not independent of the sample size. If this is the
case we need to be able to consider how the estimator will behave if the sample
changes, most usually when it gets larger.

2
Second, in many cases we may not be able to say anything precise about
the properties of an estimator in a small sample but we may be able to derive
(approximate) results about the estimator as the sample size gets large.
This section describes some of the tools from statistical analysis.

1.2.1 Pointwise Convergence


A sequence of functions {hn (x)} converges pointwise to a limit function h (x) ,
hn (x) → h (x) , if for each x
lim hn (x) = h (x) .
n→∞

1.2.2 Uniform Convergence


A sequence of functions {hn (x)} is said to converge uniformly to a limit function
h (x) , if
lim sup |hn (x) − h (x)| = 0.
n→∞ x

1.2.3 Convergence in Probability.


A random variable xn is said to converge in probability if:
lim xn = p lim xn = x
n→∞

where x is some arbitrary value is read as “the probability limit of xn is x” .


The probability limit can be rewritten as
lim Pr (x − ε ≤ xn ≤ x + ε) = 1, with ε > 0.
n→∞

*** EXPLAIN HOW TO READ THIS TERM **

1.2.4 Convergence in Distribution and Limiting Distributions


The limiting distribution of a random variable x is denoted F(x) and we say
d
that the random variable xn converges in distribution to x, denoted xn → x.
The limiting mean and limiting variance are simply the mean and variance of
the limiting distribution, F (x). A key result is :
Theorem 1 The Slutsky Theorem
If g(x) is a continuous function of x then
p lim g (x) = g (p lim (x))
which states that ”the limit of the function is the function of the limit”.
Definition 2 Jensen’s Inequality:
which states that, in general, E [g (x)] 6= g [E (x)] .
More specifically, when g(x) is a convex function, E [g (x)] ≥ g [E (x)].
Only in the case where g(.) is a linear function does the equality holds.).

3
The general case ¡ ¢ ¡ ¢
If y = αxβ then E (y) = E αxβ 6= αE xβ
However, in the linear case
If y = α + βx then E (y) = E (α + βx) = α + βE (x)
another reason why linear models are easy to deal with!
The Slutsky Theorem applies to random vectors (matrices) as well as to random
scalars. For example
If p lim Wn = Ω then p lim Wn−1 = Ω−1
The key result are therefore as follows:
p d
• xn → x ⇒ xn → x
d d
• If xn → x and g(x) is a function then the Slutsky Theorem says g (xn ) →
g (x).
√ ³ ´
d
• (Delta Method) If n b θ − θ0 → N (0, Σ) ⇒
µ ¶
√ ³ ³ ´ ´
d ∂g (θ) X ∂g (θ) 0
n g b θ − g (θ0 ) → N 0, 0 |θ0 0 |θ0
∂θ ∂θ
d p
• If xn → x and yn → α, where α is a constant, then
d
(i) xn + yn → x + a
d
(ii) xn yn → xa
xn d x
(iii) → , a different from 0.
yn a
1.2.5 Orders of Magnitude
Let {an } be a sequence of real numbers, {Xn } be a sequence of random variables
and {gn } be a sequence of positive real numbers. Then,
1. an is of smaller order (in magnitude) than gn , denoted an = o (gn ) , if
limn→∞ agnn = 0.
2. an is at most of order (in magnitude) gn , denoted an = O (gn ) , if there
exists a real number B such that
|an |
≤ B for all n.
gn

3. Xn is of smaller order (in probability) than gn , denoted Xn = op (gn ) , if


Xn p
gn → 0.

4. Xn is at most of order (in probability) gn , denoted Xn = O (gn ) , if there


exists a nonstochastic sequence {cn } such that cn = O (1) and
µ ¶
Xn p
− cn → 0.
gn

4
1.2.6 Stabilizing Transformations
Finally, it is often the case that the limiting distribution, F (x), of a random
variable is a point (often zero). There is very little information in this point and
we often want to analyze the properties of a random variable before it collapses
to a spike. This can be achieved using a stabilizing transformation.
Consider the following statistic
µ ¶
σ2
x ∼ µ, ,
n
σ2
which says that x has a mean of µ and a variance of n . We know that
p lim x = µ
but as the sample size increases the distribution collapses to a point. This is
known as a degenerate distribution.
Imagine we had another statistic with a distribution
µ ¶
0 σ2
x ∼ µ, θ
n
This has the same property that p lim x0 = µ but even though it has a different
variance it too will have the same degenerate distribution. As the sample size
gets large enough we cannot distinguish between the two distributions. In other
words, different distributions may be indistinguishable in the limit.
However we may be able to define a transformation such that
d
z = h (x) → f (z)
but
d
z 0 = h (x0 ) → f (z 0 )
where f(z) is a well-defined limiting distribution. We now have a basis for
comparison. This is a so-called stabilizing transformation.
This allows us to introduce our next desirable property of an estimator:

1.3 Consistency
An estimator is said to be consistent if its probability limit is equal to the true
population parameter. In other words
p lim b
θ=θ
Thus if, as the sample size increases, the bias and the variance decline and if
they both converge on zero when the sample is infinite, then the estimator is
consistent.
Note: all unbiased estimators are consistent; but not all consistent
³ ´estimators
are unbiased. An estimator may be biased in finite sample (i.e. E b θ 6= θ) but
consistent (p lim b
θ = θ). There is a large class of estimators with this property,
the most important being IV and GMM estimators.

5
1.4 Using Distribution Theory: The Sampling Distribu-
tion of the Sample Mean
Imagine drawing a random sample of n observations from a population and
computing a statistic B for example, the sample mean. If we drew another
sample we would, obviously, get another value for the statistic. The sample
mean is thus a random variable.
We are interested in deriving the sampling distribution of this sample mean
in cases where the variable can assume any value and can come from any kind
of distribution.
Proposition 3 If X1 ...Xn are a random sample and can be regarded as iden-
tically and independently distributed draws from a population with mean, µ and
variance σ 2 then whatever the form of the distribution of X, the sampling dis-
2
tribution of the random variable X will have mean µ and variance σn .
Proof. We define the sample mean as
n
1X 1
X= Xi = (X1 + X2 + ... + Xn )
n i=1 n

where X1 , X2 , ..., Xn are n variables drawn from the same sample. It is assumed
that the Xi are identically and independently distributed. Since n is constant,
à n !
¡ ¢ 1X 1
E X =E Xi = E (X1 + X2 + ... + Xn )
n i=1 n

We know from Jensen’s Inequality that in the case of a linear function, the
expectation of a sum is equal to the sum of the expectations, and that the mean
of each Xi ,X, is, by definition, µ:
¡ ¢ 1 1 nµ
E X = E (X1 + X2 + ... + Xn ) = (µ + µ + ... + µ) = =µ
n n n
Thus the mean of the sampling distribution of X is equal to the population
mean. The variance of the sample mean is given by
à n ! à n !
2
¡ ¢ 1X 1 X
σ X = V ar X = V ar Xi = 2 V ar Xi
n i=1 n i=1

Since the variables


P areP
independent, their covariances are zero, then we know
that V ar ( ni=1 Xi ) = ni=1 V ar (Xi ) and thus
1 1 ¡ 2 ¢ nσ 2 σ2
σ 2X = (V ar (X 1 ) + V ar (X2 ) + ... + V ar (X n )) = σ + σ 2
+ ... + σ 2
= = .
n2 n2 n2 n
¡ ¢ ³ 2
´
To summarize: If X ∼ µ, σ 2 then X ∼ µ, σn . Thus the sample mean is
an unbiased estimator of the population mean and as the sample size increases
the distribution of X converges to a point. Taken together this implies that X
is a consistent estimator.

6
1.5 The link with estimation theory
Theorem 4 The Central Limit Theorem
If x1 ...xn are a random sample from any probability distribution with finite
mean µ and variance σ 2 then
√ d ¡ ¢
n (x − µ) → N 0, σ 2 ,

which states that the limiting distribution of the mean of the sample is Normal.

There are a number of representations of this expression. For example is we


standardize the random variable then the limiting distribution is given as

n (x − µ) d
→ N (0, 1) .
σ
If each random variable has a common mean µ, but non-similar variances σ 2i
then the limiting distribution is given as
√ d ¡ ¢
n (x − µ) → N 0, σ 2 .

Finally the central limit theorem applies to a multivariate setting

Theorem 5 Lindberg-Levy Central Limit Theorem


If x1 ...xn are a random sample from any multivariate probability distribution
with finite mean vector µ and finite positive definite covariance matrix Q, then
µ ¶
√ d
n x − µ → N (0, Q)
__ −−

which states that the limiting distribution of the mean of the sample is Normal.

Central limit theorems give us an indication of the properties of the limiting


distribution of the sample mean. The final key theorem is

Theorem 6 Asymptotic Distribution of the Sample Mean


If √
n (x − µ) d
→ N (0, 1)
σ
then, asymptotically µ ¶
σ2
x ∼ N µ,
n
which states that the mean of the random variable x is asymptotically (i.e. in
very large samples) distributed normally with a mean of µ (the population mean)
2
and a variance σn (which goes to zero as the sample size goes to infinity.

7
The central limit theorem is absolutely crucial to econometrics as it allows
us to base our inference on the properties of the sample mean on the assumption
that its distribution can be approximated by a normal distribution whatever the
true (but unknown population distribution).

Free Gauss download at:


Open: “ftp://ftp.aptech.com/”, you can download manuals by double click-
ing on “Manuals”, or you can download Gauss Light by double clicking on
“GAUSS_Light”, you get to the following page:
“ftp://ftp.aptech.com/GAUSS_Light/”. Download the zip files from there.
Free OX download at:
Open: ”http://www.doornik.com/download.html”, double click on: ”Down-
load Ox Console (all platforms)”.
Fill out the opened web page

If you are working on Gauss, you can open a ”txt” file as an editor file. Write
your code on that file and save it using shorter names (e.g. ass1.txt)
If you are working on OX and you are programming using gauss codes, save
your editor file using the extension “.src”, (e.g. ass1.src).

Test your package by doing a small program:


eg:
n=1000;
z=rndn(n,1); @generate standard normal@
x=10+3*z; @generate N(10,9)@
meanc(x); @check the mean of x@
x1=zeros(10,1); @genrate sample 1 of x@
i=1;
do while i<=10; @do a doloop to fill out sample 1@
x1[i,1]=x[100*i,1];
i=i+1;
endo;
meanc(x1); @check the mean of x1@
x2=zeros(10,1); @genrate sample 2 of x@
i=1;
do while i<=10; @do a doloop to fill out sample 2@
x2[i,1]=x[50*i,1];
i=i+1;
endo;
meanc(x2); @check the mean of x2@

x3=zeros(10,1); @genrate sample 3 of x@


i=1;
do while i<=10; @do a doloop to fill out sample 3@
x3[i,1]=x[80*i,1];
i=i+1;

8
endo;
meanc(x3); @check the mean of x3@

@compute the mean of the samples means and compare with the population
mean meanc(x)@
x_average=(meanc(x1)+meanc(x2)+meanc(x3))/3;

You might also like