Random Variables and Probability Distributions

Business School, Brunel University
MSc. EC5501/5509 Modelling Financial Decisions and Markets/Introduction

to Quantitative Methods
Prof. Menelaos Karanasos (Room SS269, Tel. 01895265284)
Lecture Notes 1
1. Random Variables and Probability Distributions
Consider a random experiment with a sample space . A function A, which assigns
to each element . one and only one real number, is called a random variable.
Example: We toss a coin twice. The sample space is
= (HH) . (HT) . (TH) . (TT) .
and the probabilities of its events are
1 (HH) = 1,4. 1 (HT) = 1,4. 1 (TH) = 1,4. 1 (TT) = 1,4.
We now dene the function A-number of heads. In this case the outcome set,
i.e. the space of the random variable A is
o = 0. 1. 2 .
Thus we can write
1 (A = 0) =
1
4
. 1 (A = 1) =
2
4
. 1 (A = 2) =
1
4
.
Let A be a random variable. The function 1 () dened by
1 (r) = 1 (A _ r) . for all r 1.
is called the Distribution Function (D.F.) of A and satises the following proper-
ties:
(i) 1 (r) is non-decreasing,
(ii) 1 () = lim
x!1
1 (r) = 0. and 1 () = lim
x!1
1 (r) = 1.
(iii) 1 (r) is continuous from the right.
1 (r) is also called the Cumulative Distribution Function of A; it is a distribution
function inasmuch as it tells us how the values of the random variable are distrib-
uted, and it is a cumulative distribution function since it gives the distribution of
values in cumulative form. The counterdomain of 1 (r) is the interval [0. 1] .
For the example in page 1 we have
1 (0) = 1 (A _ 0) =
1
4
.
1 (1) = 1 (A _ 1) =
3
4
.
1 (2) = 1 (A _ 2) = 1.
2
The Distribution Function describes the distribution of values of the random
variable. For two distinct classes of random variables, the distribution of values
can be described more simply by using density functions. These two classes are:
1.1. Discrete Random Variable
A random variable (r.v.) A will be dened to be discrete if the range of A is
countable. If A is discrete then 1 (r) will be dened to be discrete.
If A is a discrete random variable with distinct values r
1
. r
2
. .... r
n
. then the
function , () dened by
, (r) =
_
1 (A = r
j
)
0
if
r = r
j
. , = 1. 2. .... :
r ,= r
j
_
is called the probability density function (p.d.f.) of the discrete A. Note that , ()
is a non-negative function.
The values of a discrete r.v. are often called mass points and , (r
j
) denotes
the mass associated with the mass point r
j
. The distribution function of a discrete
random variable has steps at the mass points; at the mass point r
j
, 1 () has a
step of size , (r
j
), and 1 () is at between mass points.
For the example in page 1 we have
, (0) = 1 (A = 0) =
1
4
.
, (1) = 1 (A = 1) =
1
2
.
, (2) = 1 (A = 2) =
1
4
.
3
For a discrete random variable we have that
1 (r) =
ux
, (n) , and

x
, (r) = 1.
1.2. Continuous Random Variable
A random variable A is called continuous if there exists a function , () such that
1 (r) =
_
x
1
, (n) dn , for every real number r.
If A is a continuous r.v. then 1 () is dened to be continuous. The function
, () is called the probability density function of the continuous random variable
A. In this case we have
_
1
1
, (r) dr = 1.
Note that , () can be obtained by dierentiation, i.e.
, (r) =
d1 (r)
dr
for those points r for which 1 (r) is dierentiable.
Caution:
The notations for the density function of discrete and continuous random vari-
ables are the same, yet they have dierent interpretations.
For discrete r.v.s , (r) denotes probabilities, i.e.
, (r) = 1 (A = r) .
For continuous r.v.s , (r) is the derivative of the distribution function, whereas
the probability that the r.v. will take a particular value is zero:
, (r) =
d1 (r)
dr
. 1 (A = r) =
_
x
x
, (n) dn = 0.
In addition, for a continuous r.v. we can write
1 (c < A < /) = 1 (c _ A _ /) =
_
b
a
, (r) dr = 1 (/) 1 (c) .
4
2. Numerical Characteristics of Random Variables
2.1. (Mathematical) Expectation of a r.v.
Let A be a random variable. The (mathematical) expectation of A, 1 (A) . or
the mean of A. j. is dened by
(i) j = 1 (A) =
j
r
j
, (r
j
) . if A is discrete with mass points r
1
. r
2
. ...
(ii) j = 1 (A) =
_
1
1
r, (r) dr. if A is continuous with p.d.f. , (r) .
So 1 (A) is an average of the values that the random variable takes on,
where each value is weighted by the probability that the r.v. is equal to that
value; the expectation of a r.v. A is the centre of gravity of the unit mass that
is determined by the density function of A. Thus the mean of A is a measure of
where the values of the random variable A are centred.
It should be noted that the expectation or the expected value of A is not
necessarily what you expect. For example, the expectation of a discrete r.v.
is not necessarily one of the possible values of A, in which case, you would not
expect to get the expected value.
2.1.1. Example
You roll a (fair) die and you receive as many sterling pounds as the number of
dots that appear on the die. In this case we have that
1 (A) = 1
_
1
6
_
+ 2
_
1
6
_
+ 3
_
1
6
_
+ 4
_
1
6
_
+ 5
_
1
6
_
+ 6
_
1
6
_
= 3.5.
The above does not imply that if you roll a die you can win 3.5 sterling pounds,
but if you play the game : times, with : suciently large, you expect to win
3.5 (:) sterling pounds.
2.1.2. Example
You are presented with two choices:
(i) Toss a coin with the possibility to
_
win $100 if heads
loose $50 if tails
_
.
5
(ii) Receive $25.
Note that the expectation of choice (i) is $25.
The above game is a fair game because the mean of the risky choice (i) equals the
certain alternative (ii).
2.2. Expectation of a function of a r.v.
Let A be a random variable and q () be a function with both domain and counter-
domain the real line. The expectation of the function q () of the r.v. A, denoted
by 1 [q (A)], is dened by
(i) 1 [q (A)] =

j
q (r
j
) , (r
j
) . for a discrete r.v.
(ii) 1 [q (A)] =
_
1
1
q (r) , (r) dr. for a continuous r.v.
2.3. Properties of expected value
(i) 1 (c) = c.
(ii) 1 (cA) = c1 (A) .
where c. A denote a constant and a r.v., respectively.
(iii) 1 (A
1
+ A
2
+ ... + A
n
) = 1 (A
1
) + 1 (A
2
) + ... + 1 (A
n
) .
(iv) 1 [c
1
q
1
(A) + c
2
q
2
(A) + ... + c
n
q
n
(A)] = c
1
1 [q
1
(A)] +
+c
2
1 [q
2
(A)] + ... + c
n
1 [q
n
(A)] .
where c
1
. c
2
. .... c
n
are constants, A
1
. A
2
. .... A
n
are r.v.s, and q
i
(A), i = 1. 2. ... :.
are functions of a r.v. A.
6
2.4. Variance of a r.v.
The variance of a random variable A is denoted by o
2
or \ (A) . and is given by
(i) o
2
= \ (A) =
j
(r
j
j)
2
, (r
j
) . when A is a discrete r.v.,
(ii) o
2
= \ (A) =
_
1
1
(r j)
2
, (r) dr. when A is a continuous r.v..
The standard deviation of A is dened as o =
_
\ (A). Note that the variance
of a r.v. is nonnegative.
The mean of a random variable A is a measure of central location of the density
of A. On the other hand, the variance of a r.v. A is a measure of the spread or
dispersion of the density of A.
Remark: Let q (A) = (A j)
2
. Then we can write that
1 [q (A)] = 1
_
(A j)
2
= \ (A) .
Thus, the variance of a discrete or continuous r.v. A can be dened as
\ (A) = 1
_
(A j)
2
=\ (A) = 1
_
A
2
_
[1 (A)]
2
.
Chebyshevs inequality:
1 (j :o < A < j + :o) _ 1
1
:
2
, for every : 0.
When : = 2. we get that
1 (j 2o < A < j + 2o) _
3
4
.
i.e. at least three fourths of the mass of any r.v. A fall within two standard
deviations of its mean.
2.5. Properties of Variance
(i) \ (c) = 0. where c is a constant,
(ii) \ (cA) = c
2
\ (A) . where c is a constant,
(iii) \ (A 1 ) = \ (A) + \ (1 ) . where A and 1 are independent r.v.s.
7
2.6. Example
Consider a random experiment where there are only two possible outcomes; the
conventional practice is to call them success and failure. the r.v. A will take
the value 1, if the outcome is a success, with probability j; A will take the
value 0, if the outcome is a failure, with probability 1j. The r.v. A is called a
Bernoulli random variable and the random experiment is called a Bernoulli trial.
The probability density function of A takes the following form:
, (r) =
_
j
x
(1 j)
1x
. r = 0. 1
0. , otherwise
_
.
Therefore, we have that
j = 1 (A) =
j
r
j
, (r
j
) = 1 (j) + 0 (1 j) = j.
1
_
A
2
_
=

j
r
2
j
, (r
j
) = 1
2
(j) + 0
2
(1 j) = j.
o
2
= \ (A) = 1
_
A
2
_
1 (A)
2
= j j
2
= j (1 j) . or
\ (A) =

j
(r
j
j)
2
, (r
j
) = (1 j)
2
j + (0 j)
2
(1 j) =
= (1 j) j (1 j + j) = j (1 j) .
An example of a Bernoulli trial is a toss of a coin.
2.7. Example
Let A be a continuous r.v. with the following p.d.f.
, (r) =
_
1
2
r . 0 _ r _ 2
0 . otherwise
_
.
The mean and the variance of A are given by
j = 1 (A) =
_
2
0
r, (r) dr =
_
2
0
1
2
r
2
dr =
_
1
6
r
3
_
2
0
=
4
3
.
1
_
A
2
_
=
_
2
0
r
2
, (r) dr =
_
2
0
1
2
r
3
dr =
_
1
8
r
4
_
2
0
= 2.
o
2
= \ (A) = 1
_
A
2
_
1 (A)
2
= 2
16
9
=
2
9
. and o =
_
2
3
.
8
The distribution function of A is given by
1 (r) =
_
x
1
, (n) dn =
_
x
0
1
2
ndn =
_
1
4
n
2
_
x
0
=
1
4
r
2
.
2.8. Example
Let A be a continuous r.v. with p.d.f. constant on an interval and 0 elsewhere,
i.e.
, (r) =
_
/. c < r < /
0. elsewhere
_
.
Such a random variable is said to be uniformly distributed on the interval [c. /].
Note that / is a function of /. c :
_
1
1
, (r) dr = 1 =
_
b
a
/dr = [/r]
b
a
= / =
1
/ c
.
The expectation and the variance of A are given by
j = 1 (A) =
_
1
1
r, (r) dr
=
_
b
a
r
/ c
dr =
c + /
2
.
1
_
A
2
_
=
_
b
a
r
2
/ c
dr =
/
3
c
3
3 (/ c)
.
o
2
= \ (A) = 1
_
A
2
_
1 (A)
2
=
(c /)
2
12
. and o =
/ c
_
12
.
3. The Normal Distribution
We say that A is a normal random variable, or simply that A is normally distrib-
uted, with parameters j and o
2
. if the density function of A is given by
, (r) =
1
_
2:o
2
exp
_
(r j)
2
2o
2
_
. < r < .
9
A normal r.v. is a continuous one. The parameters j and o
2
represent the
mean and variance of A, respectively. The above p.d.f. is a bell shaped curve
that is symmetric about j. Alternatively, we can write:
A ~ `
_
j. o
2
_
.
which reads as A follows the normal with mean j and variance o
2
.
The normal distribution was introduced by the French mathematician Abra-
ham de Moivre, back in the 18
th
century, and was used by him to approximate
probabilities associated with binomial random variables when the binomial pa-
rameter : is large. This result was later extended by Laplace and others and is
now encompassed in a probability theorem known as the Central Limit Theorem
(C.L.T.). The C.L.T. gives a theoretical base to the often noted empirical ob-
servation that many random phenomena obey, at least approximately, a normal
probability distribution.
By the beginning of the 19
th
century, the work of Gauss on the theory of
errors placed the normal distribution at the centre of probability theory.
An important fact about normal random variables is that if A is normally
distributed with parameters j and o
2
. then a linear transformation of A, 1 =
cA + ,. is normally distributed with parameters cj + , and c
2
o
2
. i.e.
_
A ~ ` (j. o
2
)
1 = cA + ,
_
=1 ~ `
_
cj + ,. c
2
o
2
_
.
since 1 (1 ) = 1 (cA + ,) = c1 (A) + , = cj + ,.
and \ (1 ) = \ (cA + ,) = c
2
\ (A) + 0 = c
2
o
2
.
An important implication of the preceding result is that
_
if A ~ ` (j. o
2
)
and 2 =
X
_
then 2 ~ ` (0. 1) .
since 1 (2) = 1
_
A
o
_
1
_
j
o
_
=
j
o

j
o
= 0.
and \ (2) = \
_
A
o
_
+ \
_
j
o
_
=
\ (A)
o
2
+ 0 =
o
2
o
2
= 1.
10
Such a random variable 2 is said to have the standard, or unit, normal distribution.
It is traditional to denote the distribution function of a standard normal r.v.
by (r) . From the symmetry of the standard normal distribution it follows that
(r) = 1 (r) .
in other words that
1 (2 _ r) = 1 (2 r) .
When A ~ ` (j. o
2
) . then the distribution function of A, 1 (r) . can be
expressed as:
1 (c) = 1 (A _ c) = 1
_
A j
o
_
c j
o
_
=
= 1
_
2 _
c j
o
_
=
_
c j
o
_
.
3.1. Example
Let A ~ ` (3. 9) . Find (i) 1 (2 < A < 5) . (ii) 1 (A 0) . (iii) 1 ([A 3[ 6) .
(i) 1 (2 < A < 5) = 1
_
2 3
3
< 2 <
5 3
3
_
=
= 1
_
1
3
< 2 <
2
3
_
= 0.1293 + 0.2486 = 0.3779.
(ii) 1 (A 0) = 1
_
2
0 3
3
_
= 1 (2 1) = 1 (2 _ 1) =
= 1 (2 _ 0) + 1 (0 < 2 _ 1) = 0.5 + 0.3413 = 0.8413.
(iii) 1 ([A 3[ 6) = 1 (A 9) + 1 (A < 3) =
= 1
_
2
9 3
3
_
+ 1
_
2 <
3 3
3
_
=
= 1 (2 2) + 1 (2 < 2) = 2 (0.0228) 0.05.
11
4. Functions of Normally Distributed Random Variables:A
Summary
(1) If r
i
~ ` (j
i
. o
2
i
) . i = 1. .... : are independent r.v.s, then
_
n
i=1
r
i
_
~ `
_
n
i=1
j
i
.
n
i=1
o
2
i
_
[normal]
(2) If r
i
~ ` (0. 1) . i = 1. .... : are independent r.v.s, then
_
n
i=1
r
2
i
_
~
2
(:) [chi-square with n degrees of freedom]
(3) If r
1
~ ` (0. 1) . r
2
~
2
(:) . r
1
and r
2
are independent r.v.s, then
r
1
_
x
2
n
~ t (:) [Students t with n degrees of freedom]
(4) If r
1
~
2
(:
1
), r
2
~
2
(:
2
) . r
1
and r
2
are independent r.v.s, then
r
1
,:
1
r
2
,:
2
~ 1 (:
1
. :
2
) [Fishers F with n
1
and n
2
degrees of freedom]
5. Random Vectors and their Distributions: a Summary
There are many observable phenomena where the outcome comes in the form of
several quantitative attributes. For example, data on personal income might be
related to social class, type of occupation, age class, etc. In order to be able to
model such real phenomena we need to extend the above framework for a single
r.v. to one for multidimensional r.v.s or random vectors.
For expositional purposes we shall restrict attention to the two-dimensional
(bivatiate) case, which is adequate for a proper understanding of the concepts
involved.
Consider the random experiment of tossing a fair coin twice (the sample space
is given in page 1). Dene the function A () to be the number of heads, and
12
1 () to be the number of tails. Both of these functions map into the real line
1. The bivariate random variable (or two-dimensional random vector) (A. 1 ) can
be considered as a function which assigns to each element of a pair of ordered
numbers (r. ) . i.e. A () . 1 () : 1
2
.
If both the r.vs A and 1 are discrete, then (A. 1 ) is called a discrete bivariate
r.v. If both the r.vs A and 1 are continuous, then (A. 1 ) is called a continuous
bivariate r.v.
The Distribution Function of the vector of random variables is called the
joint distribution function of the r.v.s:
1 (r. ) = Pr (A _ r. 1 _ ) .
The Probability Density Function (pdf.) of the random vector is called the
joint probability density function of the r.v.s:
, (r. ) = Pr (A = r. 1 = ) . (discrete case);
, (r. ) =
J
2
1 (r. )
JrJ
. (continuous case).
The joint probability function , (r. ) of the above example is:
13
1 = 0 1 = 1 1 = 2
A = 0 0 0 1,4
A = 1 0 1,2 0
A = 2 1,4 0 0
Pr (A = 0. 1 = 2)
A marginal probability density function is dened with respect to an
individual random variable. Knowing the joint pdf, we can obtain the mar-
ginal pdf. of one r.v. by summing or integrating out the other r.v.:
,
x
(r) =
_
y
, (r. ) . discrete case
_
y
, (r. ) d. continuous case
_
.
,
y
() =
_
x
, (r. ) . discrete case
_
x
, (r. ) dr. continuous case
_
.
Two random variables are statistically independent if and only if their
joint density is the product of the marginal densities:
, (r. ) = ,
x
(r) ,
y
() =r and are independent.
The covariance (o
xy
) provides a measure of the linear relationship between
the two random variables:
o
xy
= Co (A. 1 ) = 1
_
(A j
x
)
_
1 j
y
_
= 1 (A1 ) j
x
j
y
. where j
x
= 1 (A) . j
y
= 1 (1 ) .
If A and 1 are independent r.v.s, then it holds that 1 (A1 ) = 1 (A) 1 (1 ) .
i.e. Co (A. 1 ) = 0. In other words independence implies linear indepen-
dence; the converse is not true. Properties of the Covariance operator:
Co (c. A) = 0.
Co (A. A) = \ c: (A) .
Co (cA. /1 ) = c/Co (A. 1 ) .
Co (cA + /1. cA + d1 ) = cc\ c: (A) + /d\ c: (1 )
+(cd + /c) Co (A. 1 )
.
where c. /. c. d are constants.
14
A standardised form of covariance is the correlation coecient
_
j
xy
_
:
j
xy
=
o
xy
_
o
2
x
o
2
y
. where o
2
x
= \ c: (A) . o
2
y
= \ c: (1 ) .
Note that 1 _ j
xy
_ 1. If A and 1 are independent r.v.s, then they
are also uncorrelated. Note that in this case we have that \ c: (A 1 ) =
\ c: (A) + \ c: (1 ) . If j
xy
= 1 we say that A and 1 are perfectly posi-
tively correlated; if j
xy
= 1 we say that A and 1 are perfectly negatively
correlated.
Conditioning and the use of conditional distributions play a pivotal role in
econometric modelling. In a bivariate distribution, there is a conditional
distribution over 1 for each value of A. We dene the conditional prob-
ability density function of 1 given A as
, (,r) =
, (r. )
,
x
(r)
.
Similarly to its unconditional density, the conditional density of 1 on A
can be numerically characterised by its conditional expectation, 1 (1,A) .
and its conditional variance, \ c: (1,A) . If A and 1 and independent,
, (,r) = ,
y
() . Furthermore, the denition of conditional densities implies
the important result
, (r. ) = , (,r) ,
x
(r) .
15

Random Variables and Probability Distributions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Variables and Probability Distributions

Uploaded by

Copyright:

Available Formats

Business School, Brunel University

MSc. EC5501/5509 Modelling Financial Decisions and Markets/Introduction

You might also like