You are on page 1of 42

STAT272 2013

Probability

Topic 3

Topic 3
Continuous Random Variables

STAT272

STAT272 2013

Topic 3

Amount of rainfall in a day. On a dry day, the amount of

It is possible for a rv to be a mixture of a continuous and discrete


rv. Examples:

Continuous random variables can take any value over some range.

Since random variables of this type can have a continuum of


possible values, they are called continuous random variables.

the lifetime of a transistor.

the time that a train arrives at a specified stop;

Two examples:

There are rvs which take on an uncountable number of values.

Weve seen examples of discrete random variables which take on


an infinite but countable number of values.

Continuous random variables

STAT272 2013

Topic 3

The length of time it takes for a customer to commence


service in a single-server queue with continuously distributed
service times. There is a non-zero probability that the queue
is empty and therefore a strictly positive probability that the
time until commencement of service is zero. However, if the
queue is non-empty, the time until commencement of service
is continuous on (0, ) .

rainfall is exactly zero; on a wet day the amount of rainfall is


continuous on (0, ).

STAT272 2013

3.

2.

1.

Topic 3

Zb

fX (x) dx.

fX (x) dx = 1;

P (a < X b) =

fX (x) 0, x R;

A random variable X is said to be continuous if it takes on an


uncountably infinite number of values, and if there is a function
fX (x) , called the probability density function, or pdf, such that:

0.4

0.3

0.2

P(1<X<2)

0
x

Topic 3

Probabilities are represented by areas under the pdf.

0.1

fX(x)

0.4

0.3

0.2

0.0

Important:

STAT272 2013

fX(x)

0.1

0.0

STAT272 2013

Topic 3

The height of fX (x) is never interpreted as P (X = x) for a


continuous rv.

Note that this is not true for discrete distributions.

STAT272 2013

Topic 3

Every value between 0 and 1 is equally likely.

1 0 x 1
fX (x) =
0 otherwise

The Uniform Distribution U (0, 1)

STAT272 2013

1.5
1.0
0.5
0.0

fX(x)

0.5

0.0

Topic 3

0.5

1.0

1.5

STAT272 2013

Topic 3

= b a.

[x]ba

If X is uniformly distributed on [0, 1], then, if a 0 and b 1,


Z b
1 dx
P (a X b) = P (a < X < b) =

STAT272 2013

1.5
1.0
0.5
0.0

fX(x)

0.5

0.0

Topic 3

0.5

1.0

1.5

10

STAT272 2013

ba

Topic 3

This will be true for any continuous rv, i.e. If X is a continuous


rv, then P (X = x) = 0, x.

ba

= lim+ (b a) = 0

P (X = a) = lim+ P (a X b)

From above, for any a,

11

STAT272 2013

Topic 3

FX (x) is called the distribution function of X or cumulative


distribution function (cdf) of X.

FX (x) = P (X x) .

For any rv X, continuous, discrete, or neither, let FX be defined


by

Cumulative Distribution Function (cdf )

12

STAT272 2013

Topic 3

e.g. For the uniform random variable above:

13

STAT272 2013

1 ;

b0

b1

0<b<1

1 dx

0 ;

=b.

Topic 3

FX (b) = P (X b) =

= [x]b0

P (X b) =

Therefore we can write the cdf as

When 0 < b < 1,

When b > 1, P (X b) = 1.

When b < 0, P (X b) = 0.

14

STAT272 2013

Proof:

Topic 3

= P ({X x1 } {x1 < X x2 }) .

FX (x2 ) = P (X x2 )

Let x2 x1 . Then FX (x2 ) FX (x1 ) .

FX () = limx P (X x) = 1.

FX () = limx P (X x) = 0.

Properties of a cdf

The domain of the cdf is R. Note that FX (x) needs to be


specified for all values of x, and not just the values of x for which
x is a possible value of X.

15

STAT272 2013

Topic 3

FX (u) = 1

FX () = 0

If X has lowest possible value and greatest possible value u,


then

FX (x2 ) FX (x1 ) .

since all probabilities are non-negative. Thus

P (X x1 ) ,

FX (x2 ) = P (X x1 ) + P (x1 < X x2 )

Since (, x1 ] and (x1 , x2 ] are mutually exclusive

16

STAT272 2013

Topic 3

The picture below is of the cdf of a discrete rv. Since X is discrete,


FX (x) is a step-function.

Example of shapes of cdfs

17

STAT272 2013

Topic 3

The picture below is of the cdf of a continuous rv. Since X is


continuous, FX (x) is continuous.

18

STAT272 2013

Z
fX (u) du

Topic 3

i.e. the probability density function fX is the derivative of the


cdf FX .

d
FX (x) = fX (x) .
dx

The fundamental theorem of calculus gives us the relationship

Note that the dummy in the integral is u, not x.

FX (x) =

For a continuous rv X,

The probability of observing a specific value x of a continuous


random variable X is zero. It is thus the cdf, FX (x) , which is
used to define probabilities.

Probability Density Functions (pdf )

19

STAT272 2013

Topic 3

yx

fX (x) = FX (x) lim FX (y)

When X is discrete, there is also a relationship between fX and


FX . x,

20

STAT272 2013

Topic 3

Suppose that a random variable X is equally likely to lie in any


small sub-interval of (a, b) , and that it cannot lie outside this
interval. We then say that X U (a, b) , and have

1
; axb
ba
fX (x) =
0
; otherwise.

The Uniform Distribution U (a, b)

21

STAT272 2013

Za

FX (x) =

0du +

Zx

1
du
ba

fX (u) du

Topic 3

=0+

1
x
[u]
b a u=a
xa
.
=
ba

Zx

Thus, if x (a, b) , we have

22

STAT272 2013

FX (x) =

Topic 3

xa
ba

x>b

x<a

axb

Obviously, we have FX (x) = 0 if x a and FX (x) = 1 if x b.


Hence the proper specification of FX is

23

STAT272 2013

Topic 3

24

= 0 (lower limit from above)

= 1 (upper limit from below)

ba
ba

FX (a ) =

FX (b ) =

STAT272 2013

Topic 3

Note: The cdf is continuous but not differentiable, since it is not


differentiable at a or b.

aa
ba

Checks:

25

STAT272 2013

Topic 3

We shall prove this later in more generality. The pdf of the sum
of two independent rvs is known as the convolution of the pdfs of
the two rvs.

The pdf is given in the picture below, where c = 2a and d = 2b.

If two independent (this will be defined later) rvs are both


U (a, b) , then their sum is triangular on (2a, 2b) .

The triangular Distribution (Symmetric)

26

STAT272 2013

Topic 3

27

STAT272 2013

Topic 3

x2
,
=
2

x


u du
u2
=
2

FX (x) =

Zx

The cdf FX (x) is a little complicated. When x (0, 1) ,

For the case where a = 0 and b = 1, we have

; 0<x1

x
fX (x) =
2x ; 1<x2

0
; otherwise.

28

STAT272 2013

Zx
(2 u) du

=1

x
1
2

Topic 3

1
2
(2 x)
2

= 2x

x

u2
= FX (1) + 2u
2 1
 


2
1
x
1
2
= + 2x
2
2
2

FX (x) = P (X 1) +

while, when x (1, 2) ,

29

STAT272 2013

Checks:

x > 2.

1<x2

;
;

0<x1

x0
;

Topic 3

!  02
FX 0+ =
=0
2
!  1
FX 1 =
2
! +
1
1
FX 1 = 2 1 =
2
2
! 
4
FX 2 = 4 1 = 1
2

0
x2
2
FX (x) =

2
1

(2
x)
1

Note the symmetry

Thus

30

STAT272 2013

k
a valid pdf on R?
1 + x2

Topic 3

First, does the function satisfy non-negativity conditions?

What value of k makes

The Cauchy Distribution

31

STAT272 2013



k
dx = k tan1 x
2
1+x
h   i
=k

2
2
= k.

What is the cdf?

1
(1 + x2 )

Topic 3

fX (x) =

The pdf of a Cauchy random variable is thus

The function is thus a pdf if k = 1, i.e. if k = 1/.

What about its integral?

Obviously.

32

STAT272 2013

x,

1
du
(1 + u2 )

f (u) du

Topic 3

1  1 x
tan u

 i
1 h 1
tan x
=

2
1
1
= + tan1 x.
2

Zx

FX (x) =

Zx

33

STAT272 2013

Checks:

1
tan1 ()

1  

1
tan1 ()

1  

FX (0) =

Topic 3

1
1
+ tan1 (0)
2
1
1
= + 0
2
1
=
2

FX () =

1
+
2
1
= +
2
= 1,

1
+
2
1
= +
2
= 0,

FX () =

34

STAT272 2013

Topic 3

Consider the event {V > v} . This is the same as the event


{X (v) = 0} . To see this, note that if X (v) = 0, then
X (t) = 0, t v, and so V > v. Conversely, if V > v, then the
time at which the next event occurs is greater than v, and so
X (v) = 0.

As the numbers of events in non-overlapping intervals are


independent, this will have the same distribution as the time it
takes a single event to occur, given that X (0) = 0.

Let V be the time between two successive events in a Poisson


process {X (t)} with parameter .

The Exponential Distribution

35

STAT272 2013

Topic 3


d !
1 ev
dv
= ev .
fV (v) =

= 1 ev .

FV (v) = P (V v)

ev (v)
0!
= ev .

P (V > v) = P (X (v) = 0|X (0) = 0)

Hence, for v > 0,

Now

Thus

36

STAT272 2013

ev
fV (v) =

v>0
v0

;
;

Topic 3

is known as the exponential with parameter .

The pdf

37

STAT272 2013

Topic 3

38

STAT272 2013

Topic 3

The amount of time (from now) until an earthquake occurs;

The amount of time until two vehicles meet at a one-lane bridge;

The time it takes for someone to pick up a ringing phone;

Some examples of quantities that have been modelled as


exponentially distributed random variables:

39

STAT272 2013

Topic 3

This is the probability that X is at least a + b, given that we


know X is at least a. [Note: a and b must both be 0. ]

P (X > a + b|X > a)

Let X be distributed exponentially with parameter . Consider


the conditional probability

Lack of Memory or Memoryless Property

40

STAT272 2013

Topic 3

by the rule of Conditional Probability and noting that the event


{X > a + b} is a subset of the event {X > a}. Thus

P [{X > a + b} {X > a}]


P (X > a + b|X > a) =
P (X > a)
P (X > a + b)
,
=
P (X > a)

By definition,

41

STAT272 2013

Topic 3

= P (X > b)

= 1 FX (b)

e(a+b)
ea
b
=e


= 1 1 eb

1 P (X a + b)
1 P (X a)
1 FX (a + b)
=
1 FX (a)


1 1 e(a+b)
=
1 {1 ea }

P (X > a + b | X > a) =

42

STAT272 2013

P (X > a + b | X > a) = P (X > b)

Topic 3

How do we interpret this? Suppose we have modelled the life of a


lightbulb as being exponentially distributed with parameter 1
(year). Suppose the lightbulb has been working for 1 year. The
above shows that the remaining life is also exponentially
distributed with parameter 1 (year). We say that the exponential
distribution has no memory. This should indicate that the
exponential model is probably not a very good model for the life
of a lightbulb.

provided a and b are both non-negative.

Thus

43

STAT272 2013

Topic 3

= P (X (w) = 0 or 1) .

P (W > w) = P (2nd event occurs at a time > w)

Now consider the total time until the second event occurs in a
Poisson process. Let W be the time until the second event occurs
in a Poisson process {X (t)} with parameter , when X (0) = 0.
Now

The Erlangian Distribution

44

STAT272 2013

FW

1 ew (1 + w)
(w) =

Topic 3

w>0
w0

ew (w)
ew (w)
+
0!
1!
w
=e
(1 + w) ,

+ P (1 event in (0, w))

1 FW (w) = P (0 events in (0, w))

The pdf of W is found by differentiating:

and so

Thus

45

STAT272 2013

fW

2 wew
(w) =

w>0
w 0.

;
;

w0

w0

w>0

Topic 3

Generalisation: What is the pdf of the time until the kth event in
a Poisson process with parameter ?

Thus

w>0

d
FW (w)
dw

ew (1 + w) ew

2 wew
=

fW (w) =

46

STAT272 2013

Properties:

Topic 3

= [(0) (1)] = 1.



= ex 0

Z
(1) = ex dx

where the condition > 0 is needed for the integral to be finite.

Z
() = x1 ex dx

The gamma function is defined by

The Gamma Function

47

STAT272 2013

Thus

Now


xex 0

ex dx

!

(x) d ex

= 1.

Topic 3

= [(0) (0)] + (1)

(2) =

Z
(2) = xex dx

48

STAT272 2013

Proof:


!
x d ex

Topic 3

= 0 + () .



= x ex 0 +

0
Z

Z
( + 1) = x ex dx

Z
x1 ex dx

( + 1) = ()

More generally, when > 0,

49

STAT272 2013

Topic 3

If > 0, but is not integral, then the function is still defined.


In fact, as long as we know the values of (x) for x (0, 1), we
can calculate all values of (x). For example,

= ( 1)!

= ( 1) ( 2) 1 (1)

= ( 1) ( 2) ( 2)

() = ( 1) ( 1)

Note that for integral

50

STAT272 2013

Topic 3

 
 
 
1
5
7
5
3
=
=
2
2
2
2
 
3
5 3
=
2 2
2
 
5 3 1
1
=
2 2 2
2
 
1
15

=
8
2
 

1
= .
It can be shown that
2

51

STAT272 2013

r=0

k1
X

ew (w)
.
r!

Topic 3

(Note that the kth event has to occur at a time > w.) Hence

k1
X (w)r

1 ew
; w>0
r!
FW (w) =
r=0

0
; w0

1 FW (w) = P (W > w)

0 or 1 or . . . or (k 1) events

=P
occur in (0, w)

Let W = Wk be the time until the occurrence of the k th event in


a Poisson process with parameter . Then, for k = 1, 2, 3, . . .

52

STAT272 2013

and

d
dw

(w)
r!
r=0

k1

(w)
(k 1)!

k2
r
X (w)s
(w)
ew
r!
s!
s=0
r=0

k1
X

k1
r
X r (w)r1
(w)
ew
r!
r!
r=0
r=1

k1
X

ew

k1
X

Topic 3

k wk1 ew
(k 1)!

= ew

= e

= ew

fW (w) =

53

STAT272 2013

Topic 3

2. k integral. W is said to have the Erlangian distribution. The rv


W can be thought of as the sum of independent and identically
distributed exponential rvs, each with parameter . This has
resulted in the Erlangian distribution being used in instances
where the lack of memory of the exponential prevents it from
being used.

k k1 w
e

w
; w>0
(k 1)!
fW (w) =

0
; w 0.

1. k = 1. W is exponential with parameter . As expected,

ew ; w > 0
fW (w) =

0
; w0

Special cases:

54

STAT272 2013

Topic 3

Z 1 x
x
e
dx
()

= 1.

w1 ew
dw =
()

is a pdf, since, using the substitution x = w,

3. Although we have assumed that k is an integer in the above, we


know that when > 0, the function

1 w
e

w
; w>0
()
f (w) =

0
; w 0.

55

STAT272 2013

x1 ex/
()

Topic 3

This is denoted as X G(, ). Some shapes of the Gamma pdf


are shown below.

fX (x) =

4. The parameter is also known as the shape parameter. It is


usual to put = 1/. is called the scale parameter. The pdf is
then known as the Gamma distribution with parameters and :

56

STAT272 2013

fX(x)

0.20

0.15

0.10

0.05

0.00

10

Topic 3

15

20

25

30

= 1, = 5
= 2, = 5
= 3, = 5

57

STAT272 2013

Topic 3

For the existence of the integral, we need > 0 and > 0.

The beta function is defined by


Z 1
1
B (, ) =
x1 (1 x)
dx.

The beta function

This is a two-parameter density function defined over the closed


interval [0, 1] . As such, it is often used as a model for proportions,
such as the proportion of impurities in a chemical product or the
proportion of time that a machine is in a state of being repaired, etc.

The Beta Distribution

58

STAT272 2013

() ()
.
( + )

and the x y plane.

Topic 3

z = ex x1 ey y 1

This is the volume between the surface

B (, ) =

Z
ex x1 dx
ey y 1 dy
0
0

Z Z
x 1 y 1
=
e x
e y
dy dx

() () =

By definition,

Well now show that

59

STAT272 2013

Topic 3

60

STAT272 2013

Topic 3

Changing the order of integration, the integral becomes



Z Z u
1
u 1
e x
(u x)
dx du.

Lets make the change of variable u = x + y in the inner integral.


Note that dy = du. The above integral is then

Z Z
1
u 1
e x
(u x)
du dx

61

STAT272 2013

and

Hence

(1 z)

() ()
.
( + )

Topic 3

B (, ) =

dz


dz du

() () = ( + ) B (, )

z 1 (1 z)
 Z

du

Z

+1 u

u+1 eu

Z

= ( + ) B (, ) .

Now put x = uz. We get dx = udz and the integral becomes



Z Z 1
1
1
u
e (uz)
(u uz)
udz du

62

STAT272 2013

B (1/2, 1/2) =

x1/2 (1 x)

1/2

dx.

Topic 3

Put x = sin2 . Then dx = 2 sin cos d and so


Z /2
1
1
2 sin cos d
B (1/2, 1/2) =
sin cos
0
Z /2
d = .
=2

Now

= B (1/2, 1/2) .

{ (1/2)}2 = (1) B (1/2, 1/2)

For the special case where = = 1/2, we obtain, since


R
(1) = 0 ex dx = 1,

63

STAT272 2013

Well need this soon.

and so

We thus have

Topic 3

ex
dx = .
x

{ (1/2)}2 =

64

STAT272 2013

x (0, 1)
elsewhere.

;
;

Topic 3

The graphs of a few beta pdfs are shown below.

( + ) x1 (1 x)1
() ()
=

The random variable X is said to follow the beta distribution


with parameters and ( > 0 and > 0) if

1
1

x1 (1 x)
; x (0, 1)
B
(,
)
fX (x) =

0
; elsewhere

The Beta Distribution

65

STAT272 2013

Topic 3

66

STAT272 2013

Topic 3

The standard uniform distribution (on the range 0 to 1 ) is just


the beta with parameters = 1 and = 1.

67

STAT272 2013

Topic 3

Why is this a pdf? From above, putting y = (x ) /, we have


(

2 )


Z
Z
1 x
1 2
dx =
exp
exp y dy
2



Z
1 2
= 2
exp y dy.
2
0

The normal or Gaussian distribution is perhaps the most widely


used of all the continuous probability distributions. The shape of
the pdf is the familiar bell-shaped curve. The two parameters,
and 2 completely determine the shape 2 and location of the
normal pdf:
(

2 )
1 x
1
exp
f (x) =
;
< x <
2

2 2

The Normal Distribution

68

STAT272 2013

Topic 3

The standard normal has = 0 and 2 = 1, and the notation Z


is usually reserved to denote a rv having this special distribution.
We shall interpret the parameters and 2 later.

!

X N , 2 .

When X has this pdf, we write

Thus f (x) is a pdf.

Now put u = y 2 /2. Then y = 2u, dy = 12u du, and so the


above is
Z

1
exp (u) du = 2 (1/2)
2
2u
0

= 2 2 .

69

STAT272 2013

1. Pdf of Z N (0, 1)

Topic 3

70

STAT272 2013

!

2. Pdf of X N , 2

Topic 3

71

f (x) =

STAT272 2013

Topic 3

1 x 2
1
e 2 ( ) ;
2
is a valid probability density function.

Hence

2. Weve shown its integral is 1.

1. f is obviously non-negative;

Checks:

< x <

72

STAT272 2013

Topic 3

They have the same shapes but different scales and locations.

X = + Z.

We can obtain a random variable X which has the N (, 2 )


distribution from Z which has the N (0, 1) distribution via the
equations
X
Z=

and

73

STAT272 2013


!
X N , 2

Topic 3

Some tables give the area between 0 and x (x > 0) , some

Z=

X
N (0, 1)

can be used to determine areas under the curve for any normal
distribution.

then the transformation

If

The normal cdf is important enough to be tabulated. It is also


very easy to calculate it using the well-known function erf, which
is called the error function.

The normal cdf cannot be written down in a nice closed form.


But neither can the log, exp, sin and cos functions.

Cumulative Distribution Function

74

STAT272 2013

(z) = (2)

1/2 12 z 2 .

Topic 3

Note that (z) = 1 (z) , since (z) is symmetric about


z = 0.

where

The cdf of a standard normal random variable is commonly


written as (z), while the probability density function is written
as (z). Thus
Zz
(z) =
(u) du,

between and x. All areas can be evaluated from either set of


tables. Some tables give the area between x and .

75

STAT272 2013

Y Bin (n, p)

Topic 3

The interpretation of and 2 will come later.

2 = np (1 p) .

= np

then we may approximate Y s distribution with that of a normal


rv with

If

The Poisson distribution can be used to approximate the


Binomial when n is large and p is very small. However, when p is
medium and n large, the binomial is better approximated by
the normal distribution. We shall prove this later on.

The Normal Approximation to the Binomial distribution

76

STAT272 2013

np(1p)

ynp


.

Topic 3

Since the binomial is discrete and the normal is continuous, an


additional factor must be included to account for this.

Continuity Correction

Thus we may approximate P (Y y) by

77

STAT272 2013

0.15
0.10
0.05

P(X=x)

11

12

13

14
x

15

16

17

18

Topic 3

The B(n, p) probability P (X = x) is approximated by the area


under the N (np, np(1 p)) curve, between x 12 and x + 21 .

0.00

78

STAT272 2013

Topic 3


 
100
100
70
30
P (70 Y 72) =
(0.6) (0.4) +
(0.6)71 (0.4)29
70
71
 
100
(0.6)72 (0.4)28
+
72
0.0201824

Now,

Y Bin (100, 0.6) .

Let Y be the number of heads in 100 tosses of the coin. Then

As an example, consider the probability of getting between 70


and 72 heads (inclusive) when tossing a biased coin 100 times,
where the probability of a heads appearing on any one toss is 0.6.

79

STAT272 2013

The relative error is 33%.

Topic 3

= 0.0136

Without Continuity Correction: the area under the curve


between 70 and 72 is


72 60
Y
70 60

P (70 Y 72) = P

24
24
P (2.04 Z 2.45)

= 24

= 100 (0.6) (0.4)

2 = np (1 p)

= np = 100 (0.6) = 60

Using the normal approximation, we have

80

STAT272 2013

Topic 3

What is the probability of obtaining fewer than 50 heads?

which is a much better approximation. The relative error is now


only 3.1%.

0.9946 0.9737 = 0.0209

With
Continuity
the area under the curve between

 Correction:


1
1
and 72 +
is
70
2
2


1
1
P (70 Y 72) = P 70 Y 72 +
2
2


72.5 60
69.5 60
Y

=P

24
24
P (1.939 Z 2.552)

81

STAT272 2013

Topic 3

for which the relative error is 3.6%.

0.0162,

P (Y < 50) = P (Y 49)




49.5 60
Y

=P

24
P (Z < 2.14)

Using the normal approximation with continuity correction

The Minitab CDF calculator gives 0.0168.

P (Y < 50) = P (Y 49)



49 
X
100
(0.6)r (0.4)100r
=
r
r=0

The exact answer is

82

STAT272 2013

Topic 3

There are no rules, in general, how big a sample size should be


before the approximation is arbitrarily accurate. The n = 30 rule
you may have seen before is just a rule of thumb.

This approximation is quite good for values of n (and p )


satisfying
np (1 p) 10.

Some books call the normal approximation to the binomial the


De Moivre-Laplace Limit Theorem.

for which the relative error is 23%.

0.0207,

Without continuity correction, we obtain




50 60
Y
<
P (Y < 50) = P

24
P (Z < 2.04)

83

You might also like