You are on page 1of 32

Probability & Statistics

Dr. Suresh Kumar, Department of Mathematics, BITS-Pilani, Pilani Campus


Note: Some concepts of Probability & Statistics are briefly described here just to help the students. Therefore,
the following study material is expected to be useful but not exhaustive for the Probability & Statistics course. For
detailed study, the students are advised to attend the lecture/tutorial classes regularly, and consult the text book
prescribed in the hand out of the course.

Why do we need statistics ?

Consider the ideal or perfect gas law that connects the pressure (P ), volume (V ) and temperature (T ) of a gas, and
is given by
P V = RT
where R is gas constant and is fixed for a given gas. So given any two of the three variables P , V and T of the gas,
we can determine the third one. Such an approach is deterministic. However, gases may not follow the ideal gas
law at extreme temperatures. Similarly, there are gases which do not follow the perfect gas law even at moderate
temperatures. Here the point is that a given physical phenomenon need not to follow some deterministic formula or
model. In such cases where the deterministic approach fails, we collect data related to the physical process and do
the analysis using the statistical methods. These methods enable us to understand the nature of physical process
to a certain degree of certainty.

Classification of Statistics

Statistics is mainly classified into three categories:

2.1

Descriptive Statistics

It belongs to the data analysis in the cases where the data set size is manageable and can be analysed analytically
or graphically. It is the statistics that you studied in your school classes. Recall histograms, pie charts etc.

2.2

Inferential Statistics

It is applied where the entire data set (population) can not be analysed at one go or as a whole. So we draw a
sample (a small or manageable portion) from the population. Then we analyse the sample for the characteristic of
interest and try to infer the same about the population. For example, when you cook rice, you take out few grains
and crush them to see whether the rice is properly cooked. Similarly, survey polls prior to voting in elections, TRP
ratings of TV channel shows etc are samples based and therefore belong to the inferential statistics.

2.3

Model Building

Constructing a physical law or model or formula based on observational data pertains to model building. Such an
approach is called empirical. For example, Keplers laws about planetary motion belong to this approach.
Statistics is fundamentally based on the theory of probability. So first we discuss the theory of probability, and
then we shall move to the statistical methods.

Theory of Probability

Depending on the nature of event, the probability is usually assigned/calculated in the following three ways.

Probability & Statistics

3.1

Personal Approach

An oil spill has occurred from a ship carrying oil near a sea beach. A scientist is asked to find the probability that
this oil spill can be contained before it causes widespread damage to the beach. The scientist assigns probability
to this event considering the factors such as the amount of oil spilled, wind direction, distance of sea beach from
the spot of oil spill etc. Naturally such a probability may not be accurate since it depends on the expertise of the
scientist as well as the information available to the scientist. Similarly, the percentage of crops destroyed due to
heavy rains in a particular area as estimated by an agriculture scientist belongs to the personal approach.

3.2

Relative Frequency Approach

An electrical engineer employed at a power house observes that 80 days out of 100 days, the peak demand of power
supply occurs between 6 PM to 7 PM. One can immediately conclude that on any other day there are 80% chances
of peak demand of power supply between 6 PM to 7 PM.
The probability assigned to an event (such as in the above example) after repeated experimentation and observation belongs to relative frequency approach.

3.3

Classical Approach

First we give some definitions related to this approach.


Random Experiment
An experiment whose outcome or result is random, that is, is not known before the experiment, is called random
experiment. eg. Tossing a fair coin is a random experiment.
Sample Space
Set of all possible outcomes is called sample space of the random experiment and is usually denoted by S. eg. In
the toss of a fair coin, S = {H, T }.
Events
Any subset of sample space is called an event. eg. If S = {H, T }, then the sets , {H}, {T } and {H, T } all are
events. The event is called impossible event as it does not happen. The event {H, T } is called sure event as we
certainly get either head or tail in the toss of a fair coin.
Elementary and Compound Events
Singleton subsets of sample space S are called elementary events. The subsets of S containing more than one
element are known as compound events. eg. The singleton sets {H} and {T } are called elementary events while
{H, T } is a compound event.
Equally Likely Events
The elementary events of a sample space are said to be equally likely if each one of them has same chance of
occurring. eg. The elementary events {H} and {T } in the sample space of the toss of a fair coin are equally likely
because both have same chance of occurring.
Mutually Exclusive and Exhaustive Events
Two events are said to be mutually exclusive If happening of one event precludes the happening of the other. eg.
The events {H} and {T } in the sample space of the toss of a fair coin are mutually exclusive because both can not
occur together. Similarly, more than two events say A1 , A2 ,....,An are mutually exclusive if any two of these can
not occur together, that is, Ai Aj = for i = j, where i, j {1, 2, ..., n}. Further, the mutually exclusive events
in a sample space are exhaustive if their union is equal to the sample space. eg. The events {H} and {T } in the
sample space of the toss of a fair coin are mutually exclusive and exhaustive.

Probability & Statistics

Combination of Events
If A and B are any two events in a sample space S, then the event A B implies either A or B or both; A B
implies both A and B; A B implies A but not B; A implies not A, that is, A = S A.
eg. Let S be sample space in a roll of a fair die. Then S = {1, 2, 3, 4, 5, 6}. Let A be the event of getting an
even number and B be the event of getting a number greater than 3. Then A = {2, 4, 6} and B = {4, 5, 6}. So
A B = {2, 4, 5, 6}, A B = {4}, A B = {2} and A = {1, 3, 5}.
Classical Formula of Probability
Let S be sample space of a random experiment, where all the possible outcomes are equally likely. If A is any event
in S, then probability of A denoted by P [A] is defined as
P [A] =

Numer of elements in A
n(A)
=
.
Number of elements in S
n(S)

eg. If S is sample space for toss of two fair coins, then S = {HH, HT, T H, T T }. The coins being fair, here all
the four outcomes are equally likely. Let A be the event of getting two heads. Then A = {HH}, and therefore
P [A] = 1/4.
The classical approach is applicable in the cases (such as the above example) where it is reasonable to assume
that all possible outcomes are equally likely. The probability assigned to an event through classical approach is the
accurate probability.
Axioms of Probability
The classical formula of probability as discussed above suggests the following:
(i) For any event A, 0 P [A] 1.
(ii) P [] = 0 and P [S] = 1.
(iii) If A and B are mutually exclusive events, then P [A B] = P [A] + P [B].
These are known as axioms of the theory of probability.
Deductions from Classical Formula
One may easily deduce the following from the classical formula:
(i) If A and B are any two events, then P [A B] = P [A] + P [B] P [A B]. This is called law of addition of
probabilities.
= 1 P [A]. It follows from the fact that A and A are mutually exclusive and A A = S with P [S] = 1.
(ii) P [A]
(iii) If A is subset of B, then P [A] P [B].
Ex. From a pack of well shued cards, one card is drawn. Find the probability that the card is either a king or an
ace. [Ans. 4/52 + 4/52 = 2/13]
Ex. Two dice are tossed once. Find the probability of getting an even number on the first die or a total of 8. [Ans.
18/36 + 5/36 3/36 = 5/9]
Conditional Probability and Independent Events
Suppose a bag contains 10 Blue, 15 Yellow and 20 Green balls where all balls are identical except for the color. Let
A be the event of drawing a Blue ball from the bag. Then P [A] = 10/45 = 2/9. Now suppose we are told after
the ball has been drawn that the ball drawn is not Green. Because of this extra information/condition, we need to
change the value of P [A]. Now, since the ball is not Green, the total number of balls can be considered as 25 only.
Hence, P [A] = 10/25 = 2/5.
The extra information given in the above example can be considered as another event. Thus, if after the
experiment has been conducted we are told that a particular event has occurred, then we need to revise the value
of the probability of the previous event(s) accordingly. In other words, we find the probability of an event A under
the condition that an event B has occurred. We call this changed probability of A as the conditional probability
of A when B has occurred. We denote this conditional probability by P [A/B]. Mathematically, it is given by
P [A/B] =

n(A B)
n(A B)/n(S)
P [A B]
n(A B)
=
=
=
.
n(S B)
n(B)
n(B)/n(S)
P [B]

Probability & Statistics

The formula of conditional probability rewritten in the form


P [A B] = P [A/B]P [B]
is known as multiplication law of probabilities.
Two events A and B are said to be independent if occurrence or non-occurrence of A does not aect the
occurrence or non-occurrence of the other. Thus, if A and B are independent, then P [A/B] = P [A]. So
P [A B] = P [A]P [B] is the mathematical condition for the independence of the events A and B.
Ex. A die is thrown twice and the sum of the numbers appearing is noted to be 8. What is the probability that
the number 5 has appeared at least once ? [Ans. P [A] = 11/36, P [B] = 5/36, P [A B] = 2/36, P [A/B] = 2/5.]
Ex. Two cards are drawn one after the other from a pack of well-shued 52 cards. Find the probability that both
are spade cards if the first card is not replaced. [Ans. (13/52)(12/51) = 1/17]
Ex. A problem is given to three students in a class. The probabilities of the solution from the three students are 0.5,
0.7 and 0.8 respectively. What is the probability that the problem will be solved ? [Ans. 1(10.5)(10.7)(10.8) =
0.97]
Theorem of Total Probability
Let B1 , B2 , .... , Bn be exhaustive and mutually exclusive events in a sample space S of a random experiment,
each with non-zero probability. Let A be any event in S, then
P [A] =

n
!

P [Bi ]P [A/Bi ].

i=1

Proof: Since B1 , B2 , .... , Bn are exhaustive and mutually exclusive events in the sample space S, so S =
B1 B2 ... Bn . It follows that
A = A S = (A B1 ) (A B2 ) ... (A Bn ).
Now B1 , B2 , .... , Bn are mutually exclusive events. Therefore, A B1 , A B2 , .... , A Bn are mutually
exclusive events. So we have
P [A] = P [A B1 ] + P [A B2 ] + ... + P [A Bn ] =

n
!
i=1

P [A Bi ] =

n
!

P [Bi ]P [A/Bi ].

i=1

Bayes Theorem
Let B1 , B2 , .... , Bn be exhaustive and mutually exclusive events in a sample space S of a random experiment,
each with non-zero probability. Let A be any event in S with P [A] = 0, then
P [Bi /A] =

P [Bi ]P [A/Bi ]
.
n
!
P [Bi ]P [A/Bi ]
i=1

Proof: From conditional probability, we have


P [Bi /A] =

P [Bi ]P [A/Bi ]
P [A Bi ]
=
.
P [A]
P [A]

So the desired result follows from the theorem of total probability.


Note: In the above theorem, P [Bi ] is known before the experiment and is known as priori probability. P [Bi /A]
is known as posteriori probability. It gives the probability of the occurrence of the event Bi with respect to the
occurrence of the event A.
Ex. Four units of a bulb making factory respectively produce 3%, 2%, 1% and 0.5% defective bulbs. A bulb selected
at random from the entire output is found defective. Find the probability that it is produced by the fourth unit of
the factory. [Ans. 1/13]

Probability & Statistics

Discrete Random Variable, Density and Cumulative Distribution


Functions

If a variable X takes real values x corresponding to each outcome of a random experiment, it is called a random
variable. The random variable is said to be discrete if it assumes finite or countably infinite real values. The
behavior of random variable is studied in terms of its probabilities. Suppose a random variable X takes real values
x with probabilities P [X = x]. Then
a function f defined by f (x) = P [X = x] is called density
!
! function of
X provided f (x) 0 for all x and
f (x) = 1. Further, a function F defined by F (x) =
f (x) is called
X=x

Xx

cumulative distribution function of X.


Ex. Consider the random experiment of tossing of two fair coins. Then the sample space is S = {HH, HT, T H, T T }.
Let X denotes the number of heads. Then X is a random variable and it takes the values 0, 1 and 2 since X(HH) = 2,
X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. Here the random variable assumes only three values and hence is discrete.
We find that P [X = 0] = 41 , P [X = 1] = 21 and P [X = 2] = 14 . It is easy to see that the function f given by
X=x

f (x) = P [X = x]

1
4

1
2

1
4

is density function of X. It gives the probability distribution of X.


The cumulative distribution function F of X is given by
X=x

F (x) = P [X x]

1
4

3
4

Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in this experiment,
then X = 1, 2, 3, ........ since head can appear in the first toss, second toss, third toss and so on. So here the discrete
random variable X assumes countably infinite values. The function f given by
X=x

f (x) = P [X = x]

1
2

2
" 1 #2
2

3
" 1 #3
2

...
...

" #x
or f (x) = 12 , x = 1, 2, 3, ........, is the density function of X since f (x) 0 for all x and
$ %x
1
!
!
1
= 2 1 = 1 ( The sum of infinite G.P. a + ar + ar2 + .... = a/(1 r) ).
f (x) =
2
1 2
x=1

X=x

The cumulative distribution function F of X is given by


&
" 1 #x '
$ %x
1
!
1
2 1 2
, where x = 1, 2, 3, .........
F (x) =
f (x) =
=
1

1
2
1

2
Xx

5
5.1

Expectation, Variance, Standard Deviation and Moments


Expectation

Let X be a random variable with density function f . Then, the expectation of X denoted by E(X) is defined as
!
E(X) =
xf (x).
X=x

More generally, if H(X) is function of the random variable X, then we define


!
E(X) =
H(x)f (x).
X=x

Probability & Statistics

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 2 with
probabilities 14 , 12 and 41 respectively. So E(X) = 0 41 + 1 12 + 2 41 = 1.
Note: (i) The variance E(X) of the random variable X is its theoretical average. In a statistical setting, the
average value, mean value1 and expected value are synonyms. The mean value is demoted by . So E(X) = .
(ii) If X is a random variable and c is a constant, then it is easy to verify that E(c) = c and E(cX) = cE(X).
Also, E(X + Y ) = E(X) + E(Y ), where Y is another random variable.
(iii) The expected or the mean value of the random variable X is a measure of the location of the center of values
of X.

5.2

Variance

Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe that both the
variables have the same mean values given by X = Y = 5. However, we see that the values of X are far away
from the mean or the central value 5 in comparasion to the values of Y . Thus, the mean value of a random variable
does not account for its variability. In this regard, we define a new parameter known as variance. It is defined as
follows.
If X is a random variable with mean , then its variance, denoted by Var(X) is defined as the expectation of
(X )2 . So, we have
Var(X) = E[(X )2 ] = E(X 2 ) + 2 2E(X) = E(X 2 ) + E(X)2 2E(X)E(X) = E(X 2 ) E(X)2 .
Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 2 with
probabilities 14 , 12 and 41 respectively. So
E(X) = 0 14 + 1 21 + 2 14 = 1,
E(X 2 ) = (0)2 41 + (1)2 21 + (2)2 41 = 23 .
Var(X)= 32 1 = 21 .
Note: (i) The variance Var(X) of the random variable X is also denoted by 2 . So Var(X) = 2 .
(ii) If X is a random variable and c is a constant, then it is easy to verify that Var(c) = 0 and Var(cX) = c2 Var(X).
Also, Var(X + Y ) = Var(X) + Var(Y ), where X and Y are independent2 random variables.

5.3

Standard Deviation

The variance of a random variable, by definition, is sum of the squares of the dierences of the values of the random
variable from the mean value. So variance carries squared units of the original data, and hence is a pure number
often without any physical meaning. To overcome this problem, a second measure of variability is employed known
as standard deviation and is defined as follows.
Let X be a random variable with variance 2 . Then the standard deviation of X denoted by is the the
non-negative square root of X, that is,
(
= Var(X).

Note: A large standard deviation implies that the random variable X is rather inconsistent and somewhat hard
to predict. On the other hand, a small standard deviation is an indication of consistency and stability.
1 From

your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies f1 , f2 , ...., fn
n
!
respectively and
fi = N , then the mean value is
i=1

n
!
i=1

#
n
n "
!
!
fi
fi x i
=
xi =
f (xi )xi .
N
N
i=1
i=1

where f (xi ) = fNi is the probability of occurrence of xi in the given data set. Obviously, the final expression for is the expectation of
a random variable X assuming the values xi with probabilities f (xi ).
2 Independent random variables will be discussed later on.

Probability & Statistics

5.4

Moments

Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary moment of X.
Obviously, E(X) = is the first ordinary moment, E(X 2 ) is the second ordinary moment and so on. Further,
the ordinary moments can be obtained from the function E(etX ). For, the ordinary moments E(X k ) are coecients
k
of tk! in the expansion
E(etX ) = 1 + tE(X) +

t2
E(X 2 ) + ............
2!

Also, we observe that


E(X k ) =

'
dk &
E(etX ) t=0 .
k
dx

Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the moment generating
function and is denoted by mX (t). Thus, mX (t) = E(etX ).

Geometric Distribution

Suppose a random experiment consists of a series of independent trials to obtain success, where each trial results
into two outcomes namely success (s) and failure (f ) which have constant probabilities p and 1 p respectively in
each trial. Then the sample space of the random experiment is S = {s, f s, f f s, ..........}. If X denotes the number
of trials in the experiment, then X is a discrete random variable with countably infinite values given by X =
1, 2, 3, .......... Trials being independent, we have P [X = 1] = P [s] = p, P [X = 2] = P [f s] = P [f ]P [s] = (1 p)p,
P [X = 3] = P [f f s] = P [f ]P [f ]P [s] = (1 p)2 p,........... Consequently, the density function of X is given by
f (x) = (1 p)x1 p, x = 1, 2, 3......
The random variable X with this density function is called geometric3 random variable. Given the value of the
parameter p, the probability distribution of the geometric random variable X is uniquely described.
For the geometric random variable X, we have (please try the proofs)
t

pe
(i) mX (t) = 1qe
t , where q = 1 p and t < ln q,
(ii) E(X) = 1/p, E(X 2 ) = (1 + q)/p2 ,
(iii) Var(X) = q/p2 .

Ex. A fair coin is tossed again and again till head appears. If X denotes"the
#x number of tosses in this experiment,
then X is a geometric random variable with the density function f (x) = 21 , x = 1, 2, 3, ......... Here p = 12 .

Binomial Distribution

Suppose a random experiment consisting of a finite number n of independent trials is performed, where each
trial results into two outcomes namely success (s) and failure (f ) which have constant probabilities p and 1 p
respectively in each trial. Let X denotes the number of successes in the n trials. Then X is a discrete random
variable with values X = 0, 1, 2, ...., n. Now, corresponding to X = 0, there is only one point in the sample space
namely f f f....f (where f repeats n times) with probability (1 p)n . Therefore, P [X = 0] = (1 p)n . Next,
corresponding to X = 1 there are n C1 points sf f...f , f sf...f , f f s...f , ...., f f f...s (where s appears once and f
repeats n 1 times) in the sample space each with probability (1 p)n1 p. Therefore, P [X = 1] = n C1 (1 p)n1 p.
Likewise, P [X = 2] = n C2 (1 p)n2 p2 ,........, P [X = n] = pn . Consequently, the density function of X is given by
f (x) =

Cx (1 p)nx px , x = 0, 1, 2, 3......, n.

The random variable X with this density function is called binomial4 random variable. Once the values of the
parameters n and p are given/determined, the density function uniquely describes the binomial distribution of X.
3 The

name geometric because the probabilities p, (1 p)p, (1 p)2 ,.... in succession constitute a geometric progression.
name binomial because the probabilities (1 p)n , n C1 (1 p)n1 p,....., pn in succession are the terms in the binomial expansion
of ((1 p) + p)n .
4 The

Probability & Statistics

For the binomial random variable X, we have (please try the proofs)
(i) mX (t) = (q + pet )n , where q = 1 p,
(ii) E(X) = np,
(iii) Var(X) = npq.
Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours ? (Here n = 5, p = 1/6,
x = 2, and therefore P [X = 2] =5 C2 (1 1/6)52 (1/6)2 = 0.161.)

Hypergeometric Distribution

Suppose a random experiment consists of choosing n objects without replacement from a lot of N objects given
that r objects possess a trait of our interest in the lot of N objects. Let X denotes the number of objects possessing
the trait in the selected sample of size n. Then X is a discrete random variable and assumes values in the range
max[0, n (N r)] x min(n, r). Further, X = x implies that there are x objects possessing the trait in
the selected sample of size n, which should come from the r objects possessing the trait. On the other hand, the
remaining n x objects are without trait in the selected sample of size n. So these should come from the N r
objects without trait available in the entire lot of N objects. It follows that the number of ways to select n objects,
where x objects with trait are to be chosen from r objects and n x objects without trait are to be chosen from
N r objects, is r Cx .N r Cnx . Also, the number of ways to select n objects from the lot of N objects is N Cn .
r

P [X = x] =

Cx .N r Cnx
.
NC
n
r

N r

The random variable X with the density function f (x) = Cx . N CnCnx , where max[0, n(N r)] x min(n, r)
is called hypergeometric random variable. The hypergeometric distribution is characterized by the three parameters
N , r and n.
" #
" # " N r # ) N n *
For the hypergeometric random variable X, it can be shown that E(X) = n Nr and Var(X) = n Nr
N
N 1 .
The hypergeomeric probabilities can be approximated satisfactorily by the binomial distribution provided n/N
0.5.
Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards. What is the
probability of getting exactly 2 red cards ? (Here N = 52, r = 26, n = 5, x = 2, and therefore P [X = 2] = 0.3251.)

Poission Distribution

Observing discrete occurrences of an event in a continuous region or interval5 is called a Poission process or Poission
experiment. For example, observing the white blood cells in a sample of blood, observing the number of BITS-Pilani
students placed with more than one crore package in five years etc. are Poission experiments.
Let denote the number of occurrences of the event of interest per unit measurement of the region or interval.
Then the expected number of occurrences in a given region or interval of size s is k = s. If X denotes the number
of occurrences of the event in the region or interval of size s, then X is called a Poission random variable. Its
probability density function can be proved to be
f (x) =

ek k x
, x = 0, 1, 2, ....
x!

We see that the Poission distribution is characterized by the single parameter k while a Poission process or experiment is characterized by the parameter .
It can be shown that
t

mX (t) = ek(e
5 Note

1)

, E(X) = k = Var(X).

that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.

Probability & Statistics

Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is tested for white
blood cells count by collecting a blood sample of size 0.001ml. Find the probability that the collected blood sample will carry exactly 3 white blood cells. (Here = 6000, s = 0.001, k = s = 6 and x = 3, and therefore
6 3
P [X = 3] = e 3!6 .)
Ex. In the last 5 years, 10 students of BITS-Pilani are placed with a package of more than one crore. Find the
probability that exactly 7 students will be placed with a package of more than one crore in the next 3 years. (Here
6 7
= 10/5 = 2, s = 3, k = s = 6 and x = 7, and therefore P [X = 7] = e 7!6 .)

10

Uniform Distribution

A random variable X is said to follow uniform distribution if it assumes finite number of values all with same chance
of occurrence or equal probabilities. For instance, if the random variable X assumes n values x1 , x2 , .... , xn with
equal probabilities P [X = xi ] = 1/n, then it is uniform random variable with density function given by
f (x) =

1
, x = x1 , x2 , ...., xn .
n

The moment generating function, mean and variance of the uniform random variable respectively read as
,2
+ n
n
n
n
1!
1! 2
1!
1 ! txi
2
e , =
xi , =
x
xi .
mX (t) =
n i=1
n i=1
n i=1 i
n i=1
Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X is a discrete
random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P [X = 1] = P [X = 2] = P [X = 3] = P [X = 4] = P [X =
5] = P [X = 6] = 1/6. Thus, X is a uniform random variable.

11

Continuous Random Variable

A continuous random variable is a variable X that takes all values x in an interval or intervals of real numbers, and
its probability for a particular value is 0. For example, if X denotes the time of peak power demand in a power
house, then it is a continuous random variable because the peak power demand happens over a continuous period
of time, no matter how small or big it is. In other words, it does not happen at an instant of time or at a particular
value of time variable.

A function f is called density function of a continuous random variable X provided f (x) 0 for all x,
- b
- x
f (x)dx = 1 and P [a X b] =
f (x)dx. Further, a function F defined by F (x) =
f (x)dx is called

cumulative distribution function (cdf) of X.

The expectation of a random variable X having density f is defined as


-
xf (x)dx.
E(X) =

In general, the expectation of H(X), a function of X, is defined as


-
E(H(X)) =
H(x)f (x)dx.

The moment generating function of X is defined as


-
Xt
mX (t) = E(e ) =
ext f (x)dx.

The kth ordinary moment (E(X k )), mean () and variance ( 2 ) of X are respectively, given by
. k
/
-
d
xk f (x)dx =
E(X k ) =
[m
(t)]
,
X
dxk

t=0

Probability & Statistics

= E(X) =

10

xf (x)dx,

= E(X ) E(X) =

x f (x)dx

$-

%2
xf (x)dx .

Remarks (i) The-condition f (x) 0 implies that the graph of y = f (x) lies on or above x-axis.

(ii) The condition


f (x)dx = 1 graphically implies that the total area under the curve y = f (x) is 1. There
- b
fore, P [a X b] =
f (x)dx implies the area under the curve y = f (x) from x = a to x = b. Also,
a
- b
- a
P [a X b] =
f (x)dx
f (x)dx = F (b) F (a).

(iii) P [a X b] = P [a < X b] = P [a < X < b] since P [X = a] = 0, P [X = b] = 0.


(iv) F (x) = f (x) provided the dierentiation is permissible.
Ex. Verify whether the function
0
12.5x 1.25, 0.1 x 0.5
f (x) =
0
elsewhere

is density function of X. If so, find F (x), P [0.2 X 0.3], and 2 .


-
Sol. Please try the detailed calculations yourself. You will find
f (x)dx = 1. So f is a density function.

Further,

x < 0.1
0,
6.25x2 1.25x + 0.625, 0.1 x 0.5
F (x) =

1,
x > 0.5
P [0.2 X 0.3] = 0.1875.
= 0.3667.
2 = 0.00883.

12

Uniform or Rectangular Distribution

A continuous random variable X is said to have uniform distribution if its density function reads as
0 1
ba , a < x < b
f (x) =
0,
elsewhere
In this case, the area under the curve is in the form of a rectangle. That is why the name rectangular is there.
You may easily derive the following for the uniform distribution.

xa
0,
xa
, a<x<b
F (x) =
ba

1,
xb
mX (t) =

ebt eat
.
(b a)t

b+a
.
2
(b a)2
.
2 =
12

Probability & Statistics

13

11

Gamma Distribution

A continuous random variable X is said to have gamma distribution with parameters and if its density function
reads as
x
1
x1 e , x > 0, > 0, > 0,
()
4
where () = 0 ex x1 dx is the gamma function.6
The moment generating function, mean and variance of the gamma random variable can be derived as
%
$
1

.
mX (t) = (1 t)
t<

f (x) =

= ,
2 = 2 .
Note: The special case of gamma distribution with = 1 is called exponential distribution. Therefore, density
function of exponential distribution reads as
f (x) =

1 x
e , x > 0, > 0,

On the other hand, the special case of gamma distribution with = 2 and = /2, being some positive integer,
is named as Chi-Squared (2 ) distribution. Its density function is
f (x) =

1
" #
2

22

x 2 1 e

x
2

, x > 0.

The 2 distribution arises so often in practice, extensive tables for its cumulative distribution function have
been derived.

14

Normal Distribution

A continuous random variable X is said to follow Normal distribution7 with parameters and if its density
function is given by
1
f (x) =
2

1 x

e 2

"2

, < x < , < < , > 0.

For the normal random variable X, one can immediately verify the following:
-
1 2 2
f (x)dx = 1, mX (t) = et+ 2 t , Mean = , Variance = 2 .

This shows that the two parameters and in the density function of normal random variable X are its mean
and standard deviation, respectively.
Note: If X is a normal random variable with mean and variance 2 , then we write X N (, 2 ).

should remember that (1) = 1, () = ( 1)( 1), (1/2) = and () = ( 1)! when is an integer.
distribution was first described by De Moivre in 1733 as the limiting case of Binomial distribution when number of trials is
infinite. This discovery did not get much attention. Around fifty years later, Laplace and Gauss rediscovered normal distribution while
dealing with astronomical data. They found that the errors in astronomical measurements are well described by normal distribution.
The normal distribution is also known as Gaussian distribution.
6 One

7 Normal

Probability & Statistics

12

Standard Normal Distribution


Let Z = X
. Then E(Z) = 0 and Var(Z) = 1. We call Z as the standard normal variate and we write Z N (0, 1).
Its density function reads as
z2

1
e 2 , < z < .
(z) =
2
The corresponding cumulative distribution function is given by
z

1
(z) =
(z)dz =
2

z2
e 2 dz.

The normal probability curve is symmetric about the line X = or Z = 0. Therefore, we have
P [X < 0] = P [X > 0] = 0.5, P [a < Z < 0] = P [0 < Z < a].
The probabilities of the standard normal variable Z in the probability table of normal distribution are given in
terms of cumulative distribution function (z) = F (z) = P [Z z] (See Table 5 on page 697 in the text book). So
we have
P [a < Z < b] = P [Z < b] P [Z < a] = F (b) F (a).
From the normal table, it can be found that
P [|X | < ] = P [ < X < + ] = P [1 < Z < 1] = F (1) F (1) = 0.8413 0.1587 = 0.6826.
This shows that there is approximately 68% probability that the normal variable X lies in the interval ( , + ).
We call this interval as the 1 confidence interval of X. Similarly, the probabilities of X in 2 and 3 confidence
intervals are respectively, are given by
P [|X | < 2] = P [ 2 < X < + 2] = P [2 < Z < 2] = 0.9544,
P [|X | < 3] = P [ 3 < X < + 3] = P [3 < Z < 3] = 0.9973.
Ex. A random variable X is normally distributed with mean 9 and standard deviation 3. Find P [X 15],
P [X 15] and P [0 X 9].
Sol. We have Z =

X9
3 .

P [X 15] = P [Z 2] = 1 F (2) = 1 0.9772 = 0.0228.

P [X 15] = 1 0.0228 = 0.9772


P [0 X 9] = P [3 Z 0] = F (0) F (3) = 0.5 0.0013 = 0.4987.
Ex. In a normal distribution, 12% of the items are under 30 and 85% are under 60. Find the mean and standard
deviation of the distribution
Sol. Let be mean and be standard deviation of the distribution. Given that P [X < 30] = 0.12 and P [X <
60] = 0.85. Let z1 and z2 be values of the standard normal variable Z corresponding to X = 30 and X = 60
respectively so that P [Z < z1 ] = 0.12 and P [Z < z2 ] = 0.85. From the normal table, we find z1 1.17 and
z2 1.04 since F (1.17) = 0.121 and F (1.04) = 0.8508.
= 1.17 and 60
= 1.04, we find = 45.93 and = 13.56.
Finally, solving the equations, 30

Note: If X is normal random variable with mean and variance 2 , then P [|X | < k] = P [|Z| < k] =
F (k) F (k). However, if X is not a random variable, then the rule of thumb for the required probability is given
by the Chebyshevs inequality.

Probability & Statistics

13

Chebyshevs Inequality
If X is a random variable with mean and variance 2 , then
P [|X | < k] 1

1
k2

.
Note that the Chebyshevs inequality does not yield the exact probability of X to lie in the interval (k, +k)
rather it gives the minimum probability for the same. However, in case of normal random variable, the probability
obtained is exact. For example, consider the 2 interval ( 2, + 2) for X. Then, Chebyshevs inequality gives
P [|X | < 2] 1 14 = 0.75. In case, X is normal variable, we get the exact probability P [|X | < 2] = 0.9544.
However, the advantage of Chebyshevs inequality is that it applies to any random variable of known mean and
variance.

Approximation of Binomial distribution by Normal distribution


If X is a Binomial random variable with parameters n and p, then X approximately follows a normal distribution
with mean np and variance np(1p) provided n is large. Here the word large is quite vague. In strict mathematical
sense, large n means n . However, for most of the practical purposes, the approximation is acceptable if the
values of n and p are such that either p 0.5 and np > 5 or p > 0.5 and n(1 p) > 5.

MORE CONCEPTS SHALL BE ADDED SOON.


Cheers!

Probability & Statistics

Dr. Suresh Kumar, Department of Mathematics, BITS-Pilani, Pilani Campus


Note: Some concepts of Probability & Statistics are briefly described here just to help the students. Therefore,
the following study material is expected to be useful but not exhaustive for the Probability & Statistics course. For
detailed study, the students are advised to attend the lecture/tutorial classes regularly, and consult the text book
prescribed in the hand out of the course.

Chapter 5
So far we have studied a single random variable either discrete or continuous. Such random variables are called
univariate. Problems do arise where we need to study two random variables simultaneously. For example, we may
wish to study the heights and weights of a group of students up to the age of 20 years. Typical questions to ask are,
What is the average height of students of age less than or equal to 18 years? or, Is the height independent of
weight?. To answer this type of questions, we need to study what are called two-dimensional or bivariate random
variables.

Discrete Bivariate Random Variable


Let X and Y be two discrete random variables. Then the ordered pair (X, Y ) is called a two dimensional or bivariate
discrete random variable.

Joint density function


A function fXY such that
fXY (x, y) 0, fXY (x, y) = P [X = x, Y = y],

! !

fXY (x, y) = 1

X=x Y =y

is called joint density function of (X, Y ).

Distribution function
The distribution function of (X, Y ) is given by
! !
F (x, y) =
fXY (x, y).
Xx Y y

Marginal density functions


The marginal density of X, denoted by fX , is defined as
!
fX (x) =
fXY (x, y).
Y =y

Similarly, the marginal density of Y , denoted by fY , is defined as


!
fY (y) =
fXY (x, y).
X=x

Independent random variables


The discrete random variables X and Y are said to be independent if and only if
fXY (x, y) = fX (x)fY (y) for all (x, y).

Probability & Statistics

Expectation
The expectation or mean of X is defined as
! !
E[X] =
xfXY (x, y) = X .
X=x Y =y

In general, the expectation of a function of X and Y , say H(X, Y ), is defined as


! !
E[H(X, Y )] =
H(x, y)fXY (x, y).
X=x Y =y

Covariance
If X and Y are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y ) is defined
as
Cov(X, Y ) = E[(X X )(Y Y )] = E[XY ] E[X]E[Y ].
Ex. In an automobile plant, two tasks are performed by robots, the welding of two joints and tightening of three
bolts. Let X denote the number of defective bolts and Y denote the number of improperly tightened bots produced
per car. The probabilities of (X, Y ) are given in the following table.
X/Y
0
1
2
fY (y)

0
0.84
0.06
0.01
0.91

1
0.03
0.01
0.005
0.45

2
0.02
0.008
0.004
0.032

3
0.01
0.002
0.001
0.013

fX (x)
0.9
0.08
0.02
1

(i) Is it a density function?


(ii) Find the probability that there would be exactly one error made by the robots.
(iii) Find the probability that there would be no improperly tightened bolts.
(iv) Are the variables X and Y independent?
(v) Find Cov(X, Y ).
Sol. (i) We have
2 !
3
!

fXY (x, y)

fXY (0, 0) + fXY (0, 1) + fXY (0, 2) + fXY (0, 3) + fXY (1, 0) + fXY (1, 1)

X=0 Y =0

+fXY (1, 2) + fXY (1, 3) + fXY (2, 0) + fXY (2, 1) + fXY (2, 2) + fXY (2, 3)
=
=

0.84 + 0.03 + 0.02 + 0.01 + 0.06 + 0.01 + 0.008 + 0.002 + 0.01 + 0.005 + 0.004 + 0.001
1

This shows that fXY is a density function.


(ii) The probability that there would be exactly one error made by the robots, is given by
P [X = 1, Y = 0] + P [X = 0, Y = 1] = fXY (1, 0) + fXY (0, 1) = 0.06 + 0.03 = 0.09.
(iii) The probability that there would be no improperly tightened bolts, reads as
P [Y = 0] =

2
!

fXY (x, 0) = fXY (0, 0) + fXY (1, 0) + fXY (2, 0) = 0.84 + 0.06 + 0.01 = 0.91.

X=0

It is the marginal density fY (y) of Y at y = 0, that is, fY (0) = 0.91.

Probability & Statistics

(iv) From the given Table, we notice that fXY (0, 0) = 0.84, fX (0) = 0.9 and fY (0) = 0.91. So we have
fX (0)fY (0) = 0.819 = fXY (0, 0).
This shows that X and Y are not independent.
(v) We find
2 !
3
!
E[X] =
xfXY (x, y) = 0.12,
E[Y ] =

X=0 Y =0
2 !
3
!

yfXY (x, y) = 0.148,

X=0 Y =0
2 !
3
!

E[XY ] =

xyfXY (x, y) = 0.064.

X=0 Y =0

Hence, Cov(X, Y ) = E[XY ] E[X]E[Y ] = 0.046.

Continuous Bivariate Random Variable


Let X and Y be two continuous random variables. Then the ordered pair (X, Y ) is called a two dimensional or
bivariate continuous random variable.

Joint density function


A function fXY such that
fXY (x, y) 0, P [a X b, c Y d] =

"

"

fXY (x, y)dxdy,

"

"

fXY (x, y)dxdy = 1,

for real a, b, c and d, is called joint density function of (X, Y ).

Distribution function
The distribution function of (X, Y ) is given by
" x " y
F (x, y) =
fXY (x, y)dxdy.

Marginal density functions


The marginal density of X, denoted by fX , is defined as
"
fX (x) =
fXY (x, y)dy.

Similarly, the marginal density of Y , denoted by fY , is defined as


"
fY (y) =
fXY (x, y)dx.

Independent random variables


The continuous random variables X and Y are said to be independent if and only if
fXY (x, y) = fX (x)fY (y).

Probability & Statistics

Expectation
The expectation or mean of X is defined as
! !
E[X] =
xfXY (x, y)dxdy = X .

In general, the expectation of a function of X and Y , say H(X, Y ), is defined as


! !
E[H(X, Y )] =
H(x, y)fXY (x, y)dxdy.

Covariance
If X and Y are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y ) is defined
as
Cov(X, Y ) = E[(X X )(Y Y )] = E[XY ] E[X]E[Y ].
Ex. Let X denote a persons blood calcium level and Y , the blood cholesterol level. The joint density function of
(X, Y ) is
fXY (x, y) = k, 8.5 x 10.5, 120 y 240.
(i) Find the value of k.
(ii) Find the marginal densities of X and Y .
(iii) Find the probability that a healthy person has a cholesterol level between 150 to 200.
(iv) Are the variables X and Y independent?
(v) Find Cov(X, Y ).
Sol. (i) fXY (x, y) being joint density function, we have
1=

fXY (x, y)dxdy =

240

120

10.5

kdxdy = 240k.

8.5

So k = 1/240 and fXY (x, y) = 1/240


(ii) The marginal density of X is
fX (x) =

fXY (x, y)dy =

240

120

1
1
dy = , 8.5 x 10.5.
240
2

Similarly, the marginal density of Y is


!
! 10.5
1
1
dy =
, 120 y 240.
fY (y) =
fXY (x, y)dx =
240
120

8.5
(iii) The probability that a healthy person has a cholesterol level between 150 to 200, is
P [150 Y 200] =

200

fY (y)dy =

150

5
.
12

(iv) We have
fX (x)fY (y) =

1
1
1

=
fXY (x, y).
2 120
240

This shows that X and Y are independent.


(v) We find

Probability & Statistics

E[X] =
E[Y ] =

xfXY (x, y)dxdy =


yfXY (x, y)dxdy =


! !

E[XY ] =

240

120
240

120

xyfXY (x, y)dxdy =

10.5

8.5
10.5

x
dxdy = 9.5,
240
y
dxdy = 180,
240

8.5
240 ! 10.5

120

8.5

xy
dxdy = 1710.
240

Hence, Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1710 9.5 180 = 0.


Theorem: If X and Y are two independent random variables with joint density fXY , then show that
E[XY ] = E[X]E[Y ], that is, Cov(X, Y ) = 0.
Proof. We have
E[XY ] =
=
=
=

xyfXY (x, y)dxdy

xyfX (x)fY (y)dxdy ( fXY (x, y) = fX (x)fY (y) as X and Y are given independent.)
"!
#
yfY (y)
xfX (x)dx dy

yfY (y)E[X]dy
!
E[X]
yfY (y)dy

=
=

E[X]E[Y ].

Note. Converse of the above result need not be true, that is, if E[XY ] = E[X]E[Y ], then X and Y need not
be independent. For instance, see the following table for the joint density function of a two dimensional discrete
random variable (X, Y ).
X/Y
1
4
fY (y)

2
0
1/4
1/4

1
1/4
0
1/4

1
1/4
0
1/4

2
0
1/4
1/4

fX (x)
1/2
1/2
1

We find that E[X] = 5/2, E[Y ] = 0 and E[XY ] = 0. So E[XY ] = E[X]E[Y ]. Next, we see that fX (1) = 1/2,
fY (1) = 1/4 and fXY (1, 1) = 1/4. So fX (1)fY (1) = fXY (1, 1), and hence X and Y are not independent.
In fact, we can easily observe the dependency X = Y 2 . Thus, covariance between X and Y gives only a rough
indication of any association that may exist between X and Y . Also it does not describe the type or strength of
the association. The linear relationship between X and Y can be predicted by using a measure known as Pearson
coecient of correlation.

Pearson coecient of correlation


2
If X and Y are two random variables with means X , Y , and variances X
and Y2 , then correlation between X
and Y is given by

XY =

Cov(X, Y )
.
X Y

It can be proved that XY lies in the range [1, 1]. Further, |XY | = 1 if and only if Y = 0 + 1 X for some
real numbers 0 and 1 = 0.
Note that if XY = 0, we say that X and Y are uncorrelated. It does not imply that X and Y are unrelated.
Of course, the relationship, if exists, would not be linear.

Probability & Statistics

2
In Robots example, X
= 0.146, Y2 = 0.268, Cov(X, Y ) = 0.046 and therefore XY = 0.23.

Conditional densities and regression


Let (X, Y ) be a two dimensional random variable with joint density fXY and marginal densities fX and fY . Then
the conditional density of X given Y = y, denoted by fX/y , is defined as
fX/y =

fXY (x, y)
, (fY (y) > 0).
fY (y)

Similarly, the conditional density of Y given X = x, denoted by fY /x , is defined as


fY /x =

fXY (x, y)
, (fX (x) > 0).
fX (x)

The mean value of X given Y = y, denoted by X/y , is given by


!

X/y = E[X/Y = y] =

xfX/y dx.

The graph of X/y versus y is called regression line of X on Y . Similarly, the graph of
Y /x = E[Y /X = x] =

yfY /x dy,

versus x is called regression line of Y on X.


Ex. The joint density function of (X, Y ) is
fXY (x, y) = c/x, 27 y x 33.
(i) Find the value of c.
(ii) Find the marginal densities and hence check the independence of X and Y
(iii) Evaluate P [X 30, Y 28].
(iv) Find conditional densities fX/y and fY /x . Evaluate P [X > 32|y = 30] and X/y=30 .
(v) Find the curves of regression of X on Y and Y on X.
Sol. (i) To find c, we use

" 33 " 33
27

fXY (x, y)dxdy = 1 and we get c =

1
627 ln 33/27 .

"x
(ii) fX (x) = 27 xc dx = c(1 27/x), 27 x 33
" 33 c
fY (y) = y x dx = c(ln 33 ln y), 27 y 33.
We observe that fXY (x, y) = c/x = fX (x)fY (y). So X and Y are not independent.
(iii) P [X 30, Y 28] =
(iv) We have
fX/y =

" 28 " 30
27

c
x dxdy

= 0.15.

1
fXY (x, y)
=
, y x 33.
fY (y)
x(ln 33 ln y)

fXY (x, y)
1
=
, 27 y x.
fX (x)
x 27
! 33
! 33
fX/y=30 dx =
P [X > 32|y = 30] =
fY /x =

32

32

X/y=30 =

33

xfX/y=30 dx =
30

33

30

1
dx = 0.32.
x(ln 33 ln 30)

1
dx = 31.48.
ln 33 ln 30

Probability & Statistics

(v) Curve of regression of X on Y is


X/y =

33

xfX/y dx =

33
y

33 y
1
dx =
.
ln 33 ln y
ln 33 ln y

Curve of regression of Y on X is
! x
! x
y
1
Y /x =
yfY /x dx =
dx = (x + 27).
x

27
2
27
27

Chapter 6
The inferential statistics is essentially based on random sampling from the population. So it is important to
understand the meaning of random sample.

Random Sample
A random sample of size n from the distribution of X is a collection n independent random variables, each with
the same distribution as of X.
It may noted that the term random sample is used in three dierent but closely related ways in applied
statistics. It may refer to objects for study or to the random variables associated with the selected objects for study
or to the numerical values assumed by the associated random variables as illustrated in the following example.
Suppose we wish to find the mean eective life of lithium batteries used in a particular model of pocket calculator
so that a limited warranty can be placed on the product. For this purpose, we randomly choose n batteries from
the population of batteries. Here, prior to the actual selection of the batteries, the life span Xi (i = 1, 2, ..., n) of
the ith battery is a random variable. It has the same distribution as X, the life span of batteries in the population.
The random variables Xi are independent in the sense that the value assumed by one has no eect on the value
assumed by any other variable. Thus, the random variables X1 , X2 , ......, Xn constitute a random sample. For the
selected sample of n batteries, the random variables X1 , X2 , ......, Xn shall assume n real values x1 , x2 , ......, xn .
In the above example, the selected n batteries, the associated n random variables X1 , X2 , ......, Xn and the
values x1 , x2 , ......, xn assumed by the n random variables, all refer to random sample in the context under
consideration.

Statistics
A statistic is a random variable whose numerical value can be determined from the random sample. In other words,
a statistic is a random variable that is a function of the variables X1 , X2 , ......, Xn in the random sample. Some
statistics are described in the following.

Sample Mean
Let X1 , X2 , ......., Xn be a random sample from the distribution of X. Then the statistic
sample mean and is denoted by X. So X =

n
"

n
"

Xi /n is called the

i=1

Xi /n is the mean of the sample X1 , X2 , ......., Xn .

Sample Median
Let x1 , x2 , ......., xn be a random sample of observations arranged in the order from the smallest to the largest.
The sample mean is the middle observation if n is odd otherwise it is the average of the two middle observations.

Probability & Statistics

Sample Variance
Let X1 , X2 , ......., Xn be a random sample of size n from the distribution of X. Then the statistic
S2 =

n
!
(Xi X)2

i=1

is called the sample variance. The statistic S =

S 2 is called the sample standard deviation.

Important Remark: It can be shown that the statistics S 2 tends, on the average, to underestimate 2 , the
n
!
(Xi X)2 is divided by n 1 in place of n. In this way, S 2 is
population variance. To improve the situation,
i=1

unbiased for 2 , that is, centred at the right spot. In case, X1 , X2 , ......., Xn constitute the entire population, then
S 2 = 2 =

n
!
(Xi X)2

i=1

A computational formula for the sample variance is


n
2

S =

n
!

(Xi2

i=1

"

n
!
i

n(n 1)

Xi

#2

Sample Range
The sample range is defined as the dierence between the largest and the smallest observations.
Ex. A random sample of 9 observations is given as follows:
310 400 406 410 450 395 401 408 415
Find sample mean, median, variance, standard deviation and range. (Ans. Sample mean= 408.3, Median= 406,
Variance= 303.25, Standard deviation= 17.4, Range= 60).

Chapter 7
Unbiased point estimator
= . For example, if X1 , X2 , ......., Xn
A parameter is an unbiased estimator for a parameter if and only if E[]
is a random sample of size n from a distribution with mean , then the sample mean X is an unbiased estimator
for . For,
E[X] = E[(X1 + X2 + ... + Xn )/n] = (E[X1 ] + E[X2 ] + ... + E[Xn ])/n = ( + + ... + )/n = (n)/n =
since X1 , X2 , ......., Xn constitute the random sample from the distribution having mean , so each of the random
variables Xi has mean .
It is desirable that the unbiased estimator has a small variance for large sample sizes.

Standard error of mean


Let X denote the sample mean of a sample of size
n drawn from a distribution with standard deviation . Then
standard deviation of X can be proved to be / n and is called standard error of mean.

Probability & Statistics

Unbiased estimator for variance


Let S 2 be the sample variance based on a random sample of size n from a distribution with mean and variance
2 . Then it can be proved that
S2 =

n
!
(Xi X)2

n1

i=1

is an unbiased estimator for the population variance 2 . Also, it can be shown that S is not unbiased for . This
emphasizes the fact that unbiasedness is desirable but not essential in an estimator.

Method of moments for finding point estimators


In many cases, the moments involve the parameter to be estimated. We can often obtain a reasonable estimator
for by replacing the theoretical moments by their estimates based drawn sample and solving the resulting equations for the estimator , as illustrated in the following example.
Ex. A forester plants 5 rows of pine seedlings with 20 pine seedlings in each row. Let X denotes the number of
seedlings per row that survive the first winter. Then X follows a binomial distribution with n = 20 and unknown
p. Find an estimate of p given that X1 = 18, X2 = 17, X3 = 15, X4 = 19, X5 = 20.
Sol. We have E[X] = np. So X = 20
p. It follows that
5
!

Xi /5 = 20
p or 17.8 = 20
p

i=1

Finally, we get p = 0.89, the estimate for p.


Note: If there are two parameters to be estimated, then we need to search two equations involving the parameters
and moments.

Method of maximum likelihood


Obtain a random sample X1 , X2 , ......., Xn from the distribution of a random variable X with density f and
associated parameter . Define a function L() given by
L() =

n
!

f (xi )

i=1

known as the likelihood function for the sample. Find the expression for that maximizes the likelihood function.
Note that the likelihood function gives the probability of getting the sample X1 , X2 , ......., Xn from the distribution
of the random variable X. So we find the value of that maximizes the value of the likelihood function. This value
of serves as an estimate for the parameter .
Ex. Let X1 , X2 , ......., Xn be a random sample from a normal distribution with mean and variance 2 . The
density for X is
1 x 2
1
f (x) = e 2 ( ) .
2

Therefore, the likelihood function reads as


n
"

1 xi 2
1
e 2 ( ) =
L(, ) =
2
i=1

$n

212

n
!
i=1

(xi )2

Probability & Statistics

10

1 !
ln L(, ) = n ln 2 n ln 2
(xi )2 .
2 i=1

Putting the partial derivatives of ln L(, ) equal to 0, we find


n

1!
xi ,
n i=1

2 =

1!
(xi )2 .
n i=1

Thus, the maximum likelihood estimators for the parameters and 2 are
n

=X

and
2 =

1!
(xi X)2 .
n i=1

Note: The estimator obtained from the method of moments often agrees with the one obtained from the method
of maximum likelihood. If it does not happen in some case, then the maximum likelihood estimator is preferred.
Theorem: Let X1 and X2 be independent random variables with mgf mX1 (t) and mX2 (t) respectively. Let
Y = X1 + X2 . Then the mgf of Y is given by
mY (t) = mX1 (t)mX2 (t).
Proof: We have
mY (t) = E[etY ] = E[etX1 +tX2 ] = E[etX1 ]E[etX2 ] = mX1 (t)mX2 (t),
since etX1 and etX2 are independent as X1 and X2 are independent.
1

2 2

Ex. The mgf of a normal random variable with mean and variance 2 is mX (t) = et+ 2 t . Let X1 , X2 , .......,
Xn be independent normal variables with means 1 , 2 , ......., n and variance 12 , 22 , ......., n2 , respectively. Let
Y = X1 + X2 + ..... + Xn . Then, we have

mY (t) =

n
"

n
!

mXi (t) = e

i=1

n
!

i
+ 21

i=1

2
i2
t

i=1

Theorem: Let X be a random variable with mgf mX (t), and Y = + X. Then the mgf of Y is given by
mY (t) = et mX (t).
Proof: We have
mY (t) = E[etY ] = E[et+tX ] = E[et etX ] = et E[etX ] = et mX (t).
Theorem: Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with mean and
variance 2 . Then X is normally distributed with mean and variance 2 /n.
Proof: We know that
1

mX (t) = et+ 2

2 2

So by the previous theorem, we have


m X (t) = e( n )

t+ 12

'

2
n2

(
t2

It follows that
mX (t) = m X1 +X2 +.....+Xn (t) = m X1 (t)m X2 (t)....m Xn (t) = e( n

Thus, X is normally distributed with mean and variance 2 /n.

1
+....
n )t+ 2

'

2
n2

(
2
+....+
t2
n2

=e

t+ 12

'

2
n

(
t2

Probability & Statistics

11

Confidence interval
A 100(1 )% confidence interval for a parameter is a random interval [L1 , L2 ] such that P [L1 L2 ] = 1 ,
regardless the value of .

Confidence interval for of normal distribution with known


Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with unknown mean and known
X

variance 2 . Then X is normally distributed with mean and variance 2 /n. Therefore, Z = /
follows a
n
standard normal distribution. We utilize this fact to find confidence intervals for the unknown . Let us find 95%
confidence interval for . From the normal probability distribution table, we have
P [1.96 Z 1.96] = F (1.96) F (1.96) = 0.95.
X
1.96] = 0.95.
/ n

P [X 1.96/ n X + 1.96/ n] = 0.95.

or P [1.96

Thus, 95% confidence interval for is [L1 , L2 ] = [X 1.96/ n, X + 1.96/ n].

In general, 100(1 )% confidence interval for is [L1 , L2 ] = [X z/2 / n, X + z/2 / n]. Here z/2 is the

value of Z =

/ n

such that P [Z > z/2 ] = P [Z < z/2 ] = /2. Obviously, P [z/2 Z z/2 ] = 1 .

Note: If the sample is drawn from a non-normal distribution, then the following theorem helps us in getting the
confidence intervals for .
Ex. Find the 95% confidence interval for mean of population given a sample
8.0
12.5
13.4
14.2

13.6
14.2
8.6
19.0

13.2
14.9
11.5
17.9

13.6
14.5
16.0
17.0

and population variance 2 = 9.


Sol. Here n = 16, X = 13.88 and = 3. So 95% confidence limits are given by

L1 = X 1.96/ n = 13.88 1.96(3/4) = 12.41,

L2 = X + 1.96/ n = 13.88 + 1.96(3/4) = 15.35.

Central limit theorem


Let X1 , X2 , ......., Xn be a random sample of size n from a distribution with mean and variance 2 . Then
X
is
for large n, X is approximately normal with mean and variance 2 /n. Furthermore, for large n, Z = /
n
approximately normal.
Empirical studies have shown that the above theorem holds for n 25.

Chapter 8
We have seen how to estimate both mean and variance of a distribution via point estimation. We have also seen
how to construct a confidence interval for the mean of a normal distribution when its variance is assumed to be
known. Unfortunately, in most of the statistical studies, the assumption that 2 is known is unrealistic. If it is
necessary to estimate the mean of a distribution, then its variance is usually unknown. In what follows we shall
learn how to make inferences on the mean and variance when both of these parameters are unknown.

Probability & Statistics

12

Confidence interval for 2 of normal distribution with unknown


Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with mean and variance 2 . Then
it can be proved that the random variable
(n 1)S 2 / 2 =

n
!
i=1

(Xi X)2 / 2

has a chi-squared distribution1 with n 1 degrees of freedom.


2
Denoting Xn1
= (n 1)S 2 / 2 , let us find 95% confidence interval for 2 . Let 20.025 and 20.975 denote the
2
2
2
values of Xn1 such that P [Xn1
20.025 ] = 0.025 and P [Xn1
20.975 ] = 0.975. Obviously, we have
2
P [20.975 Xn1
20.025 ] = 0.95.

or P [20.975 (n 1)S 2 / 2 20.025 ] = 0.95.

P [(n 1)S 2 /20.025 2 (n 1)S 2 /20.975 ] = 0.95.

Thus, 95% confidence interval for 2 is [L1 , L2 ] = [(n 1)S 2 /20.025 , (n 1)S 2 /20.975 ].
In general, 100(1 )% confidence interval for 2 is [L1 , L2 ] = [(n 1)S 2 /2/2 , (n 1)S 2 /21/2 ].
Ex. Find the 95% confidence interval for 2 of a normal population based on the following sample:
3.4
3.0
1.4
3.5
4.2

3.6
3.1
2.0
2.5
1.5

4.0
4.1
3.1
1.7
3.0

0.4
1.4
1.8
5.1
3.9

2.0
2.5
1.6
0.7
3.0

Sol. Here n = 25 and S 2 = 1.408. From the 2 probability distribution table, for 24 degrees of freedom, we have
20.025 = 39.4 and 20.975 = 12.4. So 95% confidence limits are given by
L1 = (n 1)S 2 /20.025 = 24(1.408)/39.4 = 0.858,
L2 = (n 1)S 2 /20.975 = 24(1.408)/12.4 = 2.725.
1A

random variable X is said to follow chi-square distribution with degrees of freedom if its density function is given by
f (x) =

1
x/21 ex/2 ,
(/2)2/2

x > 0.

. We denote the chi-square random variable with n degrees of freedom by X2 .


If X is a normal random variable with mean and variance 2 . Then it can be proved that the square of the standard normal
!
"2
variable, that is, Z 2 = X
follows a chi-square distribution with one degree of freedom. Also, it can be proved that the sum

of independent chi-square random variables is also a chi-square random variable with degrees of freedom equal to the sum of degrees
of freedom of all the independent random variables. It follows that if X1 , X2 , ......., Xn is a random sample of size n from a normal
%
n $
#
Xi 2
is a chi-square random variable with n degrees of freedom.
distribution with mean and variance 2 , then

i=1

Probability & Statistics

13

Confidence interval for of normal distribution with unknown 2


Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with mean and variance 2 . Then
it can be proved that the random variable
X

S/ n
follows a T distribution2 with n 1 degrees of freedom.
X
, let us find 95% confidence interval for . Let t0.025 and t0.025 denote the values of
Denoting Tn1 = S/
n
Tn1 such that P [Tn1 t0.025 ] = 0.025 = P [Tn1 t0.025 ]. Obviously, we have

P [t0.025 Tn1 t0.025 ] = 0.95.


or P [t0.025

X
t0.025 ] = 0.95.
S/ n

X
t0.025 ] = 0.95. (Because of symmetry of T distribution, t0.025 = t0.025 )
S/ n

P [X t0.025 S/ n X + t0.025 S/ n] = 0.95.

Thus, 95% confidence interval for is [L1 , L2 ] = [X t0.025 S/ n, X + t0.025 S/ n].


or P [t0.025

In general, 100(1 )% confidence interval for is [L1 , L2 ] = [X t/2 S/ n, X + t/2 S/ n].

Ex. Find the 95% confidence interval for of a normal population based on the following sample:
52.7
62.2
45.3
52.4

43.9
56.5
63.4
38.6

41.7
33.4
53.9
46.1

71.5
61.8
65.5
44.4

47.6
54.3
66.6
60.7

55.1
50.0
70.0
56.4

Sol. Here n = 24, X = 53.92 and S = 10.07. From the T probability distribution table, for 23 degrees of freedom,
we have t0.025 = 2.069. So 95% confidence limits are given by

L1 = X t0.025 S/ n = 53.92 2.069(10.07)/ 24 = 49.67,

L2 = X + t0.025 S/ n = 53.92 + 2.069(10.07)/ 24 = 58.17.


2 If

Z is a standard normal variable and X2 is an independent chi-squared random variable with degrees of freedom, then the
!
random variable T = Z/ X2 / is said to follow a T distribution with degrees of freedom.
The density function of a T random variable reads as
f (t) =

( + 1)/2

(/2)

"

1+

t2

#(+1)/2

< t < .

The graph of this density function is symmetric about the line t = 0 and tends to the standard normal curve as the number of degrees
of freedom increases.

Probability & Statistics

14

Hypothesis testing
In the theory of hypothesis testing, the experimenter/researcher proposes a hypothesis on population parameter
. The hypothesis proposed by the experimenter/researcher is known as alternative or research hypothesis and is
denoted by H1 . Negation of H1 is called null hypothesis and is denoted by H0 . While testing a hypothesis on
population parameter , the statement of equality = 0 (known as null value of ), is always included in H0 .
Further, H1 being the research hypothesis, it is expected that the evidence leads us to reject H0 and thereby to
accept H1 .
Ex. Highway engineers think that the reflective highway signs do not perform properly because more than 50%
of the automobiles on the road have misaimed headlights. If this contention is supported statistically, a tougher
inspection program will be put into operation. Let p denote the proportion of automobiles with misaimed headlights.
Since the engineers wish to support p > 0.5, so the research hypothesis H1 and the null hypothesis H0 are
H1 : p > 0.5
H0 : p 0.5
Note that p = 0.5, the null value of p, is included in H0 .

Errors in Hypothesis testing


In hypothesis testing, we consider a test statistic whose values that lead to the rejection of H0 are set before the
experiment is conducted. These values of the test statistic constitute critical or rejection of the test.
Type I Error: The probability that the observed value of the test statistic will fall in the critical region when
= 0 is called the size of the test or level of significance of the test, and is denoted by . If this occurs, a Type I
error is committed. Thus, is the probability of committing Type I error. It is possible that the observed value of
the test statistic falls into the H1 region even though H0 is true and should not be rejected. The probability of
getting the text statistic value into the H1 region is, therefore, the Type I error.
Ex. A random sample of 20 cars is selected and the headlights are tested. Let us design a test so that , the
probability of rejecting H0 when p = 0.5, is about 0.05. The test statistic that we shall use is X, the number of
cars in the sample with misaimed headlights. Then X follows a binomial distribution with n = 20, p = 0.5 and
E[X] = np = 10. So on an average 10 of every 20 cars tested are expected to have misaimed headlights. So logically
we should reject H0 if the observed value of X is somewhat greater than 10. Also, from the binomial probability
distribution table, we observe that P [X 13 : p = 0.5] = 0.9423. Therefore,
P [X 14 : p = 0.5] = 1 P [X 13 : p = 0.5] = 1 0.9423 = 0.0577.
Let us choose = 0.0577. It implies that we agree to reject H0 in favor of H1 if the observed value of X 14. In
this way, we have split the values of X into two sets C = {14, 15, ......, 20} and C = {0, 1, 2, ......, 13}. Here C is the
critical or rejection region of the test. If the observed value of X lies in C, we reject H0 and conclude that majority
of cars have misaimed headlights.
Type II Error: It is possible that the observed value of the test statistic falls into the H0 region even though H0
is not true and should be rejected. So we fail to reject H0 even though it is not true. If this occurs a Type II error
is committed. The probability of committing Type II error is denoted by . The value of is harder to predict
directly. It is usually calculated subject to some give value of alternative. It is obviously the probability of getting
the test statistic value in the H0 region subject to the given alternative.
Note: When we fail to reject H0 , we say that the observed sample for the test statistic is not enough to support H1 .
Ex. Suppose that, unknown to the researcher, the true proportion of the cars with misaimed headlights is p = 0.7.
What is the probability that the = 0.0577 test, as designed in the previous example, is unable to detect this
situation? For this, we calculate
= P [X 13 : p = 0.7] = 0.392.

Probability & Statistics

15

Power: Suppose a researcher puts a great deal of time, eort and money into designing and carrying out an experiment to gather evidence to support a research theory. Therefore, the researcher would like to have the probability
of rejecting the null hypothesis when the research theory is true. This probability is called power of the test.
Note that both the probabilities, power and , are calculated under the assumption that the research theory is
true. The researcher will either fail to reject the null hypothesis with probability or will reject the null hypothesis
with probability power.

+ power = 1 or power = 1 .

In the previous example, we found = 0.392 under the assumption that the research theory is true (p = 0.7).
Therefore, power= 1 0.392 = 0.608.

Significance testing
Suppose we want to test
H0 : p 0.1
H1 : p > 0.1
based on a sample of size 20. Let the test statistic is X, the number of successes that are observed in 20 trials. If
p = 0.1, the null value of p, then X follows a binomial distribution with mean E[X] = 20(0.1) = 2. So values of
X somewhat greater than 2 will lead to the rejection of null hypothesis. Suppose we want to be very small, say
0.0001. From the binomial probability distribution table, we have
P [X 9 : p = 0.1] = 1 P [X 8 : p = 0.1] = 1 0.9999 = 0.0001.
So the critical region of the test is {C = 9, 10, ......., 20}. Now suppose we conduct the test and observe 8 successes.
It does not fall into C. So via our rigid rule of hypothesis testing we are unable to reject H0 . However, a little
thought should make us a bit uneasy with this decision. We find
P [X 8 : p = 0.1] = 1 P [X 7 : p = 0.1] = 1 0.9996 = 0.0004.
It means we are willing to tolerate 1 chance in 10000 of making a Type I error. But we shall declare 4 chances
in 10000 of making such an error too large to risk. There is so little dierence between these probabilities that it
seems a bit silly to insist with our original cut o value 9.
Such a problem can be avoided by adopting a technique known as significance testing where we do not preset
and hence do not specify a rigid critical region. Rather, we evaluate the test statistic and then determine the
probability of observing a value of the test statistic at least as extreme as the value noted, under the assumption
= 0 . This probability is known as critical level or descriptive level of significance or P value of the test. We
reject H0 if we consider this P value to be small. In case, an level has been preset to ensure that a traditional or
industry maximum acceptable level is met, we compare the P value with the preset value. If P , then we can
reject the null hypothesis atleast at the stated level of significance.
Ex. Automotive engineers are using more and more aluminium in manufacturing the automobiles in hopes of
reducing the cost and improving the petrol mileage. For a particular model, the mileage on highway has a mean 26
kmpl with a standard deviation of 5 kmpl. It is hoped that a new design manufactured by using more aluminium
will increase the mean petrol mileage on highway maintaining the standard deviation of 5 kmpl. So we test the
hypothesis
H0 : 26
H1 : > 26 (the new design incereases the petrol mileage on highway)
Suppose 36 vehicles with new design are tested on highway and the mean petrol mileage is found to be 28.04
kmpl. Here, n = 36 and sample mean is X = 28.04. We choose X as the test statistic since X is an unbiased
estimator for the population mean . We know X is approximately normally distributed with mean = 26 and

Probability & Statistics

16

= 5/ 36 = 5/6. Therefore Z = (X )// n = (X 26)/(5/6 is the corresponding standard normal variate.


So P value of the test is given by
P [X 28.04 : = 26] = P [Z 2.45] = 1 P [Z 2.45] = 1 0.9929 = 0.0071.
There are two explanations of this very small probability. First, the null hypothesis H0 is true and we have
observed a very rare sample that by chance has a large mean. Second, the new design with more aluminium has,
in fact, resulted in a higher mean petrol mileage. We prefer the second explanation as it supports our research
hypothesis H1 . That is, we shall reject H0 and report that P value of our test is 0.0071. In case, there is some
prestated level of significance say = 0.05. We can safely reject H0 at this level of significance since the P value
0.0071 of our test is less than = 0.05.
Note: Significance testing is a widely used concept as it is more appealing than hypothesis testing. For a right
tailed test (H1 : > 0 ), the P value is the area under the probability curve to the right of the observed value of the
test statistic while for a left-tailed test (H1 : < 0 ), it is the area to the left. For a two-tailed test (H1 : = 0 ),
where the distribution is symmetric as it is for Z or T statistic, it is logical to double the apparent one-tailed P
value. If the distribution is not symmetric as it is for chi-squared statistic, then presumably the two-tailed P value
is nearly double the one-tailed value.

Hypothesis and significance tests on the mean


There are three forms for the tests of hypotheses on the mean of a distribution.
I

H0 : 0
H1 : > 0
(Right-tailed test)

II

H0 : 0
H1 : < 0
(Left-tailed test)

III

H0 : = 0
H0 : = 0
(Two-tailed test)

Tests of hypothesis on are actually conducted by testing H0 : = 0 against one of the alternatives > 0 ,
< 0 and = 0 . In particular, the values of the test statistic that lead us to reject 0 and to conclude that
> 0 will also lead us to reject any value less than 0 . Similarly, the values of the test statistic that lead us to
reject 0 and to conclude that < 0 will also lead us to reject any value greater than 0 . For this reason many
statisticians prefer to write the above three tests as
I

H0 : = 0
H1 : > 0
(Right-tailed test)

II

H0 : = 0
H1 : < 0
(Left-tailed test)

III

H0 : = 0
H0 : = 0
(Two-tailed test)

This emphasizes the fact that while performing a hypothesis test on , is computed assuming that = 0 .
Similarly, while performing a significance test on , P value is computed under the assumption = 0 .
Ex. The maximum acceptable level for exposure to microwave radiation in Mumbai is an average of 10 microwatts
per square centimeter. It is feared that a large television transmitter may be polluting the air nearby by pushing
the level of microwave radiation above the safe limit. So we want to test
H0 : 10
H1 : > 10 (unsafe)
Obviously, a right-tailed
test is applicable here. Suppose a sample of 25 readings is to be obtained. Then our test

statistic (X 10)/(S/ 25) follows a T24 distribution when = 10. Let us preset . If we make a Type I error, we
shall shut down the transmitter unnecessarily. On the other hand, if we make a Type II error, we shall fail to detect
potential health hazard. We want small but not so small as to force very large. Let us choose = 0.1. From the
T distribution probability table, we find that the critical point of the test is 1.318. Suppose
the sample of 25 readings

gives X = 10.3 and S = 2. So the observed value of the test statistic is (X 10)/(S/ 25) = (10.310)/(2/5) = 0.75,
which is less than the critical value 1.318. Therefore, we are unable to reject H0 and conclude that the observed
data do not support the contention that the transmitter is forcing the average microwave level level above the safe
limit.

Probability & Statistics

17

Now, let us find the P value of the test, that is, P [T24 0.75]. From the T distribution probability table, we
find that P [T24 > 0.685] = 1 P [T24 0.685] = 1 0.75 = 0.25. Also, P [T24 > 1.318] = 1 P [T24 1.318] =
1 0.9 = 0.1. Next, the observed value of the test statistic is 0.75, which lies between 0.685 and 1.318. It follows
that the P value of the test, given by P [T24 0.75], is greater than 0.1 but less than 0.25. Since the P value of
the test is greater than the preset value 0.1. So we are unable to reject H0 in favor of H1 at the stated level of
significance.
Ex. See example 8.5.5 from the text book for a two-tailed test on mean.

Chapter 9 (9.1, 9.2)


Estimator for proportions
Consider a population of interest where a particular trait is being studied, and each member of the population can
be classified as either having or failing to have the trait. Let p be proportion of the population with the trait. In
order to find a logical estimator for p, we select a random sample X1 , X2 , ......., Xn of size n from the population
where Xi = 1 if the ith member of the population has the trait otherwise Xi = 0. Then X = X1 + X2 + ..... + Xn
is equal to the number in sample with trait. Therefore, the sample proportion with trait is given by
p =

X
.
n

It serves as an estimator for the population proportion p. Note that p = X, that is, p is the mean of the selected
random sample. Therefore, by Central Limit Theorem, p is approximately normally distributed with same mean
as of each Xi and variance equal to (VarXi )/n. Now, the density of Xi is given by
xi : 1
f (xi ) : p

0
1p

So mean of Xi is E[Xi ] = p and Var(Xi ) = E[Xi2 ] (E[Xi ])2 = p p2!= p(1 p). Hence, p is approximately normal
with mean p and variance p(1 p)/n. It implies that Z = (
p p)/ p(1 p)/n is standard normal variable, and
the 100(1 )% confidence interval on p is [L1 , L2 ], where
!
L1 = p z/2 p(1 p)/n
L2 = p + z/2

!
p(1 p)/n

Note that we do not know p. So we replace p by its unbiased estimator p. So we have


!
L1 = p z/2 p(1 p)/n
L2 = p + z/2

!
p(1 p)/n

Ex. In a randomly selected sample of 100 bulbs from the output of a factory, 91 bulbs are found to be working fine
without any defect. Find 95% confidence interval on the population proportion of non-defective bulbs.
Sol. Here, n = 100 and p = 91/100 = 0.91. Also, from the normal table, z0.05/2 = 1.96. So 95% confidence limits
on the population proportion of non-defective bulbs are
!
!
L1 = p z/2 p(1 p)/n = 0.91 1.96 0.91(1 0.91)100 = 0.091 0.056 = 0.854
L2 = p + z/2

!
!
p(1 p)/n = 0.91 + 1.96 0.91(1 0.91)100 = 0.091 + 0.056 = 0.966

So with 95% confidence, we expect production of non-defective bulbs from the factory between 85.4% and 96.6%.

Probability & Statistics

18

Sample size for estimating p


A sample based experiment may yield a !
lengthy confidence interval
! on p which is virtually useless. The 100(1 )%
confidence interval [L1 , L2 ] = [
p z/2 p(1 p)/n, p z/2 p(1 p)/n] on p tells us that we are 100(1 )%
!
sure that p lies in this interval, and consequently p diers from p at most by d = z/2 p(1 p)/n. This in turn
implies that the sample size n for the confidence interval of desired length 2d is given by
n=

2
z/2
p(1 p)

d2

Note that this formula can be used if the prior estimate p for p is available. Otherwise, we use the formula
n=

2
z/2

4d2

since it can be shown that p(1 p) will never exceed 1/4.


Ex. In the previous example where p = 0.91, if we desire the length of the confidence interval to be 0.02, that is,
d = 0.01, then the number of bulbs in the sample should be
n=

(1.96)2 (0.91)(1 0.91)


= 3147.
(0.01)2

In case, the prior estimate p for p is not available, then for the 95% confidence interval with length 0.02, we
need to select a sample of size
n=

(1.96)2
= 9604.
4(0.01)2

Testing hypothesis on proportion


Let the null value of population proportion p be p0 . Then for testing the hypothesis H0 : p!= p0 against one of
the alternatives H1 : p > p0 , H1 : p < p0 and H1 : p = p0 , we use the test statistic (
p p0 )/ p0 (1 p0 )/n. This
statistic is logical since it compares the unbiased point estimator p for p to the null value p0 . Furthermore, by
Central Limit Theorem, this statistic follows standard normal distribution when p = p0 .
Ex. The majority of faults on transmission lines are the result of external influences and are usually transitory. It
is thought that more than 70% of the faults are caused by lightening. To gain evidence to support this contention,
we test
H0 : p = 0.7
H1 : p > 0.7
Data gathered over a year-long period show that 151 of 200 faults are observed due to lightening. So the observed
value of the test statistic is
!
!
(
p p0 )/ p0 (1 p0 )/n = (151/200 0.7)/ 0.7(1 0.7)/200 = 1.697

From the normal table, we see that


P [Z 1.69] = 0.0455,

P [Z 1.70] = 0.0446.

It implies that the P value of our test lies between 0.0446 and 0.0455. Considering this small p value, we reject H0
and conclude that p > 0.7.
Note. In the above example, the hypothesis testing on p does not assume that the sample size is large. In fact, the
criteria p0 = 0.7 > 0.5 and n(1 p0 ) = 200(1 0.7) = 60 > 5 is met. So the Binomial distribution is approximated
by normal distribution.

Probability & Statistics

19

MORE CONCEPTS SHALL BE ADDED SOON.


Cheers!

You might also like