Professional Documents
Culture Documents
RELATED
DISTRIBUTIONS
SUBMITTED BY
OSAMA BIN AJAZ
(std_18154@iobm.edu.pk)
CONTENTS
Abstract
03
Bernoulli distribution
04
Binomial distribution
05
Multinomial Distribution
07
Beta binomial distribution
08
Correlated binomial distribution
08
12
14
ABSTRACT
R. E. Tarone from National Cancer Institute, Bethesda, Maryland;
derive the tests for the goodness of fit of the binomial distribution
using C() procedure of Neyman (1959), which are
asymptotically optimal against generalized binomial alternatives
proposed by Altham (1978) and Kupper & Haseman (1978).
Before coming to the article I have explain about binomial and
related distributions. I have reproduced key parts of the article, if
somebody interested in detail of the article then he is advice to
see references at the end page of the report.
Bernoulli distribution
A random variable X is defined to have a Bernoulli distribution if the discrete
density function of X is given by
1 x
f ( x )= p (1p) forx=01
0 otherwise
Where the parameter p satisfies 0p1,
If X has a Bernoulli distribution, then
E[x] = p,
var [x] = pq,
Mx (t) = pet + q.
Proof
1
E[x] =
x p x (1 p)1x=0. q+1. p= p
x=0
Mx (t) = E[etx] =
etx p x ( 1 p)1x
x=0
= q+pet
Example 1: out of millions of instant lottery tickets, suppose that 20% are
winners. If five such tickets are purchased, (0, 0, 0, 1, 0) is a possible
observed sequence in which the fourth ticket is a winner and the other four
are losers. Assuming independence among winning and losing tickets, the
probability of this outcome is (0.8) (0.8) (0.8) (0.2) (0.8) = (0.2) (0.8) 4 [5]
In a sequence of Bernoulli trials, we are often interested in the total number
of successes and not in the order of their occurrence. If we let the random
variable X equal the number of observed successes in n Bernoulli trials, the
possible values of X are 0, 1, 2, . . ., n. if x successes occur, where x=0,1,2,
, n, then n-x failures occur. The number of ways selection x positions for
the x successes in the n trials is
n!
(nx)= x !(nx)!
independent and since the probabilities of success and failure on each trial
are, respectively, p and q=1-p, the probability of each of these ways is px (1p) n-x. Thus f(x), the p.m.f of X, is the sum of the probabilities of these
mutually exclusively events, that is
f ( x )= n p x (1 p)n x for x=0,1,2, n
x
()
(nx )
(q+ p) n=
n1
n2
+ + n pn
n
()
b ( x ; n , p )=1
x=0
b (4,
6 =
1
6
5
6
4
3
()
Proof
pe
( t )x qn x
Mx (t) = E[etx] =
x=0
x=0
= (pet + q)
2 1 t
+ e
3 3
)5
then X has a binomial distribution with n = 5 and p = 1/3; that is, the pmf of
X is
n 2
Now, for every fixed > 0, the right-hand member of the preceding inequality is close to zero
for sufficiently large n. That is
Since this is true for every fixed > 0, we see, in a certain sense that the
relative frequency of success is for large values of n, close to the probability
of p of success [3].
Example 5: Let the independent random variables X1, X2, X3 have the same
cdf F(x). Let Y be the middle value of X1, X2, X3. To determine the cdf of Y ,
say FY (y) = P(Y y), we note that Y y if and only if at least two of the
random variables X1, X2, X3 are less than or equal to y. Let us say that the ith
trial is a success if Xi y, i = 1, 2, 3; here each trial has the probability of
success F(y). In this terminology, FY (y) = P(Y y) is then the probability of
at least two successes in three independent trials. Thus
FY(y) =
3
2
()
y
1F
[
)+
2
[F ( y )]
[F ( y )] .
If F(x) is a continuous cdf so that the pdf of X is F(x) =f(x), then the pdf of Y
is
FY(y) = FY(y) =6[F(y)] [1-F(y)] f(y). [4]
MULTINOMIAL DISTRIBUTION
Recall that in order for an experiment to be binomial; two outcomes are
required for each trial. But if each trial in an experiment has more than two
outcomes, a distribution called the multinomial distribution must be used.
For example, a survey might require the responses of approve,
disapprove, or no opinion. In another situation, a person may have a
choice of one of five activities for Friday night, such as a movie, dinner,
baseball game, play, or party. Since these situations have more than two
possible outcomes for each trial, the binomial distribution cannot be used to
compute probabilities.
If X consists of events E1, E2, E3, . . . , Ek, which have corresponding
probabilities p1, p2, p3, . . . , pk of occurring, and X1 is the number of times E1
will occur, X2 is the number of times E2 will occur,X3 is the number of times E3
will occur, etc., then the probability that X will occur is
P ( X )=
n!
. p x p x p xk
X1 ! X2! X3! Xk ! 1 2
1
(nx)
( + ) (n+ x )
.
(n+ + )
I{0,1 , , n}(x)
n
+
and variance =
n ( n+ + )
( + )2 ( + +1)
If ==1, then the beta binomial distribution reduces to a discrete uniform
distribution over the integers 0, 1 n. [2]
where p is the probability that the fetus is abnormal. Note that for the above
equation to be a valid probability distribution, a data-dependent bound for
the parameters has to be imposed; see Kupper and Haseman (1978). It can
be shown that the expectation and variance of the correlated binomial
distribution are np and np (1-p) + n(n-1), respectively. Thus, the correlated
binomial distribution is a generalization of the binomial distribution, the CB
distribution becomes the binomial distribution when =0. Altham (1978)
derived a further two-parameter generalized binomial distribution, namely,
the multiplicative generalized binomial (MB) distribution.
nx
a x(nx)
F( n)
x= 0, 1, 2, . . . , n
a0
0p1
Neyman C () test
10
i=1
i=1
x ni p ) 2+ x i ( 2 p1 )ni p2 }]
2 2 {( i
2p q
11
Under the null hypothesis, the xi are independent binomial random variables,
and hence it follows from (2) that E {S2 (p)} =0. Neyman (1959) has shown
that when E {S2 (p)} =0 the null hypothesis Ho: =0 can be tested using the
^
statistic S1 ( p) , where ^p is a root-n consistent estimator of p (Moran,
1970). Substituting the consistent estimator
^p=
xi
ni
p
x ini ^
ni
S
2
(^
p) =
S=
,
we
find
that
C
()
test
statistic
is
given
by
S
.
Since E {S2 (p)} =0, the variance of S ( ^p ) is given by E {S3 (p)} where the
expectation is taken under Ho: =0. From (3) it follows that E {S3 (p)} =
ni (ni1)
2 p2 q2
. Substituting
^p
12
Under the null hypothesis Ho: = 0, the statistic Z will have an asymptotic
standard normal
distribution.
13
The multiplicative generalization of the binomial distribution provides an alternative for which
the correlated binomial C () test is not asymptotically optimal. The log likelihood function for
the multiplicative generalization of the binomial model is
i
nix
The C () test for Ho: =1 is based on the statistic x I () . Note that unlike the correlated
R=
binomial C () statistic, R is not equivalent to the variance test statistic in the case ni = n for all i.
Will have an asymptotic chi-squared distribution with one degree of freedom. The test based on
X2m is asymptotically optimal against alternatives given by the multiplicative generalization of
the binomial mode
14
Nomin
al
level
X2c
X2m
X2v
0.01
0.007
0.010
0.003
Binomial Probabilities
P=0.10
P=0.25
0.05
0.10
0.01
0.05
0.10
0.019
0.043
0.042
0.048
0.100
0.082
0.013
0.012
0.012
0.035
0.037
0.042
0.073
0.085
0.097
0.01
0.009
0.009
0.007
P=0.50
0.05
0.10
0.034
0.031
0.049
0.077
0.075
0.108
15
Test
statistic
X2v
X2c
X2m
REFERENCES
1. Alexander M. Mood, Franklin A. Graybill and Duane C. Boes,
Introduction to the theory of statistics, third edition, McGrawHill series in probability and statistics
16
17