MIE1605 1b ProbabilityReview PDF

Probability Review
MIE1650 Lecture 1
September 19, 2017

Course Overview
S YLLABUS
Meeting Times: Tuesdays 09:00-12:00 PM

Office Hours:
◦ Monday: 04:30-06:00 PM
◦ By Appointment
Course HomePage:
◦ blackboard
◦ I will post outlines of lecture slides (if any) there before class.
◦ Homework assignments and grades will be posted there
Textbook: Adventures in Stochastic Processes, S. Resnick
Probability and Random Processes, G. Grimmett, D. Stirzaker
Grading:
◦ 5× Homework 20%
◦ 1× Midterm 40%
◦ 1× Final 40%
M Cevik MIE1605 - Probability Review 2 / 96
Course Overview
T ENTATIVE S CHEDULE
Dates Topic
12/09 - 19/09 Probability Review
26/09 - 10/10 Discrete Time Markov Chains
17/10 - 24/10 Poisson Processes
30/10 Midterm Exam, 5.15 - 7.30 PM
31/10 - 07/11 Continuous Time Markov Chains
14/11 - 21/11 Renewal Processes
28/12 - 05/12 Brownian Motion and Martingales
12/12 Final Exam, 9.00 - 11.30 PM

Course Overview
O UTLINE
1 Course Overview
2 Probability Basics
Introduction
Random variables
Sum of Independent RVs
Functions of RVs
3 Limit theorems
4 Generating Functions
Moment Generating Functions
Probability Generating Functions
5 Random Sums
6 Simple Branching Process
7 Simple Random Walk

Probability Basics Introduction
P ROBABILITY S PACE
A measure on a set is a systematic way to assign a number to

each suitable subset on that set.
Let X be a set and Σ a σ-field over X. A function µ from Σ to the

extended real number line is called a measure if it satisfies the
following properties:
(1) Non-negativity : For all E ∈ Σ: µ(E) ≥ 0.
(2) Null empty set: µ(∅) = 0.
(3) Countable additivity : For all countable collections
P∞ {Ei }∞
i=1 of
∞
pairwise disjoint sets in Σ, µ ∪k=1 Ek = k=1 µ(Ek )
A triple (X, Σ, µ) is called a measure space.

P ROBABILITY S PACE
Probability space is a measure space with a probability measure

and a probability measure is a measure with total measure one,
i.e., µ(X) = 1.
A probability space consists of three parts:

(1) A sample space, Ω , which is the set of all possible outcomes.
(2) A set of events, F, where each event is a set containing zero
or more outcomes.
(3) A probability measure, P, specifying the assignment of
probabilities to the events.

P ROBABILITY S PACE
Sample Space Ω : Set of all possible outcomes ωi of an

experiment.
Ex: Toss a coin until a head appears.

ωi = T . . . TH
Ω = {H, TH, TTH, TTTH . . .} = {ω1 , ω2 , . . .}
⇒ An event is a set of outcomes of the experiment.

P ROBABILITY S PACE
Set of events, F: A collection of subsets of Ω is called a σ-field

(denoted by F ⊆ 2Ω ) if
◦ ∅ = Ω̄ ∈ F, Ω ∈ F
◦ if A1 , A2 , . . . ∈ F, then, ∪∞
i=1 Ai ∈ F ⇒ closed under countable
union
◦ if A ∈ F, then Ā = Ω\A ∈ F ⇒ closed under complements
Ex: F = {∅, {1, 2, 3, 4}, {1, 2}, {3, 4}, {1, 3}, {2, 4}} is not a σ-field,
because, {1, 2} ∪ {1, 3} = {1, 2, 3} is not in F.
Probability measure, P: A function P : F 7→ [0, 1] such that

(a) P(∅) = 0, P(Ω) = 1
(b) if A1 , A2 , . . . ∈ F, where Ai Aj = ∅, ∀i 6= j,
P∞
then P(∪∞ i=1 Ai ) = i=1 P(Ai )

R ANDOM VARIABLES
A function X : Ω → T is called a (T-valued) random variable (RV)

(e.g., T = R). Let X be a (discrete) RV whose range is {0, 1, 2, . . .}.
Then,
P(X = k) = pk , k = 0, 1, 2, . . .
∞
X
If X is a proper RV, then pk = 1, pk ≥ 0 ∀k
k=0
Examples:
◦ Discrete RV
◦ Continuous RV
◦ Mixed RV

D ISTRIBUTION F UNCTIONS
Probability density function (pdf): A continuous RV X has density

fX , where fX is a non-negative Lebesgue-integrable function if
Z b
P[a ≤ X ≤ b] = fX (x) dx
a
Probability mass function (pmf): A discrete RV X : S → A has a

pmf fX : A → [0, 1] defined as
fX (x) = P(X = x) = P({s ∈ S : X(s) = x})
Cumulative distribution function (cdf):

FX (x) = P(X ≤ x), P(a < X ≤ b) = FX (b) − FX (a)
Z t=x
◦ FX (x) = fX (t)dt
t=−∞
P P
◦ F(x) = xi ≤x P(X = xi ) = xi ≤x p(xi )
BAYES ’ T HEOREM
P(A ∩ B) P(B|A)P(A)
P(A|B) = =
P(B) P(B)
Let Ai ’s constitute a partition of the sample space S.

P(B|Ai )P(Ai ) P
P(Ai |B) = P where P(B) = j P(B|Aj )P(Aj )
j P(B|Aj )P(Aj )
Example:
In a city, 51% of the adults are males. Also, 9.5% of males smoke
cigars, whereas 1.7% of females smoke cigars. If a randomly
selected adult smokes cigars, what’s the probability that selected
subject is a male?

J OINT DISTRIBUTION
Joint pmf: p(x, y) = P(X = x, Y = y)

Joint pdf: f (x, y), x ∈ A, y ∈ B
Single RV X: Cumulative distribution function

FX (x) = P(X ≤ x)
Two RV’s X, Y: Joint cdf

F(x, y) = P(X ≤ x, Y ≤ y)
⇒ Similar definition for more than two RVs.
Continuous RV’s: Joint cdf

R R
P(X ∈ A, Y ∈ B) = B A f (x, y)dxdy

M ARGINAL DISTRIBUTION
Relation between joint and individual RV densities:

Z Z ∞
P(X ∈ A) = P(X ∈ A, Y ∈ R) = f (x, y)dy dx
A −∞
Z ∞
⇒ fX (x) = f (x, y)dy
−∞
Similar for a discrete RV, marginal mass function

X X
pX (x) = p(x, y), pY (y) = p(x, y)
y∈R x∈R

M ARGINAL DISTRIBUTION
Ex: Consider following joint pdf

(
e−(x+y) , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise
◦ This is a joint density (it integrates to 1)
◦ Calculation of the marginals:
◦ X and Y are exponentially distributed with mean 1

C ONDITIONAL P ROBABILITY D ISTRIBUTIONS
Conditional probability mass function:
P(X = x, Y = y) p(x, y)
pX|Y (x|y) = P(X = x|Y = y) = =
P(Y = y) pY (y)
Conditional probability density function:
f (x, y)
fX|Y (x|y) =
fY (y)
Ex: Suppose the joint pmf of X and Y is given by

p(1, 1) = 0.5, p(1, 2) = 0.1, p(2, 1) = 0.1, p(2, 2) = 0.3.
Find the pmf of X given Y = 1.

I NDEPENDENCE
P(A|B) = P(A) ⇒ A and B (events) are independent, A ⊥ B.
If A ⊥ B ⇒ A ⊥ B̄
RVs X and Y are independent if one of the following holds:

◦ FX,Y (x, y) = FX (x)FY (y)
◦ fx,y (X, Y) = fX (x)fY (y)
Ex1: A coin is tossed n times: Let X be # of heads, Y be # of tails.

X and Y are dependent, since X + Y = n
Ex2: X, Y having joint density exp[−(x + y)] are independent

I NDEPENDENCE
Ex: Discrete RV’s that are not independent:

X = sum of 2 flips
Y = difference of 2 flips
Joint mass function

Outcome X Y Prob
00 0 0 0.25
01 1 -1 0.25
10 1 1 0.25
11 2 0 0.25
p(0, 0) = 0.25, pX (0) = 0.25, pY (0) = 0.5

p(2, 1) = 0, pX (2) = 0.25, pY (1) = 0.25

C ONDITIONAL I NDEPENDENCE
R and B are conditionally independent given Y

⇐⇒ P(R ∩ B | Y) = P(R | Y)P(B | Y)
or equivalently ⇐⇒ P(R | B ∩ Y) = P(R | Y)
R and B are being independent does not imply that A and B are
conditionally independent. Likewise, conditional independence of
R and B does not imply that they are independent.
Ex: Consider a coin which can be fair or biased (favoring H):

◦ A: First coin toss is H
◦ B: Second coin toss is H
◦ C: Coin is biased
⇒ A and B are dependent.
⇒ A and B are conditionally independent given C.
BASIC PROBABILITY EXAMPLE
P(A) =?, P(B) =?, P(C) =?
P(A ∩ C) =?, P(B ∩ C) =?
P(A ∩ B ∩ C) =?
P(A|C) =?, P(B|C) =?
P(A|B ∩ C) =?, P(A ∩ B|C) =?

M EAN (E XPECTATION )
For RV X, the mean (or expectation) is defined to be

X X
◦ E(X) = xp(x), E[g(X)] = g(x)p(x) (discrete RV)
x∈R x∈R
Z Z
◦ E(X) = xf (x)dx, E[g(X)] = g(x)f (x)dx (continuous RV)
R R
Joint distributions:
Z Z Z
E[aX] = axf (x, y)dydx = a xfX (x)dx = aE[X]
R R R
Z Z
E(aX + bY) = (ax + by)f (x, y)dydx = aE[X] + bE[Y]
R R
E[ ni=1 Xi ] = ni=1 E[Xi ], Xi0 s not need to be indep.

P P


M EAN (E XPECTATION )
Moments of a RV:
The rth moment of X is E[X r ]
The rth central moment of X is E[(X − E[X])r ]
For non-negative integer RVs, an alternative way of computing

expectation:
∞
X
◦ Discrete RV: E[X] = P(X > k)
k=0
Z ∞
◦ Continuous RV: E[X] = P(X > k)dk
k=0

C ONDITIONAL E XPECTATION
Conditional expectation of X given Y = y

X
◦ E[X|Y = y] = xpX|Y (x|y) (discrete RV)
x∈R
Z
◦ E[X|Y = y] = xfX|Y (x|y)dx (continuous RV)
x∈R
Computing expectation by conditioning (continuous RV)

Z ∞
E[X] = E[X|Y = y]fY (y)dy
−∞
Z ∞Z ∞
= xfX|Y (x|y)fY (y)dxdy
−∞ −∞
Chain expansion: E[X] = EY [EX|Y (X|Y)]

C ONDITIONAL E XPECTATION
Ex: Suppose you arrive at a post office having two clerks at a

moment when both are busy, but there is no one else waiting in
line. You will enter service when either clerk becomes free. If
service times for clerk i (for i = 1, 2) are exponential with rate λi ,
find E[T], where T is the total amount of time that you will spend in
the post office (including the time you spend in service). Note that
the two servers are not identical.
P.S. X ∼ Expo(λ) ⇒ fX (x) = λe−λx , x ≥ 0, FX (x) = 1 − e−λx , x ≥ 0
Solution:
3
E[T] = E[T|R1 < R2 ]P(R1 < R2 ) + E[T|R2 < R1 ]P(R2 < R1 ) = · · · = λ1 +λ2

C OVARIANCE
Measure of how much two RV change together
Covariance of RV’s X and Y

Cov(X, Y) = E[(X − E[X])]E[(Y − E[Y])]
expanding the product yields
Cov(X, Y) = ... = E[XY] − E[X]E[Y]
Cov(X, X) is the variance of X,

Var(X) = E[X 2 ] − (E[X])2
If X and Y are independent, then

Z Z Z Z
E[XY] = xyf (x, y)dydx = xyfX (x)fY (y)dydx = E[X]E[Y]
R R R R
⇒ Cov(X, Y) = 0.
C OVARIANCE PROPERTIES
Cov(cX, Y) = cCov(X, Y)
Var(cX) = c2 Var(X)
Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z)

P P P P
Cov( i Xi , i Yi ) = i j Cov(Xi , Yj )
Pn Pn Pn P
Var( i=1 Xi ) = i=1 Var(Xi ) +2 i=1 j<i Cov(Xi , Xj )
Pn Pn
Var( i=1 Xi ) = i=1 Var(Xi ) → if Xi ’s independent

C OVARIANCE PROPERTIES
Ex: Flip a fair coin 3 times. Let X be the number of heads in the
first 2 flips and let Y be the number of heads on the last 2 flips (so
there is overlap on the middle flip). Compute Cov(X, Y).
1
Solution: 4

S TANDARD DEVIATION / CV
p
Standard deviation of the RV X is σ(X) = Var(X)
⇒ Standard deviation and variance are measures of dispersion
about the mean
The coefficient of variation (CV) of the RV X is σ(X)/E[X]

⇒ This is a way of normalizing the description of dispersion (it
removes a scale factor)
⇒ Not always well-defined (what if E[X] = 0?)

C ORRELATION
Correlation of RV’s X and Y

Cov(X, Y)
ρ(X, Y) =
σ(X)σ(Y)
⇒ We always have −1 ≤ ρ(X, Y) ≤ 1
Correlation has the sign of the covariance, but it is dimensionless.
Ex: A box contains red, white and black balls. We draw balls from
the box n times where at each draw we note ball color and then
replace it to the box. Let X1 and X2 be the number of red balls and
white balls drawn, respectively. Find ρ(X1 , X2 ).
−np√1 p2
Solution: √
np1 (1−p1 ) np2 (1−p2 )

C ORRELATION AND INDEPENDENCE
If X and Y are independent ⇒ Cov(X, Y) = 0 ⇒ ρ(X, Y) = 0. Then,

X, Y are uncorrelated.
Beware!
If X and Y are uncorrelated, this does not imply they are
independent
⇒ Exception: Joint normally distributed random variables
Ex:
◦ X = sum of 2 coin flips, and Y = difference of the 2 flips
◦ E[XY] = 0 = E[Y], but E[X] = 1
◦ So, Cov(X, Y) = E[XY] − E[X]E[Y] = 0
◦ But we know X and Y are not independent

Probability Basics Random variables
B INOMIAL RV
Binomial RV’s are used to describe number of successes in n

Bernoulli trials. Let X1 , X2 , . . . Xn be independent and identically
distributed (iid) RV with
Xi = 1 with prob. p, and 0, with prob. (1 − p).
⇒ X1 , X2 , . . . Xn are called Bernoulli trials
n
X
⇒ Xi ∼ Bin(n, p).
i=1
Let X ∼ Bin(n, p).Then,

n k
◦ P(X = k) = p (1 − p)n−k , k = 0, 1, 2, . . . , n
k
Xn
◦ E[X] = np, Var(X) = Var( Xi ) = . . . = np(1 − p)
i=1

G EOMETRIC RV
Number of failures until first success, i.e., X represents number of

failures in successive Bernoulli trials until first success occurs.
◦ P(X = k) = (1 − p)k p, k = 0, 1, 2 . . .
◦ P(X ≤ k) = 1 − P(X > k) = 1 − (1 − p)k+1 , k = 0, 1, 2 . . .
(1−p) (1−p)
◦ E[X] = p , Var[X] = p2
Alternatively, X may represent number of trials until first success.

◦ P(X = k) = (1 − p)k−1 p, k = 1, 2 . . .
(1−p)
◦ E[X] = 1p , Var[X] = p2
◦ Assume X ∼ Geom(p1 ), Y ∼ Geom(p2 ) are iid Geometric RVs

X P(X < Y)?
X What’s distribution of min(X, Y)?
N EGATIVE B INOMIAL RV
Number of trials until rth success:

⇒ Nr basically corresponds to rth inter-arrival times.
◦ P(rth success on nth trial)
= P(r − 1 success in n − 1 trials)P(success on nth trial)

n − 1 r−1
p (1 − p)n−r p, n = r, r + 1, . . .

=
r−1
◦ E[Nr ] = r/p, Var(Nr ) = r(1 − p)/p2
Number of failures before rth success: Kr

◦ P(k failures before rth success)
= P(k failures in k + r − 1 trials)P(success on nth trial)

k + r − 1 r−1
p (1 − p)k p, k = 0, 1, 2, . . .

=
k
◦ E[Kr ] = r/p − r, Var(Kr ) = r(1 − p)/p2
P OISSON RV
Used to model number of occurrences or events over a time

interval.
e−λ λk
◦ P(X = k) = , k = 0, 1, 2, . . .
k!
◦ E[X] = λ
◦ Var(X) = λ
Poisson RV might be used to approximate a binomial RV when n

is large and p is small.

U NIFORM RV ( DISCRETE )
Parameters a, b ∈ Z, b ≥ a.
1
pmf: P(X = k) = , k ∈ {a, a + 1, . . . , b − 1, b}
b−a+1
bkc − a + 1
cdf: F(k; a, b) =
b−a+1
a+b (b − a + 1)2 − 1
E[X] = , Var(X) =
2 12
Ex: Roll a die. E[X] = 7/2, Var(X) = 35/12.

N ORMAL (G AUSSIAN ) RV
1 (x − µ)2
f (x) = √ exp{− }, x∈R
2πσ 2 2σ 2

F(x) = 21 1 + erf σx−µ
√
2
, x∈R
If X ∼ N(µ, σ 2 ), then Y = αX + β ∼ N(αµ + β, α2 σ 2 )
Standart Normal Distribution: Z ∼ N(0, 1) → µ = 0, σ = 1

⇒ If X ∼ N(µ, σ 2 ), then X = µ + σZ
1 2
pdf of standard normal distribution: φ(x) = √ e−x /2
2π
1 x−µ
⇒ pdf of X ∼ N(µ, σ 2 ) : fX (x) = φ( )
σ σ Z x
1 2
cdf of standard normal distribution: Φ(x) = √ e−t /2 dt
2π −∞
x−µ
⇒ cdf of X ∼ N(µ, σ 2 ) : FX (x) = Φ
σ
E XPONENTIAL RV
f (x) = λe−λx , 0≤x≤∞
F(x) = 1 − e−λx , 0≤x≤∞
E[x] = 1/λ, Var(X) = 1/λ2
Let’s assume X, Y ∼ Expo(λ1 ), Expo(λ2 ) :

Z ∞
P(X < Y) = P(X < k|Y = k)P(Y = k)dk
k=0
Z ∞
λ1
= (1 − e−λ1 k )λ2 e−λ2 k dk =
k=0 λ1 + λ2
P(min(X, Y) ≤ k) = 1 − P(min(X, Y) > k)

= 1 − P(X > k)P(Y > k) = 1 − e−(λ1 +λ2 )k
⇒ min(X, Y) ∼ Expo(λ1 + λ2 )
E XPONENTIAL RV
Ex: Two individuals, A, and B, both require kidney transplants. If

she does not receive a new kidney, then A will die after an
exponential time with rate µA , and B after an exponential time with
rate µB . New kidneys arrive in accordance with a Poisson process
having rate λ. It has been decided that the first kidney will go to A
(or to B if B is alive and A is not at that time) and the next one to B
(if still living).
(a) What is the probability that A obtains a new kidney?
(b) What is the probability that B obtains a new kidney?
Soln:
(a) λ/(λ + µA )
λ(µA + λ)
(b)
(µB + λ)(λ + µA + µB )

E RLANG RV
λk xk−1 e−λx
f (x; k, λ) = , 0≤x<∞
(k − 1)!
k−1
X 1 −λx
F(x) = 1 − e (λx)n , 0≤x<∞
n!
n=0
k k
E[X] = , Var(X) =
λ λ2
λ
If X ∼ Erlang(k, λ) ⇒ aX ∼ Erlang(k, )
a
If X ∼ Erlang(k1 , λ), and Y ∼ Erlang(k2 , λ)
⇒ X + Y ∼ Erlang(k1 + k2 , λ)

G AMMA RV
Gamma(α, β), α > 0 : shape, β > 0 : rate
β α α−1 −βx
f (x; α, β) = x e , 0<x<∞
Γ (α)
Z βx
1
F(x) = tα−1 e−t dt, 0<x<∞
Γ (α) 0
Gamma Function: Γ (n) = (n − 1)! if n is integer

Z ∞
Γ (α) = tα−1 e−t dt
0
α α
E[X] = , Var(X) =
β β2

G AMMA RV
If Xi ∼ Gamma(αi , β), i = 1, 2, . . . , N
XN XN
⇒ Xi ∼ Gamma( αi , β)
i=1 i=1
If X ∼ Gamma(α, β) ⇒ cX ∼ Gamma(α, β/c)
If X ∼ Gamma(1, β) ⇒ X ∼ Expo(β)
If α is integer, gamma distribution is equivalent to Erlang

distribution. That is, if X ∼ Γ (α, λ) ⇒ X ∼ Erlang(α, λ)

B ETA RV
Beta(α, β), α > 0 : shape, β > 0 : shape
xα−1 (1 − x)β−1
f (x; α, β) = , 0≤x≤1
B(α, β)
B(x; α, β)
F(x) = , 0≤x≤1
B(α, β)
Z 1 Z x
Beta functions: B(a, b) = ta−1 (1 − t)b−1 dt, B(x; a, b) = ta−1 (1 − t)b−1 dt
0 0
α αβ
E[X] = , Var(X) =
α+β (α + β)2 (α + β + 1)
if X ∼ Gamma(α, θ), Y ∼ Gamma(β, θ)

⇒ X/(X + Y) has a beta distribution with params. α and β.
Beta(1, 1) = Unif (0, 1)

E XAMPLE : POSTERIOR DISTRIBUTION
Posterior probability is the probability of the parameters θ given

the evidence X.
p(θ)p(X|θ)
p(θ|X) =
p(X)
p(θ|X) : posterior probability
p(X|θ) : Likelihood function
p(θ) : prior
R probability
p(X) = p(X|θ)p(θ)d(θ) : Normalization constant

Ex: We have the following prior distribution: P(λ = 0.01) =

0.3, P(λ = 0.03) = 0.2, P(λ = 0.05) = 0.4, P(λ = 0.1) = 0.1.
Assume we observed 3 failures in 100 time units. Determine
posterior distribution of λ given the evidence using poisson
likelihood function.

Ex: Let P have a beta distribution with parameters a and b,

a−1 (1−p)b−1
f (p) = Γ (a+b)p
Γ (a)Γ (b) , 0 < p < 1. If we observe evidence of k
failures in n trials, show that posterior distn. for
P ∼ Beta(a + k, b + n − k).

U NIFORM RV ( CONTINUOUS )
U(a, b), −∞ < a < b < ∞
pdf:

1
 , if x ∈ [a, b]
f (x) = b − a
0, otw
cdf:


0, if x < a
x − a
F(x) = , if x ∈ [a, b]

 b−a
1, if x ≥ b

E[X] = (a + b)/2, Var(X) = (b − a)2 /12
If X has a uniform distribution, then Y = X n ∼ beta(1/n, 1)

M EMORYLESS P ROPERTY
P(X > m + n, X ≥ n)
P(X > m + n | X ≥ n) =
P(X ≥ n)
P(X > m + n)
= = P(X > m)
P(X ≥ n)
Discrete Memorylessness: Geometric distribution.
Continuous Memorylessness: Exponential distribution.

Probability Basics Sum of Independent RVs
S UM OF I NDEPENDENT D ISCRETE RV S
X1 ⊥ X2 , we are interested in distributionPof X1 +X2 or in general, if X1 , . . . , Xn

independent, what is the distribution of ni=1 Xi ?
Let X and Y be non-negative integer valued RVs, X ⊥ Y.

◦ P(X = i) = ai , P(Y = i) = bi , i = 0, 1, 2, . . .
◦ {X + Y = n} = ∪ni=0 {X = i, Y = n − i}
◦ For i 6= j, {X = i, Y = n − i} and {X = j, Y = n − j} are mutually exclusive
(or disjoint), they cannot happen at the same time.
X n
X P(X + Y = n) = P(X = i, Y = n − i)
i=0
n
X
⇒ Cn = P(X + Y = n) = ai bn−i
i=0
k
X
X P(X + Y + Z = k) = P(X + Y = i)P(Z = k − i),
i=0
P(X + Y = i) = ci , P(Z = k − i) = dk−i

Result: Let Z = X + Y where X, Y are nonnegative RVs (X, Y need

not be independent).
X X
⇒ fZ (z) = fX (x)fY|X (z − x|x) = fY (y)fX|Y (z − y|y)
x y

Ex: Let X ∼ Poisson(λ), Y ∼ Poisson(µ). If X and Y are indep.,

what’s the distribution of X + Y?
Soln:
X + Y ∼ Poisson(λ + µ)

Ex: If X ∼ Binom(n, p), Y ∼ Binom(m, p). and X and Y are indep.,

then the distribution of X + Y?
Soln:
X + Y ∼ Binom(n + m, p)

S UM OF I NDEPENDENT C ONTINUOUS RV S
Z ∞
P(X + Y ≤ a) = P(X ≤ a − y)fY (y)dy : CDF of X + Y
0
Z ∞
d
fX+Y (a) = FX+Y (a) = fX (a − y)fY (y)dy : pdf of X + Y
da 0

S UM OF I NDEPENDENT C ONTINUOUS RV S
Ex: Let Z = U1 + U2 (i.e., sum of two standard uniform RVs). Find

fZ (z).
Soln:

z,
 if 0 ≤ z ≤ 1
fZ (z) = 2 − z, if 1 < z < 2

0, otw

⇒ pdf of a triangular RV

Probability Basics Functions of RVs
F UNCTIONS OF R ANDOM VARIABLES
Question: If X1 and X2 have joint density function f , and g, h are

functions mapping R2 to R, then what is the joint density function of
the pair Y1 = g(X1 , X2 ), Y2 = h(X1 , X2 )?
Theorem: (Method of direct transformation) Let X be a continuous

RV with pdf fX and support I, where I = [a, b]. Let g : I → R be a
continuous monotonic function with inverse function h : J → I,
where J = g(I). Let Y = g(X). Then the pdf fY of Y satisfies
(
fX (h(y)) · |h0 (y)|, if y ∈ J
fY (y) =
0, otw.

Ex: Suppose X has the density

θ
fX (x) = (θ+1) , x > 1, θ > 0.
x
Find the density of Y = ln(X).

Corollary (Two RV case): Suppose X1 , X2 have joint pdf

fX1 ,X2 (x1 , x2 ) with support A = {(x1 , x2 ) : f (x1 , x2 ) > 0}. We are
interested in RVs Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ). The
transformation y1 = g1 (x1 , x2 ), y2 = g2 (x1 , x2 ) is a one-to-one
transformation of A onto B. The inverse transformation is
x1 = g−1 −1
1 (y1 , y2 ), x2 = g2 (y1 , y2 ).
The determinant of the Jacobian of this inverse determined as:

∂x1 ∂x2 ∂x ∂x ∂x2 ∂x1
∂y1 ∂y1 1 2
J = ∂x1 ∂x2 = −
∂y ∂y ∂y1 ∂y2 ∂y1 ∂y2
2 2
The joint pdf of Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) is

fY1 ,Y2 (y1 , y2 ) = |J| fX1 ,X2 (g−1 −1
1 (y1 , y2 ), g2 (y1 , y2 )).
If (y1 , y2 ) is not in the range of g, fY1 ,Y2 (y1 , y2 ) = 0.
Ex: If X and Y have joint density function f , find the density function
U = XY.

Ex: Let X1 , X2 be indep. expo. RVs, with param λ. Find joint

density of Y1 = X1 + X2 , Y2 = X1 /X2 . Show they are indep.

Limit theorems
M ODES OF CONVERGENCE
Definition: A sequence of real numbers {α1 , α2 , . . .} is said to

converge to a real number α if, for any > 0 there exists an
integer N such that for all n > N
|αn − α| <
⇒ We express the convergence as αn → α as n → ∞ or as
limn→∞ αn = α.
Ex: Let αn = 1 − 1n . For any > 0 there exists an integer N such

that for all n > N : |αn − 1| < , so limn→∞ αn = 1. (E.g., consider
N = [ 1 ])

Limit theorems
C ONVERGENCE IN P ROBABILITY
Definition: A sequence of RVs {X1 , X2 , . . .} is said to converge in

probability (or weakly) to a real number µ if for any > 0 and γ > 0
there exists an integer N such that for all n > N:
P(|Xn − µ| < ) > 1 − γ.
p p
⇒ We express the convergence as Xn →
− µ or Xn − µ →
− 0 as
n → ∞.
Alternative way to express the same

Definition: A sequence of RVs {X1 , X2 , . . .} is said to converge in
probability (or weakly) to a real number µ if for any > 0:
limn→∞ P(|Xn − µ| < ) = 1

Limit theorems
A LMOST S URE C ONVERGENCE

Definition: A sequence of RVs {X1 , X2 , . . .} is said to converge
almost surely (or strongly) to a real number µ if for any > 0:
lim P(sup |Xn − µ| < ) = 1.
N→∞ n>N
a.s.
⇒ We express the convergence as Xn − µ −−→ 0 as n → ∞.
Alternative way to express the same

Definition: A sequence of RVs {X1 , X2 , . . .} is said to converge
almost surely (or strongly) to a real number µ if
P( lim Xn = µ) = 1.
n→∞
Alternative terminology:
a.e.
Xn → X almost everywhere, Xn −−→ X
w.p.1
Xn → X with probability 1, Xn −−−→ X
Limit theorems
C ONVERGENCE IN D ISTRIBUTION
Definition: Consider a sequence of RVs X1 , X2 , . . . and a

corresponding sequence of cdfs, FX1 , FX2 , . . . so that for
n = 1, 2, . . ., FXn = P(Xn ≤ x). Suppose that there exists a cdf FX
such that for all x at which FX is continuous,
limn→∞ FXn (x) = FX (x).
Then X1 , . . . , Xn converges in distribution to RV X with cdf FX
denoted
d
Xn →
− X
and FX is the limiting distribution.

Limit theorems
C ONVERGENCE IN rth MEAN
Definition: The sequence of RVs X1 , . . . , Xn converges in rth

r
mean to RV X (or a real number µ), denoted Xn → − X if
limn→∞ E[|Xn − X|r ] = 0.
r=2
If limn→∞ E[(Xn − X)2 ] = 0, then we write Xn −−→ X
That is, {Xn } converges to X in mean-square or in quadratic mean.
Theorem: For r1 > r2 ≥ 1

r=r1 r=r
2
Xn −−−→ X ⇒ Xn −−−→ X.

Limit theorems
R ELATING THE MODES OF CONVERGENCE
Theorem: For a sequence of RVs X1 , . . . , Xn , following

relationships hold:
No other relationships hold in general.

Limit theorems
M ARKOV ’ S INEQUALITY
If X ≥ 0, then for any tZ> 0

E[X] = xf (x)dx
ZR Z
= xf (x)dx + xf (x)dx
{x≥t} {0≤x<t}
Z
≥ xf (x)dx
{x≥t}
Z
≥ tf (x)dx = tP(X ≥ t)
{x≥t}
E[X]
⇒ If t > 0, P(X ≥ t) ≤
t
Scaling Markov’s inequality: For t > 0
P(X ≥ tE[X]) ≤ (tE[X])−1 E[X] = 1/t
Limit theorems
C HEBYSHEV ’ S INEQUALITY
Apply Markov’s inequality to the RV Y = (X − E[X])2 ≥ 0

P((X − E[X])2 ≥ 2 ) ≤ Var(X)/2
Remember, Var(X) = E[(X − E[X])2 ]
We can write this as
P(|X − E[X]| ≥ ) ≤ Var(X)/2
These just require mean and variance, with no other assumptions.
One-sided Chebyshev’s inequality:

σ2
P(X ≥ E[X] + t) ≤
σ 2 + t2
Limit theorems
Ex: Let X be an arbitrary RV with unknown distribution but with

known range, e.g., 10 ≤ X ≤ 30. For random samples of size
1000, give a lower bound for P(|X̄ − E[X]| ≤ 1).
Soln: P(|X̄ − E[X]| ≤ 1) ≥ 0.9

Limit theorems
Ex: Suppose that X is a RV with mean 10 and variance 15. What

can we say about P(5 < X < 15)?
Soln: P(5 < X < 15) ≥ 2/5

Limit theorems
R EVIEW OF IMPORTANT THEOREMS (1)
Strong law of large numbers (SLLN)

Let X1 , X2 , X3 , ... be a sequence of independent and identically
distributed (i.i.d) RV’s, with E[Xi ] = µ. Then, with probability 1,
n
1X
X̄n := Xi → µ as n → ∞
n
i=1
LLN is basis for estimation via simulation

Limit theorems
R EVIEW OF IMPORTANT THEOREMS (2)
Central Limit Theorem (CLT)

Let X1 , X2 , X3 , ... be a sequence of i.i.d RVs, with mean µ and
variance σ 2 . Then,
X̄n − µ d
Zn := √ → − N(0, 1) as n → ∞
σ/ n
That is,
d
P(Zn ≤ z) → − Φ(z) as n → ∞, ∀z
Remarks:
◦ If n is large, then X̄n ≈ Nor(µ, σ 2 /n)
◦ Xi ’s need not be normally distributed
◦ Usually n ≥ 30 for better approximations (fewer observations
needed when Xi ’s are from symmetric distribution)
Limit theorems
L IMIT THEOREMS
Ex: Use Chebshev’s inequality to prove the weak law of large

numbers. Namely, if X1 , X2 , . . . are iid with mean µ and variance σ 2
then, for any ,

X1 + X2 + . . . + Xn
P | − µ |> → 0 as n → ∞
n

Limit theorems
L IMIT THEOREMS
Ex: Let Xi , i = 1, 2, . . . , 10 be independent RVs,

P each being
uniformly distributed over (0, 1). Estimate P( 10 i=1 Xi > 7).
P10
Soln: P( i=1 Xi > 7) ≈ 1 − Φ(2.2) = 0.0139

Limit theorems
L IMIT THEOREMS
Ex: (Normal approximation to the binomial) The Blue Jays play

100 independent baseball games, each of which they have
probability 0.8 of winning. What’s the probability that they win at
least 90?
Soln: P(Y ≥ 90) ≈ 1 − Φ(2.5) = 0.0088.

Generating Functions
G ENERATING F UNCTIONS
Z
iXt
Fourier Transform: E[e ] = eixt fX (x)dx
x∈R
Z
Laplace Transform: E[e−Xt ] = e−xt fX (x)dx
x∈R
Z
Xt
Moment Generating Functions: E[e ] = ext fX (x)dx
x∈R
Z
Probability Generating Functions: E[sX ] = sx fX (x)
x∈R

Generating Functions Moment Generating Functions
M OMENT G ENERATING F UNCTIONS

Z
Xt
E[e ] = ext fX (x)dx = φX (t)
φX (0) = 1
∂φX (t) ∂
= E[ ext ] = E[XeXt ]
∂t ∂t
0 00
⇒ φX (0) = E[X], φX (0) = E[X 2 ], etc.
Sum of RVs:
P
Xi t
φPi Xi (t) = E[e i ] = E[Πi eXi t ].
If Xi0 s are independent ⇒ E[Πi eXi t ] = Πi E[eXi t ] = Πi φXi (t)

Bernoulli Distribution:
φX (t) = E[eXt ] = pet + (1 − p)
Binomial Distribution:
P
φX (t) = E[eXt ] = E[e i Xi t
] = (E[eXi t ])n
= (pet + (1 − p))n → since Xi0 s are idd Bernoulli RVs.
Note that φX (t) gives a hint about distribution of a RV. If we
recognize something like (pet + (1 − p))n , then we can say that it’s
a Binomial(n, p) RV.

Geometric Distribution (trials):

∞
Xt
X pet
φX (t) = E[e ] = ext p(1 − p)x−1 =
1 − et (1 − p)
x=1
Negative Binomial Distribution : # of trials until rth success.

pet r
φNr (t) = [φN (t)]r =

t
1 − e (1 − p)
Exponential Distribution:
φX (t) = E[ext ] = λ/(λ − t)

Generating Functions Probability Generating Functions
P ROBABILITY G ENERATING F UNCTIONS

∞
X
E[sX ] = sk P(X = k) = P(s)
k=0
P(1) = 1 → quick way of verifying that P(s) is a g.f.
Ex: Poisson RV
∞ ∞
X e−λ λk X (λs)k
P(s) = sk = e−λ = eλ(s−1)
k! k!
k=0 k=0
Ex: Geometric RV
∞
X 1
P(s) = sk (1 − p)k p = p , (s < 1/(1 − p))
1 − s(1 − p)
k=0

Let P(s) be the g.f. of a mystery RV X with an unknown distribution

P(0) = 00 p0 + 01 p1 + 02 p2 + . . . = p0
∞ ∞
0 0
X X
k−1
P (s) = ks pk = ksk−1 pk ⇒ P (0) = p1
k=0 k=1
∞
00 00
X
P (s) = k(k − 1)sk−2 pk ⇒ P (0) = 2p2
k=2
∞
X
P(n) (s) = k(k − 1) . . . (k − n + 1)sk−n pk ⇒ P(n) (0) = n!pn
k=n
P(k) (0)
Then, pk = , k = 0, 1, 2, . . .
k!

Let P(s) be the g.f. of a mystery RV X with an unknown distribution

∞ ∞
0 0
X X
P (s) = ksk−1 pk ⇒ P (1) = kpk = E[X]
k=1 k=0
∞
00
X
P (s) = k(k − 1)sk−2 pk
k=2
∞
00
X
⇒ P (1) = k(k − 1)pk = E[X(X − 1)] = E[X 2 ] − E[X]
k=0
P(n) (1)

= E X(X − 1) . . . (X − n + 1)
00 0 0
Var(X) = E[X 2 ] − (E[X])2 = P (1) + P (1) − (P (1))2

Ex:
◦ P(s) = p/(1 − qs)
0
◦ P (s) = qp/(1 − qs)2
⇒ P0 (0) = (1 − p)p = P(X = 1)
⇒ P0 (1) = q/p = E[X]
00 00
⇒ P (s) = (2(1 − qs)q2 p)/(1 − qs)4 ⇒ P (0) = 2q2 p
00
Then, P(X = 2) = (1/2!)P (0) = (1 − p)2 p
⇒ Observation: There is one-to-one correspondence between the

g.f. of a RV and its probability distribution. We can recover a
probability distribution and all moments of a RV from g.f.

PGF FOR THE S UMS
Let X1 , X2 , . . . , Xn be independent non-negative integer valued RVs

where Xi has g.f. PXi (s). We are interested in g.f. of
X1 + X2 + . . . + Xn
PX1 +X2 +...+Xn (s) = E[sX1 +X2 +...+Xn ]
= E[sX1 sX2 . . . sXn ] = E[sX1 ]E[sX2 ] . . .

n
= Πi=1 PXi (s)

PGF FOR THE S UMS
Ex: X1 ∼ Poisson(λ1 ), X2 ∼ Poisson(λ2 ), X1 ⊥ X2 .

PX1 +X2 (s) = e−λ1 (1−s) e−λ2 (1−s) = e−(λ1 +λ2 )(1−s)
⇒ X1 + X2 ∼ Poisson(λ1 + λ2 )
Ex: Let X1 , X2 , . . . , Xn are iid Bernoulli RVs with

P(Xi = 1) = p = 1 − P(Xi = 0).
◦ PXi (s) = s0 q + s1 p = q + sp
n
◦ PX1 +X2 +...+Xn (s) = Πi=1 (q + sp) = (q + sp)n
◦ X1 + X2 + . . . + Xn ∼ Binom(n, p)
Note that if X1 , X2 , . . . , Xn are iid RVs, then
PXi (s) = P(s)
PX1 +X2 +...+Xn (s) = (P(s))n
Random Sums
R ANDOM S UMS
Let X1 , X2 , X3 , . . . be iid (non-negative integer valued) RVs with
P(Xi = k) = pk , k = 0, 1, 2...
Let N be a non-negative integer valued RV which is independent
of {X1 , X2 , . . .} where P(N = k) = αk , k ≥ 0. Define
◦ S0 = 0P
◦ Sn = ni=1 Xi , n = 1, 2, . . .
⇒ A random sum is SN = Ni=1 Xi .

P
X∞ ∞
X
P(SN = j) = P(SN = j, N = k) = P(SN = j|N = k)P(N = k)
k=0 k=0
∞
X
= P(Sk = j)P(N = k)
k=0
Ex: Consider people who are coming to a mall.
N : # of people coming. Xi : money spent by ith customer.
Random Sums
R ANDOM S UMS
E[N]
X
E[SN ] = E[N]E[X1 ] 6= E[Xi ] → E[N] may not be integer!
i=1
Computing E[SN ] by conditioning on N:

∞
X
◦ E[SN ] = EN ESN [SN | N] = E SN | N = k P(N = k)
k=0
∞
X k
hX i ∞
X
= E Xi αk = kE[X1 ]αk = E[X1 ]E[N]
k=0 i=1 k=0

Random Sums
PGF FOR R ANDOM S UMS

∞
X ∞
X ∞
X
j j
PSN (s) = s P(SN = j) = s P(Sk = j)αk
j=0 j=0 k=0
∞
X ∞
X ∞
X
= αk sj P(Sk = j) = αk (PX1 (s))k
k=0 j=0 k=0
⇒ PSN (s) = PN (PX1 (s))
→ Sk = X1 + X2 + . . . + Xk , k is known (g.f. for the sums)

Random Sums
PGF FOR R ANDOM S UMS

Computing E[SN ] by using g.f.
∂PSN (s)
PSN (s) = PN (PX1 (s)) ⇒ |s=1 = E[SN ]
∂s
∂PSN (s) 0 0
= (PX1 (s)) (PN (PX1 (s))) .
∂s
0
(PX1 (s)) |s=1 = E[X1 ],
0 0
PX1 (1) = 1 ⇒ (PN (PX1 (1))) = (PN (1)) = E[N]
⇒ E[SN ] = E[X1 ]E[N]
Ex: Let P(X1 = 1) = p = 1 − P(X1 = 0) and N ∼ Poisson(λ).

Calculate PSN (s).
◦ PSN (s) = PN (PX1 (s)) = PN (ps + q)
= e−λ(1−(ps+q)) = e−λp(1−s)
⇒ SN ∼ Poisson(λp)
Simple Branching Process
S IMPLE B RANCHING P ROCESS
We have a pmf {pk } on non-negative integers and a “progenitor”

who forms generation zero. The progenitor splits into k offsprings
with probability pk and the offsprings constitute the first generation.
Each of the members of the first generation is split into a random

number of offsprings, again with the same mass function {pk }.
This process continues until extinction, if occurs.
An example could be family generations or in queuing, a branch
may be number of arrivals during 1st job is processed.
Let {Zn,j , n ≥ 1, j ≥ 1} be iid RVs with pmf {pk , k = 0, 1, 2, . . .}.

Zn,j corresponds to the number of members of the nth generation
that are offsprings of the jth member of the (n − 1)st generation.
Let {Zn , n ≥ 0} be a branching process with

Z0 = 1 → generation zero
Z1 = Z1,1
Z2 = Z2,1 + Z2,2 + . . . + Z2,Z1
Zn−1
X
Zn = Zn,1 + Zn,2 + . . . + Zn,Zn−1 = Zn,j
j=1
⇒ This is a random sum of RVs

Note that Zn−1 and {Zn,j , j = 1, 2, . . .} are independent. We can

use the generating functions to determine pmf of Zn .
PZn (s) = E[szn ]
∞
X
Let P(s) = sk pk , and E[sZ1 ] = P1 (s).
k=0
P(s) = s → s1 (gen. zero)

Pn (s) = Pn−1 (P(s)) → from rand. sum of RVs
P2 (s) = P(P(s))
P3 (s) = P(P2 (s)) = P(P(P(s)))
⇒ Pn (s) = P(Pn−1 (s))

Ex: Let P{Z1,1 = 0} = 1 − p = q, P{Z1,1 = 1} = p, 0 < p < 1.
P(s) = P1 (s) = q + ps
P2 (s) = P(P(s)) = q + p(q + ps) = q + pq + p2 s
Pn+1 (s) = q + pq + p2 q + . . . + pn q + pn+1 s = E[sZn+1 ]

∞
X
Pn+1 (s) = (q + pq + p2 q + . . . + pn q)s0 + pn+1 s1 = sk pk
k=0
2 n
P(Zn+1 = 0) = q + pq + p q + . . . + p q
P(Zn+1 = 1) = pn+1
P(Zn+1 = 2) = 0
∞
X
lim P(Zn+1 = 0) = q pi = q/(1 − p) = 1 ⇒ this family will extinct!
n→∞
i=0

If E[Z1,1 ] = p < 1, there will certainly be extinction.

⇒ It’s the measure of extinction.
∞
X
Define mn = E[Zn ]. Let m1 = m = E[Z1 ] = kpk
k=0
0 0 0
P2 (s) = P (P(s))P (s) → chain rule for derivatives
0 0
P2 0 (1) = P (1)P (1) = m2
0 0 0 0
P3 (s) = P (P2 (s))P2 (s) ⇒ P3 (1) = m3
0
⇒ Pn (s) = mn = mn


Our purpose is to be able to characterize the probability of
extinction of a branching process.
πn = P(Zn = 0) = Pn (0)
π = P{extinction}
πn = P{the extinction occurs on generation n or before}
{extinction} = ∪∞
n=1 {Zn = 0}, and {Zn = 0} ⊂ {Zn+1 = 0}
⇒ All the outcomes that lead to {Zn = 0} imply that {Zn+1 = 0}

π = P{∪∞ n
k=1 {Zk = 0}} = P( lim ∪k=1 {Zk = 0})
n→∞
= lim P(Zn = 0) = lim πn
n→∞ n→∞
Note that there are trivial cases such as:
if p0 = 0 ⇒ π = 0 if p0 = 1 ⇒ π = 1
Theorem
Suppose 0 < p0 < 1.
◦ If m = E[z1 ] ≤ 1, then π = 1.
◦ If m > 1, then π < 1 is the unique non-negative solution of
s = P(s).


Ex: Consider an operator of a sales booth at a computer show
that takes orders. Each order takes three minutes to fill. While
each order is being filled, there is probability pj that j more
customers will arrive and join the line. Assume
p0 = 0.2, p0 = 0.2, p0 = 0.6. The operator cannot take a break
until a service is completed and no one is waiting in line to order.
If present conditions persist, what is the probability that the
operator will ever take a break?

Simple Random Walk
S IMPLE R ANDOM WALK
Let {Xn , n ≥ 1} be iid RVs with possible values {−1, 1} where

P(X1 = 1) = 1 − P(X1 = −1) = p.
The Random Walk Process:

{Sn , n ≥ 0} with S0 = 0.
Sn = X1 + X2 + . . . + Xn = Sn−1 + Xn .
Define N = min{n : Sn = 1}, S0 = 0.

→ First passage time or hitting time.
b = min{n : Sn = 0}, S0 = −1
N
→ Starting point shifted to level -1.
N and N̂ are identical RVs.
Also, N = 1 ⇐⇒ Xi = 1 w.p. p
Simple Random Walk
S IMPLE R ANDOM WALK
φn = P(N = n), n ≥ 0
φ0 = 0
φ1 = p
n−2
X
φn = qφj φn−j−1
j=1
∞
X
Φ(s) = sn φn
n=0
E[N] =?

MIE1605 1b ProbabilityReview PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIE1605 1b ProbabilityReview PDF

Uploaded by

Copyright:

Available Formats

Probability Review

September 19, 2017

Meeting Times: Tuesdays 09:00-12:00 PM

M Cevik MIE1605 - Probability Review 3 / 96

M Cevik MIE1605 - Probability Review 4 / 96

A measure on a set is a systematic way to assign a number to

Let X be a set and Σ a σ-field over X. A function µ from Σ to the

A triple (X, Σ, µ) is called a measure space.

M Cevik MIE1605 - Probability Review 5 / 96

Probability space is a measure space with a probability measure

A probability space consists of three parts:

M Cevik MIE1605 - Probability Review 6 / 96

Sample Space Ω : Set of all possible outcomes ωi of an

Ex: Toss a coin until a head appears.

⇒ An event is a set of outcomes of the experiment.

M Cevik MIE1605 - Probability Review 7 / 96

Set of events, F: A collection of subsets of Ω is called a σ-field

Probability measure, P: A function P : F 7→ [0, 1] such that

M Cevik MIE1605 - Probability Review 8 / 96

A function X : Ω → T is called a (T-valued) random variable (RV)

M Cevik MIE1605 - Probability Review 9 / 96

Probability density function (pdf): A continuous RV X has density

Probability mass function (pmf): A discrete RV X : S → A has a

Cumulative distribution function (cdf):

Let Ai ’s constitute a partition of the sample space S.

M Cevik MIE1605 - Probability Review 11 / 96

Joint pmf: p(x, y) = P(X = x, Y = y)

Single RV X: Cumulative distribution function

Two RV’s X, Y: Joint cdf

Continuous RV’s: Joint cdf

M Cevik MIE1605 - Probability Review 12 / 96

Relation between joint and individual RV densities:

Similar for a discrete RV, marginal mass function

M Cevik MIE1605 - Probability Review 13 / 96

Ex: Consider following joint pdf

◦ This is a joint density (it integrates to 1)

◦ Calculation of the marginals:

◦ X and Y are exponentially distributed with mean 1

M Cevik MIE1605 - Probability Review 14 / 96

C ONDITIONAL P ROBABILITY D ISTRIBUTIONS

Conditional probability mass function:

Conditional probability density function:

Ex: Suppose the joint pmf of X and Y is given by

M Cevik MIE1605 - Probability Review 15 / 96

P(A|B) = P(A) ⇒ A and B (events) are independent, A ⊥ B.

RVs X and Y are independent if one of the following holds:

Ex1: A coin is tossed n times: Let X be # of heads, Y be # of tails.

Ex2: X, Y having joint density exp[−(x + y)] are independent

M Cevik MIE1605 - Probability Review 16 / 96

Ex: Discrete RV’s that are not independent:

Joint mass function

p(0, 0) = 0.25, pX (0) = 0.25, pY (0) = 0.5

M Cevik MIE1605 - Probability Review 17 / 96

R and B are conditionally independent given Y

Ex: Consider a coin which can be fair or biased (favoring H):

BASIC PROBABILITY EXAMPLE

P(A) =?, P(B) =?, P(C) =?

P(A ∩ C) =?, P(B ∩ C) =?

P(A|C) =?, P(B|C) =?

P(A|B ∩ C) =?, P(A ∩ B|C) =?

M Cevik MIE1605 - Probability Review 19 / 96

For RV X, the mean (or expectation) is defined to be

E[ ni=1 Xi ] = ni=1 E[Xi ], Xi0 s not need to be indep.

Let X ∼ Bin(n, p).Then,