Professional Documents
Culture Documents
Prof. Tesler
1 / 10
Estimating parameters
Let Y be a random variable with a distribution of known type but unknown parameter value .
Bernoulli or geometric with unknown p. Poisson with unknown mean .
Write the pdf of Y as PY(y; ) to emphasize that there is a parameter . Do n independent trials to get data y1 , y2 , y3 , . . . , yn . The joint pdf is PY1 ,...,Yn(y1 , . . . , yn ; ) = PY(y1 ; ) PY(yn ; ) Goal: Use the data to estimate .
Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / April 12, 2011 2 / 10
Likelihood function
Previously, we knew the parameter and regarded the ys as unknowns (occurring with certain probabilities). Dene the likelihood of given data y1 , . . . , yn to be L(; y1 , . . . , yn ) = PY1 ,...,Yn(y1 , . . . , yn ; ) = PY(y1 ; ) PY(yn ; ) Its the exact same formula as the joint pdf; the difference is the interpretation. Now we consider the data y1 , . . . , yn to be given and to be an unknown.
0.
L ( ; y1 , . . . , yn ) =
i=1
yi n y1 ++yn e e = yi ! y1 ! yn !
Log likelihood is maximized at the same and is easier to use: ln L(; y1 , . . . , yn ) = n + (y1 + + yn ) ln ln(y1 ! yn !) Critical point: Solve d(ln L)/d = 0: d(ln L) y1 + + yn = n + =0 d
Prof. Tesler
so
y1 + + yn = n
Math 283 / April 12, 2011 4 / 10
so
Check second derivative is negative: d2 (ln L) y1 + + yn n = = 0 2 2 d y1 + + yn since y1 + + yn 0. So its a max unless y1 = = yn = 0. Boundaries for range 0: Must check 0+ and . Both send ln L , so the identied above gives the max.
and
ln L < 0 if > 0.
should be narrowly distributed around the correct value of . Increasing n should improve the estimate. The distribution of should be known. The MLE often does this.
Prof. Tesler
8 / 10
Bias
Suppose Y is Poisson with secret parameter . Poisson MLE from data is Y1 + + Yn = n If many MLEs are computed from independent data sets, the average tends to Y1 + + Yn E(Y1 ) + + E(Yn ) E ( ) = E = n n n + + = = = n n Since E( ) = , we say is an unbiased estimator of . If the formula were different such that we had E( ) , we would say is a biased estimator of . E.g.: = 2Y1 has E( ) = 2, so its biased (unless = 0).
Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / April 12, 2011 9 / 10
Var(Y1 ) + + Var(Yn ) = n2 n Var(Y1 ) Var(Y1 ) = = = 2 n n n Increasing n makes the variance smaller ( is more efcient). Heres a second estimator: Use Y1 , Y2 and discard Y3 , . . . , Yn . Y1 + 2Y2 = 3 + 2 E ( )= = 3 so unbiased
Var(Y1 ) + 4 Var(Y2 ) + 4 5 Var( )= = = 9 9 9 so it has higher variance (less efcient) than the MLE.
Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / April 12, 2011 10 / 10