Professional Documents
Culture Documents
Review Maximum likeliho o d estimation Have model with unknown parameters: X pX (x; ) Pick that makes data most likely max pX (x; )
Outline in the context of various probabilistic frameworks, which provide perspective and a mechanism for quantitative analysis. consider Review We rst the case of only two variables, and then generalize. We wish to model the relation between two variables of interest, x and y (e.g., years Maximum likelihood estimation of education and income), based on a collection of data pairs (xi , yi ), i = 1, . . . , n. For example, xi could be the years of education and yi the annual income of the Condence intervals ith person in the sample. Often a two-dimensional plot of these samples indicates a systematic, linear relation between xi and yi . Then, it is natural approximately Linear regression to attempt to build a linear model of the form
y 0 testing + 1 x, Binary hypothesis 1 are unknown parameters to be estimated. where 0 and Types of error 0 and 1 of the resulting parameters, In particular, given some estimates the value yi corresponding to xiratio , as predicted by the model, is Likelihood test (LRT) 0 + 1 xi . y i = Generally, y i will be dierent from the given value yi , and the corresponding dierence y i = yi y i , is called the ith residual. A choice of estimates that results in small residuals is considered to provide a good t to the data. With this motivation, the linear 0 and 1 that minimize regression approach chooses the parameter estimates the sum of the squared residuals,
n n (yi y i )2 = (yi 0 1 xi )2 , i=1 i=1
pX |(x| )p( ) pY (y )
+ P( n n ) 1 ,
z z n n + P 1 n n
Regression
y (xi , yi ) Residual 0 1 xi x yi
min
0 + 1 x xy= =0 yx
( y i 0 1 xi )2
y + + yn y= 1 n
Figure 9.5: Illustration of a set of data pairs (xi , yi ), and a linear model y = 1 x, 0 + by minimizing , + 01 x obtained Model: y 0over 1 the sum of the squares of the residuals yi 0 1 x i .
0 ,1
min
i=1
( yi 0 1 x i ) 2
()
1x 0 = y
One interpretation: Yi = 0 + 1xi + Wi, Wi N (0, 2), i.i.d. Likelihood function fX,Y | (x, y ; ) is: c exp
Interpretation of the form of the solution Assume a model Y = 0 + 1X + W W independent of X , with zero mean Check that
1 uses natural Solution formula for estimates of the variance and covariance
The world of regression (ctd.) In practice, one also reports Condence intervals for the i Standard error (estimate of ) R2, a measure of explanatory power Some common concerns Heteroskedasticity Multicollinearity Sometimes misused to conclude causal relations etc.
model: y 0 + x + x + x formulation:
, , i=1 n 2 (y i 0 x i x i xi )
min
Choosing the right variables model y 0 + 1h(x) e.g., y 0 + 1x2 work with data points (yi, h(x)) formulation: min
n
i=1
(yi 0 1h1(xi))2
Binary hypothesis testing Binary ; new terminology: null hypothesis H0: X pX (x; H0) [or fX (x; H0)]
Likelihood ratio test (LRT) Bayesian case (MAP rule): choose H1 if: P(H1 | X = x) > P(H0 | X = x) or
alternative hypothesis H1: [or fX (x; H1)] X pX (x; H1) Partition the space of possible data vectors Rejection region R: reject H0 i data R Types of errors: Type I (false rejection, false alarm): H0 true, but rejected (R) = P(X R ; H0) Type II (false acceptance, missed detection): H0 false, but accepted (R) = P(X R ; H1)
P(X = x | H1)P(H1) P(X = x | H0)P(H0) > P(X = x) P(X = x) or P(X = x | H1) P(H0) > P(X = x | H0) P(H1)
(likelihood ratio test) Nonbayesian version: choose H1 if
threshold trades o the two types of error choose so that P(reject H0; H0) = (e.g., = 0.05)
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.