Mit6 041f10 l24

LECTURE 24 Reference: Section 9.
3 Course Evaluations (until 12/16)

http://web.mit.edu/subjectevaluation
476 Classical Statistical Inference Chap. 9
Review Maximum likeliho o d estimation Have model with unknown parameters: X pX (x; ) Pick that makes data most likely max pX (x; )
Outline in the context of various probabilistic frameworks, which provide perspective and a mechanism for quantitative analysis. consider Review We rst the case of only two variables, and then generalize. We wish to model the relation between two variables of interest, x and y (e.g., years Maximum likelihood estimation of education and income), based on a collection of data pairs (xi , yi ), i = 1, . . . , n. For example, xi could be the years of education and yi the annual income of the Condence intervals ith person in the sample. Often a two-dimensional plot of these samples indicates a systematic, linear relation between xi and yi . Then, it is natural approximately Linear regression to attempt to build a linear model of the form
y 0 testing + 1 x, Binary hypothesis 1 are unknown parameters to be estimated. where 0 and Types of error 0 and 1 of the resulting parameters, In particular, given some estimates the value yi corresponding to xiratio , as predicted by the model, is Likelihood test (LRT) 0 + 1 xi . y i = Generally, y i will be dierent from the given value yi , and the corresponding dierence y i = yi y i , is called the ith residual. A choice of estimates that results in small residuals is considered to provide a good t to the data. With this motivation, the linear 0 and 1 that minimize regression approach chooses the parameter estimates the sum of the squared residuals,
n n (yi y i )2 = (yi 0 1 xi )2 , i=1 i=1
Compare to Bayesian MAP estimation: max p|X ( | x) or max

pX |(x| )p( ) pY (y )
Sample mean estimate of = E[X ] n = (X1 + + Xn)/n 1 condence interval
+ P( n n ) 1 ,
condence interval for sample mean let z be s.t. (z ) = 1 /2
z z n n + P 1 n n
over all 1 and 2 ; see Fig. 9.5 for an illustration.
Regression
y (xi , yi ) Residual 0 1 xi x yi
Linear regression Model y 0 + 1x

0 ,1 i=1
min
0 + 1 x xy= =0 yx
( y i 0 1 xi )2
Solution (set derivatives to zero): x + + xn x= 1 , n 1 =
y + + yn y= 1 n
Figure 9.5: Illustration of a set of data pairs (xi , yi ), and a linear model y = 1 x, 0 + by minimizing , + 01 x obtained Model: y 0over 1 the sum of the squares of the residuals yi 0 1 x i .
Data: (x1, y1), (x2, y2), . . . , (xn, yn)

n
n i=1(xi x)(yi y ) n 2 i=1(xi x)
0 ,1
min
i=1
( yi 0 1 x i ) 2
()
1x 0 = y
One interpretation: Yi = 0 + 1xi + Wi, Wi N (0, 2), i.i.d. Likelihood function fX,Y | (x, y ; ) is: c exp
Interpretation of the form of the solution Assume a model Y = 0 + 1X + W W independent of X , with zero mean Check that
1 (yi 0 1xi)2 2 2 i=1
Take logs, same as (*) Least sq. pretend Wi i.i.d. normal
1 uses natural Solution formula for estimates of the variance and covariance
cov(X, Y ) E (X E[X ])(Y E[Y ]) 1 = = var(X ) E (X E[X ])2
The world of linear regression Multiple linear regression:

data: (xi, x i , xi , yi ), i = 1, . . . , n
The world of regression (ctd.) In practice, one also reports Condence intervals for the i Standard error (estimate of ) R2, a measure of explanatory power Some common concerns Heteroskedasticity Multicollinearity Sometimes misused to conclude causal relations etc.
model: y 0 + x + x + x formulation:
, , i=1 n 2 (y i 0 x i x i xi )
min
Choosing the right variables model y 0 + 1h(x) e.g., y 0 + 1x2 work with data points (yi, h(x)) formulation: min
n
i=1
(yi 0 1h1(xi))2
Binary hypothesis testing Binary ; new terminology: null hypothesis H0: X pX (x; H0) [or fX (x; H0)]
Likelihood ratio test (LRT) Bayesian case (MAP rule): choose H1 if: P(H1 | X = x) > P(H0 | X = x) or
alternative hypothesis H1: [or fX (x; H1)] X pX (x; H1) Partition the space of possible data vectors Rejection region R: reject H0 i data R Types of errors: Type I (false rejection, false alarm): H0 true, but rejected (R) = P(X R ; H0) Type II (false acceptance, missed detection): H0 false, but accepted (R) = P(X R ; H1)
P(X = x | H1)P(H1) P(X = x | H0)P(H0) > P(X = x) P(X = x) or P(X = x | H1) P(H0) > P(X = x | H0) P(H1)
(likelihood ratio test) Nonbayesian version: choose H1 if
P(X = x; H1) > (discrete case) P(X = x; H0)

fX (x; H1) > fX (x; H0) (continuous case)
threshold trades o the two types of error choose so that P(reject H0; H0) = (e.g., = 0.05)
MIT OpenCourseWare http://ocw.mit.edu
6.041 / 6.431 Probabilistic Systems Analysis and Applied Probability

Fall 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Mit6 041f10 l24

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mit6 041f10 l24

Uploaded by

Copyright:

Available Formats

LECTURE 24 Reference: Section 9.

3 Course Evaluations (until 12/16)

Compare to Bayesian MAP estimation: max p|X ( | x) or max

Sample mean estimate of = E[X ] n = (X1 + + Xn)/n 1 condence interval

condence interval for sample mean let z be s.t. (z ) = 1 /2

over all 1 and 2 ; see Fig. 9.5 for an illustration.

Linear regression Model y 0 + 1x

Solution (set derivatives to zero): x + + xn x= 1 , n 1 =

Data: (x1, y1), (x2, y2), . . . , (xn, yn)

n i=1(xi x)(yi y ) n 2 i=1(xi x)

1 (yi 0 1xi)2 2 2 i=1

Take logs, same as (*) Least sq. pretend Wi i.i.d. normal

cov(X, Y ) E (X E[X ])(Y E[Y ]) 1 = = var(X ) E (X E[X ])2

The world of linear regression Multiple linear regression:

P(X = x; H1) > (discrete case) P(X = x; H0)

MIT OpenCourseWare http://ocw.mit.edu

6.041 / 6.431 Probabilistic Systems Analysis and Applied Probability

You might also like