You are on page 1of 3

LECTURE 24 Reference: Section 9.

3 Course Evaluations (until 12/16)


http://web.mit.edu/subjectevaluation
476 Classical Statistical Inference Chap. 9

Review Maximum likeliho o d estimation Have model with unknown parameters: X pX (x; ) Pick that makes data most likely max pX (x; )

Outline in the context of various probabilistic frameworks, which provide perspective and a mechanism for quantitative analysis. consider Review We rst the case of only two variables, and then generalize. We wish to model the relation between two variables of interest, x and y (e.g., years Maximum likelihood estimation of education and income), based on a collection of data pairs (xi , yi ), i = 1, . . . , n. For example, xi could be the years of education and yi the annual income of the Condence intervals ith person in the sample. Often a two-dimensional plot of these samples indicates a systematic, linear relation between xi and yi . Then, it is natural approximately Linear regression to attempt to build a linear model of the form
y 0 testing + 1 x, Binary hypothesis 1 are unknown parameters to be estimated. where 0 and Types of error 0 and 1 of the resulting parameters, In particular, given some estimates the value yi corresponding to xiratio , as predicted by the model, is Likelihood test (LRT) 0 + 1 xi . y i = Generally, y i will be dierent from the given value yi , and the corresponding dierence y i = yi y i , is called the ith residual. A choice of estimates that results in small residuals is considered to provide a good t to the data. With this motivation, the linear 0 and 1 that minimize regression approach chooses the parameter estimates the sum of the squared residuals,
n n (yi y i )2 = (yi 0 1 xi )2 , i=1 i=1

Compare to Bayesian MAP estimation: max p|X ( | x) or max


pX |(x| )p( ) pY (y )

Sample mean estimate of = E[X ] n = (X1 + + Xn)/n 1 condence interval

+ P( n n ) 1 ,

condence interval for sample mean let z be s.t. (z ) = 1 /2

z z n n + P 1 n n

over all 1 and 2 ; see Fig. 9.5 for an illustration.

Regression
y (xi , yi ) Residual 0 1 xi x yi

Linear regression Model y 0 + 1x


0 ,1 i=1

min

0 + 1 x xy= =0 yx

( y i 0 1 xi )2

Solution (set derivatives to zero): x + + xn x= 1 , n 1 =

y + + yn y= 1 n

Figure 9.5: Illustration of a set of data pairs (xi , yi ), and a linear model y = 1 x, 0 + by minimizing , + 01 x obtained Model: y 0over 1 the sum of the squares of the residuals yi 0 1 x i .

Data: (x1, y1), (x2, y2), . . . , (xn, yn)


n

n i=1(xi x)(yi y ) n 2 i=1(xi x)

0 ,1

min

i=1

( yi 0 1 x i ) 2

()

1x 0 = y

One interpretation: Yi = 0 + 1xi + Wi, Wi N (0, 2), i.i.d. Likelihood function fX,Y | (x, y ; ) is: c exp

Interpretation of the form of the solution Assume a model Y = 0 + 1X + W W independent of X , with zero mean Check that

1 (yi 0 1xi)2 2 2 i=1

Take logs, same as (*) Least sq. pretend Wi i.i.d. normal

1 uses natural Solution formula for estimates of the variance and covariance

cov(X, Y ) E (X E[X ])(Y E[Y ]) 1 = = var(X ) E (X E[X ])2

The world of linear regression Multiple linear regression:


data: (xi, x i , xi , yi ), i = 1, . . . , n

The world of regression (ctd.) In practice, one also reports Condence intervals for the i Standard error (estimate of ) R2, a measure of explanatory power Some common concerns Heteroskedasticity Multicollinearity Sometimes misused to conclude causal relations etc.

model: y 0 + x + x + x formulation:
, , i=1 n 2 (y i 0 x i x i xi )

min

Choosing the right variables model y 0 + 1h(x) e.g., y 0 + 1x2 work with data points (yi, h(x)) formulation: min
n

i=1

(yi 0 1h1(xi))2

Binary hypothesis testing Binary ; new terminology: null hypothesis H0: X pX (x; H0) [or fX (x; H0)]

Likelihood ratio test (LRT) Bayesian case (MAP rule): choose H1 if: P(H1 | X = x) > P(H0 | X = x) or

alternative hypothesis H1: [or fX (x; H1)] X pX (x; H1) Partition the space of possible data vectors Rejection region R: reject H0 i data R Types of errors: Type I (false rejection, false alarm): H0 true, but rejected (R) = P(X R ; H0) Type II (false acceptance, missed detection): H0 false, but accepted (R) = P(X R ; H1)

P(X = x | H1)P(H1) P(X = x | H0)P(H0) > P(X = x) P(X = x) or P(X = x | H1) P(H0) > P(X = x | H0) P(H1)
(likelihood ratio test) Nonbayesian version: choose H1 if

P(X = x; H1) > (discrete case) P(X = x; H0)


fX (x; H1) > fX (x; H0) (continuous case)

threshold trades o the two types of error choose so that P(reject H0; H0) = (e.g., = 0.05)

MIT OpenCourseWare http://ocw.mit.edu

6.041 / 6.431 Probabilistic Systems Analysis and Applied Probability


Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like