Professional Documents
Culture Documents
6. Linear regression
1/30
Goals
Consider two random variables X and Y . For example,
1. X is the amount of $ spent on Facebook ads and Y is the
total conversion rate
2. X is the age of the person and Y is the number of clicks
Given two random variables (X, Y ), we can ask the following
questions:
2/30
Conversions vs. amount spent
3/30
Clicks vs. age
4/30
Modeling assumptions
h(y|x) =
5/30
Partial modeling
We can also describe the distribution only ,e.g., using
I The expectation of Y :
is called
I Other possibilities:
I The conditional median: such that
Z m
h(y|x)dy =
1
I Conditional
I Conditional variance (not informative about location)
6/30
Conditional expectation and standard deviation
7/30
Conditional expectation
8/30
Conditional density and conditional quantiles
9/30
Conditional distribution: boxplots
10/30
Linear regression
f (x) = a + bx
11/30
Nonparametric regression
12/30
Linear regression
13/30
Probabilistic analysis
I Let X and Y be two real r.v. (not necessarily independent)
with two moments and such that var(X) > 0.
I a⇤ = IE[Y ] ⇤
b IE[X] = IE[Y ] IE[X].
14/30
Noise
a ⇤
Clearly the points are not exactly on the line x 7! + ⇤
b xif
⇤ ⇤
(Y |X = x) > 0. The random variable " = Y (a + b X) is
Y = a + bX + ",
with
I IE["] = 0 and
I cov(X, ") = 0.
15/30
Statistical problem
In practice ⇤
a ,b⇤ need to be estimated from data.
16/30
Statistical problem
⇤ ⇤
Y i = a + b Xi + "i
17/30
Statistical problem
⇤ ⇤
Y i = a + b Xi + "i
18/30
Statistical problem
⇤ ⇤
Y i = a + b Xi + "i
19/30
Statistical problem
⇤ ⇤
Y i = a + b Xi + "i
20/30
Least squares
Definition
The least squared error (LSE) estimator of (a, b) is the minimizer
of the sum of squared errors:
n
X
2
(Yi a bXi ) .
i=1
XY X̄ Ȳ
b̂ =
X2 X̄ 2
â = Ȳ b̂X̄.
21/30
Residuals
22/30
Multivariate regression
> ⇤
Yi = Xi + "i , i = 1, . . . , n.
I ⇤
= ⇤ ⇤ > >
(a , b ) ; ⇤ (=
1
⇤
a ) is called the intercept.
I Y= , ⇤
unknwon.
ˆ = argmin kY X 2
k2 .
2IRp
24/30
Closed form solution
ˆ = (X> X) 1 >
X Y.
25/30
Statistical inference
"⇠
26/30
Properties of LSE
I LSE = MLE
I Distribution of ˆ : ˆ ⇠ .
h i ⇣ ⌘
I Quadratic risk of ˆ : IE k ˆ 2
k2 = 2 >
tr (X X) 1
.
h i
I Prediction error: IE kY ˆ 2
X k2 = 2
(n p).
I Unbiased estimator of 2: ˆ =2
.
Theorem
ˆ2
I (n p) ⇠ .
2
I ˆ ?? ˆ 2 .
27/30
Significance tests
I Test whether the j-th explanatory variable is significant in the
linear regression (1 j p).
I H0 : j = 0 v.s. H1 : j 6= 0.
ˆj
I Let Tn(j) = p .
ˆ2 j
Rj,↵ =
where q ↵2 (tn p ) is the (1 ↵/2)-quantile of tn p.
I We can also compute p-values.
28/30
Bonferroni’s test
I H0 : j = 0, 8j 2 S v.s. H1 : 9j 2 S, j 6= 0, where
S ✓ {1, . . . , p}.
29/30
Remarks
30/30