Professional Documents
Culture Documents
2 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Simple Linear Regression Model:
𝑌 =𝛼 +𝛽 ×𝑋 +𝜖 ,
where 𝑌 is GrossPay observed for 𝑗 employee with 𝑖 JobTitle such as Account II;
𝑋 is AnnualSalary for 𝑗 employee with 𝑖 JobTitle; 𝜖 ’s are independently and identically distributed
𝛼 ∑ 1 ∑ 𝑥 ∑ 𝑦
Least-squares solution: = × ,
𝛽 ∑ 𝑥 ∑ 𝑥 ∑ 𝑥 𝑦
∑ 𝑦 ∑ 1 ∑ 𝑥 ∑ 𝑦
𝜎 = 𝑀𝑆𝐸 = ∑ 𝑦 −
∑ 𝑥 𝑦 ∑ 𝑥 ∑ 𝑥 ∑ 𝑥 𝑦
3 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Simple Linear Regression Model:
𝑌 =𝛼 +𝛽 ×𝑋 +𝜖
𝛼 1 𝑥 𝑦
Least-squares solution: = ∑ × ∑ 𝑥 ×𝑦 ,
𝛽 𝑥 𝑥
𝑦 1 𝑥 𝑦
𝜎 = 𝑀𝑆𝐸 = ∑ 𝑦 − ∑ 𝑥 𝑦 × ∑ × ∑ 𝑥 𝑦
𝑥 𝑥
The variance of the intercept and slope estimates will be the diagonal elements of
following matrix:
1 𝑥
𝜎 ×
𝑥 𝑥
4 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
salary_grosspay_regr_mapper2.py
5 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
salary_grosspay_regr_mapper2.py
6 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
salary_grosspay_regr_reducer2.py
7 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
salary_grosspay_regr_reducer2.py
8 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Run salary_grosspay_regr_mapper2.py
9 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Run salary_grosspay_regr_mapper2.py
10 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Run salary_grosspay_regr_reducer2.py
11 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Results after running the reducer:
12 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Results after running the reducer:
13 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018
Regression w/ an intercept and a slope
Results after running the reducer:
14 STA 9760 Big Data Tech | Spring 2018 | Junyi Zhang 3/23/2018