Professional Documents
Culture Documents
Submitted By:
Vasantada Srikanth
Section: A
Question 1
A) Although the R2 value is low (4.33%) it does not mean that the regression is wrong.
The p-value of the regression is 0.0002445 which is less than 0.05 and the F value
4.839 exceeds the critical F value of 1. So the regression is statistically significant.
B) When two variables in a regression model have high level of correlation (above 90%)
between them, meaning that one can be linearly predicted from the others with a
substantial degree of accuracy. In this situation the coefficient estimates of the
multiple regression may change erratically in response to small changes in the model
or the data. In these cases regression will have high R square value but regression
model will be wrong. This is called as Multi Co-linearity problem.
C) From the Plot Residual vs Fitted Values graph, we can observe that there is no
specific pattern. If there is no pattern, then we can say that the relationship is linear.
So, relationship assumption is valid
Graph shows that variances of the error terms is not increasing or decreasing with the
value of the response. So, homoscedasticity assumption is valid.
Question 2:
Regression Equation:
RET = - 2.642159 - 2.110461GRI + 0.005735 SAT -0.180647 MBA 0.068893AGE
0.118722TEN
From case
A) If Bob attended Princeton instead of Ohio then both their SAT scores will be 1355.
Regression Equation:
RET = - 2.642159 - 2.110461GRI + 0.005735 SAT -0.180647 MBA 0.068893AGE
0.118722TEN
From case
We can observe that his return on current fun is increased because of the positive
coefficient. But it is still less than Rockfeller.
Regression Equation:
RET = - 2.642159 - 2.110461GRI + 0.005735 SAT -0.180647 MBA 0.068893AGE
0.118722TEN
From case
From the analysis we can say, If he were to manage a growth fund instead of an
income fund, then he would achieve at least 1% higher return.
Question 4:
Call:
lm(formula = RET ~ MBA, data = myData)
Residuals:
Min 1Q Median 3Q Max
-33.698 -4.436 -0.409 4.210 36.697
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7238 0.5943 -1.218 0.224
MBA 0.3349 0.7545 0.444 0.657
The linear regression equation is performed by taking MBA as independent variable , while
keeping all other TEN, AGE,GRI & SAT factors constant.
Since the Coefficient of MBA is positive a person with MBA will outperform with person
without MBA. But p- value is more than > 0.05 and R square is also pretty low, this questions
the validity of the regression equation.
B) From the analysis we can observe that the coefficient of the MBA is positive. If a person
without MBA gets higher returns, then the coefficient of Beta should be negative. Since that
is not case, a person with MBA will get higher returns.
Question 5:
= 100 10.006
= 89.994%
B) Since Coefficient off age increases Returns will be negatively related to age. As age
increases, managers can commit more errors because they can be forgiven for one
more errors. So this will increase the negative returns.So, survivorship bias will
exacerbate the effect seen in Part A.
Question6:
A) From Table-1 p-value of MBA and TEN is more than 15%. So the significant level of
15% both of these is removed and regression is run.
Call:
lm(formula = RET ~ GRI + SAT + AGE, data = myData)
Residuals:
Min 1Q Median 3Q Max
-34.199 -4.403 -0.348 4.074 35.142
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.583922 3.340350 -0.774 0.43954
GRI -2.111005 0.738580 -2.858 0.00443 **
SAT 0.006242 0.002593 2.407 0.01642 *
AGE -0.095959 0.036555 -2.625 0.00891 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual vs. Fitted value graph shows that both linearity and homoscedasticity
assumptions are valid.
Coefficient of AGE is becoming more negative because we are removing MBA and
TEN variables which had negative coefficients. So, to accommodate their negative
coefficients , coefficient of AGE would be more negative.
Question7:
Call:
lm(formula = RET ~ GRI, data = myData)
Residuals:
Min 1Q Median 3Q Max
-34.818 -4.489 -0.043 4.265 35.578
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3959 0.4664 0.849 0.39634
GRI -2.3119 0.7427 -3.113 0.00195 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Regression Equation:
RET = 0.3959-2.3119 GRI
Based on graph of Residual vs Fitted Values, both the assumptions linearity and
homoscedasticity are valid
Question8:
Call:
lm(formula = RET ~ SAT + GRI + TEN, data = myData)
Residuals:
Min 1Q Median 3Q Max
-34.180 -4.482 -0.298 4.223 35.667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.863173 3.065520 -1.586 0.11324
SAT 0.005175 0.002614 1.979 0.04829 *
GRI -2.209624 0.736785 -2.999 0.00283 **
TEN -0.183201 0.073413 -2.495 0.01288 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Regression Line
Substituting we get
B) Larger sample will reduce standard error, so t statistic value will increase and my
confidence that RET>0 will increase
C) Larger sample will reduce standard error, so t statistic value will increase and my
confidence that RET>1.5% will increase
Question10:
By running regression against Age vs GRI + SAT + MBA + TEN, we get following results
Call:
lm(formula = AGE ~ GRI + SAT + MBA + TEN, data = myData)
Residuals:
Min 1Q Median 3Q Max
-16.2663 -7.1134 -0.1337 5.9529 25.7561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.186084 3.167679 10.161 < 2e-16 ***
GRI 1.424150 0.761391 1.870 0.06197 .
SAT 0.007759 0.002729 2.843 0.00464 **
MBA -1.879255 0.778033 -2.415 0.01605 *
TEN 0.942057 0.076118 12.376 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Here although the GRI coefficient is positive. P value of GRI is more than 5% i.e. 6.197%.
So we cannot say with 5 percent level of significance.
Question11:
Taking out Fund type and Tenure out of equation, we get following regression results
call:
lm(formula = RET ~ SAT + MBA + AGE, data = myData)
Residuals:
Min 1Q Median 3Q Max
-33.469 -4.666 -0.145 3.858 35.757
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.152470 3.364309 -0.937 0.34916
SAT 0.006528 0.002646 2.468 0.01392 *
MBA -0.216997 0.761813 -0.285 0.77587
AGE -0.106347 0.037002 -2.874 0.00421 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 8.418 on 536 degrees of freedom
Multiple R-squared: 0.02516, Adjusted R-squared: 0.01971
F-statistic: 4.612 on 3 and 536 DF, p-value: 0.003385
Regression Equation:
RET = -3.152470 + 0.006528SAT -0.216997 MBA -0.106347 AGE
From case