Professional Documents
Culture Documents
EMBA
Session: 2010-2012
Managerial Report
1. Simple linear regression models
1.1. Annual amount charged as a function of annual income
Scatter plot
Since: p-value = 9.01.10-7 < 5%, the linearity relationship is accepted (meaning that there is
sufficient evidence to affirm at 95% confidence level that the Annual Income affects the Annual Amount
Charged).
• Measures of errors
○ R-square
R2=0.3981. Though acceptable, the linear model accounts of only 39.81% of the variation of the
Annual Amount Charged.
○ Standard error
S=731.731. This is a relatively high standard error as it represents about 18% of the mean Annual
Amount Charged.
• Graphical analysis of residuals
○ Linearity, independence and equal variance assumptions could be checked graphically on the plot
of residuals:
○ Normality assumption could be checked visually through the normal probability plot of residuals:
1
AUI
EMBA
Session: 2010-2012
Validity of the model
• Inference about the slope
Since : p-value = 2.86.10-10 < 5%, the linearity relationship is accepted (meaning that
there is sufficient evidence to state with a 5% significance level that Household Size affects the Annual
Amount Charged).
• Measures of errors
○ R-square
R2=0.5667.
The linear model accounts of 56.67% of the variation of the Annual Amount Charged. Household Size
is a better predictor of the Annual Amount Charged than Annual Income.
○ Standard error
S=620.793. Though lower than the standard error of the previous model, this standard error is still
relatively high as it represents about 15% of the mean Annual Amount Charged.
• Graphical analysis of residuals
○ Linearity, independence and equal variance assumptions could be checked graphically on the
residuals plot:
○ Normality assumption could also be checked through the normal probability plot of residuals:
Intercept 3.28664E-08
The linearity relationship is thus accepted (meaning that there is sufficient evidence to state with 95%
confidence level that Annual Income and Household Size collectively affect the Annual Amount
Charged).
• Measures of errors
R-square 0.825561086
2
AUI
EMBA
Session: 2010-2012
Adjusted R-square of the fit is relatively high (> 80%) and is notably higher than any of the r-squares
obtained for the previous simple regression models.
In combination, the Annual Income and the Household Size explain 81.81% of the variation of the
Annual Amount Charged taking into account the number of variables and the sample size. Thus, when
aggregated these two independent variables bring more explanatory power than when taken in
isolation.
Standard error of the model, though it has decreased, is still high (about 10% of the mean).
• Graphical analysis of residuals
○ Linearity, independence and equal variance assumptions could be checked graphically on the
residuals plot:
○ Normality assumption could be checked through the normal probability plot of residuals:
F=MSRMSE=111.21
• Significance level:
α=5%
• Degrees of freedom:
df1=k=2
df2=n-k-1=50-2-1=47
• F-test critical value:
FCV=FINV5%,2,47=3.195
• Conclusion - Since the F-test statistic is in the rejection region (being way greater than the F critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with 95%
confidence level that at least one independent variable affects the Annual Amount Charged.
Individual significance of the independent variables
• T-test for the annual income variable
○ H0: “β1 = 0” (Annual Income does not affect the Annual Amount Charged)
○ H1: “β1 ≠ 0” (Annual Income affects the Annual Amount Charged).
3
AUI
EMBA
Session: 2010-2012
○ T-test statistic:
T=b1Sb1=33.1333.96=8.350
○ Significance level:
α=5%
○ Degree of freedom:
df=n-k-1=50-2-1=47
TCV=TINV5%,47=2.01
○ Conclusion - Since the T-test statistic is in the rejection region (being greater than the T critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with
95% confidence level that the Annual Income affects the Annual Amount Charged.
○ T-test statistic:
T=b2Sb2=356.29533.20=10.73
○ Significance level:
α=5%
○ Degree of freedom:
df=n-k-1=50-2-1=47
TCV=TINV5%,47=2.01
○ Conclusion - Since the T-test statistic is in the rejection region (being greater than the T critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with
95% confidence level that the Household Size affects the Annual Amount Charged.
X3=X1×X2
With:
4
AUI
EMBA
Session: 2010-2012
X1: Annual Income.
Y = b0+b1.X1+b2.X2+b3.X3
The p-value for the slope test corresponding to the third variable is considerably high. We thus fail to reject the null
hypothesis and conclude that there is enough statistical evidence to affirm at a 5% significance level that there is no
interaction between the two explanatory variables.
(1) Cash flow – This variable would help in depicting the modern situation of many people having a high income
with limited access to that income since most of it is used to pay off bills and loans. This variable could be more
difficult to gather but it certainly will add more prediction power to the model and it should be rather
independent from the ones already integrated in the model.
(2) Percentage of females in household – This variable will help in refining the information about the internal
structure of households. It will certainly add to the prediction power of the model based on the common belief
that females like shopping (precisely by cards) more than men. This variable is easy to collect.
(3) Purchase preference: cash or credit – People in certain households may have more propensity towards
spending by cash than by credit card. This dummy variable can add some additional prediction to the model
although it might have some overlap (interaction) with the variable (2).