Advanced Quantitative Methods & Statistics: Managerial Report

AUI
EMBA
Session: 2010-2012
Advanced Quantitative Methods & Statistics

Case Study 2
Rachid ZAÏR
zair@one.ma
Managerial Report
1. Simple linear regression models
1.1. Annual amount charged as a function of annual income
 Scatter plot
 Simple regression equation

Annual Amount Charged=2203.999+40.479×Annual Income (1000$s)
 Validity of the model

• Inference about the slope
Since: p-value = 9.01.10-7 < 5%, the linearity relationship is accepted (meaning that there is
sufficient evidence to affirm at 95% confidence level that the Annual Income affects the Annual Amount
Charged).
• Measures of errors
○ R-square
R2=0.3981. Though acceptable, the linear model accounts of only 39.81% of the variation of the
Annual Amount Charged.
○ Standard error
S=731.731. This is a relatively high standard error as it represents about 18% of the mean Annual
Amount Charged.
• Graphical analysis of residuals
○ Linearity, independence and equal variance assumptions could be checked graphically on the plot
of residuals:
○ Normality assumption could be checked visually through the normal probability plot of residuals:
1.1.Annual amount charged as a function of household size

 Scatter plot
 Simple regression equation

Annual Amount Charged=2581.941+404.128×Household Size
1
AUI
EMBA
Session: 2010-2012
• Inference about the slope
Since : p-value = 2.86.10-10 < 5%, the linearity relationship is accepted (meaning that
there is sufficient evidence to state with a 5% significance level that Household Size affects the Annual
Amount Charged).
○ R-square
R2=0.5667.
The linear model accounts of 56.67% of the variation of the Annual Amount Charged. Household Size
is a better predictor of the Annual Amount Charged than Annual Income.
○ Standard error
S=620.793. Though lower than the standard error of the previous model, this standard error is still
relatively high as it represents about 15% of the mean Annual Amount Charged.
○ Linearity, independence and equal variance assumptions could be checked graphically on the
residuals plot:
○ Normality assumption could also be checked through the normal probability plot of residuals:
1. Multiple linear regression model

 Multiple regression equation
Annual Amount Charged=1304.904+33.133×Annual Income (1000$s)
+356.295×Household Size

• Inference about the slopes
P-values for the intercept and the slopes of the two independent variables (Annual Income and
Household Size) are all well below the significance level as indicated by this excerpt:
p-value
Intercept 3.28664E-08
Income ($1000s) 7.68206E-11
Household Size 3.12342E-14
The linearity relationship is thus accepted (meaning that there is sufficient evidence to state with 95%
confidence level that Annual Income and Household Size collectively affect the Annual Amount
Charged).
R-square 0.825561086
Adjusted R-square 0.818138154
2
AUI
EMBA
Session: 2010-2012
Standard error 398.0910071
Adjusted R-square of the fit is relatively high (> 80%) and is notably higher than any of the r-squares
obtained for the previous simple regression models.
In combination, the Annual Income and the Household Size explain 81.81% of the variation of the
Annual Amount Charged taking into account the number of variables and the sample size. Thus, when
aggregated these two independent variables bring more explanatory power than when taken in
isolation.
Standard error of the model, though it has decreased, is still high (about 10% of the mean).
○ Linearity, independence and equal variance assumptions could be checked graphically on the
residuals plot:
○ Normality assumption could be checked through the normal probability plot of residuals:
1. Testing the existence of a linear relationship in the multiple regression model

 F-test for the overall significance of the model
• H0: “β1 = β2 = 0” (no linear relationship)
• H1: “At least one βi ≠ 0” (at least one independent variable affects the Annual Amount Charged).
• F-test statistic:
F=MSRMSE=111.21
• Significance level:
α=5%
• Degrees of freedom:
df1=k=2
df2=n-k-1=50-2-1=47
• F-test critical value:
FCV=FINV5%,2,47=3.195
• Conclusion - Since the F-test statistic is in the rejection region (being way greater than the F critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with 95%
confidence level that at least one independent variable affects the Annual Amount Charged.
 Individual significance of the independent variables
• T-test for the annual income variable
○ H0: “β1 = 0” (Annual Income does not affect the Annual Amount Charged)
○ H1: “β1 ≠ 0” (Annual Income affects the Annual Amount Charged).
3
AUI
EMBA
Session: 2010-2012
○ T-test statistic:
T=b1Sb1=33.1333.96=8.350
○ Significance level:
α=5%
○ Degree of freedom:
df=n-k-1=50-2-1=47
○ T-test critical value:
TCV=TINV5%,47=2.01
○ Conclusion - Since the T-test statistic is in the rejection region (being greater than the T critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with
95% confidence level that the Annual Income affects the Annual Amount Charged.
• T-test for the household size variable

○ H0: “β2 = 0” (Household Size does not affect the Annual Amount Charged)
○ H1: “β2 ≠ 0” (Household Size affects the Annual Amount Charged).
○ T-test statistic:
T=b2Sb2=356.29533.20=10.73
○ Significance level:
α=5%
○ Degree of freedom:
df=n-k-1=50-2-1=47
○ T-test critical value:
TCV=TINV5%,47=2.01
○ Conclusion - Since the T-test statistic is in the rejection region (being greater than the T critical
value), we reject the null hypothesis and conclude that there is enough statistical evidence to state with
95% confidence level that the Household Size affects the Annual Amount Charged.
1. Checking for the existence of interaction between the two explanatory

variables
To check for interaction between the two independent variables, we will test for the possibility to build a multiple
regression model that includes a third variable constructed as follow:
X3=X1×X2
With:
4
AUI
EMBA
Session: 2010-2012
X1: Annual Income.
X2: Household Size.
The new model will be written as follow:
Y = b0+b1.X1+b2.X2+b3.X3
T-test null and alternative hypotheses:

• H0: “β3 = 0” (There is no interaction between Annual Income and Household Size).
• H1: “β3 ≠ 0” (There is interaction between Annual Income and Household Size).
Slopes test results for the new model:
Coefficients P-value
Intercept 1301.482189 0.00466269
X1 : Income ($1000s) 33.2175634 0.002577635
X2 : Household Size 357.281731 0.003762056
X3 : Annual Income and Household Size -0.023698297 0.993022952
The p-value for the slope test corresponding to the third variable is considerably high. We thus fail to reject the null
hypothesis and conclude that there is enough statistical evidence to affirm at a 5% significance level that there is no
interaction between the two explanatory variables.
1. Need for additional explanatory variables

Over the three linear models constructed above, the coefficients of determination kept increasing while the standard
errors kept decreasing, showing an increasing improvement in the quality of fit of the constructed linear models.
However, standard error of the third model remained high (about 10% of the mean of the predicted variable). This
demonstrates the need to search for additional independent variables to include in the model.
Predictor variables to be suggested need to be relevant and independent from the ones already integrated in the
model in order to enhance the overall quality of the fit and avoid the problem of multicollinearity. Here are some
variables I can think of:
(1) Cash flow – This variable would help in depicting the modern situation of many people having a high income
with limited access to that income since most of it is used to pay off bills and loans. This variable could be more
difficult to gather but it certainly will add more prediction power to the model and it should be rather
independent from the ones already integrated in the model.
(2) Percentage of females in household – This variable will help in refining the information about the internal
structure of households. It will certainly add to the prediction power of the model based on the common belief
that females like shopping (precisely by cards) more than men. This variable is easy to collect.
(3) Purchase preference: cash or credit – People in certain households may have more propensity towards
spending by cash than by credit card. This dummy variable can add some additional prediction to the model
although it might have some overlap (interaction) with the variable (2).

Advanced Quantitative Methods & Statistics: Managerial Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Quantitative Methods & Statistics: Managerial Report

Uploaded by

Copyright:

Available Formats

AUI

Advanced Quantitative Methods & Statistics

 Simple regression equation

 Validity of the model

1.1.Annual amount charged as a function of household size

 Simple regression equation

1. Multiple linear regression model

 Validity of the model

Income ($1000s) 7.68206E-11

Household Size 3.12342E-14

Adjusted R-square 0.818138154

Standard error 398.0910071

1. Testing the existence of a linear relationship in the multiple regression model

○ T-test critical value:

• T-test for the household size variable

○ T-test critical value:

1. Checking for the existence of interaction between the two explanatory

X2: Household Size.

The new model will be written as follow:

T-test null and alternative hypotheses:

1. Need for additional explanatory variables

You might also like