Professional Documents
Culture Documents
Krithik Jain
Descriptives
Statistic Std. Error
Income Mean 43.48 2.058
99% Confidence Lower Bound 37.97
Interval for Mean Upper Bound 48.99
5% Trimmed Mean 43.42
Median 42.00
Variance 211.724
Std. Deviation 14.551
Minimum 21
Maximum 67
Range 46
Interquartile Range 25
Skewness .096 .337
Kurtosis -1.248 .662
Household Size Mean 3.42 .246
99% Confidence Lower Bound 2.76
Interval for Mean Upper Bound 4.08
5% Trimmed Mean 3.36
Median 3.00
Variance 3.024
Std. Deviation 1.739
Minimum 1
Maximum 7
Range 6
Interquartile Range 3
Skewness .528 .337
Kurtosis -.723 .662
Amount Charged Mean 3964.06 132.016
99% Confidence Lower Bound 3610.26
Interval for Mean Upper Bound 4317.86
5% Trimmed Mean 3971.48
Median 4090.00
Variance 871411.200
Std. Deviation 933.494
Minimum 1864
Maximum 5678
Range 3814
Interquartile Range 1638
Skewness -.130 .337
Kurtosis -.742 .662
Findings:
Income:
The average income is $43,480,
the median is $42,000,
the variance is $211,724.
The Standard deviation is $14551
The minimum income is $21,000 and the maximum value is $67,000.
The range is $46,000.
The IQR is $25,000.
Household Size:
The average household size is 3.42
The median is 3.00
The variance is 3.02
The standard deviation is 1.739
The minimum household size is 1 and the maximum household size is 7.
The range for household sizes is 6.
The IQR is 3.
Amount Charged:
The average amount charged is $3,964.06
The median is $4,090
The variance is $871411.200
The standard deviation is $933.494
The minimum amount charged is $1,864 and the maximum amount charged is $5,678
The range for amount charged is $3,814.
The IQR is 1,638.
Percentiles
Percentiles
5 10 25 50 75 90 95
Weighted Income 21.55 23.20 30.00 42.00 55.00 64.90 66.45
Average Household Size 1.00 1.10 2.00 3.00 5.00 6.00 7.00
(Definition 1) Amount
2463.95 2597.80 3109.25 4090.00 4747.50 5285.80 5461.35
Charged
Tukey's Hinges Income 30.00 42.00 55.00
Household Size 2.00 3.00 5.00
Amount
3121.00 4090.00 4742.00
Charged
5.00 2 . 11223
5.00 2 . 56779
7.00 3 . 0001234
5.00 3 . 57799
5.00 4 . 01224
3.00 4 . 688
7.00 5 . 0012444
3.00 5 . 555
5.00 6 . 12234
5.00 6 . 56677
Stem width: 10
Each leaf: 1 case(s)
Income:Q1=$30000, Q2=$42000,Q3=$55000,min=$21000,max=$67000
Income
Objective 2a: SLR with income and amount charged
Develop estimated regression equations, first using annual income as the independent variable and then using household size as
the independent variable.
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .631a .398 .386 731.713
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 16999744.786 1 16999744.786 31.751 .000b
Residual 25699404.034 48 535404.251
Total 42699148.820 49
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) Bo=2204.000 329.049 6.698 .000 1542.402 2865.597
Income=X B1=40.480 7.184 .631 5.635 .000 26.036 54.924
Discuss Findings:
(1) R-Squared = .398 <= .70. The ESLRE y-hat= 2,204 +40.480x does not provide us with a good fit of the population.
39.8% of the variability in Y can be explained by Income.
(2) ESLRE y-hat= 2,204 +40.480x
(3) Is the SLR Model significant?
Ttest:
Ho: 1 = 0 (1 is NOT significant) = (Income is NOT significant) = (Income NOT linearly related to Amount Charged)
Ha: 1 = 0 (1 significant) = (Income is significant) = (Income linearly related Charged)
Using the p-value approach reject the null hypothesis because p-value = 0.00 0.05 = (1 is significant)
Objective 2b)SLR with Household size and amount charged
Which variable is the better predictor of the annual credit card charges? Discuss your findings.
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .753a .567 .558 620.793
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 24200717.481 1 24200717.481 62.796 .000b
Residual 18498431.339 48 385383.986
Total 42699148.820 49
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) Bo=2581.941 195.263 13.223 .000 2189.339 2974.543
HouseholdSize B1=404.128 50.998 .753 7.924 .000 301.590 506.666
Finding:
R-squared = .567: 56.7% of the variability in Y is explained by Hsize = X Because .567 <=70, the sample data does not provide
us with a good fit for the population data
Ha: 1 (1 is significant) = (Household Size is significant) = (Household Size linearly related to Amount Charged)
Using the pvalue method reject the null because pvalue = 000 0.05 = (1 is significant)
We compared R-square with Income & Amount Charged and the R-square with Household Size & Amount Charged to decide
with Independent variable is better.
Our conclusion is that Household size is the better predictor of credit card charges because the Household R2 > income r2
Objective 3
Develop an estimated regression equation with annual income and household size as the independent variables. Discuss your
findings.
Model Summary
Adjusted R
Model R R Square Square Std. Error of the Estimate
1 .909a .826 .818 398.091
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 35250755.672 2 17625377.836 111.218 .000b
Residual 7448393.148 47 158476.450
Total 42699148.820 49
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) Bo=1304.905 197.655 6.602 .000 907.275 1702.535
Income=X1 B1=33.133 3.968 .516 8.350 .000 25.151 41.115
HouseholdSize=
B2=356.296 33.201 .664 10.732 .000 289.504 423.087
X2
1) R-squared = .826 82.6% of the variability in Y(amt charged) is explained by income and household size combined.
And the Y-hat line is a good fit for the population data.
2) EMLRE = y-hat = 1304.905 + 33.133X1 +356.296X2 (X1=Income and X2=Household size)
1) Explain R-square
2) Estimated MLR equation= y-hat=1304.905+33.133Income+356.296HouseholdSize
3) Is the Model Significant? Use the p-value method and do both the Ftest and the two Ttests
Using the pvalue method: Because p=000 .05= we reject the Null Model is significant. May do the two Ttests
U sin g the pvalue method reject the null because pvalue = 000 .05= (1 is significant)
Ttest2:
Ha: 2 ≠ 0 (2 is significant) = (Household Size is significant) = (Household Size linearly related to Amount Charged)
U sin g the pvalue method reject the null because pvalue = 000 .05= (2 is significant)
Objective 4:
Using the estimated MLR equation replace Income with 40 and household size with 3.
y-hat = 1304.905 + 33.133 * (40) +356.296 * (3)
Objective 5:
Discuss the need for other independent variables that could be added to the model. What additional variables might be helpful?
Other possible independent variables (IVs): Mention other IVs and briefly explain how they affect credit card usage.
Age: Older people tend to not use credit cards as much, younger people tend to use them more often. Proximity to shopping
malls, city, urban, other ideas?
F: Conclusions and Recommendations
Conclusion:
Yes, the findings make sense. 99% of the consumers income was between $37,970 and $48,990. Also, the mean income for this
data was $43,480. Thus, the firms will target people between incomes of $37,970 and $48,990to maximize number of
customers.
Moreover, the mean household size is 3.42 people and 99% of consumers had a household size between 2.76(3) and 4.08(4).
Meaning that the firms are marketing to people with household sizes of 3 to 4.
Lastly, for amount charged 99% of the data was between $3610.26 and $4317.86. And the mean was $3964.04. Hence, the
credit card company should market to people who are between that range to increase revenue and profits.
We can conclude that IV income has a linear relationship with the Amount charged, even though it does not fit the population
data properly. R-square of 0.398<0.7
Also, IV household size has a linear relationship with the Amount Charged as well. But R-square of 0.567<0.7 which means
that it is not a good fit. We reject null hypothesis.
Lastly, these values calculated above have been calculated based on a sample size of 50 consumers. Even though the sample
size is large it can be bigger. Moreover, it is a sample size hence the values or findings in the real world can be different
depending upon each consumer and his or her preferences. Also, we should have more independent variables in the dataset
because that will allow us to get better results using SPSS and basic statistic calculations.