Consumer Credit Card Usage Analysis

Consumer Credit Card Usage Analysis
Krithik Jain
Business Statistics MGSC 2301-07
Professor Dimitrios Fotiadis

Introduction, Analysis and Methods
A. This report was written by Krithik Jain

B. The intended audience for this report is Company KAJ
C. The objectives of the project:
1. Use methods of descriptive statistics to summarize the data. Comment on the findings.
2. Develop estimated regression equations, first using annual income as the independent variable and then using
household size as the independent variable. Which variable is the better predictor of the annual credit card charges?
Discuss your findings.
3. Develop an estimated regression equation with annual income and household size as the independent variables.
Discuss your findings.
4. What is the predicted annual credit card charge for a three-person household with an annual of $40,000?
5. Discuss the need for other independent variables that could be added to the model. What additional variables might be
helpful.
D. Consumer Research Inc. is an independent agency that conducts research on consumer attitudes and behaviours for a variety
of firms. In one study, a client asked for an investigation of consumer characteristics that can be used to predict the amount
charged by credit card users. Data were collected on annual income, household size, and annual credit card charges for a
sample of 50 customers. The data at the end of this document (consumer) are contained in the file consumer (chapter 15 of
the book Statistics Business & Economics 11e, by Anderson, Sweeney and Williams) and is also provided on Bb.
E. The major findings
F. Conclusions and Recommendations. Discuss the limitations of your study, what questions remain unanswered, and make a
suggestion to find the answer for unanswered issues in the project.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Income 50 21 67 43.48 14.551
Household Size 50 1 7 3.42 1.739
Amount Charged 50 1864 5678 3964.06 933.494
Valid N (listwise) 50
Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
Income 50 100.0% 0 0.0% 50 100.0%
Household Size 50 100.0% 0 0.0% 50 100.0%
Amount Charged 50 100.0% 0 0.0% 50 100.0%
Descriptives
Statistic Std. Error
Income Mean 43.48 2.058
99% Confidence Lower Bound 37.97
Interval for Mean Upper Bound 48.99
5% Trimmed Mean 43.42
Median 42.00
Variance 211.724
Std. Deviation 14.551
Minimum 21
Maximum 67
Range 46
Interquartile Range 25
Skewness .096 .337
Kurtosis -1.248 .662
Household Size Mean 3.42 .246
Median 3.00
Variance 3.024
Minimum 1
Maximum 7
Range 6
Skewness .528 .337
Kurtosis -.723 .662
Amount Charged Mean 3964.06 132.016
Median 4090.00
Variance 871411.200
Minimum 1864
Maximum 5678
Range 3814
Skewness -.130 .337
Kurtosis -.742 .662
Findings:
Income:
 The average income is $43,480,
 the median is $42,000,
 the variance is $211,724.
 The Standard deviation is $14551
 The minimum income is $21,000 and the maximum value is $67,000.
 The range is $46,000.
 The IQR is $25,000.
Household Size:
 The average household size is 3.42
 The median is 3.00
 The variance is 3.02
 The standard deviation is 1.739
 The minimum household size is 1 and the maximum household size is 7.
 The range for household sizes is 6.
 The IQR is 3.
Amount Charged:
 The average amount charged is $3,964.06
 The median is $4,090
 The variance is $871411.200
 The standard deviation is $933.494
 The minimum amount charged is $1,864 and the maximum amount charged is $5,678
 The range for amount charged is $3,814.
 The IQR is 1,638.
Percentiles
Percentiles
5 10 25 50 75 90 95
Weighted Income 21.55 23.20 30.00 42.00 55.00 64.90 66.45
Average Household Size 1.00 1.10 2.00 3.00 5.00 6.00 7.00
(Definition 1) Amount
2463.95 2597.80 3109.25 4090.00 4747.50 5285.80 5461.35
Charged
Tukey's Hinges Income 30.00 42.00 55.00
Household Size 2.00 3.00 5.00
Amount
3121.00 4090.00 4742.00
Charged
Income Stem-and-Leaf Plot
Frequency Stem & Leaf
5.00 2 . 11223
5.00 2 . 56779
7.00 3 . 0001234
5.00 3 . 57799
5.00 4 . 01224
3.00 4 . 688
7.00 5 . 0012444
3.00 5 . 555
5.00 6 . 12234
5.00 6 . 56677
Stem width: 10
Each leaf: 1 case(s)
Income:Q1=$30000, Q2=$42000,Q3=$55000,min=$21000,max=$67000
Household Size:Q1 = 2, Q2 = 3,Q3 = 5,min = 1,max =7

Amount charged: Q1=$3109.2, Q2=$4090,Q3=$4747.5,min=$1864,max=$5678
Income
Objective 2a: SLR with income and amount charged
Develop estimated regression equations, first using annual income as the independent variable and then using household size as
the independent variable.
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .631a .398 .386 731.713
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 16999744.786 1 16999744.786 31.751 .000b
Residual 25699404.034 48 535404.251
Total 42699148.820 49
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) Bo=2204.000 329.049 6.698 .000 1542.402 2865.597
Income=X B1=40.480 7.184 .631 5.635 .000 26.036 54.924
Discuss Findings:
(1) R-Squared = .398 <= .70. The ESLRE y-hat= 2,204 +40.480x does not provide us with a good fit of the population.
39.8% of the variability in Y can be explained by Income.
(2) ESLRE y-hat= 2,204 +40.480x
(3) Is the SLR Model significant?
Ttest:
Ho: 1 = 0 (1 is NOT significant) = (Income is NOT significant) = (Income NOT linearly related to Amount Charged)
Ha: 1 = 0 (1 significant) = (Income is significant) = (Income linearly related Charged)
Using the p-value approach reject the null hypothesis because p-value = 0.00  0.05 =   (1 is significant)
Objective 2b)SLR with Household size and amount charged
Which variable is the better predictor of the annual credit card charges? Discuss your findings.
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .753a .567 .558 620.793
ANOVAa
Sum of
1 Regression 24200717.481 1 24200717.481 62.796 .000b
Residual 18498431.339 48 385383.986
Total 42699148.820 49
Coefficientsa
1 (Constant) Bo=2581.941 195.263 13.223 .000 2189.339 2974.543
HouseholdSize B1=404.128 50.998 .753 7.924 .000 301.590 506.666
Finding:
R-squared = .567: 56.7% of the variability in Y is explained by Hsize = X Because .567 <=70, the sample data does not provide
us with a good fit for the population data
Y-hat = 2581.941 + 404.128X(X=Household Size)
The Test we used:
Ha: 1  (1 is significant) = (Household Size is significant) = (Household Size linearly related to Amount Charged)
Using the pvalue method reject the null because pvalue = 000 0.05 =   (1 is significant)
We compared R-square with Income & Amount Charged and the R-square with Household Size & Amount Charged to decide
with Independent variable is better.
Our conclusion is that Household size is the better predictor of credit card charges because the Household R2 > income r2
Objective 3
Develop an estimated regression equation with annual income and household size as the independent variables. Discuss your
findings.
Model Summary
Adjusted R
Model R R Square Square Std. Error of the Estimate
1 .909a .826 .818 398.091
ANOVAa
Sum of
1 Regression 35250755.672 2 17625377.836 111.218 .000b
Residual 7448393.148 47 158476.450
Total 42699148.820 49
Coefficientsa
1 (Constant) Bo=1304.905 197.655 6.602 .000 907.275 1702.535
Income=X1 B1=33.133 3.968 .516 8.350 .000 25.151 41.115
HouseholdSize=
B2=356.296 33.201 .664 10.732 .000 289.504 423.087
X2
1) R-squared = .826  82.6% of the variability in Y(amt charged) is explained by income and household size combined.
And the Y-hat line is a good fit for the population data.
2) EMLRE = y-hat = 1304.905 + 33.133X1 +356.296X2 (X1=Income and X2=Household size)
Describe the Findings:
1) Explain R-square
2) Estimated MLR equation= y-hat=1304.905+33.133Income+356.296HouseholdSize
3) Is the Model Significant? Use the p-value method and do both the Ftest and the two Ttests
Ftest:Ho: Ha: At least one of β1, β2 is ≠ 0 Model is significant
Using the pvalue method: Because p=000 .05= we reject the Null  Model is significant. May do the two Ttests
And now we can do the k = 2Ttests because we rejected the FTest

Ttest1:
Ha: 1 ≠ 0 (1 is significant) = (Income is significant) = ( Income linearly related to Amount Charged)
U sin g the pvalue method reject the null because pvalue = 000 .05= (1 is significant)
Ttest2:
Ha: 2 ≠ 0 (2 is significant) = (Household Size is significant) = (Household Size linearly related to Amount Charged)
U sin g the pvalue method reject the null because pvalue = 000 .05= (2 is significant)
Objective 4:
Using the estimated MLR equation replace Income with 40 and household size with 3.
y-hat = 1304.905 + 33.133 * (40) +356.296 * (3)
Objective 5:
Discuss the need for other independent variables that could be added to the model. What additional variables might be helpful?
SPSS output: There is no SPSS output for objective 5
Other possible independent variables (IVs): Mention other IVs and briefly explain how they affect credit card usage.
Age: Older people tend to not use credit cards as much, younger people tend to use them more often. Proximity to shopping
malls, city, urban, other ideas?
F: Conclusions and Recommendations
Conclusion:
Yes, the findings make sense. 99% of the consumers income was between $37,970 and $48,990. Also, the mean income for this
data was $43,480. Thus, the firms will target people between incomes of $37,970 and $48,990to maximize number of
customers.
Moreover, the mean household size is 3.42 people and 99% of consumers had a household size between 2.76(3) and 4.08(4).
Meaning that the firms are marketing to people with household sizes of 3 to 4.
Lastly, for amount charged 99% of the data was between $3610.26 and $4317.86. And the mean was $3964.04. Hence, the
credit card company should market to people who are between that range to increase revenue and profits.
I would like to find out:

1. When the greatest number of consumers apply for credit cards, so the company can focus on marketing in that specific
period.
2. Furthermore, I would like to find out where the consumers used most of their credit cards. This way the credit card
company can have more cashback deals and offers for those places where consumers actually go to shop. This will
encourage more and more consumers to start getting this credit card.
3. Consumer age and other trends regarding this statistic. Through this information a lot more could be found out and the
marketing process can be made more efficient. Also, this will allow the company to understand how the demographic can
impact banking services on a general basis.
4. Also finding out location statistics as mentioned above will help as well. This will be useful for target marketing and
market segmentation. Also, it will make data easier to analyze.
We can conclude that IV income has a linear relationship with the Amount charged, even though it does not fit the population
data properly. R-square of 0.398<0.7
Also, IV household size has a linear relationship with the Amount Charged as well. But R-square of 0.567<0.7 which means
that it is not a good fit. We reject null hypothesis.
Lastly, these values calculated above have been calculated based on a sample size of 50 consumers. Even though the sample
size is large it can be bigger. Moreover, it is a sample size hence the values or findings in the real world can be different
depending upon each consumer and his or her preferences. Also, we should have more independent variables in the dataset
because that will allow us to get better results using SPSS and basic statistic calculations.

Consumer Credit Card Usage Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Consumer Credit Card Usage Analysis

Uploaded by

Copyright:

Available Formats

Consumer Credit Card Usage Analysis

Business Statistics MGSC 2301-07

Professor Dimitrios Fotiadis

A. This report was written by Krithik Jain

Case Processing Summary

Income Stem-and-Leaf Plot

Frequency Stem & Leaf

Household Size:Q1 = 2, Q2 = 3,Q3 = 5,min = 1,max =7

Y-hat = 2581.941 + 404.128X(X=Household Size)

The Test we used:

Describe the Findings:

Ftest:Ho: Ha: At least one of β1, β2 is ≠ 0 Model is significant

And now we can do the k = 2Ttests because we rejected the FTest

SPSS output: There is no SPSS output for objective 5

I would like to find out:

You might also like