You are on page 1of 29

Module 6 Discriminant Analysis

Surrogate advertising is advertising which embeds a

Example 1

brand or product message inside an advertisement


which is ostensibly for another brand or product.

Examples

Demand immediate ROI vs. patience for ROI

Brand loyal vs. not brand loyal


Heavy Users vs. Light Users Purchasers vs. Non-purchasers

Good Credit Risk vs. Poor Credit Risk


Member vs. Non-Member Risk averse investors vs. risk-taking investors

What is DA
This is a multivariate technique, which analyses reasons for the a-priori existence of well-defined and distinct groups of

individuals/industries etc. existing.


A non-metric (categorical) dependent variable Is predicted by several metric (interval or ratio scaled) predictor variables

Example 2

The loan department of a bank is interested in analysing (if possible) why


customers are of two types who pay their EMIs in time (good customers)

who do not (defaulters)


X 1. type of loan 2. amount of loan 3. interest rate 4. no. of EMIs 5. tenure of account 6. amount in savings account 7. profession 8. age 9. Income bracket income

X nominal scale data


(categorical variable)

- ratio or interval scale data


(discrete or continuous variable)

X X

Example 3

Some predictor variables to help in


differentiating between the two types of viewers (1) like Bond movies

(2) do not like Bond movies


ratio ratio No. of Bond movies watched No. of other Bond-hero movies watched 007 fan or not Thrill in watching action movies Enjoyment in fantasy stories Reality of stunts

nominal
interval interval

interval
nominal ratio nominal nominal

X X X

Gender
Age (in years) Age group Occupation

Example 3 Comment on the data collected by a researcher for DA as Dependent variable Group 1 employees - new premises for our office is essentials Group 2 employees this old premise is good enough for now Predictor variables Employee name Group Age Tenure Need for Infrastructure Designation matters a lot

group of service change

years

rate 1 to 7

rate 1 to 4

Selone
Elishia

young

2.5

programmer

midage

4.7

manager

Bala 1 senior 12 6 3 officer

Caution

age group

age (whole years) income (monthly, Rs. 000)

Discrete

Inc ome bracket

Continuous

marital status children yes or no?

tenure of marriage (yrs and fraction) Continuous

number of children

Discrete

brand - loyal or not loyal?

frequency of purchase of brand (yearly)

Discrete

variable dependent predictor independent explanatory

RA interval, ratio interval, ratio, categorical/nominal

DA categorical/nominal interval, ratio

Caution: Usually the independent variables used in RA and DA are correlated quite highly among themselves (problem of Multicollinearity

- making explaining the dependent variable difficult


- use PCA to find uncorrelated principal components - use these PCs as new independent variables in RA and DA

Working of DA Rather than relying on each predictor variable as a separate measure of understanding the difference between the 2 groups, we want a combination of the predictor variables. Every predictor variable may not be equally important for the purpose of distinguishing between the 2 groups. So, we take a weighted combination of the predictor variables.

Discriminant function

Example 4 Case Study Why not a holiday? Dependent variable categorical Predictor variables Some say Yes, a good annual holiday is a must Others say Holiday is not essential Family size (X1) Annual Household income, Rs. lakhs (X2) Average age (in years) of children (X3)

Ratio scaled
Interval scaled Data Group Total

Analysis sample

Validation sample Hold-out sample

sample size Yes (1) No (2) 27 34 20 26

7 8

Data Groups Yes Yes Yes Family size (X1) 4 5 3 Annual Household income, Average age of Rs. lakhs (X2) children (X3) 10 8 7 3 4 2

No
No Yes Yes No

1
2 4 3 2

5
8 12 7 8

2
6 5 8 6

No
No No Yes No Yes

2
4 2 4 3 5

6
9 12 15 10 8

7
6 9 10 4 5

Objectives of DA Univariate ANOVA 1. For every predictor variable - determine if significant differences exist between the two groups. Boxs M test 2. Tests the equality of population covariance matrices of 2 groups.

3. Determine if the predictor variables truly help in understanding the


difference between the 2 groups. Canonical correlation

4. Tests equality of population mean vectors of Discriminant scores of 2 groups . Wilks test

Structure matrix 5. Identify the relative importance of each of the predictor variables in predicting group membership.

6. Find the accuracy of the prediction of group-members using Discriminant scores. Classification matrix - Hit ratio 7. Enable future unidentified individuals to be classified to their correct group (classification) with a better than chance accuracy.

Univariate ANOVA

For each predictor variable separately

Test of Equality of Group population Means


H0: m1 = m2 H1: m1 > m2

Wilks'
Members in family

F
13.091

df1
1

df2
12

Sig.
0.004 < 0.05 reject H0

0.478

Annual house 0.938 income

0.797

12

0.390

> 0.05 accept H0

Average age of children

0.992

0.100

12

0.757

> 0.05 accept H0

Boxs M Test

Tests equality of population covariance matrices

Box's M F Approx. df1 df2 Sig.

1.540 0.186 6 1043.321 0.981 >> 0.05 (level in significance) Accept H0 Population covariance matrices of two groups are significantly equal one of basic assumptions of DA is valid S1 S2

H0: S 1 = S 2
H 1: S 1 S 2

Summary of Canonical Discriminant Functions Function Eigenvalues % of variance Cumulative % 1 1.116 100.0 100.0 Canonical correlation 0.726

Canonical correlation coefficient is the


relation between 2 sets of variables

> 60% implies that the Discrimination is good

correlation between the dependent variable (2 groups) and the set of

predictor variables (X1, X2 and X3).


Canonical correlation coefficient indicates if the predictor variables help in understanding the difference between 2 groups the variation in the dependent variable that is explained or accounted for by the Discriminant model.

Tests equality of Population Mean Vectors of Discriminant scores of 2 groups

Wilks test

m1 ~
df 3 sig. 0.049 < 0.05

m2 ~

Wilks lambda 0.473 H0:

2 7.871

m1 = m 2 ~ ~ m1 m 2 ~ ~

Reject H0 the population mean vectors of the D-scores of the two groups are not

H1:

significantly equal

Standardized Canonical Discriminant Function Coefficients Members in family (X1) Annual household income (X2) Average age of children (X3) 0.983 0.054 - 0.174

Discriminant Function = 0.983 X1 + 0.054 X2 0.174 X3

Discriminant Score (D-score) of a respondent can be found by substituting his/her answers (values of X1, X2, X3) from the data sheet In the above Discriminant Function.

Group Y Y Y N N Y Y N N N N Y N Y

Members in family (X1) 4 5 3 1 2 4 3 2 2 4 2 4 3 5

Annual Household Av age of children income (X2) (X3) 10 3 8 4 7 2 5 2 8 6 12 5 7 8 8 6 6 7 r (D-score, X1) = 0.989 9 6 12 9 15 10 10 4 8 5

Discr. Score 1.14429 2.14431 0.04437 -2.21345 -1.32038 1.04636 -0.36933 -1.32038 -1.42929 0.91746 -1.4473 0.76156 -0.03358 2.07536

Members in family (X1) Structure Matrix Annual household income (X2) Average age of children (X3)

0.989 0.244 - 0.087

D-scores of Yes respondents Yes Group Centroid

D-scores of No respondents No Group Centroid

-8

-1

+6

Members in family (X1)


Annual household income (X2) Average age of children (X3) His D-score = 0.983 X1 + 0.054 X2 0.174 X3 = 1.935 So, wrong prediction of group membership

3
7 8

He actually belonged to Group 1 - Yes

His D-scores predicts


him to be a Group 2 person - No

Group

Annual HMembers in hold income Av age of family (X1) (X2) children (X3) Discr. Score

Prediction

Y
Y Y N N

4
5 3 1 2

10
8 7 5 8

3
4 2 2 6

1.14429
2.14431 0.04437 -2.21345 -1.32038

Y
Y Y N N

Y
Y N N N

4
3 2 2 4

12
7 8 6 9

5
8 6 7 6

1.04636
-0.36933 -1.32038 -1.42929 0.91746

Y
N N N Y

N
Y N Y

2
4 3 5

12
15 10 8

9
10 4 5

-1.4473
0.76156 -0.03358 2.07536

N
Y N Y

Hit ratio = (85.7 + 85.7) / 2 = 85.7


Classification matrix Group Predicted Group Membership .00 Original Count .00 (No) 6 1 85.7 1.00 1 6 4.3 Total 7 7 100.0

Predictive power of DA is good

1.00 (Yes) % .00

1.00

14.3

85.7

100.0

Example 5
In a 2 group DA of education loan from banks, the following results were obtained.
(a) Will you use all the variables for the DA? Which are the variables you would consider for continuing with the DA? (b) Make the structure matrix and hence identify the variable that contributes most to the intergroup difference. (c) How good is the predictive power of the discriminant function?

Univariate ANOVA Variables Average age of children p-value 0.078

Number of children for higher studies


Number of household with higher education Monthly income Annual savings

0.003
0.061 0.005 0.048

Correlation coefficient between the D scores and individual variables Variables Average age of children Number of children for higher studies Number of household with higher education Monthly income Annual savings correlation coefficient 0.21 0.76 - 0.04 0.62 0.54

Take loan Sample size correct prediction Analysis sample 40 35

Not take loan Sample size correct prediction 60 56

Validation sample

15

10

20

14

Example 6

Case 2 Fear of Mutual funds

Customers answers: 1) Oh yes! I will invest in MF. 2) Never! I am too scared may be!
Variables Feeling of insecurity Too risky Scale used Interval scale 1 to 7 1 not agree 7 disagree Amount in Rs. Lakhs Actual number Structure matrix 0.822 0.541 Discriminant coefficients 0.743 0.096

Satisfied with medium returns


Annual household savings

0.346
0.213

0.233
0.469

No.of dependents

0.164

0.209

1) What scale has been used for the last two variables?
2) How are the values in the 3rd column computed? What to they convey to you? 3) Find the discriminant score of a person with answers to the variables as: 1, 6, 2, Rs. 4.5 lakhs, 3. 4) What is the following table called? 5) Complete it and thus comment on the goodness of the DA done. Predicted Group Membership

Answer
Yes No

No. of respondents 45
45

Yes
27 12

No

Example 7 (SIP suggestion)

For a fast-food outlet several DA needs to be done - for employees -- for customers

Employees
Grouping/classification variables

Nominal scaled data/Categorical variables

1) Intention to search another job yes or no 2) Work Type - part-time or full-time

3) Gender
4) Age group young or mid-aged 5) Performance good or bad
Interval scale 1 to 7

Relationship variables

1 completely disagree

Loyalty I have a sense of loyalty to Samouels restaurant. 7 completely agree Effort I am willing to put in a great deal of effort beyond that expected to help Samouels restaurant to be successful.

Proud I am proud to tell others that I work for Samouels restaurant.

Work environment variables X1 X2 X3 X4 X5 X6 I am paid fairly for the work I do. I am doing the kind of work I want.

Interval scale 1 to 7

1 completely disagree
7 completely agree

My supervisor gives credit an praise for work well done. There is a lot of cooperation among the members of my work group. My job allows me to learn new skills. My supervisor recognizes my potential.

X7
X8 X9 X10 X11 X12

My work gives me a sense of accomplishment.


My immediate work group functions as a team. My pay reflects the effort I put into doing my work. My supervisor is friendly and helpful. The members of my work group have the skills and/or training to do their job well The benefits I receive are reasonable.

Customer
Restaurant Perceptions X1 X2 Excellent Food Quality Attractive Interior

Selection Factor Rankings X13 X14 X15 X16 X17 X18 X19 X20 X21 Food Quality Atmosphere Prices Employees

X3
X4 X5 X6 X7 X8 X9 X10 X11 X12

Generous Portions
Excellent Food Taste Good Value for the Money Friendly Employees Appears Clean & Neat Fun Place to Go Wide Variety of menu Items Reasonable Prices Courteous Employees Competent Employees

Relationship Variables
Satisfaction Likely to Return in Future Recommend to Friend Frequency of Patronage Length of Time a Customer

Grouping/classification variables X22 X23 X24 Gender Which AD Viewed (no.1, 2 or 3) AD Rating good or bad

X25

Viewed Ads yes or no

You might also like