You are on page 1of 24

Discriminant Analysis

A Bose
BIMTECH

November 2009

A Bose (BIMTECH)

Discriminant Analysis

1/ 24

November 2009

1 / 24

What is Discriminant Analysis

Table of Contents

What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects

A Bose (BIMTECH)

Discriminant Analysis

2/ 24

November 2009

2 / 24

What is Discriminant Analysis

What is Discriminant Analysis Discriminant analysis undertakes te same task as Multiple Linear Regression by predicting an outcome. However, MLR is limited to cases where the dependent variable on the Y axis is an interval variable so that the combination of predictors, through the regression equation, produce estimated mean population numerical Y values for given values of weighted combinations of X values. But many interesting variables are categorical, such as
1 2 3 4 5 6 7 8 9

political party voting intention migrant / non / migrant status making a prot or not holding a particular credit card owning, renting or paying a mortgage for a house employed / unemployed satised versus dissatised employees which customers are likely to buy a product or not buy whether a person is a credit risk
A Bose (BIMTECH) Discriminant Analysis 3/ 24 November 2009 3 / 24

What is Discriminant Analysis

Discriminant Analysis is used when


1

the dependent variable is categorical with the predictor IVs at interval level such as age, income, perceptions, and years of education there are more than two DV categories, unlike logistic regression, which is limited to a dichotomous dependent variable.

A Bose (BIMTECH)

Discriminant Analysis

4/ 24

November 2009

4 / 24

Discriminant Analysis

Table of Contents

What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects

A Bose (BIMTECH)

Discriminant Analysis

5/ 24

November 2009

5 / 24

Discriminant Analysis

Discriminant Analysis involves the determination of a linear equation like regression that will predict which group the case belongs to. The form of the equation or function is: D = v1 X1 + v2 X2 + v3 X3 + + vk Xk + c where D = discriminant function v = the discriminant coecient vector (weights vector) X = vector of respondents score for the variables k = the number of predictor variables c = a constant

A Bose (BIMTECH)

Discriminant Analysis

6/ 24

November 2009

6 / 24

Discriminant Analysis

A discriminant score This is a weighted linear combination (sum) of the discriminating variables.

Assumptions of Discriminant Analysis The underlying assumptions of DA are:


1 2

the observations are a random sample each predictor variable is normally distributed

A Bose (BIMTECH)

Discriminant Analysis

7/ 24

November 2009

7 / 24

Discriminant Analysis

An example of Discriminant Analysis (from Panneerselvam P.424) The Director of a management school wants to do discriminant analysis concerning the eect of two factors, namely, the yearly spending (in Rs. lakhs) on infrastructure of the school (X1 ) and the yearly spending on interface events of the school (X2 ) on the grading of the school by an inspection team as shown in the table on the next slide. Based on the data, the committee has awarded one of the following grades for each year, as shown in the same table.

Exercise - Worked out


1 2

Design the discriminant function, Y = aX1 + bX2 . Compute the discriminant ratio, K and identify the variable which is more important in relation to the other variable. Validate the discriminant function using the given data by forming groups based on the critical discriminant score. Test whether the group means are equal in importance at a signicance level of 0.05
A Bose (BIMTECH) Discriminant Analysis 8/ 24 November 2009 8 / 24

Discriminant Analysis

The combination of hypotheses of this example are: H0 :The group means are equal in importance H1 :The group means are not equal in importance Design of the discriminant function,Y = aX1 + bX2 Year 1 2 3 4 5 6 7 8 9 10 11 12
A Bose (BIMTECH)

Grade Below Below Above Below Below Above Below Above Below Below Above Above

Infrastructure (X1 ) 3 4 10 5 6 11 7 12 8 9 13 14
Discriminant Analysis

Interface Events (X2 ) 4 5 7 4 6 4 4 5 7 5 6 8


9/ 24 November 2009 9 / 24

Discriminant Analysis

Table for Group G1, Grade=Below Infrastructure Year (X1 ) 1 3 2 4 4 5 5 6 7 7 9 8 10 9 G1 Total 42 G1 Mean 6

Interface events (X2 ) 4 5 4 6 4 7 5 35 5

Table for Group G2, Grade=Above Infrastructure Year (X1 ) 3 10 6 11 8 12 11 13 12 14 G2 Total 60 G2 Mean 12

Interface events (X2 ) 7 4 5 6 8 30 6

Grand Mean

1 =8.5 X

2 =5.417 X

A Bose (BIMTECH)

Discriminant Analysis

10/ 24

November 2009

10 / 24

Calculations for Discriminant Analysis

Table of Contents

What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects

A Bose (BIMTECH)

Discriminant Analysis

11/ 24

November 2009

11 / 24

Calculations for Discriminant Analysis

Calculation of necessary results to solve Normal Equations


Group G1 Year Standard 1 Below 2 Below 4 Below 5 Below 7 Below 9 Below 10 Below Table of Below Mean of below 3 Above 6 Above 8 Above 11 Above 12 Above Table of above Mean of above Grand Mean X1 3 4 5 6 7 8 9 42 6 10 11 12 13 14 60 12 8.5 X2 4 5 4 6 4 7 5 35 5 7 4 5 6 8 30 6 5.417
2 X1 9 16 25 36 49 64 81 280 2 X2 16 25 16 36 16 49 25 183

X1 X2 12 20 20 36 28 56 45 217 70 44 60 78 112 364

G2

100 121 144 169 196 730

49 16 25 36 64 190

A Bose (BIMTECH)

Discriminant Analysis

12/ 24

November 2009

12 / 24

Calculations for Discriminant Analysis

Sum of squares

Sum of squares 1 )2 (X1 X 2 1 2 = X1 nX 2 )2 (X2 X 2 2 2 = X2 nX 1 )(X2 X 2 ) (X1 X 1 X 2 = X1 X2 nX

Below 28 8 7

Above 10 10 4

Total 38 18 11

A Bose (BIMTECH)

Discriminant Analysis

13/ 24

November 2009

13 / 24

Calculations for Discriminant Analysis

The normal equations are 1 )2 + b a (X1 X a

1 )(X2 X 2 ) = X 1(G 2) X 1(G 1) (X1 X 2 )2 = X 2(G 2) X 2(G 1) (X2 X


Sum of squares 1 )2 (X1 X 2 1 2 nX = X1 2 2 ) (X2 X 2 2 2 = X2 nX 1 )(X2 X 2 ) (X1 X 1 X 2 = X1 X2 nX Below 28 8 7 Above 10 10 4 Total 38 18 11

1 )(X2 X 2 ) + b (X1 X

Substituting the results from the table

we have 38a + 11b = 12 6 = 6 11a + 18b = 6 5 = 1 From these simultaneous equations, a = 0.17229 and b=-0.04973. Hence, the discriminant function is as shown below: Y = 0.17229X1 0.04973X2
A Bose (BIMTECH) Discriminant Analysis 14/ 24 November 2009 14 / 24

Calculations for Discriminant Analysis

(Below ) Y 1 0.04973X 2 = 0.17229X = 0.17229x 6 0.04973x 5 = 0.78509 (Above ) Y 1 0.04973X 2 = 0.17229X = 0.17229x 12 0.04973x 6 = 1.7691 (Grandmean) Y 1 0.04973X 2 = 0.17229X = 0.17229x 8.5 0.04973x 5.417 = 1.19509 This is known as the Critical Discriminant Score

A Bose (BIMTECH)

Discriminant Analysis

15/ 24

November 2009

15 / 24

Calculations for Discriminant Analysis

Discriminant function: Y = 0.17229X1 0.04973X2 Below (Group-G1) Above (Group-G2) Data Discriminant Data Discriminant 1 )2 2 )2 set (j) Year score (S2j ) (S2j S set (j) Year score (S1j ) (S1j S 1 1 0.31795 0.218220 1 3 1.37479 0.155480 2 2 0.44051 0.118735 2 6 1.69627 0.005304 3 4 0.66253 0.015021 3 8 1.81883 0.002473 4 5 0.73536 0.002473 4 11 1.94139 0.029684 5 7 1.00711 0.049293 5 12 2.01422 0.060084 6 9 1.03021 0.060084 7 10 1.30196 0.267155 Total 5.49563 0.730981 Total 8.84550 0.253025 1 ) 0.78509 2 ) 1.7691 Mean (S Mean (S Grand Total of discriminant scores = 14.34113 ) = 1.195094 Grand Mean of discriminant scores (S

A Bose (BIMTECH)

Discriminant Analysis

16/ 24

November 2009

16 / 24

Analysis of DA Calculations

Table of Contents

What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects

A Bose (BIMTECH)

Discriminant Analysis

17/ 24

November 2009

17 / 24

Analysis of DA Calculations

The variability between groups (VBG ) Sum of squares between groups 1 S ) + n2 (S 2 S ) VBG = n1 (S
2 2 2 2

= 7(0.78509 1.195094) + 5(1.7691 1.195094) = 2.824137 The variability within groups (VWG ) Sum of squares within groups
7 5

VWG =
j =1

1 )2 + (S1j S
j =1

2 )2 (S2j S

= 0.730981 + 0.253025 = 0.984006 The discriminant ratio, K K= VBG 2.824137 = = 2.87 VWG 0.984006

A Bose (BIMTECH)

Discriminant Analysis

18/ 24

November 2009

18 / 24

Analysis of DA Calculations

Validation based on the Critical Discriminant Score (1.19509) If the discriminant score of a data set < 1.19509, include that data set into the group corresponding to Below category. If the discriminant score of a data set > 1.19509, include that data set into the group corresponding to Above category.
Classication of Data Sets based Critical Discrimination Score Year Original Classication Revised Classication 1 Below Below 2 Below Below 3 Above Above 4 Below Below 5 Below Below 6 Above Above 7 Below Below 8 Above Above 9 Below Below 10 Below Above 11 Above Above 12 Above Above Status Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Changed Unchanged Unchanged

Direction for including future data set In future if the values of the predictor variables, X1 and X2 are known, then its discriminant score can be obtained using the discriminant function. Then as per the guidelines stated, that year can be included in the appropriate group.
A Bose (BIMTECH) Discriminant Analysis 19/ 24 November 2009 19 / 24

Hypothesis Testing of equality of factor eects

Table of Contents

What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects

A Bose (BIMTECH)

Discriminant Analysis

20/ 24

November 2009

20 / 24

Hypothesis Testing of equality of factor eects

Hypothesis Testing for Equality of eect of the two factors H0 : The factors X1 (Infrastructure) and X2 (Interface events) are equal in importance H1 : The factors X1 (Infrastructure) and X2 (Interface events) are not equal in importance

The formula to compute F is shown below: n1 n2 (n1 + n2 m 1) 2 D F = m(n1 + n2 )(n1 + n2 2) where m is the number of predictor variables, (in this case, it is 2) 1(G 2) X 1(G 1) ] + b [X 2(G 2) X 2(G 1) ]} D 2 = (n1 + n2 2){a[X = (7 + 5 - 2) (0.17229x6 - 0.04973x1) = 9.8401 and F=
7x 5x (7+521) 2x (7+5)(7+52) x 9.9401

= 12.915

A Bose (BIMTECH)

Discriminant Analysis

21/ 24

November 2009

21 / 24

Hypothesis Testing of equality of factor eects

F=

7x 5x (7+521) 2x (7+5)(7+52) x 9.9401

= 12.915

The degrees of freedom for the F ratio is m, (n1 + n2 m 1), where m = 2 is the number of factors. The table value of F0.05,(2,9) = 4.26 Fobserved = 12.915 > Fcritical = 4.26, we reject H0 factors X1 (Infrastructure) and X2 (Interface events) are not equal in importance Based on H1 and the discriminant function, it is clear that the variable X1 (annual spending on infrastructure) is more important than the other variable X2 (annual spending on interface events).

A Bose (BIMTECH)

Discriminant Analysis

22/ 24

November 2009

22 / 24

Hypothesis Testing of equality of factor eects

Problem 1 - from Panneerselvam, P. 481 The performance standard of employees at a function of their age (X1 ) and family size (X2 ) is classied into Above Average and Below Average. The data on 10 dierent employees in a company are presented below: Employee 1 2 3 4 5 6 7 8 9 10 Standard Below Below Above Below Below Above Below Above Below Below X1 43 24 30 55 56 41 37 22 38 59 X2 3 4 6 3 5 3 3 4 6 4 (a) Design the discriminant function, Y = aX1 + bX2 (b) Compute the discriminant ratio, K, and identify the variable which is more important in relation to the other variable (c) Validate the discriminant function using the given data by forming groups based on the critical discriminant score (d) Test whether the group means are equal in importance at a signicance level of 0.05
23/ 24 November 2009 23 / 24

A Bose (BIMTECH)

Discriminant Analysis

Hypothesis Testing of equality of factor eects

Problem 2 - from Panneerselvam, P. 482 The potential customers of a computer company rate the product of the company as good or bad based on the time to respond to breakdown calls (X1 ) and the percentage discount on product price (X2 ). The ratings by customers are presented below: Customer 1 2 3 4 5 6 7 8 9 10 Rating Good Good Bad Bad Bad Good Good Bad Bad Good X1 (hrs) 24 12 36 12 36 36 24 48 96 36 (a) Design the discriminant function, Y = aX1 + bX2 X2 (%) 5 (b) Compute the discriminant ratio, K, 8 and identify the variable which is 4 more important in relation to the 0 other variable 3 (c) Validate the discriminant function 10 using the given data by forming 3 groups based on the critical 4 discriminant score 5 (d) Test whether the group means are 12 equal in importance at a signicance level of 0.05
Discriminant Analysis 24/ 24 November 2009 24 / 24

A Bose (BIMTECH)

You might also like