You are on page 1of 36

PROJECT REPORT

MULTIPLE DISCRIMINANT
ANALYSIS
In partial fulfilment of covering the course of Business Research
Methods

Submitted To:

Prof. Alok Kumar


Faculty, BRM
FORE School of Management

Submitted By:

Danish Sharma (251020)


Deeksha Dixit (251021)
Parth Dhingra (251036)
Safal Tyagi (251048)
Samik Mahotra (251053)
Tanmay Mathur (251061)

Student, FORE School of Management


New Delhi
FMG-25A

March 15, 2017


ACKNOWLEDGEMENT

This report is intended to be submitted as an end term project for the course titled Multiple
Discriminant Analysis. I am thankful to our course professor Alok Kumar, Professor, Fore
School of Management, New Delhi for his guidance & assistance in the completion of this
course and project.

Danish Sharma

Deeksha Dixit

Parth Dhingra

Safal Tyagi

Samik Mahotra

Tanmay Mathur

FMG-25A
Fore School of Management
New Delhi
Introduction
Multiple Discriminant Analysis (MDA) is a method for compressing a multivariate signal to
yield a lower-dimensional signal amenable to classification.

MDA is not directly used to perform classification. It merely supports classification by


yielding a compressed signal amenable to classification. The method described in Duda et al.
(2001) 3.8.3 projects the multivariate signal down to an M1 dimensional space where M is
the number of categories.

MDA is useful because most classifiers are strongly affected by the curse of dimensionality.
In other words, when signals are represented in very-high-dimensional spaces, the classifier's
performance is catastrophically impaired by the overfitting problem. This problem is reduced
by compressing the signal down to a lower-dimensional space as MDA does.

Multiple discriminant analysis (MDA), also known as canonical variates analysis (CVA) or
canonical discriminant analysis (CDA), constructs functions to maximally discriminate
between n groups of objects. This is an extension of linear discriminant analysis
(LDA) which - in its original form - is used to construct discriminant functions for objects
assigned to two groups.

Following a significant MANOVA result, the MDA procedure attempts to construct


discriminant functions (to be used as axes) from linear combinations of the original variables.
Each axis is constructed in a manner that maximizes the differences between groups while
being uncorrelated (orthogonal) to other axes in multivariate space (Figure 1). Thus, the most
'powerful' discriminatory functions are followed by functions that account for whatever
discriminatory potential is 'left over'. Together, the functions describe a hyperspace that best
separates group in multivariate space.

Key assumptions
The distribution of the original variables is assumed to be (close to) multivariate normal
in each group.
Explanatory variables are continuous. Categorical explanatory variables should be
evaluated by, e.g., discriminant correspondence analysis.
The covariance matrices of each group should be (near) equal.
It is assumed that multivariate linear functions can be used to discriminate between
groups.
The number of samples (objects) must be greater than the number of variables in the
analysis.
There should be at least two objects per group.
Variables should be homoscedastic. If the mean of a variable is correlated with its
variance, significance tests may be invalid.
There should be no linear dependency between explanatory variables.

Warnings
MDA is sensitive to outliers. These should be identified and treated accordingly.
MDA is only suitable when evaluating the variables' ability to linearly discriminate
between any grouping.
Highly correlated variables will contribute very similarly to an MDA solution and may
be redundant. Thus, variables that are uncorrelated are preferable.
While unequal group sizes can be tolerated, very large differences in group sizes can
distort results, particularly if there are very few (< 20) objects per group.
If MANOVA tests on a given set of explanatory variables are insignificant, MDA is
unlikely to be useful.
When interpreting the coefficients of a discriminant function, carefully distinguish
between standardized and unstandardized coefficients.
Heteroscedasticity is likely to lead to invalid significance tests.
Across implementations, the absolute values of discriminant weights may vary due to
different scaling and standardization approaches, but their relative proportions should be
the same.
Chapter 1

In this Study, a large international air carrier has collected data on employees in three
different job classifications:
1) customer service personnel,
2) mechanics
3) dispatchers
The director of Human Resources wants to know if these three job classifications appeal
to different personality types. Each employee is administered a battery of psychological
test which include measures of interest in outdoor activity, sociability and
conservativeness.
The dataset has 244 observations on four variables.
The psychological variables are Outdoor Interests, Social and Conservative Nature.
The following table summarizes sample of data:

Objective of the Problem


The objectives of the problems are as follows :
1. To understand the working of the Multiple Discriminant Analysis using SPSS.
2. To understand how a model can be generated to categories data into different
categories.
Data Analysis
Multiple Discriminant Analysis was done on the basis of this sample.

Here Outdoor, Social and Conservative are independent variables on continuous scale.
On the basis of these independent variables, Job categories were evaluated.
As we are conducting multiple discriminant analysis, we are having 3 categorizations of
jobs therefore 2 discriminant functions will be created.

Above Table shows Descriptive Statistics of the data set, this includes mean and standard
deviation of data set in different categories. For eg. Mean outdoor score of customer
service job is 12.52 and its standard deviation is 4.649.

By tests of equality of group means we check the significant discriminating ability of


different variables in the discriminant model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant.
As the Significance value of all variable is less than 0.05, Hence the NULL Hypothesis is
Rejected and Discriminant Function is Significant.
As Sig. value of all variables is less than 0.05, hence we reject Null Hypothesis and all
the variables in model is significant.

The above table shows correlation between independent variables, from the table we can
interpret that all the variables are very less correlated with each other.

Above table shows Eigen Values of both discriminant functions of our model, Eigen
value is the ratio of variation between group and variation within group.
D1 function is having 77% of discriminating ability of the model and other function D2
is having 22.9% discriminating ability of the model.

Wilks Lambda signifies unexplained variation, Hence, lower the better. As p value is
less than 0.05 therefore null hypothesis is Rejected.

Standardized Function Coefficients signifies correlation between variables and


discriminant function.
Unstandardized coefficients shows the coefficients of independent variables in
discriminant functions.
D1=0.09X1 - 0.194X2 + 0.155X3 + 0.937
D2=0.225X1 + 0.05X2 - 0.087X3 3.623
Where X1=Outdoor Score
X2=Social Score
X3=Conservative Score

75% of the original grouped cases are correctly classified by the Model which SPSS
created.

Recommendation and Suggestions


Model classifies 75% of the data correctly, hence it is model with decent accuracy for
categorization.

Chapter 2

Objective of the Problem


A biologist wants to establish if there is a relationship b/w the type of iris of a flower with the
length and breadth of its sepals and petals.
The type of iris for a flower can be classified into:
1. Setosa
2. Virginic
3. Versicol

The type of iris for a flower depends upon the length and width of petal and sepal.

Data Analysis
The input variables and data is as follows:
The output is as follows:
Multiple discriminant analysis of 3 categories will give 2 discriminant functions i.e. D1 &
D2. By tests of equality of group means we check the significant ability of the discriminant
model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant.
As the Significance value of all variable is less than 0.05, Hence the NULL Hypothesis is
Rejected and Discriminant Function is Significant. Pooled Within Group Matrices show
Correlation between Independent variables.
Eigen Value= Variation b/w grps / Variation within grps.
In this analysis, the first function accounts for 99.7% of the discriminating ability of the
discriminating variables and the second function accounts for 0.3%.

We can verify this by noting that the sum of the eigenvalues is 107.901+.320 = 108.221.
Then (107.901/108.221) = 0.997
and (0.320/108.221) = 0.3.
Wilks Lambda signifies unexplained variation, Hence, lower the better.
As p value is less than 0.05 therefore null hypothesis is Rejected.
Standardized Function Coefficients signifies correlation between variables and discriminant
function.
D1 = 2.102 * length of sepal 0.057 * width of sepal + 5.059 * length of petal 1.703 *
width of petal -30.017
D2 = 0.645 * length of sepal + 1.743 * width of sepal - 1.228 * length of petal + 2.7 * width
of petal -7.978
The magnitudes of these coefficients indicate how strongly the discriminating variables affect
the score.
From the above output, it can be inferred that the first function is being followed.

Recommendations and Suggestions


The model classifies majority of the data with a very high percentage of accuracy for
categorization.

Chapter 3

Objective of the Problem


A researcher wanted to study the relationship of blood pressure (BP) status (Normal, High)
with four other variables: Age, Weight, Body Surface Area (BSA) and Pulse. He recruited 42
adults and recorded their BP, Age, Weight, BSA and Pulse values.

Data Analysis
BP values are categorized into: Low(<120) Normal ( 120 to140) High(>140)

Low is denoted by 1, Normal by 2, High by 3

BLOOD AGE(in yrs) WEIGHT(kg) BSA(m2) PULSE(bpm)


PRESSURE(mg)
1 27 75.00 2.27 62.00
1 22 68.00 2.30 61.00
2 26 80.00 2.62 70.00
1 29 62.00 2.36 65.00
1 21 59.00 2.31 62.00
2 21 65.00 2.61 73.00
1 27 71.00 2.40 69.00
2 24 68.00 2.38 76.00
2 29 76.00 2.43 75.00
1 28 70.00 2.31 67.00
1 23 63.00 2.26 63.00
2 26 67.00 2.51 73.00
2 22 62.00 2.53 71.00
1 27 73.00 2.34 66.00
1 40 74.00 2.36 68.00
2 36 68.00 2.47 72.00
3 48 77.00 2.49 78.00
2 31 73.00 2.32 74.00
2 27 61.00 2.35 71.00
3 39 85.00 2.54 79.00
2 32 73.00 2.36 72.00
3 46 78.00 2.47 76.00
2 32 69.00 2.34 73.00
3 53 83.00 2.49 77.00
3 49 77.00 2.46 79.00
3 62 86.00 2.46 80.00
3 47 80.00 2.53 78.00
3 56 83.00 2.42 81.00
2 25 65.00 2.36 73.00
1 37 61.00 2.29 64.00
3 63 72.00 2.51 81.00
2 59 62.00 2.34 72.00
2 52 69.00 2.29 72.00
1 31 71.00 2.24 65.00
1 34 68.00 2.11 63.00
3 52 87.00 2.47 82.00
2 30 72.00 2.27 76.00
2 44 69.00 2.22 74.00
1 27 75.00 2.39 62.00
2 36 71.00 2.27 76.00
3 41 84.00 2.51 79.00
2 44 78.00 2.38 75.00

Discriminant
Notes

Output Created 14-MAR-2017 19:58:37


Comments
Input Data C:\Users\danish\Desktop\BRM_PRO\B
RM_Blood_Pressure.sav
Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data
474
File
Missing Value Handling Definition of Missing User-defined missing values are treated
as missing in the analysis phase.
Cases Used In the analysis phase, cases with no
user- or system-missing values for any
predictor variable are used. Cases with
user-, system-missing, or out-of-range
values for the grouping variable are
always excluded.
Syntax DISCRIMINANT
/GROUPS=bldprssr(1 3)
/VARIABLES=age weight BSA Pulse
/ANALYSIS ALL
/SAVE=CLASS SCORES PROBS
/PRIORS EQUAL
/STATISTICS=MEAN STDDEV UNIVF
RAW CORR CROSSVALID
/CLASSIFY=NONMISSING POOLED.
Resources Processor Time 00:00:00.06
Elapsed Time 00:00:00.16
Variables Created or Dis_6 Predicted Group for Analysis 1
Modified Dis1_14 Discriminant Scores from Function 1 for
Analysis 1
Dis2_14 Discriminant Scores from Function 2 for
Analysis 1
Dis1_15 Probabilities of Membership in Group 1
for Analysis 1
Dis2_15 Probabilities of Membership in Group 2
for Analysis 1
Dis3_15 Probabilities of Membership in Group 3
for Analysis 1
Number of unweighted cases written to the working file after
474
classification

Analysis Case Processing Summary


Unweighted Cases N Percent

Valid 42 8.9
Excluded Missing or out-of-range
0 .0
group codes
At least one missing
0 .0
discriminating variable
Both missing or out-of-range
group codes and at least one
432 91.1
missing discriminating
variable
Total 432 91.1
Total 474 100.0

Group Statistics

Valid N (listwise)

bldprssr Mean Std. Deviation Unweighted Weighted

Low age 28.6923 5.61819 13 13.000

weight 68.4615 5.54700 13 13.000

BSA 2.3031 .07598 13 13.000

Pulse 64.3846 2.53438 13 13.000


Normal age 33.1111 10.49307 18 18.000
weight 69.3333 5.39062 18 18.000
BSA 2.3917 .11490 18 18.000
Pulse 73.2222 1.83289 18 18.000
High age 50.5455 7.68588 11 11.000
weight 81.0909 4.65735 11 11.000
BSA 2.4864 .03501 11 11.000
Pulse 79.0909 1.81409 11 11.000
Total age 36.3095 12.10621 42 42.000

weight 72.1429 7.45579 42 42.000

BSA 2.3890 .11113 42 42.000

Pulse 72.0238 6.05055 42 42.000

Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.


age .473 21.740 2 39 .000
weight .474 21.646 2 39 .000
BSA .604 12.770 2 39 .000
Pulse .111 155.662 2 39 .000

Pooled Within-Groups Matrices

age weight BSA Pulse

Correlation age 1.000 .013 -.439 .213

weight .013 1.000 .059 .262

BSA -.439 .059 1.000 -.135

Pulse .213 .262 -.135 1.000

Summary of Canonical Discriminant Functions

Eigenvalues

Canonical
Function Eigenvalue % of Variance Cumulative % Correlation

1 10.311a 92.7 92.7 .955


a
2 .806 7.3 100.0 .668

a. First 2 canonical discriminant functions were used in the analysis.

Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 through 2 .049 113.139 8 .000


2 .554 22.171 3 .000

Standardized Canonical
Discriminant Function
Coefficients

Function

1 2

age .340 .717


weight .011 .766
BSA .518 .263
Pulse .872 -.559
Structure Matrix

Function

1 2

Pulse .877* -.241


*
BSA .251 .068
weight .274 .644*
age .298 .494*

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.

*. Largest absolute correlation between each variable and any discriminant function

Canonical Discriminant Function


Coefficients

Function

1 2

age .040 .084


weight .002 .145
BSA 5.851 2.966
Pulse .421 -.270
(Constant) -45.926 -1.195

Unstandardized coefficients

Functions at Group Centroids

Function

bldprssr 1 2

Low -4.033 .631


Normal .387 -.993
High 4.134 .880

Unstandardized canonical
discriminant functions evaluated at
group means

Classification Statistic
Classification Processing Summary

Processed 474
Excluded Missing or out-of-range group
0
codes
At least one missing
432
discriminating variable
Used in Output 42

Prior Probabilities for Groups

Cases Used in Analysis

bldprssr Prior Unweighted Weighted

Low .333 13 13.000


Normal .333 18 18.000
High .333 11 11.000
Total 1.000 42 42.000

Classification Resultsa,c

Predicted Group Membership

bldprssr Low Normal High Total

Original Count Low 12 1 0 13

Normal 0 18 0 18

High 0 0 11 11

% Low 92.3 7.7 .0 100.0

Normal .0 100.0 .0 100.0

High .0 .0 100.0 100.0


b
Cross-validated Count Low 12 1 0 13

Normal 0 18 0 18

High 0 0 11 11

% Low 92.3 7.7 .0 100.0

Normal .0 100.0 .0 100.0

High .0 .0 100.0 100.0

a. 97.6% of original grouped cases correctly classified.


b. Cross validation is done only for those cases in the analysis. In cross validation, each case
is classified by the functions derived from all cases other than that case.
c. 97.6% of cross-validated grouped cases correctly classified.

As it is a multiple discriminant analysis of 3 categories and there will be 2 discriminant


functions i.e D1 & D2.
By tests of equality of group means we check the significant ability of the discriminant
model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant
As the Significance value of all variable is less than 0.05, hence the NULL Hypothesis is
rejected and Discriminant Function is Significant.
Pooled within Group Matrices show Correlation between Independent variables.
In this analysis, the first function accounts for 92.7%of the discriminating ability of the
discriminating variables and the second function accounts for 7.3%.
We can verify this by noting that the sum of the eigenvalues is
10.311+.806 = 11.117.
Then (10.311/11.117) = 0.927 and (0.806/11.117) = 0.073.
Wilks Lambda signifies unexplained variation, hence lower the better.
As p value is less than 0.05 therefore null hypothesis is rejected.
Standardized Function Coefficients signifies correlation between variables and
discriminant function
Discriminant Coefficients 1 2 & 3 can be checked by the table of Standardized
Canonical Discriminant Function Coefficients Table in Output window.
F1 = 0.040*age + 0.002*weight + 5.851*BSA + .421*Pulse -45.926
F2 = -0.084*age+ 0.145*weight+2.966*BSA - 0.270*Pulse -1.195

Recommendations and Suggestions


Model classifies 97.6% of the data correctly, hence it is model with decent accuracy for
categorization.
Chapter 4

A Nutritionist wants to establish if there is a relationship between the varieties of the milk
with the composition of the milk
To determine this, they have conducted a suitable research. Initially the variables that have an
impact on consumers credit worthiness were identified, these variables were:
A. Water Content
B. Fat Content
C. Carbs Content
Historical data was collected from the banks own record and consumers were classified in to
two groups as follows:
A. Skimmed (code = 0)
B. Toned (code = 1)
C. Double Toned (code = 2)
This was done based on the banks experience with the customers during the last two years.
Objective of the Problem
1. To understand the working of multiple discriminant using SPSS.
2. To understand if the composition of elements in the milk is related to the variety it
has.

Data Analysis
Sample taken for multiple discriminant is as follows. Here 0 refers to skimmed milk, 1
refer to Toned Milk and 2 refer to Double toned milk.
Here Water, Fats & Carbs are independent variables on continuous scale. On the basis of
these independent variables, risk is evaluated.
As we are conducting multiple discriminant analysis, we are having 3 categorizations of
risks therefore 2 discriminant functions will be created.
By tests of equality of group means we check the significant discriminating ability of
different variables in the discriminant model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant.

As the Significance value of all variable is less than 0.05, Hence the NULL Hypothesis is
Rejected and Discriminant Function is Significant.
As Sig. value of all variables is less than 0.05, hence we reject Null Hypothesis and all
the variables in model is significant.

The above table shows correlation between independent variables, from the table we can
interpret that all the variables are very less correlated with each other.

Above table shows Eigen Values of both discriminant functions of our model, Eigen
value is the ratio of variation between group and variation within group.
D1 function is having 90.4% of discriminating ability of the model and other function D2
is having 9.6% discriminating ability of the model.
Wilks Lambda signifies unexplained variation, Hence, lower the better. As p value is
less than 0.05 therefore null hypothesis is Rejected. 2nd function is insignificant as p
value is more than 0.05.

Standardized Function Coefficients signifies correlation between variables and


discriminant function.

Unstandardized coefficients shows the coefficients of independent variables in


discriminant functions.
D1 = 1.731*Carbs + 0.512*Fat - 0.035*Water 19.65
D2 = -1.104*Carbs + 1.511*Fat - 0.025*Water - 3.113

The distribution of the scores from each function is standardized to have a mean of zero
and standard deviation of one.
The magnitudes of these coefficients indicate how strongly the discriminating variables
effect the score.
Recommendation and Suggestions
Model classifies 90% of the data correctly, hence it is model with decent accuracy for
categorization
Chapter 5

A credit card bank has been in the business for the last 14 years, during the last 2 years their
repayment default has shot up considerably. Even though the bank charges a penalty interest
on all late payments, this high default rate is putting a lot of pressure on the banks recovery
mechanism and has now begun to impact its profitability in this activity. The problem appears
to be the credit appraisal mechanism used by the bank to evaluate credit card applicants at the
time of credit card allotment. Hence the bank desires to revamp its appraisal system using its
past experience.
To determine this, they have conducted a suitable research. Initially the variables that have an
impact on consumers credit worthiness were identified, these variables were:
A. Consumers age.
B. Monthly household income.
C. No of years married.
Historical data was collected from the banks own record and consumers were classified in to
two groups as follows:
A. High risk (code = 1)
B. Medium risk (code =2)
C. Low Risk (code = 3)
This was done based on the banks experience with the customers during the last two years.
Objective of the Problem
3. To understand the working of multiple discriminant using SPSS.
4. To understand if the risk allocated to each customer is related to age, income and
number of years married.

Data Analysis
Sample taken for multiple discriminant is as follows. Here 1 refers to high risk, 2 refer to
medium risk and 3 refer to low risk.
Here Age, Income and Years of marriage are independent variables on continuous scale.
On the basis of these independent variables, risk is evaluated.
As we are conducting multiple discriminant analysis, we are having 3 categorizations of
risks therefore 2 discriminant functions will be created.
Below Table shows Descriptive Statistics of the data set, this includes mean and standard
deviation of data set in different categories. For eg. Mean age of high risk is 26.70
and its standard deviation is 4.785.
By tests of equality of group means we check the significant discriminating ability of
different variables in the discriminant model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant.

As the Significance value of all variable is less than 0.05, Hence the NULL Hypothesis is
Rejected and Discriminant Function is Significant.
As Sig. value of all variables is less than 0.05, hence we reject Null Hypothesis and all
the variables in model is significant.

The above table shows correlation between independent variables, from the table we can
interpret that all the variables are very less correlated with each other.
Above table shows Eigen Values of both discriminant functions of our model, Eigen
value is the ratio of variation between group and variation within group.
D1 function is having 98% of discriminating ability of the model and other function D2
is having 2% discriminating ability of the model.

Wilks Lambda signifies unexplained variation, Hence, lower the better. As p value is
less than 0.05 therefore null hypothesis is Rejected. 2nd function is insignificant as p
value is more than 0.05.

Standardized Function Coefficients signifies correlation between variables and


discriminant function.

Unstandardized coefficients show the coefficients of independent variables in


discriminant functions.
D1=0.054X1 + 0.00X2 + 0.039X3 - 6.970
D2=0.228X1 + 0.00X2 -0.316X3 - 5.526
Where X1=Age
X2=Income
X3=Years of Marriage

Recommendation and Suggestions


Model classifies 75% of the data correctly, hence it is model with decent accuracy for
categorization.
Chapter 6

In this Study, There are Three Prominent Chocolate Brands:


1) Cadbury
2) Nestle
3) Hersheys
A Nutritionist wants to establish if there is a relationship b/w the type of Chocolate
Brand(50gm) with the composition of the chocolate
The dataset has 42 observations on three variables.
The variables are Sugar, Fat and Milk
The following table summarizes sample of data:

Objective of the Problem


The objectives of the problems are as follows :
1. To understand the working of the Multiple Discriminant Analysis using SPSS.
2. To understand how a model can be generated to categories data into different
categories.

Data Analysis
Multiple Discriminant Analysis was done on the basis of this sample.
Here Sugar, Fat and Milk are independent variables on continuous scale. On the basis of
these independent variables, Job categories were evaluated.
As we are conducting multiple discriminant analysis, we are having 3 categorizations of
jobs therefore 2 discriminant functions will be created.

Above Table shows Descriptive Statistics of the data set, this includes mean and standard
deviation of data set in different categories. For eg. Mean fat of Nestle is 5.93 and its
standard deviation is .829.

By tests of equality of group means we check the significant discriminating ability of


different variables in the discriminant model.
Ho: Discriminant function is insignificant.
H1: Discriminant function is significant.
As the Significance value of all variable is less than 0.05, Hence the NULL Hypothesis is
Rejected and Discriminant Function is Significant.
As Sig. value of all variables is less than 0.05, hence we reject Null Hypothesis and all
the variables in model is significant.

The above table shows correlation between independent variables, from the table we can
interpret that all the variables are very less correlated with each other.

Above table shows Eigen Values of both discriminant functions of our model, Eigen
value is the ratio of variation between group and variation within group.
D1 function is having 99.3% of discriminating ability of the model and other function D2
is having .7% discriminating ability of the model.

Wilks Lambda signifies unexplained variation, Hence, lower the better. As p value is
less than 0.05 therefore null hypothesis is Rejected for D1 where as it is greater for D2
hence null hypothesis is accepted for it.
Standardized Function Coefficients signifies correlation between variables and
discriminant function.

Unstandardized coefficients shows the coefficients of independent variables in


discriminant functions.
D1 = 0.671*Fat + 0.606*Sugar + 0.303*Milk -11.564
D2 = 0.479*Fat + 0.372*Sugar 0.432*Milk +0.182

92.9% of original grouped cases are correctly classified.


88.1% of cross-validated grouped cases correctly classified

Recommendation and Suggestions


Model classifies 92.9% of the data correctly, hence it is model with decent accuracy for
categorization.
References

https://en.wikipedia.org/wiki/Multiple_discriminant_analysis
www.investopedia.com/terms/m/multiple-discriminant-analysis.asp
https://sites.google.com/site/mb3gustame/discrimination/multiple-discriminant-analysis
www.investorwords.com/6586/multiple_discriminant_analysis.html
www.cengage.com/resource_uploads/downloads/0324594690_163056.pdf
www.bauer.uh.edu/nbsyam/documents/MktRes-MARK7362-Lecture7_001.ppt
www.emeraldinsight.com/doi/abs/10.1108/17468801111119498

You might also like