You are on page 1of 32

IISWBM

C19 Advanced Marketing Research

Chinmoy Jana

Consolidated By: Debanjan Datta Page 1 of 32


Consolidated By: Debanjan Datta Page 2 of 32
Correlation
CHAPTER 12 What is Correlation?

CORRELATION & REGRESSION Correlation is defined as the degree of relationship between


two or more variables.
The correlation between two random variables, X and Y, is a
Correlation and Regression measure of the degree of linear association between the two
variables.
Correlation between two variables is called simple correlation.
It is strength and direction of a linear relationship between two
variables.
Dr Chinmoy Jana Correlation between one variable and several other variables is
IISWBM, Kolkata called multiple correlation.
Email: chinmoyjana@yahoo.com

Correlation Graphical Presentation Scatter Graph


The population correlation, denoted by , can take on any value from -1 to 1.

Positive correlation - When two variables X and Y move in the same


Y Y Y
direction, the correlation between the two is positive. = -1 =0
=1
Negative correlation: When two variables X and Y move in the
opposite direction, the correlation is negative.

Zero correlation: The correlation between two variables X and Y is X X X


zero when the variables move in no connection with each other.
Y = -.8 Y =0 Y
= 1 indicates a perfect negative linear relationship = .8
-1 < < 0 indicates a negative linear relationship
=0 indicates no linear relationship
0<<1 indicates a positive linear relationship
X X X
=1 indicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.


Consolidated By: Debanjan Datta Page 3 of 32
Quantitative Estimate of a Linear Correlation Spearman Rank Correlation
Correlation coefficient is defined by When the data of two variables is given in the form of ranks of two
r=
Co var iance of x and y variables based on some criterion, Spearman Rank Correlation is
( S tan dard Deviation of x ) ( S tan dard Deviation of y ) measured.
2
Where covariance of x and y is a measure of joint variation in x and y. -1 r 1
6 d i
linear correlation between two variables X and Y is given by Karl Pearson as: rs = 1
n(n 2 1)
r=

OR
where di is the difference in the ranks of ith individual or unit and n is the
number of individuals or units.

Spurious Correlation: Misleading Correlation Coefficient.

Regression Analysis Simple Linear Regression Model


Relationship between two variables Scenario could be in the
order of ignorance to Knowledge as follows:
The population simple linear regression model:
Ignorance: Just do not know whether any relationship exists at all. Y= 0 + 1 X +
Know that the two are related but do not know anything beyond this Nonrandom or Random
Systematic Component
Know that two are positively or negatively related. Ignorant beyond this
Component
aspect of correlation .
where
Know only the nature of relationship; whether it is linear or curvilinear. Y is the dependent variable, the variable we wish to explain or predict
Knowledge: Know exactly the mathematical equation of this relationship, so X is the independent variable, also called the predictor variable
that if one of the variables is known, the other can be derived from the is the error term, the only random component in the model, and thus, the
equation. only source of randomness in Y.

In real life the knowledge types of situations are very rare and have to contended 0 is the intercept of the systematic component of the regression relationship.
with the next best, i.e., statistical relationship. 1 is the slope of the systematic component.

Consolidated By: Debanjan Datta Page 4 of 32


Graphically- Simple linear Assumptions
regression model The relationship between X and Y is a
straight-line relationship.
The values of the independent variable X are
assumed fixed (not random); the only
Y randomness in the values of Y comes from
Regression Plot The simple linear regression the error term i. Y
model gives an exact linear
relationship between the expected
or average value of Y, the i) The distribution of ei s is normal.
dependent variable, and X, the Implication - the errors are symmetrical
E[Y]=0 + 1 X
independent or predictor variable: with both positive and negative values. E[Y]=0 + 1 X
Yi
ii) E(ei) = 0
{
Error: i } 1 = Slope E[Yi]=0 + 1 Xi Implication - Sum of positive and negative
}
errors is zero
Actual observed values of Y differ
1
iii) Var (ei) = 2 for all values of i.
from the expected value by an Implication - Fluctuations in all error terms
0 = Intercept
unexplained or random error: are of the same magnitude.
Identical normal
distributions of errors, all
centered on the regression
iv) r(ei,ej) = 0 line.
Yi = E[Yi] + i Implication - Error terms are uncorrelated,
= 0 + 1 Xi + i i.e., One error term does not influence
X
Xi the other error term. X

Estimation: The Method of Least Squares


Estimation of a simple linear regression relationship involves finding Least Squares Regression
estimated or predicted values of the intercept and slope of the linear
regression line.
The sum of squared errors in regression is:
n n
The estimated regression equation:
SSE = e 2i = (y i
y$ i ) 2
Y = b0 + b1X + e i=1 i=1

where b0 estimates the intercept of the population regression line, 0 ; The least squares regression line is that which minimizes the SSE
b1 estimates the slope of the population regression line, 1; with respect to the estimates b 0 and b 1 .
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points. The normal equations: SSE b0

n n
The estimated regression line: y i
= nb0 + b1 x i At this point
i=1 i=1 SSE is
Least squares b0 minimized
Y$ = b0 +b1X n n n with respect

where Y$ (Y-hat) is the value of Y lying on the fitted regression line for a given x y
i=1
i i
=b0 x i + b1 x
i=1 i=1
2
i
to b0 and b1

Least squares b1 b1
value of X.
Consolidated By: Debanjan Datta Page 5 of 32
Sums of Squares, Cross Products, and Least
Squares Estimators
Sums of Squares and Cross Products:
Errors in Regression
2

SS x = ( x x ) = x 2 2

( x)
Y
n 2
SS y = ( y y ) 2 = y 2
( y) the observed data point

n Y$ = b0 + b1 X the fitted regression line


( x ) ( y ) Yi .
SS xy = ( x x )( y y ) = xy
n
Y$i
{
Error ei = Yi Y$i
Y$i the predicted value of Y for X
i
Least squares regression estimators:
SS XY
b1 =
SS X
X
b0 = y b1 x Xi

How Good is the Regression Multiple Linear Regression Model


The k-Variable Multiple Regression Model
The coefficient of determination, r2, is a descriptive measure of the strength of
The population regression model of a
the regression relationship, a measure of how well the regression line fits the data. x2
dependent variable, Y, on a set of k y
( y y ) = ( y y$) + ( y$ y ) independent variables, X1, X2,. . . , Xk is 2
Y Total = Unexplained Explained given by:
Deviation Deviation Deviation
Y . (Error) (Regression) Y= 0 + 1X1 + 2X2 + . . . + kXk +

}
1

Y$
Unexplained Deviation
{ Total Deviation
2 2
( y y ) = ( y y$) + ( y$ y )
SST = SSE + SSR
2 where 0 is the Y-intercept of the
regression surface and each i , i = 1,2,...,k
0

Y
Explained Deviation
{ is the slope of the regression surface -
sometimes called the response surface -
x1
with respect to Xi. y = 0 + 1 x1 + 2 x 2 +
SSR SSE Percentage of
2
r = = 1 total variation Model assumptions:
SST SST explained by
X 1. ~N(0,2), independent of other errors.
X the regression. 2. The variables Xi are uncorrelated with the error term.

Consolidated By: Debanjan Datta Page 6 of 32


Using Statistics Simple and Multiple
Least-Squares Regression
Lines y Planes Y y
y B

B
A

Slope: 1 C
A
x1 x1
Intercept: 0 y$ = b0 + b1x
x2 X x2 y$ = b 0 + b1 x1 + b 2 x 2
x
Any two points (A and B), or Any three points (A, B, and C), or an In a simple regression model, In a multiple regression model,
an intercept and slope (0 and intercept and coefficients of x1 and the least-squares estimators the least-squares estimators
1), define a line on a two- x2 (0 , 1 , and 2), define a plane in minimize the sum of squared minimize the sum of squared
errors from the estimated errors from the estimated
dimensional surface. a three-dimensional surface.
regression line. regression plane.

Least-Squares Estimation:
The Estimated Regression Relationship The 2-Variable Normal Equations

Minimizing the sum of squared errors with respect to the


The estimated regression relationship: estimated coefficients b0, b1, and b2 yields the following
Y$ = b0 + b1 X 1 + b2 X 2 +L+bk X k normal equations which can be solved for b0, b1, and
b2.
where Y$ is the predicted value of Y, the value lying on the
estimated regression surface. The terms bi, for i = 0, 1, ....,k are
y = nb + b x + b x
0 1 1 2 2
the least-squares estimates of the population regression
parameters i.
2
x y =b x +b x +b x x
1 0 1 1 1 2 1 2

The actual, observed value of Y is the predicted value plus an


error: 2

yj = b0+ b1 x1j+ b2 x2j+. . . + bk xkj+e, j = 1, , n. x y =b x +b x x +b x


2 0 2 1 1 2 2 2

Consolidated By: Debanjan Datta Page 7 of 32


Decomposition of the Total Deviation in a How Good is the Regression
Multiple Regression Model

y y The mean square error is an unbiased

{} } Y Y$: Error Deviation estimator of the variance of the population


2
errors, , denoted by :
Total deviation: Y Y SSE ( y y$) 2
Y$ Y : Regression Deviation MSE = =
( n ( k + 1)) ( n ( k + 1))
y x1
Standard error of estimate:
x2 Errors: y - y$ s= MSE
x1
2
The multiple coefficient of determination, R , measures the proportion of
x2 the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
Total Deviation = Regression Deviation + Error Deviation SSR SSE
SST = SSR + SSE R2 = =1-
SST SST

The F Test of a Multiple Regression Model Decomposition of the Sum of Squares and the
Adjusted Coefficient of Determination
A statistical test for the existence of a linear relationship between Y and any or SST
all of the independent variables X1, X2, ..., Xk:
H0: 1 = 2 = ...= k= 0 SSR SSE
H1: Not all the i (i=1,2,...,k) are equal to 0
2
The multiple coefficient of determination, R , measures the proportion of
Source of Sum of Degrees of
the variation in the dependent variable that is explained by the combination
Variation Squares Freedom Mean Square F Ratio
of the independent variables in the multiple regression model:
Regression SSR k SSR MSR SSR SSE
MSR = F = R2 = =1-
k MSE SST SST
Error SSE n - (k+1) SSE The adjusted multiple coefficient of determination, R 2 , is the coefficient of
MSE = determination with the SSE and SST divided by their respective degrees of freedom:
( n ( k + 1)) SSE
Total SST n-1 SST 2 (n - (k + 1)) MSE
MST = R =1- =
( n 1) SST MST

(n - 1)
2
Where n is the sample size or number of observation and When an independent variable is added, i.e. the value of k is increased, the value of R
k is number of independent variables. increases. But when the addition of another variable does not contribute towards
2
explaining the variability in the dependent variable, the value of R decreases.
Consolidated By: Debanjan Datta Page 8 of 32
Multicollinearity Effects of Multicollinearity
x2
Variances of regression coefficients are inflated.

x1 x2 x1 Magnitudes of regression coefficients may be different


from what are expected.
Orthogonal X variables
provide information from
Perfectly collinear X
variables provide identical
Signs of regression coefficients may not be as expected.
independent sources. No information content. No Adding or removing variables produces large changes in
multicollinearity. regression. coefficients.
Removing a data point may cause large changes in
x2 coefficient estimates or signs.
x2
x1 x1 In some cases, the F ratio may be significant while the t
Some degree of collinearity. ratios are not.
A high degree of negative
Problems with regression
collinearity also causes
depend on the degree of
problems with regression.
collinearity.

Variance Inflation Factor Output Analysis


Descriptive Statistics
Useful in understanding overall distribution of variables
Correlations
The variance inflation factor associated with X h : Useful in understanding the relationships between the
1 independent and dependent variables. The regression analysis is
VIF ( X h ) = valid only if independent variables are not interrelated. If
1 Rh2
where R 2h is the R 2 value obtained for the regression of X on related, they may lead to misinterpretation of regression
the other independent variables. equation termed as multicollinearity
Variables Entered / Removed
Relationship between VIF and Rh2
This table provides summary of entered variable in the model.
VIF100
Model Summary
Provides all models that are significant in each step, Coefficient
2
of determination (R2) Adjusted R2 i.e, , Durbin-WatsonRstatistic
50

for the model.


0
Rh2
Desired Durbin-Watson value is in the range 1.5 2.5
0.0 0.5 1.0
Otherwise a caution on assumption that the residuals are
Consolidated By: Debanjan Datta uncorrelated is not valid Page 9 of 32
Output Analysis
Continue

ANOVA
Indicates whether the model is significant. If the model is not
significant, it implies that no relationship exists between the
set of variables.
Coefficients
The table provides the regression coefficients and their
significance.
Charts
To test the validity of the assumption that the residuals are
normally distributed.

Consolidated By: Debanjan Datta Page 10 of 32


Discriminant Analysis
CHAPTER 13 Statistical technique for classification or determining a linear
DISCRIMINANT ANALYSIS function, called discriminant function of the variables
which helps in discriminating between two groups of
entities or individuals.
Basic Objective of discriminant analysis is to perform a
DISCRIMINANT ANALYSIS classification function. From the analysis of past data, it can
classify a given group of entities or individuals into two
categories One those which would tern out to be
successful and other which would not be so.
Example whether one would be a buyer of a particular
Dr Chinmoy Jana
product / service or not.
IISWBM, Kolkata
Email: chinmoyjana@yahoo.com
Salesman could be classified according to their age, health,
sales aptitude score, communication ability score etc.

Discriminant Analysis
The Discriminant Function
In a discriminant analysis, observations are classified into two or more groups, The intersection of the normal marginal
depending on the value of a multivariate discriminant function. distributions of two groups gives the cutting Group 1 Group 2

As the figure illustrates, it may score, which is used to assign observations


be easier to classify X2 to groups. Observations with scores less
observations by looking at than C are assigned to group 1, and
Group 1
them from another direction.
The groups appear more observations with scores greater than C are
separated when viewed from a 1 assigned to group 2. Since the distributions
point perpendicular to Line L, Group 2 may overlap, some observations may be
2
rather than from a point
perpendicular to the X1 or X2 misclassified.
axis. The discriminant Line L

function gives the direction X1 The model may be evaluated in terms of the C
that maximizes the separation Cutting Score
between the groups. percentages of observations assigned
correctly and incorrectly.
Consolidated By: Debanjan Datta Page 11 of 32
Discriminant Analysis Objectives of Discriminant Analysis
The objectives of discriminant analysis are the following:
Discriminant analysis is used to predict group
membership. To find a linear combination of variables that discriminate between
This technique is used to classify individuals/objects into categories of dependent variable in the best possible manner.
one of the alternative groups on the basis of a set of
predictor variables. To find out which independent variables are relatively better in
discriminating between groups.
The dependent variable in discriminant analysis is
categorical whereas the independent or predictor To determine the statistical significance of the discriminant
variables are either interval or ratio scale in nature. function and whether any statistical difference exists among groups
in terms of predictor variables.
When there are two groups (categories) of dependent
variable, we have two-group discriminant analysis and To develop the procedure for assigning new objects, firms or
when there are more than two groups, it is a case of individuals whose profile but not the group identity are known to
multiple discriminant analysis. one of the two groups.
To evaluate the accuracy of classification, i.e., the percentage of
customers that it is able to classify correctly.

Uses of Discriminant Analysis


Uses of Discriminant Analysis
Some of the uses of Discriminant Analysis are: Segment discrimination: To understand what are the key
Scale construction: Discriminant analysis is used to variables on which two or more groups differ from each
identify the variables/statements that are discriminating other, this technique is extremely useful. Questions to
and on which people with diverse views will respond which one may seek answers are as follows:
differently. What are the demographic variables on which potentially
Perceptual mapping: The technique is also used successful salesmen and potentially unsuccessful salesmen
differ?
extensively to create attribute-based spatial maps of the
respondents mental positioning of brands. What are the variables on which users/non-users of a product
can be differentiated?
What are the economic and psychographic variables on which
price-sensitive and non-price sensitive customers be
differentiated?
What are the variables on which the buyers of local/national
Consolidated By: Debanjan Datta brand of a product be differentiated? Page 12 of 32
Discrimination rules
Discriminant analysis model
The mathematical form of the discriminant analysis model is:
Maximum Likelihood: Assigns x to the group that
maximizes population (group) density.
Bayes Discriminant Rule: Assigns x to the group that
maximizes, i fwhere
i (x)
represents
if (x)
the prior probability Where,
of that classification, and represents the population Y = Dependent variable
density. i bi = Coefficients of independent variables
Xi = Predictor or independent variables
Fisher linear discriminant rule: Maximizes the ratio
between SSbetween and SSwithin, and finds a linear Dependent Variable Y should be a categorized variable whereas the
combination of the predictors to predict group. independent variables Xs should be continuous. (interval or ratio scale)
Dependent variable should be coded as 0, 1 or 1, 2 in case of two-
group discriminant model.

Assumptions
Discriminant analysis model Dependent variable should be non-metric and the independent
variable should be metric or dummy
The method of estimating bi is based on the principle Variances are normal, linear and homogeneous
that the ratio of between group sum of squares to The assumption of linearity applies to the relationships between pairs
within group sum of squares be maximized. This will of independent variable Multicollinearity in DA is identified by
make the groups differ as much as possible on the examining tolerance values. Multicollinearity can be resolved by
values of the discriminant function. removing or combining the variables with the help of PCA
After having estimated the model, the bi coefficients Homogeneity of variance is important in the classification stage of DA.
If one of the groups defined by the dependent variable has greater
(also called discriminant coefficient) are used to variance than the others, more cases will tend to be classified in that
calculate Y, the discriminant score by substituting the group. Homogeneity is tested with Bo s M test with null hypothesis
values of Xi in the estimated discriminant model. that the group variance-covariance matrices are equal. If it fails to
reject, and concluded that the variances are equal, one may use a
The discriminant function with a constant term is pooled variance-covariance matrix in classification.
called un-standardized whereas without the constant
term is known as standardized discriminant function
Consolidated By: Debanjan Datta Page 13 of 32
Key Terms Key Terms Co ti ue
Eigenvalue: The basic principle in the estimation of a discriminant Wilks Lambda It can be used to test which independents contribute
function is that the variance between the groups relative to the significantly to the discriminant function. It is given by ratio of within group
variance within the group should be maximized. The ratio of between sum of squares to total sum of squares. The Wilks lambda takes a value
group variance to within group variance is called Eigenvalue. More the between 0 and 1 and lower the value of Wilks lambda, the higher is the
eigenvalue, more appropriate is the differentiation, hence the model. significance of the discriminant function. A statistically significant function
For two group DA, there is one discriminant function and one will enhance the reliability that the differentiation between the groups
eigenvalue which accounts for all of the explained variance. exists. A significant lambda means, reject the null hypothesis that the two
groups have the same mean discriminant function scores and conclude the
Relative Percentage: Functions eigenvalue divided by the sum of all model as discriminating Y
eigenvalues of all discriminant functions in the model. Percent of Discriminant score: The value resulting from applying a discriminant
discriminating power for the model associated with a given function formula to the data for a given case. The Z score is the
discriminant function. Relative 5 is used to to tell how many discriminant score for standardised data.
functions are important. Centroid: Mean value for discriminant scores for a particular group.
Canonical Correlation (R*): Canonical correlation is the simple Number of centroids equals the number of groups. Mean for a group on all
correlation coefficient between the discriminant score and the group the functions are the group centroids.
membership. (0, 1 or 1,2 etc.) Cut-off Score for Classification: If discriminant Score of the function is less than or
equal to the cutoff, the case is classified as 0, above the cutoff, it is classified as 1 .
The cut-off score is the average of two group centroid when the size of the sample in
the two groups are same, for unequal groups, it is the weighted mean.

Key Terms Co ti ue Key Terms Co ti ue


Classification Matrix or Confusion Matrix or Prediction Matrix: Used to
assess the performance of DA. The rows are the observed categories of Standardized discriminant coefficient: The absolute values of
dependent and columns are the predicted categories of the dependents. the coefficients in standardized discriminant function indicate the
When prediction is perfect all cases will lie on the diagonal. The relative contribution of the independent variables in discriminating
percentage of cases on the diagonal is the percentage of correct between the two groups. Importance is assessed relative to the
classification. This percentage is called the hit ratio. model being analysed. Addition or deletion of variables in the model
Expected hit ratio: The hit ratio is not relative to zero but to the percent can change discriminant coefficient markedly.
that would have been correctly classified by chance alone. For two group
discriminant analysis with a 50-50 split in the dependent variable, the Structural Correlation coefficients (Discriminant Loadings):
expected percentage is 50%. For unequally split 2-way groups of different Another way of finding the relative contributions of the predictor
sizes the expected percent is computed in the Prior Probabilities for variables in discriminating between groups is through comparing the
Groups , by multiplying the prior probabilities times the group size, structural coefficients of the predictor variables. The structural
summing for all groups, and dividing the sum by N. The best strategy is to coefficients are obtained by computing the correlation between the
pick the largest group for all cases, the expected percent is then the largest discriminant score and each of the independent variables.
group size divided by N. Relative importance of the variables obtained by two methods may
Cross Validation: Leave-one-out classification is available as a form of differ if there is a high degree of correlation among the predictor
cross-validation of the classification table. Each case is classified using a
discriminant function based on all cases except the given case. This is variables.
thought toBy:
Consolidated give a betterDatta
Debanjan estimate of what classification results would be in Page 14 of 32
the population.
Output Analysis Output Analysis
Analysis Case Processing Summary: Provides how many valid cases Wilks La da: Tests the sig ifi a e of the odel.
were selected, how many were excluded (due to missing data) total and their
respective percentages Standardised Canonical Discriminant Function Coefficient
Group Statistics: Provides group statistics of independent variables for each
categories of dependent variable. Structure Matrix: Pooled within-group correlations
Tests of Equality of Group Means: Table provides the test for Wilks for each between discriminating variables and standardized
independent variable. Significant means that the respective variable mean is
different for two groups (previously defaulted and previously not defaulted). canonical discriminant functions variables ordered by
Any insignificant value will indicate that the variable is not different for absolute size of correlation within function. Table gives
different group or in other terms does not discriminate the dependent
variable. simple correlation between the independent variables and
Pooled Within-Groups Matrices: Correlation of predictors the discriminant function. High correlation will get
Bo s Test of e ualit of Cova ia e Mat i es: Ra ks a d atu al loga ith s of translated to high discriminating power.
determinants provided are those of the group covariance matrices
Test Results: Tests null hypothesis of equal population covariance matrices Canonical Discriminant Function Coefficients: Provides
If Bo s M is sig ifi a t , the assu ptio of e ualit of va ia e a ot e canonical correlations. Negative sign indicates inverse
true. This is a caution for interpreting results. relation.
Summary of Canonical Discriminant Functions: Provides Eigenvalues and
canonical correlation. Percentage of variation in the dependent variable is Functions at Group Centroids: Unstandardized canonical
explained by the model.
discriminant functions evaluated at group menas.

Output Analysis Output Analysis


Stepwise statistics
Classification Statistics: Classification processing Summary
Variables Entered / Removed : At each step the variable that
Prior probabilities for group maximizes the Mahalanobis distance between t hetwo
Classification function Coefficients: Classification functions closest groups is entered.
are used to assign cases to groups. Separate function for
Maximum number of steps is 14; Minimum partial F to
each group. A classification score is computed for each
enter is 3.84; maximum partial F to remove is 2.71; F level,
function. The discriminant model assigns the case to the
tolerance, or VIN insufficient for furthet computation
group whose classification function obtained highest
score.
Classification Results: This Confusion Matrix. Provides the Variables in the Analysis: Table provides the predictors only.
percentage of cases that are classified correctly, i.e., the Wilks La da: Ta le p ovides su a of va ia les that
hit ratio. Hit ratio should be at least 25% more than the are in analysis variables that are not in the analysis and
random probability. the model at each step, its significance.
Consolidated By: Debanjan Datta Page 15 of 32
FACTOR
CHAPTER 14 A linear combination of the original variables. Factor
represents the underlying dimensions that summarizes or
FACTOR ANALYSIS accounts for the original set of observed variables.

Principal Component Analysis Symptoms To Measure


Factors -
Unobservable
Observable F
and Factor Analysis Manifest variables x Causing

Factor Analysis
Dr. Chinmoy Jana
Reduce data with maximum variance explained
IISWBM
Management House, Kolkata The basic principle behind the application of factor analysis is that the initial
set of variables should be highly correlated. If the correlation coefficients
between all the variables are small, factor analysis may not be an
appropriate technique.

FACTOR ANALYSIS Types of Factor Analysis


A multivariate statistical technique
No distinction between dependent and independent variables
Group the Group the
Looking for underlying relationship or associations. R-Factor Q-Factor Respondents
Variables
A correlation method underlying correlations between and among
large set of variables (r 0.30) to blind them into an underlying
factor driving their data values
A data reduction method. It is a very useful method to reduce a large Confirmatory
number of variables resulting in data complexity to a few manageable Exploratory
latent factors.
These factors explain most part of the variations of the original set of
data. Explore the pattern among variables
No prior hypothesis to start with Use for confirming model specification
A factor is a linear combination of variables. Model is already in place
It is a construct that is not directly observable but that needs to be
inferred from the input variables.
The factors are statistically independent orthogonal.
Consolidated By: Debanjan Datta Page 16 of 32
Uses of Factor Analysis Factor Analysis Process
Formulate the problem
Segmentation analysis: Factor analysis could be used for
segmentation. For example, there could be different sets of Construct the correlation matrix
two-wheelers-customers owning two-wheelers because of
different importance they give to factors like prestige, Determine the Method of Factor Analysis
economy consideration and functional features
Determine the number o Factors
Marketing studies: The technique has extensive use in the
field of marketing and can be successfully used for new Rotate Factors
product development; product acceptance research, developing
of advertising copy, pricing studies, branding studies etc. Interpret Factors

For example we can use it to:


Calculate Factor Score Select Surrogate Variables
identify the attributes of brands that influence consumers choice;
get an insight into the media habits of various consumers;
identify the characteristics of price-sensitive customers.
Determine Model Fit

Principal Component analysis (PCA) & Common Factor Analysis (FA) Key Terms
Principal component analysis involves extracting linear composites of Factor Scores It is the composite scores estimated for each
observed variables. respondent on the extracted factors. It is called component scores in
Factor analysis is based on a formal model predicting observed variables PCA
from theoretical latent factors. Factor Loading The correlation coefficient between the variables
Run principal component analysis If you want to simply reduce your included in the study and the factor score is called factor loading. It is
correlated observed variables to a smaller set of important independent called component loading in PCA
composite variables. Run factor analysis if you assume or wish to test a
theoretical model of latent factors causing observed variables. Factor Matrix (Component Matrix) It contains the factor loadings of
The bottom line is that these are two different models, conceptually. In PCA,
all the variables on all the extracted factors.
the components are actual orthogonal linear combinations that maximize the Eigenvalue The percentage of variance explained by each factor can
total variance. In FA, the factors are linear combinations that maximize the be computed using eigenvalue. The eigenvalue of any factor is
shared portion of the variance--underlying "latent constructs". That's why FA obtained by taking the sum of squares of the factor loadings of each
is often called "common factor analysis". FA uses a variety of optimization
routines and the result, unlike PCA, depends on the optimization routine
component.
used and starting points for those routines. Simply there is not a single Communality Amount of variance. It indicates how much of each
unique solution. variable is accounted for by the underlying factors taken together. In
FA models are to be preferred since they explicitly account for measurement errors, other words, it is a measure of the percentage of va iables variation
while PCA doesn't care about that. Briefly stated, using PCA you are expressing that is explained by the factors. A relatively high communality
each component (factor) as a linear combination of the variables, whereas in FA
these are the
Consolidated By:variables
Debanjan that are expressed as linear combinations of the factors
Datta indicates that a variable has much in common with the other variables
Page 17 of 32
(including communalities and uniqueness components). taken as a group.
Key Terms Continue
Key Terms Continue

Factor plot or Rotated Factor Space: The factors are on Kaiser Meyer Olkin (KMO) measure of Sampling
different axis and the variables are drawn on these axes. Adequacy: An index used to test appropriateness of the
This plot can be interpreted only if the number of factors factor analysis. The KMO statistics compares the
are three or less. magnitude of observed correlation coefficients with the
Goodness of a factor: How well can a factor account for magnitudes of partial correlation coefficients. A small
the correlations among the indicators? Examine the value of KMO shows that correlation between variables
correlations among the indicators after the effect of the cannot be explained by other variables. High values (>
factor is removed. For a good factor solution, the resulting
partial correlations should be near zero, because once the 0.5) indicate that factor analysis is an appropriate
effect of the common factor is removed, there is nothing measure.
to link the indicators.
Scree plot: Plot of eigen values against the factors in the Trace: The sum of squares of the values on the diagonal
order of their extraction. of the correlation matrix used in the factor analysis. It
Barletts Test of specificity: Test the null hypothesis that represents the total amount of variance on which the
there is no correlation between the variables factor solution is based.

Assumptions Determine Number of Factors


A prior determination Depending on prior knowledge researcher can specify
the number
Factor analysis exercise requires metric data data should Based on Eigenvalues Only factors with eigenvalues greater than 1.0 are
be either interval or ratio scale in nature retained. If the number of variables is less than 20, this approach will result in a
conservative number of factors.
No outliers in the data set Based on Scree plot Plot of the eigenvalues against the number of factors in
order of extraction. Steep slope means good percentage of total variance
Adequate sample size - The size of the sample respondents explained; Shallow slope means contribution of total variance is less and so
component is not justified.
should be at least four to five times more than the number of
Based on Percentage of variance Number of factors extracted so that
variables cumulative percentage of variance extracted by the factors reaches a satisfactory
level.
No perfect multi-collinearity
Based on Split-Half Reliability Sample is split in half and factor analysis is
Homoscedasticity not required between variables performed on each half. Only factors with high correspondence of factor loading
across the two subsample are retained.
Linearity of variables Based on Significance Test Statistical significance of the separate eigenvalues
and retain only those factors that are statistically significant. Drawback for large
samples (200 or more) more factors are likely to be statistically significant but
Consolidated By: Debanjan Datta practically these account only a small portion of total variance. Page 18 of 32
Extraction of Factors
The principal component methodology involves searchingContinue
for those
values of Wi so that the first factor explains the largest portion of
The first and the foremost step is to decide on how many factors total variance. This is called the first principal factor.
are to be extracted from the given set of data.
This explained variance is then subtracted from the original input
As we know that factors are linear combinations of the variables
which are supposed to be highly correlated, the mathematical form
matrix so as to yield a residual matrix.
of the same could be written as A second principal factor is extracted from the residual matrix in a
way such that the second factor takes care of most of the residual
variance.
One point that has to be kept in mind is that the second principal
factor has to be statistically independent of the first principal factor.
The same principle is then repeated until there is little variance to
be explained.
To decide on the number of factors to be extracted Kaiser Guttman
methodology is used which states that the number of factors to be
extracted should be equal to the number of factors having an
eigenvalue of at least 1.

Rotation of Factors Types of Rotation of Factors


The second step in the factor analysis exercise is Orthogonal Rotation Oblique Rotation
Factor 2 Factor 2
the rotation of initial factor solutions. This is
because the initial factors are very difficult to Rotated Factor 2 Rotated Factor 2

interpret. Therefore, the initial solution is rotated


so as to yield a solution that can be interpreted
easily.
Basic idea of rotation is to get some factors that
have a few variables that correlate high with that Factor 1
factor and some that correlate poorly with that Factor 1
Rotated Factor 1
factor. Similarly there are other factors that Rotated Factor 1
correlate high with those variables with which the Maintain 90 degree between every two of Allow factors to have some correlations among
other factors do not have significant correlation. factors. Maintain quality of being them. Break initial 90 degree between pairs of
factors. Seek the best association between
uncorrelated with each other
Consolidated By: Debanjan Datta factors and variables that Page one 19included
of 32
regardless of whether the factors are orthogonal
Varimax Rotation Varimax Rotation Continue
The varimax rotation method maximizes the variance of the A variable which appear in one factor should not appear in any
loadings within each factor. other factor. This means that a variable should have a high loading
only on one factor and a low loading on other factors
The variance of the factor is largest when its smallest loading tends
If that is not the case, it implies that the question has not been
towards zero and its largest loading tends towards unity.
understood properly by the respondent or it may not have been
Therefore, the rotation is carried out in such way so that the factor phrased clearly.
loadings as in the first step are close to unity or zero.
Another possible cause could be that the respondent may have
To interpret the results, a cut-off point on the factor loading is more than one opinion about a given item (statement).
selected.
The total variance explained by Principal component method and
There is no hard and fast rule to decide on the cut-off point. Varimax rotation is same. However, the variance explained by each
However, generally it is taken to be greater than 0.5 factor could be different.
All those variables attached to a factor, once the cut-off point is The communalities of each variable remains unchanged by both
decided, are used for naming the factors. This is a very subjective the methods.
procedure and different researchers may name same factors
differently.

Output Analysis Output Analysis Continue

Extraction Method: Principal Component Analysis Total Variance Explained: This table provides the total
Rotation Method: Varimax with Kaiser Normalization variance contributed by each component with its
percentage and Cumulative percentage.
Correlation Matrix: PCA can be carried out if the Scree Plots: Number of components against the
correlation matrix for the variables contains at least two eigenvalues and helps to determine the optimal number
correlations of 0.30 or greater of components. Components having steep slope indicate
KMO a d Ba letts Test: Mi i u e ui ed KMO is 0.5 that good percentage of total variance is explained by
and Chi-square statistics should be significant. that component, hence the component is justified. The
shallow slope indicates that the contribution of total
Communalities: Estimates of variance in each variable
variance is less and the component is not justified.
accounted for by the components. High Communalities
indicate that variables are well represented by the Component (Factor) Matrix: Table provides each variable
extracted components. If any communalities are very low component loadings, but not easily interpreted. So refer
in a principal components extraction, you may need to to Rotated Component (Factor) Matrix.
extract another component.
Consolidated By: Debanjan Datta Page 20 of 32
Output Analysis Continue

Rotated Component (Factor) Matrix: It is the most


important table for interpretation. Maximum of each row
(ignoring sign) indicates that the respective variable
belongs to the respective component.

Component (Factor) Score Coefficient Matrix: Table


provides the component scores for each variable. These
score are useful to replace internally related variables in
the regression analysis. The factor score for each
component can be calculated as the linear combinations
of the component scores of that component.

Consolidated By: Debanjan Datta Page 21 of 32


What is Cluster Analysis?
CHAPTER 15
CLUSTER ANALYSIS Cluster analysis is a techniques for grouping objects, cases,
entities on the basis of multiple variables.
CLUSTER ANALYSIS This type of analysis is used to divide a given number of
entities or objects into groups called clusters.

The advantage of the technique is that it is applicable to both


Dr Chinmoy Jana metric and non-metric data.
IISWBM, Kolkata
Email: chinmoyjana@yahoo.com The grouping can be done post hoc , i.e. after the primary
data survey is over.

Cluster Analysis Basic Methodological Questions


The objective is to classify a sample of entities into a small
What are the relevant variables and descriptive measures of
number of mutually exclusive clusters based on the
an entity?
premise that they are similar within the clusters but
How do we measure the similarity between entities?
dissimilar among the clusters.
Given that we have a measure of similarity between entities,
The technique has wide applications in all branches of
how do we form clusters?
management . However, it is most often used for market
How do we decide on how many clusters are to be formed?
segmentation analysis.

Consolidated By: Debanjan Datta Page 22 of 32


Usage of cluster analysis Usage of cluster analysis
Market segmentation customers/potential customers
Career planning and training analysis for human
can be split into smaller more homogenous groups by resource planning people can be grouped into
clusters on the basis of their educational/experience
using the method. or aptitude and aspirations.
Segmenting industries the same grouping principle can
be applied for industrial consumers.
Segmenting financial sector/instruments different
Segmenting markets cities or regions with similar or factors like raw material cost, financial allocations,
seasonality and other factors are being used to group
common traits can be grouped on the basis of climatic or sectors together to understand the growth and
socio-economic conditions. performance of a group of industries.

Discriminant Analysis & Cluster Analysis Key concepts in cluster analysis


In DA, data is classified in given set of categories using Agglomeration schedule: A hierarchical method that provides
some prior information about the data. The entire rules information on the objects, starting with the most similar pair and
of classification are based on the categorical dependent then at each stage provides information on the object joining the
pair at a later stage.
variable and the tolerance of the model.

CA does not assume any dependent variable. It uses ANOVA table: The univariate or one way ANOVA statistics for
different methods of classification to classify the data each clustering variable. The higher is the ANOVA value , the
into some groups without any prior information. The higher is the difference between the clusters on that variable.
cases with similar data would be in the same group and
Cluster variate: The variables or parameters representing the
the cases with distinct data would be classified in objects to be clustered and used to calculate the similarity
different groups between objects.

Cluster centroid: The average values of the objects on all the


Consolidated By: Debanjan Datta variables in the cluster variate. Page 23 of 32
Key concepts in cluster analysis Key concepts in cluster analysis
Cluster seeds: Initial cluster centres in the non-hierarchical Entropy group: The individuals or small groups that do
clustering that are the initial points from which one starts. Then
the clusters are created around these seeds. not seem to fit into any cluster.

Cluster membership: This indicates the address or the cluster to Final cluster centres: The mean value of the cluster on
which a particular person/object belongs. each of the variables that is a part of the cluster variate.

Dendrogram: This is a tree like diagram that is used to graphically Hierarchical methods: A step-wise process that starts
present the cluster results. The vertical axis represents the objects
and the horizontal represents the inter-respondent distance. The with the most similar pair and formulates a tree-like
figure is to be read from left to right. structure composed of separate clusters.

Distances between final cluster centres: These are the distances Non-hierarchical methods: Cluster seeds or centres are
between the individual pairs of clusters. A robust solution that is the starting points and one builds individual clusters
able to demarcate the groups distinctly is the one where the inter around it based on some pre-specified distance of the
cluster distance is large; the larger the distance the more distinct
are the clusters. seeds.

Key concepts in cluster analysis Cluster analysis process


Proximity matrix: A data matrix that consists of pair- Stage 1
RESEARCH OBJECTIVES
Exploratory versus confirmatory
objectives
Select variables used to cluster objects

wise distances/similarities between the objects. It is a N Metric data CLUSTER ASSUMPTIONS


Are the cluster variables metric or non
Nonmetric data

x N matrix, where N is the number of objects being Stage


2
2
metric?

clustered. Distance measures of similarity


Squared Euclidean distance
Association measures of similarity
Matching coefficients

Stage 3 CLUSTERING ALGORITHM


Is a hierarchical, nonhierarchical, or
combination of the two methods

Summary: Number of cases in each cluster is indicated


used?

in the non-hierarchical clustering method. HIERARCHICAL


METHODS
Single Linkage
NONHIERARCHICH
AL METHODS
Sequential
TWO STEP
CLUSTER
COMBINATION
Use a hierarchical
method to specify
Complete Linkage Threshold cluster seeds for a
Average Linkage Parallel Threshold nonhierarchical
Wards Methods Optimization method
Centroid Method

Vertical icicle diagram: Quite similar to the dendogram, Stage 4 NUMBER OF CLUSTERS
Hierarchical methods
Examine dendrogram

it is a graphical method to demonstrate the composition Cluster membership


Conceptual consideration

of the clusters. The objects are individually displayed at Stage 5 INTERPRETING THE CLUSTERS
Examine cluster variables.

the top. At any given stage the columns correspond to Name clusters

the objects being clustered, and the rows correspond to Stage 6 VALIDATING AND PROFILING THE
CLUSTERS
Validation

the number of clusters. An icicle diagram is read from


Consolidated By: Debanjan Datta
Profiling

Page 24 of 32
bottom to top.
Output Analysis Output Analysis
Icicle table: provides summary of cluster formation. It is read from
Case Processing Summary: Provides case processing bottom to top. The topmost is the single cluster solution and
summary and its percentage. Ignore cases which have bottommost is all cases separate. Cases in the table are in the
missing values. column. The first column indicates the number of clusters for that
stage. Each case is separated by an empty column. A cross in the
Single Linkage Agglomeration Schedule: Details of the empty column means the two cases are combined. A gap means
clusters formed in each stage. The column coefficients two cases are in separate clusters.
indicate the distance coefficient. Sudden increase in the Dendrogram: Most used tool to understand the number of clusters
coefficient indicates that the combining at that stage is and cluster memberships. The cases are in the first column and
more appropriate. This is one of the indicators for they are connected by lines for each stage of clustering. The
deciding the number of clusters. leftmost is all cluster solution and the rightmost is one cluster
solution. The graph has also the distance line from 0 to 25, More is
Difference in the coefficients between the current the width of the horizontal line for the cluster, more appropriate is
solution and the previous solution. the cluster.
If solution is not decisive, i.e., differences are very close, one can
try a different method, like furthest neighbourhood.

Output Analysis Output Analysis


Oneway Descriptive: Provides descriptive statistics for
Have to run the cluster solution again with same method dependent variables for each cluster, also provides
which will provide decisive solution and save the cluster separate table for its summary.
membership for say 4 clusters. Test of Homogeneity Variances: Provides Leven s
One new variable is added with name CLUE_1 takes Homogeneity test which is must for ANOVA as ANOVA
value between 1 to 4 to indicate the cluster membership. assumes that the different groups have equal variance. If
significance is less than 5%, the null hypothesis which
Conduct ANOVA where dependent variables are taken as states that the variances are equal is rejected. i.e., the
all included variables performing CA and the factor is the assumption is not followed, so ANOVA is cannot be used
cluster membership indicated by CLUE_1. and non-parametric test, Kruskal-Wallis test can be
ANOVA will indicate if clusters really distinguish on the performed.
basis of the list of variables, which variables significantly Next ANOVA table tests the difference between means for
distinguish the clusters and which do not distinguish. the different clusters. Null hypothesis states that there is
no difference between the clusters for given variable. If
significance is less than 5% (p value less than 0.05) the
Consolidated By: Debanjan Datta
null hypothesis is rejected. Page 25 of 32
Non-Hierarchial Cluster
K-means Cluster Output Analysis Validating the cluster solution
It is used when number of clusters are known.
Quick Cluster Use two-step clustering to measure the stability of
Initial Cluster Centres: Variable values of the k well- the obtained solution.
spaced observations.
Iteration history: the progress of the clustering process at Split the data in half and conduct clustering on each
each step.
and check cluster centroids.
Final Cluster Centres: Provides final cluster centres
ANOVA: F-test should be used only for descriptive
purposes because clusters have been chosen to maximize Use subjective judgment to evaluate both group
the differences among cases in different clusters. formation as well as cluster potential for managerial
Number of cases in each cluster: Provides the number of decision.
cases for each cluster
This method does not consider standardization
Hierarchal cluster is more valid.

Consolidated By: Debanjan Datta Page 26 of 32


CHAPTER 16 Conjoint Analysis
CONJOINT ANALYSIS Conjoint analysis attempts to determine the relative
importance consumers attach to salient attributes and
the utilities they attach to the levels of attributes.
Conjoint Analysis The respondents are presented with stimuli that consist
of combinations of attribute levels and asked to
evaluate these stimuli in terms of their desirability.
Conjoint procedures attempt to assign values to the
levels of each attribute, so that the resulting values or
Dr Chinmoy Jana utilities attached to the stimuli match, as closely as
IISWBM, Kolkata possible, the input evaluations provided by the
Email: chinmoyjana@yahoo.com
respondents.

Key Terms Key Terms


Part-worth functions. The part-worth functions, or utility
functions, describe the utility consumers attach to the levels of
each attribute. Cyclical designs. Designs employed to reduce the number of
paired comparisons.
Relative importance weights. The relative importance weights
are estimated and indicate which attributes are important in Fractional factorial designs. Designs employed to reduce the
influencing consumer choice. number of stimulus profiles to be evaluated in the full profile
approach.
Attribute levels. The attribute levels denote the values assumed
by the attributes. Orthogonal arrays. A special class of fractional designs that
enable the efficient estimation of all main effects.
Full profiles. Full profiles, or complete profiles of brands, are
constructed in terms of all the attributes by using the attribute Internal validity. This involves correlations of the predicted
levels specified by the design. evaluations for the holdout or validation stimuli with those
obtained from the respondents.
Pairwise tables. In pairwise tables, the respondents evaluate two
attributes at a time until all the required pairs of attributes have
been evaluated.
Consolidated By: Debanjan Datta Page 27 of 32
Conducting Conjoint Analysis Conducting Conjoint Analysis
Formulate the Problem
Formulate the Problem Identify the attributes and attribute levels to be used in
constructing the stimuli.
Construct the Stimuli
The attributes selected should be salient in influencing
consumer preference and choice and should be
Decide the Form of Input Data actionable.
A typical conjoint analysis study involves six or seven
Select a Conjoint Analysis Procedure attributes.
At least three levels should be used, unless the attribute
Interpret the Results naturally occurs in binary form (two levels).
The researcher should take into account the attribute
Assess Reliability and Validity levels prevalent in the marketplace and the objectives of
the study.

Conducting Conjoint Analysis Sneaker Attributes and Levels


Construct the Stimuli
In the pairwise approach, also called two-factor
evaluations, the respondents evaluate two attributes at
a time until all the possible pairs of attributes have been Level
evaluated. Attribute Number Description

In the full-profile approach, also called multiple-factor Sole 3 Rubber


evaluations, full or complete profiles of brands are 2 Polyurethane
1 Plastic
constructed for all the attributes. Typically, each profile
is described on a separate index card. Upper 3 Leather
2 Canvas
In the pairwise approach, it is possible to reduce the 1 Nylon
number of paired comparisons by using cyclical designs. Price 3 $30.00
Likewise, in the full-profile approach, the number of 2 $60.00
stimulus profiles can be greatly reduced by means of 1 $90.00
fractional factorial designs.
Consolidated By: Debanjan Datta Page 28 of 32
Full-Profile Approach to Collecting Conjoint Data Conducting Conjoint Analysis
Construct the Stimuli
A special class of fractional designs, called orthogonal
arrays, allow for the efficient estimation of all main
Example of a Sneaker Product Profile effects. Orthogonal arrays permit the measurement
of all main effects of interest on an uncorrelated basis.
These designs assume that all interactions are
negligible.
Sole Made of rubber
Upper Made of nylon Generally, two sets of data are obtained. One, the
Price $30.00 estimation set, is used to calculate the part-worth
functions for the attribute levels. The other, the
holdout set, is used to assess reliability and validity.

Conducting Conjoint Analysis


Decide on the Form of Input Data Sneaker Profiles & Ratings
For non-metric data, the respondents are typically
Attribute Levels a
required to provide rank-order evaluations. Preference
In the metric form, the respondents provide ratings, Profile No. Sole Upper Price Rating
rather than rankings. In this case, the judgments are 1 1 1 1 9
2 1 2 2 7
typically made independently.
3 1 3 3 5
In recent years, the use of ratings has become 4 2 1 2 6
increasingly common. 5 2 2 3 5
6 2 3 1 6
The dependent variable is usually preference or intention 7 3 1 3 5
to buy. However, the conjoint methodology is flexible 8 3 2 1 7
and can accommodate a range of other dependent 9 3 3 2 6
variables, including actual purchase or choice.
a The attribute levels correspond to those in Table of slide # 36
In evaluating sneaker profiles, respondents were required
Consolidated By: Debanjan
to provide Datta
preference. Page 29 of 32
Conducting Conjoint Analysis Conducting Conjoint Analysis
Decide on the Form of Input Data Decide on the Form of Input Data
The importance of an attribute, Ii, is defined in terms of the range
The basic conjoint analysis model may be represented by the
following formula: of the part-worths, , across
ij
the levels of that attribute:

m ki The attribute's importance is normalized to ascertain its importance


U(X )
i 1
x
j 1
ij ij relative to other attributes, Wi:
where
I i
W i m

U(X) = overall utility of an alternative


I i

ij = the part-worth contribution or utility associated with


i 1

ij the j th level (j, j = 1, 2, . . . ki) of the i th attribute m

W
So that
(i, i = 1, 2, . . . m) i
1
i 1
xjj = 1 if the j th level of the i th attribute is present
= 0 otherwise
ki = number of levels of attribute i The simplest estimation procedure, and one which is gaining in popularity,
m = number of attributes is dummy variable regression. If an attribute has ki
levels, it is coded in terms of ki - 1 dummy variables.

Conducting Conjoint Analysis


Decide on the Form of Input Data Sneaker Data Coded for
Dummy Variable Regression
The model estimated may be represented as:

U = b0 + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 + b6X6 Preference Attributes


Ratings Sole Upper Price
where Y X1 X2 X3 X4 X5 X6

X1, X2 = dummy variables representing Sole 9 1 0 1 0 1 0


X3, X4 = dummy variables representing Upper 7 1 0 0 1 0 1
X5, X6 = dummy variables representing Price 5 1 0 0 0 0 0
6 0 1 1 0 0 1
For Sole the attribute levels were coded as follows: 5 0 1 0 1 0 0
6 0 1 0 0 1 0
X1 X2 5 0 0 1 0 0 0
Level 1 1 0 7 0 0 0 1 1 0
Level 2 0 1 6 0 0 0 0 0 1
Level 3 0 0
Consolidated By: Debanjan Datta Page 30 of 32
Conducting Conjoint Analysis Conducting Conjoint Analysis
Decide on the Form of Input Data Decide on the Form of Input Data
The levels of the other attributes were coded similarly. The
parameters were estimated as follows: To solve for the part-worths, an additional constraint is necessary.
b0 = 4.222 11 + 12 + 13 = 0
b1 = 1.000 These equations for the first attribute, Sole, are:
b2 = -0.333 11 - 13 = 1. 000
b3 = 1.000 12 - 13 = -0. 333
b4 = 0.667 11 + 12 + 13 = 0
b5 = 2.333
Solving these equations, we get,
b6 = 1.333
Given the dummy variable coding, in which level 3 is the base 11
level, the coefficients may be related to the part-worths: 12 = 0.778
= -0.556
13
= -0.222
11 - 13 = b1
12 - 13 = b2

Conducting Conjoint Analysis Conducting Conjoint Analysis


Decide on the Form of Input Data Decide on the Form of Input Data
The part-worths for other attributes reported in Table The relative importance weights were calculated based on ranges
can be estimated similarly. of part-worths, as follows:
For Upper we have: Range of Utility (max-Min)
21 - 23 = b3
22 - 23 = b4 Sole: 0.778 (-0.556) = 1.334
Upper: 0.445-(-0.556) = 1.001
21 + 22 + 23 = 0
Price : 1.111-(-1.222) = 2.333
For the third attribute, Price, we have: Sum of ranges = (0.778 - (-0.556)) + (0.445-(-0.556))
31 - 33 = b5 of part-worths + (1.111-(-1.222)) = 4.668
32 - 33 = b6 Relative importance of Sole = 1.334/4.668 = 0.286
31 + 32 + 33 = 0 Relative importance of Upper = 1.001/4.668 = 0.214
Relative importance of Price = 2.333/4.668 = 0.500
Consolidated By: Debanjan Datta Page 31 of 32
Conducting Conjoint Analysis
Interpret the Results
Results of Conjoint Analysis The relative importance of attributes should be considered.
Level Price is the most important attribute, Range of utility value is
Attribute No. Description Utility Importance
highest 2.33, this contributes to 50% of total utility.
Sole 3 Rubber 0.778 Combination Utility: Total utility of any combination can be
2 Polyurethane -0.556
1 Plastic -0.222 0.286
calculated by picking up the attributes level of our choice. For
example: The combined utility of the combinations of
Upper 3 Leather 0.445
2 Canvas 0.111 Rubber Sole + Leather Upper + $30 Price = 0.778+0.445+1.111
1 Nylon -0.556 0.214 = 2.334
Price 3 $30.00 1.111 (Check this is also best combination)
2 $60.00 0.111
1 $90.00 -1.222 0.500
Individual Attributes
For Sole Rubber to Polyurethane, there is decrease in utility
value of 1.334 units, but the next level, i.e., Polyurethane to
Plastic has increase in utility 0.334 units. And so on for other
Attributes.

Conducting Conjoint Analysis


Assessing Reliability and Validity Assumptions and Limitations of Conjoint Analysis

The goodness of fit of the estimated model should be evaluated. Conjoint analysis assumes that the important attributes
For example, if dummy variable regression is used, the value of of a product can be identified.
R2 will indicate the extent to which the model fits the data.
It assumes that consumers evaluate the choice
Test-retest reliability can be assessed by obtaining a few
replicated judgments later in data collection. alternatives in terms of these attributes and make
tradeoffs.
The evaluations for the holdout or validation stimuli can be
predicted by the estimated part-worth functions. The predicted The tradeoff model may not be a good representation of
evaluations can then be correlated with those obtained from the choice process.
the respondents to determine internal validity.
Another limitation is that data collection may be
If an aggregate-level analysis has been conducted, the complex, particularly if a large number of attributes are
estimation sample can be split in several ways and conjoint
analysis conducted on each subsample. The results can be involved and the model must be estimated at the
compared across subsamples to assess the stability of conjoint individual level.
analysis solutions.
Consolidated By: Debanjan Datta The part-worth functions are not unique. Page 32 of 32

You might also like