You are on page 1of 8

Journal of Sensory Studies ISSN 0887-8250

IS A CONSUMER PANEL ABLE TO RELIABLY EVALUATE THE


TEXTURE OF DAIRY DESSERTS USING UNSTRUCTURED
INTENSITY SCALES? EVALUATION OF GLOBAL AND
INDIVIDUAL PERFORMANCE
joss_352

363..370

GASTN ARES1, FERNANDA BRUZZONE and ANA GIMNEZ


Seccin Evaluacin Sensorial, Departamento de Ciencia y Tecnologa de Alimentos, Facultad de Qumica, Universidad de la Repblica (UdelaR), Gral.
Flores 2124, C.P. 11800 Montevideo, Uruguay

Corresponding author. TEL: +5982-9245735;


FAX: +5982-9241906; EMAIL:
gares@fq.edu.uy
Accepted for Publication August 10, 2011
doi:10.1111/j.1745-459X.2011.00352.x

ABSTRACT
In the last decade, consumer panels have been claimed capable of evaluating the
intensity of sensory attributes of food products using intensity scales, providing
similar results than trained assessors. In this context, the present study deals with the
evaluation of the performance of a consumer panel for texture evaluation of milk
desserts, based on global and individual performance, and the comparison with a
panel of trained assessor panel. Four milk desserts with different texture characteristics were evaluated by 86 consumers and by a trained panel. Both panels evaluated
five texture attributes using unstructured intensity scales. Consumers and trained
assessors showed very similar discriminative capacity and reproducibility for all the
evaluated texture characteristics. However, the consumer panel showed lack of consensus in its evaluations and individual scores for most consumers were not able to
significantly discriminate between samples.

PRACTICAL APPLICATIONS
Results from the present work show that although average data for attribute intensity
from a consumer panel might be valid and comparable with data from a quantitative
descriptive analysis performed by a panel of trained assessors, high variability
among consumers exists. Therefore, care must be taken when using intensity scales
to study consumers perception of the sensory characteristics of food products,
which suggests that more appropriate methodologies should be developed.

INTRODUCTION
Sensory profiling is a powerful tool for the food industry
being Quantitative Descriptive Analysis (QDA) one of the
most widely used methodologies for this purpose (Stone and
Sidel 1993; Jellinek 1985; Meilgaard et al. 1999). In this methodology, assessors are selected based on their sensory capacity, trained in attribute recognition and scaling, they use a
common and agreed sensory language, and products are
scored on repeated trials to obtain a quantitative description
(ASTM 1992).
Although QDA provides detailed, reliable and consistent
results, it remains a very time-consuming approach because
the vocabulary and associated training must be adapted to
Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

each product. Considering that the times available for


product development become shorter, there is industrial
pressure to develop alternative methods that obviate the need
to train a sensory panel, as well as to gather sensory information directly from consumers (Faye et al. 2006). In this
context, several consumer profiling methodologies have
increased their popularity during the last decade, based on the
assumption that consumers could provide an accurate
description of the sensory characteristics of food products.
Asking consumers to rate the intensity of different sensory
attributes has been considered as an alternative to the classical
sensory profile provided by trained assessors by some authors
(Husson et al. 2001; Worch et al. 2010). Through the years,
several authors have compared the performance of trained
363

IS A CONSUMER PANEL REIABLE?

assessors and consumers panels, reporting contradictory


results.
Many authors have reported that trained panels perform
better than consumers or untrained panelists, in terms of discriminative capacity and reproducibility (Cardello et al. 1982;
Sawyer et al. 1988; Roberts and Vickers 1994; Wolters and Allchurch 1994; Guerrero et al. 1997; Labbe et al. 2004). On the
other hand, Moskowitz (1996) stated that consumers are
capable of validly rating the sensory aspects of products,
based on the comparison with results from trained assessors.
This author compared consumers ratings for 37 commercial
sauce products with experts ratings and physical measurements, and concluded, based on the high correlation between
data sets, that consumers are able to assess the characteristics
of food products.
More recently, a couple of articles have been published,
stating that consumers are able to perform QDA, providing
similar results than panels of trained assessors. Husson et al.
(2001) analyzed results from two consumer panels from different geographies (218 and 124 assessors) for sensory characterization of 28 grape/raspberry beverages, by means of 10
attributes. Meanwhile, Worch et al. (2010) studied the
sensory profile of 12 commercial perfumes provided by a
panel of 12 experts and by 104 consumers using unstructured
scales. These authors concluded, based on global panel performance, that consumer panels provided similar results in
terms of discrimination, consensus and reproducibility.
Despite the fact that a consumer panel has been reported to
provide similar results to a panel of trained assessors, these
studies have based their conclusions on the evaluation of
mean ratings and the comparison of product spaces from
multivariate techniques (Moskowitz 1996; Husson et al. 2001;
Worch et al. 2010). However, no study has been found analyzing consumers individual performance when using intensity
scales to evaluate food products. Studying consumers individual performance and reproducibility might provide interesting information to conclude on the reliability of consumer
panels for characterizing the sensory properties of food
products.
Texture is a complex multiparameter attribute, which
includes a large number of characteristics (Szczesniak 2002).
In particular, there are some texture attributes, such as
creaminess, that have no exact definition and that its understanding and perception might vary among individuals
(Tournier et al. 2007). For this reason, consumers evaluation
of the texture of food products using intensity scales could be
more variable and less consistent than that of other flavor or
odor characteristics.
In this context, the aim of the present work was to evaluate
the performance of a consumer panel for texture evaluation
of milk desserts, based on global and individual performance,
and to compare results with the performance of a panel of
trained assessors.
364

G. ARES, F. BRUZZONE and A. GIMNEZ

MATERIALS AND METHODS


Samples
Four milk desserts with different texture characteristics were
formulated varying the type and concentration of modified
starch, and carrageenan concentration. Two types of modified starch were used: National Frigex and National 465
(National Starch, Trombudo Central, Brazil). The formulations of the desserts are shown in Table 1. They were selected
based on previous studies in order to get samples with subtle
but perceivable texture differences, as evaluated by a trained
sensory panel.
The desserts were prepared in tap water using 12% powdered whole milk (Conaprole, Montevideo, Uruguay), 8%
commercial sugar, modified starch, vanilla aroma
(ARO10703, IFF, Buenos Aires, Argentina), carrageenan (TIC
PRETESTED Colloid 710 H Powder, TIC Gums, Belcamp,
MD) and 0.1% sodium tripolyphosphate. The rest of the formulation consisted of water up to 100%.
Desserts were prepared by mixing the solid ingredients
with water and poured into a Thermomix TM 31 (Vorwerk
Mexico S. de R.L. de C.V., Mxico D.F., Mxico). The dispersion was heated at 90C for 5 min under strong agitation
(1,100 rpm). The desserts were placed in glass containers,
closed, cooled to room temperature (25C) and then refrigerated (45C) for 24 h prior to their evaluation.

Consumer Panel
A consumer study was carried out with 86 consumers (59%
female and 41% male), ranging in age between 18 and 59
years. Participants were recruited considering their milk
dessert consumption (at least once a week), as well as their
interest and availability to participate in the study. At recruitment stage, no information about the specific aim of the
study was provided.
Twenty grams of each of the four milk desserts were served
to consumers at 10C in closed odorless plastic containers
labeled with three-digit random numbers. Samples were presented monadically following a balanced rotation (multiple
orthogonal Latin square). For each sample, consumers were
first asked to try the desserts and to evaluate five texture
TABLE 1. FORMULATION OF THE MILK DESSERTS WITH DIFFERENT
TEXTURE CHARACTERISTICS
Sample

Type of
modified starch

Modified starch
concentration (%)

Carrageenan
concentration (%)

A
B
C
D

National Frigex
National 465
National Frigex
National Frigex

4.2
4.2
4.2
5.2

0
0.04
0.04
0.02

Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

G. ARES, F. BRUZZONE and A. GIMNEZ

attributes using unstructured scales anchored with not


much at the left and very at the right, using the FIZZ computerized system (FIZZ,Version 2.0, 19942000, Biosystmes,
Couternon, France). The position on the line scale indicated
by consumers was converted to a number between 0 (leftmost
position) and 10 (rightmost position). The evaluated
attributes were: thickness, creaminess, smoothness, gumminess and homogeneity. These attributes were the most frequently mentioned by consumers in a previous study in
which they were asked to describe the texture of commercial
milk desserts using an open-ended question. No definition of
the attributes was provided, and consequently, they were
evaluated according to consumers previous understanding.
Consumers evaluated the samples in duplicate in two sessions, which were held on two consecutive days.
Mineral water was available for rinsing between samples.
The testing was carried out in a sensory laboratory that was
designed in accordance with International Organization for
Standardization (ISO) 8589 (ISO 1988), under artificial daylight type illumination, temperature control (between 22 and
24C) and air circulation.

Trained Assessor Panel


The sensory panel consisted of nine trained assessors, ages
ranging from 20 to 50 years. Assessors were selected following
the guidelines of the ISO 85861 standard (ISO 1993). They
all had a minimum of 200 h of experience in discrimination
and descriptive tests of different foods and at least 20 h experience in the evaluation of milk desserts.
Initially, five commercial milk desserts were presented to
the assessors in order to generate the texture descriptors. First,
assessors were asked to generate their individual descriptors
using a modified repertory grid method (Damasio and
Costell 1991). By open discussion with the panel leader, assessors agreed on the best descriptors to describe the samples.
According to the assessors, the best descriptors for describing
the desserts texture were: thickness, creaminess, gumminess,
melting, mouth coating, smoothness and homogeneity. In the
present work, only results for the five attributes evaluated by
the consumer panel are shown. The consensus definition of
the descriptors is shown in Table 2.
Then, assessors were trained in the descriptors evaluation
using five commercial and two formulated milk desserts with
different texture characteristics. In a first session, assessors
were asked to evaluate each of the seven texture attributes and
to score their intensity using unstructured line scales
anchored with not much at the left and very at the right.
Through open discussion with the panel leader assessors
agreed on the anchor samples for each of the attributes. In
successive sessions, the assessors received the samples, coded
with three-digit numbers, and were asked to score the texture
Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

IS A CONSUMER PANEL REIABLE?

TABLE 2. DEFINITION OF THE TEXTURE DESCRIPTORS USED BY THE


TRAINED ASSESSOR PANEL
Attribute

Definition

Thickness

Place a spoonful on your tongue, compress it


against the palate once and evaluate the
thickness perceived.
Place a spoonful of sample on your tongue, slide
it against the palate and evaluate the presence
and amount of granules or lumps.
Extension in which the sample is perceived with a
unique texture while mixing with saliva.
Sensation related to the difficulty of
disintegrating the sample in the mouth, not
mixing easily with saliva.
Sensation related to a product of smooth texture,
homogeneous, intermediate thickness and
moderate melting rate.

Smoothness

Homogeneity
Gumminess

Creaminess

attributes previously defined. A total of ten sessions, which


lasted between 15 and 20 min, were used to train the panel.
After training, assessors were asked to evaluate the seven
texture attributes and to score their intensity using unstructured scales anchored with not much at the left and very at
the right, using the FIZZ computerized system (FIZZ, Version
2.0, 19942000, Biosystmes). The position on the line scale
indicated by consumers was converted to a number between 0
(leftmost position) and 10 (rightmost position). Duplicate
evaluations were performed for evaluating the texture of the
four milk desserts in two sessions. Twenty grams of each
sample were served to assessors at 10C in closed odorless
plastic containers labeled with three-digit random numbers.
Mineral water was available for rinsing between samples. The
testing was carried out in a sensory laboratory that was
designed in accordance with ISO 8589 (ISO 1988). Evaluations were performed under artificial daylight type illumination, temperature control (between 22 and 24C) and air
circulation.

Data Analysis
Evaluation of Global Panel Performance. The global
performance of the consumer and trained assessor panels was
evaluated using the following mixed linear analysis of variance (ANOVA) model:

Yijk = + i + j + k + ij + ik + jk + ijk

(1)

where Yijk is the score of an attribute given by the panelist k to


product i at the session j, m is the constant, ai is the main effect
of product i (set as fixed), bj is the main effect of session j (set as
random),gk is the effect of the panelist k (set as random),abij is
the interaction between product i and session j,agik is the interaction between product i and panelist k, bgjk is the interaction
365

IS A CONSUMER PANEL REIABLE?

G. ARES, F. BRUZZONE and A. GIMNEZ

between session j and panelist k and eijk is the residual. In this


model,the product main effect accounts for the discriminative
capacity of the panel, the panelist effect for differences in the
use of the scale by the panelists (usually referred to as level
effect), the Product Panelist for the consensus of the panel,
and the Product Session and Panelist Session interactions
for the reproducibility of the panel.
When a significant effect was found for the products,
Tukeys test was used to perform the comparisons of the differences between them. A 5% of significance was considered.
These analyses were performed separately for the consumer and the panel of trained assessors.
Evaluation of Individual Assessor Performance. For
each attribute, the agreement between each panelist and the
rest of the panel was evaluated by calculating the correlation
coefficients between data for each assessor and data averaged over the rest of the panel for the two sessions considered, as suggested by Pags et al. (2006) and Worch et al.
(2010).
Moreover, the performance of each individual assessor was
studied using ANOVA. For this, the following model was considered for each assessor

Yij = + i + j + ij

(2)

where Yij is the score of an attribute given by an assessor to


product i at the session j, m is the constant, ai is the main effect
of product i (set as fixed), bj is the main effect of session j (set
as random) and eij is the residual. In this model, the Product
main effect accounts for the discriminative capacity of each
assessor, the Session main effect takes into account the
stability of the assessors use of the scores over sessions,
whereas the residual accounts for the Product Session

interaction, which reflects the stability of individual


performance over sessions.
The individual performance of the consumer panel was
also evaluated using principal component analysis (PCA)
(King et al. 2001). A separate PCA was conducted for each
attribute on the Product Consumer matrix, containing the
average score of each consumer for each of the evaluated
samples. This analysis was carried out on the correlation
matrix.
All statistical analyses were performed using the packages
SensoMineR (L and Husson 2008) and FactoMineR
(Husson et al. 2007; L et al. 2008) in R language (R Development Core Team 2007).

RESULTS
Evaluation of Global Panel Performance
Results from the ANOVA performed to evaluate the global
performance of the consumer and trained assessor panel are
shown in Table 3.
For the consumer panel, the Product effect was significant
for four out of the five evaluated texture attributes, suggesting
that consumers were able to detect differences in four texture
characteristics of the desserts. The largest differences were
found for thickness, creaminess and gumminess, whereas for
homogeneity no significant differences between the products
were found.
As shown in Table 4, according to the results of the consumer panel, the four desserts differed in their thickness
and gumminess, whereas they were sorted into three groups
according to their creaminess. Regarding smoothness, only
sample D differed from the rest; being significantly less
smooth.

TABLE 3. P-VALUES FROM THE ANALYSIS OF VARIANCE MODEL TO EVALUATE GLOBAL PANEL PERFORMANCE FOR CONSUMER AND TRAINED
ASSESSOR PANEL
Attribute
Panel

Effect

Thickness

Creaminess

Gumminess

Smoothness

Homogeneity

Consumer

Product
Panelist
Session
Product Panelist
Product Session
Panelist Session
Product
Panelist
Session
Product Panelist
Product Session
Panelist Session

<0.0001
<0.0001
0.381
<0.0001
0.850
0.242
<0.0001
0.0013
0.494
0.147
0.689
0.419

<0.0001
<0.0001
0.191
<0.0001
0.462
0.044
<0.0001
<0.0001
0.752
0.063
0.697
0.003

<0.0001
<0.0001
0.612
<0.0001
0.102
0.138
<0.0001
<0.0001
0.857
0.185
0.869
0.553

0.0015
<0.0001
0.411
0.006
0.827
0.076
0.578
<0.0001
0.138
0.852
0.057
0.066

0.821
<0.0001
0.093
0.0045
0.076
0.985
0.095
<0.0001
0.215
0.793
0.269
0.074

Trained assessors

Significant effects for a 5% significance level are highlighted in bold.

366

Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

G. ARES, F. BRUZZONE and A. GIMNEZ

TABLE 4. MEAN SCORES FOR THE


TEXTURE ATTRIBUTES OF MILK DESSERTS
EVALUATED BY CONSUMER AND TRAINED
ASSESSOR PANEL

IS A CONSUMER PANEL REIABLE?

Texture attribute
Panel

Sample

Thickness

Creaminess

Gumminess

Smoothness

Homogeneity

Consumers

A
B
C
D
A
B
C
D

4.2a
6.4c
5.6b
7.5d
0.5a
3.7c
2.9b
5.1d

5.3a
6.8b,c
6.3b
7.9c
1.3a
7.9b,c
6.7b
8.5c

3.3a
4.7c
4.0b
6.3d
0.4a
2.3c
1.7b
3.7d

6.9b
7.0b
7.1b
6.4a
9.9a
9.9a
9.9a
9.8a

7.6a
7.5a
7.6a
7.6a
9.9a
9.8a
9.9a
9.7a

Trained assessors

Mean values with different superscripts within a column and panel are significantly different according to Tukeys test for a 5% significance level.

In the case of the trained assessor panel, the Product effect


was significant for three of the five attributes. No significant
differences were found for the attributes smoothness and
homogeneity. The fact that consumers identified significant
differences in smoothness whereas the trained assessors did
not, could be explained considering that consumers might
not have evaluated the attribute in the same way, and therefore, both panels evaluated slightly different texture characteristics for the same descriptor.
The discriminative capacity of the consumer and trained
assessor panel was very similar in terms of thickness, creaminess and gumminess. As shown in Table 4, results from
Tukeys test performed on consumer and trained assessors
evaluations were exactly the same, which suggests that they
were able to discriminate the desserts in the same way.
In terms of scale variability, the performance of the trained
assessor panel was superior to that of the consumer panel, as
shown in Table 4. Trained assessors found larger texture differences between the products than consumers. This has
already been reported by several authors, who have stated that
untrained assessors usually find smaller differences among
products than trained assessors (Cardello et al. 1982; Sawyer
et al. 1988; Labbe et al. 2004). According to Labbe et al. (2004)
this could be explained by the fact that untrained panels tend
to use a narrow part of the scales due to the fact that they have
not been presented any product references.
For both consumer and trained assessors panels, the Panelist main effect was significant, indicating that assessors from
both panels differed in the range of the scale they used for
scaling the products. In the case of the trained assessor panel,
these level differences indicate that even though panelists
were trained in attribute scaling using references, they did not
use the scale in exactly the same way for evaluating the
samples.
Meanwhile, for the consumer panel, the interaction
Panelist Product was highly significant for all the evaluated
texture attributes, which indicates disagreement in consumers evaluations, i.e., inverse ranking among the samples for
an attribute, and scaling differences when evaluating the magJournal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

nitude of an attribute (Stone and Sidel 1985). In the case of


the trained assessor panel, the Panelist Product interaction
was not significant for any of the evaluated attributes, indicating a good consensus among the panelists.
When looking at panel reproducibility, both panels showed
similar reproducibility over sessions. None of the panels
showed a significant Session Product effect, suggesting that
average scores across products and individual product scores
did not change over sessions and therefore consumers and
trained assessors were stable in terms of their evaluations
from one session to the other.
For the consumer and trained assessor panel the
Panelist Session interaction was not significant except for
creaminess (Table 3). This suggests that some panelists might
have changed their mean creaminess scores across products
between the sessions, which do not seem relevant considering
the overall results.

Evaluation of Individual Assessor


Performance
Individual assessor performance was evaluated in terms of
agreement with the rest of the panel, discriminative capacity
and reproducibility.
The agreement between trained assessors was remarkably
high for all the evaluated attributes. Except for homogeneity,
the correlation coefficients between individual and average
scores were higher than 0.9 (data not shown). For homogeneity, the correlation coefficient between the trained assessors
was 0.822. This lower value could be explained by the fact that
very small differences in this attribute were perceived by all
the assessors.
On the other hand, the correlation coefficients between
individual and average scores for the consumer panel
showed a clear lack of consensus (Table 5). Average Pearson
correlation coefficients were low, ranging from 0.705 to
0.044, suggesting a lack of unique criterion between consumers in the sample evaluation. The highest average correlation coefficient was found for thickness whereas the lowest
367

IS A CONSUMER PANEL REIABLE?

G. ARES, F. BRUZZONE and A. GIMNEZ

Attribute

Minimum

Maximum

Average

Percentage of consumers
with a significant
correlation coefficient (%)*

Thickness
Creaminess
Gumminess
Smoothness
Homogeneity

-0.880
-0.966
-0.957
-0.996
-0.963

0.995
0.995
0.999
1.000
0.984

0.705
0.388
0.652
0.147
0.044

47.7
34.8
39.5
16.3
11.6

Pearson correlation coefficient

TABLE 5. MINIMUM, MAXIMUM AND


AVERAGE PEARSON CORRELATION
COEFFICIENTS BETWEEN INDIVIDUAL AND
AVERAGE SCORES FOR THE FIVE TEXTURE
ATTRIBUTES EVALUATED BY THE CONSUMER
PANEL AND PERCENTAGE OF CONSUMERS
WITH SIGNIFICANT CORRELATION
COEFFICIENTS

*For a confidence level of 95% and four samples, a correlation coefficient is significant if it is higher
than 0.878.

coefficient was found for homogeneity. When only four


samples are considered, the correlation coefficient is significant for a 5% significance level if it is higher than 0.878. By
considering this critical value it could be seen that the percentage of consumers who gave scores that significantly correlated with the average evaluation was close or lower than
50% for all the evaluated attributes; being thickness the
attribute in which the percentage reached a maximum
(47.7%).
As shown in Table 5, Pearson correlation coefficients
ranged from negative values close to -1 to positive values
close to 1 for the five evaluated attributes. The existence of
negative correlation coefficients could indicate that some
consumers misunderstood the scale and evaluated the

samples in the opposite way than the rest of the panel.


However, the percentage of consumers with significant negative correlation coefficients was lower than 5% for all the
evaluated samples, which clearly shows that the consumer
panel lacked agreement in their evaluations and that less than
half of the consumers significantly agreed with the average
evaluation.
The variability in consumers evaluation of the samples
could also be appreciated in consumers representation in the
PCA performed on data from each attribute (Fig. 1). As
shown, for thickness, creaminess and gumminess, consumers
were distributed along the first two principal components,
which indicate heterogeneity in their evaluation of these
attributes.

FIG 1. CONSUMERS REPRESENTATION IN


THE FIRST TWO DIMENSIONS OF THE
PRINCIPAL COMPONENT ANALYSIS
PERFORMED ON DATA FOR: (A) THICKNESS
(B) CREAMINESS AND (C) GUMMINESS

368

Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

G. ARES, F. BRUZZONE and A. GIMNEZ

IS A CONSUMER PANEL REIABLE?

TABLE 6. PERCENTAGE OF PANELISTS FROM THE CONSUMER AND


TRAINED ASSESSOR PANEL WHO GAVE SCORES WHICH TO
SIGNIFICANTLY DIFFERENTIATE BETWEEN THE SAMPLES FOR A 5% AND
10% SIGNIFICANCE LEVEL
Percentage of consumers who gave
scores which significantly
discriminated between the samples at
different significance levels (%)
Panel

Attribute

10

Consumer

Thickness
Creaminess
Gumminess
Smoothness
Homogeneity
Thickness
Creaminess
Gumminess
Smoothness
Homogeneity

33.7
16.2
24.4
8.1
9.3
100
100
100
0
0

45.3
22.1
34.9
12.8
14.0
100
100
100
0
0

Trained
assessors

Regarding discriminative capacity of assessor individual


scores, results from the trained assessors and consumer
panels clearly differed (Table 6). In the case of the trained
assessor panel, scores for all the assessors were able to discriminate between the samples in terms of thickness, creaminess and gumminess, for a significance level of 5%. This
indicates that the individual performance of the trained
assessors was good in terms of their discriminative ability.
In the case of the consumer panel, results were very different. As shown in Table 6, the percentage of consumers who
gave scores that discriminated between samples in terms of
their thickness, creaminess and gumminess was lower than
50% for a significance level of 5% or 10%. This result suggests
that only a small proportion of the consumers were able to
give intensity scores which adequately discriminate between
the samples using intensity scales.
In order to evaluate individual reproducibility, the standard deviation of the residual of model (2), which accounts
for the interaction Panelist x Session, was examined. In the
case of the trained assessor panel, the standard deviations of
the residual of the model for each trained assessor were always
lower than 1, which indicates that they were all repeatable,
using similar scores for evaluating the products in the two sessions considered. Meanwhile, the percentage of consumers
who showed a standard deviation of the residuals higher than
1.96 ranged from 13% to 22% for the five attributes; suggesting worse reproducibility than that of the trained assessor
panel. However, the percentage of consumers who might not
have been able to score the attributes within a small range of
the scale was not large and therefore, despite the fact that they
did not received any training, it could be considered that the
performance of consumers was not bad in terms of their
reproducibility.
Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

DISCUSSION AND CONCLUSIONS


Considering global panel performance, it could be concluded
that the consumer and trained assessor panels showed similar
discriminative capacity and reproducibility. Both panels were
able to detect the same significant differences in the thickness,
creaminess and gumminess of the desserts, being conclusions
regarding differences in the milk desserts texture similar.
Results from the present work provide information on the
individual performance of a consumer panel, which has not
been much reported yet. Although the consumer panel as a
whole showed a similar discriminative capacity than the
trained assessor panel, the ability of the majority of consumers to give scores that significantly discriminate among
samples was not good. In the present study, less than half of
the consumers were able to detect significant differences in
the evaluated texture attributes of the milk desserts; suggesting that the great majority of the consumers were not capable
of using intensity scales to perceive and/or express the differences in the texture of the milk desserts that the trained assessors perceived. Therefore, there seems that the lack of
consensus in the consumer panel and the high variability in
their evaluations were compensated by the large sample size
and the fact that a small group of consumers were able to discriminate between samples. For this reason, consumers mean
texture evaluation was reliable and did not greatly differ from
that of trained assessors.
These results suggest that considering average data from
consumer panels could be a valuable alternative for companies that do not have a panel of trained assessors for a particular application, as suggested by Husson et al. (2001) and
Worch et al. (2010). Using consumers might allow these companies to get information about the sensory characteristics of
food products without the need of time-consuming selection
and training processes. However, conclusions from a consumer panel might be greatly based on the perception of a
small proportion of consumers who have an outstanding performance in evaluating the characteristics of the products,
even without training. Therefore, the evaluation of sensory
attributes using intensity scales by consumers might be only
recommended in specific applications when food companies
do not have a trained panel (which is common in developing
countries such as Uruguay) or when the product is not evaluated on a regular basis. In these cases, the cost and time
involved in the selection and training of the assessors might
be higher than those needed to perform a consumer study
with 50150 participants. Otherwise, using consumers for
descriptive analysis using intensity scales would not be recommended because of the lack of consensus in their evaluations. Considering the high variability in consumers
evaluations using scales it might be more appropriate to use
other consumer profiling techniques that get rid of scaling

369

IS A CONSUMER PANEL REIABLE?

issues and that provide similar results, such as check-all-thatapply questions.


Finally, it is important to highlight that in the present study,
only the texture of four samples of milk desserts was evaluated by a panel of 86 consumers. Thus, further research in this
area is needed to overcome the limitations of the present
study and confirm its results. In particular, it would be interesting to evaluate if the lack of consensus in consumers
evaluations depends on the product, the number of samples
or the sensory modality. Another relevant issue is the study of
the impact of the number of consumers on conclusions from
sensory profiling methodologies.

ACKNOWLEDGMENTS
The authors are indebted to the Comisin Sectorial de Investigacin Cientfica (CSIC) of the Universidad de la Repblica
(UdelaR) for financial support.
REFERENCES
American Society for Testing and Materials (ASTM) 1992.
Quantitative Descriptive Analysis (QDA). ASTM Digital
Library. DOI: 608 10.1520/MNL10523M.
CARDELLO, A.V., MALLER, O., KAPSALIS, J.G., SEGARS, R.A.,
SAWYER, F.M., MURPHY, C. and MOSKOWITZ, H. 1982.
Perception of texture by trained and consumer panelists.
J. Food Sci. 47, 11861197.
DAMASIO, M.H. and COSTELL, E. 1991. Anlisis sensorial
descriptivo: Generacin de descriptores y seleccin de
catadores. Rev. Agroquim. Tecnol. Aliment. 32, 165177.
FAYE, P., BRMAUD, D., TEILLET, E., COURCOUX, P.,
GIBOREAU, A. and NICOD, H. 2006. An alternative to
external preference mapping based on consumer perceptive
mapping. Food Qual. Prefer. 17, 604614.
GUERRERO, L., GOU, P. and ARNAU, J. 1997. Descriptive
analysis of toasted almonds: A comparison between expert and
semitrained assessors. J. Sens. Stud. 12, 3954.
HUSSON, F., JOSSE, J., L, S. and MAZET, J. 2007. FactoMineR:
Factor analysis and data mining with R. R package version 1.04.
http://cran.r-project.org/web/packages/FactoMineR/index.html
(accessed November 13, 2009).
HUSSON, F., LE DIEN, S. and PAGS, J. 2001. Which value can be
granted to sensory profiles given by consumers? Methodology
and results. Food Qual. Prefer. 12, 291296.
ISO 1988. Sensory Analysis: General Guidance for the Design of Test
Rooms, ISO 8589, International Organization for
Standardization, Geneva, Switzerland.

370

G. ARES, F. BRUZZONE and A. GIMNEZ

ISO 1993. ISO Standard 85861. Sensory Analysis General


Guidance for the Selection, Training, and Monitoring
of Assessors, Part 1 Selected assessors (1st Ed.), International
Organization for Standardization, Geneva, Switzerland.
JELLINEK, G. 1985. Sensory Evaluation of Foods: Theory and
Practice, Ellis Horwood, Chichester, U.K.
KING, M., HALL, J. and CLIFF, M. 2001. A comparison of
methods for evaluating the performance of a trained sensory
panel. J. Sens. Stud. 16, 567581.
LABBE, D., RYTZ, A. and HUGI, A. 2004. Training is a critical
step to obtain reliable product profiles in a real food industry
context. Food Qual. Prefer. 15, 341348.
L, S. and HUSSON, F. 2008. SensoMineR: A package for sensory
data analysis. J. Sens. Stud. 23, 1425.
L, S., JOSSE, J. and HUSSON, F. 2008. FactoMineR: An R
package for multivariate analysis. J. Stat. Softw. 25(1), 118.
MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Sensory
Evaluation Techniques, CRC Press, Boca Raton, FL.
MOSKOWITZ, H.R. 1996. Experts versus consumers: A
comparison. J. Sens. Stud. 11, 1937.
PAGS, J., L, S. and HUSSON, F. 2006. Une approache statistique
de la performance en analyse sensorielle descriptive. Sci.
Aliment. 26, 116169.
R DEVELOPMENT CORE TEAM 2007. R: A Language and
Environment for Statistical Computing, ISBN 3-900051-07-0, R
Foundation for Statistical Computing, Vienna, Austria.
ROBERTS, A.K. and VICKERS, Z.M. 1994. A comparison of
trained and untrained judges, evaluation of sensory attribute
intensities and liking of Cheddar cheeses. J. Sens. Stud.
9, 120.
SAWYER, F.M., CARDELLO, A.V. and PRELL, P.A. 1988.
Consumer evaluation of the sensory properties of fish. J. Sens.
Stud. 53, 1218.
STONE, H. and SIDEL, J.L. 1985. Sensory Evaluation Practices.
Academic Press, London, UK.
STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices, pp.
143270, Academic Press Inc., London, U.K.
SZCZESNIAK, A.S. 2002. Texture is a sensory property. Food
Qual. Prefer. 13, 215225.
TOURNIER, C., MARTIN, C., GUICHARD, E., ISSANCHOU, S.
and SULMONT-ROSSE, C. 2007. Contribution to the
understanding of consumers creaminess concept: A sensory
and a verbal approach. Int. Dairy J. 17, 555564.
WOLTERS, C.J. and ALLCHURCH, E.M. 1994. Effect of training
procedure on the performance of descriptive panels. Food
Qual. Prefer. 5, 203214.
WORCH, T., L, S. and PUNTER, P. 2010. How reliable are the
consumers? Comparison of sensory profiles from consumers
and experts. Food Qual. Prefer. 21, 309318.

Journal of Sensory Studies 26 (2011) 363370 2011 Wiley Periodicals, Inc.

You might also like