You are on page 1of 16

Evidence for cultural dialects in vocal emotion expression:

Acoustic classification within and across five nations

Petri Laukka1, Daniel Neiberg2, and Hillary Anger Elfenbein3

1Department of Psychology, Stockholm University, Stockholm, Sweden


2Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), Stockholm, Sweden
3Olin Business School, Washington University, St. Louis, MO, USA

Manuscript version of:


Laukka, P., Neiberg, D., & Elfenbein, H. A. (2014). Evidence for cultural differences in the expressive style
of affective speech: Acoustic classification within and across five nations. Emotion, 14, 445-449.

Correspondence:
Petri Laukka
Stockholm University, Department of Psychology
106 91 Stockholm, Sweden
e-mail: petri.laukka@psychology.su.se

Supplemental materials: http://dx.doi.org/10.1037/a0036048.supp


Also included at the end of this document.

Acknowledgments
We acknowledge Swedish Research Council 2006-1360 to PL and United States National Science
Foundation BCS-0617624 to HAE. Portions of this work were presented at the Interspeech 2011 conference
(Neiberg, Laukka, & Elfenbein, 2011). We thank colleagues Jean Althoff, Wanda Chui, Frederick K. Iraki,
Thomas Rockstuhl, and Nutankumar S. Thingujam for collecting stimuli.
Abstract
The possibility of cultural differences in the fundamental acoustic patterns used to express emotion
through the voice is an unanswered question central to the larger debate about the universality versus
cultural specificity of emotion. This study used emotionally inflected standard-content speech segments
expressing 11 emotions produced by 100 professional actors from 5 English-speaking cultures. Machine
learning simulations were employed to classify expressions based on their acoustic features, using
conditions where training and testing were conducted on stimuli coming from either the same or different
cultures. A wide range of emotions were classified with above-chance accuracy in cross-cultural
conditions, suggesting vocal expressions share important characteristics across cultures. However,
classification showed an in-group advantage with higher accuracy in within- versus cross-cultural
conditions. This finding demonstrates cultural differences in expressive vocal style, and supports the
dialect theory of emotions according to which greater recognition of expressions from in-group members
results from greater familiarity with culturally specific expressive styles. Key words: acoustic features,
cross-cultural, emotion recognition, in-group advantage, machine learning, vocal expression

!2
Evidence for cultural dialects in vocal emotion expression:
Acoustic classification within and across five nations

Noticeable differences in the ability to recognize emotions from ones own cultural group versus foreign
cultural groups suggest that emotional displays even though largely universalalso have systematic
cultural differences (Elfenbein, 2013). Yet existing research focuses nearly exclusively on data from human
observers recognition of expressions, whereas there is a paucity of cross-cultural research on objective
properties of expressions (Scherer, Clark-Polner, & Mortillaro, 2011). The present study addresses this gap
by subjecting to acoustical analysis and machine learning models a large collection of vocal emotional
expressions from five countries.
The human voice provides a rich source of emotional information, conveyed by means of acoustic
patterns of pitch, intensity, voice quality, and durations (Juslin & Laukka, 2003). Across cultures,
perceivers can judge vocal expressions at rates far above those expected through chance (e.g., Laukka et
al., 2013; Sauter, Eisner, Ekman, & Scott, 2010; Scherer, Banse, & Wallbott, 2001; Van Bezooijen, Otto, &
Heenan, 1983), which documents a universal component. However, meta-analyses have documented
evidence for an in-group advantage, in that perceivers are more accurate when judging expressions from
their own culture versus other cultures (Elfenbein & Ambady, 2002; Juslin & Laukka, 2003). The dialect
theory of emotional communication starts with this observation and hypothesizes that lower accuracy
with outgroup expressions results from lower familiarity with subtle cultural differences in expression
style (Elfenbein, 2013), but this key hypothesis remains untested for vocal expression and tested only
indirectly for facial expressions (Elfenbein, Beaupr, Lvesque, & Hess, 2007)
Inspired by recent developments in affective computing (e.g., Schuller, Batliner, Steidl, & Seppi,
2011), we present a novel approach to test whether there is a match between lower accuracy and lower
familiarity with vocal expression style. More specifically, we employed machine learning methods to
classify expressions based on their acoustic features using a large database of affective speech, whereby
professional actors from different English speaking cultures express a wide range of emotions with
different levels of intensity using constant verbal content (Laukka et al., 2010). Experiments were
performed in conditions where classifier programs were trained on stimuli from either the same or a
different culture vis-a`-vis the stimuli subsequently used in the testing phase. If a particular emotion can
(or cannot) be classified with accuracy above chance in cross-cultural conditions, this would suggest that it
is (or is not) expressed using similar patterns of acoustic features across countries. In addition, if accuracy
is higher in withincultural versus cross-cultural conditions, this would suggest the existence of culture-
specific expressive styles and implicate them as the cause of in-group advantage.
Our study has few direct parallels in the existing literature. Several classification studies have used
emotional speech developed in multiple countries, and these studies report decreased accuracy in
conditions where training and testing is performed on stimuli from different collections (Kamaruddin,
Wahab, & Quek, 2012; Lefter, Rothkrantz, Wiggers, & van Leeuwen, 2010; Schuller et al., 2010; Shami &
Verhelst, 2007). However, the stimuli used in these studies differed not only with respect to culture, but
also with regard to many other relevant factors (e.g., emotion labels, language, and recording conditions),
which makes it difficult to separate the effects of culture from other group effects. To test the predictions of
dialect theory, we strived to keep all relevant factors constant across conditions, except for speaker culture.
In addition, we accounted for the intensity of the expressed emotion (e.g., Frijda, Ortony,
Sonnemans, & Clore, 1992), for which prior research demonstrates large effects on vocal acoustics (Juslin
& Laukka, 2001). In doing so, we assessed both within- and cross-cultural classification in conditions
where training and testing were conducted either on same-intensity stimuli, or on stimuli conveying
different levels of emotion intensity. As such, our classification experiments provide information on how
consistently emotions are vocally expressed both within and across cultures.

Materials and Method

!3
Vocal Stimuli and Acoustic Analyses. We used stimuli from the VENEC corpus, wherein 100
professional actors from five English speaking cultures (Australia, India, Kenya, Singapore, and U.S.A.; 20
from each culture; 50% women; ages 1830 years) expressed 11 emotions (anger, contempt, fear,
happiness, interest, neutral, sexual lust, pride, relief, sadness, and shame), using short phrases with
standardized verbal content (e.g., Let me tell you something). All actors were provided with scenarios
describing typical situations in which each emotion may be elicited, based on emotion appraisal research
(Ellsworth & Scherer, 2003), and were instructed to enact finding themselves in similar situations. More
details appear in Laukka et al. (2010, 2013). All emotions except neutral were expressed with three levels
of emotion intensity (below average, moderately high, and very high), which resulted in 3,100 emotion
portrayals used in analyses below. Recordings were conducted on location in each country using the same
equipment and similar conditions. Acoustic analysis of vocal stimuli was conducted using Praat software
(Boersma & Weenink, 2008). Thirty features were selected to represent aspects of the voice previously
associated with emotion (Juslin & Laukka, 2003): fundamental frequency, voice intensity, formants and
voice quality, and temporal characteristics, as described online in Table S1 (for additional details, see
Laukka, Neiberg, Forsell, Karlsson, & Elenius, 2011). The same feature-set was used for all experiments to
enable comparison of results across conditions.
Classification Experiments. Discriminative classification was carried out using support vector
machines (nu-SVM) with radial basis function kernel from the LibSVM-package (Chang & Lin, 2011).
Experiments were set up to examine match or mismatch between training and evaluation data in
conditions where training and testing were conducted on (a) same-intensity stimuli from the same culture
(within-culture/within- intensity; N _ 3,100 evaluation experiments); (b) differentintensity stimuli from
the same culture (within-culture/cross-intensity; N _ 6,000); (c) same-intensity stimuli from different
cultures (cross-culture/within-intensity, N _ 12,400); and (d) differentintensity stimuli from different
cultures (cross-culture/crossintensity, N _ 24,000). In addition, we included (e) an omnicultural/ within-
intensity condition wherein training and evaluation were conducted using a combination of same-
intensity stimuli from all five cultures (N _ 3,100). Classification was speaker-independent in all
conditions, which means that testing was always performed on a speaker not included in the training set.
We employed leave-out-one-speaker and leaveout- one-culture evaluation systems for within- and cross-
cultural classification, respectively (Schuller et al., 2010). The numbers of speakers used in the training
phase of each experiment were comparable in the within-cultural (N _ 19) and cross-cultural (N _ 20)
conditions, with evaluation conducted on one speaker at a time. For the omni-cultural condition we
instead used a leaveout- one-speaker-from-each-culture system where five speakers (each from different
cultures) were used for testing and the other 95 were used for training. Regularization parameters were
optimized using 13-fold cross-validation on each training set. Feature selection was not utilized in order to
keep the conditions similar in all experiments.

Results
Overall classification rates were moderate yet significantly above the proportion expected by
chance, regardless of cultural and/or emotion-intensity match in training and testing (see Table 1). The
chance level in a classification task with 11 response options is 9.1% (one out of 11) and the lower limit of
the 95% confidence intervals for overall accuracy was higher than 9.1% in all conditions. This
demonstrates important regularities in the acoustic features used to express emotions across cultures and
levels of emotion intensity.
However, as predicted by the dialect theory, overall accuracy was significantly greater for classifiers
trained and tested on same-intensity stimuli from the same culture (23.5%) versus different cultures
(16.5%; z = 8.45, p = .001), demonstrating an in-group advantage based on acoustic properties. Classifiers
trained and tested on different-intensity stimuli from the same culture (20.2%), in turn, performed worse
than classifiers with the same intensity in both stages (z =3.17, p = .002), but still outperformed classifiers
in the cross-culture/within-intensity condition (z = 6.13, p = .001). Accuracy in the cross-culture/cross-
intensity condition (15.6%) was lower than in the other conditions (zs 2.20, ps .028). These results

!4
suggest that culture may have a larger effect on vocal acoustics than emotion intensity, and their joint
contribution has the largest effects on vocal emotion encoding. In addition, overall accuracy was equal in
within- and omni-cultural conditions, suggesting that classifiers trained on several cultures performed on
par with within-culturally trained classifiers. At the level of individual emotions, all were classified with
above-chance accuracy in the within-culture/within-intensity condition and, except for pride, also in the
cross-culture/withinintensity condition (see Table 1). A significant in-group advantage was observed for
anger, contempt, fear, interest, neutral, pride, and sadness (zs 2.48, ps.013), but not for happiness, lust,
relief, or shame (zs _ 1.82). The decrease in accuracy when crossing intensity levels, in turn, was significant
for anger (z= 5.04, p = .001) and fear (z= 2.43, p = .015) in within-cultural conditions, and for anger (z =
2.99, p = .003) and shame (z =2.28, p = .023) in cross-cultural conditions. Misclassification patterns were
similar across conditions and confusion matrices for all classification conditions are shown online in Table
S2. The most frequent confusions occurred between anger and happiness, and between lust and relief,
suggesting these expressions were conveyed using partly overlapping acoustic patterns. Figure 1, finally,
shows overall classification rates as a function of individual cultures. In keeping with dialect theory
predictions, the highest accuracy was consistently observed when there was a match between training and
testing culture. In-group accuracy was significantly higher than out-group accuracy for all cultures
(Australia, 21.1% vs. 17.1%; India, 30.2% vs. 16.9%; Kenya, 18.7% vs. 14.8%; Singapore, 20.8% vs. 15.7%;
and U.S.A., 26.6% vs. 18.1%; zs 2.48, ps .013), although not all pairwise comparisons between cultures
were significant (see Figure 1). Notably, classifiers trained on American expressions performed on par
with classifiers trained on Australian expressions when classifying expressions from Australia, suggesting
similarities in expressive style between these cultures.

Discussion
Results indicated systematic cultural differences in the acoustic expression patterns of emotional
speech across portrayals generated by professional actors. Machine learning simulations allowed for the
first time a truly random assignment of familiarity with another cultures expressive style, and
classification experiments crossed five cultures for training expression style with five cultures for
classifying expressions. Regardless of cultural match, a wide range of emotions were classified with
accuracy above chance, suggesting that vocal expressions share important characteristics across cultures.
However, classification accuracy was higher in within-cultural versus cross-cultural conditions, providing
the first demonstration of an in-group advantage based on objective properties of expressions.
These results provide the strongest evidence to date for the dialect theory of emotion (Elfenbein,
2013). Because the main difference between in-group and out-group conditions was familiarity with the
stimuli from the training culture, the most plausible mechanism for in-group advantage was greater
learning of culturally specific expression styles. Conscious management techniques (Ekman, 1972) can be
ruled out, because the fully balanced design allows for a test of the expressorperceiver cultural match
independent of other group effects such as the use of decoding rules. Other mechanisms, such as ethnic
bias against members of different groups (Kilbride & Yarczower, 1983), can also be ruled out due to use of
a computational algorithm without human frailty.
The magnitude of the in-group advantage (approximately 7%) was similar to previous reports for
facial and vocal expressions (Elfenbein & Ambady, 2002; Juslin & Laukka, 2003) but varied between
emotionswith the largest differences observed for pride, anger, sadness, and fear, and the smallest for
happiness, lust, and relief. A tentative interpretation is that expressions of positive emotions (except for
pride) may be less culturally variable than negative ones, which is an exciting question for future research.
However, it is still early to draw firm conclusions, with no clear trends yet emerging across the limited
existing work (Elfenbein et al., 2007; Sauter et al., 2010). It is also possible that speakers varied more
idiosyncratically in expressing positive emotions, and so the simulation cast a wider net that may have
covered more cross-cultural differences. Pride was not cross-culturally recognized above chance, which
may indicate strongly salient culturespecific acoustic patterns. We speculate that pride might be more
universally expressed via body posture (Tracy & Robins, 2007) than vocal tone.

!5
Overall classification performance was also lower when training and testing crossed levels of
emotion intensity. Large decreases were observed for anger and fear whereas most other emotions did not
show significant decreases. One explanation for this pattern of results could be that emotions vary in their
intensity ranges (Frijda et al., 1992). The anger and fear families contain states with different intensities
ranging from mild irritation to hot rage, and mild nervousness to panic fear, respectively but other
emotions may not exhibit this variability.
The in-group advantage further varied between cultures, where the cultures with the lowest within-
cultural accuracy also exhibited the smallest in-group advantage, possibly due to floor effects.
Interestingly, we observed that Australian expressions were equally well classified by both Australian-
trained and U.S.A.- trained classifiers. This suggests that Australia and U.S.A.two countries with similar
profiles on Hofstedes (2001) cultural dimensions may have similar expressive styles, and is in line with
dialect theory which posits that cultural differences are not static, but instead vary as a function of cultural
distance (Elfenbein, 2013).
Emotional dialects may originate from a variety of sources, such as cultural variability in the modal
experience and conceptualization of various emotions (Fontaine, Scherer, & Soriano, 2013; Hess &
Thibault, 2009), or random drift that leads to divergent expressive styles when groups are stratified from
each other (Elfenbein, 2013). Linguistic factors may also contribute to emotional dialects, especially for
expressions conveyed through speech. We tried to take these sources into account by giving all actors
similar instructions whereby the emotional situations they were asked to enact were specified explicitly in
terms of appraisal dimensions. Although we controlled for the verbal content of the utterances, same-
language speakers from different cultures pronounce words differently due to local accents. If a local
accent has systematic effects on acoustic features that are used by the classifiers to discriminate between
emotions, this will lead to worse recognition when classifiers are trained and tested on stimuli with
different accents. However, we argue that, by definition, if an accent has a systematic influence on emotion
encoding, then this is part of the phenomenon rather than a nuisance effect. All the above sources may
have contributed to our results, and further research is needed to understand their separate contributions
to cultural differences in expressive style. We also note that the high degree of experimenter control for
stimulus factors other than speaker culture necessitated the use of deliberate portrayals instead of
spontaneous expressions. This may limit how well the findings generalize to more naturalistic stimuli.
Cross-cultural classification experiments provide an exciting new computational explanation of
how cultural differences can occur naturally as a consequence of people learning to interpret others
expressions in a specific cultural context (Dailey et al., 2010). Although dialects in expressive style may
create cultural barriers for emotional communication, our results showed that the in-group advantage can
result from information gaps rather than from ingrained norms or prejudice. This suggests training and
intervention programs may be used to increase familiarity with previously unfamiliar elements of
nonverbal expression, thereby eliminating the in-group advantage (see Elfenbein, 2006). The results were
also noteworthy because classification rates in the omni-cultural conditionakin to a citizen-of-the-world
who has become familiar with expressions from multiple groupswere on par with those in the within-
cultural condition. Computationally, cultural differences can be overcome if people learn to interpret
others expressions in culturally diverse environments, suggesting optimism for increasingly multicultural
societies.

References
Boersma, P., & Weenink, D. (2008). Praat: Doing phonetics by computer [Computer software]. Retrieved
from http://www.praat.org
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on
Intelligent Systems and Technology, 2, 27. doi: 10.1145/1961189.1961199

!6
Dailey, M. N., Joyce, C., Lyons, M. J., Kamachi, M., Ishi, H., Gyoba, J., & Cottrell, G. W. (2010). Evidence
and a computational explanation of cultural differences in facial expression recognition. Emotion, 10,
874-893. doi: 10.1037/a0020019
Ekman, P. (1972). Universals and cultural differences in facial expressions of emotion. In J. K. Cole (Ed.),
Nebraska symposium on motivation, 1971 (Vol. 19, pp. 207-282). Lincoln, NE: University of Nebraska
Press.
Elfenbein, H. A. (2006). Learning in emotion judgments: Training and the cross-cultural understanding of
facial expressions. Journal of Nonverbal Behavior, 30, 21-36. doi: 10.1007/s10919-005-0002-y
Elfenbein, H. A. (2013). Nonverbal dialects and accents in facial expressions of emotion. Emotion Review, 5,
90-96. doi: 10.1177/1754073912451332
Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural specificity of emotion recognition:
A meta-analysis. Psychological Bulletin, 128, 203-235. doi: 10.1037/0033-2909.128.2.203
Elfenbein, H. A., Beaupr, M., Lvesque, M., & Hess, U. (2007). Toward a dialect theory: Cultural
differences in the expression and recognition of posed facial expressions. Emotion, 7, 131-146. doi:
10.1037/1528-3542.7.1.131
Ellsworth, P. C., & Scherer, K. R. (2003). Appraisal processes in emotion. In R. J. Davidson, K. R. Scherer, &
H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 572-595). New York: Oxford University
Press.
Fontaine, J. J. R., Scherer, K. R., & Soriano, C. (Eds.) (2013). Components of emotional meaning: A sourcebook.
New York: Oxford University Press.
Frijda, N. H., Ortony, A., Sonnemans, J., & Clore, G. L. (1992). The complexity of intensity: Issues
concerning the structure of emotion intensity. In M. S. Clark (Ed.), Review of personality and social
psychology (vol. 13, pp. 60-89). Thousand Oaks, CA: Sage.
Hess, U., & Thibault, P. (2009). Darwin and emotion expression. American Psychologist, 64, 120-128. doi:
10.1037/a0013386
Hofstede, G. (2001). Cultures consequences: Comparing values, behaviors, institutions, and organizations across
nations (2nd ed.). Thousand Oaks, CA: Sage.
Juslin, P. N., & Laukka, P. (2001). Impact of intended emotion intensity on cue utilization and decoding
accuracy in vocal expression of emotion. Emotion, 1, 381-412. doi: 10.1037/1528-3542.1.4.381
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance:
Different channels, same code? Psychological Bulletin, 129, 770-814. doi: 10.1037/0033-2909.129.5.770
Kamaruddin, N., Wahab, A., & Quek, C. (2012). Cultural dependency analysis for understanding speech
emotion. Expert Systems with Applications, 39, 5115-5133. doi: 10.1016/j.eswa.2011.11.028
Kilbride, J. E., & Yarczower, M. (1983). Ethnic bias in the recognition of facial expressions. Journal of
Nonverbal Behavior, 8, 27-41. doi: 10.1007/BF00986328
Laukka, P., Elfenbein, H. A., Chui, W., Thingujam, N. S., Iraki, F. K., Rockstuhl, T., & Althoff, J. (2010).
Presenting the VENEC corpus: Development of a cross-cultural corpus of vocal emotion
expressions and a novel method of annotating emotion appraisals. In L. Devillers, B. Schuller, R.
Cowie, E. Douglas-Cowie, & A. Batliner (Eds.), Proceedings of the LREC 2010 Workshop on Corpora for
Research on Emotion and Affect (pp. 53-57). Valletta, Malta: European Language Resources
Association.
Laukka, P., Elfenbein, H. A., Sder, N., Nordstrm, H., Althoff, J., Chui, W., Iraki, F. K., Rockstuhl, T., &
Thingujam, N. S. (2013). Cross-cultural decoding of positive and negative non-linguistic
vocalizations. Frontiers in Psychology, 4, 353. doi: 10.3389/fpsyg.2013.00353
Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., & Elenius, K. (2011). Expression of affect in spontaneous
speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech
and Language, 25, 84-104. doi: 10.1016/j.csl.2010.03.004
Lefter, I., Rothkrantz, L. J. M., Wiggers, P., & van Leeuwen, D. A. (2010). Emotion recognition from speech
by combining databases and fusion of classifiers. In P. Sojka, A. Horak, I. Kopecek, & K. Pala (Eds.),

!7
Proceedings of the 13th International Conference on Text, Speech and Dialogue (pp. 353-360). Berlin:
Springer. doi: 10.1007/978-3-642-15760-8_45
Neiberg, D., Laukka, P., & Elfenbein, H. A. (2011). Intra-, inter-, and cross-cultural classification of vocal
affect. In Proceedings of the 12th Annual Conference of the International Speech Communication
Association, Interspeech 2011 (pp. 1581-1584). Florence, Italy: International Speech Communication
Association.
Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions
through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences of the
United States of America, 107, 2408-2412. doi: 10.1073/pnas.0908239106
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate
across languages and cultures. Journal of Cross-Cultural Psychology, 32, 76-98. doi:
10.1177/0022022101032001009
Scherer, K. R., Clark-Polner, E., & Mortillaro, M. (2011). In the eye of the beholder? Universality and
cultural specificity in the expression and perception of emotion. International Journal of Psychology,
46, 401-435. doi: 10.1080/00207594.2011.626049
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech:
State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062-1087. doi:
10.1016/j.specom.2011.01.011
Schuller, B., Vlasenko, B., Eyben, F., Wllmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-
corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective
Computing, 1, 119-131. doi: 10.1109/T-AFFC.2010.8
Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning
approaches to the classification of emotions in speech. Speech Communication, 49, 201-212. doi:
10.1016/j.specom.2007.01.006
Tracy, J. L., & Robins, R. W. (2007). Emerging insights into the nature and function of pride. Current
Directions in Psychological Science, 16, 147-150. doi: 10.1111/j.1467-8721.2007.00493.x
van Bezooijen, R., Otto, S. A., & Heenan, T. A. (1983). Recognition of vocal expressions of emotion: A three-
nation study to identify universal characteristics. Journal of Cross-Cultural Psychology, 14, 387-406.
doi: 10.1177/0022002183014004001

!8
Table 1.
Classification rates (% accuracy) as a function of emotion for conditions where training and testing were conducted
on (1) same-intensity stimuli from the same culture, (2) different-intensity stimuli from the same culture, (3) same-
intensity stimuli from different cultures, (4) different-intensity stimuli from different cultures, and (5) same-
intensity stimuli from many cultures.
______________________________________________________________________________________________
_____________________
(1) Within-culture/ (2) Within-culture/ (3) Cross-culture/ (4) Cross-
culture/ (5) Omni-cultural/
within-intensity cross-intensity within-intensity cross-
intensity within-intensity
______________________________________________________________________________________________
______________________
Anger 38.0 (32.5-43.5) 22.1 (18.8-25.4) 21.2 (18.9-23.5) 17.1
(15.6-18.6) 32.4 (27.1-37.7)
Contempt 17.8 (13.5-22.1) 16.8 (13.9-19.8) 11.0 (9.2-12.7) 10.5
(9.3-11.7) 16.3 (12.2-20.5)
Fear 29.4 (24.3-34.6) 22.0 (18.7-25.3) 19.0 (16.8-21.2) 17.2
(15.6-18.7) 32.7 (27.4-38.0)
Happiness 22.5 (17.8-27.3) 20.0 (16.8-23.2) 19.5 (17.3-21.8) 18.8
(17.3-20.4) 24.4 (19.5-29.2)
Interest 24.3 (19.4-29.1) 23.8 (20.4-27.2) 16.1 (14.0-18.2) 18.4
(16.8-19.9) 26.4 (21.4-31.4)
Lust 23.5 (18.7-28.3) 25.1 (21.7-28.6) 23.3 (20.9-25.7) 23.9
(22.2-25.6) 27.4 (22.3-32.4)
Neutrala 27.2 (18.5-35.9) N/A 16.4 (12.8-20.0) N/A
32.9 (23.7-42.1)
Pride 18.1 (13.7-22.4) 16.3 (13.3-19.2) 5.8 (4.4-7.1) 5.0 (4.1-5.9)
10.7 (7.2-14.2)
Relief 20.8 (16.2-25.4) 22.5 (19.2-25.9) 25.0 (22.6-27.5) 24.3
(22.6-26.0) 28.9 (23.7-34.0)
Sadness 20.0 (15.5-24.6) 15.4 (12.5-18.3) 12.1 (10.3-14.0) 10.4
(9.2-11.7) 17.0 (12.8-21.3)
Shame 16.5 (12.3-20.7) 18.1 (15.0-21.2) 12.5 (10.7-14.4) 10.0
(8.8-11.2) 14.7 (10.7-18.7)
______________________________________________________________________________________________
______________________
Overall 23.1 (21.6-24.6) 20.2 (19.2-21.2) 16.5 (15.9-17.2) 15.6
(15.1-16.0) 23.1 (21.6-24.6)
______________________________________________________________________________________________
______________________
Note. Results are averaged across experiments in each condition, and numbers within brackets indicate
95% confidence intervals. Underlined numbers indicate classification rates significantly above chance, as
indicated by 95% confidence intervals not overlapping with chance level recall (chance level=9.1%). Not
underlined numbers indicate classification rates not significantly above chance, as indicated by z-tests.
N=300 observations/cell for the within-culture/within-intensity and omni-cultural/within-intensity
conditions (N=100 for neutral portrayals); N=600 observations/cell for the within-culture/cross-intensity
condition; N=1,200 observations/cell for the cross-culture/within-intensity condition (N=400 for neutral
portrayals); and N=2,400 observations/cell for the cross-culture/cross-intensity condition.
aNeutral portrayals are not included in the overall classification rates.
Figure 1. Classification accuracy as a function of training and testing culture (Australia, India, Kenya, Singapore,
and USA) for the within-intensity condition.

Note: Error bars = 95% confidence intervals; IN = in-group condition (i.e., match between training and
testing culture); * accuracy significantly lower than in-group accuracy, as indicated by non-
overlapping 95% confidence intervals; accuracy significantly lower than in-group accuracy
(z=2.44, p=.015).

Table S1. Thirty acoustic measures related to pitch, intensity, formants, voice source and temporal aspects of speech.
___________________________________________________________________________
Pitch cues
F0M/SD Mean and standard deviation of the fundamental frequency (F0)
F0Q1/5 First and last quintiles of F0
F0Slope Slope of F0
F0FracRise/Fall Percentage of frames with F0 rise/fall
F0SSubtSD Standard deviation of F0, with the slope subtracted
Jitter Cycle-to-cycle variations in F0
___________________________________________________________________________
Intensity cues
IntM/SD Mean and standard deviation of voice intensity
IntQ1/5 First and last quintiles of voice intensity
IntSlope Slope of voice intensity
IntFracRise/Fall Percentage of frames with voice intensity rise/fall
Shimmer Cycle-to-cycle variations in intensity
___________________________________________________________________________
Formant cues
F13M Mean of formants 13
F13SD Standard deviation of formants 13
F13B Median bandwidth of formants 13
___________________________________________________________________________
Voice source cues
H1MH2 F0 amplitude-2nd F0 harmonic amplitude
H1MA3 F0 amplitude-3rd formant amplitude
___________________________________________________________________________
Temporal cues
SilDurM Mean of within-phrase silence duration
SyllDurM Mean of syllable duration
___________________________________________________________________________

Note. In a preprocessing step, each utterance was segmented into pseudo-syllables, defined as sequences
of unvoiced, voiced, and unvoiced segments with intensity and duration above thresholds tuned for
current data. Acoustic measures were calculated for these pseudo-syllables, and their averages over the
whole utterance were then used for the machine learning experiments. Temporal cues and M, SD, Q1/5,
and slope of F0 and voice intensity were instead computed over the whole utterance. All pitch based
measures use logarithmic scale and all intensity based measures use dB scale.

Table S2. Confusion matrices for conditions where training and testing were conducted on (1) same-intensity
stimuli from the same culture (within-culture/within-intensity), (2) different-intensity stimuli from the same culture
(within-culture/cross-intensity), (3) same-intensity stimuli from different cultures (cross-culture/within-intensity),
(4) different-intensity stimuli from different cultures (cross-culture/cross-intensity), and (5) same-intensity stimuli
from multiple cultures (omni-cultural/within-intensity).

1) Within-culture/within-intensity

Classified emotion _____


___

Intended Anger Contempt Fear Happiness Interest Lust Neutral


Pride Relief Sadness Shame emotion
______________________________________________________________________________________________
______________________
Anger 38.0 10.8 8.2 10.2 5.5 2.7 2.4
9.4 6.2 5.8 0.8
Contempt 15.0 17.8 8.7 6.3 7.0 4.2 8.0
13.6 8.9 4.4 6.1
Fear 10.0 9.2 29.4 10.8 6.3 4.2 2.6
6.6 5.2 10.0 5.7
Happiness 16.7 9.6 10.7 22.5 11.8 1.4 5.7
10.7 4.3 3.1 3.5
Interest 10.9 10.7 8.0 11.4 24.3 6.3 6.1
9.8 2.3 6.1 4.2
Lust 4.7 5.2 9.7 3.3 6.2 23.5 5.9
7.7 17.6 6.3 10.0
Neutral 3.6 10.7 6.2 8.3 5.7 7.8 27.2
5.0 5.5 8.7 11.2
Pride 12.0 9.6 12.4 11.2 11.2 6.7 4.8
18.1 5.6 4.5 4.1
Relief 6.1 8.1 9.2 6.9 4.8 11.3 6.3
7.4 20.8 6.0 13.0
Sadness 5.7 6.1 14.8 6.6 7.7 6.9 7.5
7.3 5.1 20.0 12.2
Shame 5.6 8.6 9.4 6.0 5.7 9.9 8.1
4.4 13.6 12.2 16.5
______________________________________________________________________________________________
______________________
2) Within-culture/cross-intensity

Classified emotion
_____ ___
Intended Anger Contempt Fear Happiness Interest Lust Neutral
Pride Relief Sadness Shame emotion
______________________________________________________________________________________________
______________________
Anger 22.1 11.4 8.7 16.7 11.3 3.3 4.8
7.3 4.8 5.0 4.5
Contempt 14.4 16.8 6.9 10.1 7.7 4.1 9.9
11.1 6.3 6.6 6.2
Fear 8.9 8.1 22.0 9.5 11.2 4.1 5.1
8.7 5.9 10.8 5.8
Happiness 14.6 9.0 10.3 20.0 10.1 2.0 6.0
11.4 5.4 7.2 4.0
Interest 8.7 8.4 8.0 7.8 23.8 4.4 8.6
9.9 4.8 9.3 6.3
Lust 2.1 7.6 6.0 4.8 5.8 25.1 8.4
7.0 12.2 8.4 12.5
Neutral 3.6 10.7 6.2 8.3 5.7 7.8 27.2
5.0 5.5 8.7 11.2
Pride 11.0 11.6 7.7 12.3 11.6 3.8 10.2
16.3 4.1 5.9 5.6
Relief 8.5 9.4 4.5 8.6 6.8 12.3 8.3
4.9 22.5 4.1 10.1
Sadness 4.6 7.1 11.8 9.3 8.7 7.4 11.1
6.4 6.0 15.4 12.4
Shame 6.0 6.2 8.9 6.1 6.0 9.6 10.9
5.6 9.8 12.7 18.1
______________________________________________________________________________________________
______________________


3) Cross-culture/within-intensity

Classified emotion
_____ ___
Intended Anger Contempt Fear Happiness Interest Lust Neutral
Pride Relief Sadness Shame emotion
______________________________________________________________________________________________
______________________
Anger 21.2 7.4 9.6 16.0 6.6 5.4 4.0
5.5 10.4 8.3 5.6
Contempt 13.4 11.0 6.5 10.7 7.5 8.5 7.9
4.3 13.3 8.2 8.6
Fear 15.1 4.8 19.0 10.8 7.1 7.0 4.8
4.5 9.9 11.6 5.5
Happiness 16.3 6.6 11.3 19.5 8.9 5.4 3.8
6.4 8.4 6.7 6.8
Interest 11.0 6.6 9.6 13.9 16.1 9.3 6.9
3.8 9.8 8.0 5.0
Lust 6.8 5.3 7.9 5.1 5.5 23.3 6.2
2.9 20.6 7.2 9.2
Neutral 8.9 5.4 6.2 7.9 7.1 8.4 16.4
3.8 14.5 9.8 11.5
Pride 13.3 8.2 9.2 15.7 8.6 8.5 4.6
5.8 11.6 7.3 7.3
Relief 7.7 6.7 8.9 7.9 5.8 15.9 6.2
3.8 25.0 5.0 7.2
Sadness 11.4 5.5 11.9 10.2 5.8 11.2 7.2
5.3 10.7 12.1 8.6
Shame 7.8 5.7 7.1 8.1 4.7 16.2 7.8
4.5 16.0 9.6 12.5
______________________________________________________________________________________________
______________________
4) Cross-culture/cross-intensity

Classified emotion
_____ ___
Intended Anger Contempt Fear Happiness Interest Lust Neutral
Pride Relief Sadness Shame emotion
______________________________________________________________________________________________
______________________
Anger 17.1 8.6 9.4 16.6 8.8 3.9 7.0
6.6 10.0 6.7 5.5
Contempt 12.5 10.5 6.7 10.4 8.4 7.6 10.3
5.7 12.8 6.7 8.4
Fear 13.4 3.8 17.2 12.3 9.1 6.9 6.8
4.5 9.1 11.3 5.7
Happiness 13.8 8.2 10.1 18.8 10.1 5.1 6.4
6.8 8.9 6.9 4.9
Interest 9.7 6.2 9.8 13.2 18.4 9.3 8.4
4.6 9.0 6.6 4.9
Lust 4.8 5.8 6.3 4.9 6.9 23.9 7.4
2.8 21.3 6.7 9.1
Neutral 8.9 5.4 6.2 7.9 7.1 8.4 16.4
3.8 14.5 9.8 11.5
Pride 12.7 9.3 9.2 13.5 10.1 7.7 7.8
5.0 11.4 6.5 6.8
Relief 8.0 7.5 6.4 7.8 6.0 15.1 7.3
4.5 24.3 5.4 7.7
Sadness 8.2 6.3 12.5 9.1 8.5 10.1 9.6
4.7 10.7 10.4 9.8
Shame 8.5 6.7 8.4 6.4 4.9 14.9 11.1
4.4 15.4 9.4 10.0
______________________________________________________________________________________________
______________________
5) Omni-cultural/within-intensity

Classified emotion
_____ ___
Intended Anger Contempt Fear Happiness Interest Lust Neutral
Pride Relief Sadness Shame emotion
______________________________________________________________________________________________
______________________
Anger 32.4 8.1 11.2 18.0 5.9 2.1 4.0
7.1 6.6 1.9 2.8
Contempt 14.6 16.3 4.9 9.1 10.0 6.6 8.6
11.2 6.6 4.9 7.1
Fear 11.1 5.6 32.7 9.0 9.4 5.3 4.6
1.9 7.5 8.4 4.5
Happiness 17.9 7.5 11.7 24.4 10.8 1.9 4.9
8.5 6.2 3.0 3.0
Interest 10.0 8.4 8.7 9.9 26.4 4.5 6.0
7.4 8.7 6.8 3.2
Lust 4.1 4.1 12.1 3.5 6.0 27.4 7.0
4.5 15.6 1.6 14.1
Neutral 4.1 9.9 6.6 4.1 13.2 3.7 32.9
4.5 6.2 4.5 10.3
Pride 11.8 10.4 10.0 15.9 9.6 6.3 5.9
10.7 7.8 5.6 5.9
Relief 7.9 8.0 8.4 6.0 4.3 10.2 9.0
3.6 28.9 4.0 9.5
Sadness 7.7 5.1 14.9 10.2 6.0 5.8 11.7
5.2 5.3 17.0 10.9
Shame 5.2 5.6 4.8 8.6 6.5 10.4 13.8
6.1 13.4 10.9 14.7
______________________________________________________________________________________________
______________________

You might also like