Cross-cultural and cross-linguistic perception of authentic emotions through speech: An acoustic-phonetic study with Brazilian and Swedish listeners (original) (raw)

The influence of language and culture on the understanding of vocal emotions

We investigated the influence of culture and language on the understanding of speech emotions. Listeners from different cultures and language families had to recognize moderately expressed vocal emotions (joy, anger, sadness) and neutrality of each sentence in foreign speech without seeing the speaker. The web-based listening test consisted of 35 context-free sentences drawn from the Estonian Emotional Speech Corpus. Eight adult groups participated, comprising: 30 Estonians; 17 Latvians; 16 North-Italians; 18 Finns; 16 Swedes; 16 Danes; 16 Norwegians; 16 Russians. All participants lived in their home countries and, except the Estonians, had no knowledge of Estonian. Results showed that most of the test groups differed significantly from Estonians in the recognition of most emotions. Only Estonian sadness was recognized well by all test groups. Results indicated that genealogical relation of languages and similarity of cultural values are not advantages in recognizing vocal emotions expressed in a different culture and language.

(Full Paper) Perception of emotional speech in Brazilian Portuguese: an intonational and multidimensional approach

The analysis of emotional speech has played an important role in the field of speech science. Studies on expression of emotion in humans were conducted by Darwin (1872), Lange (1885) and James (1890). However, little attention was paid to emotional speech. In the twentieth century several studies investigated the manifestation of emotion in speech (Skinner, 1935; Costanzo; Merkel; Costanzo, 1969; Wallbott; Scherer; 1986), but a few works compared emotional speech in different languages (Scherer, 2000; Scherer et al., 2001; Pfitzinger et al., 2011), which analysed acted emotional speech and non natural language stimuli. This study aims to analyse the perception that Brazilian and English people (non Portuguese speakers) have of Brazilian Portuguese (BP) emotional speech, using a three dimension model of emotion. The result of judgments of participants was compared to the acoustical parameters of intonation extracted by the software ExProsodia (Ferreira Netto, 2008, 2010). The corpus is composed of 32 excerpts of spontaneous speech of BP produced by different speakers and collected on internet. The three dimensions adopted were valence (unpleasant - pleasant), activation (non agitated - agitated) and dominance (submissive - non submissive). The participants (18 Brazilian and 18 English) listened to 32 excerpts of speech and rated each dimension on scale from 0 to 100. The acoustical parameters of intonation were nine: the medium tone of sentences (MT); standard deviation of medium tone (sdMT); skewness of medium tone (sMT); coefficient of variation of medium tone (cvMT); lower value (Hz) of intonation unit (lvIU); duration (ms) of intonation unit (IU); standard deviation of intonation unit (sdIU); interval duration of intonation unit (idIU); standard deviation of interval of intonation unit (sdiIU). Simple and complex linear regressions were done in order to measure how the judgements of participants and the acoustical parameters of intonation could be related. The results of simple linear regression showed that the perception of degrees of activation could be predicted by some intonational acoustic parameters. The higher cvTM values are, the greater the activation. On the other hand, the lower IU values are, the greater the activation. Regarding the perception of degrees of dominance, TM had significant results related to Brazilian answers. The complex linear regression showed similar results, but with more capacity of explanation, i.e. higher R² values. The general results pointed out to a relation between high values of TM and cvTM (plus low values of IU, sdIU, idIU and sdiIU) and high evaluation of activation and dominance. The dimension valence did not present any significant result when compared to intonational parameters. The analysis showed that evaluation of non native speakers could be explained by acoustical information, without the influence of lexicon. Also, the intonational parameters extracted automatically can predict the perception of degrees of activation and dominance in emotional speech in BP.

Factors in the recognition of vocally expressed emotions: A comparison of four languages

Journal of Phonetics, 2009

To understand how language influences the vocal communication of emotion, we investigated how discrete emotions are recognized and acoustically differentiated in four language contexts-English, German, Hindi, and Arabic. Vocal expressions of six emotions (anger, disgust, fear, sadness, happiness, pleasant surprise) and neutral expressions were elicited from four native speakers of each language. Each speaker produced pseudo-utterances (''nonsense speech'') which resembled their native language to express each emotion type, and the recordings were judged for their perceived emotional meaning by a group of native listeners in each language condition. Emotion recognition and acoustic patterns were analyzed within and across languages. Although overall recognition rates varied by language, all emotions could be recognized strictly from vocal cues in each language at levels exceeding chance. Anger, sadness, and fear tended to be recognized most accurately irrespective of language. Acoustic and discriminant function analyses highlighted the importance of speaker fundamental frequency (i.e., relative pitch level and variability) for signalling vocal emotions in all languages. Our data emphasize that while emotional communication is governed by display rules and other social variables, vocal expressions of 'basic' emotion in speech exhibit modal tendencies in their acoustic and perceptual attributes which are largely unaffected by language or linguistic similarity. r

Åsa Abelin y J Allwood - Cross Linguistic Interpretation of Emotional Prosody

This study has three purposes: the first is to study if there is consensus in the way listeners interpret different emotions and attitudes expressed by a Swedish speaker, the second is to see if this interpretation is dependent on the listeners' cultural and linguistic background, and the third is to ascertain whether there is any reoccurring relation between acoustic and semantic properties of the stimuli.

Common Factors in Emotion Perception Among Different Cultures

2000

There are may exist some common factors independent of languages and cultures in human perception of emotion via speech sounds. This study investigated the factors using subjects from Japan, the United States and China, all of whom have no experience living abroad. An emotional speech database sans linguistic information was used in this study and evaluated using 3- and 6-emotional

Emotional Prosody-Does Culture Make A Difference

Speech Prosody, 2006

We report on a multilingual comparison study on the effects of prosodic changes on emotional speech. The study was conducted in France, Germany, Greece and Turkey. Semantically identical sentences expressing emotional relevant content were translated into the target languages and were manipulated systematically with respect to pitch range, duration model, and jitter simulation. Perception experiments in the participating countries showed relevant effects irrespective of language. Nonetheless, some effects of language are also reported.

Relating Emotional Content to Speech Rate in Brazilian Portuguese

Speech Prosody 2004, 2004

Emotion is frequently conceived, in the speech science literature, in such a way as to organize the relationship between specific emotions taken as psychological concepts and phonetic parameters such as voice quality, speech rate, and prominence, in the phonetic domain. The main goal of this paper is to propose a language-based analysis of the emotional content of the text in order to get a more abstract and culturally independent approach of emotion in speech. This work presents an experiment that relates speech rate and the temporality as a constitutive element of emotion. We are able to quantify the temporal content of emotion in the text by a semiotics analysis.

The development of cross-cultural recognition of vocal emotion during childhood and adolescence OPEN by Chronaki, Wigelsworth, Pell & Kotz

2018

Humans have an innate set of emotions recognised universally. However, emotion recognition also depends on socio-cultural rules. Although adults recognise vocal emotions universally, they identify emotions more accurately in their native language. We examined developmental trajectories of universal vocal emotion recognition in children. Eighty native English speakers completed a vocal emotion recognition task in their native language (English) and foreign languages (Spanish, Chinese, and Arabic) expressing anger, happiness, sadness, fear, and neutrality. Emotion recognition was compared across 8-to-10, 11-to-13-year-olds, and adults. Measures of behavioural and emotional problems were also taken. Results showed that although emotion recognition was above chance for all languages, native English speaking children were more accurate in recognising vocal emotions in their native language. There was a larger improvement in recognising vocal emotion from the native language during adolescence. Vocal anger recognition did not improve with age for the non-native languages. This is the first study to demonstrate universality of vocal emotion recognition in children whilst supporting an " in-group advantage " for more accurate recognition in the native language. Findings highlight the role of experience in emotion recognition, have implications for child development in modern multicultural societies and address important theoretical questions about the nature of emotions. Vocal cues provide a rich source of information about a speaker's emotional state. The term 'prosody' derives from the Greek word 'prosodia' and refers to the changes in pitch, loudness, rhythm, and voice quality corresponding to a person's emotional state 1,2. Recent debates have focused on whether the ability to recognise vocal emotion is universal (e.g., due to biological significance to conspecifics) or whether it is influenced by learning, experience, or maturation 3,4. It is argued that humans have an innate, core set of emotions which seem to be expressed and recognised universally 5. However, the way emotional expressions are perceived can be highly dependent on learning and culture 6. It has been argued that when attending to the prosody conveyed in speech, listeners apply universal principles enabling them to recognise emotions in speech from foreign languages as accurately as their native language 7. However, it is also argued that cultural and social influences create subtle stylistic differences in emotional prosody perception 3. In addition, cultural influences may impact on how listeners interpret emotional meaning from prosody 8. This is known as an " in-group advantage " enabling listeners to recognise emotional expressions in their native language more accurately than in a foreign languages 7. Previous research has provided support for the hypothesis of an " in-group advantage " in the recognition of vocal emotional expressions. Recent studies by Pell and colleagues 9 used pseudo-utterances produced by Spanish, English, German, and Arabic actors in five different emotions (anger, disgust, fear, sadness and happiness) as well as neutral expressions. Pseudo-utterances reduce the effect of meaningful lexical-semantic information on the perception of vocally expressed emotions and mimic the phonotactic and morpho-syntactic properties of the

Real vs. acted emotional speech: Comparing South-Asian and Caucasian speakers and observers

2008

Both acted and real emotional audiovisual speech was collected from 50 Caucasian speakers (from The Netherlands) and 45 South-Asian speakers (from Pakistan), using a novel adaptation of the Velten technique, in which some participants are asked to act as if they are in some emotional state, while these emotions are really induced in other participants. Generally, the acted conditions did not lead to systematic non-neutral emotions, while the non-acted conditions lead to the intended emotions being induced in both the Dutch and the Pakistani speakers (where the latter seemed to feel the emotions more strongly). Next we performed a series of perception experiments, in which Dutch and Pakistani observers were asked to judge the emotional state of Dutch and Pakistani speakers. It was found that acted emotions of speakers from both cultures were perceived as stronger than their non-acted counterparts by both Dutch and Pakistani observers. Interestingly, for Dutch speakers the negative emotions were perceived as relatively strong, whereas for the Pakistani speakers the positive emotions stood out perceptually. 1

Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures

Journal of Cross-Cultural Psychology, 2001

Whereas the perception of emotion from facial expression has been extensively studied cross-culturally, little is known about judges’ ability to infer emotion from vocal cues. This article reports the results from a study conducted in nine countries in Europe, the United States, and Asia on vocal emotion portrayals of anger, sadness, fear, joy, and neutral voice as produced by professional German actors. Data show an overall accuracy of 66% across all emotions and countries. Although accuracy was substantially better than chance, there were sizable differences ranging from 74% in Germany to 52% in Indonesia. However, patterns of confusion were very similar across all countries. These data suggest the existence of similar inference rules from vocal expression across cultures. Generally, accuracy decreased with increasing language dissimilarity from German in spite of the use of language-free speech samples. It is concluded that culture- and language-specific paralinguistic patterns m...