Psychoacoustic cues to emotion in speech prosody and music (original) (raw)

"Both music and speech have the capacity to communicate emotions to listeners through the organization of acoustic signals. Furthermore, there is evidence of shared acoustic profiles common to the expression of emotions in both domains. Previous research therefore provides a basis to theorise the existence of a general mechanism for the expression and recognition of emotions in speech and music, although the generality and ecological validity of these studies is limited. This research project combines a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. Previous research using this method has shown that spatiotemporal dynamics in psychoacoustic features of music are associated with two psychological dimensions of affect underlying judgments of subjective feelings: arousal and valence (Coutinho & Cangelosi, 2009; Coutinho & Cangelosi, 2011). The empirical study reported here provides human ratings of emotions perceived in excerpts of film music and natural speech samples for a large range of emotional states. The computational study creates a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. Extending previous findings, we show that a significant part of the listeners’ reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain. Key words: Emotion, Arousal and Valence, Music, Speech prosody, Psychoacoustics, Neural Networks"