Discriminating between High-Arousal and Low-Arousal Emotional States of Mind using Acoustic Analysis (original) (raw)
Related papers
Acoustic and Physiological Feature Analysis of Affective Speech
Lecture Notes in Computer Science, 2006
The paper presents our recent work on the acoustic and physiological feature analysis of affective speech. An affective speech corpus is first built up. It contains passages read in neutral state and ten typical affective states selected in Pleasure Arousal Dominance (PAD) space. Physiological data, including electrocardiogram, respiration, electro dermal data, and finger pulse, are also collected synchronized with speech. Then based on the corpus, the relationship between affective states and acoustic\physiological features is studied through correlation analysis and co-clustering analysis. The analysis results show that most acoustic features and physiological features are significantly correlated with the arousal dimension, whereas only respiration features are more correlated with the pleasure dimension.
Classification of emotional speech through spectral pattern features
2014
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition are proposed. These features extracted from the spectrogram of speech signal using image processing techniques. For this purpose, details in the spectrogram image are firstly highlighted using histogram equalization technique. Then, directional filters are applied to decompose the image into 6 directional components. Finally, binary masking approach is employed to extract SPs from sub-banded images. The proposed HEs are also extracted by implementing the band pass filters on the spectrogram image. The extracted features are reduced in dimensions using a filtering feature selection algorithm based on fisher discriminant ratio. The classification accur...
2022
In the presented study, an algorithm is analized to esttimate speakers' emotions using speech analysis techniques. The primary focus of the algorithm involves analyzing the trajectory of the fundamental frequency F0(t) to accurately determine a range of emotional states. This investigation includes a comprehensive analysis conducted within planes (F0, 2) and (F0, T). During the initial training phase, a clear decision criterion is established to differentiate between emotional states. This criterion is defined based on the analysis of test signals and is positioned within the designated planes. The performance evaluation of the algorithm is executed during the testing phase, utilizing a confusion matrix. This evaluation allows for a precise assessment of the algorithm's capability to detect emotional states. Furthermore, a comparative study is undertaken, comparing outcomes related to the identification of various emotional states such as Normal/Anger, Normal/Boredom, and Normal/Anxiety. To provide a comprehensive presentation of the algorithm's effectiveness in identifying emotional states, the results are presented in tabulles and graphs.
2016
The objective of this work is the comparative study of the speech signal between the silence and nonsilence regions of the speech signal. In this work our main goal is to observe the pitch contour, energy and duration are time-varying and also study how these changes play an important role in emotion recognition. An important step in emotion recognition from speech is to select significant features which carry large emotional information about the speech signal. It was given that emotion recognition from speech has different types of features, among them is prosody, spectral and acoustic features. Sometimes prosody features are called supra-segmental features. It deals with the auditory qualities of the sound and it can also reflect aspects of meaning, intention and emotional state of the characters [1] [2].Prosody Feature consists of more pitch information which is used in identifying the emotion such as Pitch, Energy, and Duration. In this work we also explored the importance of t...
Review on Detection and Analysis of Emotion from Speech Signals
Perceiving feeling from discourse has turned out to be one the dynamic research topics in discourse handling and in applications in view of human-PC cooperation. The feelings considered for the tests incorporate happy, sad, fear, anger, boredom and neutral. The recognize capacity of passionate highlights in discourse were examined first took after by feeling characterization performed on a custom dataset. The arrangement was performed for various classifiers. One of the primary component quality considered in the arranged dataset was the crest to-top separation got from the graphical portrayal of the discourse signals. Feeling is characterized as the constructive or adverse condition of a man's mind which is connected with an example of physiological exercises. Feelings portray the psychological condition of a man. Using MFCC based parameters show the energy migration in frequency domain and also helps in identifying phonetic characteristics of speech. Feature extraction process done by using MFCC.
New approach in quantification of emotional intensity from the speech signal: emotional temperature
Expert Systems with Applications, 2015
The automatic speech emotion recognition has a huge potential in applications of fields such as psychology, psychiatry and the affective computing technology. The spontaneous speech is continuous, where the emotions are expressed in certain moments of the dialogue, given emotional turns. Therefore, it is necessary that the real-time applications are capable of detecting changes in the speaker's affective state. In this paper, we emphasize on recognizing activation from speech using a few feature set obtained from a temporal segmentation of the speech signal of different language like German, English and Polish. The feature set includes two prosodic features and four paralinguistic features related to the pitch and spectral energy balance. This segmentation and feature set are suitable for real-time emotion applications because they allow detect changes in the emotional state with very low processing times. The German Corpus EMO-DB (Berlin Database of Emotional Speech), the English Corpus LDC (Emotional Prosody Speech and Transcripts database) and the Polish Emotional Speech Database are used to train the Support Vector Machine (SVM) classifier and for genderdependent activation recognition. The results are analyzed for each speech emotion with gender-dependent separately and obtained accuracies of 94.9%, 88.32% and 90% for EMO-DB, LDC and Polish databases respectively. This new approach provides a comparable performance with lower complexity than other approaches for real-time applications, thus making it an appealing alternative, may assist in the future development of automatic speech emotion recognition systems with continuous tracking.
Emotion Extraction from Human Voice Using Voice Frequency Spectrum Analysis
In this work, emotion recognition from speech data is studied. Today, the applications and studies over speech are focused on text and/or context, but this approach mostly remains lacking about the main thing that people wanted to say, for example metaphors are generally used to explain a situation with reverse meaning. Emotion recognition from speech can be performed in three steps: filtering, analysis, and determination of the speaker’s emotion. First, if speech is not recorded at an isolated environment, speech data is filtered to clean out the noise produced by other speakers or environmental sounds in the conversation. Second, magnitude changes in the frequency spectrum of the speech signal are analyzed. Momentarily speeches do not have any text; therefore, we have to analyze speech rapidly. Then we should analyze sentimental elements. According to the developed method, definite magnitude fluctuations in a conversation show emotional changes. Third, the speaker’s emotions are classified using commonly used classification methods like neural networks or decision trees. First purpose of our work is finding main message in people’s speeches using intonations. Second purpose of our work is making people and computers communicate naturally with speaking.
Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
Applied Sciences, 2019
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional S...
Classification of emotions from speech signal
The article presents an analysis of the possibility of recognizing from speech signal in Polish language. In order to perform experiments a database containing speech recordings with emotional content was created. On its basis, extraction of features from the speech signals was performed. The most important step was to determine which of the previously extracted features were the most suitable to distinguish emotions and with what accuracy the emotions could be classified. Two feature selection methods-Sequential Forward Search (SFS) and t-statistics were examined. Emotion classification was implemented using k-Nearest Neighbor (k-NN), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) classifiers. Classification was carried out for pairs of emotions. The best results were obtained for classifying neutral and fear (91.9%) and neutral and joy emotions (89.6%).
Acoustic Markers of Emotions Based on Voice Physiology
Acoustic models of emotions may benefit from considering the underlying voice production mechanism. This study sought to describe emotional expressions according to physiological variations measured from the inverse-filtered glottal waveform in addition to standard parameter extraction. An acoustic analysis was performed on a subset of the /a/ vowels within the GEMEP database (10 speakers, 5 emotions). Of the 12 acoustic features computed, repeated measures ANOVA showed significant main effects for 11 parameters. Subsequent principal components analysis revealed the three components that explain acoustic variations due to emotion, including "tension" (CQ, H1-H2, MFDR, LTAS) "perturbation" (jitter, shimmer, HNR), and "voicing" (fundamental frequency).