ramu reddy - Academia.edu (original) (raw)
Papers by ramu reddy
Forensic Speaker Recognition, 2011
ABSTRACT In this chapter, we propose speaker-specific prosodic features for improving the perform... more ABSTRACT In this chapter, we propose speaker-specific prosodic features for improving the performance of speaker recognition in noisy environments. This approach can be especially useful in the forensic analysis of speech. Degradation in speaker recognition is a common phenomenon observed due to transmission and channel impairments, microphone variability and background noise. In this work spectral features are used to perform speaker recognition in the first stage and dynamic aspects of speaker-specific prosody are used to improve the performance in the second stage. For this task, speech corpus is collected at Indian Institute of Technology, Kharagpur, using 50 speakers recorded over the mobile phone. Background noise is simulated using additive white random noise from Noisex database. Speech enhancement techniques are used to improve the speaker recognition performance in the case of noisy speech. Gaussian mixture models (GMMs) and support vector machines (SVMs ) are used for developing speaker models. Performance of the speaker recognition system is observed to be 55 and 66% using prosodic and spectral features respectively, for TIMIT speech at 15 dB SNR. . The speaker recognition performance of around 73% is achieved using the combination of spectral and prosodic features for noisy speech after speech enhancement.
International Conference on Signal Processing and Communications, 2010
... Shashidhar G. Koolagudi, Ramu Reddy and K. Sreenivasa Rao School of Information Technology, I... more ... Shashidhar G. Koolagudi, Ramu Reddy and K. Sreenivasa Rao School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur - 721302, West Bengal ... In the study we have considered six emotions namely anger, disgust, fear, happy, neutral and sadness ...
International Conference on Devices and Communications, 2011
In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotio... more In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.
Dedicated to Professor Barry M. Trost on the occasion of his 70th birthday.
Forensic Speaker Recognition, 2011
ABSTRACT In this chapter, we propose speaker-specific prosodic features for improving the perform... more ABSTRACT In this chapter, we propose speaker-specific prosodic features for improving the performance of speaker recognition in noisy environments. This approach can be especially useful in the forensic analysis of speech. Degradation in speaker recognition is a common phenomenon observed due to transmission and channel impairments, microphone variability and background noise. In this work spectral features are used to perform speaker recognition in the first stage and dynamic aspects of speaker-specific prosody are used to improve the performance in the second stage. For this task, speech corpus is collected at Indian Institute of Technology, Kharagpur, using 50 speakers recorded over the mobile phone. Background noise is simulated using additive white random noise from Noisex database. Speech enhancement techniques are used to improve the speaker recognition performance in the case of noisy speech. Gaussian mixture models (GMMs) and support vector machines (SVMs ) are used for developing speaker models. Performance of the speaker recognition system is observed to be 55 and 66% using prosodic and spectral features respectively, for TIMIT speech at 15 dB SNR. . The speaker recognition performance of around 73% is achieved using the combination of spectral and prosodic features for noisy speech after speech enhancement.
International Conference on Signal Processing and Communications, 2010
... Shashidhar G. Koolagudi, Ramu Reddy and K. Sreenivasa Rao School of Information Technology, I... more ... Shashidhar G. Koolagudi, Ramu Reddy and K. Sreenivasa Rao School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur - 721302, West Bengal ... In the study we have considered six emotions namely anger, disgust, fear, happy, neutral and sadness ...
International Conference on Devices and Communications, 2011
In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotio... more In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.
Dedicated to Professor Barry M. Trost on the occasion of his 70th birthday.