preeti rao | IIT Bombay (original) (raw)
Papers by preeti rao
Signal Processing, 2000
The objective of this paper is to critically evaluate the performance of a nonstationary analysis... more The objective of this paper is to critically evaluate the performance of a nonstationary analysis method in tracking speech formant frequencies as they change with time due to the natural variations in the vocal-tract system during speech production. The method of instantaneous frequency estimation is applied to the tracking of speech formant frequencies to observe the time variations in the vocal-tract system characteristics within a pitch period. An implementation of an instantaneous frequency estimator based on the source}"lter model of speech production is described for voiced speech formants. Based on experimental results from simulated as well as natural speech data, it is shown that the accuracy of the frequency estimates is heavily dependent on the nature of the glottal excitation waveform, the fundamental frequency and the frequency spacing of the formants in the speech signal. The choice of various analysis parameters on the accuracy of the estimates is discussed. It is shown that only when the formants are well separated and there are distinct regions of the glottal cycle in which the source excitation can be considered to be negligible, does the instantaneous frequency estimate accurately represent the actual formant frequency. Experimental results on natural speech vowels which show di!erences in formant frequencies in the di!erent phases of the glottal cycle are presented. .in (P. Rao). 0165-1684/00/$ -see front matter
This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a s... more This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a song. The motivation for this work lies in developing automatic tools for the melodybased indexing of the database in a music retrieval system. The melody is assumed to be carried by the singer's voice accompanied mainly by percussive instruments. This scenario is typical of a large class of Indian movie songs. The challenges raised by this application are presented. A pitch detection method based on a perceptual model is shown to be a promising approach to the tracking of voice pitch in the presence of strong percussive background.
The spectrum interpolation synthesis model has recently been applied in the high quality synthesi... more The spectrum interpolation synthesis model has recently been applied in the high quality synthesis of harmonic musical sounds. In this work we investigate the performance of the model in the compression of music signals. Efficient methods for the automatic analysis, parameter extraction and synthesis of musical signals are presented. The system is tested on several examples of segments from wind and bowed string instruments. It is found that typically a perceived quality matching the original is obtained even when large portions of the waveform are generated by interpolation, implying that a high degree of compression is possible. Further, there is a graceful degradation in quality as the extent of interpolation is increased which makes the model well suited for use in a scalable audio coding framework.
The extraction of pitch (or fundamental frequency) information from polyphonic audio signals rema... more The extraction of pitch (or fundamental frequency) information from polyphonic audio signals remains a challenging problem. The specific case of detecting the pitch of a melodic instrument playing in a percussive background is presented. Time-domain pitch detection algorithms based on a temporal autocorrelation model, including the Meddis-Hewitt algorithm, are considered. The temporal and spectral characteristics of percussive interference degrade the performance of the pitch detection algorithms to various extents. From an experimental study of the pitch estimation errors obtained on a set of synthetic musical signals, the effectiveness of the auditory-perception-based modules of the Meddis-Hewitt pitch detection algorithm in improving the robustness of fundamental frequency tracking in the presence of percussive interference is discussed.
Journal of The Acoustical Society of America, 1991
Journal of The Acoustical Society of America, 2001
Both in speech synthesis and in sound coding it is often beneficial to have a measure that predic... more Both in speech synthesis and in sound coding it is often beneficial to have a measure that predicts whether, and to what extent, two sounds are different. This paper addresses the problem of estimating the perceptual effects of small modifications to the spectral envelope of a harmonic sound. A recently proposed auditory model is investigated that transforms the physical spectrum into a pattern of specific loudness as a function of critical band rate. A distance measure based on the concept of partial loudness is presented, which treats detectability in terms of a partial loudness threshold. This approach is adapted to the problem of estimating discrimination thresholds related to modifications of the spectral envelope of synthetic vowels. Data obtained from subjective listening tests using a representative set of stimuli in a 3IFC adaptive procedure show that the model makes reasonably good predictions of the discrimination threshold. Systematic deviations from the predicted thresholds may be related to individual differences in auditory filter selectivity. The partial loudness measure is compared with previously proposed distance measures such as the Euclidean distance between excitation patterns and between specific loudness applied to the same experimental data. An objective test measure shows that the partial loudness measure and the Euclidean distance of the excitation patterns are equally appropriate as distance measures for predicting audibility thresholds. The Euclidean distance between specific loudness is worse in performance compared with the other two.
... In this paper, we explore the feasibility of building a melody retrieval system for Hindi fil... more ... In this paper, we explore the feasibility of building a melody retrieval system for Hindi film songs (an undeniably significant segment of the Indian audio entertainment industry, and also provides the experience needed to develop more a more general system in future). ...
Speech Communication, 2006
Traditional short-time spectral attenuation (STSA) speech enhancement algorithms are ineffective ... more Traditional short-time spectral attenuation (STSA) speech enhancement algorithms are ineffective in the presence of highly nonstationary noise due to difficulties in the accurate estimation of the local noise spectrum. With a view to improve the speech quality in the presence of random noise bursts, characteristic of many environmental sounds, a simple postprocessing scheme is proposed that can be applied to the output of an STSA speech enhancement algorithm. The postprocessing algorithm is based on using spectral properties of the noise in order to detect noisy time-frequency regions which are then attenuated using a SNR-based rule. A suitable suppression rule is developed that is applied to the detected noisy regions so as to achieve significant reduction of noise with minimal speech distortion. The post-processing method is evaluated in the context of two well-known STSA speech enhancement algorithms and experimental results demonstrating improved speech quality are presented for a data set of real noise samples.
Music information retrieval is a field of rapidly growing commercial interest. This paper describ... more Music information retrieval is a field of rapidly growing commercial interest. This paper describes TANSEN, a query-by-humming based music retrieval system under development at IIT, Bombay. Named after the legendary musician (and a tenuous acronym for "TA-Note Song Extractor-Navigator), the system is designed to accept acoustic queries in the form of sung fragments, to search a database of Indian film songs. Algorithms for the extraction of melody from the query signal, and pattern matching for search and retrieval from the database are presented. The user interface is described, and experimental results obtained on a prototype version are reported.
Signal Processing, 2000
The objective of this paper is to critically evaluate the performance of a nonstationary analysis... more The objective of this paper is to critically evaluate the performance of a nonstationary analysis method in tracking speech formant frequencies as they change with time due to the natural variations in the vocal-tract system during speech production. The method of instantaneous frequency estimation is applied to the tracking of speech formant frequencies to observe the time variations in the vocal-tract system characteristics within a pitch period. An implementation of an instantaneous frequency estimator based on the source}"lter model of speech production is described for voiced speech formants. Based on experimental results from simulated as well as natural speech data, it is shown that the accuracy of the frequency estimates is heavily dependent on the nature of the glottal excitation waveform, the fundamental frequency and the frequency spacing of the formants in the speech signal. The choice of various analysis parameters on the accuracy of the estimates is discussed. It is shown that only when the formants are well separated and there are distinct regions of the glottal cycle in which the source excitation can be considered to be negligible, does the instantaneous frequency estimate accurately represent the actual formant frequency. Experimental results on natural speech vowels which show di!erences in formant frequencies in the di!erent phases of the glottal cycle are presented. .in (P. Rao). 0165-1684/00/$ -see front matter
This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a s... more This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a song. The motivation for this work lies in developing automatic tools for the melodybased indexing of the database in a music retrieval system. The melody is assumed to be carried by the singer's voice accompanied mainly by percussive instruments. This scenario is typical of a large class of Indian movie songs. The challenges raised by this application are presented. A pitch detection method based on a perceptual model is shown to be a promising approach to the tracking of voice pitch in the presence of strong percussive background.
The spectrum interpolation synthesis model has recently been applied in the high quality synthesi... more The spectrum interpolation synthesis model has recently been applied in the high quality synthesis of harmonic musical sounds. In this work we investigate the performance of the model in the compression of music signals. Efficient methods for the automatic analysis, parameter extraction and synthesis of musical signals are presented. The system is tested on several examples of segments from wind and bowed string instruments. It is found that typically a perceived quality matching the original is obtained even when large portions of the waveform are generated by interpolation, implying that a high degree of compression is possible. Further, there is a graceful degradation in quality as the extent of interpolation is increased which makes the model well suited for use in a scalable audio coding framework.
The extraction of pitch (or fundamental frequency) information from polyphonic audio signals rema... more The extraction of pitch (or fundamental frequency) information from polyphonic audio signals remains a challenging problem. The specific case of detecting the pitch of a melodic instrument playing in a percussive background is presented. Time-domain pitch detection algorithms based on a temporal autocorrelation model, including the Meddis-Hewitt algorithm, are considered. The temporal and spectral characteristics of percussive interference degrade the performance of the pitch detection algorithms to various extents. From an experimental study of the pitch estimation errors obtained on a set of synthetic musical signals, the effectiveness of the auditory-perception-based modules of the Meddis-Hewitt pitch detection algorithm in improving the robustness of fundamental frequency tracking in the presence of percussive interference is discussed.
Journal of The Acoustical Society of America, 1991
Journal of The Acoustical Society of America, 2001
Both in speech synthesis and in sound coding it is often beneficial to have a measure that predic... more Both in speech synthesis and in sound coding it is often beneficial to have a measure that predicts whether, and to what extent, two sounds are different. This paper addresses the problem of estimating the perceptual effects of small modifications to the spectral envelope of a harmonic sound. A recently proposed auditory model is investigated that transforms the physical spectrum into a pattern of specific loudness as a function of critical band rate. A distance measure based on the concept of partial loudness is presented, which treats detectability in terms of a partial loudness threshold. This approach is adapted to the problem of estimating discrimination thresholds related to modifications of the spectral envelope of synthetic vowels. Data obtained from subjective listening tests using a representative set of stimuli in a 3IFC adaptive procedure show that the model makes reasonably good predictions of the discrimination threshold. Systematic deviations from the predicted thresholds may be related to individual differences in auditory filter selectivity. The partial loudness measure is compared with previously proposed distance measures such as the Euclidean distance between excitation patterns and between specific loudness applied to the same experimental data. An objective test measure shows that the partial loudness measure and the Euclidean distance of the excitation patterns are equally appropriate as distance measures for predicting audibility thresholds. The Euclidean distance between specific loudness is worse in performance compared with the other two.
... In this paper, we explore the feasibility of building a melody retrieval system for Hindi fil... more ... In this paper, we explore the feasibility of building a melody retrieval system for Hindi film songs (an undeniably significant segment of the Indian audio entertainment industry, and also provides the experience needed to develop more a more general system in future). ...
Speech Communication, 2006
Traditional short-time spectral attenuation (STSA) speech enhancement algorithms are ineffective ... more Traditional short-time spectral attenuation (STSA) speech enhancement algorithms are ineffective in the presence of highly nonstationary noise due to difficulties in the accurate estimation of the local noise spectrum. With a view to improve the speech quality in the presence of random noise bursts, characteristic of many environmental sounds, a simple postprocessing scheme is proposed that can be applied to the output of an STSA speech enhancement algorithm. The postprocessing algorithm is based on using spectral properties of the noise in order to detect noisy time-frequency regions which are then attenuated using a SNR-based rule. A suitable suppression rule is developed that is applied to the detected noisy regions so as to achieve significant reduction of noise with minimal speech distortion. The post-processing method is evaluated in the context of two well-known STSA speech enhancement algorithms and experimental results demonstrating improved speech quality are presented for a data set of real noise samples.
Music information retrieval is a field of rapidly growing commercial interest. This paper describ... more Music information retrieval is a field of rapidly growing commercial interest. This paper describes TANSEN, a query-by-humming based music retrieval system under development at IIT, Bombay. Named after the legendary musician (and a tenuous acronym for "TA-Note Song Extractor-Navigator), the system is designed to accept acoustic queries in the form of sung fragments, to search a database of Indian film songs. Algorithms for the extraction of melody from the query signal, and pattern matching for search and retrieval from the database are presented. The user interface is described, and experimental results obtained on a prototype version are reported.