NAVNEET UPADHYAY - Academia.edu (original) (raw)
Papers by NAVNEET UPADHYAY
2013 National Conference on Communications (NCC), 2013
A spectral subtraction technique is presented for real-time speech enhancement in the aids used b... more A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real-time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-satge cascadedmedian, on a 16-bit fixed-point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4-13 dB.
The aim of this paper is to present an esoteric model of a nuclear spectrometer e.g. a gamma ray ... more The aim of this paper is to present an esoteric model of a nuclear spectrometer e.g. a gamma ray spectrometer as a communication channel [15]. The source entropy, receiver entropy, and joint entropy of a gamma ray spectrometer were estimated for an observed 1K gamma spectrum containing a 662keV peak from a Cs source. The information loss estimated for the observed gamma spectrum was of the order of 94.5%. In a typical communication engineering channel, the information loss is of the order of 30% [6]. The information loss in a gamma spectrometer is far more than that in a communication channel. Hence the information extraction in a nuclear spectrometer is extremely challenging vis-a-vis communication channel. This also explains high redundancy of spectral channels, and justifies that a priori information required is much more than the a posteriori information extracted in nuclear spectrometers. Also the model amply justifies wide ranging results for the IAEA intercomparison of spectr...
This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improv... more This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improving the quality of speech signal in various noise environments. In the proposed enhancement algorithm, the whole speech spectrum is divided into different uniformly spaced continuous frequency bands and spectral over-subtraction is performed independently, in each band. The proposed algorithm uses a novel approach to estimate the noise ffom each band continuously, without using speech pause detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each uniformly spaced frequency band. The smoothing parameter is controlled by a linear function of a-posteriori signal-to-noise ratio (SNR). The experiments are conducted for various types of noises. The results of proposed enhancement algorithm are compared with the reference multi-band spectral subtraction algorithm. To test the performance of the proposed speech enhancement algorithm, objective quality ...
Applied Speech Processing
International Journal of Speech Technology
The performance of speech recognition system degrades significantly in real-world environment, is... more The performance of speech recognition system degrades significantly in real-world environment, is a case of the acoustic mismatch between the training and operating conditions. This paper presents a two-stage approach to make a speech recognition system immune to additive and uncorrelated background noise i.e. robust . In the first stage, an oversampled wavelet packet decomposes the entire input noisy speech into seventeen nonlinear frequency subbands like the Bark scale of the human hearing system and the adaptive noise estimation based spectral subtraction filters the noisy speech from each subband signal. The oversampled WPT is linear and advantageous as it causes to overcome the shift - invariance complexity by removing the decimation after the filtering at each decomposition level. In the second stage, a nonparametric approach is used for feature extraction from filtered speech, and the parameters from the feature extraction stage are compared with the parameters extracted from speech signals stored in a template to recognize the utterance. A series of experiments are carried out to evaluate the performance of the proposed two-stage system in a variety of real environments, with and without the use of the first stage. Recognition accuracy is evaluated at the word level in a wide range of SNRs for various types of noisy environments. The experimental results show significant improvement in recognition performance at low SNR using the proposed system.
National Academy Science Letters
The performance level of speech recognizer drops significantly when there is an acoustic mismatch... more The performance level of speech recognizer drops significantly when there is an acoustic mismatch between training and operational environments. A speech recognizer is called robust if it preserves good recognition accuracy even in the mismatch conditions. Present study addresses the recognition of English speech in noisy environments and presents the comparative study of various frequency scales used in parameterization based on the average recognition rate. For the robust automatic speech reorganization, a front end signal enhancement component, spectral subtraction algorithm, is used to prefilter the noisy input speech prior fed to the recognizer. A number of frequency warped scales namely, perceptual scales viz, Mel scale, Bark scale, equivalent rectangular bandwidth rate scale, and a non-perceptual scale called uniform scale are used in the parameterization for feature extraction from enhanced speech. A suite of experiments is carried out to evaluate the performance of the speech recognizer, with and without the use of a front end signal enhancement component, in a variety of noisy environments. Recognition accuracy is tested in terms of word linguistic levels on a wide range of signal to noise ratios for both stationary and non-stationary noises.
International Journal of Speech Technology, 2016
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their... more Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.
Procedia Computer Science, 2016
This paper discusses the problem of single channel speech enhancement in stationary environments,... more This paper discusses the problem of single channel speech enhancement in stationary environments, and proposes Wiener filtering with the recursive noise estimation algorithm. The Wiener filter is a linear estimator and minimizes the mean-squared error between the original and enhanced speech. The algorithm is implemented in the frequency domain and depends on the filter transfer function from sample to sample based on the speech signal statistics; the local mean and the local variance. For the noise estimation, the recursive noise estimation approach is used. In this approach, the noise estimation is done by past and present spectral power values, using a smoothing parameter. The value of smoothing parameter is selected in between [0 1]. For the performance evaluation of the proposed speech enhancement algorithm objective evaluations with informal listening tests are conducted for the speech sentences, pronounced by male and female speakers from the NOIZEUS corpus, degraded by White as well as Pink noise types at different SNR levels. For objective measures, signal to noise ratio, segmental signal to noise ratio, and the perceptual evaluation of speech quality are used. The measures prove that the speech enhanced by proposed algorithm is more pleasant to the human ear for both noise conditions in comparison to the conventional speech enhancement method.
2014 Fifth International Symposium on Electronic System Design, 2014
This paper discusses the problem of single channel speech enhancement in various noise environmen... more This paper discusses the problem of single channel speech enhancement in various noise environments and presents an improved multi-band speech enhancement using masking properties of the human hearing system to address the additive noise and remnant noise, simultaneously. The improved multi-band spectral subtraction (I-MBSS) is used for enhancing the speech degraded by real-world noises. The IMBSS uses an adaptive noise estimation approach to estimate the noise from each band without complicated speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The noise estimation technique uses a smoothing parameter which is controlled by a linear function of a-posteriori SNR. Subsequently, the human hearing model is applied in the enhanced speech to compute the noise masking threshold and the subtraction parameters are adjusted according to human perception. The method is tested on speech signals with different noise types at different levels and the results are compared to classical multi-band spectral subtraction algorithm. Speech enhancement performance is evaluated using output SNR and the study of the spectrograms as well as informal listening tests with several types of real-world noises. Based on the analyzed speech signals, the proposed enhancement scheme performs better than then the classicalmulti-band spectral subtraction.
2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), 2012
ABSTRACT In this paper, an auditory perception based improved multi-band spectral subtraction alg... more ABSTRACT In this paper, an auditory perception based improved multi-band spectral subtraction algorithm is proposed to enhance the speech signal degraded by non-stationary or colored noises. In the proposed scheme, the whole speech spectrum is divided in different non-uniform bands (N = 6) in accordance to the critical-band rate scale and spectral subtraction is applied separately in each band. The proposed algorithm uses a new approach to estimate the noise power from each band without the need of explicit speech silence detection. The noise estimate is updated by adaptively smoothing the noisy signal power. The smoothing parameter is controlled by a linear function of a-posteriori signal-to-noise ratio (SNR). This noise estimation approach gives accurate results at low SNR and works continuously in the presence of speech. The objective measures as well as informal subjective tests demonstrate that the proposed algorithm reduces remnant noise efficiently and the enhanced speech contains minimal speech distortions with improved SNR.
International Journal of Speech Technology, 2013
ABSTRACT In this paper, we propose a speech enhancement method where the front-end decomposition ... more ABSTRACT In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.
International Journal of Image, Graphics and Signal Processing, 2013
The spectral subtraction method is a classical approach for enhancement of speech degraded by add... more The spectral subtraction method is a classical approach for enhancement of speech degraded by additive background noise. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise spectrum from the noisy speech spectrum. This is also achieved by multiplying the noisy speech spectrum with a gain function and later combining it with the phase of the noisy speech. Besides reducing the background noise, this method introduces an annoying perceptible tonal characteristic in the enhanced speech and affects the human listening, known as remnant musical noise. Several variations and implementations of this method have been adopted in past decades to address the limitations of spectral subtraction method. These variations constitute a family of subtractive-type algorithms and operate in frequency domain. The objective of this paper is to provide an extensive overview of spectral subtractive-type algorithms for enhancement of noisy speech. After the review, this paper is concluded by mentioning a future direction of speech enhancement research from spectral subtraction perspective.
Journal of Signal and Information Processing, 2013
This paper addresses the problem of single-channel speech enhancement in the adverse environment.... more This paper addresses the problem of single-channel speech enhancement in the adverse environment. The critical-band rate scale based on improved multi-band spectral subtraction is investigated in this study for enhancement of single-channel speech. In this work, the whole speech spectrum is divided into different non-uniformly spaced frequency bands in accordance with the critical-band rate scale of the psycho-acoustic model and the spectral over-subtraction is carried-out separately in each band. In addition, for the estimation of the noise from each band, the adaptive noise estimation approach is used and does not require explicit speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The smoothing parameter is controlled by a-posteriori signal-to-noise ratio (SNR). For the performance analysis of the proposed algorithm, the objective measures, such as, SNR, segmental SNR, and perceptual evaluations of the speech quality are conducted for the variety of noises at different levels of SNRs. The speech spectrogram and objective evaluations of the proposed algorithm are compared with other standard speech enhancement algorithms and proved that the musical structure of the remnant noise and background noise is better suppressed by the proposed algorithm.
Journal of Signal and Information Processing, 2013
This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for... more This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for enhancement of single channel speech. In the proposed algorithm, the output of the multi-band spectral subtraction (MBSS) algorithm is used as the input signal again for next iteration process. As after the first MBSS processing step, the additive noise transforms to the remnant noise, the remnant noise needs to be further re-estimated. The proposed algorithm reduces the remnant musical noise further by iterating the enhanced output signal to the input again and performing the operation repeatedly. The newly estimated remnant noise is further used to process the next MBSS step. This procedure is iterated a small number of times. The proposed algorithm estimates noise in each iteration and spectral over-subtraction is executed independently in each band. The experiments are conducted for various types of noises. The performance of the proposed enhancement algorithm is evaluated for various types of noises at different level of SNRs using, 1) objective quality measures: signal-to-noise ratio (SNR), segmental SNR, perceptual evaluation of speech quality (PESQ); and 2) subjective quality measure: mean opinion score (MOS). The results of proposed enhancement algorithm are compared with the popular MBSS algorithm. Experimental results as well as the objective and subjective quality measurement test results confirm that the enhanced speech obtained from the proposed algorithm is more pleasant to listeners than speech enhanced by classical MBSS algorithm.
2012 2nd International Conference on Power, Control and Embedded Systems, 2012
ABSTRACT The spectral subtraction method is a conventional approach for single channel speech enh... more ABSTRACT The spectral subtraction method is a conventional approach for single channel speech enhancement. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combines it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant noise. This paper proposes a novel algorithm to reduce the remnant noise, and thus improving the overall quality of the enhanced speech. In this algorithm, the output of multi-band spectral subtraction (MBSS) method is used as the input signal again for next iteration process. After the MBSS method, the additive noise is changed to remnant noise. The remnant noise is re-estimated at each iteration. The new estimated noise, furthermore, is been used to process the next MBSS. This procedure is iterated a small number of times. The simulation results as well as informal subjective evaluations prove that the speech enhanced by proposed algorithm is more pleasant to listeners than the conventional MBSS algorithm. This reveals that the proposed algorithm reduces remnant noise satisfactorily and produces good speech quality with improved signal-to-noise ratio.
2012 Third International Conference on Computer and Communication Technology, 2012
The spectral subtraction method is a classical approach for enhancement of degraded speech. The b... more The spectral subtraction method is a classical approach for enhancement of degraded speech. The basic principle of this technique is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combine it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant noise. The other drawback of this method is that it can work only for white Gaussian noise which has a flat spectrum and is distributed uniformly over the frequency spectrum. But real-world noise is mostly colored and has a non-uniform spectrum. To take care of this kind of noises, spectral subtraction algorithm has been extended to a multi-band case with uniformly spaced frequency bands. In this paper, a perceptually motivated multi-band spectral subtraction algorithm is proposed to enhance the speech signal degraded by colored noise. In the proposed scheme, the whole speech spectrum is divided in different non-uniform bands (N = 6) in accordance to the critical-band rate scale and spectral subtraction is executed independently in each band. The simulation results as well as informal subjective evaluations show that the proposed algorithm reduces remnant noise efficiently and the enhanced speech contains minimal speech distortions with improved signal-to-noise ratio.
2012 1st International Conference on Recent Advances in Information Technology (RAIT), 2012
The spectral subtraction is a traditional approach for enhancing the quality of speech degraded b... more The spectral subtraction is a traditional approach for enhancing the quality of speech degraded by environmental noise. This algorithm is based on the subtraction of the estimated noise spectrum from the noisy speech spectrum and combines it with the phase of the noisy speech. Besides suppressing the noise, this method introduces an unnatural and unpleasant remnant noise. Several variants of
Proceedings of the 1st International Conference on Wireless Technologies for Humanitarian Relief - ACWR '11, 2011
This paper presents the Kalman filter (KF) based channel estimation algorithm for orthogonal freq... more This paper presents the Kalman filter (KF) based channel estimation algorithm for orthogonal frequency division multiplexing (OFDM) systems. The cyclic prefix (CP) portion of the OFDM symbols is used for extracting the channel state information. The KF algorithm computes a channel estimate based on the information contained in the cyclic prefix. This channel estimation algorithm is compared with the classical
International Journal of …, 2011
The spectral subtraction method is a classical approach for enhancement of speech degraded by add... more The spectral subtraction method is a classical approach for enhancement of speech degraded by additive background noise. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise spectrum from the noisy speech spectrum. This is also achieved by multiplying the noisy speech spectrum with a gain function and later combining it with the phase of the noisy speech. Besides reducing the background noise, this method introduces an annoying perceptible tonal characteristic in the enhanced speech and affects the human listening, known as remnant musical noise. Several variations and implementations of this method have been adopted in past decades to address the limitations of spectral subtraction method. These variations constitute a family of subtractive-type algorithms and operate in frequency domain. The objective of this paper is to provide an extensive overview of spectral subtractive-type algorithms for enhancement of noisy speech. After the review, this paper is concluded by mentioning a future direction of speech enhancement research from spectral subtraction perspective.
Procedia Engineering, 2013
This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improv... more This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improving the quality of speech signal in various noise environments. In the proposed enhancement algorithm, the whole speech spectrum is divided into different uniformly spaced continuous frequency bands and spectral over-subtraction is performed in each band, independently. The proposed algorithm uses a novel approach to estimate the noise from each band continuously, without using speech pause detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each uniformly spaced frequency band. The smoothing parameter is d controlled b by a linear function of a-posteriori-signal-to-noise ratio (SNR). The experiments are conducted for various types of noises and the results of proposed algorithm are compared with the reference multi-band spectral subtraction algorithm. To test the performance of the proposed speech enhancement algorithm, objective quality measurement tests (SNR, segmental SNR (Seg.SNR), and perceptual evaluation of speech quality (PESQ)) and spectrogram with informal listening tests are conducted for various noise types at different SNRs. Experimental results and objective quality evaluation test results confirmed the performance of proposed enhancement algorithm. The proposed enhancement algorithm provides sufficient noise reduction and good perceptual quality, without causing considerable signal distortion and remnant musical noise.
2013 National Conference on Communications (NCC), 2013
A spectral subtraction technique is presented for real-time speech enhancement in the aids used b... more A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real-time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-satge cascadedmedian, on a 16-bit fixed-point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4-13 dB.
The aim of this paper is to present an esoteric model of a nuclear spectrometer e.g. a gamma ray ... more The aim of this paper is to present an esoteric model of a nuclear spectrometer e.g. a gamma ray spectrometer as a communication channel [15]. The source entropy, receiver entropy, and joint entropy of a gamma ray spectrometer were estimated for an observed 1K gamma spectrum containing a 662keV peak from a Cs source. The information loss estimated for the observed gamma spectrum was of the order of 94.5%. In a typical communication engineering channel, the information loss is of the order of 30% [6]. The information loss in a gamma spectrometer is far more than that in a communication channel. Hence the information extraction in a nuclear spectrometer is extremely challenging vis-a-vis communication channel. This also explains high redundancy of spectral channels, and justifies that a priori information required is much more than the a posteriori information extracted in nuclear spectrometers. Also the model amply justifies wide ranging results for the IAEA intercomparison of spectr...
This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improv... more This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improving the quality of speech signal in various noise environments. In the proposed enhancement algorithm, the whole speech spectrum is divided into different uniformly spaced continuous frequency bands and spectral over-subtraction is performed independently, in each band. The proposed algorithm uses a novel approach to estimate the noise ffom each band continuously, without using speech pause detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each uniformly spaced frequency band. The smoothing parameter is controlled by a linear function of a-posteriori signal-to-noise ratio (SNR). The experiments are conducted for various types of noises. The results of proposed enhancement algorithm are compared with the reference multi-band spectral subtraction algorithm. To test the performance of the proposed speech enhancement algorithm, objective quality ...
Applied Speech Processing
International Journal of Speech Technology
The performance of speech recognition system degrades significantly in real-world environment, is... more The performance of speech recognition system degrades significantly in real-world environment, is a case of the acoustic mismatch between the training and operating conditions. This paper presents a two-stage approach to make a speech recognition system immune to additive and uncorrelated background noise i.e. robust . In the first stage, an oversampled wavelet packet decomposes the entire input noisy speech into seventeen nonlinear frequency subbands like the Bark scale of the human hearing system and the adaptive noise estimation based spectral subtraction filters the noisy speech from each subband signal. The oversampled WPT is linear and advantageous as it causes to overcome the shift - invariance complexity by removing the decimation after the filtering at each decomposition level. In the second stage, a nonparametric approach is used for feature extraction from filtered speech, and the parameters from the feature extraction stage are compared with the parameters extracted from speech signals stored in a template to recognize the utterance. A series of experiments are carried out to evaluate the performance of the proposed two-stage system in a variety of real environments, with and without the use of the first stage. Recognition accuracy is evaluated at the word level in a wide range of SNRs for various types of noisy environments. The experimental results show significant improvement in recognition performance at low SNR using the proposed system.
National Academy Science Letters
The performance level of speech recognizer drops significantly when there is an acoustic mismatch... more The performance level of speech recognizer drops significantly when there is an acoustic mismatch between training and operational environments. A speech recognizer is called robust if it preserves good recognition accuracy even in the mismatch conditions. Present study addresses the recognition of English speech in noisy environments and presents the comparative study of various frequency scales used in parameterization based on the average recognition rate. For the robust automatic speech reorganization, a front end signal enhancement component, spectral subtraction algorithm, is used to prefilter the noisy input speech prior fed to the recognizer. A number of frequency warped scales namely, perceptual scales viz, Mel scale, Bark scale, equivalent rectangular bandwidth rate scale, and a non-perceptual scale called uniform scale are used in the parameterization for feature extraction from enhanced speech. A suite of experiments is carried out to evaluate the performance of the speech recognizer, with and without the use of a front end signal enhancement component, in a variety of noisy environments. Recognition accuracy is tested in terms of word linguistic levels on a wide range of signal to noise ratios for both stationary and non-stationary noises.
International Journal of Speech Technology, 2016
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their... more Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.
Procedia Computer Science, 2016
This paper discusses the problem of single channel speech enhancement in stationary environments,... more This paper discusses the problem of single channel speech enhancement in stationary environments, and proposes Wiener filtering with the recursive noise estimation algorithm. The Wiener filter is a linear estimator and minimizes the mean-squared error between the original and enhanced speech. The algorithm is implemented in the frequency domain and depends on the filter transfer function from sample to sample based on the speech signal statistics; the local mean and the local variance. For the noise estimation, the recursive noise estimation approach is used. In this approach, the noise estimation is done by past and present spectral power values, using a smoothing parameter. The value of smoothing parameter is selected in between [0 1]. For the performance evaluation of the proposed speech enhancement algorithm objective evaluations with informal listening tests are conducted for the speech sentences, pronounced by male and female speakers from the NOIZEUS corpus, degraded by White as well as Pink noise types at different SNR levels. For objective measures, signal to noise ratio, segmental signal to noise ratio, and the perceptual evaluation of speech quality are used. The measures prove that the speech enhanced by proposed algorithm is more pleasant to the human ear for both noise conditions in comparison to the conventional speech enhancement method.
2014 Fifth International Symposium on Electronic System Design, 2014
This paper discusses the problem of single channel speech enhancement in various noise environmen... more This paper discusses the problem of single channel speech enhancement in various noise environments and presents an improved multi-band speech enhancement using masking properties of the human hearing system to address the additive noise and remnant noise, simultaneously. The improved multi-band spectral subtraction (I-MBSS) is used for enhancing the speech degraded by real-world noises. The IMBSS uses an adaptive noise estimation approach to estimate the noise from each band without complicated speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The noise estimation technique uses a smoothing parameter which is controlled by a linear function of a-posteriori SNR. Subsequently, the human hearing model is applied in the enhanced speech to compute the noise masking threshold and the subtraction parameters are adjusted according to human perception. The method is tested on speech signals with different noise types at different levels and the results are compared to classical multi-band spectral subtraction algorithm. Speech enhancement performance is evaluated using output SNR and the study of the spectrograms as well as informal listening tests with several types of real-world noises. Based on the analyzed speech signals, the proposed enhancement scheme performs better than then the classicalmulti-band spectral subtraction.
2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), 2012
ABSTRACT In this paper, an auditory perception based improved multi-band spectral subtraction alg... more ABSTRACT In this paper, an auditory perception based improved multi-band spectral subtraction algorithm is proposed to enhance the speech signal degraded by non-stationary or colored noises. In the proposed scheme, the whole speech spectrum is divided in different non-uniform bands (N = 6) in accordance to the critical-band rate scale and spectral subtraction is applied separately in each band. The proposed algorithm uses a new approach to estimate the noise power from each band without the need of explicit speech silence detection. The noise estimate is updated by adaptively smoothing the noisy signal power. The smoothing parameter is controlled by a linear function of a-posteriori signal-to-noise ratio (SNR). This noise estimation approach gives accurate results at low SNR and works continuously in the presence of speech. The objective measures as well as informal subjective tests demonstrate that the proposed algorithm reduces remnant noise efficiently and the enhanced speech contains minimal speech distortions with improved SNR.
International Journal of Speech Technology, 2013
ABSTRACT In this paper, we propose a speech enhancement method where the front-end decomposition ... more ABSTRACT In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.
International Journal of Image, Graphics and Signal Processing, 2013
The spectral subtraction method is a classical approach for enhancement of speech degraded by add... more The spectral subtraction method is a classical approach for enhancement of speech degraded by additive background noise. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise spectrum from the noisy speech spectrum. This is also achieved by multiplying the noisy speech spectrum with a gain function and later combining it with the phase of the noisy speech. Besides reducing the background noise, this method introduces an annoying perceptible tonal characteristic in the enhanced speech and affects the human listening, known as remnant musical noise. Several variations and implementations of this method have been adopted in past decades to address the limitations of spectral subtraction method. These variations constitute a family of subtractive-type algorithms and operate in frequency domain. The objective of this paper is to provide an extensive overview of spectral subtractive-type algorithms for enhancement of noisy speech. After the review, this paper is concluded by mentioning a future direction of speech enhancement research from spectral subtraction perspective.
Journal of Signal and Information Processing, 2013
This paper addresses the problem of single-channel speech enhancement in the adverse environment.... more This paper addresses the problem of single-channel speech enhancement in the adverse environment. The critical-band rate scale based on improved multi-band spectral subtraction is investigated in this study for enhancement of single-channel speech. In this work, the whole speech spectrum is divided into different non-uniformly spaced frequency bands in accordance with the critical-band rate scale of the psycho-acoustic model and the spectral over-subtraction is carried-out separately in each band. In addition, for the estimation of the noise from each band, the adaptive noise estimation approach is used and does not require explicit speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The smoothing parameter is controlled by a-posteriori signal-to-noise ratio (SNR). For the performance analysis of the proposed algorithm, the objective measures, such as, SNR, segmental SNR, and perceptual evaluations of the speech quality are conducted for the variety of noises at different levels of SNRs. The speech spectrogram and objective evaluations of the proposed algorithm are compared with other standard speech enhancement algorithms and proved that the musical structure of the remnant noise and background noise is better suppressed by the proposed algorithm.
Journal of Signal and Information Processing, 2013
This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for... more This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for enhancement of single channel speech. In the proposed algorithm, the output of the multi-band spectral subtraction (MBSS) algorithm is used as the input signal again for next iteration process. As after the first MBSS processing step, the additive noise transforms to the remnant noise, the remnant noise needs to be further re-estimated. The proposed algorithm reduces the remnant musical noise further by iterating the enhanced output signal to the input again and performing the operation repeatedly. The newly estimated remnant noise is further used to process the next MBSS step. This procedure is iterated a small number of times. The proposed algorithm estimates noise in each iteration and spectral over-subtraction is executed independently in each band. The experiments are conducted for various types of noises. The performance of the proposed enhancement algorithm is evaluated for various types of noises at different level of SNRs using, 1) objective quality measures: signal-to-noise ratio (SNR), segmental SNR, perceptual evaluation of speech quality (PESQ); and 2) subjective quality measure: mean opinion score (MOS). The results of proposed enhancement algorithm are compared with the popular MBSS algorithm. Experimental results as well as the objective and subjective quality measurement test results confirm that the enhanced speech obtained from the proposed algorithm is more pleasant to listeners than speech enhanced by classical MBSS algorithm.
2012 2nd International Conference on Power, Control and Embedded Systems, 2012
ABSTRACT The spectral subtraction method is a conventional approach for single channel speech enh... more ABSTRACT The spectral subtraction method is a conventional approach for single channel speech enhancement. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combines it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant noise. This paper proposes a novel algorithm to reduce the remnant noise, and thus improving the overall quality of the enhanced speech. In this algorithm, the output of multi-band spectral subtraction (MBSS) method is used as the input signal again for next iteration process. After the MBSS method, the additive noise is changed to remnant noise. The remnant noise is re-estimated at each iteration. The new estimated noise, furthermore, is been used to process the next MBSS. This procedure is iterated a small number of times. The simulation results as well as informal subjective evaluations prove that the speech enhanced by proposed algorithm is more pleasant to listeners than the conventional MBSS algorithm. This reveals that the proposed algorithm reduces remnant noise satisfactorily and produces good speech quality with improved signal-to-noise ratio.
2012 Third International Conference on Computer and Communication Technology, 2012
The spectral subtraction method is a classical approach for enhancement of degraded speech. The b... more The spectral subtraction method is a classical approach for enhancement of degraded speech. The basic principle of this technique is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combine it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant noise. The other drawback of this method is that it can work only for white Gaussian noise which has a flat spectrum and is distributed uniformly over the frequency spectrum. But real-world noise is mostly colored and has a non-uniform spectrum. To take care of this kind of noises, spectral subtraction algorithm has been extended to a multi-band case with uniformly spaced frequency bands. In this paper, a perceptually motivated multi-band spectral subtraction algorithm is proposed to enhance the speech signal degraded by colored noise. In the proposed scheme, the whole speech spectrum is divided in different non-uniform bands (N = 6) in accordance to the critical-band rate scale and spectral subtraction is executed independently in each band. The simulation results as well as informal subjective evaluations show that the proposed algorithm reduces remnant noise efficiently and the enhanced speech contains minimal speech distortions with improved signal-to-noise ratio.
2012 1st International Conference on Recent Advances in Information Technology (RAIT), 2012
The spectral subtraction is a traditional approach for enhancing the quality of speech degraded b... more The spectral subtraction is a traditional approach for enhancing the quality of speech degraded by environmental noise. This algorithm is based on the subtraction of the estimated noise spectrum from the noisy speech spectrum and combines it with the phase of the noisy speech. Besides suppressing the noise, this method introduces an unnatural and unpleasant remnant noise. Several variants of
Proceedings of the 1st International Conference on Wireless Technologies for Humanitarian Relief - ACWR '11, 2011
This paper presents the Kalman filter (KF) based channel estimation algorithm for orthogonal freq... more This paper presents the Kalman filter (KF) based channel estimation algorithm for orthogonal frequency division multiplexing (OFDM) systems. The cyclic prefix (CP) portion of the OFDM symbols is used for extracting the channel state information. The KF algorithm computes a channel estimate based on the information contained in the cyclic prefix. This channel estimation algorithm is compared with the classical
International Journal of …, 2011
The spectral subtraction method is a classical approach for enhancement of speech degraded by add... more The spectral subtraction method is a classical approach for enhancement of speech degraded by additive background noise. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise spectrum from the noisy speech spectrum. This is also achieved by multiplying the noisy speech spectrum with a gain function and later combining it with the phase of the noisy speech. Besides reducing the background noise, this method introduces an annoying perceptible tonal characteristic in the enhanced speech and affects the human listening, known as remnant musical noise. Several variations and implementations of this method have been adopted in past decades to address the limitations of spectral subtraction method. These variations constitute a family of subtractive-type algorithms and operate in frequency domain. The objective of this paper is to provide an extensive overview of spectral subtractive-type algorithms for enhancement of noisy speech. After the review, this paper is concluded by mentioning a future direction of speech enhancement research from spectral subtraction perspective.
Procedia Engineering, 2013
This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improv... more This paper proposes an improved multi-band spectral subtraction algorithm with the goal of improving the quality of speech signal in various noise environments. In the proposed enhancement algorithm, the whole speech spectrum is divided into different uniformly spaced continuous frequency bands and spectral over-subtraction is performed in each band, independently. The proposed algorithm uses a novel approach to estimate the noise from each band continuously, without using speech pause detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each uniformly spaced frequency band. The smoothing parameter is d controlled b by a linear function of a-posteriori-signal-to-noise ratio (SNR). The experiments are conducted for various types of noises and the results of proposed algorithm are compared with the reference multi-band spectral subtraction algorithm. To test the performance of the proposed speech enhancement algorithm, objective quality measurement tests (SNR, segmental SNR (Seg.SNR), and perceptual evaluation of speech quality (PESQ)) and spectrogram with informal listening tests are conducted for various noise types at different SNRs. Experimental results and objective quality evaluation test results confirmed the performance of proposed enhancement algorithm. The proposed enhancement algorithm provides sufficient noise reduction and good perceptual quality, without causing considerable signal distortion and remnant musical noise.