Subjective and objective analysis of Speech Enhancement algorithms for Single Channel Speech Patterns of Indian & English Languages (original) (raw)
Related papers
This paper presents a comparative study among the seven single channel speech enhancement techniques such as spectral subtraction, Wiener filtering, minimum mean square error under speech presence uncertainty (MMSE-SPU), p-MMSE, log-MMSE and modulation channel selection (MCS). For the investigation of the capability of these techniques, 12 different practical noises on five different language databases were used. The result was analysed based on subjective and objective measure. In subjective measure SNR, peak signal-to-noise ratio (PSNR), segmental-SNR (SSNR) and mean square error (MSE) were considered, whereas for objective measure speech the intelligibility index was taken. The different language (Hindi, Kannada, Malayalam, Bengali and English) databases were taken from the Noizeus speech corpus and IIIT-H Indic speech database, while the noise database was obtained from the Noizex-92 noise corpus. The algorithms were implemented in MATLAB. The results obtained are very encouraging and helpful in the selection of single channel speech enhancement technique for practical application-based noise reduction. Further, among all the mentioned methods, MCS shows overall better performance for the five languages and 12 different practical noises.
IETE Technical Review, 2014
The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden.
International Journal of Interactive Multimedia and Artificial Intelligence, 2019
Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of the speech signals has led to a variety of speech processing applications. Successful use of these applications is, however, significantly aggravated in presence of the background noise distortions. These noise signals overlap and mask the target speech signals. To deal with these overlapping background noise distortions, a speech enhancement algorithm at front end is crucial in order to make noisy speech intelligible and pleasant. Speech enhancement has become a very important research and engineering problem for the last couple of decades. In this paper, we present an all-inclusive survey on unsupervised single-channel speech enhancement (U-SCSE) algorithms. A taxonomy based review of the U-SCSE algorithms is presented and the associated studies regarding improving the intelligibility and quality are outlined. The studies on the speech enhancement algorithms in unsupervised perspective are presented. Objective experiments have been performed to evaluate the potential of the U-SCSE algorithms in terms of improving the speech intelligibility and quality. It is found that unsupervised speech enhancement improves the speech quality but the speech intelligibility improvement is deprived. To finish, several research problems are identified that require further research.
Quality Evaluation of Speech Enhancement Algorithms for Normal and Hearing Loss Listeners
International Journal of Innovative Technology and Exploring Engineering, 2019
The subjective quality test of the enhanced speech from different enhancement algorithms for listeners with normal hearing (NH) capability as well as listeners with hearing impairment (HI) is reported. The subjective quality evaluation of speech enhancement methods in the literature survey is mostly done targeting NH listeners and fewer attempts are observed to subjectively evaluate for HI listeners. The algorithms evaluated are from four different classes: spectral subtraction class(SS), statistical model based class (minimum mean square error), subspace class(PKLT) and auditory class (ideal binary mask using STFT, ideal binary mask using gammatone filterbank and ideal binary mask using gammachirp filterbank). The algorithms are evaluated using four types of real world noises recorded in Indian scenarios namely cafeteria, traffic, station and train at -5, 0, 5 and 10 dB SNRs. The evaluation is being done as per ITU-T P.835 standard in terms of three parameters- speech signal alone,...
Single Channel Speech Enhancement for Mixed Non-Stationary Noise Environments
Speech enhancement is very important step for improving quality and intelligibility of noisy speech signal. In practical environment more than one noise sources are present, hence it is necessary to design a technique/ algorithm that can remove mixed noises or more than one noises from single-channel speech signals. In this paper, a single channel speech enhancement method is proposed for reduction of mixed non-stationary noises. The proposed method is based on wavelet packet and ideal binary mask thresholding function for speech enhancement. Db10 mother wavelet packet transform is used for decomposition of speech signal in three levels. After decomposition of speech signal a binary mask threshold function is used to threshold the noisy coefficients from the noisy speech signal coefficients. The performance of the proposed wavelet with ideal mask method is compared with Wiener, Spectral Subtraction, p-MMSE, log-MMSE, Ideal channel selection, Ideal binary mask, hard and soft wavelet thresholding function in terms of PESQ, SNR improvement, Cepstral Distance, and frequency weighted segmental SNR. The proposed method has shown improved performance over conventional speech enhancement methods.
Performance analysis of neural network, NMF and statistical approaches for speech enhancement
International Journal of Speech Technology, 2020
Bayesian Estimators are very useful in speech enhancement and noise reduction. But, it is noted that the traditional estimators process only amplitudes and the phase is left unprocessed. Among the Bayesian estimators, Super-Gaussian based estimators provide improved noise reduction. Super-Gaussian Bayesian estimators, which uses processed phase information for estimation of amplitudes provides further improved results. In this work, the Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators like CUP-GG (CUP Estimator with speech spectral coefficients assumed as Gamma and noise spectral coefficients as Generalized Gamma), CUP-NG (Speech as Nakagami) are compared under white noise, pink noise, Babble noise and Non-Stationary factory noise conditions. The statistical estimators show less effective results under completely non-stationary assumptions like non-stationary factory noise, babble noise etc. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires apriori knowledge about speech. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. NR-NMF and WR-NMF speech enhancement methods are developed by providing posteriori regularization based on statistical assumption of speech and noise DFT coefficients distribution. Also a speech enhancement method which uses CUP-GG estimator and NMF with online noise bases update are considered for comparison. The progress in neural network based approaches for speech enhancement further shown that with large dataset and better training, the speech enhancement algorithms results in improved results. In this work, the neural network approach for speech enhancement is implemented and compared the method with traditional estimators and NMF approaches. For generalization of unseen noise types the proposed neural network approach uses dropout. Also for training the network, the features obtained from apriori SNR and aposteriori SNR is used in this method. The objective of this paper is to analyze the performance of speech enhancement methods based on Neural Network, NMF and statistical based. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Segmental SNR (Seg SNR) are considered for comparison.
A Literature Survey on Single Channel Speech Enhancement Techniques
2020
Speech enhancement deals with the handling of noisy speech signals in order to improve people's perception or better system understanding when noise destroys speech information. It is usually difficult to keep speech undistorted while reducing noise and thus limiting the performance of speech enhancement systems— the compromise between distortion of speech and reduction of noise. With noisy speech with medium to high SNR, the goal will be to generate subjectively realistic signal by reducing noise levels, and for those with low SNR, the goal could be to reduced noise level while retaining intelligibility. In this work, discussion on the need for speech enhancement, its applications, and an overview of classification and various approaches available has also been given and done an extensive literature survey on speech enhancement techniques with different platforms.
Objective comparison of speech enhancement algorithms under real world conditions
2008
Over the past decades the problem of one channel, speech enhancement has been addressed by a great deal of researchers. In this work selected methods belonging to a variety of categories are applied to denoise speech signals corrupted by non-stationary urban noise. The performance of spectral subtraction, signal subspace, model-based and Kalman filtering approaches is evaluated. Several objective measures which are designed to predict human listening tests are employed in order to reach accurate conclusions. Two series of experiments were carried out while multiband spectral subtraction along with a short-time spectral amplitude (STSA) estimator based on the minimization of the mean square error (MSE) of the log-spectra are shown to outperform the rest of the algorithms.
Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics
2011
Speech is an elementary source of human interaction. The quality and intelligibility of speech signals during communication are generally degraded by the surrounding noise. Corrupted speech signals need therefore to be enhanced to improve quality and intelligibility. In the field of speech processing, much effort has been devoted to develop speech enhancement techniques in order to restore the speech signal by reducing the amount of disturbing noise. This thesis focuses on a single channel speech enhancement technique that performs noise reduction by spectral subtraction based on minimum statistics. Minimum statistics means that the power spectrum of the non-stationary noise signal is estimated by finding the minimum values of a smoothed power spectrum of the noisy speech signal and, thus, circumvents the speech activity detection problem. The performance of the spectral subtraction method is evaluated using single channel speech data and for a wide range of noise types with various...
Intelligibility investigation of single-channel noise reduction algorithms for Chinese and Japanese
2010 7th International Symposium on Chinese Spoken Language Processing, 2010
A large number of single-channel noise reduction algorithms have been proposed based largely on mathematical principles and evaluated with English speech. Given the different perceptual cues used by native listeners of different languages, it is of great interest to examine whether there are any language effects on speech intelligibility when the same noise reduction algorithm is used to process noisy speech in different languages. In this paper, a comparative evaluation is taken of various singlechannel noise reduction algorithms applied to noisy speech for Chinese and Japanese. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five singlechannel noise reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluations showed that the majority of noisereduction algorithms did not improve speech intelligibility and that significant differences in performance of noise reduction algorithms were observed across the two languages.