Robust DNN-Based Speech Enhancement with Limited Training Data (original) (raw)
Related papers
Normalized Features for Improving the Generalization of DNN Based Speech Enhancement
arXiv: Sound, 2017
Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the noisy observation while the actual filter functions are derived analytically based on statistical assumptions. Even though such approaches generalize well to many different acoustic conditions, the noise suppression capability in transient noises is low. To amend this shortcoming, machine-learning (ML) methods such as deep learning have been employed for speech enhancement. However, due to their data-driven nature, the generalization of ML based approaches to unknown noise types is still discussed. To improve the generalization of ML based algorithms and to enhance the noise suppression of non-ML based methods, we propose a combination of both approaches. For this, we employ the a priori signal-to-noise ratio (SNR) and the a posteriori SNR estimated a...
Speech Enhancement Using Deep Neural Network
2016
Speech is the main source of human interaction. In everyday life,Speech understanding in noisy environments is still one of the major challenges for users. The quality and intelligibility of speech signals are generally gets corrupted by the surrounding background noise during communication. So to improve the quality and intelligibility, Corrupted speech signals is to be enhanced. In the field of speech processing, different effort has been taken to develop speech enhancement techniques in order to enhance the speech signal by reducing the amount of noise. Speech enhancement deals with improving the quality and intelligibility of speech which gets degraded in the presence of surrounding background noise. In various everyday environments, the goal of speech enhancement methods is to improving the quality and intelligibility of speech especially at low Signal-to-Noise ratios (SNR). Regarding intelligibility, different machine learning methods that aim to estimate an ideal binary mask ...
Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
2021
Deep neural network (DNN)–based speech enhancement ordinarily requires clean speech signals as the training target. However, collecting clean signals is very costly because they must be recorded in a studio. This requirement currently restricts the amount of training data for speech enhancement less than 1/1000 of that of speech recognition which does not need clean signals. Increasing the amount of training data is important for improving the performance, and hence the requirement of clean signals should be relaxed. In this paper, we propose a training strategy that does not require clean signals. The proposed method only utilizes noisy signals for training, which enables us to use a variety of speech signals in the wild. Our experimental results showed that the proposed method can achieve the performance similar to that of a DNN trained with clean signals.
An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
Electronics
Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalizatio...
Speech Enhancement Using Deep Learning Methods: A Review
Jurnal Elektronika dan Telekomunikasi
Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing. According to the type of degradation and noise in the speech signal, approaches to speech enhancement vary. Thus, the research topic remains challenging in practice, specifically when dealing with highly non-stationary noise and reverberation. Recent advance of deep learning technologies has provided great support for the progress in speech enhancement research field. Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey. In this review, we described the advantages and disadvantages of recent deep learning approaches. We also discussed challenges and trends of this field. From the reviewed works, we concluded that the trend of the deep learning architecture has shifted from the standard deep neural network (DNN) to convolutional neural network ...
Applications of deep learning to speech enhancement
2019
Deep neural networks (DNNs) have been successfully employed in a broad range of applications, having achieved state-of-the-art results in tasks such as acoustic modeling for automatic speech recognition and image classification. However, their application to speech enhancement problems such as denoising, dereverberation, and source separation is more recent work and therefore in more preliminary stages. In this work, we explore DNN-based speech enhancement from three different and complementary points of view. First, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder and a recurrent neural network for extracting long-term information. Our model outperforms a recently proposed model that uses different context information depending on the reverberation time, wit...