Efficient classification of noisy speech using neural networks (original) (raw)
Related papers
Investigation on the Effect of the Input Features in the Noise Level Classification of Noisy Speech
2019
Noise Level Estimation plays a crucial role in Speech Enhancement (SE) Algorithms. Recently, few noise estimation (NE) algorithms are developed for SE using the minimal-tracking method, but there is little research done in the noise level classification (NLC). Therefore, there is a need to identify appropriate audio features that are required for the NLC. In this paper, this problem has been addressed and seventeen audio features of the noisy speech are examined for NLC using four different types of standard and efficient classifiers such as K-Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT) classifiers. The features are first optimized to achieve the best classification performance using the Principal Component Analysis (PCA) and the Neighbourhood Component Feature Selection (NCFS) method. Finally, a comparative performance analysis is carried out by taking six different categories of real-life noisy speech signals from the standard speech ...
Classification of Speech using Signal Processing and Artificial Neural Network
A wide range of researches are carried out in signal processing for denoising. Stress management is important to improve disabled person speech. In order to provide proper speech practice for the disabled children, their speech is analyzed. Initially, the normal and pathological subject's speech are obtained with same set of words. In this project, classification of normal and pathological subject's speech is discussed. The relevant features are extracted using Mel Frequency Cepstrum Coefficient (MFCC) for both words of normal and pathological subject's speech. Dimensionality reduction of features is achieved by using Principle Component Analysis (PCA). Finally the features are trained and tested using Artificial Neural Network (ANN) for classification.
Removing Noise from Speech Signals Using Different Approaches of Artificial Neural Networks
International Journal of Information Technology and Computer Science, 2015
In this research, four ANN models: Function Fitting (FitNet), Nonlinear AutoRegressive (NARX), Recurrent (RNNs), and Cascaded-ForwardNet were constructed and trained separately to become a filter to remove noise from any speech signal. Each model consists of input, hidden and output layers. Two neurons in the input layer that represent speech signal and its associated noise. The output layer includes one neuron that represent the enhanced signal after removing noise. The four models were trained separately on stereo (noisy and clean) audio signals to produce the clean signal. Experiments were conducted for each model separately with different: architecture; optimization training algorithms; and learning parameters to identify model with best results of removing noise from speech signal. From experiments, best results were obtained from FitNet and NARAX models respectively. TrainLM is the best training algorithm in this case. Finally, the results showed that the suggested architecture of the four models have filtering ability to remove noise form both trained and not trained speech signals samples.
Identification of Noises and Speech Signals by Artificial Neural Networks
Journal of Engineering Science and Technology Review
According to the studies, the most commonly used mathematical apparatuses for signal recognition tasks are Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Deep Neural Networks (DNN). This paper presents an innovative approach related to the possibility for identification of accidental noise impacts and human speech with superimposed presence of noise by Backpropagation Neural Networks (BNN) in different transfer functions. BPNs with linear, tangent-sigmoid and log-sigmoid transfer functions in the output layers are tested. A neural architecture for noise recognition in 6 neurons defined in the hidden layer with "tansig" activation output and achieved accuracy of 98.3% is selected. In the speech processing, an identical efficiency of 93.7% in 4, 3 and 4 hidden neurons for all types of output transfer functions was observed.
Neural networks used for speech recognition
Proceedings of the Nineteenth National Radio Science Conference, 2002
Abstract-In this paper is presented an investigation of the speech recognition classification performance. This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier. The utilized standard neural network types include Feed-forward Neural Network (NN) with back propagation algorithm and a Radial Basis Functions Neural Networks.
Proceedings of the workshop on Speech and Natural Language - HLT '90, 1990
A model-based spectral estimation algorithm is derived that improves the robustness of speech recognition systems to additive noise. The algorithm is tailored for filter-bank-based systems, where the estimation should seek to minimize the distortion as measured by the recognizer's distance metric. This estimation criterion is approximated by minimizing the Euclidean distance between spectral log-energy vectors, which is equivalent to minimizing the nonweighted, nontruncated cepstral distance. Correlations between frequency channels are incorporated in the estimation by modeling the spectral distribution of speech as a mixture of components, each representing a different speech class, and assuming that spectral energies at different frequency channels are uncorrelated within each class. The algorithm was tested with SRI's continuous-speech, speaker-independent, hidden Markov model recognition system using the largevocabulary NIST "Resource Management Task." When trained on a clean-speech database and tested with additive white Gaussian noise, the new algorithm has an error rate half of that with MMSE estimation of log spectral energies at individual frequency channels, and it achieves a level similar to that with the ideal condition of training and testing at constant SNR. The algorithm is also very efficient with additive environmental noise, recorded with a desktop microphone.
Noise reduction algorithm for robust speech recognition using MLP neural network
2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), 2009
We propose an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. Multi Layer Perceptron (MLP) neural network in the log spectral domain minimizes the difference between noisy and clean speech. By using this method as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments is improved. We can extend the application of the system to different environments with different noises without re-training it. We need only to train the preprocessing stage with a small portion of noisy data which is created by artificially adding different types of noises from the NOISEX-92 database to the TIMIT speech database. Experimental results show that the proposed method can achieve significant improvement ofrecognition rates.
Probabilistic decision-based neural networks for speech pattern classification
ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344)
Probabilistic decision-based neural networks (PDBNNs) were originally proposed by Lin, Kung and Lin for human face recognition. Although high recognition accuracy has been achieved, not many illustrations were given to highlight the characteristic of the decision boundaries. This paper aims at providing detailed illustrations to compare the decision boundaries of PDBNNs with that of Gaussian mixture models through a pattern recognition task, namely the classification of two-dimensional vowel data. The original PDBNNs use elliptical basis functions with diagonal covariance matrices, which may be inefficient for modeling feature vectors with correlated components. This paper attempts to tackle this problem by using full covariance matrices. The paper also highlights the strengths of PDBNNs by demonstrating that the PDBNN's thresholding mechanism is very effective in rejecting data not belonging to any known classes.
Artificial neural networks for voice activity detection Technology
Journal of Advanced Sciences and Engineering Technologies, 2022
Currently, the direction of voice biometrics is actively developing, which includes two related tasks of recognizing the speaker by voice: the verification task, which consists in determining the speaker's personality, and the identification task, which is responsible for checking the belonging of the phonogram to a particular speaker. An open question remains related to improving the quality of the verification identification algorithms in real conditions and reducing the probability of error. In this work study Voice activity detection algorithm is proposed, which is a modification of the algorithm based on pitch statistics; VAD is investigated as a component of a speaker recognition system by voice, and therefore the main purpose of its work is to improve the quality of the system as a whole. On the example of the proposed modification of the VAD algorithm and the energy-based VAD algorithm, the analysis of the influence of the choice on the quality of the speaker recognition...