MAXOUT BASED DEEP NEURAL NETWORKS FOR ARABIC PHONEMES RECOGNITION (original) (raw)

Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition

2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME), 2018

In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local spe ech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face it, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pretraining methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on the FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre trained fully connected NNs with sigmoid neurons by about 3%.

Deep neural network acoustic models for multi-dialect Arabic speech recognition

2015

Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject. This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speec...

Recognition System for Nasal, Lateral and Trill Arabic Phonemes using Neural Networks

In this paper, we monitored and analyzed the performance of multi-layer feed-forward with back-propagation (MLFFBP) and cascade-forward (CF) networks on our phoneme recognition system of Standard Arabic (SA). This study focused on Malaysian children as test subjects. Focused on four chosen phonemes from SA, which composed of nasal, lateral and trill behaviors, i.e. tabulated at four different articulation places. Highest training recognition rate for multi-layer and cascade-layer network are 98.8 % and 95.2 % respectively, while the highest testing recognition rate achieved for both networks is 92.9 %. k-fold cross validation was used to evaluate system performance. The selected network is cascade layer with 40 and 10 hidden neurons in first hidden layer and second hidden layer respectively. The chosen network was used in the developed GUI system for user feedback.

Convolutional Neural Network for Arabic Speech Recognition

The Egyptian Journal of Language Engineering

This work is focused on single word Arabic automatic speech recognition (AASR). Two techniques are used during the feature extraction phase; Log frequency spectral coefficients (MFSC) and Gammatone-frequency cepstral coefficients (GFCC) with their first and second-order derivatives. The convolutional neural network (CNN) is mainly used to execute feature learning and classification process. CNN achieved performance enhancement in automatic speech recognition (ASR). Local connectivity, weight sharing, and pooling are the crucial properties of CNNs that have the potential to improve ASR. We tested the CNN model using an Arabic speech corpus of isolated words. The used corpus is synthetically augmented by applying different transformations such as changing the pitch, the speed, the dynamic range, adding noise, and forward and backward shift in time. It was found that the maximum accuracy obtained when using GFCC with CNN is 99.77 %. The outcome results of this work are compared to previous reports and indicate that CNN achieved better performance in AASR.

Distinctive Phonetic Features Modeling and Extraction using Deep Neural Networks

IEEE Access

Feature extraction is a critical stage of digital speech processing systems. Quality of features are of great importance to provide solid foundation upon which the subsequent stages stand. Distinctive Phonetic Features (DPFs) are one of the most representative features of speech signals. The significance of DPFs is in their ability to provide abstract description of the places and manners of articulation of the language phonemes. A phoneme's DPF element reflects unique articulatory information about that phoneme. Therefore, there is a need to discover and investigate each DPF element individually in order to achieve deeper understanding and to come up with a descriptive model for each one. Such fine-grained modeling will satisfy the uniqueness of each DPF element. In this paper, the problem of DPF modeling and extraction of Modern Standard Arabic is tackled. Due to the remarkable success of Deep Neural Networks (DNNs) that are initialized using Deep Belief Networks (DBNs) in serving DSP applications and its capability of extracting highly representative features from raw data, we exploit its modeling power to investigate and model DPF elements. DNN models are compared to the classical Multilayer Perceptron (MLP) models. The representativeness of several acoustic cues for different DPF elements was also measured. This work is based on formalizing DPF modeling problem as a binary classification problem. Because DPF elements are highly imbalanced data, evaluating the quality of models is a very tricky process. The paper addresses the proper evaluation measures satisfying the imbalanced nature of DPF elements. After modeling each element individually, two top-level DPF extractors are designed: MLP-based and DNN-based extractors. Results shows the quality of DNN models and their superiority over MLPs with accuracies of 89.0% and 86.7%, respectively. INDEX TERMS Modern Standard Arabic, distinctive phonetic features, speech processing, deep belief networks, restricted Boltzmann machine.

Arabic phoneme recognition using neural networks

Proceedings of the 5th WSEAS …, 2006

The main theme of this paper is the recognition of isolated Arabic speech phonemes using artificial neural networks, as most of the researches on speech recognition (SR) are based on Hidden Markov Models (HMM). The technique in this paper can be divided into three major steps: firstly the preprocessing in which the original speech is transformed into digital form. Two methods for preprocessing have been applied, FIR filter and Normalization. Secondly, the global features of the Arabic speech phoneme are then extracted using Cepstral coefficients, with frame size of 512 samples, 170 overlapping, and hamming window. Finally, recognition of Arabic speech phoneme using supervised learning method and Multi Layer Perceptron Neural Network MLP, based on Feed Forward Backprobagation. The proposed system achieved a recognition rate within 96.3% for most of the 34 phonemes. The database used in this paper is KAPD (King AbdulAziz Phonetics Database), and the algorithms were written in MATLAB.

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Baghdad Science Journal, 2021

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the C...

CASCADE-FORWARD NEURAL NETWORKS FOR ARABIC PHONEMES BASED ON K-FOLD CROSS VALIDATION

In this paper, we monitored and analyzed the performance of cascade-forward (CF) networks on our phoneme recognition system of Standard Arabic (SA). This study focused on Malaysian children as test subjects. It is focused on four chosen phonemes from SA, which composed of nasal, lateral and trill behaviors, i.e. tabulated at four different articulation places. The method, k-fold cross validation to evaluate each network architecture in k times to improve the reliability of the choice of the optimal architecture. Based on k-fold cross validation method, namely 10-fold cross validation, the most suitable cascade-layer network architecture in first hidden layer and second hidden layer is 50 and 30 nodes respectively with MSE less than 0.06. The training and testing recognition rate achieved were 91.6 % and 89.3 % respectively.

An Approach for Pronunciation Classification of Classical Arabic Phonemes Using Deep Learning

Applied Sciences

A mispronunciation of Arabic short vowels can change the meaning of a complete sentence. For this reason, both the students and teachers of Classical Arabic (CA) are required extra practice for correcting students’ pronunciation of Arabic short vowels. That makes the teaching and learning task cumbersome for both parties. An intelligent process of students’ evaluation can make learning and teaching easier for both students and teachers. Given that online learning has become a norm these days, modern learning requires assessment by virtual teachers. In our case, the task is about recognizing the exact pronunciation of Arabic alphabets according to the standards. A major challenge in the recognition of precise pronunciation of Arabic alphabets is the correct identification of a large number of short vowels, which cannot be dealt with using traditional statistical audio processing techniques and machine learning models. Therefore, we developed a model that classifies Arabic short vowel...

Arabic phonetic features recognition using modular connectionist architectures

Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376), 1998

This paper proposes an approach for reliably identifying complex Arabic phonemes in continuous speech. This is proposed to be done by a mixture of artificial neural experts. These experts are typically time delay neural networks using an original version of the autoregressive backpropagation algorithm (AR-TDNN). A module using specific cues generated by an ear model operates the speech phone segmentation. Perceptual linear predictive (PLP) coefficients, energy, zero crossing rate and their derivatives are used as input parameters. Serial and parallel architectures of AR-TDNN have been implemented and confronted to a monolithic system using simple backpropagation algorithm.