Arabic phonetic features recognition using modular connectionist architectures (original) (raw)

Recognition System for Nasal, Lateral and Trill Arabic Phonemes using Neural Networks

In this paper, we monitored and analyzed the performance of multi-layer feed-forward with back-propagation (MLFFBP) and cascade-forward (CF) networks on our phoneme recognition system of Standard Arabic (SA). This study focused on Malaysian children as test subjects. Focused on four chosen phonemes from SA, which composed of nasal, lateral and trill behaviors, i.e. tabulated at four different articulation places. Highest training recognition rate for multi-layer and cascade-layer network are 98.8 % and 95.2 % respectively, while the highest testing recognition rate achieved for both networks is 92.9 %. k-fold cross validation was used to evaluate system performance. The selected network is cascade layer with 40 and 10 hidden neurons in first hidden layer and second hidden layer respectively. The chosen network was used in the developed GUI system for user feedback.

Arabic phoneme recognition using neural networks

Proceedings of the 5th WSEAS …, 2006

The main theme of this paper is the recognition of isolated Arabic speech phonemes using artificial neural networks, as most of the researches on speech recognition (SR) are based on Hidden Markov Models (HMM). The technique in this paper can be divided into three major steps: firstly the preprocessing in which the original speech is transformed into digital form. Two methods for preprocessing have been applied, FIR filter and Normalization. Secondly, the global features of the Arabic speech phoneme are then extracted using Cepstral coefficients, with frame size of 512 samples, 170 overlapping, and hamming window. Finally, recognition of Arabic speech phoneme using supervised learning method and Multi Layer Perceptron Neural Network MLP, based on Feed Forward Backprobagation. The proposed system achieved a recognition rate within 96.3% for most of the 34 phonemes. The database used in this paper is KAPD (King AbdulAziz Phonetics Database), and the algorithms were written in MATLAB.

Phoneme Recognition Using Neural Networks

2006

CASCADE-FORWARD NEURAL NETWORKS FOR ARABIC PHONEMES BASED ON K-FOLD CROSS VALIDATION

In this paper, we monitored and analyzed the performance of cascade-forward (CF) networks on our phoneme recognition system of Standard Arabic (SA). This study focused on Malaysian children as test subjects. It is focused on four chosen phonemes from SA, which composed of nasal, lateral and trill behaviors, i.e. tabulated at four different articulation places. The method, k-fold cross validation to evaluate each network architecture in k times to improve the reliability of the choice of the optimal architecture. Based on k-fold cross validation method, namely 10-fold cross validation, the most suitable cascade-layer network architecture in first hidden layer and second hidden layer is 50 and 30 nodes respectively with MSE less than 0.06. The training and testing recognition rate achieved were 91.6 % and 89.3 % respectively.

MAXOUT BASED DEEP NEURAL NETWORKS FOR ARABIC PHONEMES RECOGNITION

Abstract—Arabic is widely articulated by Malay race due to several factors such as; performing worship and reciting the Holy book of Muslims. Newly, Maxout deep neural networks have conveyed substantial perfections to speech recognition systems. Hence, in this paper, a fully connected feed-forward neural network with Maxout units is introduced. The proposed deep neural network involves three hidden layers, 500 Maxout units and 2 neurons for each unit along with Mel-Frequency Cepstral Coefficients (MFCC) as feature extraction of the phonemes waveforms. Further, the deep neural network is trained and tested over a corpus comprised of consonant Arabic phonemes recorded from 20 Malay speakers. Each person is required to pronounce the twenty eight consonant phonemes within the three chances given to each subjects articulate all the letters. Conversely, continuous recording has been established to record all the letters in each chance. The recording process is accomplished using SAMSON C03U USB multi-pattern condenser microphone. Here, the data are divided into five waveforms for training the proposed Maxout network and fifteen waveforms for testing. Experimentally, the proposed Dropout function for training has shown considerable performance over Sigmoid and Rectified Linear Unit (ReLU) functions. Eventually, testing Maxout network has shown considerable outcome compare to Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Convolutional Neural Network (CNN), the conventional feedforward neural network (NN) and Convolutional Auto- Encoder (CAE).

Phoneme recognition with a time-delay neural network

IJCNN-1990; International Joint Conference on Neural Networks, San Diego,1990, 1990

A Time-Delay Neural Network architecture is used for speaker dependent recognition of the long vowel sounds a, e and i. This work is similar to the work in [1,2,3,4], but differs in the following areas, 1) increasing the amount of data supplied to the network 2) by allowing longer and variable length utterances and 3) using English rather Japanese speech. With this architecture, we have been able to obtain 100% recognition for a population of six speakers.

Classification of the Arabic Emphatic Consonants using Time Delay Neural Network

International Journal of Computer Applications (0975 – 8887) Volume 80 – No.10, October 2013, 2013

This study concerns the use of Artificial Neural Networks (ANNs) in automatic classification of the emphatic consonants of the Standard Arabic Language (SAL). It reinforces the few works directed towards the speech recognition in Standard Arabic. We have applied the Time Delay Neural Network (TDNN) approach which permits a classification of the phonemes by taking into account the dynamic aspect of speech and consequently to overcome problems of coarticulation phenomenon. We have conducted a supervised training method based on Bayesian Regularization (BR) backpropagation coupled with the Levenberg-Marquardt (LM) optimization algorithm, to adjust the synaptic weights in order to minimize the error between the computed output and the desired output for all samples. Based on the results, the proposed Neural Network provides a higher percentage of recognition accuracy of the emphatic phonemes (92.25%). The choice of our study is quite important. Indeed, efficient phoneme classifiers lead to efficient word classifiers and the ability to recognize phonemes accurately provides the basis for an accurate recognition of words and continuous speech in the future.

RECOGNITION OF ARABIC PHONETIC FEATURES USING NEURAL NETWORKS AND KNOWLEDGE-BASED SYSTEM: A COMPARATIVE STUDY

International Journal on Artificial Intelligence Tools, 1999

This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linear predictive (PLP) technique is used. The ability of the system, has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledgebased system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better.

Arabic phonemes recognition system based on malay speakers using neural network

Arabic language can be used by native and nonnative speakers; due to Arabic is the language of the holy book of Muslims. In this paper, Arabic phoneme recognition system is proposed based on Malay speakers. This system consists of three main stages. The first stage is noise reduction and it aims to enhance the phoneme signals by excluding the unvoiced signals and keep only the voiced signal. Wiener filter is adapted to accomplish this task. The second stage is based on Mel-Frequency Cepstral Coefficients method to extract a vector of features to represent each phoneme signal. Eventually, pattern recognition neural network is designed as recognizer. The proposed system produces sufficient outcomes with 20 hidden neurons.

Consonant Recognition by Modular Constructlon of Large Phonemic Time-Delay Neural Networks

In this paper we show that neural networks for speech recognition can be constructed in a modular fashbn by expbiting the hidden structure of prevbusly trained phonetic subcategory networks. The performance of resulting larger phonetic nets was found to be as good as the performance of the subcomponent nets by themselves. This approach avoids the excessive learning times that would ba necessary to train larger networks and allows for incremental learning. Large time-delay neural networks constructed incrementally by applying these modular training techniques achieved a rewgnition performance of 96.0% for ail consonants and 94.7% for all phonemes. 1. Introductlon Recently we have demonstrated that connectionist architectures capable of capturing some critical aspects of the dynamic nature of speech, can achieve superior recognition performance for daficuit but small phonemic discrimination tasks such as discrimination of the voiced consonants B,D and G 11.21. Encouraged by these results we wanted to explore the vestion, how we might expand on these models to make them useful for the design of speech recognition systems. A problem that emerges as we attempt to apply neural network models to the full speech recognition problem is the problem of scaling. Simply extending neural networks to ever larger structures and retraining them as one monolithic net quickly exceeds the capabilities of the fastest and largest supercomputers. The search complexity of finding a g w d solutions in a huge space of possible network configurations also soon assumes unmanageable proportions. Moreover, having to decide on all possible classes for recognition ahead of time as well as collecting sufficient data to train such a large monolithic network is impractical to say the least. In an effort to extend our models from mail recognition tasks to large scale speech recognition systems, we must therefore explore modularity and incremental learning as design strategies to break up a large learning task into smaller subtasks. Breaking up a large task into subtasks to be tackled by individual black boxes interconnected in ad hoc arrangements, on the other hand, would mean to abandon one of the most anractiie aspects of connectionism: the ability to perform complex constraint satistaction in a massively parallel and interconnected fashion, in view of an overall optimal performance goal. In this paper we demonstrate based on a set of experiments aimed at phoneme recognition that it is indeed possible to consttuct large neural networks incrementally by exploiting the hidden structure of smaller pretrained subcomponent networks.

Arabic phonetic features recognition using modular connectionist architectures (original) (raw)

Related papers