Distinctive Phonetic Features Modeling and Extraction using Deep Neural Networks (original) (raw)

Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network

IEICE TRANSACTIONS on …, 2009

This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLN LF−DPF , which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLN Dyn , which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.

MAXOUT BASED DEEP NEURAL NETWORKS FOR ARABIC PHONEMES RECOGNITION

Abstract—Arabic is widely articulated by Malay race due to several factors such as; performing worship and reciting the Holy book of Muslims. Newly, Maxout deep neural networks have conveyed substantial perfections to speech recognition systems. Hence, in this paper, a fully connected feed-forward neural network with Maxout units is introduced. The proposed deep neural network involves three hidden layers, 500 Maxout units and 2 neurons for each unit along with Mel-Frequency Cepstral Coefficients (MFCC) as feature extraction of the phonemes waveforms. Further, the deep neural network is trained and tested over a corpus comprised of consonant Arabic phonemes recorded from 20 Malay speakers. Each person is required to pronounce the twenty eight consonant phonemes within the three chances given to each subjects articulate all the letters. Conversely, continuous recording has been established to record all the letters in each chance. The recording process is accomplished using SAMSON C03U USB multi-pattern condenser microphone. Here, the data are divided into five waveforms for training the proposed Maxout network and fifteen waveforms for testing. Experimentally, the proposed Dropout function for training has shown considerable performance over Sigmoid and Rectified Linear Unit (ReLU) functions. Eventually, testing Maxout network has shown considerable outcome compare to Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Convolutional Neural Network (CNN), the conventional feedforward neural network (NN) and Convolutional Auto- Encoder (CAE).

Deep neural network acoustic models for multi-dialect Arabic speech recognition

2015

Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject. This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speec...

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Baghdad Science Journal, 2021

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the C...

Speech Recognition using Deep Neural Network Neural (DNN) and Deep Belief Network (DBN

International Journal for Research in Applied Science and Engineering Technology IJRASET, 2020

It is a difficult task of continuous automatic speech recognition, translating of spoken words into text due to the excessive viability in speech signals. In recent years speech recognition has been accomplishing pinnacle of success however it still has few limitations to overcome. Deep learning also known as representation learning or sometimes referred as unsupervised feature learning, is a subset of machine learning. Deep learning is becoming a conventional technology for speech recognition and has efficiently replaced Gaussian mixtures for speech recognition on a global scale. The predominant goal of this undertaking is to apply deep learning algorithms, together with Deep Neural Networks (DNN) and Deep Belief Networks (DBN), for automatic non-stop speech recognition. Keywords: Gaussian Mixture Model (GMM), Hidden Markov Models (HMMs), Deep Neural Networks (DNN), Deep Belief Networks (DBN).

An Approach for Pronunciation Classification of Classical Arabic Phonemes Using Deep Learning

Applied Sciences

A mispronunciation of Arabic short vowels can change the meaning of a complete sentence. For this reason, both the students and teachers of Classical Arabic (CA) are required extra practice for correcting students’ pronunciation of Arabic short vowels. That makes the teaching and learning task cumbersome for both parties. An intelligent process of students’ evaluation can make learning and teaching easier for both students and teachers. Given that online learning has become a norm these days, modern learning requires assessment by virtual teachers. In our case, the task is about recognizing the exact pronunciation of Arabic alphabets according to the standards. A major challenge in the recognition of precise pronunciation of Arabic alphabets is the correct identification of a large number of short vowels, which cannot be dealt with using traditional statistical audio processing techniques and machine learning models. Therefore, we developed a model that classifies Arabic short vowel...

Deep belief networks for phone recognition

NIPS Workshop on Deep Learning for …, 2009

Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation process. Deep Belief Networks (DBNs) have recently proved to be very effective for a variety of machine learning problems and this paper applies DBNs to acoustic modeling. On the standard TIMIT corpus, DBNs consistently outperform other techniques and the best DBN achieves a phone error rate (PER) of 23.0% on the TIMIT core test set.

Comparative Study of CNN Structures for Arabic Speech Recognition

Ingénierie des systèmes d information

Speech recognition is an essential ability of human beings and is crucial for communication. Consequently, automatic speech recognition (ASR) is a major area of research that is increasingly using artificial intelligence techniques to replicate this human ability. Among these techniques, deep learning (DL) models attract much attention, in particular, convolutional neural networks (CNN) which are known due to their power to model spatial relationships. In this article, three CNN architectures that performed well in recognized competitions were implemented to compare their performance in Arabic speech recognition; these are the well-known models AlexNet, ResNet, and GoogLeNet. These models were compared based on a corpus composed of Arabic spoken digits collected from various sources, including messaging and social media applications, in addition to an online corpus. The architectures of AlexNet, ResNet, and GoogLeNet achieved respectively an accuracy of 86.19%, 83.46%, and 89.61%. The results show the superiority of GoogLeNet, and underline the potential of CNN architectures to model acoustic features of low-resource languages such as Arabic.

Automatic Speech Attribute Detection of Arabic Language

2018

Recently, the speech attribute features caught the interest of the speech processing society and successfully employed in a wide variety of applications. In this paper we introduce the first intensive study of speech attribute detection in Arabic language. For each speech attribute, namely the manners and places of articulation, a binary Deep Neural Network (DNN) classifier is trained to recognize the existence or absence of the attribute. The DNN consists of multiple fully connected hidden layers and a two-way output softmax layer. The DNN is fed by mel-scale filter bank features extracted from the speech signal. We further adopted the dropout regularization technique to alleviate the classifier overfitting. The system tested on a speech corpus of 90 hours collected from Quranic Arabic reciters. The results show that the speech attribute detectors achieved classification accuracies ranging from 76% to 95%.

Deep Learning Architectures, Algorithms for Speech Recognition: An Overview

International Journal of Advanced Research in Computer Science and Software Engineering, 2017

Speech is the most natural form of human communication and speech processing has been one of the most inspiring expansions of signal processing. The Speech Recognition could be of great interest in the future. Speech is an easy mode of communication for the people to interact with the computer, rather than using keyboard and mouse. The goal of Deep Learning is to invent a machine which can sense, remember, learn, and recognize like a real human being. In this paper, we explore the different Deep Learning architectures and the algorithms applied to train the architectures. Our paper brings a study of the different classifiers of Neural networks like Recurrent Neural Network (RNN), Deep Recurrent Neural Network(RNN), and Deep Belief Network(DBN) and the algorithms used to train them.