Formant analysis in dysphonic patients and automatic Arabic digit speech recognition (original) (raw)

Automatic Arabic digit speech recognition and formant analysis for voicing disordered people

2011

In this paper, analysis of speech from voice disordered people is performed from automatic speech recognition (ASR) point of view. Six different types of voicing disorder (pathological voice) are analyzed to show the difficulty of automatically recognizing their corresponding speech. As a case study, Arabic spoken digits are taken as input. The distribution of first four formants of vowel /a/ is extracted to show a significant deviation of formants from the normal speech to disordered speech. Experiment result reveals that current ASR technique is far from reliable performance in case of pathological speech, and thereby we need attention to this.

An Automatic Diagnostic System for Medically Disordered Voice

The automatic detection of vocal cord/vocal tract diseases from the medically disordered speech is a challenging task in the medical field. An automatic diagnostic system for the medically disordered voice is proposed in this paper. The developed system takes speech recorded by a patient having a disorder in his/her voice, as an input. The system gives the type of voice disorder as an output. The output is accompanied by a confidence score that tells about the systems confidence of correctly recognizing the type of the disorder as output. In the system, after the detection of the vowels from the disordered speech, two formants F1 and F2 are extracted from the vowels and Euclidean distance is used for classification. The results of the proposed system are encouraging.

Automatic voice disorder classification using vowel formants

2011

In this paper, we propose an automatic voice disorder classification system using first two formants of vowels. Five types of voice disorder, namely, cyst, GERD, paralysis, polyp and sulcus, are used in the experiments. Spoken Arabic digits from the voice disordered people are recorded for input. First formant and second formant are extracted from the vowels [Fatha] and [Kasra], which are present in Arabic digits. These four features are then used to classify the voice disorder using two types of classification methods: vector quantization (VQ) and neural networks. In the experiments, neural network performs better than VQ. For female and male speakers, the classification rates are 67.86% and 52.5%, respectively, using neural networks. The best classification rate, which is 78.72%, is obtained for female sulcus disorder.

Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech

2016

We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Arabic spoken language and that proper to concerned speaker. For this task, we use techniques of automatic speech processing like forced alignment and artificial neural network (ANN) (Basheer, 2000). Based on a test corpus containing 100 speech sequences, recorded by different speakers (healthy/pathological speeches and native/foreign speakers), we attain 97% as classification rate. Algorithms used in identifying phonemes that pose pronunciation problems show high efficiency: we attain an identification rate of 100%.

Performance of Different Acoustic Measures to Discriminate Individuals With and Without Voice Disorders

Journal of Voice, 2020

The goal of this study is to compare and combine different acoustic features in discriminating subjects with and without voice disorders. A database of 484 adult patients participated in the research. All subjects recorded a sustained vowel /Ɛ/ and underwent a laryngoscopic examination of the larynx. From the results of the laryngeal examination performed by a physician and the auditory-perceptual judgment performed by a Speech-Language Pathologist, the subjects were allocated to the group with (n = 52) and without (n = 432) voice disorder. Four types of acoustic features were used: traditional measures, cepstral measures, nonlinear measures, and recurrence quantification measures. Recordings comprised the emission of the vowel /e/. Quadratic discriminant analysis was used as classifier. Individual features in the context of traditional, cepstral, and recurrence quantification measures achieved an acceptable performance of ≥70%. Combination of measures improved the classifier performance. The best classification result (86.43% accuracy) was obtained by combining traditional linear and recurrence quantification measures. Results shown that Traditional, Cepstral, and recurrence quantification measures are promising features that capture meaningful information about voice production, which provides good classification performances. The findings of this study can be used to develop a computational tool for voice disorders diagnosis and monitoring.

Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia

2005

This paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones -the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk-Det and ALIZE) of the LIA lab. Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation.

Interface of an Automatic Recognition System for Dysarthric Speech

International Journal of Advanced Computer Science and Applications

This paper addresses the realization of a Human/Machine (H/M) interface including a system for automatic recognition of the Continuous Pathological Speech (ARSCPS) and several communication tools in order to help frail people with speech problems (Dysarthric speech) to access services providing by new technologies of information and communication (TIC) while making it easier for the doctors to achieve a first diagnosis on the patient's disease. In addition, an ARSCPS has been improved and developed for normal and pathology voice while establishing a link with our graphic interface which is based on the box tools Hidden Markov Model Toolkit (HTK), in addition to the Hidden Models of Markov (HMM). In our work we used different techniques of feature extraction for the speech recognition system in order to improve the dysarthric speech intelligibility while developing an ARSCPS which can perform well for pathological and normal speakers. These techniques are based on the coefficients of ETSI standard Mel Frequency Cepstral Coefficient Front End (ETSI MFCC FE V2.0); Perceptual Linear Prediction coefficients (PLP); Mel Frequency Cepstral Coefficients (MFCC) and the recently proposed Power Normalized Cepstral Coefficients (PNCC) have been used as a basis for comparison. In this context we used the Nemours database which contains 11 speakers that represents dysarthric speech and 11 speakers that represents normal speech.

Development of the Arabic Voice Pathology Database and Its Evaluation by Using Speech Features and Machine Learning Algorithms

Journal of healthcare engineering, 2017

A voice disorder database is an essential element in doing research on automatic voice disorder detection and classification. Ethnicity affects the voice characteristics of a person, and so it is necessary to develop a database by collecting the voice samples of the targeted ethnic group. This will enhance the chances of arriving at a global solution for the accurate and reliable diagnosis of voice disorders by understanding the characteristics of a local group. Motivated by such idea, an Arabic voice pathology database (AVPD) is designed and developed in this study by recording three vowels, running speech, and isolated words. For each recorded samples, the perceptual severity is also provided which is a unique aspect of the AVPD. During the development of the AVPD, the shortcomings of different voice disorder databases were identified so that they could be avoided in the AVPD. In addition, the AVPD is evaluated by using six different types of speech features and four types of mach...

IJERT-Diagnosis of Disordered Speech using Automatic Speech Recognition

International Journal of Engineering Research and Technology (IJERT), 2020

https://www.ijert.org/diagnosis-of-disordered-speech-using-automatic-speech-recognition https://www.ijert.org/research/diagnosis-of-disordered-speech-using-automatic-speech-recognition-IJERTCONV8IS11029.pdf The purpose of the project is to help the speech disordered people communicate effectively. Speech is the effective way of communication. Many speech disorders are caused due to stroke. In this paper speech disorders in adults and various software available in the market has been described. The people suffering from speech disorders produce a 17.6% of mispronunciations in the phonetic level. The study of speech signals and the methods to process them is called speech processing. In order to study the speech signals methods such as speech coding, speech synthesis, speech recognition and speaker recognition plays a vital role. Among all these, speech recognition is considered in this paper. Speech recognition is converting the acoustic signals obtained from the speaker which is given as an input to microphone. The obtained input is then generated as a set of words. There must be electronic circuits in order to extract linguistic properties obtained from the speaker. This project is designed using MFCC as a feature extractor and support vector machine as the classifier.

Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders

Logopedics Phonatrics …, 2009

In general opinion computerized automatic speech recognition (ASR) seems to be regarded as a method only to accomplish transcriptions from spoken language to written text and as such quite insecure and rather cumbersome. However, due to great advances in computer technology and informatics methodology ASR has nowadays become quite dependable and easier to handle, and the number of applications has increased considerably. After some introductory background information on ASR a number of applications of great interest for professionals in voice, speech, and language therapy are pointed out. In the foreseeable future, the keyboard and mouse will by means of ASR technology be replaced in many functions by a microphone as the humanÁcomputer interface, and the computer will talk back via its loud-speaker. It seems important that professionals engaged in the care of oral communication disorders take part in this development so their clients may get the optimal benefit from this new technology.