Sid-Ahmed Selouani - Academia.edu (original) (raw)
Papers by Sid-Ahmed Selouani
This paper presents a new hybrid approach which aims to overcome the drawbacks of automatic speec... more This paper presents a new hybrid approach which aims to overcome the drawbacks of automatic speech recognition systems when faced with complex Arabic phonetic features such as emphasis, gemination and relevant vowel lengthening. The approach consists of using hearing/perception-based cues and dividing the global task of recognition into simple and well-defined subtasks. The sub-tasks are assigned to a set of Time-Delay Neural Networks using an autoregressive version of the backpropagation algorithm (AR-TDNNs). When they are incorporated in a hybrid structure, ARTDNNs act as postprocessors of a HMM-based system. The reported results showed that for either static or dynamic acoustic features, the hybrid system performs significantly better than its corresponding baseline system.
This paper deals with a new indicative features recognition system for Arabic which uses a set of... more This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linear predictive technique is used. The ability of the system has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledge based system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better
This paper proposes an approach for reliably identifying complex Arabic phonemes in continuous sp... more This paper proposes an approach for reliably identifying complex Arabic phonemes in continuous speech. This is proposed to be done by a mixture of artificial neural experts. These experts are typically time delay neural networks using an original version of the autoregressive backpropagation algorithm (AR-TDNN). A module using specific cues generated by an ear model operates the speech phone segmentation. Perceptual linear predictive (PLP) coefficients, energy, zero crossing rate and their derivatives are used as input parameters. Serial and parallel architectures of AR-TDNN have been implemented and confronted to a monolithic system using simple backpropagation algorithm.
International Journal on Artificial Intelligence Tools, 1999
This paper deals with a new indicative features recognition system for Arabic which uses a set of... more This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linear predictive (PLP) technique is used. The ability of the system, has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledgebased system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better.
This paper presents a new system for radiological image classification. The proposed system is bu... more This paper presents a new system for radiological image classification. The proposed system is built on Hidden Markov Models (HMMs). In this work, the Hidden Markov Models Toolkit (HTK) is adapted to deal with image classification issue. HTK was primarily designed for speech recognition research. Features are extracted through Shape context descriptor. They are converted to HTK format by first adding headers, then, representing them in successive frames. Each frame is multiplied by a windowing function. Features are used by HTK for training and classification. Classes of the medical IRMA database are used in experiments. A comparison with a neural network based system shows the efficiency of the proposed approach.
Page 1. Investigating Automatic Recognition of Non-Native Arabic Speech Sid-Ahmed Selouani 1, You... more Page 1. Investigating Automatic Recognition of Non-Native Arabic Speech Sid-Ahmed Selouani 1, Yousef Ajami Alotaibi 2 1 LARIHS Lab. Universite de Moncton, Campus de Shippagan, Canada. 2Computer Engineering Department, King Saud University, Saudi Arabia ...
International Journal of Computer Processing of Oriental Languages, 2009
Abstract: Compared to other major languages of the world, the Arabic language suffers from a dear... more Abstract: Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of ...
This paper addresses the issue of noise reduction applied to robust large-vocabulary continuous-s... more This paper addresses the issue of noise reduction applied to robust large-vocabulary continuous-speech recognition (CSR). We investigate strategies based on the subspace filtering that has been proven very effective in the area of speech enhancement. We compare original hybrid techniques that combine the Karhonen-Loeve transform (KLT), multilayer perceptron (MLP) and genetic algorithms (GAs) in order to get less-variant Mel-frequency parameters. The advantages of these methods include that they do not require estimation of either noise or speech spectra. To evaluate the effectiveness of these methods, an extensive set of recognition experiments are carried out in a severe interfering car noise environment for a wide range of SNRs varying from 16 dB to -4 dB using a noisy version of the TIMIT database.
... Shaughnessy1 1INRS-EMT, 800 de la Gaucheti`ere O, H5A 1K6, Montréal, Qc, Canada {benabd, doug... more ... Shaughnessy1 1INRS-EMT, 800 de la Gaucheti`ere O, H5A 1K6, Montréal, Qc, Canada {benabd, dougo}@inrs.emt.ca 2Université de Moncton, campus de Shippagan E8S 1P6 NB, Canada {bensalem, selouani}@umcs.ca ABSTRACT ...
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech... more In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance in noisy car environments.
Eurasip Journal on Audio, Speech, and Music Processing, 2008
The automatic recognition of foreign-accented Arabic speech is a challenging task since it involv... more The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA). We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC) and the hidden Markov Model Toolkit (HTK) are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.
Eurasip Journal on Advances in Signal Processing, 2003
Limiting the decrease in performance due to acoustic environment changes remains a major challeng... more Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the melfrequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to −4 dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algo... more We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal sub-space selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like over-estimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech... more In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance in noisy car environments.
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algo... more We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal subspace selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like overestimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
In this paper we report the results of a comparative study on blind speech signal separation appr... more In this paper we report the results of a comparative study on blind speech signal separation approaches. Three algorithms, Oriented Principal Component Analysis (OPCA), High Order Statistics (HOS), and Fast Independent Component Analysis (Fast-ICA), are objectively compared in terms of signal-to-interference ratio criteria. The results of experiments carried out using the TIMIT and AURORA speech databases show that OPCA outperforms the other techniques. It turns out that OPCA can be used for blindly separating temporal signals from their linear mixtures without need for a pre-whitening step.
This paper presents a new hybrid approach which aims to overcome the drawbacks of automatic speec... more This paper presents a new hybrid approach which aims to overcome the drawbacks of automatic speech recognition systems when faced with complex Arabic phonetic features such as emphasis, gemination and relevant vowel lengthening. The approach consists of using hearing/perception-based cues and dividing the global task of recognition into simple and well-defined subtasks. The sub-tasks are assigned to a set of Time-Delay Neural Networks using an autoregressive version of the backpropagation algorithm (AR-TDNNs). When they are incorporated in a hybrid structure, ARTDNNs act as postprocessors of a HMM-based system. The reported results showed that for either static or dynamic acoustic features, the hybrid system performs significantly better than its corresponding baseline system.
This paper deals with a new indicative features recognition system for Arabic which uses a set of... more This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linear predictive technique is used. The ability of the system has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledge based system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better
This paper proposes an approach for reliably identifying complex Arabic phonemes in continuous sp... more This paper proposes an approach for reliably identifying complex Arabic phonemes in continuous speech. This is proposed to be done by a mixture of artificial neural experts. These experts are typically time delay neural networks using an original version of the autoregressive backpropagation algorithm (AR-TDNN). A module using specific cues generated by an ear model operates the speech phone segmentation. Perceptual linear predictive (PLP) coefficients, energy, zero crossing rate and their derivatives are used as input parameters. Serial and parallel architectures of AR-TDNN have been implemented and confronted to a monolithic system using simple backpropagation algorithm.
International Journal on Artificial Intelligence Tools, 1999
This paper deals with a new indicative features recognition system for Arabic which uses a set of... more This paper deals with a new indicative features recognition system for Arabic which uses a set of a simplified version of sub-neural-networks (SNN). For the analysis of speech, the perceptual linear predictive (PLP) technique is used. The ability of the system, has been tested in experiments using stimuli uttered by 6 native Algerian speakers. The identification results have been confronted to those obtained by the SARPH knowledgebased system. Our interest goes to the particularities of Arabic such as geminate and emphatic consonants and the duration. The results show that SNN achieved well in pure identification while in the case of phonologic duration the knowledge-based system performs better.
This paper presents a new system for radiological image classification. The proposed system is bu... more This paper presents a new system for radiological image classification. The proposed system is built on Hidden Markov Models (HMMs). In this work, the Hidden Markov Models Toolkit (HTK) is adapted to deal with image classification issue. HTK was primarily designed for speech recognition research. Features are extracted through Shape context descriptor. They are converted to HTK format by first adding headers, then, representing them in successive frames. Each frame is multiplied by a windowing function. Features are used by HTK for training and classification. Classes of the medical IRMA database are used in experiments. A comparison with a neural network based system shows the efficiency of the proposed approach.
Page 1. Investigating Automatic Recognition of Non-Native Arabic Speech Sid-Ahmed Selouani 1, You... more Page 1. Investigating Automatic Recognition of Non-Native Arabic Speech Sid-Ahmed Selouani 1, Yousef Ajami Alotaibi 2 1 LARIHS Lab. Universite de Moncton, Campus de Shippagan, Canada. 2Computer Engineering Department, King Saud University, Saudi Arabia ...
International Journal of Computer Processing of Oriental Languages, 2009
Abstract: Compared to other major languages of the world, the Arabic language suffers from a dear... more Abstract: Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of ...
This paper addresses the issue of noise reduction applied to robust large-vocabulary continuous-s... more This paper addresses the issue of noise reduction applied to robust large-vocabulary continuous-speech recognition (CSR). We investigate strategies based on the subspace filtering that has been proven very effective in the area of speech enhancement. We compare original hybrid techniques that combine the Karhonen-Loeve transform (KLT), multilayer perceptron (MLP) and genetic algorithms (GAs) in order to get less-variant Mel-frequency parameters. The advantages of these methods include that they do not require estimation of either noise or speech spectra. To evaluate the effectiveness of these methods, an extensive set of recognition experiments are carried out in a severe interfering car noise environment for a wide range of SNRs varying from 16 dB to -4 dB using a noisy version of the TIMIT database.
... Shaughnessy1 1INRS-EMT, 800 de la Gaucheti`ere O, H5A 1K6, Montréal, Qc, Canada {benabd, doug... more ... Shaughnessy1 1INRS-EMT, 800 de la Gaucheti`ere O, H5A 1K6, Montréal, Qc, Canada {benabd, dougo}@inrs.emt.ca 2Université de Moncton, campus de Shippagan E8S 1P6 NB, Canada {bensalem, selouani}@umcs.ca ABSTRACT ...
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech... more In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance in noisy car environments.
Eurasip Journal on Audio, Speech, and Music Processing, 2008
The automatic recognition of foreign-accented Arabic speech is a challenging task since it involv... more The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA). We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC) and the hidden Markov Model Toolkit (HTK) are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.
Eurasip Journal on Advances in Signal Processing, 2003
Limiting the decrease in performance due to acoustic environment changes remains a major challeng... more Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the melfrequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to −4 dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algo... more We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal sub-space selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like over-estimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech... more In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance in noisy car environments.
We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algo... more We present in this paper a signal subspace-based approach for enhancing a noisy signal. This algorithm is based on a principal component analysis (PCA) in which the optimal subspace selection is provided by a variance of the reconstruction error (VRE) criterion. This choice overcomes many limitations encountered with other selection criteria, like overestimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. The performance evaluation, which is made on the Aurora database, measures improvements in the distributed speech recognition of noisy signals corrupted by different types of additive noises. Our algorithm succeeds in improving the recognition of noisy speech in all noisy conditions.
In this paper we report the results of a comparative study on blind speech signal separation appr... more In this paper we report the results of a comparative study on blind speech signal separation approaches. Three algorithms, Oriented Principal Component Analysis (OPCA), High Order Statistics (HOS), and Fast Independent Component Analysis (Fast-ICA), are objectively compared in terms of signal-to-interference ratio criteria. The results of experiments carried out using the TIMIT and AURORA speech databases show that OPCA outperforms the other techniques. It turns out that OPCA can be used for blindly separating temporal signals from their linear mixtures without need for a pre-whitening step.