Youcef Tabet - Academia.edu (original) (raw)
Papers by Youcef Tabet
EAI/Springer Innovations in Communication and Computing, Jun 22, 2023
Service Oriented Computing and Applications
International Journal of Electronics, Communications, and Measurement Engineering
A new speech signal analysis method referred to as a refined iterative adaptive method (RIAM) is ... more A new speech signal analysis method referred to as a refined iterative adaptive method (RIAM) is introduced in this paper. Based on time-varying adaptive sinusoidal modeling, the RIAM method tries to determine in an iterative adaptive manner the instantaneous components of time-varying quasi-periodic multi-component signals such as voiced speech. The proposed method can adjust the current analysis parameters to the time-varying characteristics of the speech signal. This is done using a refined iterative sinusoidal parameter estimation algorithm based on a frequency correction mechanism combined with an adaptive scheme. The experiments on voiced speech demonstrate that the proposed RIAM algorithm outperforms some well-known state-of-the-art approaches. The RIAM algorithm provides a higher signal to reconstruction error ratio (SRER) of 42.267 dB with an improvement of 19.752 dB, 5.054 dB, and 2.552 dB compared to the conventional sinusoidal model (SM), adaptive harmonic model (aHM), a...
Procedia Computer Science, 2015
For Speech Synthesis, the understanding of the physical and mathematical models of speech is esse... more For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech.
International Workshop on Systems, Signal Processing and their Applications, WOSSPA, 2011
ABSTRACT The goal of this paper is to provide a short but a comprehensive overview of Text-To-Spe... more ABSTRACT The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.
International Journal of Speech Technology, 2018
This paper explores common speech signal representations along with a brief description of their ... more This paper explores common speech signal representations along with a brief description of their corresponding analysissynthesis stages. The main focus is on adaptive sinusoidal representations where a refined model of speech is suggested. This model is referred to as Refined adaptive Sinusoidal Representation (R_aSR). Based on the performance of the recently suggested adaptive Sinusoidal Models of speech, significant refinements are proposed at both the analysis and adaptive stages. First, a quasi-harmonic representation of speech is used in the analysis stage in order to obtain an initial estimation of the instantaneous model parameters. Next, in the adaptive stage, an adaptive scheme combined with an iterative frequency correction mechanism is used to allow a robust estimation of model parameters (amplitudes, frequencies, and phases). Finally, the speech signal is reconstructed as a sum of its estimated time-varying instantaneous components after an interpolation scheme. Objective evaluation tests prove that the suggested R_aSR achieves high quality reconstruction when applied in modeling voiced speech signals compared to state-of-the-art models. Moreover, transparent perceived quality was attained using the R_aSR according to results obtained from listening evaluation tests.
EAI/Springer Innovations in Communication and Computing, Jun 22, 2023
Service Oriented Computing and Applications
International Journal of Electronics, Communications, and Measurement Engineering
A new speech signal analysis method referred to as a refined iterative adaptive method (RIAM) is ... more A new speech signal analysis method referred to as a refined iterative adaptive method (RIAM) is introduced in this paper. Based on time-varying adaptive sinusoidal modeling, the RIAM method tries to determine in an iterative adaptive manner the instantaneous components of time-varying quasi-periodic multi-component signals such as voiced speech. The proposed method can adjust the current analysis parameters to the time-varying characteristics of the speech signal. This is done using a refined iterative sinusoidal parameter estimation algorithm based on a frequency correction mechanism combined with an adaptive scheme. The experiments on voiced speech demonstrate that the proposed RIAM algorithm outperforms some well-known state-of-the-art approaches. The RIAM algorithm provides a higher signal to reconstruction error ratio (SRER) of 42.267 dB with an improvement of 19.752 dB, 5.054 dB, and 2.552 dB compared to the conventional sinusoidal model (SM), adaptive harmonic model (aHM), a...
Procedia Computer Science, 2015
For Speech Synthesis, the understanding of the physical and mathematical models of speech is esse... more For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech.
International Workshop on Systems, Signal Processing and their Applications, WOSSPA, 2011
ABSTRACT The goal of this paper is to provide a short but a comprehensive overview of Text-To-Spe... more ABSTRACT The goal of this paper is to provide a short but a comprehensive overview of Text-To-Speech synthesis by highlighting its digital signal processing component. First two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained then the concatenative synthesis is explored. Concatenative synthesis is simpler than rule-based synthesis, since there is no need to determine speech production rules. However, it introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances for each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. To resolve mismatches speech synthesis system combines the unit-selection method with Harmonic plus Noise Model (HNM). This model represents speech signal as a sum of a harmonic and noise part. The decomposition of speech signal into these two parts enables more natural sounding modifications of the signal. Finally Hidden Markov model(HMM) synthesis combined with an HNM model is introduced in order to obtain a Text-To-Speech system that requires smaller development time and cost.
International Journal of Speech Technology, 2018
This paper explores common speech signal representations along with a brief description of their ... more This paper explores common speech signal representations along with a brief description of their corresponding analysissynthesis stages. The main focus is on adaptive sinusoidal representations where a refined model of speech is suggested. This model is referred to as Refined adaptive Sinusoidal Representation (R_aSR). Based on the performance of the recently suggested adaptive Sinusoidal Models of speech, significant refinements are proposed at both the analysis and adaptive stages. First, a quasi-harmonic representation of speech is used in the analysis stage in order to obtain an initial estimation of the instantaneous model parameters. Next, in the adaptive stage, an adaptive scheme combined with an iterative frequency correction mechanism is used to allow a robust estimation of model parameters (amplitudes, frequencies, and phases). Finally, the speech signal is reconstructed as a sum of its estimated time-varying instantaneous components after an interpolation scheme. Objective evaluation tests prove that the suggested R_aSR achieves high quality reconstruction when applied in modeling voiced speech signals compared to state-of-the-art models. Moreover, transparent perceived quality was attained using the R_aSR according to results obtained from listening evaluation tests.