Fernando Villavicencio | National Institute of Informatics (original) (raw)

Papers by Fernando Villavicencio

Research paper thumbnail of Vivos Voco: A Survey of Recent Research on Voice Transformations at IRCAM

International Conference on Digital Audio Effects (DAFx), 2011

IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice tra... more IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the ...

Research paper thumbnail of OBSERVATION-MODEL ERROR COMPENSATION FOR ENHANCED SPECTRAL ENVELOPE TRANSFORMATION IN VOICE CONVERSION

A strategy to enhance the signal quality and naturalness was designed for performing probabilisti... more A strategy to enhance the signal quality and naturalness was designed for performing probabilistic spectral envelope transformation in voice conversion. The existing modeling error of the probabilistic mixture to represent the observed envelope features is translated generally as an averaging of the information in the spectral domain, resulting in over-smoothed spectra. Moreover, a transformation based on poorly mod-eled features might not be considered reliable. Our strategy consists of a novel definition of the spectral transformation to compensate the effect of both over-smoothing and poor mod-eling. The results of an experimental evaluation show that the perceived naturalness of converted speech was enhanced.

Research paper thumbnail of Efficient Pitch Estimation on Natural Opera-Singing by a Spectral Correlation based Strategy

We present in this work a study for robust pitch estimation on signals presenting wide-range pitc... more We present in this work a study for robust pitch estimation on signals presenting wide-range pitch con- tent, as is the case of opera singing. Aiming to perform automatic features extraction for the further development of parametric opera singing synthesis technology we evaluate four state-of-the-art pitch estimators, reporting in particular technical details of one introduced in previous work, based in a technique called Spectral Amplitude Autocorrelation (SAC). The results issued from subjective and objetive evaluations show clear performance trends, denoting robust estimation performance for SAC without observing significant sensitivity to the pitch height.

Research paper thumbnail of High-Quality Voice Conversion

The goal of our research work was the application of Voice Conversion on high-quality speech. Our... more The goal of our research work was the application of Voice Conversion on high-quality speech. Our main interests were established in the improvement of current systems quality and the use of high-quality speech. To achieve this, we focused our motivation into the study of improved spectral envelope modeling and timbre modification.

Research paper thumbnail of Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vul... more This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV systems have reached equivalent performances equivalent to those of other biometric modalities. However, spoofing techniques against these systems have also progressed drastically. Experimentation using advanced speech synthesis and voice conversion techniques has showed unacceptable false acceptance rates and several new countermeasure algorithms have been explored to detect spoofing materials accurately. However, the countermeasures proposed so far are based on the acoustic differences between natural speech signals and artificial speech signals, expected to become gradually smaller in the near future. In this paper, we focus on voice liveness detection, which aims to validate whether the presented speech signals originated from a live human. We use the phenomenon of pop noise, which is a distortion that happens when human breath reaches a microphone, as liveness evidence. This paper proposes pop noise detection algorithms and shows through an experimental study that they can be used to discriminate live voice signals from artificial ones generated by means of speech synthesis techniques.

Research paper thumbnail of A STRATEGY FOR LF-BASED GLOTTAL-SOURCE & VOCAL-TRACT ESTIMATION ON STATIONARY MODAL SINGING

Proc. of EUSIPCO 2014, Sep 2014

This paper presents a methodology for estimation and modeling of the glottal source and vocal-t... more This paper presents a methodology for estimation and modeling of the glottal source and vocal-tract information. The strategy pro- poses a simplified framework based on the characteristics of statio- nary singing following a selection of glottal pulse model candidates driven by a single shape parameter. True-Envelope based models are applied, allowing efficient modeling of the observed filter in- formation and accurate cancellation of the glottal source contribu- tion in the spectrum. According to experimental studies on synthetic and real signals the methodology observes adequate approximation of the source and filter information, leading to natural resynthesis quality using synthetic glottal excitation. The proposed estimation framework represents a promising technique for voice transforma- tion on stationary modal voice.

Research paper thumbnail of Glottal Source Model Selection for Stationary Singing-Voice by Low-Band Envelope Matching

Proc. of NOLISP workshop 2013, 2013

In this paper a preliminary study on voice excitation model- ing by single glottal shape paramete... more In this paper a preliminary study on voice excitation model- ing by single glottal shape parameter selection is presented. A strategy for direct model selection by matching derivative glottal source estimates with LF-based candidates driven by the Rd parameter is explored by means of two state-of-the-art similarity measures and a novel one con- sidering spectral envelope information. An experimental study on syn- thetic singing-voice was carried out aiming to compare the performance of the different measures and to observe potential relations with respect to different voice characteristics (e.g. vocal effort, pitch range, amount of aperiodicities and aspiration noise). The results of this study allow us to claim competitive performance of the proposed strategy and suggest us preferable source modeling conditions for stationary singing-voice.

Research paper thumbnail of Non-parallel singing-voice conversion by phoneme-based modeling and covariance approximation

Proc. of Digital Audio Effects (DAF'x) 2011 , Sep 2011

In this work we present an approach to perform voice timbre con- version from unpaired data. Voic... more In this work we present an approach to perform voice timbre con- version from unpaired data. Voice Conversion strategies are com- monly restricted to the use of parallel speech corpora. Our propo- sition is based on two main concepts: the modeling of the timbre space based on phonetic information and a simple approximation of the cross-covariance of source-target features. The experimen- tal results based on the mentioned strategy in singing-voice data of the VOCALOID synthesizer showed a conversion performance comparable to that obtained by Maximum-Likelihood, thereby al- lowing us to achieve singer-timbre conversion from real singing performances.

Research paper thumbnail of Resurrecting past singers: non-parallel singing-voice conversion

Proc. of 1st InterSinging workshop, Sep 2010

We present in this work a strategy to perform timbre conversion from un- paired source and target... more We present in this work a strategy to perform timbre conversion from un- paired source and target data and its application to the singing-voice synthe- sizer VOCALOID to produce sung utterances with a past singer voice. The conversion framework using unpaired data is based on a phoneme-constrained modeling of the timbre space and the assumption of a linear relation between the source and target features. The proposed nonparallel framework resulted in a performance close to the one following the traditional approach based on GMM and paired data. The application to convert an original singer database using sung performances of a past singer observed a successful perception of the past singer’s timbre on the singing-voice utterances performed by VOCALOID.

Research paper thumbnail of Applying voice conversion to concatenative singing-voice synthesis

Proc. of Interspeech 2010, Sep 2010

This work address the application of Voice Conversion to singing-voice. The GMM-based approach wa... more This work address the application of Voice Conversion to singing-voice. The GMM-based approach was applied to VOCALOID, a concatenative singing synthesizer, to perform singer timbre conversion. The conversion framework was applied to full-quality singing databases, achieving a satisfactory conversion effect on the synthesized utterances issued by VOCALOID. We report in this paper a description of our implementation as well as the results of our experimentation focused to study the spectral conversion performance when applied to specific pitch-range data

Research paper thumbnail of GMM-PCA based Speaker-Timbre Conversion on Full-Quality Speech

This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conv... more This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conversion. In general, high-quality voice conversion requires accurate spectral enve- lope estimates, resulting in high-dimensional feature vectors and relatively high computational. Aiming to achieve low- dimensional processing, accurate envelope estimates of the speakers are mel-frequency scaled and projected onto the space defined by a subset of the principal components. The GMM- based features conversion is then performed in the reduced space. Our experimental findings confirm that this strategy pro- vides benefits, especially observed on the resulting converted speech quality, with a significant computational cost reduction.

Research paper thumbnail of Applying Improved Spectral Modeling for High Quality Voice Conversion

Proc. of ICASSP 2009, 2009

In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to ac... more In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conver- sion. True-Envelope based estimators allow model order selection leading to an adaptation of the spectral features to the characteris- tics of the speaker. Optimal residual signals can also be computed following a local adaptation of the model order in terms of the F0. A new perceptual criteria is proposed to measure the impact of the spectral conversion error. The proposed envelope models show improved spectral conversion performance as well as increased converted-speech quality when compared to Linear Prediction.

Research paper thumbnail of Extending efficient spectral envelope modeling to Mel-frequency based representation

Proc. of ICASSP 2008, 2008

In this work we consider the problem of spectral envelope estimation using spectra with perceptua... more In this work we consider the problem of spectral envelope estimation using spectra with perceptually warped frequency axis. The goal of this work is the reduction of the order of the spectral envelope model which will facilitate the use of these envelopes for training of voice conversion systems. We adapt the true-envelope estimator to Mel-frequency representations and adapt a recently proposed cepstral model order selection criterion taking into account the distortion of the frequency axis. We evaluate the modified order selection procedure using a perceptual framework for the evaluation of envelope estimation errors. The experimental evaluation carried out with real speech confirms our modifications. The results demonstrate that the Mel frequency based true envelope estimator achieves superior envelope estimation with significantly reduced model order.

Research paper thumbnail of All-Pole Spectral Envelope Modelling with Order Selection for Harmonic Signals

Proc. of ICASSP 2007, 2007

We present a study into all-pole spectral envelope estimation for the case of harmonic signals. W... more We present a study into all-pole spectral envelope estimation for the case of harmonic signals. We address the problem of the selection of the model order and propose to make use of the fact that the spectral envelope is sampled by means of the harmonic structure to derive a reasonable choice for an appropriate model order. The experimental investigation uses synthetic ARMA featured signals with varying fundamental frequency and differing model structure to evaluate the performance of the selected all-pole models. The experimental results confirm the relation between optimal model order and the fundamental frequency.

Research paper thumbnail of On cepstral and all-pole based spectral envelope modeling with unknown model order

Pattern Recognition Letters, Elsevier Ed., 2007

In this work, we investigate spectral envelope estimation for harmonic signals. We address the is... more In this work, we investigate spectral envelope estimation for harmonic signals. We address the issue of model order selection and propose to make use of the fact that the spectral envelope is sampled by means of the harmonic structure of the signal in order to derive upper bounds for the estimator order. An experimental study is performed using synthetic test signals with various fundamental frequencies and different model structures to evaluate the performance of the envelope models. Experimental results confirm the relation between optimal model order and fundamental frequency.

Research paper thumbnail of Improving Lpc Spectral Envelope Extraction Of Voiced Speech By True-Envelope Estimation

In this work we address the problem of all pole spectral envelope estimation for speech signals. ... more In this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely from model order mismatch. We will propose a procedure to first establish a band limited interpolation of the observed spectrum using a recently rediscovered true envelope estimator and then using the band limited envelope to derive an all pole envelope model named TE-LPC . The band-limited envelope that is used to derive the all pole envelope model reduces the problem of the unknown all pole model order. For the experimental investigation we propose a new perceptually motivated residual spectral peak flatness measure. The experimental results demonstrate that the proposed method significantly increases the spectral flatness for the perceptually especially important low order harmonics of voiced utterances

Research paper thumbnail of Vivos Voco: A Survey of Recent Research on Voice Transformations at IRCAM

International Conference on Digital Audio Effects (DAFx), 2011

IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice tra... more IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the ...

Research paper thumbnail of Vivos Voco: A Survey of Recent Research on Voice Transformations at IRCAM

International Conference on Digital Audio Effects (DAFx), 2011

IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice tra... more IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the ...

Research paper thumbnail of OBSERVATION-MODEL ERROR COMPENSATION FOR ENHANCED SPECTRAL ENVELOPE TRANSFORMATION IN VOICE CONVERSION

A strategy to enhance the signal quality and naturalness was designed for performing probabilisti... more A strategy to enhance the signal quality and naturalness was designed for performing probabilistic spectral envelope transformation in voice conversion. The existing modeling error of the probabilistic mixture to represent the observed envelope features is translated generally as an averaging of the information in the spectral domain, resulting in over-smoothed spectra. Moreover, a transformation based on poorly mod-eled features might not be considered reliable. Our strategy consists of a novel definition of the spectral transformation to compensate the effect of both over-smoothing and poor mod-eling. The results of an experimental evaluation show that the perceived naturalness of converted speech was enhanced.

Research paper thumbnail of Efficient Pitch Estimation on Natural Opera-Singing by a Spectral Correlation based Strategy

We present in this work a study for robust pitch estimation on signals presenting wide-range pitc... more We present in this work a study for robust pitch estimation on signals presenting wide-range pitch con- tent, as is the case of opera singing. Aiming to perform automatic features extraction for the further development of parametric opera singing synthesis technology we evaluate four state-of-the-art pitch estimators, reporting in particular technical details of one introduced in previous work, based in a technique called Spectral Amplitude Autocorrelation (SAC). The results issued from subjective and objetive evaluations show clear performance trends, denoting robust estimation performance for SAC without observing significant sensitivity to the pitch height.

Research paper thumbnail of High-Quality Voice Conversion

The goal of our research work was the application of Voice Conversion on high-quality speech. Our... more The goal of our research work was the application of Voice Conversion on high-quality speech. Our main interests were established in the improvement of current systems quality and the use of high-quality speech. To achieve this, we focused our motivation into the study of improved spectral envelope modeling and timbre modification.

Research paper thumbnail of Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vul... more This paper proposes a novel countermeasure framework to detect spoofing attacks to reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV systems have reached equivalent performances equivalent to those of other biometric modalities. However, spoofing techniques against these systems have also progressed drastically. Experimentation using advanced speech synthesis and voice conversion techniques has showed unacceptable false acceptance rates and several new countermeasure algorithms have been explored to detect spoofing materials accurately. However, the countermeasures proposed so far are based on the acoustic differences between natural speech signals and artificial speech signals, expected to become gradually smaller in the near future. In this paper, we focus on voice liveness detection, which aims to validate whether the presented speech signals originated from a live human. We use the phenomenon of pop noise, which is a distortion that happens when human breath reaches a microphone, as liveness evidence. This paper proposes pop noise detection algorithms and shows through an experimental study that they can be used to discriminate live voice signals from artificial ones generated by means of speech synthesis techniques.

Research paper thumbnail of A STRATEGY FOR LF-BASED GLOTTAL-SOURCE & VOCAL-TRACT ESTIMATION ON STATIONARY MODAL SINGING

Proc. of EUSIPCO 2014, Sep 2014

This paper presents a methodology for estimation and modeling of the glottal source and vocal-t... more This paper presents a methodology for estimation and modeling of the glottal source and vocal-tract information. The strategy pro- poses a simplified framework based on the characteristics of statio- nary singing following a selection of glottal pulse model candidates driven by a single shape parameter. True-Envelope based models are applied, allowing efficient modeling of the observed filter in- formation and accurate cancellation of the glottal source contribu- tion in the spectrum. According to experimental studies on synthetic and real signals the methodology observes adequate approximation of the source and filter information, leading to natural resynthesis quality using synthetic glottal excitation. The proposed estimation framework represents a promising technique for voice transforma- tion on stationary modal voice.

Research paper thumbnail of Glottal Source Model Selection for Stationary Singing-Voice by Low-Band Envelope Matching

Proc. of NOLISP workshop 2013, 2013

In this paper a preliminary study on voice excitation model- ing by single glottal shape paramete... more In this paper a preliminary study on voice excitation model- ing by single glottal shape parameter selection is presented. A strategy for direct model selection by matching derivative glottal source estimates with LF-based candidates driven by the Rd parameter is explored by means of two state-of-the-art similarity measures and a novel one con- sidering spectral envelope information. An experimental study on syn- thetic singing-voice was carried out aiming to compare the performance of the different measures and to observe potential relations with respect to different voice characteristics (e.g. vocal effort, pitch range, amount of aperiodicities and aspiration noise). The results of this study allow us to claim competitive performance of the proposed strategy and suggest us preferable source modeling conditions for stationary singing-voice.

Research paper thumbnail of Non-parallel singing-voice conversion by phoneme-based modeling and covariance approximation

Proc. of Digital Audio Effects (DAF'x) 2011 , Sep 2011

In this work we present an approach to perform voice timbre con- version from unpaired data. Voic... more In this work we present an approach to perform voice timbre con- version from unpaired data. Voice Conversion strategies are com- monly restricted to the use of parallel speech corpora. Our propo- sition is based on two main concepts: the modeling of the timbre space based on phonetic information and a simple approximation of the cross-covariance of source-target features. The experimen- tal results based on the mentioned strategy in singing-voice data of the VOCALOID synthesizer showed a conversion performance comparable to that obtained by Maximum-Likelihood, thereby al- lowing us to achieve singer-timbre conversion from real singing performances.

Research paper thumbnail of Resurrecting past singers: non-parallel singing-voice conversion

Proc. of 1st InterSinging workshop, Sep 2010

We present in this work a strategy to perform timbre conversion from un- paired source and target... more We present in this work a strategy to perform timbre conversion from un- paired source and target data and its application to the singing-voice synthe- sizer VOCALOID to produce sung utterances with a past singer voice. The conversion framework using unpaired data is based on a phoneme-constrained modeling of the timbre space and the assumption of a linear relation between the source and target features. The proposed nonparallel framework resulted in a performance close to the one following the traditional approach based on GMM and paired data. The application to convert an original singer database using sung performances of a past singer observed a successful perception of the past singer’s timbre on the singing-voice utterances performed by VOCALOID.

Research paper thumbnail of Applying voice conversion to concatenative singing-voice synthesis

Proc. of Interspeech 2010, Sep 2010

This work address the application of Voice Conversion to singing-voice. The GMM-based approach wa... more This work address the application of Voice Conversion to singing-voice. The GMM-based approach was applied to VOCALOID, a concatenative singing synthesizer, to perform singer timbre conversion. The conversion framework was applied to full-quality singing databases, achieving a satisfactory conversion effect on the synthesized utterances issued by VOCALOID. We report in this paper a description of our implementation as well as the results of our experimentation focused to study the spectral conversion performance when applied to specific pitch-range data

Research paper thumbnail of GMM-PCA based Speaker-Timbre Conversion on Full-Quality Speech

This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conv... more This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conversion. In general, high-quality voice conversion requires accurate spectral enve- lope estimates, resulting in high-dimensional feature vectors and relatively high computational. Aiming to achieve low- dimensional processing, accurate envelope estimates of the speakers are mel-frequency scaled and projected onto the space defined by a subset of the principal components. The GMM- based features conversion is then performed in the reduced space. Our experimental findings confirm that this strategy pro- vides benefits, especially observed on the resulting converted speech quality, with a significant computational cost reduction.

Research paper thumbnail of Applying Improved Spectral Modeling for High Quality Voice Conversion

Proc. of ICASSP 2009, 2009

In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to ac... more In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conver- sion. True-Envelope based estimators allow model order selection leading to an adaptation of the spectral features to the characteris- tics of the speaker. Optimal residual signals can also be computed following a local adaptation of the model order in terms of the F0. A new perceptual criteria is proposed to measure the impact of the spectral conversion error. The proposed envelope models show improved spectral conversion performance as well as increased converted-speech quality when compared to Linear Prediction.

Research paper thumbnail of Extending efficient spectral envelope modeling to Mel-frequency based representation

Proc. of ICASSP 2008, 2008

In this work we consider the problem of spectral envelope estimation using spectra with perceptua... more In this work we consider the problem of spectral envelope estimation using spectra with perceptually warped frequency axis. The goal of this work is the reduction of the order of the spectral envelope model which will facilitate the use of these envelopes for training of voice conversion systems. We adapt the true-envelope estimator to Mel-frequency representations and adapt a recently proposed cepstral model order selection criterion taking into account the distortion of the frequency axis. We evaluate the modified order selection procedure using a perceptual framework for the evaluation of envelope estimation errors. The experimental evaluation carried out with real speech confirms our modifications. The results demonstrate that the Mel frequency based true envelope estimator achieves superior envelope estimation with significantly reduced model order.

Research paper thumbnail of All-Pole Spectral Envelope Modelling with Order Selection for Harmonic Signals

Proc. of ICASSP 2007, 2007

We present a study into all-pole spectral envelope estimation for the case of harmonic signals. W... more We present a study into all-pole spectral envelope estimation for the case of harmonic signals. We address the problem of the selection of the model order and propose to make use of the fact that the spectral envelope is sampled by means of the harmonic structure to derive a reasonable choice for an appropriate model order. The experimental investigation uses synthetic ARMA featured signals with varying fundamental frequency and differing model structure to evaluate the performance of the selected all-pole models. The experimental results confirm the relation between optimal model order and the fundamental frequency.

Research paper thumbnail of On cepstral and all-pole based spectral envelope modeling with unknown model order

Pattern Recognition Letters, Elsevier Ed., 2007

In this work, we investigate spectral envelope estimation for harmonic signals. We address the is... more In this work, we investigate spectral envelope estimation for harmonic signals. We address the issue of model order selection and propose to make use of the fact that the spectral envelope is sampled by means of the harmonic structure of the signal in order to derive upper bounds for the estimator order. An experimental study is performed using synthetic test signals with various fundamental frequencies and different model structures to evaluate the performance of the envelope models. Experimental results confirm the relation between optimal model order and fundamental frequency.

Research paper thumbnail of Improving Lpc Spectral Envelope Extraction Of Voiced Speech By True-Envelope Estimation

In this work we address the problem of all pole spectral envelope estimation for speech signals. ... more In this work we address the problem of all pole spectral envelope estimation for speech signals. The currently widely used all pole spectral envelope model suffers from well-known systematic errors and more severely from model order mismatch. We will propose a procedure to first establish a band limited interpolation of the observed spectrum using a recently rediscovered true envelope estimator and then using the band limited envelope to derive an all pole envelope model named TE-LPC . The band-limited envelope that is used to derive the all pole envelope model reduces the problem of the unknown all pole model order. For the experimental investigation we propose a new perceptually motivated residual spectral peak flatness measure. The experimental results demonstrate that the proposed method significantly increases the spectral flatness for the perceptually especially important low order harmonics of voiced utterances

Research paper thumbnail of Vivos Voco: A Survey of Recent Research on Voice Transformations at IRCAM

International Conference on Digital Audio Effects (DAFx), 2011

IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice tra... more IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the ...