Exemplar-Based Spectral Detail Compensation for Voice Conversion (original) (raw)
Related papers
Applying improved spectral modeling for High Quality voice conversion
2009
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conversion. True-Envelope based estimators allow model order selection leading to an adaptation of the spectral features to the characteristics of the speaker. Optimal residual signals can also be computed following a local adaptation of the model order in terms of the F0. A new perceptual criteria is proposed to measure the impact of the spectral conversion error. The proposed envelope models show improved spectral conversion performance as well as increased converted-speech quality when compared to Linear Prediction.
A voice conversion method based on joint pitch and spectral envelope transformation
… International Conference on …, 2004
Most of the research in Voice Conversion (VC) is devoted to spectral transformation while the conversion of prosodic features is essentially obtained through a simple linear transformation of pitch. These separate transformations lead to an unsatisfactory speech conversion quality, especially when the speaking styles of the source and target speakers are different. In this paper, we propose a method capable of jointly converting pitch and spectral envelope information. The parameters to be transformed are obtained by combining scaled pitch values with the spectral envelope parameters for the voiced frames and only spectral envelope parameters for the unvoiced ones. These parameters are clustered using a Gaussian Mixture Model (GMM). Then the transformation functions are determined using a conditional expectation estimator. Tests carried out show that, this process leads to a satisfactory pitch transformation. Moreover, it makes the spectral envelope transformation more robust.
A strategy to enhance the signal quality and naturalness was designed for performing probabilistic spectral envelope transformation in voice conversion. The existing modeling error of the probabilistic mixture to represent the observed envelope features is translated generally as an averaging of the information in the spectral domain, resulting in over-smoothed spectra. Moreover, a transformation based on poorly mod-eled features might not be considered reliable. Our strategy consists of a novel definition of the spectral transformation to compensate the effect of both over-smoothing and poor mod-eling. The results of an experimental evaluation show that the perceived naturalness of converted speech was enhanced.
Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion
recherche.ircam.fr
Gaussian mixture model (GMM)-based spectral voice conversion (VC) can be performed in real-time by applying the conversion method frame by frame. However, this local method can produce inappropriate trajectories of parameters and the converted spectrum can be excessively smoothed due to the statistical approach. In order to address these limitations, we propose an approach based on a new Extended Conditional GMM model. Two different features vectors are used for the description of the source characteristics: one is specifically designed for a precise description of the spectral features to be transformed, the other one being designed for the selection of the transformations to be applied. The latter include local descriptors of the trajectories of parameters via Discrete Cosine Transform (DCT) coefficients in order to generate local trajectories of parameters. Finally, the effect of over-smoothing is alleviated by a covariance matrix correction method. The proposed VC method is evaluated objectively and subjectively, showing a dramatic improvement compared to conventional VC method.
Fast locally linear embedding algorithm for exemplar-based voice conversion
2017
The locally linear embedding (LLE) algorithm has been proven to have high output quality and applicability for voice conversion (VC) tasks. However, the major shortcoming of the LLE-based VC approach is the time complexity (especially in the matrix inversion process) during the conversion phase. In this paper, we propose a fast version of the LLE algorithm that significantly reduces the complexity. In the proposed method, each locally linear patch on the data manifold is described by a pre-computed cluster of exemplars, and thus the major part of on-line computation can be carried out beforehand in the off-line phase. Experimental results demonstrate that the VC performance of the proposed fast LLE algorithm is comparable to that of the original LLE algorithm and that a real-time VC system becomes possible because of the highly reduced time complexity.
High quality voice conversion based on Gaussian mixture model with dynamic frequency warping
2001
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM), quality of the converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we newly propose the GMM-based algorithm with the Dynamic Frequency Warping (DFW) to avoid the over-smoothing. We also propose that the converted spectrum is calculated by mixing the GMM-based converted spectrum and the DFW-based converted spectrum, to avoid the deterioration of conversion-accuracy on speaker individuality. Results of the evaluation experiments clarify that the converted speech quality is better than that of the GMMbased algorithm, and the conversion-accuracy on speaker individuality is the same as that of the GMM-based algorithm in the proposed algorithm with the proper weight for mixing spectra.
Locally Linear Embedding for Exemplar-Based Spectral Conversion
Interspeech 2016, 2016
This paper describes a novel exemplar-based spectral conversion (SC) system developed by the AST (Academia Sinica, Taipei) team for the 2016 voice conversion challenge (vcc2016). The key feature of our system is that it integrates the locally linear embedding (LLE) algorithm, a manifold learning algorithm that has been successfully applied for the super-resolution task in image processing, with the conventional exemplar-based SC method. To further improve the quality of the converted speech, our system also incorporates (1) the maximum likelihood parameter generation (MLPG) algorithm, (2) the postfiltering-based global variance (GV) compensation method, and (3) a high-resolution feature extraction process. The results of subjective evaluation conducted by the vcc2016 organizer show that our LLE-exemplarbased SC system notably outperforms the baseline GMMbased system (implemented by the vcc2016 organizer). Moreover, our own internal evaluation results confirm the effectiveness of the major LLE-exemplar-based SC method and the three additional approaches with improved speech quality.
Efficient model re-estimation in voice conversion
2008 16th European Signal Processing Conference, 2008
Voice conversion systems aim at converting an utterance spoken by one speaker to sound as speech uttered by a second speaker. Over the last few years, the interest towards voice conversion has risen immensely. Gaussian mixture model (GMM) based techniques have been found to be efficient in the transformation of features represented as scalars or vectors. However, reasonably large amount of aligned training data is needed to achieve good results. To solve this problem, this paper presents an efficient model re-estimation scheme. The proposed technique is based on adjusting an existing well-trained conversion model for a new target speaker with only a very small amount of training data. The experimental results provided in the paper demonstrate the efficiency of the re-estimation approach in line spectral frequency conversion and show that the proposed approach can reach good performance while using only a very limited amount of adaptation data.
Voice Conversion Using GMM with Enhanced Global Variance
2011
The goal of voice conversion is to transform a sentence said by one speaker, to sound as if another speaker had said it. The classical conversion based on a Gaussian Mixture Model and several other schemes suggested since, produce muffled sounding outputs, due to excessive smoothing of the spectral envelopes. To reduce the muffling effect, enhancement of the Global Variance (GV) of the spectral features was recently suggested. We propose a different approach for GV enhancement, based on the classical conversion formalized as a GV-constrained minimization. Listening tests show that an improvement in quality is achieved by the proposed approach.