A first step towards text-independent voice conversion (original) (raw)
So far, all conventional voice conversion approaches are text-dependent, i.e., they need equivalent training utterances of source and target speaker. Since several recently proposed applications call for renouncing this requirement, in this paper, we present an algorithm which finds corresponding time frames within text-independent training data. The performance of this algorithm is tested by means of a voice conversion framework based on linear transformation of the spectral envelope. Experimental results are reported on a Spanish cross-gender corpus utilizing several objective error measures.