Tuấn Hồ - Academia.edu (original) (raw)
Uploads
Papers by Tuấn Hồ
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Au... more This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at 1 .
CLEO: 2013, 2013
ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which ... more ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which can provide an ultrahigh isotropic spatial resolution. The complex refractive index and thickness of an embedded absorptive thin film was simultaneously measured.
Digital signal processing techniques and today’s co mputing capabilities allow the computers to “... more Digital signal processing techniques and today’s co mputing capabilities allow the computers to “understand” human speech. This paper describes the EllaVoice application, which is the user-dependant, isolated voice command recognition tool. It was created in MATLAB, based on dynamic programming and it could serve for the purpose of mobile robots control. The paper deals w ith the application of selected techniques like cross-words reference template crea tion or endpoints detection.
Asia Communications and Photonics Conference 2013, 2013
ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges coveri... more ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges covering ultraviolet to near infrared are developed for core and clad pumping schemes. Applications using these cw and direct laser-diode-pumped sources will be addressed.
2014 IEEE Photonics Conference, 2014
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Au... more This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at 1 .
CLEO: 2013, 2013
ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which ... more ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which can provide an ultrahigh isotropic spatial resolution. The complex refractive index and thickness of an embedded absorptive thin film was simultaneously measured.
Digital signal processing techniques and today’s co mputing capabilities allow the computers to “... more Digital signal processing techniques and today’s co mputing capabilities allow the computers to “understand” human speech. This paper describes the EllaVoice application, which is the user-dependant, isolated voice command recognition tool. It was created in MATLAB, based on dynamic programming and it could serve for the purpose of mobile robots control. The paper deals w ith the application of selected techniques like cross-words reference template crea tion or endpoints detection.
Asia Communications and Photonics Conference 2013, 2013
ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges coveri... more ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges covering ultraviolet to near infrared are developed for core and clad pumping schemes. Applications using these cw and direct laser-diode-pumped sources will be addressed.
2014 IEEE Photonics Conference, 2014