Tuấn Hồ - Academia.edu (original) (raw)

Tuấn Hồ

Uploads

Papers by Tuấn Hồ

Research paper thumbnail of Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Au... more This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at 1 .

Research paper thumbnail of Absorptive Thin Film Characterization with Spectroscopic Full-field Optical Coherence Tomography

CLEO: 2013, 2013

ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which ... more ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which can provide an ultrahigh isotropic spatial resolution. The complex refractive index and thickness of an embedded absorptive thin film was simultaneously measured.

Research paper thumbnail of Voice Command Control for Mobile Robots

Digital signal processing techniques and today’s co mputing capabilities allow the computers to “... more Digital signal processing techniques and today’s co mputing capabilities allow the computers to “understand” human speech. This paper describes the EllaVoice application, which is the user-dependant, isolated voice command recognition tool. It was created in MATLAB, based on dynamic programming and it could serve for the purpose of mobile robots control. The paper deals w ith the application of selected techniques like cross-words reference template crea tion or endpoints detection.

Research paper thumbnail of Crystalline fiber based broadband light sources

Asia Communications and Photonics Conference 2013, 2013

ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges coveri... more ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges covering ultraviolet to near infrared are developed for core and clad pumping schemes. Applications using these cw and direct laser-diode-pumped sources will be addressed.

Research paper thumbnail of Crystal fibers based broadband emissions and lasers

2014 IEEE Photonics Conference, 2014

Research paper thumbnail of Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Au... more This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal scales. When trained with a multi-speaker database, our proposed model can encode the voice characteristics into the speaker embedding vector, which can be used in one-shot learning settings. Results from objective and subjective tests indicate that our proposed model outperforms the conventional VQVAE based model in both intra-lingual and cross-lingual conversion tasks. The official results from Voice Conversion Challenge 2020 reveal that our proposed model achieved the highest naturalness performance among autoencoder based models in both tasks. Our implementation is being made available at 1 .

Research paper thumbnail of Absorptive Thin Film Characterization with Spectroscopic Full-field Optical Coherence Tomography

CLEO: 2013, 2013

ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which ... more ABSTRACT We have developed a spectroscopic full-field optical coherence tomography system, which can provide an ultrahigh isotropic spatial resolution. The complex refractive index and thickness of an embedded absorptive thin film was simultaneously measured.

Research paper thumbnail of Voice Command Control for Mobile Robots

Digital signal processing techniques and today’s co mputing capabilities allow the computers to “... more Digital signal processing techniques and today’s co mputing capabilities allow the computers to “understand” human speech. This paper describes the EllaVoice application, which is the user-dependant, isolated voice command recognition tool. It was created in MATLAB, based on dynamic programming and it could serve for the purpose of mobile robots control. The paper deals w ith the application of selected techniques like cross-words reference template crea tion or endpoints detection.

Research paper thumbnail of Crystalline fiber based broadband light sources

Asia Communications and Photonics Conference 2013, 2013

ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges coveri... more ABSTRACT Few-mode crystalline fibers with broadband emissions in various wavelength ranges covering ultraviolet to near infrared are developed for core and clad pumping schemes. Applications using these cw and direct laser-diode-pumped sources will be addressed.

Research paper thumbnail of Crystal fibers based broadband emissions and lasers

2014 IEEE Photonics Conference, 2014

Log In