Self supervised learning for robust voice cloning (original) (raw)
Related papers
Voice Cloning Using Transfer Learning with Audio Samples
UMT Artificial Intelligence Review (UMT-AIR) , 2023
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
ArXiv, 2021
Autotuned voice cloning enabling multilingualism
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
2021 29th European Signal Processing Conference (EUSIPCO), 2021
Multi-speaker TTS with Deep Learning
2020
Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone
Interspeech 2007, 2007
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
2020
Waveform-Based Speaker Representations for Speech Synthesis
Interspeech 2018
Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control
10th International Conference on Speech Prosody 2020, 2020
Combining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloning
ITAcotron 2: the Power of Transfer Learning in Expressive TTS Synthesis
Analysis and Application of Natural Language and Speech Processing, 2023
Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Integrated speaker-adaptive speech synthesis
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Development of a Genre-Dependent TTS System with Cross-Speaker Speaking-Style Transplantation
Learning Speaker Embedding from Text-to-Speech
Interspeech 2020, 2020
Procedia Computer Science, 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021
Self-Supervised Speaker Embeddings
Interspeech 2019
High Fidelity Speech Regeneration with Application to Speech Enhancement
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ArXiv, 2021
Karaoker: Alignment-free singing voice synthesis with speech training data
Interspeech 2022
ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech
2020
Explicit Prosodic Modelling and Deep Speaker Embedding Learning for Non-standard Voice Conversion
arXiv: Audio and Speech Processing, 2020
Voice Cloning Applied to Voice Disorders: a Study of Extreme Phonetic Content in Speaker Embeddings
Proceedings of the Canadian Conference on Artificial Intelligence
Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS
Interspeech 2020, 2020
The voice synthesis business: 2022 update
Natural Language Engineering
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
ArXiv, 2021
SYNTACC : Synthesizing Multi-Accent Speech By Weight Factorization
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
2020
Phone-Level Embeddings for Unit Selection Speech Synthesis
Statistical Language and Speech Processing, 2018