kean chin - Academia.edu (original) (raw)

Uploads

Papers by kean chin

Research paper thumbnail of Sound source separation algorithm using phase difference and angle distribution modeling near the target

Conference of the International Speech Communication Association, 2015

Research paper thumbnail of Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Recently, it was shown that the performance of supervised time-frequency masking based robust aut... more Recently, it was shown that the performance of supervised time-frequency masking based robust automatic speech recognition techniques can be improved by training them jointly with the acoustic model [1]. The system in [1], termed deep neural network based joint adaptive training, used fully-connected feed-forward deep neural networks for estimating time-frequency masks and for acoustic modeling; stacked log mel spectra was used as features and training minimized cross entropy loss. In this work, we extend such jointly trained systems in several ways. First, we use recurrent neural networks based on long short-term memory (LSTM) units-this allows the use of un-stacked features, simplifying joint optimization. Next, we use a sequence discriminative training criterion for optimizing parameters. Finally, we conduct experiments on large scale data and show that joint adaptive training can provide gains over a strong baseline. Systematic evaluations on noisy voice-search data show relativ...

Research paper thumbnail of Raw Multichannel Processing Using Deep Neural Networks

New Era for Robust Speech Recognition, 2017

Research paper thumbnail of Understanding Recurrent Neural State Using Memory Signatures

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Research paper thumbnail of Prior information for rapid speaker adaptation

Research paper thumbnail of Acoustic Modeling for Google Home

Research paper thumbnail of Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition

2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Research paper thumbnail of Rapid joint speaker and noise compensation for robust speech recognition

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

Research paper thumbnail of Speech recognition method

The Journal of the Acoustical Society of America, 2008

Research paper thumbnail of Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing, 2011

ABSTRACT

Research paper thumbnail of Support vector machines applied to speech pattern classification

Master's thesis, Engineering Department, Cambridge …

Support Vector Machines (SVM) is a new approach to pattern classification. It promises to give go... more Support Vector Machines (SVM) is a new approach to pattern classification. It promises to give good generalisation and has been applied to various tasks. In this project, pattern recognition using SVMs is evaluated. Specifically, SVMs will be used to classify speech patterns. ...

Research paper thumbnail of Constrained discriminative mapping transforms for unsupervised speaker adaptation

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

Research paper thumbnail of Extending the decomposition algorithm for support vector machines training

Research paper thumbnail of Improved language modelling using bag of word pairs

Research paper thumbnail of Speech factorization for HMM-TTS based on cluster adaptive training

Interspeech 2012

ABSTRACT This paper presents a novel approach to factorize and control different speech factors i... more ABSTRACT This paper presents a novel approach to factorize and control different speech factors in HMM-based TTS systems. In this paper cluster adaptive training (CAT) is used to factorize speaker identity and expressiveness (i.e. emotion). Within a CAT framework, each speech factor can be modelled by a different set of clusters. Users can control speaker identity and expressiveness independently by modifying the weights associated with each set. These weights are defined in a continuous space, so variations of speaker and emotion are also continuous. Additionally, given a speaker which has only neutral-style training data, the approach is able to synthesise speech with that speaker’s voice and different expressions. Lastly, the paper discusses how generalization of the basic factorization concept could allow the production of expressive speech from neutral voices for other HMM-TTS systems not based on CAT.

Research paper thumbnail of An initial investigation of long-term adaptation for meeting transcription

Research paper thumbnail of Time-frequency masking for large scale robust speech recognition

Research paper thumbnail of Sound source separation algorithm using phase difference and angle distribution modeling near the target

Research paper thumbnail of Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Research paper thumbnail of Intonational features for identifying regional accents of Italian

Research paper thumbnail of Sound source separation algorithm using phase difference and angle distribution modeling near the target

Conference of the International Speech Communication Association, 2015

Research paper thumbnail of Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Recently, it was shown that the performance of supervised time-frequency masking based robust aut... more Recently, it was shown that the performance of supervised time-frequency masking based robust automatic speech recognition techniques can be improved by training them jointly with the acoustic model [1]. The system in [1], termed deep neural network based joint adaptive training, used fully-connected feed-forward deep neural networks for estimating time-frequency masks and for acoustic modeling; stacked log mel spectra was used as features and training minimized cross entropy loss. In this work, we extend such jointly trained systems in several ways. First, we use recurrent neural networks based on long short-term memory (LSTM) units-this allows the use of un-stacked features, simplifying joint optimization. Next, we use a sequence discriminative training criterion for optimizing parameters. Finally, we conduct experiments on large scale data and show that joint adaptive training can provide gains over a strong baseline. Systematic evaluations on noisy voice-search data show relativ...

Research paper thumbnail of Raw Multichannel Processing Using Deep Neural Networks

New Era for Robust Speech Recognition, 2017

Research paper thumbnail of Understanding Recurrent Neural State Using Memory Signatures

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Research paper thumbnail of Prior information for rapid speaker adaptation

Research paper thumbnail of Acoustic Modeling for Google Home

Research paper thumbnail of Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition

2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Research paper thumbnail of Rapid joint speaker and noise compensation for robust speech recognition

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

Research paper thumbnail of Speech recognition method

The Journal of the Acoustical Society of America, 2008

Research paper thumbnail of Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing, 2011

ABSTRACT

Research paper thumbnail of Support vector machines applied to speech pattern classification

Master's thesis, Engineering Department, Cambridge …

Support Vector Machines (SVM) is a new approach to pattern classification. It promises to give go... more Support Vector Machines (SVM) is a new approach to pattern classification. It promises to give good generalisation and has been applied to various tasks. In this project, pattern recognition using SVMs is evaluated. Specifically, SVMs will be used to classify speech patterns. ...

Research paper thumbnail of Constrained discriminative mapping transforms for unsupervised speaker adaptation

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

Research paper thumbnail of Extending the decomposition algorithm for support vector machines training

Research paper thumbnail of Improved language modelling using bag of word pairs

Research paper thumbnail of Speech factorization for HMM-TTS based on cluster adaptive training

Interspeech 2012

ABSTRACT This paper presents a novel approach to factorize and control different speech factors i... more ABSTRACT This paper presents a novel approach to factorize and control different speech factors in HMM-based TTS systems. In this paper cluster adaptive training (CAT) is used to factorize speaker identity and expressiveness (i.e. emotion). Within a CAT framework, each speech factor can be modelled by a different set of clusters. Users can control speaker identity and expressiveness independently by modifying the weights associated with each set. These weights are defined in a continuous space, so variations of speaker and emotion are also continuous. Additionally, given a speaker which has only neutral-style training data, the approach is able to synthesise speech with that speaker’s voice and different expressions. Lastly, the paper discusses how generalization of the basic factorization concept could allow the production of expressive speech from neutral voices for other HMM-TTS systems not based on CAT.

Research paper thumbnail of An initial investigation of long-term adaptation for meeting transcription

Research paper thumbnail of Time-frequency masking for large scale robust speech recognition

Research paper thumbnail of Sound source separation algorithm using phase difference and angle distribution modeling near the target

Research paper thumbnail of Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Research paper thumbnail of Intonational features for identifying regional accents of Italian

Log In