Training acoustic models with speech data from different languages (original) (raw)
Related papers
… European Conference on Speech Communication and …, 2005
In this paper we describe an automated, linguistic knowledgebased method for building acoustic models for a target language for which there is no native training data. The method assumes availability of well-trained acoustic models for a number of existing source languages. It employs statistically derived phonetic and phonological distance metrics, particularly a combined phonetic-phonological (CPP) metric, defined to characterize a variety of linguistic relationships between phonemes from the source languages and a target language. Using these metrics, candidate phonemes from the source languages are automatically selected for each phoneme of the target language and acoustic models are constructed. Our experiments show that this automated method can generate acoustic models with good quality, far above the general phoneme symbol-based crosslanguage transfer strategy, reaching the performance of models generated through acoustic-distance mapping.
Crosslingual acoustic model development for automatics speech recognition
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2007
In this work we discuss the development of two crosslingual acoustic model sets for automatic speech recognition (ASR). The starting point is a set of multilingual Spanish-English-German hidden Markov models (HMMs). The target languages are Slovenian and French. During the discussion the problem of defining a multilingual phoneme set and the associated dictionary mapping is considered. A method is described to circumvent related problems. The impact of the acoustic source models on the performance of the target systems is analyzed in detail. Several crosslingual defined target systems are built and compared to their monolingual counterparts. It is shown that crosslingual build acoustic models clearly outperform pure monolingual models if only a limited amount of target data is available.
Crosslingual transfer of source acoustic models to two different target languages
This paper presents the ongoing work on crosslingual speech recognition in the MASPER initiative. Source acoustic models were transferred to two different target languages -Hungarian and Slovenian. Beside the monolingual source acoustic models, also a semi-multilingual set was defined. An expert-knowledge approach and a data-driven method were applied for transfer. The crosslingual speech recognition results were used to analyse the robustness of different source acoustic models respective the language similarity influence.
Towards multilingual speech recognition using data driven source/target acoustical units association
2004
Multilingual speech recognition pushes to study the acoustic modeling of target language units using one or more source languages' units. This paper presents a study of manual and data driven association of two possible target units with source language's phonemes. The target units studied are words and phonemes. Algorithms for data-driven association are described. While phoneme-tophoneme association is more practical, words' transcription provides better results. It has been shown that more precise and rich source models are more suitable to determine those association. Experiments are conducted with French as source language and Arabic as target language.
Proceedings of the Sixth International Conference on …, 2008
In this paper we describe an approach that both creates crosslingual acoustic monophone model sets for speech recognition tasks and objectively predicts their performance without target-language speech data or acoustic measurement techniques. This strategy is based on a series of linguistic metrics characterizing the articulatory phonetic and phonological distances of target-language phonemes from source-language phonemes. We term these algorithms the Combined Phonetic and Phonological Crosslingual Distance (CPP-CD) metric and the Combined Phonetic and Phonological Crosslingual Prediction (CPP-CP) metric. The particular motivations for this project are the current unavailability and often prohibitively high production cost of speech databases for many strategically important low-and middle-density languages.
Cross-LanguagePhonemeRecognitionfor Under-ResourcedLanguages
2012
Abstract—In the present research, we explore several methods for transforming phoneme models from a language with acoustic models that have been trained (source language) to another, untrained language (target language). One approach uses acoustic distance-measures to automatically define the mapping from source to target phonemes. This is achieved by training basic models for the target language using a limited amount of training data and calculating the distance between the source models and target models. Naturally this approach requires some data from the target language. Another approach, which also requires some data from the target language, is to use acoustic adaptation for augmenting the source language acoustic models to better match the acoustic properties of the data in the target language. Phoneme recognition results of these approaches are compared to a reference recognizer that is well-trained on the target language.
Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
IEICE Transactions on Information and Systems, 2014
This paper presents a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging well-trained acoustic models of other languages (called source languages). The idea is to use source language acoustic model to score the acoustic features of the target language, and then map these scores to the posteriors of the target phones using a classifier. The target phone posteriors are then used for decoding in the usual way of hybrid acoustic modeling. The motivation of such a strategy is that human languages usually share similar phone sets and hence it may be easier to predict the target phone posteriors from the scores generated by source language acoustic models than to train from scratch an under-resourced language acoustic model. The proposed method is evaluated using on the Aurora-4 task with less than 1 hour of training data. Two types of source language acoustic models are considered, i.e. hybrid HMM/MLP and conventional HMM/GMM models. In addition, we also use triphone tied states in the mapping. Our experimental results show that by leveraging well trained Malay and Hungarian acoustic models, we achieved 9.0% word error rate (WER) given 55 minutes of English training data. This is close to the WER of 7.9% obtained by using the full 15 hours of training data and much better than the WER of 14.4% obtained by conventional acoustic modeling techniques with the same 55 minutes of training data.
Cross-language phone recognition when the target language phoneme inventory is not known
2011
Cross-language speech recognition often assumes a certain amount of knowledge about the target language. However, there are hundreds of languages where not even the phoneme inventory is known. In the work reported here, phone recognisers are evaluated on a cross-language task with minimum target knowledge. A phonetic distance measure is introduced for the evaluation, allowing a distance to be calculated between any utterance of any language. This has a number of spin-off applications such as allophone detection, a phone-based ROVER approach to recognition, and cross-language forced alignment. Results show that some of these novel approaches will be of immediate use in characterising languages where there is little phonological knowledge.
Acoustic Model Optimization for Multilingual Speech Recognition
2008
Due to abundant resources not always being available for resource-limited languages, training an acoustic model with unbalanced training data for multilingual speech recognition is an interesting research issue. In this paper, we propose a three-step data-driven phone clustering method to train a multilingual acoustic model. The first step is to obtain a clustering rule of context independent phone models driven from a well-trained acoustic model using a similarity measurement. For the second step, we further clustered the sub-phone units using hierarchical agglomerative clustering with delta Bayesian information criteria according to the clustering rules. Then, we chose a parametric modeling technique- model complexity selection -- to adjust the number of Gaussian components in a Gaussian mixture for optimizing the acoustic model between the new phoneme set and the available training data. We used an unbalanced trilingual corpus where the percentages of the amounts of the training ...
Towards language independent acoustic modeling
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000
We describe procedures and experimental results using speech from diverse source languages to build an ASR system for a single target language. This work is intended to improve ASR in languages for which large amounts of training data are not available. We have developed both knowledge based and automatic methods to map phonetic units from the source languages to the target language. We employed HMM adaptation techniques and Discriminative Model Combination to combine acoustic models from the individual source languages for recognition of speech in the target language. Experiments are described in which Czech Broadcast News is transcribed using acoustic models trained from small amounts of Czech read speech augmented by English, Spanish, Russian, and Mandarin acoustic models.