Towards universal speech recognition (original) (raw)
Related papers
A study of multilingual speech recognition
1997
This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian codebooks across Swedish and English allophones. The language model (LM) components are constructed by training a statistical bigram model, with a common backoff node, on bilingual texts, and by combining two monolingual LMs into a probabilistic finite state grammar. This system uses a single decoder for Swedish and English sentences, and is capable of recognizing sentences with words from both languages. Preliminary experiments show that sharing acoustic models across the two languages has not resulted in improved performance, while sharing a backoff node at the LM component provides flexibility and ease in recognizing bilingual sentences at the expense of a slight increase in word error rate in some cases. As a by-product, the bilingual decoder also achieves good performance on language identification (LID).
Multilingual speech recognition
2000
Abstract. The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents different solutions to the problem of the automatic language identification task.
Multilingual and Crosslingual Speech Recognition
1998
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. For our experiments we used six of these languages to train and test several recognition engines in monolingual, multilingual and crosslingual setups. Based on a global phoneme set we built a multilingual speech recognition system which can handle five different languages. The acoustic models of the five languages are combined into a monolithic system and context dependent phoneme models are created using language questions.
Towards Language-Universal Mandarin-English Speech Recognition
Interspeech 2019, 2019
Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and codeswitching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.
Multilingual speech recognition in seven languages
Speech Communication, 2001
In this study we present approaches to multilingual speech recognition. We ®rst de®ne dierent approaches, namely portation, cross-lingual and simultaneous multilingual speech recognition. We will show some experiments performed in the ®elds of multilingual speech recognition. In recent years we have ported our recognizer to other languages than German (Italian, Slovak, Slovenian, Czech, English, Japanese). We found that some languages achieve a higher recognition performance with comparable tasks, and are thus easier for automatic speech recognition than others. Furthermore, we present experiments which show the performance of cross-lingual speech recognition of an untrained language with a recognizer trained with other languages. The substitution of phones is important for cross-lingual and simultaneous multilingual recognition. We compared results in cross-lingual recognition for dierent baseline systems and found that the number of shared acoustic units is very important for the performance. With simultaneous multilingual recognition, performance usually decreases compared to monolingual recognition. In few cases, like in the case of non-native speech, however, the recognition can be improved.
Multilingual Speech Recognition and Language Identification
Automatic speech recognition (ASR) is an important technology to enable and improve the human-human and human-computer interactions. Todays, speech recognition technology is mature enough to be useful in many applications.in this present multilingual ASR, both LID and ASR base on DNN.DNN importance to train ASR for many languages with avoids the extra latency that a nearly language decision would introduce, and benefits from the extra scores from the recognizers to better decide which result to return to the user. These benefits come however with an increased processing cost since the input is recognized multiple times. This architecture support multiple languages, allowing users to naturally interact with the system in several languages.
Language-independent and language-adaptive acoustic modeling for speech recognition
Speech Communication, 2001
With the distribution of speech technology products all over the world, the portability to new target languages becomes a practical concern. As a consequence our research focuses on the question of how to port LVCSR systems in a fast and efficient way. More specifically we want to estimate acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language. For this purpose we introduce different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure. Recognition results using language dependent, independent and language adaptive acoustic models are presented and discussed in the framework of our GlobalPhone project which investigates LVCSR systems in 15 languages. Mit der weltweiten Verbreitung von Sprachtechnologieprodukten wird die schnelle und effiziente Portierung vorhandener Spracherkennungssysteme auf neue Sprachen zu einer Angelegenheit von direkt anwendbarem Nutzen. Aus diesem Grund konzentriert sich unsere Forschung auf die Frage, wie sich ein Spracherkennungssystem, genaugenommen die akustischen Modelle, unter Ausnutzung vorhandener Daten anderer Sprachen in einer neuen Sprache effizient entwickeln lassen. Zu diesem Zweck führen wir unterschiedliche Methoden zur Kombination multilingualer akustischer Modelle ein und definieren die Polyphone Decision Tree Specialization Methode. Es werden zahlreiche Erkennungsexperimente anhand sprachenabhängiger, sprachenunabhängiger und sprachenadaptiver akustischer Modellen vorgestellt und im Rahmen des GlobalPhone Projektes evaluiert. GlobalPhone ist ein Projekt, in dem LVCSR Spracherkennung in 15 verschiedenen Sprachen untersucht wird.
Multi-lingual speech recognition system for speech-to-speech translation
2004
This paper describes the speech recognition module of the speech-to-speech translation system being currently developed at ATR. It is a multi-lingual large vocabulary continuous speech recognition system supporting Japanese, English and Chinese languages. A corpusbased statistical approach was adopted for the system design. The database we collected consists of more than 600 000 sentences covering broad range of travel related conversations in each of the three languages.
Cross-lingual speech recognition under runtime resource constraints
2009
This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraints of limited memory size and CPU speed. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding contextindependent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of acoustic-model (AM) merging where the FL's AM is merged into the NL's AM by mapping each senone in the FL's AM to the senone in the NL's AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.