Document driven machine translation enhanced ASR (original) (raw)

Speech translation enhanced automatic speech recognition

IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 2005

Nowadays official documents have to be made available in many languages, like for example in the EU with its 20 official languages. Therefore, the need for effective tools to aid the multitude of human translators in their work becomes easily apparent. An ASR system, enabling the human translator to speak his translation in an unrestricted manner, instead of typing it, constitutes such a tool. In this work we improve the recognition performance of such an ASR system on the target language of the human translator by taking advantage of an either written or spoken source language representation. To do so, machine translation techniques are used to translate between the different languages and then the involved ASR systems are biased towards the gained knowledge. We present an iterative approach for ASR improvement and outperform our baseline system by a relative word error rate reduction of 35.8% / 29.9% in the case of a written / spoken source language representation. Further, we show how multiple target languages, as for example provided by different simultaneous translators during European Parliament debates, can be incorporated into our system design for an improvement of all involved ASR systems.

Spoken language translation using automatically transcribed text in training

In spoken language translation a machine translation system takes speech as input and translates it into another language. A standard machine translation system is trained on written language data and expects written language as input. In this paper we propose an approach to close the gap between the output of automatic speech recognition and the input of machine translation by training the translation system on automatically transcribed speech. In our experiments we show improvements of up to 0.9 BLEU points on the IWSLT 2012 English-to-French speech translation task.

Integration of ASR and machine translation models in a document translation task

(InterSpeech 2007), 2007

This paper is concerned with the problem of machine aided human language translation. It addresses a translation scenario where a human translator dictates the spoken language translation of a source language text into an automatic speech dictation system. The source language text in this scenario is also presented to a statistical machine translation system (SMT). The techniques presented in the paper assume that the optimum target language word string which is produced by the dictation system is modeled using the combined SMT and ASR statistical models. These techniques were evaluated on a speech corpus involving human translators dictating English language translations of French language text obtained from transcriptions of the proceedings of the Canadian House of Commons. It will be shown in the paper that the combined ASR/SMT modeling techniques described in the paper were able to reduce ASR WER by 26.6 percent relative to the WER of an ASR system that did not incorporate SMT knowledge.

Efficient language model adaptation for automatic speech recognition of spoken translations

Interspeech 2015, 2015

Direct integration of translation model (TM) probabilities into a language model (LM) with the purpose of improving automatic speech recognition (ASR) of spoken translations typically requires a number of complex operations for each sentence. Many if not all of the LM probabilities need to be updated, the model needs to be renormalized and the ASR system needs to load a new, updated LM for each sentence. In computer-aided translation environments the time loss induced by these complex operations seriously reduces the potential of ASR as an efficient input method. In this paper we present a novel LM adaptation technique that drastically reduces the complexity of each of these operations. The technique consists of LM probability updates using exponential weights based on TM probabilities for each sentence and does not enforce probability renormalization. Instead of storing each resulting language model in its entirety, we only store the update weights which also reduces disk storage and loading time during ASR. Experiments on Dutch read speech translated from English show that both disk storage and recognition time drop dramatically compared to a baseline system that employs a more conventional way of updating the LM.

The RWTH Aachen speech recognition and machine translation system for IWSLT 2012

In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

Incorporating knowledge of source language text in a system for dictation of document translations

This paper describes methods for integrating source language and target language infor-mation for machine aided human translation (MAHT) of text documents. These methods are applied to a language translation task in-volving a human translator dictating a first draft translation of a source language docu-ment. A method is presented which integrates target language automatic speech recognition (ASR) models with source language statistical machine translation (SMT) and named entity recognition (NER) information at the phonetic level. Information extracted from a source lan-guage document including translation model probabilities and translated named entities are combined with acoustic-phonetic information obtained from phone lattices produced by the ASR system. Phone-level integration allows the combined MAHT system to correctly de-code words that are either not in the ASR vo-cabulary or would have been incorrectly de-coded by the ASR system. It is shown that the combined MAHT system r...

Interactive-Predictive Speech-Enabled Computer-Assisted Translation

2012

In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sentence. Then, the human translator uses an interactive-predictive process to correct the system generated errors. We show the efficiency of this method by higher human productivity gain compared to the baseline systems: pure ASR system and integrated ASR and MT systems.

Evaluating Automatic Speech Recognition in Translation

2018

We address and evaluate the challenges of utilizing Automatic Speech Recognition (ASR) to support the human translator. Audio transcription and translation are known to be far more time-consuming than text translation; at least 2 to 3 times longer. Furthermore, time to translate or transcribe audio is vastly dependent on audio quality, which can be impaired by background noise, overlapping voices, and other acoustic conditions. The purpose of this paper is to explore the integration of ASR in the translation workflow and evaluate the challenges of utilizing ASR to support the human translator. We present several case studies in different settings in order to evaluate the benefits of ASR. Time is the primary factor in this evaluation. We show that ASR might be effectively used to assist, but not replace, the human translator in essential ways.

Literature Survey : Spoken Language Translation

2018

Spoken language translation system has received lot of attention in last decade or so. It enables in translation of speech signals in a source language A to text in target language B. This problem mainly deals with Machine translation (MT), Automatic Speech Recognition (ASR) and Machine Learning (ML). The spoken utterances are first recognized and converted to text and later this source language text is translated to target language. In this paper, we start with looking into the whole flow of speech translation by going via Automatic Speech Recognition and its techniques and neural machine translation. We study the coupling of speech recognition system and the machine translation system by doing the hypothesis selection and features for it by taking the output of the ASR system and fetching it to the MT system. We also look into the lattice and Confusion Network Decoding.

Adaptation of lecture speech recognition system with machine translation output

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

In spoken language translation, integration of the ASR and MT components is critical for good performance. In this paper, we consider the recognition setting where a text translation of each utterance is also available. We present experiments with different ASR system adaptation techniques to exploit MT system outputs. In particular, N-best MT outputs are represented as an utterance-specific language model, which are then used to rescore ASR lattices. We show that this method improves significantly over ASR alone, resulting in an absolute WER reduction of more than 6% for both indomain and out-of-domain acoustic models.