Spoken language translation using automatically transcribed text in training (original) (raw)
Related papers
The USFD Spoken Language Translation System for IWSLT 2014
Cornell University - arXiv, 2015
The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-totext translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.
Speech translation enhanced automatic speech recognition
IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 2005
Nowadays official documents have to be made available in many languages, like for example in the EU with its 20 official languages. Therefore, the need for effective tools to aid the multitude of human translators in their work becomes easily apparent. An ASR system, enabling the human translator to speak his translation in an unrestricted manner, instead of typing it, constitutes such a tool. In this work we improve the recognition performance of such an ASR system on the target language of the human translator by taking advantage of an either written or spoken source language representation. To do so, machine translation techniques are used to translate between the different languages and then the involved ASR systems are biased towards the gained knowledge. We present an iterative approach for ASR improvement and outperform our baseline system by a relative word error rate reduction of 35.8% / 29.9% in the case of a written / spoken source language representation. Further, we show how multiple target languages, as for example provided by different simultaneous translators during European Parliament debates, can be incorporated into our system design for an improvement of all involved ASR systems.
Literature Survey : Spoken Language Translation
2018
Spoken language translation system has received lot of attention in last decade or so. It enables in translation of speech signals in a source language A to text in target language B. This problem mainly deals with Machine translation (MT), Automatic Speech Recognition (ASR) and Machine Learning (ML). The spoken utterances are first recognized and converted to text and later this source language text is translated to target language. In this paper, we start with looking into the whole flow of speech translation by going via Automatic Speech Recognition and its techniques and neural machine translation. We study the coupling of speech recognition system and the machine translation system by doing the hypothesis selection and features for it by taking the output of the ASR system and fetching it to the MT system. We also look into the lattice and Confusion Network Decoding.
The RWTH Aachen speech recognition and machine translation system for IWSLT 2012
In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.
KIT's Multilingual Speech Translation System for IWSLT 2023
arXiv (Cornell University), 2023
Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and terminology-dense contents. The task requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of retraining. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks. 1
Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade
2019
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then trans- late with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and pretraining approaches for AST, by comparing all on the same datasets. Simple data augmentation by translating ASR transcripts proves most effective on the English–French augmented LibriSpeech dataset, closing the performance gap from 8.2 to 1.4 BLEU, compared to a very strong cascade that could directly utilize copious ASR and MT data. The same end-to-end approach plus fine-tuning closes the gap on the English–Romanian MuST-C dataset from 6.7 to 3.7 BLEU. In addition to these results, we present practical rec- ommendations for augmentation and pretraining approaches. Finally, we decrease...
Efficient language model adaptation for automatic speech recognition of spoken translations
Interspeech 2015, 2015
Direct integration of translation model (TM) probabilities into a language model (LM) with the purpose of improving automatic speech recognition (ASR) of spoken translations typically requires a number of complex operations for each sentence. Many if not all of the LM probabilities need to be updated, the model needs to be renormalized and the ASR system needs to load a new, updated LM for each sentence. In computer-aided translation environments the time loss induced by these complex operations seriously reduces the potential of ASR as an efficient input method. In this paper we present a novel LM adaptation technique that drastically reduces the complexity of each of these operations. The technique consists of LM probability updates using exponential weights based on TM probabilities for each sentence and does not enforce probability renormalization. Instead of storing each resulting language model in its entirety, we only store the update weights which also reduces disk storage and loading time during ASR. Experiments on Dutch read speech translated from English show that both disk storage and recognition time drop dramatically compared to a baseline system that employs a more conventional way of updating the LM.
SPEECH-TO-TEXT TRANSLATOR USING NATURAL LANGUAGE PROCESSING (NLP
International Journal of Engineering Applied Sciences and Technology, 2024
This paper presents a novel approach to realtime speech-to-text conversion and translation leveraging cutting-edge deep learning models. The proposed system utilizes the Wave2Vec model for efficient speech recognition and the M2M-100 model for multilingual translation, providing seamless and accurate conversion of spoken input into translated text. By leveraging contextual information and leveraging the power of unsupervised pretraining, Wave2Vec excels in capturing intricate details of speech, enabling robust and accurate transcription even in noisy environments. In parallel, the M2M-100 model is employed for translating the transcribed text into the desired target language. M2M-100 stands out for its ability to translate between any pair of 100 languages, breaking down language barriers and facilitating seamless communication across diverse linguistic backgrounds. The proposed system operates in real-time, accepting audio input from users via microphones or other input devices. The Wave2Vec model processes the incoming audio stream, transcribing it into text with high accuracy and efficiency. Subsequently, the transcribed text is passed through the M2M-100 model, which translates it into the desired target language, providing the final output in near real-time. Experimental results demonstrate the effectiveness and robustness of the proposed system in accurately transcribing and translating speech across various languages and accents. The system's real-time capabilities make it suitable for a wide range of applications, including multilingual communication, transcription services, language learning platforms, and more.
Integrating Speech Recognition and Machine Translation: Where do We Stand
2006
This paper describes improvements to the interface between speech recognition and machine translation. We modify two different machine translation systems to effectively process dense speech recognition lattices. In addition, we describe how to fully integrate speech translation with machine translation based on weighted finite-state transducers. With a thorough set of experiments, we show that both the acoustic model scores and the source language model positively and significantly affect the translation quality. We have found consistent improvements on three different corpora compared with translations of single best recognized results.