The USFD Spoken Language Translation System for IWSLT 2014 (original) (raw)
Related papers
LIUM’s systems for the IWSLT 2011 Speech Translation Tasks
This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated in three of the proposed tasks, namely the Automatic Speech Recognition task (ASR), the ASR system combination task (ASR_SC) and the Spoken Language Translation task (SLT), since these tasks are all related to speech translation. We present the approaches and specificities we developed on each task.
SRPOL’s System for the IWSLT 2020 End-to-End Speech Translation Task
Proceedings of the 17th International Conference on Spoken Language Translation
This paper describes the submission to IWSLT 2020(Ansari et al., 2020) End-to-End speech translation task by Samsung R&D Institute, Poland. We took part in the offline End-to-End English to German TED lectures translation task. We based our solution on our last year's submission(Potapczyk et al., 2019). We used a slightly altered Transformer(Vaswani et al., 2017) architecture with ResNet-like(He et al., 2016) convolutional layer preparing the audio input to Transformer encoder. To improve the model's quality of translation we introduced two regularization techniques and trained on machine translated Librispeech(Panayotov et al., 2015) corpus in addition to iwsltcorpus, TEDLIUM2(Rousseau et al., 2014) and Must C(Di Gangi et al., 2019) corpora. Our best model scored almost 3 BLEU higher than last year's model. To segment 2020 test set we used exactly the same procedure as last year.
KIT's Multilingual Speech Translation System for IWSLT 2023
arXiv (Cornell University), 2023
Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and terminology-dense contents. The task requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of retraining. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks. 1
The RWTH Aachen speech recognition and machine translation system for IWSLT 2012
In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.
The USFD SLT system for IWSLT 2014
The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-totext translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.
We describe the Microsoft Speech Language Translation (MSLT) corpus, which was created in order to evaluate end-to-end conversational speech translation quality. The corpus was created from actual conversations over Skype, and we provide details on the recording setup and the different layers of associated text data. The corpus release includes Test and Dev sets with reference transcripts for speech recognition. Additionally, cleaned up transcripts and reference translations are available for evaluation of machine translation quality. The IWSLT 2016 release described here includes the source audio, raw transcripts, cleaned up transcripts, and translations to or from English for both French and German.
Samsung's System for the IWSLT 2019 End-to-End Speech Translation Task
2019
This paper describes the submission to IWSLT 2019 End- to-End speech translation task by Samsung R&D Institute, Poland. We decided to focus on end-to-end English to German TED lectures translation and did not provide any submission for other speech tasks. We used a slightly altered Transformer architecture with standard convolutional layer preparing the audio input to Transformer en- coder. Additionally, we propose an audio segmentation al- gorithm maximizing BLEU score on tst2015 test set.
Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018
2018
This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the task, and fine-tuned on cleaned subsets of the original training corpus. The addition of weight normalization and label smoothing improved the baseline system by 1.0 BLEU point on our validation set. The final submission also featured checkpoint averaging within a training run and ensemble decoding of models trained during multiple runs. On test data, our best single model obtained a BLEU score of 9.7, while the ensemble obtained a BLEU score of 10.24.
ELITR Non-Native Speech Translation at IWSLT 2020
Proceedings of the 17th International Conference on Spoken Language Translation
This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-toend general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.