Searchable Translation Memories (original) (raw)

A Compact Data Structure for Searchable Translation Memories

2005

In this paper we describe searchable translation memories, which allow translators to search their archives for possible translations of phrases. We describe how statistical machine translation can be used to align subsentential units in a translation memory, and rank them by their probability. We detail a data structure that allows for memory-efficient storage of the index. We evaluate the accuracy of translations retrieved from a searchable translation memory built from 50,000 sentence pairs, and find a precision of 86.6% for the top ranked translations.

Augmenting a statistical translation system with a translation memory

2005

Abstract. In this paper, we present a translation memory (TM) based system to augment a statistical translation (SMT) system. It is used for translating sentences which have close matches in the training corpus. Given a test sentence, we first extract sentence pairs from the training corpus, whose source side is similar to the test sentence. Then, the TM system modifies the translation of the sentences by a sequence of substitution, deletion and insertion operations, to obtain the desired result.

A Poor Man’s Translation Memory Using Machine Translation Evaluation Metrics

2012

We propose straightforward implementations of translation memory (TM) functionality for research purposes, using machine translation evaluation metrics as similarity functions. Experiments under various conditions demonstrate the effectiveness of the approach, but also highlight problems in evaluating the results using an MT evaluation methodology.

Improving Coverage of Translation Memories with Language Modelling

In this paper, we describe and evaluate current improvements to methods for enlarging translation memories. In comparison with the previous results in 2013, we have achieved improvement in coverage by almost 35 percentage points on the same test data. The basic subsegment splitting of the translation pairs is done using Moses and (M)GIZA++ tools, which provide the subsegment translation probabilities. The obtained phrases are then combined with subsegment combination techniques and filtered by large target language models.

Translation Memory Systems Have a Long Way to Go

Proceedings of the Workshop on Human-Informed Translation and Interpreting Technology, 2017

The TM memory systems changed the work of translators and now the translators not benefiting from these tools are a tiny minority. These tools operate on fuzzy (surface) matching mostly and cannot benefit from already translated texts which are synonymous to (or paraphrased versions of) the text to be translated. The match score is mostly based on character-string similarity, calculated through Levenshtein distance. The TM tools have difficulties with detecting similarities even in sentences which represent a minor revision of sentences already available in the translation memory. This shortcoming of the current TM systems was the subject of the present study and was empirically proven in the experiments we conducted. To this end, we compiled a small translation memory (English-Spanish) and applied several lexical and syntactic transformation rules to the source sentences with both English and Spanish being the source language. The results of this study show that current TM systems have a long way to go and highlight the need for TM systems equipped with NLP capabilities which will offer the translator the advantage of he/she not having to translate a sentence again if an almost identical sentence has already been already translated.

Integrating machine translation into translation memory systems

EAMT Machine Translation Workshop, 1996

Within the last few years, there had been a remarkable change within the use of tools at the desktop of professional translators. Whereas, traditionally the keyword for the automation of the translation process had been machine translation (MT) this has significantly changed in the last few years towards the usage of translation memory systems. On the other side, MT-systems are more and more targeting their genuine market. Non-professional users' main interest lies in "quick information translation". This general development doesn't mean that MT is not used any more at translator's desktops. It rather means, that the role of MT in a professional environment has significantly changed. MT for professional translators means that MT is one software component among other ones within a central translation memory system. MT is reduced to a "proposal machine" for worse case situations: if no information at all is accessible or if the source is simple enough.

Improving Translation Memory Matching and Retrieval using Paraphrases

Most of the current Translation Memory (TM) systems work on string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance calculated on surface form or some variation of it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing in the edit-distance metric. The approach computes edit-distance while considering paraphrases efficiently using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and had three subjective questionnaires. Our results show that paraphrasing improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.

Consistent translation using discriminative learning-a translation memory-inspired approach

2011

We present a discriminative learning method to improve the consistency of translations in phrase-based Statistical Machine Translation (SMT) systems. Our method is inspired by Translation Memory (TM) systems which are widely used by human translators in industrial settings. We constrain the translation of an input sentence using the most similar 'translation example' retrieved from the TM. Differently from previous research which used simple fuzzy match thresholds, these constraints are imposed using discriminative learning to optimise the translation performance. We observe that using this method can benefit the SMT system by not only producing consistent translations, but also improved translation outputs. We report a 0.9 point improvement in terms of BLEU score on English-Chinese technical documents.

Predictive translation memory

Proceedings of the 27th annual ACM symposium on User interface software and technology, 2014

The standard approach to computer-aided language translation is post-editing: a machine generates a single translation that a human translator corrects. Recent studies have shown this simple technique to be surprisingly effective, yet it underutilizes the complementary strengths of precision-oriented humans and recall-oriented machines. We present Predictive Translation Memory, an interactive, mixed-initiative system for human language translation. Translators build translations incrementally by considering machine suggestions that update according to the user's current partial translation. In a largescale study, we find that professional translators are slightly slower in the interactive mode yet produce slightly higher quality translations despite significant prior experience with the baseline post-editing condition. Our analysis identifies significant predictors of time and quality, and also characterizes interactive aid usage. Subjects entered over 99% of characters via interactive aids, a significantly higher fraction than that shown in previous work.