Beyond translation memories (original) (raw)

Hybrid data-driven models of machine translation

Machine Translation, 2005

This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid ‘example-based’ SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French–English translation. In this paper, we show that similar gains are to be had from constructing a hybrid ‘statistical’ EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid ‘statistical’ EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid ‘example-based’ SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.

On the use of statistical machine-translation techniques within a memory-based translation system (AMETRA)

2003

The goal of the AMETRA project is to make a computer-assisted translation tool from the Spanish language to the Basque language under the memory-based translation framework. The system is based on a large collection of bilingual word-segments. These segments are obtained using linguistic or statistical techniques from a Spanish-Basque bilingual corpus consisting of sentences extracted from the Basque Country's of£cial government record. One of the tasks within the global information document of the AMETRA project is to study the combination of well-known statistical techniques for the translation of short sequences and techniques for memory-based translation. In this paper, we address the problem of constructing a statistical module to deal with the task of translating segments. The task undertaken in the AMETRA project is compared with other existing translation tasks, This study includes the results of some preliminary experiments we have carried out using well-known statistical machine translation tools and techniques.

Four Generations of Machine Translation Research and Prospects for the Future

Language Interpretation and Communication, 1978

This paper begins with a description of four generatiqns of research in mschine translation: the original efforts off 1957 to 1965 and threW-tipes of surviving and sometimes competing present projects. The three types of 'present projects include. those relying on "brute force" methods n"volving largek and fastek computers; those-based on a ling'istic tradition which asserts that knowledge required for machine t anslation can be assimilated to the structure of a grammar-baied sys em with a semantic, component; and those stemming from artificialA. elligence research, with an emphasis on knowledge structures.,Th paper argues that the , artificial intelligence approach ha the best chance of simulating the communicative abilities necessar for realistic machine translation and gives an account of ow knowledge structures eight cope with one of the-classic problems of achide translation: that of metaphor, or "semantic boundary breaking." (AA) *

Machine Translation and Multi-Lingual Text

Abstract In this paper, I look at some of the problems of Machine Translation (MT) as compared to multi-lingual text processing and text retrieval. In particular, I discuss the question of the evaluation of the output of MT systems, as opposed to the output of other NLP applications.

Recent Advances in Example-Based Machine Translation

2003

Recent Advances in Example-Based Machine Translation is of relevance to researchers and program developers in the field of Machine Translation and especially Example-Based Machine Translation, bilingual text processing and cross-linguistic information retrieval. It is also of interest to translation technologists and localisation professionals. Recent Advances in Example-Based Machine Translation fills a void, because it is the first book to tackle the issue of EBMT in depth. It gives a state-of-the-art overview of EBMT techniques and provides a coherent structure in which all aspects of EBMT are embedded. Its contributions are written by long-standing researchers in the field of MT in general, and EBMT in particular. This book can be used in graduate-level courses in machine translation and statistical NLP.

Linking translation memories with example-based machine translation

Machine Translation Summit VII, 1999

The paper reports on experiments which compare the translation outcome of three corpus-based MT systems, a string-based translation memory (STM), a lexeme-based translation memory (LTM) and the examplebased machine translation (EBMT) system EDGAR. We use a fully automatic evaluation method to compare the outcome of each MT system and discuss the results. We investigate the benefits for the linkage of different MT strategies such as TMsystems and EBMT systems.

The Universität Karlsruhe translation system for the EACL-WMT 2009

Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009

In this paper we describe the statistical machine translation system of the Universität Karlsruhe developed for the translation task of the Fourth Workshop on Statistical Machine Translation. The state-ofthe-art phrase-based SMT system is augmented with alternative word reordering and alignment mechanisms as well as optional phrase table modifications. We participate in the constrained condition of German-English and English-German as well as in the constrained condition of French-English and English-French.

Improving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon

2015

In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based Machine Translation system against those of the Statistical Machine Translation system Moses. The obtained results revealed that adding a domain-specific bilingual lexicon (extracted from a parallel domain-specific corpus) to the general-purpose bilingual lexicon of the Example-Based Machine Translation system improves translation quality for both in-domain as well as outof-domain texts, and the Example-Based Machine Translation system outperforms Moses when texts to translate are related to the specific domain.

The 5 th ELTLT CONFERENCE PROCEEDINGS DECISION MAKING IN TRANSLATION AND ITS CONSEQUENCE: THE IMPACT OF TRANSLATION TECHNIQUES ON THE QUALITY

2016

Every single decision a translator makes bears consequence, be it concerning micro units of translation, the macro units or even decision dealing with components larger than those translation units. One of the various decisions which has to be made by a translator when working on a translational task or job deals with the possibility that a single translation unit can be translated in many different ways. In deciding on how micro units of translation (which range from words to sentence) are translated, the concept termed by Molina and Albir-translation technique‖, a translator is required to consider several noteworthy factors as each technique that he or she decides to use brings about effect, the impact on the translation quality. This paper aims at revealing how the decisions made by a translator regarding what techniques to use affect the quality of the translation that he or she produces. There are two existent chances: correctly chosen and employed translation techniques heigh...