A web-based demonstrator of a multi-lingual phrase-based translation system (original) (raw)

The CMU statistical machine translation system

2003

Abstract In this paper we describe the components of our statistical machine translation system. This system combines phrase-tophrase translations extracted from a bilingual corpus using different alignment approaches. Special methods to extract and align named entities are used. We show how a manual lexicon can be incorporated into the statistical system in an optimized way. Experiments on Chinese-to-English and Arabic-to-English translation tasks are presented.

Analysis and System Combination of Phrase and N-Gram-Based Statistical Machine Translation Systems

2007

In the framework of the Tc-Star project, we analyze and propose a combination of two Statistical Machine Translation systems: a phrase-based and an N -gram-based one. The exhaustive analysis includes a comparison of the translation models in terms of efficiency (number of translation units used in the search and computational time) and an examination of the errors in each system's output. Additionally, we combine both systems, showing accuracy improvements.

Multi-Lingual Phrase-Based Statistical Machine Translation for Arabic-English

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning, 2017

In this paper, we implement a multilingual Statistical Machine Translation (SMT) system for Arabic-English Translation. Arabic Text can be categorized into standard and dialectal Arabic. These two forms of Arabic differ significantly. Different mono-lingual and multilingual hybrid SMT approaches are compared. Mono-lingual systems do always result in better translation accuracy in one Arabic form and poor accuracy in the other. Multilingual SMT models that are trained with pooled parallel MSA/dialectal data result in better accuracy. However, since the available parallel MSA data are much larger compared to dialectal data, multilingual models are biased to MSA. We propose in the work, a multilingual combination of different mono-lingual systems using an Arabic form classifier. The outcome of the classier directs the system to use the appropriate mono-lingual models (standard, dialectal, or mixture). Testing the different SMT systems shows that the proposed classifier-based SMT system outperforms mono-lingual and datapooled multilingual systems.

Bilingual phrases for statistical machine translation

2005

The statistical framework has proved to be very successful in machine translation. The main reason for this success is the existence of powerful techniques that allow to build machine translation systems automatically from available parallel corpora. Most of statistical machine translation approaches are based on single-word translation models, which do not take bilingual contextual information into account. The translation model in the phrase-based approach defines correspondences between sequences of contiguous source words (source segments) and sequences of contiguous target words (target segments) instead of only correspondences between single source words and single target words. That is, statistical phrase-based translation models make use of explicit bilingual contextual information. Different methods for the selection of adequate bilingual word sequences and for training the parameters of these models are reviewed in this paper. Improved techniques for the selection and training model parameters are also introduced. The phrase-based approach has been assessed in different tasks using different corpora and the results obtained are comparable or better than the ones obtained using other statistical and non-statistical machine translation systems.

A Rule-Augmented Statistical Phrase-based Translation System

Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014

Interactive or Incremental Statistical Machine Translation (IMT) aims to provide a mechanism that allows the statistical models involved in the translation process to be incrementally updated and improved. The source of knowledge normally comes from users who either post-edit the entire translation or just provide the translations for wrongly translated domain-specific terminologies. Most of the existing work on IMT uses batch learning paradigm which does not allow translation systems to make use of the new input instantaneously. We introduce an adaptive MT framework with a Rule Definition Language (RDL) for users to amend MT results through translation rules or patterns. Experimental results show that our system acknowledges user feedback via RDL which improves the translations of the baseline system on three test sets for Vietnamese to English translation.

Building an English-to-Arabic Statistical Machine Translation ' s System for the ISL ' s Lecture

Using machines to overcome language barriers have been the main concern of many researchers over the last 5 decades. Among the various efforts to build an automatic translator, the statistical approach showed to be specially promising. Based on machine learning methods, the statistical machine translation is data driven. Indeed it enables to build a translation system to any language pair with no needfor language expertise. The Interactive Systems Laboratories already works on many SMT-based projects. This work focuses on expanding the SMT Lecture Translator by building an English-To-Arabic SMT system. The system was built u.dng training data mainly provided by IBM and was optimized than/c.sto numerous measures like data pre-processing and the genera/ion of a more adequate Language model.

Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System

The Egyptian Journal of Language Engineering, 2017

This paper explores a phrase-based statistical machine translation (PBSMT) pipeline for English-Arabic (En-Ar) language pair. The work surveys the most recent experiments conducted to enhance Arabic machine translation in the En-Ar direction. It also focuses on free datasets and linguistically motivated ideas that enhance phrase-based En-Ar statistical machine translation (SMT) as it is as aims to use those only in order to build a large scale En-Ar SMT system. In addition, the paper highlights Arabic linguistic challenges in Machine Translation (MT) in general. This paper can be considered a guide for building an En-Ar PBSMT system. Furthermore, the presented pipeline can be generalized to any language pairs.

An Overview of Statistical Machine Translation Tools

International Journal of Advanced Research in Computer Science and Software Engineering

The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that SMT approach has been enhanced with in past two and half decades. The paper gives a brief introduction to SMT: its steps and different tools available for each step.

A simple multilingual machine translation system

2003

The multilingual machine translation system described in the first part of this paper demonstrates that the translation memory (TM) can be used in a creative way for making the translation process more automatic (in a way which in fact does not depend on the languages used). The MT system is based upon exploitation of syntactic similarities between more or less related natural languages. It currently covers the translation from Czech to Slovak, Polish and Lithuanian. The second part of the paper also shows that one of the most popular TM based commercial systems, TRADOS, can be used not only for the translation itself, but also for a relatively fast and natural method of evaluation of the translation quality of MT systems.

Phramer: an open source statistical phrase-based translator

Proceedings of the …, 2006

Phrase-Based Statistical Machine Translation Decoder -Phramer. The paper also presents the UTD (HLTRI) system build for the WMT06 shared task. Our goal was to improve the translation quality by enhancing the translation table and by preprocessing the source language text