Statistical Machine Translation prototype using UN parallel documents (original) (raw)

Statistical Machine Translation

2016

Statistical Machine Translation (SMT) systems are based on bilingual sentence aligned data. The quality of translation depends on the data provided for translation learning. A huge parallel corpus is required for performing the statistical machine translation. The aim of this paper is to explore SMT using the Moses toolkit for creating a German-English translator. To perform the German to English translation, a parallel corpus of this language pair has been provided. Larger the size of the data provided for the training of the Moses decoder, more accurate is the translated output.

Evaluation of machine translation systems and related procedures

ARPN journal of engineering and applied sciences, 2018

Currently, the high volume of international information exchange involves a wide range of localities. As each locality comes with its own distinctive dialect, the need for an effective means of language translation is becoming more and more apparent. Among the concerns of information professionals is the capacity of an interested party to access web information offered in an unfamiliar language. Classified under the wide field of artificial intelligence, machine translation (MT) is an approach related to natural language processing. The machine translation technique involves the use of software for the conversion of documents or verbalized information from one natural language into another. Of late, a substantial number of procedures have been proposed for the fashioning of an efficient MT system. While these procedures were observed to be capable in certain areas, they were found wanting in others. The objectives of this endeavour are to (a) conduct a thorough investigation on mach...

Findings of the 2009 workshop on statistical machine translation

Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009

This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality, for more than 20 metrics. We present a new evaluation technique whereby system output is edited and judged for correctness. 2 Overview of the shared translation and system combination tasks The workshop examined translation between English and five other languages: German, Spanish, French, Czech, and Hungarian. We created a test set for each language pair by translating newspaper articles. We additionally provided training data and a baseline system. 2.1 Test data The test data for this year's task was created by hiring people to translate news articles that were drawn from a variety of sources during the period from the end of September to mid-October of 2008. A total of 136 articles were selected, in roughly equal amounts from a variety of Czech, English, French, German, Hungarian, Italian and Spanish news sites: 3

A Feasibility Study for Chinese-Spanish Statistical Machine Translation

This article presents and describes an experimental prototype system for performing Chinese-to-Spanish and Spanish-to-Chinese machine translation. The system is based on the statistical machine translation (SMT) framework and, more specifically, it implements the bilingual n-gram SMT approach. Since, as far as we know, no large Chinese-Spanish parallel corpus is currently available for training purposes, an alternative experimental method for building a training corpus was used. This method is compared, in terms of translation quality, to the simpler approach of using English as a bridge language for performing Chinese-to-Spanish and Spanish-to-Chinese translations.

The CMU statistical machine translation system

2003

Abstract In this paper we describe the components of our statistical machine translation system. This system combines phrase-tophrase translations extracted from a bilingual corpus using different alignment approaches. Special methods to extract and align named entities are used. We show how a manual lexicon can be incorporated into the statistical system in an optimized way. Experiments on Chinese-to-English and Arabic-to-English translation tasks are presented.

Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09

This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality, for more than 20 metrics. We present a new evaluation technique whereby system output is edited and judged for correctness. 2.1 Test data The test data for this year's task was created by hiring people to translate news articles that were drawn from a variety of sources during the period from the end of September to mid-October of 2008. A total of 136 articles were selected, in roughly equal amounts from a variety of Czech, English, French, German, Hungarian, Italian and Spanish news sites: 2

LIMSI's statistical translation systems for WMT'09

Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009

This paper describes our Statistical Machine Translation systems for the WMT09 (en:fr) shared task. For this evaluation, we have developed four systems, using two different MT Toolkits: our primary submission, in both directions, is based on Moses, boosted with contextual information on phrases, and is contrasted with a conventional Moses-based system. Additional contrasts are based on the Ncode toolkit, one of which uses (part of) the English/French GigaWord parallel corpus.

A web-based demonstrator of a multi-lingual phrase-based translation system

Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations on - EACL '06, 2006

This paper describes a multi-lingual phrase-based Statistical Machine Translation system accessible by means of a Web page. The user can issue translation requests from Arabic, Chinese or Spanish into English. The same phrase-based statistical technology is employed to realize the three supported language-pairs. New language-pairs can be easily added to the demonstrator. The Web-based interface allows the use of the translation system to any computer connected to the Internet.

The CMU-UKA statistical machine translation systems for IWSLT 2007

2007

Abstract This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→ English, Chinese→ English and Arabic→ English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled.

An Overview of Statistical Machine Translation Tools

International Journal of Advanced Research in Computer Science and Software Engineering

The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that SMT approach has been enhanced with in past two and half decades. The paper gives a brief introduction to SMT: its steps and different tools available for each step.