Universal Reordering via Linguistic Typology Daiber (original) (raw)

A Model Lexicalized Hierarchical Reordering for Phrase Based Translation

Procedia - Social and Behavioral Sciences, 2011

In this paper, we present a reordering model based on Maximum Entropy with local and non-local features. This model is extended from a hierarchical reordering model with PBSMT [1], which integrates rich syntactic information directly in decoder as local and non-local features of Maximum Entropy model. The advantages of this model are (1) maintaining the strength of phrase based approach with a hierarchical reordering model, (2) many kinds of rich linguistic information integrated in PBSMT as local and non-local features of MaxEntropy model. The experiment results with English-Vietnamese pair showed that our approach achieves significant improvements over the system which uses a lexical hierarchical reordering model .

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation

2005

This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce different reordering constraints. In translation, we apply source sentence reordering on word level and use a reordering automaton as input. We show how to compute reordering automata on-demand using IBM or ITG constraints, and also introduce two new types of reordering constraints. We further add weights to the reordering automata. We present detailed experimental results and show that reordering significantly improves translation quality.

Shift-Reduce Word Reordering for Machine Translation

This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.

Analyzing the potential of source sentence reordering in statistical machine translation

2013

We analyze the performance of source sentence reordering, a common reordering approach, using oracle experiments on German-English and English-German translation. First, we show that the potential of this approach is very promising. Compared to a monotone translation, the optimally reordered source sentence leads to improvements of up to 4.6 and 6.2 BLEU points, depending on the language. Furthermore, we perform a detailed evaluation of the different aspects of the approach. We analyze the impact of the restriction of the search space by reordering lattices and we can show that using more complex rule types for reordering results in better approximation of the optimally reordered source. However, a gap of about 3 to 3.8 BLEU points remains, presenting a promising perspective for research on extending the search space through better reordering rules. When evaluating the ranking of different reordering variants, the results reveal that the search for the best path in the lattice perfo...

Examining the Relationship between Preordering and Word Order Freedom in Machine Translation

Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

We study the relationship between word order freedom and preordering in statistical machine translation. To assess word order freedom, we first introduce a novel entropy measure which quantifies how difficult it is to predict word order given a source sentence and its syntactic analysis. We then address preordering for two target languages at the far ends of the word order freedom spectrum, German and Japanese, and argue that for languages with more word order freedom, attempting to predict a unique word order given source clues only is less justified. Subsequently, we examine lattices of n-best word order predictions as a unified representation for languages from across this broad spectrum and present an effective solution to a resulting technical issue, namely how to select a suitable source word order from the lattice during training. Our experiments show that lattices are crucial for good empirical performance for languages with freer word order (English-German) and can provide additional improvements for fixed word order languages (English-Japanese).

Handling phrase reorderings for machine translation

Proceedings of the ACL-IJCNLP 2009 Conference Short Papers on - ACL-IJCNLP '09, 2009

We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task, we show improved performance over a baseline SMT system.

Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation

Expert Systems with Applications, 2017

We present a syntax-based reordering model (RM) for hierarchical phrase-based statistical machine translation (HPB-SMT) enriched with semantic features. Our model brings a number of novel contributions: (i) while the previous dependency-based RM is limited to the reordering of head and dependant constituent pairs, we also model the reordering of pairs of dependants; (ii) Our model is enriched with semantic features (Wordnet synsets) in order to allow the reordering model to generalize to pairs not seen in training but with equivalent meaning. (iii) We evaluate our model on two language directions: English-to-Farsi and English-to-Turkish. These language pairs are particularly challenging due to the free word order, rich morphology and lack of resources of the target languages. We evaluate our RM both intrinsically (accuracy of the RM classifier) and extrinsically (MT). Our best configuration outperforms the baseline classifier by 5-29% on pairs of dependants and by 12-30% on head and dependant pairs while the improvement on MT ranges between 1.6% and 5.5% relative in terms of BLEU depending on language pair and domain. We also analyze the value of the feature weights to obtain further insights on the impact of the reordering-related features in the HPB-SMT model. We observe that the features of our RM are assigned significant weights and that our features are complementary to the reordering feature included by default in the HPB-SMT model.

Syntactic Based Reordering Rules for Chinese-to-Japanese Machine Translation

2012

In Statistical Machine Translation(SMT), reordering rules have been proved effective in extracting bilingual phrases and in decoding when translating between languages whose word orders are structurally different. Researchers have tackled the reordering problem in multiple ways. One basic idea is preordering (Xia and McCord, 2004; Collins et al., 2005), that is, to pre-order the source sentences following the word order of the target sentences to be used for decoding. For example, making use of a source dependency parser, Xu et al. (2009) manually created dependency-to-string pre-ordering rules for translating English into five SOV(Subject-ObjectVerb) languages. Later, dependency tree based preordering rules were automatically extracted by Genzel (2010) from word-aligned parallel sentences. In this work, we focus on Chinese-to-Japanese translation, motivated by the need of constructing a direct machine translation system without using a pivot language. Chinese and Japanese involve s...

A Novel Reordering Model for Statistical Machine Translation

Word reordering is one of the fundamental problems of machine translation, and an important factor of its quality and efficiency. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree including syntactical and statistical information in context of a log-linear model. The phrasal dependency tree is a new modern syntactic structure based on dependency relations between contiguous nonsyntactic phrases. In comparison with well-known and popular reordering models such as the distortion, lexicalized and hierarchical models, the experimental study demonstrates the superiority of our model regarding to the different evaluation measures. We evaluated the proposed model on a PersianEnglish SMT system. On average our model retrieved a significant impact on precision with comparable recall value respect to the lexicalized and distortion models, and is found to be effective for medium and long-distance reordering.

A simple and effective hierarchical phrase reordering model

Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08, 2008

While phrase-based statistical machine translation systems currently deliver state-of-theart performance, they remain weak on word order changes. Current phrase reordering models can properly handle swaps between adjacent phrases, but they typically lack the ability to perform the kind of long-distance reorderings possible with syntax-based systems. In this paper, we present a novel hierarchical phrase reordering model aimed at improving non-local reorderings, which seamlessly integrates with a standard phrase-based system with little loss of computational efficiency. We show that this model can successfully handle the key examples often used to motivate syntax-based systems, such as the rotation of a prepositional phrase around a noun phrase. We contrast our model with reordering models commonly used in phrase-based systems, and show that our approach provides statistically significant BLEU point gains for two language pairs: Chinese-English (+0.53 on MT05 and +0.71 on MT08) and Arabic-English (+0.55 on MT05).