2 PETs and the Hidden Treebank (original) (raw)

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

We present a novel approach for unsupervised induction of a Reordering Grammar using a modified form of permutation trees (Zhang and Gildea, 2007), which we apply to preordering in phrase-based machine translation. Unlike previous approaches, we induce in one step both the hierarchical structure and the transduction function over it from word-aligned parallel corpora. Furthermore, our model (1) handles non-ITG reordering patterns (up to 5-ary branching), (2) is learned from all derivations by treating not only labeling but also bracketing as latent variable, (3) is entirely unlexicalized at the level of reordering rules, and (4) requires no linguistic annotation. Our model is evaluated both for accuracy in predicting target order, and for its impact on translation quality. We report significant performance gains over phrase reordering, and over two known preordering baselines for English-Japanese.

Syntactic Based Reordering Rules for Chinese-to-Japanese Machine Translation

2012

In Statistical Machine Translation(SMT), reordering rules have been proved effective in extracting bilingual phrases and in decoding when translating between languages whose word orders are structurally different. Researchers have tackled the reordering problem in multiple ways. One basic idea is preordering (Xia and McCord, 2004; Collins et al., 2005), that is, to pre-order the source sentences following the word order of the target sentences to be used for decoding. For example, making use of a source dependency parser, Xu et al. (2009) manually created dependency-to-string pre-ordering rules for translating English into five SOV(Subject-ObjectVerb) languages. Later, dependency tree based preordering rules were automatically extracted by Genzel (2010) from word-aligned parallel sentences. In this work, we focus on Chinese-to-Japanese translation, motivated by the need of constructing a direct machine translation system without using a pivot language. Chinese and Japanese involve s...

Rule-based preordering on multiple syntactic levels in statistical machine translation

2014

We propose a novel data-driven rule-based preordering approach, which uses the tree information of multiple syntactic levels. This approach extend the tree-based reordering from one level into multiple levels, which has the capability to process more complicated reordering cases. We have conducted experiments in English-to-Chinese and Chinese-to-English translation directions. Our results show that the approach has led to improved translation quality both when it was applied separately or when it was combined with some other reordering approaches. As our reordering approach was used alone, it showed an improvement of 1.61 in BLEU score in the English-to-Chinese translation direction and an improvement of 2.16 in BLEU score in the Chinese-to-English translation direction, in comparison with the baseline, which used no word reordering. As our preordering approach were combined with the short rule [1], long rule [2] and tree rule [3] based preordering approaches, it showed further impr...

Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation

Expert Systems with Applications, 2017

We present a syntax-based reordering model (RM) for hierarchical phrase-based statistical machine translation (HPB-SMT) enriched with semantic features. Our model brings a number of novel contributions: (i) while the previous dependency-based RM is limited to the reordering of head and dependant constituent pairs, we also model the reordering of pairs of dependants; (ii) Our model is enriched with semantic features (Wordnet synsets) in order to allow the reordering model to generalize to pairs not seen in training but with equivalent meaning. (iii) We evaluate our model on two language directions: English-to-Farsi and English-to-Turkish. These language pairs are particularly challenging due to the free word order, rich morphology and lack of resources of the target languages. We evaluate our RM both intrinsically (accuracy of the RM classifier) and extrinsically (MT). Our best configuration outperforms the baseline classifier by 5-29% on pairs of dependants and by 12-30% on head and dependant pairs while the improvement on MT ranges between 1.6% and 5.5% relative in terms of BLEU depending on language pair and domain. We also analyze the value of the feature weights to obtain further insights on the impact of the reordering-related features in the HPB-SMT model. We observe that the features of our RM are assigned significant weights and that our features are complementary to the reordering feature included by default in the HPB-SMT model.

A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation

2008

In this paper, we propose a linguistically annotated reordering model for BTG-based statistical machine translation. The model incorporates linguistic knowledge to predict orders for both syntactic and non-syntactic phrases. The linguistic knowledge is automatically learned from source-side parse trees through an annotation algorithm. We empirically demonstrate that the proposed model leads to a significant improvement of 1.55% in the BLEU score over the baseline reordering model on the NIST MT-05 Chinese-to-English translation task.

Handling phrase reorderings for machine translation

Proceedings of the ACL-IJCNLP 2009 Conference Short Papers on - ACL-IJCNLP '09, 2009

We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task, we show improved performance over a baseline SMT system.

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation

2005

This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce different reordering constraints. In translation, we apply source sentence reordering on word level and use a reordering automaton as input. We show how to compute reordering automata on-demand using IBM or ITG constraints, and also introduce two new types of reordering constraints. We further add weights to the reordering automata. We present detailed experimental results and show that reordering significantly improves translation quality.

A dependency-based word reordering approach for Statistical Machine Translation

2008

Reordering is of crucial importance for machine translation. Solving the reordering problem can lead to remarkable improvements in translation performance. In this paper, we propose a novel approach to solve the word reordering problem in statistical machine translation. We rely on the dependency relations retrieved from a statistical parser incorporating with linguistic hand-crafted rules to create the transformations. These dependency-based transformations can produce the problem of word movement on both phrase and word reordering which is a difficult problem on parse tree based approaches. Such transformations are then applied as a preprocessor to English language both in training and decoding process to obtain an underlying word order closer to the Vietnamese language. About the hand-crafted rules, we extract from the syntactic differences of word order between English and Vietnamese language. This approach is simple and easy to implement with a small rule set, not lead to the rule explosion. We describe the experiments using our model on VCLEVC corpus [18] and consider the translation from English to Vietnamese, showing significant improvements about 2-4% BLEU score in comparison with the MOSES phrase-based baseline system [19].

Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation - SSST '07, 2007

In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results are reported for a Chinese-to-English task, showing an improvement of 0.5%-1.8% BLEU score absolute on various test sets and better computational efficiency than reordering during decoding. The experiments also show that the reordering at the chunk-level performs better than at the POS-level.

A Novel Reordering Model for Statistical Machine Translation

Word reordering is one of the fundamental problems of machine translation, and an important factor of its quality and efficiency. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree including syntactical and statistical information in context of a log-linear model. The phrasal dependency tree is a new modern syntactic structure based on dependency relations between contiguous nonsyntactic phrases. In comparison with well-known and popular reordering models such as the distortion, lexicalized and hierarchical models, the experimental study demonstrates the superiority of our model regarding to the different evaluation measures. We evaluated the proposed model on a PersianEnglish SMT system. On average our model retrieved a significant impact on precision with comparable recall value respect to the lexicalized and distortion models, and is found to be effective for medium and long-distance reordering.

2 PETs and the Hidden Treebank (original) (raw)

Related papers