Khalil Sima'an - Profile on Academia.edu (original) (raw)

Papers by Khalil Sima'an

This paper presents a new view of Explanation-Based Learning (EBL) of natural language parsing. R... more This paper presents a new view of Explanation-Based Learning (EBL) of natural language parsing. Rather than employing EBL for specializing parsers by inferring new ones, this paper suggests employing EBL for learning how to reduce ambiguity only partially.We exemplify this by presenting a new EBL method that learns parsers that avoid spurious overgeneration, and we show how the same method can be used for reducing the sizes of stochastic grammars learned from treebanks, e.g. (Bod, 1995; Charniak, 1996; Sekine and Grishman, 1995). The present method consists of an EBL algorithm for learning partial-parsers, and a parsing algorithm which combines partialparsers with existing \full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CF-STs), recognize and combine constituents e ciently, prohibiting spurious overgeneration. The parsing algorithm combines a learned partial-parser with a given fullparser such that the role of the full-parser is limited to combining the constituents, recognized by the partial-parser, and to recognizing unrecognized portions of the input sentence. Besides the reduction of the parse-space prior to disambiguation, the present method provides a way for rening existing disambiguation models that learn stochastic grammars from tree-banks e.g. (Bod, 1995; Charniak, 1996; Sekine and Grishman, 1995). We exhibit encouraging empirical results using a pilot implementation: parse-space is reduced substantially with minimal loss of coverage. The speedup gain for disambiguation models is exempli ed by experiments with the DOP model (Bod, 1995).

We propose a novel pipeline for translation into morphologically rich languages which consists of... more We propose a novel pipeline for translation into morphologically rich languages which consists of two steps: initially, the source string is enriched with target morphological features and then fed into a translation model which takes care of reordering and lexical choice that matches the provided morphological features. As a proof of concept we ﬁrst show improved translation performance for a phrase-based model translating source strings enriched with morphological features projected through the word alignments from target words to source words. Given this potential, we present a model for predicting target morphological features on the source string and its predicate-argument structure, and tackle two major technical challenges: (1) How to ﬁt the morphological feature set to training data? and (2) How to integrate the morphology into the back-end phrase-based model such that it can also be trained on projected (rather than predicted) features for a more efﬁcient pipeline? For the ...

The Arabic language is a collection of spoken dialects with important phonological, morphological... more The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA).We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LAMSA. Instead, we use explicit knowledge about the relation between LA and MSA.

PARSEVAL, the default paradigm for evaluating constituency parsers, calculates parsing success (P... more PARSEVAL, the default paradigm for evaluating constituency parsers, calculates parsing success (Precision/Recall) as a function of the number of matching labeled brackets across the test set. Nodes in constituency trees, however, are connected together to reflect important linguistic relations such as predicate-argument and direct-dominance relations between categories. In this paper, we present FREVAL, a generalization of PARSEVAL, where the precision and recall are calculated not only for individual brackets, but also for co-occurring, connected brackets (i.e. fragments). FREVAL fragments precision (FLP) and recall (FLR) interpolate the match across the whole spectrum of fragment sizes ranging from those consisting of individual nodes (labeled brackets) to those consisting of full parse trees. We provide evidence that FREVAL is informative for inspecting relative parser performance by comparing a range of existing parsers.

We present a novel approach for unsupervised induction of a Reordering Grammar using a modified f... more We present a novel approach for unsupervised induction of a Reordering Grammar using a modified form of permutation trees (Zhang and Gildea, 2007), which we apply to preordering in phrase-based machine translation. Unlike previous approaches, we induce in one step both the hierarchical structure and the transduction function over it from word-aligned parallel corpora. Furthermore, our model (1) handles non-ITG reordering patterns (up to 5-ary branching), (2) is learned from all derivations by treating not only labeling but also bracketing as latent variable, (3) is entirely unlexicalized at the level of reordering rules, and (4) requires no linguistic annotation. Our model is evaluated both for accuracy in predicting target order, and for its impact on translation quality. We report significant performance gains over phrase reordering, and over two known preordering baselines for English-Japanese.

Enriching statistical models with linguistic knowledge has been a major concern in Machine Transl... more Enriching statistical models with linguistic knowledge has been a major concern in Machine Translation (MT). In monolingual data, adjuncts are optional constituents contributing secondarily to the meaning of a sentence. One can therefore hypothesize that this secondary status is preserved in translation, and thus that adjuncts may align consistently with their adjunct translations, suggesting they form optional phrase pairs in parallel corpora. In this paper we verify this hypothesis on French-English translation data, and explore the utility of compiling adjunct-poor data for augmenting the training data of a phrase-based machine translation model.

While much work has been done to inform Hierarchical Phrase-Based SMT (Chiang, 2005) models lingu... more While much work has been done to inform Hierarchical Phrase-Based SMT (Chiang, 2005) models linguistically, the adjunct/argument distinction has generally not been exploited for these models. But as Shieber (2007) points out, capturing this distinction allows to abstract over ‘intervening’ adjuncts, and is thus relevant for (machine) translation in general. We contribute an adjunction-driven approach to hierarchical phrase-based modelling that uses source-side adjuncts to relax extraction constraints–allowing to capturing long-distance dependencies–, and to guide translation through labelling. The labelling scheme can be reduced to two adjunct/non-adjunct labels, and improves translation over Hiero by up to 0.6 BLEU points for English-Chinese.

Selecting a set of nonterminals for the synchronous CFGs underlying the hierarchical phrase-based... more Selecting a set of nonterminals for the synchronous CFGs underlying the hierarchical phrase-based models is usually done on the basis of a monolingual resource (like a syntactic parser). However, a standard bilingual resource like word alignments is itself rich with reordering patterns that, if clustered somehow, might provide labels of different (possibly complementary) nature to monolingual labels. In this paper we explore a first version of this idea based on a hierarchical decomposition of word alignments into recursive tree representations. We identify five clusters of alignment patterns in which the children of a node in a decomposition tree are found and employ these five as nonterminal labels for the Hiero productions. Although this is our first non-optimized instantiation of the idea, our experiments show competitive performance with the Hiero baseline, exemplifying certain merits of this novel approach.

In this paper we explore the novel idea of building a single universal reordering model from Engl... more In this paper we explore the novel idea of building a single universal reordering model from English to a large number of target languages. To build this model we exploit typological features of word order for a large number of target languages together with source (English) syntactic features and we train this model on a single combined parallel corpus representing all (22) involved language pairs. We contribute experimental evidence for the usefulness of linguistically defined typological features for building such a model. When the universal reordering model is used for preordering followed by monotone translation (no reordering inside the decoder), our experiments show that this pipeline gives comparable or improved translation performance with a phrase-based baseline for a large number of language pairs (12 out of 22) from diverse language families.

We present the first application of the adjunct/argument distinction to Hierarchical Phrase-Based... more We present the first application of the adjunct/argument distinction to Hierarchical Phrase-Based SMT. We use rule labelling to characterize synchronous recursion with adjuncts and arguments. Our labels are bilingual obtained from dependency annotations and extended to cover nonsyntactic phrases. The label set we derive in this manner is extremely small, as it contains only thirty-six labels, and yet we find it useful to cluster these labels even further. We present a clustering method that uses label similarity based on left-hand-side/right-hand-side joint trained-model estimates. The results of initial experiments show that our model performs similarly to Hiero on in-domain French-English data.

Deciding whether a synchronous grammar formalism generates a given word alignment (the alignment ... more Deciding whether a synchronous grammar formalism generates a given word alignment (the alignment coverage problem) depends on finding an adequate instance grammar and then using it to parse the word alignment. But what does it mean to parse a word alignment by a synchronous grammar? This is formally undefined until we define an unambiguous mapping between grammatical derivations and word-level alignments. This paper proposes an initial, formal characterization of alignment coverage as intersecting two partially ordered sets (graphs) of translation equivalence units, one derived by a grammar instance and another defined by the word alignment. As a first sanity check, we report extensive coverage results for ITG on automatic and manual alignments. Even for the ITG formalism, our formal characterization makes explicit many algorithmic choices often left underspecified in earlier work.

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018

This work exploits translation data as a source of semantically relevant learning signal for mode... more This work exploits translation data as a source of semantically relevant learning signal for models of word representation. In particular, we exploit equivalence through translation as a form of distributional context and jointly learn how to embed and align with a deep generative model. Our EMBEDALIGN model embeds words in their complete observed context and learns by marginalisation of latent lexical alignments. Besides, it embeds words as posterior probability densities, rather than point estimates, which allows us to compare words in context using a measure of overlap between distributions (e.g. KL divergence). We investigate our model's performance on a range of lexical semantics tasks achieving competitive results on several standard benchmarks including natural language inference, paraphrasing, and text similarity.

We would like to thank all the people who have contributed to the organisation and delivery of th... more We would like to thank all the people who have contributed to the organisation and delivery of this workshop: the authors who submitted such high quality papers; the programme committee for their prompt and effective reviewing; our keynote speaker; the ACL 2016 organising committee, especially the workshops chairs; the participants in the workshop; and future readers of these proceedings for your shared interest in this exciting new area of research.

While it is generally accepted that many translation phenomena are correlated with linguistic str... more While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by selecting and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.

Machine Translation, 2015

Long-range word order differences are a well-known problem for machine translation. Unlike the st... more Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG). For the Chinese-B Gideon Maillette de Buy Wenniger

Nederlands Tijdschrift Voor Tandheelkunde, 2010

In this study we give an overview of the ILLC-UvA (Institute for Logic, Language and Computation-... more In this study we give an overview of the ILLC-UvA (Institute for Logic, Language and Computation-University of Amsterdam) submission to the IWSLT 2010 evaluation campaign. It outlines the architecture and configuration of the novel features we are introducing: a syntax-based model for source-side reordering via tree transduction and accurate training data selection. We have concentrated on the Chinese-to-English and English-to-Chinese DIALOG translation tasks.

The Prague Bulletin of Mathematical Linguistics, 2015

We present BEER, an open source implementation of a machine translation evaluation metric. BEER i... more We present BEER, an open source implementation of a machine translation evaluation metric. BEER is a metric trained for high correlation with human ranking by using learning-to-rank training methods. For evaluation of lexical accuracy it uses sub-word units (character n-grams) while for measuring word order it uses hierarchical representations based on PETs (permutation trees). During the last WMT metrics tasks, BEER has shown high correlation with human judgments both on the sentence and the corpus levels. In this paper we will show how BEER can be used for (i) full evaluation of MT output, (ii) isolated evaluation of word order and (iii) tuning MT systems.

Despite its state-of-the-art performance, the Data Ori- ented Parsing (DOP) model has been shown ... more Despite its state-of-the-art performance, the Data Ori- ented Parsing (DOP) model has been shown to suf- fer from biased parameter estimation, and the good performance seems more the result of ad hoc adjust- ments than correct probabilistic generalization over the data. In recent work, we developed a new es- timation procedure, called Backoff Estimation, for DOP models that are based

Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 3 - EMNLP '09, 2009

Recent syntactic extensions of statistical translation models work with a synchronous context-fre... more Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar (CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single supertag-operator pair for extending the dependency parse incrementally. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the art DTM2 system.

Relational-realizational parsing