Dependency Based Chinese Sentence Realization (original) (raw)

Dependency-based n-gram models for general purpose sentence realisation

Natural Language Engineering, 2011

This paper presents a general-purpose, wide-coverage, probabilistic sentence generator based on dependency n-gram models. This is particularly interesting as many semantic or abstract syntactic input specifications for sentence realisation can be represented as labelled bi-lexical dependencies or typed predicate-argument structures. Our generation method captures the mapping between semantic representations and surface forms by linearising a set of dependencies directly, rather than via the application of grammar rules as in more traditional chart-style or unification-based generators. In contrast to conventional n-gram language models over surface word forms, we exploit structural information and various linguistic features inherent in the dependency representations to constrain the generation space and improve the generation quality. A series of experiments shows that dependency-based n-gram models generalise well to different languages (English and Chinese) and representations (L...

Multi-stage Chinese Dependency Parsing Based on Dependency Direction

This paper presents a novel method for multi-stage dependency parsing based on dependency direction. In the method, dependency parsing processes are divided into multiple sub-stages, and each stage is in a sequential pattern, which makes it easier to take applicable solutions for different issues in dependency parsing. Meanwhile, dependency parsing in the previous stage provides a clearer context for next stage. Furthermore, due to the dependency direction, the proposed method has lower search complexity than that of classic graph-based methods. Experimental results show that compared with common methods, the proposed method in this paper offers comparable accuracy and higher efficiency.

A subtree-based factorization of dependency parsing

2016

We propose a dependency parsing pipeline, in which the parsing of long-distance projections and localized dependencies are explicitly decomposed at the input level. A chosen baseline dependency parsing model performs only on ‘carved’ sequences at the second stage, which are transformed from coarse constituent parsing outputs at the first stage. When k-best constituent parsing outputs are kept, a third-stage is required to search for an optimal combination of the overlapped dependency subtrees. In this sense, our dependency model is subtree-factored. We explore alternative approaches for scoring subtrees, including feature-based models as well as continuous representations. The search for optimal subset to combine is formulated as an ILP problem. This framework especially benefits the models poor on long sentences, generally improving baselines by 0.75-1.28 (UAS) on English, achieving comparable performance with high-order models but faster. For Chinese, the most notable increase is ...