Dependency-based n-gram models for general purpose sentence realisation (original) (raw)

Dependency Based Chinese Sentence Realization

This paper describes log-linear models for a general-purpose sentence realizer based on de- pendency structures. Unlike traditional realiz- ers using grammar rules, our method realizes sentences by linearizing dependency relations directly in two steps. First, the relative order between head and each dependent is deter- mined by their dependency relation. Then the best linearizations compatible with the relative order are selected by log-linear models. The log-linear models incorporate three types of feature functions, including dependency rela- tions, surface words and headwords. Our ap- proach to sentence realization provides sim- plicity, efficiency and competitive accuracy. Trained on 8,975 dependency structures of a Chinese Dependency Treebank, the realizer achieves a BLEU score of 0.8874.

Using regular tree grammars to enhance sentence realisation

Natural Language Engineering, 2011

Feature-based regular tree grammars (FRTG) can be used to generate the derivation trees of a feature-based tree adjoining grammar (FTAG). We make use of this fact to specify and implement both an FTAG-based sentence realiser and a benchmark generator for this realiser. We argue furthermore that the FRTG encoding enables us to improve on other proposals based on a grammar of TAG derivation trees in several ways. It preserves the compositional semantics that can be encoded in feature-based TAGs; it increases efficiency and restricts overgeneration; and it provides a uniform resource for generation, benchmark construction and parsing.

Linguistically informed statistical models of constituent structure for ordering in sentence realization

2004

Abstract We present several statistical models of syntactic constituent order for sentence realization. We compare several models, including simple joint models inspired by existing statistical parsing models, and several novel conditional models. The conditional models leverage a large set of linguistic features without manual feature selection. We apply and evaluate the models in sentence realization for French and German and find that a particular conditional model outperforms all others.

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and representations based on the hidden states of recurrent neural networks such as LSTMs. The sentence vectors are used as features for subsequent machine learning tasks or for pre-training in the context of deep learning. However, not much is known about the properties that are encoded in these sentence representations and about the language information they capture. We propose a framework that facilitates better understanding of the encoded representations. We define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. We demonstrate the potential contribution of the approach by analyzing different sentence representation mechanisms. The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

RNN Simulations of Grammaticality Judgments on Long-distance Dependencies

2018

The paper explores the ability of LSTM networks trained on a language modeling task to detect linguistic structures which are ungrammatical due to extraction violations (extra arguments and subject-relative clause island violations), and considers its implications for the debate on language innatism. The results show that the current RNN model can correctly classify (un)grammatical sentences, in certain conditions, but it is sensitive to linguistic processing factors and probably ultimately unable to induce a more abstract notion of grammaticality, at least in the domain we tested.