Achieving Generality in Natural Language Generation: A Functional Approach to Transforming Representations1 (original) (raw)

Implementation architectures for natural language generation

Natural Language Engineering, 2004

Generic software architectures aim to support re-use of components, focusing of research and development effort, and evaluation and comparison of approaches. In the field of natural language processing, generic frameworks for understanding have been successfully deployed to meet all of these aims, but nothing comparable yet exists for generation. The nature of the task itself, and the current methodologies available to research it, seem to make it more difficult to reach the necessary level of consensus to support generic proposals. Recent work has made progress towards establishing a generic framework for generation at the functional level, but left open the issue of actual implementation. In this paper, we discuss the requirements for such an implementation layer for generation systems, drawing on two initial attempts to implement it. We argue that it is possible and useful to distinguish "functional architecture" from "implementation architecture" for generation systems. 1 The Case for a Generic Software Architecture for NLG Most natural language generation (NLG) systems have some kind of modular structure. The individual modules may differ in complex ways, according to whether they are based on symbolic or statistical models, what particular linguistic theories they embrace and so on. Ideally, such modules could be reused in other NLG systems. This would avoid duplication of work, allow realistic research specialisation and allow empirical comparison of different approaches. Examples of ideas that might give rise to reusable modules include:

Acquiring Correct Knowledge for Natural Language Generation

Journal of Artificial Intelligence Research, 2003

Natural language generation (nlg) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. nlg systems, like most ai systems, need substantial amounts of knowledge. However, our experience in two nlg projects suggests that it is difficult to acquire correct knowledge for nlg systems; indeed, every knowledge acquisition (ka) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based ka approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented ka techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other nlg systems as well. In the long term, we hope that new ka techniques may emerge to help nlg system builders. In the shorter term, we believe that understanding how individual ka techniques can fail, and using a mixture of different ka techniques with different strengths and weaknesses, can help developers acquire nlg knowledge that is mostly correct.

Lexical realization in natural language generation

1988

This paper describes a procedure for lexical selection of open-class lexical items in a natural language generation system. An optimum lexical selection module must be able to make realization decisions under varying contextual circumstances. First, it must be able to operate without the influence of context, based on meaning correspondences between elements of conceptual input and the lexical inventory of the target language. Second, it must be able to use contextual constraints, as supported by collocational information in the generation lexicon. Third, there must be an option of realizing input representations pronominally or through definite descriptions. Finally, there must also be an option of using elliptical constructions. The nature of background knowledge and the algorithm we suggest for this task are described. The lexical selection procedure is a part of a comprehensive generation system, DIOGENES.

doi: 10.1017/S1351324906004104 Printed in the United Kingdom A Reference Architecture for Natural Language Generation Systems∗

2004

We present the rags (Reference Architecture for Generation Systems) framework: a specification of an abstract Natural Language Generation (NLG) system architecture to support sharing, re-use, comparison and evaluation of NLG technologies. We argue that the evidence from a survey of actual NLG systems calls for a different emphasis in a reference proposal from that seen in similar initiatives in information extraction and multimedia interfaces. We introduce the framework itself, in particular the two-level data model that allows us to support the complex data requirements of NLG systems in a flexible and coherent fashion, and describe our efforts to validate the framework through a range of implementations. * This is a revised and updated version of the paper "A Reference Architecture for Generation Systems" which appeared (in error) in Natural Language Engineering 10(3/4) the Special Issue on Software Architectures for Language Engineering. This version should be cited in preference to the earlier one. 1 This survey drew on Reiter's original (Reiter 1994) formulation of the model. The later (Reiter and Dale 2000) formulation uses slightly different terminology, which we also use here, but for our purposes is otherwise not significantly different. 2 The systems surveyed were:

Natural Language Generation from Knowledge-Base Triples

2021

The main goal of this master thesis is to create a machine-learning-based tool that is able to verbalize given data, i.e., from given RDF triples; it should be able to create a corresponding text in a natural language (English) such that the text must be grammatically correct, fluent, must contain all information from the input data and cannot have any additional information. The thesis begins with examining the publicly available datasets; then, it focuses on the architectures of statistical machine learning models and their possible usage for natural language generation. The work is also focused on possible numerical text representation, text generation by machine learning models, and optimization algorithms for training the models. The next part of the thesis proposes two main solutions to the problem and examines each of them. Automatic metrics evaluate all systems, and the best performing models are then passed to a human (manual) evaluation. The last part of the thesis focuses...

Reinterpretation of an existing NLG system in a Generic Generation Architecture

Proceedings of the …, 2000

The RAGS project aims to define a reference architecture for Natural Language Generation (NLG) systems. Currently the major part of this architecture consists of a set of datatype definitions for specifying the input and output formats for modules within NLG systems. In this paper we describe our efforts to reinterpret an existing NLG system in terms of these definitions. The system chosen was the Caption Generation System.

Description-directed Natural Language Generation

1985

We report here on a significant new set of capabilities that we have incorporated into our language generation system MUMBLE. Their impact will be to greatly simplify the work of any text planner that uses MUMBLE as ita linguistics component since MUMBLE can now take on many of the planner's text organization and decision-making problems with markedly less hand-tailoring of algorithms in either component.

Architectures for Natural Language Generation: Problems and Perspectives

1993

Current research in natural language generation is situated in a computational linguistics tradition that was founded several decades ago. We critically analyse some of the architectural assumptions underlying existing systems and point out some problems in the domains of text planning and lexicalization. Guided by the identification of major generation challenges viewed from the angles of knowledge-based systems and cognitive psychology, we sketch some new directions for future research.

Robust natural language generation from large-scale knowledge bases

1995

Abstract We have begun to see the emergence of large-scale knowledge bases that house tens of thousands of facts encoded in expressive representational languages. The richness of these representations offer the promise of significantly improving the quality of natural language neration, buttheir epresentational complexity, scale, and task-independence pose great challenges to generators.