Robust natural language generation from large-scale knowledge bases (original) (raw)

Natural Language Generation from Knowledge-Base Triples

2021

The main goal of this master thesis is to create a machine-learning-based tool that is able to verbalize given data, i.e., from given RDF triples; it should be able to create a corresponding text in a natural language (English) such that the text must be grammatically correct, fluent, must contain all information from the input data and cannot have any additional information. The thesis begins with examining the publicly available datasets; then, it focuses on the architectures of statistical machine learning models and their possible usage for natural language generation. The work is also focused on possible numerical text representation, text generation by machine learning models, and optimization algorithms for training the models. The next part of the thesis proposes two main solutions to the problem and examines each of them. Automatic metrics evaluate all systems, and the best performing models are then passed to a human (manual) evaluation. The last part of the thesis focuses...

Generation that exploits corpus-based statistical knowledge

Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998

We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth, and shifts the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional, making it efficient, yet it also handles non-compositional aspects of language. Nitrogen's design makes it robust and scalable, operating with lexicons and knowledge bases of one hundred thousand entities.

The realities of generating natural language from databases

1998

Research in natural language generation promises signi cant advances in the ways in which we can make available the contents of underlying information sources. Most work in the eld relies on the existence of carefully constructed arti cial intelligence knowledge bases; however, the reality is that most information currently stored on computers is not represented in this format. In this paper, we describe some work in progress where we attempt to generate large numbers of texts automatically from existing underlying databases. We focus here in particular on the automatic generation of descriptions of objects stored in a museum database, highlighting the di culties that arise in using a real data source, and pointing to some possible solutions.

Natural Language Generation

Oxford Handbooks Online, 2005

Communication via a natural language requires two fundamental skills, producing text and understanding it. This article introduces the field of computational approaches to the former-natural language generation (NLG) showing some of the theoretical and practical problems that linguists, computer scientists, and psychologists have encountered when trying to explain how language works in machines or in their minds. The corresponding task of NLG spans a wide spectrum: ranging from planning some action to executing it. Providing architectures in which all of these decisions can be made to coexist, while still allowing the production of natural sounding texts within a reasonable amount of time, is one of the major challenges of NLG. Another challenge is ascertaining just what the decisions involved in NLG are. This article overviews the cognitive, social and linguistic dimensions of NLG and finally opens issues and problems related to the field.

Implementation architectures for natural language generation

Natural Language Engineering, 2004

Generic software architectures aim to support re-use of components, focusing of research and development effort, and evaluation and comparison of approaches. In the field of natural language processing, generic frameworks for understanding have been successfully deployed to meet all of these aims, but nothing comparable yet exists for generation. The nature of the task itself, and the current methodologies available to research it, seem to make it more difficult to reach the necessary level of consensus to support generic proposals. Recent work has made progress towards establishing a generic framework for generation at the functional level, but left open the issue of actual implementation. In this paper, we discuss the requirements for such an implementation layer for generation systems, drawing on two initial attempts to implement it. We argue that it is possible and useful to distinguish "functional architecture" from "implementation architecture" for generation systems. 1 The Case for a Generic Software Architecture for NLG Most natural language generation (NLG) systems have some kind of modular structure. The individual modules may differ in complex ways, according to whether they are based on symbolic or statistical models, what particular linguistic theories they embrace and so on. Ideally, such modules could be reused in other NLG systems. This would avoid duplication of work, allow realistic research specialisation and allow empirical comparison of different approaches. Examples of ideas that might give rise to reusable modules include:

Architectures for Natural Language Generation: Problems and Perspectives

1993

Current research in natural language generation is situated in a computational linguistics tradition that was founded several decades ago. We critically analyse some of the architectural assumptions underlying existing systems and point out some problems in the domains of text planning and lexicalization. Guided by the identification of major generation challenges viewed from the angles of knowledge-based systems and cognitive psychology, we sketch some new directions for future research.

Achieving Generality in Natural Language Generation: A Functional Approach to Transforming Representations1

Abstract In recent years, natural language generation research has begun to mature rapidly. It has witnessed both the appearance of sophisticated o-the-shelf surface realizations systems and the development in the knowledge representation community of knowledge bases in a variety of domains. However, knowledge bases typically employ a logic formalism while the state-of-the-art surface realizer, Fuf, employs a uni cation-based formalism to encode input expressions, grammatical knowledge, and lexical knowledge.

Natural Language Generation in the context of the Semantic Web

Semantic web, 2014

Natural Language Generation (NLG) is concerned with transforming some formal content input into a natural language output, given some communicative goal. Although this input has taken many forms and representations over the years, it is the semantic/conceptual representations that have always been considered as the "natural" starting ground for NLG. Therefore, it is natural that the semantic web, with its machine-processable representation of information with explicitly defined semantics, has attracted the interest of NLG practitioners from early on. We attempt to provide an overview of the main paradigms of NLG from SW data, emphasizing how the Semantic Web provides opportunities for the NLG community to improve their state-of-the-art approaches whilst bringing about challenges that need to be addressed before we can speak of a real symbiosis between NLG and the Semantic Web.

Language Generation for Broad-Coverage, Explainable Cognitive Systems

2022

This paper describes recent progress on natural language generation (NLG) for language-endowed intelligent agents (LEIAs) developed within the OntoAgent cognitive architecture. The approach draws heavily from past work on natural language understanding in this paradigm: it uses the same knowledge bases, theory of computational linguistics, agent architecture, and methodology of developing broad-coverage capabilities over time while still supporting near-term applications.

Proceedings of the Linguistic Resources for Automatic Natural Language Generation - LiRA@NLG

2017

The Linguistic Resources for Automatic Natural Language Generation (LiRA@NLG) workshop of the International Natural Language Generation INLG2017 Conference held at Santiago de Compostela, September 4, 2017, brought together participants involved in developing large-coverage linguistic resources and researchers with an interest in expanding real-world Natural Language Generation (NLG) software. Linguists and developers of NLG software have been working separately for many years: NLG researchers are typically more focused on technical issues specific to text generation-where good performance (e.g. recall and precision) is crucial-whereas linguists tend to focus on problems related to the development of exhaustive and precise resources that are mainly 'neutral' visa -vis any NLP application (e.g. parsing or generating sentences), using various grammatical formalisms such as NooJ, TAG or HPSG. However, recent progress in both fields is reducing many of these differences, with largecoverage linguistic resources being more and more used by robust NLP software. For instance, NLG researchers can now use large dictionaries of multiword units and expressions, and several linguistic experiments have shown the feasibility of using large phrase-structure grammars (a priori used for text parsing) in 'generation' mode to automatically produce paraphrases of sentences that are described by these grammars. The eight papers presented at the LiRA@NLG workshop focused on the following questions:  How do we develop 'neutral' linguistic resources (dictionaries, morphological, phrase-structure and transformational grammars) that can be used both to parse and generate texts automatically?  Is it possible to generate grammatical sentences by using linguistic data alone, i.e. with no statistical methods to remove ambiguities? What are the limitations of rule-based systems, as opposed to stochastic ones? The common themes that these articles explore are: how to build large-coverage dictionaries and morphological grammars that can be used by NLG applications, how to integrate a linguistically-based Generation module into a Machine-Translation system, and how to construct a syntactic grammar that can be used by a transformational engine to perform paraphrase generation. Linguists as well as Computational Linguists who work on Automatic Generation based on linguistic methods will find advanced, up-to-the-minute studies on these topics in this volume:  Max Silberztein's article, "Automatic Generation from FOAF to English: Linguistic Contribution to Web Semantics," presents an automatic system capable of generating a large number of English sentences from Friend Of A Friend (FOAF) statements in the RDF Turtle notation using NooJ's transformational engine both in Parse and Generation modes.