Natural language generation: The commercial state of the art in 2020 (original) (raw)

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

This paper surveys the current state of the art in Natural Language Generation (nlg), defined as the task of generating text or speech from non-linguistic input. A survey of nlg is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of nlg technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in nlg and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between nlg and other areas of artificial intelligence; (c) draw attention to the challenges in nlg evaluation, relating them to similar challenges faced in other areas of nlp, with an emphasis on different evaluation methods and the relationships between them.

Data-driven Natural Language Generation: Paving the Road to Success

ArXiv, 2017

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training.

A repository of data and evaluation resources for natural language generation

2012

Abstract Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created.

Toward Natural Language Generation by Humans

Intelligent Narrative Technologies, 2015

Natural language generation (NLG) has been featured in at most a handful of shipped games and interactive stories. This is certainly due to it being a very specialized practice, but another contributing factor is that the state of the art today, in terms of content quality, is simply inadequate. The major benefits of NLG are its alleviation of authorial burden and the capability it gives to a system of generating state-bespoke content, but we believe we can have these benefits without actually employing a full NLG pipeline. In this paper, we present the preliminary design of Expressionist, an in-development mixed-initiative authoring tool that instantiates an authoring scheme residing somewhere between conventional NLG and conventional human content authoring. In this scheme, a human author plays the part of an NLG module in that she starts from a set of deep representations constructed for the game or story domain and proceeds to specify dialogic content that may express those representations. Rather than authoring static dialogue, the author defines a probabilistic context-free grammar that yields templated dialogue. This allows a human author to still harness a computer's generativity, but in a capacity in which it can be trusted: operating over probabilities and treelike control structures. Additional features of Expressionist's design include arbitrary markup and realtime feedback showing currently valid derivations.

Natural language generation

Handbook of Natural Language Processing, 2000

We report here on a significant new set of capabilities that we have incorporated into our language generation system MUMBLE. Their impact will be to greatly simplify the work of any text planner that uses MUMBLE as ita linguistics component since MUMBLE can now take on many of the planner's text organization and decision-making problems with markedly less hand-tailoring of algorithms in either component.

INLG 2008 Fifth International Natural Language Generation Conference

2008

We are pleased to introduce the technical program of the Fifth International Natural Language Generation Conference (INLG 2008), the Biennial Meeting of SIGGEN, the ACL Special Interest Group in Natural Language Generation. INLG is the leading international conference on research into natural language generation. It has been held in Sydney (Australia) in 2006, at Brockenhurst (UK) in 2004, in Harriman (New York, USA) in 2002, and in Mitzpe Ramon (Israel) in 2000.

Acquiring Correct Knowledge for Natural Language Generation

Journal of Artificial Intelligence Research, 2003

Natural language generation (nlg) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. nlg systems, like most ai systems, need substantial amounts of knowledge. However, our experience in two nlg projects suggests that it is difficult to acquire correct knowledge for nlg systems; indeed, every knowledge acquisition (ka) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based ka approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented ka techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other nlg systems as well. In the long term, we hope that new ka techniques may emerge to help nlg system builders. In the shorter term, we believe that understanding how individual ka techniques can fail, and using a mixture of different ka techniques with different strengths and weaknesses, can help developers acquire nlg knowledge that is mostly correct.

Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)}

Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)}, 2009

@Book{ENLG:2009, editor = {Emiel Krahmer and Mari\"{e}t Theune}, title = {Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)}, month = {March}, year = {2009}, address = {Athens, Greece}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/anthology/W09-06} } @InProceedings{reiter-EtAl:2009:ENLG, author = {Reiter, Ehud and Turner, Ross and Alm, Norman and Black, Rolf and Dempster, Martin and Waller, Annalu}, title = {Using {NLG} to Help Language-Impaired Users Tell ...

Implementation architectures for natural language generation

Natural Language Engineering, 2004

Generic software architectures aim to support re-use of components, focusing of research and development effort, and evaluation and comparison of approaches. In the field of natural language processing, generic frameworks for understanding have been successfully deployed to meet all of these aims, but nothing comparable yet exists for generation. The nature of the task itself, and the current methodologies available to research it, seem to make it more difficult to reach the necessary level of consensus to support generic proposals. Recent work has made progress towards establishing a generic framework for generation at the functional level, but left open the issue of actual implementation. In this paper, we discuss the requirements for such an implementation layer for generation systems, drawing on two initial attempts to implement it. We argue that it is possible and useful to distinguish "functional architecture" from "implementation architecture" for generation systems. 1 The Case for a Generic Software Architecture for NLG Most natural language generation (NLG) systems have some kind of modular structure. The individual modules may differ in complex ways, according to whether they are based on symbolic or statistical models, what particular linguistic theories they embrace and so on. Ideally, such modules could be reused in other NLG systems. This would avoid duplication of work, allow realistic research specialisation and allow empirical comparison of different approaches. Examples of ideas that might give rise to reusable modules include: