David M Howcroft | Heriot-Watt University (original) (raw)

Uploads

Papers by David M Howcroft

Research paper thumbnail of Toward Bayesian Synchronous Tree Substitution Grammars for Sentence Planning

Proceedings of the 11th International Conference on Natural Language Generation, 2018

Developing conventional natural language generation systems requires extensive attention from hum... more Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex sets of sentence planning rules. We propose a Bayesian nonparamet-ric approach to learn sentence planning rules by inducing synchronous tree substitution grammars for pairs of text plans and morphosyntactically-specified dependency trees. Our system is able to learn rules which can be used to generate novel texts after training on small datasets.

Research paper thumbnail of G-TUNA: a corpus of referring expressions in German, including duration information

Corpora of referring expressions elicited from human participants in a controlled environment are... more Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using images of furniture as stimuli similarly to the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected in a simulated driving dual-task setting, and additionally provides exact duration annotations for the spoken referring expressions. This corpus will hence allow researchers to analyze the interaction between referring expression length and speech rate, under conditions where the listener is under high vs. low cognitive load.

Research paper thumbnail of The Extended SPaRKy Restaurant Corpus: designing a corpus with variable information density

Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a t... more Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for statistical end-to-end (learned) generation systems. Limitations in existing resources, however, make it difficult to develop systems which can vary the linguistic properties of an utterance as needed. For example, when users' attention is split between a linguistic and a secondary task such as driving, a generation system may need to reduce the information density of an utterance to compensate for the reduction in user attention. We introduce a new corpus in the restaurant recommendation and comparison domain, collected in a paraphrasing paradigm, where subjects wrote texts targeting either a general audience or an elderly family member. This design resulted in a corpus of more than 5000 texts which exhibit a variety of lexical and syntactic choices and differ with respect to average word & sentence length and surprisal. The corpus includes two levels of meaning representation: flat 'semantic stacks' for proposi-tional content and Rhetorical Structure Theory (RST) relations between these propositions.

Research paper thumbnail of Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking

While previous research on readability has typically focused on document-level measures , recent ... more While previous research on readability has typically focused on document-level measures , recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has fo-cused for many years on processing measures that provide difficulty estimates on a word-byword basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.

Research paper thumbnail of Search Challenges in Natural Language Generation with Complex Optimization Objectives

Automatic natural language generation (NLG) is a difficult problem already when merely trying to ... more Automatic natural language generation (NLG) is a difficult problem already when merely trying to come up with natural-sounding utterances. Ubiq-uituous applications, in particular companion technologies , pose the additional challenge of flexible adaptation to a user or a situation. This requires optimizing complex objectives such as information density, in combi-natorial search spaces described using declarative input languages. We believe that AI search and planning is a natural match for these problems, and could substantially contribute to solving them effectively. We illustrate this using a concrete example NLG framework, give a summary of the relevant optimization objectives, and provide an initial list of research challenges.

Research paper thumbnail of From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016

The search space in grammar-based natural language generation tasks can get very large, which is ... more The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.

Research paper thumbnail of Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), 2015

We describe an algorithm for inducing clause-combining rules for use in a traditional natural lan... more We describe an algorithm for inducing clause-combining rules for use in a traditional natural language generation architecture. An experiment pairing lexicalized text plans from the SPaRKy Restaurant Corpus with logical forms obtained by parsing the corresponding sentences demonstrates that the approach is able to learn clause-combining operations which have essentially the same coverage as those used in the SPaRKy Restaurant Corpus. This paper fills a gap in the literature, showing that it is possible to learn microplanning rules for both aggregation and discourse connective insertion, an important step towards ameliorating the knowledge acquisition bottleneck for NLG systems that produce texts with rich discourse structures using traditional architectures.

Research paper thumbnail of Enhancing the Expression of Contrast in the SPaRKy Restaurant Corpus

Proc. of the 14th European Workshop on Natural Language Generation, Aug 8, 2013

"We show that Nakatsu & White’s (2010) proposed enhancements to the SPaRKy Restaurant Corpus (SRC... more "We show that Nakatsu & White’s (2010) proposed enhancements to the SPaRKy Restaurant Corpus (SRC; Walker et al., 2007) for better expressing contrast do indeed make it possible to generate better texts, including ones that make effective and varied use of contrastive connectives and discourse adverbials. After first pre senting a validation experiment for naturalness ratings of SRC texts gathered using Amazon’s Mechanical Turk, we present an initial experiment suggesting that such ratings can be used to train a realization ranker that enables higher-rated texts to be selected when the ranker is trained on a sample of generated restaurant recommendations with the contrast enhancements than without them. We conclude with a discussion of possible ways of improving the ranker in future work."

articles by David M Howcroft

Research paper thumbnail of How speakers adapt object descriptions to listeners under load

Language, Cognition, & Neuroscience, 2019

A controversial issue in psycholinguistics is the degree to which speakers employ audience design... more A controversial issue in psycholinguistics is the degree to which speakers employ audience design during language production. Hypothesising that a consideration of the listener's needs is particularly relevant when the listener is under cognitive load, we had speakers describe objects for a listener performing an easy or a difficult simulated driving task. We predicted that speakers would introduce more redundancy in their descriptions in the difficult driving task, thereby accommodating the listener's reduced cognitive capacity. The results showed that speakers did not adapt their descriptions to a change in the listener's cognitive load. However, speakers who had experienced the driving task themselves before and who were presented with the difficult driving task first were more redundant than other speakers. These findings may suggest that speakers only consider the listener's needs in the presence of strong enough cues, and do not update their beliefs about these needs during the task.

Research paper thumbnail of Toward Bayesian Synchronous Tree Substitution Grammars for Sentence Planning

Proceedings of the 11th International Conference on Natural Language Generation, 2018

Developing conventional natural language generation systems requires extensive attention from hum... more Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex sets of sentence planning rules. We propose a Bayesian nonparamet-ric approach to learn sentence planning rules by inducing synchronous tree substitution grammars for pairs of text plans and morphosyntactically-specified dependency trees. Our system is able to learn rules which can be used to generate novel texts after training on small datasets.

Research paper thumbnail of G-TUNA: a corpus of referring expressions in German, including duration information

Corpora of referring expressions elicited from human participants in a controlled environment are... more Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for German. Using images of furniture as stimuli similarly to the TUNA and D-TUNA corpora, our corpus extends on these corpora by providing data collected in a simulated driving dual-task setting, and additionally provides exact duration annotations for the spoken referring expressions. This corpus will hence allow researchers to analyze the interaction between referring expression length and speech rate, under conditions where the listener is under high vs. low cognitive load.

Research paper thumbnail of The Extended SPaRKy Restaurant Corpus: designing a corpus with variable information density

Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a t... more Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for statistical end-to-end (learned) generation systems. Limitations in existing resources, however, make it difficult to develop systems which can vary the linguistic properties of an utterance as needed. For example, when users' attention is split between a linguistic and a secondary task such as driving, a generation system may need to reduce the information density of an utterance to compensate for the reduction in user attention. We introduce a new corpus in the restaurant recommendation and comparison domain, collected in a paraphrasing paradigm, where subjects wrote texts targeting either a general audience or an elderly family member. This design resulted in a corpus of more than 5000 texts which exhibit a variety of lexical and syntactic choices and differ with respect to average word & sentence length and surprisal. The corpus includes two levels of meaning representation: flat 'semantic stacks' for proposi-tional content and Rhetorical Structure Theory (RST) relations between these propositions.

Research paper thumbnail of Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking

While previous research on readability has typically focused on document-level measures , recent ... more While previous research on readability has typically focused on document-level measures , recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has fo-cused for many years on processing measures that provide difficulty estimates on a word-byword basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.

Research paper thumbnail of Search Challenges in Natural Language Generation with Complex Optimization Objectives

Automatic natural language generation (NLG) is a difficult problem already when merely trying to ... more Automatic natural language generation (NLG) is a difficult problem already when merely trying to come up with natural-sounding utterances. Ubiq-uituous applications, in particular companion technologies , pose the additional challenge of flexible adaptation to a user or a situation. This requires optimizing complex objectives such as information density, in combi-natorial search spaces described using declarative input languages. We believe that AI search and planning is a natural match for these problems, and could substantially contribute to solving them effectively. We illustrate this using a concrete example NLG framework, give a summary of the relevant optimization objectives, and provide an initial list of research challenges.

Research paper thumbnail of From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016

The search space in grammar-based natural language generation tasks can get very large, which is ... more The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can effectively detect partial solutions (edges) which cannot ultimately be part of a complete sentence because of their syntactic category. Formulating the completion of an edge into a sentence as finding a solution path in a large state-transition system, we demonstrate a connection to AI Planning which is concerned with this kind of problem. We design a compilation from OpenCCG into AI Planning allowing the detection of infeasible edges via AI Planning dead-end detection methods (proving the absence of a solution to the compilation). Our experiments show that this can filter out large fractions of infeasible edges in, and thus benefit the performance of, complex realization processes.

Research paper thumbnail of Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus

Proceedings of the 15th European Workshop on Natural Language Generation (ENLG), 2015

We describe an algorithm for inducing clause-combining rules for use in a traditional natural lan... more We describe an algorithm for inducing clause-combining rules for use in a traditional natural language generation architecture. An experiment pairing lexicalized text plans from the SPaRKy Restaurant Corpus with logical forms obtained by parsing the corresponding sentences demonstrates that the approach is able to learn clause-combining operations which have essentially the same coverage as those used in the SPaRKy Restaurant Corpus. This paper fills a gap in the literature, showing that it is possible to learn microplanning rules for both aggregation and discourse connective insertion, an important step towards ameliorating the knowledge acquisition bottleneck for NLG systems that produce texts with rich discourse structures using traditional architectures.

Research paper thumbnail of Enhancing the Expression of Contrast in the SPaRKy Restaurant Corpus

Proc. of the 14th European Workshop on Natural Language Generation, Aug 8, 2013

"We show that Nakatsu & White’s (2010) proposed enhancements to the SPaRKy Restaurant Corpus (SRC... more "We show that Nakatsu & White’s (2010) proposed enhancements to the SPaRKy Restaurant Corpus (SRC; Walker et al., 2007) for better expressing contrast do indeed make it possible to generate better texts, including ones that make effective and varied use of contrastive connectives and discourse adverbials. After first pre senting a validation experiment for naturalness ratings of SRC texts gathered using Amazon’s Mechanical Turk, we present an initial experiment suggesting that such ratings can be used to train a realization ranker that enables higher-rated texts to be selected when the ranker is trained on a sample of generated restaurant recommendations with the contrast enhancements than without them. We conclude with a discussion of possible ways of improving the ranker in future work."

Research paper thumbnail of How speakers adapt object descriptions to listeners under load

Language, Cognition, & Neuroscience, 2019

A controversial issue in psycholinguistics is the degree to which speakers employ audience design... more A controversial issue in psycholinguistics is the degree to which speakers employ audience design during language production. Hypothesising that a consideration of the listener's needs is particularly relevant when the listener is under cognitive load, we had speakers describe objects for a listener performing an easy or a difficult simulated driving task. We predicted that speakers would introduce more redundancy in their descriptions in the difficult driving task, thereby accommodating the listener's reduced cognitive capacity. The results showed that speakers did not adapt their descriptions to a change in the listener's cognitive load. However, speakers who had experienced the driving task themselves before and who were presented with the difficult driving task first were more redundant than other speakers. These findings may suggest that speakers only consider the listener's needs in the presence of strong enough cues, and do not update their beliefs about these needs during the task.