G-TUNA: a corpus of referring expressions in German, including duration information (original) (raw)

Speaker-dependent variation in content selection for referring expression generation

2010

In this paper we describe machine learning experiments that aim to characterise the content selection process for distinguishing descriptions. Our experiments are based on two large corpora of humanproduced descriptions of objects in relatively small visual scenes; the referring expressions are annotated with their semantic content. The visual context of reference is widely considered to be a primary determinant of content in referring expression generation, so we explore whether a model can be trained to predict the collection of descriptive attributes that should be used in a given situation. Our experiments demonstrate that speaker-specific preferences play a much more important role than existing approaches to referring expression generation acknowledge.

Towards a balanced corpus of multimodal referring expressions in dialogue

2008

Abstract. This paper describes an experiment in which dialogues are elicited through an identification task. Currently we are transcribing the collected data. The primary purpose of the experiment is to test a number of hypotheses regarding both the production and perception of multimodal referring expressions. To achieve this, the experiment was designed such that a number of factors (prior reference, focus of attention, visual attributes and cardinality) were systematically manipulated.

Generating a Novel Dataset of Multimodal Referring Expressions

Proceedings of the 13th International Conference on Computational Semantics - Short Papers, 2019

Referring expressions and definite descriptions of objects in space exploit information about both object characteristics and locations. Linguistic referencing strategies can rely on increasingly highlevel abstractions to distinguish an object in a given location from similar ones elsewhere, yet the description of the intended location may still be unnatural or difficult to interpret. Modalities like gesture may communicate spatial information like locations in a more concise manner. When communicating with each other, humans mix language and gesture to reference entities, changing modalities as needed. Recent progress in AI and human-computer interaction has created systems where a human can interact with a computer multimodally, but computers often lack the capacity to intelligently mix modalities when generating referring expressions. We present a novel dataset of referring expressions combining natural language and gesture, describe its creation and evaluation, and its uses to train models for generating and interpreting multimodal referring expressions.

Referring Expression Generation: Taking Speakers’ Preferences into Account

Lecture Notes in Computer Science, 2014

We describe a classification-based approach to referring expression generation (REG) making use of standard context-related features, and an extension that adds speaker-related features. Results show that taking speakers' preferences into account outperforms the standard REG model in four test corpora of definite descriptions.

Referring in installments: A corpus study of spoken object references in an interactive virtual environment

2012

Commonly, the result of referring expression generation algorithms is a single noun phrase. In interactive settings with a shared workspace, however, human dialog partners often split referring expressions into installments that adapt to changes in the context and to actions of their partners. We present a corpus of human–human interactions in the GIVE-2 setting in which instructions are spoken. A first study of object descriptions in this corpus shows that references in installments are quite common in this scenario and suggests that ...

Generating referring expressions in multimodal contexts

Workshop on Coherence in …, 2000

This paper adresses the need of structuring the global context set into subsets or domains in order to explain adequately the use of referring expressions in a multimodal corpus. We underline, in particular, the importance of taking into account not only the discourse, but also perception and gestures for the construction of these domains. We propose a unified context model where the context is built up dynamically from different information sources and show that this way of context modelling predicts correctly the use of referring expressions in a corpus of instructional dialogues.

Trainable speaker-based referring expression generation

Twelfth Conference on …, 2008

Previous work in referring expression generation has explored general purpose techniques for attribute selection and surface realization. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface realization approaches that combine semantic and word order information. In this paper we describe and evaluate several end-to-end referring expression generation algorithms that take into consideration speaker style and use data-driven surface realization techniques.

Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Dialogue participants often refer to entities or situations repeatedly within a conversation, which contributes to its cohesiveness. Subsequent references exploit the common ground accumulated by the interlocutors and hence have several interesting properties, namely, they tend to be shorter and reuse expressions that were effective in previous mentions. In this paper, we tackle the generation of first and subsequent references in visually grounded dialogue. We propose a generation model that produces referring utterances grounded in both the visual and the conversational context. To assess the referring effectiveness of its output, we also implement a reference resolution system. Our experiments and analyses show that the model produces better, more effective referring utterances than a model not grounded in the dialogue context, and generates subsequent references that exhibit linguistic patterns akin to humans.

Trainable Referring Expression Generation using Overspecification Preferences

arXiv (Cornell University), 2017

Referring expression generation (REG) models that use speaker-dependent information require a considerable amount of training data produced by every individual speaker, or may otherwise perform poorly. In this work we present a simple REG experiment that allows the use of larger training data sets by grouping speakers according to their overspecification preferences. Intrinsic evaluation shows that this method generally outperforms the personalised method found in previous work.