Marilyn Walker | University of California, Santa Cruz (original) (raw)
Papers by Marilyn Walker
International Conference on Interactive Digital Storytelling, Nov 2014
We present an annotation scheme and combinatorial authoring procedure by which a small base of an... more We present an annotation scheme and combinatorial authoring procedure by which a small base of annotated human-authored dialogue exchanges can be exploited to automatically generate many new exchanges. The combinatorial procedure builds recombinant exchanges by reasoning about individual lines of dialogue in terms of their mark-up, which is attributed during annotation and captures what a line expresses about the story world and what it specifies about lines that may precede or succeed it in new contexts. From a human evaluation task, we find that while our computer-authored recombinant dialogue exchanges are not rated as highly as human-authored ones, they still rate quite well and show more than double the strength of the latter in expressing game state. We envision immediate practical use of our method in a collaborative authoring scheme in which, given a small database of annotated dialogue, the computer instantly generates many full exchanges that the human author then polishes, if necessary. We believe that combinatorial dialogue authoring represents an immediate and huge reduction in authorial burden relative to current authoring practice.
Journal of Artificial Intelligence Research, 2002
Natural Language Engineering, 2000
Computer Speech & Language, 1998
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for e... more This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, speci es the relative contribution of various factors to performance, and makes it possible to compare agents performing di erent tasks by normalizing for task complexity. After presenting PARADISE, we illustrate its application to two di erent spoken dialogue agents. We show how to derive a performance function for each agent and how to generalize results across agents. We then show that once such a performance function has been derived, that it can be used both for making predictions about future versions of an agent, and as feedback to the agent so that the agent can learn to optimize its behavior based on its experiences with users over time.
Computational Linguistics, 1996
In computational theories of discourse, there are at least three processes presumed to operate un... more In computational theories of discourse, there are at least three processes presumed to operate under a limited attention constraint of some type: (1) ellipsis interpretation; (2) pronominal anaphora interpretation; and (3) inference of discourse relations between representations A and B of utterances in a discourse, e.g. B motivates A. In each case, the interpretation of the current element B of a discourse depends on the accessibility of another earlier element A. According to the limited attention constraint only a limited number of candidates need to be considered in the processing of B, for example, only a limited number of entities in the discourse model are potential cospecifiers for a pronoun.
In order to take steps towards establishing a methodology for evaluating Natural Language systems... more In order to take steps towards establishing a methodology for evaluating Natural Language systems, we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative results of handsimulating these algorithms, but this analysis naturally gives rise to both a qualitative evaluation and recommendations for performing such evaluations in general. We illustrate the general difficulties encountered with quantitative evaluation. These are problems with: (a) allowing for underlying assumptions, (b) determining how to handle underspecifications, and (c) evaluating the contribution of false positives and error chaining.
While the notion of a cooperative response has been the focus of considerable research in natural... more While the notion of a cooperative response has been the focus of considerable research in natural language dialogue systems, there has been little empirical work demonstrating how such responses lead to more efficient, natural, or successful dialogues. This paper presents an experimental evaluation of two alternative response strategies in TOOT, a spoken dialogue agent that allows users to access train schedules stored on the web via a telephone conversation. We compare the performance of two versions of TOOT (literal and cooperative), by having users carry out a set of tasks with each version. By using hypothesis testing methods, we show that a combination of response strategy, application task, and task/strategy interactions account for various types of performance differences. By using the PARADISE evaluation framework to estimate an overall performance function, we identify interdependencies that exist between speech recognition and response strategy. Our results elaborate the conditions under which TOOT' s cooperative rather than literal strategy contributes to greater performance.
This paper describes a novel method by which a dialogue agent can learn to choose an optimal dial... more This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of communicative intentions, there has been little work on automatically optimizing an agent's choices when there are multiple ways to realize a communicative intention. Our method is based on a combination of learning algorithms and empirical evaluation techniques. The learning component of our method is based on algorithms for reinforcement learning, such as dynamic programming and Q-learning. The empirical component uses the PARADISE evaluation framework to identify the important peribrmance factors and to provide the performance function needed by the learning algorithm. We illustrate our method with a dialogue agent named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We show how ELVIS can learn to choose among alternate strategies for agent initiative, for reading messages, and for summarizing email folders.
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
This paper presents PARADISE (PARAdigm for Dialogue System Evaluation), a general framework for e... more This paper presents PARADISE (PARAdigm for Dialogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
Systems in which human users speak to a computer in order to achieve a goal are called spoken dia... more Systems in which human users speak to a computer in order to achieve a goal are called spoken dialogue systems. Such systems are some of the few realized examples of open- ended, real-time, goal-oriented interaction between humans and computers, and are therefore an ...
This paper describes the application of the PARADISE evaluation framework to the corpus of 662 hu... more This paper describes the application of the PARADISE evaluation framework to the corpus of 662 human-computer dialogues collected in the June 2000 Darpa Communicator data collection. We describe results based on the standard logfile metrics as well as results based on additional qualitative metrics derived using the DATE dialogue act tagging scheme. We show that performance models derived via using the standard metrics can account for 37% of the variance in user satisfaction, and that the addition of DATE metrics improved the models by an absolute 5%.
The objective of the DARPA Communicator project is to support rapid, cost-effective development o... more The objective of the DARPA Communicator project is to support rapid, cost-effective development of multi-modal speech-enabled dialog systems with advanced conversational capabilities. In order to make this a reality, it is important to be able to evaluate the contribution of various ...
Computing Research Repository, 1995
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
Mobile interfaces need to allow the user and system to adapt their choice of communication modes ... more Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines finite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with flexible input and adaptive output. Our testbed application MATCH (Multimodal Access To City Help) provides a mobile multimodal speech-pen interface to restaurant and subway information for New York City.
Computing Research Repository, 1997
This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvis... more This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvisation of spoken utterances by artificial agents, with applications to interactive story and dialogue systems. We argue that linguistic style is a key aspect of character, and show how speech act representations common in AI can provide abstract representations from which computer characters can improvise. We show that the mechanisms proposed introduce the possibility of socially oriented agents, meet the requirements that lifelike characters be believable, and satisfy particular criteria for improvisation proposed by Hayes-Roth. *
International Conference on Interactive Digital Storytelling, Nov 2014
We present an annotation scheme and combinatorial authoring procedure by which a small base of an... more We present an annotation scheme and combinatorial authoring procedure by which a small base of annotated human-authored dialogue exchanges can be exploited to automatically generate many new exchanges. The combinatorial procedure builds recombinant exchanges by reasoning about individual lines of dialogue in terms of their mark-up, which is attributed during annotation and captures what a line expresses about the story world and what it specifies about lines that may precede or succeed it in new contexts. From a human evaluation task, we find that while our computer-authored recombinant dialogue exchanges are not rated as highly as human-authored ones, they still rate quite well and show more than double the strength of the latter in expressing game state. We envision immediate practical use of our method in a collaborative authoring scheme in which, given a small database of annotated dialogue, the computer instantly generates many full exchanges that the human author then polishes, if necessary. We believe that combinatorial dialogue authoring represents an immediate and huge reduction in authorial burden relative to current authoring practice.
Journal of Artificial Intelligence Research, 2002
Natural Language Engineering, 2000
Computer Speech & Language, 1998
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for e... more This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, speci es the relative contribution of various factors to performance, and makes it possible to compare agents performing di erent tasks by normalizing for task complexity. After presenting PARADISE, we illustrate its application to two di erent spoken dialogue agents. We show how to derive a performance function for each agent and how to generalize results across agents. We then show that once such a performance function has been derived, that it can be used both for making predictions about future versions of an agent, and as feedback to the agent so that the agent can learn to optimize its behavior based on its experiences with users over time.
Computational Linguistics, 1996
In computational theories of discourse, there are at least three processes presumed to operate un... more In computational theories of discourse, there are at least three processes presumed to operate under a limited attention constraint of some type: (1) ellipsis interpretation; (2) pronominal anaphora interpretation; and (3) inference of discourse relations between representations A and B of utterances in a discourse, e.g. B motivates A. In each case, the interpretation of the current element B of a discourse depends on the accessibility of another earlier element A. According to the limited attention constraint only a limited number of candidates need to be considered in the processing of B, for example, only a limited number of entities in the discourse model are potential cospecifiers for a pronoun.
In order to take steps towards establishing a methodology for evaluating Natural Language systems... more In order to take steps towards establishing a methodology for evaluating Natural Language systems, we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative results of handsimulating these algorithms, but this analysis naturally gives rise to both a qualitative evaluation and recommendations for performing such evaluations in general. We illustrate the general difficulties encountered with quantitative evaluation. These are problems with: (a) allowing for underlying assumptions, (b) determining how to handle underspecifications, and (c) evaluating the contribution of false positives and error chaining.
While the notion of a cooperative response has been the focus of considerable research in natural... more While the notion of a cooperative response has been the focus of considerable research in natural language dialogue systems, there has been little empirical work demonstrating how such responses lead to more efficient, natural, or successful dialogues. This paper presents an experimental evaluation of two alternative response strategies in TOOT, a spoken dialogue agent that allows users to access train schedules stored on the web via a telephone conversation. We compare the performance of two versions of TOOT (literal and cooperative), by having users carry out a set of tasks with each version. By using hypothesis testing methods, we show that a combination of response strategy, application task, and task/strategy interactions account for various types of performance differences. By using the PARADISE evaluation framework to estimate an overall performance function, we identify interdependencies that exist between speech recognition and response strategy. Our results elaborate the conditions under which TOOT' s cooperative rather than literal strategy contributes to greater performance.
This paper describes a novel method by which a dialogue agent can learn to choose an optimal dial... more This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of communicative intentions, there has been little work on automatically optimizing an agent's choices when there are multiple ways to realize a communicative intention. Our method is based on a combination of learning algorithms and empirical evaluation techniques. The learning component of our method is based on algorithms for reinforcement learning, such as dynamic programming and Q-learning. The empirical component uses the PARADISE evaluation framework to identify the important peribrmance factors and to provide the performance function needed by the learning algorithm. We illustrate our method with a dialogue agent named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We show how ELVIS can learn to choose among alternate strategies for agent initiative, for reading messages, and for summarizing email folders.
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
This paper presents PARADISE (PARAdigm for Dialogue System Evaluation), a general framework for e... more This paper presents PARADISE (PARAdigm for Dialogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
Systems in which human users speak to a computer in order to achieve a goal are called spoken dia... more Systems in which human users speak to a computer in order to achieve a goal are called spoken dialogue systems. Such systems are some of the few realized examples of open- ended, real-time, goal-oriented interaction between humans and computers, and are therefore an ...
This paper describes the application of the PARADISE evaluation framework to the corpus of 662 hu... more This paper describes the application of the PARADISE evaluation framework to the corpus of 662 human-computer dialogues collected in the June 2000 Darpa Communicator data collection. We describe results based on the standard logfile metrics as well as results based on additional qualitative metrics derived using the DATE dialogue act tagging scheme. We show that performance models derived via using the standard metrics can account for 37% of the variance in user satisfaction, and that the addition of DATE metrics improved the models by an absolute 5%.
The objective of the DARPA Communicator project is to support rapid, cost-effective development o... more The objective of the DARPA Communicator project is to support rapid, cost-effective development of multi-modal speech-enabled dialog systems with advanced conversational capabilities. In order to make this a reality, it is important to be able to evaluate the contribution of various ...
Computing Research Repository, 1995
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversatio... more Conversation between two people is usually of MIXED-INITIATIVE, with CONTROL over the conversation being transferred from one person to another. We apply a set of rules for the transfer of control to 4 sets of dialogues consisting of a total of 1862 turns. The application of the control rules lets us derive domain-independent discourse structures. The derived structures indicate that initiative plays a role in the structuring of discourse. In order to explore the relationship of control and initiative to discourse processes like centering, we analyze the distribution of four different classes of anaphora for two data sets. This distribution indicates that some control segments are hierarchically related to others. The analysis suggests that discourse participants often mutually agree to a change of topic. We also compared initiative in Task Oriented and Advice Giving dialogues and found that both allocation of control and the manner in which control is transferred is radically different for the two dialogue types. These differences can be explained in terms of collaborative planning principles.
Mobile interfaces need to allow the user and system to adapt their choice of communication modes ... more Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines finite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with flexible input and adaptive output. Our testbed application MATCH (Multimodal Access To City Help) provides a mobile multimodal speech-pen interface to restaurant and subway information for New York City.
Computing Research Repository, 1997
This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvis... more This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvisation of spoken utterances by artificial agents, with applications to interactive story and dialogue systems. We argue that linguistic style is a key aspect of character, and show how speech act representations common in AI can provide abstract representations from which computer characters can improvise. We show that the mechanisms proposed introduce the possibility of socially oriented agents, meet the requirements that lifelike characters be believable, and satisfy particular criteria for improvisation proposed by Hayes-Roth. *