Data-Driven Dialogue Systems for Social Agents (original) (raw)

Developing a Twitter bot that can join a discussion using state-of-the-art architectures

Social Network Analysis and Mining

, 118 pages Twitter is today mostly used for sharing and commenting about news [1]. In this manner, the interaction between Twitter users is inevitable. This interaction sometimes causes people to move daily debates to this social platform. Since being dominant in these debates is crucial, automation of this process becomes highly popular [2]. In this work, we aim to train a bot that classifies posted tweets according to their semantic and generates logical tweets about a popular discussion, namely gun debate of the U.S. for this study. Bots are trained to tweet independently on their side of the debate and also reply to a tweet from opposite view. State-of-art architectures are tested to get more accurate classification. We have applied GloVe embedding model for representing tweets. Instead of using handcrafted features, long-short-term memory neural network is applied to these embeddings to get more informative and equal size feature vectors. This model is trained to encode the tweet by fed as a sequence of embeddings. Encoding is used for both classification and generation tasks. LSTM sequence to sequence model is used to generate tweets and replies to tweets. The attention mechanism is added to the reply model to produce more related tweets. We v propose a new metric for measuring the relatedness of the reply to the target tweet. Additionally, human evaluators measure the quality of generated tweets according to relatedness to the topic and target tweet, which is replied.

Edina: Building an Open Domain Socialbot with Self-dialogues

ArXiv, 2017

We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hyb...

Further Advances in Open Domain Dialog Systems in the Third Alexa Prize Socialbot Grand Challenge

2020

Building open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. The Alexa Prize Socialbot Grand Challenge was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the third iteration of the competition, university teams have moved the needle on the state of the art, bringing together common sense knowledge representations, neural response generation models, NLU systems enhanced by large-scale transformer models and improved dialog policies to switch between graph-based representations or retrieval-based or templated dialog fragments, along with generated responses. The Third Socialbot Grand Challenge included an improved version of the CoBot (conversational bot) toolkit from the prior competition, along with topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could...

Gunrock: A Social Bot for Complex and Engaging Long Conversations

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 2019

Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazonselected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users' engagement (e.g., ratings, number of turns). Additionally, users' backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.

The Day a System Becomes a Conversation Partner—Exploring New Horizons in Social Dialogue Systems with Large-scale Deep Learning

NTT Technical Review, 2021

People live their lives by casually talking with others on a daily basis. Such "social" dialogue contributes to building trust among people and satisfying their desire to talk with others. There has been a growing interest in social dialogue systems to satisfy the human desire for chatting with others, and we have been working on a wide range of research projects to develop such systems. With the rapid progress in deep learning, high-performance social dialogue systems using deep learning have been proposed. In this article, we introduce NTT's social dialogue system using the latest deep-learning models as well as the current achievements obtained and challenges with this system.

Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

2022

We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, handwritten dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the fourth iteration of the Alexa Prize Socialbot Grand Challenge, Chirpy Cardinal handled thousands of conversations per day, placing second out of nine bots with an average user rating of 3.58/5.

Viola: A Topic Agnostic Generate-and-Rank Dialogue System

Cornell University - arXiv, 2021

We present Viola, an open-domain dialogue system for spoken conversation that uses a topic-agnostic dialogue manager based on a simple generate-and-rank approach. Leveraging recent advances of generative dialogue systems powered by large language models, Viola fetches a batch of response candidates from various neural dialogue models trained with different datasets and knowledge-grounding inputs. Additional responses originating from template-based generators are also considered, depending on the user's input and detected entities. The hand-crafted generators build on a dynamic knowledge graph injected with rich content that is crawled from the web and automatically processed on a daily basis. Viola's response ranker is a fine-tuned polyencoder that chooses the best response given the dialogue history. While dedicated annotations for the polyencoder alone can indirectly steer it away from choosing problematic responses, we add rule-based safety nets to detect neural degeneration and a dedicated classifier to filter out offensive content. We analyze conversations that Viola took part in for the Alexa Prize Socialbot Grand Challenge 4 and discuss the strengths and weaknesses of our approach. Lastly, we suggest future work with a focus on curating conversation data specifcially for socialbots that will contribute towards a more robust data-driven socialbot. 4th Proceedings of Alexa Prize (Alexa Prize 2020).

Local Knowledge Powered Conversational Agents

ArXiv, 2020

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models. However, even with these advancements, conversational agents still lack the ability to produce responses that are informative and coherent with the local context. In this work, we propose a dialog framework that incorporates both local knowledge as well as users' past dialogues to generate high quality conversations. We introduce an approach to build a dataset based on Reddit conversations, where outbound URL links are widely available in the conversations and the hyperlinked documents can be naturally included as local external knowledge. Using our framework and dataset, we demonstrate that incorporating local knowledge can largely improve informativeness, coherency and realisticness measures using human evaluations. In particular, our approach consistently outperforms the state-of-the-art conversational model on the Reddit dataset across al...

Alquist: The Alexa Prize Socialbot

ArXiv, 2018

This paper describes a new open domain dialogue system Alquist developed as part of the Alexa Prize competition for the Amazon Echo line of products. The Alquist dialogue system is designed to conduct a coherent and engaging conversation on popular topics. We are presenting a hybrid system combining several machine learning and rule based approaches. We discuss and describe the Alquist pipeline, data acquisition, and processing, dialogue manager, NLG, knowledge aggregation and hierarchy of sub-dialogs. We present some of the experimental results.

End-to-End Dialogue with Sentiment Analysis Features

2017

Psychiatric assistance for suicide prevention does not have a wide enough reach to help the number of victims who commit suicide every year. To help people cope with suicidal thoughts when formal care is unavailable, we propose an artificial intelligence, text-based conversational agent that generates responses similar to those of a counselor. The application will offer a temporary channel for expression that serves as a transition to speaking with a professional psychiatrist. We expand upon existing approaches by utilizing sentiment analysis data, or scores that rank the emotional content of users’ text input, when generating responses. We also train a response generation system based on a dataset of counseling and therapy transcripts. We posit that inclusion of sentiment analysis data provides marginally better responses based on quantitative metrics of quality. We hope our results will advance realistic conversation modeling and promote further research into its humanitarian appl...