XDAI: A Tuning-free Framework for Exploiting Pre-trained Language Models in Knowledge Grounded Dialogue Generation (original) (raw)

Knowledge-Grounded Dialogue Generation with a Unified Knowledge Representation

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and heterogeneous knowledge sources. Existing systems perform poorly on unseen topics due to limited topics covered in the training data. In addition, heterogeneous knowledge sources make it challenging for systems to generalize to other tasks because knowledge sources in different knowledge representations require different knowledge encoders. To address these challenges, we present PLUG 1 , a language model that homogenizes different knowledge sources to a unified knowledge representation for knowledge-grounded dialogue generation tasks. PLUG is pre-trained on a dialogue generation task conditioned on a unified essential knowledge representation. It can generalize to different downstream knowledgegrounded dialogue generation tasks with a few training examples. The empirical evaluation on two benchmarks shows that our model generalizes well across different knowledgegrounded tasks. It can achieve comparable performance with state-of-the-art methods under a fully-supervised setting and significantly outperforms other methods in zero-shot and fewshot settings. * Work was done when Yu Li was interning at MSR 1 Pre-trained Language model with a Unified knowledge representation for knowledge-Grounded dialogues. Dataset Knowledge % Topics Open-domain Wizard of Wikipedia articles 0.02% CMU_DoG articles 0.04% Recommendation REDIAL tables 15.0% OPENDIALKG graph 7.5%

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

arXiv (Cornell University), 2023

Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present In-structTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, Instruct-TODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that In-structTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here 1 .

AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation

ArXiv, 2021

Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for taskoriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via backtranslation to increase the diversity of the training data. We further examine the possibilities of combining data from multiples sources to improve performance on the target dataset. We carefully evaluate our contributions with both human and automatic methods. Our model achieves state-of-the-art performance on the MultiWOZ data and shows competitive performance in human evaluation.

Response Generation with Context-Aware Prompt Learning

ArXiv, 2021

Pre-trained language models (PLM) have marked a huge leap in neural dialogue modeling. While PLMs are pre-trained on largescale text corpora, they are usually fine-tuned on scarce dialogue data with specific domain knowledge and dialogue styles. However, tailoring the language models while fully utilizing prior knowledge in large pre-trained models remains a challenge. In this paper, we present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task. Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts, which appropriately elicit knowledge from the large pre-trained model. To encourage the model to better utilize the prompt embeddings, the prompt encoders are designed to be conditioned on the input dialogue context. Experiments on popular conversation datasets show that our approach significantly outperforms the fine-tuning basel...

Robust Conversational AI with Grounded Text Generation

ArXiv, 2020

This article presents a hybrid approach based on a Grounded Text Generation (GTG) model to building robust task bots at scale. GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone, combined with symbol-manipulation modules for knowledge base inference and prior knowledge encoding, to generate responses grounded in dialog belief state and real-world knowledge for task completion. GTG is pre-trained on large amounts of raw text and human conversational data, and can be fine-tuned to complete a wide range of tasks. The hybrid approach and its variants are being developed simultaneously by multiple research teams. The primary results reported on task-oriented dialog benchmarks are very promising, demonstrating the big potential of this approach. This article provides an overview of this progress and discusses related methods and technologies that can be incorporated for building robust conversational AI systems.

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Although pre-training models have achieved great success in dialogue generation, their performance drops dramatically when the input contains an entity that does not appear in pretraining and fine-tuning datasets (unseen entity). To address this issue, existing methods leverage an external knowledge base to generate appropriate responses. In real-world scenario, the entity may not be included by the knowledge base or suffer from the precision of knowledge retrieval. To deal with this problem, instead of introducing knowledge base as the input, we force the model to learn a better semantic representation by predicting the information in the knowledge base, only based on the input context. Specifically, with the help of a knowledge base, we introduce two auxiliary training objectives: 1) Interpret Masked Word, which conjectures the meaning of the masked entity given the context; 2) Hypernym Generation, which predicts the hypernym of the entity based on the context. Experiment results on two dialogue corpus verify the effectiveness of our methods under both knowledge available and unavailable settings. * Contribution during internship at MSRA. (a) Non-knowledge dialogue generation. (b) Knowledge grounded dialogue generation. Note that the knowledge of "COVID-19" can not be retrieved from the knowledge base, because it is a new term. (c) The proposed knowledge enhanced dialogue generation.

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

arXiv (Cornell University), 2022

We introduce GODEL (Grounded Open Dialogue Language Model), a large pretrained language model for dialog. In contrast with earlier models such as DialoGPT, GODEL leverages a new phase of grounded pre-training designed to better support adapting GODEL to a wide range of downstream dialog tasks that require information external to the current conversation (e.g., a database or document) to produce good responses. Experiments against an array of benchmarks that encompass task-oriented dialog, conversational QA, and grounded open-domain dialog show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot finetuning setups, in terms of both human and automatic evaluation. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses (extrinsic evaluation) in addition to their communicative features (intrinsic evaluation). We show that extrinsic evaluation offers improved inter-annotator agreement and correlation with automated metrics. Code and data processing scripts are publicly available. 1

Neural, Neural Everywhere: Controlled Generation Meets Scaffolded, Structured Dialogue∗

2021

In this paper, we present the second iteration of Chirpy Cardinal, an open-domain dialogue agent developed for the Alexa Prize SGC4 competition. Building on the success of the SGC3 Chirpy, we focus on improving conversational flexibility, initiative, and coherence. We introduce a variety of methods for controllable neural generation, ranging from prefix-based neural decoding over a symbolic scaffolding, to pure neural modules, to a novel hybrid infilling-based method that combines the best of both worlds. Additionally, we enhance previous news, music and movies modules with new APIs, as well as make major improvements in entity linking, topical transitions, and latency. Finally, we expand the variety of responses via new modules that focus on personal issues, sports, food, and even extraterrestrial life! These components come together to create a refreshed Chirpy Cardinal that is able to initiate conversations filled with interesting facts, engaging topics, and heartfelt responses.

Viola: A Topic Agnostic Generate-and-Rank Dialogue System

Cornell University - arXiv, 2021

We present Viola, an open-domain dialogue system for spoken conversation that uses a topic-agnostic dialogue manager based on a simple generate-and-rank approach. Leveraging recent advances of generative dialogue systems powered by large language models, Viola fetches a batch of response candidates from various neural dialogue models trained with different datasets and knowledge-grounding inputs. Additional responses originating from template-based generators are also considered, depending on the user's input and detected entities. The hand-crafted generators build on a dynamic knowledge graph injected with rich content that is crawled from the web and automatically processed on a daily basis. Viola's response ranker is a fine-tuned polyencoder that chooses the best response given the dialogue history. While dedicated annotations for the polyencoder alone can indirectly steer it away from choosing problematic responses, we add rule-based safety nets to detect neural degeneration and a dedicated classifier to filter out offensive content. We analyze conversations that Viola took part in for the Alexa Prize Socialbot Grand Challenge 4 and discuss the strengths and weaknesses of our approach. Lastly, we suggest future work with a focus on curating conversation data specifcially for socialbots that will contribute towards a more robust data-driven socialbot. 4th Proceedings of Alexa Prize (Alexa Prize 2020).

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, 2021

Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for taskoriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via backtranslation to increase the diversity of the training data. We further examine the possibilities of combining data from multiples sources to improve performance on the target dataset. We carefully evaluate our contributions with both human and automatic methods. Our model substantially outperforms the baseline on the MultiWOZ data and shows competitive performance with state of the art in both automatic and human evaluation.