GitHub - mawentao277/PR-Embedding: Conversational Word Embedding for Retrieval-based Dialog System (ACL2020) (original) (raw)

This repository contains resources of the following ACL 2020 paper.

Title: Conversational Word Embedding for Retrieval-based Dialog System
Authors: Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/2020.acl-main.127/

News

2020/7/16 we have released the codes, and the data for the personachat example is in data_for_personachat (password vzoQ). You can clone the codes and download the data to have a try ^^
2020/7/6 We have already uploaded the Chinese PR-Embedding based on Zhidao (password g3FK) and Weibo (password Yz6H) Zhidao Embedding |Weibo Embedding. Where the Zhidao Embedding has been used in the experiment part of the paper.

Requirements

gcc4.8.5 (or >=4.8.5)
Python3.6
Keras2.1.2 (or >=2.1)
Tensorflow1.12.0 (or >=1.12)
(We run the codes in gcc4.8.5 + Python3.6 + Keras2.1.2 + Tensorflow1.12.0)

Quick Start

  1. Prepare the corpus: each line has two columns corresponding to a conversation pair, for example:
    Hi , how are you \t I am fine , Thank you !
  2. Train your PR-Embedding:
    sh train.sh <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi><mi>o</mi><mi>r</mi><mi>p</mi><mi>u</mi><msub><mi>s</mi><mi>f</mi></msub><mi>i</mi><mi>l</mi><mi>e</mi></mrow><annotation encoding="application/x-tex">corpus_file </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9805em;vertical-align:-0.2861em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">cor</span><span class="mord mathnormal">p</span><span class="mord mathnormal">u</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">e</span></span></span></span>embedding_file corpusfileiscoupusfileinstep1andcorpus_file is coupus file in step 1 and corpusfileiscoupusfileinstep1andembedding_file is the output PR-Embedding.

Citation

If you use the data or codes in this repository, please cite our paper

@inproceedings{ma-etal-2020-conversational,
    title = "{C}onversational {W}ord {E}mbedding for {R}etrieval-{B}ased {D}ialog {S}ystem",
    author = "Ma, Wentao  and
      Cui, Yiming  and
      Liu, Ting  and
      Wang, Dong  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.127",
    pages = "1375--1380",
}