GitHub - mawentao277/PR-Embedding: Conversational Word Embedding for Retrieval-based Dialog System (ACL2020) (original) (raw)
This repository contains resources of the following ACL 2020 paper.
Title: Conversational Word Embedding for Retrieval-based Dialog System
Authors: Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/2020.acl-main.127/
News
2020/7/16 we have released the codes, and the data for the personachat example is in data_for_personachat (password vzoQ). You can clone the codes and download the data to have a try ^^
2020/7/6 We have already uploaded the Chinese PR-Embedding based on Zhidao (password g3FK) and Weibo (password Yz6H) Zhidao Embedding |Weibo Embedding. Where the Zhidao Embedding has been used in the experiment part of the paper.
Requirements
gcc4.8.5 (or >=4.8.5)
Python3.6
Keras2.1.2 (or >=2.1)
Tensorflow1.12.0 (or >=1.12)
(We run the codes in gcc4.8.5 + Python3.6 + Keras2.1.2 + Tensorflow1.12.0)
Quick Start
- Prepare the corpus: each line has two columns corresponding to a conversation pair, for example:
Hi , how are you \t I am fine , Thank you ! - Train your PR-Embedding:
sh train.sh <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi><mi>o</mi><mi>r</mi><mi>p</mi><mi>u</mi><msub><mi>s</mi><mi>f</mi></msub><mi>i</mi><mi>l</mi><mi>e</mi></mrow><annotation encoding="application/x-tex">corpus_file </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9805em;vertical-align:-0.2861em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">cor</span><span class="mord mathnormal">p</span><span class="mord mathnormal">u</span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">e</span></span></span></span>embedding_filecorpusfileiscoupusfileinstep1andcorpus_file is coupus file in step 1 and corpusfileiscoupusfileinstep1andembedding_file is the output PR-Embedding.
Citation
If you use the data or codes in this repository, please cite our paper
@inproceedings{ma-etal-2020-conversational,
title = "{C}onversational {W}ord {E}mbedding for {R}etrieval-{B}ased {D}ialog {S}ystem",
author = "Ma, Wentao and
Cui, Yiming and
Liu, Ting and
Wang, Dong and
Wang, Shijin and
Hu, Guoping",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.127",
pages = "1375--1380",
}