GitHub - Skytliang/Multi-Agents-Debate: MAD: The first work to explore Multi-Agent Debate with Large Language Models :D (original) (raw)

Logo

πŸ”₯This work aims to explore the debating capability of LLMs by proposing the MAD framework, which stands for Multi-Agents Debate.

"Truth emerges from the clash of adverse ideas."
"ηœŸη†θΆŠθΎ©θΆŠζ˜Žγ€‚"

Brief Introduction

The cognitive behavior of large language models (LLMs) has garnered significant attention in recent times. For example, self-reflection, a concept that usually refers to the process of introspection and examination of a person's own thoughts, has also been demonstrated effective with LLMs in solving challenging NLP tasks. However, we point out that self-reflection can easily fall into the degeneration of thoughts (DoT) issue in the follow scenarios:

MAD

Figure 1: Comparison between debate and reflection.

In this project, we have embarked on a journey to explore the potential of a debating interaction framework among LLMs. With MAD, the nature of agents being in the state of 'tit for tat' determines that (1) the distorted thinking of one agent can be corrected by the other one πŸ˜€; (2) the resistance to change of one agent will be complemented by the other one πŸ˜„; and (3) either agent can provide external feedback for each other πŸ˜†.

Obviously, MAD is less likely to have the DoT issue and can exploit more potential of LLMs. Experiments show that MAD brings significant and consistent improvements on Counterintuitive QA and Commonsense-MT tasks.

JOIN US on this journey of exploring the interaction and debating capability with LLMs. πŸš€πŸš€πŸš€

Framework

MAD

Figure 2: Framework of Multi-Agent Debate. Here we designate the devil () as the affirmative side while the angel () as the negative side. We want the angel to correct the devil’s mistakes..

Run

Preparation

pip3 install -r requirements.txt

Run MAD

Run Interactive

If you just want to have a try, you can try the interactive script on your PC.

Or simply try our demo for translation here.

Main Results

Counterintuitive QA

CounterintuitiveQA

Table 1: Reasoning accuracy on Counter-Intuitive AR.

Case 1

When Alice walks up the hill, her speed is 1 m/s and when she goes down the hill, her speed is 3 m/s. Then when Alice walks up and down the hill, what is her average speed? (1.5m/s)

MAD

MAD

Figure 3: An Animation to Show the Process of MAD.

Debate process

MAD


Commonsense Machine Translation

CommonMT

Table 2: Translation performance on Common MT.

Case 1

Given the Chinese sentence "εƒζŽ‰ζ•ŒδΊΊδΈ€δΈͺεΈˆγ€‚", please provide its translation in English.

MAD

Case 2Given the Chinese sentence "δ»–δ»ŽεŽι—¨ζžεˆ°δΊ†δΈε°‘ει…’γ€‚", please provide its translation in English.

MAD

Reference

Citation

@article{liang2023encouraging,
  title={Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate},
  author={Liang, Tian and He, Zhiwei and Jiao, Wenxiang and Wang, Xing and Wang, Yan and Wang, Rui and Yang, Yujiu and Tu, Zhaopeng and Shi, Shuming},
  journal={arXiv preprint arXiv:2305.19118},
  year={2023}
}