大语言模型在多代理辩论中作为辩论者表现的比较分析  

A Comparative Analysis of Large Language Models as Debaters’Performance in Multi⁃Agent Debates

在线阅读下载全文

作  者:张立炎 梁志剑[1] ZHANG Liyan;LIANG Zhijian(School of Computer Science and Technology,North University of China,Taiyuan 030051,China)

机构地区:[1]中北大学计算机科学与技术学院,山西太原030051

出  处:《中北大学学报(自然科学版)》2025年第2期219-229,共11页Journal of North University of China(Natural Science Edition)

摘  要:为了深入探索大型语言模型(Large Language Models, LLMs)在模拟人类智能,特别是辩论能力方面的潜力与局限性,将思维链(Chain-of-Thought, CoT)与检索增强生成(Retrieval-Augmented Generation, RAG)技术相结合应用到多代理辩论(Multi-Agent Debate, MAD)中,构建了一套多代理辩论框架——CoRAG-MAD,旨在模拟人类辩论比赛流程,包括开篇立论、质询环节、自由辩论和总结陈词四个阶段。设计了公平辩论(Fair Debate)、不平等辩论(Unequal Debate)和混合辩论(Mixed Debate)三种不同的辩论场景,通过自动化评估工具与人工专家评审相结合的方式,对辩论内容进行了深度分析。以OrChiD数据集为测试平台,实验结果表明,CoRAG-MAD可以有效提高LLMs在各个辩论场景中的多项能力。具体而言,在不平等辩论中,LLMs的逻辑推理得分提升57.56%,创造力得分提升49.77%;在混合辩论中,LLMs的协作能力提升23.36%,整体性能提升28.20%。本文进行了消融实验和对比实验,验证了CoT模块在增强逻辑推理能力方面、 RAG模块在提升事实准确性和激发创新思维方面以及CoRAG方法在MAD中的有效性。In order to explore the potential and limitations of Large Language Models(LLMs)in simulating human intelligence,particularly in debate capabilities,a framework called CoRAG-MAD was constructed that integrated Chain-of-Thought(CoT)and Retrieval Augmented Generation(RAG)techniques into Multi-Agent Debate(MAD).It was designed to simulate the process of human debating competition,including four stages:opening statements,attack and defence,free debate,and closing statements.It was employed in three distinct debate scenarios:fair debate,unequal debate,and mixed debate.By combining automated evaluation tools and human expert review,a thorough analysis of the debate content was conducted.The experiment,using the OrChiD dataset as the test platform,shows that CoRAG-MAD can effectively improve several abilities of LLMs in various debate scenarios.Specifically,in the unequality debate,LLMs’logical reasoning score improves up to 57.56%and creativity score improves up to 49.77%;in the mixed debate,LLMs’collaborative ability improves up to 23.36%,and overall performance improves up to 28.20%.This paper presented ablation and comparative experiments,which were conducted to verify the effectiveness of the CoT in enhancing logical reasoning,the RAG in enhancing factual accuracy and stimulating creative thinking,and the CoRAG approach in MAD.

关 键 词:多代理辩论 检索增强生成 思维链 大语言模型 NLP 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象