基于TD3算法的自动协商策略

Automated Negotiation Strategy Based on TD3 Algorithm

作　　者：陈佐明詹捷宇 CHEN Zuo-Ming;ZHAN Jie-Yu(School of Computer Science,South China Normal University,Guangzhou 510631,China)

机构地区：[1]华南师范大学计算机学院,广州510631

出　　处：《计算机系统应用》2023年第3期15-24,共10页Computer Systems & Applications

基　　金：国家自然科学基金青年基金(62006085)。

摘　　要：协商是人们就某些议题进行交流寻求一致协议的过程.而自动协商旨在通过协商智能体的使用降低协商成本、提高协商效率并且优化协商结果.近年来深度强化学习技术开始被运用于自动协商领域并取得了良好的效果,然而依然存在智能体训练时间较长、特定协商领域依赖、协商信息利用不充分等问题.为此,本文提出了一种基于TD3深度强化学习算法的协商策略,通过预训练降低训练过程的探索成本,通过优化状态和动作定义提高协商策略的鲁棒性从而适应不同的协商场景,通过多头语义神经网络和对手偏好预测模块充分利用协商的交互信息.实验结果表明,该策略在不同协商环境下都可以很好地完成协商任务.Negotiation refers to the process in which people communicate with each other on certain topics to reach an agreement. Automated negotiation aims to reduce negotiation costs, improve negotiation efficiency, and optimize negotiation results by using negotiating agents. In recent years, deep reinforcement learning techniques have been applied to the field of automated negotiation with good results. However, there are still problems such as the long training time of agents, dependence on specific negotiation domains, and insufficient utilization of negotiation information. Therefore, this study proposes a negotiation strategy based on the TD3 deep reinforcement learning algorithm, which reduces the exploration cost of the training process through pre-training and improves the robustness of the negotiation strategy by optimizing the state and action definitions, so as to adapt to different negotiation scenarios. In addition, it makes full use of the interaction information of the negotiation by multi-head semantic neural network and opponent preference prediction module. The experimental results show that the strategy can perform the negotiation task well in different negotiation environments.

关键词：自动协商协商策略深度强化学习 TD3算法偏好预测

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TD3算法的自动协商策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TD3算法的自动协商策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索