基于D2GA的逆强化学习算法

Inverse reinforcement learning algorithm based on D2GA

作　　者：段成龙袁杰[1] 常乾坤张宁宁[1] DUAN Cheng-long;YUAN Jie;CHANG Qian-kun;ZHANG Ning-ning(School of Electrical Engineering,Xinjiang University,Urumqi 830017,China)

机构地区：[1]新疆大学电气工程学院,新疆乌鲁木齐830017

出　　处：《计算机工程与科学》2024年第11期2053-2062,共10页Computer Engineering & Science

基　　金：国家自然科学基金(62263031);新疆维吾尔自治区自然科学基金(2022D01C53)。

摘　　要：针对传统生成对抗逆强化学习存在的专家样本获取困难以及生成样本利用率低的问题,提出一种基于事后经验回放策略HER的双鉴别器生成对抗D2GA逆强化学习算法。在该算法中,HER自动合成类专家的正样本,通过D2GA与强化学习方法柔性动作-评价SAC生成的负样本进行对抗性训练,基于所求解的最优奖励函数,利用SAC求解最优策略。将所提出的D2GA算法与经典的逆强化学习算法在Fetch机械臂环境中的4种任务进行了比较实验。结果表明:在没有可用演示数据的情况下,D2GA在相对少的回合数内完成任务的成功率可以达到理想性能,优于当前流行的逆强化学习算法。Aiming at the difficulty in obtaining expert demonstrations and the low utilization rate of generated samples in the traditional generative adversarial reinforcement learning,a double discriminator generative adversarial(D2GA)inverse reinforcement learning algorithm based on hindsight experience replay(HER)is proposed.In this algorithm,HER automatically synthesizes positive expert-like samples,and conducts adversarial training with negative samples generated by D2GA and reinforcement learning algorithm soft actor-critic(SAC).Based on the solved optimal reward function,SAC is used to solve the optimal strategy.The proposed D2GA algorithm is compared with the classical inverse reinforcement algorithm on four tasks in the Fetch environment.The results show that the success rate of D2GA in completing the task in relatively few rounds can reach ideal performance without available demonstration data,which is better than the current popular inverse reinforcement learning algorithm.

关键词：深度强化学习事后经验回放逆强化学习生成对抗网络

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于D2GA的逆强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于D2GA的逆强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索