基于强化学习的伦理智能体训练方法被引量：1

An Approach for Training Moral Agents via Reinforcement Learning

作　　者：古天龙高慧[2] 李龙包旭光[2] 李云辉[2] Gu Tianlong;Gao Hui;Li Long;Bao Xuguang;Li Yunhui(College of Cyber Security,Jinan University,Guangzhou 510632;Guangxi Key Laboratory of Trusted Software(Guilin University of Electronic Technology),Guilin,Guangxi 541004)

机构地区：[1]暨南大学网络空间安全学院,广州510632 [2]广西可信软件重点实验室(桂林电子科技大学),广西桂林541004

出　　处：《计算机研究与发展》2022年第9期2039-2050,共12页Journal of Computer Research and Development

基　　金：国家自然科学基金项目(62172350,61966009,61961007,61862016,62006057);广西自然科学基金项目(2019GXNSFBA245049,2019GXNSFBA245059);中央高校基本科研业务费专项资金项目(21621028)。

摘　　要：自动驾驶汽车、看护机器人等形式多样的智能体在人类生活中扮演着越来越重要的角色,其伦理问题受到了广泛关注.为使智能体具备遵守人类伦理规范的能力,提出了一种基于众包和强化学习的伦理智能体训练方法.首先,采用众包获取行为示例数据集,并借助于文本聚类、关联分析等技术生成情节图及轨迹树,以定义智能体的基本行为空间、表明行为的发生顺序;其次,提出元伦理行为的概念,通过对不同场景中的相似行为进行概括,扩展伦理智能体的行为空间,进一步基于《中学生日常行为规范》提取了9种元伦理行为;最后,提出了行为分级机制及与之对应的强化学习奖惩函数,以此为基础完成伦理智能体训练.通过模拟人类生活中的买药场景,分别使用Q-learning算法及DQN(deep Q-networks)算法完成了伦理智能体的训练实验.实验结果表明,训练后的智能体能够以符合伦理的行为方式完成预期任务,验证了所提方法的合理性与有效性.Artificial agents such as autonomous vehicles and healthcare robots are playing an increasingly important role in human life,and their moral issues have attracted more and more concerns.To build the ability for agents to comply with basic human ethical norms,a novel approach for training artificial moral agents is proposed based on crowdsourcing and reinforcement learning.Firstly,crowdsourcing is used to obtain sampling data sets of human behaviors,and text clustering and association analysis are used to generate plot graphs and trajectory trees,which define a basic behavior space of agents and present the sequence of behaviors.Secondly,the concept of meta-ethical behavior is proposed,which expands the behavior space of agents by summarizing similar behaviors in different scenarios,and nine kinds of meta-ethical behaviors are extracted from the Code of Daily Behavior of Middle School Students.Finally,a behavior grading mechanism and the corresponding reward and punishment function in reinforcement learning are proposed.By simulating drug purchase scenarios in human life,Q-learning algorithm and DQN(deep Q-networks)algorithm are used to complete the training experiments of moral agent respectively.Experimental results show that the trained agents can complete the expected tasks in ethical manners,which verifies the rationality and effectiveness of the above method.

关键词：伦理智能体符合伦理的设计伦理分级强化学习众包

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的伦理智能体训练方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的伦理智能体训练方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的伦理智能体训练方法被引量：1