检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈卓然 刘泽阳 万里鹏 陈星宇 朱雅萌 王成泽 程翔 张亚[4] 张森林[5] 王晓辉[6] 兰旭光[1] CHEN Zhuoran;LIU Zeyang;WAN Lipeng;CHEN Xingyu;ZHU Yameng;WANG Chengze;CHENG Xiang;ZHANG Ya;ZHANG Senlin;WANG Xiaohui;LAN Xuguang(Institute of Artificial Intelligence and Robotics,Xi′an Jiaotong University,Xi′an 710049;China Academy of Launch Vehicle Technology,Beijing 100076;School of Electronics,Peking University,Beijing 100871;School of Automation,Southeast University,Nanjing 210096;College of Electrical Engineering,Zhejiang University,Hangzhou 310027;Artificial Intelligence Research Institute,China Electric Power Research Institute,Beijing 100192)
机构地区:[1]西安交通大学人工智能与机器人研究所,西安710049 [2]中国运载火箭技术研究院,北京100076 [3]北京大学电子学院,北京100871 [4]东南大学自动化学院,南京210096 [5]浙江大学电气工程学院,杭州310027 [6]中国电力科学研究院人工智能研究所,北京100192
出 处:《模式识别与人工智能》2024年第10期851-872,共22页Pattern Recognition and Artificial Intelligence
基 金:国家重点研发计划项目(No.2021ZD0112700);国家自然科学基金重点项目(No.62125305,62088102,U23A20339,62203348)资助。
摘 要:强化学习是一种用于解决序列决策问题的常用机器学习方法,核心思想是让智能体与环境交互获得反馈,从而逐步学会最佳策略.随着实际应用对计算能力和数据规模的要求不断提高,单体智能转向群体智能逐渐成为人工智能未来发展的必然趋势,这为强化学习带来诸多新的机遇和挑战.文中首先从深度多智能体强化学习概念着手,针对目前的理论困境,如可拓展性较差、效用分配较难、探索-利用困境、环境非稳态、信息部分可观测等问题,进行提炼和分析.然后,详细阐述目前学者对于这些问题提出的多种解决方法及其优缺点.最后,介绍当前多智能体强化学习的典型训练学习环境和智慧城市建设、游戏、机器人控制、自动驾驶等复杂决策领域的实际应用,并总结协作多智能体强化学习面临的挑战和未来发展方向.Reinforcement learning(RL)is a widely utilized machine learning paradigm for addressing sequential decision-making problems.Its core principle involves enabling agents to learn optimal policies iteratively through feedback derived from interactions between an agent and the environment.As the demands for computational power and data scale of practical applications continue to escalate,the transition from single-agent intelligence to collective intelligence becomes an inevitable trend in the future development of artificial intelligence.Therefore,challenges and opportunities are abundant for RL. In this paper, grounded on the concept of deep multi-agent reinforcement learning(MARL), the current theoretical dilemmas are refined and analyzed, including limited scalability, credit assignment, exploration-exploitation dilemma, non-stationarity and partial observability of information. Various solutions and their advantages and disadvantages proposed by researchers are elaborated. Typical training and learning environment of MARL and its practical applications in complex decision-making fields, such as smart city construction, gaming, robotics control and autonomous driving, are introduced. The challenges and future development direction of collaborative multi-agent reinforcement learning are summarized.
关 键 词:深度强化学习 多智能体 效用分配 人类反馈 马尔科夫决策过程
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.121.38