检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋波 叶伟 孟祥辉 SONG Bo;YE Wei;MENG Xianghui(Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China;Unit 95801 of the PLA, Beijing 100076, China)
机构地区:[1]航天工程大学电子与光学工程系,北京101416 [2]中国人民解放军95801部队,北京100076
出 处:《系统工程与电子技术》2021年第11期3338-3351,共14页Systems Engineering and Electronics
摘 要:认知无线电和动态频谱分配技术是解决频谱资源短缺问题的有效手段。随着近年来深度学习和强化学习等机器学习技术迅速发展,以多智能体强化学习为代表的群体智能技术不断取得突破,使得分布式智能动态频谱分配成为可能。本文详细梳理了强化学习和多智能体强化学习领域关键研究成果,以及基于多智能体强化学习的动态频谱分配过程建模方法与算法研究。并将现有算法归结为独立Q-学习、合作Q-学习、联合Q-学习和多智能体行动器评判器算法4种,分析了这些方法的优点与不足,总结并给出了基于多智能体强化学习的动态频谱分配方法的关键问题与解决思路。Cognitive radio and dynamic spectrum allocation technology are effective means to solve the scarcity of spectrum.With the rapid development of machine learning technology including deep learning and reinforcement learning in recent years,the swarm intelligence technology represented by multi-agent reinforcement learning is continuously making breakthroughs,which is also making distributed and intelligent dynamic spectrum allocation possible.This paper reviews the key research achievements in reinforcement learning and multi-agent reinforcement learning in detail,as well as research in modeling methods and algorithms of dynamic spectrum allocation process based on multi-agent reinforcement learning.The method could boil down to four types:independent Q-learning,cooperating Q-learning,joint Q-learning and multi-agent actor-critic.The advantages and disadvantages of the existing four types of methods are analyzed,and the critical problems and possible solutions of the dynamic spectrum allocation method based on multi-agent reinforcement learning are summarized.
关 键 词:频谱管理 认知无线电 动态频谱分配 机器学习 强化学习 多智能体强化学习
分 类 号:TN929.5[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117