检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈平平 张旭 谢肇鹏 丘毓萍 方毅[3] CHEN Ping-ping;ZHANG Xu;XIE Zhao-peng;QIU Yu-ping;FANG Yi(School of Advanced Manufacturing,Fuzhou University,Jinjiang,Fujian 362251,China;College of Physics and Information Engineering,Fuzhou University,Fuzhou,Fujian 350108,China;School of Information Engineering,Guangdong University of Technology,Guangzhou,Guangdong 510006,China)
机构地区:[1]福州大学先进制造学院,福建晋江362251 [2]福州大学物理与信息工程学院,福建福州350108 [3]广东工业大学信息工程学院,广东广州510006
出 处:《电子学报》2024年第6期1824-1831,共8页Acta Electronica Sinica
基 金:国家自然科学基金(No.62171135,No.62322106,No.62071131);福建省自然科学基金(No.2022J06010)。
摘 要:为了在多用户多信道通信场景中应用动态频谱接入(Dynamic Spectrum Access,DSA)技术提高通信效率,保证用户公平,本文基于多智能体近端策略优化(Multi-Agent Proximal Policy Optimization,MAPPO)提出了MAPPO-DSA算法.该算法首先针对单信道接入在多个信道同时空闲时存在的频谱浪费问题,使用多信道接入作为解决方案.同时,多信道接入导致状态空间与动作空间指数增长,计算成本高,学习难度大.为此本文引入MAPPO深度强化学习(Deep Reinforcement Learning,DRL)算法,在复杂环境中高效学习和优化接入策略.通过设计优化MAPPO中观测及奖励等强化学习要素和共享网络参数来保证用户公平.最后,在不同场景下的实验结果表明,所提出的MAPPO-DSA能够学习到近似最优的接入策略,部分场景中的网络吞吐量逼近理论上限,显著优于现有算法,且有效保证用户公平.To enhance communication efficiency and ensure user fairness in multi-user multi-channel communication scenarios,based on multi-agent proximal policy optimization(MAPPO)for the application of dynamic spectrum access(DSA)technology,this paper proposes the MAPPO-DSA algorithm.The algorithm addresses the issue of spectrum waste in single-channel access when multiple channels are simultaneously idle by using multi-channel access as a solution.However,multi-channel access leads to an exponential increase in the state and action spaces,resulting in high computational costs and learning difficulties.To tackle this,the paper introduces the MAPPO deep reinforcement learning(DRL)algorithm to efficiently learn and optimize access strategies in complex environments.The design of MAPPO incorporates reinforcement learning elements such as observation and reward,as well as shared network parameters to ensure user fairness.Experimental results in different scenarios demonstrate that the proposed MAPPO-DSA algorithm can learn near-optimal access strategies,and approach the theoretical throughput limit in some scenarios,outperforming the existing algorithms significantly and effectively ensuring user fairness.
关 键 词:动态频谱接入 深度强化学习 多智能体近端优化 多信道接入
分 类 号:TP317.4[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63