基于多智能体深度强化学习的水声网络资源分配  

Multi-agent Deep Reinforcement Learning Based ResourcesAllocation for Underwater Acoustic Networks

在线阅读下载全文

作  者:李梦凡 张育芝 韩翔 冯晓美 LI Mengfan;ZHANG Yuzhi;HAN Xiang;FENG Xiaomei(School of Communication and Information Engineering,Xi an University of Science and Technology,Xi’an 710054,China)

机构地区:[1]西安科技大学通信与信息工程学院,西安710054

出  处:《电讯技术》2025年第2期283-292,共10页Telecommunication Engineering

基  金:国家自然科学基金资助项目(61801372);陕西省教育厅科研计划项目(22JK0454)。

摘  要:在资源受限的水声网络中,使用软频率复用技术和自适应资源分配技术可以提高网络容量和能量效率。然而,水声信道的长传播时延和时变特性导致用于自适应技术的反馈信道状态信息(Channel State Information, CSI)是时变且过时的。非理想的反馈CSI会降低自适应系统的性能。针对该问题,提出了一种基于多智能体深度Q网络的资源分配(Multi-agent Deep Q Network Based Resource Allocation, MADQN-RA)方法。该方法将水声软频率复用网络视为多智能体系统,并使用过时的反馈CSI序列作为系统状态。通过建立有效的奖励表达式,智能体可以跟踪时变时延水声信道的变化特性并做出相应的资源分配决策。为了进一步提高智能体的决策准确度,同时避免状态空间维度增大时的部分学习成本,结合动态状态长度方法改进了MADQN-RA。仿真结果表明,所提方法实现的系统性能优于基于其他学习的方法和基于信道预测的方法,且更接近理论最优值。In resource limited underwater acoustic networks,the network capacity and energy efficiency can be improved by using soft frequency reuse technology and adaptive resource allocation technology.However,the underwater acoustic channel has long propagation delays and time-varying features,resulting in the feedback channel state information(CSI)used in adaptive techniques being time-varying and outdated.Imperfect feedback CSI will reduce the performance of adaptive systems.To address this issue,a multi-agent deep Q network based resource allocation(MADQN-RA)method is proposed.The method treats the underwater acoustic soft frequency reuse network as a multi-agent system and employs outdated feedback CSI sequences as the system states.By establishing an effective reward expression,agents can track the properties of time-varying delay underwater acoustic channels and make corresponding resource allocation decisions.To further improve the decision-making accuracy of agents and avoid the partial learning cost of increasing state space dimensions,the MADQN-RA is improved by dynamic state length method.The simulation results show that the system performance achieved through the proposed methods surpasses that of other learning based and channel prediction based methods and converges closer to the theoretically optimal values.

关 键 词:水声网络 资源分配 反馈信道状态信息 多智能体深度Q网络 动态状态长度 

分 类 号:TN929.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象