检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:武艳[1] 潘广川 姚明旿[1] 杨清海[1] 梁中明 WU Yan;PAN Guangchuan;YAO Mingwu;YANG Qinghai;LEUNG Victor C.M.(State Key Laboratory of Integrated Services Network,School of Telecommunications,Xidian University,Xi’an 710071,China;Internet of Things Research Center,School of Computer and Software,Shenzhen University,Shenzhen 518060,China)
机构地区:[1]西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西西安710071 [2]深圳大学计算机与软件学院物联网研究中心,广东深圳518060
出 处:《通信学报》2024年第8期180-191,共12页Journal on Communications
基 金:国家重点研发计划基金资助项目(No.2020YFB1807700);陕西省创新团队基金资助项目(No.2024RS-CXTD-01)。
摘 要:针对空天地一体化网络信息物理系统模型复杂、很难获得网络拓扑先验知识和模型化假设的特点,研究其基于深度强化学习的垂直切换策略。首先,综合考虑系统稳定性、切换开销和网络使用成本约束,将垂直切换策略问题建模为约束马尔可夫决策过程(CMDP),并给出保证可行解存在的充分条件;其次,提出约束-近端策略优化(CPPO)算法解决该问题,并在基站侧引入分布式强化学习机制加速训练收敛。相较于基准策略,仿真验证了所提垂直切换策略的优越性和有效性。The vertical handover policy of space-air-ground integrated cyber-physical systems based on deep reinforcement learning was studied,in which the challenges of complicated network model and difficulties in acquiring prior knowledge for network topology and model were addressed.By jointly taking the system stability,handover cost and network-using cost into account,the vertical handover policy problem was modeled as a constraint Markov decision process(CMDP),and a sufficient condition to ensure the existence of a feasible solution was derived.Furthermore,a constraint-proximal policy optimization(CPPO)algorithm was proposed to solve the CMDP,and also the distributed learning scheme at base station sides was introduced to accelerate the speed of converging.Simulation results verify the validation and superiority of the proposed vertical handover policy as compared with the baselines.
关 键 词:空天地一体化网络 信息物理系统 深度强化学习 垂直切换
分 类 号:TN929.5[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.173