检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yingzhi HUANG Zhaoyang ZHANG Jingze CHE Zhaohui YANG Qianqian YANG Kai-Kit WONG
机构地区:[1]College of Information Science and Electronic Engineering,Zhejiang University,Hangzhou,310027,China [2]Department of Electronic and Electrical Engineering,University College London,London,WC1E 6BT,UK
出 处:《Science China(Information Sciences)》2023年第6期200-217,共18页中国科学(信息科学)(英文版)
基 金:supported in part by National Key R&D Program of China (Grant Nos.2020YFB1807101,2018YFB1801104);National Natural Science Foundation of China (Grant Nos.61725104,U20A20158,61922071)。
摘 要:Machine learning(ML)has been empowering all aspects of the wireless communication system design,among which,the reinforcement learning(RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently.In this paper,we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave(mmWave)three-dimensional(3D)multi-input multi-output(MIMO)communication systems.The proposed scheme does not require perfect channel state information(CSI)or precise user location information which both are generally difficult to obtain in practice,and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users,multiple paths,and multiple 3D beams.In particular,a self-attention deep deterministic policy gradient(DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively.We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user.To this end,the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely,which leads to a near-optimal beamformer design.Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.
关 键 词:reinforcement learning(RL) deep deterministic policy gradient(DDPG) self-attention precoding/combining millimeter-wave(mmWave) multi-input multi-output(MIMO)
分 类 号:TN929.5[电子电信—通信与信息系统] TP181[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.169