Self-attention reinforcement learning for multi-beam combining in mmW ave 3D-MIMO systems  

在线阅读下载全文

作  者:Yingzhi HUANG Zhaoyang ZHANG Jingze CHE Zhaohui YANG Qianqian YANG Kai-Kit WONG 

机构地区:[1]College of Information Science and Electronic Engineering,Zhejiang University,Hangzhou,310027,China [2]Department of Electronic and Electrical Engineering,University College London,London,WC1E 6BT,UK

出  处:《Science China(Information Sciences)》2023年第6期200-217,共18页中国科学(信息科学)(英文版)

基  金:supported in part by National Key R&D Program of China (Grant Nos.2020YFB1807101,2018YFB1801104);National Natural Science Foundation of China (Grant Nos.61725104,U20A20158,61922071)。

摘  要:Machine learning(ML)has been empowering all aspects of the wireless communication system design,among which,the reinforcement learning(RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently.In this paper,we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave(mmWave)three-dimensional(3D)multi-input multi-output(MIMO)communication systems.The proposed scheme does not require perfect channel state information(CSI)or precise user location information which both are generally difficult to obtain in practice,and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users,multiple paths,and multiple 3D beams.In particular,a self-attention deep deterministic policy gradient(DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively.We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user.To this end,the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely,which leads to a near-optimal beamformer design.Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.

关 键 词:reinforcement learning(RL) deep deterministic policy gradient(DDPG) self-attention precoding/combining millimeter-wave(mmWave) multi-input multi-output(MIMO) 

分 类 号:TN929.5[电子电信—通信与信息系统] TP181[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象