无蜂窝大规模MIMO中基于深度强化学习的无人机辅助通信与资源调度

王朝炜; 邓丹昊; 王卫东; 江帆

doi:10.11999/JEIT211241

无蜂窝大规模MIMO中基于深度强化学习的无人机辅助通信与资源调度

doi: 10.11999/JEIT211241 cstr: 32379.14.JEIT211241

王朝炜^{1, 2, ,},
邓丹昊^{1, 2},
王卫东^{1, 2},
江帆³

1.
北京邮电大学电子工程学院北京 100876
2.
泛网无线通信教育部重点实验室北京 100876
3.
西安邮电大学通信与信息工程学院西安 710061

基金项目: 国家重点研发计划(2020YFB1807204)

详细信息

作者简介:
王朝炜：男，1982年生，副教授，研究方向为下一代移动通信技术、无线传感器与IoT技术等

邓丹昊：女，1996年生，博士生，研究方向为无线资源管理技术

王卫东：男，1967年生，教授，研究方向为卫星移动通信、下一代移动通信技术、IoT技术等

江帆：女，1982年生，教授，研究方向为基于人工智能的边缘计算及缓存技术、D2D通信技术、5G超密集异构网络中的无线资源管理等

通讯作者:
王朝炜　wangchaowei@bupt.edu.cn

中图分类号: TN915
计量
- 文章访问数: 2246
- HTML全文浏览量: 1423
- PDF下载量: 353
- 被引次数: 0
出版历程
- 收稿日期: 2021-11-08
- 修回日期: 2022-03-02
- 录用日期: 2022-03-02
- 网络出版日期: 2022-03-04
- 刊出日期: 2022-03-28

UAV Assisted Communication and Resource Scheduling in Cell-free Massive MIMO Based on Deep Reinforcement Learning Approach

WANG Chaowei^{1, 2
, ,},
DENG Danhao^{1, 2},
WANG Weidong^{1, 2},
JIANG Fan³

1.
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
2.
Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing 100876, China
3.
School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710061, China

Funds: The National Key R&D Program of China (2020YFB1807204)

摘要

摘要: 无蜂窝大规模多入多出(MIMO)网络中分布式接入点(AP)同时服务多个用户，可以实现较大区域内虚拟MIMO的大容量传输；而无人机辅助通信能够为该目标区域热点或边缘用户提供覆盖增强。为了降低反馈链路负载，并有效提升无人机辅助通信的频谱利用率，该文研究了基于AP功率分配、无人机服务区选择和接入用户选择的联合调度；首先将AP功率分配和无人机服务区选择问题联合建模为双动作马尔可夫决策过程 (DAMDP)，提出了基于Q-learning和卷积神经网络(CNN)的深度强化学习(DRL)算法；然后将用户调度构造为一个0-1优化问题，并分解成子问题来求解。仿真结果表明，该文提出的基于DRL的资源调度方案与现有方案相比，可以有效提升无蜂窝大规模MIMO网络中频谱利用率。
- 无蜂窝大规模MIMO /
- 无人机辅助通信 /
- 资源调度 /
- 深度增强学习
Abstract: Distributed Access Points (AP) in the cell-free massive Multiple Input Multiple Output (MIMO) networks serve multiple users at the same time, which can achieve large-capacity transmission of virtual MIMO in a larger area. Unmanned Aerial Vehicle (UAV) assisted communication can provide coverage enhancement for hotspots or edge users in this area. In order to improve the spectrum efficiency and reduce the feedback overhead, a joint resource scheduling scheme that includes AP power allocation, UAV service zone selection and user scheduling is proposed in this paper. Firstly, the AP power allocation and the UAV service zone selection problems are jointly modeled as a Double-Action Markov Decision Process (DAMDP). Then, a Deep Reinforcement Learning (DRL) algorithm based on Q-learning and Convolutional Neural Networks (CNN) is proposed. Furthermore, the user scheduling problem is formulated as a 0-1 optimization problem and solved by dividing into sub-problems. Simulation results demonstrate that the proposed DRL-based resource scheduling scheme exhibits a higher spectrum efficiency than existing schemes.
- Cell-free massive MIMO /
- UAV assisted communication /
- Resource scheduling /
- Deep Reinforcement Learning (DRL)

HTML全文

图 1 无蜂窝大规模MIMO网络中无人机辅助通信

下载: 全尺寸图片幻灯片

图 2 不同算法的收敛性

下载: 全尺寸图片幻灯片

图 3 不同算法的复杂度

下载: 全尺寸图片幻灯片

图 4 不同学习率下的错误率

下载: 全尺寸图片幻灯片

图 5 不同网络框架下的错误率

下载: 全尺寸图片幻灯片

图 6 不同最小距离下的频谱利用率(DQN和Q-learning方案经历60000步训练)

下载: 全尺寸图片幻灯片

图 7 不同服务区数量下的频谱利用率(DQN和Q-learning方案经历60000步训练)

下载: 全尺寸图片幻灯片

表 1 DQN算法框架

层级	输入	内核	激活函数	输出
卷积层1	$ \left( {\sum {{M_l}} } \right) \times N \times 1 $	3×3, 8	ReLU	$ \left( {\sum {{M_l}} } \right) \times N \times {\text{8}} $
卷积层2	$ \left( {\sum {{M_l}} } \right) \times N \times {\text{8}} $	3×3, 16	ReLU	$ \left( {\sum {{M_l}} } \right) \times N \times {\text{16}} $
全连接层1	$ \left( {\sum {{M_l}} } \right) \times N \times {\text{16}} $	NA	ReLU	1024
全连接层2	1024	NA	Linear	$ \prod {\left( {{M_l} \times N} \right)} $

下载: 导出CSV

参考文献(29)

[1]	NGO H Q, ASHIKHMIN A, YANG Hong, et al. Cell-free massive MIMO versus small cells[J]. IEEE Transactions on Wireless Communications, 2017, 16(3): 1834–1850. doi: 10.1109/TWC.2017.2655515
[2]	尤肖虎, 尹浩, 邬贺铨. 6G与广域物联网[J]. 物联网学报, 2020, 4(1): 3–11. doi: 10.11959/j.issn.2096-3750.2020.00158 YOU Xiaohu, YIN Hao, and WU Hequan. On 6G and wide-area IoT[J]. Chinese Journal on Internet of Things, 2020, 4(1): 3–11. doi: 10.11959/j.issn.2096-3750.2020.00158
[3]	GU Shushi, ZHANG Qinyu, and XIANG Wei. Coded storage-and-computation: A new paradigm to enhancing intelligent services in space-air-ground integrated networks[J]. IEEE Wireless Communications, 2020, 27(6): 44–51. doi: 10.1109/MWC.001.2000108
[4]	ZHAO Jianwei, GAO Feifei, WU Qihui, et al. Beam tracking for UAV mounted SatCom on-the-Move with massive antenna array[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(2): 363–375. doi: 10.1109/JSAC.2018.2804239
[5]	JIANG Xu, YANG Zhutian, ZHAO Nan, et al. Resource allocation and trajectory optimization for UAV-enabled multi-user covert communications[J]. IEEE Transactions on Vehicular Technology, 2021, 70(2): 1989–1994. doi: 10.1109/TVT.2021.3053936
[6]	ZHAO Jianwei, GAO Feifei, KUANG Linling, et al. Channel tracking with flight control system for UAV mmWave MIMO communications[J]. IEEE Communications Letters, 2018, 22(6): 1224–1227. doi: 10.1109/LCOMM.2018.2824800
[7]	AMMAR H A, ADVE R, SHAHBAZPANAHI S, et al. Downlink resource allocation in multiuser cell-free MIMO networks with user-centric clustering[J]. IEEE Transactions on Wireless Communications, 2022, 21(3): 1482–1497.
[8]	AMMAR H A, ADVE R, SHAHBAZPANAHI S, et al. Distributed resource allocation optimization for user-centric cell-free MIMO networks[J]. IEEE Transactions on Wireless Communications, To be published.
[9]	XIA Xinjiang, ZHU Pengcheng, LI Jiamin, et al. Joint user selection and transceiver design for cell-free with network-assisted full duplexing[J]. IEEE Transactions on Wireless Communications, 2021, 20(12): 7856–7870. doi: 10.1109/TWC.2021.3088485
[10]	WANG Dongming, ZHANG Chuan, DU Yongqiang, et al. Implementation of a cloud-based cell-free distributed massive MIMO system[J]. IEEE Communications Magazine, 2020, 58(8): 61–67. doi: 10.1109/MCOM.001.2000106
[11]	WANG Dongming, WANG Menghan, ZHU Pengcheng, et al. Performance of network-assisted full-duplex for cell-free massive MIMO[J]. IEEE Transactions on Communications, 2020, 68(3): 1464–1478. doi: 10.1109/TCOMM.2019.2962158
[12]	ZHENG Jiakang, ZHANG Jiayi, and AI Bo. UAV communications with WPT-aided cell-free massive MIMO systems[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(10): 3114–3128. doi: 10.1109/JSAC.2021.3088632
[13]	D’ANDREA C, GARCIA-RODRIGUEZ A, GERACI G, et al. Analysis of UAV communications in cell-free massive MIMO systems[J]. IEEE Open Journal of the Communications Society, 2020, 1: 133–147. doi: 10.1109/OJCOMS.2020.2964983
[14]	SAMIR M, EBRAHIMI D, ASSI C, et al. Leveraging UAVs for coverage in cell-free vehicular networks: A deep reinforcement learning approach[J]. IEEE Transactions on Mobile Computing, 2021, 20(9): 2835–2847. doi: 10.1109/TMC.2020.2991326
[15]	XI Xing, CAO Xianbin, YANG Peng, et al. Joint user association and UAV location optimization for UAV-aided communications[J]. IEEE Wireless Communications Letters, 2019, 8(6): 1688–1691. doi: 10.1109/LWC.2019.2937077
[16]	GUO Yijun, YIN Sixing, and HAO Jianjun. Resource allocation and 3-D trajectory design in wireless networks assisted by rechargeable UAV[J]. IEEE Wireless Communications Letters, 2019, 8(3): 781–784. doi: 10.1109/LWC.2019.2892721
[17]	ZHANG Guangchi, YAN Haiqiang, ZENG Yong, et al. Trajectory optimization and power allocation for multi-hop UAV relaying communications[J]. IEEE Access, 2018, 6: 48566–48576. doi: 10.1109/ACCESS.2018.2868117
[18]	ZHAO Chenxi, LIU Junyu, SHENG Min, et al. Multi-UAV trajectory planning for energy-efficient content coverage: A decentralized learning-based approach[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(10): 3193–3207. doi: 10.1109/JSAC.2021.3088669
[19]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
[20]	KOUSHIK A M, HU Fei, and KUMAR S. Deep Q-learning-based node positioning for throughput-optimal communications in dynamic UAV swarm network[J]. IEEE Transactions on Cognitive Communications and Networking, 2019, 5(3): 554–566. doi: 10.1109/TCCN.2019.2907520
[21]	CUI Jingjing, LIU Yuanwei, and NALLANATHAN A. Multi-agent reinforcement learning-based resource allocation for UAV networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 729–743. doi: 10.1109/TWC.2019.2935201
[22]	DING Ruijin, GAO Feifei, and SHEN X S. 3D UAV trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2020, 19(12): 7796–7809. doi: 10.1109/TWC.2020.3016024
[23]	DING Ruijin, XU Yadong, GAO Feifei, et al. Trajectory design and access control for air-ground coordinated communications system with multi-agent deep reinforcement learning[J]. IEEE Internet of Things Journal. To be published.
[24]	HE Ying, ZHANG Zheng, YU F R, et al. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks[J]. IEEE Transactions on Vehicular Technology, 2017, 66(11): 10433–10445. doi: 10.1109/TVT.2017.2751641
[25]	HE Dawei, SUN Wei, and SHI Lei. The novel mobility models based on spiral line for aerial backbone networks[J]. IEEE Access, 2020, 8: 11297–11314. doi: 10.1109/ACCESS.2020.2965616
[26]	SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: MIT Press, 2018: 1–130.
[27]	ZHANG Ran, WANG Miao, CAI L X, et al. Learning to be proactive: Self-regulation of UAV based networks with UAV and user dynamics[J]. IEEE Transactions on Wireless Communications, 2021, 20(7): 4406–4419. doi: 10.1109/TWC.2021.3058533
[28]	WANG Xue, JIN Tao, HU Liangshuai, et al. Energy-efficient power allocation and Q-learning-based relay selection for relay-aided D2D communication[J]. IEEE Transactions on Vehicular Technology, 2020, 69(6): 6452–6462. doi: 10.1109/TVT.2020.2985873
[29]	WANG Xiuhong, QIAO Qingli, and WANG Zheng’ou. Chaotic neural network technique for “0–1” programming problems[J]. Journal of Systems Engineering and Electronics, 2003, 14(4): 99–105.