基于K-臂赌博机的多无人机空地网络动态资源分配方法

马楠; 许魁; 夏晓晨; 谢威; 徐键卉; 申麦英

doi:10.11999/JEIT210877

基于K-臂赌博机的多无人机空地网络动态资源分配方法

doi: 10.11999/JEIT210877

陆军工程大学南京 210007

基金项目: 国家自然科学基金(62071485, 61901519, 61771486, 62001513)，江苏省基础研究计划(BK20192002)，江苏省自然科学基金(BK20201334, BK20181335, BK20200579)

详细信息

作者简介:
马楠：女，博士生，研究方向为移动通信、无人机空地网络、大规模MIMO

许魁：男，副教授，博士生导师，研究方向为移动通信、无人机空地网络、大规模MIMO、通信信号处理

夏晓晨：男，讲师，研究方向为无线通信、无人机空地网络、大规模MIMO

谢威：男，副教授，硕士生导师，研究方向为移动通信、空地通信、大规模MIMO

徐键卉：女，讲师，研究方向为移动通信、无人机空地网络、大规模MIMO、通信信号处理

申麦英：女，助教，研究方向为空地通信、无线传感器网络

通讯作者:
许魁　lgdxxukui@sina.com

中图分类号: TN929.5
计量
- 文章访问数: 629
- HTML全文浏览量: 385
- PDF下载量: 183
- 被引次数: 0
出版历程
- 收稿日期: 2021-08-25
- 修回日期: 2022-01-11
- 录用日期: 2022-01-18
- 网络出版日期: 2022-02-21
- 刊出日期: 2022-09-19

Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network

The Army Engineering University of PLA, Nanjing 210007, China

Funds: The National Natural Science Foundation of China (62071485, 61901519, 61771486, 62001513), The Basic Research Project of Jiangsu Province (BK20192002), The Natural Science Foundation of Jiangsu Province (BK20201334, BK20181335, BK20200579)

摘要

摘要: 针对配置大规模MIMO的多无人机空地网络中的动态资源分配问题，从最大化系统吞吐量的角度出发，该文提出一种基于K-臂赌博机的强化学习算法联合优化多个无人机的用户选择与功率分配策略。首先根据地理位置对用户进行分簇，利用簇中心节点规划无人机飞行路径；其次在不考虑无人机之间端到端通信的情况下，将多无人机资源分配问题转化为相互独立的多个智能体强化学习问题；最后提出分幕式多智能体多状态K-臂赌博机算法来实现用户选择与功率分配的联合优化。通过将无人机每个时刻的位置索引定义为状态空间，从而使得无人机可动态适配自身位置及信道的动态变化。仿真结果表明，所提方案可根据环境状态变化自主智能调整资源分配策略，相比于已有方案能有效提升系统总吞吐量。
- 无人机空地网络 /
- 动态资源分配 /
- 多智能体强化学习 /
- K-臂赌博机 /
- 大规模MIMO
Abstract: In view of the problem of resource allocation in the Unmanned Aerial Vehicle (UAV) enabled air-ground network with massive MIMO, a K-armed bandit-based reinforcement learning algorithm is proposed to jointly optimize the user selection and power allocation to maximize the total throughput of ground users. Firstly, users are clustered according to their geographic location, and the cluster center nodes are used to plan the trajectory of UAVs. Secondly, without considering the UAV-UAV communication links, the problem of multi-UAV resource allocation is transformed into a mutually independent multi-agent reinforcement learning problem. Finally, an episode-based K-armed bandit algorithm with multi-agent and multi-state is proposed to realize the joint optimization of user selection and power allocation, so that the UAV can dynamically adapt to the changes of its position and channel state by defining the position index of the UAV as the state space. Simulation results verify that the proposed algorithm can adaptively adjust the resource allocation strategy according to the channel conditions, which can effectively improve the total system throughput compared with the existing schemes.
- UAV-enabled air-ground network /
- Dynamic resource allocation /
- Multi-agent reinforcement learning /
- K-armed bandit /
- Massive MIMO

HTML全文

图 1 空天地一体化应用场景

下载: 全尺寸图片幻灯片

图 2 k-means 与 k-means++聚类结果对比

下载: 全尺寸图片幻灯片

图 3 仿真场景

下载: 全尺寸图片幻灯片

图 4 不同探索率下平均最大吞吐量

下载: 全尺寸图片幻灯片

图 5 不同探索率下训练中实际吞吐量

下载: 全尺寸图片幻灯片

图 6 4 种方案下平均最大吞吐量

下载: 全尺寸图片幻灯片

图 7 4 种方案下平均最大吞吐量分布

下载: 全尺寸图片幻灯片

图 8 两种路径下用户平均吞吐量分布

下载: 全尺寸图片幻灯片

表 1 基于k-means++的簇中心选择算法

初始化：分簇数$ {k_{\text{c}}} $
(1)在所有用户中随机选择第1个簇中心，记为${c_1}$；
(2)计算其他所有用户到${c_1}$的水平距离，将其他用户到${c_1}$的水平距　　离记为$d({{\mathbf{v}}_k},{c_1})$；
(3)从所有用户中选择第2个簇中心节点${c_2}$，选择第$m$个用户的概　　率为　　$ \dfrac{{{d^2}({{\mathbf{v}}_m},{c_1})}}{{\displaystyle\sum\limits_{j = 1}^K {{d^2}({{\mathbf{v}}_j},{c_1})} }} $　　　　　　　　　　　　　　　　　　 (12)
(4)要选择中心$j$，需要执行以下操作：
(a) 计算从每个观测值到每个簇中心节点的距离，并将每个观测　　值分配给其最近的簇；
(b) 对于$m = 1,2, \cdots ,K$和$p = 1,2, \cdots ,{k_{\text{c}}} - 1$，从所有用户中随　　机选择中心$j$，其概率为　　$ \dfrac{{{d^2}({{\mathbf{v}}_m},{c_p})}}{{\displaystyle\sum\limits_{\{ h;{{\mathbf{v}}_h} \in {C_p}\} } {{d^2}({{\mathbf{v}}_h},{c_p})} }} $　　　　　　　　　　　　　　　(13)
其中，${C_p}$是所有最接近簇中心节点${c_p}$的用户的集合，而　${{\mathbf{v}}_m} \in {C_p}$，也就是说，选择每个后续中心时，其选择概率与它到　已选最近中心的距离成比例；
(5)重复步骤(4)，直到选择了${k_{\text{c}}}$个中心。

下载: 导出CSV

表 2 分幕式多智能体多状态K-臂赌博机用户选择和功率分配算法

初始化：探索参数$\varepsilon $，最大训练幕数${N_{{\text{epi}}}}$，状态-动作价值函数$ Q_m^1(s,a) = 0 $，$\forall m \in \mathcal{M}$；
(1)对于所有无人机，给定初始状态${s_m}(0)$；
(2)${N_{{\text{epi}}}} = {N_{{\text{epi}}}} - 1$；
(3)对幕中的每一步循环，$t = 1,2, \cdots ,T$，每架无人机独立执行以下步骤：
(a) 依据策略$ \pi _m^\varepsilon $选取动作${a_m}(t)$；
(b) 执行动作${a_m}(t)$，获得即时回报${R_m}(t + 1)$，状态转换为${s_m}(t + 1)$；
(c) 更新状态-动作价值函数$ Q_m^{t + 1}(s,a) = Q_m^t(s,a) + \alpha (R_m^t - Q_m^t(s,a)) $；
(4)重复步骤(1)—步骤(3)，直到${N_{{\text{epi}}}} = 0$。

下载: 导出CSV

参考文献(23)

[1]	Qualcomm Technologies, Inc. LTE unmanned aircraft systems[R]. Trial Report v1.0. 1, 2017.
[2]	LIU Xin, ZHAI X B, LU Weidang, et al. QoS-guarantee resource allocation for multibeam satellite industrial internet of things with NOMA[J]. IEEE Transactions on Industrial Informatics, 2021, 17(3): 2052–2061. doi: 10.1109/TII.2019.2951728
[3]	ZHAO Nan, LU Weidang, SHENG Min, et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019, 26(1): 45–51. doi: 10.1109/MWC.2018.1800160
[4]	MOZAFFARI M, SAAD W, BENNIS M, et al. Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage[J]. IEEE Communications Letters, 2016, 20(8): 1647–1650. doi: 10.1109/LCOMM.2016.2578312
[5]	LYU Jiangbin, ZENG Yong, ZHANG Rui, et al. Placement optimization of UAV-mounted mobile base stations[J]. IEEE Communications Letters, 2017, 21(3): 604–607. doi: 10.1109/LCOMM.2016.2633248
[6]	LIU Liang, ZHANG Shuowen, and ZHANG Rui. CoMP in the sky: UAV placement and movement optimization for multi-user communications[J]. IEEE Transactions on Communications, 2019, 67(8): 5645–5658. doi: 10.1109/TCOMM.2019.2907944
[7]	HAMMOUTI H E, BENJILLALI M, SHIHADA B, et al. Learn-as-you-fly: A distributed algorithm for joint 3D placement and user association in multi-UAVs networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(12): 5831–5844. doi: 10.1109/TWC.2019.2939315
[8]	ZHANG Shuowen and ZHANG Rui. Trajectory design for cellular-connected UAV under outage duration constraint[C]. 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
[9]	张广驰, 严雨琳, 崔苗, 等. 无人机基站的飞行路线在线优化设计[J]. 电子与信息学报, 2021, 43(12): 3605–3611. doi: 10.11999/JEIT200525 ZHANG Guangchi, YAN Yulin, CUI Miao, et al. Online trajectory optimization for the UAV-mounted base stations[J]. Journal of Electronics &Information Technology, 2021, 43(12): 3605–3611. doi: 10.11999/JEIT200525
[10]	ALZENAD M, EL-KEYI A, LAGUM F, et al. 3-D placement of an unmanned aerial vehicle base station (UAV-BS) for energy-efficient maximal coverage[J]. IEEE Wireless Communications Letters, 2017, 6(4): 434–437. doi: 10.1109/LWC.2017.2700840
[11]	ZENG Yong and ZHANG Rui. Energy-efficient UAV communication with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3747–3760. doi: 10.1109/TWC.2017.2688328
[12]	WU Qingqing, ZENG Yong, and ZHANG Rui. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109–2121. doi: 10.1109/TWC.2017.2789293
[13]	NIU Haibin, ZHAO Xinyu, and LI Jing. 3D location and resource allocation optimization for UAV-enabled emergency networks under statistical QoS constraint[J]. IEEE Access, 2021, 9: 41566–41576. doi: 10.1109/ACCESS.2021.3065055
[14]	ZHANG Qianqian, MOZAFFARI M, SAAD W, et al. Machine learning for predictive on-demand deployment of Uavs for wireless communications[C]. 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, UAE, 2018: 1–6.
[15]	CHEN Mingzhe, SAAD W, and YIN Changchuan. Liquid state machine learning for resource and cache management in LTE-U unmanned aerial vehicle (UAV) networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(3): 1504–1517. doi: 10.1109/TWC.2019.2891629
[16]	ALNAGAR S I, SALHAB A M, and ZUMMO S A. Q-learning-based power allocation for secure wireless communication in UAV-aided relay network[J]. IEEE Access, 2021, 9: 33169–33180. doi: 10.1109/ACCESS.2021.3061406
[17]	ARANI A H, HU Peng, and ZHU Yeying. Fairness-aware link optimization for space-terrestrial integrated networks: A reinforcement learning framework[J]. IEEE Access, 2021, 9: 77624–77636. doi: 10.1109/ACCESS.2021.3082862
[18]	CUI Jingjing, LIU Yuanwei, and NALLANATHAN A. Multi-agent reinforcement learning-based resource allocation for UAV networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 729–743. doi: 10.1109/TWC.2019.2935201
[19]	FENG Wei, WANG Jingchao, CHEN Yunfei, et al. UAV-aided MIMO communications for 5G internet of things[J]. IEEE Internet of Things Journal, 2019, 6(2): 1731–1740. doi: 10.1109/JIOT.2018.2874531
[20]	LI Chunguo, SUN Fan, CIOFFI J M, et al. Energy efficient MIMO relay transmissions via joint power allocations[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2014, 61(7): 531–535. doi: 10.1109/TCSII.2014.2327317
[21]	LI Chunguo, LIU Peng, ZOU Chao, et al. Spectral-efficient cellular communications with coexistent one- and two-hop transmissions[J]. IEEE Transactions on Vehicular Technology, 2016, 65(8): 6765–6772. doi: 10.1109/TVT.2015.2472456
[22]	AN Jue and ZHAO Feng. Trajectory optimization and power allocation algorithm in MBS-assisted cell-free massive MIMO systems[J]. IEEE Access, 2021, 9: 30417–30425. doi: 10.1109/ACCESS.2021.3054652
[23]	ARTHUR D and VASSILVITSKII S. K-means++: The advantages of careful seeding[C]. The Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, 2007: 1027–1035.