Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network

MA Nan; XU Kui; XIA Xiaochen; XIE Wei; XU Jianhui; SHEN Maiying

doi:10.11999/JEIT210877

Volume 44 Issue 9

Sep. 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2022 > 44(9): 3117-3125

MA Nan, XU Kui, XIA Xiaochen, XIE Wei, XU Jianhui, SHEN Maiying. Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3117-3125. doi: 10.11999/JEIT210877

Citation:

MA Nan, XU Kui, XIA Xiaochen, XIE Wei, XU Jianhui, SHEN Maiying. Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3117-3125. doi: 10.11999/JEIT210877

Citation:

PDF( 1983 KB)

Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network

doi: 10.11999/JEIT210877 cstr: 32379.14.JEIT210877

The Army Engineering University of PLA, Nanjing 210007, China

Funds: The National Natural Science Foundation of China (62071485, 61901519, 61771486, 62001513), The Basic Research Project of Jiangsu Province (BK20192002), The Natural Science Foundation of Jiangsu Province (BK20201334, BK20181335, BK20200579)

Received Date: 2021-08-25
Accepted Date: 2022-01-18
Rev Recd Date: 2022-01-11

Available Online: 2022-02-21

Publish Date: 2022-09-19

Abstract

Abstract

In view of the problem of resource allocation in the Unmanned Aerial Vehicle (UAV) enabled air-ground network with massive MIMO, a K-armed bandit-based reinforcement learning algorithm is proposed to jointly optimize the user selection and power allocation to maximize the total throughput of ground users. Firstly, users are clustered according to their geographic location, and the cluster center nodes are used to plan the trajectory of UAVs. Secondly, without considering the UAV-UAV communication links, the problem of multi-UAV resource allocation is transformed into a mutually independent multi-agent reinforcement learning problem. Finally, an episode-based K-armed bandit algorithm with multi-agent and multi-state is proposed to realize the joint optimization of user selection and power allocation, so that the UAV can dynamically adapt to the changes of its position and channel state by defining the position index of the UAV as the state space. Simulation results verify that the proposed algorithm can adaptively adjust the resource allocation strategy according to the channel conditions, which can effectively improve the total system throughput compared with the existing schemes.
- UAV-enabled air-ground network,
- Dynamic resource allocation,
- Multi-agent reinforcement learning,
- K-armed bandit,
- Massive MIMO

FullText(HTML)

References(23)

References

[1]	Qualcomm Technologies, Inc. LTE unmanned aircraft systems[R]. Trial Report v1.0. 1, 2017.
[2]	LIU Xin, ZHAI X B, LU Weidang, et al. QoS-guarantee resource allocation for multibeam satellite industrial internet of things with NOMA[J]. IEEE Transactions on Industrial Informatics, 2021, 17(3): 2052–2061. doi: 10.1109/TII.2019.2951728
[3]	ZHAO Nan, LU Weidang, SHENG Min, et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019, 26(1): 45–51. doi: 10.1109/MWC.2018.1800160
[4]	MOZAFFARI M, SAAD W, BENNIS M, et al. Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage[J]. IEEE Communications Letters, 2016, 20(8): 1647–1650. doi: 10.1109/LCOMM.2016.2578312
[5]	LYU Jiangbin, ZENG Yong, ZHANG Rui, et al. Placement optimization of UAV-mounted mobile base stations[J]. IEEE Communications Letters, 2017, 21(3): 604–607. doi: 10.1109/LCOMM.2016.2633248
[6]	LIU Liang, ZHANG Shuowen, and ZHANG Rui. CoMP in the sky: UAV placement and movement optimization for multi-user communications[J]. IEEE Transactions on Communications, 2019, 67(8): 5645–5658. doi: 10.1109/TCOMM.2019.2907944
[7]	HAMMOUTI H E, BENJILLALI M, SHIHADA B, et al. Learn-as-you-fly: A distributed algorithm for joint 3D placement and user association in multi-UAVs networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(12): 5831–5844. doi: 10.1109/TWC.2019.2939315
[8]	ZHANG Shuowen and ZHANG Rui. Trajectory design for cellular-connected UAV under outage duration constraint[C]. 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
[9]	张广驰, 严雨琳, 崔苗, 等. 无人机基站的飞行路线在线优化设计[J]. 电子与信息学报, 2021, 43(12): 3605–3611. doi: 10.11999/JEIT200525 ZHANG Guangchi, YAN Yulin, CUI Miao, et al. Online trajectory optimization for the UAV-mounted base stations[J]. Journal of Electronics &Information Technology, 2021, 43(12): 3605–3611. doi: 10.11999/JEIT200525
[10]	ALZENAD M, EL-KEYI A, LAGUM F, et al. 3-D placement of an unmanned aerial vehicle base station (UAV-BS) for energy-efficient maximal coverage[J]. IEEE Wireless Communications Letters, 2017, 6(4): 434–437. doi: 10.1109/LWC.2017.2700840
[11]	ZENG Yong and ZHANG Rui. Energy-efficient UAV communication with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3747–3760. doi: 10.1109/TWC.2017.2688328
[12]	WU Qingqing, ZENG Yong, and ZHANG Rui. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109–2121. doi: 10.1109/TWC.2017.2789293
[13]	NIU Haibin, ZHAO Xinyu, and LI Jing. 3D location and resource allocation optimization for UAV-enabled emergency networks under statistical QoS constraint[J]. IEEE Access, 2021, 9: 41566–41576. doi: 10.1109/ACCESS.2021.3065055
[14]	ZHANG Qianqian, MOZAFFARI M, SAAD W, et al. Machine learning for predictive on-demand deployment of Uavs for wireless communications[C]. 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, UAE, 2018: 1–6.
[15]	CHEN Mingzhe, SAAD W, and YIN Changchuan. Liquid state machine learning for resource and cache management in LTE-U unmanned aerial vehicle (UAV) networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(3): 1504–1517. doi: 10.1109/TWC.2019.2891629
[16]	ALNAGAR S I, SALHAB A M, and ZUMMO S A. Q-learning-based power allocation for secure wireless communication in UAV-aided relay network[J]. IEEE Access, 2021, 9: 33169–33180. doi: 10.1109/ACCESS.2021.3061406
[17]	ARANI A H, HU Peng, and ZHU Yeying. Fairness-aware link optimization for space-terrestrial integrated networks: A reinforcement learning framework[J]. IEEE Access, 2021, 9: 77624–77636. doi: 10.1109/ACCESS.2021.3082862
[18]	CUI Jingjing, LIU Yuanwei, and NALLANATHAN A. Multi-agent reinforcement learning-based resource allocation for UAV networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 729–743. doi: 10.1109/TWC.2019.2935201
[19]	FENG Wei, WANG Jingchao, CHEN Yunfei, et al. UAV-aided MIMO communications for 5G internet of things[J]. IEEE Internet of Things Journal, 2019, 6(2): 1731–1740. doi: 10.1109/JIOT.2018.2874531
[20]	LI Chunguo, SUN Fan, CIOFFI J M, et al. Energy efficient MIMO relay transmissions via joint power allocations[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2014, 61(7): 531–535. doi: 10.1109/TCSII.2014.2327317
[21]	LI Chunguo, LIU Peng, ZOU Chao, et al. Spectral-efficient cellular communications with coexistent one- and two-hop transmissions[J]. IEEE Transactions on Vehicular Technology, 2016, 65(8): 6765–6772. doi: 10.1109/TVT.2015.2472456
[22]	AN Jue and ZHAO Feng. Trajectory optimization and power allocation algorithm in MBS-assisted cell-free massive MIMO systems[J]. IEEE Access, 2021, 9: 30417–30425. doi: 10.1109/ACCESS.2021.3054652
[23]	ARTHUR D and VASSILVITSKII S. K-means++: The advantages of careful seeding[C]. The Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, 2007: 1027–1035.