Power Allocation and Trajectory Design for Unmanned Aerial Vehicle Relay Network with Mobile Users
-
摘要: 在无人机(UAV)中继通信中,中继无人机的通信资源分配与运动规划是需要重点解决的问题。为了提升无人机中继通信系统的通信效率,该文提出一种基于近端策略优化算法的无人机中继功率分配与轨迹设计联合规划方法。该方法将用户移动场景下无人机中继功率分配与轨迹设计联合规划问题建模为马尔可夫决策过程,考虑用户位置信息获取不精确的情形,在满足用户中断概率约束的前提下,以中继通信系统的吞吐量最大为优化目标设置奖励函数,采用一种收敛速度较快的深度强化学习算法——近端策略优化算(PPO)法求解,实现中继无人机飞行轨迹优化和中继发射功率合理有效分配。仿真实验结果表明,针对用户随机移动的无人机中继通信场景,该文所提方法与基于随机策略和传统深度确定性策略梯度(DDPG)的方法相比,系统吞吐量分别提升22%和15%。结果表明,所提方法能够有效地提高系统的通信效率。Abstract: In Unmanned Aerial Vehicle (UAV) relay networks, communication resource allocation and motion planning of UAV are the key problems that should be solved. In order to improve the communication efficiency of UAV relay communication system, a joint planning method of UAV relay power allocation and trajectory design is proposed based on proximal policy optimization algorithm. The joint planning problem of UAV relay power allocation and trajectory design in the user movement scenario is modelled as a Markov decision-making process. Considering the inaccurate acquisition of user location information, the reward function is set with the maximum throughput of the relay communication system as the optimization goal under the premise of satisfying the user interruption probability constraint. Then, a deep reinforcement learning algorithm with high convergence speed—the Proximal Policy Optimization (PPO) algorithm, is used to solve the problem and realized the flight trajectory optimization of relay UAV and the reasonable and effective allocation of relay transmission power. The simulation experimental results show that for the scenario of UAV relay communication with random users movement, the proposed method improves system throughput by 22% and 15%, respectively, compared to the methods based on random strategy and traditional Deep Deterministic Policy Gradient (DDPG). The results show that the proposed method can effectively improve the communication efficiency of the system.
-
表 1 奖励函数的参数
奖励参数 值 $ {\xi _{{\text{out}}}} $ –0.5 $ \zeta $ 1$ \times $$ {\text{10}}^{-\text{3}} $ $ {\xi _{\text{c}}} $ 1$ \times $$ {\text{10}}^{-\text{9}} $ $ {\varepsilon _{{\text{ec}}}} $ 73 J $ {\xi _{{\text{ec}}}} $ 0.02 $ {\xi _{{\text{bd}}}} $ –1.5 $ {\xi _{{\text{acc}}}} $ –1 1 PPO-PATD算法
(1) 初始化网络参数$ \theta $,缓冲区D; (2) for each episode do (3) 初始化UAV、基站和各用户的初始位置,UAV的初始速
度为0,电池总能量为$ {e_{{\text{total}}}} $;(4) for each time slot k do (5) UAV的位置,UAV获取到的各用户非精确位置,基站
位置和UAV的速度构成当前时隙下的状态$ {s^k} $;(6) 选择动作$ {a^k} = {\pi _{{\theta _{{\text{old}}}}}}({s^k}) $,保存动作概率
$ P({\pi _{{\theta _{{\text{old}}}}}}({a^k}\left| {{s^k}} \right.)) $;(7) if 动作$ {a^k} $违反加速度约束,then (8) $ {\boldsymbol{a}}_{{\text{uav}}}^k = {a_{\max }}({\boldsymbol{a}}_{{\text{uav}}}^k/\left\| {{\boldsymbol{a}}_{{\text{uav}}}^k} \right\|) $; (9) end if (10) UAV执行调整后的动作; (11) 计算UAV速度:$ {\boldsymbol{v}}_{{\text{uav}}}^{k + 1} = {\boldsymbol{v}}_{{\text{uav}}}^k + {\boldsymbol{a}}_{{\text{uav}}}^k\delta $; (12) if $ \left\| {{\boldsymbol{v}}_{{\text{uav}}}^k} \right\| > {v_{\max }} $then (13) $ {\boldsymbol{v}}_{{\text{uav}}}^{k + 1} = {v_{\max }}({\boldsymbol{v}}_{{\text{uav}}}^{k + 1}/\left\| {{\boldsymbol{v}}_{{\text{uav}}}^{k + 1}} \right\|) $; (14) end if (15) if 执行动作后违反边界约束,then (16) 调整UAV的位置和速度以符合边界约束; (17) end if (18) 各用户随机移动至新的位置,进入下一状态$ {s^{k + 1}} $,获
取奖励$ {r^k} $;(19) 将$ \left\{ {{s^k},{a^k},P({\pi _{{\theta _{{\text{old}}}}}}({a^k}\left| {{s^k}} \right.)),{r^k}} \right\} $保存至D; (20) if D 中数据已经足够,then (21) 根据式计算折扣奖励; (22) 根据式计算优势估计; (23) for each update-time=1, ${n_{{\text{update}}}}$do (24) 由评估网络获取状态价值; (25) 根据式(34)计算目标函数:$ L_{{\text{clip + vf + }}{{\text{S}}_{\text{e}}}}^k $; (26) 通过最大化$ L_{{\text{clip + vf + }}{{\text{S}}_{\text{e}}}}^k $更新网络参数θ; (27) end for (28) $ \theta \to {\theta _{{\text{old}}}} $, 清空缓冲区 D; (29) end if (30) 更新状态$ {s^k} \to {s^{k + 1}} $; (31) end for (32) end for 表 2 UAV中继通信系统仿真参数
参数 值 用户数量$ N $ 10 时隙$ \delta\text{} $ 0.2 s 单位路径损耗$ {\beta _0} $ –42 dB 非视距链路衰减因子$ {a_0} $ 0.18 路径损耗指数α 2.07 视距概率$ {P_{{\text{LoS}}}} $ 0.95 $ {\sigma ^2} $ –95 dBm 基站发射功率$ {p_{{\text{bs}}}} $ 10 W 无人机最大发射功率$ p_{{\text{uav,}}\max }^{} $ 2 W 信噪比阈值$ {G_{{\text{th}}}} $ 0.42 dB 总带宽B 100 MHz 无人机重量G 40.18 N 空气密度ρ 1.201 kg/m2 转盘面积S 0.19 m2 与转子叶片形状相关的阻力系数$ {C_{{\text{blade}}}} $ 0.09 评估网络目标函数所占权重值$ {{\mathrm{c}}_1} $ –0.5 策略模型的熵所占权重值$ {{\mathrm{c}}_2} $ –0.01 动作值概率分布的标准差的最大值$ {\sigma _{a,\max }} $ 0.6 动作值概率分布的标准差的最小值$ {\sigma _{a,\min }} $ 0.1 关于动作值概率分布标准差的衰减因子$ {\partial _a} $ 0.999 5 仿真回合数Episodes 5 000 缓冲区D的大小Baffer-size 4 096 网络连续更新次数$ {n_{{\text{update}}}} $ 64 PPO的裁剪参数$ \varepsilon $ 0.2 计算奖励期望的折扣系数$ \gamma $ 0.99 策略网络学习率 0.000 1 评估网络学习率 0.000 3 -
[1] 胡钰林, 文玄, 原晓鹏, 等. 面向无线能量传输的三维无人机轨迹设计[J]. 电子与信息学报, 2022, 44(3): 852–859. doi: 10.11999/JEIT211280.HU Yulin, WEN Xuan, YUAN Xiaopeng, et al. 3D unmanned aerial vehicle trajectory design for wireless power transfer[J]. Journal of Electronics & Information Technology, 2022, 44(3): 852–859. doi: 10.11999/JEIT211280. [2] 张天魁, 陈超, 王子端, 等. 无人机辅助蜂窝网络中的无人机与用户协同缓存算法[J]. 通信学报, 2020, 41(9): 130–138. doi: 10.11959/j.issn.1000-436x.2020029.ZHANG Tiankui, CHEN Chao, WANG Ziduan, et al. Cooperative caching algorithm of UAV and user in UAV-assisted cellular network[J]. Journal on Communications, 2020, 41(9): 130–138. doi: 10.11959/j.issn.1000-436x.2020029. [3] GHANAVI R, KALANTARI E, SABBAGHIAN M, et al. Efficient 3D aerial base station placement considering users mobility by reinforcement learning[C]. 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 2018: 1–6. doi: 10.1109/WCNC.2018.8377340. [4] ZHANG Shuo, SHI Shuo, GU Shushi, et al. Power control and trajectory planning based interference management for UAV-assisted wireless sensor networks[J]. IEEE Access, 2020, 8: 3453–3464. doi: 10.1109/ACCESS.2019.2962547. [5] ZHONG Xijian, GUO Yan, LI Ning, et al. Joint optimization of relay deployment, channel allocation, and relay assignment for UAVs-aided D2D networks[J]. IEEE/ACM Transactions on Networking, 2020, 28(2): 804–817. doi: 10.1109/TNET.2020.2970744. [6] LI Lei, CHANG T H, and CAI Shu. UAV positioning and power control for two-way wireless relaying[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 1008–1024. doi: 10.1109/TWC.2019.2950301. [7] LIANG Fengzhu, ZHANG Jun, LI Bin, et al. The optimal placement for caching UAV-assisted mobile relay communication[C]. 2019 IEEE 19th International Conference on Communication Technology (ICCT), Xi’an, China, 2019: 540–544. doi: 10.1109/ICCT46805.2019.8947051. [8] CHEN Yunfei, ZHAO Nan, DING Zhiguo, et al. Multiple UAVs as relays: Multi-hop single link versus multiple dual-hop links[J]. IEEE Transactions on Wireless Communications, 2018, 17(9): 6348–6359. doi: 10.1109/TWC.2018.2859394. [9] WEI Wei, CHEN Shukang, YAN Jun, et al. Optimal relay placement for UAV-assisted wireless regenerative communication system[C]. 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 2017: 2850–2854. doi: 10.1109/FSKD.2017.8393232. [10] PAN Cunhua, REN Hong, DENG Yansha, et al. Joint blocklength and location optimization for URLLC-enabled UAV relay systems[J]. IEEE Communications Letters, 2019, 23(3): 498–501. doi: 10.1109/LCOMM.2019.2894696. [11] REN Hong, PAN Cunhua, WANG Kezhi, et al. Joint transmit power and placement optimization for URLLC-enabled UAV relay systems[J]. IEEE Transactions on Vehicular Technology, 2020, 69(7): 8003–8007. doi: 10.1109/TVT.2020.2992736. [12] ZENG Yong, ZHANG Rui, and LIM T J. Throughput maximization for UAV-enabled mobile relaying systems[J]. IEEE Transactions on Communications, 2016, 64(12): 4983–4996. doi: 10.1109/TCOMM.2016.2611512. [13] WANG Haichao, WANG Jinlong, DING Guoru, et al. Spectrum sharing planning for full-duplex UAV relaying systems with Underlaid D2D communications[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1986–1999. doi: 10.1109/JSAC.2018.2864375. [14] WANG Lei, HU Bo, CHEN Shanzhi, et al. UAV-enabled reliable mobile relaying based on downlink NOMA[J]. IEEE Access, 2020, 8: 25237–25248. doi: 10.1109/ACCESS.2020.2970206. [15] ZENG Yong and ZHANG Rui. Energy-efficient UAV communication with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3747–3760. doi: 10.1109/TWC.2017.2688328. [16] XIAO Lin, XU Yu, YANG Dingcheng, et al. Secrecy energy efficiency maximization for UAV-enabled mobile relaying[J]. IEEE Transactions on Green Communications and Networking, 2020, 4(1): 180–193. doi: 10.1109/TGCN.2019.2949802. [17] GU Jiangchun, DING Guoru, XU Yitao, et al. Proactive optimization of transmission power and 3D trajectory in UAV-assisted relay systems with mobile ground users[J]. Chinese Journal of Aeronautics, 2021, 34(3): 129–144. doi: 10.1016/j.cja.2020.09.028. [18] ZENG Shuhao, ZHANG Hongliang, DI Boya, et al. Trajectory optimization and resource allocation for OFDMA UAV relay networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6634–6647. doi: 10.1109/TWC.2021.3075594. [19] SUN Zhongxiang, YANG Dingcheng, XIAO Lin, et al. Joint energy and trajectory optimization for UAV-enabled relaying network with multi-pair users[J]. IEEE Transactions on Cognitive Communications and Networking, 2021, 7(3): 939–954. doi: 10.1109/TCCN.2020.3048392. [20] XU Yongjun, LIU Zijian, HUANG Chongwen, et al. Robust resource allocation algorithm for energy-harvesting-based D2D communication underlaying UAV-assisted networks[J]. IEEE Internet of Things Journal, 2021, 8(23): 17161–17171. doi: 10.1109/JIOT.2021.3078264. [21] XU Yongjun, GUI Guan, GACANIN H, et al. A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges[J]. IEEE Communications Surveys & Tutorials, 2021, 23(2): 668–695. doi: 10.1109/COMST.2021.3059896. [22] 李国权, 林金朝, 徐勇军, 等. 无人机辅助的NOMA网络用户分组与功率分配算法[J]. 通信学报, 2020, 41(9): 21–28. doi: 10.11959/j.issn.1000-436x.2020194.LI Guoquan, LIN Jinzhao, XU Yongjun, et al. User grouping and power allocation algorithm for UAV-aided NOMA network[J]. Journal on Communications, 2020, 41(9): 21–28. doi: 10.11959/j.issn.1000-436x.2020194. [23] WU Qingqing, ZENG Yong, and ZHANG Rui. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109–2121. doi: 10.1109/TWC.2017.2789293. [24] ZHANG Guangchi, OU Xiaoqi, CUI Miao, et al. Cooperative UAV enabled relaying systems: Joint trajectory and transmit power optimization[J]. IEEE Transactions on Green Communications and Networking, 2022, 6(1): 543–557. doi: 10.1109/TGCN.2021.3108147. [25] WANG Zhen, ZHOU Fuhui, WANG Yuhao, et al. Joint 3D trajectory and resource optimization for a UAV relay-assisted cognitive radio network[J]. China Communications, 2021, 18(6): 184–200. doi: 10.23919/JCC.2021.06.015. [26] WANG Liang, WANG Kezhi, PAN Cunhua, et al. Deep Q-network based dynamic trajectory design for UAV-aided emergency communications[J]. Journal of Communications and Information Networks, 2020, 5(4): 393–402. doi: 10.23919/JCIN.2020.9306013. [27] CHANG Zheng, DENG Hengwei, YOU Li, et al. Trajectory design and resource allocation for multi-UAV networks: Deep reinforcement learning approaches[J]. IEEE Transactions on Network Science and Engineering, 2023, 10(5): 2940–2951. doi: 10.1109/TNSE.2022.3171600. [28] ZHAO Nan, CHENG Yiqiang, PEI Yiyang, et al. Deep reinforcement learning for trajectory design and power allocation in UAV networks[C]. ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6. doi: 10.1109/ICC40277.2020.9149196.