Collaborative Multi-agent Trajectory Optimization for Unmanned Aerial Vehicles Under Low-altitude Mixed-obstacle Airspace
-
摘要: 在低空智联网中,随着用户数量的急剧增加与空域环境的日益复杂,无人机(UAVs)搭载活动基站为多用户提供通信服务时难以兼顾数据传输性能与飞行安全。因此,该文创新性构建了基于碰撞概率地图避障的无人机避障通信系统模型,为解决低空混合障碍下最大化无人机通信能效的问题,提出了用户调度优化的多智能体深度确定性策略梯度(MADDPG)算法,实现了多机协同航迹规划。仿真分析表明,该文所提策略在混合障碍物空域中可有效提升无人机系统能效的同时,平均碰撞概率相比传统避障方法降低了约8倍。Abstract:
Objective The rapid expansion of the low-altitude economy has driven the development of low-altitude intelligent networks as a key component of the Internet of Things (IoT). In such networks, the growing number of users challenges the ability of Unmanned Aerial Vehicles (UAVs) with mobile base stations to sustain data transmission quality. Efficient access technologies are therefore essential to ensure service quality as user density increases. At the same time, the growing complexity of airspace elevates the risk of in-flight collisions, necessitating integrated strategies to improve both communication efficiency and flight safety. This study proposes a collaborative trajectory planning framework for multiple UAVs operating in low-altitude, mixed-obstacle environments. The approach incorporates Non-Orthogonal Multiple Access (NOMA) to increase spectral efficiency and communication capacity, together with a discrete collision probability map for obstacle avoidance. A novel multi-UAV communication and obstacle-avoidance model is developed, and an optimized Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is introduced to schedule users and plan UAV trajectories. The objective is to maximize communication energy efficiency while ensuring reliable obstacle avoidance. The proposed method effectively enhances multi-UAV coordination in complex airspace and improves the overall communication performance. Methods To ensure energy efficiency and reliable obstacle avoidance for multiple UAVs operating in low-altitude, mixed-obstacle environments, a multi-user communication system model is proposed, incorporating collaborative multi-UAV trajectory planning. This model comprises two key components. First, a collision probability model based on discrete obstacles extends the conventional low-altitude obstacle representation into a probabilistic collision map. Second, a multi-user communication framework is constructed using fractional-order transmission energy allocation under NOMA, integrating both UAV communication and flight energy models within a unified UAV energy efficiency framework. Based on this model, the problem of maximizing energy efficiency is formulated, accounting for coordinated UAV communication and obstacle avoidance. To solve this problem, an integrated strategy is proposed. A multi-agent direction-preprocessing K-means++ algorithm is first used to enhance convergence during user scheduling optimization. Based on the optimized user allocation and environmental awareness, a state space is defined together with a 3D action space consisting of 27 directional movement options. The MADDPG algorithm is then trained by alternately updating Actor and Critic networks over the defined state-action space. Once trained, the network outputs trajectory planning policies that achieve both effective obstacle avoidance and optimized communication energy efficiency. Results and Discussions The proposed trajectory planning framework applies a user scheduling algorithm that dynamically allocates users at each time step, incorporating the positions of other UAVs, obstacles, and associated collision probabilities as environmental inputs. The MADDPG network is trained using a reward function defined by energy efficiency and collision probability, enabling the generation of trajectory planning solutions that maintain both communication performance and flight safety for multiple UAVs. Simulation results show that the planned trajectories—depicted by red, yellow, and blue lines—are shorter on average than those obtained using the traditional safety radius method ( Fig. 3 ). Compared with trajectory planning approaches based on varying safety radius values, the proposed method achieves an approximately 8-fold reduction in average collision probability (Fig. 5 ). In terms of communication performance, the NOMA-based approach significantly outperforms Frequency-Division Multiple Access (FDMA). Furthermore, the proposed algorithm, incorporating multi-agent direction preprocessing optimization, yields an average improvement of 10.81% in communication energy efficiency over the non-optimized variant, as evaluated by the mean across multiple iterations (Fig. 6 ). The network also demonstrates rapid environmental adaptation within 20 training iterations and exhibits superior generalization compared to conventional reward-based reinforcement learning algorithms (Fig. 4 ).Conclusions This paper presents a multi-UAV collaborative communication and trajectory planning solution for ensuring both flight safety and communication performance in low-altitude mixed-obstacle airspace during multi-user operations. A UAV collaborative NOMA communication system model, based on a collision probability map, is developed. An optimized MADDPG algorithm for user scheduling is introduced to address the multi-UAV trajectory planning problem, aiming to maximize communication energy efficiency. The algorithm comprises two key components: firstly, a user scheduling algorithm based on K-means++ to establish user-UAV connection relationships; secondly, the MADDPG algorithm, which generates UAV trajectory planning solutions under dynamic environmental conditions and established connection relationships. Simulation results reveal the following key findings: (1) The optimized MADDPG algorithm enhances multi-UAV communication while ensuring flight safety; (2) The proposed algorithm significantly improves obstacle avoidance performance, reducing collision probability by approximately 8-fold compared to traditional methods; (3) The inclusion of multi-agent direction preprocessing optimizes communication energy efficiency by 10.81%. However, this study only considers a low-altitude environment with mixed static obstacles. In real-world scenarios, obstacles may move or intrude dynamically, and future work should explore the impact of dynamic obstacles on trajectory planning. -
1 用户调度优化
输入:${O_{l,t}},U$ (1)以$i$遍历${\mathcal{L}}$: (2) 随机选择${O_{i,t}}$作为聚类中心 (3) 以$d_{i,l,t}^{ - 1}$作为概率选择最大概率${O_{j,t}}$作为聚类中心 (4) 重复直至$U$个聚类中心被确定 (5) 当聚类中心发生变化时重复: (6) 将${O_{l,t}}(l = 1,2,\cdots,L)$归类到距离最近聚类中心的类中 (7) 以类中均值作为新聚类中心 (8) 对比聚类中心与预设方向$ {O_{u0}} $并以距离远近分配无人机服
务聚类中心对应用户(9) 以${d_\rho }$作为判据将距离${d_{u,l,t}}$更大的${O_{l,t}}$的${\alpha _{u,l,t}}$置0 输出:${\alpha _{u,l,t}}$ 2 多智能体深度确定性策略梯度算法
输入:$ S,A,R,{\mathcal{L}} $ (1)初始化Actor-Critic网络参数 (2)重复更新直到达到循环次数上限: (3) 从经验池随机取一组$ {S_t},{A_t},{R_t},{S_{t + 1}} $输入 (4) 通过将$ {S_{t + 1}} $输入Actor网络得到预计动作$ \hat a $ (5) 输入$ \hat a,{S_{t + 1}} $ 通过Critic网络得到下一状态的状态价值函数 $ {Q_{{\text{next}}}}(s) $并计算当前状态$ {S_t} $的状态价值函数估计值$ \hat y $ (6) 通过$ \hat y $计算$ L(\varepsilon ) $更新Critic网络参数 (7) 通过将$ {S_t} $输入Actor网络得到预计动作$ a' $ (8) 输入$ a',{S_t} $ 通过Critic网络得到下一状态的状态价值函数$ y' $ (9) 计算平均价值$ J(\varepsilon ) $更新Actor网络参数 (10)根据训练得到的网络输入初始状态、动作得到最优策略$ {\pi _*} $ 输出:$ {\pi _*} $ 表 1 仿真参数
参数 数值 参数 数值 $ [{O_x},{O_y},{O_z}]({\text{m}}) $ [200,200,200] ${d_0}$ 0.6 ${\eta _{{\text{LoS}}}}$ 1.173 $\rho $(kg/m3) 1.225 ${\eta _{{\text{NLoS}}}}$ 9.974 ${p_{\text{s}}}$ 0.05 ${\alpha _0}$ 0.11 $A$(m2) 0.503 ${\beta _0}$ 12.08 $ {d_{\rho}} $(m) 3 $ {z_0} $(m) 4 ${\text{b}}$ 1.4 ${{\text{p}}_{\text{0}}}$ 79.856 $ \zeta $ 0.6 ${{\text{p}}_{\text{1}}}$ 0.15496 $ \tau $ 0.8 ${U_{{\text{tip}}}}$(m/s) 120 $ {{\alpha}_{\text{p}}} $ 0.4 表 2 航迹仿真飞行距离表 (m)
\ 本文算法航迹长度$ T $ 安全半径算法航迹长度$ T $ UAV1 100 112 UAV2 80 88 UAV3 84 108 -
[1] 樊邦奎, 李云, 张瑞雨. 浅析低空智联网与无人机产业应用[J]. 地理科学进展, 2021, 40(9): 1441–1450. doi: 10.18306/dlkxjz.2021.09.001.FAN Bangkui, LI Yun, and ZHANG Ruiyu. Initial analysis of low-altitude internet of intelligences (IOI) and the applications of unmanned aerial vehicle industry[J]. Progress in Geography, 2021, 40(9): 1441–1450. doi: 10.18306/dlkxjz.2021.09.001. [2] REDDY MADDIKUNTA P K, HAKAK S, ALAZAB M, et al. Unmanned aerial vehicles in smart agriculture: Applications, requirements, and challenges[J]. IEEE Sensors Journal, 2021, 21(16): 17608–17619. doi: 10.1109/JSEN.2021.3049471. [3] ZHU Pengfei, WEN Longyin, DU Dawei, et al. Detection and tracking meet drones challenge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 7380–7399. doi: 10.1109/TPAMI.2021.3119563. [4] WU Qingqing, XU Jie, ZENG Yong, et al. A comprehensive overview on 5G-and-beyond networks with UAVs: From communications to sensing and intelligence[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(10): 2912–2945. doi: 10.1109/JSAC.2021.3088681. [5] SWINNEY C J and WOODS J C. A review of security incidents and defence techniques relating to the malicious use of small unmanned aerial systems[J]. IEEE Aerospace and Electronic Systems Magazine, 2022, 37(5): 14–28. doi: 10.1109/MAES.2022.3151308. [6] XU Chenchen, LIAO Xiaohan, TAN Junming, et al. Recent research progress of unmanned aerial vehicle regulation policies and technologies in urban low altitude[J]. IEEE Access, 2020, 8: 74175–74194. doi: 10.1109/ACCESS.2020.2987622. [7] SINGLA A, PADAKANDLA S, and BHATNAGAR S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(1): 107–118. doi: 10.1109/TITS.2019.2954952. [8] LINDQVIST B, MANSOURI S S, AGHA-MOHAMMADI A A, et al. Nonlinear MPC for collision avoidance and control of UAVs with dynamic obstacles[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6001–6008. doi: 10.1109/LRA.2020.3010730. [9] 李琦, 甘旭升, 孙静娟, 等. 军航无人机与民航航班侧向碰撞风险评估[J]. 北京航空航天大学学报, 2021, 47(4): 724–730. doi: 10.13700/j.bh.1001-5965.2020.0030.LI Qi, GAN Xusheng, SUN Jingjuan, et al. Risk assessment of lateral collision between military UAV and civil aviation flight[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(4): 724–730. doi: 10.13700/j.bh.1001-5965.2020.0030. [10] 向征, 何雨阳, 全志伟. 流量拥堵空域内一种基于Q-Learning算法的改航路径规划[J]. 科学技术与工程, 2022, 22(32): 14494–14501. doi: 10.3969/j.issn.1671-1815.2022.32.053.XIANG Zheng, HE Yuyang, and QUAN Zhiwei. A rerouting path planning based on Q-Learning algorithm in traffic congestion airspace[J]. Science Technology and Engineering, 2022, 22(32): 14494–14501. doi: 10.3969/j.issn.1671-1815.2022.32.053. [11] FENG Jianxin, ZHANG Jingze, ZHANG Geng, et al. UAV dynamic path planning based on obstacle position prediction in an unknown environment[J]. IEEE Access, 2021, 9: 154679–154691. doi: 10.1109/ACCESS.2021.3128295. [12] LI Liang, HU Jiawei, TANG Xinke, et al. Autonomous obstacle avoidance and communication capacity optimization for UAV-assisted VLC systems[C]. 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings (ACP/POEM), Wuhan, China, 2023: 1–4. doi: 10.1109/ACP/POEM59049.2023.10369190. [13] LI Bing, ZHAO Shengjie, ZHANG Rongqing, et al. Full-duplex UAV relaying for multiple user pairs[J]. IEEE Internet of Things Journal, 2021, 8(6): 4657–4667. doi: 10.1109/JIOT.2020.3027621. [14] ZHOU Xiaobo, YAN Shihao, HU Jinsong, et al. Joint optimization of a UAV's trajectory and transmit power for covert communications[J]. IEEE Transactions on Signal Processing, 2019, 67(16): 4276–4290. doi: 10.1109/TSP.2019.2928949. [15] YIN Sixing and YU F R. Resource allocation and trajectory design in UAV-aided cellular networks based on multiagent reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(4): 2933–2943. doi: 10.1109/JIOT.2021.3094651. [16] ZHOU Xiaobo, WU Qingqing, YAN Shihao, et al. UAV-enabled secure communications: Joint trajectory and transmit power optimization[J]. IEEE Transactions on Vehicular Technology, 2019, 68(4): 4069–4073. doi: 10.1109/TVT.2019.2900157. [17] DO D T, NGUYEN T T T, LE C B, et al. UAV relaying enabled NOMA network with hybrid duplexing and multiple antennas[J]. IEEE Access, 2020, 8: 186993–187007. doi: 10.1109/ACCESS.2020.3030221. [18] 谭国平, 易文雄, 周思源, 等. 无人机辅助MEC车辆任务卸载与功率控制近端策略优化算法[J]. 电子与信息学报, 2024, 46(6): 2361–2371. doi: 10.11999/JEIT230770.TAN Guoping, YI Wenxiong, ZHOU Siyuan, et al. Proximal policy optimization algorithm for UAV-assisted MEC vehicle task offloading and power control[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2361–2371. doi: 10.11999/JEIT230770. [19] 赵楠, 黄香港, 邓娜, 等. 无人机高能效立体覆盖中轨迹与资源优化[J]. 电子与信息学报, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151.ZHAO Nan, HUANG Xianggang, DENG Na, et al. Trajectory and resource optimization in energy-efficient 3D coverage of unmanned aerial vehicle[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151. [20] ZHAO Jingjing, YU Lanchenhui, CAI Kaiquan, et al. RIS-aided ground-aerial NOMA communications: A distributionally robust DRL approach[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(4): 1287–1301. doi: 10.1109/JSAC.2022.3143230. [21] REN Pengfei, WANG Jingjing, TONG Ziheng, et al. Federated learning via nonorthogonal multiple access for UAV-assisted internet of things[J]. IEEE Internet of Things Journal, 2024, 11(17): 27994–28006. doi: 10.1109/JIOT.2024.3413780. [22] CHEN Shijia, DU Pengfei, LIU Ziyue, et al. Energy-efficient trajectory design and computing offloading in UAV-aided IoT networks[C]. 2024 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Hangzhou, China, 2024: 1–6. doi: 10.1109/ICCCWorkshops62562.2024.10693739. [23] ZENG Fanzi, HU Zhenzhen, XIAO Zhu, et al. Resource allocation and trajectory optimization for QoE provisioning in energy-efficient UAV-enabled wireless networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(7): 7634–7647. doi: 10.1109/TVT.2020.2986776. [24] YANG Zhaohui, XU Wei, and SHIKH-BAHAEI M. Energy efficient UAV communication with energy harvesting[J]. IEEE Transactions on Vehicular Technology, 2020, 69(2): 1913–1927. doi: 10.1109/TVT.2019.2961993. -