Prior-guided Temporal Fusion Method for Multi-UAV Cooperative Obstacle-avoidance Route Planning

WANG Ao; LI Dapeng; XU Yifan; FAN Bingyang; HAN Guang; ZHAO Haitao

doi:10.11999/JEIT251231

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

WANG Ao, LI Dapeng, XU Yifan, FAN Bingyang, HAN Guang, ZHAO Haitao. Prior-guided Temporal Fusion Method for Multi-UAV Cooperative Obstacle-avoidance Route Planning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251231

Citation:

WANG Ao, LI Dapeng, XU Yifan, FAN Bingyang, HAN Guang, ZHAO Haitao. Prior-guided Temporal Fusion Method for Multi-UAV Cooperative Obstacle-avoidance Route Planning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251231

Citation:

PDF( 7936 KB)

Prior-guided Temporal Fusion Method for Multi-UAV Cooperative Obstacle-avoidance Route Planning

doi: 10.11999/JEIT251231 cstr: 32379.14.JEIT251231

1.
School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
College of Communications Engineering, Army Engineering University of PLA, Nanjing 210001, China
3.
Portland Institute, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Funds: The Key Support Project of the Joint Fund Program of the National Natural Science Foundation of China (U24B20187), The International Science and Technology Cooperation Program of Jiangsu Province (BZ2025021)

Received Date: 2025-11-24
Accepted Date: 2026-03-24
Rev Recd Date: 2026-03-23

Available Online: 2026-04-19

Abstract

Abstract

Objective Traditional multi-agent reinforcement learning methods for multi-Unmanned Aerial Vehicle(UAV) cooperative obstacle-avoidance route planning in cluttered 3D environments often suffer from slow convergence, weak coordination, and limited global awareness under partial observability. To address these limitations, this paper proposes a prior-guided temporal fusion value-decomposition framework, termed Prior-Guided-LSTM-QMIX (PGL-QMIX). The method uses local heuristic scores derived from offline A* reference paths to guide decision-making under partial observability. The aim is to reduce route length, avoid collisions, and preserve real-time planning capability. Methods The multi-UAV cooperative obstacle-avoidance route-planning task is formulated as a Partially Observable Markov Decision Process (POMDP). In the offline stage, A* is used to generate a reference path for each UAV. During online execution, only the locally visible path segment is extracted, and heuristic scores are constructed from this local prior information and fused with each UAV’s local observation. An individual-level Long Short-Term Memory (LSTM) network is used to capture temporal dependencies in local perception and prior guidance, whereas a system-level LSTM-based mixing network dynamically generates the mixing weights and bias for value decomposition, thereby enabling coordinated joint action-value estimation. Potential-based reward shaping is further adopted to improve training stability. Results and Discussions Simulation results in 3D grid environments show that PGL-QMIX converges faster and more stably than QMIX, VDN, and MAPPO. Compared with QMIX, the proposed method reduces the average route length by 8.8%, 12.3%, and 16.1% in three scenarios, respectively. It also improves convergence speed by 20.5%, 26.6%, and 38.1%, and increases the steady-state task success rate by 5.22, 14.99, and 37.25 percentage points, respectively. In addition, the generated trajectories are shorter and more efficient across different map sizes. Conclusions PGL-QMIX improves coordination, safety, and route efficiency for multi-UAV cooperative obstacle avoidance in cluttered 3D environments. By integrating heuristic prior guidance, recurrent temporal fusion, and value decomposition, the proposed method achieves faster convergence, higher success rates, and better generalization than existing baselines. Future work will incorporate real UAV dynamic constraints and communication-aware cooperative obstacle avoidance.
- Multi-Unmanned Aerial Vehicle(UAV),
- Route planning,
- Deep reinforcement learning,
- Prior knowledge,
- Cooperative obstacle avoidance

FullText(HTML)

References(16)

References

[1]	高思华, 刘宝煜, 惠康华, 等. 信息年龄约束下的无人机数据采集能耗优化路径规划算法[J]. 电子与信息学报, 2024, 46(10): 4024–4034. doi: 10.11999/JEIT240075. GAO Sihua, LIU Baoyu, HUI Kanghua, et al. Energy-efficient UAV trajectory planning algorithm for AoI-constrained data collection[J]. Journal of Electronics & Information Technology, 2024, 46(10): 4024–4034. doi: 10.11999/JEIT240075.
[2]	潘钰, 胡航, 金虎, 等. 非授权频段下无人机辅助通信的轨迹与资源分配优化[J]. 电子与信息学报, 2024, 46(11): 4287–4294. doi: 10.11999/JEIT240275. PAN Yu, HU Hang, JIN Hu, et al. Trajectory and resource allocation optimization for unmanned aerial vehicles assisted communications in unlicensed bands[J]. Journal of Electronics & Information Technology, 2024, 46(11): 4287–4294. doi: 10.11999/JEIT240275.
[3]	薛健, 赵琳, 向贤财, 等. 非完全信息下无人机集群对抗研究综述[J]. 电子与信息学报, 2024, 46(4): 1157–1172. doi: 10.11999/JEIT230544. XUE Jian, ZHAO Lin, XIANG Xiancai, et al. A review of the research on UAV swarm confrontation under incomplete information[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1157–1172. doi: 10.11999/JEIT230544.
[4]	DUCHOŇ F, BABINEC A, KAJAN M, et al. Path planning with modified a star algorithm for a mobile robot[J]. Procedia Engineering, 2014, 96: 59–69. doi: 10.1016/j.proeng.2014.12.098.
[5]	OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4033–4041.
[6]	周治国, 余思雨, 于家宝, 等. 面向无人艇的T-DQN智能避障算法研究[J]. 自动化学报, 2023, 49(8): 1645–1655. doi: 10.16383/j.aas.c210080. ZHOU Zhiguo, YU Siyu, YU Jiabao, et al. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle[J]. Acta Automatica Sinica, 2023, 49(8): 1645–1655. doi: 10.16383/j.aas.c210080.
[7]	赵静, 裴子楠, 姜斌, 等. 基于深度强化学习的无人机虚拟管道视觉避障[J]. 自动化学报, 2024, 50(11): 2245–2258. doi: 10.16383/j.aas.c230728. ZHAO Jing, PEI Zinan, JIANG Bin, et al. Virtual tube visual obstacle avoidance for UAV based on deep reinforcement learning[J]. Acta Automatica Sinica, 2024, 50(11): 2245–2258. doi: 10.16383/j.aas.c230728.
[8]	GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM: A search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10): 2222–2232. doi: 10.1109/TNNLS.2016.2582924.
[9]	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]. The 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2018: 2085–2087.
[10]	RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 178.
[11]	冯斯梦, 张云弈, 刘凯, 等. 低空混合障碍下无人机协同多智能体航迹规划[J]. 电子与信息学报, 2025, 47(5): 1291–1300. doi: 10.11999/JEIT250012. FENG Simeng, ZHANG Yunyi, LIU Kai, et al. Collaborative multi-agent trajectory optimization for unmanned aerial vehicles under low-altitude mixed-obstacle airspace[J]. Journal of Electronics & Information Technology, 2025, 47(5): 1291–1300. doi: 10.11999/JEIT250012.
[12]	GUAN Yue, ZOU Sai, PENG Haixia, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848–8859. doi: 10.1109/jiot.2023.3320796.
[13]	YANG Yang, LI Juntao, and PENG Lingling. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177–183. doi: 10.1049/trit.2020.0024.
[14]	WANG Binyu, LIU Zhe, LI Qingbiao, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6932–6939. doi: 10.1109/LRA.2020.3026638.
[15]	BEYNIER A, CHARPILLET F, SZER D, et al. DEC-MDP/POMDP[M]. SIGAUD O and BUFFET O. Markov Decision Processes in Artificial Intelligence. Hoboken: John Wiley & Sons, Inc. , 2013: 277–318. doi: 10.1002/9781118557426.ch9.
[16]	邢娜, 邸昊天, 尹文杰, 等. 基于自适应多态蚁群优化的智能体路径规划[J]. 北京航空航天大学学报, 2025, 51(7): 2330–2337. doi: 10.13700/j.bh.1001-5965.2023.0432. XING Na, DI Haotian, YIN Wenjie, et al. Path planning for agents based on adaptive polymorphic ant colony optimization[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2330–2337. doi: 10.13700/j.bh.1001-5965.2023.0432.