Advanced Search
Turn off MathJax
Article Contents
WANG Ao, LI Dapeng, XU Yifan, FAN Bingyang, HAN Guang, ZHAO Haitao. Prior-guided temporal fusion multi-UAV cooperative obstacle avoidance route planning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251231
Citation: WANG Ao, LI Dapeng, XU Yifan, FAN Bingyang, HAN Guang, ZHAO Haitao. Prior-guided temporal fusion multi-UAV cooperative obstacle avoidance route planning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251231

Prior-guided temporal fusion multi-UAV cooperative obstacle avoidance route planning

doi: 10.11999/JEIT251231 cstr: 32379.14.JEIT251231
Funds:  the Joint Funds for the National Natural Science Foundation of China under Grant U24B201873. The International Science and Technology Cooperation Program of Jiangsu Province under Grant BZ2025021
  • Accepted Date: 2026-03-24
  • Rev Recd Date: 2026-03-24
  • Available Online: 2026-04-19
  •   Objective  Traditional multi-agent reinforcement learning methods for multi-UAV navigation in cluttered 3D environments often suffer from slow convergence, weak coordination, and limited global awareness under partial observability. To address these issues, this paper proposes a prior-guided cooperative route-planning framework, termed Prior-Guided LSTM-QMIX (PGL-QMIX), which incorporates environment-derived heuristic scores instead of explicit reference trajectories. The objective is to reduce route length and collision risk while preserving real-time decision-making capability.  Methods   The multi-UAV route-planning task is formulated as a partially observable Markov game. Static environmental information, including obstacle distribution, free-space structure, and goal locations, is mapped into a heuristic scoring field, and the locally visible part of this field is fused with each UAV’s local observation. An individual-level LSTM is used to capture temporal dependencies in local perception, while a system-level LSTM-based mixing network dynamically generates the mixing weights and bias for value decomposition, enabling coordinated joint value estimation. Potential-based reward shaping is further introduced to improve training stability.  Results and Discussions   Simulation results in 3D grid environments show that PGL-QMIX converges faster and more stably than QMIX, VDN, and MAPPO. Compared with the baseline QMIX, it reduces the average route length by 8.8%, 12.3%, and 16.1%, improves convergence speed by 20.5%, 26.6%, and 38.1%, and increases the steady-state task success rate by 5.22, 14.99, and 37.25 percentage points in three scenarios, respectively. The generated trajectories are also shorter and more efficient across different map sizes.  Conclusions   PGL-QMIX improves coordination, safety, and route efficiency for multi-UAV cooperative obstacle avoidance in cluttered 3D environments. By combining heuristic prior guidance, recurrent temporal fusion, and value decomposition, the proposed method achieves faster convergence, higher success rates, and better generalization than existing baselines. Future work will consider real UAV dynamic constraints and communication-aware cooperative obstacle avoidance.
  • loading
  • [1]
    高思华, 刘宝煜, 惠康华, 等. 信息年龄约束下的无人机数据采集能耗优化路径规划算法[J]. 电子与信息学报, 2024, 46(10): 4024–4034. doi: 10.11999/JEIT240075.

    GAO Sihua, LIU Baoyu, HUI Kanghua, et al. Energy-efficient UAV trajectory planning algorithm for AoI-constrained data collection[J]. Journal of Electronics & Information Technology, 2024, 46(10): 4024–4034. doi: 10.11999/JEIT240075.
    [2]
    潘钰, 胡航, 金虎, 等. 非授权频段下无人机辅助通信的轨迹与资源分配优化[J]. 电子与信息学报, 2024, 46(11): 4287–4294. doi: 10.11999/JEIT240275.

    PAN Yu, HU Hang, JIN Hu, et al. Trajectory and resource allocation optimization for unmanned aerial vehicles assisted communications in unlicensed bands[J]. Journal of Electronics & Information Technology, 2024, 46(11): 4287–4294. doi: 10.11999/JEIT240275.
    [3]
    薛健, 赵琳, 向贤财, 等. 非完全信息下无人机集群对抗研究综述[J]. 电子与信息学报, 2024, 46(4): 1157–1172. doi: 10.11999/JEIT230544.

    XUE Jian, ZHAO Lin, XIANG Xiancai, et al. A review of the research on UAV swarm confrontation under incomplete information[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1157–1172. doi: 10.11999/JEIT230544.
    [4]
    DUCHOŇ F, BABINEC A, KAJAN M, et al. Path planning with modified a star algorithm for a mobile robot[J]. Procedia Engineering, 2014, 96: 59–69. doi: 10.1016/j.proeng.2014.12.098.
    [5]
    OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4033–4041.
    [6]
    周治国, 余思雨, 于家宝, 等. 面向无人艇的T-DQN智能避障算法研究[J]. 自动化学报, 2023, 49(8): 1645–1655. doi: 10.16383/j.aas.c210080.

    ZHOU Zhiguo, YU Siyu, YU Jiabao, et al. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle[J]. Acta Automatica Sinica, 2023, 49(8): 1645–1655. doi: 10.16383/j.aas.c210080.
    [7]
    赵静, 裴子楠, 姜斌, 等. 基于深度强化学习的无人机虚拟管道视觉避障[J]. 自动化学报, 2024, 50(11): 2245–2258. doi: 10.16383/j.aas.c230728.

    ZHAO Jing, PEI Zinan, JIANG Bin, et al. Virtual tube visual obstacle avoidance for UAV based on deep reinforcement learning[J]. Acta Automatica Sinica, 2024, 50(11): 2245–2258. doi: 10.16383/j.aas.c230728.
    [8]
    GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM: A search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10): 2222–2232. doi: 10.1109/TNNLS.2016.2582924.
    [9]
    SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2018: 2085–2087.
    [10]
    RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 178.
    [11]
    冯斯梦, 张云弈, 刘凯, 等. 低空混合障碍下无人机协同多智能体航迹规划[J]. 电子与信息学报, 2025, 47(5): 1291–1300. doi: 10.11999/JEIT250012.

    FENG Simeng, ZHANG Yunyi, LIU Kai, et al. Collaborative multi-agent trajectory optimization for unmanned aerial vehicles under low-altitude mixed-obstacle airspace[J]. Journal of Electronics & Information Technology, 2025, 47(5): 1291–1300. doi: 10.11999/JEIT250012.
    [12]
    GUAN Yue, ZOU Sai, PENG Haixia, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848–8859. doi: 10.1109/jiot.2023.3320796.
    [13]
    YANG Yang, LI Juntao, and PENG Lingling. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177–183. doi: 10.1049/trit.2020.0024.
    [14]
    WANG Binyu, LIU Zhe, LI Qingbiao, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6932–6939. doi: 10.1109/LRA.2020.3026638.
    [15]
    BEYNIER A, CHARPILLET F, SZER D, et al. DEC-MDP/POMDP[M]. SIGAUD O and BUFFET O. Markov Decision Processes in Artificial Intelligence. Hoboken: John Wiley & Sons, Inc. , 2013: 277–318. doi: 10.1002/9781118557426.ch9.
    [16]
    邢娜, 邸昊天, 尹文杰, 等. 基于自适应多态蚁群优化的智能体路径规划[J]. 北京航空航天大学学报, 2025, 51(7): 2330–2337. doi: 10.13700/j.bh.1001-5965.2023.0432.

    XING Na, DI Haotian, YIN Wenjie, et al. Path planning for agents based on adaptive polymorphic ant colony optimization[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2330–2337. doi: 10.13700/j.bh.1001-5965.2023.0432.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(5)

    Article Metrics

    Article views (47) PDF downloads(11) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return