Advanced Search
Turn off MathJax
Article Contents
HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong. Multi-Agent Reinforcement Learning Method for Dual-UAV Cooperative Trajectory Optimization in Railway Inspection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251321
Citation: HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong. Multi-Agent Reinforcement Learning Method for Dual-UAV Cooperative Trajectory Optimization in Railway Inspection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251321

Multi-Agent Reinforcement Learning Method for Dual-UAV Cooperative Trajectory Optimization in Railway Inspection

doi: 10.11999/JEIT251321 cstr: 32379.14.JEIT251321
Funds:  The National Natural Science Foundation of China (62071393)
  • Accepted Date: 2026-03-24
  • Rev Recd Date: 2026-03-24
  • Available Online: 2026-04-19
  •   Objective  The rapid expansion of the global railway infrastructure necessitates advanced and efficient inspection methods to replace conventional manual or dedicated vehicle-based approaches, which suffer from inefficiency, limited coverage, and safety risks, especially in hazardous or inaccessible areas. Unmanned Aerial Vehicles (UAVs) offer a promising alternative; however, their deployment in strictly regulated railway protection zones presents significant challenges. Single-UAV operations are often constrained by limited perspectives and data asynchrony. A cooperative dual-UAV inspection framework in this paper is proposed to address these challenges. The primary optimization objective is to solve the complex optimization problem of jointly planning the flight trajectories and inspection sequences for two UAVs to maximize the quality of task completion while satisfying multiple coupled constraints, including energy consumption, obstacle avoidance, communication range, and formation synchronization.  Methods  To tackle this high-dimensional, non-convex, NP-hard optimization problem, a two-stage hierarchical framework is proposed to decompose the coupled multi-constraint model into tractable sub-problems. In the first stage, the framework decouples the problem by determining the optimal cooperative observation positions for each inspection task. A Particle Swarm Optimization (PSO) algorithm is employed to identify the ideal 3D coordinates for both UAVs, maximizing sensor coverage and data quality. In the second stage, the joint optimization of continuous flight trajectories and the discrete task sequence is formulated as a Multi-Agent Deep Reinforcement Learning (MADRL) problem. To ensure robust convergence under strict safety constraints, a Risk-Adaptive Exploration Noise Mechanism (RAENM) is integrated into the training process. The problem is then solved by the improved Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3) algorithm under a Centralized Training with Decentralized Execution (CTDE) paradigm. Each UAV acts as an independent agent with a state space encompassing its kinematic data, target location, remaining energy, and obstacle proximity. The action space defines UAV control inputs, while a meticulously designed multi-component reward function balances competing objectives: rewarding efficient navigation toward targets, penalizing high energy consumption, enforcing safety via penalties for entering railway protection zones and approaching obstacles, and incentivizing collaborative behavior through rewards for synchronized task execution.  Results and Discussions  The proposed framework was rigorously evaluated through comprehensive simulations against state-of-the-art baseline algorithms. Results demonstrate the significant advantages of the proposed improved MATD3 approach. Benefiting from the enhanced training stability provided by the risk-adaptive mechanism, the proposed improved MATD3 approach demonstrated superior convergence and scalability in complex multi-agent scenarios, consistently achieving higher cumulative rewards, particularly as the number of inspection tasks increased. In path planning, the proposed improved MATD3 algorithm generated more compact and efficient trajectories, consistently achieving the shortest total path lengths (e.g., 13,025 meters in a two-task scenario, outperforming the next best algorithm by approximately 4.5%). Furthermore, the proposed improved MATD3 algorithm excelled in energy efficiency, yielding the lowest cumulative energy consumption across all scenarios. It also maintained the smallest navigation error and time difference between UAV arrivals at shared inspection points, confirming high control precision and superior spatiotemporal coordination. Consequently, by minimizing positional deviations and ensuring synchronized coverage, the proposed MATD3 achieved the highest final inspection task quality scores in all evaluations.  Conclusions  An effective two-stage hierarchical framework for optimizing dual-UAV cooperative trajectories in railway infrastructure inspection is proposed in this paper, integrating the PSO algorithm to determine optimal perceptual positions and the improved MATD3 algorithm for learning dynamic collaborative flight policies. Extensive experiments demonstrate that the proposed solution significantly outperforms state-of-the-art baselines across multiple key performance indicators, including path efficiency, energy conservation, collision avoidance, and inspection coverage. This work provides a solid foundation for deploying intelligent multi-UAV systems in critical infrastructure monitoring. Future work will focus on enhancing robustness by incorporating real-world uncertainties, such as communication delays, and dynamic environmental conditions.
  • loading
  • [1]
    AELA P, CHI H L, FARES A, et al. UAV-based studies in railway infrastructure monitoring[J]. Automation in Construction, 2024, 167: 105714. doi: 10.1016/j.autcon.2024.105714.
    [2]
    秦暄阳, 张喆, 王浩宇, 等. 国内外铁路巡检无人机应用现状分析(上)[J]. 铁道技术监督, 2024, 52(1): 48–51,55. doi: 10.3969/j.issn.1006-9178.2024.01.018.

    QIN Xuanyang, ZHANG Zhe, WANG Haoyu, et al. Analysis on the application status of railway inspection unmanned aerial vihicle at home and abroad (Part 1 of 2)[J]. Railway Quality Control, 2024, 52(1): 48–51,55. doi: 10.3969/j.issn.1006-9178.2024.01.018.
    [3]
    LIU S, WANG Quandong, and LUO Yiping. A review of applications of visual inspection technology based on image processing in the railway industry[J]. Transportation Safety and Environment, 2019, 1(3): 185–204. doi: 10.1093/tse/tdz007.
    [4]
    WU Jianjie, PENG Limei, SHENG Wei, et al. Track gauge measurement based on model matching using UAV image[J]. Automation in Construction, 2023, 155: 105070. doi: 10.1016/j.autcon.2023.105070.
    [5]
    NARAZAKI Y. Autonomous vision-based inspection of RC railway bridges for rapid post-earthquake response and recovery[D]. [Ph. D. dissertation], University of Illinois at Urbana-Champaign, 2020.
    [6]
    ZHANG Ran, HAO Guangbo, ZHANG Kong, et al. Unmanned aerial vehicle navigation in underground structure inspection: A review[J]. Geological Journal, 2023, 58(6): 2454–2472. doi: 10.1002/gj.4763.
    [7]
    SHARMA R, PATEL K, SHAH S, et al. Aerial footage analysis using computer vision for efficient detection of points of interest near railway tracks[J]. Aerospace, 2022, 9(7): 370. doi: 10.3390/aerospace9070370.
    [8]
    中华人民共和国国务院. 铁路安全管理条例[Z]. 国务院令第639号. 2013-08-17.

    State Council of the People's Republic of China. Regulations on railway safety management[Z]. State Council Order No. 639. Promulgated on August 17, 2013. (查阅网上资料, 不确定格式是否正确, 未找到本条文献英文翻译, 请确认).
    [9]
    李浩, 牛洪蛟, 李夏洋, 等. 基于无人机协同编队控制的铁路智能巡检方法[J]. 铁路通信信号工程技术, 2025, 22(2): 11–17,70. doi: 10.3969/j.issn.1673-4440.2025.02.002.

    LI Hao, NIU Hongjiao, LI Xiayang, et al. Intelligent railway inspection method based on UAV cooperative formation control[J]. Railway Signalling & Communication Engineering, 2025, 22(2): 11–17,70. doi: 10.3969/j.issn.1673-4440.2025.02.002.
    [10]
    WAN Yuting, ZHONG Yanfei, MA Ailong, et al. An accurate UAV 3-D path planning method for disaster emergency response based on an improved multiobjective swarm intelligence algorithm[J]. IEEE Transactions on Cybernetics, 2023, 53(4): 2658–2671. doi: 10.1109/tcyb.2022.3170580.
    [11]
    唐伦, 李质萱, 蒲昊, 等. 基于多智能体深度强化学习的无人机动态预部署策略[J]. 电子与信息学报, 2023, 45(6): 2007–2015. doi: 10.11999/JEIT220513.

    TANG Lun, LI Zhixuan, PU Hao, et al. A dynamic pre-deployment strategy of UAVs based on multi-agent deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2007–2015. doi: 10.11999/JEIT220513.
    [12]
    MEI Hao, ZHANG Haixia, ZHOU Xiaotian, et al. AoI minimization for air-ground integrated sensing and communication networks with jamming attack[J]. IEEE Transactions on Vehicular Technology, 2025, 74(8): 12776–12790. doi: 10.1109/TVT.2025.3558061.
    [13]
    FAN Xiao, WU Peiran, and XIA Minghua. Air-to-ground communications beyond 5G: Uav swarm formation control and tracking[J]. IEEE Transactions on Wireless Communications, 2024, 23(7): 8029–8043. doi: 10.1109/twc.2023.3347600.
    [14]
    WANG Changheng, WEI Zhiqing, JIANG Wangjun, et al. Cooperative sensing enhanced UAV path-following and obstacle avoidance with variable formation[J]. IEEE Transactions on Vehicular Technology, 2024, 73(6): 7501–7516. doi: 10.1109/tvt.2023.3348665.
    [15]
    胡钰林, 吴鹏, 原晓鹏, 等. 海上无人集群联合轨迹设计方法[J]. 电子与信息学报, 2022, 44(3): 890–898. doi: 10.11999/JEIT211305.

    HU Yulin, WU Peng, YUAN Xiaopeng, et al. Joint trajectory design for unmanned marine cluster[J]. Journal of Electronics & Information Technology, 2022, 44(3): 890–898. doi: 10.11999/JEIT211305.
    [16]
    ZHENG Yuanshuai and CHEN Junting. Geography-aware optimal UAV 3D placement for LOS relaying: A geometry approach[J]. IEEE Transactions on Wireless Communications, 2024, 23(8): 9301–9314. doi: 10.1109/twc.2023.3301613.
    [17]
    赵楠, 黄香港, 邓娜, 等. 无人机高能效立体覆盖中轨迹与资源优化[J]. 电子与信息学报, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151.

    ZHAO Nan, HUANG Xianggang, DENG Na, et al. Trajectory and resource optimization in energy-efficient 3D coverage of unmanned aerial vehicle[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151.
    [18]
    BAEK J, HAN S I, and HAN Y. Energy-efficient UAV routing for wireless sensor networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(2): 1741–1750. doi: 10.1109/TVT.2019.2959808.
    [19]
    ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/twc.2019.2902559.
    [20]
    MORTEZAEI A, MIRAHMADI S S, and DERAKHSHAN F. A new era in railway track inspection: Drone based image processing integrated with IoT[C]. The 10th International Conference on Artificial Intelligence and Robotics (QICAR), Qazvin, Iran, 2024: 311–315. doi: 10.1109/qicar61538.2024.10496654.
    [21]
    赵志超, 饶彬, 王涛, 等. 雷达网检测概率计算及性能评估[J]. 现代雷达, 2010, 32(7): 7–10. doi: 10.3969/j.issn.1004-7859.2010.07.002.

    ZHAO Zhichao, RAO Bin, WANG Tao, et al. Detection probability calculation and performance evaluation of radar network[J]. Modern Radar, 2010, 32(7): 7–10. doi: 10.3969/j.issn.1004-7859.2010.07.002.
    [22]
    AL-HOURANI A, KANDEEPAN S, and LARDNER S. Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014, 3(6): 569–572. doi: 10.1109/lwc.2014.2342736.
    [23]
    ACKERMANN J, GABLER V, OSA T, et al. Reducing overestimation bias in multi-agent domains using double centralized critics[EB/OL]. https://arxiv.org/abs/1910.01465, 2019. doi: 10.48550/arXiv.1910.01465.
    [24]
    LI Bin, YANG Rongrong, LIU Lei, et al. Service placement and trajectory design for heterogeneous tasks in multi-UAV edge computing networks[J]. IEEE Internet of Things Journal, 2025, 12(8): 9360–9371. doi: 10.1109/jiot.2024.3439350.
    [25]
    TIAN Jie, WANG Di, ZHANG Haixia, et al. Service satisfaction-oriented task offloading and UAV scheduling in UAV-enabled MEC networks[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 8949–8964. doi: 10.1109/twc.2023.3267330.
    [26]
    PU Yuan, WANG Shaochen, YANG Rui, et al. Decomposed soft actor-critic method for cooperative multi-agent reinforcement learning[EB/OL]. https://arxiv.org/abs/2104.06655, 2021. doi: 10.48550/arXiv.2104.06655.
    [27]
    HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1856–1865.
    [28]
    FUJIMOTO S, VAN HOOF H, and MEGER D. Addressing function approximation error in actor-critic methods[C]. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1582–1591.
    [29]
    LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. The 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2016. doi: 10.1016/S1098-3015(10)67722-4. (查阅网上资料,未找到本条文献页码和doi,请确认).
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(2)

    Article Metrics

    Article views (39) PDF downloads(6) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return