Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection

HUANG Gaoyong; SONG Jun; FANG Xuming; YAN Li; HE Rong

doi:10.11999/JEIT251321

Volume 48 Issue 5

May 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 > 48(5): 1936-1947

HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong. Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection[J]. Journal of Electronics & Information Technology, 2026, 48(5): 1936-1947. doi: 10.11999/JEIT251321

Citation:

HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong. Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection[J]. Journal of Electronics & Information Technology, 2026, 48(5): 1936-1947. doi: 10.11999/JEIT251321

Citation:

HUANG Gaoyong, SONG Jun, FANG Xuming, YAN Li, HE Rong. Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection[J]. Journal of Electronics & Information Technology, 2026, 48(5): 1936-1947. doi: 10.11999/JEIT251321

PDF( 4304 KB)

Multi-agent Reinforcement Learning Method for Trajectory Optimization in Dual-UAV Cooperative Railway Inspection

doi: 10.11999/JEIT251321 cstr: 32379.14.JEIT251321

HUANG Gaoyong^{1, 2
,
,},
SONG Jun¹,
FANG Xuming¹,
YAN Li^{1, 2},
HE Rong¹

1.
Sichuan Province Key Laboratory of Information Coding and Transmission, Southwest Jiaotong University, Chengdu 610031, China
2.
Sichuan Provincial Engineering Research Center for Train Operation Control Technology, Chengdu 611756, China

Funds: The National Natural Science Foundation of China (62071393)

Received Date: 2025-12-15
Accepted Date: 2026-03-24
Rev Recd Date: 2026-03-21

Available Online: 2026-04-19

Publish Date: 2026-05-30

Abstract

Abstract

Objective Conventional railway inspection methods, including manual inspection and dedicated inspection vehicles, suffer from low efficiency, limited coverage, and safety risks, especially in hazardous or inaccessible areas. Unmanned Aerial Vehicles (UAVs) offer a promising alternative. However, deployment in strictly regulated railway protection zones remains challenging. In particular, single-UAV inspection is limited by restricted viewpoints, coverage blind spots, and poor data synchronization. To address these issues, this paper proposes a dual-UAV cooperative railway inspection framework. The objective is to jointly optimize the flight trajectories and inspection task sequence of two UAVs to maximize inspection task quality under coupled constraints, including energy consumption, obstacle avoidance, communication-rate constraints, and cooperative synchronization. Methods To solve this high-dimensional, non-convex, NP-hard problem, a two-stage hierarchical framework is proposed. In the first stage, the optimal cooperative observation positions for each inspection task are determined. Particle Swarm Optimization (PSO) is used to obtain the optimal three-dimensional coordinates of the two UAVs, thereby improving coverage and inspection quality. In the second stage, continuous trajectory optimization is formulated as a Multi-Agent Deep Reinforcement Learning (MADRL) problem. To improve convergence stability under strong safety constraints, a Risk-Adaptive Exploration Noise Mechanism (RAENM) is incorporated into the training process. The problem is then solved by an improved Multi-Agent Twin Delayed Deep Deterministic policy gradient (MATD3) algorithm under the Centralized Training with Decentralized Execution (CTDE) paradigm. Each UAV is modeled as an independent agent. Its state includes kinematic information, target position, remaining energy, and obstacle distance. Its action space defines the flight control variables. A composite reward function is designed to balance multiple objectives, including target approaching, energy saving, obstacle avoidance, railway-protection-zone compliance, and synchronized cooperative arrival. Results and Discussions The proposed framework is evaluated through simulations against several baseline algorithms. The results show that the improved MATD3 method achieves faster and more stable convergence, especially as the number of inspection tasks increases. In path planning, it generates more compact trajectories and the shortest total path length. For example, in the two-task scenario, the total path length is reduced to 13,025 m, about 4.5% shorter than that of the next best method. In addition, the proposed method achieves the lowest cumulative energy consumption in all tested scenarios. It also yields the smallest navigation error and the shortest arrival-time difference between the two UAVs at shared inspection points, indicating higher control accuracy and better spatiotemporal coordination. By reducing position deviation and improving synchronization, the proposed method achieves the highest inspection task quality in all evaluation settings. Conclusions This paper proposes a two-stage hierarchical framework for dual-UAV cooperative trajectory optimization in railway inspection. The framework combines PSO-based cooperative observation position optimization with improved MATD3-based trajectory learning. Simulation results show that the proposed method outperforms baseline methods in path efficiency, energy saving, cooperative synchronization, and inspection task quality. This study provides support for the deployment of intelligent multi-UAV systems in railway infrastructure inspection. Future work will consider more realistic factors, including communication uncertainty and dynamic environments.

FullText(HTML)

References(29)

References

[1]	AELA P, CHI H L, FARES A, et al. UAV-based studies in railway infrastructure monitoring[J]. Automation in Construction, 2024, 167: 105714. doi: 10.1016/j.autcon.2024.105714.
[2]	秦暄阳, 张喆, 王浩宇, 等. 国内外铁路巡检无人机应用现状分析(上)[J]. 铁道技术监督, 2024, 52(1): 48–51,55. doi: 10.3969/j.issn.1006-9178.2024.01.018. QIN Xuanyang, ZHANG Zhe, WANG Haoyu, et al. Analysis on the application status of railway inspection unmanned aerial vihicle at home and abroad (Part 1 of 2)[J]. Railway Quality Control, 2024, 52(1): 48–51,55. doi: 10.3969/j.issn.1006-9178.2024.01.018.
[3]	LIU S, WANG Quandong, and LUO Yiping. A review of applications of visual inspection technology based on image processing in the railway industry[J]. Transportation Safety and Environment, 2019, 1(3): 185–204. doi: 10.1093/tse/tdz007.
[4]	WU Jianjie, PENG Limei, SHENG Wei, et al. Track gauge measurement based on model matching using UAV image[J]. Automation in Construction, 2023, 155: 105070. doi: 10.1016/j.autcon.2023.105070.
[5]	NARAZAKI Y. Autonomous vision-based inspection of RC railway bridges for rapid post-earthquake response and recovery[D]. [Ph. D. dissertation], University of Illinois at Urbana-Champaign, 2020.
[6]	ZHANG Ran, HAO Guangbo, ZHANG Kong, et al. Unmanned aerial vehicle navigation in underground structure inspection: A review[J]. Geological Journal, 2023, 58(6): 2454–2472. doi: 10.1002/gj.4763.
[7]	SHARMA R, PATEL K, SHAH S, et al. Aerial footage analysis using computer vision for efficient detection of points of interest near railway tracks[J]. Aerospace, 2022, 9(7): 370. doi: 10.3390/aerospace9070370.
[8]	中华人民共和国国务院. 铁路安全管理条例[Z]. 国务院令第639号. 2013-08-17. State Council of the People's Republic of China. Regulations on railway safety management[Z]. State Council Order No. 639. Promulgated on August 17, 2013.
[9]	李浩, 牛洪蛟, 李夏洋, 等. 基于无人机协同编队控制的铁路智能巡检方法[J]. 铁路通信信号工程技术, 2025, 22(2): 11–17,70. doi: 10.3969/j.issn.1673-4440.2025.02.002. LI Hao, NIU Hongjiao, LI Xiayang, et al. Intelligent railway inspection method based on UAV cooperative formation control[J]. Railway Signalling & Communication Engineering, 2025, 22(2): 11–17,70. doi: 10.3969/j.issn.1673-4440.2025.02.002.
[10]	WAN Yuting, ZHONG Yanfei, MA Ailong, et al. An accurate UAV 3-D path planning method for disaster emergency response based on an improved multiobjective swarm intelligence algorithm[J]. IEEE Transactions on Cybernetics, 2023, 53(4): 2658–2671. doi: 10.1109/tcyb.2022.3170580.
[11]	唐伦, 李质萱, 蒲昊, 等. 基于多智能体深度强化学习的无人机动态预部署策略[J]. 电子与信息学报, 2023, 45(6): 2007–2015. doi: 10.11999/JEIT220513. TANG Lun, LI Zhixuan, PU Hao, et al. A dynamic pre-deployment strategy of UAVs based on multi-agent deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2007–2015. doi: 10.11999/JEIT220513.
[12]	MEI Hao, ZHANG Haixia, ZHOU Xiaotian, et al. AoI minimization for air-ground integrated sensing and communication networks with jamming attack[J]. IEEE Transactions on Vehicular Technology, 2025, 74(8): 12776–12790. doi: 10.1109/TVT.2025.3558061.
[13]	FAN Xiao, WU Peiran, and XIA Minghua. Air-to-ground communications beyond 5G: Uav swarm formation control and tracking[J]. IEEE Transactions on Wireless Communications, 2024, 23(7): 8029–8043. doi: 10.1109/twc.2023.3347600.
[14]	WANG Changheng, WEI Zhiqing, JIANG Wangjun, et al. Cooperative sensing enhanced UAV path-following and obstacle avoidance with variable formation[J]. IEEE Transactions on Vehicular Technology, 2024, 73(6): 7501–7516. doi: 10.1109/tvt.2023.3348665.
[15]	胡钰林, 吴鹏, 原晓鹏, 等. 海上无人集群联合轨迹设计方法[J]. 电子与信息学报, 2022, 44(3): 890–898. doi: 10.11999/JEIT211305. HU Yulin, WU Peng, YUAN Xiaopeng, et al. Joint trajectory design for unmanned marine cluster[J]. Journal of Electronics & Information Technology, 2022, 44(3): 890–898. doi: 10.11999/JEIT211305.
[16]	ZHENG Yuanshuai and CHEN Junting. Geography-aware optimal UAV 3D placement for LOS relaying: A geometry approach[J]. IEEE Transactions on Wireless Communications, 2024, 23(8): 9301–9314. doi: 10.1109/twc.2023.3301613.
[17]	赵楠, 黄香港, 邓娜, 等. 无人机高能效立体覆盖中轨迹与资源优化[J]. 电子与信息学报, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151. ZHAO Nan, HUANG Xianggang, DENG Na, et al. Trajectory and resource optimization in energy-efficient 3D coverage of unmanned aerial vehicle[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3553–3562. doi: 10.11999/JEIT240151.
[18]	BAEK J, HAN S I, and HAN Y. Energy-efficient UAV routing for wireless sensor networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(2): 1741–1750. doi: 10.1109/TVT.2019.2959808.
[19]	ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/twc.2019.2902559.
[20]	MORTEZAEI A, MIRAHMADI S S, and DERAKHSHAN F. A new era in railway track inspection: Drone based image processing integrated with IoT[C]. The 10th International Conference on Artificial Intelligence and Robotics (QICAR), Qazvin, Iran, 2024: 311–315. doi: 10.1109/qicar61538.2024.10496654.
[21]	赵志超, 饶彬, 王涛, 等. 雷达网检测概率计算及性能评估[J]. 现代雷达, 2010, 32(7): 7–10. doi: 10.3969/j.issn.1004-7859.2010.07.002. ZHAO Zhichao, RAO Bin, WANG Tao, et al. Detection probability calculation and performance evaluation of radar network[J]. Modern Radar, 2010, 32(7): 7–10. doi: 10.3969/j.issn.1004-7859.2010.07.002.
[22]	AL-HOURANI A, KANDEEPAN S, and LARDNER S. Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014, 3(6): 569–572. doi: 10.1109/lwc.2014.2342736.
[23]	ACKERMANN J, GABLER V, OSA T, et al. Reducing overestimation bias in multi-agent domains using double centralized critics[EB/OL]. https://arxiv.org/abs/1910.01465, 2019. doi: 10.48550/arXiv.1910.01465.
[24]	LI Bin, YANG Rongrong, LIU Lei, et al. Service placement and trajectory design for heterogeneous tasks in multi-UAV edge computing networks[J]. IEEE Internet of Things Journal, 2025, 12(8): 9360–9371. doi: 10.1109/jiot.2024.3439350.
[25]	TIAN Jie, WANG Di, ZHANG Haixia, et al. Service satisfaction-oriented task offloading and UAV scheduling in UAV-enabled MEC networks[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 8949–8964. doi: 10.1109/twc.2023.3267330.
[26]	PU Yuan, WANG Shaochen, YANG Rui, et al. Decomposed soft actor-critic method for cooperative multi-agent reinforcement learning[EB/OL]. https://arxiv.org/abs/2104.06655, 2021. doi: 10.48550/arXiv.2104.06655.
[27]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1856–1865.
[28]	FUJIMOTO S, VAN HOOF H, and MEGER D. Addressing function approximation error in actor-critic methods[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1582–1591.
[29]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. The 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2016. doi: 10.1016/S1098-3015(10)67722-4.