Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms

PENG Yuxiang; HAN Zhiwei; WANG Hui; HONG Weijia; LIU Zhigang

doi:10.11999/JEIT250343

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

PENG Yuxiang, HAN Zhiwei, WANG Hui, HONG Weijia, LIU Zhigang. Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250343

Citation:

PENG Yuxiang, HAN Zhiwei, WANG Hui, HONG Weijia, LIU Zhigang. Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250343

Citation:

PENG Yuxiang, HAN Zhiwei, WANG Hui, HONG Weijia, LIU Zhigang. Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250343

PDF( 2859 KB)

Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms

doi: 10.11999/JEIT250343 cstr: 32379.14.JEIT250343

School of Electric Engineering, Southwest Jiaotong University, Chengdu 611756, China

Received Date: 2025-04-30
Rev Recd Date: 2025-09-10

Available Online: 2025-09-16

Abstract

Abstract

Objective The coupled dynamics of the pantograph-catenary system are a critical determinant of current collection stability and the overall operational efficiency of high-speed trains. This study proposes an active control strategy that addresses complex operating conditions to mitigate fluctuations in pantograph-catenary contact force. Conventional approaches face inherent limitations: model-free Reinforcement Learning (RL) suffers from low sample efficiency and a tendency to converge to local optima, while Model Predictive Control (MPC) is constrained by its short optimization horizon. To integrate their complementary advantages, this paper develops a Reinforcement Learning-Guided Model Predictive Control (RL-GMPC) algorithm for active pantograph control. The objective is to design a controller that combines the long-term planning capability of RL with the online optimization and constraint-handling features of MPC. This hybrid framework is intended to overcome the challenges of sample inefficiency, short-sighted planning, and limited adaptability, thereby achieving improved suppression of contact force fluctuations across diverse operating speeds and environmental disturbances. Methods A finite element model of the pantograph-catenary system is established, in which a simplified three-mass pantograph model is integrated with nonlinear catenary components to simulate dynamic interactions. The reinforcement learning framework is designed with an adaptive latent dynamics model to capture system behavior and a robust reward estimation module to normalize multi-scale rewards. The RL-GMPC algorithm is formulated by combining MPC for short-term trajectory optimization with a terminal state value function for estimating long-term cumulative rewards, thus balancing immediate and future performance. A Markov decision process environment is constructed by defining the state variables (pantograph displacement, velocity, acceleration, and contact force), the action space (pneumatic lift force adjustment), and the reward function, which penalizes contact force deviations and abrupt control changes. Results and Discussions Experimental validation under Beijing-Shanghai line conditions demonstrates significant reductions in contact force standard deviations: 14.29%, 18.07%, 21.52%, and 34.87% at 290, 320, 350, and 380 km/h, respectively. The RL-GMPC algorithm outperforms conventional H∞ control and Proximal Policy Optimization (PPO) by generating smoother control inputs and suppressing high-frequency oscillations. Robustness tests under 20% random wind disturbances show a 30.17% reduction in contact force variations, confirming adaptability to dynamic perturbations. Cross-validation with different catenary configurations (Beijing-Guangzhou and Beijing-Tianjin lines) reveals consistent performance improvements, with deviations reduced by 17.04%～33.62% across speed profiles. Training efficiency analysis indicates that RL-GMPC requires 57% fewer interaction samples than PPO to achieve convergence, demonstrating superior sample efficiency. Conclusions The RL-GMPC algorithm integrates the predictive capabilities of model-based control with the adaptive learning strengths of reinforcement learning. By dynamically optimizing pantograph posture, it enhances contact stability across varying speeds and environmental disturbances. Its demonstrated robustness to parameter variations and external perturbations highlights its practical applicability in high-speed railway systems. This study establishes a novel framework for improving pantograph-catenary interaction quality, reducing maintenance costs, and advancing the development of next-generation high-speed trains.
- High-speed railway,
- Active pantograph control,
- Reinforcement Learning (RL),
- Model Predictive Control (MPC)

FullText(HTML)

References(21)

References

[1]	余卫国, 熊幼京, 周新风, 等. 电力网技术线损分析及降损对策[J]. 电网技术, 2006, 30(18): 54–57,63. doi: 10.3321/j.issn:1000-3673.2006.18.011. YU Weiguo, XIONG Youjing, ZHOU Xinfeng, et al. Analysis on technical line losses of power grids and countermeasures to reduce line losses[J]. Power System Technology, 2006, 30(18): 54–57,63. doi: 10.3321/j.issn:1000-3673.2006.18.011.
[2]	吴延波, 韩志伟, 王惠, 等. 基于双延迟深度确定性策略梯度的受电弓主动控制[J]. 电工技术学报, 2024, 39(14): 4547–4556. doi: 10.19595/j.cnki.1000-6753.tces.230694. WU Yanbo, HAN Zhiwei, WANG Hui, et al. Active pantograph control of deep reinforcement learning based on double delay depth deterministic strategy gradient[J]. Transactions of China Electrotechnical Society, 2024, 39(14): 4547–4556. doi: 10.19595/j.cnki.1000-6753.tces.230694.
[3]	葛超, 张嘉滨, 王蕾, 等. 基于模型预测控制的自动驾驶车辆轨迹规划[J]. 计算机应用, 2024, 44(6): 1959–1964. doi: 10.11772/j.issn.1001-9081.2023050725. GE Chao, ZHANG Jiabin, WANG Lei, et al. Trajectory planning for autonomous vehicles based on model predictive control[J]. Journal of Computer Applications, 2024, 44(6): 1959–1964. doi: 10.11772/j.issn.1001-9081.2023050725.
[4]	DULAC-ARNOLD G, LEVINE N, MANKOWITZ D J, et al. Challenges of real-world reinforcement learning[J]. Machine Learning, 2021, 110(9): 2419–2468. doi: 10.1007/s10994-021-05961-4.
[5]	GUESTRIN C, LAGOUDAKIS M G, and PARR R. Coordinated reinforcement learning[C]. The 19th International Conference on Machine Learning, San Francisco, USA, 2002: 227–234.
[6]	HEGER M. Consideration of risk in reinforcement learning[M]. COHEN W W and HIRSH H. Machine Learning Proceedings 1994: Proceedings of the Eleventh International Conference. New Brunswick, 1994: 105–111. doi: 10.1016/B978-1-55860-335-6.50021-0.
[7]	VENKAT A N, HISKENS I A, RAWLINGS J B, et al. Distributed MPC strategies with application to power system automatic generation control[J]. IEEE Transactions on Control Systems Technology, 2008, 16(6): 1192–1206. doi: 10.1109/TCST.2008.919414.
[8]	LORENZEN M, CANNON M, and ALLGÖWER F. Robust MPC with recursive model update[J]. Automatica, 2019, 103: 461–471. doi: 10.1016/j.automatica.2019.02.023.
[9]	LIMON D, ALVARADO I, ALAMO T, et al. MPC for tracking piecewise constant references for constrained linear systems[J]. Automatica, 2008, 44(9): 2382–2387. doi: 10.1016/j.automatica.2008.01.023.
[10]	INCREMONA G P, FERRARA A, and MAGNI L. MPC for robot manipulators with integral sliding modes generation[J]. IEEE/ASME Transactions on Mechatronics, 2017, 22(3): 1299–1307. doi: 10.1109/TMECH.2017.2674701.
[11]	SONG Yang, LIU Zhigang, WANG Hongrui, et al. Nonlinear modelling of high-speed catenary based on analytical expressions of cable and truss elements[J]. Vehicle System Dynamics, 2015, 53(10): 1455–1479. doi: 10.1080/00423114.2015.1051548.
[12]	SCHIEHLEN W, GUSE N, and SEIFRIED R. Multibody dynamics in computational mechanics and engineering applications[J]. Computer Methods in Applied Mechanics and Engineering, 2006, 195(41/43): 5509–5522. doi: 10.1016/j.cma.2005.04.024.
[13]	陈庆斌, 韩先国, 李猛, 等. 一种机构型牵制释放装置的动力学建模与分析[J]. 航天制造技术, 2024(6): 26–34. doi: 10.3969/j.issn.1674-5108.2024.06.005. CHEN Qingbin, HAN Xianguo, LI Meng, et al. Dynamic modeling and analysis of a mechanism-type holding release device[J]. Aerospace Manufacturing Technology, 2024(6): 26–34. doi: 10.3969/j.issn.1674-5108.2024.06.005.
[14]	戈宝军, 殷继伟, 陶大军, 等. 基于励磁与调速控制的单机无穷大系统场-路-网时步有限元建模[J]. 电工技术学报, 2017, 32(3): 139–148. doi: 10.19595/j.cnki.1000-6753.tces.2017.03.016. GE Baojun, YIN Jiwei, TAO Dajun, et al. Modeling of field-circuit-network coupled time-stepping finite element for one machine infinite bus system based on excitation and speed control[J]. Transactions of China Electrotechnical Society, 2017, 32(3): 139–148. doi: 10.19595/j.cnki.1000-6753.tces.2017.03.016.
[15]	陈忠华, 平宇, 陈明阳, 等. 波动接触力下弓网载流摩擦力建模研究[J]. 电工技术学报, 2019, 34(7): 1434–1440. doi: 10.19595/j.cnki.1000-6753.tces.180212. CHEN Zhonghua, PING Yu, CHEN Mingyang, et al. Research on current-carrying friction modeling of pantograph- catenary under fluctuation contact pressure[J]. Transactions of China Electrotechnical Society, 2019, 34(7): 1434–1440. doi: 10.19595/j.cnki.1000-6753.tces.180212.
[16]	梁骅旗, 陆畅. 基于改进模型预测转矩控制的高精度PMSM控制方法研究[J]. 计算机测量与控制, 2025, 33(2): 152–160. doi: 10.16526/j.cnki.11-4762/tp.2025.02.020. LIANG Huaqi and LU Chang. High precision PMSM control method for predictive torque control based on improved model[J]. Computer Measurement & Control, 2025, 33(2): 152–160. doi: 10.16526/j.cnki.11-4762/tp.2025.02.020.
[17]	WANG Hui, HAN Zhiwei, LIU Zhigang, et al. Deep reinforcement learning based active pantograph control strategy in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2023, 72(1): 227–238. doi: 10.1109/TVT.2022.3205452.
[18]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1861–1870.
[19]	HANSEN N, SU Hao, and WANG Xiaolong. TD-MPC2: scalable, robust world models for continuous control[J]. arXiv preprint arXiv: 2310.16828, 2023. doi: 10.48550/arXiv.2310.16828.
[20]	DEWEY D. Reinforcement learning and the reward engineering principle[C]. The AAAI Spring Symposium, 2014.
[21]	KUBANEK J, SNYDER L H, and ABRAMS R A. Reward and punishment act as distinct factors in guiding behavior[J]. Cognition, 2015, 139: 154–167. doi: 10.1016/j.cognition.2015.03.005.