Advanced Search
Turn off MathJax
Article Contents
PENG Yuxiang, HAN Zhiwei, WANG Hui, HONG Weijia, LIU Zhigang. Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250343
Citation: PENG Yuxiang, HAN Zhiwei, WANG Hui, HONG Weijia, LIU Zhigang. Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250343

Research on Active Control Strategies for High-speed Train Pantographs Based on Reinforcement Learning-guided Model Predictive Control Algorithms

doi: 10.11999/JEIT250343 cstr: 32379.14.JEIT250343
  • Received Date: 2025-04-30
  • Rev Recd Date: 2025-09-12
  • Available Online: 2025-09-16
  •   Objective  The coupled dynamics of the pantograph–catenary system are a critical determinant of current collection stability and the overall operational efficiency of high-speed trains. This study proposes an active control strategy that addresses complex operating conditions to mitigate fluctuations in pantograph–catenary contact force. Conventional approaches face inherent limitations: model-free Reinforcement Learning (RL) suffers from low sample efficiency and a tendency to converge to local optima, while Model Predictive Control (MPC) is constrained by its short optimization horizon. To integrate their complementary advantages, this paper develops a Reinforcement Learning-guided Model Predictive Control (RL-GMPC) algorithm for active pantograph control. The objective is to design a controller that combines the long-term planning capability of RL with the online optimization and constraint-handling features of MPC. This hybrid framework is intended to overcome the challenges of sample inefficiency, short-sighted planning, and limited adaptability, thereby achieving improved suppression of contact force fluctuations across diverse operating speeds and environmental disturbances.  Methods   A finite element model of the pantograph–catenary system is established, in which a simplified three-mass pantograph model is integrated with nonlinear catenary components to simulate dynamic interactions. The reinforcement learning framework is designed with an adaptive latent dynamics model to capture system behavior and a robust reward estimation module to normalize multi-scale rewards. The RL-GMPC algorithm is formulated by combining MPC for short-term trajectory optimization with a terminal state value function for estimating long-term cumulative rewards, thus balancing immediate and future performance. A Markov decision process environment is constructed by defining the state variables (pantograph displacement, velocity, acceleration, and contact force), the action space (pneumatic lift force adjustment), and the reward function, which penalizes contact force deviations and abrupt control changes.  Results and Discussions   Experimental validation under Beijing–Shanghai line conditions demonstrates significant reductions in contact force standard deviations: 14.29%, 18.07%, 21.52%, and 34.87% at 290, 320, 350, and 380 km/h, respectively. The RL-GMPC algorithm outperforms conventional H∞ control and Proximal Policy Optimization (PPO) by generating smoother control inputs and suppressing high-frequency oscillations. Robustness tests under 20% random wind disturbances show a 30.17% reduction in contact force variations, confirming adaptability to dynamic perturbations. Cross-validation with different catenary configurations (Beijing–Guangzhou and Beijing–Tianjin lines) reveals consistent performance improvements, with deviations reduced by 19.31%–33.62% across speed profiles. Training efficiency analysis indicates that RL-GMPC requires 57% fewer interaction samples than PPO to achieve convergence, demonstrating superior sample efficiency.  Conclusions   The RL-GMPC algorithm integrates the predictive capabilities of model-based control with the adaptive learning strengths of reinforcement learning. By dynamically optimizing pantograph posture, it enhances contact stability across varying speeds and environmental disturbances. Its demonstrated robustness to parameter variations and external perturbations highlights its practical applicability in high-speed railway systems. This study establishes a novel framework for improving pantograph–catenary interaction quality, reducing maintenance costs, and advancing the development of next-generation high-speed trains.
  • loading
  • [1]
    余卫国, 熊幼京, 周新风, 等. 电力网技术线损分析及降损对策[J]. 电网技术, 2006, 30(18): 54–57,63. doi: 10.3321/j.issn:1000-3673.2006.18.011.

    YU Weiguo, XIONG Youjing, ZHOU Xinfeng, et al. Analysis on technical line losses of power grids and countermeasures to reduce line losses[J]. Power System Technology, 2006, 30(18): 54–57,63. doi: 10.3321/j.issn:1000-3673.2006.18.011.
    [2]
    吴延波, 韩志伟, 王惠, 等. 基于双延迟深度确定性策略梯度的受电弓主动控制[J]. 电工技术学报, 2024, 39(14): 4547–4556. doi: 10.19595/j.cnki.1000-6753.tces.230694.

    WU Yanbo, HAN Zhiwei, WANG Hui, et al. Active pantograph control of deep reinforcement learning based on double delay depth deterministic strategy gradient[J]. Transactions of China Electrotechnical Society, 2024, 39(14): 4547–4556. doi: 10.19595/j.cnki.1000-6753.tces.230694.
    [3]
    葛超, 张嘉滨, 王蕾, 等. 基于模型预测控制的自动驾驶车辆轨迹规划[J]. 计算机应用, 2024, 44(6): 1959–1964. doi: 10.11772/j.issn.1001-9081.2023050725.

    GE Chao, ZHANG Jiabin, WANG Lei, et al. Trajectory planning for autonomous vehicles based on model predictive control[J]. Journal of Computer Applications, 2024, 44(6): 1959–1964. doi: 10.11772/j.issn.1001-9081.2023050725.
    [4]
    DULAC-ARNOLD G, LEVINE N, MANKOWITZ D J, et al. Challenges of real-world reinforcement learning[J]. Machine Learning, 2021, 110(9): 2419–2468. doi: 10.1007/s10994-021-05961-4.
    [5]
    GUESTRIN C, LAGOUDAKIS M G, and PARR R. Coordinated reinforcement learning[C]. Proceedings of the 19th International Conference on Machine Learning, San Francisco, USA, 2002: 227–234.
    [6]
    HEGER M. Consideration of risk in reinforcement learning[M]. COHEN W W and HIRSH H. Machine Learning Proceedings 1994: Proceedings of the Eleventh International Conference. New Brunswick, 1994: 105–111. doi: 10.1016/B978-1-55860-335-6.50021-0.
    [7]
    VENKAT A N, HISKENS I A, RAWLINGS J B, et al. Distributed MPC strategies with application to power system automatic generation control[J]. IEEE Transactions on Control Systems Technology, 2008, 16(6): 1192–1206. doi: 10.1109/TCST.2008.919414.
    [8]
    LORENZEN M, CANNON M, and ALLGÖWER F. Robust MPC with recursive model update[J]. Automatica, 2019, 103: 461–471. doi: 10.1016/j.automatica.2019.02.023.
    [9]
    LIMON D, ALVARADO I, ALAMO T, et al. MPC for tracking piecewise constant references for constrained linear systems[J]. Automatica, 2008, 44(9): 2382–2387. doi: 10.1016/j.automatica.2008.01.023.
    [10]
    INCREMONA G P, FERRARA A, and MAGNI L. MPC for robot manipulators with integral sliding modes generation[J]. IEEE/ASME Transactions on Mechatronics, 2017, 22(3): 1299–1307. doi: 10.1109/TMECH.2017.2674701.
    [11]
    SONG Yang, LIU Zhigang, WANG Hongrui, et al. Nonlinear modelling of high-speed catenary based on analytical expressions of cable and truss elements[J]. Vehicle System Dynamics, 2015, 53(10): 1455–1479. doi: 10.1080/00423114.2015.1051548.
    [12]
    SCHIEHLEN W, GUSE N, and SEIFRIED R. Multibody dynamics in computational mechanics and engineering applications[J]. Computer Methods in Applied Mechanics and Engineering, 2006, 195(41/43): 5509–5522. doi: 10.1016/j.cma.2005.04.024.
    [13]
    陈庆斌, 韩先国, 李猛, 等. 一种机构型牵制释放装置的动力学建模与分析[J]. 航天制造技术, 2024(6): 26–34. doi: 10.3969/j.issn.1674-5108.2024.06.005.

    CHEN Qingbin, HAN Xianguo, LI Meng, et al. Dynamic modeling and analysis of a mechanism-type holding release device[J]. Aerospace Manufacturing Technology, 2024(6): 26–34. doi: 10.3969/j.issn.1674-5108.2024.06.005.
    [14]
    戈宝军, 殷继伟, 陶大军, 等. 基于励磁与调速控制的单机无穷大系统场-路-网时步有限元建模[J]. 电工技术学报, 2017, 32(3): 139–148. doi: 10.19595/j.cnki.1000-6753.tces.2017.03.016.

    GE Baojun, YIN Jiwei, TAO Dajun, et al. Modeling of field-circuit-network coupled time-stepping finite element for one machine infinite bus system based on excitation and speed control[J]. Transactions of China Electrotechnical Society, 2017, 32(3): 139–148. doi: 10.19595/j.cnki.1000-6753.tces.2017.03.016.
    [15]
    陈忠华, 平宇, 陈明阳, 等. 波动接触力下弓网载流摩擦力建模研究[J]. 电工技术学报, 2019, 34(7): 1434–1440. doi: 10.19595/j.cnki.1000-6753.tces.180212.

    CHEN Zhonghua, PING Yu, CHEN Mingyang, et al. Research on current-carrying friction modeling of pantograph- catenary under fluctuation contact pressure[J]. Transactions of China Electrotechnical Society, 2019, 34(7): 1434–1440. doi: 10.19595/j.cnki.1000-6753.tces.180212.
    [16]
    叶圣永, 王晓茹, 周曙, 等. 基于马尔可夫链蒙特卡罗方法的电力系统暂态稳定概率评估[J]. 电工技术学报, 2012, 27(6): 168–174. doi: 10.19595/j.cnki.1000-6753.tces.2012.06.025.

    YE Shengyong, WANG Xiaoru, ZHOU Shu, et al. Power system probabilistic transient stability assessment based on Markov chain Monte Carlo method[J]. Transactions of China Electrotechnical Society, 2012, 27(6): 168–174. doi: 10.19595/j.cnki.1000-6753.tces.2012.06.025.
    [17]
    梁骅旗, 陆畅. 基于改进模型预测转矩控制的高精度PMSM控制方法研究[J]. 计算机测量与控制, 2025, 33(2): 152–160. doi: 10.16526/j.cnki.11-4762/tp.2025.02.020.

    LIANG Huaqi and LU Chang. High precision PMSM control method for predictive torque control based on improved model[J]. Computer Measurement & Control, 2025, 33(2): 152–160. doi: 10.16526/j.cnki.11-4762/tp.2025.02.020.
    [18]
    WANG Hui, HAN Zhiwei, LIU Zhigang, et al. Deep reinforcement learning based active pantograph control strategy in high-speed railway[J]. IEEE Transactions on Vehicular Technology, 2023, 72(1): 227–238. doi: 10.1109/TVT.2022.3205452.
    [19]
    HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1861–1870.
    [20]
    HANSEN N, SU Hao, and WANG Xiaolong. TD-MPC2: scalable, robust world models for continuous control[J]. arXiv preprint arXiv: 2310.16828, 2023. doi: 10.48550/arXiv.2310.16828. (查阅网上资料,不确定类型及格式是否正确,请确认).
    [21]
    DEWEY D. Reinforcement learning and the reward engineering principle[C]. Proceedings of the AAAI Spring Symposium, 2014. (查阅网上资料, 未找到出版地和doi信息, 请确认).
    [22]
    KUBANEK J, SNYDER L H, and ABRAMS R A. Reward and punishment act as distinct factors in guiding behavior[J]. Cognition, 2015, 139: 154–167. doi: 10.1016/j.cognition.2015.03.005.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(11)  / Tables(8)

    Article Metrics

    Article views (21) PDF downloads(3) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return