Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy

ZHANG Ruifeng; YANG Rongni

doi:10.11999/JEIT250746

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

ZHANG Ruifeng, YANG Rongni. Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250746

Citation:

ZHANG Ruifeng, YANG Rongni. Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250746

Citation:

ZHANG Ruifeng, YANG Rongni. Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250746

PDF( 1501 KB)

Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy

doi: 10.11999/JEIT250746 cstr: 32379.14.JEIT250746

ZHANG Ruifeng,
YANG Rongni^,

School of Control Science and Engineering, Shandong University, Jinan 250061, China

Funds: The National Natural Science Foundation of China (62273208)

Received Date: 2025-08-12
Accepted Date: 2025-11-05
Rev Recd Date: 2025-11-05

Available Online: 2025-11-13

Abstract

Abstract

Objective The open network architecture of cyber-physical systems (CPSs) enables remarkable flexibility and scalability, but it also renders CPSs highly vulnerable to cyber-attacks. Particularly, denial-of-service (DoS) attacks have emerged as one of the predominant threats, which can cause packet loss and reduce system performance by directly jamming channels. On the other hand, CPSs under dormant and active DoS attacks can be regarded as dual-mode switched systems with stable and unstable subsystems, respectively. Therefore, it is worth exploring how to utilize the switched system theory to design a secure control approach with high degrees of freedom and low conservatism. However, due to the influence of complex environments such as attacks and noises, it is difficult to model practical CPSs exactly. Currently, although a Q-learning-based control method demonstrates potential for handling unknown CPSs, the significant research gap exists in switched systems with unstable modes, particularly for establishing the evaluable stability criterion. Therefore, it remains to be investigated for unknown CPSs under DoS attacks to apply switched system theory to design the learning-based control algorithm and evaluable security criterion. Methods An online mode-dependent switching-Q-learning strategy is presented to study the data-driven evaluable criterion and secure control for unknown CPSs under DoS attacks. Initially, the CPSs under dormant and active DoS attacks are transformed into switched systems with stable and unstable subsystems, respectively. Subsequently, the optimal control problem of the value function is addressed for the model-based switched systems by designing a new generalized switching algebraic Riccati equation (GSARE) and obtaining the corresponding mode-dependent optimal security controller. Furthermore, the existence and uniqueness of the GSARE’s solution are proved. In what follows, with the help of model-based results, a data-driven optimal security control law is proposed by developing a novel online mode-dependent switching-Q-learning control algorithm. Finally, through utilizing the learned control gain and parameter matrices from the above algorithm, a data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints. Results and Discussions In order to verify the efficiency and advantage of the proposed methods, comparative experiments of the wheeled robot are displayed in this work. Firstly, compare the model-based result (Theorem 1) and the data-driven result (Algorithm 1) as follows: From the iterative process curves of control gain and parameter matrices (Fig. 2 and Fig. 3), it can be observed that the optimal control gain and parameter matrices under threshold errors can all be successfully obtained from both the model-based GSARE and the data-driven algorithm. Meanwhile, the tracking errors of CPSs can converge to 0 by utilizing the above data-driven controller (Fig. 5), which ensures the exponential stability of CPSs and verifies the efficiency of our proposed switching-Q-learning algorithm. Secondly, it is evident from learning process curves (Fig.4) that although the initial value of the learned control gain is not stabilizable, the optimal control gain can still be successfully learned to stabilize the system from Algorithm 1. This result significantly reduces conservatism compared to existing Q-learning approaches, which take stabilizable initial control gains as the learning premise. Thirdly, compare the data-driven evaluable security criterion in Theorem 2 of this work and existing criteria as follows: While the switching parameters learned from Algorithm 1 do not satisfy the popular switching constraint to obtain the model dwell-time, by utilizing the evaluable security criterion proposed in this paper, the attack frequency and duration are obtained based on the new switching constraints and subsystem constraints. Furthermore, it is seen from the comparison of the evaluable security criteria (Tab.1) that our proposed evaluable security criterion is less conservative than the existing evaluable criteria. Finally, the learned optimal controller and the obtained DoS attack constraints are applied to the tracking control experiment of a wheeled robot under DoS attacks, and the result is compared with existing results via Q-learning controllers. It is evident from the tracking trajectory comparisons of the robot (Fig.6 and Fig.7) that the robot enables significantly faster and more accurate trajectory tracking with the help of our proposed switching-Q-learning controller. Therefore, the efficiency and advantage of the proposed algorithm and criterion in this work are verified. Conclusions Based on the learning strategy and the switched system theory, this study presents an online mode-dependent switching-Q-learning control algorithm and the corresponding evaluable security criterion for the unknown CPSs under DoS attacks. The detailed results are provided as follows: (1) By representing the unknown CPSs under dormant and active DoS attacks as unknown switched systems with stable and unstable subsystems, respectively, the security problem of CPSs under DoS attacks is transformed into a stabilization problem of the switched systems, which offers high design freedom and low conservatism. (2) A novel online mode-dependent switching-Q-learning control algorithm is developed for unknown switched systems with unstable modes. Through the comparative experiments, the proposed switching-Q-learning algorithm effectively increases the design freedom of controllers and decreases conservatism over existing Q-learning algorithms. (3) A new data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints. It is evident from the comparative criteria that the proposed criterion demonstrates significantly reduced conservatism over existing evaluable criteria via single subsystem constraints and traditional model dwell-time constraints.
- Cyber-Physical Systems (CPSs),
- Denial-of-Service (DoS) attacks,
- Switching-Q-learning,
- Data-driven security control,
- Evaluable security criterion

FullText(HTML)

References(35)

References

[1]	杨挺, 刘亚闯, 刘宇哲, 等. 信息物理系统技术现状分析与趋势综述[J]. 电子与信息学报, 2021, 43(12): 3393–3406. doi: 10.11999/JEIT211135. YANG Ting, LIU Yachuang, LIU Yuzhe, et al. Review on cyber-physical system: Technology analysis and trends[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3393–3406. doi: 10.11999/JEIT211135.
[2]	杨光红, 芦安洋, 安立伟. 网络攻击下的信息物理系统安全状态估计研究综述[J]. 控制与决策, 2023, 38(8): 2093–2105. doi: 10.13195/j.kzyjc.2023.0885. YANG Guanghong, LU Anyang, and AN Liwei. A survey on secure state estimation of cyber-physical systems under cyber attacks[J]. Control and Decision, 2023, 38(8): 2093–2105. doi: 10.13195/j.kzyjc.2023.0885.
[3]	LU Kangdi and WU Zhengguang. Resilient event-triggered load frequency control for cyber-physical power systems under DoS attacks[J]. IEEE Transactions on Power Systems, 2023, 38(6): 5302–5313. doi: 10.1109/TPWRS.2022.3229667.
[4]	OBAYYA M, AL-WESABI F N, ALABDAN R, et al. Artificial intelligence for traffic prediction and estimation in intelligent cyber-physical transportation systems[J]. IEEE Transactions on Consumer Electronics, 2024, 70(1): 1706–1715. doi: 10.1109/TCE.2023.3320513.
[5]	李云鹏, 张立宪, 韩岳江, 等. 基于模型预测控制的子母式无人机编队飞行控制方法[J]. 自动化学报, 2025, 51(2): 312–326. doi: 10.16383/j.aas.c240405. LI Yunpeng, ZHANG Lixian, HAN Yuejiang, et al. Model predictive control-based formation flight control method for composite UAVs[J]. Acta Automatica Sinica, 2025, 51(2): 312–326. doi: 10.16383/j.aas.c240405.
[6]	LANGNER R. Stuxnet: Dissecting a cyberwarfare weapon[J]. IEEE Security & Privacy, 2011, 9(3): 49–51. doi: 10.1109/MSP.2011.67.
[7]	KHAN S. Distributed sensors, computation and AI for automation, protection and maintenance of power grid[C]. Proceedings of the 2022 18th International Computer Engineering Conference, Cairo, Egypt, 2022: 130–135. doi: 10.1109/ICENCO55801.2022.10032522.
[8]	XU Hang, BARBOT S, and WANG Teng. Remote sensing through the fog of war: Infrastructure damage and environmental change during the Russian-Ukrainian conflict revealed by open-access data[J]. Natural Hazards Research, 2024, 4(1): 1–7. doi: 10.1016/j.nhres.2024.01.006.
[9]	WANG Zhe, ZHANG Heng, YANG Chaoqun, et al. Improved zero-dynamics attack scheduling with state estimation[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(2): 472–474. doi: 10.1109/JAS.2024.124737.
[10]	ZHAO Rui, ZUO Zhiqiang, SHI Yang, et al. DoS and stealthy deception attacks for switched systems: A cooperative approach[J]. IEEE Transactions on Automatic Control, 2024, 69(7): 4396–4410. doi: 10.1109/TAC.2023.3321248.
[11]	DE PERSIS C and TESI P. Input-to-state stabilizing control under denial-of-service[J]. IEEE Transactions on Automatic Control, 2015, 60(11): 2930–2944. doi: 10.1109/TAC.2015.2416924.
[12]	SU Lei and YE Dan. Observer-based output feedback control for cyber-physical systems under randomly occurring packet dropout and periodic DoS attacks[J]. ISA Transactions, 2019, 95: 58–67. doi: 10.1016/j.isatra.2019.05.008.
[13]	TAN Wen, HOU Zhongsheng, and LI Yuanxin. Data-driven containment control for unknown MIMO nonlinear MASs under aperiodic DoS attacks[J]. IEEE Transactions on Automation Science and Engineering, 2025, 22: 7762–7772. doi: 10.1109/TASE.2024.3469153.
[14]	DEBRUHL B and TAGUE P. Digital filter design for jamming mitigation in 802.15. 4 communication[C]. Proceedings of 20th International Conference on Computer Communications and Networks, Lahaina, USA, 2011: 1–6. doi: 10.1109/ICCCN.2011.6006020.
[15]	SHI Ting, SHI Peng, and CHAMBERS J. Dynamic event-triggered model predictive control under channel fading and denial-of-service attacks[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(4): 6448–6459. doi: 10.1109/TASE.2023.3325534.
[16]	YUAN Yuan, YUAN Huanhuan, GUO Lei, et al. Resilient control of networked control system under DoS attacks: A unified game approach[J]. IEEE Transactions on Industrial Informatics, 2016, 12(5): 1786–1794. doi: 10.1109/TII.2016.2542208.
[17]	SAEEDI M, ZAREI J, RAZAVI-FAR R, et al. Event-triggered adaptive optimal fast terminal sliding mode control under denial-of-service attacks[J]. IEEE Systems Journal, 2022, 16(2): 2684–2692. doi: 10.1109/JSYST.2021.3073816.
[18]	ZHU Yanzheng and ZHENG Weixing. Observer-based control for cyber-physical systems with periodic DoS attacks via a cyclic switching strategy[J]. IEEE Transactions on Automatic Control, 2020, 65(8): 3714–3721. doi: 10.1109/TAC.2019.2953210.
[19]	WU Chengwei, WU Ligang, LIU Jianxing, et al. Active defense-based resilient sliding mode control under denial-of-service attacks[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 237–249. doi: 10.1109/TIFS.2019.2917373.
[20]	SHEN Hao, LIU Xinmiao, MA Qian, et al. Observer-based control for interval type-2 fuzzy systems under PDT-based DoS attacks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025, 55(6): 3780–3790. doi: 10.1109/TSMC.2025.3547320.
[21]	WANG Fuxing, LONG Yue, and LI Tieshan. Thruster fault detection for unmanned marine vehicles under DoS attacks: An asynchronous switched method[C]. Proceedings of the 14th International Conference on Information Science and Technology, Chengdu, China, 2024: 554–559. doi: 10.1109/ICIST63249.2024.10805417.
[22]	华和安, 方勇纯, 钱辰, 等. 基于线性滤波器的四旋翼无人机强化学习控制策略[J]. 电子与信息学报, 2021, 43(12): 3407–3417. doi: 10.11999/JEIT210251. HUA He’an, FANG Yongchun, QIAN Chen, et al. Reinforcement learning control strategy of quadrotor unmanned aerial vehicles based on linear filter[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3407–3417. doi: 10.11999/JEIT210251.
[23]	REN Yan, ZHANG Heng, YANG Wei, et al. Transferable adversarial attack against deep reinforcement learning-based smart grid dynamic pricing system[J]. IEEE Transactions on Industrial Informatics, 2024, 20(6): 9015–9025. doi: 10.1109/TII.2024.3379645.
[24]	YIN Liyuan, XU Lezhong, HOU Fusheng, et al. Security analysis and control under periodic DoS attacks[J]. IEEE Internet of Things Journal, 2024, 11(5): 8473–8484. doi: 10.1109/JIOT.2023.3319703.
[25]	LIU Jinliang, DONG Yanhui, ZHA Lijuan, et al. Reinforcement learning-based tracking control for networked control systems with DoS attacks[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 4188–4197. doi: 10.1109/TIFS.2024.3376250.
[26]	GAO Weinan, DENG Chao, JIANG Yi, et al. Resilient reinforcement learning and robust output regulation under denial-of-service attacks[J]. Automatica, 2022, 142: 110366. doi: 10.1016/j.automatica.2022.110366.
[27]	LI Hao, CHEN Hua, and ZHANG Wei. On model-free reinforcement learning for switched linear systems: A subspace clustering approach[C]. Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing, Monticello, USA, 2018: 123–130, doi: 10.1109/ALLERTON.2018.8635985.
[28]	CHEN Hua, ZHANG Linfang, and ZHANG Wei. Optimal control inspired Q-learning for switched linear systems[C]. Proceedings of the 2020 American Control Conference (ACC), Denver, USA, 2020: 4003–4010. doi: 10.23919/ACC45564.2020.9147818.
[29]	ZHANG Xuewen, WANG Yun, XIA Jianwei, et al. Optimal tracking control for discrete-time modal persistent dwell time switched systems based on Q-learning[J]. Optimal Control Applications and Methods, 2023, 44(6): 3327–3341. doi: 10.1002/oca.3040.
[30]	SUN Jiayue, ZHANG Huaguang, WANG Yingchun, et al. Optimal tracking control of switched systems applied in grid-connected hybrid generation using reinforcement learning[J]. Neural Computing and Applications, 2021, 33(15): 9363–9374. doi: 10.1007/s00521-021-05696-2.
[31]	WU Jiacheng, LIAN Bosen, SU Hongye, et al. Data-driven weighted $\tiny{H_\infty }$ control of persistent dwell time switched systems with optimal disturbance attenuation guaranteed[J]. IEEE Transactions on Automation Science and Engineering, 2025, 22: 8162–8173. doi: 10.1109/TASE.2024.3480449.
[32]	ZHAI Lijing and VAMVOUDAKIS K G. Data-based and secure switched cyber-physical systems[J]. Systems & Control Letters, 2021, 148: 104826. doi: 10.1016/j.sysconle.2020.104826.
[33]	ZHANG Wei, ABATE A, HU Jianghai, et al. Exponential stabilization of discrete-time switched linear systems[J]. Automatica, 2009, 45(11): 2526–2536. doi: 10.1016/j.automatica.2009.07.018.
[34]	AL-TAMIMI A, LEWIS F L, and ABU-KHALAF M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control[J]. Automatica, 2007, 43(3): 473–481. doi: 10.1016/j.automatica.2006.09.019.
[35]	FEI Zhongyang, SHI Shuang, and SHI Peng. Analysis and Synthesis for Discrete-Time Switched Systems: A Quasi-Time-Dependent Method[M]. Cham: Springer, 2020: 23–25. doi: 10.1007/978-3-030-25812-2.