SHAP-based Reliable Threshold Decision-driven Remaining Useful Life Prediction for MOSFETs
-
摘要: 针对功率MOSFET传统固定阈值预警法与物理失效机理脱节的问题,该文提出一种融合可解释人工智能(XAI)的寿命预测框架。首先,设计一种自适应双阈值划分策略,融合K-means聚类与近端策略优化(PPO)算法;该策略以聚类获得的初始解为搜索起点,构建兼顾区间比例、状态转移灵敏度激励及阈值间距惩罚的多目标奖励函数,引导智能体优化阈值,实现退化阶段的精准划分。其次,为增强对黑盒决策过程的理解,引入SHAP可解释性分析,从特征与机理关联层面验证阈值决策的合理性。分析表明,低阈值由健康期稳态特征主导,满足安全基线要求;高阈值则由后期加速退化动力学特征主导,精准定位临界点,该分析证实了阈值决策的可信性与透明性。在此基础上,当退化数据超越可信低阈值时触发预警机制,并采用结合残差连接的堆叠门控循环单元(R-SGRU)进行剩余寿命预测。在NASA数据集上的实验表明,该模型预测性能显著优于长短期记忆网络(LSTM)和时间卷积网络(TCN)等多种模型,测试集MSE低于
0.001 5 ,R2高于0.98。该研究不仅为MOSFET早期预警提供了精确可靠的决策支持,更通过可解释技术建立了数据特征与物理机理的关联,推动了人工智能在器件预测领域向可信、可靠方向发展。-
关键词:
- 功率MOSFET /
- 近端策略优化 /
- 剩余寿命预测 /
- SHAP可解释性 /
- 残差连接的堆叠门控循环单元
Abstract:To address the disconnect between conventional fixed-threshold early warning methods for power MOSFETs and their physical failure mechanisms, this paper proposes a lifetime prediction framework that integrates Explainable Artificial Intelligence (XAI). First, an adaptive dual-threshold partitioning strategy is designed by combining K-means clustering with the Proximal Policy Optimization (PPO) algorithm. The initial solution obtained by K-means is used as the search starting point. A multi-objective reward function is then constructed to balance interval proportion, state-transition sensitivity, and threshold-spacing penalties. This function guides the agent in threshold optimization and enables accurate partitioning of degradation stages. Second, SHAPley additive explanations (SHAP) analysis is introduced to improve the interpretability of the black-box decision-making process. It verifies the rationality of threshold decisions from the perspective of feature-mechanism correlations. The results show that the low threshold is mainly governed by steady-state features in the healthy stage and meets the safety baseline requirement. The high threshold is dominated by dynamic features of late-stage accelerated degradation and accurately identifies the critical point. These findings confirm the reliability and transparency of the threshold decisions. Based on this framework, an early warning mechanism is triggered when degradation data exceed the reliable low threshold. A Residual-connected Stacked Gated Recurrent Unit (R-SGRU) is then used for Remaining Useful Life (RUL) prediction. Experiments on the NASA dataset show that the proposed model outperforms several baseline models, including Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). The test-set Mean Squared Error (MSE) is below 0.001 5, and R2 is above 0.98. This study provides accurate and reliable decision support for early warning in MOSFETs. It also links data features with physical mechanisms through explainable techniques, supporting the development of trustworthy artificial intelligence for device prognostics. Objective This study addresses two key issues in power MOSFET lifetime prognostics: the disconnect between conventional fixed-threshold early warning methods and physical mechanisms, and the limited interpretability of existing approaches. A framework integrating adaptive dual-threshold partitioning with XAI is proposed to support predictive maintenance with both physical credibility and high prediction accuracy. Methods An adaptive dual-threshold partitioning strategy is proposed by integrating K-means clustering with PPO reinforcement learning. Threshold positions are optimized using a multi-objective reward function to accurately identify degradation stages. SHAP analysis is used to quantify the contributions of 13-dimensional morphological features based on Shapley values. This validates the physical rationality of threshold decisions from a mechanistic perspective. When degradation data exceed the low threshold, an early warning is triggered. The R-SGRU network is then used for RUL prediction by capturing long-term dependencies through its gating mechanism. The proposed method is validated using the NASA dataset, forming a complete technical route from intelligent early warning to accurate prediction. Results and Discussions The thresholds optimized by PPO achieve the best performance across all metrics ( Table 1 ). SHAP analysis reveals the physical rationale for the threshold decisions. In the healthy stage, the low threshold is mainly governed by steady-state features. By contrast, the high threshold is determined by accelerated degradation dynamics. This result establishes a quantitative correlation between data-driven results and physical failure mechanisms. SHAP interaction heatmaps (Figs. 6 and7 ) further show the synergistic effects among features. Device failure is a complex process driven by the coordinated evolution of multiple features. The R-SGRU prediction model based on the optimized thresholds shows excellent performance on the NASA dataset (Table 5 ). Across the four device groups, the model achieves an MSE below 0.001 5 and an R2 above 0.98, outperforming the baseline models.Conclusions This study proposes an XAI-based framework for predicting the RUL of power MOSFETs. For threshold partitioning, an adaptive dual-threshold strategy combining K-means clustering and PPO reinforcement learning is adopted. A multi-objective reward function enables accurate identification of nonlinear degradation stages, and its performance is validated across four test devices. For interpretability, SHAP analysis provides mechanistic support for threshold decisions. The results show that low thresholds depend on steady-state features in the healthy period, whereas high thresholds are dominated by late-stage accelerated degradation features. This pattern is consistent with actual failure mechanisms. Feature interaction heatmaps reveal complex cooperative effects among multiple features and improve the understanding of the decision-making process. The R-SGRU prediction model shows strong time-series modeling capability and ensures high stability and accuracy. This work establishes a complete technical route from intelligent early warning to accurate prediction. It achieves adaptive threshold optimization and links data-driven results with physical mechanisms through interpretability analysis. The findings provide reliable support for the intelligent operation and maintenance of power MOSFETs. -
表 1 阈值性能对比表
划分方法 MSE R2 预警比例(%) 固定阈值 0.001987 0.976051 16.23 K-means 0.001953 0.977837 8.35 变化点 0.001599 0.978407 8.32 分位数 0.001698 0.977378 20.05 贝叶斯优化 0.001553 0.978950 26.37 PPO 0.00150 0.983891 15.68 表 2 低阈值SHAP分析全局重要性表
特征 SHAP值 初始波动 0.001315 首次偏离点 0.000960 前期斜率 0.000898 前期均值 0.000847 曲线曲率 0.000545 表 3 高阈值SHAP分析全局重要性表
特征 SHAP值 后期斜率 0.001333 后期均值 0.000778 加速拐点 0.000701 首次偏离点 0.000650 突变点数量 0.000440 表 4 模型消融实验精度对比
方法 MSE R2 训练时间(s) 预警准确率(%) GRU 0.001622 0.975846 59.61 93.56 SGRU 0.001675 0.977097 158.30 94.97 R-SGRU 0.0015 0.983891 167.64 95.11 表 5 多种预测方法精度对比
方法 MSE R2 训练时间(s) 预警准确率(%) LSTM 0.002127 0.967152 56.69 94.82 BiLSTM 0.001688 0.971866 102.02 94.88 TCN 0.001695 0.971414 463.96 93.21 Transformer 0.001598 0.974956 531.94 94.03 本文方法 0.0015 0.983891 167.64 95.11 表 6 器件剩余寿命预测误差对比(min)
器件 低阈值 实际剩余寿命 预测剩余寿命 误差 8号 95.2 32.4 32.1 0.3 9号 106.6 83.7 84.1 0.4 12号 112.6 75.3 75.6 0.3 14号 58.6 60.2 60.4 0.2 表 7 模型跨器件泛化性能验证结果
测试器件 训练器件 MSE R² 8号 9,12,14号 0.005548 0.93 9号 8,12,14号 0.008145 0.90 12号 8,9,14号 0.004257 0.94 14号 8,9,12号 0.009348 0.89 表 8 模型在不同退化阶段的预测性能
训练数据比例(%) MSE R² 40 0.016641 0.872 50 0.009604 0.911 60 0.004489 0.943 -
[1] ZHANG Yingying, WANG Xinpeng, and FENG Nianqiao. The path of green finance to promote the realization of low-carbon economic transformation under the carbon peaking and carbon neutrality goals: Theoretical model and empirical analysis[J]. International Review of Financial Analysis, 2024, 94: 103227. doi: 10.1016/j.irfa.2024.103227. [2] PICOT-DIGOIX M, RICHARDEAU F, BLAQUIÈRE J M, et al. Gate voltage dip as a new indicator for online health monitoring of SiC MOSFETS[J]. IEEE Transactions on Power Electronics, 2025, 40(1): 142–145. doi: 10.1109/TPEL.2024.3476553. [3] GAO Le, LIU Chaoming, XIAO Yiping, et al. Remaining useful life prediction of power electronic devices with physics-informed deep learning and sparse data[J]. IEEE Transactions on Power Electronics, 2025, 40(11): 16068–16073. doi: 10.1109/TPEL.2025.3563853. [4] LI Xu, DENG Xiaochuan, LIAO Zhengxiang, et al. Failure mechanism of 1200-V SiC MOSFET with embedded Schottky barrier diode under short-circuit condition[J]. IEEE Transactions on Electron Devices, 2025, 72(3): 1259–1263. doi: 10.1109/TED.2024.3524371. [5] 石欣, 张夏恒, 朱雅亲, 等. 基于VMD-NARX的MOSFET剩余使用寿命预测方法[J]. 仪器仪表学报, 2023, 44(9): 275–286. doi: 10.19650/j.cnki.cjsi.J2311277.SHI Xin, ZHANG Xiaheng, ZHU Yaqin, et al. Method for predicting the remaining useful life of MOSFETs based on VMD-NARX[J]. Chinese Journal of Scientific Instrument, 2023, 44(9): 275–286. doi: 10.19650/j.cnki.cjsi.J2311277. [6] 张明宇, 王琦, 于洋. 基于LSTM-DHMM的MOSFET器件健康状态识别与故障时间预测[J]. 电子学报, 2022, 50(3): 643–651. doi: 10.12263/DZXB.20210047.ZHANG Mingyu, WANG Qi, and YU Yang. Health status identification and fault time prediction of MOSFET device based on LSTM-DHMM[J]. Acta Electronica Sinica, 2022, 50(3): 643–651. doi: 10.12263/DZXB.20210047. [7] 拓云天. 基于维纳过程的组件分阶段剩余寿命预测方法研究[D]. [硕士论文], 中北大学, 2023.TUO Yuntian. Research on staged remaining useful life prediction method of components based on wiener process[D]. [Master dissertation], North University of China, 2023. [8] 罗妍, 王枞, 叶文玲. 基于XGBoost和SHAP的急性肾损伤可解释预测模型[J]. 电子与信息学报, 2022, 44(1): 27–38. doi: 10.11999/JEIT210931.LUO Yan, WANG Cong, and YE Wenling. An interpretable prediction model for acute kidney injury based on XGBoost and SHAP[J]. Journal of Electronics & Information Technology, 2022, 44(1): 27–38. doi: 10.11999/JEIT210931. [9] LI Yang, LIU Zhenbao, JIA Zhen, et al. Fault diagnosis strategy for flight control rudder circuit based on SHAP interpretable analysis optimization transformer with attention mechanism[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 3534214. doi: 10.1109/TIM.2024.3470041. [10] ALOMARI Y and ANDÓ M. SHAP-based insights for aerospace PHM: Temporal feature importance, dependencies, robustness, and interaction analysis[J]. Results in Engineering, 2024, 21: 101834. doi: 10.1016/j.rineng.2024.101834. [11] CELAYA J R, SAXENA A, SAHA S, et al. Prognostics of power MOSFETs under thermal stress accelerated aging using data-driven and model-based methodologies[C]. Annual Conference of the PHM Society, 2011. doi: 10.36001/PHMCONF.2011.V3I1.1995. [12] VIJAY J, VAISHNAVI S, and PRABHAKAR V. Enhancing wave parameters prediction: Machine learning models combined with PCHIP[J]. Ocean Dynamics, 2025, 75(12): 101. doi: 10.1007/s10236-025-01748-6. [13] CHEN Yanlu, HU Lei, HU Niaoqing, et al. A synchrosqueezed transform method based on fast kurtogram and demodulation and piecewise aggregate approximation for bearing fault diagnosis[J]. Sensors, 2024, 24(8): 2502. doi: 10.3390/s24082502. [14] 谭国平, 易文雄, 周思源, 等. 无人机辅助MEC车辆任务卸载与功率控制近端策略优化算法[J]. 电子与信息学报, 2024, 46(6): 2361–2371. doi: 10.11999/JEIT230770.TAN Guoping, YI Wenxiong, ZHOU Siyuan, et al. Proximal policy optimization algorithm for UAV-assisted MEC vehicle task offloading and power control[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2361–2371. doi: 10.11999/JEIT230770. [15] VAN DEN BROECK G, LYKOV A, SCHLEICH M, et al. On the tractability of SHAP explanations[J]. Journal of Artificial Intelligence Research, 2022, 74: 851–886. doi: 10.1613/jair.1.13283. [16] 张红, 伊敏, 张玺君, 等. 长期Transformer和自适应傅里叶变换的动态图卷积交通流预测研究[J]. 电子与信息学报, 2025, 47(7): 2249–2262. doi: 10.11999/JEIT241076.ZHANG Hong, YI Min, ZHANG Xijun, et al. Long-term transformer and adaptive fourier transform for dynamic graph convolutional traffic flow prediction study[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2249–2262. doi: 10.11999/JEIT241076. -
下载:
下载: