A Multi-Dimensional Scenario-Based Evaluation Method for Deep Learning Side-Channel Analysis Using a Multi-Attribute Decision Model
-
摘要: 本文针对深度学习侧信道分析(DL-SCA)模型评估中存在的维度单一、公平性不足以及与工程场景脱节的问题,提出一种基于系统工程的多维度场景化评估框架。该框架首先构建了覆盖攻击效能、资源开销与环境适应性的层次化评估指标体系;其次,设计了一种CRITIC-AHP混合多属性决策机制,融合数据驱动的客观赋权与场景导向的主观权重,实现评估与不同应用需求的精准匹配;在此基础上,定义多维度攻击性能指标,融合多维度信息,生成直观、可比的综合评分,为模型优选提供统一量化依据。基于ASCAD数据集的实验表明,该框架在资源受限、高性能、高噪声及实时等典型场景下均能有效区分模型优势,如在资源受限场景中CNN综合评分最高(0.723),在高噪声环境下CNN-LSTM表现最优(0.863),显著提升了模型选型的科学性与可解释性。Abstract:
Objective The application of deep learning has significantly advanced side-channel analysis (DL-SCA), enabling attacks against protected implementations. However, transitioning DL-SCA models from research to practical deployment is hindered by the lack of systematic, fair, and scenario-aware evaluation methodologies. Current evaluations predominantly rely on one-dimensional metrics like Guessing Entropy (GE) and Success Rate (SR), neglecting critical practical dimensions such as resource consumption and environmental robustness. Furthermore, comparisons are often unfair due to inconsistent hyperparameter optimization, and they fail to provide quantifiable guidance for model selection tailored to diverse real-world constraints (e.g., resource-limited devices, high-noise environments, or real-time requirements). This paper aims to address these gaps by proposing a comprehensive, systems engineering-based evaluation framework that enables holistic, quantifiable, and scenario-adaptive assessment of DL-SCA models. Methods A multi-dimensional, scenario-based evaluation framework is constructed based on systems engineering principles. First, a hierarchical evaluation index system is established, encompassing three criteria (attack efficacy, resource overhead, and environmental adaptability) and six specific metrics (GE, SR, training time TC, peak memory consumption MC, model complexity MoC, and noise robustness Rob). Second, a standardized evaluation process following the "V-model" is designed to ensure fairness. This process mandates independent hyperparameter optimization for each candidate model (MLP, CNN, CNN-LSTM) using grid search before comprehensive multi-dimensional data collection. Third, the core of the framework is a hybrid CRITIC-AHP (Criteria Importance Through Intercriteria Correlation - Analytic Hierarchy Process) Multi-Attribute Decision Making (MADM) engine. The CRITIC method derives objective weights from the statistical characteristics (contrast intensity and conflict) of the measured data matrix. The AHP method incorporates subjective, scenario-specific preferences (e.g., prioritizing low memory or high robustness) through pairwise comparison matrices. These weights are fused to generate final scenario-adapted weights. Finally, a Multi-dimensional Attack Performance Metric (MAPM) is defined as the linear weighted sum of normalized metric values using the fused weights, providing a single, comparable score for each model under a given scenario. Results and Discussions The framework is rigorously validated using the standard ASCAD fixed-key dataset. After independent optimization, the three model architectures are evaluated across all six metrics. The CRITIC method yields an objective base weight vector: W_critic = [0.17, 0.19, 0.15, 0.21, 0.14, 0.14]. For four predefined scenarios (Resource-Constrained, High-Performance, High-Noise, Real-Time), specific AHP judgments are made and fused with the objective weights to produce the final adapted weights ( Table 8 ). For instance, in the Resource-Constrained scenario, memory consumption (MC) receives the highest weight (0.52), while in the High-Noise scenario, robustness (Rob) is dominant (0.57). The calculated MAPM scores (Table 9 ,Fig.9 ,Fig.10 ) clearly quantify the differentiated advantages of each model and demonstrate the framework's scenario-aware decision capability: CNN achieves the highest score in High-Performance scenarios (0.894), MLP excels in Real-Time scenarios (0.758) due to its lowest training time, and CNN-LSTM performs best in High-Noise scenarios (0.863) owing to its superior robustness, despite its higher resource cost. These results effectively prove that there is no universally "best" model and that the proposed MAPM provides a clear, quantitative basis for model selection under specific engineering constraints.Conclusions This paper proposes a novel systems engineering-based, multi-dimensional evaluation framework to address the key limitations in current DL-SCA model assessment. By integrating a hierarchical index system, a fair V-model process, and a hybrid CRITIC-AHP MADM engine, the framework successfully quantifies and balances the trade-offs between attack efficacy, resource cost, and environmental adaptability. The experimental results on the ASCAD benchmark demonstrate its practical utility in generating clear, quantifiable, and scenario-aware model selection guidelines. The proposed MAPM offers a direct decision basis for engineers facing diverse deployment contexts, bridging the gap between academic attack construction and practical model deployment in DL-SCA. Future work may involve extending the evaluation to more model architectures and datasets, enhancing the automation of the framework, and validating it in real-world deployment scenarios. -
表 1 基于层次化分解的评估指标体系
层次 名称 要素 说明 目标层 模型优选 选择最优DL-SCA模型 评估的最终目标 准则层 攻击效能 恢复密钥的核心能力 功能性准则 资源开销 计算与存储成本 非功能性准则 环境适应性 噪声下的稳定性 环境性准则 指标层 猜测熵 (GE) 负向指标 衡量攻击效率 成功率 (SR) 正向指标 衡量即时有效性 时间复杂度 (TC) 负向指标 训练时间成本 空间复杂度 (MC) 负向指标 峰值内存成本 模型复杂度 (MoC) 负向指标 参数量(存储成本) 噪声鲁棒性 (Rob) 正向指标 抗干扰能力 方案层 候选模型 MLP, CNN, CNN-LSTM 待评估的实体 表 2 实验环境配置
配置项 参数 CPU Intel Core i7-11700K GPU NVIDIA GeForce RTX 4090 (24 GB) 内存 128GB 操作系统 Windows11 深度学习框架 TensorFlow 2.10.0,Keras 2.10.0 CUDA/cuDNN 11.2/8.1.0 表 3 超参寻优网络配置表
Hyperparameters MLP CNN CNNLSTM FC Layers 5 2 - Neurons 200 [512,256] - Filters - [64,128,256,512] 4 Kernel size - 3 50 Conv layers - 4 1 Lstm units - - 128 Batch size 100 300 200 Activation ReLU ReLU ReLU Learning rate 1e-5 1e-4 1e-5 Epoch 300 200 500 Optimizer RMSprop RMSprop RMSprop 表 4 攻击效能指标对比(测试集:N=10,000条轨迹)
迹数量 CNN_GE CNN_SR MLP_GE MLP_SR CNN-LSTM_GE CNN-LSTM_SR 100 76.0 0.0 85.0 0.0 90.0 0.0 300 12.0 0.55 45.0 0.10 70.0 0.05 500 1.5 0.88 25.0 0.45 55.0 0.15 1000 0.2 0.98 12.0 0.75 40.0 0.30 1500 0.05 0.995 5.5 0.99 28.0 0.50 2000 0.0 1.000 0.3 1.00 8.0 0.90 3000 0.0 1.000 0.0 1.00 3.0 0.97 4000 0.0 1.000 0.0 1.00 0.0 1.00 表 5 资源开销指标对比
模型 参数量(MoC/万) 训练时间(TC/s) 峰值内存(MC/MB) MLP 35 568 1950 CNN 425 2850 520 CNN-LSTM 78 1450 3050 表 6 鲁棒性(Rob)指标对比
噪声水平 MLP_SR MLP_GE CNN_SR CNN_GE CNN-LSTM_SR CNN-LSTM_GE 0.0 0.992 0.08 0.784 10.25 0.865 6.45 0.1 0.980 0.15 0.770 12.50 0.860 7.20 0.2 0.960 0.30 0.750 15.80 0.855 8.50 0.3 0.930 0.65 0.720 20.10 0.850 10.20 0.4 0.890 1.20 0.680 25.50 0.845 12.80 0.5 0.840 2.10 0.630 32.00 0.840 16.00 0.6 0.780 3.50 0.570 40.20 0.835 20.50 0.7 0.710 5.80 0.500 50.10 0.830 25.80 表 7 高噪声场景AHP判断矩阵及一致性检验
指标 GE SR TC MC MoC Rob 权重 GE 1 1 3 3 3 1/4 0.15 SR 1 1 3 3 3 1/4 0.15 TC 1/3 1/3 1 1 1 1/5 0.06 MC 1/3 1/3 1 1 1 1/5 0.06 MoC 1/3 1/3 1 1 1 1/5 0.06 Rob 4 4 5 5 5 1 0.52 表 8 各场景最终权重分配
评估指标 资源受限场景 高性能场景 高噪声场景 实时场景 猜测熵(GE) 0.04 0.37 0.06 0.09 成功率(SR) 0.06 0.41 0.04 0.11 训练时间(TC) 0.11 0.05 0.12 0.38 内存占用(MC) 0.52 0.06 0.13 0.29 模型复杂度(MoC) 0.12 0.04 0.08 0.08 鲁棒性(Rob) 0.15 0.07 0.57 0.05 表 9 多维度场景化综合评估(MAPM)结果与排名
模型 资源受限场景 排名 高性能场景 排名 高噪声场景 排名 实时场景 排名 CNN 0.723 1 0.894 1 0.382 3 0.501 2 MLP 0.608 2 0.832 2 0.785 2 0.758 1 CNN-LSTM 0.289 3 0.214 3 0.863 1 0.324 3 表 10 场景-指标约束映射表
场景 典型部署环境 核心约束 对应指标 数据来源 资源受限 物联网边缘节点、智能卡、低功耗MCU 存储与算力严格受限 峰值内存(MC)、参数量(MoC) 来自训练过程测量 高性能 GPU服务器、云端计算平台 资源充裕,唯攻击效能论 猜测熵(GE)、成功率(SR) 来自测试集攻击实验 高噪声 工业现场、电磁泄露远距离/非侵入攻击 环境信噪比低,干扰强烈 噪声鲁棒性(Rob) 来自加噪测试 实时攻击 在线攻击系统、支付终端、车载/物联网IDS 训练或推理时延严格受限 训练时间(TC) 来自训练过程测量 -
[1] KOCHER P C. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems[C]. Advances in Cryptology - CRYPTO’96, 16th Annual International Cryptology Conference, Santa Barbara, USA, 1996: 104–113. doi: 10.1007/3-540-68697-5_9. [2] KOCHER P, JAFFE J, and JUN B. Differential power analysis[C]. Advances in Cryptology - CRYPTO’99, 9th Annual International Cryptology Conference, Santa Barbara, USA, 1999: 388–397. doi: 10.1007/3-540-48405-1_25. [3] BRIER E, CLAVIER C, and OLIVIER F. Correlation power analysis with a leakage model[C]. Cryptographic Hardware and Embedded Systems - CHES 2004, 6th International Workshop, Cambridge, USA, 2004: 16–29. doi: 10.1007/978-3-540-28632-5_2. [4] ZHANG Fan, DONG Xiaofei, YANG Bolin, et al. A systematic evaluation of wavelet-based attack framework on random delay countermeasures[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 1407–1422. doi: 10.1109/TIFS.2019.2941774. [5] MAGHREBI H, PORTIGLIATTI T, and PROUFF E. Breaking cryptographic implementations using deep learning techniques[C]. Security, Privacy, and Applied Cryptography Engineering, 6th International Conference, SPACE 2016, Hyderabad, India, 2016: 3–26. doi: 10.1007/978-3-319-49445-6_1. [6] HETTWER B, GEHRER S, and GUNEYSU T. Deep neural network based cryptanalysis of lightweight block ciphers[J]. IACR Transactions on Symmetric Cryptology, 2020, 2020(3): 49–78. doi: 10.46586/tosc.v2020.i3.49-78. [7] BENADJILA R, PROUFF E, STRULLU R, et al. Deep learning for side-channel analysis and introduction to ASCAD database[J]. Journal of Cryptographic Engineering, 2020, 10(2): 163–188. doi: 10.1007/s13389-019-00220-8. [8] MARTINAZZI S, ZANKL A, SCHILLING M, et al. A systematic review of deep learning for side-channel analysis: Challenges and opportunities[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 1058–1073. doi: 10.1109/TIFS.2023.3321123. [9] GOHR A. Improving attacks on round-reduced speck32/64 using deep learning[C]. Advances in Cryptology - CRYPTO 2019, 39th Annual International Cryptology Conference, Santa Barbara, USA, 2019: 150–179. doi: 10.1007/978-3-030-26951-7_6. [10] WOUTERS L, ARRIBAS V, GIERLICHS B, et al. Revisiting a methodology for efficient CNN architectures in profiling attacks[J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2020, 2020(3): 147–168. doi: 10.13154/tches.v2020.i3.147-168. [11] PARK D, LEE K, and KIM H. CASTLE: A context-aware strategy for tunable evaluation of deep learning SCA models[J]. IEEE Transactions on Dependable and Secure Computing, 2024, 21(3): 2145–2159. doi: 10.1109/TDSC.2023.3258246. [12] ZAID G, BOSSUET L, HARBRECHT H, et al. Towards efficient and scalable side-channel attacks modeling using convolutional neural networks[C]. 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020: 133–138. (查阅网上资料, 未找到本条文献信息, 请确认). [13] BENADJILA R, PROUFF E, STRULLU R, et al. Study of deep learning techniques for side-channel analysis and introduction to ASCAD database[J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2018, 2018(3): 1–35. doi: 10.46586/tches.v2018.i3.1-35. [14] WANG Z, LIU Y, SONG N, et al. Robustness assessment of deep learning-based side-channel analysis against adversarial trace perturbations[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(5): 789–802. doi: 10.1109/TCAD.2023.3334567. [15] RIJSDIJK J, WU Lichao, PERIN G, et al. Reinforcement learning for hyperparameter tuning in deep learning-based side-channel analysis[J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021, 2021(3): 677–707. doi: 10.46586/tches.v2021.i3.677-707. [16] WU L, PERIN G, and PICEK S. The (un)fairness of deep learning-based side-channel analysis: A large-scale benchmarking study[J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024(2): 1–30. doi: 10.46586/tches.v2024.i2.1-30. [17] ZHANG J, WANG H, LIU Z, et al. Evaluating robustness of deep learning-based side-channel attacks against adversarial traces[C]. 2023 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), Oxford, UK, 2023: 1–8. (查阅网上资料, 未找到本条文献信息, 请确认). [18] CHEN L, WANG Y, LIU J, et al. A lightweight CNN architecture for side-channel analysis on embedded devices[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024, 71(2): 456–460. doi: 10.1109/TCSII.2023.3329876. [19] KUMAR A, ZHOU Y, and BHATTACHARYA S. On the trade-offs between model complexity and attack efficiency in deep learning-based SCA[C]. Proceedings of the 2023 ACM Workshop on Attacks and Solutions in Hardware Security (ASHES’23), New York, USA, 2023: 45–52. (查阅网上资料, 未找到本条文献信息, 请确认). [20] SÁNCHEZ P, ROJAS E, and ROY D B. MESA: A multi-objective evaluation framework for security applications using systematic weighting[J]. ACM Transactions on Privacy and Security, 2023, 26(4): 1–30. doi: 10.1145/3592612. (查阅网上资料,未找到本条文献信息,请确认). [21] PERIN G, WU L, and PICEK S. Exploring the trade-offs: Model accuracy vs. complexity in deep learning SCA[C]. Constructive Side-Channel Analysis and Secure Design – COSADE 2022, Milan, Italy, 2022: 189–209. (查阅网上资料, 未找到本条文献信息, 请确认). [22] DUBEY A and MUKHOPADHYAY D. Noise tolerance of deep learning based side channel attacks: An experimental study[J]. Journal of Cryptographic Engineering, 2023, 13(4): 431–449. doi: 10.1007/s13389-022-00311-x. [23] LEE K and PARK D. AutoSCA: An automated framework for fair and reproducible side-channel analysis with deep learning[J]. IEEE Access, 2024, 12: 45678–45692. doi: 10.1109/ACCESS.2024.3369876. [24] DUBOIS S, NAJM Z, and DANGER J L. One metric to rule them all? A critical discussion on the evaluation of deep learning-based side-channel attacks[C]. Constructive Side-Channel Analysis and Secure Design – COSADE 2023, Munich, Germany, 2023: 89–110. (查阅网上资料, 未找到本条文献信息, 请确认). [25] LIU Weifeng, LI Wenchang, CAO Xiaodong, et al. Full-element analysis of side-channel leakage dataset on symmetric cryptographic advanced encryption standard[J]. Symmetry, 2025, 17(5): 769. doi: 10.3390/sym17050769. -
下载: