A Cross-Dimensional Collaborative Framework for Header-Metadata-Driven Encrypted Traffic Identification
-
摘要: 在网络通信加密技术广泛应用的背景下,加密流量识别已成为网络安全领域亟待攻克的核心难题。传统基于载荷内容的识别方法,因加密算法的持续升级,面临特征失效的风险,进而在动态网络环境中产生检测盲区。与此同时,报头作为协议交互的关键载体,其结构化特征价值尚未得到充分挖掘。此外,随着加密协议的不断发展,现有的加密流量识别方法还面临特征解释性不足、模型在对抗攻击下鲁棒性薄弱等问题。针对上述挑战,该文提出一种报头特征驱动的加密流量跨维度协同识别框架,分别从网络流量特征选取与识别性能、量化特征贡献度的可解释性评估以及对抗性扰动对模型稳健性影响3个维度进行分析,系统地揭示和证明了报头特征在加密流量识别中占主导作用,突破了传统单视角分析的局限性,革新了传统方法依赖载荷数据的固有认知。该识别框架不仅能分析深度模型的性能边界、评估决策的可信性,而且能通过有效筛选特征剪除冗余,在降低模型复杂度的基础上提升加密场景下的抗干扰能力,进而设计更轻量化、更加稳健的加密流量识别模型。最后,在ISCXVPN2016和ISCXTor2016数据集上的对比实验表明:在识别性能维度,仅基于报头特征的模型F1分数较完整流量模型最高提升6%,较仅基于有载荷特征的模型最高提升61%,验证了报头特征在分类任务中的有效性;在可解释性评估中,通过特征贡献度量化方法发现,报头特征相关性得分的平均占比相较于载荷特征最多高出 89.8%,凸显其在模型决策中的主导性影响;在抗干扰鲁棒性方面,含报头特征的模型在同等带宽扰动下的最大抗干扰性能保持率较纯载荷模型相比,优势显著,最大差距达 98.46%,证实了报头特征对增强模型鲁棒性的关键作用。Abstract:
Objective With the widespread adoption of network communication encryption technologies, encrypted traffic identification has become a critical problem in network security. Traditional identification methods based on payload content face the risk of feature invalidation due to the continuous evolution of encryption algorithms, leading to detection blind spots in dynamic network environments. Meanwhile, the structured information embedded in packet headers, an essential carrier for protocol interaction, remains underutilized. Furthermore, as encryption protocols evolve, existing encrypted traffic identification approaches encounter limitations such as poor feature interpretability and weak model robustness against adversarial attacks. To address these challenges, this paper proposes a cross-dimensional collaborative identification framework for encrypted traffic, driven by header metadata features. The framework systematically reveals and demonstrates the dominant role of header features in encrypted traffic identification, overcoming the constraints of single-perspective analyses and reducing dependence on payload data. It further enables the assessment of deep model performance boundaries and decision credibility. Through effective feature screening and pruning, redundant attributes are eliminated, enhancing the framework’s anti-interference capability in encrypted scenarios. This approach reduces model complexity while improving interpretability and robustness, facilitating the design of lighter and more reliable encrypted traffic identification models. Methods This study performs a three-dimensional analysis including (1) network traffic feature selection and identification performance, (2) quantitative evaluation of feature importance in classification, and (3) assessment of model robustness under adversarial perturbations. First, the characteristics, differences, and effects on identification performance are compared among three forms of encrypted traffic packets using a One-Dimensional Convolutional Neural Network (1D-CNN). This comparison verifies the dominant role of header features in encrypted traffic identification. Second, two explainable algorithms, Layer-wise Relevance Propagation (LRP) and Deep Taylor Decomposition (DTD), are employed to further confirm the essential contribution of header features to network traffic classification. The relative importance of header and payload features is quantified from two perspectives: (i) the relevance of backpropagation and (ii) the contribution coefficients derived from Taylor series expansion, thereby enhancing feature interpretability. Finally, adversarial attack experiments are conducted using Projected Gradient Descent (PGD) and random perturbations. By injecting carefully constructed adversarial perturbation data into the initial and terminal parts of the payload, or by adding randomly generated noise to produce adversarial traffic, the study examines the effect of these perturbations on model decision-making. This analysis evaluates the stability and anti-interference capabilities of the encrypted traffic identification model under adversarial conditions. Results and Discussions Comparative experiments conducted on the ISCXVPN2016 and ISCXTor2016 datasets yield three key findings. (1) Recognition performance. The model based solely on header features achieves an F1 score up to 6% higher than that of the model using complete traffic, and up to 61% higher than that of the model using only payload features. These results verify that header features possess irreplaceable significance in encrypted traffic identification. The structural information embedded in headers plays a dominant role in enabling the model to accurately classify traffic types. Even without payload data, high identification accuracy can be achieved using header information alone (Figure 2, Table 4). (2) Interpretability evaluation. The LRP and DTD methods are used to quantify the contribution of header features to model classification. The correlation between header features and classification performance is markedly higher than that of payload features, with the average proportion of the correlation score up to 89.8% greater (Figures 3~4, Table 5). This result is highly consistent with the classification behavior of the One-Dimensional Convolutional Neural Network (1D-CNN), further confirming the critical importance and dominant influence of header features in encrypted traffic identification. (3) Anti-interference robustness. The combined Header–Payload model exhibits strong robustness under adversarial attacks. Particularly under low-bandwidth conditions, the model incorporating header features shows a markedly higher maximum performance retention rate under equivalent bandwidth perturbation than the pure payload model, with the maximum difference reaching 98.46%. This finding confirms the essential role of header features in enhancing model robustness (Figures 5~6). Header-based models maintain stable recognition performance, whereas payload information is more susceptible to interference, leading to sharp performance degradation. In addition, the identification performance, contribution quantification, and anti-attack effectiveness of header features are influenced by data type and distribution characteristics. In certain cases, payload features provide auxiliary support, suggesting a complementary relationship between the two feature domains. Conclusions This study addresses core challenges in encrypted traffic identification, including feature degradation, limited interpretability, and weak adversarial robustness in traditional payload-dependent methods. A cross-dimensional collaborative identification framework driven by header features is proposed. Through systematic theoretical analysis and experimental validation from three perspectives, the framework demonstrates the irreplaceable value of header features in network traffic identification and overcomes the limitations of conventional single-perspective approaches. It provides a theoretical foundation for improving the efficiency, interpretability, and robustness of encrypted traffic identification models. Future work will focus on enhancing dynamic adaptability, integrating multi-modal features, implementing lightweight architectures, and strengthening adversarial defense mechanisms. These directions are expected to advance encrypted traffic identification technology toward higher intelligence, adaptability, and resilience. -
表 1 数据集样本分布情况(预处理后)(个)
流量类型 ISCXVPN2016
样本数量ISCXTor2016
样本数量Chat 2160 28284 Email 2488 16579 Filetransfer 25630 76912 P2P 21703 55818 Streaming 15103 40544 VoIP 12462 27557 表 2 注入扰动后的样本分布情况(个)
流量类型 ISCXVPN2016
样本数量ISCXTor2016
样本数量Chat 720 9428 Email 829 5526 Filetransfer 8543 25637 P2P 7234 18606 Streaming 5034 13514 VoIP 4154 9186 表 3 参数设置表
方法 参数名称 参数符号 参数值 方法 参数名称 参数符号 参数值 1D CNN 学习率 lr 0.002 PGD 训练集/测试集 train/test 8:2 权重衰减 weight_decay 0.001 最大迭代次数 max_iter 10 训练轮数 epoch 50 扰动阈值 eps 0.3 批量大小 batch_size 1024 梯度扰动步长 eps_iter 0.03 表 4 两种数据集下模型的流量识别效果
数据集 数据类型 Precision Recall F1 score Accuracy H P HP H P HP H P HP H P HP ISCXVPN2016 Chat 0.92 0.63 0.90 0.94 0.50 0.95 0.93 0.59 0.93 0.94 0.55 0.95 Email 0.92 0.86 0.96 0.88 0.52 0.83 0.90 0.65 0.89 0.88 0.52 0.83 Filetransfer 0.99 0.54 0.99 0.99 0.94 1.00 0.99 0.69 0.99 0.99 0.94 0.99 P2P 1.00 0.91 1.00 1.00 0.60 1.00 1.00 0.73 1.00 1.00 0.60 1.00 Streaming 0.99 0.53 0.99 1.00 0.29 1.00 0.99 0.38 0.99 1.00 0.29 1.00 VoIP 0.99 0.81 0.98 0.97 0.51 0.98 0.98 0.63 0.98 0.97 0.51 0.98 ISCXTor2016 Chat 0.88 0.84 0.64 0.53 0.24 0.67 0.67 0.38 0.65 0.50 0.37 0.81 Email 0.98 0.98 0.97 0.90 0.55 0.83 0.94 0.70 0.89 0.95 0.90 0.96 Filetransfer 0.99 1.00 0.99 0.98 0.71 0.86 0.98 0.83 0.92 0.98 0.94 0.98 P2P 0.95 0.87 0.90 0.97 0.89 0.97 0.96 0.88 0.94 0.97 0.97 0.99 Streaming 0.85 0.55 0.85 0.97 0.89 0.97 0.91 0.68 0.91 0.97 0.96 0.98 VoIP 0.88 0.81 0.86 0.79 0.82 0.84 0.83 0.81 0.85 0.81 0.86 0.92 表 5 2种数据集的每种类型数据的报头和载荷在LRP和DTD可解释方法下的相关性得分的平均占比
方法 数据集 类别 Chat Email Filetransfer P2P Streaming VoIP LRP ISCXVPN2016 H 0.89 0.95 0.88 0.87 0.86 0.89 P 0.11 0.05 0.12 0.13 0.14 0.11 ISCXTor2016 H 0.72 0.62 0.57 0.75 0.57 0.82 P 0.28 0.38 0.43 0.25 0.43 0.18 DTD ISCXVPN2016 H 0.88 0.82 0.97 0.84 0.88 0.88 P 0.12 0.18 0.03 0.16 0.12 0.12 ISCXTor2016 H 0.76 0.70 0.59 0.72 0.75 0.83 P 0.24 0.30 0.41 0.28 0.25 0.17 -
[1] CHOOROD P, WEIR G, and FERNANDO A. Classifying tor traffic encrypted payload using machine learning[J]. IEEE Access, 2024, 12: 19418–19431. doi: 10.1109/ACCESS.2024.3356073. [2] SHEN Meng, YE Ke, LIU Xingtong, et al. Machine learning-powered encrypted network traffic analysis: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2023, 25(1): 791–824. doi: 10.1109/COMST.2022.3208196. [3] ABBASI M, SHAHRAKI A, and TAHERKORDI A. Deep learning for network traffic monitoring and analysis (NTMA): A survey[J]. Computer Communications, 2021, 170: 19–41. doi: 10.1016/j.comcom.2021.01.021. [4] OKONKWO Z, FOO E, LI Qinyi, et al. A CNN based encrypted network traffic classifier[C]. 2022 Australasian Computer Science Week, Brisbane, Australia, 2022: 74–83. doi: 10.1145/3511616.3513101. [5] WANG Wei, ZHU Ming, WANG Jinlin, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]. 2017 IEEE International Conference on Intelligence and Security Informatics, Beijing, China, 2017: 43–48. doi: 10.1109/ISI.2017.8004872. [6] CUI Yuqing and LI Aihua. Research on network encrypted traffic detection technology based on CNN+LSTM[C]. 2024 2nd International Conference on Signal Processing and Intelligent Computing, Guangzhou, China, 2024: 191–195. doi: 10.1109/SPIC62469.2024.10691502. [7] HU Feifei, ZHANG Situo, LIN Xuebin, et al. Network traffic classification model based on attention mechanism and spatiotemporal features[J]. EURASIP Journal on Information Security, 2023, 2023(1): 6. doi: 10.1186/s13635-023-00141-4. [8] HONG Yueping, LI Qi, YANG Yanqing, et al. Graph based encrypted malicious traffic detection with hybrid analysis of multi-view features[J]. Information Sciences, 2023, 644: 119229. doi: 10.1016/j.ins.2023.119229. [9] YU Rongwei, GUO Xiya, ZHANG Peihao, et al. HGNN-ETC: Higher-order graph neural network based on chronological relationships for encrypted traffic classification[J]. Computers, Materials & Continua, 2024, 81(2): 2643–2664. doi: 10.32604/cmc.2024.056165. [10] DIAO Zulong, XIE Gaogang, WANG Xin, et al. EC-GCN: A encrypted traffic classification framework based on multi-scale graph convolution networks[J]. Computer Networks, 2023, 224: 109614. doi: 10.1016/j.comnet.2023.109614. [11] LIM W, YONG K S C, LAU B T, et al. Future of generative adversarial networks (GAN) for anomaly detection in network security: A review[J]. Computers & Security, 2024, 139: 103733. doi: 10.1016/j.cose.2024.103733. [12] DING Hongwei, SUN YU, HUANG Nana, et al. TMG-GAN: Generative Adversarial Networks-Based Imbalanced Learning for Network Intrusion Detection[J]. IEEE Transactions on Information Forensics and Security, 2023, 19: 1156–1167. doi: 10.1109/TIFS.2023.3331240. [13] JAIN S and WALLACE B C. Attention is not explanation[C]. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 2019: 3543–3556. doi: 10.18653/v1/N19-1357. [14] BINDER A, MONTAVON G, LAPUSCHKIN S, et al. Layer-wise relevance propagation for neural networks with local renormalization layers[C]. The 25th International Conference on Artificial Neural Networks and Machine Learning, Barcelona, Spain, 2016: 63–71. doi: 10.1007/978-3-319-44781-0_8. [15] KAUFFMANN J, MÜLLER K R, and MONTAVON G. Towards explaining anomalies: A deep Taylor decomposition of one-class models[J]. Pattern Recognition, 2020, 101: 107198. doi: 10.1016/j.patcog.2020.107198. [16] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[C]. The 6th International Conference on Learning Representations, Vancouver, Canada, 2018. [17] DRAPER-GIL G, LASHKARI A H, MAMUN M S I, et al. Characterization of encrypted and VPN traffic using time-related features[C]. The 2nd International Conference on Information Systems Security and Privacy, Rome, Italy, 2016: 407–414. doi: 10.5220/0005740704070414. [18] LASHKARI A H, GIL G D, MAMUN M S I, et al. Characterization of tor traffic using time based features[C]. The 3rd International Conference on Information Systems Security and Privacy, Porto, Portugal, 2017: 253–262. doi: 10.5220/0006105602530262. -
下载:
下载: