A Network Traffic Anomaly Detection Method Integrating Unsupervised Adaptive Sampling with Enhanced Siamese Network
-
摘要: 针对基于传统机器学习的网络流量异常检测方法受流量数据类别不平衡的影响检测性能较差的问题,该文提出一种无监督自适应抽样与改进孪生网络结合的网络流量异常检测方法。首先,设计基于K-medoids的自适应小样本抽样算法(KAFS),利用无监督聚类对各类流量动态自适应地抽取更具代表性的少量样本,使正常和攻击流量均衡,提高训练小样本学习模型的数据质量。然后,构建具有鲁棒损失函数的孪生多层感知机(SMLP)模型用于流量异常检测,该模型利用两个相同结构的多层感知机对训练集中的成对流量样本进行训练,捕捉跨流量特征的非线性关系,学习流量数据的异同,进一步提高对攻击流量的分类精度。实验结果表明,所提方法在CICIDS2017和CICIDS2018数据集上的检测准确率分别可达99.80%和98.26%。与其他方法相比,该方法对未知攻击的检出率分别提高了至少2.85%和1.73%,有效提升流量异常检测性能。Abstract:
Objective The increasing complexity of network architectures and the rising frequency of cyberattacks have heightened the demand for effective network traffic anomaly detection. While machine learning and deep learning approaches have been widely applied, their effectiveness is often limited by the class imbalance commonly observed in network traffic data. To address this limitation, this study proposes a network traffic anomaly detection method integrating unsupervised adaptive sampling with enhanced Siamese network. An adaptive sampling algorithm is developed to balance the distribution of normal and anomalous traffic, improving the representativeness of training data. A Siamese Multi-Layer Perceptron (SMLP) model is then trained using a robust loss function to capture both similarities and differences in traffic patterns. This architecture enhances the model’s ability to identify anomalies, particularly under class-imbalance conditions. The proposed framework provides a scalable and data-efficient approach for improving the accuracy of network anomaly detection and reinforcing cybersecurity. Methods The proposed a K-medoids-based Adaptive Few-shot Sampling (KAFS) algorithm applies unsupervised K-medoids clustering to group traffic data within each category based on feature distributions, forming multiple subclasses. From these, a small number of representative samples are adaptively selected to construct a balanced few-shot training set. This approach maintains a proportionate representation of normal and attack traffic, reducing model bias toward the dominant normal class and ensuring more effective learning across categories. Sample quality is further improved by prioritizing representativeness during selection. For the constructed training set, a traffic anomaly detection model based on a SMLP is designed. The model’s loss function combines encoding loss from the MLP with a prediction loss defined by the distance between anchor samples and corresponding normal or malicious samples. This structure enables the model to distinguish both similarities and subtle differences in traffic behavior, thereby enhancing the accuracy of attack traffic detection. Results and Discussions The proposed network traffic anomaly detection method, which integrates unsupervised adaptive sampling with an enhanced Siamese network, demonstrates strong performance on the CICIDS2017 and CICIDS2018 datasets. As shown in Fig. 8 , the SMLP model trained using traffic samples generated by the KAFS sampling algorithm achieves superior detection performance, confirming the effectiveness of the KAFS approach. InFig. 9 , detection accuracies of 99.80% and 98.26% are achieved for attack-class traffic in the CICIDS2017 and CICIDS2018 datasets, respectively. Evaluation metrics presented inFig. 9 andFig. 10 show that the proposed method consistently outperforms other Siamese network architectures and loss functions in terms of accuracy, precision, recall, and F1-score, further supporting the validity of the SMLP design. As shown inTables 4 and6 , the method attains detection performance comparable to that of state-of-the-art algorithms while using substantially fewer samples, highlighting its suitability for practical deployment where data availability may be limited. Statistical analysis of the results (Tables 5 and8 ) confirms that the performance gains achieved by the proposed method are statistically significant.Fig. 11 andFig. 12 further illustrate that the method delivers notable improvements over existing approaches in detecting unknown attack types, demonstrating its adaptability and robustness under evolving threat conditions.Conclusions This study addresses the challenges of sparse attack traffic and class imbalance in network traffic anomaly detection by proposing a method that combines unsupervised adaptive sampling with an enhanced Siamese network. A KAFS algorithm is designed to dynamically select representative training sets using unsupervised clustering. To enable accurate detection with limited input samples, a SMLP is developed to compute distances between traffic samples. A robust loss function is introduced, incorporating both encoding loss from the MLP and prediction loss based on the distance between anchor, normal, and malicious samples, thereby improving training efficiency. Experimental validation using the CICIDS2017 and CICIDS2018 datasets confirm the method’s effectiveness in detecting attack traffic with few samples. Future work will focus on further enhancing few-shot intrusion detection techniques to improve detection accuracy in real-world network environments. -
表 1 CICIDS2017数据集训练数据分布
类别 正常流量 攻击流量 类型 正常流量 DDoS DoS GoldenEye PortScan Bot FTP-Patator SSH-Patator 样本量 43 32 17 25 16 24 20 表 2 CICIDS2018数据集训练数据分布
类别 正常流量 攻击流量 类型 正常流量 Bot Brutefoce DoS Infiltration 样本量 29 42 12 25 20 表 3 流量异常检测类别不平衡问题的先进方法的结构及检测结果对比(%)
方法 抽样方法 K值设定 模型 损失 准确率 检出率 精确率 F1-score FC-Net 随机抽样 固定且相同 CNN and DNN 均方误差 95.67 95.28 94.32 94.80 FS-IDS 随机抽样 固定且相同 AE+CNN 均方误差 97.71 96.56 97.88 97.22 LIO-IDS 过采样 固定且相同 LSTM + I-OVO 分类交叉熵 97.56 99.24 95.08 97.11 本文方法 KAFS 自适应 SMLP 二进制交叉熵 99.80 99.61 99.90 99.75 表 4 不同方法的多分类精确率和检出率对比(%)
类型 方法 FC-Net FS-IDS 所提方法 精确率 检出率 精确率 检出率 精确率 检出率 DDoS 98.45 97.62 98.36 97.97 99.58 99.87 DoS GoldenEye 89.77 99.52 96.07 99.28 99.94 99.54 PortScan 85.46 99.77 92.86 99.72 99.08 99.84 Bot 98.32 98.73 97.98 96.78 99.14 99.91 FTP-Patator 99.24 99.34 99.71 99.59 99.82 100.00 SSH-Patator 99.49 100.00 99.61 99.92 99.99 99.77 表 5 所提方法与CICIDS2017上的先进方法检测性能的对比
方法 模型结构 样本数 准确率(%) 精确率(%) 检出率(%) F1-score(%) VFBLS 多特征广义学习系统 21942 97.39 96.90 97.60 97.25 HNN CNN/LSTM+DNN 225745 99.84 99.98 99.13 99.55 DEIL-RVM 动态集成相关向量机 2830696 99.56 99.44 99.41 99.42 FCL-SBLS 联邦持续学习和堆叠广义学习系统 2264557 95.28 94.14 94.30 94.22 所提方法 KAFS和SMLP 177 99.80 99.90 99.61 99.75 表 6 现有先进方法与所提方法对CICIDS2017数据集在检测性能的统计显著性水平
方法 准确率 精确率 检出率 F1-score VFBLS 1.60e-11 8.20e-13 1.58e-10 6.61e-11 HNN 0.20 2.30e-02 7.65e-05 2.49e-02 DEIL-RVM 1.48e-04 5.48e-06 2.36e-02 5.04e-04 FCL-SBLS 5.98e-14 3.89e-15 1.26e-13 5.15e-14 表 7 所提方法与CICIDS2018上的先进方法检测结果对比
方法 样本量 准确率(%) 精确率(%) 检出率(%) F1-score(%) DSSTE+miniVGGNeT 154034 97.26 94.46 95.18 94.82 ICVAE-BSM 1628599 97.83 96.30 95.42 95.86 FL-IIDS 75955 98.21 96.35 96.27 96.31 所提方法 141 98.26 96.94 96.44 96.68 表 8 与CICIDS2018上的先进方法多分类检测精确率和检出率对比(%)
类型 方法 ICVAE-BSM DSSTE+ miniVGGNeT FL-IIDS 所提方法 精确率 检出率 精确率 检出率 精确率 检出率 精确率 检出率 Bot 91.04 94.56 89.11 95.36 96.14 99.65 95.82 99.70 Bruteforce 92.29 94.37 89.08 95.24 92.36 100.00 93.86 99.61 DoS 92.52 93.81 91.99 92.59 97.16 98.40 97.51 95.87 Infiltration 87.72 93.77 87.38 93.08 93.91 82.06 95.65 95.39 表 9 现有先进方法与所提方法对CICIDS2018数据集在检测性能的统计显著性水平
方法 准确率 精确率 检出率 F1-score ICVAE-BSM 2.99e-05 5.15e-06 2.30e-06 4.91e-07 DSSTE+ miniVGGNeT 6.18e-08 1.02e-11 2.42e-10 3.34e-10 FL-IIDS 6.27e-03 1.33e-08 4.19e-06 5.82E-04 -
[1] 潘成胜, 李志祥, 杨雯升, 等. 基于二次特征提取和BiLSTM-Attention的网络流量异常检测方法[J]. 电子与信息学报, 2023, 45(12): 4539–4547. doi: 10.11999/JEIT221296.PAN Chengsheng, LI Zhixiang, YANG Wensheng, et al. Anomaly detection method of network traffic based on secondary feature extraction and BiLSTM-attention[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4539–4547. doi: 10.11999/JEIT221296. [2] GUPTA N, JINDAL V, and BEDI P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems[J]. Computers & Security, 2022, 112: 102499. doi: 10.1016/j.cose.2021.102499. [3] LEEVY J L and KHOSHGOFTAAR T M. A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data[J]. Journal of Big Data, 2020, 7(1): 104. doi: 10.1186/s40537-020-00382-x. [4] HE Xiaoqiang, CHEN Qianbin, TANG Lun, et al. Federated continuous learning based on stacked broad learning system assisted by digital twin networks: An incremental learning approach for intrusion detection in UAV networks[J]. IEEE Internet of Things Journal, 2023, 10(22): 19825–19838. doi: 10.1109/jiot.2023.3282648. [5] WU Zhijun, GAO Pan, CUI Lei, et al. An incremental learning method based on dynamic ensemble RVM for intrusion detection[J]. IEEE Transactions on Network and Service Management, 2022, 19(1): 671–685. doi: 10.1109/tnsm.2021.3102388. [6] LI Zhida, RIOS A L G, and TRAJKOVIĆ L. Machine learning for detecting anomalies and intrusions in communication networks[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(7): 2254–2264. doi: 10.1109/jsac.2021.3078497. [7] LEI Shengwei, XIA Chunhe, LI Zhong, et al. HNN: A novel model to study the intrusion detection based on multi-feature correlation and temporal-spatial analysis[J]. IEEE Transactions on Network Science and Engineering, 2021, 8(4): 3257–3274. doi: 10.1109/tnse.2021.3109644. [8] JIN Zhigang, ZHOU Junyi, LI Bing, et al. FL-IIDS: A novel federated learning-based incremental intrusion detection system[J]. Future Generation Computer Systems, 2024, 151: 57–70. doi: 10.1016/j.future.2023.09.019. [9] RESENDE P A A and DRUMMOND A C. A survey of random forest-based methods for intrusion detection systems[J]. ACM Computing Surveys, 2019, 51(3): 48. doi: 10.1145/3178582. [10] SHAO Ling, WU Di, and LI Xuelong. Learning deep and wide: A spectral method for learning deep networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(12): 2303–2308. doi: 10.1109/TNNLS.2014.2308519. [11] 唐宏, 刘丹, 姚立霜, 等. 面向类不平衡网络流量的特征选择算法[J]. 电子与信息学报, 2021, 43(4): 923–930. doi: 10.11999/JEIT190992.TANG Hong, LIU Dan, YAO Lishuang, et al. Feature selection algorithm for class imbalanced internet traffic[J]. Journal of Electronics & Information Technology, 2021, 43(4): 923–930. doi: 10.11999/JEIT190992. [12] TELIKANI A, GANDOMI A H, CHOO K K R, et al. A cost-sensitive deep learning-based approach for network traffic classification[J]. IEEE Transactions on Network and Service Management, 2022, 19(1): 661–670. doi: 10.1109/tnsm.2021.3112283. [13] GUPTA N, JINDAL V, and BEDI P. LIO-IDS: Handling class imbalance using LSTM and improved one-vs-one technique in intrusion detection system[J]. Computer Networks, 2021, 192: 108076. doi: 10.1016/j.comnet.2021.108076. [14] LIU Lan, WANG Pengcheng, LIN Jun, et al. Intrusion detection of imbalanced network traffic based on machine learning and deep learning[J]. IEEE Access, 2021, 9: 7550–7563. doi: 10.1109/ACCESS.2020.3048198. [15] ZHANG Ying and LIU Qiang. On IoT intrusion detection based on data augmentation for enhancing learning on unbalanced samples[J]. Future Generation Computer Systems, 2022, 133: 213–227. doi: 10.1016/j.future.2022.03.007. [16] BALASUBRAMANIAM S, VIJESH JOE C, SIVAKUMAR T A, et al. Optimization enabled deep learning-based DDoS attack detection in cloud computing[J]. International Journal of Intelligent Systems, 2023, 2023: 2039217. doi: 10.1155/2023/2039217. [17] LAKE B M and BARONI M. Human-like systematic generalization through a meta-learning neural network[J]. Nature, 2023, 623(7985): 115–121. doi: 10.1038/s41586-023-06668-3. [18] KUMAR V and SINHA D. Synthetic attack data generation model applying generative adversarial network for intrusion detection[J]. Computers & Security, 2023, 125: 103054. doi: 10.1016/j.cose.2022.103054. [19] YAN Mi, HUI S C, and LI Ning. DML-PL: Deep metric learning based pseudo-labeling framework for class imbalanced semi-supervised learning[J]. Information Sciences, 2023, 626: 641–657. doi: 10.1016/j.ins.2023.01.074. [20] YAN Fei, LI Nianqiao, ILIYASU A M, et al. Insights into security and privacy issues in smart healthcare systems based on medical images[J]. Journal of Information Security and Applications, 2023, 78: 103621. doi: 10.1016/j.jisa.2023.103621. [21] XU Congyuan, SHEN Jizhong, and DU Xin. A method of few-shot network intrusion detection based on meta-learning framework[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 3540–3552. doi: 10.1109/tifs.2020.2991876. [22] YANG Jingcheng, LI Hongwei, SHAO Shuo, et al. FS-IDS: A framework for intrusion detection based on few-shot learning[J]. Computers & Security, 2022, 122: 102899. doi: 10.1016/j.cose.2022.102899. [23] SHARAFALDIN I, LASHKARI A H, and GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization[C]. The 4th International Conference on Information Systems Security and Privacy, Funchal, Portugal, 2018: 108–116. doi: 10.5220/0006639801080116. -