无监督自适应抽样与改进孪生网络结合的网络流量异常检测方法

尹梓诺; 陈鸿昶; 马海龙; 胡涛; 白禄鑫

doi:10.11999/JEIT241115

无监督自适应抽样与改进孪生网络结合的网络流量异常检测方法

doi: 10.11999/JEIT241115 cstr: 32379.14.JEIT241115

尹梓诺¹,
陈鸿昶¹,
马海龙^{1, 2},
胡涛^1, ,,
白禄鑫¹

1.
信息工程大学信息技术研究所郑州 450001
2.
网络空间安全教育部重点实验室郑州 450001

基金项目: 雄安新区科技创新专项 (2022XAGG0111)，国家自然科学基金(62176264)

详细信息

作者简介:
尹梓诺：女，博士生，研究方向为网络空间安全、网络流量异常检测等

陈鸿昶：男，教授，研究方向为通信与信息系统、数据科学与人工智能等

马海龙：男，教授，研究方向为网络空间内生安全技术、网络威胁智能检测以及新型网络体系等

胡涛：男，助理研究员，研究方向为新型网络体系结构等

白禄鑫：男，硕士生，研究方向为卫星互联网、软件定义网络和网络安全等

通讯作者:
胡涛　hutaondsc@163.com

中图分类号: TN915.08; TP393
计量
- 文章访问数: 440
- HTML全文浏览量: 218
- PDF下载量: 48
- 被引次数: 0
出版历程
- 收稿日期: 2024-12-19
- 修回日期: 2025-05-14
- 网络出版日期: 2025-06-03
- 刊出日期: 2025-07-22

A Network Traffic Anomaly Detection Method Integrating Unsupervised Adaptive Sampling with Enhanced Siamese Network

YIN Zinuo¹,
CHEN Hongchang¹,
MA Hailong^{1, 2},
HU Tao^{1
, ,},
BAI Luxin¹

1.
Information Technology Institute, Information Engineering University, Zhengzhou 450001, China
2.
Key Laboratory of Cyberspace Security, Ministry of Education of China, Zhengzhou 450001, China

Funds: Xiong’an New Area Science and Technology Innovation Special Project (2022XAGG0111), The National Natural Science Foundation of China (62176264)

摘要

摘要: 针对基于传统机器学习的网络流量异常检测方法受流量数据类别不平衡的影响检测性能较差的问题，该文提出一种无监督自适应抽样与改进孪生网络结合的网络流量异常检测方法。首先，设计基于K-medoids的自适应小样本抽样算法(KAFS)，利用无监督聚类对各类流量动态自适应地抽取更具代表性的少量样本，使正常和攻击流量均衡，提高训练小样本学习模型的数据质量。然后，构建具有鲁棒损失函数的孪生多层感知机(SMLP)模型用于流量异常检测，该模型利用两个相同结构的多层感知机对训练集中的成对流量样本进行训练，捕捉跨流量特征的非线性关系，学习流量数据的异同，进一步提高对攻击流量的分类精度。实验结果表明，所提方法在CICIDS2017和CICIDS2018数据集上的检测准确率分别可达99.80%和98.26%。与其他方法相比，该方法对未知攻击的检出率分别提高了至少2.85%和1.73%，有效提升流量异常检测性能。
- 网络流量异常检测 /
- 类别不平衡 /
- 自适应抽样 /
- 孪生多层感知机
Abstract: Objective The increasing complexity of network architectures and the rising frequency of cyberattacks have heightened the demand for effective network traffic anomaly detection. While machine learning and deep learning approaches have been widely applied, their effectiveness is often limited by the class imbalance commonly observed in network traffic data. To address this limitation, this study proposes a network traffic anomaly detection method integrating unsupervised adaptive sampling with enhanced Siamese network. An adaptive sampling algorithm is developed to balance the distribution of normal and anomalous traffic, improving the representativeness of training data. A Siamese Multi-Layer Perceptron (SMLP) model is then trained using a robust loss function to capture both similarities and differences in traffic patterns. This architecture enhances the model’s ability to identify anomalies, particularly under class-imbalance conditions. The proposed framework provides a scalable and data-efficient approach for improving the accuracy of network anomaly detection and reinforcing cybersecurity. Methods The proposed K-medoids-based Adaptive Few-shot Sampling (KAFS) algorithm applies unsupervised K-medoids clustering to group traffic data within each category based on feature distributions, forming multiple subclasses. From these, a small number of representative samples are adaptively selected to construct a balanced few-shot training set. This approach maintains a proportionate representation of normal and attack traffic, reducing model bias toward the dominant normal class and ensuring more effective learning across categories. Sample quality is further improved by prioritizing representativeness during selection. For the constructed training set, a traffic anomaly detection model based on a SMLP is designed. The model’s loss function combines encoding loss from the MLP with a prediction loss defined by the distance between anchor samples and corresponding normal or malicious samples. This structure enables the model to distinguish both similarities and subtle differences in traffic behavior, thereby enhancing the accuracy of attack traffic detection. Results and Discussions The proposed network traffic anomaly detection method, which integrates unsupervised adaptive sampling with an enhanced Siamese network, demonstrates strong performance on the CICIDS2017 and CICIDS2018 datasets. As shown in Fig. 8, the SMLP model trained using traffic samples generated by the KAFS sampling algorithm achieves superior detection performance, confirming the effectiveness of the KAFS approach. In Fig. 9, detection accuracies of 99.80% and 98.26% are achieved for attack-class traffic in the CICIDS2017 and CICIDS2018 datasets, respectively. Evaluation metrics presented in Fig. 9 and Fig. 10 show that the proposed method consistently outperforms other Siamese network architectures and loss functions in terms of accuracy, precision, Detection Rate (DR), and F1-score, further supporting the validity of the SMLP design. As shown in Tables 4 and 6, the method attains detection performance comparable to that of state-of-the-art algorithms while using substantially fewer samples, highlighting its suitability for practical deployment where data availability may be limited. Statistical analysis of the results (Tables 5 and 8) confirms that the performance gains achieved by the proposed method are statistically significant. Fig. 11 and Fig. 12 further illustrate that the method delivers notable improvements over existing approaches in detecting unknown attack types, demonstrating its adaptability and robustness under evolving threat conditions. Conclusions This study addresses the challenges of sparse attack traffic and class imbalance in network traffic anomaly detection by proposing a method that combines unsupervised adaptive sampling with an enhanced Siamese network. A KAFS algorithm is designed to dynamically select representative training sets using unsupervised clustering. To enable accurate detection with limited input samples, an SMLP is developed to compute distances between traffic samples. A robust loss function is introduced, incorporating both encoding loss from the MLP and prediction loss based on the distance between anchor, normal, and malicious samples, thereby improving training efficiency. Experimental validation using the CICIDS2017 and CICIDS2018 datasets confirm the method’s effectiveness in detecting attack traffic with few samples. Future work will focus on further enhancing few-shot intrusion detection techniques to improve detection accuracy in real-world network environments.
- Network traffic anomaly detection /
- Class imbalance /
- Adaptive sampling /
- Siamese Multi-Layer Perceptron (SMLP)

HTML全文

图 1 CICIDS2017数据集中正常流量和攻击流量的4种典型特征的核密度图

下载: 全尺寸图片幻灯片

图 2 CICIDS2018数据集中正常流量和攻击流量的4种典型特征的核密度图

下载: 全尺寸图片幻灯片

图 3 无监督自适应抽样与改进孪生网络结合的网络流量异常检测方法的结构

下载: 全尺寸图片幻灯片

图 4 KAFS算法

下载: 全尺寸图片幻灯片

图 5 SMLP模型

下载: 全尺寸图片幻灯片

图 6 SMLP模型在CICIDS2017和CICIDS2018数据集上的损失曲线

下载: 全尺寸图片幻灯片

图 7 CICIDS2017和CICIDS2018数据集中流统计特征对模型检测性能的贡献

下载: 全尺寸图片幻灯片

图 8 不同抽样方法的检测结果的比较

下载: 全尺寸图片幻灯片

图 9 不同孪生网络对CICIDS2017和CICIDS2018数据集的检测性能

下载: 全尺寸图片幻灯片

图 10 基于标准损失的MLP与融合编码和预测损失SMLP在CICIDS 2017和CICIDS2018数据集的检测性能对比

下载: 全尺寸图片幻灯片

图 11 不同检测方法对CICIDS2017数据集中未知攻击的检测性能

下载: 全尺寸图片幻灯片

图 12 不同检测方法对CICIDS2018数据集中未知攻击的检测性能

下载: 全尺寸图片幻灯片

表 1 CICIDS2017数据集训练数据分布

类别	正常流量	攻击流量
类型	正常流量	DDoS	DoS GoldenEye	PortScan	Bot	FTP-Patator	SSH-Patator
样本量	43	32	17	25	16	24	20

下载: 导出CSV

表 2 CICIDS2018数据集训练数据分布

类别	正常流量	攻击流量
类型	正常流量	Bot	Brutefoce	DoS	Infiltration
样本量	29	42	12	25	20

下载: 导出CSV

表 3 流量异常检测类别不平衡问题的先进方法的结构及检测结果对比(%)

方法	抽样方法	K值设定	模型	损失	准确率	检出率	精确率	F1-score
FC-Net	随机抽样	固定且相同	CNN and DNN	均方误差	95.67	95.28	94.32	94.80
FS-IDS	随机抽样	固定且相同	AE+CNN	均方误差	97.71	96.56	97.88	97.22
LIO-IDS	过采样	固定且相同	LSTM + I-OVO	分类交叉熵	97.56	99.24	95.08	97.11
本文方法	KAFS	自适应	SMLP	二进制交叉熵	99.80	99.61	99.90	99.75

下载: 导出CSV

表 4 不同方法的多分类精确率和检出率对比(%)

类型	方法
	FC-Net		FS-IDS		所提方法
	精确率	检出率	精确率	检出率	精确率	检出率
DDoS	98.45	97.62	98.36	97.97	99.58	99.87
DoS GoldenEye	89.77	99.52	96.07	99.28	99.94	99.54
PortScan	85.46	99.77	92.86	99.72	99.08	99.84
Bot	98.32	98.73	97.98	96.78	99.14	99.91
FTP-Patator	99.24	99.34	99.71	99.59	99.82	100.00
SSH-Patator	99.49	100.00	99.61	99.92	99.99	99.77

下载: 导出CSV

表 5 所提方法与CICIDS2017上的先进方法检测性能的对比

方法	模型结构	样本数	准确率(%)	精确率(%)	检出率(%)	F1-score(%)
VFBLS	多特征广义学习系统	21942	97.39	96.90	97.60	97.25
HNN	CNN/LSTM+DNN	225745	99.84	99.98	99.13	99.55
DEIL-RVM	动态集成相关向量机	2830696	99.56	99.44	99.41	99.42
FCL-SBLS	联邦持续学习和堆叠广义学习系统	2264557	95.28	94.14	94.30	94.22
所提方法	KAFS和SMLP	177	99.80	99.90	99.61	99.75

下载: 导出CSV

表 6 现有先进方法与所提方法对CICIDS2017数据集在检测性能的统计显著性水平

方法	准确率	精确率	检出率	F1-score
VFBLS	1.60e–11	8.20e–13	1.58e–10	6.61e–11
HNN	0.20	2.30e–02	7.65e–05	2.49e–02
DEIL-RVM	1.48e–04	5.48e–06	2.36e–02	5.04e–04
FCL-SBLS	5.98e–14	3.89e–15	1.26e–13	5.15e–14

下载: 导出CSV

表 7 所提方法与CICIDS2018上的先进方法检测结果对比

方法	样本量	准确率(%)	精确率(%)	检出率(%)	F1-score(%)
DSSTE+miniVGGNeT	154034	97.26	94.46	95.18	94.82
ICVAE-BSM	1628599	97.83	96.30	95.42	95.86
FL-IIDS	75955	98.21	96.35	96.27	96.31
所提方法	141	98.26	96.94	96.44	96.68

下载: 导出CSV

表 8 与CICIDS2018上的先进方法多分类检测精确率和检出率对比(%)

类型	方法
	ICVAE-BSM		DSSTE+ miniVGGNeT		FL-IIDS		所提方法
	精确率	检出率	精确率	检出率	精确率	检出率	精确率	检出率
Bot	91.04	94.56	89.11	95.36	96.14	99.65	95.82	99.70
Bruteforce	92.29	94.37	89.08	95.24	92.36	100.00	93.86	99.61
DoS	92.52	93.81	91.99	92.59	97.16	98.40	97.51	95.87
Infiltration	87.72	93.77	87.38	93.08	93.91	82.06	95.65	95.39

下载: 导出CSV

表 9 现有先进方法与所提方法对CICIDS2018数据集在检测性能的统计显著性水平

方法	准确率	精确率	检出率	F1-score
ICVAE-BSM	2.99e–05	5.15e–06	2.30e–06	4.91e–07
DSSTE+ miniVGGNeT	6.18e–08	1.02e–11	2.42e–10	3.34e–10
FL-IIDS	6.27e–03	1.33e–08	4.19e–06	5.82E–04

下载: 导出CSV

参考文献(23)

[1]	潘成胜, 李志祥, 杨雯升, 等. 基于二次特征提取和BiLSTM-Attention的网络流量异常检测方法[J]. 电子与信息学报, 2023, 45(12): 4539–4547. doi: 10.11999/JEIT221296. PAN Chengsheng, LI Zhixiang, YANG Wensheng, et al. Anomaly detection method of network traffic based on secondary feature extraction and BiLSTM-attention[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4539–4547. doi: 10.11999/JEIT221296.
[2]	GUPTA N, JINDAL V, and BEDI P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems[J]. Computers & Security, 2022, 112: 102499. doi: 10.1016/j.cose.2021.102499.
[3]	LEEVY J L and KHOSHGOFTAAR T M. A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data[J]. Journal of Big Data, 2020, 7(1): 104. doi: 10.1186/s40537-020-00382-x.
[4]	HE Xiaoqiang, CHEN Qianbin, TANG Lun, et al. Federated continuous learning based on stacked broad learning system assisted by digital twin networks: An incremental learning approach for intrusion detection in UAV networks[J]. IEEE Internet of Things Journal, 2023, 10(22): 19825–19838. doi: 10.1109/jiot.2023.3282648.
[5]	WU Zhijun, GAO Pan, CUI Lei, et al. An incremental learning method based on dynamic ensemble RVM for intrusion detection[J]. IEEE Transactions on Network and Service Management, 2022, 19(1): 671–685. doi: 10.1109/tnsm.2021.3102388.
[6]	LI Zhida, RIOS A L G, and TRAJKOVIĆ L. Machine learning for detecting anomalies and intrusions in communication networks[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(7): 2254–2264. doi: 10.1109/jsac.2021.3078497.
[7]	LEI Shengwei, XIA Chunhe, LI Zhong, et al. HNN: A novel model to study the intrusion detection based on multi-feature correlation and temporal-spatial analysis[J]. IEEE Transactions on Network Science and Engineering, 2021, 8(4): 3257–3274. doi: 10.1109/tnse.2021.3109644.
[8]	JIN Zhigang, ZHOU Junyi, LI Bing, et al. FL-IIDS: A novel federated learning-based incremental intrusion detection system[J]. Future Generation Computer Systems, 2024, 151: 57–70. doi: 10.1016/j.future.2023.09.019.
[9]	RESENDE P A A and DRUMMOND A C. A survey of random forest-based methods for intrusion detection systems[J]. ACM Computing Surveys, 2019, 51(3): 48. doi: 10.1145/3178582.
[10]	SHAO Ling, WU Di, and LI Xuelong. Learning deep and wide: A spectral method for learning deep networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(12): 2303–2308. doi: 10.1109/TNNLS.2014.2308519.
[11]	唐宏, 刘丹, 姚立霜, 等. 面向类不平衡网络流量的特征选择算法[J]. 电子与信息学报, 2021, 43(4): 923–930. doi: 10.11999/JEIT190992. TANG Hong, LIU Dan, YAO Lishuang, et al. Feature selection algorithm for class imbalanced internet traffic[J]. Journal of Electronics & Information Technology, 2021, 43(4): 923–930. doi: 10.11999/JEIT190992.
[12]	TELIKANI A, GANDOMI A H, CHOO K K R, et al. A cost-sensitive deep learning-based approach for network traffic classification[J]. IEEE Transactions on Network and Service Management, 2022, 19(1): 661–670. doi: 10.1109/tnsm.2021.3112283.
[13]	GUPTA N, JINDAL V, and BEDI P. LIO-IDS: Handling class imbalance using LSTM and improved one-vs-one technique in intrusion detection system[J]. Computer Networks, 2021, 192: 108076. doi: 10.1016/j.comnet.2021.108076.
[14]	LIU Lan, WANG Pengcheng, LIN Jun, et al. Intrusion detection of imbalanced network traffic based on machine learning and deep learning[J]. IEEE Access, 2021, 9: 7550–7563. doi: 10.1109/ACCESS.2020.3048198.
[15]	ZHANG Ying and LIU Qiang. On IoT intrusion detection based on data augmentation for enhancing learning on unbalanced samples[J]. Future Generation Computer Systems, 2022, 133: 213–227. doi: 10.1016/j.future.2022.03.007.
[16]	BALASUBRAMANIAM S, VIJESH JOE C, SIVAKUMAR T A, et al. Optimization enabled deep learning-based DDoS attack detection in cloud computing[J]. International Journal of Intelligent Systems, 2023, 2023: 2039217. doi: 10.1155/2023/2039217.
[17]	LAKE B M and BARONI M. Human-like systematic generalization through a meta-learning neural network[J]. Nature, 2023, 623(7985): 115–121. doi: 10.1038/s41586-023-06668-3.
[18]	KUMAR V and SINHA D. Synthetic attack data generation model applying generative adversarial network for intrusion detection[J]. Computers & Security, 2023, 125: 103054. doi: 10.1016/j.cose.2022.103054.
[19]	YAN Mi, HUI S C, and LI Ning. DML-PL: Deep metric learning based pseudo-labeling framework for class imbalanced semi-supervised learning[J]. Information Sciences, 2023, 626: 641–657. doi: 10.1016/j.ins.2023.01.074.
[20]	YAN Fei, LI Nianqiao, ILIYASU A M, et al. Insights into security and privacy issues in smart healthcare systems based on medical images[J]. Journal of Information Security and Applications, 2023, 78: 103621. doi: 10.1016/j.jisa.2023.103621.
[21]	XU Congyuan, SHEN Jizhong, and DU Xin. A method of few-shot network intrusion detection based on meta-learning framework[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 3540–3552. doi: 10.1109/tifs.2020.2991876.
[22]	YANG Jingcheng, LI Hongwei, SHAO Shuo, et al. FS-IDS: A framework for intrusion detection based on few-shot learning[J]. Computers & Security, 2022, 122: 102899. doi: 10.1016/j.cose.2022.102899.
[23]	SHARAFALDIN I, LASHKARI A H, and GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization[C]. The 4th International Conference on Information Systems Security and Privacy, Funchal, Portugal, 2018: 108–116. doi: 10.5220/0006639801080116.