基于距离比较的AC自动机并行匹配算法

姜海洋; 李雪菲; 杨晔

doi:10.11999/JEIT210009

基于距离比较的AC自动机并行匹配算法

doi: 10.11999/JEIT210009 cstr: 32379.14.JEIT210009

姜海洋^{1, 2, 3, ,},
李雪菲^{1, 2},
杨晔^{1, 2}

1.
中国科学院计算技术研究所北京 100190
2.
中国科学院大学北京 100049
3.
江苏省未来网络创新研究院南京 211111

基金项目: 国家重点研发计划(2019YFB1804500)，光合基金B类(20210702)

详细信息

作者简介:
姜海洋：男，1982年生，副研究员，研究方向为高性能数据包处理、数据驱动安全等

李雪菲：女，1996年生，硕士生，研究方向为网络安全、异常检测

杨晔：男，1995年生，博士生，研究方向为网络虚拟化、网络安全

通讯作者:
姜海洋　jianghaiyang@ict.ac.cn

中图分类号: TN915.08; TP393.08
计量
- 文章访问数: 833
- HTML全文浏览量: 846
- PDF下载量: 110
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-05
- 修回日期: 2021-09-01
- 录用日期: 2021-09-01
- 网络出版日期: 2021-12-20
- 刊出日期: 2022-02-25

Distance Comparison Based Parallel Pattern Matching

JIANG Haiyang^{1, 2, 3
, ,},
LI Xuefei^{1, 2},
YANG Ye^{1, 2}

1.
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2.
University of Chinese Academy of Sciences, Beijing 100049, China
3.
Jiangsu Future Networks Innovation Institute, Nanjing 211111, China

Funds: The National Key Research and Development Program of China (2019YFB1804500), The GHfund B (20210702)

摘要

摘要: 随着网络带宽的快速增长，作为网络安全设备核心模块的多模式匹配(MPM)算法面临严峻的性能挑战。该文提出一种高效的数据包分割和并行匹配算法—距离比较并行匹配算法(DCPM)。和已有方法相比，并行的DCPM线程间不存在同步开销，引入的冗余检测开销达到理论最小。基于Aho-Corasick(AC)算法，在8核处理器平台上将DCPM算法与已有的数据包分割方法进行了性能比较。实验结果表明，和已有方法相比，DCPM算法的适应性更好，性能受网络流量中模式串占比、模式串长度及自动机状态数等因素的影响更小；在处理真实数据集时，DCPM算法的加速比提升1.3～3.5倍。
- 模式匹配 /
- 多线程 /
- 多核 /
- 深度包检测 /
- Aho-Corasick算法
Abstract: Multi-Pattern Matching(MPM) works as a core algorithm of packet processing procedure. In order to improve the performance, an efficient packet segmentation and parallel pattern matching algorithm–DCPM (Distance Comparison Parallel Matching) is proposed based on the Aho-Corasick (AC) algorithm. Comparing with existing solutions, DCPM eliminates the threads’ synchronization overhead and decreases the redundant detection overhead. The DCPM algorithm is evaluated on an eight-core processor server platform. The experimental results show that the performance is largely improved (1.3～3.5 times when processing real-world traffic with 8 threads, compared with existing solutions). Meanwhile, the performance of DCPM is less affected by the proportion of pattern strings in the traffic, the length of pattern strings, as well as the number of states in automata.
- Pattern matching /
- Multi-threads /
- Multi-core /
- Deep Packet Inspection(DPI) /
- Aho-Corasick (AC) algorithm

HTML全文

图 1 模式串集合{he, she, his, hers}对应的DFA

下载: 全尺寸图片幻灯片

图 2 基于数据包分片的负载均衡并行匹配方法

下载: 全尺寸图片幻灯片

图 3 DDP算法示意图

下载: 全尺寸图片幻灯片

图 4 SPPM算法示意图

下载: 全尺寸图片幻灯片

图 5 枚举式并行匹配算法示意图

下载: 全尺寸图片幻灯片

图 6 DCPM算法理论分析示意图

下载: 全尺寸图片幻灯片

图 7 不同pattern集和检测集下各算法加速比随线程数变化情况

下载: 全尺寸图片幻灯片

图 8 真实规则集和检测集下各算法加速比情况

下载: 全尺寸图片幻灯片

图 9 不同状态数规则集下各算法加速比随线程数变化情况

下载: 全尺寸图片幻灯片

表 1 单个DCPM线程内的模式匹配算法

输入：AC自动机状态跳转函数AC_DFA(state, character)；AC 　　　　自动机的状态深度查询函数depth(state)；待匹配的数据　　　　包载荷payload[M]；分片的起始、结束位置start，end；　　　　最长的模式串长度 max_signature；
输出：匹配结果payload_results[M]
(1) 初始化相关参数，令记录线程状态的state_i=0，记录分片中　　处理位置的index_i=start，记录越过分割点长度的step_i=0;
(2) 数据包载荷的payload[start]到payload[end]间，执行AC自动　　机正常匹配阶段，更新对应的state_i、index_i和匹配结果　　 payload_results[index_i]
(3) 进入验证匹配阶段
WHILE(step_i < max_signature –1)
{
IF depth(state_i) ≤ step_i
RETURN;
END IF;
state_i $ \leftarrow $AC_DFA (state_i, payload[index_i])；
更新对应的匹配结果payload_results[index_i]
index_i++; step_i++;
END WHILE;
}
(4) 输出匹配结果payload_results[M]

下载: 导出CSV

表 2 DCPM匹配算法正常阶段检测结果

线程1	字符串	e	s	h	s	h
线程1	状态跳转	s₀	s₇	s₈	s₇	s₈
线程2	字符串	i	s	s	i	h
线程2	状态跳转	s₀	s₇	s₇	s₀	s₁
线程3	字符串	s	h	s	r	e
线程3	状态跳转	s₇	s₈	s₇	s₀	s₀

下载: 导出CSV

表 3 DCPM算法验证阶段检测结果

线程1 (向后处理分片2)	input	初始值	i	s	s
	state	s₈	s₅	s₆	s₇
	depth	2	2	3	1
	step	0	1	2	3
线程2 (向后处理分片3)	input	初始值	s
	state	s₁	s₇
	depth	1	1
	step	0	1

下载: 导出CSV

表 4 人工产生模式串集合

特征模式集	包含特征模式数	特征模式长度范围(Byte)	生成DFA的状态数
pattern1	1000	10～20	14283
pattern2	1000	10～200	100310

下载: 导出CSV

表 5 人工产生待匹配字符串集合

对应特征模式集	混合字符串集	特征模式比例(%)	总字节数 (Byte)
pattern1	mstring1_0	0	87481547
	mstring1_30	30	87393564
	mstring1_50	50	87190985
	mstring1_70	70	87187583
	mstring1_100	100	87201482
pattern2	mstring2_0	0	87420048
	mstring2_30	30	87763399
	mstring2_50	50	87532636
	mstring2_70	70	87361422
	mstring2_100	100	87632722

下载: 导出CSV

表 6 实验所用的snort模式串集合1

特征模式集	包含特征模式数	特征模式长度范围(Byte)	生成DFA的状态数
Snort rule1	2858	2～192	37902

下载: 导出CSV

表 7 实验所用真实流量数据集1

流量类型	数据包集	数据包长度 (Byte)	总字节数 (Byte)
良性流量	Gmail	70～100	6996770
良性流量	Weibo	1500～1600	132744419
恶意流量	Bot	1500～1600	113776976

下载: 导出CSV

表 8 实验所用的snort模式串集合2

特征模式集	包含特征模式数	特征模式长度范围(Byte)	生成DFA的状态数
Snort rule2	52	4～44	723
Snort rule3	828	3～65	9615

下载: 导出CSV

表 9 实验所用真实流量数据集2

流量来源	日期	流长度 (Byte)	Size(MB)
Darpa	Monday	1200～20000	145.8

下载: 导出CSV

表 10 正确性测试实验结果

匹配数		AC	DCPM
匹配数		AC	线程数1	线程数2	线程数4	线程数8
测试字符集	mstring1_0	0	0	0	0	0
	mstring1_50	322835	322835	322835	322835	322835
	mstring1_100	641993	641993	641993	641993	641993
	mstring2_0	0	0	0	0	0
	mstring2_50	47613	47613	47613	47613	47613
	mstring2_100	95759	95759	95759	95759	95759

下载: 导出CSV

参考文献(18)

[1]	REN Hao, LITT H, LIU Dongxiao, et al. Toward efficient and secure deep packet inspection for outsourced middlebox[C]. 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
[2]	JIANG Haiyang, ZHANG Guangxing, XIE Gaogang, et al. Scalable high-performance parallel design for network intrusion detection systems on many-core processors[C]. The Architectures for Networking and Communications Systems, San Jose, United States, 2013: 137–146.
[3]	BINSAHAQ A, SHELTAMI T R, and SALAH K. A survey on autonomic provisioning and management of QoS in SDN networks[J]. IEEE Access, 2019, 7: 73384–73435. doi: 10.1109/ACCESS.2019.2919957
[4]	LARABA A, FRANÇOIS J, CHRISMENT I, et al. Defeating protocol abuse with P4: Application to explicit congestion notification[C]. 2020 IFIP Networking Conference (Networking), Paris, France, 2020: 431–439.
[5]	KOPEK C V, FULP E W, and WHEELER P S. Distributed data parallel techniques for content-matching intrusion detection systems[C]. The MILCOM 2007 – IEEE Military Communications Conference, Orlando, USA, 2007: 1–7.
[6]	HAKAK S I, KAMSIN A, SHIVAKUMARA P, et al. Exact string matching algorithms: Survey, issues, and future research directions[J]. IEEE Access, 2019, 7: 69614–69637. doi: 10.1109/ACCESS.2019.2914071
[7]	BREMLER-BARR A, HAY D, and KORAL Y. CompactDFA: Scalable pattern matching using longest prefix match solutions[J]. IEEE/ACM Transactions on Networking, 2014, 22(2): 415–428. doi: 10.1109/TNET.2013.2253119
[8]	BOTACIN M, GALANTE L, CESCHIN F, et al. The AV says: Your hardware definitions were updated![C]. The 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), York, UK, 2019: 27–34.
[9]	Cisco. Clam AntiVirus(ClamAv)[CP/OL]. http://www.clamav.net/.
[10]	Cisco. Snort official website[CP/OL]. https://www.snort.org/.
[11]	NAJAM M, YOUNIS U, and UR RASOOL R. Speculative parallel pattern matching using stride-k DFA for deep packet inspection[J]. Journal of Network and Computer Applications, 2015, 54: 78–87. doi: 10.1016/j.jnca.2015.04.013
[12]	FU Zhe, LIU Zhi, and LI Jun. Efficient parallelization of regular expression matching for deep inspection[C]. The 26th International Conference on Computer Communication and Networks (ICCCN), Vancouver, Canada, 2017: 1–9.
[13]	JIANG Peng and AGRAWAL G. Combining SIMD and many/multi-core parallelism for finite state machines with enumerative speculation[C]. The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Austin, USA, 2017: 179–191.
[14]	CHOI B, CHAE J, JAMSHED M, et al. DFC: Accelerating string pattern matching for network applications[C]. The 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI’16), Santa Clara, USA, 2016: 551–565.
[15]	WANG Xiang, HONG Yang, CHANG H, et al. Hyperscan: A fast multi-pattern regex matcher for modern CPUs[C]. The 16th USENIX Conference on Networked Systems Design and Implementation, Boston, USA, 2019: 631–648.
[16]	WANG Wei, ZHU Ming, ZENG Xuewen, et al. Malware traffic classification using convolutional neural network for representation learning[C]. 2017 International Conference on Information Networking, Da Nang, Vietnam, 2017: 712–717.
[17]	DEHKORDI A B, SOLTANAGHAEI M, and BOROUJENI F Z. The DDoS attacks detection through machine learning and statistical methods in SDN[J]. The Journal of Supercomputing, 2021, 77(3): 2383–2415. doi: 10.1007/s11227-020-03323-w
[18]	MIT: Darpa dataset[EB/OL]. http://www.ll.mit.edu/ideval/data/1999data.html/.