Distance Comparison Based Parallel Pattern Matching
-
摘要: 随着网络带宽的快速增长,作为网络安全设备核心模块的多模式匹配(MPM)算法面临严峻的性能挑战。该文提出一种高效的数据包分割和并行匹配算法—距离比较并行匹配算法(DCPM)。和已有方法相比,并行的DCPM线程间不存在同步开销,引入的冗余检测开销达到理论最小。基于Aho-Corasick(AC)算法,在8核处理器平台上将DCPM算法与已有的数据包分割方法进行了性能比较。实验结果表明,和已有方法相比,DCPM算法的适应性更好,性能受网络流量中模式串占比、模式串长度及自动机状态数等因素的影响更小;在处理真实数据集时,DCPM算法的加速比提升1.3~3.5倍。
-
关键词:
- 模式匹配 /
- 多线程 /
- 多核 /
- 深度包检测 /
- Aho-Corasick算法
Abstract: Multi-Pattern Matching(MPM) works as a core algorithm of packet processing procedure. In order to improve the performance, an efficient packet segmentation and parallel pattern matching algorithm–DCPM (Distance Comparison Parallel Matching) is proposed based on the Aho-Corasick (AC) algorithm. Comparing with existing solutions, DCPM eliminates the threads’ synchronization overhead and decreases the redundant detection overhead. The DCPM algorithm is evaluated on an eight-core processor server platform. The experimental results show that the performance is largely improved (1.3~3.5 times when processing real-world traffic with 8 threads, compared with existing solutions). Meanwhile, the performance of DCPM is less affected by the proportion of pattern strings in the traffic, the length of pattern strings, as well as the number of states in automata. -
表 1 单个DCPM线程内的模式匹配算法
输入:AC自动机状态跳转函数AC_DFA(state, character);AC
自动机的状态深度查询函数depth(state);待匹配的数据
包载荷payload[M];分片的起始、结束位置start,end;
最长的模式串长度 max_signature;输出:匹配结果payload_results[M] (1) 初始化相关参数,令记录线程状态的statei=0,记录分片中
处理位置的indexi=start,记录越过分割点长度的stepi=0;(2) 数据包载荷的payload[start]到payload[end]间,执行AC自动
机正常匹配阶段,更新对应的statei、indexi和匹配结果
payload_results[indexi](3) 进入验证匹配阶段 WHILE(stepi < max_signature –1) { IF depth(statei) ≤ stepi RETURN; END IF; statei $ \leftarrow $AC_DFA (statei, payload[indexi]); 更新对应的匹配结果payload_results[indexi] indexi++; stepi++; END WHILE; } (4) 输出匹配结果payload_results[M] 表 2 DCPM匹配算法正常阶段检测结果
线程1 字符串 e s h s h 状态跳转 s0 s7 s8 s7 s8 线程2 字符串 i s s i h 状态跳转 s0 s7 s7 s0 s1 线程3 字符串 s h s r e 状态跳转 s7 s8 s7 s0 s0 表 3 DCPM算法验证阶段检测结果
线程1
(向后处理分片2)input 初始值 i s s state s8 s5 s6 s7 depth 2 2 3 1 step 0 1 2 3 线程2
(向后处理分片3)input 初始值 s state s1 s7 depth 1 1 step 0 1 表 4 人工产生模式串集合
特征模式集 包含特征模式数 特征模式长度
范围(Byte)生成DFA的
状态数pattern1 1000 10~20 14283 pattern2 1000 10~200 100310 表 5 人工产生待匹配字符串集合
对应特征模式集 混合字符串集 特征模式比例(%) 总字节数
(Byte)pattern1 mstring1_0 0 87481547 mstring1_30 30 87393564 mstring1_50 50 87190985 mstring1_70 70 87187583 mstring1_100 100 87201482 pattern2 mstring2_0 0 87420048 mstring2_30 30 87763399 mstring2_50 50 87532636 mstring2_70 70 87361422 mstring2_100 100 87632722 表 6 实验所用的snort模式串集合1
特征模式集 包含特征模
式数特征模式长度范围(Byte) 生成DFA的
状态数Snort rule1 2858 2~192 37902 表 7 实验所用真实流量数据集1
流量类型 数据包集 数据包长度
(Byte)总字节数
(Byte)良性流量 Gmail 70~100 6996770 Weibo 1500~1600 132744419 恶意流量 Bot 1500~1600 113776976 表 8 实验所用的snort模式串集合2
特征模式集 包含特征模式数 特征模式长度
范围(Byte)生成DFA的
状态数Snort rule2 52 4~44 723 Snort rule3 828 3~65 9615 表 9 实验所用真实流量数据集2
流量来源 日期 流长度
(Byte)Size(MB) Darpa Monday 1200~20000 145.8 表 10 正确性测试实验结果
匹配数 AC DCPM 线程数1 线程数2 线程数4 线程数8 测试
字符集mstring1_0 0 0 0 0 0 mstring1_50 322835 322835 322835 322835 322835 mstring1_100 641993 641993 641993 641993 641993 mstring2_0 0 0 0 0 0 mstring2_50 47613 47613 47613 47613 47613 mstring2_100 95759 95759 95759 95759 95759 -
[1] REN Hao, LITT H, LIU Dongxiao, et al. Toward efficient and secure deep packet inspection for outsourced middlebox[C]. 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6. [2] JIANG Haiyang, ZHANG Guangxing, XIE Gaogang, et al. Scalable high-performance parallel design for network intrusion detection systems on many-core processors[C]. The Architectures for Networking and Communications Systems, San Jose, United States, 2013: 137–146. [3] BINSAHAQ A, SHELTAMI T R, and SALAH K. A survey on autonomic provisioning and management of QoS in SDN networks[J]. IEEE Access, 2019, 7: 73384–73435. doi: 10.1109/ACCESS.2019.2919957 [4] LARABA A, FRANÇOIS J, CHRISMENT I, et al. Defeating protocol abuse with P4: Application to explicit congestion notification[C]. 2020 IFIP Networking Conference (Networking), Paris, France, 2020: 431–439. [5] KOPEK C V, FULP E W, and WHEELER P S. Distributed data parallel techniques for content-matching intrusion detection systems[C]. The MILCOM 2007 – IEEE Military Communications Conference, Orlando, USA, 2007: 1–7. [6] HAKAK S I, KAMSIN A, SHIVAKUMARA P, et al. Exact string matching algorithms: Survey, issues, and future research directions[J]. IEEE Access, 2019, 7: 69614–69637. doi: 10.1109/ACCESS.2019.2914071 [7] BREMLER-BARR A, HAY D, and KORAL Y. CompactDFA: Scalable pattern matching using longest prefix match solutions[J]. IEEE/ACM Transactions on Networking, 2014, 22(2): 415–428. doi: 10.1109/TNET.2013.2253119 [8] BOTACIN M, GALANTE L, CESCHIN F, et al. The AV says: Your hardware definitions were updated![C]. The 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), York, UK, 2019: 27–34. [9] Cisco. Clam AntiVirus(ClamAv)[CP/OL]. http://www.clamav.net/. [10] Cisco. Snort official website[CP/OL]. https://www.snort.org/. [11] NAJAM M, YOUNIS U, and UR RASOOL R. Speculative parallel pattern matching using stride-k DFA for deep packet inspection[J]. Journal of Network and Computer Applications, 2015, 54: 78–87. doi: 10.1016/j.jnca.2015.04.013 [12] FU Zhe, LIU Zhi, and LI Jun. Efficient parallelization of regular expression matching for deep inspection[C]. The 26th International Conference on Computer Communication and Networks (ICCCN), Vancouver, Canada, 2017: 1–9. [13] JIANG Peng and AGRAWAL G. Combining SIMD and many/multi-core parallelism for finite state machines with enumerative speculation[C]. The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Austin, USA, 2017: 179–191. [14] CHOI B, CHAE J, JAMSHED M, et al. DFC: Accelerating string pattern matching for network applications[C]. The 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI’16), Santa Clara, USA, 2016: 551–565. [15] WANG Xiang, HONG Yang, CHANG H, et al. Hyperscan: A fast multi-pattern regex matcher for modern CPUs[C]. The 16th USENIX Conference on Networked Systems Design and Implementation, Boston, USA, 2019: 631–648. [16] WANG Wei, ZHU Ming, ZENG Xuewen, et al. Malware traffic classification using convolutional neural network for representation learning[C]. 2017 International Conference on Information Networking, Da Nang, Vietnam, 2017: 712–717. [17] DEHKORDI A B, SOLTANAGHAEI M, and BOROUJENI F Z. The DDoS attacks detection through machine learning and statistical methods in SDN[J]. The Journal of Supercomputing, 2021, 77(3): 2383–2415. doi: 10.1007/s11227-020-03323-w [18] MIT: Darpa dataset[EB/OL]. http://www.ll.mit.edu/ideval/data/1999data.html/.