模糊测试中的位置自适应变异调度策略

杨智; 徐航; 桑伟泉; 孙浩东; 金舒原

doi:10.11999/JEIT240060

模糊测试中的位置自适应变异调度策略

doi: 10.11999/JEIT240060 cstr: 32379.14.JEIT240060

1.
信息工程大学密码工程学院郑州 450004
2.
中山大学计算机学院广州 510275

基金项目: 国家自然科学基金(62176265)

详细信息

作者简介:
杨智：男，教授，研究方向为操作系统安全、云计算安全和隐私保护

徐航：男，硕士生，研究方向为软件安全性分析

桑伟泉：男，硕士，研究方向为网络安全与人工智能

孙浩东：男，硕士生，研究方向为软件安全性分析

金舒原：女，教授，研究方向为漏洞挖掘、网络攻防、人工智能安全和操作系统安全

通讯作者:
徐航　1174290091@qq.com

中图分类号: TN915.08; TP393.08
计量
- 文章访问数: 461
- HTML全文浏览量: 313
- PDF下载量: 36
- 被引次数: 0
出版历程
- 收稿日期: 2024-01-26
- 修回日期: 2024-07-13
- 网络出版日期: 2024-08-02
- 刊出日期: 2024-09-26

Position-Adaptive Mutation Scheduling Strategy in Fuzzing

1.
School of Cryptographic Engineering, Information Engineering University, Zhengzhou 450004, China
2.
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China

Funds: The National Natural Science Foundation of China (62176265)

摘要

摘要: 种子自适应变异调度策略是基于变异的模糊测试中最新的技术，该技术能够根据种子的语法和语义特征自适应地调整变异算子的概率分布，然而其存在两个问题：(1)无法根据变异位置自适应地调整概率分布；(2)使用的汤普森采样算法在模糊测试场景中容易导致学习到的概率分布接近平均分布，进而导致变异调度策略失效。针对上述问题，该文提出一种位置自适应变异调度策略，通过一种自定义的双层多臂老虎机模型为变异位置和变异算子建立联系，并且采用置信区间上界算法选择变异算子，实现位置自适应的同时避免了出现平均分布的问题。基于American Fuzzy Lop(AFL)实现了位置自适应的模糊测试器 (PAMSSAFL)，实验结果表明位置自适应的变异调度策略能明显提升模糊测试器的bug发现能力和覆盖能力。
- 漏洞挖掘 /
- 模糊测试 /
- 变异 /
- 覆盖率
Abstract: The seed-adaptive mutation scheduling strategy is the latest technology in mutation-based fuzzing, which can adaptively adjust the probability distribution of the mutation operators according to the syntax and semantic characteristics of the seed. However, it has two problems: (1) it is unable to adaptively adjust the probability distribution according to the mutation position; (2) The Thompson Sampling algorithm used in the fuzzing scenario is easy to lead to the learned probability distribution close to the average distribution, which leads to the failure of the mutation scheduling strategy. Focusing on the above problems, a position-adaptive mutation scheduling strategy is proposed. This technology establishes the relationship between the mutation position and the mutation operators through a user-defined double-layer multi-armed bandit model, and uses the Upper Confidence Bound algorithm to select the mutation operator, so as to achieve position adaptation and avoid the problem of average distribution. The position-adaptive fuzzer Position-Adaptive Mutation Scheduling Strategy AFL (PAMSSAFL) is implemented based on American Fuzzy Lop (AFL). The comparison results show that the position-adaptive mutation scheduling strategy can improve the bug detection ability and coverage ability of the fuzzer.
- Vulnerability mining /
- Fuzzing /
- Mutation /
- Coverage

HTML全文

图 1 种子自适应变异调度策略流程

下载: 全尺寸图片幻灯片

图 2 示例种子

下载: 全尺寸图片幻灯片

图 3 双层多臂老虎机示例

下载: 全尺寸图片幻灯片

图 4 PAMSSAFL系统流程

下载: 全尺寸图片幻灯片

图 5 位置自适应变异调度策略流程

下载: 全尺寸图片幻灯片

图 6 触发漏洞的测试用例

下载: 全尺寸图片幻灯片

图 7 路径数量对比

下载: 全尺寸图片幻灯片

图 8 SeamFuzz测试objdump和who时学习到的概率分布

下载: 全尺寸图片幻灯片

表 1 变异算子类型

类型	释义
Flip(k)	翻转第k个比特，0翻转为1，1翻转为0
Overwrite(s, e, v)	将第s个字节至第e个字节的内容复写为v
Arithmetic(s, e, v)	将第s个字节至第e个字节的内容与 v进行算术运算
Delete(s, l)	从第s个字节开始删除长度l的内容
Insert(s, v)	在第s个字节之后插入v

下载: 导出CSV

表 2 示例代码段

1. function example() {
2. 　　number = getNumber();
3. 　　if(number == 10) bug1();
4. 　　content = getContent();
5. 　　length = strlen(content);
6. 　　if(length > 8) bug2();
7. }

下载: 导出CSV

表 3 MAGMA测试结果

模糊测试器		libpng	libtiff	libxml2	lua	openssl	总计
AFL	覆盖	6	6	8	1	4	25
AFL	触发	2	4	1	1	1	9
DARWIN	覆盖	6	6	8	1	4	25
DARWIN	触发	1	3	1	1	1	7
PAMSSAFL	覆盖	6	8	8	1	4	27
PAMSSAFL	触发	1	5	2	1	1	10
SeamFuzz	覆盖	6	6	8	1	4	24
SeamFuzz	触发	1	3	1	1	1	7
AFL++	覆盖	6	8	8	1	4	27
AFL++	触发	2	4	4	1	1	12
PAMSSAFL++	覆盖	6	8	8	1	4	27
PAMSSAFL++	触发	2	5	4	1	2	14

下载: 导出CSV

表 4 LAVA-M测试结果

模糊测试器	base64	md5sum	uniq	who	总计
AFL	0	7	0	0	7
DARWIN	0	7	1	4	12
PAMSSAFL	0	7	4	5	16
SeamFuzz	0	7	1	0	8

下载: 导出CSV

表 5 真实应用中的漏洞挖掘情况

程序	版本	所在文件	漏洞类型	AFL	DARWIN	PAMSSAFL	SeamFuzz
binutils	2.28	libbfd.c	缓冲区溢出	0 h 55 min	1 h 11 min	0 h 27 min	0 h 48 min
binutils	2.30	dwarf1.c	整数溢出	∞	1 h 36 min	1 h 3 min	2 h 1 min
cflow	1.6	parser.c	释放后使用	15 h 9 min	10 h 33 min	6 h 36 min	6 h 57 min
cflow	1.6	parser.c	跨界读取	∞	∞	5 h 51 min	10 h 22 min
libexif	0.6.14	exif-entry.c	缓冲区溢出	0 h 18 min	0 h 10 min	0 h 6 min	0 h 5 min
libexif	0.6.14	exif-data.c	跨界读取	0 h 21 min	0 h 24 min	0 h 31 min	0 h 27 min
libtiff	4.0.4	tif_print.c	跨界读取	1 h 25 min	0 h 18 min	0 h 6 min	0 h 53 min
libxml2	2.9.4	valid.c	缓冲区溢出	∞	∞	6 h 8 min	∞
tcpdump	4.9.0	print-sl.c	缓冲区溢出	0 h 49 min	0 h 47 min	0 h 4 min	0 h 3 min
tcpdump	4.9.2	print-bootp.c	跨界读取	0 h 6 min	0 h 7 min	0 h 9 min	0 h 9 min
xpdf	3.02	parser.cc	无限递归	19 h 46 min	3 h 21 min	0 h 51 min	2 h 39 min
平均时长				10 h 4 min	6 h 2 min	1 h 59 min	4 h 24 min

下载: 导出CSV

表 6 漏洞所在代码段

1. static u_int lastlen[2][256];
2. static void sliplink_print() {
3. 　int dir = p[SLX_DIR];
4. 　compressed_sl_print(dir);
5. }
6. static void compressed_sl_print(int dir) {
7. 　lastlen[dir][lastconn] = length - (hlen << 2);
8. }

下载: 导出CSV

表 7 路径数量和边数量统计

目标程序	版本	输入参数	AFL		DARWIN		PAMSSAFL		SeamFuzz
目标程序	版本	输入参数	路径	边	路径	边	路径	边	路径	边
cflow	1.7	@@	1588	2251	1449	2219	1644	2247	1640	2230
objdump	2.41	-d @@	2020	6462	2174	6441	2222	6590	2050	6467
tcpdump	4.99.4	-nr @@	4603	11441	4709	11790	5198	12414	4864	11401
tiffinfo	4.6.0	@@	2991	4391	2879	4424	2959	4356	2998	4394
增幅(%)			0	0	+0.08	+1.34	+7.33	+4.33	+3.12	–0.22

下载: 导出CSV

参考文献(17)

[1]	ZALEWSKI M. American Fuzzy Lop (AFL) fuzzer[EB/OL]. https://lcamtuf.coredump.cx/afl/, 2023.
[2]	Honggfuzz: A security oriented, feedback-driven, evolutionary, easy-to-use fuzzer[EB/OL]. https://github.com/google/honggfuzz, 2023.
[3]	LLWM. LibFuzzer—A library for coverage-guided fuzz testing[EB/OL]. http://llvm.org/docs/LibFuzzer.html, 2023.
[4]	LEE M, CHA S, and OH H. Learning seed-adaptive mutation strategies for greybox fuzzing[C]. 2023 IEEE/ACM 45th International Conference on Software Engineering, Melbourne, Australia, 2023: 384–396. doi: 10.1109/ICSE48619.2023.00043.
[5]	AGRAWAL S and GOYAL N. Analysis of thompson sampling for the multi-armed bandit problem[C]. The 25th Annual Conference on Learning Theory, Edinburgh, UK, 2012: 39.
[6]	AUER P, CESA-BIANCHI N, and FISCHER P. Finite-time analysis of the Multiarmed bandit problem[J]. Machine Learning, 2002, 47(2/3): 235–256. doi: 10.1023/A:1013689704352.
[7]	JAUERNIG P, JAKOBOVIC D, PICEK S, et al. DARWIN: Survival of the fittest fuzzing mutators[C]. The 30th Annual Network and Distributed System Security Symposium, San Diego, USA, 2023. doi: 10.14722/ndss.2023.23159.
[8]	LEMIEUX C and SEN K. FairFuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage[C]. The 33rd IEEE/ACM International Conference on Automated Software Engineering, Montpellier, France, 2018: 475–485. doi: 10.1145/3238147.3238176.
[9]	SHE Dongdong, SHAH A, and JANA S. Effective seed scheduling for fuzzing with graph centrality analysis[C]. The 43rd IEEE Symposium on Security and Privacy, San Francisco, USA, 2022: 2194–2211. doi: 10.1109/SP46214.2022.9833761.
[10]	SAHA S, SARKER L, SHAFIUZZAMAN M, et al. Rare path guided fuzzing[C]. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, USA, 2023: 1295–1306. doi: 10.1145/3597926.3598136.
[11]	LÜ Chenyang, JI Shouling, ZHANG Chao, et al. MOPT: Optimized mutation scheduling for fuzzers[C]. The 28th USENIX Security Symposium, Santa Clara, USA, 2019: 1949–1966.
[12]	WU Mingyuan, JIANG Ling, XIANG Jiahong, et al. One fuzzing strategy to rule them all[C]. The 44th International Conference on Software Engineering, Pittsburgh, USA, 2022: 1634–1645. doi: 10.1145/3510003.3510174.
[13]	SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: The MIT Press, 2018: 31–47.
[14]	李明磊, 陆余良, 黄晖, 等. 模糊测试变异算子调度优化模型[J]. 小型微型计算机系统, 2021, 42(10): 2190–2195. doi: 10.3969/j.issn.1000-1220.2021.10.029. LI Minglei, LU Yuliang, HUANG Hui, et al. Fuzzy tester mutation operator scheduling optimization algorithm[J]. Journal of Chinese Computer Systems, 2021, 42(10): 2190–2195. doi: 10.3969/j.issn.1000-1220.2021.10.029.
[15]	FIORALDI A, MAIER D C, EIßFELDT H, et al. AFL++: Combining incremental steps of fuzzing research[C]. The 14th USENIX Conference on Offensive Technologies, Berkeley, USA, 2020: 10.
[16]	HAZIMEH A, HERRERA A, and PAYER M. Magma: A ground-truth fuzzing benchmark[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2020, 4(3): 49. doi: 10.1145/3428334.
[17]	DOLAN-GAVITT B, HULIN P, KIRDA E, et al. LAVA: Large-scale automated vulnerability addition[C]. 2016 IEEE Symposium on Security and Privacy, San Jose, USA, 2016: 110–121. doi: 10.1109/SP.2016.15.