Position-Adaptive Mutation Scheduling Strategy in Fuzzing
-
摘要: 种子自适应变异调度策略是基于变异的模糊测试中最新的技术,该技术能够根据种子的语法和语义特征自适应地调整变异算子的概率分布,然而其存在两个问题:(1)无法根据变异位置自适应地调整概率分布;(2)使用的汤普森采样算法在模糊测试场景中容易导致学习到的概率分布接近平均分布,进而导致变异调度策略失效。针对上述问题,该文提出一种位置自适应变异调度策略,通过一种自定义的双层多臂老虎机模型为变异位置和变异算子建立联系,并且采用置信区间上界算法选择变异算子,实现位置自适应的同时避免了出现平均分布的问题。基于American Fuzzy Lop(AFL)实现了位置自适应的模糊测试器 (PAMSSAFL),实验结果表明位置自适应的变异调度策略能明显提升模糊测试器的bug发现能力和覆盖能力。Abstract: The seed-adaptive mutation scheduling strategy is the latest technology in mutation-based fuzzing, which can adaptively adjust the probability distribution of the mutation operators according to the syntax and semantic characteristics of the seed. However, it has two problems: (1) it is unable to adaptively adjust the probability distribution according to the mutation position; (2) The Thompson Sampling algorithm used in the fuzzing scenario is easy to lead to the learned probability distribution close to the average distribution, which leads to the failure of the mutation scheduling strategy. Focusing on the above problems, a position-adaptive mutation scheduling strategy is proposed. This technology establishes the relationship between the mutation position and the mutation operators through a user-defined double-layer multi-armed bandit model, and uses the Upper Confidence Bound algorithm to select the mutation operator, so as to achieve position adaptation and avoid the problem of average distribution. The position-adaptive fuzzer Position-Adaptive Mutation Scheduling Strategy AFL (PAMSSAFL) is implemented based on American Fuzzy Lop (AFL). The comparison results show that the position-adaptive mutation scheduling strategy can improve the bug detection ability and coverage ability of the fuzzer.
-
Key words:
- Vulnerability mining /
- Fuzzing /
- Mutation /
- Coverage
-
表 1 变异算子类型
类型 释义 Flip(k) 翻转第k个比特,0翻转为1,1翻转为0 Overwrite(s, e, v) 将第s个字节至第e个字节的内容复写为v Arithmetic(s, e, v) 将第s个字节至第e个字节的内容与
v进行算术运算Delete(s, l) 从第s个字节开始删除长度l的内容 Insert(s, v) 在第s个字节之后插入v 表 2 示例代码段
1. function example() { 2. number = getNumber(); 3. if(number == 10) bug1(); 4. content = getContent(); 5. length = strlen(content); 6. if(length > 8) bug2(); 7. } 表 3 MAGMA测试结果
模糊测试器 libpng libtiff libxml2 lua openssl 总计 AFL 覆盖 6 6 8 1 4 25 触发 2 4 1 1 1 9 DARWIN 覆盖 6 6 8 1 4 25 触发 1 3 1 1 1 7 PAMSSAFL 覆盖 6 8 8 1 4 27 触发 1 5 2 1 1 10 SeamFuzz 覆盖 6 6 8 1 4 24 触发 1 3 1 1 1 7 AFL++ 覆盖 6 8 8 1 4 27 触发 2 4 4 1 1 12 PAMSSAFL++ 覆盖 6 8 8 1 4 27 触发 2 5 4 1 2 14 表 4 LAVA-M测试结果
模糊测试器 base64 md5sum uniq who 总计 AFL 0 7 0 0 7 DARWIN 0 7 1 4 12 PAMSSAFL 0 7 4 5 16 SeamFuzz 0 7 1 0 8 表 5 真实应用中的漏洞挖掘情况
程序 版本 所在文件 漏洞类型 AFL DARWIN PAMSSAFL SeamFuzz binutils 2.28 libbfd.c 缓冲区溢出 0 h 55 min 1 h 11 min 0 h 27 min 0 h 48 min binutils 2.30 dwarf1.c 整数溢出 ∞ 1 h 36 min 1 h 3 min 2 h 1 min cflow 1.6 parser.c 释放后使用 15 h 9 min 10 h 33 min 6 h 36 min 6 h 57 min cflow 1.6 parser.c 跨界读取 ∞ ∞ 5 h 51 min 10 h 22 min libexif 0.6.14 exif-entry.c 缓冲区溢出 0 h 18 min 0 h 10 min 0 h 6 min 0 h 5 min libexif 0.6.14 exif-data.c 跨界读取 0 h 21 min 0 h 24 min 0 h 31 min 0 h 27 min libtiff 4.0.4 tif_print.c 跨界读取 1 h 25 min 0 h 18 min 0 h 6 min 0 h 53 min libxml2 2.9.4 valid.c 缓冲区溢出 ∞ ∞ 6 h 8 min ∞ tcpdump 4.9.0 print-sl.c 缓冲区溢出 0 h 49 min 0 h 47 min 0 h 4 min 0 h 3 min tcpdump 4.9.2 print-bootp.c 跨界读取 0 h 6 min 0 h 7 min 0 h 9 min 0 h 9 min xpdf 3.02 parser.cc 无限递归 19 h 46 min 3 h 21 min 0 h 51 min 2 h 39 min 平均时长 10 h 4 min 6 h 2 min 1 h 59 min 4 h 24 min 表 6 漏洞所在代码段
1. static u_int lastlen[2][256]; 2. static void sliplink_print() { 3. int dir = p[SLX_DIR]; 4. compressed_sl_print(dir); 5. } 6. static void compressed_sl_print(int dir) { 7. lastlen[dir][lastconn] = length - (hlen << 2); 8. } 表 7 路径数量和边数量统计
目标程序 版本 输入参数 AFL DARWIN PAMSSAFL SeamFuzz 路径 边 路径 边 路径 边 路径 边 cflow 1.7 @@ 1588 2251 1449 2219 1644 2247 1640 2230 objdump 2.41 -d @@ 2020 6462 2174 6441 2222 6590 2050 6467 tcpdump 4.99.4 -nr @@ 4603 11441 4709 11790 5198 12414 4864 11401 tiffinfo 4.6.0 @@ 2991 4391 2879 4424 2959 4356 2998 4394 增幅(%) 0 0 +0.08 +1.34 +7.33 +4.33 +3.12 –0.22 -
[1] ZALEWSKI M. American Fuzzy Lop (AFL) fuzzer[EB/OL]. https://lcamtuf.coredump.cx/afl/, 2023. [2] Honggfuzz: A security oriented, feedback-driven, evolutionary, easy-to-use fuzzer[EB/OL]. https://github.com/google/honggfuzz, 2023. [3] LLWM. LibFuzzer—A library for coverage-guided fuzz testing[EB/OL]. http://llvm.org/docs/LibFuzzer.html, 2023. [4] LEE M, CHA S, and OH H. Learning seed-adaptive mutation strategies for greybox fuzzing[C]. 2023 IEEE/ACM 45th International Conference on Software Engineering, Melbourne, Australia, 2023: 384–396. doi: 10.1109/ICSE48619.2023.00043. [5] AGRAWAL S and GOYAL N. Analysis of thompson sampling for the multi-armed bandit problem[C]. The 25th Annual Conference on Learning Theory, Edinburgh, UK, 2012: 39. [6] AUER P, CESA-BIANCHI N, and FISCHER P. Finite-time analysis of the Multiarmed bandit problem[J]. Machine Learning, 2002, 47(2/3): 235–256. doi: 10.1023/A:1013689704352. [7] JAUERNIG P, JAKOBOVIC D, PICEK S, et al. DARWIN: Survival of the fittest fuzzing mutators[C]. The 30th Annual Network and Distributed System Security Symposium, San Diego, USA, 2023. doi: 10.14722/ndss.2023.23159. [8] LEMIEUX C and SEN K. FairFuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage[C]. The 33rd IEEE/ACM International Conference on Automated Software Engineering, Montpellier, France, 2018: 475–485. doi: 10.1145/3238147.3238176. [9] SHE Dongdong, SHAH A, and JANA S. Effective seed scheduling for fuzzing with graph centrality analysis[C]. The 43rd IEEE Symposium on Security and Privacy, San Francisco, USA, 2022: 2194–2211. doi: 10.1109/SP46214.2022.9833761. [10] SAHA S, SARKER L, SHAFIUZZAMAN M, et al. Rare path guided fuzzing[C]. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, USA, 2023: 1295–1306. doi: 10.1145/3597926.3598136. [11] LÜ Chenyang, JI Shouling, ZHANG Chao, et al. MOPT: Optimized mutation scheduling for fuzzers[C]. The 28th USENIX Security Symposium, Santa Clara, USA, 2019: 1949–1966. [12] WU Mingyuan, JIANG Ling, XIANG Jiahong, et al. One fuzzing strategy to rule them all[C]. The 44th International Conference on Software Engineering, Pittsburgh, USA, 2022: 1634–1645. doi: 10.1145/3510003.3510174. [13] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: The MIT Press, 2018: 31–47. [14] 李明磊, 陆余良, 黄晖, 等. 模糊测试变异算子调度优化模型[J]. 小型微型计算机系统, 2021, 42(10): 2190–2195. doi: 10.3969/j.issn.1000-1220.2021.10.029.LI Minglei, LU Yuliang, HUANG Hui, et al. Fuzzy tester mutation operator scheduling optimization algorithm[J]. Journal of Chinese Computer Systems, 2021, 42(10): 2190–2195. doi: 10.3969/j.issn.1000-1220.2021.10.029. [15] FIORALDI A, MAIER D C, EIßFELDT H, et al. AFL++: Combining incremental steps of fuzzing research[C]. The 14th USENIX Conference on Offensive Technologies, Berkeley, USA, 2020: 10. [16] HAZIMEH A, HERRERA A, and PAYER M. Magma: A ground-truth fuzzing benchmark[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2020, 4(3): 49. doi: 10.1145/3428334. [17] DOLAN-GAVITT B, HULIN P, KIRDA E, et al. LAVA: Large-scale automated vulnerability addition[C]. 2016 IEEE Symposium on Security and Privacy, San Jose, USA, 2016: 110–121. doi: 10.1109/SP.2016.15.