高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于CNN-GAP可解释性模型的软件源码漏洞检测方法

王剑 匡洪宇 李瑞林 苏云飞

王剑, 匡洪宇, 李瑞林, 苏云飞. 基于CNN-GAP可解释性模型的软件源码漏洞检测方法[J]. 电子与信息学报, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412
引用本文: 王剑, 匡洪宇, 李瑞林, 苏云飞. 基于CNN-GAP可解释性模型的软件源码漏洞检测方法[J]. 电子与信息学报, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412
WANG Jian, KUANG Hongyu, LI Ruilin, SU Yunfei. Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model[J]. Journal of Electronics & Information Technology, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412
Citation: WANG Jian, KUANG Hongyu, LI Ruilin, SU Yunfei. Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model[J]. Journal of Electronics & Information Technology, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412

基于CNN-GAP可解释性模型的软件源码漏洞检测方法

doi: 10.11999/JEIT210412
基金项目: 国家自然科学基金(61702540),湖南省自然科学基金(2018JJ3615)
详细信息
    作者简介:

    王剑:男,1975年生,教授,博士生导师,研究方向为漏洞挖掘与分析、网络安全协议设计与分析、网络安全测试与评估

    匡洪宇:男,1996年生,博士生,研究方向为漏洞挖掘与分析

    李瑞林:男,1982年生,副教授,研究方向为漏洞挖掘与分析、网络安全协议设计与分析

    苏云飞:男,1982年生,博士,研究方向为漏洞挖掘与分析、网络渗透测试分析

    通讯作者:

    王剑 jwang@nudt.edu.cn

  • 中图分类号: TN919.31; TN915.08

Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model

Funds: The National Natural Science Foundation of China (61702540), The Hunan Provincial Natural Science Foundation (2018JJ3615)
  • 摘要: 源代码漏洞检测是保证软件系统安全的重要手段。近年来,多种深度学习模型应用于源代码漏洞检测,极大提高了漏洞检测的效率,但还存在自定义标识符导致库外词过多、嵌入词向量的语义不够准确、神经网络模型缺乏可解释性等问题。基于此,该文提出了一种基于卷积神经网络(CNN)和全局平均池化(GAP)可解释性模型的源代码漏洞检测方法。首先在源代码预处理中对部分自定义标识符进行归一化,并采用One-hot编码进行词嵌入以缓解库外词过多的问题;然后构建CNN-GAP神经网络模型,识别出包含CWE-119缓冲区溢出类型漏洞的函数;最后通过类激活映射(CAM)可解释方法对结果进行可视化输出,标识出可能与漏洞相关的代码。通过与Russell等人提出的模型以及Li等人提出的VulDeePecker模型进行对比分析,表明CNN-GAP模型能达到相当甚至更好的性能,且具有一定的可解释性,便于研究人员对漏洞进行更深入的分析。
  • 图  1  函数源码标识符归一化示例

    图  2  3种形式的分隔符

    图  3  CNN-GAP模型结构

    图  4  CNN-GAP和Russell模型的Precision-Recall曲线图

    图  5  CNN-GAP模型的ROC曲线图

    图  6  堆溢出示例代码

    图  7  栈溢出示例代码

    图  8  CNN-GAP模型针对堆溢出示例代码的可视化结果

    图  9  CNN-GAP模型针对栈溢出示例代码的可视化结果

    表  1  CNN-GAP模型在Russell测试集上的结果

    AccuracyPrecisionRecallF1TPRFPRFNR
    CNN-GAP0.89450.95310.82990.88720.82990.04070.1700
    下载: 导出CSV

    表  2  CNN-GAP模型与VulDeePecker模型的实验结果对比

    AccuracyPrecisionRecallF1TPRFPRFNR
    CNN-GAP0.84900.92350.76100.83440.76100.06300.2390
    VulDeePecker0.91700.82000.86600.82000.02900.1800
    下载: 导出CSV
  • [1] COUSOT P, COUSOT R, FERET J, et al. The ASTRÉE analyzer[C]. The 14th European Symposium on Programming, Edinburgh, UK, 2005: 21–30.
    [2] HOLZMANN G J. The model checker SPIN[J]. IEEE Transactions on Software Engineering, 1997, 23(5): 279–295. doi: 10.1109/32.588521
    [3] YAMAGUCHI F, LOTTMANN M, and RIECK K. Generalized vulnerability extrapolation using abstract syntax trees[C]. The 28th Annual Computer Security Applications Conference, New York, USA, 2012: 359–368.
    [4] YAMAGUCHI F, GOLDE N, ARP D, et al. Modeling and discovering vulnerabilities with code property graphs[C]. The 2014 IEEE Symposium on Security and Privacy, Berkeley, USA, 2014: 590–604.
    [5] MILLER B P, FREDRIKSEN L, and SO B. An empirical study of the reliability of UNIX utilities[J]. Communications of the ACM, 1990, 33(12): 32–44. doi: 10.1145/96267.96279
    [6] STEPHENS N, GROSEN J, SALLS C, et al. Driller: Augmenting fuzzing through selective symbolic execution[C]. The 2016 23rd Network and Distributed System Security Symposium, San Diego, USA, 2016: 1–16.
    [7] PORTOKALIDIS G, SLOWINSKA A, and BOS H. Argos: An emulator for fingerprinting zero-day attacks for advertised honeypots with automatic signature generation[C]. The 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, New York, USA, 2006: 15–27.
    [8] 邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化: 软件漏洞挖掘技术进展[J]. 清华大学学报:自然科学版, 2018, 58(12): 1079–1094.

    ZOU Quanchen, ZHANG Tao, WU Runpu, et al. From automation to intelligence: Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University:Science &Technology, 2018, 58(12): 1079–1094.
    [9] LIN Guanjun, ZHANG Jun, LUO Wei, et al. POSTER: Vulnerability discovery with function representation learning from unlabeled projects[C]. The 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, USA, 2017: 2539–2541.
    [10] LIN Guanjun, ZHANG Jun, LUO Wei, et al. Cross-project transfer representation learning for vulnerable function discovery[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3289–3297. doi: 10.1109/TII.2018.2821768
    [11] LIN Guanjun, ZHANG Jun, LUO Wei, et al. Software vulnerability discovery via learning multi-domain knowledge bases[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2469–2485. doi: 10.1109/TDSC.2019.2954088
    [12] LIU Shigang, LIN Guanjun, HAN Qinglomg, et al. DeepBalance: Deep-learning and fuzzy oversampling for vulnerability detection[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(7): 1329–1343.
    [13] LIU Shigang, LIN Guanjun, QU Lizhen, et al. CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(1): 438–451. doi: 10.1109/TDSC.2020.2984505
    [14] LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[C]. The 25th Annual Network and Distributed System Security Symposium, San Diego, USA, 2018.
    [15] ZOU Deqing, WANG Sujuan, XU Shouhuai, et al. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2224–2236. doi: 10.1109/TDSC.2019.2942930
    [16] LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeeLocator: A deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2821–2837. doi: 10.1109/TDSC.2021.3076142
    [17] RUSSELL R, KIM L, HAMILTON L, et al. Automated vulnerability detection in source code using deep representation learning[C]. The 17th IEEE International Conference on Machine Learning and Applications, Orlando, USA, 2018: 757–762.
    [18] ZHOU Yaqin, LIU Shangqing, SIOW J K, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. The 33rd International Conference on Neural Information Processing Systems, Red Hook, USA, 2019: 10197–10207.
    [19] 段旭, 吴敬征, 罗天悦, 等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020, 31(11): 3404–3420.

    DUAN Xu, WU Jingzheng, LUO Tianyue, et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020, 31(11): 3404–3420.
    [20] ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. 2016 IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2921–2929.
  • 加载中
图(9) / 表(2)
计量
  • 文章访问数:  1944
  • HTML全文浏览量:  981
  • PDF下载量:  289
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-05-12
  • 修回日期:  2021-11-11
  • 录用日期:  2021-11-11
  • 网络出版日期:  2021-11-15
  • 刊出日期:  2022-07-25

目录

    /

    返回文章
    返回