高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于图和代码切片的可解释性漏洞检测方法

高文超 索建华 张傲

高文超, 索建华, 张傲. 基于图和代码切片的可解释性漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250363
引用本文: 高文超, 索建华, 张傲. 基于图和代码切片的可解释性漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250363
GAO Wenchao, SUO Jianhua, ZHANG Ao. An Interpretable Vulnerability Detection Method Based on Graph and Code Slicing[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250363
Citation: GAO Wenchao, SUO Jianhua, ZHANG Ao. An Interpretable Vulnerability Detection Method Based on Graph and Code Slicing[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250363

基于图和代码切片的可解释性漏洞检测方法

doi: 10.11999/JEIT250363 cstr: 32379.14.JEIT250363
详细信息
    作者简介:

    高文超:女,副教授,研究方向为自然语言处理、计算机视觉、大数据

    索建华:男,硕士生,研究方向为大模型、自然语言处理

    张傲:男,硕士,研究方向为大数据、人工智能和软件安全

    通讯作者:

    高文超 gaowc@cumtb.edu.cn

  • 11)本文所有表格中,若存在黑体与下划线,则将各项评价指标中排名第一的数据用加粗字体表明,排名第二的数据用下划线表明。

An Interpretable Vulnerability Detection Method Based on Graph and Code Slicing

  • 摘要: 深度学习已被广泛应用于漏洞检测,其主流方法可分为基于代码序列和基于代码图两类:前者易因忽视结构而误报,后者则难以捕获执行顺序。此外,两者普遍缺乏可解释性,难以定位漏洞根源。为此,该文提出一个基于图和代码切片的可解释性漏洞检测方法GSVD。该模型通过门控图卷积网络提取代码多维度图(AST, DDG, CDG)的结构语义,并结合“污点”分析驱动的代码切片与双向长短时记忆网络精准捕获代码序列特征,实现二者优势互补。同时,引入HITS算法思想,设计VDExplainer解释器,直观揭示了模型的决策过程。实验表明,GSVD在Devign数据集上准确率达64.57%,优于多种基线模型,证明了其在有效检测漏洞的同时,能实现代码行级的可解释定位。
  • 图  1  GSVD模型整体架构

    图  2  代码图表征

    图  3  代码切片算法示例

    图  4  重要节点选择过程

    1  代码切片算法

     输入:抽象语法树AST,数据依赖图DDG,控制依赖图CDG,初始污点变量集合${T_0}$
     输出:污染语句集合S
     1:T← ${T_0}$ // 初始化污点序列
     2:S← $\varnothing $ // 初始化污染语句集合
     3:for each 语句 s ∈ AST (深度优先遍历) do
     4: if s 含外部输入 then
     5:  将输入变量加入 T,并令 S ← S ∪ {s}
     6: else if s 依赖于 T 中变量 then // 数据依赖传播
     7:  if s 为赋值语句 x = f(y) then
     8:   if y ∈ T, x $\notin $ T then T ← T ∪ {x}, S ← S ∪ {s}
     9:   else if x ∈ T, y $notin $T then T ← T − {x} // 消毒
     10:  else if s 为函数调用 z = f(x1, ···, xn) then
     11:   if $\exists $xi∈T then T ← T∪{z的输出变量}, S ← S∪{s}
     12:   else T ← T − {z的输出变量} // 消毒
     13:  else S ← S ∪ {s}
     14: end if
     15: if s ∈ S then // 控制依赖传播
     16:  for each c ∈ CDG.control_dependents(s) do
     17:   将 c 中变量加入 T,并令 S ← S ∪ {c}
     18:  end for
     19: end if
     20:end for
     21:return S ≠ $\varnothing $ ? S : AST
    下载: 导出CSV

    表  1  数据集统计特征分布

    特征指标训练集验证集测试集
    样本总数2185427322732
    平均函数长度51.7149.8451.98
    平均语句数41.8440.8041.81
    平均AST深度13.2613.2713.23
    下载: 导出CSV

    表  2  GSVD实验结果(%)

    Accuracy Precision Recall F1-Score
    Cppcheck 57.96 79.55 11.34 19.85
    FlawFinder 52.04 46.69 34.41 39.62
    BiLSTM 59.37 / / /
    TextCNN 60.69 / / /
    RoBERTa 61.05 / / /
    CodeBERT 62.08 / / /
    Devign 59.22 57.23 44.46 50.04
    ReGVD_GCN 61.90 62.62 42.31 50.50
    ReGVD_GGCN 62.12 61.58 46.61 53.06
    Zeng’s 64.49 64.29 51.08 56.93
    GSVD 64.57 61.17 62.63 61.89 1
    下载: 导出CSV

    表  3  特征融合方法对比(%)

    AccuracyPrecisionRecallF1-Score
    特征拼接63.3659.7561.9960.85
    加权求和63.1463.3346.9353.91
    注意力机制62.5559.5457.6858.60
    GSVD64.5761.1762.6361.89
    下载: 导出CSV

    表  4  数据重复度对比(%)

    方法数据重复率
    未切片0.2
    Vuldeepecker19.58
    SySeVR22.10
    Ours6.42
    下载: 导出CSV

    表  5  代码切片影响

    准确率精确率召回率F1得分源代码平均行数
    未切片64.13%61.30%59.44%60.36%51.98
    切片64.57%61.17%62.63%61.89%17.30
    下载: 导出CSV

    表  6  解释器解释定位实验(%)

    解释器重要节点筛选比例漏洞定位准确率

    GNNExplainer
    3018.37
    4028.94
    5041.12

    VDExplainer
    3024.30
    4035.68
    5048.77
    下载: 导出CSV
  • [1] GAO Qing, MA Sen, SHAO Sihao, et al. CoBOT: Static C/C++ bug detection in the presence of incomplete code[C]. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden, 2018: 385–388. doi: 10.1145/3196321.3196367.
    [2] ZHANG Yu, HUO Wei, JIAN Kunpeng, et al. SRFuzzer: An automatic fuzzing framework for physical SOHO router devices to discover multi-type vulnerabilities[C]. The 35th Annual Computer Security Applications Conference, San Juan, USA, 2019: 544–556. doi: 10.1145/3359789.3359826.
    [3] LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[C]. The 25th Annual Network and Distributed Systems Security Symposium, San Diego, USA, 2018. doi: 10.14722/ndss.2018.23158.
    [4] ZOU Deqing, WANG Sujuan, XU Shouhuai, et al. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2224–2236. doi: 10.1109/TDSC.2019.2942930.
    [5] LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244–2258. doi: 10.1109/TDSC.2021.3051525.
    [6] ZHOU Yaqin, LIU Shangqing, SIOW J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 915. doi: 10.5555/3454287.3455202.
    [7] FENG Qi, FENG Chendong, and HONG Weijiang. Graph neural network-based vulnerability predication[C]. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Adelaide, Australia, 2020: 800–801. doi: 10.1109/ICSME46990.2020.00096.
    [8] NGUYEN V A, NGUYEN D Q, NGUYEN V, et al. ReGVD: Revisiting graph neural networks for vulnerability detection[C]. Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, Pittsburgh, USA, 2022: 178–182. doi: 10.1145/3510454.3516865.
    [9] CHAKRABORTY S, KRISHNA R, DING Yangruibo, et al. Deep learning based vulnerability detection: Are we there yet?[J]. IEEE Transactions on Software Engineering, 2022, 48(9): 3280–3296. doi: 10.1109/TSE.2021.3087402.
    [10] ALI G M A and CHEN Hongsong. Contract-guardian: A bagging-based gradient boosting decision tree for detection vulnerability in smart contract[J]. Cluster Computing, 2025, 28(8): 528. doi: 10.1007/s10586-025-05230-2.
    [11] GUO Daya, ZHU Qihao, YANG Dejian, et al. DeepSeek-coder: When the large language model meets programming -- the rise of code intelligence[J]. arXiv preprint arXiv: 2401.14196, 2024. (不确定本条文献类型及格式是否正确, 请确认).

    GUO Daya, ZHU Qihao, YANG Dejian, et al. DeepSeek-coder: When the large language model meets programming -- the rise of code intelligence[J]. arXiv preprint arXiv: 2401.14196, 2024. (不确定本条文献类型及格式是否正确, 请确认).
    [12] DeepSeek-AI. DeepSeek-coder-V2: Breaking the barrier of closed-source models in code intelligence[J]. arXiv preprint arXiv: 2406.11931, 2024. (不确定本条文献类型及格式是否正确, 请确认).

    DeepSeek-AI. DeepSeek-coder-V2: Breaking the barrier of closed-source models in code intelligence[J]. arXiv preprint arXiv: 2406.11931, 2024. (不确定本条文献类型及格式是否正确, 请确认).
    [13] AGHAEI E, NIU Xi, SHADID W, et al. SecureBERT: A domain-specific language model for cybersecurity[C]. 18th International Conference on Security and Privacy in Communication Networks, Kansas, USA, 2022: 39–56. doi: 10.1007/978-3-031-25538-0_3. (查阅网上资料,未找到本条文献出版地信息,请确认).
    [14] SUN Yuqiang, WU Daoyuan, XUE Yue, et al. LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMs' vulnerability reasoning[J]. arXiv preprint arXiv: 2401.16185, 2024. (不确定本条文献类型及格式是否正确, 请确认).

    SUN Yuqiang, WU Daoyuan, XUE Yue, et al. LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMs' vulnerability reasoning[J]. arXiv preprint arXiv: 2401.16185, 2024. (不确定本条文献类型及格式是否正确, 请确认).
    [15] FAR S M T and FEYZI F. Large language models for software vulnerability detection: A guide for researchers on models, methods, techniques, datasets, and metrics[J]. International Journal of Information Security, 2025, 24(2): 78. doi: 10.1007/s10207-025-00992-7.
    [16] ZHOU Xin, CAO Sicong, SUN Xiaobing, et al. Large language model for vulnerability detection and repair: Literature review and the road ahead[J]. ACM Transactions on Software Engineering and Methodology, 2025, 34(5): 145. doi: 10.1145/3708522.
    [17] YING R, BOURGEOIS D, YOU Jiaxuan, et al. GNNExplainer: Generating explanations for graph neural networks[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 829. doi: 10.5555/3454287.3455116.
    [18] FAN Jiahao, LI Yi, WANG Shaohua, et al. A C/C++ code vulnerability dataset with code changes and CVE summaries[C]. Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, Korea, 2020: 508–512. doi: 10.1145/3379597.3387501.
    [19] D'ABRUZZO PEREIRA J and VIEIRA M. On the use of open-source C/C++ static analysis tools in large projects[C]. 2020 16th European Dependable Computing Conference (EDCC), Munich, Germany, 2020: 97–102. doi: 10.1109/EDCC51268.2020.00025.
    [20] FERSCHKE O, GUREVYCH I, and RITTBERGER M. FlawFinder: A modular system for predicting quality flaws in wikipedia[C]. CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, 2012: 1178.
    [21] GRAVES A. Long short-term memory[M]. GRAVES A. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012: 37–45. doi: 10.1007/978-3-642-24797-2_4.
    [22] CHEN Yahui. Convolutional neural network for sentence classification[D]. [Master dissertation], University of Waterloo, 2015.
    [23] LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[C]. International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
    [24] FENG Zhangyin, GUO Daya, TANG Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 1536–1547. doi: 10.18653/v1/2020.findings-emnlp.139. (查阅网上资料,未找到本条文献出版地信息,请确认).
    [25] ZENG Ciling, ZHOU Bo, DONG Huoyuan, et al. A general source code vulnerability detection method via ensemble of graph neural networks[C]. The 6th International Conference on Frontiers in Cyber Security, Chengdu, China, 2023: 560–574. doi: 10.1007/978-981-99-9331-4_37.
  • 加载中
图(4) / 表(7)
计量
  • 文章访问数:  40
  • HTML全文浏览量:  9
  • PDF下载量:  12
  • 被引次数: 0
出版历程
  • 修回日期:  2025-11-13
  • 录用日期:  2025-11-13
  • 网络出版日期:  2025-11-18

目录

    /

    返回文章
    返回