An Interpretable Vulnerability Detection Method Based on Graph and Code Slicing
-
摘要: 深度学习已被广泛应用于漏洞检测,其主流方法可分为基于代码序列和基于代码图两类:前者易因忽视结构而误报,后者则难以捕获执行顺序。此外,两者普遍缺乏可解释性,难以定位漏洞根源。为此,该文提出一个基于图和代码切片的可解释性漏洞检测方法GSVD。该模型通过门控图卷积网络提取代码多维度图(AST, DDG, CDG)的结构语义,并结合“污点”分析驱动的代码切片与双向长短时记忆网络精准捕获代码序列特征,实现二者优势互补。同时,引入HITS算法思想,设计VDExplainer解释器,直观揭示了模型的决策过程。实验表明,GSVD在Devign数据集上准确率达64.57%,优于多种基线模型,证明了其在有效检测漏洞的同时,能实现代码行级的可解释定位。Abstract:
Objective Deep learning technology has been widely applied to source code vulnerability detection. The mainstream methods can be categorized into sequence-based and graph-based approaches. Sequence-based models usually convert structured code into a linear sequence, which ignores the syntactic and structural information of the program and often leads to a high false-positive rate. Graph-based models can effectively capture structural features, but they fail to model the execution order of the program. In addition, their prediction granularity is usually coarse and limited to the function level. Both types of methods lack interpretability, which makes it difficult for developers to locate the root causes of vulnerabilities. Although large language models (LLM) have made progress in code understanding, they still suffer from high computational overhead, hallucination problems in the security domain, and insufficient understanding of complex program logic. To address these issues, this paper proposes an interpretable vulnerability detection method based on graphs and code slicing (GSVD). The proposed method integrates structural semantics and sequential features, and provides fine-grained, line-level explanations for model decisions. Methods The proposed method consists of four main components: code graph feature extraction, code sequence feature extraction, feature fusion, and an interpreter module ( Fig. 1 ). First, the source code is normalized, and the Joern static analysis tool is used to convert it into multiple code graphs, including the Abstract Syntax Tree (AST), Data Dependency Graph (DDG), and Control Dependency Graph (CDG). These graphs comprehensively represent the syntactic structure, data flow, and control flow of the program. Then, node features are initialized by combining CodeBERT embeddings with one-hot encodings of node types. With the adjacency matrix of each graph, a Gated Graph Convolutional Network (GGCN) equipped with a self-attention pooling layer is applied to extract deep structural semantic features. At the same time, a code slicing algorithm based on taint analysis (Algorithm 1) is designed. In this algorithm, taint sources are identified, and taints are propagated according to data and control dependencies, thereby generating concise code slices that are highly related to potential vulnerabilities. These slices remove irrelevant code noise and are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network to capture long-range sequential dependencies. After obtaining both graph and sequence features, a gating mechanism is introduced for feature fusion. The two feature vectors are fed into a Gated Recurrent Unit (GRU), which automatically learns the dependency relationships between structural and sequential information through its dynamic state updates. Finally, to address vulnerability detection and localization, a VDExplainer is designed, considering the characteristics of the vulnerability detection task. Inspired by the HITS algorithm, it iteratively computes the “authority” and “hub” values of nodes to evaluate their importance under the constraint of an edge mask, thus achieving node-level interpretability for vulnerability explanation.Results and Discussions To evaluate the effectiveness of GSVD, a series of comparative experiments( Table 2 ) are conducted on the Devign (FFmpeg + Qemu) dataset. GSVD is compared with several baseline models. The experimental results show that GSVD achieves the highest accuracy and F1-score of 64.57% and 61.89%, respectively. The recall rate also increases to 62.63%, indicating that the proposed method effectively performs the vulnerability detection task and reduces the number of missed vulnerability reports. To verify the effectiveness of the GRU-based fusion mechanism, three feature fusion strategies—feature concatenation, weighted sum, and attention mechanism—are compared (Table 3 ). GSVD achieves the best overall performance, with accuracy, recall, and F1-score reaching 64.57%, 62.63%, and 61.89%, respectively. Its precision reaches 61.17%, which is slightly lower than the 63.33% obtained by the weighted sum method. Ablation experiments (Tables 4 -5 ) further confirm the importance of the proposed slicing algorithm. The taint propagation-based slicing method reduces the average number of code lines from 51.98 to 17.30 (a 66.72% reduction) and lowers the data redundancy rate to 6.42%, compared with 19.58% for VulDeePecker and 22.10% for SySeVR. This noise suppression effect leads to a 1.53% improvement in the F1-score, demonstrating its ability to focus on key code segments. Finally, interpretability experiments (Table 6 ) on the Big-Vul dataset further validate the effectiveness of the VDExplainer. The proposed method outperforms the standard GNNExplainer at all evaluation thresholds. When 50% of the nodes are selected, the localization accuracy improves by 7.65%, showing its advantage in node-level vulnerability localization. In summary, GSVD not only achieves superior detection performance but also significantly improves the interpretability of model decisions, providing practical support for vulnerability localization and remediation.Conclusions The GSVD model effectively addresses the limitations of single-modal approaches by deeply integrating graph structures with taint analysis-based code slices. It achieves notable improvements in vulnerability detection accuracy and interpretability. In addition, the VDExplainer provides node-level and line-level vulnerability localization, enhancing the practical value of the model. Experimental results confirm the superiority of the proposed method in both detection performance and interpretability. -
Key words:
- Vulnerability detection /
- Deep learning /
- Graph neural network /
- Code slicing /
- Interpretability
-
1 代码切片算法
输入:抽象语法树AST,数据依赖图DDG,控制依赖图CDG,初始污点变量集合${T_0}$ 输出:污染语句集合S 1:T← ${T_0}$ // 初始化污点序列 2:S← $\varnothing $ // 初始化污染语句集合 3:for each 语句 s ∈ AST (深度优先遍历) do 4: if s 含外部输入 then 5: 将输入变量加入 T,并令 S ← S ∪ {s} 6: else if s 依赖于 T 中变量 then // 数据依赖传播 7: if s 为赋值语句 x = f(y) then 8: if y ∈ T, x $\notin $ T then T ← T ∪ {x}, S ← S ∪ {s} 9: else if x ∈ T, y $notin $T then T ← T − {x} // 消毒 10: else if s 为函数调用 z = f(x1, ···, xn) then 11: if $\exists $xi∈T then T ← T∪{z的输出变量}, S ← S∪{s} 12: else T ← T − {z的输出变量} // 消毒 13: else S ← S ∪ {s} 14: end if 15: if s ∈ S then // 控制依赖传播 16: for each c ∈ CDG.control_dependents(s) do 17: 将 c 中变量加入 T,并令 S ← S ∪ {c} 18: end for 19: end if 20:end for 21:return S ≠ $\varnothing $ ? S : AST 表 1 数据集统计特征分布
特征指标 训练集 验证集 测试集 样本总数 21854 2732 2732 平均函数长度 51.71 49.84 51.98 平均语句数 41.84 40.80 41.81 平均AST深度 13.26 13.27 13.23 表 2 GSVD实验结果(%)
Accuracy Precision Recall F1-Score Cppcheck 57.96 79.55 11.34 19.85 FlawFinder 52.04 46.69 34.41 39.62 BiLSTM 59.37 / / / TextCNN 60.69 / / / RoBERTa 61.05 / / / CodeBERT 62.08 / / / Devign 59.22 57.23 44.46 50.04 ReGVD_GCN 61.90 62.62 42.31 50.50 ReGVD_GGCN 62.12 61.58 46.61 53.06 Zeng’s 64.49 64.29 51.08 56.93 GSVD 64.57 61.17 62.63 61.89 1 表 3 特征融合方法对比(%)
Accuracy Precision Recall F1-Score 特征拼接 63.36 59.75 61.99 60.85 加权求和 63.14 63.33 46.93 53.91 注意力机制 62.55 59.54 57.68 58.60 GSVD 64.57 61.17 62.63 61.89 表 4 数据重复度对比(%)
方法 数据重复率 未切片 0.2 Vuldeepecker 19.58 SySeVR 22.10 Ours 6.42 表 5 代码切片影响
准确率 精确率 召回率 F1得分 源代码平均行数 未切片 64.13% 61.30% 59.44% 60.36% 51.98 切片 64.57% 61.17% 62.63% 61.89% 17.30 表 6 解释器解释定位实验(%)
解释器 重要节点筛选比例 漏洞定位准确率
GNNExplainer30 18.37 40 28.94 50 41.12
VDExplainer30 24.30 40 35.68 50 48.77 -
[1] GAO Qing, MA Sen, SHAO Sihao, et al. CoBOT: Static C/C++ bug detection in the presence of incomplete code[C]. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden, 2018: 385–388. doi: 10.1145/3196321.3196367. [2] ZHANG Yu, HUO Wei, JIAN Kunpeng, et al. SRFuzzer: An automatic fuzzing framework for physical SOHO router devices to discover multi-type vulnerabilities[C]. The 35th Annual Computer Security Applications Conference, San Juan, USA, 2019: 544–556. doi: 10.1145/3359789.3359826. [3] LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[C]. The 25th Annual Network and Distributed Systems Security Symposium, San Diego, USA, 2018. doi: 10.14722/ndss.2018.23158. [4] ZOU Deqing, WANG Sujuan, XU Shouhuai, et al. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2224–2236. doi: 10.1109/TDSC.2019.2942930. [5] LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244–2258. doi: 10.1109/TDSC.2021.3051525. [6] ZHOU Yaqin, LIU Shangqing, SIOW J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 915. doi: 10.5555/3454287.3455202. [7] FENG Qi, FENG Chendong, and HONG Weijiang. Graph neural network-based vulnerability predication[C]. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Adelaide, Australia, 2020: 800–801. doi: 10.1109/ICSME46990.2020.00096. [8] NGUYEN V A, NGUYEN D Q, NGUYEN V, et al. ReGVD: Revisiting graph neural networks for vulnerability detection[C]. Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, Pittsburgh, USA, 2022: 178–182. doi: 10.1145/3510454.3516865. [9] CHAKRABORTY S, KRISHNA R, DING Yangruibo, et al. Deep learning based vulnerability detection: Are we there yet?[J]. IEEE Transactions on Software Engineering, 2022, 48(9): 3280–3296. doi: 10.1109/TSE.2021.3087402. [10] ALI G M A and CHEN Hongsong. Contract-guardian: A bagging-based gradient boosting decision tree for detection vulnerability in smart contract[J]. Cluster Computing, 2025, 28(8): 528. doi: 10.1007/s10586-025-05230-2. [11] GUO Daya, ZHU Qihao, YANG Dejian, et al. DeepSeek-coder: When the large language model meets programming -- the rise of code intelligence[J]. arXiv preprint arXiv: 2401.14196, 2024. (不确定本条文献类型及格式是否正确, 请确认).GUO Daya, ZHU Qihao, YANG Dejian, et al. DeepSeek-coder: When the large language model meets programming -- the rise of code intelligence[J]. arXiv preprint arXiv: 2401.14196, 2024. (不确定本条文献类型及格式是否正确, 请确认). [12] DeepSeek-AI. DeepSeek-coder-V2: Breaking the barrier of closed-source models in code intelligence[J]. arXiv preprint arXiv: 2406.11931, 2024. (不确定本条文献类型及格式是否正确, 请确认).DeepSeek-AI. DeepSeek-coder-V2: Breaking the barrier of closed-source models in code intelligence[J]. arXiv preprint arXiv: 2406.11931, 2024. (不确定本条文献类型及格式是否正确, 请确认). [13] AGHAEI E, NIU Xi, SHADID W, et al. SecureBERT: A domain-specific language model for cybersecurity[C]. 18th International Conference on Security and Privacy in Communication Networks, Kansas, USA, 2022: 39–56. doi: 10.1007/978-3-031-25538-0_3. (查阅网上资料,未找到本条文献出版地信息,请确认). [14] SUN Yuqiang, WU Daoyuan, XUE Yue, et al. LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMs' vulnerability reasoning[J]. arXiv preprint arXiv: 2401.16185, 2024. (不确定本条文献类型及格式是否正确, 请确认).SUN Yuqiang, WU Daoyuan, XUE Yue, et al. LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMs' vulnerability reasoning[J]. arXiv preprint arXiv: 2401.16185, 2024. (不确定本条文献类型及格式是否正确, 请确认). [15] FAR S M T and FEYZI F. Large language models for software vulnerability detection: A guide for researchers on models, methods, techniques, datasets, and metrics[J]. International Journal of Information Security, 2025, 24(2): 78. doi: 10.1007/s10207-025-00992-7. [16] ZHOU Xin, CAO Sicong, SUN Xiaobing, et al. Large language model for vulnerability detection and repair: Literature review and the road ahead[J]. ACM Transactions on Software Engineering and Methodology, 2025, 34(5): 145. doi: 10.1145/3708522. [17] YING R, BOURGEOIS D, YOU Jiaxuan, et al. GNNExplainer: Generating explanations for graph neural networks[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 829. doi: 10.5555/3454287.3455116. [18] FAN Jiahao, LI Yi, WANG Shaohua, et al. A C/C++ code vulnerability dataset with code changes and CVE summaries[C]. Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, Korea, 2020: 508–512. doi: 10.1145/3379597.3387501. [19] D'ABRUZZO PEREIRA J and VIEIRA M. On the use of open-source C/C++ static analysis tools in large projects[C]. 2020 16th European Dependable Computing Conference (EDCC), Munich, Germany, 2020: 97–102. doi: 10.1109/EDCC51268.2020.00025. [20] FERSCHKE O, GUREVYCH I, and RITTBERGER M. FlawFinder: A modular system for predicting quality flaws in wikipedia[C]. CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, 2012: 1178. [21] GRAVES A. Long short-term memory[M]. GRAVES A. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012: 37–45. doi: 10.1007/978-3-642-24797-2_4. [22] CHEN Yahui. Convolutional neural network for sentence classification[D]. [Master dissertation], University of Waterloo, 2015. [23] LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[C]. International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020. [24] FENG Zhangyin, GUO Daya, TANG Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 1536–1547. doi: 10.18653/v1/2020.findings-emnlp.139. (查阅网上资料,未找到本条文献出版地信息,请确认). [25] ZENG Ciling, ZHOU Bo, DONG Huoyuan, et al. A general source code vulnerability detection method via ensemble of graph neural networks[C]. The 6th International Conference on Frontiers in Cyber Security, Chengdu, China, 2023: 560–574. doi: 10.1007/978-981-99-9331-4_37. -
下载:
下载: