高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合代码序列与属性图的源代码漏洞检测方法

杨宏宇 罗靖川 成翔 胡俊成

杨宏宇, 罗靖川, 成翔, 胡俊成. 融合代码序列与属性图的源代码漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250470
引用本文: 杨宏宇, 罗靖川, 成翔, 胡俊成. 融合代码序列与属性图的源代码漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250470
YANG Hongyu, LUO Jingchuan, CHENG Xiang, HU Juncheng. Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250470
Citation: YANG Hongyu, LUO Jingchuan, CHENG Xiang, HU Juncheng. Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250470

融合代码序列与属性图的源代码漏洞检测方法

doi: 10.11999/JEIT250470 cstr: 32379.14.JEIT250470
基金项目: 国家自然科学基金民航联合研究基金重点项目(U2433205),江苏省基础研究计划自然科学基金青年基金项目(BK20230558)
详细信息
    作者简介:

    杨宏宇:男,教授,博士生导师,研究方向为网络与系统安全、软件安全、网络安全态势感知

    罗靖川:男,硕士生,研究方向为网络与信息安全

    成翔:男,讲师,研究方向为网络与系统安全、网络安全态势感知、APT攻击检测

    胡俊成:男,讲师,研究方向为计算机体系结构、物联网大数据、人工智能与安全

    通讯作者:

    杨宏宇 yhyxlx@hotmail.com

  • 中图分类号: TP393

Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs

Funds: The Civil Aviation Joint Research Fund Project of the National Natural Science Foundation of China (U2433205), The Jiangsu Provincial Basic Research Program Natural Science Foundation - Youth Fund Project (BK20230558)
  • 摘要: 针对现有源代码漏洞检测方法无法充分提取并有效融合代码特征,导致检测模型学习不全面、漏洞检测性能不佳等问题,该文提出一种融合代码序列与属性图的源代码漏洞检测方法。首先,获取代码序列特征表示和代码属性图特征表示。其次,使用预训练模型UniXcoder提取代码语义特征并使用残差门控图卷积网络提取代码图结构特征。然后,构建一种用于特征融合的多模态注意力融合网络,通过学习代码的语义特征与图结构特征之间的交互关系,形成对漏洞检测任务更有价值的融合特征。最后,通过引入插值预测分类器并调整模型关注点,提升模型针对不同特性样本的泛化性并优化漏洞检测性能。多个数据集上的实验结果表明,所提方法具有良好检测效果,准确率可提升0.08%~1.38%,精确率可提升5.19%~8.15%。
  • 图  1  CSPG框架图

    图  2  MAFN结构图

    图  3  漏洞检测网络架构

    图  4  不同数据集上的消融实验结果

    图  5  各预训练模型组合性能对比

    表  1  Devign数据集上的检测性能对比(%)

    检测方法准确率精确率召回率F1分数
    Devign55.0252.9056.3754.58
    CodeBERT63.4964.4643.3751.85
    Reveal62.3859.8053.7156.59
    LineVul62.3761.5548.2154.07
    Vul-LMGNNs65.1963.6453.8558.33
    SCALE66.1861.8868.6965.11
    CSPG67.5669.6550.3158.42
    下载: 导出CSV

    表  2  Reveal数据集上的检测性能对比(%)

    检测方法准确率精确率召回率F1分数
    Devign87.4936.6531.5533.91
    CodeBERT88.5847.8041.0444.16
    Reveal85.3729.9040.9133.87
    LineVul88.8748.6056.1549.10
    Vul-LMGNNs89.1644.3833.6538.27
    SCALE90.0252.3278.6962.85
    CSPG90.1050.8243.8747.09
    下载: 导出CSV

    表  3  SVulD数据集上的检测性能对比(%)

    检测方法准确率精确率召回率F1分数
    Devign73.579.7250.3116.29
    CodeBERT80.5614.3355.3222.76
    Reveal82.5812.9240.0819.31
    LineVul80.5715.9564.4525.58
    Vul-LMGNNs86.2057.8345.3950.86
    SCALE87.6322.5657.0332.33
    CSPG87.7665.9845.5053.86
    下载: 导出CSV

    表  4  各参数值准确率(%)对比

    λ1=0λ1=0.1λ1=0.2λ1=0.3λ1=0.4λ1=0.5λ1=0.6λ1=0.7λ1=0.8λ1=0.9λ1=1.0
    λ2=054.8763.0565.0366.6965.9866.4664.3265.3165.5564.4864.36
    λ2=0.154.1965.6663.6162.9066.5367.5664.8764.0463.7366.57*
    λ2=0.255.0664.6866.1065.0366.5766.8565.4366.4264.60**
    λ2=0.355.0663.4565.0764.7264.9565.7464.4865.78***
    λ2=0.454.5164.2066.5766.0265.4366.1065.27****
    λ2=0.554.6361.1965.5563.2165.5564.68*****
    λ2=0.655.2263.2164.6865.4364.56******
    λ2=0.754.1563.9664.7564.87*******
    λ2=0.854.4763.2965.19********
    λ2=0.953.5663.21*********
    λ2=1.053.32**********
    下载: 导出CSV
  • [1] 苏小红, 郑伟宁, 蒋远, 等. 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47(2): 337–374. doi: 10.11897/SP.J.1016.2024.00337.

    SU Xiaohong, ZHENG Weining, JIANG Yuan, et al. Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47(2): 337–374. doi: 10.11897/SP.J.1016.2024.00337.
    [2] FU M and TANTITHAMTHAVORN C. LineVul: A transformer-based line-level vulnerability prediction[C]. The 19th International Conference on Mining Software Repositories, Pittsburgh, USA, 2022: 608–620. doi: 10.1145/3524842.3528452.
    [3] LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244–2258. doi: 10.1109/TDSC.2021.3051525.
    [4] XIA Yuying, SHAO Haijian, and DENG Xing. VulCoBERT: A CodeBERT-based system for source code vulnerability detection[C]. The 2024 International Conference on Generative Artificial Intelligence and Information Security, Guangzhou, China, 2024: 249–252. doi: 10.1145/3665348.3665391.
    [5] DU Gewangzi, CHEN Liwei, WU Tongshuai, et al. CPMSVD: Cross-project multiclass software vulnerability detection via fused deep feature and domain adaptation[C]. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 2024: 4950–4954. doi: 10.1109/ICASSP48485.2024.10447552.
    [6] SHESTOV A, LEVICHEV R, MUSSABAYEV R, et al. Finetuning large language models for vulnerability detection[J]. IEEE Access, 2025, 13: 38889–38900. doi: 10.1109/ACCESS.2025.3546700.
    [7] DO C X, LUU N T, and NGUYEN P T L. Optimizing software vulnerability detection using RoBERTa and machine learning[J]. Automated Software Engineering, 2024, 31(2): 40. doi: 10.1007/s10515-024-00440-1.
    [8] FENG Zhangyin, GUO Daya, TANG Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 1536–1547. doi: 10.18653/v1/2020.findings-emnlp.139. (查阅网上资料,未找到对应的出版地信息,请确认补充).
    [9] ZHOU Yaqin, LIU Shangqing, SIOW J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 915.
    [10] CHAKRABORTY S, KRISHNA R, DING Yangruibo, et al. Deep learning based vulnerability detection: Are we there yet?[J]. IEEE Transactions on Software Engineering, 2022, 48(9): 3280–3296. doi: 10.1109/TSE.2021.3087402.
    [11] WEN Xincheng, GAO Cuiyun, GAO Shuzheng, et al. SCALE: Constructing structured natural language comment trees for software vulnerability detection[C]. The 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 2024: 235–247. doi: 10.1145/3650212.3652124.
    [12] LIU Ruitong, WANG Yanbin, XU Haitao, et al. Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection[J]. Information Fusion, 2025, 115: 102748. doi: 10.1016/j.inffus.2024.102748.
    [13] TANG Mingwei, TANG Wei, GUI Qingchi, et al. A vulnerability detection algorithm based on Residual Graph Attention Networks for source code imbalance (RGAN)[J]. Expert Systems with Applications, 2024, 238: 122216. doi: 10.1016/j.eswa.2023.122216.
    [14] SHAO Miaomiao, DING Yuxin, CAO Jing, et al. GraphFVD: Property graph-based fine-grained vulnerability detection[J]. Computers & Security, 2025, 151: 104350. doi: 10.1016/j.cose.2025.104350.
    [15] 胡雨涛, 王溯远, 吴月明, 等. 基于图神经网络的切片级漏洞检测及解释方法[J]. 软件学报, 2023, 34(6): 2543–2561. doi: 10.13328/j.cnki.jos.006849.

    HU Yutao, WANG Suyuan, WU Yueming, et al. Slice-level vulnerability detection and interpretation method based on graph neural network[J]. Journal of Software, 2023, 34(6): 2543–2561. doi: 10.13328/j.cnki.jos.006849.
    [16] QIU Fangcheng, LIU Zhongxin, HU Xing, et al. Vulnerability detection via multiple-graph-based code representation[J]. IEEE Transactions on Software Engineering, 2024, 50(8): 2178–2199. doi: 10.1109/tse.2024.3427815.
    [17] ZHANG Guodong, YAO Tianyu, QIN Jiawei, et al. CodeSAGE: A multi-feature fusion vulnerability detection approach using code attribute graphs and attention mechanisms[J]. Journal of Information Security and Applications, 2025, 89: 103973. doi: 10.1016/j.jisa.2025.103973.
    [18] GUO Daya, LU Shuai, DUAN Nan, et al. UniXcoder: Unified cross-modal pre-training for code representation[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022: 7212–7225. doi: 10.18653/v1/2022.acl-long.499.
    [19] BRESSON X and LAURENT T. Residual gated graph convnets[EB/OL]. https://arxiv.org/abs/1711.07553, 2017.
    [20] LIN Yuxiao, MENG Yuxian, SUN Xiaofei, et al. BertGCN: Transductive text classification by combining GNN and BERT[C]. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, 2021: 1456–1462. doi: 10.18653/v1/2021.findings-acl.126. (查阅网上资料,未找到对应的出版地信息,请确认).
    [21] NI Chao, YIN Xin, YANG Kaiwen, et al. Distinguishing look-alike innocent and vulnerable code by subtle semantic representation learning and explanation[C]. The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, USA, 2023: 1611–1622. doi: 10.1145/3611643.3616358.
    [22] GUO Daya, REN Shuo, LU Shuai, et al. GraphCodeBERT: Pre-training code representations with data flow[C]. 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到对应的出版地及页码信息, 请确认补充).
    [23] ZHOU Shuyan, ALON U, AGARWAL S, et al. CodeBERTScore: Evaluating code generation with pretrained models of code[C]. The 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Singapore, 2023: 13921–13937. doi: 10.18653/v1/2023.emnlp-main.859.
    [24] GUO Daya, XU Canwen, DUAN Nan, et al. LongCoder: A long-range pre-trained language model for code completion[C]. 40th International Conference on Machine Learning, Honolulu, USA, 2023: 12098–12107.
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  11
  • HTML全文浏览量:  5
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-05-27
  • 修回日期:  2025-10-01
  • 网络出版日期:  2025-10-23

目录

    /

    返回文章
    返回