高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多模态联合蒸馏优化的源代码漏洞检测方法

张学军 张一帆 刘灿灿 加小红 陈卓 张蕾

张学军, 张一帆, 刘灿灿, 加小红, 陈卓, 张蕾. 多模态联合蒸馏优化的源代码漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250453
引用本文: 张学军, 张一帆, 刘灿灿, 加小红, 陈卓, 张蕾. 多模态联合蒸馏优化的源代码漏洞检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250453
Multi-modal Joint Distillation Optimization for Source Code Vulnerability Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250453
Citation: Multi-modal Joint Distillation Optimization for Source Code Vulnerability Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250453

多模态联合蒸馏优化的源代码漏洞检测方法

doi: 10.11999/JEIT250453 cstr: 32379.14.JEIT250453
基金项目: 甘肃省重点研发计划(25YFFA089),甘肃省教育厅产业支撑基金 (2022CYZC-38),国家自然科学基金(62366029)
详细信息
    作者简介:

    张学军:男,博士,教授,博导,研究方向为位置隐私保护、漏洞挖掘、边缘计算

    张一帆:男,硕士生,研究方向为漏洞挖掘、深度学习

    刘灿灿:女,硕士生,研究方向为漏洞挖掘、深度学习

    加小红:男,博士,讲师,研究方向为模式识别与图像处理

    陈卓:男,硕士生,研究方向为漏洞挖掘、深度学习

    张蕾:女,硕士,工程师,研究方向为网络与信息安全

    通讯作者:

    张学军 xuejunzhang@mail.lzjtu.cn

  • 中图分类号: TN915.08

Multi-modal Joint Distillation Optimization for Source Code Vulnerability Detection

Funds: The Key Research and Development Program of Gansu Province (25YFFA089), The Industry Support Fund of Education Department of Gansu Province (No.2022CYZC-38); The National Natural Science Foundation of China(62366029)
  • 摘要: 针对现有基于深度学习的源代码漏洞检测方法在特征利用不充分、易学习虚假特征及跨模态一致性优化不足的问题,该文提出一种深度蒸馏与多模态一致性提升的漏洞检测框架mVulD-DO。该方法通过程序依赖图和代码切片技术,从源代码中提取函数名、变量名、Token_type及局部代码片段等多个语义模态以提升代码语义的刻画精度,并结合异构邻接矩阵与图注意力网络构建结构模态;接着,引入多层特征蒸馏层对各语义模态进行深层蒸馏以提炼特征主峰,利用BLSTM捕获时序依赖,并通过自适应动态Sinkhorn正则化在全局范围内对齐语义与结构特征分布;最终,经过对齐的模态输入全局注意力层进行融合,融合特征经过softmax分类器实现二分类检测。大量对比与消融实验表明,mVulD-DO在准确率、F1-score和Recall等指标上达到87.11%, 86.37%, 83.59%,均优于主流方法,验证了多模态表征、深度蒸馏及联合优化在漏洞检测中的协同优势和泛化能力。
  • 图  1  mVulD-DO的总体框架

    1  PDG-C子图切片过程

     输入:G=(V,E),关键节点集合K
     输出:子图G_S
      1. 初始化节点集S←K2. 重复以下步骤,直至S不再变化:
      a. 正向切片:对每个v∈S,遍历所有(v,u)∈E,将u加入临时
       集合New
      b. 反向切片:对每个v∈S,遍历所有(u,v)∈E,将u加入New
      c. 合并:S←S∪Newd. 清空New
     3. 子图提取:G_S←G[S]
    下载: 导出CSV

    表  1  与基线方法对比实验结果(%)

    数据集方法RecallFPRF1ACC
    CVE-fixes
    +
    SARD
    mVulD-DO83.5910.9886.3787.11
    VulDeePecker55.4144.6858.1855.41
    SlicedLocator59.7516.5978.2682.04
    SySeVR77.3922.6179.3377.39
    ReGVD77.338.0682.0185.84
    Devign58.1035.0062.6063.11
    VulLLM-CL-7b67.5149.9656.7157.28
    Codebert-FT57.712.0171.9181.17
    UniXcoder-FT72.765.5180.6585.42
    GraphCodeBERT-FT23.648.6834.8363.06
    下载: 导出CSV

    表  2  不同表征的消融实验结果(%)

    方法RecallFPRF1ACCP
    mVulD-DO (var+type+tokens)87.9916.7585.0785.2579.28
    mVulD-DO (func+type+tokens)87.6016.4385.0785.2779.51
    mVulD-DO (func+var+tokens)83.7817.0683.0283.3078.15
    mVulD-DO (type+tokens)91.8525.1881.9682.0072.65
    mVulD-DO (var+tokens)90.9219.6784.6984.7977.09
    mVulD-DO (func+tokens)82.3817.8581.9582.2477.06
    mVulD-DO (tokens)89.1926.0980.2880.4271.23
    mVulD-DO (undis)68.4912.1977.2677.2787.09
    mVulD-DO83.5910.9886.3787.1183.93
    下载: 导出CSV

    表  3  不同数据集下进行联合学习的对比实验(%)

    数据集是否联合优化ACCRecallFPRPF1
    CallmVulD-DO (unsinkhorn)85.6689.8817.4779.8285.58
    mVulD-DO87.1290.4615.4781.9387.03
    ArraymVulD-DO (unsinkhorn)90.7094.0311.6084.8890.52
    mVulD-DO92.0396.6511.1785.7091.90
    PtrmVulD-DO (unsinkhorn)70.9062.5321.7171.7670.45
    mVulD-DO71.2263.4321.9171.8770.81
    OPSmVulD-DO (unsinkhorn)88.3785.9510.4280.4387.12
    mVulD-DO92.1791.837.6585.6791.34
    TotalmVulD-DO (unsinkhorn)82.5477.0413.4580.6581.98
    mVulD-DO87.1183.5910.9883.9386.37
    下载: 导出CSV
  • [1] Skybox Security. Vulnerability and threat trends report 2024[EB/OL]. https://www.skyboxsecurity.com/resources/report/vulnera-bility-threat-trends-report-2024, 2025. (查阅网上资料,未找到本条文献信息且网址打不开,请确认).
    [2] Coverity Scan. Coverity scan static analysis[EB/OL]. https://scan.coverity.com/, 2024.
    [3] AYEWAH N, PUGH W, HOVEMEYER D, et al. Using static analysis to find bugs[J]. IEEE Software, 2008, 25(5): 22–29. doi: 10.1109/MS.2008.130.
    [4] PERL H, DECHAND S, SMITH M, et al. VCCFinder: Finding potential vulnerabilities in open-source projects to assist code audits[C]. The 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, USA, 2015: 426–437. doi: 10.1145/2810103.2813604.
    [5] LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[C]. The 25th Annual Network and Distributed System Security Symposium, San Diego, USA, 2018.
    [6] LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244–2258. doi: 10.1109/TDSC.2021.3051525.
    [7] JIANG Yuan, ZHANG Yujian, SU Xiaohong, et al. StagedVulBERT: Multigranular vulnerability detection with a novel pretrained code model[J]. IEEE Transactions on Software Engineering, 2024, 50(12): 3454–3471. doi: 10.1109/TSE.2024.3493245.
    [8] 杨宏宇, 马建辉, 侯旻, 等. 基于多模态对比学习的代码表征增强预训练方法[J]. 软件学报, 2024, 35(4): 1601–1617. doi: 10.13328/j.cnki.jos.007016.

    YANG Hongyu, MA Jianhui, HOU Min, et al. Pre-training method for enhanced code representation based on multimodal contrastive learning[J]. Journal of Software, 2024, 35(4): 1601–1617. doi: 10.13328/j.cnki.jos.007016.
    [9] ZHANG Kechi, LI Jia, LI Zhuo, et al. Transformer-based code model with compressed hierarchy representation[J]. Empirical Software Engineering, 2025, 30(2): 60. doi: 10.1007/s10664-025-10612-6.
    [10] YAMAGUCHI F, LINDNER F, and RIECK K. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning[C]. Proceedings of the 5th USENIX Conference on Offensive Technologies, San Francisco, USA, 2011.
    [11] DAM H K, PHAM T, NG S W, et al. Lessons learned from using a deep tree-based model for software defect prediction in practice[C]. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, Canada, 2019: 46–57. doi: 10.1109/MSR.2019.00017.
    [12] 李韵, 黄辰林, 王中锋, 等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020, 31(7): 2040–2061. doi: 10.13328/j.cnki.jos.006055.

    LI Yun, HUANG Chenlin, WANG Zhongfeng, et al. Survey of software vulnerability mining methods based on machine learning[J]. Journal of Software, 2020, 31(7): 2040–2061. doi: 10.13328/j.cnki.jos.006055.
    [13] FENG Qi, FENG Chendong, and HONG Weijiang. Graph neural network-based vulnerability predication[C]. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Adelaide, Australia, 2020: 800–801. doi: 10.1109/ICSME46990.2020.00096.
    [14] GHAFFARIAN S M and SHAHRIARI H R. Neural software vulnerability analysis using rich intermediate graph representations of programs[J]. Information Sciences, 2021, 553: 189–207. doi: 10.1016/j.ins.2020.11.053.
    [15] WU Bolun, ZOU Futai, YI Ping, et al. SlicedLocator: Code vulnerability locator based on sliced dependence graph[J]. Computers & Security, 2023, 134: 103469. doi: 10.1016/j.cose.2023.103469.
    [16] GUO Xiaobao, KONG A W K, and KOT A. Deep multimodal sequence fusion by regularized expressive representation distillation[J]. IEEE Transactions on Multimedia, 2023, 25: 2085–2096. doi: 10.1109/TMM.2022.3142448.
    [17] FENG Zhangyin, GUO Daya, TANG Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 1536–1547. doi: 10.18653/v1/2020.findings-emnlp.139. (查阅网上资料,未找到对应的出版地信息,请确认补充).
    [18] GUO Daya, REN Shuo, LU Shuai, et al. GraphCodeBERT: Pre-training code representations with data flow[C]. The 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到对应的出版地信息, 请确认补充).
    [19] GUO Daya, LU Shuai, DUAN Nan, et al. UniXcoder: Unified cross-modal pre-training for code representation[C]. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022: 7212–7225. doi: 10.18653/v1/2022.acl-long.499.
    [20] 邓枭, 叶蔚, 谢睿, 等. 基于深度学习的源代码缺陷检测研究综述[J]. 软件学报, 2023, 34(2): 625–654. doi: 10.13328/j.cnki.jos.006696.

    DENG Xiao, YE Wei, XIE Rui, et al. Survey of source code bug detection based on deep learning[J]. Journal of Software, 2023, 34(2): 625–654. doi: 10.13328/j.cnki.jos.006696.
    [21] 张学军, 张奉鹤, 盖继扬, 等. mVulSniffer: 一种多类型源代码漏洞检测方法[J]. 通信学报, 2023, 44(9): 149–160. doi: 10.11959/j.issn.1000-436x.2023184.

    ZHANG Xuejun, ZHANG Fenghe, GAI Jiyang, et al. mVulSniffer: A multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44(9): 149–160. doi: 10.11959/j.issn.1000-436x.2023184.
    [22] XU Xiangzhe, ZHANG Zhuo, SU Zian, et al. Symbol preference aware generative models for recovering variable names from stripped binary[EB/OL]. https://arxiv.org/abs/2306.02546, 2023.
    [23] WANG Yue, WANG Weishi, JOTY S, et al. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation[C]. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2021: 8696–8708. doi: 10.18653/v1/2021.emnlp-main.685.
    [24] WU Xinyi, AJORLOU A, WU Zihui, et al. Demystifying oversmoothing in attention-based graph neural networks[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 1524.
    [25] TANG Wensi, LONG Guodong, LIU Lu, et al. Omni-scale CNNs: A simple and effective kernel size configuration for time series classification[C]. Proceedings of the 10th International Conference on Learning Representations, 2022. (查阅网上资料, 未找到对应的出版地信息, 请确认补充).
    [26] MITRE. Common Vulnerabilities and Exposures (CVE)[EB/OL]. https://cve.mitre.org/, 2024. (查阅网上资料,未找到本条文献信息且网址打不开,请确认).
    [27] NIST. Software assurance reference dataset[EB/OL]. https://samate.nist.gov/SARD/test-suites, 2024. (查阅网上资料,未找到本条文献信息且网址打不开,请确认).
    [28] ZHOU Yaqin, LIU Shangqing, SIOW Jingkai, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 915.
    [29] NGUYEN VA, NGUYEN DQ, NGUYEN V, et al. ReGVD: Revisiting graph neural networks for vulnerability detection[C]. Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, Pittsburgh, USA, 2022: 178–182. doi: 10.1145/3510454.3516865.
    [30] DU Xiaohu, WEN Ming, ZHU Jiahao, et al. Generalization-enhanced code vulnerability detection via multi-task instruction fine-tuning[C]. Findings of the Association for Computational Linguistics: ACL, Bangkok, Thailand, 2024: 10507–10521. doi: 10.18653/v1/2024.findings-acl.625.
    [31] ESPOSITO M, FALASCHI V, and FALESSI D. An extensive comparison of static application security testing tools[C]. The 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno, Italy, 2024: 69–78. doi: 10.1145/3661167.3661199.
  • 加载中
图(1) / 表(4)
计量
  • 文章访问数:  22
  • HTML全文浏览量:  10
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-05-26
  • 修回日期:  2025-08-20
  • 网络出版日期:  2025-09-01

目录

    /

    返回文章
    返回