Advanced Search
Turn off MathJax
Article Contents
YANG Hongyu, LUO Jingchuan, CHENG Xiang, HU Juncheng. Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250470
Citation: YANG Hongyu, LUO Jingchuan, CHENG Xiang, HU Juncheng. Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250470

Source Code Vulnerability Detection Method Integrating Code Sequences and Property Graphs

doi: 10.11999/JEIT250470 cstr: 32379.14.JEIT250470
Funds:  The Civil Aviation Joint Research Fund Project of the National Natural Science Foundation of China (U2433205), The Jiangsu Provincial Basic Research Program Natural Science Foundation - Youth Fund Project (BK20230558)
  • Received Date: 2025-05-27
  • Rev Recd Date: 2025-10-01
  • Available Online: 2025-10-23
  •   Objective  Code vulnerabilities create opportunities for hacker intrusions, and if they are not promptly identified and remedied, they pose serious threats to cybersecurity. Deep learning–based vulnerability detection methods leverage large collections of source code to learn secure programming patterns and vulnerability characteristics, enabling the automated identification of potential security risks and enhancing code security. However, most existing deep learning approaches rely on a single network architecture, extracting features from only one perspective, which constrains their ability to comprehensively capture multi-dimensional code characteristics. Some studies have attempted to address this by extracting features from multiple dimensions, yet the adopted feature fusion strategies are relatively simplistic, typically limited to feature concatenation or weighted combination. Such strategies fail to capture interdependencies among feature dimensions, thereby reducing the effectiveness of feature fusion. To address these challenges, this study proposes a source code vulnerability detection method integrating code sequences and property graphs. By optimizing both feature fusion and vulnerability detection processes, the proposed method effectively enhances the accuracy and robustness of vulnerability detection.  Methods  The proposed method consists of four components: feature representation, feature extraction, feature fusion, and vulnerability detection (Fig. 1). First, vector representations of the code sequence and the Code Property Graph (CPG) are obtained. Using word embedding and node embedding techniques, the code sequence and graph nodes are mapped into fixed-dimensional vectors, which serve as inputs for subsequent feature extraction. Next, the pre-trained UniXcoder model is employed to capture contextual information and extract semantic features from the code. In parallel, a Residual Gated Graph Convolution Network (RGGCN) is applied to the CPG to capture complex structural information, thereby extracting graph structural features. To integrate these complementary representations, a Multimodal Attention Fusion Network (MAFN) is designed to model the interactions between semantic and structural features. This network generates informative fused features for the vulnerability detection task. Finally, a Multilayer Perceptron (MLP) performs classification on the semantic features, structural features, and fused features. An interpolated prediction classifier is then applied to optimize the detection process by balancing multiple prediction outcomes. By adaptively adjusting the model’s focus according to the characteristics of different code samples, the classifier enables the detection model to concentrate on the most critical features, thereby improving overall detection accuracy.  Results and Discussions  To validate the effectiveness of the proposed method, comparative experiments were conducted against baseline approaches on the Devign, Reveal, and SVulD datasets. The experimental results are summarized in (Tables 13). On the Devign dataset, the proposed method achieved an accuracy improvement of 1.38% over SCALE and a precision improvement of 5.19% over CodeBERT. On the Reveal dataset, it improved accuracy by 0.08% compared to SCALE, with precision being closest to that of SCALE. On the SVulD dataset, the method achieved an accuracy improvement of 0.13% over SCALE and a precision gain of 8.15% over Vul-LMGNNs. Collectively, these results demonstrate that the proposed method consistently yields higher accuracy and precision. This improvement can be attributed to its effective integration of semantic information extracted by UniXcoder and structural information captured by RGGCN. By contrast, CodeBERT and LineVul effectively learn code semantics but exhibit insufficient understanding of complex structural patterns, resulting in weaker detection performance. Devign and Reveal employ gated graph neural networks to capture structural information from code graphs but lack the ability to model semantic information contained in code sequences, which constrains their performance. Vul-LMGNNs attempt to improve detection performance by jointly learning semantic and structural features; however, their feature fusion strategy relies on simple concatenation. This approach fails to account for correlations between features, severely limiting the expressive power of the fused representation and reducing detection performance. In contrast, the proposed method fully leverages and integrates semantic and structural features through multimodal attention fusion. By modeling feature interactions rather than treating them independently, it achieves superior accuracy and precision, enabling more effective vulnerability detection.  Conclusions  Fully integrating code features across multiple dimensions can significantly enhance vulnerability detection performance. Compared with baseline methods, the proposed approach enables deeper modeling of interactions among code features, allowing the detection model to develop a more comprehensive understanding of code characteristics and thereby achieve superior detection accuracy and precision.
  • loading
  • [1]
    苏小红, 郑伟宁, 蒋远, 等. 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47(2): 337–374. doi: 10.11897/SP.J.1016.2024.00337.

    SU Xiaohong, ZHENG Weining, JIANG Yuan, et al. Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47(2): 337–374. doi: 10.11897/SP.J.1016.2024.00337.
    [2]
    FU M and TANTITHAMTHAVORN C. LineVul: A transformer-based line-level vulnerability prediction[C]. The 19th International Conference on Mining Software Repositories, Pittsburgh, USA, 2022: 608–620. doi: 10.1145/3524842.3528452.
    [3]
    LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244–2258. doi: 10.1109/TDSC.2021.3051525.
    [4]
    XIA Yuying, SHAO Haijian, and DENG Xing. VulCoBERT: A CodeBERT-based system for source code vulnerability detection[C]. The 2024 International Conference on Generative Artificial Intelligence and Information Security, Guangzhou, China, 2024: 249–252. doi: 10.1145/3665348.3665391.
    [5]
    DU Gewangzi, CHEN Liwei, WU Tongshuai, et al. CPMSVD: Cross-project multiclass software vulnerability detection via fused deep feature and domain adaptation[C]. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 2024: 4950–4954. doi: 10.1109/ICASSP48485.2024.10447552.
    [6]
    SHESTOV A, LEVICHEV R, MUSSABAYEV R, et al. Finetuning large language models for vulnerability detection[J]. IEEE Access, 2025, 13: 38889–38900. doi: 10.1109/ACCESS.2025.3546700.
    [7]
    DO C X, LUU N T, and NGUYEN P T L. Optimizing software vulnerability detection using RoBERTa and machine learning[J]. Automated Software Engineering, 2024, 31(2): 40. doi: 10.1007/s10515-024-00440-1.
    [8]
    FENG Zhangyin, GUO Daya, TANG Duyu, et al. CodeBERT: A pre-trained model for programming and natural languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 1536–1547. doi: 10.18653/v1/2020.findings-emnlp.139. (查阅网上资料,未找到对应的出版地信息,请确认补充).
    [9]
    ZHOU Yaqin, LIU Shangqing, SIOW J, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 915.
    [10]
    CHAKRABORTY S, KRISHNA R, DING Yangruibo, et al. Deep learning based vulnerability detection: Are we there yet?[J]. IEEE Transactions on Software Engineering, 2022, 48(9): 3280–3296. doi: 10.1109/TSE.2021.3087402.
    [11]
    WEN Xincheng, GAO Cuiyun, GAO Shuzheng, et al. SCALE: Constructing structured natural language comment trees for software vulnerability detection[C]. The 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 2024: 235–247. doi: 10.1145/3650212.3652124.
    [12]
    LIU Ruitong, WANG Yanbin, XU Haitao, et al. Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection[J]. Information Fusion, 2025, 115: 102748. doi: 10.1016/j.inffus.2024.102748.
    [13]
    TANG Mingwei, TANG Wei, GUI Qingchi, et al. A vulnerability detection algorithm based on Residual Graph Attention Networks for source code imbalance (RGAN)[J]. Expert Systems with Applications, 2024, 238: 122216. doi: 10.1016/j.eswa.2023.122216.
    [14]
    SHAO Miaomiao, DING Yuxin, CAO Jing, et al. GraphFVD: Property graph-based fine-grained vulnerability detection[J]. Computers & Security, 2025, 151: 104350. doi: 10.1016/j.cose.2025.104350.
    [15]
    胡雨涛, 王溯远, 吴月明, 等. 基于图神经网络的切片级漏洞检测及解释方法[J]. 软件学报, 2023, 34(6): 2543–2561. doi: 10.13328/j.cnki.jos.006849.

    HU Yutao, WANG Suyuan, WU Yueming, et al. Slice-level vulnerability detection and interpretation method based on graph neural network[J]. Journal of Software, 2023, 34(6): 2543–2561. doi: 10.13328/j.cnki.jos.006849.
    [16]
    QIU Fangcheng, LIU Zhongxin, HU Xing, et al. Vulnerability detection via multiple-graph-based code representation[J]. IEEE Transactions on Software Engineering, 2024, 50(8): 2178–2199. doi: 10.1109/tse.2024.3427815.
    [17]
    ZHANG Guodong, YAO Tianyu, QIN Jiawei, et al. CodeSAGE: A multi-feature fusion vulnerability detection approach using code attribute graphs and attention mechanisms[J]. Journal of Information Security and Applications, 2025, 89: 103973. doi: 10.1016/j.jisa.2025.103973.
    [18]
    GUO Daya, LU Shuai, DUAN Nan, et al. UniXcoder: Unified cross-modal pre-training for code representation[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022: 7212–7225. doi: 10.18653/v1/2022.acl-long.499.
    [19]
    BRESSON X and LAURENT T. Residual gated graph convnets[EB/OL]. https://arxiv.org/abs/1711.07553, 2017.
    [20]
    LIN Yuxiao, MENG Yuxian, SUN Xiaofei, et al. BertGCN: Transductive text classification by combining GNN and BERT[C]. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, 2021: 1456–1462. doi: 10.18653/v1/2021.findings-acl.126. (查阅网上资料,未找到对应的出版地信息,请确认).
    [21]
    NI Chao, YIN Xin, YANG Kaiwen, et al. Distinguishing look-alike innocent and vulnerable code by subtle semantic representation learning and explanation[C]. The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, USA, 2023: 1611–1622. doi: 10.1145/3611643.3616358.
    [22]
    GUO Daya, REN Shuo, LU Shuai, et al. GraphCodeBERT: Pre-training code representations with data flow[C]. 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到对应的出版地及页码信息, 请确认补充).
    [23]
    ZHOU Shuyan, ALON U, AGARWAL S, et al. CodeBERTScore: Evaluating code generation with pretrained models of code[C]. The 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Singapore, 2023: 13921–13937. doi: 10.18653/v1/2023.emnlp-main.859.
    [24]
    GUO Daya, XU Canwen, DUAN Nan, et al. LongCoder: A long-range pre-trained language model for code completion[C]. 40th International Conference on Machine Learning, Honolulu, USA, 2023: 12098–12107.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(4)

    Article Metrics

    Article views (29) PDF downloads(2) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return