Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model

WANG Jian; KUANG Hongyu; LI Ruilin; SU Yunfei

doi:10.11999/JEIT210412

Volume 44 Issue 7

Jul. 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2022 > 44(7): 2568-2575

WANG Jian, KUANG Hongyu, LI Ruilin, SU Yunfei. Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model[J]. Journal of Electronics & Information Technology, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412

Citation:

WANG Jian, KUANG Hongyu, LI Ruilin, SU Yunfei. Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model[J]. Journal of Electronics & Information Technology, 2022, 44(7): 2568-2575. doi: 10.11999/JEIT210412

Citation:

PDF( 1751 KB)

Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model

doi: 10.11999/JEIT210412 cstr: 32379.14.JEIT210412

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

Funds: The National Natural Science Foundation of China (61702540), The Hunan Provincial Natural Science Foundation (2018JJ3615)

Received Date: 2021-05-12
Accepted Date: 2021-11-11
Rev Recd Date: 2021-11-11

Available Online: 2021-11-15

Publish Date: 2022-07-25

Abstract

Abstract

Source code vulnerability detection is an important method to ensure the security of software system. In recent years, a variety of deep learning models are applied to source code vulnerability detection, which improves greatly the efficiency of vulnerability detection. However, there are still some problems in source code vulnerability detection based on deep learning, such as too many words outside the database caused by user-defined identifier, inaccurate semantics of embedded word vector, lack of interpretability of neural network model, and so on. A new software source code vulnerability detection method is proposed based on Convolution Neural Networks (CNN) and Global Average Pooling (GAP) interpretability model. Firstly, some user-defined identifiers are normalized in the source code preprocessing, and one hot coding is used for word embedding to alleviate the problem of too many words outside the database. Then, CNN-GAP neural network model is built to identify the functions containing CWE-119 type vulnerabilities. Finally, Class Activation Mapping (CAM) interpretable method is used to output visually the results and identify the codes that may be related to vulnerabilities. Compared with the model proposed by Russell and Vuldeepecker model proposed by Li et al., the experimental results show that CNN-GAP model can achieve quite or even better performance, and has a certain interpretability, which is convenient for researchers to analyze the vulnerability more deeply.
- Source code vulnerability detection,
- Deep learning,
- Interpretability of neural network

FullText(HTML)

References(20)

References

[1]	COUSOT P, COUSOT R, FERET J, et al. The ASTRÉE analyzer[C]. The 14th European Symposium on Programming, Edinburgh, UK, 2005: 21–30.
[2]	HOLZMANN G J. The model checker SPIN[J]. IEEE Transactions on Software Engineering, 1997, 23(5): 279–295. doi: 10.1109/32.588521
[3]	YAMAGUCHI F, LOTTMANN M, and RIECK K. Generalized vulnerability extrapolation using abstract syntax trees[C]. The 28th Annual Computer Security Applications Conference, New York, USA, 2012: 359–368.
[4]	YAMAGUCHI F, GOLDE N, ARP D, et al. Modeling and discovering vulnerabilities with code property graphs[C]. The 2014 IEEE Symposium on Security and Privacy, Berkeley, USA, 2014: 590–604.
[5]	MILLER B P, FREDRIKSEN L, and SO B. An empirical study of the reliability of UNIX utilities[J]. Communications of the ACM, 1990, 33(12): 32–44. doi: 10.1145/96267.96279
[6]	STEPHENS N, GROSEN J, SALLS C, et al. Driller: Augmenting fuzzing through selective symbolic execution[C]. The 2016 23rd Network and Distributed System Security Symposium, San Diego, USA, 2016: 1–16.
[7]	PORTOKALIDIS G, SLOWINSKA A, and BOS H. Argos: An emulator for fingerprinting zero-day attacks for advertised honeypots with automatic signature generation[C]. The 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, New York, USA, 2006: 15–27.
[8]	邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化: 软件漏洞挖掘技术进展[J]. 清华大学学报:自然科学版, 2018, 58(12): 1079–1094. ZOU Quanchen, ZHANG Tao, WU Runpu, et al. From automation to intelligence: Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University:Science &Technology, 2018, 58(12): 1079–1094.
[9]	LIN Guanjun, ZHANG Jun, LUO Wei, et al. POSTER: Vulnerability discovery with function representation learning from unlabeled projects[C]. The 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, USA, 2017: 2539–2541.
[10]	LIN Guanjun, ZHANG Jun, LUO Wei, et al. Cross-project transfer representation learning for vulnerable function discovery[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3289–3297. doi: 10.1109/TII.2018.2821768
[11]	LIN Guanjun, ZHANG Jun, LUO Wei, et al. Software vulnerability discovery via learning multi-domain knowledge bases[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2469–2485. doi: 10.1109/TDSC.2019.2954088
[12]	LIU Shigang, LIN Guanjun, HAN Qinglomg, et al. DeepBalance: Deep-learning and fuzzy oversampling for vulnerability detection[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(7): 1329–1343.
[13]	LIU Shigang, LIN Guanjun, QU Lizhen, et al. CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(1): 438–451. doi: 10.1109/TDSC.2020.2984505
[14]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A deep learning-based system for vulnerability detection[C]. The 25th Annual Network and Distributed System Security Symposium, San Diego, USA, 2018.
[15]	ZOU Deqing, WANG Sujuan, XU Shouhuai, et al. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2224–2236. doi: 10.1109/TDSC.2019.2942930
[16]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeeLocator: A deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2821–2837. doi: 10.1109/TDSC.2021.3076142
[17]	RUSSELL R, KIM L, HAMILTON L, et al. Automated vulnerability detection in source code using deep representation learning[C]. The 17th IEEE International Conference on Machine Learning and Applications, Orlando, USA, 2018: 757–762.
[18]	ZHOU Yaqin, LIU Shangqing, SIOW J K, et al. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]. The 33rd International Conference on Neural Information Processing Systems, Red Hook, USA, 2019: 10197–10207.
[19]	段旭, 吴敬征, 罗天悦, 等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020, 31(11): 3404–3420. DUAN Xu, WU Jingzheng, LUO Tianyue, et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020, 31(11): 3404–3420.
[20]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. 2016 IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2921–2929.