A Lightweight Program Anomaly Detection Method for Heterogeneous Platform
-
摘要: 针对现有异常检测方法因为预学习以及噪声敏感所导致的检测时间长和误报率高的问题,该文通过对现有异常检测案例进行分析,从平台异构性角度提出了一种异常检测的新方法:将程序分别运行在多个异构平台,正常程序的所有平台运行结果相同,而异常程序在不同平台显示出差异性。基于此,该文设计了一种面向异构化平台的轻量级程序异常检测方法,收集系统状态数据并利用特征工程构建对异常表征明显的多维矢量,采用标签编码和Max-Min归一化对数据预处理,计算数据间差异度并应用阈值规则比较分析判别异常。相比于无监督特征聚类方法,所提方法的检测准确率提升了13.12%且具有低误报率和较短的检测时间。Abstract: The existing anomaly detection methods which require pre-learning and are sensitive to noise result in long detection time and high false positive rate. Based on the analysis of the existing anomaly detection cases, a new perspective is proposed from platform heterogeneity: programs are run on multiple heterogeneous platforms, normal programs are run on all platforms with the same result, while anomaly programs show heterogeneity on different platforms. So a lightweight program anomaly detection method for heterogeneous platforms is designed. System state data is collected. Feature engineering is used to construct a multidimensional vector with obvious representation of anomaly. The label code and max-min normalization are used to preprocess the data. The difference degree between the data is calculated and the threshold rule is used to compare, analyze and detect anomaly. Compared with the unsupervised feature clustering method, detection accuracy of the proposed method is improved by 13.12% with low false positive rate and short detection time.
-
Key words:
- Program anomaly detection /
- Heterogeneous platforms /
- System status features /
- Diversity
-
表 1 检测结果
CVE 样本数量 结果 两次崩溃 差异 2016-6946 51 8 40 2016-4204 78 7 37 2016-4119 1 0 1 2016-1091 63 6 31 2016-1077 1 0 1 2016-1046 4 0 4 2015-5097 4 0 4 2015-2426 14 6 8 2015-0090 1 0 1 总数 217 27 127 表 2 数据集中的特征
序号 特征 特征描述 1 total cpu usage 用户空间程序CPU利用率usr 系统空间程序CPU利用率sys CPU空闲百分比idl 硬中断hiq 软中断siq 2 dsk/total 磁盘读带宽read 磁盘写带宽write 3 net/total 网络设备发送数据带宽send 网络设备接收数据带宽recv 4 system 统计中断int 上下文切换csw 5 most expensive in memory 当前占用内存资源最高的进程
memory process6 most expensive in i/o 当前占用i/o资源最高的进程
i/o process7 most expensive in cpu 当前占用CPU资源最高的进程
cpu process8 syscall 系统调用读写信息syscall 表 3 数据预处理算法(算法1)
输入:收集的数据集 输出:预处理后的数据集 (1)取出数据集中的所有数据 (2)对特征进行判断 (3)如果特征是i/o process或cpu process,则取出其所有特征值 (4)创建标签编码器,对所有特征进行编码 (5)对编码后的数据集计算每个特征的最小值min(xj) 和
最大值max(xj)(6)对所有数据,利用式(1)计算其标准化值 表 4 基于系统状态多维特征差异性的异常检测方法(算法2)
输入:平台1,2,3归一化后的数据集E1,E2,E3,差异度间差值的
阈值${\varepsilon } $输出:异常判断结果 (1) 将预处理后的数据集合并。 (2) 根据式(4)—式(6)计算各个属性特征的权重。 (3) 根据式(8)计算两两平台间的状态特征差异度。 (4) 进行差异度的比较,并计算差异度的差值。令 (5) diff1=dif(E1,E2)-dif(E2,E3) (6) diff2=dif(E1,E3)-dif(E2,E3) (7) diff3=dif(E1,E2)-dif(E1,E3) (8) 应用阈值规则,判断平台是否发生异常。 (9) 如果diff1>0,diff2>0且|diff1|>${\varepsilon } $, |diff2|>${\varepsilon } $,则判断平台1异常。 (10) 如果diff2<0,diff3>0且|diff2|>${\varepsilon } $, |diff3|>${\varepsilon } $,则判断平台2异常。 (11) 如果diff1<0,diff3<0且|diff1|>${\varepsilon } $, |diff3|>${\varepsilon } $,则判断平台3异常。 表 5 两种算法在不同抽样数量下的异常检测时间(s)
抽样数 1105 2003 3046 4127 5478 本文算法 0.64 0.72 0.87 1.02 1.35 K-means 1.56 3.32 5.92 8.47 11.70 -
[1] 张祖法. 网络流量中面向缓冲区溢出漏洞的恶意程序检测方法研究[D]. [硕士论文], 江苏大学, 2020.ZHANG Zufa. Research on malware detection method for buffer overflow vulnerability in network traffic[D]. [Master dissertation], Jiangsu University, 2020. [2] 张雄冠, 邵培南. 基于textCNN模型的Android恶意程序检测[J]. 计算机系统应用, 2021, 30(1): 114–121. doi: 10.15888/j.cnki.csa.007722ZHANG Xiongguan and SHAO Peinan. Android malware detection based on textCNN model[J]. Computer Systems &Applications, 2021, 30(1): 114–121. doi: 10.15888/j.cnki.csa.007722 [3] 吴震雄. Android恶意软件静态检测方案研究[D]. [硕士论文], 南京邮电大学, 2015.WU Zhenxiong. Research on android malware static detection system[D]. [Master dissertation], Nanjing University of Posts and Telecommunications, 2015. [4] MA Zhuo, GE Haoran, LIU Yang, et al. A combination method for android malware detection based on control flow graphs and machine learning algorithms[J]. IEEE Access, 2019, 7: 21235–21245. doi: 10.1109/ACCESS.2019.2896003 [5] DINABURG A, ROYAL P, SHARIF M, et al. Ether: Malware analysis via hardware virtualization extensions[C]. The 15th ACM Conference on Computer and Communications Security, Alexandria, USA, 2008: 51–62. doi: 10.1145/1455770.1455779. [6] 张若楠, 李红辉, 张骏温. 一种融合改进Kmeans和KNN的网络入侵检测方法[J]. 计算机科学, 2018, 10A(45): 172–176.ZHANG Ruonan, LI Honghui, and ZHANG Junwen. Hybrid improved Kmeans with improved KNN for network intrusion detection algorithm[J]. Compouter Science, 2018, 10A(45): 172–176. [7] 汪洁, 王长青. 子图相似性的恶意程序检测方法[J]. 软件学报, 2020, 31(11): 3436–3447. doi: 10.13328/j.cnki.jos.005863WANG Jie and WANG Changqing. Malware detection method based on subgraph similarity[J]. Journal of Software, 2020, 31(11): 3436–3447. doi: 10.13328/j.cnki.jos.005863 [8] 陈志峰, 李清宝, 张平, 等. 基于聚类分析的内核恶意软件特征选择[J]. 电子与信息学报, 2015, 37(12): 2821–2829. doi: 10.11999/JEIT150387CHEN Zhifeng, LI Qingbao, ZHANG Ping, et al. Signature selection for kernel malware based on cluster analysis[J]. Journal of Electronics &Information Technology, 2015, 37(12): 2821–2829. doi: 10.11999/JEIT150387 [9] YOO S, KIM S, KIM S, et al. AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification[J]. Information Sciences, 2021, 546: 420–435. doi: 10.1016/j.ins.2020.08.082 [10] 邬江兴. 网络空间拟态防御原理[M]. 2版. 北京: 科学出版社, 2018: 148–149.WU Jiangxing. The Principle of Cyber Mimic Defence[M]. 2nd ed. Beijing: Science Press, 2018: 148–149. . [11] GARCIA M, BESSANI A, GASHI I, et al. Analysis of operating system diversity for intrusion tolerance[J]. Software: Practice and Experience, 2014, 44(6): 735–770. doi: 10.1002/spe.2180 [12] ÖSTERLUND S, KONING K, OLIVIER P, et al. kMVX: Detecting kernel information leaks with multi-variant execution[C]. The Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, USA, 2019: 559–572. doi: 10.1145/3297858.3304054. [13] KIRAT D, VIGNA G, and KRUEGEL C. BareCloud: Bare-metal analysis-based evasive malware detection[C]. The 23rd USENIX conference on Security Symposium, Berkeley, USA, 2014: 287–301. [14] XU Meng and KIM T. PLATPAL: Detecting malicious documents with platform diversity[C]. The 26th USENIX Conference on Security Symposium, Vancouver, Canada, 2017: 271–287. [15] 张剑, 童言, 徐明迪, 等. 轻量级主机数据采集与实时异常事件检测方法研究[J]. 西安交通大学学报, 2017, 51(4): 97–102. doi: 10.7652/xjtuxb201704015ZHANG Jian, TONG Yan, XU Mingdi, et al. A method for data collection and real-time anomaly detection of lightweight hosts[J]. Journal of Xi’an Jiaotong University, 2017, 51(4): 97–102. doi: 10.7652/xjtuxb201704015 [16] 张浚, 张凤荔, 罗琴, 等. 基于多特征相似度的大规模网络异常检测算法[J]. 计算机工程, 2007, 33(24): 181–183. doi: 10.3969/j.issn.1000-3428.2007.24.063ZHANG Jun, ZHANG Fengli, LUO Qin, et al. Large-scale network anomaly detecting method based on multi-feature similarity[J]. Computer Engineering, 2007, 33(24): 181–183. doi: 10.3969/j.issn.1000-3428.2007.24.063 [17] HU Shuai, XIAO Zhihua, RAO Qiang, et al. An anomaly detection model of user behavior based on similarity clustering[C]. Proceedings of 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference, Chongqing, China, 2018. doi: 10.1109/ITOEC.2018.8740748. [18] 缪祥华, 单小撤. 基于密集连接卷积神经网络的入侵检测技术研究[J]. 电子与信息学报, 2020, 42(11): 2706–2712. doi: 10.11999/JEIT190655MIAO Xianghua and SHAN Xiaoche. Research on intrusion detection technology based on densely connected convolutional neural networks[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2706–2712. doi: 10.11999/JEIT190655 [19] 董书琴, 张斌. 基于深度特征学习的网络流量异常检测方法[J]. 电子与信息学报, 2020, 42(3): 695–703. doi: 10.11999/JEIT190266DONG Shuqin and ZHANG Bin. Network traffic anomaly detection method based on deep features learning[J]. Journal of Electronics &Information Technology, 2020, 42(3): 695–703. doi: 10.11999/JEIT190266