高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于CNN与LSTM相结合的恶意域名检测模型

张斌 廖仁杰

张斌, 廖仁杰. 基于CNN与LSTM相结合的恶意域名检测模型[J]. 电子与信息学报, 2021, 43(10): 2944-2951. doi: 10.11999/JEIT200679
引用本文: 张斌, 廖仁杰. 基于CNN与LSTM相结合的恶意域名检测模型[J]. 电子与信息学报, 2021, 43(10): 2944-2951. doi: 10.11999/JEIT200679
Bin ZHANG, Renjie LIAO. Malicious Domain Name Detection Model Based on CNN and LSTM[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2944-2951. doi: 10.11999/JEIT200679
Citation: Bin ZHANG, Renjie LIAO. Malicious Domain Name Detection Model Based on CNN and LSTM[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2944-2951. doi: 10.11999/JEIT200679

基于CNN与LSTM相结合的恶意域名检测模型

doi: 10.11999/JEIT200679
基金项目: 河南省基础与前沿技术研究计划基金(142300413201),信息保障技术重点实验室开放基金项目(KJ-15-109),信息工程大学科研项目(2019f3303)
详细信息
    作者简介:

    张斌:男,1969年生,教授,博士生导师,研究方向为信息系统安全

    廖仁杰:男,1996年生,硕士生,研究方向为基于机器学习的恶意域名检测

    通讯作者:

    廖仁杰 lrj2803@163.com

  • 中图分类号: TN915.08; TP393

Malicious Domain Name Detection Model Based on CNN and LSTM

Funds: The Foundation and Frontier Technology Research Project of Henan Province (142300413201), The Open Fund Project of Information Assurance Technology Key Laboratory (KJ-15-109), The Research Project of Information Engineering University (2019f3303)
  • 摘要: 为提高恶意域名检测准确率,该文提出一种基于卷积神经网络(CNN)与长短期记忆网络(LSTM)相结合的域名检测模型。该模型通过提取域名字符串中不同长度字符组合的序列特征进行恶意域名检测:首先,为避免N-Gram特征稀疏分布的问题,采用CNN提取域名字符串中字符组合特征并转化为维度固定的稠密向量;其次,为充分挖掘域名字符串上下文信息,采用LSTM提取字符组合前后关联的深层次序列特征,同时引入注意力机制为填充字符所处位置的输出特征分配较小权重,降低填充字符对特征提取的干扰,增强对长距离序列特征的提取能力;最后,将CNN提取局部特征与LSTM提取序列特征的优势相结合,获得不同长度字符组合的序列特征进行域名检测。实验表明:该模型较单一采用CNN或LSTM的模型具有更高的召回率和F1分数,尤其对matsnu和suppobox两类恶意域名的检测准确率较单一采用LSTM的模型提高了24.8%和3.77%。
  • 图  1  基于CNN与LSTM相结合的恶意域名检测模型(LSTM -Parallel CNN ATT-LSTM, L-PCAL)

    图  2  结合注意力机制的LSTM单元(ATT-LSTM)

    图  3  ROC曲线对比图

    图  4  注意力权值可视化

    表  1  模型检测性能对比表

    模型Recall (%)Precision (%)FPR (%)F1-ScoreTest Time(s)
    Bi-Gram DT84.3775.3222.600.79591.05
    LSTM93.7593.586.570.93674.46
    Bi-LSTM90.8896.493.380.93607.34
    Stack-CNN86.3194.015.620.90010.62
    Parallel-CNN88.3994.545.220.91360.57
    PCAL92.6695.963.980.942812.16
    L-PCL92.1796.383.540.942313.26
    CAL-PCAL93.0295.413.980.942011.94
    本文L-PCAL93.9195.424.610.946612.67
    下载: 导出CSV

    表  2  不同模型TPR与AUC对比表

    模型TPR (%)AUC
    FPR: 1%FPR:2%FPR:3%
    LSTM80.1285.8289.830.9846
    Bi-LSTM83.1188.1990.230.9840
    Stack-CNN72.5879.8282.240.9613
    Parallel-CNN77.1382.0484.850.9671
    本文L-PCAL85.7490.4092.170.9867
    下载: 导出CSV

    表  3  单词拼接类恶意域名检测准确率对比表

    模型Accuracy (%)
    matsnusuppobox
    LSTM0.7881.57
    Bi-LSTM0.7874.59
    Stack-CNN018.39
    Parallel-CNN0.7816.08
    PCAL066.86
    L-PCL37.9874.59
    CAL-PCAL7.7574.59
    本文L-PCAL25.5885.34
    下载: 导出CSV
  • [1] ZHAUNIAROVICH Y, KHALIL I, YU Ting, et al. A survey on malicious domains detection through DNS data analysis[J]. ACM Computing Surveys, 2018, 51(4): 67. doi: 10.1145/3191329
    [2] 张维维, 龚俭, 刘茜, 等. 基于词素特征的轻量级域名检测算法[J]. 软件学报, 2016, 27(9): 2348–2364. doi: 10.13328/j.cnki.jos.004913

    ZHANG Weiwei, GONG Jian, LIU Qian, et al. Lightweight domain name detection algorithm based on morpheme features[J]. Journal of Software, 2016, 27(9): 2348–2364. doi: 10.13328/j.cnki.jos.004913
    [3] SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA-based botnet tracking and intelligence[C]. The 11th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Egham, UK, 2014: 192–211. doi: 10.1007/978-3-319-08509-8_11.
    [4] YADAV S, REDDY A K K, REDDY A L N, et al. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis[J]. IEEE/ACM Transactions on Networking, 2012, 20(5): 1663–1677. doi: 10.1109/tnet.2012.2184552
    [5] YU Bin, PAN Jie, HU Jiaming, et al. Character level based detection of DGA domain names[C]. 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 2018: 1–8. doi: 10.1109/ijcnn.2018.8489147.
    [6] SAXE J and BERLIN K. eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys[EB/OL]. https://arxiv.org/abs/1702.08568, 2017.
    [7] 杨路辉, 刘光杰, 翟江涛, 等. 一种改进的卷积神经网络恶意域名检测算法[J]. 西安电子科技大学学报, 2020, 47(1): 37–43. doi: 10.19665/j.issn1001-2400.2020.01.006

    YANG Luhui, LIU Guangjie, ZHAI Jiangtao, et al. Improved algorithm for detection of the malicious domain name based on the convolutional neural network[J]. Journal of Xidian University, 2020, 47(1): 37–43. doi: 10.19665/j.issn1001-2400.2020.01.006
    [8] HOCHREITER S and SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735–1780. doi: 10.1162/neco.1997.9.8.1735
    [9] WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks[EB/OL]. https://arxiv.org/abs/1611.00791, 2016.
    [10] TRAN D, MAC H, TONG V, et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection[J]. Neurocomputing, 2018, 275: 2401–2413. doi: 10.1016/j.neucom.2017.11.018
    [11] 杜鹏, 丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法[J]. 计算机研究与发展, 2020, 57(2): 433–446. doi: 10.7544/issn1000-1239.2020.20190160

    DU Peng and DING Shifei. A DGA domain name detection method based on deep learning models with mixed word embedding[J]. Journal of Computer Research and Development, 2020, 57(2): 433–446. doi: 10.7544/issn1000-1239.2020.20190160
    [12] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. https://arxiv.org/abs/1301.3781, 2013.
    [13] RAFFEL C and ELLIS D P W. Feed-forward networks with attention can solve some long-term memory problems[EB/OL]. https://arxiv.org/abs/1512.08756, 2015.
    [14] 谢金宝, 侯永进, 康守强, 等. 基于语义理解注意力神经网络的多元特征融合中文文本分类[J]. 电子与信息学报, 2018, 40(5): 1258–1265. doi: 10.11999/JEIT170815

    XIE Jinbao, HOU Yongjin, KANG Shouqiang, et al. Multi-feature fusion based on semantic understanding attention neural network for Chinese text categorization[J]. Journal of Electronics &Information Technology, 2018, 40(5): 1258–1265. doi: 10.11999/JEIT170815
    [15] Alexa Internet, Inc. Alexa top-ranked websites[EB/OL]. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip, 2020.
    [16] Qihoo 360 Technology Co, Ltd. 360 DGA feeds[EB/OL]. https://data.netlab.360.com/dga/, 2020.
  • 加载中
图(4) / 表(3)
计量
  • 文章访问数:  1724
  • HTML全文浏览量:  1097
  • PDF下载量:  216
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-04
  • 修回日期:  2020-12-13
  • 网络出版日期:  2021-02-06
  • 刊出日期:  2021-10-18

目录

    /

    返回文章
    返回