高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于字符表示学习与时序边界扩散的网络安全实体识别方法

胡泽 李文君 杨宏宇

基于N分随机乘法模型的多重分形海杂波仿真[J]. 电子与信息学报, 2015, 37(6): 1470-1475. doi: 10.11999/JEIT141042
引用本文: 胡泽, 李文君, 杨宏宇. 基于字符表示学习与时序边界扩散的网络安全实体识别方法[J]. 电子与信息学报. doi: 10.11999/JEIT240953
Simulating Multifractal Sea Clutter by N-partitioned Random Multiplicative Process Model[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1470-1475. doi: 10.11999/JEIT141042
Citation: HU Ze, LI Wenjun, YANG Hongyu. A Cybersecurity Entity Recognition Approach Based on Character Representation Learning and Temporal Boundary Diffusion[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240953

基于字符表示学习与时序边界扩散的网络安全实体识别方法

doi: 10.11999/JEIT240953
基金项目: 国家自然科学基金(62201576, U1833107),国家自然科学基金配套基金(3122023PT10)
详细信息
    作者简介:

    胡泽:男,讲师,研究方向为人工智能、自然语言处理、网络信息安全

    李文君:女,硕士生,研究方向为人工智能安全、自然语言处理

    杨宏宇:男,教授,博士生导师,研究方向为网络与系统安全、软件安全、网络安全态势感知

    通讯作者:

    杨宏宇 yhyxlx@hotmail.com

  • 中图分类号: TN915.08; TP391.1

A Cybersecurity Entity Recognition Approach Based on Character Representation Learning and Temporal Boundary Diffusion

Funds: The National Natural Science Foundation of China (62201576, U1833107), The Supporting Fund of the National Natural Science Foundation of China (3122023PT10)
  • 摘要: 网络安全实体识别作为威胁信息抽取、构建知识图谱的基础,对于发现和应对网络威胁具有至关重要的作用。该文针对当前主流的命名实体识别方法在网络安全领域泛化能力欠佳、难以清晰判断网络安全实体边界的问题,提出一种基于字符表示学习与时序边界扩散的网络安全实体识别方法。该方法首先将命名实体识别任务分解为实体边界检测与实体分类两个子任务,分别进行处理;其次,对于实体边界检测任务,使用基于问答的方法将预定义的问题与数据进行编码,采用膨胀卷积残差字符网络进行数据的字符级特征提取,并使用时序边界扩散网络判断实体边界;然后,对于实体分类任务,同样使用问答方法,并独立训练分类器进行实体类型判断;最后将实体边界检测任务的结果输入实体分类任务判断实体的类型。为验证方法有效性,在网络威胁情报数据集DNRTI上进行测试。实验结果表明,边界检测效率的提升能够有效增强命名实体识别的性能。该方法在网络安全实体识别任务中不仅资源开销较小,且对比近年提出的基线方法性能有所提升,其中较最近两年的方法在F1分数上提升了0.40%~1.65%。
  • 图  1  基于字符表示学习与时序边界扩散的网络安全实体识别方法框架图

    图  2  边界检测任务中应用的问答方法示意图

    图  3  DCR-CharNet网络结构图

    图  4  膨胀残差块结构图

    图  5  TBDN网络结构图

    图  6  各训练轮次的学习率和F1分数变化曲线

    图  7  不同膨胀残差块个数下的F1分数

    表  1  超参数设置

    参数
    最大学习率 1e–4
    优化器 AdamW
    词嵌入维度 768
    语句最大长度 256
    膨胀残差块数 3
    训练轮次 200
    批次大小 32
    损失函数 CE Loss
    下载: 导出CSV

    表  2  数据集中各标签数量

    标签数量
    HackOrg5 645
    Tool4 784
    Aera3 447
    SamFile2 400
    Time2 659
    SecTeam1 921
    OffAct2 669
    Org2 489
    Idus2 136
    Features2 441
    Purp2 424
    Way2 018
    Exp1 559
    总计36 592
    下载: 导出CSV

    表  3  与其他命名实体识别方法对比结果(%)

    方法 准确率 精确率 召回率 F1分数
    IDCNN+CRF 98.66 74.42 77.40 75.88
    CNN+BiLSTM+CRF 98.69 76.20 76.07 76.14
    BERT 77.00 82.00 80.00
    BERT+BiLSTM+HSA 95.43 86.16 84.54 85.34
    CTERMRFRAT 88.31
    UTERMMF (SOTA, 2023) 90.50 88.34 89.41
    本文方法 97.30 90.03 89.51 89.77
    下载: 导出CSV

    表  4  模型资源开销对比

    方法参数量(M)推理速度(sent/s)GPU最大功耗(W)F1分数(%)
    DiffusionNER38159.5252.884.29
    本文方法198112.6436.989.77
    下载: 导出CSV

    表  5  命名实体识别消融实验结果(%)

    方法准确率精确率召回率F1分数
    本文方法97.3090.0389.5189.77
    -CharNet96.8488.2787.8588.06
    -QA96.6887.7287.1387.42
    -TBDN96.6187.3387.0887.21
    下载: 导出CSV

    表  6  实体边界检测消融实验结果(%)

    方法准确率精确率召回率F1分数
    本文方法98.3393.9693.4293.69
    -CharNet98.0092.7592.0992.42
    -TBDN97.8492.1391.5291.82
    下载: 导出CSV

    表  7  各标签预测结果(%)

    标签精确率召回率F1分数
    HackOrg91.3389.5290.41
    Tool82.3382.8982.61
    Aera90.0589.1689.60
    SamFile94.3585.2089.54
    Time94.2791.9393.08
    SecTeam91.6790.4191.03
    OffAct92.9185.5189.06
    Org87.7279.3783.33
    Idus90.0895.1692.55
    Features91.43100.0095.52
    Purp86.9298.9492.54
    Way86.4698.8192.22
    Exp97.53100.0098.75
    下载: 导出CSV
  • [1] ZHOU Diange, LI Shengwen, CHEN Qizhi, et al. Improving few-shot named entity recognition via semantics induced optimal transport[J]. Neurocomputing, 2024, 597: 127938. doi: 10.1016/j.neucom.2024.127938.
    [2] XU Yingjie, TAN Xiaobo, TONG Xin, et al. A robust Chinese named entity recognition method based on integrating dual-layer features and CSBERT[J]. Applied Sciences, 2024, 14(3): 1060. doi: 10.3390/app14031060.
    [3] MA Pingchuan, JIANG Bo, LU Zhigang, et al. Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields[J]. Tsinghua Science and Technology, 2021, 26(3): 259–265. doi: 10.26599/tst.2019.9010033.
    [4] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, USA, 2019: 4171–4186. doi: 10.18653/v1/N19-1423.
    [5] GAO Chen, ZHANG Xuan, HAN Mengting, et al. A review on cyber security named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1153–1168. doi: 10.1631/FITEE.2000286.
    [6] YU Junhui, CHEN Yanping, ZHENG Qinghua, et al. Full-span named entity recognition with boundary regression[J]. Connection Science, 2023, 35(1): 2181483. doi: 10.1080/09540091.2023.2181483.
    [7] ZHA Enze, ZENG Delong, LIN Man, et al. CEPTNER: Contrastive learning enhanced prototypical network for two-stage few-shot named entity recognition[J]. Knowledge-Based Systems, 2024, 295: 111730. doi: 10.1016/j.knosys.2024.111730.
    [8] WANG Xiaodi and LIU Jiayong. A novel feature integration and entity boundary detection for named entity recognition in cybersecurity[J]. Knowledge-Based Systems, 2023, 260: 110114. doi: 10.1016/j.knosys.2022.110114.
    [9] EFTIMOV T, KOROUŠIĆ SELJAK B, and KOROŠEC P. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations[J]. PLoS One, 2017, 12(6): e0179488. doi: 10.1371/journal.pone.0179488.
    [10] HU Chenxi, WU Tao, LIU Chunsheng, et al. Joint contrastive learning and belief rule base for named entity recognition in cybersecurity[J]. Cybersecurity, 2024, 7(1): 19. doi: 10.1186/s42400-024-00206-y.
    [11] FREITAG D, CADIGAN J, SASSEEN R, et al. Valet: Rule-based information extraction for rapid deployment[C]. The Thirteenth Language Resources and Evaluation Conference, Marseille, France, 2022: 524–533.
    [12] SARI Y, HASSAN M F, and ZAMIN N. Rule-based pattern extractor and named entity recognition: A hybrid approach[C]. 2010 International Symposium on Information Technology, Kuala Lumpur, Malaysia, 2010: 563–568. doi: 10.1109/ITSIM.2010.5561392.
    [13] MCCALLUM A and LI Wei. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]. The Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003: 188–191. doi: 10.3115/1119176.1119206.
    [14] SASAKI Y, TSURUOKA Y, MCNAUGHT J, et al. How to make the most of NE dictionaries in statistical NER[J]. BMC Bioinformatics, 2008, 9(S11): S5. doi: 10.1186/1471-2105-9-S11-S5.
    [15] PASSOS A, KUMAR V, and MCCALLUM A. Lexicon infused phrase embeddings for named entity resolution[C]. The Eighteenth Conference on Computational Natural Language Learning, Ann Arbor, USA, 2014: 78–86. doi: 10.3115/v1/W14-1609.
    [16] CECCHINI F M and FERSINI E. Named entity recognition using conditional random fields with non-local relational constraints[EB/OL]. https://arxiv.org/abs/1310.1964, 2013.
    [17] BANERJEE S, DUTTA A, AGRAWAL A, et al. DistALANER: Distantly supervised active learning augmented named entity recognition in the open source software ecosystem[C]. European Conference on Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track, Vilnius, Lithuania, 2024: 313–331. doi: 10.1007/978-3-031-70381-2_20.
    [18] MING Hong, YANG Jiaoyun, GUI Fang, et al. Few-shot nested named entity recognition[J]. Knowledge-Based Systems, 2024, 293: 111688. doi: 10.1016/j.knosys.2024.111688.
    [19] WANG Hao, ZHOU Lekai, DUAN Jianyong, et al. Cross-lingual named entity recognition based on attention and adversarial training[J]. Applied Sciences, 2023, 13(4): 2548. doi: 10.3390/app13042548.
    [20] YU Jie, KONG Wenya, and LIU Fangfang. CeER: A nested name entity recognition model incorporating gaze feature[C]. 8th International Joint Conference on Web and Big Data, Jinhua, China, 2024: 32–45. doi: 10.1007/978-981-97-7232-2_3.
    [21] YANG Kang, YANG Zhiwei, ZHAO Songwei, et al. Uncertainty-aware contrastive learning for semi-supervised named entity recognition[J]. Knowledge-Based Systems, 2024, 296: 111762. doi: 10.1016/j.knosys.2024.111762.
    [22] MA Xuezhe and HOVY E. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF[C]. The 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016: 1064–1074. doi: 10.18653/v1/P16-1101.
    [23] ZHANG Zhen, HU Mengting, ZHAO Shiwan, et al. E-NER: Evidential deep learning for trustworthy named entity recognition[C]. Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023: 1619–1634. doi: 10.18653/v1/2023.findings-acl.103.
    [24] LENG Fangling, LI Fan, BAO Yubin, et al. DABC: A named entity recognition method incorporating attention mechanisms[J]. Mathematics, 2024, 12(13): 1992. doi: 10.3390/math12131992.
    [25] DE LICHY C, GLAUDE H, and CAMPBELL W. Meta-learning for few-shot named entity recognition[C]. The 1st Workshop on Meta Learning and Its Applications to Natural Language Processing, Bangkok, Thailand (online), 2021: 44–58. doi: 10.18653/v1/2021.metanlp-1.6.
    [26] JACKADUMA. SecRoBERTa[EB/OL]. https://huggingface.co/jackaduma/SecRoBERTa, 2024.
    [27] ARORA J and PARK Y. Split-NER: Named entity recognition via two question-answering-based classifications[C]. The 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, Canada, 2023: 416–426. doi: 10.18653/v1/2023.acl-short.36.
    [28] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1800–1807. doi: 10.1109/CVPR.2017.195.
    [29] YU F and KOLTUN V. Multi-scale context aggregation by dilated convolutions[C]. 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016. doi: 10.48550/arXiv.1511.07122.
    [30] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [31] SHEN Yongliang, SONG Kaitao, TAN Xu, et al. DiffusionNER: Boundary diffusion for named entity recognition[C]. The 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023: 3875–3890. doi: 10.18653/v1/2023.acl-long.215.
    [32] WANG Xuren, LIU Xinpei, AO Shengqin, et al. DNRTI: A large-scale dataset for named entity recognition in threat intelligence[C]. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 2020: 1842–1848. doi: 10.1109/TrustCom50675.2020.00252.
    [33] MOUICHE I and SAAD S. TI-NERmerger: Semi-automated framework for integrating NER datasets in cybersecurity[C]. The 21st International Conference on Security and Cryptography, Dijon, France, 2024: 357–370. doi: 10.5220/0012867900003767.
    [34] LIU Peipei, LI Hong, WANG Zuoguang, et al. Multi-features based semantic augmentation networks for named entity recognition in threat intelligence[C]. 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, Canada, 2022: 1557–1563. doi: 10.1109/ICPR56361.2022.9956373.
    [35] WANG Peng and LIU Jingju. A cyber threat entity recognition method based on robust feature representation and adversarial training[C]. The 2023 12th International Conference on Computing and Pattern Recognition, Qingdao, China, 2024: 255–259. doi: 10.1145/3633637.3633677.
    [36] CHANG Yu, WANG Gang, ZHU Peng, et al. Research on unified cyber threat intelligence entity recognition method based on multiple features[C]. 2023 4th International Conference on Computers and Artificial Intelligence Technology (CAIT), Macau, China, 2023: 233–240. doi: 10.1109/CAIT59945.2023.10469250.
  • 期刊类型引用(1)

    1. 左雷,金丹. 基于2分随机乘法模型的多重分形海杂波建模研究. 海军工程大学学报. 2019(04): 17-21 . 百度学术

    其他类型引用(3)

  • 加载中
图(7) / 表(7)
计量
  • 文章访问数:  349
  • HTML全文浏览量:  271
  • PDF下载量:  53
  • 被引次数: 4
出版历程
  • 收稿日期:  2024-10-28
  • 修回日期:  2025-02-10
  • 网络出版日期:  2025-02-19

目录

    /

    返回文章
    返回