高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合多特征嵌入和多网络融合的中文医疗命名实体识别

雷松泽 刘博 王瑜菲 单奥奎

雷松泽, 刘博, 王瑜菲, 单奥奎. 结合多特征嵌入和多网络融合的中文医疗命名实体识别[J]. 电子与信息学报, 2023, 45(8): 3032-3039. doi: 10.11999/JEIT220802
引用本文: 雷松泽, 刘博, 王瑜菲, 单奥奎. 结合多特征嵌入和多网络融合的中文医疗命名实体识别[J]. 电子与信息学报, 2023, 45(8): 3032-3039. doi: 10.11999/JEIT220802
LEI Songze, LIU Bo, WANG Yufei, SHAN Aokui. Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3032-3039. doi: 10.11999/JEIT220802
Citation: LEI Songze, LIU Bo, WANG Yufei, SHAN Aokui. Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3032-3039. doi: 10.11999/JEIT220802

结合多特征嵌入和多网络融合的中文医疗命名实体识别

doi: 10.11999/JEIT220802
基金项目: 新型网络与检测控制国家地方联合工程实验室基金(GSYSJ2016008)
详细信息
    作者简介:

    雷松泽:男,博士,副教授,研究方向为深度学习、模式识别等

    刘博:女,硕士生,研究方向为深度学习等

    王瑜菲:女,硕士生,研究方向为深度学习等

    单奥奎:男,硕士生,研究方向为深度学习等

    通讯作者:

    刘博 liubo0909888@163.com

  • 11) https://github.com/google-research/bert.
  • 22) https://pinyin.sogou.com/dict/cate/index/132?rf=dictindex.3) http://tool.httpcn.com/Zi/.4) https://openhownet.thunlp.org/.
  • 中图分类号: TP391.1; R-05

Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion

Funds: The National Joint Engineering Laboratory of New Network and Detection Foundation (GSYSJ2016008)
  • 摘要: 在医疗领域中,实体识别能够从大规模电子病历文本中提取有价值信息,由于缺乏定位实体边界的特征以及存在语义信息提取不完整等问题,中文的命名实体识别(NER)实现更加困难。该文提出一种针对中文电子病历的结合多特征嵌入和多网络融合的模型(MFE-MNF)。该模型嵌入多粒度特征,即字符、单词、部首和外部知识,扩展字符的特征表示,明确实体边界。将特征向量分别输入到双向长短期记忆神经网络(BiLSTM)和该文构建的自适应图卷积网络等双通路中,全面深入地捕获上下文语义信息和全局语义信息,缓解语义信息提取不完整问题。在CCKS2019和CCKS2020数据集上进行实验验证,结果表明,相比于传统实体识别模型,该文模型能够准确且有效地提取实体。
  • 图  1  知识嵌入模块

    图  2  基于多特征嵌入的字符表示

    图  3  “入院后诊断为阑尾炎”的语义树

    图  4  中文电子病历标注结果

    图  5  训练结果

    表  1  实验参数设置

    参数名数值单位
    字符嵌入维度768
    GCN层数2
    滑动窗口大小10字符
    Dropout0.500
    Batch_size64
    Epoch80
    学习率0.001
    下载: 导出CSV

    表  2  各模型在CCKS2019数据集上的比较结果(%)

    模型PRF1
    Word2vec-BiLSTM-CRF[5]80.7480.4280.59
    Bert-BiLSTM-CRF[21]82.4581.8682.08
    ME-CNER[6]83.5682.9183.13
    Lattice LSTM[19]84.4483.8984.18
    Bert-GCN-CRF[20]85.0584.1484.65
    MFE-MNF85.3184.9685.15
    下载: 导出CSV

    表  3  各模型在CCKS2020数据集上的比较结果(%)

    模型PRF1
    Word2vec-BiLSTM-CRF[5]87.1686.7786.97
    Bert-BiLSTM-CRF[19]88.7888.3588.61
    ME-CNER[6]90.1090.1790.15
    Lattice LSTM[20]91.1090.4190.54
    Bert-GCN-CRF[21]91.1990.9190.96
    MFE-MNF91.4591.0991.21
    下载: 导出CSV

    表  4  各模型的计算复杂度和计算时间的比较结果

    模型参数量(M)计算量(M)时间(s)
    Word2vec-BiLSTM-CRF[5]17264.49
    Bert-BiLSTM-CRF[21]1242001.97
    ME-CNER[6]15233.36
    Lattice LSTM[19]47785.33
    Bert-GCN-CRF[20]1262034.54
    MFE-MNF1051763.21
    下载: 导出CSV

    表  5  嵌入模块的消融实验(%)

    模型PRF1
    character87.9387.5887.77
    + word89.2988.5189.08
    + radical89.7489.3389.52
    + sememe90.0589.6289.85
    + word + radical90.4390.0990.28
    + word + sememe91.0190.3790.74
    +character+sememe+radical+word91.4591.0991.21
    下载: 导出CSV

    表  6  语义信息提取模块的消融实验(%)

    模型PRF1
    BiLSTM+AGCN91.4591.0991.21
    - BiLSTM90.1389.8590.04
    - AGCN89.8989.4289.65
    下载: 导出CSV

    表  7  基于CCKS2019数据集的词典与覆盖率实验(%)

    实体是否出现在训练集没有词典有词典
    PRF1PRF1
    全部出现90.6990.0390.3891.4591.0991.21
    部分出现88.2887.6087.9288.9988.2388.62
    不出现86.8886.7786.8587.6087.0987.29
    下载: 导出CSV

    表  8  基于CCKS2020数据集的词典与覆盖率实验(%)

    实体是否出现在训练集没有词典有词典
    PRF1PRF1
    全部出现85.2884.5784.9285.3184.9685.15
    部分出现82.8281.1481.4683.5382.7783.13
    不出现81.4280.3180.7782.1481.6381.83
    下载: 导出CSV
  • [1] MURRAY E, POLLACK L, WHITE M, et al. Clinical decision-making: Patients’ preferences and experiences[J]. Patient Education and Counseling, 2007, 65(2): 189–196. doi: 10.1016/j.pec.2006.07.007
    [2] GOEURIOT L, JONES G J F, KELLY L, et al. Medical information retrieval: Introduction to the special issue[J]. Information Retrieval Journal, 2016, 19(1): 1–5. doi: 10.1007/s10791-015-9277-8
    [3] ANSARI A, MAKNOJIA M, and SHAIKH A. Intelligent question answering system based on artificial neural network[C]. 2016 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 2016: 758–763.
    [4] WU Fangzhao, LIU Junxin, WU Chuhan, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]. the World Wide Web Conference, San Francisco, USA, 2019: 3342–3348.
    [5] DONG Chuanhai, ZHANG Jiajun, ZONG Chengqing, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]. The 24th International Conference on Computer Processing of Oriental Languages, 5th National CCF Conference on Natural Language Processing and Chinese Computing, Kunming, China, 2016: 239–250.
    [6] XU Canwen, WANG Feiyang, HAN Jialong, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]. The 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 2019: 2269–2272.
    [7] ZHANG Naixin, LI Feng, XU Guangluan, et al. Chinese NER using dynamic meta-embeddings[J]. IEEE Access, 2019, 7: 64450–64459. doi: 10.1109/ACCESS.2019.2916816
    [8] WANG Xiao, DOU Shihan, XIONG Limao, et al. MINER: Improving out-of-vocabulary named entity recognition from an information theoretic perspective[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022.
    [9] 张乐, 李健, 唐亮, 等. 基于预训练BERT的军事领域目标实体深度学习识别方法[J]. 信息工程大学学报, 2021, 22(3): 331–337. doi: 10.3969/j.issn.1671-0673.2021.03.013

    ZHANG Le, LI Jian, TANG Liang, et al. Deep learning recognition method for target entity in military field based on pre-trained BERT[J]. Journal of Information Engineering University, 2021, 22(3): 331–337. doi: 10.3969/j.issn.1671-0673.2021.03.013
    [10] ZHU Enwei and LI Jinpeng. Boundary smoothing for named entity recognition[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022.
    [11] 郭力华, 李旸, 王素格, 等. 基于匹配策略和社区注意力机制的法律文书命名实体识别[J]. 中文信息学报, 2022, 36(2): 85–92. doi: 10.3969/j.issn.1003-0077.2022.02.010

    GUO Lihua, LI Yang, WANG Suge, et al. Name entity recognition in legal instruments based on matching strategy and community attention mechanism[J]. Journal of Chinese Information Processing, 2022, 36(2): 85–92. doi: 10.3969/j.issn.1003-0077.2022.02.010
    [12] JI Bin, LIU Rui, LI Shasha, et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(2): 64. doi: 10.1186/s12911-019-0767-2
    [13] YAN Hang, GUI Tao, DAI Junqi, et al. A unified generative framework for various NER subtasks[EB]. https://doi.org/10.48550/arXiv.2016.01223?file=arXiv.2016.01223.
    [14] LIU Qin, ZHENG Rui, RONG Bao, et al. Flooding-X: Improving BERT’s resistance to adversarial attacks via loss-restricted fine-tuning[C]. The 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2022: 5634–5644.
    [15] LI Fei, LIN Zhichao, ZHANG Meishan, et al. A span-based model for joint overlapped and discontinuous named entity recognition[EB]. https://doi.org/10.48550/arXiv.2016.14373.
    [16] YAO Liang, MAO Chengsheng, and LUO Yuan. Graph convolutional networks for text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7370–7377. doi: 10.1609/aaai.v33i01.33017370
    [17] CETOLI A, BRAGAGLIA S, O'HARNEY A D, et al. Graph convolutional networks for named entity recognition[C]. The 16th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic, 2018.
    [18] AN Ying, XIA Xianyun, CHEN Xianlai, et al. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF[J]. Artificial Intelligence in Medicine, 2022, 127: 102282. doi: 10.1016/j.artmed.2022.102282
    [19] ZHANG Yue and YANG Jie. Chinese NER using lattice LSTM[C]. The 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018.
    [20] 景慎旗, 赵又霖. 面向中文电子病历文书的医学命名实体识别研究——一种基于半监督深度学习的方法[J]. 信息资源管理学报, 2021, 11(6): 105–115. doi: 10.13365/j.jirm.2021.06.105

    JING Shenqi and ZHAO Youlin. Recognizing clinical named entity from Chinese electronic medical record texts based on semi-supervised deep learning[J]. Journal of Information Resources Management, 2021, 11(6): 105–115. doi: 10.13365/j.jirm.2021.06.105
    [21] DAI Zhenjin, WANG Xutao, NI Pin, et al. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records[C]. 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 2019: 1–5.
  • 加载中
图(5) / 表(8)
计量
  • 文章访问数:  787
  • HTML全文浏览量:  472
  • PDF下载量:  138
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-06-17
  • 修回日期:  2022-12-02
  • 网络出版日期:  2022-12-08
  • 刊出日期:  2023-08-21

目录

    /

    返回文章
    返回