结合多特征嵌入和多网络融合的中文医疗命名实体识别

雷松泽; 刘博; 王瑜菲; 单奥奎

doi:10.11999/JEIT220802

结合多特征嵌入和多网络融合的中文医疗命名实体识别

doi: 10.11999/JEIT220802

西安工业大学计算机科学与工程学院西安 710021

基金项目: 新型网络与检测控制国家地方联合工程实验室基金(GSYSJ2016008)

详细信息

作者简介:
雷松泽：男，博士，副教授，研究方向为深度学习、模式识别等

刘博：女，硕士生，研究方向为深度学习等

王瑜菲：女，硕士生，研究方向为深度学习等

单奥奎：男，硕士生，研究方向为深度学习等

通讯作者:
刘博　liubo0909888@163.com

1¹⁾ https://github.com/google-research/bert.
2²⁾ https://pinyin.sogou.com/dict/cate/index/132?rf=dictindex.³⁾ http://tool.httpcn.com/Zi/.⁴⁾ https://openhownet.thunlp.org/.
中图分类号: TP391.1; R-05
计量
- 文章访问数: 603
- HTML全文浏览量: 354
- PDF下载量: 137
- 被引次数: 0
出版历程
- 收稿日期: 2022-06-17
- 修回日期: 2022-12-02
- 网络出版日期: 2022-12-08
- 刊出日期: 2023-08-21

Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

Funds: The National Joint Engineering Laboratory of New Network and Detection Foundation (GSYSJ2016008)

摘要

摘要: 在医疗领域中，实体识别能够从大规模电子病历文本中提取有价值信息，由于缺乏定位实体边界的特征以及存在语义信息提取不完整等问题，中文的命名实体识别(NER)实现更加困难。该文提出一种针对中文电子病历的结合多特征嵌入和多网络融合的模型(MFE-MNF)。该模型嵌入多粒度特征，即字符、单词、部首和外部知识，扩展字符的特征表示，明确实体边界。将特征向量分别输入到双向长短期记忆神经网络(BiLSTM)和该文构建的自适应图卷积网络等双通路中，全面深入地捕获上下文语义信息和全局语义信息，缓解语义信息提取不完整问题。在CCKS2019和CCKS2020数据集上进行实验验证，结果表明，相比于传统实体识别模型，该文模型能够准确且有效地提取实体。
- 命名实体识别 /
- 多特征嵌入 /
- 多网络融合 /
- 自适应图卷积网络
Abstract: In the medical field, entity recognition can extract valuable information from the text of large-scale electronic medical records. Due to the lack of features for locating entity boundaries and incomplete semantic information extraction, the implementation of Chinese Named Entity Recognition(NER) is more difficult. In this paper, a model combining Multi-Feature Embedding and Multi-Net-work Fusion model (MFE-MNF) is proposed. The model embeds multi-granularity features, i.e. characters, words, radicals and external knowledge, extends the feature representation of characters and defines the entity boundary. The feature vectors are input respectively into the two paths of Bi-directional Long Short-Term Memory (BiLSTM) and adaptive graph convolution network to capture comprehensively and deeply the context semantic information and global semantic information, and alleviate the problem of incomplete semantic information extraction. The experimental results on CCKS2019 and CCKS2020 datasets show that compared with the traditional entity recognition model, the proposed model can extract entities accurately and effectively.
- Named Entity Recognition(NER) /
- Multi-feature embedding /
- Multi-network fusion /
- Adaptive graph convolutional network

HTML全文

图 1 知识嵌入模块

下载: 全尺寸图片幻灯片

图 2 基于多特征嵌入的字符表示

下载: 全尺寸图片幻灯片

图 3 “入院后诊断为阑尾炎”的语义树

下载: 全尺寸图片幻灯片

图 4 中文电子病历标注结果

下载: 全尺寸图片幻灯片

图 5 训练结果

下载: 全尺寸图片幻灯片

表 1 实验参数设置

参数名	数值	单位
字符嵌入维度	768	维
GCN层数	2	层
滑动窗口大小	10	字符
Dropout	0.500	–
Batch_size	64	–
Epoch	80	轮
学习率	0.001	–

下载: 导出CSV

表 2 各模型在CCKS2019数据集上的比较结果(%)

模型	P	R	F1
Word2vec-BiLSTM-CRF^[5]	80.74	80.42	80.59
Bert-BiLSTM-CRF^[21]	82.45	81.86	82.08
ME-CNER^[6]	83.56	82.91	83.13
Lattice LSTM^[19]	84.44	83.89	84.18
Bert-GCN-CRF^[20]	85.05	84.14	84.65
MFE-MNF	85.31	84.96	85.15

下载: 导出CSV

表 3 各模型在CCKS2020数据集上的比较结果(%)

模型	P	R	F1
Word2vec-BiLSTM-CRF^[5]	87.16	86.77	86.97
Bert-BiLSTM-CRF^[19]	88.78	88.35	88.61
ME-CNER^[6]	90.10	90.17	90.15
Lattice LSTM^[20]	91.10	90.41	90.54
Bert-GCN-CRF^[21]	91.19	90.91	90.96
MFE-MNF	91.45	91.09	91.21

下载: 导出CSV

表 4 各模型的计算复杂度和计算时间的比较结果

模型	参数量(M)	计算量(M)	时间(s)
Word2vec-BiLSTM-CRF^[5]	17	26	4.49
Bert-BiLSTM-CRF^[21]	124	200	1.97
ME-CNER^[6]	15	23	3.36
Lattice LSTM^[19]	47	78	5.33
Bert-GCN-CRF^[20]	126	203	4.54
MFE-MNF	105	176	3.21

下载: 导出CSV

表 5 嵌入模块的消融实验(%)

模型	P	R	F1
character	87.93	87.58	87.77
+ word	89.29	88.51	89.08
+ radical	89.74	89.33	89.52
+ sememe	90.05	89.62	89.85
+ word + radical	90.43	90.09	90.28
+ word + sememe	91.01	90.37	90.74
+character+sememe+radical+word	91.45	91.09	91.21

下载: 导出CSV

表 6 语义信息提取模块的消融实验(%)

模型	P	R	F1
BiLSTM+AGCN	91.45	91.09	91.21
- BiLSTM	90.13	89.85	90.04
- AGCN	89.89	89.42	89.65

下载: 导出CSV

表 7 基于CCKS2019数据集的词典与覆盖率实验(%)

实体是否出现在训练集	没有词典			有词典
实体是否出现在训练集	P	R	F1	P	R	F1
全部出现	90.69	90.03	90.38	91.45	91.09	91.21
部分出现	88.28	87.60	87.92	88.99	88.23	88.62
不出现	86.88	86.77	86.85	87.60	87.09	87.29

下载: 导出CSV

表 8 基于CCKS2020数据集的词典与覆盖率实验(%)

实体是否出现在训练集	没有词典			有词典
实体是否出现在训练集	P	R	F1	P	R	F1
全部出现	85.28	84.57	84.92	85.31	84.96	85.15
部分出现	82.82	81.14	81.46	83.53	82.77	83.13
不出现	81.42	80.31	80.77	82.14	81.63	81.83

下载: 导出CSV

参考文献(21)

[1]	MURRAY E, POLLACK L, WHITE M, et al. Clinical decision-making: Patients’ preferences and experiences[J]. Patient Education and Counseling, 2007, 65(2): 189–196. doi: 10.1016/j.pec.2006.07.007
[2]	GOEURIOT L, JONES G J F, KELLY L, et al. Medical information retrieval: Introduction to the special issue[J]. Information Retrieval Journal, 2016, 19(1): 1–5. doi: 10.1007/s10791-015-9277-8
[3]	ANSARI A, MAKNOJIA M, and SHAIKH A. Intelligent question answering system based on artificial neural network[C]. 2016 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 2016: 758–763.
[4]	WU Fangzhao, LIU Junxin, WU Chuhan, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]. the World Wide Web Conference, San Francisco, USA, 2019: 3342–3348.
[5]	DONG Chuanhai, ZHANG Jiajun, ZONG Chengqing, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]. The 24th International Conference on Computer Processing of Oriental Languages, 5th National CCF Conference on Natural Language Processing and Chinese Computing, Kunming, China, 2016: 239–250.
[6]	XU Canwen, WANG Feiyang, HAN Jialong, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]. The 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 2019: 2269–2272.
[7]	ZHANG Naixin, LI Feng, XU Guangluan, et al. Chinese NER using dynamic meta-embeddings[J]. IEEE Access, 2019, 7: 64450–64459. doi: 10.1109/ACCESS.2019.2916816
[8]	WANG Xiao, DOU Shihan, XIONG Limao, et al. MINER: Improving out-of-vocabulary named entity recognition from an information theoretic perspective[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022.
[9]	张乐, 李健, 唐亮, 等. 基于预训练BERT的军事领域目标实体深度学习识别方法[J]. 信息工程大学学报, 2021, 22(3): 331–337. doi: 10.3969/j.issn.1671-0673.2021.03.013 ZHANG Le, LI Jian, TANG Liang, et al. Deep learning recognition method for target entity in military field based on pre-trained BERT[J]. Journal of Information Engineering University, 2021, 22(3): 331–337. doi: 10.3969/j.issn.1671-0673.2021.03.013
[10]	ZHU Enwei and LI Jinpeng. Boundary smoothing for named entity recognition[C]. The 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022.
[11]	郭力华, 李旸, 王素格, 等. 基于匹配策略和社区注意力机制的法律文书命名实体识别[J]. 中文信息学报, 2022, 36(2): 85–92. doi: 10.3969/j.issn.1003-0077.2022.02.010 GUO Lihua, LI Yang, WANG Suge, et al. Name entity recognition in legal instruments based on matching strategy and community attention mechanism[J]. Journal of Chinese Information Processing, 2022, 36(2): 85–92. doi: 10.3969/j.issn.1003-0077.2022.02.010
[12]	JI Bin, LIU Rui, LI Shasha, et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(2): 64. doi: 10.1186/s12911-019-0767-2
[13]	YAN Hang, GUI Tao, DAI Junqi, et al. A unified generative framework for various NER subtasks[EB]. https://doi.org/10.48550/arXiv.2016.01223?file=arXiv.2016.01223.
[14]	LIU Qin, ZHENG Rui, RONG Bao, et al. Flooding-X: Improving BERT’s resistance to adversarial attacks via loss-restricted fine-tuning[C]. The 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2022: 5634–5644.
[15]	LI Fei, LIN Zhichao, ZHANG Meishan, et al. A span-based model for joint overlapped and discontinuous named entity recognition[EB]. https://doi.org/10.48550/arXiv.2016.14373.
[16]	YAO Liang, MAO Chengsheng, and LUO Yuan. Graph convolutional networks for text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7370–7377. doi: 10.1609/aaai.v33i01.33017370
[17]	CETOLI A, BRAGAGLIA S, O'HARNEY A D, et al. Graph convolutional networks for named entity recognition[C]. The 16th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic, 2018.
[18]	AN Ying, XIA Xianyun, CHEN Xianlai, et al. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF[J]. Artificial Intelligence in Medicine, 2022, 127: 102282. doi: 10.1016/j.artmed.2022.102282
[19]	ZHANG Yue and YANG Jie. Chinese NER using lattice LSTM[C]. The 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018.
[20]	景慎旗, 赵又霖. 面向中文电子病历文书的医学命名实体识别研究——一种基于半监督深度学习的方法[J]. 信息资源管理学报, 2021, 11(6): 105–115. doi: 10.13365/j.jirm.2021.06.105 JING Shenqi and ZHAO Youlin. Recognizing clinical named entity from Chinese electronic medical record texts based on semi-supervised deep learning[J]. Journal of Information Resources Management, 2021, 11(6): 105–115. doi: 10.13365/j.jirm.2021.06.105
[21]	DAI Zhenjin, WANG Xutao, NI Pin, et al. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records[C]. 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 2019: 1–5.

施引文献

资源附件(0)

访问统计

图(5) / 表(8)

计量

文章访问数: 603
HTML全文浏览量: 354
PDF下载量: 137
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

结合多特征嵌入和多网络融合的中文医疗命名实体识别

doi: 10.11999/JEIT220802

作者简介:
雷松泽：男，博士，副教授，研究方向为深度学习、模式识别等

刘博：女，硕士生，研究方向为深度学习等

王瑜菲：女，硕士生，研究方向为深度学习等

单奥奎：男，硕士生，研究方向为深度学习等

通讯作者:
刘博　liubo0909888@163.com

计量

Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion

计量

目录

留言板

结合多特征嵌入和多网络融合的中文医疗命名实体识别

doi: 10.11999/JEIT220802

作者简介: 雷松泽：男，博士，副教授，研究方向为深度学习、模式识别等 刘博：女，硕士生，研究方向为深度学习等 王瑜菲：女，硕士生，研究方向为深度学习等 单奥奎：男，硕士生，研究方向为深度学习等

通讯作者: 刘博 liubo0909888@163.com

计量

出版历程

Chinese Medical Named Entity Recognition Combined with Multi-Feature Embedding and Multi-Network Fusion

计量

出版历程

目录

作者简介:
雷松泽：男，博士，副教授，研究方向为深度学习、模式识别等

刘博：女，硕士生，研究方向为深度学习等

王瑜菲：女，硕士生，研究方向为深度学习等

单奥奎：男，硕士生，研究方向为深度学习等

通讯作者:
刘博　liubo0909888@163.com