融合表示学习和知识图谱推理的糖尿病及并发症预测方法

王宇翱; 黄叶琪; 李青远; 刘云; 景慎旗; 单涛; 郭永安

doi:10.11999/JEIT250798

融合表示学习和知识图谱推理的糖尿病及并发症预测方法

doi: 10.11999/JEIT250798 cstr: 32379.14.JEIT250798

1.
南京邮电大学智能信息处理与通信技术省高校重点实验室南京 210003
2.
江苏省人民医院信息处南京 210029

基金项目: 国家重点研发计划(2023YFC3605800)，江苏省前沿引领技术基础研究专项(BK20202001)，江苏省研究生科研与实践创新计划项目(SJCX24_0285)

详细信息

作者简介:
王宇翱：男，博士生，研究方向为人工智能和智能信息处理

黄叶琪：女，硕士生，研究方向为医疗人工智能

李青远：男，硕士生，研究方向为人工智能和医疗信息处理

刘云：女，教授，研究方向为智能医学、医学信息学、临床大数据

景慎旗：男，高级工程师，研究方向为医疗信息大数据

单涛：男，高级工程师，研究方向为医疗信息大数据

郭永安：男，教授，研究方向为智能信息处理

通讯作者:
郭永安　guo@njupt.edu.cn

1¹⁾ https://physionet.org/，该数据集由美国国立卫生研究院提供
中图分类号: TN912.34
计量
- 文章访问数: 106
- HTML全文浏览量: 47
- PDF下载量: 19
- 被引次数: 0
出版历程
- 收稿日期: 2025-08-26
- 修回日期: 2025-10-27
- 网络出版日期: 2025-11-04

Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction

1.
Jiangsu Key Laboratory of Intelligent Information Processing and Communication Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
Department of Information, Jiangsu Province Hospital, Nanjing 210029, China

Funds: The National Key Research Program of China (2023YFC3605800), The Frontier Leading Technology Basic Research Program of Jiangsu Province (BK20202001), The Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX24_0285)

摘要

摘要: 糖尿病及其发并发症的联合预测对于降低慢性病危害、改善患者预后具有重要意义。然而，现有预测方法面临数据异构性和稀疏性、实体关系复杂以及疾病与医学概念间高阶关联难以精确捕捉等挑战，限制了预测准确性和多病症识别能力。针对上述问题，该文提出一种基于表示学习与知识图谱推理的糖尿病及其并发症预测模型(REKG-MDP)。通过整合电子健康记录与医学补充知识构建医疗知识图谱，在患者侧完善个人基本信息、检查指标及现病史，在疾病侧补充疾病共病信息、多发人群、常见病因及诊断依据，从而缓解数据稀疏性与异构性问题。综合考虑对称、反对称、反转和组合4种关系连接模式，并设计层次化注意力机制与图卷积网络相结合的推理模块，在全局和局部动态调整邻居节点权重，有效聚合多阶邻居信息并捕捉高阶语义关系。基于MIMIC-IV数据集的实验结果表明，所提模型在糖尿病及发并发症联合预测任务中明显优于现有方法，预测准确率和多病症识别能力均有显著提升。
- 多病症联合预测 /
- 表示学习 /
- 医疗知识图谱 /
- 图神经网络 /
- 注意力机制
Abstract: Objective Diabetes mellitus and its complications are recognized as major global health challenges, causing severe morbidity, high healthcare costs, and reduced quality of life. Accurate joint prediction of these conditions is essential for early intervention but is hindered by data heterogeneity, sparsity, and complex inter-entity relationships. To address these challenges, a Representation Learning Enhanced Knowledge Graph-based Multi-Disease Prediction (REKG-MDP) model is proposed. Electronic Health Records (EHRs) are integrated with supplementary medical knowledge to construct a comprehensive Medical Knowledge Graph (MKG), and higher-order semantic reasoning combined with relation-aware representation learning is applied to capture complex dependencies and improve predictive accuracy across multiple diabetes-related conditions. Methods The REKG-MDP framework consists of three modules. First, a MKG is constructed by integrating structured EHR data from the MIMIC-IV dataset with external disease knowledge. Patient-side features include demographics, laboratory indices, and medical history, whereas disease-side attributes cover comorbidities, susceptible populations, etiological factors, and diagnostic criteria. This integration mitigates data sparsity and enriches semantic representation. Second, a relation-aware embedding module captures four relational patterns: symmetric, antisymmetric, inverse, and compositional. These patterns are used to optimize entity and relation embeddings for semantic reasoning. Third, a Hierarchical Attention-based Graph Convolutional Network (HA-GCN) aggregates multi-hop neighborhood information. Dynamic attention weights capture both local and global dependencies, and a bidirectional mechanism enhances the modeling of patient–disease interactions. Results and Discussions Experiments demonstrate that REKG-MDP consistently outperforms four baselines: two machine learning models (DCKD-RF and bSES-AC-RUN-FKNN) and two graph-based models (KGRec and PyRec). Compared with the strongest baseline, REKG-MDP achieves average improvements in P, F1, and NDCG of 19.39%, 19.67%, and 19.39% for single-disease prediction ($ n=1 $); 16.71%, 21.83%, and 23.53% for $ n=3 $; and 22.01%, 20.34%, and 20.88% for $ n=5 $ (Table 4). Ablation studies confirm the contribution of each module. Removing relation-pattern modeling reduces performance metrics by approximately 12%, removing hierarchical attention decreases them by 5–6%, and excluding disease-side knowledge produces the largest decline of up to 20% (Fig. 5). Sensitivity analysis indicates that increasing the embedding dimension from 32 to 128 enhances performance by more than 11%, whereas excessive dimensionality (256) leads to over-smoothing (Fig. 6). Adjusting the $ \beta $ parameter strengthens sample discrimination, improving P, F1, and NDCG by 9.28%, 27.9%, and 8.08%, respectively (Fig. 7). Conclusions REKG-MDP integrates representation learning with knowledge graph reasoning to enable multi-disease prediction. The main contributions are as follows: (1) integrating heterogeneous EHR data with disease knowledge mitigates data sparsity and enhances semantic representation; (2) modeling diverse relational patterns and applying hierarchical attention improves the capture of higher-order dependencies; and (3) extensive experiments confirm the model’s superiority over state-of-the-art baselines, with ablation and sensitivity analyses validating the contribution of each module. Remaining challenges include managing extremely sparse data and ensuring generalization across broader populations. Future research will extend REKG-MDP to model temporal disease progression and additional chronic conditions.
- Joint prediction of multiple diseases /
- Representation learning /
- Medical Knowledge Graph (MKG) /
- Graph neural network /
- Attention mechanism

HTML全文

图 1 REKG-MDP模型架构图

下载: 全尺寸图片幻灯片

图 2 知识图谱构建流程图

下载: 全尺寸图片幻灯片

图 3 医疗领域知识图谱示例图

下载: 全尺寸图片幻灯片

图 4 节点嵌入向量聚合过程

下载: 全尺寸图片幻灯片

图 5 REKG-MDP模型以及其3个变体的性能对比图

下载: 全尺寸图片幻灯片

图 6 嵌入向量维度对REKG-MDP模型的性能影响图

下载: 全尺寸图片幻灯片

图 7 $\beta $对REKG-MDP模型的性能影响图

下载: 全尺寸图片幻灯片

表 1 医疗知识图谱中的关系连接模式示例

关系连接模式	解释	医疗案例
对称模式	两个实体之间的关系是相互的，即如果A与B有这种关系，那么B也应该与A有这种关系	(糖尿病，共病，高脂血症)
反对称模式	如果A与B有这种关系，那么B与A没有这种关系	(患者，BMI，肥胖)
反转模式	在某些条件下，这导致原始关系的反转，即如果存在$ {r_1}(A,B) $，那么存在$ {r_2}(B,A) $	(高血糖，导致，糖尿病) →(糖尿病，风险因素，高血糖)
组合模式	一个实体可以通过一系列关系与另一个实体连接，即如果存在$ {r_1}(A,B) $和$ {r_2}(B,C) $，那么可以推断出$ {r_3}(A,C) $	(患者，有，异常检查指标)+ (疾病，诊断依据，异常检查指标) →(患者，患有，疾病)

下载: 导出CSV

表 2 知识图谱统计信息

数据类型	数据集大小
训练集	2910
测试集	1942
疾病数量	18
患者数量	4852
检查指标数量	92
基本个人信息类型数量	18
共病/常见病因/多发人群数量	485
关系类型数量	18
知识图中的三元组数量	163118

下载: 导出CSV

表 3 该文中使用的疾病信息和疾病分类

疾病类别	疾病ICD-10代码	疾病名称
代谢性疾病	E11	2型糖尿病
	E78.5	高脂血症
	E11.4	糖尿病性神经病变
	E10.2&E11.2	糖尿病性慢性肾病
	E10.65&E11.65	高血糖症
	E10	1型糖尿病
	E78.0	高胆固醇血症
	E10.1&E11.1	糖尿病酮症酸中毒
心脑血管疾病	I10	高血压
	150	心力衰竭
	125.1	冠状动脉粥样硬化性心脏病
	121	心肌梗死
	163	缺血性中风
	G45	短暂性脑缺血发作
	170	动脉粥样硬化
肾脏疾病	N18	慢性肾病
非酒精性脂肪肝病	K75.81	非酒精性脂肪性肝炎
非酒精性脂肪肝病	K76.0	脂肪肝

下载: 导出CSV

表 4 REKG-MDP模型与5种基线方法的性能对比

模型	P@1	P@3	P@5	F1@1	F1@3	F1@5	NDCG@1	NDCG@3	NDCG@5
REKG-MDP	0.9655 (↑19.39%)	0.8879 (↑16.71%)	0.8280 (↑22.01%)	0.4200 (↑19.67%)	0.7332 (↑21.83%)	0.8121 (↑20.34%)	0.9655 (↑19.39%)	0.9151 (↑23.53%)	0.8946 (↑20.88%)
DCKD-RF	0.7199	0.4192	0.3455	0.3058	0.3569	0.3375	0.7199	0.4651	0.4329
bSES-AC-RUN-FKNN	0.7106	0.4670	0.3995	0.3086	0.3972	0.3902	0.7106	0.4855	0.4384
KGRec	0.8087	0.7608	0.6786	0.2910	0.4804	0.5057	0.8087	0.6544	0.6017
PyRec	0.7948	0.7018	0.6537	0.3510	0.6018	0.6748	0.7948	0.7408	0.7401

下载: 导出CSV

参考文献(31)

[1]	American Diabetes Association. Diagnosis and classification of diabetes mellitus[J]. Diabetes Care, 2014, 37(S1): S81–S90. doi: 10.2337/dc14-S081.
[2]	姚欣卉, 肖洪彬, 卞敬琦, 等. 丹参有效成分在治疗糖尿病及其并发症中的作用机制研究进展[J]. 中国实验方剂学杂志, 2021, 27(7): 209–218. doi: 10.13422/j.cnki.syfjx.20210401. YAO Xinhui, XIAO Hongbin, BIAN Jingqi, et al. New progress in mechanism of Salviae Miltiorrhizae Radix et Rhizoma in treatment of diabetes and its complications[J]. Chinese Journal of Experimental Traditional Medical Formulae, 2021, 27(7): 209–218. doi: 10.13422/j.cnki.syfjx.20210401.
[3]	GUAN Zhouyu, LI Huating, LIU Ruhan, et al. Artificial intelligence in diabetes management: Advancements, opportunities, and challenges[J]. Cell Reports Medicine, 2023, 4(10): 101213. doi: 10.1016/j.xcrm.2023.101213.
[4]	ZHANG Lufang, YU Renyue, CHEN Keya, et al. Enhancing deep vein thrombosis prediction in patients with coronavirus disease 2019 using improved machine learning model[J]. Computers in Biology and Medicine, 2024, 173: 108294. doi: 10.1016/j.compbiomed.2024.108294.
[5]	RAHMAN M M, AL-AMIN M, and HOSSAIN J. Machine learning models for chronic kidney disease diagnosis and prediction[J]. Biomedical Signal Processing and Control, 2024, 87: 105368. doi: 10.1016/j.bspc.2023.105368.
[6]	ALTHOBAITI T, ALTHOBAITI S, and SELIM M M. An optimized diabetes mellitus detection model for improved prediction of accuracy and clinical decision-making[J]. Alexandria Engineering Journal, 2024, 94: 311–324. doi: 10.1016/j.aej.2024.03.044.
[7]	AL-SSULAMI A M, ALSORORI R S, AZMI A M, et al. Improving coronary heart disease prediction through machine learning and an innovative data augmentation technique[J]. Cognitive Computation, 2023, 15(5): 1687–1702. doi: 10.1007/s12559-023-10151-6.
[8]	金怀平, 薛飞跃, 李振辉, 等. 基于病理图像集成深度学习的胃癌预后预测方法[J]. 电子与信息学报, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655. JIN Huaiping, XUE Feiyue, LI Zhenhui, et al. Prognostic prediction of gastric cancer based on ensemble deep learning of pathological images[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655.
[9]	季薇, 王传瑜, 吴迪, 等. 基于跨语种声学分析的帕金森病检测方法[J]. 电子与信息学报, 2024, 46(2): 546–554. doi: 10.11999/JEIT230981. JI Wei, WANG Chuanyu, WU Di, et al. Parkinson's disease detection method based on cross-language acoustic analysis[J]. Journal of Electronics & Information Technology, 2024, 46(2): 546–554. doi: 10.11999/JEIT230981.
[10]	GHORBANI M, KAZI A, BAGHSHAH M S, et al. RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data[J]. Medical Image Analysis, 2023, 75: 102272. doi: 10.1016/j.media.2021.102272.
[11]	ZHAO Qing, LI Jianqiang, ZHAO Linna, et al. Knowledge guided feature aggregation for the prediction of chronic obstructive pulmonary disease with Chinese EMRs[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 20(6): 3343–3352. doi: 10.1109/TCBB.2022.3198798.
[12]	PHAM T, TAO Xiaohui, ZHANG Ji, et al. Graph-based multi-label disease prediction model learning from medical data and domain knowledge[J]. Knowledge-Based Systems, 2022, 235: 107662. doi: 10.1016/j.knosys.2021.107662.
[13]	QU Zhe, CUI Lizhen, and XU Yonghui. Disease risk prediction via heterogeneous graph attention networks[C]. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, USA, IEEE, 2022: 3385–3390. doi: 10.1109/BIBM55620.2022.9995491.
[14]	LU Chang, HAN Tian, and NING Yue. Context-aware health event prediction via transition functions on dynamic disease graphs[C]. The 36th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2022: 4567–4574. doi: 10.1609/aaai.v36i4.20380.
[15]	熊立鹏, 徐修远, 牛颢, 等. 融合nmODE的术后肺部并发症预测模型[J]. 智能系统学报, 2025, 20(1): 198–205. doi: 10.11992/tis.202401007. XIONG Lipeng, XU Xiuyuan, NIU Hao, et al. Predicting postoperative pulmonary complications after lung surgery using nmODE[J]. CAAI Transactions on Intelligent Systems, 2025, 20(1): 198–205. doi: 10.11992/tis.202401007.
[16]	SUN Zhoujian, DONG Wei, SHI Jinlong, et al. Interpretable disease progression prediction based on reinforcement reasoning over a knowledge graph[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(3): 1948–1959. doi: 10.1109/TSMC.2023.3331847.
[17]	CHEN Xiaojun, JIA Shengbin, and XIANG Yang. A review: Knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948. doi: 10.1016/j.eswa.2019.112948.
[18]	BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data[C]. The 27th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2013: 2787–2795.
[19]	LIN Yankai, LIU Zhiyuan, SUN Maosong, et al. Learning entity and relation embeddings for knowledge graph completion[C]. The 29th AAAI Conference on Artificial Intelligence, Austin, USA, 2015: 2181–2187. doi: 10.1609/aaai.v29i1.9491.
[20]	TROUILLON T, WELBL J, RIEDEL S, et al. Complex embeddings for simple link prediction[C]. The 33rd International Conference on Machine Learning, New York, USA, 2016: 2071–2080.
[21]	HE Zexue, YAN An, GENTILI A, et al. “Nothing abnormal”: Disambiguating medical reports via contrastive knowledge infusion[C]. The 37th AAAI Conference on Artificial Intelligence, Washington, D.C., USA, 2023: 14232–14240. doi: 10.1609/aaai.v37i12.26665.
[22]	SUN Zhiqing, DENG Zhihong, NIE Jianyun, et al. Rotate: Knowledge graph embedding by relational rotation in complex space[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019: 1–18.
[23]	QIU Jiezhong, TANG Jian, MA Hao, et al. DeepInf: Social influence prediction with deep learning[C]. The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, United Kingdom, 2018: 2110–2119. doi: 10.1145/3219819.3220077.
[24]	WANG Xiang, HE Xiangnan, CAO Yixin, et al. KGAT: Knowledge graph attention network for recommendation[C]. The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, USA, 2019: 950–958. doi: 10.1145/3292500.3330989.
[25]	RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[C]. The 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 2009: 452–461.
[26]	STEFAN N and CUSI K. A global view of the interplay between non-alcoholic fatty liver disease and diabetes[J]. The Lancet Diabetes & Endocrinology, 2022, 10(4): 284–296. doi: 10.1016/S2213-8587(22)00003-1.
[27]	CARRASCO-ZANINI J, PIETZNER M, KOPRULU M, et al. Proteomic prediction of diverse incident diseases: A machine learning-guided biomarker discovery study using data from a prospective cohort study[J]. The Lancet Digital Health, 2024, 6(7): e470–e479. doi: 10.1016/S2589-7500(24)00087-6.
[28]	LI Bo, QUAN Haowei, WANG Jiawei, et al. Neural library recommendation by embedding project-library knowledge graph[J]. IEEE Transactions on Software Engineering, 2024, 50(6): 1620–1638. doi: 10.1109/TSE.2024.3393504.
[29]	YANG Yuhao, HUANG Chao, XIA Lianghao, et al. Knowledge graph self-supervised rationalization for recommendation[C]. The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, USA, 2023: 3046–3056. doi: 10.1145/3580305.3599400.
[30]	KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015: 1–15.
[31]	HAMILTON W L, YING R, and LESKOVEC J. Inductive representation learning on large graphs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 1025–1035.