Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction

WANG Yuao; HUANG Yeqi; LI Qingyuan; LIU Yun; JING Shenqi; SHAN Tao; GUO Yongan

doi:10.11999/JEIT250798

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan. Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250798

Citation:

WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan. Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250798

Citation:

WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan. Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250798

PDF( 5622 KB)

Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction

doi: 10.11999/JEIT250798 cstr: 32379.14.JEIT250798

1.
Jiangsu Key Laboratory of Intelligent Information Processing and Communication Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
Department of Information, Jiangsu Province Hospital, Nanjing 210029, China

Funds: The National Key Research Program of China (2023YFC3605800), The Frontier Leading Technology Basic Research Program of Jiangsu Province (BK20202001), The Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX24_0285)

Received Date: 2025-08-26
Rev Recd Date: 2025-10-27

Available Online: 2025-11-04

Abstract

Abstract

Objective Diabetes mellitus and its complications are recognized as major global health challenges, causing severe morbidity, high healthcare costs, and reduced quality of life. Accurate joint prediction of these conditions is essential for early intervention but is hindered by data heterogeneity, sparsity, and complex inter-entity relationships. To address these challenges, a Representation Learning Enhanced Knowledge Graph-based Multi-Disease Prediction (REKG-MDP) model is proposed. Electronic Health Records (EHRs) are integrated with supplementary medical knowledge to construct a comprehensive Medical Knowledge Graph (MKG), and higher-order semantic reasoning combined with relation-aware representation learning is applied to capture complex dependencies and improve predictive accuracy across multiple diabetes-related conditions. Methods The REKG-MDP framework consists of three modules. First, a MKG is constructed by integrating structured EHR data from the MIMIC-IV dataset with external disease knowledge. Patient-side features include demographics, laboratory indices, and medical history, whereas disease-side attributes cover comorbidities, susceptible populations, etiological factors, and diagnostic criteria. This integration mitigates data sparsity and enriches semantic representation. Second, a relation-aware embedding module captures four relational patterns: symmetric, antisymmetric, inverse, and compositional. These patterns are used to optimize entity and relation embeddings for semantic reasoning. Third, a Hierarchical Attention-based Graph Convolutional Network (HA-GCN) aggregates multi-hop neighborhood information. Dynamic attention weights capture both local and global dependencies, and a bidirectional mechanism enhances the modeling of patient–disease interactions. Results and Discussions Experiments demonstrate that REKG-MDP consistently outperforms four baselines: two machine learning models (DCKD-RF and bSES-AC-RUN-FKNN) and two graph-based models (KGRec and PyRec). Compared with the strongest baseline, REKG-MDP achieves average improvements in P, F1, and NDCG of 19.39%, 19.67%, and 19.39% for single-disease prediction ($ n=1 $); 16.71%, 21.83%, and 23.53% for $ n=3 $; and 22.01%, 20.34%, and 20.88% for $ n=5 $ (Table 4). Ablation studies confirm the contribution of each module. Removing relation-pattern modeling reduces performance metrics by approximately 12%, removing hierarchical attention decreases them by 5–6%, and excluding disease-side knowledge produces the largest decline of up to 20% (Fig. 5). Sensitivity analysis indicates that increasing the embedding dimension from 32 to 128 enhances performance by more than 11%, whereas excessive dimensionality (256) leads to over-smoothing (Fig. 6). Adjusting the $ \beta $ parameter strengthens sample discrimination, improving P, F1, and NDCG by 9.28%, 27.9%, and 8.08%, respectively (Fig. 7). Conclusions REKG-MDP integrates representation learning with knowledge graph reasoning to enable multi-disease prediction. The main contributions are as follows: (1) integrating heterogeneous EHR data with disease knowledge mitigates data sparsity and enhances semantic representation; (2) modeling diverse relational patterns and applying hierarchical attention improves the capture of higher-order dependencies; and (3) extensive experiments confirm the model’s superiority over state-of-the-art baselines, with ablation and sensitivity analyses validating the contribution of each module. Remaining challenges include managing extremely sparse data and ensuring generalization across broader populations. Future research will extend REKG-MDP to model temporal disease progression and additional chronic conditions.
- Joint prediction of multiple diseases,
- Representation learning,
- Medical Knowledge Graph (MKG),
- Graph neural network,
- Attention mechanism

FullText(HTML)

References(31)

References

[1]	American Diabetes Association. Diagnosis and classification of diabetes mellitus[J]. Diabetes Care, 2014, 37(S1): S81–S90. doi: 10.2337/dc14-S081.
[2]	姚欣卉, 肖洪彬, 卞敬琦, 等. 丹参有效成分在治疗糖尿病及其并发症中的作用机制研究进展[J]. 中国实验方剂学杂志, 2021, 27(7): 209–218. doi: 10.13422/j.cnki.syfjx.20210401. YAO Xinhui, XIAO Hongbin, BIAN Jingqi, et al. New progress in mechanism of Salviae Miltiorrhizae Radix et Rhizoma in treatment of diabetes and its complications[J]. Chinese Journal of Experimental Traditional Medical Formulae, 2021, 27(7): 209–218. doi: 10.13422/j.cnki.syfjx.20210401.
[3]	GUAN Zhouyu, LI Huating, LIU Ruhan, et al. Artificial intelligence in diabetes management: Advancements, opportunities, and challenges[J]. Cell Reports Medicine, 2023, 4(10): 101213. doi: 10.1016/j.xcrm.2023.101213.
[4]	ZHANG Lufang, YU Renyue, CHEN Keya, et al. Enhancing deep vein thrombosis prediction in patients with coronavirus disease 2019 using improved machine learning model[J]. Computers in Biology and Medicine, 2024, 173: 108294. doi: 10.1016/j.compbiomed.2024.108294.
[5]	RAHMAN M M, AL-AMIN M, and HOSSAIN J. Machine learning models for chronic kidney disease diagnosis and prediction[J]. Biomedical Signal Processing and Control, 2024, 87: 105368. doi: 10.1016/j.bspc.2023.105368.
[6]	ALTHOBAITI T, ALTHOBAITI S, and SELIM M M. An optimized diabetes mellitus detection model for improved prediction of accuracy and clinical decision-making[J]. Alexandria Engineering Journal, 2024, 94: 311–324. doi: 10.1016/j.aej.2024.03.044.
[7]	AL-SSULAMI A M, ALSORORI R S, AZMI A M, et al. Improving coronary heart disease prediction through machine learning and an innovative data augmentation technique[J]. Cognitive Computation, 2023, 15(5): 1687–1702. doi: 10.1007/s12559-023-10151-6.
[8]	金怀平, 薛飞跃, 李振辉, 等. 基于病理图像集成深度学习的胃癌预后预测方法[J]. 电子与信息学报, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655. JIN Huaiping, XUE Feiyue, LI Zhenhui, et al. Prognostic prediction of gastric cancer based on ensemble deep learning of pathological images[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655.
[9]	季薇, 王传瑜, 吴迪, 等. 基于跨语种声学分析的帕金森病检测方法[J]. 电子与信息学报, 2024, 46(2): 546–554. doi: 10.11999/JEIT230981. JI Wei, WANG Chuanyu, WU Di, et al. Parkinson's disease detection method based on cross-language acoustic analysis[J]. Journal of Electronics & Information Technology, 2024, 46(2): 546–554. doi: 10.11999/JEIT230981.
[10]	GHORBANI M, KAZI A, BAGHSHAH M S, et al. RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data[J]. Medical Image Analysis, 2023, 75: 102272. doi: 10.1016/j.media.2021.102272.
[11]	ZHAO Qing, LI Jianqiang, ZHAO Linna, et al. Knowledge guided feature aggregation for the prediction of chronic obstructive pulmonary disease with Chinese EMRs[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 20(6): 3343–3352. doi: 10.1109/TCBB.2022.3198798.
[12]	PHAM T, TAO Xiaohui, ZHANG Ji, et al. Graph-based multi-label disease prediction model learning from medical data and domain knowledge[J]. Knowledge-Based Systems, 2022, 235: 107662. doi: 10.1016/j.knosys.2021.107662.
[13]	QU Zhe, CUI Lizhen, and XU Yonghui. Disease risk prediction via heterogeneous graph attention networks[C]. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, USA, IEEE, 2022: 3385–3390. doi: 10.1109/BIBM55620.2022.9995491.
[14]	LU Chang, HAN Tian, and NING Yue. Context-aware health event prediction via transition functions on dynamic disease graphs[C]. The 36th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2022: 4567–4574. doi: 10.1609/aaai.v36i4.20380.
[15]	熊立鹏, 徐修远, 牛颢, 等. 融合nmODE的术后肺部并发症预测模型[J]. 智能系统学报, 2025, 20(1): 198–205. doi: 10.11992/tis.202401007. XIONG Lipeng, XU Xiuyuan, NIU Hao, et al. Predicting postoperative pulmonary complications after lung surgery using nmODE[J]. CAAI Transactions on Intelligent Systems, 2025, 20(1): 198–205. doi: 10.11992/tis.202401007.
[16]	SUN Zhoujian, DONG Wei, SHI Jinlong, et al. Interpretable disease progression prediction based on reinforcement reasoning over a knowledge graph[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(3): 1948–1959. doi: 10.1109/TSMC.2023.3331847.
[17]	CHEN Xiaojun, JIA Shengbin, and XIANG Yang. A review: Knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948. doi: 10.1016/j.eswa.2019.112948.
[18]	BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data[C]. The 27th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2013: 2787–2795.
[19]	LIN Yankai, LIU Zhiyuan, SUN Maosong, et al. Learning entity and relation embeddings for knowledge graph completion[C]. The 29th AAAI Conference on Artificial Intelligence, Austin, USA, 2015: 2181–2187. doi: 10.1609/aaai.v29i1.9491.
[20]	TROUILLON T, WELBL J, RIEDEL S, et al. Complex embeddings for simple link prediction[C]. The 33rd International Conference on Machine Learning, New York, USA, 2016: 2071–2080.
[21]	HE Zexue, YAN An, GENTILI A, et al. “Nothing abnormal”: Disambiguating medical reports via contrastive knowledge infusion[C]. The 37th AAAI Conference on Artificial Intelligence, Washington, D.C., USA, 2023: 14232–14240. doi: 10.1609/aaai.v37i12.26665.
[22]	SUN Zhiqing, DENG Zhihong, NIE Jianyun, et al. Rotate: Knowledge graph embedding by relational rotation in complex space[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019: 1–18.
[23]	QIU Jiezhong, TANG Jian, MA Hao, et al. DeepInf: Social influence prediction with deep learning[C]. The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, United Kingdom, 2018: 2110–2119. doi: 10.1145/3219819.3220077.
[24]	WANG Xiang, HE Xiangnan, CAO Yixin, et al. KGAT: Knowledge graph attention network for recommendation[C]. The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, USA, 2019: 950–958. doi: 10.1145/3292500.3330989.
[25]	RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[C]. The 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 2009: 452–461.
[26]	STEFAN N and CUSI K. A global view of the interplay between non-alcoholic fatty liver disease and diabetes[J]. The Lancet Diabetes & Endocrinology, 2022, 10(4): 284–296. doi: 10.1016/S2213-8587(22)00003-1.
[27]	CARRASCO-ZANINI J, PIETZNER M, KOPRULU M, et al. Proteomic prediction of diverse incident diseases: A machine learning-guided biomarker discovery study using data from a prospective cohort study[J]. The Lancet Digital Health, 2024, 6(7): e470–e479. doi: 10.1016/S2589-7500(24)00087-6.
[28]	LI Bo, QUAN Haowei, WANG Jiawei, et al. Neural library recommendation by embedding project-library knowledge graph[J]. IEEE Transactions on Software Engineering, 2024, 50(6): 1620–1638. doi: 10.1109/TSE.2024.3393504.
[29]	YANG Yuhao, HUANG Chao, XIA Lianghao, et al. Knowledge graph self-supervised rationalization for recommendation[C]. The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, USA, 2023: 3046–3056. doi: 10.1145/3580305.3599400.
[30]	KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015: 1–15.
[31]	HAMILTON W L, YING R, and LESKOVEC J. Inductive representation learning on large graphs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 1025–1035.