Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients

ZHANG Junli; XU Weiran; WANG Zhao

doi:10.11999/JEIT251056

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 >

ZHANG Junli, XU Weiran, WANG Zhao. Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251056

Citation:

ZHANG Junli, XU Weiran, WANG Zhao. Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251056

ZHANG Junli, XU Weiran, WANG Zhao. Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251056

Citation:

PDF( 1059 KB)

Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients

doi: 10.11999/JEIT251056 cstr: 32379.14.JEIT251056

1.
Department of Oncology, Beijing Tiantan Hospital, Capitial Medical University, Beijing 100070, China
2.
Department of Cardiology, Peking University Third Hospital, Beijing 100191, China

Funds: Items1, the National Key Laboratory of Virtual Reality Technology and Systems at Beihang University (Open Project No. VRLAB2025C15), Items2, the Beijing Tiantan Hospital Institutional Research Fund (Management Project, TYGL202402)

Accepted Date: 2026-01-22
Rev Recd Date: 2026-01-22

Available Online: 2026-02-11

Abstract

Abstract

Objective With the rapid increase in cancer incidence and mortality worldwide, patient education has become a critical strategy for reducing the disease burden and improving patient outcomes. However, traditional education methods, such as paper-based materials or face-to-face consultations, are limited by time, space, and personalization constraints. The emergence of large language models (LLMs) has opened new opportunities for delivering intelligent, scalable, and personalized health education. Although domestic LLMs, such as Doubao, Kimi, and DeepSeek have been widely applied in general scenarios, their utility in oncology education remains underexplored. This study aimed to systematically evaluate the performance of three domestic LLMs in cancer patient education across multiple dimensions, providing empirical evidence for their potential clinical application and optimization. Methods Frequently asked patient education questions were collected through group discussions with oncology nurses from a tertiary hospital. Nineteen oncology nurses with ≥1 year of clinical experience participated in item selection, and the ten most common questions were chosen, covering domains such as diet, nutrition, treatment, adverse drug reactions, and prognosis. Each question was independently input into Doubao (Pro, ByteDance, May 2024), Kimi (V1.1, Moonshot AI, Nov 2023), and DeepSeek (R1, DeepSeek AI, Jan 2025) under “new chat” conditions to avoid contextual interference. Responses were standardized to remove model identifiers and randomly coded. Quality evaluation followed a blinded design. Thirteen inpatients with cancer assessed responses for readability and effectiveness, while six senior oncologists rated responses for accuracy, comprehensiveness, and professionalism. A self-designed five-point Likert scale was used for each dimension. Statistical analyses were conducted using GraphPad Prism 9.5.1. One-way ANOVA with Bonferroni correction was applied for dimensional comparisons, while Welch’s ANOVA and Games-Howell post hoc tests were used for overall score analysis. Results were visualized with tables and radar plots. Results and Discussions Overall, the three models achieved mean total scores of 4.05±0.687 (Doubao), 4.17±0.791 (Kimi), and 4.19±0.640 (DeepSeek). Welch’s ANOVA showed significant overall differences (F=5.537, P=0.004). Games-Howell analysis revealed that Doubao performed significantly worse than Kimi and DeepSeek (P=0.005 and 0.042, respectively), while Kimi and DeepSeek did not differ significantly (P=0.975) .From the patient perspective, Kimi outperformed its peers, achieving the highest scores in readability (4.615±0.534) and effectiveness (4.476±0.560), with statistically significant differences (P<0.05). Patients rated Kimi’s responses to lifestyle-related queries, such as managing nausea or loss of appetite during chemotherapy, as particularly clear and actionable. From the expert perspective, DeepSeek demonstrated superiority in accuracy (4.117±0.846), comprehensiveness (4.100±0.681), and professionalism (3.917±0.645), with significant advantages over Kimi (P<0.01) and moderate superiority over Doubao (P<0.05). DeepSeek was favored for handling technical and evidence-based questions, such as drug metabolism or integrative therapy evaluation. The divergence between patient and expert assessments highlighted a mismatch: the “most understandable” responses (Kimi) were not always the “most professional” (DeepSeek). This complementarity suggests that future research should explore layered output formats or dual verification mechanisms. Such approaches would balance readability with professional rigor, minimizing the risks of misinformation while improving accessibility. Despite promising findings, limitations exist. This single-center study involved a relatively small sample size, and only patients with lung and breast cancer were included. The evaluation simulated static Q&A interactions rather than dynamic multi-turn dialogues, which are more representative of real-world consultations. Additionally, technical enhancements such as retrieval-augmented generation (RAG), fine-tuning with oncology-specific corpora, and multi-agent collaboration were not implemented. Future studies should expand to multi-center designs, diverse cancer populations, and advanced LLM optimization methods. Conclusions Domestic LLMs demonstrated significant potential as tools for cancer patient education. Kimi excelled in communication and patient-centered knowledge translation, while DeepSeek showed strength in professional accuracy and comprehensiveness. Doubao, although moderate across all dimensions, lagged behind in overall performance. The results indicate that LLMs can complement traditional health education by bridging the gap between patient comprehension and clinical expertise.
- Large Language Models (LLMs),
- Cancer,
- Patient education,
- Readability,
- Professional accuracy

FullText(HTML)

References(18)

References

[1]	BRAY F, LAVERSANNE M, SUNG H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2024, 74(3): 229–263. doi: 10.3322/caac.21834.
[2]	HAN Bingfeng, ZHENG Rongshou, ZENG Hongmei, et al. Cancer incidence and mortality in China, 2022[J]. Journal of the National Cancer Center, 2024, 4(1): 47–53. doi: 10.1016/j.jncc.2024.01.006.
[3]	陈润生. 医疗大数据结合大语言模型的应用展望[J]. 四川大学学报: 医学版, 2023, 54(5): 855–856. doi: 10.12182/20230960301. CHEN Runsheng. Prospects for the application of healthcare big data combined with large language models[J]. Journal of Sichuan University: Medical Sciences, 2023, 54(5): 855–856. doi: 10.12182/20230960301.
[4]	LI Lulu, DU Pengqiang, HUANG Xiaojing, et al. Comparative analysis of generative artificial intelligence systems in solving clinical pharmacy problems: Mixed methods study[J]. JMIR Medical Informatics, 2025, 13: e76128. doi: 10.2196/76128.
[5]	MOONS P and VAN BULCK L. Using ChatGPT and Google bard to improve the readability of written patient information: A proof of concept[J]. European Journal of Cardiovascular Nursing, 2024, 23(2): 122–126. doi: 10.1093/eurjcn/zvad087.
[6]	陈洁, 孟庆童, 刘惊今, 等. 基于指南的慢性心力衰竭患者运动康复科普手册研制[J]. 护理学报, 2024, 31(9): 36–41. doi: 10.16460/j.issn1008-9969.2024.09.036. CHEN Jie, MENG Qingtong, LIU Jingjin, et al. Development of guideline-based exercise rehabilitation handbook for patients with chronic heart failure[J]. Journal of Nursing (China), 2024, 31(9): 36–41. doi: 10.16460/j.issn1008-9969.2024.09.036.
[7]	曹广文. 我国癌症的流行特点、防控现状及未来应对策略[J]. 海军军医大学学报, 2025, 46(3): 279–290. doi: 10.16781/j.CN31-2187/R.20250050. CAO Guangwen. Cancer in China: Epidemiological characteristics, current prophylaxis and treatment, and future strategy[J]. Academic Journal of Naval Medical University, 2025, 46(3): 279–290. doi: 10.16781/j.CN31-2187/R.20250050.
[8]	司国金, 吕章艳, 李文轩, 等. 天津市肿瘤筛查人群的癌症防治核心知识及健康素养调查分析[J]. 中华肿瘤防治杂志, 2025, 32(1): 1–9. doi: 10.16073/j.cnki.cjcpt.2025.01.01. SI Guojin, LV Zhangyan, LI Wenxuan, et al. Investigation and analysis of cancer prevention and treatment core knowledge awareness and health literacy among the tumor screening population in Tianjin[J]. Chinese Journal of Cancer Prevention and Treatment, 2025, 32(1): 1–9. doi: 10.16073/j.cnki.cjcpt.2025.01.01.
[9]	GIBSON D, JACKSON S, SHANMUGASUNDARAM R, et al. Evaluating the efficacy of ChatGPT as a patient education tool in prostate cancer: Multimetric assessment[J]. Journal of Medical Internet Research, 2024, 26: e55939. doi: 10.2196/55939.
[10]	CHEN D, PARSA R, SWANSON K, et al. Large language models in oncology: A review[J]. BMJ Oncology, 2025, 4(1): e000759. doi: 10.1136/bmjonc-2025-000759.
[11]	MOONS P and VAN BULCK L. ChatGPT: Can artificial intelligence language models be of value for cardiovascular nurses and allied health professionals[J]. European Journal of Cardiovascular Nursing, 2023, 22(7): e55–e59. doi: 10.1093/eurjcn/zvad022.
[12]	AMMO T, GUILLAUME V G J, HOFMANN U K, et al. Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: Heterogeneous performance across various specialties[J]. Frontiers in Oncology, 2025, 14: 1526288. doi: 10.3389/fonc.2024.1526288.
[13]	RYDZEWSKI N R, DINAKARAN D, ZHAO S G, et al. Comparative evaluation of LLMs in clinical oncology[J]. NEJM AI, 2024, 1(5): AIoa2300151. doi: 10.1056/AIoa2300151. (查阅网上资料,无法确认页码信息是否正确).
[14]	XIE Yaojue, ZHAI Yuansheng, and LU Guihua. Evolution of artificial intelligence in healthcare: A 30-year bibliometric study[J]. Frontiers in Medicine, 2025, 11: 1505692. doi: 10.3389/fmed.2024.1505692.
[15]	BEDI S, LIU Yutong, ORR-EWING L, et al. Testing and evaluation of health care applications of large language models: A systematic review[J]. JAMA, 2025, 333(4): 319–328. doi: 10.1001/jama.2024.21700.
[16]	SORIN V, KLANG E, SKLAIR-LEVY M, et al. Large language model (ChatGPT) as a support tool for breast tumor board[J]. npj Breast Cancer, 2023, 9(1): 44. doi: 10.1038/s41523-023-00557-8.
[17]	CHAVEZ M R, BUTLER T S, REKAWEK P, et al. Chat generative pre-trained transformer: Why we should embrace this technology[J]. American Journal of Obstetrics and Gynecology, 2023, 228(6): 706–711. doi: 10.1016/j.ajog.2023.03.010.
[18]	王晨琪, 肖洪玲, 吴亚轩, 等. 大语言模型在护理领域的应用及展望——以ChatGPT为代表[J]. 护士进修杂志, 2024, 39(12): 1296–1300. doi: 10.16821/j.cnki.hsjx.2024.12.012. WANG Chenqi, XIAO Hongling, WU Yaxuan, et al. Application and perspectives of large language model in the field of nursing: Represented by ChatGPT[J]. Journal of Nurses Training, 2024, 39(12): 1296–1300. doi: 10.16821/j.cnki.hsjx.2024.12.012.