Advanced Search
Turn off MathJax
Article Contents
ZHANG Yangzi, XU Ting, GAO Zhaoya, SI Zhenduo, XU Weiran. Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250849
Citation: ZHANG Yangzi, XU Ting, GAO Zhaoya, SI Zhenduo, XU Weiran. Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250849

Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases

doi: 10.11999/JEIT250849 cstr: 32379.14.JEIT250849
Funds:  The Ministry of Education Industry-University Cooperative Education Project (2506193449); The National Key Laboratory of Virtual Reality Technology and Systems at Beihang University (VRLAB2025C15); Beijing Tiantan Hospital Institutional Research Fund (TYGL202402)
  • Received Date: 2025-09-01
  • Accepted Date: 2025-11-12
  • Rev Recd Date: 2025-11-06
  • Available Online: 2025-11-18
  •   Objective   ColoRectal Cancer (CRC) is the third most commonly diagnosed malignancy worldwide. Approximately 25–50% of patients with CRC develop liver metastases during the course of their disease, which increases the disease burden. Although the MultiDisciplinary Team (MDT) model improves survival in ColoRectal Liver Metastases (CRLM), its broader implementation is limited by delayed knowledge updates and regional differences in medical standards. Large Language Models (LLMs) can integrate multimodal data, clinical guidelines, and recent research findings, and can generate structured diagnostic and therapeutic recommendations. These features suggest potential to support MDT-based care. However, the actual effectiveness of LLMs in MDT decision-making for CRLM has not been systematically evaluated. This study assesses the performance of DeepSeek-V3.1 and ChatGPT-5 in supporting MDT decisions for CRLM and examines the consistency of their recommendations with MDT expert consensus. The findings provide evidence-based guidance and identify directions for optimizing LLM applications in clinical practice.  Methods   Six representative virtual CRLM cases are designed to capture key clinical dimensions, including colorectal tumor recurrence risk, resectability of liver metastases, genetic mutation profiles (e.g., KRAS/BRAF mutations, HER2 amplification status, and microsatellite instability), and patient functional status. Using a structured prompt strategy, MDT treatment recommendations are generated separately by the DeepSeek-V3.1 and ChatGPT-5 models. Independent evaluations are conducted by four MDT specialists from gastrointestinal oncology, gastrointestinal surgery, hepatobiliary surgery, and radiation oncology. The model outputs are scored using a 5-point Likert scale across seven dimensions: accuracy, comprehensiveness, frontier relevance, clarity, individualization, hallucination risk, and ethical safety. Statistical analysis is performed to compare the performance of DeepSeek-V3.1 and ChatGPT-5 across individual cases, evaluation dimensions, and clinical disciplines.  Results and Discussions   Both LLMs, DeepSeek-V3.1 and ChatGPT-5, show robust performance across all six virtual CRLM cases, with an average overall score of ≥ 4.0 on a 5-point scale. This performance indicates that clinically acceptable decision support is provided within a complex MDT framework. DeepSeek-V3.1 shows superior overall performance compared with ChatGPT-5 (4.27±0.77 vs. 4.08±0.86, P=0.03). Case-by-case analysis shows that DeepSeek-V3.1 performs significantly better in Cases 1, 4, and 6 (P=0.04, P<0.01, and P =0.01, respectively), whereas ChatGPT-5 receives higher scores in Case 2 (P<0.01). No significant differences are observed in Cases 3 and 5 (P=0.12 and P=1.00, respectively), suggesting complementary strengths across clinical scenarios (Table 3). In the multidimensional assessment, both models receive high scores (range: 4.12$ \sim $4.87) in clarity, individualization, hallucination risk, and ethical safety, confirming that readable, patient-tailored, reliable, and ethically sound recommendations are generated. Improvements are still needed in accuracy, comprehensiveness, and frontier relevance (Fig. 1). DeepSeek-V3.1 shows a significant advantage in frontier relevance (3.90±0.65 vs. 3.42±0.72, P=0.03) and ethical safety (4.87±0.34 vs. 4.58±0.65, P= 0.03) (Table 4), indicating more effective incorporation of recent evidence and more consistent delivery of ethically robust guidance. For the case with concomitant BRAF V600E and KRAS G12D mutations, DeepSeek-V3.1 accurately references a phase III randomized controlled study published in the New England Journal of Medicine in 2025 and recommends a triple regimen consisting of a BRAF inhibitor + EGFR monoclonal antibody + FOLFOX. By contrast, ChatGPT-5 follows conventional recommendations for RAS/BRAF mutant populations-FOLFOXIRI+bevacizumab-without integrating recent evidence on targeted combination therapy. This difference shows the effect of timely knowledge updates on the clinical value of LLM-generated recommendations. For MSI-H CRLM, ChatGPT-5’s recommendation of “postoperative immunotherapy” is not supported by phase III evidence or existing guidelines. Direct use of such recommendations may lead to overtreatment or ineffective therapy, representing a clear ethical concern and illustrating hallucination risks in LLMs. Discipline-specific analysis shows notable variation. In radiation oncology, DeepSeek-V3.1 provides significantly more precise guidance on treatment timing, dosage, and techniques than ChatGPT-5 (4.55±0.67 vs. 3.38±0.91, P<0.01), demonstrating closer alignment with clinical guidelines. In contrast, ChatGPT-5 performs better in gastrointestinal surgery (4.48±0.67 vs. 4.17 ±0.85, P=0.02), with experts rating its recommendations on surgical timing and resectability as more concise and accurate. No significant differences are identified in gastrointestinal oncology and hepatobiliary surgery (P=0.89 and P=0.14, respectively), indicating comparable performance in these areas (Table 5). These findings show a performance bias across medical sub-specialties, demonstrating that LLM effectiveness depends on the distribution and quality of training data.  Conclusions   Both DeepSeek-V3.1 and ChatGPT-5 demonstrated strong capabilities in providing reliable recommendations for CRLM-MDT decision-making. Specifically, DeepSeek-V3.1 showed notable advantages in integrating cutting-edge knowledge, ensuring ethical safety, and performing in the field of radiation oncology, whereas ChatGPT-5 excelled in gastrointestinal surgery, reflecting a complementary strength between the two models. This study confirms the feasibility of leveraging LLMs as “MDT collaborators”, offering a readily applicable and robust technical solution to bridge regional disparities in clinical expertise and enhance the efficiency of decision-making. However, model hallucination and insufficient evidence grading remain key limitations. Moving forward, mechanisms such as real-world clinical validation, evidence traceability, and reinforcement learning from human feedback are expected to further advance LLMs into more powerful auxiliary tools for CRLM-MDT decision support.
  • loading
  • [1]
    SUNG H, FERLAY J, SIEGEL R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2021, 71(3): 209–249. doi: 10.3322/caac.21660.
    [2]
    MARTIN J, PETRILLO A, SMYTH E C, et al. Colorectal liver metastases: Current management and future perspectives[J]. World Journal of Clinical Oncology, 2020, 11(10): 761–808. doi: 10.5306/wjco.v11.i10.761.
    [3]
    REBOUX N, JOOSTE V, GOUNGOUNGA J, et al. Incidence and survival in synchronous and metachronous liver metastases from colorectal cancer[J]. JAMA Network Open, 2022, 5(10): e2236666. doi: 10.1001/jamanetworkopen.2022.36666.
    [4]
    VALDERRAMA-TREVIÑO A I, BARRERA-MERA B, CEBALLOS-VILLALVA J C, et al. Hepatic metastasis from colorectal cancer[J]. Euroasian Journal of Hepato-Gastroenterology, 2017, 7(2): 166–175. doi: 10.5005/jp-journals-10018-1241.
    [5]
    ZEINEDDINE F A, ZEINEDDINE M A, YOUSEF A, et al. Survival improvement for patients with metastatic colorectal cancer over twenty years[J]. npj Precision Oncology, 2023, 7(1): 16. doi: 10.1038/s41698-023-00353-4.
    [6]
    LAN Y T, JIANG J K, CHANG S C, et al. Improved outcomes of colorectal cancer patients with liver metastases in the era of the multidisciplinary teams[J]. International Journal of Colorectal Disease, 2016, 31(2): 403–411. doi: 10.1007/s00384-015-2459-4.
    [7]
    TOPOL E J. High-performance medicine: The convergence of human and artificial intelligence[J]. Nature Medicine, 2019, 25(1): 44–56. doi: 10.1038/s41591-018-0300-7.
    [8]
    THIRUNAVUKARASU A J, TING D S J, ELANGOVAN K, et al. Large language models in medicine[J]. Nature Medicine, 2023, 29(8): 1930–1940. doi: 10.1038/s41591-023-02448-8.
    [9]
    SINGHAL K, AZIZI S, TU Tao, et al. Large language models encode clinical knowledge[J]. Nature, 2023, 620(7972): 172–180. doi: 10.1038/s41586-023-06291-2.
    [10]
    MESKÓ B and GÖRÖG M. A short guide for medical professionals in the era of artificial intelligence[J]. npj Digital Medicine, 2020, 3: 126. doi: 10.1038/s41746-020-00333-z.
    [11]
    PARK Y E and CHAE H. The fidelity of artificial intelligence to multidisciplinary tumor board recommendations for patients with gastric cancer: A retrospective study[J]. Journal of Gastrointestinal Cancer, 2024, 55(1): 365–372. doi: 10.1007/s12029-023-00967-8.
    [12]
    HORESH N, EMILE S H, GUPTA S, et al. Comparing the management recommendations of large language model and colorectal cancer multidisciplinary team: A pilot study[J]. Diseases of the Colon & Rectum, 2025, 68(1): 41–47. doi: 10.1097/DCR.0000000000003504.
    [13]
    CHOO J M, RYU H S, KIM J S, et al. Conversational artificial intelligence (chatGPT™) in the management of complex colorectal cancer patients: Early experience[J]. ANZ Journal of Surgery, 2024, 94(3): 356–361. doi: 10.1111/ans.18749.
    [14]
    UMIHANIC S, OSMANOVIC H, SELAK N, et al. Evaluating the concordance between ChatGPT and multidisciplinary teams in breast cancer treatment planning: A study from Bosnia and Herzegovina[J]. Journal of Clinical Medicine, 2025, 14(18): 6460. doi: 10.3390/jcm14186460.
    [15]
    AMMO T, GUILLAUME V G J, HOFMANN U K, et al. Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: Heterogeneous performance across various specialties[J]. Frontiers in Oncology, 2025, 14: 1526288. doi: 10.3389/fonc.2024.1526288.
    [16]
    LEE J T, LI V C S, WU J J, et al. Evaluation of performance of generative large language models for stroke care[J]. npj Digital Medicine, 2025, 8(1): 481. doi: 10.1038/s41746-025-01830-9.
    [17]
    ELEZ E, YOSHINO T, SHEN Lin, et al. Encorafenib, cetuximab, and mFOLFOX6 in BRAF-mutated colorectal cancer[J]. New England Journal of Medicine, 2025, 392(24): 2425–2437. doi: 10.1056/NEJMoa2501912.
    [18]
    OMAR M, SORIN V, COLLINS J D, et al. Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support[J]. Communications Medicine, 2025, 5(1): 330. doi: 10.1038/s43856-025-01021-3.
    [19]
    AHN S. A guide to evade hallucinations and maintain reliability when using large language models for medical research: A narrative review[J]. Annals of Pediatric Endocrinology & Metabolism, 2025, 30(3): 115–118. doi: 10.6065/apem.2448278.139.
    [20]
    LIU Jiaxi. ChatGPT: Perspectives from human-computer interaction and psychology[J]. Frontiers in Artificial Intelligence, 2024, 7: 1418869. doi: 10.3389/frai.2024.1418869.
    [21]
    RAJARAM A, LI H, HOLODINSKY J K, et al. Opening the black box: Challenges and opportunities regarding interpretability of artificial intelligence in emergency medicine[J]. Canadian Journal of Emergency Medicine, 2025, 27(2): 83–86. doi: 10.1007/s43678-024-00827-9.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)  / Tables(5)

    Article Metrics

    Article views (69) PDF downloads(7) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return