A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning

LU Jiafa; TANG Kai; ZHANG Guoming; YU Xiaofan; GU Wenqi; LI Zhuo

doi:10.11999/JEIT250909

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 >

LU Jiafa, TANG Kai, ZHANG Guoming, YU Xiaofan, GU Wenqi, LI Zhuo. A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250909

Citation:

LU Jiafa, TANG Kai, ZHANG Guoming, YU Xiaofan, GU Wenqi, LI Zhuo. A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250909

Citation:

PDF( 1181 KB)

A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning

doi: 10.11999/JEIT250909 cstr: 32379.14.JEIT250909

1.
School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing210003, China
3.
School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Funds: National Science and Technology Major Project - Implementation Program for the Promotion of Chronic Disease Prevention and Control Technologies

Accepted Date: 2026-01-22
Rev Recd Date: 2026-01-22

Available Online: 2026-02-11

Abstract

Abstract

Objective The structuring of Traditional Chinese Medicine (TCM) electronic medical records (EMRs) is essential for enabling knowledge discovery, clinical decision support, and intelligent diagnosis. However, two significant barriers exist: (1) TCM EMRs are primarily unstructured free text and often paired with tongue images, which complicates automated processing; and (2) grassroots hospitals typically face limited GPU resources, preventing deployment of large-scale pretrained models. This study aims to resolve these challenges by proposing a lightweight multimodal model based on memory-constrained pruning. The approach is designed to retain near–state-of-the-art accuracy while dramatically reducing memory consumption and computation cost, thereby ensuring practical applicability in resource-limited healthcare settings. Methods A three-stage architecture is established, consisting of an encoder, a multimodal fusion module, and a decoder. For textual inputs, a distilled TinyBERT encoder is combined with a BiLSTM-CRF decoder to extract 23 categories of TCM clinical entities, including symptoms, syndromes, prescriptions, and herbs. For visual inputs, a ResNet-50 encoder processes tongue diagnosis images. A novel memory-constrained pruning strategy is introduced: an LSTM decision network observes convolutional feature maps and adaptively prunes redundant channels while retaining crucial diagnostic features. To expand pruning flexibility, gradient re-parameterization and dynamic channel grouping are employed, with stability ensured through a reinforcement-learning controller. In parallel, INT8 mixed-precision quantization, gradient accumulation, and dynamic batch pruning (DBP) are adopted to reduce memory usage. Finally, a TCM terminology–enhanced lexicon is incorporated into the encoder embeddings to address recognition of rare entities. The entire system is trained end-to-end on paired EMR–tongue datasets (Fig. 1), ensuring joint optimization of multimodal information flow. Results and Discussions Experiments are conducted on 10,500 de-identified EMRs paired with tongue images, collected from 21 tertiary hospitals. On an RTX 3060 GPU, the proposed model achieves an F1-score of 91.7%, with peak GPU memory reduced to 3.8 GB and inference speed improved to 22 records per second (Table 1). Compared with BERT-Large, memory consumption decreases by 75% and throughput increases 2.7×, while accuracy remains comparable. Ablation studies confirm the contributions of each module: the adaptive attention gating mechanism raises overall F1 by 2.8% (Table 2); DBP reduces memory usage by 40–62% with minimal accuracy loss and significantly improves performance on EMRs exceeding 5,000 characters (Fig. 2); and the terminology-enhanced lexicon boosts recognition of rare entities such as “blood stasis” by 6.2%. Moreover, structured EMR fields enable association rule mining, where the confidence of syndrome–symptom relationships increases by 18% (Algorithm 1). These findings highlight three main insights: (1) multimodal fusion with lightweight design yields clinical benefits beyond unimodal models; (2) memory-constrained pruning offers stable channel reduction under strict hardware limits, outperforming traditional magnitude-based pruning; and (3) pruning, quantization, and dynamic batching exhibit strong synergy when co-designed, rather than used independently. Collectively, these results demonstrate the feasibility of deploying high-performing TCM EMR structuring systems in real-world environments with limited computational capacity. Conclusions This work proposes and validates a lightweight multimodal framework for structuring TCM EMRs. By introducing memory-constrained pruning combined with quantization and dynamic batch pruning, the method significantly compresses the visual encoder while maintaining fusion accuracy between text and images. The approach delivers near–state-of-the-art performance with drastically reduced hardware requirements, enabling deployment in regional hospitals and clinics. Beyond immediate efficiency gains, the structured multimodal outputs enrich TCM knowledge graphs and improve the reliability of downstream tasks such as syndrome classification and treatment recommendation. The study thus provides both theoretical and practical contributions: it bridges the gap between powerful pretrained models and the limited hardware of grassroots medical institutions, and establishes a scalable paradigm for lightweight multimodal NLP in medical informatics. Future directions include incorporating additional modalities such as pulse-wave signals, extending pruning strategies with graph neural networks, and exploring adaptive cross-modal attention mechanisms to further enhance clinical applicability.
- Pruning Lightweight,
- TCM Electronic Medical Records,
- TCM Entity Recognition,
- Multimodal Fusion

FullText(HTML)

References(28)

References

[1]	国家卫生健康委, 国家发展改革委, 教育部, 等. 2023年度全国三级公立中医医院绩效监测分析情况通报[EB/OL]. http://www.natcm.gov.cn/yizhengsi/zhengcewenjian/2025-03-28/36079.html, 2025. (查阅网上资料,未找到本条文献英文翻译,请确认并补充).
[2]	张敏, 李军, 王芳, 等. 基于CNN-BiLSTM的电子病历实体识别研究[J]. 计算机应用, 2023, 43(5): 1567–1573. (查阅网上资料, 未找到本条文献信息, 请确认). ZHANG Min, LI Jun, WANG Fang, et al. Entity recognition in electronic medical records based on CNN-BiLSTM[J]. Journal of Computer Applications, 2023, 43(5): 1567–1573.
[3]	WANG J, LI Y, and ZHANG Q. Lightweight BERT for TCM entity recognition in primary hospitals[C]. International Conference on Biomedical and Health Informatics (BHI), 2023: 1–5. (查阅网上资料, 未找到本条文献信息, 请确认).
[4]	王建国, 刘敏, 张强. 中医电子病历结构化方法研究进展[J]. 北京中医药大学学报, 2022, 45(8): 789–795. (查阅网上资料, 未找到本条文献信息, 请确认). WANG Jianguo, LIU Min, and ZHANG Qiang. Progress in structured methods of TCM electronic medical records[J]. Journal of Beijing University of Traditional Chinese Medicine, 2022, 45(8): 789–795.
[5]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, USA, 2019: 4171–4186. doi: 10.18653/v1/N19-1423.
[6]	李华, 王磊, 张明, 等. 基于轻量化和极化自注意力的中医望诊异常形态分类[J]. 数字中医药, 2024, 3(4): 25–33. (查阅网上资料, 未找到本条文献信息, 请确认).
[7]	JIAO Xiaoqi, YIN Yichun, SHANG Lifeng, et al. TinyBERT: Distilling BERT for natural language understanding[C]. Findings of the Association for Computational Linguistics: EMNLP, Hong Kong, China, 2020: 4163–4174. doi: 10.18653/v1/2020.findings-emnlp.372. (查阅网上资料,未找到本条文献出版地信息,请确认).
[8]	李明, 张华, 王红. 基于注意力机制的中医实体识别轻量化研究[J]. 电子与信息学报, 2023, 45(2): 312–318. (查阅网上资料, 未找到本条文献信息, 请确认). LI Ming, ZHANG Hua, and WANG Hong. TCM entity recognition based on attention mechanism[J]. Journal of Electronics & Information Technology, 2023, 45(2): 312–318.
[9]	RONNEBERGER O, FISCHER P, and BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI, Munich, Germany, 2015: 234–241. doi: 10.1007/978-3-319-24574-4_28.
[10]	中国中医药信息学会. T/CIATCM 013-2019 中医电子病历基本数据集[S]. 2019. (查阅网上资料, 未找到本条文献出版信息, 请确认并补充). China Information Association of Traditional Chinese Medicine. T/CIATCM 013-2019 Basic datasets for electronic medical records of traditional Chinese medicine[S]. 2019.
[11]	国家中医药管理局. 中医诊断学术语[S]. 北京: 国家中医药管理局, 2021. (查阅网上资料, 未找到本条文献信息, 请确认).
[12]	国家中医药管理局. 中药处方规范[S]. 北京: 国家中医药管理局, 2010. (查阅网上资料, 未找到本条文献信息, 请确认).
[13]	范骁辉, 张俊华, 等. TCMChat: 基于LoRA的中医药生成式大模型[J]. 药理研究, 2025, 112: 105986. (查阅网上资料, 未找到本条文献信息, 请确认).
[14]	LI Hao, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[C]. 5th International Conference on Learning Representations, Toulon, France, 2017.
[15]	LIU Zhuang, SUN Mingjie, ZHOU Tinghui, et al. Rethinking the value of network pruning[C]. 7th International Conference on Learning Representations, New Orleans, USA, 2019.
[16]	KUNDU S, NAZEMI M, BEEREL P A, et al. DNR: A tunable robust pruning framework through dynamic network rewiring of DNNs[C]. Proceedings of the 26th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2021: 344–350. doi: 10.1145/3394885.3431542.
[17]	JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2704–2713. doi: 10.1109/CVPR.2018.00286.
[18]	HAN Song, MAO Huizi, and DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding[J]. arXiv preprint arXiv: 1510.00149, 2015. doi: 10.48550/arXiv.1510.00149. (查阅网上资料,请作者核对文献类型及格式是否正确).
[19]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
[20]	ZHAO Xiongjun, WANG Xiang, YU Fenglei, et al. UniMed: Multimodal multitask learning for medical predictions[C]. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Las Vegas, USA, 2022: 1399–1404. doi: 10.1109/BIBM55620.2022.9995044.
[21]	MA Xinyin, FANG Gongfan, and WANG Xinchao. LLM-pruner: On the structural pruning of large language models[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 950.
[22]	POUDEL P, CHHETRI A, GYAWALI P, et al. Multimodal federated learning with missing modalities through feature imputation network[C]. 29th Annual Conference on Medical Image Understanding and Analysis, Leeds, UK, 2026: 289–299. doi: 10.1007/978-3-031-98688-8_20.
[23]	BACK J, AHN N, and KIM J. Magnitude attention-based dynamic pruning[J]. Expert Systems with Applications, 2025, 276: 126957. doi: 10.1016/j.eswa.2025.126957.
[24]	LIU Jiaxin, LIU Wei, LI Yongming, et al. Attention-based adaptive structured continuous sparse network pruning[J]. Neurocomputing, 2024, 590: 127698. doi: 10.1016/j.neucom.2024.127698.
[25]	TAN Mingxing and LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[C]. 36th International Conference on Machine Learning, Long Beach, USA, 2019: 6105–6114.
[26]	WU Qinzhuo, XU Weikai, LIU Wei, et al. MobileVLM: A vision-language model for better intra- and inter-UI understanding[C]. Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, USA, 2024: 10231–10251. doi: 10.18653/v1/2024.findings-emnlp.599.
[27]	FU Zheren, ZHANG Lei, XIA Hou, et al. Linguistic-aware patch slimming framework for fine-grained cross-modal alignment[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 26297–26306. doi: 10.1109/CVPR52733.2024.02485.
[28]	PAN Zhengxin, WU Fangyu, and ZHANG Bailing. Fine-grained image-text matching by cross-modal hard aligning network[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 19275–19284. doi: 10.1109/CVPR52729.2023.01847.