Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification
-
摘要: 病理图像分类对于癌症诊断至关重要,但现有方法存在依赖随机采样、多层级信息利用不足等问题。为此,该文提出一种层级融合多实例学习方法。首先,对病理图像的不同层级进行切分并用ResNet-50提取特征;然后,针对伪包标签不准确及背景噪声问题,提出基于注意力评价函数的伪包划分方法,利用门控注意力对低分辨率特征进行重要性评估,依据得分将特征划分为低分辨率伪包,并通过索引映射得到对应高分辨率伪包;最后,针对多层级信息利用不足的问题设计两阶段分类模型,第1阶段对低分辨率伪包进行初步分类,并依据预测置信度筛选出高判别性的关键区域及其对应的高分辨率特征;第2阶段通过交叉注意力机制,将筛选出的低分辨率特征与对应的高分辨率特征进行深度融合,随后将其与经过门控注意力聚合的高分辨率伪包特征进行拼接,以利用局部细节结合全局信息进行分析。在训练过程中,采用双分支交叉熵损失函数,联合优化低分辨率初步分类任务与高分辨率最终分类任务。实验使用了两个公开数据集Camelyon16、TCGA-LUNG及一个私有皮肤癌数据集NBU-Skin对模型进行测试,结果表明,该方法在多中心公开数据集和私有数据集上性能均优于CLAM、TransMIL等算法,其中在NBU-Skin数据集上5折交叉验证的平均准确率达到90.5%,平均AUC达到0.976。此外,该方法在跨病种、跨中心数据中表现稳定,为癌症病理的人工智能诊断提供了新的思路。Abstract:
Objective Cancer mortality in China continues to rise, and pathological image classification has become central to diagnosis. Pathological images have a multilevel structure, yet many existing methods focus only on the highest resolution or use simple feature concatenation for multi-scale fusion. These strategies do not make effective use of hierarchical information. In addition, most approaches rely on random pseudo-bag division to handle high-resolution images. Because cancerous regions in positive slides are sparse, random sampling often produces incorrect pseudo-labels and low signal-to-noise ratios, which reduce classification accuracy. This study proposes a Hierarchical Fusion Multi-Instance Learning (HFMIL) method that integrates multilevel feature fusion with a pseudo-bag division strategy based on an attention evaluation function to improve accuracy and interpretability in pathological image classification. Methods A weakly supervised multilevel classification method is proposed to use the hierarchical characteristics of pathological images and improve cancer image classification performance. The method has three main steps. First, multilevel features are extracted. Blank regions are removed, low-resolution images are divided into patches, and these patches are indexed to their corresponding high-resolution regions. Semantic features capture low-resolution tissue structure and high-resolution cellular detail. Second, pseudo-bags are constructed using an attention-based evaluation function. Class activation mapping is used to compute patch-level scores. Patches are ranked, and high-scoring ones are selected as potential positive samples. Low-scoring patches are discarded to maintain pseudo-label relevance. High-resolution pseudo-bags are then generated using index mapping, which reduces incorrect pseudo-labels and improves the signal-to-noise ratio. Third, a two-stage classification model is developed. Low-resolution pseudo-bags are aggregated with a gated attention mechanism for preliminary classification. A cross-attention mechanism then fuses the most informative low-resolution features with their corresponding high-resolution features. The fused representation is concatenated with aggregated high-resolution pseudo-bags to form an image-level feature vector for final prediction. Training uses a two-stage loss that combines low-resolution and overall cross-entropy losses. Experiments on three pathological image datasets confirm the effectiveness of the method in weakly supervised settings. Results and Discussions The proposed method is compared with several recent weakly supervised classification approaches, including ABMIL, CLAM, TransMIL, and DTFD, using three pathological image datasets: the publicly available Camelyon16 and TCGA-LUNG datasets and a private skin cancer dataset, NBU-Skin. The results show clear performance gains. On Camelyon16, the method achieves 88.3% accuracy and an AUC of 0.979 ( Table 2 ). On TCGA-LUNG, accuracy reaches 86.0% and AUC 0.931 (Table 2 ), exceeding the comparative methods. On the NBU-Skin dataset, accuracy reaches 90.5% and AUC 0.976 for multiclass tasks (Table 2 ). Ablation studies further examine the necessity of the multilevel feature fusion and pseudo-bag division modules. The combination of these modules improves classification performance. On the skin cancer dataset, removing the pseudo-bag division module reduces accuracy from 93.8% to 90.7%, and removing the multilevel feature fusion module reduces accuracy further to 80.0% (Table 3 ). These results confirm that each component contributes to the effectiveness of the method.Conclusions A weakly supervised pathological image classification method that integrates multilevel feature fusion and an attention-based pseudo-bag division strategy is proposed. The method uses hierarchical information effectively and reduces errors caused by incorrect pseudo-labels and low signal-to-noise ratios. Experiments show consistent improvements in accuracy and AUC across three datasets. The main contributions are: (1) a multilevel feature extraction and fusion strategy that uses a cross-attention mechanism to combine features across scales; (2) an attention-based pseudo-bag division method that identifies potential positive regions and improves pseudo-label correctness through a top-k strategy while reducing background noise; and (3) superior performance compared with recent weakly supervised classifiers. Future work may include optimizing cross-level attention mechanisms, extending the framework to prognosis prediction or lesion segmentation, and developing more efficient feature extraction and fusion modules for broader clinical use. -
Key words:
- Pathological image /
- Multiple instance learning /
- Deep learning /
- Artificial intelligence
-
1 多层级对应特征提取方法
输入:共输入N张病理图像$ {\{W}_{i}\}_{i=1}^{N} $,其中$ {W}_{i} $代表第i张病理图 像,每张病理图像在低分辨率下有m个图像块 预训练网络ResNet50 输出:低分辨率特征集合$ \{{\boldsymbol{X}}_{i,\text{low}}\}_{i=1}^{N} $,其中
$ {\boldsymbol{X}}_{i,\text{low}}=\{{x}_{i,1},{x}_{i,2},\cdots ,{x}_{i,m}\} $高分辨率特征集合$ \{{\boldsymbol{X}}_{i,\text{high}}\}_{i=1}^{N} $,其中
$ {\boldsymbol{X}}_{i,\text{high}}=\{{h}_{i,1},{h}_{i,2},\cdots ,{h}_{i,16m}\} $(1) for i←1 to N do (2) 使用最大类间方差法提取组织区域轮廓并去除空白区域 (3) 将$ {W}_{i} $切分为256×256的图像块,得到第k个图像块的低分
辨率坐标 $ \mathrm{low}\_ {\text{coords}}_{k}=({x}_{k},{y}_{k}) $(4) for k←1 to m do (5) 从$ {W}_{i} $中根据坐标$ \mathrm{low}\_ {\text{coords}}_{k}=({x}_{k},{y}_{k}) $提取图像块
$ {P}_{{k},\text{low}} $(6) $ {\boldsymbol{f}}_{\text{low}}\leftarrow \mathrm{ResNet}50({P}_{{k},\text{low}}) $ (7) $ {\boldsymbol{X}}_{i,\text{low}}.\mathrm{append}({\boldsymbol{f}}_{\text{low}}) $ (8) 通过${\mathrm{high}\_ \text{coords}}_{k}\mathrm{}=\{({x}_{{k}}\times 4+i\times 256,{y}_{{k}}\times 4+ $
$ j\times 256)\mid i,j\in \{0{,}1{,}2{,}3\}\} $计算第k个图像块的高分辨
率坐标$ {\text{high}\_ \text{coords}}_{k}\mathrm{}=({x}_\text{high},{y}_\text{high}) $(9) foreach $ ({x}_\text{high},{y}_\text{high})\in {\text{high}\_ \text{coords}}_{k} $do (10) 从$ {W}_{i} $中根据坐标$ \text{high\_ coords}_{{k}}=({x}_\text{high},{y}_\text{high}) $提
取图像块 $ {P}_{{k},\text{high}} $(11) $ {\boldsymbol{f}}_{\text{high}}\leftarrow \mathrm{ResNet}50({P}_{{k},\text{high}}) $ (12) $ {\boldsymbol{X}}_{i,\text{high}}.\mathrm{append}({\boldsymbol{f}}_{\text{high}}) $ (13) end foreach (14) end for (15) end for 表 1 数据分布情况
数据集 总数 类别分布 数量 训练集 验证集 测试集 NBU-Skin 326 BCC 166 228 33 65 MM 106 SCC 54 TCGA-LUNG 1053 LUSC 541 738 105 210 LUAD 512 Camelyon16 399 NORMAL 239 243 27 129 TUMOR 160 表 2 模型结果对比
模型 模型推理时间 显存消耗 Camelyon16 NBU-Skin TCGA-LUNG 准确率 AUC 准确率 AUC 准确率 AUC Mean-Pooling[23] 0.229 ms 2148 MB 0.675 0.761 0.755±0.050 0.863±0.114 0.813±0.094 0.881±0.099 Max-Pooling[23] 0.232 ms 2148 MB 0.587 0.599 0.797±0.018 0.881±0.044 0.801±0.031 0.860±0.034 ABMIL[13] 0.468 ms 2276 MB 0.862 0.876 0.838±0.063 0.927±0.089 0.844±0.023 0.919±0.026 Dsmil[17] 0.864 ms 2284 MB 0.836 0.862 0.727±0.027 0.809±0.107 0.783±0.041 0.856±0.040 Dsmil + APBD 0.913 ms 2463 MB 0.850 0.919 0.847±0.039 0.925±0.044 0.844±0.011 0.918±0.016 Dsmil + RankMix[18] 1.176 ms 2949 MB 0.855 0.897 0.823±0.086 0.922±0.042 0.849±0.017 0.917±0.023 Dsmil + ReMix[20] 0.901 ms 3052 MB 0.829 0.905 0.818±0.055 0.908±0.047 0.833±0.037 0.915±0.020 CLAM-SB[14] 0.456 ms 2584 MB 0.806 0.865 0.862±0.070 0.950±0.020 0.834±0.030 0.912±0.034 CLAM-SB + MDDP[24] 0.820 ms 3394 MB 0.872 0.868 0.751±0.083 0.878±0.055 0.841±0.033 0.925±0.042 CLAM-MB[14] 0.841 ms 2584 MB 0.782 0.770 0.865±0.050 0.953±0.032 0.840±0.043 0.912±0.019 TransMIL[15] 6.620 ms 10114 MB 0.858 0.906 0.798±0.100 0.930±0.061 0.819±0.038 0.885±0.030 DTFD(MaxS)[16] 1.833 ms 2140 MB 0.858 0.870 0.859±0.024 0.898±0.012 0.764±0.010 0.837±0.017 DTFD(MaxMinS)[16] 1.911 ms 2336 MB 0.881 0.906 0.792±0.101 0.887±0.074 0.832±0.031 0.907±0.031 DTFD(AFS)[16] 2.052 ms 2130 MB 0.881 0.901 0.786±0.101 0.891±0.081 0.849±0.036 0.927±0.019 HFMIL(本文) 2.249 ms 2356 MB 0.883 0.979 0.905±0.030 0.976±0.016 0.860±0.043 0.931±0.026 注:粗体表示最优值。 表 3 消融实验结果
HFMIL消融实验设置 Camelyon16 NBU-Skin TCGA-LUNG 多层级特征融合 伪包划分 准确率 AUC 准确率 AUC 准确率 AUC 0.472 0.420 0.800 0.882 0.785 0.880 √ 0.751 0.668 0.876 0.952 0.823 0.854 √ 0.798 0.782 0.907 0.968 0.880 0.927 √ √ 0.883 0.979 0.938 0.997 0.900 0.966 注:粗体表示最优值。 表 4 top-k参数敏感性实验结果
top-k取值 准确率 AUC 5% 0.907±0.055 0.975±0.013 10% 0.901±0.058 0.971±0.011 15% 0.905±0.030 0.976±0.016 20% 0.900±0.037 0.972±0.015 25% 0.892±0.066 0.966±0.023 注:粗体表示最优值。 -
[1] HAN Bingfeng, ZHENG Rongshou, ZENG Hongmei, et al. Cancer incidence and mortality in China, 2022[J]. Journal of the National Cancer Center, 2024, 4(1): 47–53. doi: 10.1016/j.jncc.2024.01.006. [2] 姜梦琦, 韩昱晨, 傅小龙. 基于人工智能的H-E染色全切片病理学图像分析在肺癌研究中的进展[J]. 中国癌症杂志, 2024, 34(3): 306–315. doi: 10.19401/j.cnki.1007-3639.2024.03.009.JIANG Mengqi, HAN Yuchen, and FU Xiaolong. Research progress on H-E stained whole slide image analysis by artificial intelligence in lung cancer[J]. China Oncology, 2024, 34(3): 306–315. doi: 10.19401/j.cnki.1007-3639.2024.03.009. [3] 王钰萌, 刘振丙, 刘再毅. 隐私保护的联邦弱监督组织病理学亚型分类方法[J/OL]. https://jeit.ac.cn/cn/article/doi/10.11999/JEIT250842, 2025.WANG Yumeng, LIU Zhenbing, and LIU Zaiyi. Privacy-preserving federated weakly-supervised learning for cancer subtyping on histopathology images[J/OL]. https://jeit.ac.cn/cn/article/doi/10.11999/JEIT250842, 2025. [4] 金怀平, 薛飞跃, 李振辉, 等. 基于病理图像集成深度学习的胃癌预后预测方法[J]. 电子与信息学报, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655.JIN Huaiping, XUE Feiyue, LI Zhenhui, et al. Prognostic prediction of gastric cancer based on ensemble deep learning of pathological images[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655. [5] FEI Manman, ZHANG Xin, CHEN Dongdong, et al. Whole slide cervical cancer classification via graph attention networks and contrastive learning[J]. Neurocomputing, 2025, 613: 128787. doi: 10.1016/j.neucom.2024.128787. [6] ZHANG Jiawei, SUN Zhanquan, WANG Kang, et al. Prognosis prediction based on liver histopathological image via graph deep learning and transformer[J]. Applied Soft Computing, 2024, 161: 111653. doi: 10.1016/j.asoc.2024.111653. [7] LI Mingze, ZHANG Bingbing, SUN Jian, et al. Weakly supervised breast cancer classification on WSI using transformer and graph attention network[J]. International Journal of Imaging Systems and Technology, 2024, 34(4): e23125. doi: 10.1002/ima.23125. [8] WANG Fuying, XIN Jiayi, ZHAO Weiqin, et al. TAD-graph: Enhancing whole slide image analysis via task-aware subgraph disentanglement[J]. IEEE Transactions on Medical Imaging, 2025, 44(6): 2683–2695. doi: 10.1109/TMI.2025.3545680. [9] WU Kun, JIANG Zhiguo, TANG Kunming, et al. Pan-cancer histopathology WSI pre-training with position-aware masked autoencoder[J]. IEEE Transactions on Medical Imaging, 2025, 44(4): 1610–1623. doi: 10.1109/TMI.2024.3513358. [10] 张印辉, 张金凯, 何自芬, 等. 全局感知与稀疏特征关联图像级弱监督病理图像分割[J]. 电子与信息学报, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364.ZHANG Yinhui, ZHANG Jinkai, HE Zifen, et al. Global perception and sparse feature associate image-level weakly supervised pathological image segmentation[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364. [11] YAN Rui, LV Zhilong, YANG Zhidong, et al. Sparse and hierarchical transformer for survival analysis on whole slide images[J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28(1): 7–18. doi: 10.1109/JBHI.2023.3307584. [12] MA Yingfan, LUO Xiaoyuan, FU Kexue, et al. Transformer-based video-structure multi-instance learning for whole slide image classification[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 14263–14271. doi: 10.1609/aaai.v38i13.29338. [13] ILSE M, TOMCZAK J, and WELLING M. Attention-based deep multiple instance learning[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 2127–2136. [14] LU M Y, WILLIAMSON D F K, CHEN T Y, et al. Data-efficient and weakly supervised computational pathology on whole-slide images[J]. Nature Biomedical Engineering, 2021, 5(6): 555–570. doi: 10.1038/s41551-020-00682-w. [15] SHAO Zhuchen, BIAN Hao, CHEN Yang, et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification[C].The 35th International Conference on Neural Information Processing Systems, 2021: 164. [16] ZHANG Hongrun, MENG Yanda, ZHAO Yitian, et al. DTFD-MIL: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 18780–18790. doi: 10.1109/CVPR52688.2022.01824. [17] LI Bin, LI Yin, and ELICEIRI K W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14313–14323. doi: 10.1109/CVPR46437.2021.01409. [18] CHEN Y C and LU C S. RankMix: Data augmentation for weakly supervised learning of classifying whole slide images with diverse sizes and imbalanced categories[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 23936–23945. doi: 10.1109/CVPR52729.2023.02292. [19] LIU Pei, JI Luping, ZHANG Xinyu, et al. Pseudo-bag mixup augmentation for multiple instance learning-based whole slide image classification[J]. IEEE Transactions on Medical Imaging, 2024, 43(5): 1841–1852. doi: 10.1109/TMI.2024.3351213. [20] YANG Jiawei, CHEN Hanbo, ZHAO Yu, et al. ReMix: A general and efficient framework for multiple instance learning based whole slide image classification[C]. The 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, Singapore, 2022: 35–45. doi: 10.1007/978-3-031-16434-7_4. [21] BEJNORDI B E, VETA M, VAN DIEST P J, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer[J]. JAMA, 2017, 318(22): 2199–2210. doi: 10.1001/jama.2017.14585. [22] TOMCZAK K, CZERWIŃSKA P, and WIZNEROWICZ M. The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge[J]. Contemporary Oncology, 2015, 19(1A): A68–A77. doi: 10.5114/wo.2014.47136. [23] ZHOU S K, RUECKERT D, and FICHTINGER G. Handbook of Medical Image Computing and Computer Assisted Intervention[M]. London: Academic Press, 2020: 521–546. [24] LOU Wei, LI Guanbin, WAN Xiang, et al. Multi-modal denoising diffusion pre-training for whole-slide image classification[C]. The 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 10804–10813. doi: 10.1145/3664647.3680882. -
下载:
下载: