Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images

WANG Yumeng; LIU Zhenbing; LIU Zaiyi

doi:10.11999/JEIT250842

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

WANG Yumeng, LIU Zhenbing, LIU Zaiyi. Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250842

Citation:

WANG Yumeng, LIU Zhenbing, LIU Zaiyi. Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250842

Citation:

PDF( 3191 KB)

Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images

doi: 10.11999/JEIT250842 cstr: 32379.14.JEIT250842

WANG Yumeng^{1, 2, 3, 4
,},
LIU Zhenbing^{1
,
,},
LIU Zaiyi^{1, 2, 3, 4}

1.
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
2.
Department of Radiology, Guangdong Provincial People’s Hospital, Guangzhou 510080, China
3.
Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou 510080, China
4.
Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou 510080, China

Funds: The National Natural Science Foundation of China (82272075, U22A20345), Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application (2022B1212010011)

Received Date: 2025-09-01
Accepted Date: 2025-11-17
Rev Recd Date: 2025-11-17

Available Online: 2025-11-25

Abstract

Abstract

Objective Data-driven deep learning methods are widely applied to cancer subtyping, yet their performance depends on large training datasets with fine-grained annotations. For gigapixel Whole Slide Images (WSI), such annotations are labor-intensive and costly. Clinical data are typically stored in isolated data silos, and sharing procedures raise privacy concerns. Federated Learning (FL) enables a global model to be trained from data distributed across multiple medical centers without transmitting local data. However, in conventional FL, substantial heterogeneity across centers reduces the performance and stability of the global model. Methods A privacy-preserving FL method is proposed for gigapixel WSI in computational pathology. Weakly supervised attention-based Multiple Instance Learning (MIL) is integrated with differential privacy to support training when only slide-level labels are available. Within each client, a multi-scale attention-based MIL method is used to conduct local training on histopathology WSIs, reducing the need for costly pixel-level annotation through a weakly supervised setting. During the federated update, local differential privacy is applied to limit the risk of sensitive information leakage. Random noise drawn from a Gaussian or Laplace distribution is added to model parameters after each client’s local training. Furthermore, a federated adaptive reweighting strategy is introduced to address the heterogeneity of pathological images across clients by dynamically balancing the influence of local data quantity and quality on each client’s aggregation weight. Results and Discussions The proposed FL framework is evaluated on two clinical diagnostic tasks: Non-small Cell Lung Cancer (NSCLC) histologic subtyping and Breast Invasive Carcinoma (BRCA) histologic subtyping. As shown in (Table 1, Table 2, and Fig. 4), the proposed FL method (Ours with DP and Ours w/o DP) achieves higher accuracy and stronger generalization than localized models and other FL approaches. Its classification performance remains competitive even when compared with the centralized model (Fig. 3). These results indicate that privacy-preserving FL is a feasible and effective strategy for multicenter histopathology images and may reduce the performance degradation typically caused by data heterogeneity across centers. When the magnitude of added noise is controlled within a limited range, stable classification can also be achieved (Table 3). The two main components, the multiscale representation attention network and the federated adaptive reweighting strategy, each contribute to consistent performance improvement (Table 4). In addition, the proposed FL method maintains stable classification performance across different hyperparameter settings (Table 5, Table 6), confirming its robustness. Conclusions The proposed FL method addresses two central challenges in multicenter computational pathology: the presence of data silos and concerns over privacy. It also alleviates the performance degradation caused by inter-center data heterogeneity. As balancing model accuracy with privacy protection remains a key challenge, future work focuses on developing methods that preserve privacy while sustaining stable classification performance.
- Histopathology,
- Whole slide image,
- Federated Learning (FL),
- Differential privacy,
- Weakly-supervised learning

FullText(HTML)

References(40)

References

[1]	BRAY F, LAVERSANNE M, SUNG H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2024, 74(3): 229–263. doi: 10.3322/caac.21834.
[2]	HAN Bingfeng, ZHENG Rongshou, ZENG Hongmei, et al. Cancer incidence and mortality in China, 2022[J]. Journal of the National Cancer Center, 2024, 4(1): 47–53. doi: 10.1016/j.jncc.2024.01.006.
[3]	DENTRO S C, LESHCHINER I, HAASE K, et al. Characterizing genetic intra-tumor heterogeneity across 2, 658 human cancer genomes[J]. Cell, 2021, 184(8): 2239–2254. e39. doi: 10.1016/j.cell.2021.03.009.
[4]	WANG Yibei, SAFI M, HIRSCH F R, et al. Immunotherapy for advanced-stage squamous cell lung cancer: The state of the art and outstanding questions[J]. Nature Reviews Clinical Oncology, 2025, 22(3): 200–214. doi: 10.1038/s41571-024-00979-8.
[5]	GONG Tingting, GUO Shuang, LIU Fanghua, et al. Proteomic characterization of epithelial ovarian cancer delineates molecular signatures and therapeutic targets in distinct histological subtypes[J]. Nature Communications, 2023, 14(1): 7802. doi: 10.1038/s41467-023-43282-3.
[6]	NASRAZADANI A, LI Yujia, FANG Yusi, et al. Mixed invasive ductal lobular carcinoma is clinically and pathologically more similar to invasive lobular than ductal carcinoma[J]. British Journal of Cancer, 2023, 128(6): 1030–1039. doi: 10.1038/s41416-022-02131-8.
[7]	ELMORE J. Abstract SY01–03: The gold standard cancer diagnosis: Studies of physician variability, interpretive behavior, and the impact of AI[J]. Cancer Research, 2021, 81(S13): SY01–03. doi: 10.1158/1538-7445.AM2021-SY01-03.
[8]	MADABHUSHI A and LEE G. Image analysis and machine learning in digital pathology: Challenges and opportunities[J]. Medical Image Analysis, 2016, 33: 170–175. doi: 10.1016/j.media.2016.06.037.
[9]	LI Bin, KEIKHOSRAVI A, LOEFFLER A G, et al. Single image super-resolution for whole slide image using convolutional neural networks and self-supervised color normalization[J]. Medical Image Analysis, 2021, 68: 101938. doi: 10.1016/j.media.2020.101938.
[10]	BULTEN W, PINCKAERS H, VAN BOVEN H, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: A diagnostic study[J]. The Lancet Oncology, 2020, 21(2): 233–241. doi: 10.1016/S1470-2045(19)30739-9.
[11]	SRINIDHI C L, CIGA O, and MARTEL A L. Deep neural network models for computational histopathology: A survey[J]. Medical Image Analysis, 2021, 67: 101813. doi: 10.1016/j.media.2020.101813.
[12]	DIETTERICH T G, LATHROP R H, and LOZANO-PÉREZ T. Solving the multiple instance problem with axis-parallel rectangles[J]. Artificial Intelligence, 1997, 89(1/2): 31–71. doi: 10.1016/S0004-3702(96)00034-3.
[13]	CARBONNEAU M A, CHEPLYGINA V, GRANGER E, et al. Multiple instance learning: A survey of problem characteristics and applications[J]. Pattern Recognition, 2018, 77: 329–353. doi: 10.1016/j.patcog.2017.10.009.
[14]	LU M Y, WILLIAMSON D F K, CHEN T Y, et al. Data-efficient and weakly supervised computational pathology on whole-slide images[J]. Nature Biomedical Engineering, 2021, 5(6): 555–570. doi: 10.1038/s41551-020-00682-w.
[15]	BONTEMPO G, BOLELLI F, PORRELLO A, et al. A graph-based multi-scale approach with knowledge distillation for WSI classification[J]. IEEE Transactions on Medical Imaging, 2024, 43(4): 1412–1421. doi: 10.1109/TMI.2023.3337549.
[16]	DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
[17]	MARELLI L and TESTA G. Scrutinizing the EU general data protection regulation[J]. Science, 2018, 360(6388): 496–498. doi: 10.1126/science.aar5419.
[18]	MARKS M and HAUPT C E. AI chatbots, health privacy, and challenges to HIPAA compliance[J]. JAMA, 2023, 330(4): 309–310. doi: 10.1001/jama.2023.9458.
[19]	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]. The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017: 1273–1282.
[20]	KARARGYRIS A, UMETON R, SHELLER M J, et al. Federated benchmarking of medical artificial intelligence with MedPerf[J]. Nature Machine Intelligence, 2023, 5(7): 799–810. doi: 10.1038/s42256-023-00652-2.
[21]	DU TERRAIL J O, LEOPOLD A, JOLY C, et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer[J]. Nature Medicine, 2023, 29(1): 135–146. doi: 10.1038/s41591-022-02155-w.
[22]	ZHANG Yuanming, LI Zheng, HAN Xiangmin, et al. Pseudo-data based self-supervised federated learning for classification of histopathological images[J]. IEEE Transactions on Medical Imaging, 2024, 43(3): 902–915. doi: 10.1109/TMI.2023.3323540.
[23]	RODRÍGUEZ-BARROSO N, JIMÉNEZ-LÓPEZ D, LUZÓN M V, et al. Survey on federated learning threats: Concepts, taxonomy on attacks and defences, experimental study and challenges[J]. Information Fusion, 2023, 90: 148–173. doi: 10.1016/j.inffus.2022.09.011.
[24]	ZHANG Yuheng, JIA Ruoxi, PEI Hengzhi, et al. The secret revealer: Generative model-inversion attacks against deep neural networks[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 250–258. doi: 10.1109/CVPR42600.2020.00033.
[25]	GEIPING J, BAUERMEISTER H, DRÖGE H, et al. Inverting gradients-how easy is it to break privacy in federated learning?[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1421.
[26]	WANG Zhibo, SONG Mengkai, ZHANG Zhifei, et al. Beyond inferring class representatives: User-level privacy leakage from federated learning[C]. IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 2019: 2512–2520. doi: 10.1109/INFOCOM.2019.8737416.
[27]	DONG Jinshuo, ROTH A, and SU Weijie. Gaussian differential privacy[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2022, 84(1): 3–37. doi: 10.1111/rssb.12454.
[28]	KAISSIS G A, MAKOWSKI M R, RÜCKERT D, et al. Secure, privacy-preserving and federated machine learning in medical imaging[J]. Nature Machine Intelligence, 2020, 2(6): 305–311. doi: 10.1038/s42256-020-0186-1.
[29]	WANG Xiaoding, HU Jia, LIN Hui, et al. Federated learning-empowered disease diagnosis mechanism in the internet of medical things: From the privacy-preservation perspective[J]. IEEE Transactions on Industrial Informatics, 2023, 19(7): 7905–7913. doi: 10.1109/TII.2022.3210597.
[30]	XIANG Hangchen, SHEN Junyi, YAN Qingguo, et al. Multi-scale representation attention based deep multiple instance learning for gigapixel whole slide image analysis[J]. Medical Image Analysis, 2023, 89: 102890. doi: 10.1016/j.media.2023.102890.
[31]	CHIDAMBARANATHAN M, SHARMA U, NAIDU C M, et al. A new approach for recognition of implant in knee by template matching[J]. Indian Journal of Science and Technology, 2016, 9(37): 1–5. doi: 10.17485/ijst/2016/v9i37/102081.
[32]	SHI Xiaoshuang, XING Fuyong, XU Kaidi, et al. Loss-based attention for interpreting image-level prediction of convolutional neural networks[J]. IEEE Transactions on Image Processing, 2021, 30: 1662–1675. doi: 10.1109/TIP.2020.3046875.
[33]	GUO Shengnan, WANG Xibin, LONG Shigong, et al. A federated learning scheme meets dynamic differential privacy[J]. CAAI Transactions on Intelligence Technology, 2023, 8(3): 1087–1100. doi: 10.1049/cit2.12187.
[34]	ZHENG Yifeng, LAI Shangqi, LIU Yi, et al. Aggregation service for federated learning: An efficient, secure, and more resilient realization[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(2): 988–1001. doi: 10.1109/TDSC.2022.3146448.
[35]	WANG Bo, LI Hongtao, GUO Yina, et al. PPFLHE: A privacy-preserving federated learning scheme with homomorphic encryption for healthcare data[J]. Applied Soft Computing, 2023, 146: 110677. doi: 10.1016/j.asoc.2023.110677.
[36]	LI Xiaoxiao, GU Yufeng, DVORNEK N, et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results[J]. Medical Image Analysis, 2020, 65: 101765. doi: 10.1016/j.media.2020.101765.
[37]	LU M Y, CHEN R J, KONG Dehan, et al. Federated learning for computational pathology on gigapixel whole slide images[J]. Medical Image Analysis, 2022, 76: 102298. doi: 10.1016/j.media.2021.102298.
[38]	MACENKO M, NIETHAMMER M, MARRON J S, et al. A method for normalizing histology slides for quantitative analysis[C]. 2019 IEEE International Symposium on Biomedical Imaging, Boston, USA, 2009: 1107–1110. doi: 10.1109/ISBI.2009.5193250.
[39]	MA Benteng, FENG Yu, CHE Geng, et al. Federated adaptive reweighting for medical image classification[J]. Pattern Recognition, 2023, 144: 109880. doi: 10.1016/j.patcog.2023.109880.
[40]	ILSE M, TOMCZAK J M, and WELLING M. Attention-based deep multiple instance learning[C]. The 35th International Conference on Machine Learning, Stockholmsmässan, Sweden, 2018: 2132–2141.