高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

隐私保护的联邦弱监督组织病理学亚型分类方法

王钰萌 刘振丙 刘再毅

王钰萌, 刘振丙, 刘再毅. 隐私保护的联邦弱监督组织病理学亚型分类方法[J]. 电子与信息学报. doi: 10.11999/JEIT250842
引用本文: 王钰萌, 刘振丙, 刘再毅. 隐私保护的联邦弱监督组织病理学亚型分类方法[J]. 电子与信息学报. doi: 10.11999/JEIT250842
WANG Yumeng, LIU Zhenbing, LIU Zaiyi. Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250842
Citation: WANG Yumeng, LIU Zhenbing, LIU Zaiyi. Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250842

隐私保护的联邦弱监督组织病理学亚型分类方法

doi: 10.11999/JEIT250842 cstr: 32379.14.JEIT250842
基金项目: 国家自然科学基金(82272075, U22A20345),广东省科技计划项目(2022B1212010011)
详细信息
    作者简介:

    王钰萌:女,博士生,研究方向为医学图像处理与分析、联邦学习

    刘振丙:男,教授,研究方向为医学图像处理、机器学习

    刘再毅:男,教授/主任医师,研究方向为医学影像组学人工智能研究

    通讯作者:

    刘振丙 zbliu@guet.edu.cn

  • 中图分类号: TP399; TP391.4

Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images

Funds: The National Natural Science Foundation of China (82272075, U22A20345), Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application (2022B1212010011)
  • 摘要: 数据驱动的深度学习方法已展现出优越性能,但其成功实施往往依赖于大量细粒度标注训练数据。此外,医疗数据通常呈“数据孤岛”状态,复杂的数据共享过程可能会存在患者隐私泄露的风险。联邦学习 (FL)能够使多个医疗中心在不共享数据的情况下协同训练一个深度学习模型。然而,在计算病理学领域,源自不同医疗中心的病理图像之间普遍存在数据异质性。这些固有的数据异质性可能会显著影响模型性能。针对以上问题,该研究提出一种适用于计算病理学领域千兆像素全切片图像 (WSI)的隐私保护FL方法,该方法结合弱监督的注意力多实例学习 (MIL)与差分隐私技术。具体而言,对于各个参与客户端,使用一种弱监督的多尺度注意力MIL方法,仅需要切片级标签监督本地模型训练,以应对千兆像素病理WSI标注成本高昂的问题。在联邦权重聚合阶段,引入本地化差分隐私技术,进一步降低敏感数据泄露风险;同时采用一种新的联邦自适应重加权策略,旨在克服客户端之间病理图像异质性所带来的挑战。在两种癌症组织学分型任务上评估了所提出FL方法的有效性。实验结果表明,在保障患者数据隐私的前提下,该研究所提出的FL方法相较于本地化模型及其它FL方法,表现出更高的分类准确率;即便与中心化模型相比,其分类性能仍然具备一定竞争力。
  • 图  1  网络框架示意图

    图  2  各个客户端数据集的分布

    图  3  比较基线方法的ROC曲线和PR曲线

    图  4  不同FL方法之间的分类性能比较

    1  多尺度表征注意力网络

     输入: 训练集病理WSIs $ {\left\{{I}_{n}\right\}}_{n=1}^{N} $,标签集$ {\left\{{Y}_{n}\right\}}_{n=1}^{N} $,特征提取
     器$ f(\cdot ) $,第一个分支中的全连接层$ {h}_{1}(\cdot ) $,第二个分支中的全连
     接层$ {h}_{2}(\cdot ) $,注意力模块$ Atten(\cdot ) $
     输出: 病理WSI的预测类向量$ {P}_{n}^{w} $
     1: $ {\left\{{I}_{ni}\right\}}_{i=1}^{{M}_{b}}\leftarrow {I}_{n} $ //将WSI切割成$ {M}_{b} $张包级图像
     2: $ {\gamma }_{ni}\leftarrow 1 $ //初始化WSI中每张包级图像的注意力权重
     3: for 每次迭代$ t\in \{1,2,\cdots ,T\} $ do
       //第一个分支
     4:  for 每个minibatch do
     5:   $ {\left\{{I}_{nij}\right\}}_{j=1}^{{M}_{p}}\leftarrow {I}_{ni} $ //将包级图像划分为$ {M}_{p} $张补丁级图像
     6:   $ {\left\{{Z}_{nijk}^{c}\right\}}_{k=1}^{{M}_{c}}\leftarrow f\left({I}_{nij}\right) $
     7:   $ {\left\{{\eta }_{nijk}\right\}}_{k=1}^{{M}_{c}}\leftarrow Atten\left({h}_{1}\left({\left\{{Z}_{nijk}^{c}\right\}}_{k=1}^{{M}_{c}}\right)\right) $ //细胞级
        注意力权重
     8:   $ {Z}_{nij}^{p}\leftarrow {\sum }_{k=1}^{{M}_{c}}{\eta }_{nijk}\cdot {Z}_{nijk}^{c} $
     9:   $ {\left\{{\delta }_{nij}\right\}}_{j=1}^{{M}_{p}}\leftarrow Atten\left({h}_{1}\left({\left\{{Z}_{nij}^{p}\right\}}_{j=1}^{{M}_{p}}\right)\right) $ //补丁级注
        意力权重
     10:   $ {Z}_{ni}^{b}\leftarrow {\sum }_{j=1}^{{M}_{p}}{\delta }_{nij}{\cdot Z}_{nij}^{p} $
     11: end
       //第二个分支
     12: for每个minibatch do
     13:  $ {\left\{{\gamma }_{ni}\right\}}_{i=1}^{{M}_{b}}\leftarrow Atten\left({h}_{2}\right({\left\{{Z}_{ni}^{b}\right\}}_{i=1}^{{M}_{b}}\left)\right) $ //包级注意力权重
     14:  $ {Z}_{n}^{w}\leftarrow \displaystyle\sum\nolimits_{i=1}^{{M}_{b}}{\gamma }_{ni}{\cdot Z}_{ni}^{b} $
     15:  $ {P}_{n}^{w}\leftarrow h\left({Z}_{n}^{w}\right) $
     16: end
     17: end
    下载: 导出CSV

    表  1  与基线方法的对比结果

    数据集模型AUCACCPRESENSPEF1-scoreP
    NSCLCClient #10.9010.7770.8370.6840.8690.753<0.001
    Client #20.8320.7510.7380.7760.7270.7560.034
    Client #30.8380.7160.6780.8160.6160.741<0.001
    Centralized0.9640.9290.9380.9180.9390.9280.003
    Ours w/o DP0.9690.9340.9470.9180.9490.933-
    Ours with DP0.9640.9140.9090.9180.9090.9140.023
    BRCAClient #10.8000.8140.8770.8890.5370.8830.045
    Client #20.8330.8450.8550.9670.3900.908<0.001
    Client #30.7880.7630.8090.9150.1950.859<0.001
    Centralized0.8980.8810.9060.9480.6340.9270.124
    Ours w/o DP0.9000.8870.9170.9410.6830.929-
    Ours with DP0.8920.8760.8910.9610.5610.9250.702
    下载: 导出CSV

    表  2  与其他FL方法的对比结果

    数据集模型AUCACCPRESENSPEF1-scoreP
    NSCLCFedAVG0.9600.8930.9140.8670.9190.890<0.001
    FedAR0.9610.8980.9060.8880.9090.8970.006
    HistoFL0.8960.8070.8000.8160.7980.8080.013
    Ours w/o DP0.9690.9340.9470.9180.9490.933-
    Ours with DP0.9640.9140.9090.9180.9090.9140.023
    BRCAFedAVG0.7970.7780.8270.9080.2930.866<0.001
    FedAR0.8450.8140.8310.9610.2680.8910.002
    HistoFL0.8190.8250.8840.8950.5610.890<0.001
    Ours w/o DP0.9000.8870.9170.9410.6830.929-
    Ours with DP0.8920.8760.8910.9610.5610.9250.702
    下载: 导出CSV

    表  3  添加不同程度噪声对本文方法的分类性能影响

    添加噪声
    程度
    NSCLCBRCA
    AUCACCF1-scoreAUCACCF1-score
    高斯噪声0.10.8450.7510.7170.7060.7990.887
    0.010.9630.9140.9140.8760.8760.924
    0.0010.9640.9140.9140.8920.8760.925
    0.00010.9570.9090.9100.8940.8920.932
    拉普拉斯噪声0.10.7670.7260.7220.5930.7890.882
    0.010.9710.9140.9130.8970.8920.933
    0.0010.9580.9090.9090.8920.8760.925
    0.00010.9630.9240.9240.8720.8920.933
    -00.9690.9340.9330.9000.8870.929
    下载: 导出CSV

    表  4  NSCLC数据集上,本文方法采用不同基础组件组合的定量结果

    基线模型 联邦自适应重加权 多尺度表征注意力 AUC ACC PRE SEN SPE F1-score
    0.896 0.807 0.800 0.816 0.798 0.808
    0.930 0.873 0.892 0.847 0.899 0.869
    0.960 0.893 0.914 0.867 0.919 0.890
    0.969 0.934 0.947 0.918 0.949 0.933
    下载: 导出CSV

    表  5  NSCLC数据集上,本文方法在不同客户端数量下的性能表现

    客户端数量模型AUCACCPRESENSPEF1-score
    3Localized0.8570.7480.7510.7590.7370.750
    HistoFL0.8960.8070.8000.8160.7980.808
    Ours0.9690.9340.9470.9180.9490.933
    4Localized0.7470.6800.6750.6930.6680.677
    HistoFL0.8550.7660.8230.6700.8600.739
    Ours0.9310.8730.8600.8870.8600.873
    5Localized0.7540.6570.6530.7420.5720.671
    HistoFL0.8980.8190.7960.8600.7780.827
    Ours0.9390.8990.9000.9000.8990.900
    下载: 导出CSV

    表  6  不同超参数(β和λ)设置下,本文方法在NSCLC数据集上的性能表现

    βλAUCACCPRESENSPEF1-score
    0.510.9450.8830.8790.8880.8790.883
    50.9580.9040.8990.9080.8990.904
    100.9630.8980.8980.8980.8990.898
    110.9570.8980.9060.8880.9090.897
    50.9540.9140.9180.9080.9190.913
    100.9690.9340.9470.9180.9490.933
    1.510.9280.8780.8940.8570.8990.875
    50.9460.9040.8760.9390.8690.906
    100.9480.8880.9130.8570.9190.884
    下载: 导出CSV
  • [1] BRAY F, LAVERSANNE M, SUNG H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2024, 74(3): 229–263. doi: 10.3322/caac.21834.
    [2] HAN Bingfeng, ZHENG Rongshou, ZENG Hongmei, et al. Cancer incidence and mortality in China, 2022[J]. Journal of the National Cancer Center, 2024, 4(1): 47–53. doi: 10.1016/j.jncc.2024.01.006.
    [3] DENTRO S C, LESHCHINER I, HAASE K, et al. Characterizing genetic intra-tumor heterogeneity across 2, 658 human cancer genomes[J]. Cell, 2021, 184(8): 2239–2254. e39. doi: 10.1016/j.cell.2021.03.009.
    [4] WANG Yibei, SAFI M, HIRSCH F R, et al. Immunotherapy for advanced-stage squamous cell lung cancer: The state of the art and outstanding questions[J]. Nature Reviews Clinical Oncology, 2025, 22(3): 200–214. doi: 10.1038/s41571-024-00979-8.
    [5] GONG Tingting, GUO Shuang, LIU Fanghua, et al. Proteomic characterization of epithelial ovarian cancer delineates molecular signatures and therapeutic targets in distinct histological subtypes[J]. Nature Communications, 2023, 14(1): 7802. doi: 10.1038/s41467-023-43282-3.
    [6] NASRAZADANI A, LI Yujia, FANG Yusi, et al. Mixed invasive ductal lobular carcinoma is clinically and pathologically more similar to invasive lobular than ductal carcinoma[J]. British Journal of Cancer, 2023, 128(6): 1030–1039. doi: 10.1038/s41416-022-02131-8.
    [7] ELMORE J. Abstract SY01–03: The gold standard cancer diagnosis: Studies of physician variability, interpretive behavior, and the impact of AI[J]. Cancer Research, 2021, 81(S13): SY01–03. doi: 10.1158/1538-7445.AM2021-SY01-03.
    [8] MADABHUSHI A and LEE G. Image analysis and machine learning in digital pathology: Challenges and opportunities[J]. Medical Image Analysis, 2016, 33: 170–175. doi: 10.1016/j.media.2016.06.037.
    [9] LI Bin, KEIKHOSRAVI A, LOEFFLER A G, et al. Single image super-resolution for whole slide image using convolutional neural networks and self-supervised color normalization[J]. Medical Image Analysis, 2021, 68: 101938. doi: 10.1016/j.media.2020.101938.
    [10] BULTEN W, PINCKAERS H, VAN BOVEN H, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: A diagnostic study[J]. The Lancet Oncology, 2020, 21(2): 233–241. doi: 10.1016/S1470-2045(19)30739-9.
    [11] SRINIDHI C L, CIGA O, and MARTEL A L. Deep neural network models for computational histopathology: A survey[J]. Medical Image Analysis, 2021, 67: 101813. doi: 10.1016/j.media.2020.101813.
    [12] DIETTERICH T G, LATHROP R H, and LOZANO-PÉREZ T. Solving the multiple instance problem with axis-parallel rectangles[J]. Artificial Intelligence, 1997, 89(1/2): 31–71. doi: 10.1016/S0004-3702(96)00034-3.
    [13] CARBONNEAU M A, CHEPLYGINA V, GRANGER E, et al. Multiple instance learning: A survey of problem characteristics and applications[J]. Pattern Recognition, 2018, 77: 329–353. doi: 10.1016/j.patcog.2017.10.009.
    [14] LU M Y, WILLIAMSON D F K, CHEN T Y, et al. Data-efficient and weakly supervised computational pathology on whole-slide images[J]. Nature Biomedical Engineering, 2021, 5(6): 555–570. doi: 10.1038/s41551-020-00682-w.
    [15] BONTEMPO G, BOLELLI F, PORRELLO A, et al. A graph-based multi-scale approach with knowledge distillation for WSI classification[J]. IEEE Transactions on Medical Imaging, 2024, 43(4): 1412–1421. doi: 10.1109/TMI.2023.3337549.
    [16] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
    [17] MARELLI L and TESTA G. Scrutinizing the EU general data protection regulation[J]. Science, 2018, 360(6388): 496–498. doi: 10.1126/science.aar5419.
    [18] MARKS M and HAUPT C E. AI chatbots, health privacy, and challenges to HIPAA compliance[J]. JAMA, 2023, 330(4): 309–310. doi: 10.1001/jama.2023.9458.
    [19] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]. The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017: 1273–1282.
    [20] KARARGYRIS A, UMETON R, SHELLER M J, et al. Federated benchmarking of medical artificial intelligence with MedPerf[J]. Nature Machine Intelligence, 2023, 5(7): 799–810. doi: 10.1038/s42256-023-00652-2.
    [21] DU TERRAIL J O, LEOPOLD A, JOLY C, et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer[J]. Nature Medicine, 2023, 29(1): 135–146. doi: 10.1038/s41591-022-02155-w.
    [22] ZHANG Yuanming, LI Zheng, HAN Xiangmin, et al. Pseudo-data based self-supervised federated learning for classification of histopathological images[J]. IEEE Transactions on Medical Imaging, 2024, 43(3): 902–915. doi: 10.1109/TMI.2023.3323540.
    [23] RODRÍGUEZ-BARROSO N, JIMÉNEZ-LÓPEZ D, LUZÓN M V, et al. Survey on federated learning threats: Concepts, taxonomy on attacks and defences, experimental study and challenges[J]. Information Fusion, 2023, 90: 148–173. doi: 10.1016/j.inffus.2022.09.011.
    [24] ZHANG Yuheng, JIA Ruoxi, PEI Hengzhi, et al. The secret revealer: Generative model-inversion attacks against deep neural networks[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 250–258. doi: 10.1109/CVPR42600.2020.00033.
    [25] GEIPING J, BAUERMEISTER H, DRÖGE H, et al. Inverting gradients-how easy is it to break privacy in federated learning?[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1421.
    [26] WANG Zhibo, SONG Mengkai, ZHANG Zhifei, et al. Beyond inferring class representatives: User-level privacy leakage from federated learning[C]. IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 2019: 2512–2520. doi: 10.1109/INFOCOM.2019.8737416.
    [27] DONG Jinshuo, ROTH A, and SU Weijie. Gaussian differential privacy[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2022, 84(1): 3–37. doi: 10.1111/rssb.12454.
    [28] KAISSIS G A, MAKOWSKI M R, RÜCKERT D, et al. Secure, privacy-preserving and federated machine learning in medical imaging[J]. Nature Machine Intelligence, 2020, 2(6): 305–311. doi: 10.1038/s42256-020-0186-1.
    [29] WANG Xiaoding, HU Jia, LIN Hui, et al. Federated learning-empowered disease diagnosis mechanism in the internet of medical things: From the privacy-preservation perspective[J]. IEEE Transactions on Industrial Informatics, 2023, 19(7): 7905–7913. doi: 10.1109/TII.2022.3210597.
    [30] XIANG Hangchen, SHEN Junyi, YAN Qingguo, et al. Multi-scale representation attention based deep multiple instance learning for gigapixel whole slide image analysis[J]. Medical Image Analysis, 2023, 89: 102890. doi: 10.1016/j.media.2023.102890.
    [31] CHIDAMBARANATHAN M, SHARMA U, NAIDU C M, et al. A new approach for recognition of implant in knee by template matching[J]. Indian Journal of Science and Technology, 2016, 9(37): 1–5. doi: 10.17485/ijst/2016/v9i37/102081.
    [32] SHI Xiaoshuang, XING Fuyong, XU Kaidi, et al. Loss-based attention for interpreting image-level prediction of convolutional neural networks[J]. IEEE Transactions on Image Processing, 2021, 30: 1662–1675. doi: 10.1109/TIP.2020.3046875.
    [33] GUO Shengnan, WANG Xibin, LONG Shigong, et al. A federated learning scheme meets dynamic differential privacy[J]. CAAI Transactions on Intelligence Technology, 2023, 8(3): 1087–1100. doi: 10.1049/cit2.12187.
    [34] ZHENG Yifeng, LAI Shangqi, LIU Yi, et al. Aggregation service for federated learning: An efficient, secure, and more resilient realization[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(2): 988–1001. doi: 10.1109/TDSC.2022.3146448.
    [35] WANG Bo, LI Hongtao, GUO Yina, et al. PPFLHE: A privacy-preserving federated learning scheme with homomorphic encryption for healthcare data[J]. Applied Soft Computing, 2023, 146: 110677. doi: 10.1016/j.asoc.2023.110677.
    [36] LI Xiaoxiao, GU Yufeng, DVORNEK N, et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results[J]. Medical Image Analysis, 2020, 65: 101765. doi: 10.1016/j.media.2020.101765.
    [37] LU M Y, CHEN R J, KONG Dehan, et al. Federated learning for computational pathology on gigapixel whole slide images[J]. Medical Image Analysis, 2022, 76: 102298. doi: 10.1016/j.media.2021.102298.
    [38] MACENKO M, NIETHAMMER M, MARRON J S, et al. A method for normalizing histology slides for quantitative analysis[C]. 2019 IEEE International Symposium on Biomedical Imaging, Boston, USA, 2009: 1107–1110. doi: 10.1109/ISBI.2009.5193250.
    [39] MA Benteng, FENG Yu, CHE Geng, et al. Federated adaptive reweighting for medical image classification[J]. Pattern Recognition, 2023, 144: 109880. doi: 10.1016/j.patcog.2023.109880.
    [40] ILSE M, TOMCZAK J M, and WELLING M. Attention-based deep multiple instance learning[C]. The 35th International Conference on Machine Learning, Stockholmsmässan, Sweden, 2018: 2132–2141.
  • 加载中
图(4) / 表(7)
计量
  • 文章访问数:  16
  • HTML全文浏览量:  8
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 修回日期:  2025-11-17
  • 录用日期:  2025-11-17
  • 网络出版日期:  2025-11-25

目录

    /

    返回文章
    返回