高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向纵向联邦学习的隐私保护数据对齐框架

高莹 谢雨欣 邓煌昊 朱祖坤 张一余

高莹, 谢雨欣, 邓煌昊, 朱祖坤, 张一余. 面向纵向联邦学习的隐私保护数据对齐框架[J]. 电子与信息学报, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234
引用本文: 高莹, 谢雨欣, 邓煌昊, 朱祖坤, 张一余. 面向纵向联邦学习的隐私保护数据对齐框架[J]. 电子与信息学报, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234
GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234
Citation: GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234

面向纵向联邦学习的隐私保护数据对齐框架

doi: 10.11999/JEIT231234
基金项目: 北京市自然科学基金 (M21033),腾讯微信犀牛鸟基金(本基金无项目编号)
详细信息
    作者简介:

    高莹:女,副教授,博士生导师,研究方向为密码学、隐私计算和区块链

    谢雨欣:女,博士生,研究方向为联邦学习

    邓煌昊:男,博士生,研究方向为联邦学习

    朱祖坤:男,硕士生,研究方向为联邦学习

    张一余:男,硕士,研究方向为纵向联邦学习

    通讯作者:

    高莹 gaoying@buaa.edu.cn

  • 中图分类号: TN918;TP309

A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

Funds: The Natural Science Foundation of Beijing Municipality (M21033), The Tencent Rhino-Bird Joint Research Program
  • 摘要: 纵向联邦学习中,各个客户端持有的数据集中包含有重叠的样本ID和不同维度的样本特征,需要进行数据对齐以适应模型训练。现有数据对齐技术一般将各方样本ID交集作为公开信息,如何在不泄露样本ID交集的前提下实现数据对齐成为亟需解决的问题。基于可交换加密和同态加密技术,该文构造了隐私保护的数据对齐框架ALIGN,包括数据加密、密文盲化、密文求交和特征拼接等步骤,使得相同的原始样本ID经过双重可交换加密可变换为相同的密文,并且对样本特征经同态加密后又进行了盲化处理。ALIGN框架能够对参与方样本ID的密文求交,将交集内样本ID对应的全部特征数据进行拼接并以秘密分享形式分配给参与方。相比现有数据对齐技术,该框架不仅能够保护样本ID交集的隐私性,同时能安全地删除样本ID交集外的样本信息。对ALIGN框架的安全性证明表明,除数据规模外,各客户端不能通过数据对齐获得关于对方数据的任何信息,保证了隐私保护策略的有效性。与现有工作相比,每增加10%的冗余数据,ALIGN框架利用所得数据对齐结果可将模型训练时间缩短约1.3秒,将模型训练准确度稳定在85%以上。仿真实验结果表明,通过ALIGN框架进行纵向联邦学习数据对齐,有利于提升后续模型训练的效率和模型准确度。
  • 图  1  纵向联邦学习数据对齐及训练阶段示意图

    图  2  ALIGN框架流程图

    图  3  ALIGN和Ciruit-PSI对齐数据的模型训练对比

    表  1  本文符号汇总

    符号 含义
    $ {P_{\rm A}},{P_{\rm B}} $ 纵向联邦学习参与方
    $ {n_{\rm A}} $ 参与方$ {P_{\rm A}} $的原始数据集包含的样本数量
    $ {n_{\rm B}} $ 参与方$ {P_{\rm B}} $的原始数据集包含的样本数量
    $ [n](n \in \mathbb{Z}) $ 集合$ \{ 1,2, \cdots ,n\} $
    m 参与方$ {P_{\rm A}} $和$ {P_{\rm B}} $的样本ID交集大小
    $\left\langle {\cdot } \right\rangle_{\rm A} $ 参与方$ {P_{\rm A}} $的秘密分享份额
    $\left\langle {\cdot } \right\rangle_{\rm B} $ 参与方$ {P_{\rm B}} $的秘密分享份额
    $ {[x]_k} $ 以密钥k加密明文x得到的密文
    下载: 导出CSV

    1  ALIGN框架任意参与方数据对齐算法

     输入:数据集$ ({\text{ID}},X) $,可交换加密算法$ E( \cdot ) $及密钥k,同态加密算法$ H( \cdot ) $及公私钥对$ ({\text{pk,sk}}) $
     输出:数据对齐结果$ {\text{res}} $
     Begin
      # ID加密和特征加密
      $ P\;{\text{sends}}\;{\text{pk}}\;{\text{to}}\;P'\;{\text{and receives}}\;{\text{pk}}'\;{\text{from}}\;P' $
      $ ({\text{ID}},X) $$ \leftarrow $$ ({E_k}({\text{ID}}),{H_{{\text{pk}}}}(X)) $
      $ P\;{\text{sends}}\;({\text{ID}},X)\;{\text{to}}\;P'\;{\text{and receives}}\;({\text{I}}{{\text{D}}_1},{X_1})\;{\text{from}}\;P' $
      # ID重加密和特征密文盲化
      $ {\text{I}}{{\text{D}}_1} \leftarrow {E_k}({\text{I}}{{\text{D}}_1}) $
      $ R = [] $
      for i in $ {\text{range(|}}{X_1}{\text{|)}} $
       $ {r_i} \leftarrow {\text{RandomGenerator}}(\cdot) $ # RandomGenerator(·)表示随机数生成器
       $ R.{\text{append}}({r_i}) $
       $ {X_1}[i] \leftarrow {X_1}[i] - {H_{{\text{pk}}}}({r_i}) $;
       end for
      $ {\text{dict}} \leftarrow \{ {\text{I}}{{\text{D}}_1}[i]:R[i] $for i in $ {\text{range(|I}}{{\text{D}}_{\text{1}}}{\text{|)\} }} $
      $ \pi \leftarrow {\text{PermutationGenerator}}(\cdot) $ #PermutationGenerator(·)表示随机置换函数生成器,用于随机置换列表中元素的顺序
      $ ({\text{I}}{{\text{D}}_1},{X_1}) \leftarrow \pi ({\text{I}}{{\text{D}}_1}),\pi ({X_1}) $
      $ P\;{\text{sends}}\;({\text{I}}{{\text{D}}_1},{X_1})\;{\text{to}}\;P'\;{\text{and receives}}\;({\text{I}}{{\text{D}}_2},{X_2})\;{\text{from}}\;P' $
      # ID密文求交和特征拼接
      $ {\text{re}}{{\text{s}}_{\text{1}}}{\text{, re}}{{\text{s}}_2} = [],[] $;
      for i in $ {\text{range(|I}}{{\text{D}}_2}{\text{|)}} $
       for $ {\text{id}} $ in $ {\text{I}}{{\text{D}}_1} $:
        if $ {\text{I}}{{\text{D}}_2}[i] = = {\text{id}} $:
         $ {\text{re}}{{\text{s}}_{\text{1}}}{\text{.append}}({X_2}[i]) $
         $ {\text{re}}{{\text{s}}_2}{\text{.append}}({\text{dict[id}}]) $
        end if
       end for
      end for
      # 解密生成秘密份额
      $ {\text{result}} \leftarrow ({H^{{{ - 1}}}}_{{\text{sk}}}({\text{re}}{{\text{s}}_{\text{1}}}{\text{),re}}{{\text{s}}_{\text{2}}}) $
     End
    下载: 导出CSV

    表  2  ALIGN框架运行时间

    ID交集大小 运行时间(s)
    50 2.30
    100 4.49
    150 6.69
    200 8.82
    250 11.07
    下载: 导出CSV
  • [1] YANG Qiang, LIU Yang, CHEN Tianjian, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 12. doi: 10.1145/3298981.
    [2] 刘艺璇, 陈红, 刘宇涵, 等. 联邦学习中的隐私保护技术[J]. 软件学报, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446.

    LIU Yixuan, CHEN Hong, LIU Yuhan, et al. Privacy-preserving techniques in federated learning[J]. Journal of Software, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446.
    [3] LI Tian, SAHU A K, TALWALKAR A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50–60. doi: 10.1109/MSP.2020.2975749.
    [4] KAIROUZ P, MCMAHAN H B, AVENT B, et al. Advances and open problems in federated learning[J]. Foundations and Trends® in Machine Learning, 2021, 14(1/2): 1–210. doi: 10.1561/2200000083.
    [5] BELTRÁN E T M, PÉREZ M Q, SÁNCHEZ P M S, et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges[J]. IEEE Communications Surveys & Tutorials, 2023, 25(4): 2983–3013. doi: 10.1109/COMST.2023.3315746.
    [6] 陈晋音, 李荣昌, 黄国瀚, 等. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017.

    CHEN Jinyin, LI Rongchang, HUANG Guohan, et al. Survey on vertical federated learning: Algorithm, privacy and security[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017.
    [7] LIU Yang, KANG Yan, ZOU Tianyuan, et al. Vertical federated learning: Concepts, advances and challenges[J]. arXiv: 2211.12814, 2023.
    [8] ROMANINI D, HALL A J, PAPADOPOULOS P, et al. Pyvertical: A vertical federated learning framework for multi-headed splitNN[J]. arXiv: 2104.00489, 2021.
    [9] LI Qun, THAPA C, ONG L, et al. Vertical federated learning: Taxonomies, threats, and prospects[J]. arXiv: 2302.01550, 2023.
    [10] WEI Kang, LI Jun, MA Chuan, et al. Vertical federated learning: Challenges, methodologies and experiments[J]. arXiv: 2202.04309, 2022.
    [11] FREEDMAN M J, NISSIM K, and PINKAS B. Efficient private matching and set intersection[C]. International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2004: 1–19. doi: 10.1007/978-3-540-24676-3_1.
    [12] PINKAS B, SCHNEIDER T, TKACHENKO O, et al. Efficient circuit-based PSI with linear communication[C]. The 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, 2019: 122–153. doi: 10.1007/978-3-030-17659-4_5.
    [13] PINKAS B, SCHNEIDER T, and ZOHNER M. Scalable private set intersection based on OT extension[J]. ACM Transactions on Privacy and Security, 2018, 21(2): 7. doi: 10.1145/3154794.
    [14] PINKAS B, ROSULEK M, TRIEU N, et al. PSI from PaXoS: Fast, malicious private set intersection[C]. The 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2020: 739–767. doi: 10.1007/978-3-030-45724-2_25.
    [15] LU Linpeng and DING Ning. Multi-party private set intersection in vertical federated learning[C]. The 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, Guangzhou, China, 2020: 707–714. doi: 10.1109/TrustCom50675.2020.00098.
    [16] HARDY S, HENECKA W, IVEY-LAW H, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. arXiv: 1711.10677, 2017.
    [17] LIU Yang, ZHANG Xiong, and WAN Libin. Asymmetrical vertical federated learning[J]. arXiv: 2004.07427, 2020.
    [18] RINDAL P and SCHOPPMANN P. VOLE-PSI: Fast OPRF and circuit-psi from vector-ole[C]. The 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2021: 901–930. doi: 10.1007/978-3-030-77886-6_31.
    [19] CHANDRAN N, GUPTA D, and SHAH A. Circuit-PSI with linear complexity via relaxed batch OPPRF[J]. Proceedings on Privacy Enhancing Technologies, 2022, 2022(1): 353–372. doi: 10.2478/POPETS-2022-0018.
    [20] RAGHURAMAN S and RINDAL P. Blazing fast PSI from improved OKVS and subfield vole[C]. 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, USA, 2022: 2505–2517. doi: 10.1145/3548606.3560658.
    [21] LIU Yang, ZHANG Bingsheng, MA Yuxiang, et al. iPrivJoin: An ID-private data join framework for privacy-preserving machine learning[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 4300–4312. doi: 10.1109/TIFS.2023.3288455.
    [22] AGRAWAL R, EVFIMIEVSKI A, and SRIKANT R. Information sharing across private databases[C]. 2003 ACM SIGMOD International Conference on Management of Data, San Diego, USA, 2003: 86–97. doi: 10.1145/872757.872771.
    [23] AHIRWAL R R and AHKE M. Elliptic curve diffie-hellman key exchange algorithm for securing hypertext information on wide area network[J]. International Journal of Computer Science and Information Technologies, 2013, 4(2): 363–368.
    [24] PAILLIER P. Public-key cryptosystems based on composite degree residuosity classes[C]. International Conference on the Theory and Application of Cryptographic Techniques, Prague, Czech Republic, 1999: 223–238. doi: 10.1007/3-540-48910-X_16.
    [25] SHAMIR A. How to share a secret[J]. Communications of the ACM, 1979, 22(11): 612–613. doi: 10.1145/359168.359176.
    [26] BONAWITZ K A, EICHNER H, GRIESKAMP W, et al. Towards federated learning at scale: System design[C]. Machine Learning and Systems 2019, Stanford, USA, 2019.
    [27] SHAMIR A, RIVEST R L, and ADLEMAN L M. Mental poker[M]. KLARNER D A. The Mathematical Gardner. Boston: Springer, 1981: 37–43. doi: 10.1007/978-1-4684-6686-7_5.
    [28] POHLIG S and HELLMAN M. An improved algorithm for computing logarithms overGF(p)and its cryptographic significance (Corresp. )[J]. IEEE Transactions on information Theory, 1978, 24(1): 106–110. doi: 10.1109/TIT.1978.1055817.
    [29] DIFFIE W and HELLMAN M. New directions in cryptography[J]. IEEE Transactions on Information Theory, 1976, 22(6): 644–654. doi: 10.1109/TIT.1976.1055638.
    [30] LECUN Y. The MNIST database of handwritten digits[EB/OL]. http://yann.lecun.com/exdb/mnist/, 1998.
    [31] BARNES R. CrypTen[EB/OL].https://github.com/facebookresearch/CrypTen, 2020.
  • 加载中
图(3) / 表(3)
计量
  • 文章访问数:  238
  • HTML全文浏览量:  140
  • PDF下载量:  32
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-11-07
  • 修回日期:  2024-04-02
  • 网络出版日期:  2024-04-17
  • 刊出日期:  2024-08-10

目录

    /

    返回文章
    返回