Advanced Search
Turn off MathJax
Article Contents
GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231234
Citation: GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231234

A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

doi: 10.11999/JEIT231234
Funds:  Natural Science Foundation of Beijing Municipality (M21033),Tencent Rhino-Bird Joint Research Program.
  • Received Date: 2023-11-07
  • Rev Recd Date: 2024-04-02
  • Available Online: 2024-04-17
  • In vertical federated learning, the datasets of the clients have overlapping sample IDs and features of different dimensions, thus the data alignment is necessary for model training. As the intersection of the sample IDs is public in current data alignment technologies, how to align the data without any leakage of the intersection becomes a key issue. The proposing private-preserving data alignment framework is based on interchangeable encryption and homomorphic encryption technologies, mainly including data encryption, ciphertext blinding, private intersecting, and feature splicing. The sample IDs are encrypted twice based on an interchangeable encryption algorithm, where the same ciphertexts correspond to the same plaintexts, and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm. The intersection of the encrypted sample IDs is obtained, and the corresponding features are then spliced and secretly shared with the participants. Compared to the existing technologies, the privacy of the ID intersection is protected, and the samples corresponding to the IDs outside intersection can be removed safely in our framework. The security proof shows that each participant cannot obtain any knowledge of each other except for the data size, which guarantees the effectiveness of the private-preserving strategies. The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data. The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.
  • loading
  • [1]
    YANG Qiang, LIU Yang, CHEN Tianjian, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 12. doi: 10.1145/3298981.
    [2]
    刘艺璇, 陈红, 刘宇涵, 等. 联邦学习中的隐私保护技术[J]. 软件学报, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446.

    LIU Yixuan, CHEN Hong, LIU Yuhan, et al. Privacy-preserving techniques in federated learning[J]. Journal of Software, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446.
    [3]
    LI Tian, SAHU A K, TALWALKAR A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50–60. doi: 10.1109/MSP.2020.2975749.
    [4]
    KAIROUZ P, MCMAHAN H B, AVENT B, et al. Advances and open problems in federated learning[J]. Foundations and Trends® in Machine Learning, 2021, 14(1/2): 1–210. doi: 10.1561/2200000083.
    [5]
    BELTRÁN E T M, PÉREZ M Q, SÁNCHEZ P M S, et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges[J]. IEEE Communications Surveys & Tutorials, 2023, 25(4): 2983–3013. doi: 10.1109/COMST.2023.3315746.
    [6]
    陈晋音, 李荣昌, 黄国瀚, 等. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017.

    CHEN Jinyin, LI Rongchang, HUANG Guohan, et al. Survey on vertical federated learning: Algorithm, privacy and security[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017.
    [7]
    LIU Yang, KANG Yan, ZOU Tianyuan, et al. Vertical federated learning: Concepts, advances and challenges[J]. arXiv: 2211.12814, 2023. (查阅网上资料, 未能确认文献类型, 请确认) .
    [8]
    ROMANINI D, HALL A J, PAPADOPOULOS P, et al. Pyvertical: A vertical federated learning framework for multi-headed splitNN[J]. arXiv: 2104.00489, 2021. (查阅网上资料, 未能确认文献类型, 请确认) .
    [9]
    LI Qun, THAPA C, ONG L, et al. Vertical federated learning: Taxonomies, threats, and prospects[J]. arXiv: 2302.01550, 2023. (查阅网上资料, 未能确认文献类型, 请确认) .
    [10]
    WEI Kang, LI Jun, MA Chuan, et al. Vertical federated learning: Challenges, methodologies and experiments[J]. arXiv: 2202.04309, 2022. (查阅网上资料, 未能确认文献类型, 请确认) .
    [11]
    FREEDMAN M J, NISSIM K, and PINKAS B. Efficient private matching and set intersection[C]. Proceedings of International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2004: 1–19. doi: 10.1007/978-3-540-24676-3_1.
    [12]
    PINKAS B, SCHNEIDER T, TKACHENKO O, et al. Efficient circuit-based PSI with linear communication[C]. Proceedings of the 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, 2019: 122–153. doi: 10.1007/978-3-030-17659-4_5.
    [13]
    PINKAS B, SCHNEIDER T, and ZOHNER M. Scalable private set intersection based on OT extension[J]. ACM Transactions on Privacy and Security, 2018, 21(2): 7. doi: 10.1145/3154794.
    [14]
    PINKAS B, ROSULEK M, TRIEU N, et al. PSI from PaXoS: Fast, malicious private set intersection[C]. Proceedings of the 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2020: 739–767. doi: 10.1007/978-3-030-45724-2_25.
    [15]
    LU Linpeng and DING Ning. Multi-party private set intersection in vertical federated learning[C]. Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, Guangzhou, China, 2020: 707–714. doi: 10.1109/TrustCom50675.2020.00098.
    [16]
    HARDY S, HENECKA W, IVEY-LAW H, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. arXiv: 1711.10677, 2017. (查阅网上资料, 未能确认文献类型, 请确认) .
    [17]
    LIU Yang, ZHANG Xiong, and WAN Libin. Asymmetrical vertical federated learning[J]. arXiv: 2004.07427, 2020. (查阅网上资料, 未能确认文献类型, 请确认) .
    [18]
    RINDAL P and SCHOPPMANN P. VOLE-PSI: Fast OPRF and circuit-psi from vector-ole[C]. Proceedings of the 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2021: 901–930. doi: 10.1007/978-3-030-77886-6_31.
    [19]
    CHANDRAN N, GUPTA D, and SHAH A. Circuit-PSI with linear complexity via relaxed batch OPPRF[J]. Proceedings on Privacy Enhancing Technologies, 2022, 2022(1): 353–372. doi: 10.2478/POPETS-2022-0018.
    [20]
    RAGHURAMAN S and RINDAL P. Blazing fast PSI from improved OKVS and subfield vole[C]. Proceedings of 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, USA, 2022: 2505–2517. doi: 10.1145/3548606.3560658.
    [21]
    LIU Yang, ZHANG Bingsheng, MA Yuxiang, et al. iPrivJoin: An ID-private data join framework for privacy-preserving machine learning[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 4300–4312. doi: 10.1109/TIFS.2023.3288455.
    [22]
    AGRAWAL R, EVFIMIEVSKI A, and SRIKANT R. Information sharing across private databases[C]. Proceedings of 2003 ACM SIGMOD International Conference on Management of Data, San Diego, USA, 2003: 86–97. doi: 10.1145/872757.872771.
    [23]
    AHIRWAL R R and AHKE M. Elliptic curve diffie-hellman key exchange algorithm for securing hypertext information on wide area network[J]. International Journal of Computer Science and Information Technologies, 2013, 4(2): 363–368.
    [24]
    PAILLIER P. Public-key cryptosystems based on composite degree residuosity classes[C]. Proceedings of International Conference on the Theory and Application of Cryptographic Techniques, Prague, Czech Republic, 1999: 223–238. doi: 10.1007/3-540-48910-X_16.
    [25]
    SHAMIR A. How to share a secret[J]. Communications of the ACM, 1979, 22(11): 612–613. doi: 10.1145/359168.359176.
    [26]
    BONAWITZ K A, EICHNER H, GRIESKAMP W, et al. Towards federated learning at scale: System design[C]. Proceedings of Machine Learning and Systems 2019, Stanford, USA, 2019.
    [27]
    SHAMIR A, RIVEST R L, and ADLEMAN L M. Mental poker[M]. KLARNER D A. The Mathematical Gardner. Boston: Springer, 1981: 37–43. doi: 10.1007/978-1-4684-6686-7_5.
    [28]
    POHLIG S and HELLMAN M. An improved algorithm for computing logarithms overGF(p)and its cryptographic significance (Corresp. )[J]. IEEE Transactions on information Theory, 1978, 24(1): 106–110. doi: 10.1109/TIT.1978.1055817.
    [29]
    DIFFIE W and HELLMAN M. New directions in cryptography[J]. IEEE Transactions on Information Theory, 1976, 22(6): 644–654. doi: 10.1109/TIT.1976.1055638.
    [30]
    LECUN Y. The MNIST database of handwritten digits[EB/OL]. http://yann.lecun.com/exdb/mnist/, 1998.
    [31]
    BARNES R. CrypTen[EB/OL].https://github.com/facebookresearch/CrypTen, 2020. (查阅网上资料,未找到作者信息,请确认) .
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(3)

    Article Metrics

    Article views (33) PDF downloads(3) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return