A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

GAO Ying; XIE Yuxin; DENG Huanghao; ZHU Zukun; ZHANG Yiyu

doi:10.11999/JEIT231234

Volume 46 Issue 8

Aug. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(8): 3419-3427

GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234

Citation:

GAO Ying, XIE Yuxin, DENG Huanghao, ZHU Zukun, ZHANG Yiyu. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3419-3427. doi: 10.11999/JEIT231234

Citation:

PDF( 2220 KB)

A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

doi: 10.11999/JEIT231234 cstr: 32379.14.JEIT231234

1.
School of Cyber Science and Technology, Beihang University, Beijing 100191, China
2.
Zhongguancun Laboratory, Beijing 100194, China

Funds: The Natural Science Foundation of Beijing Municipality (M21033), The Tencent Rhino-Bird Joint Research Program

Received Date: 2023-11-07
Rev Recd Date: 2024-04-02

Available Online: 2024-04-17

Publish Date: 2024-08-30

Abstract

Abstract

In vertical federated learning, the datasets of the clients have overlapping sample IDs and features of different dimensions, thus the data alignment is necessary for model training. As the intersection of the sample IDs is public in current data alignment technologies, how to align the data without any leakage of the intersection becomes a key issue. The proposed private-preserving data ALIGNment framework (ALIGN) is based on interchangeable encryption and homomorphic encryption technologies, mainly including data encryption, ciphertext blinding, private intersecting, and feature splicing. The sample IDs are encrypted twice based on an interchangeable encryption algorithm, where the same ciphertexts correspond to the same plaintexts, and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm. The intersection of the encrypted sample IDs is obtained, and the corresponding features are then spliced and secretly shared with the participants. Compared to the existing technologies, the privacy of the ID intersection is protected, and the samples corresponding to the IDs outside intersection can be removed safely in our framework. The security proof shows that each participant cannot obtain any knowledge of each other except for the data size, which guarantees the effectiveness of the private-preserving strategies. The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data. The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.
- Vertical federated learning,
- Data alignment,
- Privacy protection,
- Commutative encryption,
- Homomorphic encryption

FullText(HTML)

References(31)

References

[1]	YANG Qiang, LIU Yang, CHEN Tianjian, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 12. doi: 10.1145/3298981.
[2]	刘艺璇, 陈红, 刘宇涵, 等. 联邦学习中的隐私保护技术[J]. 软件学报, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446. LIU Yixuan, CHEN Hong, LIU Yuhan, et al. Privacy-preserving techniques in federated learning[J]. Journal of Software, 2022, 33(3): 1057–1092. doi: 10.13328/j.cnki.jos.006446.
[3]	LI Tian, SAHU A K, TALWALKAR A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50–60. doi: 10.1109/MSP.2020.2975749.
[4]	KAIROUZ P, MCMAHAN H B, AVENT B, et al. Advances and open problems in federated learning[J]. Foundations and Trends^® in Machine Learning, 2021, 14(1/2): 1–210. doi: 10.1561/2200000083.
[5]	BELTRÁN E T M, PÉREZ M Q, SÁNCHEZ P M S, et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges[J]. IEEE Communications Surveys & Tutorials, 2023, 25(4): 2983–3013. doi: 10.1109/COMST.2023.3315746.
[6]	陈晋音, 李荣昌, 黄国瀚, 等. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017. CHEN Jinyin, LI Rongchang, HUANG Guohan, et al. Survey on vertical federated learning: Algorithm, privacy and security[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 1–20. doi: 10.11959/j.issn.2096−109x.2023017.
[7]	LIU Yang, KANG Yan, ZOU Tianyuan, et al. Vertical federated learning: Concepts, advances and challenges[J]. arXiv: 2211.12814, 2023.
[8]	ROMANINI D, HALL A J, PAPADOPOULOS P, et al. Pyvertical: A vertical federated learning framework for multi-headed splitNN[J]. arXiv: 2104.00489, 2021.
[9]	LI Qun, THAPA C, ONG L, et al. Vertical federated learning: Taxonomies, threats, and prospects[J]. arXiv: 2302.01550, 2023.
[10]	WEI Kang, LI Jun, MA Chuan, et al. Vertical federated learning: Challenges, methodologies and experiments[J]. arXiv: 2202.04309, 2022.
[11]	FREEDMAN M J, NISSIM K, and PINKAS B. Efficient private matching and set intersection[C]. International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2004: 1–19. doi: 10.1007/978-3-540-24676-3_1.
[12]	PINKAS B, SCHNEIDER T, TKACHENKO O, et al. Efficient circuit-based PSI with linear communication[C]. The 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, 2019: 122–153. doi: 10.1007/978-3-030-17659-4_5.
[13]	PINKAS B, SCHNEIDER T, and ZOHNER M. Scalable private set intersection based on OT extension[J]. ACM Transactions on Privacy and Security, 2018, 21(2): 7. doi: 10.1145/3154794.
[14]	PINKAS B, ROSULEK M, TRIEU N, et al. PSI from PaXoS: Fast, malicious private set intersection[C]. The 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2020: 739–767. doi: 10.1007/978-3-030-45724-2_25.
[15]	LU Linpeng and DING Ning. Multi-party private set intersection in vertical federated learning[C]. The 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, Guangzhou, China, 2020: 707–714. doi: 10.1109/TrustCom50675.2020.00098.
[16]	HARDY S, HENECKA W, IVEY-LAW H, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. arXiv: 1711.10677, 2017.
[17]	LIU Yang, ZHANG Xiong, and WAN Libin. Asymmetrical vertical federated learning[J]. arXiv: 2004.07427, 2020.
[18]	RINDAL P and SCHOPPMANN P. VOLE-PSI: Fast OPRF and circuit-psi from vector-ole[C]. The 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 2021: 901–930. doi: 10.1007/978-3-030-77886-6_31.
[19]	CHANDRAN N, GUPTA D, and SHAH A. Circuit-PSI with linear complexity via relaxed batch OPPRF[J]. Proceedings on Privacy Enhancing Technologies, 2022, 2022(1): 353–372. doi: 10.2478/POPETS-2022-0018.
[20]	RAGHURAMAN S and RINDAL P. Blazing fast PSI from improved OKVS and subfield vole[C]. 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, USA, 2022: 2505–2517. doi: 10.1145/3548606.3560658.
[21]	LIU Yang, ZHANG Bingsheng, MA Yuxiang, et al. iPrivJoin: An ID-private data join framework for privacy-preserving machine learning[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 4300–4312. doi: 10.1109/TIFS.2023.3288455.
[22]	AGRAWAL R, EVFIMIEVSKI A, and SRIKANT R. Information sharing across private databases[C]. 2003 ACM SIGMOD International Conference on Management of Data, San Diego, USA, 2003: 86–97. doi: 10.1145/872757.872771.
[23]	AHIRWAL R R and AHKE M. Elliptic curve diffie-hellman key exchange algorithm for securing hypertext information on wide area network[J]. International Journal of Computer Science and Information Technologies, 2013, 4(2): 363–368.
[24]	PAILLIER P. Public-key cryptosystems based on composite degree residuosity classes[C]. International Conference on the Theory and Application of Cryptographic Techniques, Prague, Czech Republic, 1999: 223–238. doi: 10.1007/3-540-48910-X_16.
[25]	SHAMIR A. How to share a secret[J]. Communications of the ACM, 1979, 22(11): 612–613. doi: 10.1145/359168.359176.
[26]	BONAWITZ K A, EICHNER H, GRIESKAMP W, et al. Towards federated learning at scale: System design[C]. Machine Learning and Systems 2019, Stanford, USA, 2019.
[27]	SHAMIR A, RIVEST R L, and ADLEMAN L M. Mental poker[M]. KLARNER D A. The Mathematical Gardner. Boston: Springer, 1981: 37–43. doi: 10.1007/978-1-4684-6686-7_5.
[28]	POHLIG S and HELLMAN M. An improved algorithm for computing logarithms overGF(p)and its cryptographic significance (Corresp. )[J]. IEEE Transactions on information Theory, 1978, 24(1): 106–110. doi: 10.1109/TIT.1978.1055817.
[29]	DIFFIE W and HELLMAN M. New directions in cryptography[J]. IEEE Transactions on Information Theory, 1976, 22(6): 644–654. doi: 10.1109/TIT.1976.1055638.
[30]	LECUN Y. The MNIST database of handwritten digits[EB/OL]. http://yann.lecun.com/exdb/mnist/, 1998.
[31]	BARNES R. CrypTen[EB/OL].https://github.com/facebookresearch/CrypTen, 2020.