基于相似度聚类的可信联邦安全聚合算法

蔡红云; 张宇; 王诗云; 赵傲; 张美玲

doi:10.11999/JEIT221088

基于相似度聚类的可信联邦安全聚合算法

doi: 10.11999/JEIT221088 cstr: 32379.14.JEIT221088

1.
河北大学网络空间安全与计算机学院保定 071000
2.
河北省高可信信息系统重点实验室保定 071000

基金项目: 河北省自然科学基金(F2020201023)，河北省高等学校科学技术研究项目(ZD2022105)，河北大学高层次人才科研启动项目(521100221089)

详细信息

作者简介:
蔡红云：女，博士，副教授，研究方向为联邦学习、隐私计算、推荐系统安全等

张宇：男，硕士生，研究方向为联邦学习

王诗云：女，硕士生，研究方向为攻击检测

赵傲：男，硕士生，研究方向为隐私度量

张美玲：女，硕士生，研究方向为隐私计算

通讯作者:
张宇　zhangyu990813@163.com

中图分类号: TN915; TP309.2
计量
- 文章访问数: 1193
- HTML全文浏览量: 1076
- PDF下载量: 191
- 被引次数: 0
出版历程
- 收稿日期: 2022-08-19
- 修回日期: 2023-02-21
- 网络出版日期: 2023-02-23
- 刊出日期: 2023-03-10

Trusted Federated Secure Aggregation via Similarity Clustering

1.
School of Cyber Security and Computer, Hebei University, Baoding 071000, China
2.
Key Laboratory on High Trusted Information System in Hebei Province, Baoding 071000, China

Funds: The Natural Science Foundation of Hebei Province (F2020201023), The Science and Technology Project of Hebei Education Department (ZD2022105), The High-level Personnel Starting Project of Hebei University (521100221089)

摘要

摘要: 联邦学习能够有效地规避参与方数据隐私问题，但模型训练中传递的参数或者梯度仍有可能泄露参与方的隐私数据，而恶意参与方的存在则会严重影响聚合过程和模型质量。基于此，该文提出一种基于相似度聚类的可信联邦安全聚合方法(FSA-SC)。首先基于客户端训练数据集规模及其与服务器间的通信距离综合评估选出拟参与模型聚合的候选客户端；然后根据候选客户端间的相似度，利用聚类将候选客户端划分为良性客户端和异常客户端；最后，对异常客户端类中的成员利用类内广播和二次协商进行参数替换和记录，检测识别恶意客户端。为了验证FSA-SC的有效性，以联邦推荐为应用场景，选取MovieLens 1M，Netflix数据集和Amazon抽样数据集为实验数据集，实验结果表明，所提方法能够实现高效的安全聚合，且相较对比方法有更高的鲁棒性。
- 隐私保护 /
- 联邦学习 /
- 模型攻击 /
- 安全聚合 /
- 相似度聚类
Abstract: Federated learning can effectively circumvent the data privacy issues of participants, but the parameters or gradients passed in model training may still leak the privacy of the participants. Also, the existence of malicious participants can seriously affect the aggregation process and model quality. In this paper, a trusted Federated Secure Aggregation method based on Similarity Clustering named FSA-SC is proposed. Firstly, the weight for each client can be measured based on the size of the client training data set and the communication distance between the client and the server, and those participants with higher weight are selected in the server-side model aggregation. Secondly, according to the similarity between the candidate clients, the candidate clients are divided into two groups, i.e., benign group and abnormal group. Finally, for the abnormal group, an intra-class broadcast and secondary negotiation are designed to replace and record the parameters of the members, so as to detect effectively malicious clients. In order to verify the effectiveness of FSA-SC, taking federated recommendation as the application scenario, experimental results on MovieLens 1M, Netflix and Amazon datasets indicate that FSA-SC can achieve efficient security aggregation and has greater robustness than baselines.
- Federated learning /
- Privacy protection /
- Model attack /
- Secure aggregation /
- Similarity clustering

HTML全文

图 1 FSA-SC算法流程

下载: 全尺寸图片幻灯片

图 2 茫化过程

下载: 全尺寸图片幻灯片

图 3 二次协商

下载: 全尺寸图片幻灯片

图 4 3种方法在MovieLens 1M数据集上的命中率对比

下载: 全尺寸图片幻灯片

图 5 3种方法在Netflix数据集上的命中率对比

下载: 全尺寸图片幻灯片

图 6 3种方法在Amazon数据集上的命中率对比

下载: 全尺寸图片幻灯片

表 1 FSA-SC和SMPC性能对比

	通信	计算	存储	方法
用户	$ O(m\gamma +nr) $	$ O({m}^{2}+\left(m-1\right)c\left(r,n\right)) $	$ O(\left(\gamma +2\mu \right)m+nr) $	SMPC
用户	$O(m{\gamma }'+(1+m\left)nr\right)$	$ O\left(\right(m+1)m+c\left(r,n\right)) $	$ O(\gamma +2\mu +mnr) $	FSA-SC
服务器	$ O({m}^{2}\gamma +nmr) $	$ O({m}^{2}+(m-\vartheta \left)\right(m-1\left)c\right(r,n\left)\right) $	$ O({m}^{2}\mu +mnr) $	SMPC
服务器	$ O(m\gamma +nr) $	$ O(m+(m-\vartheta \left)c\right(r,n\left)\right) $	$ O(m\mu +mnr) $	FSA-SC

下载: 导出CSV

表 2 参数设置

	客户端数量	通信次数	本地迭代次数	批次大小	学习率
MovieLens 1M	20	50	50	256	0.001
Netflix	20	30	50	256	0.001
Amazon	10	30	40	256	0.001

下载: 导出CSV

参考文献(31)

[1]	LÓPEZ K L, GAGNÉ C, and GARDNER M A. Demand-side management using deep learning for smart charging of electric vehicles[J]. IEEE Transactions on Smart Grid, 2019, 10(3): 2683–2691. doi: 10.1109/TSG.2018.2808247
[2]	LIN Weiyang, HU Yahan, and TSAI C F. Machine learning in financial crisis prediction: A survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(4): 421–436. doi: 10.1109/TSMCC.2011.2170420
[3]	KONEČNÝ J, MCMAHAN H B, RAMAGE D, et al. Federated optimization: Distributed machine learning for on-device intelligence[EB/OL]. https://arxiv.org/abs/1610.02527, 2016.
[4]	WU Nan, FAROKHI F, SMITH D, et al. The value of collaboration in convex machine learning with differential privacy[C]. IEEE Symposium on Security and Privacy (SP), San Francisco, USA, 2020: 304–317.
[5]	ZHU Ligeng, LIU Zhijian, and HAN Song. Deep leakage from gradients[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 1323.
[6]	BAGDASARYAN E, VEIT A, HUA Yiqing, et al. How to backdoor federated learning[C]. 23rd International Conference on Artificial Intelligence and Statistics, Palermo, Italy, 2020: 2938–2948.
[7]	XIE Chulin, HUANG Keli, CHEN Pinyu, et al. DBA: Distributed backdoor attacks against federated learning[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
[8]	BONAWITZ K, IVANOV V, KREUTER B, et al. Practical secure aggregation for privacy-preserving machine learning[C]. The 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, USA, 2017: 1175–1191.
[9]	BELL J H, BONAWITZ K A, GASCÓN A, et al. Secure single-server aggregation with (poly) logarithmic overhead[C/OL]. The 2020 ACM SIGSAC Conference on Computer and Communications Security, USA, 2020: 1253–1269.
[10]	MANDAL K, GONG Guang, and LIU Chuyi. NIKE-based fast privacy-preserving high-dimensional data aggregation for mobile devices[R]. CACR Technical Report, CACR 2018–10, 2018.
[11]	LIN Guanyu, LIANG Feng, PAN Weike, et al. FedRec: Federated recommendation with explicit feedback[J]. IEEE Intelligent Systems, 2021, 36(5): 21–30. doi: 10.1109/MIS.2020.3017205
[12]	MINTO L, HALLER M, LIVSHITS B, et al. Stronger privacy for federated collaborative filtering with implicit feedback[C]. Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 2021: 342–350.
[13]	TANG Ruixiang, DU Mengnan, LIU Ninghao, et al. An embarrassingly simple approach for Trojan attack in deep neural networks[C/OL]. The 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, USA, 2020: 218–228.
[14]	CHEN Chaochao, LI Liang, WU Bingzhe, et al. Secure social recommendation based on secret sharing[C]. 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 2020: 506–512.
[15]	JEONG E, OH S, KIM H, et al. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data[EB/OL]. https://arxiv.org/abs/1811.11479, 2018.
[16]	XU Runhua, BARACALDO N, ZHOU Yi, et al. HybridAlpha: An efficient approach for privacy-preserving federated learning[C]. The 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 2019: 13–23.
[17]	ZHANG Yuhui, WANG Zhiwei, CAO Jiangfeng, et al. ShuffleFL: Gradient-preserving federated learning using trusted execution environment[C]. The 18th ACM International Conference on Computing Frontiers, Catania, Italy, 2021: 161–168.
[18]	SUN Lichao, QIAN Jianwei, and CHEN Xun. LDP-FL: Practical private aggregation in federated learning with local differential privacy[C]. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, Canada, 2021: 1571–1578.
[19]	TOLPEGIN V, TRUEX S, GURSOY M E, et al. Data poisoning attacks against federated learning systems[C]. 25th European Symposium on Research in Computer Security, Guildford, UK, 2020: 480–501.
[20]	SUN Ziteng, KAIROUZ P, SURESH A T, et al. Can you really backdoor federated learning?[EB/OL]. https://arxiv.org/abs/1911.07963, 2019.
[21]	YIN Dong, CHEN Yudong, RAMCHANDRAN K, et al. Byzantine-robust distributed learning: Towards optimal statistical rates[C]. 35th International Conference on Machine Learning, Stockholmsmassan, Sweden, 2018: 5636–5645.
[22]	GAO Jiqiang, ZHANG Baolei, GUO Xiaojie, et al. Secure partial aggregation: Making federated learning more robust for industry 4.0 applications[J]. IEEE Transactions on Industrial Informatics, 2022, 18(9): 6340–6348. doi: 10.1109/TII.2022.3145837
[23]	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]. 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017: 1273–1282.
[24]	CHAI Zheng, ALI A, ZAWAD S, et al. TiFL: A tier-based federated learning system[C]. The 29th International Symposium on High-Performance Parallel and Distributed Computing, Stockholm, Sweden, 2020: 125–136.
[25]	RIBERO M and VIKALO H. Communication-efficient federated learning via optimal client sampling[EB/OL]. https://arxiv.org/abs/2007.15197, 2020.
[26]	ABDULRAHMAN S, TOUT H, MOURAD A, et al. FedMCCS: Multicriteria client selection model for optimal IoT federated learning[J]. IEEE Internet of Things Journal, 2021, 8(6): 4723–4735. doi: 10.1109/JIOT.2020.3028742
[27]	NISHIO T and YONETANI R. Client selection for federated learning with heterogeneous resources in mobile edge[C]. IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–7.
[28]	HE Xiangnan, LIAO Lizi, ZHANG Hanwang, et al. Neural collaborative filtering[C]. The 26th International Conference on World Wide Web, Perth, Australia, 2017: 173–182.
[29]	HAO Yaojun, ZHANG Fuzhi, WANG Jian, et al. Detecting shilling attacks with automatic features from multiple views[J]. Security and Communication Networks, 2019, 2019: 6523183. doi: 10.1155/2019/6523183
[30]	WANG Wenjie, FENG Fuli, HE Xiangnan, et al. Denoising implicit feedback for recommendation[C/OL]. The 14th ACM International Conference on Web Search and Data Mining, Israel, 2021: 373–381.
[31]	LIU Zhiwei, CHEN Yongjun, LI Jia, et al. Contrastive self-supervised sequential recommendation with robust augmentation[EB/OL]. https://arxiv.org/abs/2108.06479, 2021.