AccFed：物联网中基于模型分割的联邦学习加速

曹绍华; 陈辉; 陈舒; 张汉卿; 张卫山

doi:10.11999/JEIT220240

AccFed：物联网中基于模型分割的联邦学习加速

doi: 10.11999/JEIT220240

中国石油大学(华东)计算机科学与技术学院青岛 266580

基金项目: 国家自然科学基金(62072469)，研究生创新工程项目(YCX2021129)，中国科学院自动化研究所复杂系统管理与控制国家重点实验室开放课题(20210114)

详细信息

作者简介:
曹绍华：男，副教授，硕士生导师，研究方向为SDN、云计算和边缘计算等

陈辉：男，硕士生，研究方向为边缘智能、联邦学习和SDN等

陈舒：女，硕士生，研究方向为智能城市和5G等

张汉卿：男，硕士生，研究方向为边缘计算中的计算卸载和数据缓存等

张卫山：男，教授，博士生导师，研究方向为大数据平台、普适性云计算、面向服务计算和联邦学习等

通讯作者:
曹绍华　shaohuacao@upc.edu.cn

中图分类号: TN929.5; TP399
计量
- 文章访问数: 1467
- HTML全文浏览量: 867
- PDF下载量: 242
- 被引次数: 8
出版历程
- 收稿日期: 2022-03-08
- 修回日期: 2022-05-11
- 网络出版日期: 2022-05-20
- 刊出日期: 2023-05-10

AccFed: Federated Learning Acceleration Based on Model Partitioning in Internet of Things

College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China

Funds: The National Natural Science Foundation of China (62072469), The Postgraduate Student Innovation Project (YCX2021129), The State Key Laboratory of Complex System Management and Control, Institute of Automation, Chinese Academy of Sciences, Open Project (20210114)

摘要

摘要: 随着物联网(IoT)的快速发展，人工智能(AI)与边缘计算(EC)的深度融合形成了边缘智能(Edge AI)。但由于IoT设备计算与通信资源有限，并且这些设备通常具有隐私保护的需求，那么在保护隐私的同时，如何加速Edge AI仍然是一个挑战。联邦学习(FL)作为一种新兴的分布式学习范式，在隐私保护和提升模型性能等方面，具有巨大的潜力，但是通信及本地训练效率低。为了解决上述难题，该文提出一种FL加速框架AccFed。首先，根据网络状态的不同，提出一种基于模型分割的端边云协同训练算法，加速FL本地训练；然后，设计一种多轮迭代再聚合的模型聚合算法，加速FL聚合；最后实验结果表明，AccFed在训练精度、收敛速度、训练时间等方面均优于对照组。
- 边缘智能 /
- 联邦学习 /
- 端边云协同 /
- 模型分割
Abstract: With the rapid development of Internet of Things (IoT), the deep integration of Artificial Intelligence (AI) and Edge Computing (EC) has formed Edge AI. However, since IoT devices are computationally and communicationally constrained and these devices often require privacy-preserving, it is still a challenge to accelerate Edge AI while protecting privacy. Federated Learning (FL), an emerging distributed learning paradigm, has great potential in terms of privacy preservation and improving model performance, but communication and local training are inefficient. To address the above challenges, a FL acceleration framework AccFed is proposed in this paper. Firstly, a Device-Edge-Cloud synergy training algorithm based on model partitioning is proposed to accelerate FL local training according to the different network states; Then, a multi-iteration and reaggregation algorithm is designed to accelerate FL aggregation; Finally, experimental results show that AccFed outperforms the control group in terms of training accuracy, convergence speed, training time, etc.
- Edge Artificial Intelligence (AI) /
- Federated Learning (FL) /
- Device-edge-cloud synergy /
- Model partitioning

HTML全文

图 1 IoT场景中的Edge AI

下载: 全尺寸图片幻灯片

图 2 AccFed 框架

下载: 全尺寸图片幻灯片

图 3 AlexNet分支网络结构示意图

下载: 全尺寸图片幻灯片

图 4 当 $k = 3$ , FedAvg, SplitFed与AccFed的训练精度

下载: 全尺寸图片幻灯片

图 5 当 $k = 5$ , FedAvg, SplitFed与AccFed的模型精度

下载: 全尺寸图片幻灯片

图 6 $k = 7$ , FedAvg, SplitFed与AccFed的模型精度

下载: 全尺寸图片幻灯片

图 7 AccFed 50轮迭代之前的模型精度

下载: 全尺寸图片幻灯片

图 8 当迭代次数为150轮时，FedAvg, SplitFed与AccFed的训练用时

下载: 全尺寸图片幻灯片

图 9 $k = 3$ , FedAvg, SplitFed与AccFed的损失值对比

下载: 全尺寸图片幻灯片

图 10 $k = 5$ , FedAvg, SplitFed与AccFed的损失值对比

下载: 全尺寸图片幻灯片

图 11 $k = 7$ , FedAvg, SplitFed与AccFed的损失值对比

下载: 全尺寸图片幻灯片

表 1 AccFed与FL, SL各项指标对比

指标	FL	SL	AccFed
构建模型	快	慢	快
隐私性	中等	优秀	优秀
计算卸载	无	有	有
通信成本	中等	高	低

下载: 导出CSV

算法1　DPS算法
输入：用户所需延迟latency，输入数据量 ${D_{{\text{in}}}}$ ，分支网络拓扑(包　　　　括 ${N_{{\text{ex}}}}$ , ${N_i}$ )， $f({L_j})$
输出：切分点 $p$ ，最小时延 $T$
(1) while true do
(2) 　　通过“ping”监视网络状态
(3) 　　if 需要进行计算卸载 then
(4) 　　　　if 网络动态为静态then
(5) 　　　　for $i={1:N}_{\mathrm{e}\mathrm{x}}$ do
(6) 　　　　　　选择第 $i$ 个退出点
(7) 　　　　　　for $j=1:{N}_{i}$ do
(8) 　　　　　　 $j=1:{N}_{i}$ ${\rm{T}}{{\rm{E}}_j} \leftarrow {f_{\text{e} } }\left( { {L_j} } \right)$
(9) 　　　　　　 ${\rm{T}}{{\rm{D}}_j} \leftarrow {f_{\text{d} } }\left( { {L_j} } \right)$
(10) 　　　　　　end for
(11) 　　　　　　 ${T_{i,p}} = \arg {\min _p}\left( {{T_{\text{d}}} + {T_{\text{t}}} + {T_{\text{e}}}} \right)$
(12) 　　　　　　if ${T_{i,p} } \le$ latency then
(13) 　　　　　　　　Return $i,p,{T}_{i,p}$
(14) 　　　　　　end if
(15) 　　　　　end for
(16) 　　　　　Return NULL
(17) 　　　　else
(18) 　　　　　 ${T_{\max }} \leftarrow + \infty$
(19) 　　　　　for $\alpha = 0:\dfrac{T}{ {\min \left( { {T_i} } \right)} };\alpha \leftarrow \alpha + \sigma$ do
(20) 　　　　　　for $\gamma = 0:\dfrac{T}{ {\min \left( { {T_i} } \right)} };\gamma \leftarrow \gamma + \sigma$ do
(21) 　　　　　　　　执行4～16行，更新 ${T_{\max }}$
(22) 　　　　　　 end for
(23) 　　　　　　若发现小于阈值，则缩小搜索空间
(24) 　　　　　end for
(25) 　　　　end if
(26) 　　　end if
(27) end while

下载: 导出CSV

算法2　Device-Edge-Cloud Synergy FL算法
输入：客户端数量 $N$ ，参与者数量 $K$ ，网络带宽 $B$
输出：全局模型
(1) 从 $N$ 个客户端中随机选取 $K$ 个客户端进行FL
(2) 根据 $B$ ，执行DPS()得到 $p$
Procedure Device
(3) for each epoch do
(4) 　　for each batch ${b}_{i}$ do
(5) 　　　　 ${O}_{p}\leftarrow \text{Output}\left({b}_{i},{W}_{{\rm{d}}}\right)$
(6) 　　　　将前 $p$ 层的输出 ${O}_{p}$ 与激活函数发送给边
(7) 　　　　从边接收 $\nabla L\left({O}_{p}\right)$
(8) 　　　　 ${W}_{{\rm{d}}}\leftarrow {W}_{{\rm{d}}}-\eta \cdot \nabla L\left({O}_{p}\right)\cdot \nabla {{O} }_{{p} }({W}_{{\rm{d}}})$
(9) 　　　　将 ${W}_{{\rm{d}}}$ 的变化进行参数裁剪
(10) 　 end for
(11) 　计算 ${W}_{{\rm{d}}}$ 平均变化量 ${\delta }_{ {W}_{{\rm{d}}} }$ ，如果 ${\delta }_{ {W}_{{\rm{d}}} }$ 变小，则增加本　　　　地迭代次数
Procedure Edge
(12) 从云获取最新全局模型 ${W}_{{\rm{c}}}$
(13) ${W}_{{\rm{e}}}\leftarrow {W}_{{\rm{c}}}$
(14) while true do
(15) 　　从设备接收 ${O}_{p}$ 与激活函数
(16) 　　 ${W}_{{\rm{e}}}\leftarrow {W}_{{\rm{e}}}-\eta \cdot \nabla L\left({W}_{{\rm{e}}}\right)$
(17) 　　将 $\nabla L\left({O}_{p}\right)$ 发给设备
(18) end while
Procedure Cloud
(19) 初始化 ${W}_{{\rm{c}}}$
(20) for each round do
(21) 　　将 ${W}_{{\rm{c}}}$ 发送给边
(22) 　　从设备接收 ${W}_{{\rm{d}}}$
(23) 　　执行联邦平均算法更新 ${W}_{{\rm{c}}}$
(24) 　　对 ${W}_{{\rm{c}}}$ 进行裁剪，求取高斯噪声方差 $\sigma$
(25) 　　 ${W}_{{\rm{c}}}\leftarrow {W}_{{\rm{c}}}+N(0,{\sigma }^{2})$
(26) end for

下载: 导出CSV

表 2 各设备参数表

设备	内存(GB)	数量	计算能力
树莓派 3B+	1	3	较弱
树莓派 4B	8	2	一般
Jetson Xavier NX	16	2	较强
服务器	32	1	最强

下载: 导出CSV

参考文献(26)

[1]	AAZAM M, ZEADALLY S, and HARRAS K A. Deploying fog computing in industrial internet of things and industry 4.0[J]. IEEE Transactions on Industrial Informatics, 2018, 14(10): 4674–4682. doi: 10.1109/TII.2018.2855198
[2]	UR REHMAN M H, AHMED E, YAQOOB I, et al. Big data analytics in industrial IoT using a concentric computing model[J]. IEEE Communications Magazine, 2018, 56(2): 37–43. doi: 10.1109/MCOM.2018.1700632
[3]	SHI Weisong, CAO Jie, ZHANG Quan, et al. Edge computing: vision and challenges[J]. IEEE Internet of Things Journal, 2016, 3(5): 637–646. doi: 10.1109/JIOT.2016.2579198
[4]	MOHAMMED T, JOE-WONG C, BABBAR R, et al. Distributed inference acceleration with adaptive DNN partitioning and offloading[C]. Proceedings of 2020 IEEE Conference on Computer Communications, Toronto, Canada, 2020: 854–863.
[5]	ZHANG Peiying, WANG Chao, JIANG Chunxiao, et al. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT[J]. IEEE Transactions on Industrial Informatics, 2021, 17(12): 8475–8484. doi: 10.1109/TII.2021.3064351
[6]	GAO Yansong, KIM M, ABUADBBA S, et al. End-to-end evaluation of federated learning and split learning for internet of things[C]. Proceedings of 2020 International Symposium on Reliable Distributed Systems (SRDS), Shanghai, China, 2020.
[7]	YU Keping, TAN Liang, ALOQAILY M, et al. Blockchain-enhanced data sharing with traceable and direct revocation in IIoT[J]. IEEE Transactions on Industrial Informatics, 2021, 17(11): 7669–7678. doi: 10.1109/TII.2021.3049141
[8]	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017: 1273–1282.
[9]	GUO Yeting, LIU Fang, CAI Zhiping, et al. FEEL: A federated edge learning system for efficient and privacy-preserving mobile healthcare[C]. Proceedings of the 49th International Conference on Parallel Processing-ICPP. Edmonton, Canada, 2020: 9.
[10]	CAO Xiaowen, ZHU Guangxu, XU Jie, et al. Optimized power control for over-the-air federated edge learning[C]. ICC 2021-IEEE International Conference on Communications, Montreal, Canada, 2021: 1–6.
[11]	LO S K, LU Qinghua, WANG Chen, et al. A systematic literature review on federated machine learning: From a software engineering perspective[J]. ACM Computing Surveys, 2022, 54(5): 95. doi: 10.1145/3450288
[12]	LI En, ZHOU Zhi, and CHEN Xu. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy[C]. Proceedings of 2018 Workshop on Mobile Edge Communications, Budapest, Hungary, 2018: 31–36.
[13]	KANG Yiping, HAUSWALD J, GAO Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[J]. ACM SIGARCH Computer Architecture News, 2017, 45(1): 615–629. doi: 10.1145/3093337.3037698
[14]	ESHRATIFAR A E, ABRISHAMI M S, and PEDRAM M. JointDNN: an efficient training and inference engine for intelligent mobile cloud computing services[J]. IEEE Transactions on Mobile Computing, 2021, 20(2): 565–576. doi: 10.1109/TMC.2019.2947893
[15]	TANG Xin, CHEN Xu, ZENG Liekang, et al. Joint multiuser DNN partitioning and computational resource allocation for collaborative edge intelligence[J]. IEEE Internet of Things Journal, 2021, 8(12): 9511–9522. doi: 10.1109/JIOT.2020.3010258
[16]	LI En, ZENG Liekang, ZHOU Zhi, et al. Edge AI: On-demand accelerating deep neural network inference via edge computing[J]. IEEE Transactions on Wireless Communications, 2020, 19(1): 447–457. doi: 10.1109/TWC.2019.2946140
[17]	ELGAMAL T and NAHRSTEDT K. Serdab: An IoT framework for partitioning neural networks computation across multiple enclaves[C]. Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, Australia, 2020: 519–528.
[18]	ZHU Guangxu, DU Yuqing, GÜNDÜZ D, et al. One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis[J]. IEEE Transactions on Wireless Communications, 2021, 20(3): 2120–2135. doi: 10.1109/TWC.2020.3039309
[19]	DU Yuqing, YANG Sheng, and HUANG Kaibin. High-dimensional stochastic gradient quantization for communication-efficient edge learning[J]. IEEE Transactions on Signal Processing, 2020, 68: 2128–2142. doi: 10.1109/TSP.2020.2983166
[20]	THAPA C, CHAMIKARA M A P, CAMTEPE S, et al. Splitfed: When federated learning meets split learning[J]. arXiv: 2004.12088, 2020.
[21]	VEPAKOMMA P, GUPTA O, SWEDISH T, et al. Split learning for health: Distributed deep learning without sharing raw patient data[J]. arXiv: 1812.00564, 2018.
[22]	ROMANINI D, HALL A J, PAPADOPOULOS P, et al. PyVertical: A vertical federated learning framework for multi-headed SplitNN[J]. arXiv: 2104.00489, 2021.
[23]	TEERAPITTAYANON S, MCDANEL B, and KUNG H T. Branchynet: Fast inference via early exiting from deep neural networks[C]. Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 2464–2469.
[24]	MCMAHAN H B, ANDREW G, ERLINGSSON U, et al. A general approach to adding differential privacy to iterative training procedures[J]. arXiv: 1812.06210, 2018.
[25]	MCMAHAN H B, RAMAGE D, TALWAR K, et al. Learning differentially private language models without losing accuracy[J]. arXiv: 1710.06963, 2018.
[26]	ABADI M, CHU A, GOODFELLOW I, et al. Deep learning with differential privacy[C]. Proceedings of 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 2016: 308–318.