面向物联网隐私数据分析的分布式弹性网络回归学习算法

方维维; 刘梦然; 王云鹏; 李阳阳; 安竹林

doi:10.11999/JEIT190739

面向物联网隐私数据分析的分布式弹性网络回归学习算法

doi: 10.11999/JEIT190739

1.
北京交通大学计算机与信息技术学院北京 100044
2.
社会安全风险感知与防控大数据应用国家工程实验室北京 100041
3.
中国科学院计算技术研究所北京 100190

基金项目: 北京市自然科学基金(L191019)，赛尔网络下一代互联网创新项目(NGII20190308)

详细信息

作者简介:
方维维：男，1981年生，博士，副教授，研究方向为物联网、边缘计算、分布式机器学习

刘梦然：女，1994年生，硕士生，研究方向为物联网、边缘计算、ADMM

王云鹏：男，1996年生，硕士生，研究方向为物联网、边缘计算、ADMM

李阳阳：男，1987年生，博士，高级工程师，研究方向为移动网络、边缘计算、系统安全

安竹林：男，1980年生，博士，高级工程师，研究方向为物联网、边缘计算、分布式系统

通讯作者:
方维维　wwfang@bjtu.edu.cn

中图分类号: TN915; P391.4
计量
- 文章访问数: 3096
- HTML全文浏览量: 1347
- PDF下载量: 192
- 被引次数: 0
出版历程
- 收稿日期: 2019-09-25
- 修回日期: 2020-05-12
- 网络出版日期: 2020-05-17
- 刊出日期: 2020-10-13

A Distributed Elastic Net Regression Algorithm for Private Data Analytics in Internet of Things

1.
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
2.
National Engineering Laboratory for Public Safety Risk Perception and Control, Beijing 100041, China
3.
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Funds: Beijing Municipal Natural Science Foundation (L191019), The CERNET Innovation Project (NGII20190308)

摘要

摘要: 为了解决基于集中式算法的传统物联网数据分析处理方式易引发网络带宽压力过大、延迟过高以及数据隐私安全等问题，该文针对弹性网络回归这一典型的线性回归模型，提出一种面向物联网(IoT)的分布式学习算法。该算法基于交替方向乘子法(ADMM)，将弹性网络回归目标优化问题分解为多个能够由物联网节点利用本地数据进行独立求解的子问题。不同于传统的集中式算法，该算法并不要求物联网节点将隐私数据上传至服务器进行训练，而仅仅传递本地训练的中间参数，再由服务器进行简单整合，以这样的协作方式经过多轮迭代获得最终结果。基于两个典型数据集的实验结果表明：该算法能够在几十轮迭代内快速收敛到最优解。相比于由单个节点独立训练模型的本地化算法，该算法提高了模型结果的有效性和准确性；相比于集中式算法，该算法在确保计算准确性和可扩展性的同时，可有效地保护个体隐私数据的安全性。
- 物联网 /
- 隐私保护 /
- 弹性网络回归 /
- 分布式算法 /
- 交替方向乘子法
Abstract: In order to solve the problems caused by the traditional data analysis based on the centralized algorithm in the IoT, such as excessive bandwidth occupation, high communication latency and data privacy leakage, considering the typical linear regression model of elastic net regression, a distributed learning algorithm for Internet of Things (IoT) is proposed in this paper. This algorithm is based on the the Alternating Direction Method of Multipliers (ADMM) framework. It decomposes the objective problem of elastic net regression into several sub-problems that can be solved independently by each IoT node using its local data. Different from traditional centralized algorithms, the proposed algorithm does not require the IoT node to upload its private data to the server for training, but rather the locally trained intermediate parameters to the server for aggregation. In such a collaborative manner, the server can finally obtain the objective model after several iterations. The experimental results on two typical datasets indicate that the proposed algorithm can quickly converge to the optimal solution within dozens of iterations. As compared to the localized algorithm in which each node trains the model solely based on its own local data, the proposed algorithm improves the validity and the accuracy of training models; as compared to the centralized algorithm, the proposed algorithm can guarantee the accuracy and the scalability of model training, and well protect the individual private data from leakage.
- Internet of Things(IoT) /
- Privacy protection /
- Elastic net regression /
- Distributed algorithm /
- Alternating Direction Method of Multipliers (ADMM)

HTML全文

图 1 分布式弹性网络回归学习算法计算流程

下载: 全尺寸图片幻灯片

图 2 目标函数值随迭代次数变化

下载: 全尺寸图片幻灯片

图 3 原始残差和对偶残差随迭代次数变化

下载: 全尺寸图片幻灯片

图 4 RMSE值随迭代次数变化

下载: 全尺寸图片幻灯片

图 5 调整复相关系数 ${R^2_a}$ 值随迭代次数变化

下载: 全尺寸图片幻灯片

图 6 参数N对算法性能的影响

下载: 全尺寸图片幻灯片

图 7 参数 $\rho$ 对算法性能的影响

下载: 全尺寸图片幻灯片

图 8 参数 ${u_1}$ 对算法性能的影响

下载: 全尺寸图片幻灯片

图 9 参数 ${u_2}$ 对算法性能的影响

下载: 全尺寸图片幻灯片

图 10 分布式算法与本地化算法之间的性能比较

下载: 全尺寸图片幻灯片

图 11 目标函数值随迭代次数变化

下载: 全尺寸图片幻灯片

图 12 RMSE随迭代次数变化

下载: 全尺寸图片幻灯片

图 13 分布式算法与本地化算法之间的性能比较

下载: 全尺寸图片幻灯片

表 1 PRP共轭梯度算法流程

输入：特征向量 ${{{x}}_{ij}}$ ；相应变量 ${y_{ij}}$ ；服务器提供的参数 $\alpha = \left\{ {({{{w}}^{k + 1}},{b^{k + 1}})} \right\}$ ；对偶变量 ${\gamma ^k} = \{ ({\gamma _{i,w}}^k,{\gamma _{i,b}}^k)\}$ ; $b_i^*$ ；
输出：物联网节点i的局部最优解 ${{{w}}_i}^*$ ；
(1) 初始迭代次数 $t = 0$ ，初始向量 ${{{w}}_i}^0 = 0$ ，收敛精度 $\varepsilon = 1e - 5$ ，初始搜索方向 ${{{p}}^0} = - g({{{w}}_i}^0)$ ；
(2) repeat /算法进行迭代/
(3) 　　 for j = –1:2:1
(4) 　　　　 if $F({w_i}^t + {\lambda ^t}{{{p}}^t}) > F({{{w}}_i}^t + j * {{{p}}^t})$ then
(5) 　　　　　　 ${\lambda ^t} \leftarrow j$ ;
(6) 　　　　 end if
(7) 　　 end for
(8) 　　 ${{{w}}_i}^{t + 1} \leftarrow {{{w}}_i}^t + {\lambda ^t}{{{p}}^t}$ ; 　(9) 　　 ${\beta ^t} \leftarrow \dfrac{ {g{ {({ {{w} }_i}^{t + 1})}^{\rm{T} } }(g({ {{w} }_i}^{t + 1}) - g({ {{w} }_i}^t))} }{ {g{ {({ {{w} }_i}^t)}^{\rm{T} } }g({ {{w} }_i}^t)} }$ ;
(10) 　　 ${{{p}}^{t + 1}} = - g({{{w}}_i}^{t + 1}) + {\beta ^t}{{{p}}^t}$ ;
(11) $t \leftarrow t + 1$ ;
(12) until $\left\\| {g({ {{w} }_i}^t)} \right\\| \le \varepsilon$ ; /算法达到收敛准则，停止迭代/
(13) ${{{w}}_i}^* \leftarrow {{{w}}_i}^t$ ;

下载: 导出CSV

表 2 分布式弹性网络回归学习算法流程

输入：物联网节点的样本数据，包括特征向量 ${{{x}}_{ij}}$ ；相应因变量 ${y_{ij}}$ ;
输出：最终结果 $\alpha = \left\{ {({{{w}}^},{b^})} \right\}$ ;
(1) 服务器初始参数设置： $k = 0,\;\;\overline w = 0,\;\;{\overline b ^0} = 0,\;\;{\varepsilon ^{{\rm{rel}}}} = 1e - 2,\;\;{\varepsilon ^{{\rm{abs}}}} = 1e - 4$ ;
(2) 物联网节点i参数设置: $k = 0,\gamma _{i,w}^0 = 0,\gamma _{i,b}^0 = 0$ ;
(3)Repeat /算法进行迭代/
(4) 　　服务器整合物联网节点上传的中间参数 $\left( {{{w}}_i^k,b_i^k} \right)$ 及 $\left( {\gamma _{i,w}^k,\gamma _{i,b}^k} \right)$ ，求得各变量均值 ${\overline {{w}} ^k},{\overline b ^k},{\overline {{\gamma _w}} ^k},{\overline {{\gamma _b}} ^k}$ ，根据式(12)和式(13)更新参数　　　　 $\left( {{{{w}}^{k + 1}},{b^{k + 1}}} \right)$ ，并将结果广播给物联网节点;
(5) 　　物联网节点i根据服务器提供的参数 $\left( {{{{w}}^{k + 1}},{b^{k + 1}}} \right)$ 对问题式(14)进行求解得到参数 $\left( {{{w}}_i^{k + 1},b_i^{k + 1}} \right)$ ;
(6) 　　物联网节点i根据式(17)和式(18)更新对偶变量 $\left( {\gamma _{i,w}^{k + 1},\gamma _{i,b}^{k + 1}} \right)$ ;
(7) 　　物联网节点i向服务器发送新的中间参数 $\left( {{{w}}_i^{k + 1},b_i^{k + 1}} \right)$ 及 $\left( {\gamma _{i,w}^{k + 1},\gamma _{i,b}^{k + 1}} \right)$ ;
(8) $k \leftarrow k + 1$ ;
(9) until ${\left\\| { {r^k} } \right\\|_2} \le {\varepsilon ^{ {\rm{rel} } } }{\rm{,} }{\left\\| { {s^k} } \right\\|_2} \le {\varepsilon ^{ {\rm{abs} } } }$ ; /算法达到收敛准则，停止迭代/

下载: 导出CSV

参考文献(15)

BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends® in Machine Learning, 2011, 3(1): 1–122. doi: 10.1561/2200000016

邵振峰, 蔡家骏, 王中元, 等. 面向智能监控摄像头的监控视频大数据分析处理[J]. 电子与信息学报, 2017, 39(5): 1116–1122. doi: 10.11999/JEIT160712

SHAO Zhenfeng, CAI Jiajun, WANG Zhongyuan, et al. Analytical processing method of big surveillance video data based on smart monitoring cameras[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1116–1122. doi: 10.11999/JEIT160712

DHAR S, YI C R, RAMAKRISHNAN N, et al. ADMM based scalable machine learning on spark[C]. 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, USA, 2015: 1174–1182. doi: 10.1109/BigData.2015.7363871.

赵海涛, 朱银阳, 丁仪, 等. 车联网中基于移动边缘计算的内容感知分类卸载算法研究[J]. 电子与信息学报, 2020, 42(1): 20–27. doi: 10.11999/JEIT190594

ZHAO Haitao, ZHU Yinyang, DING Yi, et al. Research on content-aware classification offloading algorithm based on mobile edge calculation in the internet of vehicles[J]. Journal of Electronics &Information Technology, 2020, 42(1): 20–27. doi: 10.11999/JEIT190594

SHI Weisong, ZHANG Xingzhou, WANG Yifan, et al. Edge computing: State-of-the-art and future directions[J]. Journal of Computer Research and Development, 2019, 56(1): 69–89. doi: 10.7544/issn1000-1239.2019.20180760

CHOUDHURY T, GUPTA A, PRADHAN S, et al. Privacy and security of cloud-based Internet of Things (IoT)[C]. The 3rd International Conference on Computational Intelligence and Networks (CINE), Odisha, India, 2017: 40–45. doi: 10.1109/CINE.2017.28.

ZOU Hui and HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 2005, 67(2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x

SHEN Yi, HAN Bing, and BRAVERMAN E. Stability of the elastic net estimator[J]. Journal of Complexity, 2016, 32(1): 20–39. doi: 10.1016/j.jco.2015.07.002

LUO Hezhi, SUN Xiaoling, and LI Duan. On the convergence of augmented lagrangian methods for constrained global optimization[J]. SIAM Journal on Optimization, 2007, 18(4): 1209–1230. doi: 10.1137/060667086

BERTSEKAS D P and TSITSIKLIS J N. Parallel and Distributed Computation: Numerical Methods[M]. London: Prentice Hall, 1989.

ZHANG Yueqin, ZHENG Hao, and ZHANG Chuanlin. Global convergence of a modified PRP conjugate gradient method[J]. Procedia Engineering, 2012, 31: 986–995. doi: 10.1016/j.proeng.2012.01.1131

HE Bingsheng and YUAN Xiaoming. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method[J]. SIAM Journal on Numerical Analysis, 2012, 50(2): 700–709. doi: 10.1137/110836936

NISHIHARA R, LESSARD L, RECHT B, et al. A general analysis of the convergence of ADMM[C]. The 32nd International Conference on International Conference on Machine Learning, Lille, France, 2015: 343–352.

EFRON B, HASTIE T, JOHNSTONE I, et al. Least angle regression[J]. Annals of Statistics, 2004, 32(2): 407–451. doi: 10.1214/009053604000000067

SPENCER B, ALFANDI O, and AL-OBEIDAT F. A refinement of lasso regression applied to temperature forecasting[J]. Procedia Computer Science, 2018, 130: 728–735. doi: 10.1016/j.procs.2018.04.127

施引文献

资源附件(0)

访问统计