An Outlier Cleaning Algorithm Based on Deep Learning
摘要: 在物联网(IoT)中采用合适的异常数据清洗算法能极大地提升数据质量。许多研究人员采用统计学方法或分类聚类等方法对时-空相关数据进行清洗。但这些方法需要额外的先验知识,会给汇聚节点带来额外的计算开销。该文根据低秩-稀疏矩阵分解模型,提出一种基于深度神经网络的快速异常数据清洗算法,来解决物联网中时-空相关数据的清洗问题。结合感知数据的时-空相关性和异常值的稀疏性,将异常数据清洗问题转换为优化问题,并采用迭代阈值收缩算法(ISTA)求解该优化问题,再将ISTA算法展开成一个固定长度的深度神经网络。实际数据集的实验结果表明,该方法能够自动更新阈值,比传统的ISTA算法收敛速度更快,精度更高。Abstract: The use of appropriate abnormal data cleaning algorithms in the Internet of Things (IoT) can greatly improve data quality. Statistical methods or clustering methods are utilized to clean anomalies in Spatio-temporal data. However, these methods require additional prior knowledge, which will incur additional computational overhead for the sink node. In this paper, in line with the low-rank sparse matrix decomposition model, a fast anomaly cleaning algorithm based on a deep neural network is proposed to solve the Spatio-temporal data cleaning problem in IoT. Both the Spatio-temporal correlation of sensing data and the abnormal values' sparsity are considered in an optimization problem. The Iterative Shrinkage-Thresholding Algorithm (ISTA) is used to solve it. Then the ISTA is unfolded into a fixed-length deep neural network. The real-world dataset’s experimental results show that the proposed method can automatically update the thresholds faster and more accurately than the traditional ISTA.
表 1 ISTA-Net异常数据恢复算法
已知:测量矩阵${\boldsymbol{R}}$,深度神经网络层数$K$ (1) 初始化 $ {\boldsymbol{S}}{\text{ = }}{\boldsymbol{L}}{\text{ = }}{\boldsymbol{0}} $,${\lambda _1} > 0$,${\lambda _2} > 0$ (2) for 数据集中的每个样本 do (3) 初始化 $ {{\boldsymbol{L}}^0} $,$ {{\boldsymbol{S}}^0} $为全零矩阵,$ k = 0 $ (4) While $ k < K $ do (5) $ {{\boldsymbol{G}}_{{{\text{1}}_k}}}{\text{ = }}\frac{1}{2}{{\boldsymbol{L}}^k} - \frac{1}{2}{{\boldsymbol{S}}^k} + \frac{1}{2}{\boldsymbol{R}} $ (6) $ {{\boldsymbol{G}}_{{{\text{2}}_k}}}{\text{ = }}\frac{1}{2}{{\boldsymbol{S}}^k} - \frac{1}{2}{{\boldsymbol{L}}^k} + \frac{1}{2}{\boldsymbol{R}} $ (7) $ {{\boldsymbol{L}}^{k + 1}} = {\text{SV}}{{\text{T}}_{{\lambda _1}/{L_f}}}\left\{ {{{\boldsymbol{G}}_{{{\text{1}}_k}}}} \right\} $ (8) $ {{\boldsymbol{S}}^{k + 1}} = {\mathcal{T}_{{\lambda _2}/{L_f}}}\left\{ {{{\boldsymbol{G}}_{{2_k}}}} \right\} $ (9) $ k \leftarrow k + 1 $ (10) end while (11) 输出$ {{\boldsymbol{L}}^K} $和$ {{\boldsymbol{S}}^K} $,并计算归一化均方误差NMSE (12) 执行会话 (13) for 隐藏层或输出层的每个神经元 do (14) 更新网络中的每一个权值和偏差 (15) end for (16) end for -
