多变量时间序列中基于克罗内克压缩感知的缺失数据预测算法

郭艳; 宋晓祥; 李宁; 钱鹏

doi:10.11999/JEIT180541

多变量时间序列中基于克罗内克压缩感知的缺失数据预测算法

doi: 10.11999/JEIT180541 cstr: 32379.14.JEIT180541

陆军工程大学通信工程学院南京 210007

基金项目: 国家自然科学基金(61571463, 61371124, 61472445)；江苏省自然科学基金(BK20171401)

详细信息

作者简介:
郭艳：女，1971年生，教授，研究方向为大数据、信号处理、压缩感知

宋晓祥：男，1993年生，硕士生，研究方向为大数据、压缩感知

李宁：男，1967年生，副教授，研究方向为信号处理、认知无线电

钱鹏：男，1991年生，博士生，研究方向为压缩感知、无源目标定位

通讯作者:
宋晓祥　guoyan_1029@sina.com

中图分类号: TN911.7
计量
- 文章访问数: 2356
- HTML全文浏览量: 994
- PDF下载量: 82
- 被引次数: 0
出版历程
- 收稿日期: 2018-06-01
- 修回日期: 2018-10-29
- 网络出版日期: 2018-11-19
- 刊出日期: 2019-04-01

Missing Data Prediction Based on Kronecker Compressing Sensing in Multivariable Time Series

Institute of Communications Engineering, Army Engineering University, Nanjing 210007, China

Funds: The National Natural Science Foundation of China (61571463, 61371124, 61472445), The Jiangsu Province Natural Science Foundation (BK20171401)

摘要

摘要:
针对现有算法在预测多变量时间序列中的缺失数据时不适用或只适用于缺失数据较少的情况，该文提出一种基于克罗内克压缩感知的缺失数据预测算法。首先，利用多变量时间序列的时域平滑特性和序列之间的潜在相关性从时空两个方面设计了稀疏表示基，从而将缺失数据预测问题建模成稀疏向量恢复问题。模型求解部分，根据缺失数据的位置特点设计了适合当前应用场景且与稀疏表示基相关性低的观测矩阵。接着，从稀疏表示向量是否足够稀疏和感知矩阵是否满足有限等距特性两个方面验证了模型的性能。最后，仿真结果表明，所提算法在数据缺失严重的情况下具有良好的性能。
- 多变量时间序列 /
- 缺失数据 /
- 克罗内克压缩感知 /
- 时域平滑特性 /
- 潜在相关性
Abstract:
In view of the problem that the existing methods are not applicable or are only feasible to the case where only a low ratio of data are missing in multivariable time series, a missing data prediction algorithm is proposed based on Kronecker Compressed Sensing (KCS) theory. Firstly, the sparse representation basis is designed to largely utilize both the temporal smoothness characteristic of time series and potential correlation between multiple time series. In this way, the missing data prediction problem is modeled into the problem of sparse vector recovery. In the solution part of the model, according to the location of missing data, the measurement matrix is designed suitable for the current application scenario and low correlation with the sparse representation basis. Then, the validity of the model is verified from two aspects: Whether the sparse representation vector is sufficiently sparse and the sensing matrix satisfies the restricted isometry property. Simulation results show that the proposed algorithm has good performance in the case where a high ratio of data are missing.
- Multivariable time series /
- Missing data /
- Kronecker Compressing Sensing (KCS) /
- Temporal smoothness characteristic /
- Potential correlation

HTML全文

图 1 MOTES中数据缺失类型的影响

下载: 全尺寸图片幻灯片

图 2 GSA中数据缺失类型的影响

下载: 全尺寸图片幻灯片

图 3 SST中数据缺失类型的影响

下载: 全尺寸图片幻灯片

图 4 数据集特性对算法性能的影响

下载: 全尺寸图片幻灯片

表 1 多变量时间序列

数据源	t₁	t₂	t₃	t₄	$\cdots$	t_K
s₁	0.2	?	?	0.3		1.9
s₂	?	0.4	0.5	?		?
s₃	?	0.7	?	0.6		?
s₄	0.8	?	?	?		2.1
$ \vdots $	$ \vdots $	$ \vdots $	$ \vdots $	$ \vdots $		$ \vdots $
s_N	?	0.1	1.2	1.3		?

下载: 导出CSV

表 2 稀疏表示基稀疏性能比较

n	100	200	300	500
MOTES_1	0.9897	0.9952	0.9973	0.9992
MOTES_2	0.7428	0.8731	0.9301	0.9786
GSA_1	0.9687	0.9995	0.9998	0.9999
GSA_2	0.5227	0.6841	0.9018	0.9721
SST_1	0.8759	0.9429	0.9722	0.9935
SST_2	0.7390	0.8785	0.9393	0.9849

下载: 导出CSV

表 3 稀疏表示基和观测矩阵非相关性的大小

NK	2000	3000	4000	6000	8000
$I({ {{ψ}}_R},{ {φ}_{{\rm{KCS}}1}})$	1998	2995	3997	5996	7993
$I({ {{ψ}}_R},{ {φ}_{{\rm{KCS}}2}})$	1998	2999	4000	5999	8000
$I({ {{ψ}}_E},{ {φ}_{{\rm{KCS}}1}})$	1999	2996	3997	5999	7995
$I({ {{ψ}}_E},{ {φ}_{{\rm{KCS}}2}})$	1999	2999	3999	6000	8000
$I({ {{ψ}}_C},{ {φ}_{{\rm{KCS}}1}})$	1997	2996	3997	5995	7995
$I({ {{ψ}}_C},{ {φ}_{{\rm{KCS}}2}})$	1998	2999	3998	5999	7998

下载: 导出CSV

表 4 各方法在不同数据缺失率下的性能比较

		数据缺失率(%)
		20	50	80	90	95
MOTES	KCS-GAMP-SBL	0.0266	0.0356	0.0840	0.1101	0.1509
	TDMF	0.0287	0.0871	0.1962	0.2604	0.3684
	SI	0.0862	0.1604	1.0172	1.7662	3.5204
	RNN	0.0392	0.0951	0.8875	0.9174	1.0529
	KPPCA	0.0278	0.0608	0.1826	0.2769	0.4516
GSA	KCS-GAMP-SBL	0.0357	0.0460	0.1985	0.2769	0.3304
	TDMF	0.0548	0.1278	0.3557	0.4764	0.5688
	SI	0.0935	0.1455	0.8348	2.9442	4.6315
	RNN	0.0526	0.0945	1.0858	1.1639	1.7984
	KPPCA	0.0498	0.1012	0.3892	0.4879	0.5872
SST	KCS-GAMP-SBL	0.0181	0.0327	0.0956	0.1225	0.1767
	TDMF	0.0242	0.0544	0.1788	0.2374	0.2961
	SI	0.0485	0.0851	0.7451	1.3802	2.0596
	RNN	0.0262	0.0488	0.3417	0.5896	1.0421
	KPPCA	0.0226	0.0644	0.1736	0.2705	0.2909

下载: 导出CSV

表 5 不同算法平均运行时间(ART)的比较(s)

	SI	RNN	KPPCA	TDMF	KCS-GAMP-SBL
MOTES	0.7350	3.1832	11.5781	8.4362	3.0802
GSA	1.5361	7.0320	24.8685	20.4093	15.8633
SST	0.7216	2.5216	10.8683	8.9381	2.9032

下载: 导出CSV

表 6 稀疏表示基的选择对算法均方根误差(RMSE)的影响

	数据缺失率(%)
	20	50	80	90	95
MOTES_1	0.0266	0.0356	0.0930	0.1201	0.1509
MOTES_2	0.0312	0.0422	0.1151	0.1387	0.1954
GSA_1	0.0357	0.0460	0.1985	0.2769	0.3304
GSA_2	0.0412	0.0681	0.2313	0.3343	0.4068
SST_1	0.0181	0.0327	0.0996	0.1275	0.1767
SST_2	0.0199	0.0340	0.1265	0.1534	0.2463

下载: 导出CSV

参考文献(21)

SOWMYA R and SUNEETHA K R. Data mining with big data[C]. International Conference on Intelligent Systems and Control, Coimbatore, India, 2017: 246–250. doi: 10.1109/ISCO.2017.7855990.

JAQUES N, TAYLOR S, SANO A, et al. Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction[C]. International Conference on Affective Computing & Intelligent Interaction, San Antonio, USA, 2017: 202–208. doi: 10.1109/ACII.2017.8273601.

BALOUJI E, SALOR Q, and ERMIS M. Exponential smoothing of multiple reference frame components with GPUs for real-time detection of time-varying harmonics and interharmonics of EAF currents[C]. IEEE Industry Applications Society Meeting, Cincinatti, USA, 2017: 1–8. doi: 10.1109/IAS.2017.8101815.

KOZERA R and WILKOLAZKA M. Natural spline interpolation and exponential parameterization for length estimation of curves[C]. International Conference of Numerical Analysis & Applied Mathematics, Rhodes, Greece, 2017: 1–140.

LAO Wenchao, WANG Ying, CHEN Peng, et al. Time series forecasting via weighted combination of trend and seasonality respectively with linearly declining increments and multiple sine functions[C]. International Joint Conference on Neural Networks, Beijing, China, 2014: 832–837. doi: 10.1109/IJCNN.2014.6889609.

LIPPI M, BERTINI M, and FRASCONI P. Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning[J]. IEEE Transactions on Intelligent, Transportation, System, 2013, 14(2): 871–882 doi: 10.1109/TITS.2013.2247040

STRAUMAN A S, BIANCHI F M, and MIKALSEN K Ø. Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks[C]. International Conference on Biomedical & Health Informatics, Las Vegas, USA, 2018: 307–310. doi: 10.1109/BHI.2018.8333430.

LI Li, LI Yuebiao, and LI Zhiheng. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence[J]. Transportation Research Part C, 2013, 34(9): 108–120.

LI Yuebiao, LI Zhiheng, LI Li, et al. Comparison on PPCA, KPPCA and MPPCA based missing data imputing for traffic flow[C]. Proceedings of IEEE Conference on Intelligent Transportation System, Wuhan, China, 2013: 1535–1540.

SHI Weiwei, ZHU Yongxin, and HUANG Tian. Effective prediction of missing data on apache spark over multivariable time series[J]. IEEE Transactions on Big Data, 2018 doi: 10.1109/TBDATA.2017.2719703

HADI A and WAHIDAH I. Delay estimation using compressive sensing on WSN IEEE 802.15.4[C]. International Conference on Control, Electronics, Renewable Energy and Communications, Bandung, Indonesia, 2017: 192–197. doi: 10.1109/ICCEREC.2016.7814975.

DUARTEAND M F and BARANIUK R G. Kronecker compressive sensing[J]. IEEE Transactions on Image Processing, 2012, 21(2): 494–504 doi: 10.1109/TIP.2011.2165289

ZHOU Haifei, TAN Liangsheng, GE Fei, et al. Traffic matrix estimation: Advanced-Tomogravity method based on a precise gravity model[J]. International Journal of Communication Systems, 2015, 28(10): 1709–1728 doi: 10.1002/dac.2787

CHEN S S, DONOHO D L, and SAUNDERS M A. Atomic decomposition by basis pursuit[J]. SIAM Review, 2001, 43(1): 129–159 doi: 10.1137/S003614450037906X

TROPP J A and GILBERT A C. Signal recovery from random measurements via orthogonal matching pursuit[J]. IEEE Transactions on Information Theory, 2007, 53(12): 4655–4666 doi: 10.1109/TIT.2007.909108

CHARTRAND R and YIN W. Iteratively reweighted algorithms for compressive sensing[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 2008: 3869–3872. doi: 10.1109/ICASSP.2008.4518498.

AL-SHOUKAIRI M, SCHNITER P, and RAO B D. A GAMP based low complexity sparse Bayesian learning algorithm[J]. IEEE Transactions on Signal Processing, 2018, 66(2): 294–308 doi: 10.1109/TSP.2017.2764855

SAMUEL M. Intel lab data[OL]. http://db.csail.mit.edu, 2016.

FONOLLOSA J, SHEIK S, HUERTA R, et al. Reservoir computing compensates slow response of chemo sensor arrays exposed to fast varying gas concentrations in continuous monitoring[J]. Sensors & Actuators B Chemical, 2015, 215: 618–629.

Tropical Atmosphere Ocean. NOAA/Pacific Marine Environmental Laboratory[OL]. http://www.pmel.noaa.gov/tao/proj_over/proj_over.html, 2016.

WU Xiaopei and LIU Mingyan. In-situ soil moisture sensing: Measurement scheduling and estimation using compressive sensing[C]. ACM/IEEE, International Conference on Information Processing in Sensor Networks, Beijing, China, 2012: 1–11. doi: 10.1109/IPSN.2012.6920949.

施引文献

资源附件(0)

访问统计