Multi-Scale Attention Recurrent Network with Multi-order Taylor Differential Knowledge for Deep Spatiotemporal Sequence Prediction

SUN Qiang; ZHAO Ke

doi:10.11999/JEIT231108

Volume 46 Issue 6

Jun. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(6): 2605-2618

SUN Qiang, ZHAO Ke. Multi-Scale Attention Recurrent Network with Multi-order Taylor Differential Knowledge for Deep Spatiotemporal Sequence Prediction[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2605-2618. doi: 10.11999/JEIT231108

Citation:

SUN Qiang, ZHAO Ke. Multi-Scale Attention Recurrent Network with Multi-order Taylor Differential Knowledge for Deep Spatiotemporal Sequence Prediction[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2605-2618. doi: 10.11999/JEIT231108

Citation:

PDF( 10791 KB)

Multi-Scale Attention Recurrent Network with Multi-order Taylor Differential Knowledge for Deep Spatiotemporal Sequence Prediction

doi: 10.11999/JEIT231108 cstr: 32379.14.JEIT231108

SUN Qiang^,,
ZHAO Ke

Department of Communication Engineering, School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China

Funds: The Open Research Fund of Key Laboratory of Ecology and Environment in Qinling and Loess Plateau of Shaanxi Meteorological Bureau (2021G-28)

Received Date: 2023-10-11
Rev Recd Date: 2024-04-29

Available Online: 2024-05-15

Publish Date: 2024-06-30

Abstract

Abstract

Deep spatiotemporal sequence prediction methods that incorporate a priori physical knowledge are commonly characterized by the utilization of Partial Differential Equations (PDE) for modeling. However, two main issues are concerned: (1) the limited precision in approximations with PDEs; and (2) the inability to efficiently capture spatiotemporal features at multiple spatial scales as well as the edge spatial information of the spatiotemporal sequences in the recurrent network. To address these challenges, one Taylor Differential Incorporated Convolutional Recurrent Neural Network (TDI-CRNN) is proposed in this paper. Firstly, in order to enhance the approximation accuracy of higher-order partial differential equations and to alleviate the limitations of PDE applications, one physical module with multi-order Taylor approximation is designed. The module is firstly used for the differential approximation of the input sequence by means of the Taylor expansion, and then couples the differential convolution layers with different orders via differential coefficients, and dynamically adjusts the truncation order and the number of differential terms of the Taylor expansions. Secondly, to capture the multiple spatial scale features of the hidden states in the recurrent network and to better capture the edge spatial information of the spatiotemporal sequences, one Multi-Scale Attention Recurrent Module (MSARM) is devised. Multi-scale convolution and spatial attention mechanisms are utilized in the convolution layer of the Multi-scale Convolution Spatial Attention UNet (MCSA-UNet), aiming to focus on local spatial regions within spatiotemporal sequences. Extensive experiments are conducted on the Moving MNIST, KTH, and CIKM datasets. The Mean Squared Error (MSE) on the Moving MNIST dataset dropped to 42.7, while the Structural Similarity Index Measure (SSIM) increased to 0.912. The SSIM and Peak Signal-to-Noise Ratio (PSNR) on the KTH dataset increased to 0.882 and 29.03, respectively. The Correct Skill Index (CSI) on the real weather radar echo CIKM dataset increased to 0.515. The final visualization and quantitative prediction results verify the rationality and effectiveness of the TDI-CRNN model.
- Spatiotemporal sequences prediction,
- Long Short Term Memory (LSTM),
- Knowledge-guided,
- Partial differential equation,
- Taylor expansion

FullText(HTML)

References(34)

References

[1]	刘博, 王明烁, 李永, 等. 深度学习在时空序列预测中的应用综述[J]. 北京工业大学学报, 2021, 47(8): 925–941. doi: 10.11936/bjutxb2020120037. LIU Bo, WANG Mingshuo, LI Yong, et al. Deep learning for spatio-temporal sequence forecasting: A survey[J]. Journal of Beijing University of Technology, 2021, 47(8): 925–941. doi: 10.11936/bjutxb2020120037.
[2]	周康辉. 基于深度卷积神经网络的强对流天气预报方法研究[D]. [博士论文], 中国气象科学研究院, 2021. doi: 10.27631/d.cnki.gzqky.2021.000006. ZHOU Kanghui. Convective weather forecasting with convolutional neural networks[D]. [Ph. D. dissertation], Chinese Academy of Meteorological Sciences, 2021. doi: 10.27631/d.cnki.gzqky.2021.000006.
[3]	杨函. 基于深度学习的气象预测研究[D]. [硕士论文], 哈尔滨工业大学, 2017. YANG Han. Research on weather forecasting based on deep learning[D]. [Master dissertation], Harbin Institute of Technology, 2017.
[4]	徐成鹏, 曹勇, 张恒德, 等. U-Net模型在京津冀临近降水预报中的应用和检验评估[J]. 气象科学, 2022, 42(6): 781–792. doi: 10.12306/2022jms.0078. XU Chengpeng, CAO Yong, ZHANG Hengde, et al. Application and test evaluation of U-Net model in Beijing-Tianjin-Hebei precipitation nowcasting[J]. Journal of the Meteorological Sciences, 2022, 42(6): 781–792. doi: 10.12306/2022jms.0078.
[5]	SHI Xingjian, CHEN Zhourong, WANG Hao, et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 802–810.
[6]	SHI Xingjian, GAO Zhihan, LAUSEN L, et al. Deep learning for precipitation nowcasting: A benchmark and a new model[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5622–5632.
[7]	LIN Zhihui, LI Maomao, ZHENG Zhuobin, et al. Self-attention ConvLSTM for spatiotemporal prediction[C]. The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, USA, 2020: 11531–11538. doi: 10.1609/aaai.v34i07.6819.
[8]	SU Jiahao, BYEON W, KOSSAIFI J, et al. Convolutional tensor-train LSTM for spatio-temporal learning[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1150.
[9]	WANG Yunbo, WU Haixu, ZHANG Jianjin, et al. PredRNN: A recurrent neural network for spatiotemporal predictive learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2208–2225. doi: 10.1109/TPAMI.2022.3165153.
[10]	WANG Yunbo, GAO Zhifeng, LONG Mingsheng, et al. PredRNN++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 5123–5132.
[11]	WU Haixu, YAO Zhiyu, WANG Jianmin, et al. MotionRNN: A flexible model for video prediction with spacetime-varying motions[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 15430–15439. doi: 10.1109/CVPR46437.2021.01518.
[12]	WANG Yunbo, ZHANG Jianjin, ZHU Hongyu, et al. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 9146–9154. doi: 10.1109/CVPR.2019.00937.
[13]	王杨刚, 郝丽荣, 黄辉, 等. 基于空间数据和专家知识驱动的地质编图技术研究与应用[J]. 地质通报, 2019, 38(12): 2067–2076. doi: 10.12097/j.issn.1671-2552.2019.12.015. WANG Yanggang, HAO Lirong, HUANG Hui, et al. Research on geological map compilation technology based on spatial data and geological knowledge[J]. Geological Bulletin of China, 2019, 38(12): 2067–2076. doi: 10.12097/j.issn.1671-2552.2019.12.015.
[14]	毛超利. 基于深度学习的偏微分方程求解方法[J]. 智能物联技术, 2021, 53(5): 18–23,30. MAO Chaoli. A method for solving partial differential equations based on deep learning[J]. Technology of IoT & AI, 2021, 53(5): 18–23,30.
[15]	金哲, 张引, 吴飞, 等. 数据驱动与知识引导结合下人工智能算法模型[J]. 电子与信息学报, 2023, 45(7): 2580–2594. doi: 10.11999/JEIT220700. JIN Zhe, ZHANG Yin, WU Fei, et al. Artificial intelligence algorithms based on data-driven and knowledge-guided models[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2580–2594. doi: 10.11999/JEIT220700.
[16]	LONG Zichao, LU Yiping, MA Xianzhong, et al. PDE-Net: Learning PDEs from data[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 3208–3216.
[17]	LE GUEN V and THOME N. Disentangling physical dynamics from unknown factors for unsupervised video prediction[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 11471–11481. doi: 10.1109/CVPR42600.2020.01149.
[18]	HSIEH J T, LIU Bingbin, HUANG Dean, et al. Learning to decompose and disentangle representations for video prediction[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 515–524.
[19]	FINN C, GOODFELLOW I, and LEVINE S. Unsupervised learning for physical interaction through video prediction[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 64–72.
[20]	REN Pu, RAO Chengping, YANG Liu, et al. PhyCRNet: Physics-informed convolutional-recurrent network for solving spatiotemporal PDEs[J]. Computer Methods in Applied Mechanics and Engineering, 2022, 389: 114399. doi: 10.1016/j.cma.2021.114399.
[21]	DE BÉZENAC E, PAJOT A, and GALLINARI P. Deep learning for physical processes: Incorporating prior scientific knowledge[J]. Journal of Statistical Mechanics: Theory and Experiment, 2019, 2019: 124009. doi: 10.1088/1742-5468/ab3195.
[22]	KALCHBRENNER N, VAN DEN OORD A, SIMONYAN K, et al. Video pixel networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1771–1779.
[23]	SRIVASTAVA N, MANSIMOV E, and SALAKHUTDINOV R. Upervised learning of video representations using LSTMs[C]. The 32nd International Conference on International Conference on Machine Learning, Lille, France, 2015: 843–852.
[24]	SCHULDT C, LAPTEV I, and CAPUTO B. Recognizing human actions: A local SVM approach[C]. The 17th International Conference on Pattern Recognition, Cambridge, UK, 2004: 32–36. doi: 10.1109/ICPR.2004.1334462.
[25]	阿里巴巴天池大赛, CIKM AnalytiCup2017短时定量降水预测数据[EB/OL].https://tianchi.aliyun.com/dataset/1085.2018.
[26]	WANG Yunbo, LU Jiang, YANG M H, et al. Eidetic 3D LSTM: A model for video prediction and beyond[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019: 1–14.
[27]	ZHANG Jianjin, WANG Yunbo, LONG Mingsheng, et al. Z-Order recurrent neural networks for video prediction[C]. 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019: 230–235. doi: 10.1109/ICME.2019.00048.
[28]	LIU Guixin and MA Zhonghua. Prediction of spatiotemporal sequence based on IM-LSTM[C]. 2022 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Nanjing, China, 2022: 247–250. doi: 10.1109/CEI57409.2022.9950135.
[29]	DE BRABANDERE B, JIA Xu, TUYTELAARS T, et al. Dynamic filter networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 667–675.
[30]	VILLEGAS R, YANG Jimei, HONG S, et al. Decomposing motion and content for natural video sequence prediction[C]. 5th International Conference on Learning Representations, Toulon, France, 2017.
[31]	JIN Beibei, HU Yu, TANG Qiankun, et al. Exploring spatial-temporal multi-frequency analysis for high-fidelity and temporal-consistency video prediction[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 4553–4562. doi: 10.1109/CVPR42600.2020.00461.
[32]	OLIU M, SELVA J, and ESCALERA S. Folded recurrent neural networks for future video prediction[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 745–761. doi: 10.1007/978-3-030-01264-9_44.
[33]	XIONG Taisong, HE Jianxing, WANG Hao, et al. Contextual Sa-attention convolutional LSTM for precipitation nowcasting: A spatiotemporal sequence forecasting view[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 12479–12491. doi: 10.1109/JSTARS.2021.3128522.
[34]	LEE S, KIM H G, CHOI D H, et al. Video prediction recalling long-term motion context via memory alignment learning[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 3053–3062. doi: 10.1109/CVPR46437.2021.00307.