Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting

YANG Zhenzhen; XU Yi; WAN Chengye; YANG Yongpeng

doi:10.11999/JEIT251188

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

YANG Zhenzhen, XU Yi, WAN Chengye, YANG Yongpeng. Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251188

Citation:

YANG Zhenzhen, XU Yi, WAN Chengye, YANG Yongpeng. Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251188

Citation:

PDF( 3364 KB)

Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting

doi: 10.11999/JEIT251188 cstr: 32379.14.JEIT251188

1.
College of Science, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Network and Communication, Nanjing Vocational College of Information Technology, Nanjing, 210023, China

Funds: The National Natural Science Foundation of China (62571269), The Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX24_1125, SJCX24_0279)

Received Date: 2025-11-11
Accepted Date: 2026-01-13
Rev Recd Date: 2026-01-12

Available Online: 2026-03-06

Abstract

Abstract

Objective With the rapid development of big data technology, time series data are increasingly used in meteorology, power systems, finance, and other fields. However, mainstream forecasting methods face challenges in multi-scale modeling and frequency-domain feature extraction, which limit the ability to capture dynamic properties and periodic patterns in complex datasets. Traditional statistical approaches, such as AutoRegressive Integrated Moving Average (ARIMA), rely on assumptions of linear relationships and therefore perform poorly when applied to nonlinear or high-dimensional time series data. Although deep learning methods, particularly those based on convolutional neural networks and Transformer architectures, improve forecasting accuracy through advanced feature extraction and long-range dependency modeling, limitations remain in efficiently extracting and integrating multi-scale features in both temporal and frequency domains. These limitations reduce stability and forecasting accuracy, especially in dynamic and heterogeneous applications. This study proposes an intelligent forecasting framework that models multi-scale information and improves prediction accuracy across different scenarios. Methods A Multi-scale Frequency Adapter and Dual-path Attention (MFADA) framework is proposed for time series forecasting. The framework integrates two key modules: the Multi-scale Frequency Adapter (MFA) and the Multi-scale Dual-path Attention (MDA). The MFA module captures multi-scale frequency features through adaptive pooling and deep convolution operations. This design improves sensitivity to different frequency components and supports modeling of both short-term and long-term dependencies. The MDA module applies a multi-scale attention mechanism to strengthen fine-grained modeling across temporal and feature dimensions. It enables effective extraction and fusion of comprehensive time-domain and frequency-domain information. The framework is designed with computational efficiency to ensure scalability. Experiments on eight public datasets verify the effectiveness and robustness of the proposed method compared with existing time series forecasting approaches. Results and Discussions Extensive experiments were conducted on eight publicly available multivariate datasets, including ECL, Weather, ETT (ETTm1, ETTm2, ETTh1, ETTh2), Solar-Energy, and Traffic. Evaluation metrics include Mean Absolute Error (MAE) and Mean Squared Error (MSE). Model complexity was assessed through parameter count, FLoating Point Operations (FLOPs), and training time. Comparisons were performed with state-of-the-art models, including Fredformer, Peri-midFormer, iTransformer, TFformer, PatchTST, MSGNet, TimesNet, and TCM. Results show that MFADA achieves superior forecasting performance on most datasets and forecasting horizons (Table 1). The model obtains the best average MSE and MAE of 0.163 and 0.261 on ECL, representing decreases of 13.2% and 17.3% compared with TimesNet for forecasting length 96. On the periodic ETTm1 dataset, the average MSE reaches 0.377, which is 5.3% lower than MSGNet. Ablation experiments (Table 2) confirm the contributions of the MFA and MDA modules. Removing MFA or replacing MDA with standard self-attention increases forecasting errors on ECL, Weather, ETTh1, and ETTh2. These results indicate the complementary roles of both modules in modeling complex temporal patterns. Complexity analysis (Fig. 2) shows that MFADA achieves a balanced trade-off among forecasting accuracy, parameter efficiency, and training time, outperforming Fredformer, MSGNet, and TimesNet. Visualization results for ECL and ETTh2 (Fig. 3, Fig. 4) demonstrate that MFADA effectively follows ground-truth trends, captures turning points, and improves prediction accuracy at both global and local levels. Performance on the Traffic dataset is relatively weaker because of strong spatial correlations in the data, which indicates potential directions for future research. Conclusions This paper proposes MFADA, a time series forecasting method that integrates multi-scale frequency adaptation and dual-path attention mechanisms. MFADA presents four main advantages: (1) The MFA module effectively extracts and integrates multi-scale frequency-domain features through pyramid pooling and channel gating, which improves representation across different temporal scales. (2) The MDA module captures multi-scale dependencies in both temporal and feature dimensions, enabling fine-grained dynamic modeling. (3) The architecture maintains computational efficiency through lightweight convolution and pooling operations. (4) Experimental results across eight datasets and multiple forecasting horizons demonstrate strong generalization ability, particularly for multivariate and long-term forecasting tasks. These results show that MFADA improves both accuracy and efficiency in time series forecasting and provides useful directions for research and practical applications. Future work will explore the integration of spatial correlation information to further improve model applicability.
- Time series forecasting,
- Multi-scale,
- Frequency adapter,
- Dual-path attention,
- Attention mechanism

FullText(HTML)

References(26)

References

[1]	KONG Xiangjie, CHEN Zhenghao, LIU Weiyao, et al. Deep learning for time series forecasting: A survey[J]. International Journal of Machine Learning and Cybernetics, 2025, 16(5): 5079–5112. doi: 10.1007/s13042-025-02560-w.
[2]	ZHONG Weiyi, ZHAI Dengshuai, XU Wenran, et al. Accurate and efficient daily carbon emission forecasting based on improved ARIMA[J]. Applied Energy, 2024, 376: 124232. doi: 10.1016/j.apenergy.2024.124232.
[3]	潘金伟, 王乙乔, 钟博, 等. 基于统计特征搜索的多元时间序列预测方法[J]. 电子与信息学报, 2024, 46(8): 3276–3284. doi: 10.11999/JEIT231264. PAN Jinwei, WANG Yiqiao, ZHONG Bo, et al. Statistical feature-based search for multivariate time series forecasting[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3276–3284. doi: 10.11999/JEIT231264.
[4]	DA SILVA D G and DE MOURA MENESES A A M. Comparing long short-term memory (LSTM) and bidirectional LSTM deep neural networks for power consumption prediction[J]. Energy Reports, 2023, 10: 3315–3334. doi: 10.1016/j.egyr.2023.09.175.
[5]	郑庆河, 李秉霖, 于治国, 等. 深度学习使能的自动调制分类技术研究进展[J]. 电子与信息学报, 2025, 47(11): 4096–4111. doi: 10.11999/JEIT250674. ZHENG Qinghe, LI Binglin, YU Zhiguo, et al. Research progress of deep learning enabled automatic modulation classification technology[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4096–4111. doi: 10.11999/JEIT250674.
[6]	刘辉, 冯浩然, 马佳妮, 等. 融合空间自注意力感知的严重缺失多元时间序列插补算法[J]. 电子与信息学报, 2025, 47(10): 3917–3928. doi: 10.11999/JEIT250220. LIU Hui, FENG Haoran, MA Jiani, et al. Spatial self-attention incorporated imputation algorithm for severely missing multivariate time series[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3917–3928. doi: 10.11999/JEIT250220.
[7]	RABBANI M B A, MUSARAT M A, ALALOUL W S, et al. A comparison between seasonal autoregressive integrated moving average (SARIMA) and exponential smoothing (ES) based on time series model for forecasting road accidents[J]. Arabian Journal for Science and Engineering, 2021, 46(11): 11113–11138. doi: 10.1007/s13369-021-05650-3.
[8]	WU Haixu, HU Tengge, LIU Yong, et al. TimesNet: Temporal 2D-variation modeling for general time series analysis[C]. The 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
[9]	COUTINHO E R, MADEIRA J G F, BORGES D G F, et al. Multi-step forecasting of meteorological time series using CNN-LSTM with decomposition methods[J]. Water Resources Management, 2025, 39(7): 3173–3198. doi: 10.1007/s11269-025-04102-z.
[10]	CAI Wanlin, LIANG Yuxuan, LIU Xianggen, et al. MSGNet: Learning multi-scale inter-series correlations for multivariate time series forecasting[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 11141–11149. doi: 10.1609/aaai.v38i10.28991.
[11]	YUNITA A, PRATAMA M H D I, ALMUZAKKI M Z, et al. Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models[J]. MethodsX, 2025, 15: 103462. doi: 10.1016/j.mex.2024.103462.
[12]	YADAV H and THAKKAR A. NOA-LSTM: An efficient LSTM cell architecture for time series forecasting[J]. Expert Systems with Applications, 2024, 238: 122333. doi: 10.1016/j.eswa.2023.122333.
[13]	UBAL C, DI-GIORGI G, CONTRERAS-REYES J E, et al. Predicting the long-term dependencies in time series using recurrent artificial neural networks[J]. Machine Learning and Knowledge Extraction, 2023, 5(4): 1340–1358. doi: 10.3390/make5040068.
[14]	ZENG Ailing, CHEN Muxi, ZHANG Lei, et al. Are transformers effective for time series forecasting?[C]. The 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 11121–11128. doi: 10.1609/aaai.v37i9.26317.
[15]	JIANG Hongwei, LIU Dongsheng, DING Xinyi, et al. TCM: An efficient lightweight MLP-based network with affine transformation for long-term time series forecasting[J]. Neurocomputing, 2025, 617: 128960. doi: 10.1016/j.neucom.2024.128960.
[16]	ZHOU Haoyi, ZHANG Shanghang, PENG Jieqi, et al. Informer: Beyond efficient transformer for long sequence time-series forecasting[C].The 35th AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2021: 11106–11115. doi: 10.1609/aaai.v35i12.17325.
[17]	WU Haixu, XU Jiehui, WANG Jianmin, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting[C]. The 35th Conference on Neural Information Processing Systems, Red Hook, USA, 2021: 22419–22430.
[18]	ZHOU Tian, MA Ziqing, WEN Qingsong, et al. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting[C]. The International Conference on Machine Learning, Baltimore, USA, 2022: 27268–27286.
[19]	NIE Yuqi, NGUYEN N H, SINTHONG P, et al. A time series is worth 64 words: Long-term forecasting with transformers[C]. The 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
[20]	WU Qiang, YAO Gechang, FENG Zhixi, et al. Peri-midFormer: Periodic pyramid transformer for time series analysis[C]. The 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 415. doi: 10.52202/079017-0415.
[21]	LIU Yong, HU Tengge, ZHANG Haoran, et al. iTransformer: Inverted transformers are effective for time series forecasting[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024.
[22]	ZHAO Tianlong, FANG Lexin, MA Xiang, et al. TFformer: A time-frequency domain bidirectional sequence-level attention based transformer for interpretable long-term sequence forecasting[J]. Pattern Recognition, 2025, 158: 110994. doi: 10.1016/j.patcog.2024.110994.
[23]	ZHOU Tian, NIU Peisong, WANG Xue, et al. One fits all: Power general time series analysis by pretrained LM[C]. The 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 1877.
[24]	PIAO Xihao, CHEN Zheng, MURAYAMA T, et al. Fredformer: Frequency debiased transformer for time series forecasting[C]. The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 2024: 2400–2410. doi: 10.1145/3637528.3671928.
[25]	GAO Shixuan, ZHANG Pingping, YAN Tianyu, et al. Multi-scale and detail-enhanced segment anything model for salient object detection[C]. The 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 9894–9903. doi: 10.1145/3664647.3680650.
[26]	SI Yunzhong, XU Huiying, ZHU Xinzhong, et al. SCSA: Exploring the synergistic effects between spatial and channel attention[J]. Neurocomputing, 2025, 634: 129866. doi: 10.1016/j.neucom.2025.129866.