Comparing the performance of different deep learning architectures for time series forecasting

Taleblou, Reza

doi:10.22054/jmmf.2025.83410.1157

Document Type : Research Article

Author

Reza Taleblou

Faculty of Economics, Allameh Tabataba'i University, Tehran, Iran

https://doi.org/10.22054/jmmf.2025.83410.1157

Abstract

In this paper, we evaluate the performance of two machine learning architectures— Recurrent Neural Networks (RNN) and Transformer-based models—on four commodity-based company indices from the Tehran Stock Exchange. The Transformer-based models used in this study include AutoFormer, FEDformer, Informer, and PatchTST, while the RNN-based models consist of GRU and LSTM. The dataset comprises daily observations collected from April 20, 2020, to November 20, 2024. To enhance the generalization power of the models and prevent overfitting, we employ two techniques: splitting the training and test samples, and applying regularization methods such as dropout. Hyperparameters for all models were selected using a visual method. Our results indicate that the PatchTST model outperforms other methods in terms of Root Mean Squared Error (RMSE) for both 1-day and 5-day (1-week) forecasting horizons. The FEDformer model also demonstrates promising performance, particularly for forecasting the MetalOre time series. In contrast, the AutoFormer model performs relatively poorly for longer forecasting horizons, while the GRU and LSTM models yield mixed results. These findings underscore the significant impact of model selection and forecasting horizon on the accuracy of time series forecasts, emphasizing the importance of careful model choice and hyperparameter tuning for achieving optimal performance.

Keywords

References

[1] K. Benidis, S. S. Rangapuram, V. Flunkert, Y. Wang, D. Maddix, C. Turkmen, J. Gasthaus,
M. Bohlke-Schneider, D. Salinas, L. Stella, F.-X. Aubet, L. Callot, and T. Januschowski,
Deep learning for time series forecasting: Tutorial and literature survey, 2022.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, Attention is all you need, in Proceedings of the 31st International Conference
on Neural Information Processing Systems, 2017, pp. 1–12.
[3] J. Kenton, R. Rajpurkar, J. Hinton, and J. L. Ba, BERT: Pre-training of deep bidirectional
transformers for language understanding, in Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 1–12.
[4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, An image is worth
16x16 words: Transformers for image recognition at scale, in Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2021, pp. 1–12.
[5] X. Dong, J. Li, D. Yu, F. Seide, and M. L. Seltzer, Speech recognition with deep learning: A
review, IEEE Transactions on Audio, Speech, and Language Processing, 26 (2018), pp. 1–13.
[6] B. Lim and S. Zohren, Deep learning for time series forecasting: A survey, Journal of
Forecasting, 40 (2021), pp. 1–23.
[7] Y. Tay, D. Bahri, D. Metzler, D. Juan, Z. Zhao, and C. Zheng, Efficient transformers for
natural language processing, in Proceedings of the 2022 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
2022, pp. 1–12.
[8] J. Torres, D. Hadjout, A. Sebaa, F. Mart´ınez-Alvarez, and A. Troncoso, ´ Deep learning for
time series forecasting: A survey, Journal of Forecasting, 40 (2021), pp. 1–23.
[9] S. Tuli, S. K. Singh, S. K. Singh, and R. Buyya, Anomaly detection in time series data using
deep learning, Journal of Intelligent Information Systems, 58 (2022), pp. 1–15.
[10] X. Kong, Z. Chen, W. Liu, K. Ning, L. Zhang, S. M. Marier, Y. Liu, Y. Chen, and F. Xia,
Deep learning for time series forecasting: A survey, International Journal of Machine Learning and Cybernetics, (2025).
[11] Y. W. Xiong, K. Tang, M. Ma, J. Zhang, J. Xu, and T. Li, Modeling temporal dependencies
within the target for long-term time series forecasting, arXiv preprint arXiv:2406.04777v2
[cs.LG], 2024.
[12] Y. Wu, L. Zhang, and Y. Zhang, Deep learning for time series forecasting: A review, Journal
of Forecasting, 40 (2021), pp. 1–23.
[13] X. Xu, J. Li, L. Zhang, and Y. Zhang, Anomaly detection in time series data using deep
learning, Journal of Intelligent Information Systems, 58 (2022), pp. 1–15.
[14] A. Casolaro, V. Capone, G. Iannuzzo, and F. Camastra, Deep learning for time
series forecasting: Advances and open problems, Information, 14 (2023), pp. 598.
DOI:10.3390/info14110598.
[15] G. Zerveas, L. Zhang, and Y. Zhang, Deep learning for time series classification: A review,
Journal of Forecasting, 40 (2021), pp. 1–23.
[16] P. Lara-Ben´ıtez, M. Carranza-Garc´ıa, and J. C. Riquelme, An experimental review on deep
learning architectures for time series forecasting, International Journal of Neural Systems,
31 (2021).
[17] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, Informer: Beyond
efficient transformer for long sequence time-series forecasting, Journal of Machine Learning
Research, (2021), pp. 1–23.
[18] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, A time series is worth 64 words:
Long-term forecasting with transformers, in Proceedings of the International Conference on
Learning Representations (ICLR), 2023, pp. 1–12.
[19] Y. Wang, J. Li, Y. Zhang, H. Xiong, and W. Zhang, FEDformer: Frequency enhanced decomposed transformer for long-term time series forecasting, in Proceedings of the International
Conference on Learning Representations (ICLR), 2022, pp. 1–12.
[20] Y. Wu, S. Li, S. Zhang, J. Li, H. Xiong, and W. Zhang, Autoformer: Decomposition transformers for long-term time series forecasting, in Proceedings of the International Conference
on Learning Representations (ICLR), 2022, pp. 1–12.
[21] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by backpropagating errors, Nature, 323 (1986), pp. 533–536.
[22] K. Cho, B. van Merri¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2014, pp. 1724–1734.
[23] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997),
pp. 1735–1780.
[24] S. Bai, J. Z. Kolter, and V. Koltun, An empirical evaluation of generic convolutional and
recurrent networks for sequence modeling, arXiv preprint arXiv:1803.01271, 2018.
[25] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, Patch time series transformer in
Hugging Face - Getting started, Hugging Face Blog, 2023.
[26] Towards Data Science, PatchTST: A breakthrough in time series forecasting, Towards Data
Science, 2023.
[27] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and practice, 3rd ed.,
OTexts: Melbourne, Australia, 2021. OTexts.com/fpp3.
[28] C. Bergmeir and J. M. Ben´ıtez, On the use of cross-validation for time series predictor
evaluation, Information Sciences, 191 (2012), pp. 192–213. DOI:10.1016/j.ins.2011.12.028.
[29] J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal of
Machine Learning Research, 13 (2012), pp. 281–305.
[30] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, Efficient
and robust automated machine learning, in NeurIPS, 2019.
[31] Optuna Development Team, Optuna hyperparameter optimization guide, Documentation,
https://optuna.org/, 2023.
[32] Ray Team, Hyperparameter tuning with Ray Tune, Documentation,
https://docs.ray.io/en/latest/tune/, 2023.
[33] D. Maclaurin, D. Duvenaud, and R. P. Adams, Gradient-based hyperparameter optimization
through reversible learning, in Proceedings of the 32nd International Conference on Machine
Learning (ICML), 2015. https://proceedings.mlr.press/v37/maclaurin15.html.
[34] T. Bollerslev, R. F. Engle, and J. M. Wooldridge, A capital asset pricing model with timevarying covariances, Journal of Political Economy, 96 (1988). DOI:10.1086/261527.
[35] M. Asai, C.-L. Chang, and M. McAleer, Realized volatility and MGARCH models: A review,
Econometrics, 9 (2021).
[36] R. Taleblou and P. Mohajeri, Modeling the daily volatility of oil, gold, dollar, bitcoin and
Iranian stock markets: An empirical application of a nonlinear space state model, Iranian
Economic Review, (2023).
[37] G. Kastner and S. Fruhwirth-Schnatter, ¨ Ancillarity-sufficiency interweaving strategy (ASIS)
for boosting MCMC estimation of stochastic volatility models, Computational Statistics Data
Analysis, 76 (2014), pp. 408–423.
[38] F. Chollet, Deep learning with Python, 2nd ed., Manning Publications, Shelter Island, NY,
2021.
[39] M. Rezaei, N. Neshat, A. Jafari Nodoushan, and A. Ahmadzadeh, The artificial neural
networks for investigation of correlation between economic variables and stock market indices,
Journal of Mathematics and Modeling in Finance, 3 (2023), pp. 19–35.
[40] M. Abdollahzadeh, A. Baagherzadeh Hushmandi, and P. Nabati, Improving the accuracy of
financial time series prediction using nonlinear exponential autoregressive models, Journal
of Mathematics and Modeling in Finance, 4 (2024), pp. 159–173.
[41] M. Goldani, Comparative analysis on forecasting methods and how to choose a suitable one:
Case study in financial time series, Journal of Mathematics and Modeling in Finance, 3
(2023), pp. 37–61.

Comparing the performance of different deep learning architectures for time series forecasting

References

References

Volume 5, Issue 1July 2025Pages 63-87

Volume 5, Issue 1
July 2025
Pages 63-87