Data Warehousing and Data Science

16 February 2022

Forecasting time series: using statistics vs machine learning

Filed under: Data Science,Machine Learning — Vincent Rainardi @ 6:59 am

This article outlines how ARIMA and LSTM are used for forecasting time series, and which one is better.
A lot of references are available at the end of this article for those who would like to find out further.


In ML, we use regression, to predict the values of a variable (y) based on the values of other variables (x1, x2, x3, …). For example, we predict the stock price of a company, based on its financial ratios, fundamentals and ESG factors.

In time series forecasting, we predict the values of a variable in the future, based on the values of that variable in the past. For example, we predict the stock price of a company, based on the past prices.

A time series is a sequence of numbers, each collected at a fixed period of time.

How do we forecast a time series? There are 2 ways: a) using statistics, b) using machine learning. In this article I’ll give a brief explanation of both. But before that let’s clear out one thing first: is “time series” plural or singular?

Time Series: plural or singular?

A time series is a sequence of numbers like this 1, 2, 3, 4, 5, … This is one time series, not one time serie.

We can have two time series like this: 1, 2, 3, 4, 5, … and 6, 7, 8, 9, 10, … These are two time series, not two time serieses.

So the singular form is “series” and the plural form is also “series”, not “serieses”. The word “series” is both singular and plural. See Merriam-Webster dictionary explanation in Ref #1 below.

Forecasting a time series means to find out what the next numbers in one series (1, 2, 3, 4, 5, …)

Forecasting two time series means to find out what the next numbers in two series (1, 2, 3, 4, 5, … and 6, 7, 8, 9, 10, …)

Forecasting time series using statistics

We can use regression to forecast a time series. We can also use Moving Average to forecast a time series.

Auto-Regressive model (AR)

Using regression, we use the past values of the forecast variable as the input variables. Which is why this method is called Auto-Regressive model. It is called auto because the input variables are the forecast variable itself, but the past values of it.

where yt-1, yt-2, yt-3are the past values of y, and c, c1, c2, c3 are constants.

ϵt = white noise. It is a sequence of random numbers, with the average of zero and the standard deviation is the same over time.

Moving Average model (MA)

Using Moving Average model the forecast variable is the mean of the series plus the error terms.

where ϵt = yt – yt-1 (white noise error term), μ is the mean and a1, a2, a3 are constants.

It is called moving average because we start with the average (mean), then keep moving/shifting the average by a factor of epsilon (the error term).

I need to emphasise here that the Moving Average model is not the Moving Average analysis that we use for the stock price, where we simply calculate the average of stock prices in the last 20 days.

ARMA model

ARMA model is the combination of the Auto-Regressive model and Moving Average. That is why it is called ARMA, the AR bit means Auto-Regressive, whereas the MA bit means Moving Average. So we forecast using the previous values of the forecast variable (Auto-Regressive model), and using the mean plus the error terms (Moving Average model).

ARIMA has 2 parameters i.e. ARMA(p,q)
where p = order of the autoregressive and q = order of the moving average.
Whereas AR and MA has 1 parameter i.e. AR(p) and MA(q).

ARIMA model

The ARIMA model is ARMA model plus differencing. Differencing means creating a new series by taking difference between the value at t and at (t-1).

For example, from this series: 0, 1, 3, 2, 3, 3, … (call it y)
We can make a new series by taking the difference between the numbers: 1, 2, -1, 1, 0, … (call it y’)
We can take the difference again (called second order differencing): 1, -3, 2, -1, … (call it y’’)

The I in ARIMA stands for Integrated. Integrated here means Differencing.

So the difference between the ARMA model and the ARIMA is: in ARMA we use y, whereas in ARIMA we use y’ or y’’.

In the ARIMA model use AR model and MA model on y’ or y’’, like this:

ARIMA has 3 parameters i.e. ARIMA(p,d,q)
where p = order of the autoregressive, d = degree of the first order differencing, and q = order of the moving average.


The S here means Seasonal and the X here means Exogenous.

Seasonal means that it has a repeating pattern from season to season. For example, the series on top line below consists of the trend part, the seasonal part and the random part. The seasonal part has a repeating pattern. Source: Ref #5.

The SARIMAX model include the seasonal part as well as the non-seasonal part.

SARIMAX has 7 parameters i.e. SARIMAX(p,d,q)x(P,D,Q,s)

Where p, d, q are as defined above, and P, D, Q are the seasonal terms of the p, d, q parameters, and s is the number seasons per year, e.g. for monthly s = 12, for quarterly s = 4.

In timer series, a exogenous variable means parallel time series which is used as a weighted input to the model (Ref #6)

Exogenous variable is one of the parameter in SARIMAX. In Python (statsmodels library), the parameters for SARIMAX are:

SARIMAX (y, X, order=(p, d, q), seasonal_order=(P, D, Q, s))

where y is the time series, X is the Exogenous variable/factor, and the others are as described before.

Forecasting time series using machine learning

The area of machine learning which deals with temporal sequence. is called Recurrent Neural Network (RNN). Temporal sequence means anything which has time element (a series of things happening one after the other), such as speech, handwriting, images, video. And that includes time series of course.

RNN is an neural network which has an internal memory. Which is it able to recognise patterns in time series. There are many RNN models, such as Elman network, Jordan network, Hopfield network, LSTM, GRU.

The most widely used method for predicting a time series is LSTM. An LSTM cell has 3 gates: an input gate, an output gate and a forget gate:

The horizontal line at the top (from ct-1 to ct) is the cell state. It is the memory of the cell. Along this line, there are 3 things happening: the cell state is multiplied by the “forget gate”, increased/reduced by the “input gate” and finally the value is taken to the “output gate”.

  • The forget gate removes unwanted information from the cell state (c), based on the previous input (ht-1) and the current input (xt).
  • The input gate adds new information to the cell state. The current input (xt) and the previous output (ht-1) pass through a σ and a tanh, multiplied then added to the cell memory line.
  • The output gate calculates the output from the cell state (c), the previous input (ht-1) and the current input (xt).

Architecturally, there are different ways we can use to forecast time series using LSTM: (Ref #7)

  • Fully Connected LSTM: a neural network with several layers of LSTM units with each layer fully connected to the next layer.
  • Bidirectional LSTM: the LSTM model learns the time series in backward direction in addition to the forward direction.
  • CNN LSTM: the time series is processed by a CNN first (1 dimensional), then processed by LSTM.
  • ConvLSTM: the convolutional structure is in inside the LSTM cell (in both the input-to-state and state-to-state transitions), see Ref #13 and #16.
  • Encoder-Decoder LSTM: for forecasting several time steps. The Encoder maps the time series into a fixed length vector, and decoder maps this vector back to a variable-length output sequence.

Which one is better, ARIMA or LSTM?

Well that is a million dollar question! Some research suggests that LSTM is better (Ref #17, #20, #24), some suggests that ARIMA is better (Ref #19) and some says that XGB is better than LSTM and ARIMA (#23). So it depends on the cases, but generally speaking LSTM is better in terms of accuracy (RMSE, MAPE).

It is an interesting topic for research. Plus other approaches such as Facebook’s Prophet, GRU, GAN and their combinations (Ref #25, #26, #27). It is possible to get better accuracy by combining the above approaches. I’m still searching a topic for my MSc dissertation, and it looks that this could be the one!


  1. Merriam-Webster dictionary explanation on “series” plurality: link
  2. Forecasting: Principles and Practice, by Rob J. Hyndman and George Athanasopoulos: link
  3. Wikipedia on ARIMA: link
  4. ARIMA model on Statsmodel: link
  5. Penn State Eberly College of Science: link
  6. Quick Adviser: link
  7. How to Develop LSTM Model for Time Series Forecasting by Jason Brownlee: link
  8. Time Series Prediction with LSTM RNN in Python with Keras: link
  9. Time Series Forecasting: Predicting Stock Prices Using An ARIMA Model by Serafeim Loukas: link
  10. Time Series Forecasting: Predicting Stock Prices Using An LSTM Model by Serafeim Loukas: link
  11. Wikipedia on RNN: link
  12. RNN and LSTM by Vincent Rainardi: link
  13. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting, by Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo: link
  14. Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based RNN, by Adrian Sanchez-Caballero, David Fuentes-Jimenez, Cristina Losada-Guti´errez: link
  15. Convolutional LSTM for spatial forecasting, by Sigrid Keydana: link
  16. Very Deep Convolutional Networks for End-to-End Speech Recognition, by Yu Zhang, William Chan, Navdeep Jaitly: link
  17. A Comparison of ARIMA and LSTM in Forecasting Time Series, by Sima Siami-Namini, Neda Tavakoli, Akbar Siami Namin: link
  18. ARIMA vs Prophet vs LSTM for Time Series Prediction, by Konstantin Kutzkov: link
  19. A Comparative Analysis of the ARIMA and LSTM Predictive Models and Their Effectiveness for Predicting Wind Speed, by Meftah Elsaraiti, Adel Merabet: link
  20. Weather Forecasting Using Merged LSTM and ARIMA Model, by Afan Galih Salman, Yaya Heryadi, Edi Abdurahman, Wayan Suparta: link
  21. Comparing ARIMA Model and LSTM RNN Model in Time-Series Forecasting, by Vaibhav Kumar: link
  22. A Comparison between ARIMA, LSTM, and GRU for Time Series Forecasting, by Peter Yamak, Li Yujian, Pius Kwao Gadosey: link
  23. Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions by Florian Haselbeck, Jennifer Killinger, Klaus Menrad, Thomas Hannus, Dominik G. Grimm: link
  24. Forecasting Covid-19 Transmission with ARIMA and LSTM Techniques in Morocco by Mohamed Amine Rguibi, Najem Moussa, Abdellah Madani, Abdessadak Aaroud, Khalid Zine-dine: link
  25. Time Series Forecasting papers on Research Gate: link
  26. Stock Price Forecasting by a Deep Convolutional Generative Adversarial Network by Alessio Staffini: link
  27. A novel approach based on combining deep learning models with statistical methods for COVID-19 time series forecasting by Hossein Abbasimehr, Reza Paki, Aram Bahrini: link

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: