LSTM
Long Short Term Memory (LSTM) is a machine learning algorithm that can recognise a series of data in a sequence. For example, series of words in an email, a series of spoken words (voice), a series of pictures (a video), a series of daily rain fall data, daily solar radiation, or monthly sales. LSTM has the ability to recognise the pattern in the data and make a prediction about the future values. This has been implemented widely in many areas, including in the financial markets.
RNN
LSTM belongs to a family of algorithms called Recurrent Neural Network (RNN). RNN is an machine learning algorithm that has the ability to learn the pattern in a sequential data. Apart from LSTM, there are other algorithms in RNN family, for example: Gated Recurrent Unit (GRU). RNN, as the name implies, consists of a neural network. A neural network is an architecture from a series of layers, and each layer consists of nodes. Each node in LSTM consists of 3 gates, i.e. the forget gate, the input gate and the output gate.
Stock Markets
There are many different forms of financial markets. Stock markets, bond markets, commodity markets (they trade the future price of oil, gold, aluminium, wheat, etc.), currency markets (trading the exchange rate of fiat currency pairs and crypto currencies), money markets (trading very liquid securities such as US treasury) and derivative markets (trading fixed income and equity derivatives such as options, futures and swaps). These markets have different characteristics. Some are volatile like stock markets and currency markets, some are quite stable like bond markets and money markets. For individual investors, stock markets is arguably the most popular one, much more popular than bond, commodity, derivative or money markets.
Forecasting vs prediction
Before we get into stock price forecasting, let’s understand the difference between prediction and forecasting. In machine learning the word “prediction” means trying to guess the value of a variable. If you have the number of windows in a house, and you try to guess the price of the house, that’s prediction. Or if you have the height of the people, and you try to guess the weight, that’s prediction. Of course you can use many variables for prediction, for example: using the house location, the floor area, the number of bedrooms and the age of the house to predict the price of the house.
In machine learning forecasting deals specifically with time. It means predicting the future values. For example how many washing machine will be sold in our store next month, or whether it is going to rain or not tomorrow. We can based our forecast based on the past values of same metrics. For example, we can forecast the next month inflation based on the inflation in last 2 years. Or we can based our forecast on something else entirely. For example, we can forecast the traffic jam based on the scheduled events in that city. And of course, forecasting is predicting. Because predicting means trying to guess the value of a variable, including the value of a variable in the future.
Stock Price Forecast
In the stock market there a lot of variables that we can forecast (= guessing the future). For example, try to predict which companies will be leaving S&P 500 next year. We can predict the number of transactions tomorrow. Or the volume. Or the direction of the price tomorrow, i.e. up or down. Or the market sentiment. Or we can try to correlate one variable with another variable. Out of all variables, the stock price is the most popular one. That is not surprising, because if you know what the stock price will be in the future, you can make a huge fortune.
There are many stock price forecasting articles. Most of them use only last few weeks data. And they forecast only the next day price. That is not what happens in the real world. Many uses the closing price, or the opening price. Again that’s not what’s happening in the real world. When an investor sell, the price they get is not the opening or closing price. Not the highest price or the lowest price either. They get somewhere between the highest and the lowest price. It is not called the mid price, because mid price is the mid point between the asking price and the offer/bid price, for a particular trade. Whereas high, low, open, close are for a day worth of trades, for a particular stock or bond. Or options, or futures, or any other derivatives, commodity or currency for that matter.
Principles of Forecasting
Many forecasting methods is based on the assumption that the future will behave like the past. That is the fundamental reason to believe that we can predict the future. Because the future will behave like the past. This is the fundamental of a traditional time series forecasting such as ARIMA or Exponential Smoothing.
The technical term for this is autoregression, meaning that the future values of a time series is related to the linear combination of past values of that time series (or autocorrelation if it is not linear). The other technical term for this is moving average model, meaning that the future values of a time series is related to the past forecast errors. Both autoregression and moving average model are the fundamental principles used in ARIMA.
Whereas Exponential Smoothing technique believes that the future values of a time series are related to the weighted average of the past values of that time series. The recent events get larger weights than the distant past events.
The Art of Stock Forecasting
Like any prediction, the art of stock forecasting is choosing the right variables to use. The stock time series itself might not enough for forecasting its future. In that case we need to include other variables, such as volume, other stocks, and the industry average. That is the most important ingredient: the input variables. The first step that we do, before choosing the algorithm or methods, we need to establish a set of variables which are highly correlated to the future price of the stock.
The second most important ingredient in stock forecasting is the choosing the right algorithm. It does not have to be a machine learning. Time series forecasting is a hundred years old since, so ignore the non ML methods at your peril. Combine the traditional methods and the ML methods. Learn about Facebook’s Prophet, GARCH, ES, ARIMA, Theta and other non-ML methods. Combine them with popular ML methods commonly used for forecasting time series such as LSTM and CNN. But don’t stop there, also use unconventional ML algorithms such as XGBoost and Random Forest. There are tons of literatures and papers for each methods, which provides us with a good starting point.
That is the art of stock forecasting, to find the right combination of variables and algorithms that can give us the best results. And then the third one: tuning those combinations. Trying out different regularisations, using different hyper-parameters, different optimiser, different evaluation criteria. All in an attempt to get the best results. That is the art of stock forecasting.
Different types of LSTM
When it comes to stock price forecasting using LSTM, there are 5 different architectures that we can use: vanilla LSTM, stacked LSTM, convolutional LSTM, bidirectional LSTM and CNN LSTM.
- Vanilla LSTM is a single LSTM layer, possibly with regularisation measure such as a dropout layer. Optionally with one or two dense layers for converting the output into classes.
- Stacked LSTM means multiple LSTM layers stacked one after another.
- Bidirectional LSTM learns the time series both forward and backward.
- Convolutional LSTM reads the time series using convolution method (see my article here for what convolution means)
- CNN LSTM also reads the time series using convolution method.
The difference between CNN LSTM an Convolutional LSTM is: CNN LSTM uses a convolution layer to read the input, then pass the output to an LSTM layer. Whereas Convolutional LSTM does not use a separate convolution layer separately, but uses a Keras layer called ConvLSTM which already have convolution reading built in.
Implementation
The implementation examples of these 5 architectures can be seen on Jason Brownlee’s website: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
Here is an example for Vanilla LSTM: (my own implementation)

In the above example, the Y Gap is 66, i.e. the number of days between [the last trading day in the stock time series] and [the number of days in the forecasting period]. For example: the time series is from 1st Jan 2016 to 31st Dec 2021. The Y Gap is the period from Jan, Feb, and March 2022. And the forecasting period is the first week in April 2022.
That is why, in the example above the LSTM Output is set to 71 days. Because it is 66 days (the number of trading days in 3 months period of the gap, i.e. Jan, Feb, March 2022) plus 5 days (the number of trading days in the first week of April).
The number of trading days corresponds to the days when the stock market is open. This is usually Monday to Friday, minus the public holidays.
Running the model is quite simple:
model.fit(X_train, Y_train, epochs = 50, batch_size = 32)
I do hope this article inspired you to take a stab in stock price forecasting. If you have any questions or would like to discuss any aspect of stock forecasting, I’m available at vrainardi@gmail.com.