Where does NoSQL sit in data warehousing landscape? Does it just sit on the bench, having no role? Do we use it in staging area? Or does it function as one of the main data stores?
NoSQL has many faces. Document database, key value pair, graph database, they are all NoSQL. So obviously its usage depends on what type of data we have to store. If our data is about individual people (or individual companies) then it lends itself to be stored as a graph database. Because people have connections (ditto companies). Or you can store them (people and companies) as documents.
That is if you focus on one of the at the time. But if you’d like to process them en-masse, aggregate them, etc. then tabular storage has its advantages. KVP has a distinct advantage when it comes to schema-on-read. Data from any sources can be stored as key value pairs in the data lake, and thus deliberately avoiding modelling the data.
The idea of schema-on-read is simple, but a bit difficult to be agreed with. But it is key to understanding NoSQL’s role. The basic premise of a data lake is that we don’t model the data when storing it in the lake. We model it when we use it. KVP is ideal for this, for data lake store. All the attributes from all tables are stored in just one column. Obviously this requires a good data dictionary and data lineage. Not just at entity level (what they contain, how they are used, etc.) but down to each attribute.
Document DB’s role in data warehousing is very different (or any data lakescape for that matter). Document DB has models. Every entity has attributes. And that becomes the basis of the data model. But unlike tabular model, in document DB individual entity can have different attributes. Let’s look at security as an example (as in financial instrument). The attribute of an option is very different to a CDS. An option has a “strike price” attribute for example, and a CDS doesn’t. A CDS has a “reference entity” attribute whereas an option doesn’t.
In document DB entities have relationship with each other. If it is one to many then the parent IDs are put on the child entities. If it many to many then we use a “joining document” or we can do embedding and referencing (see: ref #6 below).
There are other types of NoSQL too: wide column store, tuple store, object store, triple store, multi model DB. We really are limiting ourselves if we only do relational databases. There are many, many different types of data store outside relational DB. And as a data architect, it is our job to know them well. Even if eventually we use SQL for a project, we know know very well why we are not implementing it on the other data stores.
A “wide column store” uses rows and columns like in a relational database, but the column names and data types can vary from row to row. The most famous wide column store is probably Cassandra, but both Amazon and Azure have them.
So where does that leave us. Where do we use NoSQL in data warehousing or data lake? KVP is useful to be used as a data lake store. Graph database would be useful as the main data store if our data is about individual object (such as person, security, or company) which is not processed/queried en-masse. Document DB is useful for modelling entites which has variable attributes, such as security. So NoSQL can be used as a data lake store, or as one of the main data stores.
This article outlines how ARIMA and LSTM are used for forecasting time series, and which one is better. A lot of references are available at the end of this article for those who would like to find out further.
Introduction
In ML, we use regression, to predict the values of a variable (y) based on the values of other variables (x1, x2, x3, …). For example, we predict the stock price of a company, based on its financial ratios, fundamentals and ESG factors.
In time series forecasting, we predict the values of a variable in the future, based on the values of that variable in the past. For example, we predict the stock price of a company, based on the past prices.
A time series is a sequence of numbers, each collected at a fixed period of time.
How do we forecast a time series? There are 2 ways: a) using statistics, b) using machine learning. In this article I’ll give a brief explanation of both. But before that let’s clear out one thing first: is “time series” plural or singular?
Time Series: plural or singular?
A time series is a sequence of numbers like this 1, 2, 3, 4, 5, … This is one time series, not one time serie.
We can have two time series like this: 1, 2, 3, 4, 5, … and 6, 7, 8, 9, 10, … These are two time series, not two time serieses.
So the singular form is “series” and the plural form is also “series”, not “serieses”. The word “series” is both singular and plural. See Merriam-Webster dictionary explanation in Ref #1 below.
Forecasting a time series means to find out what the next numbers in one series (1, 2, 3, 4, 5, …)
Forecasting two time series means to find out what the next numbers in two series (1, 2, 3, 4, 5, … and 6, 7, 8, 9, 10, …)
Forecasting time series using statistics
We can use regression to forecast a time series. We can also use Moving Average to forecast a time series.
Auto-Regressive model (AR)
Using regression, we use the past values of the forecast variable as the input variables. Which is why this method is called Auto-Regressive model. It is called auto because the input variables are the forecast variable itself, but the past values of it.
where yt-1, yt-2, yt-3are the past values of y, and c, c1, c2, c3 are constants.
ϵt = white noise. It is a sequence of random numbers, with the average of zero and the standard deviation is the same over time.
Moving Average model (MA)
Using Moving Average model the forecast variable is the mean of the series plus the error terms.
where ϵt = yt – yt-1 (white noise error term), μ is the mean and a1, a2, a3 are constants.
It is called moving average because we start with the average (mean), then keep moving/shifting the average by a factor of epsilon (the error term).
I need to emphasise here that the Moving Average model is not the Moving Average analysis that we use for the stock price, where we simply calculate the average of stock prices in the last 20 days.
ARMA model
ARMA model is the combination of the Auto-Regressive model and Moving Average. That is why it is called ARMA, the AR bit means Auto-Regressive, whereas the MA bit means Moving Average. So we forecast using the previous values of the forecast variable (Auto-Regressive model), and using the mean plus the error terms (Moving Average model).
ARIMA has 2 parameters i.e. ARMA(p,q) where p = order of the autoregressive and q = order of the moving average. Whereas AR and MA has 1 parameter i.e. AR(p) and MA(q).
ARIMA model
The ARIMA model is ARMA model plus differencing. Differencing means creating a new series by taking difference between the value at t and at (t-1).
For example, from this series: 0, 1, 3, 2, 3, 3, … (call it y) We can make a new series by taking the difference between the numbers: 1, 2, -1, 1, 0, … (call it y’) We can take the difference again (called second order differencing): 1, -3, 2, -1, … (call it y’’)
The I in ARIMA stands for Integrated. Integrated here means Differencing.
So the difference between the ARMA model and the ARIMA is: in ARMA we use y, whereas in ARIMA we use y’ or y’’.
In the ARIMA model use AR model and MA model on y’ or y’’, like this:
ARIMA has 3 parameters i.e. ARIMA(p,d,q) where p = order of the autoregressive, d = degree of the first order differencing, and q = order of the moving average.
SARIMAX model
The S here means Seasonal and the X here means Exogenous.
Seasonal means that it has a repeating pattern from season to season. For example, the series on top line below consists of the trend part, the seasonal part and the random part. The seasonal part has a repeating pattern. Source: Ref #5.
The SARIMAX model include the seasonal part as well as the non-seasonal part.
SARIMAX has 7 parameters i.e. SARIMAX(p,d,q)x(P,D,Q,s)
Where p, d, q are as defined above, and P, D, Q are the seasonal terms of the p, d, q parameters, and s is the number seasons per year, e.g. for monthly s = 12, for quarterly s = 4.
In timer series, a exogenous variable means parallel time series which is used as a weighted input to the model (Ref #6)
Exogenous variable is one of the parameter in SARIMAX. In Python (statsmodels library), the parameters for SARIMAX are:
SARIMAX (y, X, order=(p, d, q), seasonal_order=(P, D, Q, s))
where y is the time series, X is the Exogenous variable/factor, and the others are as described before.
Forecasting time series using machine learning
The area of machine learning which deals with temporal sequence. is called Recurrent Neural Network (RNN). Temporal sequence means anything which has time element (a series of things happening one after the other), such as speech, handwriting, images, video. And that includes time series of course.
RNN is an neural network which has an internal memory. Which is it able to recognise patterns in time series. There are many RNN models, such as Elman network, Jordan network, Hopfield network, LSTM, GRU.
The most widely used method for predicting a time series is LSTM. An LSTM cell has 3 gates: an input gate, an output gate and a forget gate:
The horizontal line at the top (from ct-1 to ct) is the cell state. It is the memory of the cell. Along this line, there are 3 things happening: the cell state is multiplied by the “forget gate”, increased/reduced by the “input gate” and finally the value is taken to the “output gate”.
The forget gate removes unwanted information from the cell state (c), based on the previous input (ht-1) and the current input (xt).
The input gate adds new information to the cell state. The current input (xt) and the previous output (ht-1) pass through a σ and a tanh, multiplied then added to the cell memory line.
The output gate calculates the output from the cell state (c), the previous input (ht-1) and the current input (xt).
Architecturally, there are different ways we can use to forecast time series using LSTM: (Ref #7)
Fully Connected LSTM: a neural network with several layers of LSTM units with each layer fully connected to the next layer.
Bidirectional LSTM: the LSTM model learns the time series in backward direction in addition to the forward direction.
CNN LSTM: the time series is processed by a CNN first (1 dimensional), then processed by LSTM.
ConvLSTM: the convolutional structure is in inside the LSTM cell (in both the input-to-state and state-to-state transitions), see Ref #13 and #16.
Encoder-Decoder LSTM: for forecasting several time steps. The Encoder maps the time series into a fixed length vector, and decoder maps this vector back to a variable-length output sequence.
Which one is better, ARIMA or LSTM?
Well that is a million dollar question! Some research suggests that LSTM is better (Ref #17, #20, #24), some suggests that ARIMA is better (Ref #19) and some says that XGB is better than LSTM and ARIMA (#23). So it depends on the cases, but generally speaking LSTM is better in terms of accuracy (RMSE, MAPE).
It is an interesting topic for research. Plus other approaches such as Facebook’s Prophet, GRU, GAN and their combinations (Ref #25, #26, #27). It is possible to get better accuracy by combining the above approaches. I’m still searching a topic for my MSc dissertation, and it looks that this could be the one!
References:
Merriam-Webster dictionary explanation on “series” plurality: link
Forecasting: Principles and Practice, by Rob J. Hyndman and George Athanasopoulos: link
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting, by Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo: link
Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based RNN, by Adrian Sanchez-Caballero, David Fuentes-Jimenez, Cristina Losada-Guti´errez: link
Convolutional LSTM for spatial forecasting, by Sigrid Keydana: link
Very Deep Convolutional Networks for End-to-End Speech Recognition, by Yu Zhang, William Chan, Navdeep Jaitly: link
A Comparison of ARIMA and LSTM in Forecasting Time Series, by Sima Siami-Namini, Neda Tavakoli, Akbar Siami Namin: link
ARIMA vs Prophet vs LSTM for Time Series Prediction, by Konstantin Kutzkov: link
A Comparative Analysis of the ARIMA and LSTM Predictive Models and Their Effectiveness for Predicting Wind Speed, by Meftah Elsaraiti, Adel Merabet: link
Weather Forecasting Using Merged LSTM and ARIMA Model, by Afan Galih Salman, Yaya Heryadi, Edi Abdurahman, Wayan Suparta: link
Comparing ARIMA Model and LSTM RNN Model in Time-Series Forecasting, by Vaibhav Kumar: link
A Comparison between ARIMA, LSTM, and GRU for Time Series Forecasting, by Peter Yamak, Li Yujian, Pius Kwao Gadosey: link
Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions by Florian Haselbeck, Jennifer Killinger, Klaus Menrad, Thomas Hannus, Dominik G. Grimm: link
Forecasting Covid-19 Transmission with ARIMA and LSTM Techniques in Morocco by Mohamed Amine Rguibi, Najem Moussa, Abdellah Madani, Abdessadak Aaroud, Khalid Zine-dine: link
Time Series Forecasting papers on Research Gate: link
Stock Price Forecasting by a Deep Convolutional Generative Adversarial Network by Alessio Staffini: link
A novel approach based on combining deep learning models with statistical methods for COVID-19 time series forecasting by Hossein Abbasimehr, Reza Paki, Aram Bahrini: link
It really puzzled me when people talked about using CNN for stock market prediction. CNN is for processing images. How can CNN be used for predicting the stock market? Surely we need LSTM for that, because it is a time series?
The key here is to recognise that time series can be viewed in polar coordinate, like this: (Ref #2 and #3)
Stock charts are time series, which is basically the price of the stock across time. This is in Cartesian coordinate, i.e. X and Y coordinate. A point in X and Y coordinate can be converted into polar coordinate and vice versa, like this: (Ref #4)
This way, the line in the x and y chart (such as time series) can be converted into a Polar Coordinate chart, like above. The reason we are converting it to Polar Coordinate is to make it easier to detect patterns or anomalies (Ref #5).
To identify the temporal correlation in different time intervals we look at the cosine of sum between each pair of points (Ref #6 and #7), like this:
The above matrix of cosine is called the Gramian Matrix.
To make it easier to visualise, we convert the Gramian Matrix into an image, like below left: (Ref #3)
The image on the left is a Gramian Angular Summation Field (GASF). If the Gramian Matrix uses sin instead of cos, and the operative is minus instead of plus, then the image is called a Gramian Angular Difference Field (GADF), like above right. Together, GASF and GADF are called GAF (Gramian Angular Field).
So why do we use GAF? What are the advantages? The first advantage is that GAF preserves temporal dependency. Time increases from the top left corner of the GAF image to the bottom right corner. Each element of the Gramian Matrix is a superposition or difference of directions with respect to the time difference between 2 points. Therefore GAF contains temporal correlations.
Second, the main diagonal in the Gramian Matrix contains the values with no time difference between 2 points. Meaning that the main diagonal contains the actual angular values. Using this main diagonal we can construct the time series of the features learned by the neural network (Ref 6# and #7).
Different colour schemes are used when creating GAF chart, but the common one is from blue to green to yellow to red. Blue for -1, green for 0, yellow for 0.3 and red for 1, like this: (Ref 6)
Once the time series become images, we can process them using CNN. But how do we use CNN to predict the stock prices? The idea is to take time series of thousands stocks and indices from various time periods, convert them into GAF images, and label each image with the percentage up or down the next day, like below.
We then train a CNN to classify those GAF images to predict which series will be up or down the next day and by how much (a regression exercise).
Secondly, for each of the stock chart we produce various different indicators such as Bollinger Bands, Exponential Moving Average, Ichimoku Cloud, etc. like below (Ref 15), and convert all of them to GAF images.
We put all these overlays together with the stock/index, forming a 3D image (dimensions: x = time, y = values, z = indicators. We use the same labels, which is the percentage up or down the next day, to train a CNN network using those 3D images and those labels. Now, we don’t only use the stock prices to predict the next day movement, but also the indicators.
Of course, we can use the conventional LSTM to do the same task, without converting them to the Polar Coordinate. But the point of this article is to use CNN for stock prediction.
I’ll finish with 2 more points:
Apart from GAF there are other method for converting time series into images, for example Markov Transition Field (MTF, Ref 9) and Reference Plot (RP, Ref 10). We can use MTF and RP images (both the prices and the indicators) to predict the next day prices.
There are other methods for using CNN to predict stock prices without involving images. The stock prices (and their indicators) remain as time series. See Ref 11 first, then 12 and 13. The time series is cut at different points and converted into matrix, like below.
If you Google “using CNN for predicting stock prices”, the chances are it is this matrix method that you will find, rather than using images. Because this matrix method uses X and y (input variable and target variable) then we can also use other ML algorithm including classical algorithms such as Linear Regression, Decision Trees, Random Forest, Support Vector Machines and Extreme Gradient Boosting.
The third method is using CNN-LSTM. In this method the local perception and weight sharing of CNN is used to reduce the number of parameters in the data, before the data is processed by LSTM (Ref 14).
So there you go, there are 3 ways of using CNN for predicting stock prices. The first one is using images (GAF, MTF, RP, etc), the second one is converting the time series into X and y matrix and the third one is by putting CNN in front of LSTM.
References:
Encoding time series as images, Louis de Vitry (link).
Convolutional Neural Network for stock trading using techincal indicators, Kumar Chandar S (link)
Imaging time series for classification of EMI discharge sources, Imene Mitiche, Gordon Morison, Alan Nesbitt, Michael Hughes-Narborough, Brian G. Stewart, Philip Boreha (link)
Sensor classification using Convolutional Neural Network by encoding multivariate time series as two dimensional colored images, Chao-Lung Yang, Zhi-Xuan Chen, Chen-Yi Yang (link)
Spatially Encoding Temporal Correlations to Classify Temporal Data Using Convolutional Neural Networks, Zhiguang Wang, Tim Oates (link)
Imaging Time-Series to Improve Classification and Imputation, ZhiguangWang and Tim Oates (link)
How to encode Time-Series into Images for Financial Forecasting using Convolutional Neural Networks, Claudio Mazzoni (link)
Advanced visualization techniques for time series analysis, Michaël Hoarau (link)
Financial Market Prediction Using Recurrence Plot and Convolutional Neural Network, Tameru Hailesilassie (link)
How to Develop Convolutional Neural Network Models for Time Series Forecasting, Jason Brownlee (link)
Using CNN for financial time series prediction, Jason Brownlee (link)
CNNPred: CNN-based stock market prediction using several data sources, Ehsan Hoseinzade, Saman Haratizadeh (link)
A CNN-LSTM-Based Model to Forecast Stock Prices Wenjie Lu, Jiazheng Li, Yifan Li, Aijun Sun, Jingyang Wang (link)