Data Warehousing and Business Intelligence

17 April 2017

Definition of Big Data and Data Warehousing

Filed under: Data Architecture,Data Warehousing — Vincent Rainardi @ 5:38 pm
Tags: ,

I’m annoyed that many people label the normal Data Warehousing & BI stuff as “Big Data”. It is not. For me, Big Data is the “Hadoop stuff” (HDFS). It is is in SQL Server, Oracle or Teradata, it is just a normal database, not Big Data.

Big Data is stored on HDFS (Hadoop Distributed File System), not in RDBMS. Oracle, IBM, Microsoft, SAP, Teradata, all use Hadoop to store Big Data. Big Data is queried using MapReduce.

The reason why Big Data can’t be stored in RDBMS is because the format is not tabular. Sometimes it is 2 columns, sometimes it is 200 columns. Like Twitter data. The second reason is because it is too big. Sensors can make 100 measurements in a second, and in a year it could be Petabytes. Web Logs is another example. Tracking the ask and offer price of every transaction in every stock market is another example. Yes we can put Petabytes into SQL Server or Oracle, into Netezza or Teradata, but not at this speed. Hadoop on the other hand is designed exactly to cope with these kind of speed and volume.

Now the usage. What is Big Data Analytics? Big Data Analytics is when we do analytics on Hadoop Data.

Is Fraud Detection Big Data Analytics? Not always. Fraud Detection can be done on a normal Data Warehouse or a database. Is Machine Learning Big Data? Not always. Machine Learning can be done on a normal Data Warehouse or a database. If the Fraud Detection or the Machine Learning is done on data stored in Hadoop, then it is Big Data Analytics.

Even if it is only 200 GB, if it is stored in Hadoop, it is a Big Data. Even if the data is 5 Petabyte, if it is stored in an SQL Server database, it is not a Big Data, in my opinion.

Even if the data is in tabular format (i.e. columns and rows), if it is stored in Hadoop, it is a Big Data. But if it is stored in an Oracle database, it is not a Big Data.

Every Big Data Architecture that I know uses Hadoop. No companies (or government) implement Big Data on an RDBMS. Or on a non HDFS files. Every single company, every single Big Data case I read implement the Big Data on Hadoop. I may be wrong and would be happy to be corrected. If you know a case which implement Big Data on a non-Hadoop system/architecture, I will grateful if you could let me know, either through comments, or via vrainardi@gmail.com.

 

 

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: