Data Warehousing and Data Science

11 May 2018

Big Volume Data Warehouse

Filed under: Data Warehousing — Vincent Rainardi @ 7:42 pm

Some people are worried with the volume of a snapshot fact table. If we insert half a million rows a day, after a year we would get 180 million rows, and after 5 years close to 1 billion rows. For example, position valuation of a large asset manager, or account balances of a large retail bank.

The volume issue also occurs on transaction fact table, i.e. we could be inserting a few million rows a day, resulting a few billion rows a year. For example, stock exchange bids and offers, search engines searches, the browsing traffic of a large ecommerce sites, or the good old supermarket checkouts case?

If we haven’t dealt with billions rows fact table like above, we would be worried about the query performance. How long will it take to query a fact table with 10 billion rows? Would it take a minute? 10 minutes? Well it could be, if the database engine has to scan the table. But if it is partitioned, it could only takes a second.

So the question now becomes: ok, great to know that it can take a second to query a 10 billion rows fact table. But how do I partition that big fact table? Well it depends on what fact table it is. If it a snapshot fact table, the partition should be on the snapshot date. If it is a transaction fact table, the partition should be on the transaction date.

Ok, that’s great. But what if we need to query across dates? For example, in the stock exchange data warehouse, to find out which are the top 10 most traded shares by volume (or by value) in the last 3 months. That query will take a lot of minutes. To do this, we need to prepare a fact table which stores the transaction volume and value of every share on every date. We populate this daily snapshot fact table every day, immediately after the main transaction fact table is populated.

For example, today is 11th May 2018. Right after the 10th May data is loaded into the shares transaction table, we calculate the volume and value of every shares traded on 10th May and insert these rows into this daily snapshot fact table. By querying this table it will only take a second to find out what are the top 10 most traded shares in the last 3 months.

Blog at