Data Warehousing, BI and Data Science

20 February 2019

Transactional Fact Tables

Filed under: Data Warehousing — Vincent Rainardi @ 9:11 pm

Transactional fact tables are not as popular as periodic snapshot fact table. In this article I would like to compare transactional and periodic snapshot fact tables, list their advantages and disadvantages, and give two cases from my own experience where I needed to decide between the two. But first let me explain what these two types of fact tables are.

What is a transactional fact table?

In dimensional data modelling we have 3 types of fact tables: transactional, periodic snapshot and accumulation snapshot. I’m going to explain the first two below.

1. Transactional Fact Table

A transactional fact table is a fact table where:

  • Each event is stored in the fact table only once.
  • It has a date column indicating when the event occurred.
  • It has an identifier column which identifies each event.
  • The number of rows is the same as the source table.

A classic example is when in the source system we have a sales table containing customer orders (for example, in a shop, a restaurant, or a factory). Say Monday we had 80 orders, Tuesday 70 orders, Wednesday 90, and so on. So on Monday night we load the 80 rows for Monday into the data warehouse. On Tuesday night we load the 70 rows for Tuesday, and on Wednesday night we load the 90 rows for Wednesday.

Transactional Fact Tables
Figure 1
. A transactional fact table loading 3 days data

In the data warehouse we store the customer orders in the Sales Fact Table, which is a transactional fact table.

  • In this Sales Fact Table, every order is stored only once. The 70 orders on Tuesday are different to the 80 orders on Monday. And they are also different to the 90 Wednesday orders.
  • For this we use the order date column. In the above example the Monday, Tuesday and Wednesday are the order date. This order date column indicates when the event occurred, when the order happened.
  • In this sales table we also have a sales identifier, such as order number if it is a shop, or ticket number if it is a restaurant.
  • On Wednesday night, after the warehouse load finishes, we have 80+70+90 = 240 rows, the same as in the source system.

In addition to insert, we also have update in the source table. In the example above, in addition to 70 new orders on Tuesday, we also have updates to some of the 80 Monday orders. This is they key difference to the Periodic Snapshot fact table: a Transactional fact table updates existing rows, and therefore lost some history.

Other ideal examples of a transactional fact table is the journal table in an accounting system, a trade table in an investment banking system, a premium table in an insurance system, a payment table in a payment system, a transaction table in a retail banking system, and a call table in a telecommunication system.

2. Periodic Snapshot Fact Table

A periodic snapshot fact table is a fact table where:

  • The whole source system is copied into the fact table regularly.
  • The same event is stored multiple times.
  • It has a snapshot date column indicating when a copy of the source table was created.

An ideal example of a periodic snapshot fact table is the bank account balance. At the end of each day, the balances of every customer account in the bank is stored in this account balance table. Say there were 20,000 customers on Monday; 22,000 customers on Tuesday and 24,000 customers on Wednesday.

Periodic Snapshot Fact Tables
Figure 2. Periodic Snapshot Fact Table

  • Every day we copy the whole content of the account balance table into the periodic snapshot fact table.
  • So on Monday night we stored 20,000 rows in the account balances periodic snapshot fact table, on Tuesday night 22,000 rows and on Wednesday night 24,000 rows. So an account is copied every day to the fact table, each day with potentially a different balance amount.
  • In the fact table we have a column called snapshot date. For all the rows created on Monday night, we set the snapshot date column to (for example) 11th Feb 2018. For the rows created on Tuesday night we set the snapshot date to 12th Feb 2018 and for the Wednesday rows we set the snapshot date to 13th Feb 2018.

Of course there are accounts which were closed on Tuesday and no longer in the account balance table in the source system. In the fact table, the Monday data set contains these accounts, but the Tuesday data set doesn’t contain these accounts, and neither does the Wednesday data set.

And there are accounts which were updated on Tuesday. These changes will be reflected on the Tuesday snapshot in the fact table, different to their Monday rows.

Another example is inventory table in manufacturing, holdings table in fund management, billing balance table in telecommunication, and daily temperature table in a weather system.

Advantages and Disadvantages

The advantages of a transactional fact table are:

  • It mimics the source table
  • It is simpler as we only have once version of each event

The disadvantages of a transactional fact table are:

  • We don’t have the previous values
  • Update is slow if the fact table is large, potentially performance issue

Now let’s get on with the two cases of transactional fact table implementations.

Case1: Retail Association

Let’s suppose we are running a Retail Association. Every shop in the country reports their annual sales data on our website, within one month of their financial year end. In January 2019 there were 8,000 shops reporting their sales data on our website, and in Feb 2019 there were 9000 shops reporting their sales data. There are about 100,000 shops in total.

Every month we get a CSV file containing the previous month data. The file is called Data_YYYYMM.csv and the file contains a column called Reporting Date. So:

  • The Data_20180131.csv contains the 8000 shops reporting in January 2019, with the Reporting Date column containing dates from 1st to 31st Jan 2019.
  • The Data_20190228.csv contains the 9000 shops reporting in February 2019, with the Reporting Date column containing dates from 1st to 28th Feb 2019.

Case 1
Figure 3. Retail Association monthly data file loaded into a transactional fact table

Because on the February file the January data is not given in full (but only the changed and new rows), we can’t make it as a periodic snapshot fact table. So in this case a transactional fact table is the only option.

Heart of Transactional Fact Tables

In theory the loading is straight forward. Jan file is loaded, then Feb file, and so on. But in reality this is rarely the case. A few shops who supposed to report in January were late, and they reported in February. So the in the February file we also get a few rows of the January “late reporter” shops.

Some shops made an error in their January submission and corrected it in February. So in February file we also have a few rows containing January corrections.

That is the heart of transactional fact tables: performing updates to the previous months’ data. In this case it goes back only one month, but in real cases it could be a few months.

Case 2: Waste Reporting

Suppose we are working with a government department responsible for implementing the waste regulations in our country. Every company and government agency in the country needs to report the amount and types of their packaging waste, electronic waste, commercial waste and recyclables e.g. plastic, paper, metal. This report is annual (once a year). Every town council also need to report the amount and types of their household waste, also annually.

The timing when they have to submit the report is different for every company, agency and council, depending on their financial year end. But the report contains the same year, which is the previous calendar year. So companies/agencies/councils with financial year ending on 31st March 2018 need to report the waste happened from Jan to Dec 2017. Those with financial year ending on 31st August 2018 also need to report the waste happened from Jan to Dec 2017.

The data we get from this central government department is monthly, i.e. every month we get a file. The file contains the waste produced in the previous calendar year. So the file we get in all 12 months of 2019 contains the waste data for 2018, but growing every month. For example, in Jan 2019 the file contains 17,000 entities (i.e. companies, agencies and councils), in Feb 2019 the file contains 32,000 entities, the March 2019 file contains 49,000 entities, and so on, like below.

Case 2
Figure 4. Waste Regulation monthly data file loaded into a transactional fact table

The February file contains all the 17,000 January entities plus 15,000 entities that report in February. But the February file also contains corrections for January. In other words, in the February file the waste data for some of these 17,000 January entities might have changed. For example, company A which in January reported waste of 11,000 kg of waste for 2018 calendar year, in February might submitted a correction to change this figure to 11,500 kg. This correction may happen up to 6 months after the submission deadline. So the correction for January data can happen in the July file.

Now we need to decide whether transactional fact table is the right approach for this case.

Case 2 Periodic vs Transactional
Figure 5. Comparing Transactional Fact Table and Periodic Snapshot

If we build this fact table as a periodic snapshot fact table, we will have every version of the data. We will have the January version when Entity A was 11,000 kg, and we will also have the February and March version when entity A was 11,500 kg.

If we build it as a transactional fact table, we will only have the latest version.

From my experience many people simply go for the snapshot model because of “just in case”. In case we need the January version.

The other reason, which is more valid, is trace-ability. If we use a transactional fact table, the report we produced in Feb 2019 (based on the Jan data) will have different numbers to the report we produce in Apr 2019 (based on the Mar data). If we use the snapshot model, we can explain the differences, i.e. because in January Entity A was 11000 kg, not 11500 kg.

So in case 1 we are dealing with “incremental” monthly data files, whereas in case 2 we are dealing with “accumulated” monthly data files.

 

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: