# Data Warehousing and Data Science

## 20 February 2019

### Transactional Fact Tables

Filed under: Data Warehousing — Vincent Rainardi @ 9:11 pm

Transactional fact tables are not as popular as periodic snapshot fact table. In this article I would like to compare transactional and periodic snapshot fact tables, list their advantages and disadvantages, and give two cases from my own experience where I needed to decide between the two. But first let me explain what these two types of fact tables are.

What is a transactional fact table?

In dimensional data modelling we have 3 types of fact tables: transactional, periodic snapshot and accumulation snapshot. I’m going to explain the first two below.

1. Transactional Fact Table

A transactional fact table is a fact table where:

• Each event is stored in the fact table only once.
• It has a date column indicating when the event occurred.
• It has an identifier column which identifies each event.
• The number of rows is the same as the source table.

A classic example is when in the source system we have a sales table containing customer orders (for example, in a shop, a restaurant, or a factory). Say Monday we had 80 orders, Tuesday 70 orders, Wednesday 90, and so on. So on Monday night we load the 80 rows for Monday into the data warehouse. On Tuesday night we load the 70 rows for Tuesday, and on Wednesday night we load the 90 rows for Wednesday.

Figure 1

In the data warehouse we store the customer orders in the Sales Fact Table, which is a transactional fact table.

• In this Sales Fact Table, every order is stored only once. The 70 orders on Tuesday are different to the 80 orders on Monday. And they are also different to the 90 Wednesday orders.
• For this we use the order date column. In the above example the Monday, Tuesday and Wednesday are the order date. This order date column indicates when the event occurred, when the order happened.
• In this sales table we also have a sales identifier, such as order number if it is a shop, or ticket number if it is a restaurant.
• On Wednesday night, after the warehouse load finishes, we have 80+70+90 = 240 rows, the same as in the source system.

In addition to insert, we also have update in the source table. In the example above, in addition to 70 new orders on Tuesday, we also have updates to some of the 80 Monday orders. This is they key difference to the Periodic Snapshot fact table: a Transactional fact table updates existing rows, and therefore lost some history.

Other ideal examples of a transactional fact table is the journal table in an accounting system, a trade table in an investment banking system, a premium table in an insurance system, a payment table in a payment system, a transaction table in a retail banking system, and a call table in a telecommunication system.

2. Periodic Snapshot Fact Table

A periodic snapshot fact table is a fact table where:

• The whole source system is copied into the fact table regularly.
• The same event is stored multiple times.
• It has a snapshot date column indicating when a copy of the source table was created.

An ideal example of a periodic snapshot fact table is the bank account balance. At the end of each day, the balances of every customer account in the bank is stored in this account balance table. Say there were 20,000 customers on Monday; 22,000 customers on Tuesday and 24,000 customers on Wednesday.

Figure 2. Periodic Snapshot Fact Table

• Every day we copy the whole content of the account balance table into the periodic snapshot fact table.
• So on Monday night we stored 20,000 rows in the account balances periodic snapshot fact table, on Tuesday night 22,000 rows and on Wednesday night 24,000 rows. So an account is copied every day to the fact table, each day with potentially a different balance amount.
• In the fact table we have a column called snapshot date. For all the rows created on Monday night, we set the snapshot date column to (for example) 11th Feb 2018. For the rows created on Tuesday night we set the snapshot date to 12th Feb 2018 and for the Wednesday rows we set the snapshot date to 13th Feb 2018.

Of course there are accounts which were closed on Tuesday and no longer in the account balance table in the source system. In the fact table, the Monday data set contains these accounts, but the Tuesday data set doesn’t contain these accounts, and neither does the Wednesday data set.

And there are accounts which were updated on Tuesday. These changes will be reflected on the Tuesday snapshot in the fact table, different to their Monday rows.

Another example is inventory table in manufacturing, holdings table in fund management, billing balance table in telecommunication, and daily temperature table in a weather system.

The advantages of a transactional fact table are:

• It mimics the source table
• It is simpler as we only have once version of each event

The disadvantages of a transactional fact table are:

• We don’t have the previous values
• Update is slow if the fact table is large, potentially performance issue

Now let’s get on with the two cases of transactional fact table implementations.

Case1: Retail Association

Let’s suppose we are running a Retail Association. Every shop in the country reports their annual sales data on our website, within one month of their financial year end. In January 2019 there were 8,000 shops reporting their sales data on our website, and in Feb 2019 there were 9000 shops reporting their sales data. There are about 100,000 shops in total.

Every month we get a CSV file containing the previous month data. The file is called Data_YYYYMM.csv and the file contains a column called Reporting Date. So:

• The Data_20180131.csv contains the 8000 shops reporting in January 2019, with the Reporting Date column containing dates from 1st to 31st Jan 2019.
• The Data_20190228.csv contains the 9000 shops reporting in February 2019, with the Reporting Date column containing dates from 1st to 28th Feb 2019.

Figure 3. Retail Association monthly data file loaded into a transactional fact table

Because on the February file the January data is not given in full (but only the changed and new rows), we can’t make it as a periodic snapshot fact table. So in this case a transactional fact table is the only option.

Heart of Transactional Fact Tables

In theory the loading is straight forward. Jan file is loaded, then Feb file, and so on. But in reality this is rarely the case. A few shops who supposed to report in January were late, and they reported in February. So the in the February file we also get a few rows of the January “late reporter” shops.

Some shops made an error in their January submission and corrected it in February. So in February file we also have a few rows containing January corrections.

That is the heart of transactional fact tables: performing updates to the previous months’ data. In this case it goes back only one month, but in real cases it could be a few months.

Case 2: Waste Reporting

Suppose we are working with a government department responsible for implementing the waste regulations in our country. Every company and government agency in the country needs to report the amount and types of their packaging waste, electronic waste, commercial waste and recyclables e.g. plastic, paper, metal. This report is annual (once a year). Every town council also need to report the amount and types of their household waste, also annually.

The timing when they have to submit the report is different for every company, agency and council, depending on their financial year end. But the report contains the same year, which is the previous calendar year. So companies/agencies/councils with financial year ending on 31st March 2018 need to report the waste happened from Jan to Dec 2017. Those with financial year ending on 31st August 2018 also need to report the waste happened from Jan to Dec 2017.

The data we get from this central government department is monthly, i.e. every month we get a file. The file contains the waste produced in the previous calendar year. So the file we get in all 12 months of 2019 contains the waste data for 2018, but growing every month. For example, in Jan 2019 the file contains 17,000 entities (i.e. companies, agencies and councils), in Feb 2019 the file contains 32,000 entities, the March 2019 file contains 49,000 entities, and so on, like below.

Figure 4. Waste Regulation monthly data file loaded into a transactional fact table

The February file contains all the 17,000 January entities plus 15,000 entities that report in February. But the February file also contains corrections for January. In other words, in the February file the waste data for some of these 17,000 January entities might have changed. For example, company A which in January reported waste of 11,000 kg of waste for 2018 calendar year, in February might submitted a correction to change this figure to 11,500 kg. This correction may happen up to 6 months after the submission deadline. So the correction for January data can happen in the July file.

Now we need to decide whether transactional fact table is the right approach for this case.

Figure 5. Comparing Transactional Fact Table and Periodic Snapshot

If we build this fact table as a periodic snapshot fact table, we will have every version of the data. We will have the January version when Entity A was 11,000 kg, and we will also have the February and March version when entity A was 11,500 kg.

If we build it as a transactional fact table, we will only have the latest version.

From my experience many people simply go for the snapshot model because of “just in case”. In case we need the January version.

The other reason, which is more valid, is trace-ability. If we use a transactional fact table, the report we produced in Feb 2019 (based on the Jan data) will have different numbers to the report we produce in Apr 2019 (based on the Mar data). If we use the snapshot model, we can explain the differences, i.e. because in January Entity A was 11000 kg, not 11500 kg.

So in case 1 we are dealing with “incremental” monthly data files, whereas in case 2 we are dealing with “accumulated” monthly data files.

1. do we save history in transaction fact tables?

Comment by ankit — 21 July 2020 @ 6:55 am

• No we don’t store history in transaction fact tables. We store history in snapshot fact tables.

Comment by Vincent Rainardi — 21 July 2020 @ 7:55 am

2. Your definition of Period Snapshot Fact Table contradicts Kimball – “A row in a periodic snapshot fact table summarizes many measurement events occurring over a standard period, such as a day, a week, or a month. The grain is the
period, not the individual transaction”

Comment by lesoftwareman — 30 August 2020 @ 11:50 pm

• Thanks for your comment Lesoftwareman. In what way does it contradict Kimball’s principles? Is it the single event principle?

Comment by Vincent Rainardi — 31 August 2020 @ 7:40 pm

3. Hello Vincent, In your diagram of periodic snapshot fact table you are coping all the transactions present in the operational data source to the fact table where as definition suggests a single summary row of transactional information that occurred during the period.

Comment by Ankit — 25 September 2020 @ 9:40 pm

• Both of them are correct Ankit, in my opinion. Sometimes we need to summarise them, sometimes we need to bring all the transactions. It depends on what we will be using them for. How is it in your case?

Comment by Vincent Rainardi — 25 September 2020 @ 10:12 pm

4. Hi, I agree with other comments that your definition of a periodic snapshot is not correct. A periodic snapshot isn’t just a copy of all the source data at a point in time. It’s a reflection of either the latest status of something at a point-in-time (e.g., the last temperature measurement at end-of-month) or it’s some aggregation of all the records up that point (average temperature for the month). Recall that disk space was costly when Kimball was developing his framework – it wouldn’t make sense for him to be advocating copying your entire source system regularly. Nowadays things are different of course….

Comment by Paul — 25 November 2020 @ 11:39 pm

Blog at WordPress.com.