Data Platform and Data Science

27 November 2021

Using Temporal Tables in Data Warehousing

Filed under: Data Warehousing — Vincent Rainardi @ 2:38 pm

Can we use temporal tables in data warehousing? Yes for staging area (persistent). Yes for slowly changing dimension, but not recommended. No for standard types fact tables, but yes for particular cases.

What are temporal tables?

Temporal tables keep all the historical changes, i.e. deleted rows and previous values before updates. Here is the documentation: link.

We can query a temporal table to get the latest version or the old versions:

Using Temporal Tables for Staging Area

We can use a temporal table for a staging area. There are 2 types of staging area: transient, which is the one you truncate before every load, and persistent, which you don’t truncate / delete (you keep all the historical data here).

Please read Hans Michials experience here: link (on using temporal table for staging area). The conclusion was: a temporal tables are a lot faster than conventional persistent staging tables. And temporal tables are able to adopt changes in the data model too.

And, out of the box, temporal tables enable us to do historical load into dimensions. And it is simple and reliable.

Using Temporal Tables for Slowly Changing Dimensions

Please read what Microsoft wrote in August 2021 first: link (on using temporal tables for slowly changing dimensions). Then please read what Tim Michell wrote in April 2019: link (scroll down to this part: using temporal tables for slowly changing dimensions). The conclusion: temporal tables is good for slowly changing dimensions. It simplifies the load process. It simplifies changing a dimension from type 1 to type 2.

But it is a bit tricky to implement temporal tables for SCD type 2. The join to the fact table can’t be on surrogate key, but on natural key plus the effective dates. That is not a good, both for performance and conceptually, thanks to Gary who pointed out at the end of Tim’s article.

And Gary also pointed out one important fact: in temporal tables the temporal times can overlap. In SCD type 2, the effective dates must not overlap (because otherwise we won’t know which row was effective at certain datetime.

And the killer point is (again, also from Gary) in SCD type 2 dimension, I’m usually very selective about which attribute we want to set as type 2. If there is no good reason for that attribute to be type 2, it stays as type 1 (including corrections). But in the temporal tables, changes on every column result in new rows. We can’t be selective and choose only 5 columns out of 50, for example.

Because of those reasons, I would not recommend using temporal table for SCD type 2 dimension.

Using temporal tables for fact tables

First please read Paul Te Braak article on using temporal tables for fact tables: link. A SCD type 2 dimension captures which attributes were active for each fact row, whereas a temporal table doesn’t do that.

Let’s have a look at the 2 transaction above from James Hood buying oil filter for his TD5 Discovery II Land Rover car. One was last year (Sep 2020) and was this week (Nov 2021). The attributes of the oil filter are different between last year and this year. The customer attributes are also different. That is what SCD type 2 dimension captures. They captures which attributes were active for each fact row. Whereas temporal tables don’t do that.

Second, please read Martin Schoombee article on how he implemented temporal tables for his retail pricing tool case: link. In this specific case, to me it makes sense to use temporal fact table. Martin’s case is similar to Davide Mauri’s insurance document case that he presented at SQLBits X: link. In this case it also makes sense to use temporal tables for the fact table.

But, temporal tables are not suitable for replacing the 3 Kimball fact table types i.e. periodic snapshot fact table, accumulative snapshot fact table or transaction fact table. In the periodic snapshot fact table, the timing for snapshotting is at particular dates, for example end of month or beginning of month. We control the timing, but in temporal tables we don’t have any control on the snapshot date. Because of all the reasons above we can’t use temporal tables for all 3 Kimball types of the fact tables. But your can use it if your case is like Martin’s or Davide’s cases above.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.