This is a short article about the time element in the effective/expiry date column in an SCD type 2 dimension. The format used in this article (and all my articles in this blog) is DD/MM/YYYY. Thanks to Jim Kennedy for his feedback about the date format. Apologies for the confusion.
In many data warehouses, I still see that the effective date and expity date columns in the type 2 dimension is set to date only, i.e. 12/02/2012. This is incorrect. We should have the time element as well, i.e. 12/02/2012 08:56:17. Why, because if we have 3 updates on an attribute for the same day, we will be screwed.
Here’s an example:
The customer type is updated 3 times today. In the beginning it was “UK Individual”. At 10 o’clock it changes to “International”. Then at 1 o’clock it changes to “Business”. And finally at 3 o’clock it changes to “Group”. Then we’ll have something like this:
… resulting in a lot of confusion when the fact table calls “give me the customer type for 13/2/2012”. Because 4 rows are returned. All 4 customer types above are returned when you join the fact table with “…where effective_date <= 13/02/2012 and expiry date >= 13/02/2012”.
If the effective and expiry dates have time elements in them, it would like this:
Now when the fact table calls “give me the customer type for 13/2/2012”, the answer is UK Individual. Why? Because 13/2/2012 implies that it is 13/2/2012 00:00:00. Hence it will return only 1 row: the top row above.
But is that what you want? Usually no. Usually we want the last value entered that day, i.e. in the above case it’s Group. But how do we do that? Simple, we add 23:59:59 like this “… where effective_date <= 13/02/2012 23:59:59 and expiry_date >= 13/02/2012 23:59:59”
As always I welcome suggestions and questions at email@example.com
Vincent Rainardi 15/2/2012