8 March 2011

Time Dimension

Filed under: Data Warehousing — Vincent Rainardi @ 10:03 pm
Tags:

This is a classic topic in dimensional modelling. I am not talking about date dimension here, but time dimension, e.g. 2011-03-08 19:31:05.289. Consider this case:

It is an order table in a stock brokerage company (click here for a background). It has these 3 columns: Order date time, cancellation datetime, settlement date. Of course it has other columns, but they are irrelevant to this post. Sample data:

Order Datetime| Cancellation Datetime|Settlement Date
2011-03-08 08:59:31.201|NULL|2011-03-11
2011-02-12 14:07:22.549|2011-02-13 06:38:30.216|2011-02-16

The SettlementDate is irrelevant in this matter because it is a date column. Let’s take Order Datetime column as an example, because it is a datetime column. The question is, how are we going to store it in the warehouse/mart? We have 3 options:

a. Create OrderDateKey and OrderTimeKey in the fact table? They then link to 2 dimensions: DimDate and DimTime.
b. Create OrderDatetimeKey in the fact table? This then links to DimDatetime.
c. Not keying it, but store it as a datetime column in the fact table?

Some of the things that we need to consider in this case are:

1. Do you need the millisecond element of the time?
2. Do you need the “second” element of the time or just the minutes?
3. Do you need to band the time, e.g. into 1 hour slots or half hour slots.
4. Do you need to analyse the time element separately from the date element?
Or the reverse: do you need to analyse just the date element without the time?
5. Is this a fact table or a dimension table?

In many DW projects option b is considered impractical. Option a and c are chosen more often than b. If the millisecond is needed, usually option c is more suitable. If the millisecond is not needed, usually option a is more suitable.

If it is a fact table, then the tendency is to key the datetime column (option a). If it is a dimension table, the tendency is not to key the datetime column (option c).

If the date is analysed separately from the time element, option a is preferred than option b.

Apart from the 3 options above, we have a “combi” option, i.e. the date column is keyed, but the time column is not. This is rarely used. I’m struggling to see the justification or the case where this approach is more suitable than option a, b, and c.

DimDatetime (option b) practically speaking can only be used if the grain is minute. We can’t do b if the grain is second or ms.

Here are the number of rows for each dim (the word in brackets shows the grain of the dim):

DimTime (min):     1,440
DimTime (sec): 86,400
DimTime (ms): 86,400,000

DimDatetime (min): 5,256,000
DimDatetime (sec): 315,360,000
DimDatetime (ms): 315,360,000,000

That is why practically speaking the last 2 lines can’t be used. Because the dimension would have had 315 million and 315 billion rows respectively.

As usual I welcome any feedback and discussion at vrainardi@gmail.com. Vincent 8/3/11.

Update, 21/4/2011: If you need to compose a hierarchy between date attributes and time attribute (say day and hour) then it is best to keep them as 1 dimension (not Dim_Date and Dim_Time separately)

1. Vincent – I was doing some research on best practices while building an architecture for data warehouse model to support dates and times down to the millisecond and came across your post.

Regarding your topic and the premis of my research, I have a scenario that requires my grain to be down to the (ms) over a date span (1/1/1999 to 12/31/2011). With that in mind, one dimension wouldn’t be appropriate, would it?

Regards,
Michael Lee

Comment by Michael Lee — 5 June 2011 @ 1:33 pm

• Apologies I’m not sure why even though you posted this in 2011, I just saw it today.
Yes Michael, it would be better to have 2 dimensions (date and time of day) rather than 1 dimension with the grain of ms.

Comment by Vincent Rainardi — 9 February 2015 @ 8:39 am

2. […] Altough it takes a bit of time to maintain the lookup table, overall we still save time as the time saved by querying a slim key lookup table is a lot greater than the time required to maintain the lookup table itself. This is especially true for a Master Data Management (MDM) dimensions such as customer, product and account, where they are used all over the place in the data warehouse and data marts. Time dimension is not that long – 10 years is only 3651 rows – hence I tend not to have a lookup for time dimension. For those of us who think to set the grain of time dimension to hours or minutes, the general advice is: don’t. Either put a time stamp column on the fact table or have a time of day dimension. But this discussion (time of day dimension) is for another article: link. […]

Pingback by Populating Fact Tables | Data Warehousing and Business Intelligence — 9 February 2015 @ 8:09 am

Blog at WordPress.com.