This is a classic topic in dimensional modelling. I am not talking about date dimension here, but time dimension, e.g. 2011-03-08 19:31:05.289. As always it is easier to explain by example:
It is an order table in a stock brokerage company (click here for a background). It has these 3 columns: Order date time, cancellation datetime, settlement date. Of course it has other columns, but they are irrelevant to this post. Sample data:
Order Datetime| Cancellation Datetime|Settlement Date
2011-03-08 08:59:31.201|NULL|2011-03-11
2011-02-12 14:07:22.549|2011-02-13 06:38:30.216|2011-02-16
The SettlementDate is irrelevant in this matter because it is a date column. Let’s take Order Datetime column as an example, because it is a datetime column. The question is, how are we going to store it in the warehouse/mart? We have 3 options:
a. Create OrderDateKey and OrderTimeKey in the fact table? They then link to 2 dimensions: DimDate and DimTime.
b. Create OrderDatetimeKey in the fact table? This then links to DimDatetime.
c. Not keying it, but store it as a datetime column in the fact table?
Some of the things that we need to consider in this case are:
- Do you need the millisecond element of the time?
- Do you need the “second” element of the time or just the minutes?
- Do you need to band the time, e.g. into 1 hour slots or half hour slots.
- Do you need to analyse the time element separately from the date element?
Or the reverse: do you need to analyse just the date element without the time? - Is this a fact table or a dimension table?
In many DW projects option b is considered impractical. Option a and c are chosen more often than b. If the millisecond is needed, usually option c is more suitable. If the millisecond is not needed, usually option a is more suitable.
If it is a fact table, then the tendency is to key the datetime column (option a). If it is a dimension table, the tendency is not to key the datetime column (option c).
If the date is analysed separately from the time element, option a is preferred than option b.
Apart from the 3 options above, we have a “combi” option, i.e. the date column is keyed, but the time column is not. This is rarely used. I’m struggling to see the justification or the case where this approach is more suitable than option a, b, and c.
DimDatetime (option b) practically speaking can only be used if the grain is minute. We can’t do b if the grain is second or ms.
Here are the number of rows for each dim (the word in brackets shows the grain of the dim):
DimTime (min): 1,440
DimTime (sec): 86,400
DimTime (ms): 86,400,000
DimDatetime (min): 5,256,000
DimDatetime (sec): 315,360,000
DimDatetime (ms): 315,360,000,000
That is why practically speaking the last 2 lines can’t be used. Because the dimension would have had 315 million and 315 billion rows respectively.
As usual I welcome any feedback and discussion at vrainardi@gmail.com. Vincent 8/3/11.
Update, 21/4/2011: If you need to compose a hierarchy between date attributes and time attribute (say day and hour) then it is best to keep them as 1 dimension (not Dim_Date and Dim_Time separately)