If a dimension only has one attribute, is it worth putting the attribute in the fact table instead? Why should we bother to create a separate table for that dimension, with 2 columns: dim key and that single attribute? That will make the query slow right? Consider a simple 1 dim 1 fact scenario:
create dim1 (dim1key int, attr1 varchar(20))
create fact1 (dim1key int, measure1 money)
A typical star join query for that mart is:
select d.attr1, f.sum(measure1)
from fact1 f
inner join dim1 d on d.dim1key = f.dim1key
group by d.attr1
If we put the attribute in the fact table:
create fact1(attr1 varchar(20), measure1 money)
we could query the fact table directly, which means better performance:
select attr1, sum(measure1) from fact1 group by attr1
Well, there are 3 reasons why we should put that single attribute in its own dimension table:
- To keep the fact table slim (4 bytes int rather than 20 bytes varchar)
- If the value changes in the source system, we don’t have to update the big fact table
- Yes it’s only 1 attribute today, but in the future we could have another attribute
There is an exception: snapshot month. We use snapshot month column in a periodic snapshot fact table. Snapshot month in the form of an integer (201003 for March 2010) doesn’t violate the 3 point above.
- It is an integer, not char(6).
- The value never changes, March 2010 will still be March 2010 forever
- There will not be another attribute (snapshot year? there is no such things)
How about snapshot date (20100311)? And snapshot week (201014)? Same thing, in my opinion. They can go directly on the fact table without having a dim table.
If on the other hand it’s a varchar (or char because the length is constant), then it needs to go to its own dimension table. For example: ‘2010-03-11’ (snapshot date), ‘2010-14’ (snapshot week) and ‘2010-03’ (snapshot month).
There is one more thing I need to mention: junk dimension (Kimball’s term). We could (sometimes should) combine dimensions with one attribute into a junk dimension.
VR, 30/9/13: one other factor that we need to consider is the cardinality of the attribute column. Cardinality in database design means “degree of value uniqueness”. An DD attribute with very high cardinality such as order number, transaction ID and invoice ID, should probably be left in the fact table. This is the convention, which was recommended in Kimball’s Data Warehouse Toolkit book (put DD in fact table). But if the DD has very low cardinality (status for example), i would argue that it is beneficial to put it in a separate fact table. Factor #3 above (there will be another attribute) would be applicable to Status DD. Status could have 3 attributes, not just 1: account status code (A, C, S), account status (Active, Closed, Suspended), and account status description (explaining that Suspended means in dispute pending investigation).