Data Warehousing and Data Science

11 March 2010

A dimension with only one attribute

Filed under: Data Warehousing — Vincent Rainardi @ 7:10 pm
Tags:

If a dimension only has one attribute, is it worth putting the attribute in the fact table instead? Why should we bother to create a separate table for that dimension, with 2 columns: dim key and that single attribute? That will make the query slow right? Consider a simple 1 dim 1 fact scenario:
create dim1 (dim1key int, attr1 varchar(20))
create fact1 (dim1key int, measure1 money)

A typical star join query for that mart is:
select d.attr1, f.sum(measure1)
from fact1 f
inner join dim1 d on d.dim1key = f.dim1key
group by d.attr1

If we put the attribute in the fact table:
create fact1(attr1 varchar(20), measure1 money)
we could query the fact table directly, which means better performance:
select attr1, sum(measure1) from fact1 group by attr1

Well, there are 3 reasons why we should put that single attribute in its own dimension table:

  1. To keep the fact table slim (4 bytes int rather than 20 bytes varchar)
  2. If the value changes in the source system, we don’t have to update the big fact table
  3. Yes it’s only 1 attribute today, but in the future we could have another attribute

There is an exception: snapshot month. We use snapshot month column in a periodic snapshot fact table. Snapshot month in the form of an integer (201003 for March 2010) doesn’t violate the 3 point above.

  1. It is an integer, not char(6).
  2. The value never changes, March 2010 will still be March 2010 forever
  3. There will not be another attribute (snapshot year? there is no such things)

How about snapshot date (20100311)? And snapshot week (201014)? Same thing, in my opinion. They can go directly on the fact table without having a dim table.

If on the other hand it’s a varchar (or char because the length is constant), then it needs to go to its own dimension table. For example: ‘2010-03-11’ (snapshot date), ‘2010-14’ (snapshot week) and ‘2010-03’ (snapshot month).

There is one more thing I need to mention: junk dimension (Kimball’s term). We could (sometimes should) combine dimensions with one attribute into a junk dimension.

VR, 30/9/13: one other factor that we need to consider is the cardinality of the attribute column. Cardinality in database design means “degree of value uniqueness”. An DD attribute with very high cardinality such as order number, transaction ID and invoice ID, should probably be left in the fact table. This is the convention, which was recommended in Kimball’s Data Warehouse Toolkit book (put DD in fact table). But if the DD has very low cardinality (status for example), i would argue that it is beneficial to put it in a separate fact table. Factor #3 above (there will be another attribute) would be applicable to Status DD. Status could have 3 attributes, not just 1: account status code (A, C, S), account status (Active, Closed, Suspended), and account status description (explaining that Suspended means in dispute pending investigation).

1 Comment »

  1. […] A dimension which only has 1 column, and therefore be kept in the fact table is called a Degenerate Dimension. A Degenerate Dimension is usually used to store identifier of the source table, such as Transaction ID and Order ID. But it is also perfectly valid for dimensions which naturally only have one attribute/column, like Year dimension. See my article about “A dimension with only one attribute” here: link. […]

    Pingback by DimMonth, DimQuarter and DimYear | Data Warehousing and Business Intelligence — 29 December 2015 @ 6:23 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: