Data Warehousing and Business Intelligence

19 December 2009

Where To Put an Attribute (DW)

Filed under: Data Warehousing — Vincent Rainardi @ 9:59 pm
Tags:

When designing a data warehouse, sometimes we need to decide whether to put an attribute(s) in its own dimension, or in an existing dimension. That is, if you have a fact table with dim1_key on it, you need to decide whether to put this new attribute(s) on dim1 or create a new dim2.

In travel or airline industry, you could have a ‘ticket’ dimension and put all the ticket details on it. In insurance, you could have a policy dimension and put all policy details on it. In retail banking, we could have an account dimension and put all the account details on it. In telecom industry, we could have call dimension and put all the call details on it. In health care, we could have a dimension called ‘case’ and put all the case detail on it. In investment banking, we could have a dimension called Transaction, and put all the transaction details on it. This concept is the same as creating an order dimension in the retail or ecommerce industry. It’s like a large dimension, which should be split into several dimensions.

In banking, a group consists of several companies, and each company can have several accounts. We are not going to put the companies details into account dimension, are we? We are going to put the companies details into customer dimension. But, are we going to split company and group into separate dimensions? In telecom,

In some cases, it is a decision whether we snowflake or keep it as a star. Which is easy to decide, as that depends on your style/principle of warehousing. But in some cases, it’s not a case of snowflaking. In snowflake, dim2 is linked to the fact table through dim1. But in many of the cases above, it is about whether we have dim1 & dim2 both linked to the fact table, or combine them into 1 dim (with lower grain).

It’s not a matter of fact table grain either, i.e. whether we can split the measure into several rows based on those dim2 attributes. Because we know that (for example) the grain of the fact table is dim1, dim2, dim3, dim4. But for analysis purpose, we could split dim4 into 2 dims. We could put say 3 attributes of dim4 into dim5. The grain of the fact table is still dim1, dim2, dim3, dim4. Dim 5 does not affect the measure. If the value of measure1 on a fact row is 100, after we split dim4 it will still be 100. The fact that there are 5 dim keys in the fact table doesn’t mean that the grain of the fact table is those 5 dims. No. In many cases, the number of the dim keys on the fact table is more than the ‘grain keys’.

In these cases, the question we should ask myself when deciding is: can those dim2 attributes change without dim1 attributes changing? If they can, then they should be in a separate dimension.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: