In a fact table there are a few dimension key columns. These columns link to dimension tables. Each dimension table has a grain, the lowest unit that constitute a row (apart from SCD stuff). The question here is whether we should put an attribute from one of those dimensions into its own dimension. As usual, it is easier to explain by example.
Example 1. ProductGroupKey. In the classic case of retail mart, in the sales fact table we have ProductKey, DateKey, StoreKey and CustomerKey. In the Product dimension we have product type. A product group depends only on product key, hence Product Group is an attribute of product dimension. The grain of product dimension is product. Product group has a higher grain than product. Should we separate product group attributes into a separate dimension and have ProductGroupKey in the sales fact table
Example 2. CustomerKey. In the stock broker data mart (see my SQLBits presentation here for a background), in the account balance fact table we have AccountKey, OrderDateKey, OrderTypeKey, ShareKey, SettlementDateKey, OrderStatusKey and CancellationDateKey. We have an Account dimension and we also have a Customer dimension. A customer can have several accounts but an account can only belong to one customer. So a customer depends totally on an account. Hence the CustomerKey should be attribute of the account dimension. The grain of the account dimension is account. Customer has a higher grain than account. There is no doubt we need to have a customer dimension, but should we keep the CustomerKey in the account dimension or should put the CustomerKey in the fact table?
Example 3. StatusKey. In the product or customer dimension, we have customer status and product status. There could be several statuses. For customer: credit status, CRM status, customer life cycle status, whereas for product: stock status, product life cycle status. Some statuses change once a year, some once a week. Should we put these statuses into a separate dimension and have a status key in the fact table?
The things that we need to consider when making these decisions are:
- Whether there are many other attributes at the same level or not. In example 2, customer name is an attribute of an account, but there are many other customer attributes which are at customer level, hence they need to be put into a separate dimension.
- This is an influential factor: Whether there are measures which are at that level. In example 1, if we have several measures at Product Group level, and we can’t break those measures down to Product level, then we will need to have a dimension at this grain. We need to have a Product Group dimension.
- One way for dealing with a rapidly changing attribute in a type 2 dimension is to put it in its own dimension, with StatusKey column in the fact table. If in the product dimension there are 3 statuses and they are changing every day, then they should be put into a separate dimension, with direct link to the fact tables.
- This is an influential factor: Whether the relationship between that higher grain attribute and the key of the dimension is many-to-many. If it is many-to-many, then it needs to be moved to a separate dimension. For example, in Example 1 above, if a product can belong to 2 product groups, then product group cannot stay in the product dimension.
- Whether the attribute is also an attribute of another dimension. For example, geography attributes such as city and country. In insurance data warehouse, both the broker dimension and the risk dimension have city attribute. We may want to consider putting “city” in a separate dimension called geography or location.
Number 2 and 4 above are influential factors, i.e. the decision is sort of “forced” (borrowing a chess terminology); you don’t have a choice. Whereas number 1, 3 & 5 are “non-forced”, i.e. you don’t have to do it; you have a choice.
As usual I welcome question and discussion at firstname.lastname@example.org. Vincent 5/3/11.