Data Warehousing and Business Intelligence

17 June 2010

One or Two Dimensions

Filed under: Data Warehousing — Vincent Rainardi @ 5:41 am
Tags:

Each customer can have more than one account. Should we

a)  create one dimension: dim_account, with grain = account, PK = account_key, and customer details become attributes of this dimension. On the fact table we only have account_key FK.
b)  create two dimensions: dim_account (grain = account, PK = account_key) and dim_customer (grain: customer, PK: customer_key). On the fact table we have 2 FKs: account_key and customer_key.
c)  snowflake, i.e. create dim_account and dim_customer as per b), but on the fact table we only have 1 FK: account_key. On dim_account we have customer_key FK.

What are the advantages and disadvantages for each of the three approaches above? In what case/situation does each approach work best?

Before I start, there are 2 more approaches, derived from the above 3 main ones above:
d)  as per b) but on dim_account we have customer_key FK, which connects dim_account directly to dim_customer
e)  as per a) but on dim_account we have customer_key column, and on the fact table we have 2 FKs: account_key and customer_key

OK, now the advantages and disadvantages. For each of the 5 approaches above, I will elaborate the main advantage, in which situation is that approach best used, the main disadvantage and any issues when using that approach.

a) One Dimension

The main advantage is simpliciy, i.e. there is only 1 dimension and 1 fact table, and a 1-column link between them. This approach is best used when we don’t have other fact table(s) with grain = customer and we don’t need dim_customer to be created.

The main disadvantage is we don’t have dim_customer. So a fact table with the grain = customer can’t be built. An attempt to use dim_account for that fact table (using account_key to link) will result in difficulties in maintaining the account_key column on the fact table (to point to the right customer). The difficulties are especilly felt when the dim is SCD type 2.

b) Two Dimensions

The main advantage is: from the fact table we can the get customer details without knowing the account_key. In which Ideal to be used when we have a fact table with the grain = account and another fact table with grain = customer, but we don’t have the need to get the customer details for a particular account.

The main disadvantage is: we cannot ‘go’ from dim_account to dim_customer (and vice versa) without going through the fact table. All ‘trafic’ between dim_account and dim_customer must go via the fact table. How? Using select distinct, e.g.

select distinct account_key, customer_key from fact1

The fact table could be big. If it’s a 1 billion row fact table, this query would be slow. An unclustered index on both the account_key and customer_key helps the performance, but still a lot slower than approach c), d) or e). Unfortunately, we can’t materialize the above query using an indexed view. The select statement in an indexed view cannot contain DISTINCT keyword.

c) Snowflake

The main advantage is: it’s modular i.e. we have 2 separate dimension tables but can combine them easily to create a bigger dimension. If we have a fact table with the grain = account and another fact table with grain = customer, and we need to get the customer details for a particular account without going through the fact table, then this is the ideal approach to use.

The main disadvantage of snowflake approach is: in a fact table with the grain = account, to get the breakdown of a measure by customer details it’s a little bit more complicated than approach a).

In Figure a) : (One Dimension approach)

select a.customer_attribute1, sum(f.measure1)
from fact1 f
inner join dim_account a on f.account_key = a.account_key
group by a.customer_attribute1

In Figure c) : (Snowflake approach)

select c.customer_attribute1, sum(f.measure1)
from fact1 f
inner join dim_account a on f.account_key = a.account_key
inner join dim_customer c on a.customer_key = c.customer_key
group by c.customer_attribute1

But this disadvantage is relatively very small compared to the advantage of having a customer dimension which is linked to the account dimension.

d) Two Dimensions with Inter-Dimension Link

The main disadvantage of the Two Dimensions approach is that we can’t “go” from account dimension to customer dimension without going through the fact table. And the main disadvantage of the Snowflake approach is that we can’t access dim_customer directly from the fact table. This approach is trying to fix both problems. So there are 2 main advantages:

a)  we can “go” from account dimension to customer dimension, and
b)  we can access dim_customer directly from the fact table

When should we use this approach? When a) we have a fact table with the grain = customer so we need to create dim_customer, b) when we need to access customer details from the fact table directly, and c) we need to get the customer details for a particular account without going through the fact table.

What is the main disadvantage of this approach? The “duplicated effort” in maintaining the customer key. We maintain the customer key column in the fact table as well as in the dim_account.

e) One Dimension With Customer Key

This approach is trying to fix the disadvantage of the One Dimension approach, which is the inability to build a fact table with the grain = customer because we don’t have dim_customer with its own customer_key.

So in this approach we add a column to the dim_account: customer_key. This enable us to build a fact table with the grain = customer, but keeping the number of dimensions to one.

This approach is not as popular as c) and d) in solving dim_customer issue because it is “undecisive”. It is trying to create a dim_customer but it doesn’t want to create a dim_customer. In c) and d) we have dim_customer physically, but in e) the dim_customer is hidden inside dim_account. Making it a) more difficult to maintain (especially if it is a type 2), and b) less modular/flexible, which are the main disadvantages of this approach.

Please bear in mind that the above concept is applicable not only dim_customer & dim_account, but to all dimensions with hierarchical relationship. For example, dim_product and dim_product_group, dim_policy and dim_line.

About these ads

7 Comments »

  1. [...] Answer: Depends on how they are going to be used, as I explained in my article “One or two dimensions” here. [...]

    Pingback by Data Warehousing Interview Questions « Data Warehousing and Business Intelligence — 11 December 2010 @ 10:59 pm | Reply

  2. [...] via the fact table. Which is why some designer “link the 2 dimension via a back door”. Read here for details, case [...]

    Pingback by Denormalising a Fact Table « Data Warehousing and Business Intelligence — 26 February 2011 @ 9:00 am | Reply

  3. In the section … you begin the second example with … Should it be ? You are joining dim_customer to dim_account because it is a snowflake.

    Comment by Aalam Rangi — 18 September 2014 @ 3:52 pm | Reply

    • Sorry Alaam, which section did you mean? And which second example did you mean?

      Comment by Vincent Rainardi — 18 September 2014 @ 4:28 pm | Reply

      • Wow, the comment box just gobbled my words :-) I think it cuts out parentheses or colons.

        Anyway, I was referring to section C-Snowflake which has two example code snippets in blue. The second snippet is preceded by the intro text IN B which refers to the figure b (Two Dimensions). Should it not be referring to figure c (Snowflake) because the snippet has an INNER JOIN between dim_customer and dim_account?

        Comment by Aalamjeet Rangi — 19 September 2014 @ 6:00 pm

      • Yes you are right Aalam, it is figure c) not figure b). I have corrected it. Thanks for the correction.

        Comment by Vincent Rainardi — 19 September 2014 @ 11:45 pm

  4. My previous comment lost some words so rewriting. In the section c) Snowflake … you begin the second example with In b): … Should it be written as In c): ? Because you are joining dim_customer to dim_account as a snowflake.

    Comment by Aalam Rangi — 18 September 2014 @ 3:54 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 208 other followers

%d bloggers like this: