Data Warehousing and Business Intelligence

15 February 2012

City and Rank

Filed under: Data Warehousing — Vincent Rainardi @ 10:20 pm
Tags:

City is an attribute of the customer dimension. Why don’t we put city in its own dimension, with separate SK on the fact table? Because we need to analyse city in the customer context.

City is also an attribute of the supplier dimension. Why don’t we put city in its own dimension, with separate SK on the fact table? Because we need to analyse city in the supplier context.

Why don’t we snowflake and put city in its own table, linked to both the customer and supplier dimension? Because if we snowflake the structure it is more complex; we need to do more joins. And because we need to analyse city with two separate contexts: customer and supplier.

It is best to leave city on both the customer dim and the supplier dim (and the broker dim, and the store dim, where applicable). They will form different geography hierarchies on each dimension. On the customer dim it could be city-county-country-area-region. On the supplier it could be city-country-region.

The organisation (data) for the geography hierarchy on each dimension could also be different. The store dimension may have smaller areas than the customer dim. On the customer dim Luxemburg may belong to Western Europe area. But on the store dim Luxemburg may belong to Benelux area.

This design gives flexibility which often required by many large corporations. Contradict to popular believes that corporations need a single, golden standard (ala MDM), I found that many corporations require more flexibility than a single standard. This is because corporations are essentially group of companies and each part of the organisation have different way of doing business.

The case of industry sector is also similar. It is a hierarchy which grouped the customers, suppliers and partner organisations. Telecom, retail, healthcare, government, manufacturing, transport, technology, etc. This is particularly important in the B2B business, and in particular for financial institutions. In the UK and US we have SIC, Standard Industrial Classification. For example: 200 different SICs are then grouped into 80 subclasses, and these subclasses are grouped into 40 classes, which is then grouped into 6 hyperclasses.

Customer Rank

Customer rank is an attribute of a customer. We order all active customers based on their orders in the last 12 months. So customer with rank = 700 is the 700th biggest purchaser in the company. The “valuation” is done on monthly basis. So a customer which is #700 on 1st January could be #800 on 1st Feb. Should we put rank in the customer dimension as a normal attribute?

I believe the answer is no. We should be putting customer rank on its own fact table. Why? Because it changes every month, it would be sensible to create a fact table with 2 SKs: month key and customer key. So the grain is 1 row for each customer for each month. The measures that we can put in this fact table are: rank, order value (last 12 months), order value (last 6 months), number of orders (last 12 months), number of products purchased (last 12 months), number of distinct products purchased (last 12 months), avarage order value (last 12 months), etc.

Unlike the other measures I just mentioned, rank is not really a measure, I know. It is an attribute.But because it changes every month, we put it in a separate fact table. What if we only calculate the customer rank not once a month but once a quarter? We still need to put rank in a separate fact table, not in the customer dimension. What if we calculate it once a year? We still need to put rank in a separate fact table. What if we calculate the customer rank every day? Then we definitely need to put it on a daily snapshot fact table.

The above monthly customer fact table could store a lot of other useful measures and attributes. Basically any monthly evaluation or calculation that concerns customer can be stored in this fact table. Various different customer ratios for example.

So you see, even though rank is (by entity definition, dependency, etc) an attribute/property of a customer, during dimensional modelling we could choose not to put it in the customer dimension.

As always I welcome suggestions and questions at vrainadi@gmail.com

Vincent Rainardi, 13/2/2012

Advertisements

4 Comments »

  1. Hi

    I see your conclusions for Rank, but I dont see any for City
    Can you please clarify a little bit more on what are your suggestions for City attribute?

    Snowflake?
    Separate Dim table for Customer’s city?
    Separate Dim table for Supplier’s city?

    Comment by Armando — 15 February 2012 @ 11:15 pm | Reply

    • Hi Armando. Apologies I missed the conclusion for City. I’ve added it to the article.

      Comment by Vincent Rainardi — 16 February 2012 @ 7:49 am | Reply

  2. It should probably be
    “City is also an attribute of the supplier dimension. Why don’t we put city in its own dimension, with separate SK on the fact table? Because we need to analyse city in the __supplier__ context.”

    Comment by Utwig — 23 February 2012 @ 10:47 pm | Reply

    • Corrected from “customer context” to “supplier context”. Thanks Andrej. Vincent

      Comment by Vincent Rainardi — 24 February 2012 @ 7:49 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: