Sometimes we are not sure whether a number should be a measure or an attribute. As always it is easier to learn by example.
Example 1. It’s a customer loyalty programme in an eShop, an eCommerce company. Every year they give all customers a simple survey containing 3 questions. One of the question is how satisfied are you with eShop.com. It’s an integer number between 1 (lowest) and 10 highest. The question is: is this customer satisfaction number a measure, or an attribute? Would you a) create a fact table with CustomerKey and YearKey and have the CustomerSatisfaction as measure, or b) put it as an attribute in DimCustomer?
Example 2. eEverything.com sells thousands of product items in 5 categories: eBook, eFilm, eMusic, eSoftware, eGames. Every December, they calculate all the things that customer bought from eEverything.com and give them 5 numbers: the probability that the customer will make purchase an eBook/eFilm/eMusic/eSoftware/eGames in the coming year. These 5 probability numbers will drive the CRM email marketing campaign in the next year. Are these probability numbers measures or attributes? Would you a) create a fact table with CustomerKey and YearKey and have them as 5 measure columns, or b) put them as 5 attributes in DimCustomer?
Example 3. MyStore.com is an eCommerce company that let customers build their own stores. Customers can search millions of products and choose what they want to put on their stores. From socks to cups. From notebooks to TV. Rather than “browsing the whole internet” they have everything that they like in 1 place. They can even add the same product from two different suppliers (with a different price of course). Because MyStore.com has millions of customers, they can aggregate the products that are wanted by many customers. Say that this month, product A (which is $10) is wanted by 100,000 people. MyStore then asks the supplier for a 5% discount. Knowing that they will have lots of sales, the supplier usually agrees to give a discount. MyStore then contacts the customers if they would like to buy it for $9.50. Each product has a number, which shows how the probability of the product will be given a discount, because the demand hits a “mass scale”. Every day, MyStore data warehouse system calculates this number, which is called PFD, Probability For a Discount. Would you put PFD as a measure in a fact table (with ProductKey and DateKey) or as an attribute of the product dimension?
In my opinion, the factors that we need to consider when making this decision is:
- How frequent does it change. If it changes daily then put it as a measure. If it never changes then put it as an attribute. If it’s changes once a year we can handle it as SCD, if the dim is not too large (say up to 1 million members)
- Usually a numeric is put as a measure, but if it is not aggregatable then what’s the point? It make sense to put it in a dimension as an attribute. But see grain below (point 3). But if it is aggregatable then there is an advantage to put it as a measure.
All 3 examples above are not aggregatable.
- Is the grain the same as the dimension? In the 3 example above, the grain is only the customer (plus time of course). But if the grain spans across several dimensions (like this example) then we cannot put it on 1 dimension. It needs to be put on a fact table.
As usual I welcome comments or discussion at email@example.com. Vincent 2/3/11.