Data Warehousing and Business Intelligence

18 December 2015

Measure or Attribute

Filed under: Data Architecture,Data Warehousing — Vincent Rainardi @ 10:00 pm

Last week two people asked me about this subject so I thought it would be beneficial for others if I write it. It is about whether a number should be a measure (a fact) or an attribute (of a dimension). Apologies in advance I can’t reveal the real case as it is a client’s case. But I will present several similar cases to illustrate this issue. As always, it is better to learn by example (cases) than by theory.

First case: worker rate

The first case is the timesheet system in the construction industry. The daily rate of the worker is the number in question. The hourly rate is recorded as an attribute of the worker. The fact table recorded the hours each worker worked in the project, every day. The data mart is used to analyse the cost the project (by work area), and to plan for the next 6-12 months. The cost for each worker for each day is simply the rate multiplied by the hours worked.

There are 2 issues here. One, when the rate entered last month is incorrect, how do we correct it. Second, how to plan for next year using the current rate + 5%, and current work hours -5%.

In this case, it is better to treat the rate as a measure, located in the fact table for actual costs as well as in the fact table for planning. This way we can correct the past rates and recalculate last month’s costs. We can also change the future rates (in the planning fact table) and recalculate the future costs. Yes we will occupy more disk space because we are repeating the rate on every row, but that is a very good price to pay for the advantage we get: more flexibility, ability to support the required functionality and higher performance.

The need to put the rate in the worker dimension arises when we need to know the historical rates for each worker. But this need can also be satisfied if we store the rate as a measure. Granted it would be less performant as we have to plough through lots of fact rows to distil the data to get distinct historical rate per worker. So if this is a mandatory requirement, it is a good idea if we put rate an type 2 attribute, in addition to putting the rate as a measure.

Second case: product price

This is a classic example from the retail industry. But it is also applicable to banking and insurance, pharma, energy and telecom sectors. In a shop, each product has a price. The same product can have different price in different stores. The price can change at any time. The price reviews can happen weekly or monthly, or ad-hoc. It can be done at the store level, city level or regional level. The actual price paid depends on discount or promotion applied for each transaction, at the time the customer purchase the product.

The issue here is, whether we should store the price of the product in a sales fact table, or as an attribute in the product dimension. I maintain an opinion that the price must be store in the sales fact table, for each transaction. In addition, the price should also be stored in the price list fact table. The price on the price list fact table is the official price for each store, for that day. Whereas the price on the sales fact table is the actual price paid, which could be different from price list, because of the offer or promotion applied to that transaction. For example, buy one get one free, or three for the price of two. Or, because the product is in a bundle such as cheese and wine, pyjamas and sleeper, a dictionary and thesaurus, etc. Or, because the product is simply discounted on that day, perhaps due to expiry date, or because the manager want to get rid of the stock quickly.

Having a price in the product dimension, in my opinion is not right. Product attributes such as size and colour, model and optional extras, are true product properties and therefore should be in the product dimension as attributes. These attributes are the same across stores, they are truly the property of a product. But prices can be different across stores, so they are not sole property of a product.

The price may be attached not to an individual product code, but to several product codes. For example, regardless of colour, the prices of the products are the same. Prices can be dynamic, e.g. prices of tickets (entertainment industry or transport industry) of the same show or flight can be different from one seat to the next, from one booking to the next, from one time slot to the next. Prices can be in bands, for example in utility industry (electricity and gas) and in telecommunication industry (mobile calls) the first 100 kWh can have different price to the second 100 kWh, Friday price can be different to Sunday calls.

The concept of price list fact table is more accommodative to satisfy the above demanding requirements, compared to putting price as a type 2 attribute in the product dimension.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: