There are 4 ways we can deal with Status attribute:
- Put it in a separate dimension, with a dim key in the fact table.
- Put it in a “status fact table”, which is an accumulated snapshot.
- Leave it in the dimension table as type 1.
- Leave it in the dimension table as type 2.
Once we explained what they are to the business, number 3 is usually rejected outright. The issue is that they need to record the date when the status changes and they want to analyse it. Whether it is insurance or banking, manufacturing or telco, it is usually a “pipeline process”, aka “workflow”. In a pipeline process, an object (document, project, application, etc) goes to different stages. The users often need to know how long the application/document stays at each stage. Either they want to find a “bottleneck”, or inefficiencies, e.g. “why on earth it takes 2 weeks to approve it?”
Apart from the dates they also want to know the indicator/flag, i.e. say you have 5 stages: stage1 then stage2 then it goes to either stage3 or stage4. If it goes to stage3 then finish, but if it goes to stage4 it goes to stage5. It is confusing using dummy data isn’t it. I’ll make up something. Credit Card Application:
Case 1: The application was received 7/3/11, credit scoring passed 9/3/11 (2 days), application approved 10/3/11 (1 day), credit limit determined 14/3/11 (2 working days), card issued 15/3/11 (1 working day).
Case 2: The application was received 7/3/11, credit scoring failed 9/3/11 (2 days), and application rejected 10/3/11 (1 day).
The queries are about how many days passes (on average) between status 4 (app approved) and status 6 (credit limit determined), etc. How many apps are currently at status 1 (app received but not scored yet)?
The best way to store the dates is to have accumulative fact table, as per option 2 above. As always there are pluses and minuses of each approach. The weakness of number 2 is that the number of statuses is fixed. Leaving it as type 2 (option 4 above) does gives us the history (what status and when) but the query is more horrible / difficult, in comparison to option 2. Putting it into separate dim does resolve the “rapidly changing attribute” issue, but doesn’t give us the management of dates and statuses.
When you are doing accumulative fact table, consider creating indicator/flag columns out of 1000 apps in the process (status = 1,2,3,4,5 not 5 or 7), how many of them are on status 1, how many are on status 2, etc? Also consider creating a column called current status. This is a must have in most situation. Useful to answer queries like, how many of status 6 do we have now? Who are they and what are they break down by … etc.
As usual I welcome any comments and discussion at firstname.lastname@example.org. Vincent 17/3/11.