Many people have seen or implemented transaction fact table and periodic snapshot fact table but not the 3rd type of fact table: accumulative snapshot fact table. The purpose of this article is to introduce it, with an example. I have written an article about it in the past (link), but this one is a bit different. I try to be more practical, and an email from a reader I received yesterday made me realise that the previous article does not cover some areas. In particular, the business aspect of it, which is rather important.
We use this fact table when we need to analyse a process. A process is a series of business events, which comes one after the other in stages.
As an example, customers applying for a health care insurance. First, the customer fills an application form, including the bank details for direct debit and the effective date of the policy. The insurer then receives the form and sends medical questionaire. The insurer receives the medical questionaire and checks it, then confirms the medical details with their internal medical team, and at the same time, checks with the applicant’s GP (doctor). Underwriting is the next step, that is when the insurer determines the rate for this applicant (based on their risks), and from the rate the underwriter then calculates the premium and issue a policy along with membership certificate. The policy then become ‘alive’ and the first payment is then taken from the applicant bank account. As per the regulation, the applicant can cancel the policy during the grace period of 21 days and have their money back.
As we can see in the example above, within a process we can have branching where two or more flows happens at the same time. We can also have alternate paths, like cancellation. I won’t go into process flows in details as it is not the purpose of this article. But I will simplify the example above into only 3 stages:
- Stage 1: Application form received
- Stage 2: Medical questionnaire received
- Stage 3: Policy document sent
Next I will discuss the analysis that the business usually want (the “BI Questions”).
Analysis (the “BI Questions”)
The usual BI Questions that the business try to answer are:
- Volume of business – Early Stage: How many applications did we receive in Q4 2015? How does this compare to Q4 2014 and Q3 2015? That is quarterly, but it is also common to do monthly comparison (Dec 2015 compared to the previous 3 months), and annual comparison (2015 compared to 2014).
- Volume of business – Final Stage: How many policies did we issue in Q4 2015? How does this compare to Q4 2014 and Q3 2015? Again, above is quarterly, but it is also common to do monthly and annual comparison.
- Efficiency of the business process – Conversion Rate: Of all applications we received in Q3 2015, how many became policies? How does this compare to Q2 2015? Note that this question can only be answered 3 months after, because it takes a few weeks for an application to become a policy. So at the moment (3rd Feb 2015) we can’t ask for Q4 2015 because they are still being processed / in progress.
- Efficiency of the business process – Speed of Flow: Of all application we received in Q3 2015, on average how many business days (BD) does it take to reach stage two (medical form received)? And how many BDs does it take to reach stage three (policy document sent)? How does this compare to the previous quarter and previous year? Let’s say that the answers are: 7.2 BDs to reach stage 2 and 15.8 BDs to reach stage 3.
- Efficiency of the business process – Issues and Anomalies: Show me the applications we received in Q3 2015, which reached stage 3 in more than 20 BDs. How many in total? Why were they late – is it because of stage 2 being slow or stage 3 being slow?
- Efficiency of the business process – Best Practices: Show me the applications we received in Q3 2015, which reached stage 3 in less than 12 BDs. How many in total? Why were they early – is it because of stage 2 being fast, or stage 3 being fast?
- The same questions as per 1 to 6 above, but specifically for customers in a particular region, e.g. London, North East, Scotland; and a particular customer type, e.g. family (rather than individual), business (rather than personal).
- The same questions as per 1 to 6 above, but specifically for a particular product type, e.g. “comprehensive cancer treatment”, “policies which include alternative treatments”
The basic design of accumulative snapshot fact table which can answer the above question is:
The fact table consists of 4 groups of columns:
- Group A: Dimension Keys. The most important ones are product_key, and customer_key. This enables us to do #7 and #8 above, i.e. filtering the data on a particular product attribute(s) or customer attribute(s).
- Group B: Dates. This group contains the dates of all stages in the process. In the above example, this group would contain date_app_received, date_medical_received, and date_policy_sent. The purpose of these columns is to specify the time frame of the dataset, i.e. applications received in Q4 2015, or policies issued in Q3 2015.
- Group C: Durations. This group contains the duration between stages. In the above example, we have 2 columns in group C: days_medical_received (the number of days from the day we received application until the day we received the medical questionnaire) and days_policy_sent (the number of days from the day we received the medical questionnaire until the day we sent the policy document). The purpose of the columns in group C is to enable us analyse the business process efficiency (#4 above, the speed of flow), as well as “bottleneck analysis” (#5 and #6 in the example above).
- Group D: Measures. This group contains numerical measures such as: quantity of applications (hardcoded to 1 for all rows) , the premium amount (annualised), the number of people in the policy, the sum insured, etc. The purpose of Group D columns is to enable us to answer “how many applications did we…”, “what is the value of the policies that we…”, etc.
The way we query this table is in 3 steps:
- First we set the time period of the analysis by filtering the row on one (or more) of the Group B columns.
- Then we filter the rows further on the product or customer dimensions using the dim keys in Group A.
- Once we got the rows that we want to involve, we sum one (or more) of the the measure columns (Group D). Or, we can sum one (or more) of the duration columns (Group C).