Status is a specific stage in a workflow. This is an example of a workflow:
In the above workflow, we have 6 statuses. The workflow begins with Status 1, and ends with Status 5 or 6. The normal workflow is: Status 1, Status 2, Status 4, Status 6 (1,2,4,6). And we have 3 alternate flows:
- 1, 3, 5
- 1, 2, 3, 5
- 1, 2, 4, 3, 5
An example of a workflow is a mortgage application. A mortgage application needs to go through various stages such as: credit score check, collateral check, application approved, application declined, etc.
When the workflow is in action, we have a date of those various stages:
In the above example, the application got at Status 1 on date1, at Status 2 on date2, at Status 4 on date3 and it got at status 6 on date4.
In the transaction/source system (e.g. mortgage application), the questions are about individual applications, such as:
SS1. What is the current status of Application 1?
SS2. When did Application 2 reach Status 4?
But in the data warehouse, the questions are less about individual applications. The questions are more about the status of many or all applications, such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?
In many data warehouse implementations, Status is created as an attribute on the main dimension, e.g. customer dimension, product dimension, or application dimension. The dimension is set as Slowly Changing Dimension type 2 (SCD2), like this:
This SCD2 structure is more appropriate to answer Transaction/Source System questions such as SS1 and SS2 above. Using this SCD2 structure, it is difficult to answer Data Warehouse questions such as DW1 and DW2 above. Using this structure the query to get “How many on Status 4” is:
select count(*) from DimApp where IsCurrent = 1 and Status = 'Status4'
Output: 2
And the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is
with x as (select DateDiff(d, EffectiveDate, ExpiryDate) as Days from DimApp where Status = 'Status2') select avg(Days*1.0) from x
Output: 1.5
The “*1.0” is to convert days column from integer to decimal so we don’t get an integer output.
But, from Status 2 an application doesn’t always go to Status 4. It can go to Status 3 instead! So how do we find out which application goes from Status 2 to Status 4 (App2) and which app goes from Status 2 to Status 3 (App3)? We can’t!
Not able to identify which row is the previous state, we are forced to add “Previous App Key” column:
Now we can join the table to itself using PrevKey and AppKey column. So the query to answer “What is the average duration (in days) between Status 2 and Status 4?” becomes:
select avg(datediff(d, StatusFromDate, StatusToDate)*1.0) from ( select A.AppId, B.Status as StatusFrom, B.EffectiveDate as StatusFromDate, A.Status as StatusTo, A.EffectiveDate as StatusToDate from DimApp A join DimApp B on A.PrevKey = B.AppKey ) X where StatusFrom = 'Status2' and StatusTo = 'Status4'
Output: 2.5
Using Accumulating Snapshot Fact Table
To answer question DW1 and DW2 above, it is more appropriate to use Accumulating Snapshot Fact Table, like this:
The “Ind” columns are indicator columns. It shows which status the application is currently on.
For a background on Accumulating Snapshot you can read Kimball Toolkit book chapter 3 page 75 and his article here.
Using this structure the query to get “how many on status 4” becomes:
select sum(S4Ind) from FactApp
Output: 2
Whereas the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is:
select avg(datediff(d, S2Date, S4Date)*1.0) from FactApp where S2Date is not null and S4Date is not null
As you can see, it is easier to use Accumulating Snapshot Fact Table to answer Data Warehousing questions such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?
Vincent Rainardi, 8th August 2011




