Data Warehousing and Business Intelligence

9 August 2011

Using Accumulating Snapshot Fact table to Monitor Status

Filed under: Data Warehousing — Vincent Rainardi @ 4:32 am
Tags:

Status is a specific stage in a workflow. This is an example of a workflow:

In the above workflow, we have 6 statuses. The workflow begins with Status 1, and ends with Status 5 or 6. The normal workflow is: Status 1, Status 2, Status 4, Status 6 (1,2,4,6). And we have 3 alternate flows:

  1. 1, 3, 5
  2. 1, 2, 3, 5
  3. 1, 2, 4, 3, 5

An example of a workflow is a mortgage application. A mortgage application needs to go through various stages such as: credit score check, collateral check, application approved, application declined, etc.

When the workflow is in action, we have a date of those various stages:

In the above example, the application got at Status 1 on date1, at Status 2 on date2, at Status 4 on date3 and it got at status 6 on date4.

In the transaction/source system (e.g. mortgage application), the questions are about individual applications, such as:
SS1. What is the current status of Application 1?
SS2. When did Application 2 reach Status 4?

But in the data warehouse, the questions are less about individual applications. The questions are more about the status of many or all applications, such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?

In many data warehouse implementations, Status is created as an attribute on the main dimension, e.g. customer dimension, product dimension, or application dimension. The dimension is set as Slowly Changing Dimension type 2 (SCD2), like this:

This SCD2 structure is more appropriate to answer Transaction/Source System questions such as SS1 and SS2 above. Using this SCD2 structure, it is difficult to answer Data Warehouse questions such as DW1 and DW2 above. Using this structure the query to get “How many on Status 4” is:

select count(*) from DimApp where IsCurrent = 1 and Status = 'Status4'

Output: 2

And the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is

with x as
(select DateDiff(d, EffectiveDate, ExpiryDate) as Days from DimApp where Status = 'Status2')
select avg(Days*1.0) from x

Output: 1.5

The “*1.0” is to convert days column from integer to decimal so we don’t get an integer output.

But, from Status 2 an application doesn’t always go to Status 4. It can go to Status 3 instead! So how do we find out which application goes from Status 2 to Status 4 (App2) and which app goes from Status 2 to Status 3 (App3)? We can’t!

Not able to identify which row is the previous state, we are forced to add “Previous App Key” column:

Now we can join the table to itself using PrevKey and AppKey column. So the query to answer “What is the average duration (in days) between Status 2 and Status 4?” becomes:

select avg(datediff(d, StatusFromDate, StatusToDate)*1.0) from
( select A.AppId, B.Status as StatusFrom, B.EffectiveDate as StatusFromDate,
  A.Status as StatusTo, A.EffectiveDate as StatusToDate
  from DimApp A
  join DimApp B on A.PrevKey = B.AppKey
) X where StatusFrom = 'Status2' and StatusTo = 'Status4'

Output: 2.5

 

Using Accumulating Snapshot Fact Table

To answer question DW1 and DW2 above, it is more appropriate to use Accumulating Snapshot Fact Table, like this:

The “Ind” columns are indicator columns. It shows which status the application is currently on.

For a background on Accumulating Snapshot you can read Kimball Toolkit book chapter 3 page 75 and his article here.

Using this structure the query to get “how many on status 4” becomes:

select sum(S4Ind) from FactApp

Output: 2

Whereas the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is:

select avg(datediff(d, S2Date, S4Date)*1.0) from FactApp
where S2Date is not null and S4Date is not null

As you can see, it is easier to use Accumulating Snapshot Fact Table to answer Data Warehousing questions such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?

Vincent Rainardi, 8th August 2011

About these ads

3 Comments »

  1. This is nice, but in your workflow you could visit status 3 twice. So the questions I would like to answer are:
    1. Which apps where in Status X on a specific date range.
    2. Which apps where in Status X on a specific date range, and this was the last time they hit this status so far.
    3. Which apps where in Status X on a specific date range, and this was the first time they hit this status.

    I think this is not possible to solve with an Accumulating Snapshot Fact Table, and i have to stick to the status dimension table, right? maybe adding a flag “first” and “last” next to “current” to indicate whether the app passed this status the first or the last time or none of it as I can check for current at the moment.

    Do you have a better approach where I could use the benefits of an Accumulating Snapshot Fact Table?

    Comment by Philipp Kemmeter — 2 October 2014 @ 12:23 pm | Reply

    • Okay, i have to correct: In your workflow you cannot visit status 3 multiple times. But let’s say, you could :)

      Comment by Philipp Kemmeter — 2 October 2014 @ 12:59 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 208 other followers

%d bloggers like this: