Data Warehousing and Business Intelligence

9 August 2011

Using Accumulating Snapshot Fact table to Monitor Status

Filed under: Data Warehousing — Vincent Rainardi @ 4:32 am
Tags:

Status is a specific stage in a workflow. This is an example of a workflow:

In the above workflow, we have 6 statuses. The workflow begins with Status 1, and ends with Status 5 or 6. The normal workflow is: Status 1, Status 2, Status 4, Status 6 (1,2,4,6). And we have 3 alternate flows:

  1. 1, 3, 5
  2. 1, 2, 3, 5
  3. 1, 2, 4, 3, 5

An example of a workflow is a mortgage application. A mortgage application needs to go through various stages such as: credit score check, collateral check, application approved, application declined, etc.

When the workflow is in action, we have a date of those various stages:

In the above example, the application got at Status 1 on date1, at Status 2 on date2, at Status 4 on date3 and it got at status 6 on date4.

In the transaction/source system (e.g. mortgage application), the questions are about individual applications, such as:
SS1. What is the current status of Application 1?
SS2. When did Application 2 reach Status 4?

But in the data warehouse, the questions are less about individual applications. The questions are more about the status of many or all applications, such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?

In many data warehouse implementations, Status is created as an attribute on the main dimension, e.g. customer dimension, product dimension, or application dimension. The dimension is set as Slowly Changing Dimension type 2 (SCD2), like this:

This SCD2 structure is more appropriate to answer Transaction/Source System questions such as SS1 and SS2 above. Using this SCD2 structure, it is difficult to answer Data Warehouse questions such as DW1 and DW2 above. Using this structure the query to get “How many on Status 4” is:

select count(*) from DimApp where IsCurrent = 1 and Status = 'Status4'

Output: 2

And the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is

with x as
(select DateDiff(d, EffectiveDate, ExpiryDate) as Days from DimApp where Status = 'Status2')
select avg(Days*1.0) from x

Output: 1.5

The “*1.0” is to convert days column from integer to decimal so we don’t get an integer output.

But, from Status 2 an application doesn’t always go to Status 4. It can go to Status 3 instead! So how do we find out which application goes from Status 2 to Status 4 (App2) and which app goes from Status 2 to Status 3 (App3)? We can’t!

Not able to identify which row is the previous state, we are forced to add “Previous App Key” column:

Now we can join the table to itself using PrevKey and AppKey column. So the query to answer “What is the average duration (in days) between Status 2 and Status 4?” becomes:

select avg(datediff(d, StatusFromDate, StatusToDate)*1.0) from
( select A.AppId, B.Status as StatusFrom, B.EffectiveDate as StatusFromDate,
  A.Status as StatusTo, A.EffectiveDate as StatusToDate
  from DimApp A
  join DimApp B on A.PrevKey = B.AppKey
) X where StatusFrom = 'Status2' and StatusTo = 'Status4'

Output: 2.5

 

Using Accumulating Snapshot Fact Table

To answer question DW1 and DW2 above, it is more appropriate to use Accumulating Snapshot Fact Table, like this:

The “Ind” columns are indicator columns. It shows which status the application is currently on.

For a background on Accumulating Snapshot you can read Kimball Toolkit book chapter 3 page 75 and his article here.

Using this structure the query to get “how many on status 4” becomes:

select sum(S4Ind) from FactApp

Output: 2

Whereas the query to answer “What is the average duration (in days) between Status 2 and Status 4?” is:

select avg(datediff(d, S2Date, S4Date)*1.0) from FactApp
where S2Date is not null and S4Date is not null

As you can see, it is easier to use Accumulating Snapshot Fact Table to answer Data Warehousing questions such as:
DW1. How many applications are currently on Status 4?
DW2. What is the average duration (in days) between Status 2 and Status 4?

Vincent Rainardi, 8th August 2011

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 208 other followers

%d bloggers like this: