One of the questions frequently asked by somebody who is learning dimensional modelling is “How do we preserve the history of the facts?” This question usually comes up after they learned about SCD (Slowly Changing Dimension), which made them realise that we could preserve the history of dimensional attributes.
What they meant by the word “fact” was measure, i.e. the numerical column in the fact table. For example, in an insurance warehouse, if in the source system a premium of £100 later on changed to £200, what will happen to the fact table? Earlier today somebody came to me with this question.
I explained to him the concept of “Transaction Fact Table” versus “(Periodic) Snapshot Fact Table”. Then explained that in the case of Transaction Fact Table, we overwrite the £100 with £200. But in the case of Snapshot Fact Table (say daily snapshot), we store £100 for snapshot date1 and £200 for snapshot date2. E.g. if the premium was £100 on 1st July, then on 10th July it was changed to £200, in the snapshot fact table for snapshot date = 1st to 9th July the premium is £100, but from 10th July onwards the premium is £200.
So a simple answer to the question of “how do we preserve the history of the fact” is: using snapshot fact table.
Then he went on and say, “But what if it’s a retail case and the price of the product is updated from £100 to £200, and it’s not overwritten, but inserted as a different date.” I explained the concept of sales version price, and that price is encapsulated in the sales transaction. E.g. sales1 on 1st July for product1 has price = £100. If on 10th July the price of product1 is changed to £200, then sales2 which happened on 10th July will have the price = £200. So in the transaction fact table there will be 2 rows, one for each sales transaction. Sales1 has price = £100 and sales2 has price = £200. So it is the sales transaction that is captured, and price is captured as “part of the sales”.
So this case it is not about preserving the history of the fact. It’s a different matter. The price is “captured” every time there is a sales transaction.
Would you store the price as dimension? Hmmm it’s for another blog post. But the short answer is (as always in dimensional modelling): It depends on how it would be used. Generally it is better to store price as a fact (with date/time key), rather than as a dimension. In retail it is price, but in the case of Forex, banking & insurance, it’s “rate” rather than “price”.
“But what if”, he asked again, “in the first case above (premium), the snapshot fact table is monthly, and the premium is changed from £100 to £200, then back to £100 again, within the same month? Would we loose the history?” What he meant was: say on 1st July the premium was created as £100, then on 10th July it was updated to £200. Then on 20th July it was changed back to £100. In this case the July version of that premium in the snapshot fact table would be £100. Yes we will loose the “history”, i.e. the warehouse never captures the fact that the premium was £200 for 10 days. But if the snapshot fact table was daily, the warehouse would capture it.
Hope this is useful. As usual I welcome any comments and questions on vrainardi@gmail.com.
Here is how I would reflect changing premium.
In the transaction table on date 1 we would record a payment of $200, if 10 days later the premium changed to $100 we would record a transaction of -$100 (refund). If 10 days later we changed our mind again and decided to charge premium $200 we would record a transaction of $100
Is there anything wrong with this approach?
Thank you
Comment by Alex Kiper — 16 April 2012 @ 10:55 pm |
I think that approach is OK. Perhaps we want to fully cancel out before entering the new amount. But practically we in the data warehouse / BI don’t have control over it. It is the source system people that make this decision about the approach of how to reflect changing premium.
Comment by Vincent Rainardi — 17 April 2012 @ 6:28 pm |