If your data warehouse is processing files from other systems, and one day you didn’t get a file from a source system, what would you do? Of course you will still run the data warehouse ETL, to process the available data files. The data warehouse ETL (or “the daily batch”) must run every day, so that the business gets the latest data.
How, here’s the question, if that 1 missing file was produced the next day, what would you do with it? Can you process it into your data warehouse?
If the file populates a snapshot fact table, it is straight forward because the new data will just occupy the missing snapshot date:
But for dimensions it’s a bit tricky, especially if it’s type 2.
As you see above, in addition to inserting a row, we have to update the effective date of the next row. The question is, how do we know which one is the next row? Well, by sorting on the effective date. Yes it’s not simple.
How about type 3 dims?
We need to update the Prev Version, as well as the Current Version Date, i.e.
How about transaction fact table? Quite straight forward, as we only need to insert the missing date:
How about accumulating snapshot fact table? Not straight forward. In addition to updating status 2 columns, we need to update the duration of status 1:
In the above scenario, status 3 happened on 3rd Jan. Then on 4th Jan we receive the data file for 2nd Jan.
Vincent Rainardi, 28/1/2012