I was very surprised to receive an email this morning from somebody saying that they delete the whole dimension table and insert all rows again, and they use surrogate keys. And they are having problem because the existing rows get new surrogate keys.
Whenever I heard “delete all rows in the dim table” a big alarm bell rang in my head. The word “delete” and “dim table” should not be in the same sentence!
This is the second time this month I encountered people doing this and I suspect there are many others out there doing the same thing, and having exactly the same problem. So I thought I have to write about this asap. So here it goes.
As always it is better to explain using an example. Say it’s a Member dimension. In the source system we have Membership table, like this:
And in the data warehouse we have Member dimension, like this:
There are 2 types of changes happening in the source Membership table:
- Change to an existing row
- New row created
Let’s make these 2 types of changes: change Ayane’s name to Eyane and create a new member: Agus Salim, like this:
When we truncate the Member dimension and reload all rows, this is what we get:
All members get new surrogate keys.
G48’s surrogate key changed from 1 to 3.
G49’s surrogate key changed from 2 to 4.
That is the issue about “And they are having problem because the existing rows get new surrogate keys” that I mentioned at the beginning of this article.
The problem with G48’s SK changed from 1 to 3 is: the fact table row for G48 is still referring to SK = 1. Now the fact table SKs don’t match with the dim table SKs, causing issues when you join them.
As I said above, we should not delete from the dim table. Instead, we should update changed rows, and insert new rows. After the update, the dim table should be like this:
So once again, the word “delete” and “dim table” should not be in the same sentence!