Data Warehousing, BI and Data Science

12 July 2012

Delete All Rows in the Dimension Table

Filed under: Data Warehousing — Vincent Rainardi @ 5:32 pm
Tags:

I was very surprised to receive an email this morning from somebody saying that they delete the whole dimension table and insert all rows again, and they use surrogate keys. And they are having problem because the existing rows get new surrogate keys.

Whenever I heard “delete all rows in the dim table” a big alarm bell rang in my head. The word “delete” and “dim table” should not be in the same sentence!

This is the second time this month I encountered people doing this and I suspect there are many others out there doing the same thing, and having exactly the same problem. So I thought I have to write about this asap. So here it goes.

As always it is better to explain using an example. Say it’s a Member dimension. In the source system we have Membership table, like this:

And in the data warehouse we have Member dimension, like this:

There are 2 types of changes happening in the source Membership table:

  • Change to an existing row
  • New row created

Let’s make these 2 types of changes: change Ayane’s name to Eyane and create a new member: Agus Salim, like this:

When we truncate the Member dimension and reload all rows, this is what we get:

All members get new surrogate keys.
G48’s surrogate key changed from 1 to 3.
G49’s surrogate key changed from 2 to 4.
That is the issue about “And they are having problem because the existing rows get new surrogate keys” that I mentioned at the beginning of this article.

The problem with G48’s SK changed from 1 to 3 is: the fact table row for G48 is still referring to SK = 1. Now the fact table SKs don’t match with the dim table SKs, causing issues when you join them.

As I said above, we should not delete from the dim table. Instead, we should update changed rows, and insert new rows. After the update, the dim table should be like this:

So once again, the word “delete” and “dim table” should not be in the same sentence!

 

Advertisements

9 Comments »

  1. Good and simple explanation.
    But there’s a little hick up in this sentence “As I said above, we should not delete from the fact table”

    Comment by Geoffrey — 15 August 2012 @ 6:37 am | Reply

  2. I had had cases where I deleted dim tables, but I know the consequences. By this post, can you envisage where truncating a dim table is evitable?

    Comment by abacusdotcom — 25 February 2014 @ 11:55 am | Reply

  3. What if we deleted G48 in the Membership table (source). How should we handle it in the dimension/fact table?

    Comment by Chris — 11 June 2018 @ 2:49 pm | Reply

    • Hi Chris, if the dimension table is type 2, 4, 5 or 6, we expire the G48 rows by updating the expiry_date column. If it’s type 1 or type 3 we soft delete the row by setting delete_flag column to Y (useful for membership statistics e.g. to know how many active members, etc). We don’t delete rows in a dimension table physically because they are used by the fact tables.

      Comment by Vincent Rainardi — 11 June 2018 @ 6:15 pm | Reply

      • How about in fact tables, do you also implement the delete_flag column?
        Thanks for getting back to me.

        Comment by Chris — 12 June 2018 @ 8:01 pm

  4. Hi Chris, there is a split of opinions among practitioners about a Delete/Active Flag column in the fact table. My friend used that column to mark transactions which had been deleted from source. He also zeroed out the amount during the update to make sure that queries which don’t use that flag return the right aggregated amount. Reason: to differentiate between zero amount transactions and deleted transactions.

    I prefer not to have that column in the fact table, but to insert a new row with the reverse amount and same surrogate keys to effectively cancelling out the original transaction. Reason: performance. Updating a fact table with billions of rows is painfully slow, whereas insert only takes a split second.

    Comment by Vincent Rainardi — 14 June 2018 @ 7:30 am | Reply

    • Thank you so much for sharing your knowledge. Appreciate a lot!

      Comment by Chris — 14 June 2018 @ 5:52 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: