Data Warehousing, BI and Data Science

14 March 2012

Name of Surrogate Key Columns

Filed under: Data Warehousing — Vincent Rainardi @ 5:15 am
Tags:

The convention amongst Kimball practitioners is to name the surrogate key columns as the dimension name + “_key”, i.e product_key, claim_key, risk_key. And this is done both in the fact table and in the dimension table. This way, when you create the BO universe, or SSAS cube, or MicroStrategy metadata/repository, it’s easier because we can say ‘link on the same name’

But recently I came across a different naming convention. In the fact table the surrogate key column is called PK_[Dim Name]_ID and in the dimension it is called FK_[Dim_Name]_ID. For example, the surrogate key of the product dimension is PK_Product_ID and in the fact table it is called FK_Product_ID. The benefit of this naming convention is so that in Toad, or in SSMS, when you type

select * from fact_daily_sales f
join dim_product dp on dp.

As soon as you type “.” the intellisense pops out displaying all the columns in dim_product and when you type “PK” there is only 1 column selected: PK_Product_ID. So it becomes:

select * from fact_daily_sales f
join dim_product dp on dp.pk_product_id = f.

And as soon as you type “.” the intellisense pops out displaying all the columns in fact_daily_sales table and when you type “FK” Toad/SSMS offers you all the surrogate keys in that fact table. Beautiful isn’t it?

Yes, but with 1 disadvantage: you can’t join on names between the fact table and the dimension table because the surrogate key columns have different names.

 

Name of Dimension and Fact Tables

Whilst I’m at it (naming), I might as well write about naming the dimension and fact tables. The convention amongst Kimball practitioners is to prefix the dimension table name with “dim_”, i.e. dim_[Dim Name]. And for the fact table the prefix is “fact_”, i.e. dim_[Fact Table Name]. For example: dim_product, dim_customer, fact_customer_profitability. This is because of the examples set by the Kimball Group, and by Microsoft SQL Server product team when they created Adventure Works data warehouse.

To differentiate transaction fact tables from snapshot fact tables, the data warehouse team in some companies added “_trans” or “_snapshot” suffix. And to differentiate accumulating snapshot fact table form periodic snapshot fact table, they add “_acc” suffix. For example: fact_sales_trans and fact_inventory_level_snapshot_acc. If it is a periodic snapshot, they mention the frequency, i.e. daily or monthly: fact_daily_sales_snapshot, fact_monthly_sales_snapshot.

In some companies, they shorten it further by naming the dimension table “D_” instead of “dim_” and the fact tables as “F_” instead of “fact_”. I personally prefer the D and F convention rather than “dim_” and “fact_”. Reason: we know that there are only 3 types of tables in the whole warehouse: fact, dimension and bridge.

Rather than using underscore (“_”) to separate words, I prefer to camel case it, to match Microsoft / .NET naming standard, i.e. fDailySales, dProduct, dCustomer. I prefer to type the full words rather than abbreviated them, i.e. instead of fProdOutput I type fProductionOutput.

I have yet come across data warehouse team who put the word “dim” and “fact” at the end, i.e. SalesFact, sales_fact, ProductDim or product_dim. Everyone seems to be putting the “dim” and “fact” in the front.

A data warehouse user once said to me: “Fact tables are fine, but dim tables? We want bright tables, not dim ones” 🙂

Advertisements

3 Comments »

  1. We’ve also taking a new approach with our conventions … similar to C# conventions as well, but with the end user in mind. Our goals are to make BI as self-service as possible for our end users, and in efforts to align to this goal, we’ve chosen to leave out as much non-business terminology as possible. For our table names, we’ve dropped Fact and Dim altogether, as the terms mean nothing to our users. This leaves us with table names such as Customer, Product, and Date. For snapshots, we similarly include the frequency of the snapshot, but that is all – i.e. SalesDaily and SalesMonthly. We have also stopped using underscores and instead chosen to use Pascal case for column names, although this is more for development reasons – primarily it makes it easy to point an ORM like Entity Framework at the model and not have to clean up all the property names.

    Comment by trentfoley — 14 March 2012 @ 4:58 pm | Reply

  2. What about “SK_Product” as the “D_Product” key and also “SK_Product” in a “F_DailySales”?

    Comment by Bruno Rafael Oliveira — 20 August 2015 @ 11:32 am | Reply

    • Hi Bruno, that is a good idea, and used in some implementations as I found out. The other format is Product_SK, but I think your SK_Product is better. I think in general prefix is better than suffix, i.e. Key_Product or SK_Product is better than Product_Key or Product_SK. This is because if it is prefix we can take advantage of the intellisense.

      Comment by Vincent Rainardi — 20 August 2015 @ 8:55 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: