Data Warehousing and Business Intelligence

9 July 2012

Duplicate Attribute Keys in SSAS

Filed under: Analysis Services,Data Warehousing — Vincent Rainardi @ 5:32 pm
Tags: ,

Sometimes when we process the cube we get an error message: duplicate attribute key. And at other times we get: missing attribute key. In this article I will try to address these 2 issues.

A. Duplicate Attribute Key

There are 2 things to check in this case:

  1. Do you use surrogate key instead of business key?
  2. Is there a duplicate business key in the dimension table?

1. Surrogate Key vs Business Key

In the data warehouse, to link the fact tables to the dim tables we must use surrogate keys. We must not use business key to connect fact table to dim table. But time and time again I have seen people not reading Kimball and Caserta’s book thoroughly and try to build a data warehouse quickly. They use the business key to connect the fact and dim. It is easier to understand if I use an example, so I’ll use income fact table as an example.

FactIncome:

DimWork:

The reason we have 2 rows for WorkId 781 is because DimWork is SCD type 2.

The above example shows that they link FactIncome to DimWork using the business key: WorkId. This is a fatal error. Because when we query “What is the IncomeAmount for WorkId 781 on 9/7/12?” the DW will return 2 rows to the cube:

select DW.WorkId, DW.WorkKey, F.IncomeAmount
from FactIncome F
join DimWork DW on DW.WorkId = F.WordId
where F.SnapshotDateKey = 20120709

Output:

781     2        140
781     3        140

And SSAS will sum the two 140 up into 280.

It is vital that in the fact table we use surrogate key to connect to the dimension table, like this:

FactIncome:

So that when we query “What is the IncomeAmount for WorkId 781 on 9/7/12?” the DW will return only 1 row to the cube:

select DW.WorkId, DW.WorkKey, F.IncomeAmount
from FactIncome F
join DimWork DW on DW.WorkId = F.WordId
where F.SnapshotDateKey = 20120709

Output:

781     2        140

2. Is there a duplicate business key in the dimension table?

The second reason of experiencing “duplicate attribute key” in SSAS is because there is a duplicate business key in the dimension table.

This could happen if there are 2 rows in the source table with the same business key, but the business key on the second row has trailing spaces. For example:

Row 1: CustomerId = ‘GA34’
Row 2: CustomerId = ‘GA34  ’

The ETL consider these 2 rows as two different customers and allocate different surrogate key in the customer dimension.

If you use business key as the key in SSAS dimension, SSAS will say that there is a duplicate attribute key in the customer dimension, i.e. there are two rows with CustomerId = ‘GA34’. This is because SSAS trim the attribute key.

Of course you can suppress the issue using custom configuration (in the property of the dimension), but this is hiding the issue and making the front end (say Tableau or Strategy Companion Analyser) returning incorrect data to the users.

This issue will not occur by the way, if you use surrogate key. It will only occur if you use the business key as the key of the dimension in SSAS.

B. Missing Attribute Key

The second error, missing attribute key, means that there is a surrogate key in the fact table which does not exist in the dimension table. We can isolate the fact row that gives the error message by looking at the details of the error message that SSAS gives us when we process the cube. The value of the key is given in the message. We can then query the dim table to check if the dim table has a row with that SK.

A possible reason why the SK exists in the fact table but not in the dim table is because of a mistake on the ETL task that populates the fact table. The ETL task can be SSIS workflow or Informatica session/mapping, but it is the same principle: the part of the workflow or mapping that does the lookup failed. This is either because the Lookup Transform didn’t use the correct Business Key, or because the Lookup Transform didn’t look at the correct column.

There is another possibility if it is a type 2 dim: the Lookup Transform didn’t get the correct version, i.e. part of lookup logic that examines the EffectiveDate and ExpiryDate (or ValidToDate and ValidFromDate) is not working properly.

Those are the various possibilities of why we could have “duplicate attribute key” and “missing attribute key” in SSAS cube.

About these ads

2 Comments »

  1. Hi,
    Nice article, Just wondering in 1st scenario if we want to group by Location “Manchester” we will get two rows. It is possible if we group by Location we should only see one row result. Thanks

    Comment by X — 26 November 2012 @ 5:20 pm | Reply

    • Thanks X. It’s not appropriate to group by Location. We would have an issue if we group by Location, for example when we need to use other attribute in the dimension. The correct solution is to use surrogate key to connect the dimension to the fact table.

      Comment by Vincent Rainardi — 27 November 2012 @ 6:59 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 212 other followers

%d bloggers like this: