Data Warehousing and Business Intelligence

12 November 2009

Concatenating attributes to form dimension key

Filed under: Analysis Services — Vincent Rainardi @ 9:14 pm
Tags:

In SSAS, to create a dimension from a large transaction table, we should avoid using the table’s primary key as the dim key. Instead, it is better to use the concatenation of the attributes as the dim key.

For example, in stock trading we have a transaction table called Trade with these columns: action (whether it’s buy, sell or stop loss), trade type (AT for Automated Trade, O for Ordinary Trade, NK for Block Trade), status (whether it’s outstanding, in progress, or completed), result (failed, rejected or successful). This Trade table has a lot of data, say 50 millions rows, containing 8 years of history. And these attributes (actions, trade types, trade status and trade results) are not stored in separate tables. There is no table called trade types for example; the only place we could find AT, O and NK is in the Trade table.

Now we want to create an Trade dimension with that four attributes. The primary key of the Trade table is trade_ref. If we use “select trade_ref, action, trade_type, status, result from trade”, SSAS will have to do “select distinct” 5 times to build the dimension. It would take probably 10 minutes to build this dimension. From my experience, if we have 15-20 attributes instead of 4, it could take about 30-45 minutes to build the dimension. About 60-70% of this time is to build the trade_ref attribute, which is the dimension’s key attribute. A more efficient way of doing it is to:

a) split it into 4 dimensions, or
b) do “select distinct action + “|” + trade_type + “|” + status + “|” + result + “|” + action as dimkey, trade_type, status, result from trade”

From my experience, users prefer to have related attribute in the same dimension. If we do a), those 4 attributes will be located in separate dimensions. Where as if we do b), we will have those 4 attributes in 1 dimension, and the query would finish in a few seconds, if we have an index in each of the 4 columns. Probably 5 minutes for 15 attributes. This key attribute is hidden anyway, so whether we use trade_ref or the concatenated one, users won’t see it.

In the Named Query the Measure Group (MG), we need to provide a dim key. We can’t use trade_ref for this (select trade_ref, … from Trade – [Q1]). We need to use the concatenated columns (select action + “|” + trade_type + “|” + status + “|” + result + “|” + action as trade_key, … from Trade [Q2]). From experience, the [Q2] query performs as well as the [Q1] query. In a 50 millions transaction table, both queries take about 15 minutes (10 partitions located on separate disks). This is because the MG query is a straight forward query (1 pass), which is very different from the dim query (multiple passes, 1 for each attribute, select distinct).

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: