Data Warehousing and Business Intelligence

13 March 2011

Dimensions with Multi Valued Attributes

Filed under: Data Warehousing — Vincent Rainardi @ 1:19 am
Tags:

A multi valued attribute is an attribute which has more than 1 value per dimension row. A “Multi Valued Attribute” is different to A “Multi Valued Dimension”. A “Multi Value Attribute” occurs in a dimension, whereas a “Multi Valued Dimension” occurs in a fact table. A “Multi Valued Dimension” is a dimension with more than 1 value per fact row. As always it is best to explain by example:

DimCustomer

CustomerName|City|PhoneNumber
Anita Haynes|Edinburgh|0131 123456
Bobby Locard|Peterborough|01733 123456
Ken Barrera|Cambridge|01223 111111, 1223 222222

The Phone Number column above is an example of  a multi valued attribute. It can have more than 1 value.

Another example for DimCustomer is interest. In a travel company, one customer can have several interests, which could be captured in an online survey.

Examples of multivalued attributes in the Product dimension are colour and size, e.g.

ProductKey|ProductCode|Colour|Size
1|A|Red|Small
2|B|Blue|Small,Medium
3|C|Green,Yellow|Small

The Colour column above is an example of a multi valued attribute. It can have more than 1 value. Same with the Size column.

An example of a Multi Valued Dimension is given by Ralph Kimball and Margy Ross in the Health Care chapter of their Toolkit book (page 263 chapter 13 on the 2nd edition): Diagnosis dimension on the Billing Line Fact Table. For each fact row there could be more than 1 diagnosis. 99% of the time, when this happens, we remove that dim from the fact table, because the grain does not suit the fact table. However, as Kimball and Ross said, it is sometimes useful to keep them in the fact table.

But this article is about Multi Valued Attribute, not Multi Valued Dimension.

There are several approaches to deal with a dimension with a multi valued attribute.

  1. Lower the grain of the dimension
  2. Put the attribute in another dimension, link direct to the fact table
  3. Use a fact table (bridge table) to link the 2 dimensions
  4. Have several columns in the dim for that attribute
  5. Put the attribute in a snow-flaked sub dimension
  6. Keep in one column using commas or pipes

1. Lower the grain of the dimension

In this approach we change the grain of the dimension, taking into account the multivalued attribute(s). It is quicker to easier by example: in the case of the colour and size above, we change the DimProduct to a lower grain, e.g. the combination ProductCode-Colour-Size as follows:

ProductKey|ProductCode|Colour|Size
1|A|Red|Small
2|B|Blue|Small
3|B|Blue|Medium
4|C|Green|Small
5|C|Yellow|Small

But in many cases we are not able to do this, because the fact table requires that the product dimension is at Product Code grain. For example: we don’t have the sales information per colour and size, but only by product code. You could argue that if the sale is for product B, we could put ProductKey 2 or 3, doesn’t really matter which one. But this gives the wrong impression that we know the colour or size.

2. Move the attribute to another dimension, linked direct to the fact table

An example of this is DimProduct which has a “Manufacturer” attribute. A product is usually made by 1 manufacturer, but sometimes two or three. One solution is to move manufacturer to its own dimension (DimManufacturer) and in the fact table we have ManufacturerKey. This approach is only acceptable if the fact table is relevant to the new dim, for example FactSupplierPerformance. But if the fact table is nothing to do with DimManufacturer (say FactSales) we can’t put ManufacturerKey there. Note that to use this approach we need to know (or assume) how to allocate the measures for each manufacturer, e.g. equal proportion, one takes all, or weighted:

FactSupplierPerformance

Before:

ProductKey|SupplierKey |Measure1
1|1|10
2|2|9

After, if product1 has 2 manufacturers, equal proportion:

ProductKey|ManufacturerKey|SupplierKey |Measure1
1|1|1|5
1|2|1|5
2|2|2|9

“One takes all”: (this is the easiest, but the question is always: which one?)

ProductKey|ManufacturerKey|SupplierKey |Measure1
1|1|1|10
1|2|1|0
2|2|2|9

In most cases, we don’t have the allocation information (as in, if it is 50-50 or 30-70), we only know that product1 has 2 manufacturers.

3. Use a bridge table to link the 2 dimensions

This is the “standard” approach, when we need a multivalued sub dim to be linked to the main dim. For example, in the case of manufacturer above:

FactSupplierPerformance:
ProductKey|SupplierKey |Measure1
1|1|10
2|2|9

DimProduct:
ProductKey|ProductCode
1|A
2|B

FactProductManufacturerBridge:
ProductKey|ManufacturerKey|Weight
1|1|0.5
1|2|0.5
2|1|1
(Weight or percentage column, aka allocation column, is optional)

DimManufacturer:
ManufacturerKey|ManufacturerName
1|Manufacturer1
2|Manufacturer2

Some of the advantages of this approach are:

  • We keep the fact table as it is
  • We keep the main dim as it is
  • That “guessing of the percentage/weight” is limited only to the relationship between product and manufacturer

Whereas the disadvantage, in comparison to approach 1 & 2 is: we need “more join” as there are more tables.

4. Have several columns in the dim for that attribute

If the number of attributes is small and fixed, this is by far the most popular approach to handle Multi Valued Attribute, i.e. in the case of DimCustomer above, have Phone1 and Phone2 columns. I said “if the number of attributes is small and fixed”, i.e. 2 or 3. But if the number of attributes is large (e.g. >10) or if it’s variable (e.g. sometimes 2, sometimes 20), approach 2 and 3 above are more popular, and more appropriate.

5. Put the attribute in a snowflake sub dimension

We can’t really do this, as it is 1 to many (1 row in the main dim corresponds to many rows in the sub dim). So we need a bridge table, which brings us back to approach 3.

6. Keep in 1 column using commas or pipes

e.g. “01223 111111, 1223 222222”. This is a crazy idea, I know, but there have been talks about it in the DW practice. Not only talks, I’ve seen it implemented too! It is more flexible than having several columns (approach 4) and simpler than approach 3 or 2. So if the purpose of the attribute is “display only” or “information only” on a report (rather than analyse or slice & dice), there is an argument for using this approach, particularly if the number of attributes is small (e.g. 1 to 4).

As usual I welcome suggestions and discussion at vrainardi@gmail.com. Thank you for reading. Vincent 13/3/11. Revised for minor corrections 1/4/11.

Advertisements

9 Comments »

  1. Hi Vincent,

    Another approach is to combine the fact approach with a cte (i guess were on sql server) and expose the attribute results as a delimited attribute through the dimension view. This gives you a high degree of flexibility for storage and ease of display.

    Regards,

    Paul

    Comment by Paul te Braak — 1 April 2011 @ 7:27 am | Reply

  2. […] a particular dimension member. Rather than focus too much on the warehouse technique in question Vincent Rainardi’s blog post provides a very good technically agnostic explanation far better than I […]

    Pingback by Multi-Valued Attributes in SQL Services Analysis Services (SSAS) « Brokenninja's Blog — 5 May 2011 @ 1:30 pm | Reply

  3. […] a particular dimension member. Rather than focus too much on the warehouse technique in question Vincent Rainardi’s blog post provides a very good technically agnostic explanation far better than I […]

    Pingback by Multi-Valued Attributes in SQL Services Analysis Services (SSAS) - Shaun Ryan's blog — 5 May 2011 @ 3:25 pm | Reply

  4. […] a particular dimension member. Rather than focus too much on the warehouse technique in question Vincent Rainardi’s blog post provides a very good technically agnostic explanation far better than I […]

    Pingback by Multi-Valued Attributes in SQL Server Analysis Services (SSAS) - Shaun Ryan's blog — 5 May 2011 @ 4:41 pm | Reply

  5. I am not sure if I understand the approach 3 properly. Kindly provide another detailed example

    Comment by Anyone — 10 May 2011 @ 3:39 pm | Reply

  6. […] attributes” as I showed in my previous article, “Dimensions with Multi Valued Attributes” here. Here is an example of a bridge […]

    Pingback by Bridge Table with Multiple Instances « Data Warehousing and Business Intelligence — 28 July 2011 @ 5:39 am | Reply

  7. […] 2 dimensions. A bridge table is usually used in data warehousing for multi valued attributes, see here. A bridge table has no measure (factless). Usually a bridge table contains only 2 columns, both are […]

    Pingback by Bridge Table with Date « Data Warehousing and Business Intelligence — 30 July 2011 @ 3:06 am | Reply

  8. […] a particular dimension member. Rather than focus too much on the warehouse technique in question Vincent Rainardi’s blog post provides a very good technically agnostic explanation far better than I could. I’m […]

    Pingback by Multi-Valued Attributes in SQL Server Analysis Services (SSAS) « semanticmi — 15 September 2011 @ 11:45 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: