A multi valued attribute is an attribute which has more than 1 value per dimension row. A “Multi Valued Attribute” is different to A “Multi Valued Dimension”. A “Multi Value Attribute” occurs in a dimension, whereas a “Multi Valued Dimension” occurs in a fact table. A “Multi Valued Dimension” is a dimension with more than 1 value per fact row. As always it is best to explain by example:
Anita Haynes|Edinburgh|0131 123456
Bobby Locard|Peterborough|01733 123456
Ken Barrera|Cambridge|01223 111111, 1223 222222
The Phone Number column above is an example of a multi valued attribute. It can have more than 1 value.
Another example for DimCustomer is interest. In a travel company, one customer can have several interests, which could be captured in an online survey.
Examples of multivalued attributes in the Product dimension are colour and size, e.g.
The Colour column above is an example of a multi valued attribute. It can have more than 1 value. Same with the Size column.
An example of a Multi Valued Dimension is given by Ralph Kimball and Margy Ross in the Health Care chapter of their Toolkit book (page 263 chapter 13 on the 2nd edition): Diagnosis dimension on the Billing Line Fact Table. For each fact row there could be more than 1 diagnosis. 99% of the time, when this happens, we remove that dim from the fact table, because the grain does not suit the fact table. However, as Kimball and Ross said, it is sometimes useful to keep them in the fact table.
But this article is about Multi Valued Attribute, not Multi Valued Dimension.
There are several approaches to deal with a dimension with a multi valued attribute.
- Lower the grain of the dimension
- Put the attribute in another dimension, link direct to the fact table
- Use a fact table (bridge table) to link the 2 dimensions
- Have several columns in the dim for that attribute
- Put the attribute in a snow-flaked sub dimension
- Keep in one column using commas or pipes
1. Lower the grain of the dimension
In this approach we change the grain of the dimension, taking into account the multivalued attribute(s). It is quicker to easier by example: in the case of the colour and size above, we change the DimProduct to a lower grain, e.g. the combination ProductCode-Colour-Size as follows:
But in many cases we are not able to do this, because the fact table requires that the product dimension is at Product Code grain. For example: we don’t have the sales information per colour and size, but only by product code. You could argue that if the sale is for product B, we could put ProductKey 2 or 3, doesn’t really matter which one. But this gives the wrong impression that we know the colour or size.
2. Move the attribute to another dimension, linked direct to the fact table
An example of this is DimProduct which has a “Manufacturer” attribute. A product is usually made by 1 manufacturer, but sometimes two or three. One solution is to move manufacturer to its own dimension (DimManufacturer) and in the fact table we have ManufacturerKey. This approach is only acceptable if the fact table is relevant to the new dim, for example FactSupplierPerformance. But if the fact table is nothing to do with DimManufacturer (say FactSales) we can’t put ManufacturerKey there. Note that to use this approach we need to know (or assume) how to allocate the measures for each manufacturer, e.g. equal proportion, one takes all, or weighted:
After, if product1 has 2 manufacturers, equal proportion:
“One takes all”: (this is the easiest, but the question is always: which one?)
In most cases, we don’t have the allocation information (as in, if it is 50-50 or 30-70), we only know that product1 has 2 manufacturers.
3. Use a bridge table to link the 2 dimensions
This is the “standard” approach, when we need a multivalued sub dim to be linked to the main dim. For example, in the case of manufacturer above:
(Weight or percentage column, aka allocation column, is optional)
Some of the advantages of this approach are:
- We keep the fact table as it is
- We keep the main dim as it is
- That “guessing of the percentage/weight” is limited only to the relationship between product and manufacturer
Whereas the disadvantage, in comparison to approach 1 & 2 is: we need “more join” as there are more tables.
4. Have several columns in the dim for that attribute
If the number of attributes is small and fixed, this is by far the most popular approach to handle Multi Valued Attribute, i.e. in the case of DimCustomer above, have Phone1 and Phone2 columns. I said “if the number of attributes is small and fixed”, i.e. 2 or 3. But if the number of attributes is large (e.g. >10) or if it’s variable (e.g. sometimes 2, sometimes 20), approach 2 and 3 above are more popular, and more appropriate.
5. Put the attribute in a snowflake sub dimension
We can’t really do this, as it is 1 to many (1 row in the main dim corresponds to many rows in the sub dim). So we need a bridge table, which brings us back to approach 3.
6. Keep in 1 column using commas or pipes
e.g. “01223 111111, 1223 222222”. This is a crazy idea, I know, but there have been talks about it in the DW practice. Not only talks, I’ve seen it implemented too! It is more flexible than having several columns (approach 4) and simpler than approach 3 or 2. So if the purpose of the attribute is “display only” or “information only” on a report (rather than analyse or slice & dice), there is an argument for using this approach, particularly if the number of attributes is small (e.g. 1 to 4).
As usual I welcome suggestions and discussion at firstname.lastname@example.org. Thank you for reading. Vincent 13/3/11. Revised for minor corrections 1/4/11.