A few weeks ago I came across EAV fact tables. EAV is Entity Attribute Values data model (read this for a background on EAV structure). It is a data model which enables us to add column into a table without actually changing the structure. At first I thought this EAV approach has no place in Kimball dimensional model. But after thinking and rethinking I saw that it had some advantages too, not just disadvantages. And in some cases it is appropriate to be used.
So below I’ll explain what it it is, the advantages, the disadvantages and the verdict about it.
What does it look like?
An EAV Fact Table looks like this:
The above example is taken from retail industry (a fashion retailer), where they analysed the profitability of each product line every day. What is a product line? They have 9 product groups: women, men, kids, shoes, handbags, accessories, watches, jewelry, beauty. The women product group consists of 23 product lines: dresses, tops, jumpers, cardigans, shirts, T-shirts, blazers, capes, jackets, coats, skirts, shorts, trousers, jeans, leggings, tights, socks, nightware, lingerie, leisureware, swimware, suits, new.
The above EAV fact table is product line performance fact table. Every day, based on the sales figures, profit margins, direct costs and overheads, they calculated various performance measure for each product line: sales growth, profit margin, product distribution, margin stability, cost effectiveness, price variation, colour variation, size variation, style variation, etc. TypeKey 217 means 1 week sales growth, 218 means 1 week margin stability, 219 is product distribution, and so on. Some measures are time-based, so they have periods such as 1 day, 1 week or 1 month. Some measures have 2 versions: net and gross. Some measures have location variations i.e. local and global. Similar measures are grouped.
Performance measurements are different for each product line. Measurements applicable for a product line may not be appliable to other product line. Using EAV structure fits the structure of performance measurement data, and makes it flexible. Because there are so many product lines with so many different performance measures, almost each week they have a new performance measure. This is because in the business analytic software they can create a new measure at any time. They define the formula for that measure, and on which product lines the measure are applicable, the different range of time periods applicable to that new measure, whether it is gross or net measurement, and whether it is global or local based.
What’s bad about it?
The main down side of EAV fact tables is: when we query the fact table we may need to pivot it. If we need to retrieve the weekly margin stability for every product lines, we could just filter on Performance Type Key = 128 and we get what we need. But if we need to retrieve all time variances for sales growth (1 week, 2 weeks, 1 month, etc) for a certain product lines, then we will need to pivot the data.
This pivoting is annoying because we have to hardcode the performance type names to make them as columns. At times it could be so annoying that we wished we had a normal Kimball style fact table so we didn’t have to pivot the data.
What’s good about it?
In Kimball star schema, in the fact tables the measures are created as fixed columns. If we have a new measure, we will need to create a new column. If the retail analytics software is churning out new measures almost every week, our development cycle will not be able to cope with the pace. In a mature warehouse, it will probably take about a month to complete the analysis, design, development, testing and release process, just to add 1 column. Equal to probably 10 man days.
If the employee cost rate is $550 per day (sum of salary, bonus, medical, dental, vision, relocation, life insurance, accident insurance, training, pension, cost of HR staff, childcare, cost of facilities, and 401k for a year, say $120k, divided by 250 working days per year minus 30 days vacation and sick time) that 1 column would cost $5500. It is very not cost effective.
Having a flexible structure like the EAV fact table means that we don’t have to worry about new performance measure churned by the retail analytic software almost every week, ever. That saves $5500 per week, or $275,000 per year, which is a lot of money.
If new measures are created quite often (more than once per quarter) and the measures are different for every product lines (or every customer, or other dimensionality such as instrument or asset type), then the EAV Fact Table approach is justified.
If the measures are quite stable (changes are less than once per quarter), and the measures are the same for every dimensionality, then an EAV Fact Table is not justified. We should build it as a normal, Kimball-style fact table.
But that is just my opinon. You could have different experiences, and therefore different opinions. Hence I would like to hear your comments in this topic.