Data Warehousing, BI and Data Science

27 February 2011

Mixed Grain Fact Table

Filed under: Data Warehousing — Vincent Rainardi @ 7:19 am
Tags:

A mixed grain fact table is a fact table where we have measures with different grain. For example, one measure is weekly whereas the other measure is monthly. In this post I’d like to go through the advantages and disadvantages of that approach. The Kimball group clearly stated that measures in a fact table must have the same grain, see Kimball’s Toolkit book chapter 2.

As usual it is easier to explain “by example”:

It is a data mart. They have measures per week and per month, but they don’t have the measure on daily basis. Should we have 2 fact tables, one for weekly and one for monthly, like this: (1)

Or should we create a mixed grain fact table like this: (2)

In the above fact table, the black lines are weekly figures, whereas the red lines are monthly figures. They are both put in the same fact table, but on different measure columns. On the rows where the weekly measure exist, the monthly measure is zero. And vice versa. Therefore weekly and monthly total are correct:

select D.Week, sum(F.WeeklyMeasure) from FactMixedGrain F
join DimDate D on F.DimDate = D.DimDate group by D.Week

Result:

select D.Month, sum(F.MonthlyMeasure) from FactMixedGrain F
join DimDate D on F.DimDate = D.DimDate group by D.Month

Result:

Usually the main reason of doing option (2) is to have both the weekly and monthly measures in 1 fact table. This is done to save development effort, particularly ETL effort. It is easier to populate just 1 table than two tables.

That’s the advantage. Now the disadvantages. The issue with option (2) is that the grain of the fact table is mixed. There are 2 grains. In practice we have other dimension key columns in the fact table. And we also have other measure columns in the fact table. Some of these measures are weekly and some of them are monthly. Perhaps some of these measures are daily.

The issues with a fact table with a mixed grain like that are:

  1. When the reporting from the table, we are risking mixing measures with different grains in the same formula, resulting in meaningless result.
  2. The DateKey is used to indicate both the end of week and the end of month. This requires certain logic in the ETL, which prolong its development.
  3. In the fact table we have black and red lines, indicating weekly and monthly grain. Black for weekly and red for monthly. We may end up having yellow lines too, for daily. And blue lines too, for quarterly. Oh, and green lines too, for yearly. After that, it will be very confusing. When we query the fact table, we wouldn’t be able to tell which line is which.
  4. When we join this fact table with another fact table (drilling across), we are risking joining the wrong rows.

In option (2) above, the date dimension is used for 2 different grains in the fact table. Other examples like this are:

  1. Product dimension used at the product group level and at the product category level.
  2. Customer dimension used at the company level and at the group level.

When we have 2 or 3 dimensions, with each has dual grain like that, it becomes really confusing and risky.

In the above example, the same dimension key column has dual grain. In other cases, the mixed grain in the fact table is caused by different dimension key columns. One measure correspond to dim1, where as another measure correspond to dim2. Where measure1 has value, measure 2 is 0. And vice versa, like this:

And so it becomes really confusing. So I would recommend avoiding this design, mixing measures with different grains into the same fact table. Instead, it is better to put them into separate fact tables.

As usual I welcome comments and discussion at vrainardi@gmail.com. Vincent 27/2/11.

Advertisements

3 Comments »

  1. I had this same experience building daily and monthly performance mart for a bank here in Nigeria. The solution is for Monthly Performance Report, all of a sudden, there comes an overzealous or creative analyst that wants to get daily MPR for weekly report, then monthly as it is. I maintained the fact table then introduced a column in the DimDate dimension called “IsMonthEnd”. The holds “Y” or “N” and there is an understanding with report builders to filter by the column. Great post you have here, thanks for sharing your thoughts, as usual a good site/place everyday. Many thanks.

    Comment by abacusdotcom — 26 October 2015 @ 4:53 pm | Reply

  2. I have a report which requires join between two fact tables(from two different business processes). I know I can do this at reporting level but don’t think it is the efficient way of handling this scenario. What is the best way of data modelling if I need to create fact based on two business processes?

    Should I create one fact using source tables for two business processes?
    Or
    create fact for each business process separately based on single source table and then join the results from the queries?
    Kimball suggests second approach for drilling across but how to implement this in data modeling? Does that mean creating facts and then dimensions which contain joins of two facts? Or this can only be done by reporting tool?

    Comment by Pravin Pande — 17 December 2018 @ 7:42 pm | Reply

    • Hi Pravin, the best practice is to create a separate fact table for each business process. Because you have two different business processes, you should be creating two fact tables.

      But in my experience it depends on what these two business processes are, and it depends on what the grain of the fact tables are. In some cases we need to put them into one fact table, not two. For example:
      1. Invoicing process and sales process: 2 fact tables
      2. Loan application process and mortgage application process: could be 1 fact table (depends on grain)
      3. Portfolio positions/holdings and benchmark positions (2 different processes): both 1 fact table approach and two fact table approach have advantages and disadvantages (see my article here https://dwbi1.wordpress.com/2018/09/25/data-warehouse-for-asset-management/)
      4. Derivative trade and bond purchase/sale (2 different processes): definitely 1 fact table
      5. Water pollution and climate change: 2 fact tables if it is geo-climate forecasting, but 1 fact table if it is ESG scoring.
      6. Account balances for saving accounts and mortgage accounts (2 different processes): definitely 1 fact table
      7. Shipment and purchasing: definitely 2 fact tables

      Note: For us to take the one fact table approach, two business processes must have the same grain.

      Because you have two different business processes, you should be creating two fact tables. But I would advise against joining two fact tables at run time when the report runs (unless the fact tables are small, i.e. less than 20,000 rows). Instead, after both fact tables are populated, at the end of the load calculate the numbers required by the report, then store the result/output in an output table. Make this output table generic, i.e. it should be reusable for other purpose (storing results of other calculations). For the entire system you could have 2 or 3 output tables (with slightly different structures), but not more than 4.

      Comment by Vincent Rainardi — 18 December 2018 @ 8:15 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: