Data Warehousing and Business Intelligence

29 December 2015

DimMonth, DimQuarter and DimYear

Filed under: Data Architecture,Data Warehousing — Vincent Rainardi @ 6:21 am
Tags: ,

Sometimes the grains of our fact tables are monthly, quarterly, or yearly. In such cases, how do we create DimMonth, DimQuarter and DimYear? Some of the questions in these cases are:

  1. Why do we need to create Month as a dimension? We can’t Month column in the fact table remain as Month, not as a dimension key?
  2. What does DimMonth look like? What are the attributes? How about DimQuarter?
  3. Should we create DimMonth as a physical table, or as a view on top of DimDate?
  4. What is the surrogate key of DimMonth and DimQuarter? Do we create a new SK, or do we use the SK from DimDate?
  5. Do we need to create a dimension for year? It seems weird because it only has 1 column (so that would be against the “degenerate dimension” concept)

For example, in the source system we have a table which stores the monthly targets for each store:

Or quarterly target like this:

How do we create a fact table for this? Should we create it like this?

Question 1 above: Why do we need to create Month as a dimension? Why can’t we leave it as Month in the fact table, not as a dim key, like this?

Question 2 above: if we decided to keep the column as MonthKey, how should DimMonth look like? What are the attributes? Should DimMonth be like this?

What attributes should we put there?

  • Do we need quarter and year?
  • Do we need half year? e.g. 2015 H1 and 2015 H2
  • For month name, should we use the short name or long name? e.g. Oct 2015 or October 2015?
  • Do we need an attribute for “Month Name without Year”, e.g. October?
  • Do we need “Month End Date” column? e.g. 2015-10-30
  • Do we need “Month is a Quarter End” indicator column for March, June, Sep and Dec?
  • Do we need “Number of days in a month” column? e.g. 30 or 31, or 28 or 29.

Or should we use “intelligent key” like this?

Question 3 above: Should we create DimMonth as a view of DimDate like this?

create view DimMonth as

select min(DateKey) as MonthKey, MonthNumber, MonthName, Quarter, [Year]

from DimDate

group by MonthNumber, MonthName, Quarter, [Year]

 

What are the advantages of creating DimMonth and DimQuarter as a view of DimDate? (compared to creating it as a physical table) What are the disadvantages?

 

I think with the above questions and examples we are now clear about what the issue is. Now let’s answer those questions.

 

Q1. Do we need to create Month as dimension? We can’t Month column in the fact table remain as Month, not as a dimension key, like this?

 

We need the Month column in the fact table to be a Dim Key to a month dimension because we need to access Year and other attributes such as Quarter.

 

Bringing Year into the Sales Target fact table like below is not a good idea, because it makes it inflexible. For example, if we want to add Quarter column we have to alter the fact table structure.

 

Using a Dimension Key to link the fact table to a Month dimension makes it a flexible structure:

 

There is an exception to this: Snapshot Month column. In a monthly periodic snapshot fact table, the first column is Snapshot Month. In this case, we do not need to create this column as a dimension key, linking it to DimMonth. In this case, we do not need a DimMonth. Because we do not need other attributes (like Year or Quarter). A monthly periodic snapshot fact table stores the measures as of the last day of every month, or within that month. For example: number of customers, number of products, number of orders, number of orders for each customer, the highest price and lowest price within that month for every product, the number of new customers for that month, etc.

 

Q2. What does DimMonth look like? What are the attributes?

 

Obviously, the grain of DimMonth is 1 row per month. So we are clear about what the rows are. But what are the columns? Well it depends on what we need.

I usually put MonthNumber, MonthName, Quarter and Year in DimMonth, because they are frequently used.

I don’t find “Month Name without the Year” as a useful attribute. I rarely come across the need for “Half Year” attribute.

“Month is a Quarter End” column is also rarely used. Instead, we usually use “Quarter” column.

“Month End Date” and “Number of days in a month” are also rarely used. Instead, we usually use “IsMonthEnd” indicator column in the DimDate.

For month name, should we use the short name (Oct 2015) or the long name (October 2015)? I found that the short name is more used that the long name. But the month number (2015-10) is even more frequently used that the short name

Q3. Should we create DimMonth as a physical table, or as a view on top of DimDate?

This is really the core of this article. A view on top of DimDate is better in my opinion, because we avoid maintaining two physical tables. And it makes the dimensions less cluttered.

If we make DimMonth and DimQuarter as a physical dimensions, in SSMS Object Explorer, when we open the table section we would see these:

DimDate

DimMonth

DimQuarter

DimYear

But if we create DimMonth and DimQuarter as views, then we will only see DimDate in the Object Explorer’s table section. The DimMonth and DimQuarter will be in the view section.

The main disadvantage of creating DimMonth as a view from DimDate is that it is less flexible. The attribute column that we want to appear in DimMonth should exist in DimDate. But I found that DimMonth usually only need 2 or 3 attributes i.e. Month, Quarter, Year; and all of them are available in the DimDate table. So this is not an issue.

Avoiding maintaining 2 physical tables is quite important because when we extend the date dimension (adding more years i.e. more rows) and we forget to extend DimMonth and DimQuarter, then we will cause an error.

The other consideration is of course the performance. I do not find the performance of DimMonth and DimQuarter to be an issue. This is because DimDate is not too large, and more importantly because the monthly and quarterly fact tables are small, less than 1 million rows. They are much smaller than daily fact tables which have millions or billions of rows.

 

Q4. What is the surrogate key of DimMonth and DimQuarter? Do we create a new SK, or do we use the SK from DimDate?

If we create DimMonth and DimQuarter as physical tables, then the surrogate key can either be pure surrogate (1, 2, 3, …) or intelligent key (201510, 201511, 201512, etc.)

But if we create them as a view of DimDate, then the surrogate key can be either the first day of the month (20151001, 20151101, 20151201, etc.) or the month itself (201510, 201511, 201512, etc.). I prefer the latter than the former because it is more intuitive (intelligent key) and there is no ambiguity like the former.

The script to create the view for DimMonth with SK = 201510, 201511, 201512, etc. is like this:

create view DimMonth as

select distinct convert(int,left(convert(varchar,SK_Date),6)) as MonthKey,

[MonthName] , Quarter, [Year]

from DimDate

 

Q5. Do we need to create a dimension for year?

 

No we don’t need to create DimYear, because it would only have 1 column.

What should we call the dim key column in the fact table then? Is it YearKey or Year? We should call it YearKey, to be consistent with the other dim key columns.

A dimension which only has 1 column, and therefore be kept in the fact table is called a Degenerate Dimension. A Degenerate Dimension is usually used to store identifier of the source table, such as Transaction ID and Order ID. But it is also perfectly valid for dimensions which naturally only have one attribute/column, like Year dimension. See my article about “A dimension with only one attribute” here: link.

Advertisements

1 Comment »

  1. Great and pragmatic article.
    I faced some similar design issues recently with the date dimension and with the geography dimension. For the geography I had at least 3 options: (a) a self-referenced table (let’s say city, country and region in the same table with a parent Id), (b) a denormalized table with the hierarchy flatten-out, (c) a snowflake schema with a table for each “level” or entity. I finally used the option (c) because there are fact tables with different granularities (sales by region, inventories by country, etc..) and I want to keep FKs using the surrogated keys. I think other options also worked depending on the requirements. For the date dimension, it is also possible to have a single table with the flat hierarchy if the end tool is SSAS for instance. It is possible to logically join a measure group to a dimension attribute other to the dimension key. That will work, the problem in my opinion is to have a consistence design and if you want to use FKs or not. I think FKs are the better way to force the integrity of the data.

    Comment by Paul Hernandez — 29 December 2015 @ 12:18 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: