Data Warehousing and Business Intelligence

13 March 2010

Creating a Data Mart from a Data Warehouse: Four Questions

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 11:44 pm
Tags: ,

I meant a dimensional data warehouse, not a normalised data warehouse here. The things I’d like to cover are:

  1. In the DM, could we change the grain of the fact table? (to a higher grain)
    In doing so could we combine several fact tables into one?
  2. If we have SCD type2 dimension in the DW, should we make it type 1 in the DM?
  3. If the DW is in star schema, could we make it snow flake in DM?
    If the DW is in snow flake schema, could we make it a star?
  4. Should the DM be allowed to change the attribute names and entity names?
    Is there a good reason for doing it?

The answers of these questions largely depend on the reason of creating the DM in the first place. In this article I listed down 6 reasons for creating a DM from a DW:

  1. Because we need to create a static copy of a few fact & dim tables which does not change every day like the warehouse, for analysis purposes.
  2. Because they need a mart for data mining.
  3. To ease the query workload on the warehouse.
  4. Because they want to change the data to simulate some business scenarios.
  5. To support the query performance of the reports.
  6. Because the data warehouse is in normalised format.

In this article I’d like to focus on #5: a DM which is created to support the reports’ performance (or BI application). The DM is structured differently to the DW.

In the DM, could we change the grain of the fact table? (to a higher grain)
In doing so could we combine several fact tables into one?

Yes by all means. That was the intention of creating the data mart in the first place. To support query/report performance we need to create the fact table at a higher grain and that’s what relational aggregation is all about. Alternatively you could build a cube and have your reports reading the cube.

Yes by all means combine the fact tables. The idea of optimising data structure for report performance is that we store the data in a ready-for-consumption format. If this means reading several fact tables from the DW and combine them into a single DM fact table so be it. This is where the covering index will be very useful. A nonclustered index that includes all ‘grain determinant’ dim keys in the fact table would speed up the join. And no, referring to this article, a “fact key” column would not be useful as you can’t join on it.

If we have SCD type2 dimension in the DW, should we make it type 1 in the DM?

Yes by all means do that. If you want to make your mart perform, simplify it. The principle of designing a mart is: 1) only take from DW what you really need, 2) structure it in the most performing way. The DW on the other hand, has different design principles, i.e. 1) store anything and everything, and 2) structure it in the most flexible way.

SCD type 2 is a classic example of this ‘flexibility’ principle. Worry that there will be a need to understand the value of those attribute when the transaction happened, the DW modeller declares the dimension as type 2. Some designers even created the “current attribute” columns, in that type 2 dim, containing the current value. And some created the “current key” column, which contain the dim key of the current row. For a good background on this Slowly Changing Dimension type 2+1, please read Margy Ross and Ralph Kimball article here. Look for the heading “type 2 with current overwrite”. It is not a new concept. They wrote it 5 years ago (March 2005). I first came across it when writting this article. In there I highlighted what Margy Ross said about type 6 which is type 2+3+1 in her article: Combining SCD Techniques. And that was in October 2000!

So yes, if the dimension is type 2 in the DW and you do not need historical value of attributes, you should make the dimension as type 1 in your mart.

Just a few notes on the details when creating a type 1 dim from a type 2 dim:

  1. Use Is Current Flag column instead of date validity columns (it’s faster). A discussion about the data type of the Is Current Flag column is here.
  2. If your source type 2 dimension has “current attribute” columns, use them (instead of the normal attribute column).
  3. If the type 2 dim has both “is_current” flag and “current attribute”, use is_current column regardless whether you use the “current attribute” columns or the normal attribute columns.
  4. You will need to translate the dim keys in the fact table to the current dim keys. If your source dim has “current dim key” column, it makes the job a lot easier. Otherwise this operation is a quite heavy. Essentially we need to prepare a table with 2 columns: dim key and current dim key. We work out the current dim key column by looking up the natural key where is_current = ‘Y’

If the DW is in star schema, could we make it snow flake in DM?
If the DW is in snow flake schema, could we make it a star?

First let’s go through the reasons. The reasons people need to normalised a star schema into a snow flake schema are: a) to support a data entry application where the application tags certain information entered against the attribute member, b) to support Reporting Services drop down lists, possibly Sharepoint lists also, c) ease of maintenance, because that particular attribute is used by many dimensions, and d) company standard for dimensional modelling.

Where as the reasons people need to make a star are: a) query performance, b) query performance and c) query performance.

Considering that the purpose of creating the data mart is to support report’s query performance, In my opinion the answer to this 2 questions is: yes by all means make it snow flake or star as necessary to support the report/application.

Some notes when normalising a star schema:

  1. There is another way of normalise a star (other than physicalised it as tables). And that is by creating views. Be it materialised view or not (indexed view in SQL Server).
  2. If you have both attribute code and attribute name in the DW dim then “select distinct attribute code, attribute name from dim1” – no problem. But if you don’t have the attribute code, then you need to either a) link on attribute name, or b) create and maintain the attribute code in the mart. I’d recommend the latter. Of course there’s another option which is to ask the DW to add the attribute code.

When creating a star from snow flake:

  1. Consider SCD type, e.g. a) parent & child source are type 2, target is type 2, b) parent and child source are type 1, target is type 1, c) parent is type 2, child is type 1 (or vice versa) and target is type 1. To create a type 2 from type 1 is not possible.
  2. Out of the 3 combinations on point 1 above (more if you have 3 levels), the issue is when you want to create a type2 target (option a in point 1 above). We need to make sure that the parent is joining to the child dimension on the right row, e.g. which one is the child row for that parent row etc.
  3. Set your dimension grain (of the target dimension) according to what the application/report need. Usually not at the lowest possible level, but at higher level, e.g. product type – product group – product code: if the report works at product type then set the grain at product type, so the report will have good query performance.
  4. If necessary create 2 dims at different grain, 1 at product type and the other at product code. Reports which work at product type level don’t have to use the product code dim. The higher level dim could be created using indexed view rather than a physical table.
  5. If you change the dimension (either the grain or SCD type) you will need to translate the dim keys in the fact table.

Should the DM be allowed to change the attribute names and entity names?
Is there a good reason for doing it?

By ‘entity’ I mean table, e.g. dimension table and fact table.

Yes, in my opinion. The DW is designed to be flexible and incorporating different business areas/units, so in DW we create it as 2 dimensions. Whereas in DM we only deal with 1 area of the business so we only need to create 1 dimension.

For example, in wealth management we have 2 account dimensions: private accounts and corporate accounts. They have different attributes because their have different natures of business. They are used in different fact tables. So the dimension name in the DW are accordingly named as dim_private_account and dim_corporate_account.

When we create a DM for private business, it makes sense if we want to call the account dimension as dim_account (without the ‘private’ prefix) because in this mart we only deal with the private business.

But if the name changes are because ‘I just simply don’t like it’ I would discourage you from changing the attribute/table name. As much as possible you should try to keep the attribute and table names in the DM the same as in DW. This is for consistency, and also for ease of use. If a measure in DW is called potential_profitability_ratio, and in DM it’s called potential_profitability or profitability_ratio then users will be confused. Not only the end users, but report writers and ETL developers will be confused too. Exact column names give the additional assurance that they are the same, and contain the same data.

2 Comments »

  1. […] article @ https://dwbi1.wordpress.com/2010/03/13/creating-a-data-mart-from-a-data-warehouse-four-questions/ GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", "other"); […]

    Pingback by Creating a Data Mart from a Data Warehouse: Four Questions « Business Intelligence — 18 July 2011 @ 9:54 am | Reply

  2. Great article. Thanks

    Comment by Mustafa — 6 December 2016 @ 9:09 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: