Data Warehousing and Business Intelligence

3 March 2010

Reasons for Creating a Data Mart from a Data Warehouse

Filed under: Data Warehousing — Vincent Rainardi @ 11:45 pm
Tags:

There are several reasons why we want to create a data mart from a data warehouse. But before I list them down I would like to clarify the meaning of “data mart” and the meaning of “data warehouse”.

Data mart: in this article I am referring to the Kimball meaning of the word “mart”, which means that it is in dimensional model. Meaning that it consists of a few fact tables (usually 1 to 5 but not more than 10), and the necessary dimensions (usually about 5 to 10). Quoting my own definition in my book, a dimensional data mart is a group of related fact tables and their corresponding dimension tables containing the measurements of business events categorised by their dimensions.

The other possible meanings of “data mart” in data warehousing practice are (which I’m not talking about them in this article):

  1. An OLAP cube. Usually quite small in size, created from 1 or 2 fact tables (and the necessary dimension). Probably not at the lowest grain but at higher/summarised level.
  2. A relational database but not dimensional (not fact & dim). Usually it is not at 3rd normal form either. Just a collection of tables without (necessarily) having referential integrity / foreign key relationships between the tables. It’s quite a small database, there are not many tables, probably between 5 and 15.

Whereas for the term “data warehouse”, it’s as what I defined in my book: a data warehouse is a system that retrieves and consolidates data periodically from the source systems into a dimensional or normalised data store. It usually keeps years of history and is queried for business intelligence or other analytical activities. It is typically updated in batches, not every time the transaction happens in the source systems.

So, with those meanings of data mart and data warehouse in mind, the reasons for creating a data mart from a data warehouse are:

  1. Because we need to create a static copy of a few fact tables (with their corresponding dimensions) which does not change every day like the warehouse, for analysis purposes. Usually it’s only temporary. The users use the data in the mart for a project which does not last long, say a few days to a few weeks. And after that the data mart is deleted.
  2. Because they need a mart for data mining. Processing a data mining model (“training”, “predictive analysis”) creates heavy work load and we don’t want it to affect the performance of the central/core warehouse.
  3. To ease the query workload on the warehouse. Say 50% of the reports are querying one certain fact table (and its dimensions). To enlighten the burden of the warehouse we create a copy of that fact table (and its dimension) in a separated database located on different server, and point some of th reports that way. This mart is refreshed/updated every day.
  4. Because they want to change the data to simulate some business scenarios. They can’t change the data in the core warehouse (it will affect everybody!) so we provide a data mart for them to running their scenarios.
  5. To support the query performance of the reports (the mart is structured differently, indexed differently). Probably putting additional tables in it (summary tables/fact at higher grain, or list tables). The mart is usually read only, refreshed/updated every day.
  6. Because the data warehouse is in normalised format (kind of NDS, probably on PDW), so we need to build a dimensional mart for the analytics/OLAP to get the data from.
Advertisements

4 Comments »

  1. Reason #6 is very important and I would add that having a data warehouse separated from marts allows to create several data marts from the same tables – oftentimes the same entity generates different dimensions for different analysis (just imagine a Product dimension containing only the products which are active and another one, maybe SCD2, with all product history).

    Comment by Marco Russo — 4 March 2010 @ 11:09 am | Reply

  2. […] answers of these questions largely depend on the reason of creating the DM in the first place. In this article I listed down 6 reasons for creating a DM from a […]

    Pingback by Creating a Data Mart from a Data Warehouse: Four Questions « Data Warehousing and Business Intelligence — 13 March 2010 @ 11:44 pm | Reply

  3. […] answers of these questions largely depend on the reason of creating the DM in the first place. In this article I listed down 6 reasons for creating a DM from a […]

    Pingback by Creating a Data Mart from a Data Warehouse: Four Questions « Business Intelligence — 18 July 2011 @ 9:54 am | Reply

  4. thanks for the nice sharing, I am enlightened whether to put data mart or not for my DSS prototype study.

    Comment by Arvin Alonzo — 7 March 2017 @ 12:14 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: