Data Warehousing, BI and Data Science

13 March 2012

Star Schema or Snowflake

Filed under: Data Warehousing — Vincent Rainardi @ 8:05 am

Last week somebody asked me that classic question again: which one should I do, star schema or snowflake and why?

Star schema, because:
1. Query performance
Because it is 1 level join, star schema perform better in query times than snowflake. We don’t need to join multiple tables to form a dimension. All we need to join is the dimension table and the fact table. If your BI tool is ‘all in memory’ such as QlikView, PowerPivot, and SSAS (MOLAP), then they are fast. But if your BI tool is ‘touch the disk everytime’ such as BO InfoView and SSRS, then you need

2. Simplicity in design
This is the main reason why I prefer star schema than snowflake. It is simple. We do not normalise the dimension. We put all the product attributes in 1 dimension. It makes it simple. It is easy to relate one attribute and another. We don’t need to join tables.
The most important reason to choose Star Schema is this: simplicity in SCD type 2. With Star Schema, product dimension is one table so it is easier to make it type 2. With Snowflake it is scattered over 5-6 tables and it is a nightmare to implement type 2.

3. Flexibility for hierarchy
The relationship between hierarchy levels can be different for each dimension. The relationship of city-area-region in the customer dimension can be different from the relationship of city-area-region in the branch dimension. And it can be different in supplier dimension. We don’t put them in 1 central location like in Snowflake. This makes it flexible to define the hierarchy.

4. Compatibility with BI Tools and database engine
Star schema has wider recognition in the DWBI industry. Some database engines such as SQL Server 2008 recognise that it is a star schema and optimise the query accordingly. Some BI tools such as BO and MicroStrategy can create the metadata layer automatically based on star schema.

As a principle, we always have to look at the other side. We must not look at one side only. The disadvantages of Star Schema are:

1. Longer load time
Because we need to perform joins to get all the data required, the ETL in Star Schema takes longer time than Snowflake. But please see SCD type 2 in point 2 above. It make it longer to load a Snowflake Schema if the dimension is type 2.

2. Inflexibility
In Star Schema the dimension is already formed in its entirity. So it’s not possible to dismantle it and offer half of it to one fact table, and offer the other half to another fact table. In Snowflake we can. The dimension is stored in several pieces, enabling us to take the pieces we need.

Now, after all above, you need to read this: When to snowflake (link).

Above is my opinion. I know there are many experts who know much more than me, who may have different opinions. So as always I welcome any comments and discussions at vrainardi@gmail.com

Vincent Rainardi 13/3/2012

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: