Let’s have a look at this dimensional model:
It’s a snowflake schema. It is the dimensional model for a book distribution company. The company buys books from publishers, stores them in their warehouses and sells them in their shops. “Line” means product line, i.e. book category such as academic books, fiction books, sport books, etc.
There are a few interesting points here.
- DimLine, DimAuthor and DimPublisher are the sub dimensions of DimProduct.
- DimCity is a sub dimension of DimCustomer.
- Each sub dimension has a surrogate key.
Because each sub dimension has a surrogate key, there is additional complexity. If the sub dims are type 2, whenever the author name or publisher name changes, their surrogate key changes, resulting in a new row in DimProduct. Let me demonstrate:
Below are DimProduct, DimLine, DimAuthor and DimPublisher:
Now let’s make 1 change in each sub dimension (SCD type 2):
Each time any of the sub dim changes, in the DimProduct we will have 3 new rows:
As you can see above, we now have 4 versions of Rainfall in DimProduct.
That is the issue of using Snowflake: we can’t do type 2 on sub dims. We can only do type 1, like this:
Whenever I asked a DW practitioner which DW methodology he uses Star or Snowflake, the answers almost always have been Star. But when I dig deeper why not snowflake, nobody brings up this fundamental issue: inability to store the history of attributes. I hope this article would make it clearer to DW practitioners.