I first noticed SCD type 0 in Wikipedia about 1.5 years ago (link). We recently had a debate/discussion in the office with regard to SCD type 0:
- Some said that SCD type 0 was static, i.e. once we load it we don’t change it.
- Some said that SCD type 0 was static, i.e. once we load a row into the dim, we don’t change that row. But we can load it again with a different Surrogate Key! Wow what a revolution!
- Some wanted to find the definition of SCD type 0 in the Kimball’s Toolkit book. I said to him that type 0 is not there but he still insisted in searching the book. Cover to cover. And after sometime he gave up. Yup, it’s not there. Ralph Kimball and Margy Ross never wrote about type 0. About type 1+2+3 = 6 yes, they had, but not SCD type 0.
In my opinion, with SCD type 0 we don’t change the dim row and we don’t reload it under new Surrogate Key (SK). The question here is: how do we identify the dim row? Using the natural key of course (NK) – some people call it Business Key (BK).
What if the business key changes? Then it’s a new dim row of course, and we load it in. Didn’t we just say that the way we identify a dim row is by the NK?
Do we load new rows? Of course we do. How do we know they are new rows? Well, as I said before, we use the NK to identify a row.
The Idea of SCD Type 0 Was Inspired by SSIS SCD Transform?
I think this type 0 thing was probably inspired by SSIS SCD Transformation. In SSIS SCD Wizard we have 3 different types of attributes: fixed attribute, changing attribute and historical attribute, see below:
This is probably what triggered SCD type 0 idea. In 2009 it’s been 4 year since SSIS 2005 (5 years if you include the CTP) hence in 2009 the SSIS SCD Wizard became a common thing. If we think about it, it is natural when we see the “fixed attribute” in the SCD Wizard, we think “oh that’s probably type 0”. See here for example.
What’s Type 0 For?
OK. Enough with the etymology or the history of it. The correct question is this: what’s that for? What is SCD type 0 for? Because if it’s never get used, why bother talking about it right?
Why do we want to ignore any changes that happen in that dim? What are the examples/cases?
To answer that question first we need to differentiate between a type 0 attribute and a type 0 dimension. Not many type 0 dims around (although I’ll try to give 1-2 examples) but there are many type 0 attributes around. In a Type 2 dimension, we often find some columns which are fixed. We don’t change their value. For example: created by and created date. For these columns if their values change, we don’t want to create a new row. And we don’t want to change their values either. Let’s define what a type 0 attribute is, then go through an example. After that, let’s define what a type 0 dimension is, followed by an example.
Type 0 Attribute
A type 0 Attribute is: a column with fixed value in type 2 dimension. It is of course possible to have a type 0 attribute in a type 1 dimension.
An example of a type 0 attribute in a type 2 dimension: in the source system we have a customer table with 10 columns, plus 1 column: Created Date. One day, for customer number 1, the Created Date column has a different value. For 11 years it has always been 31/10/2000 but today it is 6/7/2011. We don’t want to change the value of the Created Date column to 6/7/2011. We want it to still be 31/10/2000. We want to ignore this change, because it is a Created Date column, i.e. the date the customer was created. We know that customer number 1 was created 11 years ago and its record has been there since 2000. Setting its Created Date to 2011 is incorrect. Why in the source system the Created Date became 6/7/2011? Probably because of data migration, i.e. the row for customer1 was inserted into a new table hence the “system created date” was set to the today’s date. In this case we want to deliberately ignore the new value. Created Date is a type 0 column.
In that customer table we also have Last Modified Date column. Yesterday the value for this column for customer 1 was 24/8/2007. Today the value of that column changes to 6/7/2011. We do want to change the value of that column in our customer dimension table to 6/7/2011, but we don’t want to create a new row. Last Modified Date column is a type 1 column. We are not interested in keeping its historical values.
So that’s an example of a type 0 column (Created Date), as opposed to a type 1 column (Last Modified Date).
When defining a type 2 dimension, often we don’t set every single column to be “historical”/type 2. Some columns are type 1 attributes, some columns are type 2 attributes, and some columns are type 0 attributes. These type 0 attributes are usually the system column, i.e. when the row was created, name of the application that created the row (or IP address), etc.
Please note that in some data warehouses, the Last Modified Date and Last Modified User are type 2 columns. We do track their history because we want to know, who changed the row and when. For example: credit card application in a bank. We want to track who changed the status of the application and when. So the “last modified date” becomes a very important column, by which the Data Warehouse tracks the status of each application, using an “Accumulated Snapshot Fact Table” like this:
PS. I’ll blog about this “Using Accumulated Snapshot Fact Table for tracking status” later on. Promise. But now let me finish this “type 0 dimension” first.
Type 0 Dimension
OK so that’s SCD type 0 attribute. How about SCD type 0 dimension? Now that we have understood what a type 0 attribute is, we can define what a type 0 dimension is.
A SCD Type 0 Dimension is: a dimension of which all attributes are type 0.
In a type 0 dimension, we can add new rows (with new natural keys), but we don’t want the existing rows in dimension to change. An example of a type 0 dimension is an event dimension, and a “derived” dimension. I’ll explain the “derived” dimension first, then the event dimension.
SCD Type 0 dimension is often found in “derived” dimensions. Let me explain what a “Derived” dimension is using an example. Say we have a transaction table. In this table there are 2 columns called “Grouping Type ID” and “Grouping Type”. The value of this “Grouping Type” column is either “Multiple” or “Single”. Whereas the Grouping Type ID column has M and S value. We don’t have the source table for Grouping Type. So we derived/created a dimension called Grouping Type dimension in your mart, and created 3 columns: Surrogate Key, Grouping Type ID and Grouping Type Description. We populated the dim with select distinct type from that transaction table, resulting in 2 rows, like below:
Then we maintain the dimension by comparing today’s “select distinct type” with the current content of dimension table. We only insert if there is a new grouping type. But we don’t want the rows which are already in the dim to be touched. If there’s an “S” row coming today with description = “Syndication”, we ignore this new row. We can add new rows (with new natural keys), but we don’t want the existing rows in dimension to change. Because the source is a transaction table, not a master table. The dimension is “derived” from a transaction table.
The second case of a type 0, which is quite common also, is an “Event” dimension. An “event” dimension is created over time. The rows in an event dimension always “go forward”, i.e. incrementally moving forward in time. There is no “going back” in time. Every event that occur in the business will create a row in the dimension (in some cases more than 1 row). In banking, an example of event is “credit event”. A credit event happens when a company, or a government, goes default (unable to meet its financial obligation, for example a coupon on a bond).
In scientific DW, it is more common to find an event dimension. Weather measurements, electricity measurements, all have event dimensions. The event is the measurement that happens automatically every period of time. An event dimension is characterised with timestamp for its natural key. Which means that what makes it unique is when the measurement happens. For example, a “Differential Pressure Transmitter” (DPT) measures the pressure difference between 2 points and transmits the measurement electronically continuously (analogue output). If we do a sampling say every 1 second, then we have a measurement every second. The time the measurement was taken is the natural key to this Measurement Event Dimension.
To give us an idea about what the attributes of an event dimension are: the attributes of an event dimension are usually the input parameters to the event. In the event of the event being a “production run” (as in in a factory/manufacturer), the input parameters are the settings of the machines in that run. Some DW architect put the single value outputs of the event/measurement into the event dimension. I disagree with this approach. In my opinion the outputs of the event (as they are numerical values) needs to be put in a fact table.
So a type 0 attribute is a column with fixed value in type-2 dimension or type-1 dimension. The values of a type 0 attribute column never change. The values remain the same forever. An example of a type 0 attribute is created date column.
A type 0 dimension is a dimension of which all attributes are type 0. In a type 0 dimension we can add new rows (with new natural keys) but we don’t want existing rows to change. Examples of a type 0 dimension are derived dimensions and event dimensions.
For those of you seeking what a type 0 dimension is, I hope this helps. There are many respectable data warehouse architects out there and some of them may not necessarily agree with my opinion above. In that case I would like to hear from them, and I will be glad to learn from them.
Vincent Rainardi, 12/7/2011