In a Data Warehousing project, when a Data Warehouse Designer or a Dimensional Modeller completed his design and hand it over to the ETL developers, some people think his utilisation will drop from 100% to 10%. This is not true. During ETL development phase, the DW Designer will still be fully occupied. The ETL developers will ask him a lot of questions for many weeks. And the testers will also ask him questions too.
This “answering questions” task seems to be a light job but in reality it takes quite a lot of time. Particularly if the developers and testers are overseas, e.g. UK/US to India. Non face-to-face communications (e.g. telephone conference supported by desktop sharing software) take more effort than face-to-face.
What kind of questions do the ETL developers ask? Mainly ETL questions but some are modelling questions. So it would help if you have an ETL architect in your DW/BI project. Because if you do, this ETL architect would be able to take away the burden of the DW designer having to interface with many developers at once. Instead the DW designer will only need to interface with 1 person.
These are examples of the questions from the ETL developers:
- Column length, e.g. the natural key (aka business key) is varchar(20) in source table 1, but varchar(30) in source table 2.
- Records with NULL measures or NULL natural keys in the source system: what to do with them?
- Duplicate natural keys in the source system: how to de-dup them?
- Inconsistency between the dimension model and the data, e.g. the dim model says that the source column is X, but in the data there’s no column called X.
- Inconsistencies of data types, either between source system and staging or between staging and the dimensional data store.
- Decimal points: should we round to 2 decimal point? But the source data is 6 decimal point hence truncation.
- The rule for the ETL is: if it’s null set it to “unknown”. But it is a numeric column, not a string column. Should we set as null or 0?
In addition to answering the above questions asked by ETL developers, the DW designer still has to finish the dim model design. When a dim model is given to the developers to be built, in most cases it is not 100% complete, probably 90% or 95% complete, but not 100%. This is because:
a) There are source systems that we can’t get yet.
b) There are changes in the source systems.
c) Misunderstanding of the source systems, i.e. the dim model was designed without having the source data
So a DW designer or a dim modeller (what’s the difference between a DW designer and a dim modeller, you might ask? I’ll explain after this) still needs to do these tasks when the development is being done. In some projects, these “grey” parts of the dim model are labelled as “phase 2”. It is common in DW projects that the ETL development (and construction of test packages) is started when the dim model achieves a somewhat mature state. Usually, most of the source systems are known to the designer, but there is always 1 or 2 which he hasn’t got them yet. Of course, logically, he start the design work without waiting for this 1 or 2 little gaps in the data sources. And guess what? 2 months later, after the design is “finished”, those 1 or 2 items are still not there. And they got pushed to phase 2. This is common.
Differences Between a DW Designer and a Dim Modeller
What’s the difference between a DW designer and a dim modeller, you might ask? A dimensional model is the fact and dimension tables that form the data warehouse. It is the data model of these tables, i.e. the Entity Relationship Diagram (ERD). It specifies the table names, and how they relate to each other (through the surrogate keys). It specifies the columns in those tables, and the data types and length of each column. It specifies the default values if any, and specific column properties such as “identity”. It also specifies related table properties such as indexes and material views.
A dimensional modeller is the person who designs/creates the dimensional model. His job title might be “data architect”, because he doesn’t merely design the data warehouse/mart, but he also designs OLTP databases. It is uncommon for someone to have the title of “dimensional modeller”. “Data Architect” is a much more common title.
Notice that a dimensional model does not explain how we are going to populate the tables. It explains the tables, but not how to populate it and where we are going to populate it from. This usually falls in the hands of an ETL architect, who, alongside the dim modeller, specifies how the tables will be populated. It is common in a DWBI project that once the dim modeller completed the design of a “subject area” of the dim model, that area is passed to the ETL architect so he can specify the data population for that area. So they work simultaneously, shortening the duration of the project. An ETL architect is also known as an ETL designer.
But in small DW projects (6 months end-to-end or less), 1 person does both the dim modelling and the ETL architect. In this case the person is labelled as a “DW designer”, or as it is more commonly known, a “DW Architect”.
Vincent Rainardi, 14/6/2011