Sometimes in a data warehousing or BI projects we need to review an existing dimensional model. In this post I will try to write down what to look for when reviewing.
The most important thing to look out when reviewing a dimensional model is the functionality. The technicality is the easy bit. You can read the Kimball book, read all Kimball Group articles, read my blog, and you might consider yourself an expert technically on dimensional modelling. But if you don’t understand insurance, you won’t be able review the dimensional model in a Lloyd’s underwriter. You could be asking lots of basic questions: what is a “line”, what does “to underwrite” mean? During the first month of the project, you will have to spend a lot of hours reviewing the business requirements of the source systems. You need to understand the source systems. That’s the key.
After you understand the source systems, you can ask yourself these questions: (I’m going to use Investment Bank for an example, and abbreviate Dimensional Model with DM)
- Pick a business area (e.g. Repo, Credit Risk, FX, Fixed Income). Don’t try to analyse the business in its entirety.
- In one area, say Repo, list all the sub areas that need analysis (for example position analysis, exposure analysis, collateral analysis, etc), and check if all of them are covered in the DM. It is possible that you have 6 sub areas for analysis and the dim model only cover 5.
- Then pick one sub area (say exposure analysis), ask yourself what are the business questions that need answering when analysing exposure? For example, daily re-pricing/marking-to-market, exposure by counterparty, rating, liquidity analysis, etc. For example: “what is the value of three days’ worth of the underlying security’s turnover?” To manage the liquidity the business needs to know this.
A good business analyst has experience in Repo in other banks, hence he or she can guide the business user of what to do/expect in the analysis. But what I observe is that the data warehouse architect/designer often don’t understand the business hence he/she can’t design the DM.
- Now that you know all the questions, you can check if the DM can be used to answer those questions. Are the missing attributes/measures? Are they at the right grain? Are there a missing fact table? Are there a missing dimension? Is the history captured? Does the fact table contain the all the necessary data? (for example: we need 5 product types but the fact table only contains 4).
I must emphasise that there is no point going straight to point 4 without doing point 1 to 3.
As I said this is the easy bit. Just read Kimball’s DW Toolkit book and you will be able to evaluate the design of the DM. If you are the business user (rather than the dimensional modeller), avoid going into the technical straight away, without discussing the functionalities. You need to sort out the Functionality first. There is no point talking about SCD, grain, etc if the functionality is wrong.
I’ll list down some technical things:
- Measured stored in dimensions
- Incorrect grain
- DD not captured
- SCD type 2 over used in all attributes
- Incorrect data type
- Some data is required but not available
- Daily snapshot causing performance issue
- Aggregate fact table not defined
- Multi Valued Attribute
As I said, Kimball Toolkit book will give you an exhaustive list. You can also refer to my Advanced Dimensional Modelling presentation here to get some ideas.