A data warehouse is usually built when the operational system is up and running. But sometimes we have to build a data warehouse or data mart at the same time as the operational system. This has particular challenges.
- We can’t get production quality data, because the system is not in production yet. And for several items we can’t get the data at all, because that part of the system is not built yet.
- The source system in general is still buggy and not stable. The database can still change. The functionality can still change.
This doesn’t mean that we can’t do it. We can. But there are certain things that we need to do.
- First and foremost, for the benefit of project management (the economics of resources), agree the interface between the operational system and MI. This will speed up the project timeline overall, because the MI / DW team can start development without waiting for the operational system to complete first. We remove the dependency between them.
- Unlike a normal warehouse project where we usually query the source tables directly using our Data Integration tool, I would advise not to do it that way. It is better if this agreed interface is in the form of flat files. This is because a) the operational database is still changing, you don’t want to interface with a database which is still changing. By interfacing using files we establish some stability. b) the files could be used by other downstream systems too, not only the warehouse.
- Produce sample files for the interface. If we have the data in the operational system, use real data. If for certain areas we don’t have any data (because the system hasn’t been built yet for that area), then use a made up data. Ask the ops team to describe the functionality of that area, then compose the sample data. Ask the operational team to verify that the sample data makes sense.
- Then we go ahead with the usual nitty gritty of DW project development cycle, be it Agile or waterfall. Define the front end of the MI, either cube or report. Define the dim model, create the tables. Build the ETL (can’t believe we still use this term) and build the reports & cubes.
- Follow the ops team development. If they develop area 4 first, then area 1, we do so too. If they put area 4 to UAT, and area 1 is still in dev, we do so too. I know we need all the dims to complete a particular star, and some of them could be in area 1 which is still in dev. But the principles still apply: as far as possible, follow the operational team development progress. You don’t want to MI something which hasn’t been developed.
As usual I welcome any questions and comments at email@example.com. Vincent 5/3/11.