About 4 years ago I had an idea (well, more of a dream really) of “building a data warehouse at the press of a button”. This means that everything is automated, including the development of the reports/cubes and the development of ETL. I understand that we can build SSIS, SSRS, and SSAS programmatically (and Informatica PowerCenter mappings too). I also understand that it is painfully slow and tedious process. But we can make the process easier by creating a few templates then at run time we choose the nearest match template and modify it.
This works for ETL, reports and cube. But how about tables? Corresponding dim and fact tables could be generated on the fly by analysing the source tables based on its data types: numerical columns become measures and textual columns become attributes. Yet, I admit it is a difficult process, definitely requires human involvement. Still, it’s easier than doing it from scratch. We present the “machine made” fact & dim tables to the designer, who will then “fix” the tables. And the “which source column goes where” would be automatically fed to ETL creation module.
This was I think what Kalido does. But it does not create the cube and reports. It only creates the ETL. If I’m mistaken here I’d be happy to be corrected. I do feel strange that someone automate the DW build but not the BI part.
Like Kalido (and Insource Data Academy), Wherescape RED creates the ETL automatically. It also minimizes the time for designing the warehouse tables. Dimensions can be built by drag-and-drop the source table. SCD type 2 is also automated. RED also builds the cube automatically and the MicroStrategy project too. Documentation is also automated.
This article is about Wherescape RED so I’m going to go into a little bit more detail about how it is done.
In high level, the steps are:
- Load source tables
- Create dimensions
- Create dimension views
- Create staging tables
- Create fact tables
- Create SSAS cubes
- Create MicroStrategy reports
Having used it myself (evaluation copy) I found that it is easy to use. Step by step tutorial with screenshots is provided to it’s easy to do step 1 to 6 above (I have tried number 7 as it requires MicroStrategy extension to be installed – not included in the evaluation copy).
This is how it looks:
Lots of operations are done by drag and drop, and fill in the properties. It’s user friendly, easy to use, and quite simple to understand (note: you need to understand data warehousing of course).
You can use SQL Server, Oracle, DB2 or Teradata for your warehouse platform. These four are very popular platform for data warehousing. In the next release they include Netezza. No RED does not do Informix, Sybase or MySQL data warehouses.
The question that some people have about RED is: “Is it only suitable for simple DWs?” No it isn’t. Whether it’s complex fact table or complex dimension, we can use RED to build it. It can build aggregate fact table, SCD, hierarchy, KPI. We can also modify the generated SQL script if we need to do any custom stuff. You can also define indexes and create SQL agent jobs/scheduler.
The other questions are about the platform: Can it do BO? Informatica? Cognos reports? No, unfortunately. Any reporting tool such as BO/Cognos/RS can be used, but not automatically. You have to build all the reports manually. The ETL is SQL Script (T-SQL in SQL Server, PL/SQL in Oracle), or SSIS package. Yes RED can generate SSIS packages to load the data from source to stage/load tables, but not from stage to fact/dim tables. It doesn’t generate code for any other ETL tool: Ab Initio, OWB, ODI, PowerCenter, etc. Only SSIS.
The main disadvantage of RED (this is what people often ask when evaluating), is that it is ELT. It uses SQL Server T-SQL/Oracle PL-SQL to load data from the load tables to the DW/dim model (SSIS only load to stage/”load tables”). It doesn’t do ETL. Only ELT. Celinio explains the difference between ETL and ELT (link). Rob Davenport from Insource /Data Academy provides excellent white paper explaining advantage/disadvantages of ETL and ELT in great length.
ELT has its own advantages: flexible, easy to maintain programmatically*, simpler, lower cost (use DB server, not separate server). *difficult to maintain manually because it’s lots of SQL code. But of course once RED generates the SSIS package you can add all the transformation you like.
The main advantage of RED is fast development time. It surely cuts a lot of development time. My estimate is that a warehouse which takes 3 months to build using custom development (say using SQL Server + SSIS or Oracle + Informatica), can be built in 1 months using RED. And after it is built, changing or expanding is also quick.
Second advantage is ease of use. It’s a mature product, been there for years, so it’s easy to use. I found by following the tutorial within a day I’m able to use the product to build a DW. Very short training time. Proper ETL tools such as SSIS/Informatica requires weeks of basic training, plus weeks of on the job training (or buy the skills at £400-£500 / day).
Third advantage is once it’s built you can leave RED and continue to do it manually, maintaining the generated tables and ELT scripts or SSIS packages yourself. Expanding it as you need to, and developing the SSIS packages as you need to. And building the reports using your tool of choice, e.g. RS, Cognos or BO.
4th advantage and this is what I like best as I work quite a lot with cube: RED generates the SSAS cube. And RED also builds MicroStrategy project, enabling excellent reporting capability which is very flexible.
In my opinion Wherescape lives up to its reputation. We can use it build a data warehouse in much shorter amount of time. And this, in my experience, is what every BI consultant need. Reports only take a month to build but the DW takes 6 months* to build. If you can cut it down to 2 months it will be a big advantage, money wise and time/PM too. *that’s a small one, a big one is 18 months. The biggest plus point is that we can expand it later on: further develop and customise the SQL Scripts/SSIS package, SSAS cube, MicroStrategy projects for reporting. Unfortunately it does not generate Informatica or SSRS/BO/Cognos, some of the most popular tools in DWBI.
As always I’ll be glad to receive comments or corrections.
Vincent Rainardi 14/11/2011