Data Warehousing and Data Science

31 May 2013

Seven Methods of Data Integration

Filed under: Data Warehousing — Vincent Rainardi @ 6:48 pm

If you have to read from 2 systems and produce an integral view of the data from both systems, what options do you have to integrate the data?

1. One of course, is data warehousing, i.e. we ETL the data in. We bring the data from both source systems into our DW, and setup the BI on top of the DW. This is the traditional methods.

2. The second method, is similar to the first one, but not a data warehouse. It’s an ODS. It does not have surrogate key, dimensions, and fact tables. It has tables for reference data and transaction data. If you have 2 instances of POS systems (Point of Sale), the ODS basically is the union of the two.

3. The third method is data virtualisation. This method doesn’t create a copy of the data. We query the data in the source systems on the fly, integrate them on the fly, and present the data to a BI tool. Composite is one tool that does this. Of course, the issue here is performance. A query which joins 15 tables from 3 system could take a hour, and we would be famous amongst the source system admins for grinding their system to a halt. Hence there is technique of caching the data. So we extract incrementally, and cache the data. When the BI tool asks for the data, we retrieve from cache.

4. The fourth method is using OLAP cubes. Basically we get data from source systems as files and we load the files into OLAP cubes, i.e. SSAS, QlikView, Spotfire, TM1. So the data goes from files straight into OLAP system (BI), without being stored in a relational database first. The integration happens in the cubes.

5. The fifth method is using views. If both source systems are on Oracle, we can setup a Database Link. If both are on SQL Server, we can setup a Linked Server. This Linked Server enable us to query tables located in different servers. Now we can create views which join tables from different servers/systems. The BI tool queries these views and get the integrated data. The data integration basically happens in the views.

6. The sixth method is also in BI tool, like number 4 above. But rather than in the cube, the integration happens in the report. The report sources the data from different systems. All good reporting tools these days have some kind of metadata layer. BO has Universes. Cognos has Framework Manager. SSRS has Data Source View. In this metadata layer we integrate data from multiple source system. To the report, the data coming from the metadata layer is integrated.

7. The seventh method is to bring the data from the tables in system 2 into system 1. So that when we query the table in system 1, we get data from both systems. The data integration happens in the source system.

Those are the seven methods of integrating data from multiple source systems.

Blog at