Data Warehousing and Data Science

15 November 2009

Using Cubes For Reconciliation

Filed under: Uncategorized — Vincent Rainardi @ 11:19 pm

cube can be used for 4 purposes:
1. Data analysis
2. Reporting
3. Reconcilation
4. Testing

By far the most common use of a cube is data analysis. Data from dimensional data warehouse is loaded into cubes so that business users can browse the cubes to analyse the data. Browsing a cube means slicing and dicing the data to get the break down of the measures by the dimensional attributes. It also means drilling down (viewing the data at a more detail level) and drilling up (viewing the data at a higher level).

The second usage of a cube is for reporting. In this case the cube is accessed by a reporting tool such as SSRS, and displayed in tabular format. ‘cube’ reports like this often contain parameters in the form of dropdown lists. Users use the dropdown lists to specify which slice of the cube to display on the report. For example, if the cube has 3 dimensions (product, date, store), and the report is “monthly sales by product” (i.e. product on the rows, and months on the column), user can select which store to display in the report parameter.

Data analysis and reporting probably cover 99% of the cube usage. Not many companies (and other types of organisations) use OLAP cubes for any other purposes. The idea of using cubes for reconciliation is quite simple: it is to compare 2 systems and identify the differences. To do this, data from both systems are loaded into the cube. In the cube, for each measure that we want to compare we create a calculated measure, which is basically the measure from system B minus the measure from system A.

We then browse the cube. If at the very top level this calculated measure shows say £1 million that is the difference between system A and system B. Users can then drill down to get the break down of this £1 million by whatever dimension attributes they want.

For example, say a retail company needs to migrate data from system A to system B. System A is an old system, and system B is the new system. This is not a straight forward task, but it is a very common scenario. All companies need to change their systems sooner or later. Tables table on system A are mapped to the tables in system B. Each column within each table in system A is mapped to the target column system B. And there are various business rules. And there are numerous exceptions. If this than that, if that then this. But if the condition is so and so, then use this calculation instead. And so on and so forth. After a few months of design and development, the team managed to create the ETL, or data migration application. Say it is in SSIS. The team managed to execute the SSIS package successfully (in Test environment of course) and all data from system A has been migrated to system B.

Now comes the checking. In system A, the total sales from all stores in City X in 2005 was $1,300,000. In system B it was also $1,300,000. Checked. The inventory value of product Y in 2006 was $2,900,000. In system B it was $2,800,000. The $100k difference was expected, because there is a business rule affecting the inventory valuation in 2006. Checked. But the total sales of all products, from all stores worldwide, for all years doesn’t match. It differ by say $1,500,000.  Testers and analysts alike all issuing SQL queries to both system A and system B trying to pin point the root cause. After long hours they found the cause: a price list update in such and such store for such and such product in such and such year. And it goes on and on. All in all, a huge chunk of project time is spent on this activity: Reconciliation. Checking and rechecking. Hundreds of “select from where group by” queries issued agains both source and target systems. Weeks and weeks of valuable time is spent on this Reconciliation.

Using the reconciliation cube I described earlier, the testers will see the top level difference. Then they can drill down on any attribute at any level to find out the break down of that figure. They can then decide whether that difference is expected or not. If it is not expected, they can check the ETL package and correct it. Rerun the package and refresh the cube. After several iterations they can declare that the testing is complete and they are ready to sign off the project. At the migration day, the reconciliation cube is then pointed at the production system. After the ETL package is executed on the production database(s), the reconcilation cube is refreshed and reviewed. This approach is much more effective than issuing hundreds of SQL queries to do it manually.

This approach can also be applied for checking data warehouse load, i.e. to reconcile the data that is already loaded in the warehouse against the data in the source system. There are millions of numbers in the source system and millions of numbers in the warehouse. To be able to state confidently that all numbers match is a pretty tall order. This is especially so if your tool is limited to SQL queries. OLAP cubes, on the other hand, is purposely built to handle millions of numbers. And using a simple calculated measure like B-A (or (B-A)/A to get the percentage) the reconcilation process is so much simpler.

1 Comment »

  1. Thanks, its useful. If has enough space better to do initial reconcile via cubes.

    Comment by San — 16 September 2016 @ 3:57 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: