Data Warehousing and Data Science

17 February 2016

The Problem with Data Quality

Filed under: Analysis Services — Vincent Rainardi @ 8:44 am

The problem with data quality is not the technicality. It is not difficult to design and build a reconciliation system, which checks and validates the numbers in the data warehouse and in BI reports/analytics (or non-BI reports/analytics!).

The problem is, once the reconciliation/validation program spits out hundreds or thousands of issues, who will be correcting them? That is the issue!

It requires someone to investigate the issues, and, more importantly, fix the issues. This requires funding which can seldom be justified (it is difficult to quantify the benefits), and requires a combination of skills which rarely exists within one person. So that doubles the money because we need to hire 2 people. The “checker” who checks the DQ reports is largely an operational application support type of person, with whilst the “fixer” need to have a detective mind set and development skills. To make matters worse, these development skills are usually platform specific, i.e. .NET, Oracle, SSIS, Business Objects, etc.

Assuming £40k salary in the UK, then adding NI, pension, software, desk, training, bonus, insurance, consumables, appraisal, and payroll cost (total of £15k), and multiply by 2 person, it is a £110k/year operation. Adding half-time of a manager (£60k salary + £20k costs), it is a £150k/year operation.

It is usually difficult to find the benefit of a data quality program bigger than £100k. The build cost of a DQ program can be included in the application development cost (i.e. data validation, data reconciliation, automated checks, etc.), but the operational cost is an issue.

So the fundamental issue is not actually finding a person, or a team of people. The fundamental issue is actually to get the funding to pay these people. The old adage in IT is usually true: anything is possible in IT, provided we have the funding and the time.

The benefit can’t come from FTE reduction (full time employee, means headcount), because it is not a reduction of workload (unless a manual DQ is already in place of course). And it doesn’t come from increased sales or revenue either. Try to find a link between better DQ and increased revenue, and you’ll experience that it is hard to find this link. And we know that headcount reduction and revenue increase are two major factor for funding an activity/work within a company.

3 factors that drives data quality work

But fortunately there are 2 another factors that we can use: compliance and risk.

Compliance in financial services industry, healthcare industry, etc. requires reports to the regulators in a timely manner, and with good accuracy. That drives the data quality work. For example, if we report that the credit derivative loss position is $1.6bn, where as actually it is $2.1bn, we could be risking penalty/fine of several million dollars.

Risk: there are other risks apart from compliance, namely credit risk, interest rate risk, counterparty risk, etc. Different industry has different risks of course, with financial services probably have the largest monetary amount, but they all drives data quality work. If the data quality is low, we are risking misstating the risk amount, and that could cost the company a fortune.

The 3rd factor to use is data warehouse. If your company stores a lot of data in one place, such as a data warehouse, and the data quality is low, then all the investments are wasted. A £600k DW warrants a £50k DQ. And if your DW has been there for 4 years, the actual cost (development + operation) could easily exceed £1m mark. A general 10% ratio yields a £100k investment in the DQ work.

The key statement to use with regards to DW’s DQ is the “DW usage”. A centralised data store such as a DW is likely to be used across many applications/teams/business lines. Each of these app/business are in risk of having operational issues if the data in the DW is incorrect. And if we don’t monitor the DQ, we can be sure that the data will be incorrect. That is quite an argument for a Data Quality project.

1 Comment »

  1. Sometimes is hard to convince managers to have dedicated people for DQ. That brings us to the point that BI developers and data stewards need to work together before delivering the project in order to improve the data quality. The most common problem is, the deadlines become hard to accomplish and the project plan goes completely out of the track. The mood gets also decreasing and some people could get frustrated. The typical solution is to work 2 or 3 extra hours a day.

    Comment by Paul Hernandez — 17 February 2016 @ 10:41 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: