Data Warehousing and Data Science

18 June 2022

Reporting – Stability or Correctness

Filed under: Data Warehousing — Vincent Rainardi @ 6:25 am

There is a benefit of restating last month’s reports using a more recent data. There is also a benefit of not changing the content of any report after it is published. In this article I’d like to outline the pros and cons of both approaches.

As always it is better to learn by example. Say on 1st April 2022 you were calculating the carbon footprint of company A as of 31 March 2022 (the last 12 months). You used the 2020 carbon data as that was the latest carbon data available for this company.

One month later, on 2nd May 2022 the 2021 carbon data is available. So you recalculated those 12 numbers (last 12 months carbon footprint, from 30/4/21 to 31/3/22) using this latest carbon data.

We can see that this approach makes sense: we’d want to restate the March report to use the more recent data, to make it more accurate.

But this approach comes with a cost: our trustworthiness. When the version 1 of the report came out on 1st April 2022, the numbers where used by many users in the firm. When the 2nd version of the report came out on 2nd May, some of these users didn’t get it so they kept using the old numbers. Of those who got the new version, some of them were not sure about what was changed. Is it just the carbon data that was changed, or something else too?

If every first working day in the month you publish this report (the last 12 months carbon data for thousands of companies). Then users will be confused because on 2nd May you were restated not only the March report, but also publishes April report. And you restated February, January and December reports too. Say users can access these reports on the intranet using Power BI, and on this report there is a drop down list to change the “As of” month.

This approach causes confusion amongst the users. Because the March numbers keep changing. February, January and December numbers too, they keep changing. To prevent that confusion that some companies implement the “freeze after publish” approach. Once the December report was published on 4th Jan 2022, it is frozen. Once the March report was published on 1st April, it is frozen. Its numbers won’t be changed. Even if in April the data that was used to make that March report changed, the numbers in the March report were not restated.

This second approach promotes confidence amongst the users. The numbers in all reports are stable. That is why they are trustworthy. Even though the numbers are wrong now, but on the day the report was produced they are all correct. Reports were produced using the best data available at that time.

Which approach is better? The second one. Because it creates a stable and trustworthy environment amongst the users. Which takes precedence over the the correctness. I found that “The report was correct on the day it was produced” was quite an acceptable principle for many users. And a few months later “the incorrectness” is only in the small amount. Only a very small amount of data changed, not much. So a few months later that March report is only slightly incorrect, not much. That’s why it is more important to freeze the report and get the stability and trust (and have 99% correctness), than keep correcting the past reports sacrificing stability and trust to obtain 100% correctness.

Of course you need to be politically correct and ask the users what they want. But if you’ve been to many companies and see what happened with both approaches, you also have the obligation to inform the users about the consequences of both approaches, before they make a decision. And of course, my recommendation above is not universal. You need to look at individual cases one by one. Each case is different. There are cases where being 100% correct is more important than the stability.

Blog at WordPress.com.