A data warehouse is fed from source systems. A calculation is better performed upstream in the source system than in the data warehouse. This is what I call the principle of “Do It Upstream” in data warehousing.
Why do we need to do it upstream? Why don’t we calculate it in the data warehouse? Because:
a) It is easier and more economical
b) The result can be used by other downstream systems, not only the DW
c) It forms a logical grouping architecturally
For example, a credit risk system A is responsible for calculating various measures, ratios and indicators. We then need 1 more ratio, which was not already calculated in system A, it is easier to calculate it in system A because they already calculating various ratios. It will be in the same place/program as the other calculations, and it will already have access to the database columns it requires.
What I meant with “logical grouping” in point c) above is that when the data is published/sent to downstream system, the new ratio can be included in the existing group of ratios and published together with minimal extra effort.
As always, there are exceptions to everything. There are 2 exceptions to this principle:
a) If the calculation involved data from several source/upstream systems, then we have to perform it in the DW, no argument about it.
b) If the calculation is complex and DW is more equipped to do it, it could be better to perform it in DW. An example of this is customer classification, i.e. a mobile company who wants to classify the customers based on spending patterns.
It is usually a hard fight to ask an upstream system to do the work for a downstream system. “Why should I?” and “What do I get from this?”, will have to be answered in their favour, before the management of the upstream system would agree to do it.
And why should they indeed. Think about it from their point of view, you are asking them to do some work, of their own budget, of which produces no benefit to them. No one in their right minds would agree to that. This is why, in reality, the work of adding extra calculation falls under the DW project umbrella. “You need it, you do it”.
There are ways to do deal with this politics. These 2 are usual ways:
a) Ask somebody in higher position who overlooks both of you. Present the case of “This company will benefit $x if we do this upstream”. As always, when dealing with a C level, just “show them the money”. Anybody in their right mind will follow the money.
b) Pay him. Cross charge the activity. Totally funded by you, and therefore (important!) controlled by you.
Fighting with Yourself
This is a much harder fight. You yourself want the work to fall in your hands. As a DW Manager, you want this work to be done in DW. You will get more budget (obviously, it would be fool not to ask for an increase of money) and you will get more resource (time and people), meaning more power. In an under-constraint project this could be a good relief. Everybody under you (Mr. Architect, Mr. Analyst, Mr. Developer, Mr. Tester) would prefer the work to be done in DW, for an obvious reason that benefit themselves.
There are 2 usual ways to look at this:
a) By giving the work to the upstream system you have a control over them, possibility of promotion to a position overseeing both departments
b) Closer relationship to the CEO (as you have to present the case to him, see politics above)
Not sure why I wrote about politics and management today. Probably because it just flows naturally from the “Upstream” topic. Probably because of hot summer weather (32 C today in London!). I think I better stay on the technical side. Dealing with the “how to make it happen”. It’s simpler and less head ache 🙂
Vincent Rainardi, 27/6/2011