Data Warehousing, BI and Data Science

10 April 2012

Data Type Changes in the Source System

Filed under: Data Warehousing — Vincent Rainardi @ 6:35 pm
Tags:

If the data type of a column in the source system changes from varchar2(50) to varchar2(100), and they start sending data down that field with phrases over 50 characters long, what would you do from data warehouse point of view?

Are you going to ask the source system to postpone that change?

Next time you design a warehouse, are you going to double the width of all columns? (don’t be ridiculous)

Are you going to stop taking that feed into your warehouse? (don’t be ridiculous)

Or are you going to let the data flow into your warehouse and be truncated at 50 char limit?

Most likely, the source system won’t wait the downstream systems to be ready. Including the data warehouse. Their timing (of going to production) depends on the business needs. If they can do it in 1 month, they will. They won’t wait for the data warehouse or BI system to be ready. Remember that you (data warehouse) may not the only downstream system affected by this change. There may be other downstream systems.

What usually happens is
a) The DW/BI will ask the source system to delay the change until they are ready.
b) If the source system won’t wait, then DW/BI will try to do a “Expedited Change”
c) If DW can’t do expedited change, then the data will be truncated in the warehouse and as long as the business is ready to accept this truncation in their reports/cubes then fine

Therefore it is important for the source system to inform the downstream systems about any changes on the fields that are published. If the field is not published they can do what they like. If the field is published and they change the data type, rename or delete it, then all downstream systems need to sign it off first.

It is best practice that the publication is done using a set of Stored Procedures, and we use VS 2010 Database Project* to detect if that column is published or not, before changing it. *Or other tool with the same functionality, i.e. ERWin, Embarcadero, Enterprise Architect. In VS 2010, if you rename or delete a column, and this column is used by an SP, the DB project won’t compile and you can’t proceed. You need to change the stored proc first before you can deploy this change to the physical DB.

But in many small companies, there is no concept of “publishing data”. DW reads directly (ODBC) from the source system’s tables. The source system does not publish its data. It’s the downstream system which retrieves the data. In this case there’s no shield or intermediate layer. The minute the source system changes, the DW ETL will fail. DW will then raise an emergency fix ticket to get it fixed. But in a small company there is no emergency fix ticket. Developer would fix it straight in production, usually within 2 hours.

Vincent Rainardi, 10/4/2012

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: