Data Warehousing and Data Science

28 April 2012

Bank Data Model

Filed under: Business Knowledge — Vincent Rainardi @ 2:33 pm

I created a retail bank data model in April 2012 for my nephew, who is studying Information System, in order to explain to him what a database of a bank might look like. Rather than concentrating to the data type (length of each column, numeric 20 or 16, etc — and the column lengths are different from bank to bank), I decided to concentrate on the data values, which to him is much more valuable, in order for to give him a good view and experience about the banking industry. I made the data myself. Minimalist, only about 5-10 rows per table, but it gives him an idea of what the data could be.

I put the database is in Excel, so that he doesn’t need SQL Server or Oracle to view it.
The link is here: bank-data-model2

It is a normalised model. The aspiration is to do it to 3NF. But sometimes for practical reason I didn’t do it to 3NF (e.g. only 2NF). This is the list of tables:

  1. Customer
  2. Address
  3. Telephone Number
  4. Customer Address
  5. Customer Telephone
  6. Account
  7. Account Holder
  8. Occupation
  9. Transaction
  10. List Lookup
  11. Product
  12. Branch
  13. Employee
  14. Currency
  15. Interest Rate Scheme
  16. Interest Rate
  17. Product Interest Rate
  18. Transaction

For illustration, below are the screenshots of 4 tables: Customer, Account, Interest Rate Scheme, and Interest Rate

Customer Table1

Customer Table2

Account Table

Interest Rate Scheme Table

Interest Rate Table

Transaction Table

10 April 2012

Data Type Changes in the Source System

Filed under: Data Warehousing — Vincent Rainardi @ 6:35 pm
Tags:

If the data type of a column in the source system changes from varchar2(50) to varchar2(100), and they start sending data down that field with phrases over 50 characters long, what would you do from data warehouse point of view?

Are you going to ask the source system to postpone that change?

Next time you design a warehouse, are you going to double the width of all columns? (don’t be ridiculous)

Are you going to stop taking that feed into your warehouse? (don’t be ridiculous)

Or are you going to let the data flow into your warehouse and be truncated at 50 char limit?

Most likely, the source system won’t wait the downstream systems to be ready. Including the data warehouse. Their timing (of going to production) depends on the business needs. If they can do it in 1 month, they will. They won’t wait for the data warehouse or BI system to be ready. Remember that you (data warehouse) may not the only downstream system affected by this change. There may be other downstream systems.

What usually happens is
a) The DW/BI will ask the source system to delay the change until they are ready.
b) If the source system won’t wait, then DW/BI will try to do a “Expedited Change”
c) If DW can’t do expedited change, then the data will be truncated in the warehouse and as long as the business is ready to accept this truncation in their reports/cubes then fine

Therefore it is important for the source system to inform the downstream systems about any changes on the fields that are published. If the field is not published they can do what they like. If the field is published and they change the data type, rename or delete it, then all downstream systems need to sign it off first.

It is best practice that the publication is done using a set of Stored Procedures, and we use VS 2010 Database Project* to detect if that column is published or not, before changing it. *Or other tool with the same functionality, i.e. ERWin, Embarcadero, Enterprise Architect. In VS 2010, if you rename or delete a column, and this column is used by an SP, the DB project won’t compile and you can’t proceed. You need to change the stored proc first before you can deploy this change to the physical DB.

But in many small companies, there is no concept of “publishing data”. DW reads directly (ODBC) from the source system’s tables. The source system does not publish its data. It’s the downstream system which retrieves the data. In this case there’s no shield or intermediate layer. The minute the source system changes, the DW ETL will fail. DW will then raise an emergency fix ticket to get it fixed. But in a small company there is no emergency fix ticket. Developer would fix it straight in production, usually within 2 hours.

Vincent Rainardi, 10/4/2012

Blog at WordPress.com.