One of the difficult decisions that a data architect needs to make is about the “code decode table”. It is also known as Decode table. It is a 3 column table: entity, code, description. Like this:
Of course, we will need to add a primary key. And, with its advantage and disadvantage, we could normalise the entity column by moving it into another table. I disagree with normalising it, as it causes unnecessary join, more complexity for not much benefit.
And then you can add “systems columns”, such as LastModifiedBy, LastModifiedDateTime, CreatedDateTime (or InsertedTimeStamp), ExpiryDate, etc. So the final form of the Code Decode table is like this:
Another table which is similar to Code Decode table is Grouping table. For example, we can group 200 countries in the world, from Angola to Zanzibar into 3 sales region: EMEA, Asia Pacific, Americas. But we can also group them into 7 distribution region: Europe, Asia ex Japan, North America, Africa, etc. Or into this: Emerging Market, Developed Market, etc. We can group products, we can group currencies, etc.
This week it is probably the 10th times I came across Code Decode table, or Key Value Mapping, and for the first time I think it is a right decision to have a Code Decode table. The main reason is that it enables us to add minor entities without doing any structural change, i.e. we don’t have to create a new table. During the initial 1-2 year of a new system, data architects usually need to add a number of new code decode tables.
This “flexibility” could be a life saver in the initial life of a new system. The ability to add a new entity without creating a new table enables us to grow the system, add new functionality to the system whilst at the same time keeping the design simple.
This table has no place in a dimensional model, of course. Because the description will be an attribute on a dimension table, whereas the code will be removed.
But in a normalised database, I believe (for the first time) that it is useful and we should have it. I know it is against the normalisation principles, but the advantages (as I mentioned above) outweigh the disadvantages.
The disadvantages of having a Code Decode table are:
- Unable to enforce referential integrity (Foreign Keys)
- In the same query we need to do multiple joins to this Code Decode table
- It is more difficult to automatically build maintenance pages
Not being able to enforce referential integrity means that we are risking having an invalid payment method. The valid payment methods are CC, DD and BT; but we could have CA (for Cash) in the transaction table.
The 2nd point impacts performance, when the code decode table becomes large, the query becomes slower. It could reach 5000 rows because instead of updating rows, we may decide to preserve the historical values, so we still have the old description. For most systems it is unnecessary to keep the previous values of the description, but occasionally in a few companies this is required, for example for auditing, legal,or security reasons.
To minimise the performance impact the common solutions are a) use temp tables, and b) use nolock. And they can be implemented together.
Using temp tables means we select different entities from the code decode table (different rows) into different temp tables, then use these temp tables in the main query. Using nolock enables SQL Server to access different parts of the table at the same time. Both result in better query performance.
Conclusion
As a principle, for the first time, I think it is good to have code decode table, and grouping table. The advantage becomes more obvious as we scale out. If there are only 10 entities (small systems), I would prefer to build them as separate tables, but if there are 100 entities (large systems), I would put them into a code decode table.