Data Warehousing, BI and Data Science

13 November 2017

Numerical Attributes

Filed under: Data Architecture,Data Warehousing — Vincent Rainardi @ 7:24 pm

Should we put numerical attributes in the dimension table or in the fact table? Numerical attributes are static attributes of an entity but they are numeric, not string. For example: interest rate of a swap and weight of a product.

I usually follow the textbook approach and put them in the dimension. But for measures and attributes of OTC derivative instruments, we should be putting them in the fact table.

The fact table contains daily (and intraday) valuations, such as price, market value and the “Greeks”. An interest rate swap usually has a fixed leg and a floating leg. For example, receive 0.886% pay EUR 3m + 0.567%. So we will get a fixed interest rate of 0.886% and pay our counterparty the rate of EURIBOR 3m plus 0.567% (currently EURIBOR 3m is -0.329%).

These two attributes (the 0.886% and 0.567%) are part of the interest rate swap contract and are fixed for the duration of the contract (until maturity). When it reaches the maturity date, the contract ends and if we want to extend it, we will normally make a new contract, with a new SEDOL. Hence these 2 attributes are essentially static attributes (they are called receive interest rate and pay interest rate).

Because they are static they should be put into the dimension, which in this case is the security dimension or instrument dimension, rather than the fact table, which in this case is the instrument valuation fact table.

But I recommend putting them into the fact table for 4 reasons:

  1. Incorrect business judgement (a simple “sweep all” rule, and if we are wrong then it’s not too bad)
  2. Bytes
  3. Floating
  4. Calculation

1. Incorrect business judgement

There are many types of OTC derivative instruments, e.g. Interest Rate Swaps, Credit Default Swap, FX Options, Equity Index Swap, CDS Options, Asset Swap, Total Return Swap, etc. There are about 15 of them, each with different numerical attributes. So in there are about 50 or so numerical attributes.

We could sit down with the business and categorise these 50 numerical attributes into two:

  • A measure or fact: meaning they are potentially changing every day
  • A dimension attribute: meaning that they are static and never change

But the business judgement could be wrong. Something which they think is static could be changing. Here are some of those 50 attributes and imagine you sit down with the business categorising them into the two categories above:

Strike price interval, put option premium, strike price, intrinsic value, market value, fair value, time value, put call ratio, put call parity, recovery rate, expiration value, underlying asset value, fixed interest rate, floating interest rate margin, nominal amount, notional value, pay duration, pay/receive spread, pay/receive margin, credit spread.

In addition to the input from the business people, it can be useful to check the source system for historical changes. Does the value of that column/field ever changes? If yes, how often?

If we miscategorise them, there are two things which can happen:
a) A measure is miscategorised as an attribute
b) An attribute is miscategorised as a measure

Generally speaking, b) is fine but a) has quite serious consequences, because a) means that 1 instrument in the instrument dimension can have 100 rows, even 1000 rows. And if we have 300,000 instruments (covering both active positions, and non-active positions), the size of the instrument dimension will be very big, 300 million rows, making it slow.

The reason an instrument can have 1000 rows is that the attribute is actually a measure. We think it is static, but it actually changes every day. Or even several times a day.

On the other hand if an attribute is miscategorised as a measure, that attribute will be put in the fact table. And because it is numeric, it does not take a lot of bytes, even if it is repeated every day.

So to be safe, if you are not sure, put it as a measure in the fact table. The setup of an instrument can take a few weeks, particularly if it is a swap. During this period there is a continuous update on the instrument rows. If we have 20 numerical attributes for a swap, during this few weeks period we could have 20 rows because on day 1 only 5 attributes are setup, on day 2 another 5 is setup, on day 3 another 5, and on day 4 they change the attribute setup on day 1, and so on. So the whole thing can take 3 weeks, producing 20 rows. If it is in the fact table, a) we will only get 1 version a day, and b) we are preventing the instrument dimension from ballooning into many millions of rows.

2. Bytes

Numerical attributes take up much less bytes / disk space compared to textual attributes. Therefore, it does not slow things down if we put numerical attributes in the fact table, compared to the risk of miscategorising them into the instrument dimension, which will cause the dimension to “explode”.

3. Floating

If the attribute value is 0.00352281 and it is a float in the source system, it can be stored as 0.0035228100000000001. And it is possible that the value in the following day is 0.0035228100000000000, just because of the rounding in the floating point processor. Perhaps it is derived from a calculation of A/B. And if it changing from 00001 to 00000 then to 00001 again the next day, we have a phenomenon called “Flip Flop”. Flip Flop is where a dimension attribute keep changing values (usually twice a day) because it is updated from 2 different systems. This causes the number of rows in the dimension to “explode”.

4. Calculation

The last argument is that numerical attributes might be required for calculation, for example column1 divided by column2, or column1 + column2 + column3. If all columns involved in the calculation are stored in the fact table, we will be able to perform the calculation without joining the any dimension. We just need to filter on the partitioning date of the fact table, which is very fast.

Conclusion

I am not advocating to store all numerical attributes in the fact table for all cases. Only in the case of OTC instrument, I would recommend putting the numerical attributes in the fact table, because of the 4 reasons above.

But for most cases, usually the best approach is to put the numerical attributes in the dimension. For example: the dimension of a product (the width, length and height) and the weight of a product. Yes they do change from time to time, but not daily, hence we want to record these changes as SCD type 2 in the product dimension.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: