Data Platform and Data Science

25 September 2018

Data Warehouse for Asset Management

Filed under: Data Warehousing — Vincent Rainardi @ 6:50 am

In this article I would like to illustrate what a data warehouse for an asset management could look like. This article consists of 3 sections:

  • Summary
  • Fact Tables
  • Dimension Tables

In the Summary section I will explain the main fact and dimension tables, along with the potential issues. In the Fact Tables and Dimension Tables sections I will explain them in more detail.

A. Summary

A.1. Fact Tables

The main fact tables in an asset management warehouse are position, performance, risk and transaction. In the next 4 paragraphs I’ll explain what they are briefly. Later on in the article I will be explaining them in more detail.

The position fact table (aka valuation fact table) contains the daily holdings and benchmark constituents for every portfolio, both security/instrument positions and cash positions. The main issue with this fact table is the matching between the holding/portfolio positions and benchmark positions. We have the option of not attaching the benchmark, which makes it flexible particularly for portfolios having more than one benchmarks. The position fact table is mainly used to provide portfolio breakdown, and top N holdings. The other issue is lookthrough.

The performance fact table contains the performance returns of every portfolios compared with their benchmarks, for many different periods e.g. 3M, 6M, 1Y, 3Y. There is also another type of performance fact table, which contains the performance attribution to different factors such as yield curve, sector allocation and security selection. Performance can also be allocated to different countries, currencies and asset classes, each compared to the benchmark. Please see my article here.

Because of the non additive nature of risk numbers, there are many risk fact tables. They mainly hold the risk sensitivities such as interest rate sensitivities (PV01, DV01), inflation sensitivities (IE01), credit rate sensitivities (CR01) and exchange rate sensitivities. The risk fact table also stores the VAR data (value at risk) at portfolio level and at “carved-up” level (for example, industry sector, currency and asset class)

Transaction fact table stores every purchase, sale and corporate action in every portfolio, including the fees and charges. The transaction status, which changes at every stage on different dates, can be stored on an accumulated snapshot fact table. Whereas the trades data can be stored on a “transaction” fact table (as opposed to snapshot fact table).

Other business areas requiring fact tables are:

  • Fund administration e.g. redemptions, subscriptions, distribution payments
  • Liquidity management e.g. cash ladder (projected cash)
  • Collateral management and counterparty exposure
  • ESG data (Environment, Social, Governance)

A.2. Dimensions

In an asset/investment management DW, the following dimensions are commonly found:

  1. Security dimension (aka Instrument dimension or Asset dimension) which contains 3 types of securities: held securities, benchmark securities and watchlist securities.
    Note: cash is a financial instrument but not a security. Only equity, fixed income and money market are financial securities, including derivatives (refer to The Securities Exchange Act, section 3 part 10)
  2. Issuer dimension (aka company dimension) containing a list of public and private companies who issue shares or bonds, had an OTC trade/deal with us, or has ESG data.
  3. Client dimension (aka customer dimension) containing a list of clients from the CMS such as Salesforce, with their usual attributes such as client type, Mifid category, industry sector, and address.
  4. Portfolio dimension (aka account dimension, product dimension or fund dimension) which contains a list of open funds, segregated account/mandate and trust funds that the company runs. Portfolio dimension should contain share classes too.
  5. Benchmark dimension, containing a list of benchmarks which the portfolios are measured against, including blended benchmarks. The “pairing” between portfolios and benchmarks is not done in this dimension.
  6. Country dimension covers two country code schemes (ISO 2 chars) and (ISO 3 chars) and 2 to 3 country name columns (one for each OMS/IMS, plus ISO).
  7. Currency dimension is pretty static. A country column is not useful here, as the currency name already includes the country and some currencies are used by several countries. It is usually a good practice to update country and currency dimensions manually.
  8. Rating dimension. Covers Fitch, Moody’s, and S&P. Each agency is put on different set of rows, with an “agency” column. This data structure is known as a “stacked” dimension. Also covers “major” rating and “notched” rating as two different attributes/columns (AA is a major and AA+ is a notched). Also covers cash ratings (Moody’s), which is quite different from credit rating, possibly “watch” rating and “outlook” too.
  9. Industry Sector dimension. Covers at least ICB and GICS, along with major providers such as ML, Barclays, Bloomberg. Some are 4 levels deep, some 7 levels. Unlike rating, don’t try to do translation between providers here, so stack them up with a “provider” column, then level 1 to level 7 columns.
  10. Asset class dimension (aka asset type, instrument type, asset group) which contains a list like: equity, fixed income, cash, property, commodity as level 1. Level 2 contains the breakdown for example for fixed income the level 2s are bond, floating rate note, CDS, bond future. For bond the level 3s are callable/puttable or government/corporate.
  11. Performance return basis dimension: contains the fee basis (gross or net of fees), cumulative or annualised, arithmetic or geometric, pricing method (single or dual pricing), etc. We should model them as a junk dimension because most attributes only have 2-3 values. I don’t recommend the performance period to be put here (e.g. 3 month, 1 year, 5 years), best to put them as columns in the performance return fact table.
  12. Broker dimension: stores all brokers we use in trading. Used by the transaction fact tables.
  13. And finally, date dimension.

A.3. Issues

The usual issues/dilemmas with designing the dimensions are:

  1. Do we include country of risk/incorporation, currency, industry sector, etc. in the security dimension? Absolutely, as type 2 attributes. Plus we also have them as separate dimensions in their own right.
  2. How do you model derivative instruments which has very different attributes? By creating Option, Swaps, Futures and CDS as separate dimensions. Don’t separate the OTC Derivatives (Over The Counter, e.g. options, swaps, CDS) from ETDs (Exchange Traded Derivatives, e.g. futures, forwards). Don’t create too many dimensions for derivatives, but group them up. 4 or 5 dimensions sounds about right. Don’t worry too much if certain attributes are duplicated between the main security dimension and the derivative instrument dimension, it’s ok.
  3. Do we put cash in the security dimension or not? Don’t. Only real financial instruments should be in the security dimension (including CD, Repo, OTC and ETD), but leave settled and unsettled cash out (including collateral cash for margin requirements) See my article here for more info: link. Cash in different currencies should be put directly in the position fact table, clearly labelled as settled or unsettled as many portfolio management decisions hang on this label.

B. Details – Fact Tables

B.1. Position Fact Tables

Position fact tables store the valuation for each position from the back office accounting system, including adjustments. The valuation is daily, but the one at the month end is continuously adjusted / restated until a few weeks later.

For example, every business day we could be receiving:

  1. Daily position valuation from the OMS system
  2. Daily position valuation from the accounting system
  3. Today’s version of the last month end position valuation from the accounting system

The valuation is not only on the securities that we hold, but also the cash. There are a few things we need to be aware here, with regards to the validity of our position fact table:

  1. The cash amount is dependent on the settlement status of the transactions for that day.
    For example, this morning we had £1m cash, and this afternoon we bought £300k worth of equities and bonds. So what’s our cash position at the end of today? Answer: Settled cash = £700k, unsettled cash = £300k.
  2. The securities we hold is affected by the trade status.
    For example, today the portfolio manager put an order to buy £200k of bond X. If the status of the order is “executed by the broker” (it’s called “filled orders”) then we will have that security in today’s position. But if the status of the order is “not yet executed by the broker” then we don’t have that security in today’s position. There are many stages of an order/trade including “draft/created”, “verified”, “authorised”, “compliance checked”, “placed with/sent to the broker”, “confirmed by the broker”, “filled/executed by the broker”, “allocated”, “settled”, and “closed”. The position is created when the order is “executed”.
  3. The value of an equity position is the number of shares we hold x the previous day price of that equity (close of business price, e.g. when the exchange ended trading – the exact hour depends on which country the exchange is located). Like wise of bonds and any exchange traded securities. But for OTC derivative positions like options and swaps, the valuation requires some calculations to determine the “fair” market value, e.g. Black-Scholes model for options (this discipline is generally called “derivative pricing”). This is because OTC derivatives are not traded on exchanges, but a contract between two parties.
  4. Some of the positions could be valued in-house, and some positions could be valued by an external company to whom we outsource our back office operation. In this case we may need to merge two sources to get the holistic view.

Design of Position Fact Tables:

  • Name: Fact Holding, Fact Valuation or Fact Position. “Holding” restricts us from including benchmark positions. “Valuation” restricts us from including fixed income analytics. So in my opinion “Position” is the best name.
  • Grain: 1 row for each position in every portfolio and benchmark per day.
  • Portfolio and benchmark: two separate position fact tables would cause all sort of painful issues when joining, so put into 1 fact table but join on the front end OMS (Charles River, Aladdin, Simcorp, ThinkFolio). Do not join portfolio and benchmark positions in the DW. This is probably the most important point in this article.
  • Don’t use “stacked” fact table. A stacked fact table is where portfolio positions and benchmark positions are put as separate rows in the same fact table. This leads to all sorts of performance issue because that fact table needs to be joined to itself causing locking and blocking.
  • Front end and back end: put data from back end (State Street, Fund Accounting, Northern Trust) and front end (CRD, Aladdin, TF) in two separate position fact tables. The former contains accounting/finance valuations and the latter contains daily trade-by-trade valuations.
  • Daily and month end: put daily positions and month end positions in a separate fact tables. The daily position fact table only has 1 date key column (valuation date), and the month end position fact table has 2 date key columns (valuation date and as at date), because the month end position is restated in the first few business days. This business day is the as at date.
  • Surrogate keys: date key (called reporting date), portfolio key or benchmark key, security key. Those are SKs which make up the grain. Other SKs are: currency key, client key, issuer key.
  • History: stores 5 years of daily snapshot.
  • Partition: partitioned on reporting date.
  • Indexing: index on date key, portfolio key and security key. Also index on other SK columns.
  • Restated month end valuation: stored in a separate fact table.
  • Measures: valuations of each position in 3 currencies (portfolio, base and local) and their implied FX rates, fixed income analytics such as durations, yields and spreads, maturities (in years, 4 dp), risk measures such as position level IE01, PV01 and DV01.
    Note: Analytics numbers are from front end OMS. Fund accounting/finance/back office only provides valuation numbers.

Loading: daily file from back office outsourcing/finance and daily export from OMS.
If the valuation of OTC derivative positions is conducted in house rather than outsourced, this needs to be merged into the back office data.

Usage: correct security key is vital as this fact table is the main source for reporting aggregated positions (portfolio breakdown) by currency, asset class, country, industry sector, rating, etc. which is required by both clients and regulator. Mostly usage are only interested in the month end, but settled positions. Some requires executed (unsettled) positions.

Specifics:

  • Lookthrough: store the positions without a lookthrough. A lookthrough position is a fund held in the portfolio, usually our own funds. In a portfolio, in addition to holding cash, equity or fixed income position, we also hold funds. We should not “expand” the “content” of this fund. Instead we should store the value of the whole fund. If we “expand”, the valuation of other positions become incorrect, particularly cash positions.
  • Unsettled cash positions: in the Position fact table, separate unsettled cash positions from settled cash. The separation and calculation of the unsettled cash should be done within fund accounting, not within the DW.
  • Mid day vs end of day valuations: always take the end of day valuations, even if we do the valuations in house (e.g. for options and swaps).

B.2. Performance Fact Tables

The Portfolio Performance is calculated in the back office system such as Stat Pro, Barclays Point (now Bloomberg Port), MSCI BarraOne, or outsourced to back office provider such as State Street or Northern Trust (Passport).

Design:

  • Name: Fact Performance Return (to differentiate from Performance Attribution).
  • Grain: 1 row for each share class, per month, per return type (gross or net return), per return calculation (arithmetic or geometric return), per data source, per row type (portfolio/benchmark/excess return), per cumulative type (annualised or cumulative), per price/total type (price return or total return which is price + income).
  • Surrogate keys: reporting date key, account key (1 account for each share class), benchmark key, currency key, return type key, return calculation key, data source key
  • Degenerate dimensions: performance row type (P for portfolio, B for benchmark, E excess return = R for relative return = A for active return), cumulative type (A for annualised, C for cumulative), performance total/price type (P for Price Only Returns, T for Total Return which is Price + Income), return calculation (A for arithmetic, G for geometric).
    Alternatively we can also put all these DD columns in a junk dimension.
  • Measures: 1M, QTD, 3M, 6M, YTD, 1Y, 3Y, 5Y, 10Y (M = month, Q = Quarter, Y = Year). Some asset managers also have 2Y and 15Y.
  • Loading: from StatPro/Simcorp/Aladdin output or files from the back office provider such as NT or State Street.

Specifics:

  • Some have MTD (month to date), which stores the daily intra-month returns. To store MTD numbers we will need to update the MTD column every day (unlike other measures which are an insert). We update the MTD column for the rows/portfolios that requires daily intra-month update.
  • A popular alternative design is to create a period dimension, containing 1M to 10Y, effectively making this fact table a vertical fact table. I prefer to have a separate measure column for each period, because the first thing we do when we have a vertical performance fact table is to pivot it so that the 1 column of numbers becomes 1 row of numbers. Also because the number of periods are relatively fixed.
  • Performance numbers have to be signed off. Before the sign off happens, the data is already available actually, and can be used for reporting, but there a (small) risk that the data is incorrect – which in some cases is acceptable. So we need to add a DD column (or part of the junk dimension) indicating the sign off status, e.g. preliminary or signed off.
  • Another attribute which should be added to this fact table is the sign off date (as a date, not a date key).
  • Some asset managers calculate and store the performance of their funds on daily basis rather than month end only. For example, on Monday the 1Y performance of Fund1 is 12.34% and the 3M is 3.88%. On Tuesday the 1Y is 12.57% and the 3M is 3.74%. And so on. So every day we calculate and store the performance of all funds for all period. For this I recommend creating a separate daily snapshot fact table, leaving the monthly performance fact table intact, still to be loaded once a month. If the users require stable, signed off performance numbers, use the monthly fact table. If the users require quick and dirty data, use a daily performance fact table.

Performance Attribution Fact Table

The active return or excess return or relative return is the return of the portfolio minus the return of the benchmark. This active return can be attributed/allocated to several factors, i.e. yield curve, duration, security selection, currency, credit, carry and sector allocation. This is called performance attribution data. For example, a relative return of 6% can be attributed to:

  • 2% because of duration
  • 1% because of security selection
  • 2% because of sector allocation
  • 1% because of carry (FX rate differences)

Design:

  • Name: Fact Performance Attribution
  • Grain: 1 row per portfolio, per reporting date.
  • Surrogate keys: reporting date key, portfolio key, benchmark key.
    Additionally: currency key.
  • Measures: yield curve, duration, security selection, credit, carry, sector allocation, and total.
  • Loading: from performance attribution system such as StatPro/Point/Barra output or files from the back office providers such as NT or State Street.

B.3. Risk Fact Tables

The risk fact tables store the sensitivities numbers and value at risk. They also stores the results of “stress test” on multiple scenarios, such as European sovereign default, global economic recession, oil price fluctuation, tightened credit markets, liquidity squeeze, exchange rate fluctuation, Brexit scenarios, raise of US interest rates.

I’ll start with sensitivities fact table, the VAR fact table, then stress test fact table. Note: IE01 reflects how sensitive the portfolio is to inflation. PV01 reflects how sensitive the portfolio is to interest rates.

Sensitivities fact table:

  • Description: combine IE01, PV01 and DV01 in one fact table, storing them as different measures.
  • Name: Fact Portfolio Sensitivities
  • Grain: 1 row per portfolio, per day.
  • Surrogate keys: calculation date key (when the sensitivities were calculated), portfolio key
  • Measures: IE01, PV01, DV01. Some asset managers also store CR01 and IR01.
    (IR01 is more for interest rate swap, whereas DV01 is for bonds. Both are interest rate sensitivity, but IR01 refers to Libor whereas DV01 revers to the Treasury).
  • Loading: from risk system such as IBM Algo Risk, MSCI BarraOne, MSCI RiskMetrics, Simcorp. Daily load of the output files.
  • Note: IE01, PV01 and DV01 also calculated at position level. Put them in the position fact table. Keep the ones in this sensitivities fact table at fund/portfolio level.

Value At Risk fact table:

  • Description: stores value at risk (VAR) at portfolio level, and at sector and country levels. Note that VAR measure is not additive, so we can’t aggregate the VAR of all European countries to calculate the VAR for Europe. So we need to store the VAR at country level and at region level. Ditto for the sector. A VAR measure depends on the confidence level (i.e. 95%, 99%, 99.9%, etc.) and the time horizon (i.e. 1 day, 10 days, etc.) For example: on 20th Nov 2018, the one day VAR of this portfolio is $1 million with a confidence level of 99%. Which means that the possibility of this portfolio losing $1m or more in one day is 1%. Whereas “10-day VAR with a confidence level of 95% is $4m” means: the possibility of losing $4m in 10 days is 5%.
  • Name: Fact Portfolio Value At Risk
  • Grain: 1 row for each sector/country, per portfolio, per day, per horizon, per confidence level.
  • Surrogate keys: calculation date key, portfolio key, sector/country key
  • Measure: because the time horizon is fixed at 1 day and 10 days, and the confidence levels are fixed at 99.9%, 99% and 95%. I prefer to have the time horizons and confidence levels as columns so we have 6 VAR measures / 6 columns like this: VAR_1D_999, VAR_1D_99, VAR_1D_95, VAR_10D_999, VAR_10D_99, VAR_10_95. This saves us from pivoting the result all the time.
  • Rows: the rows in this fact table are one of the following 5 types:
    1. Portfolio Level
    2. Country Level
    3. Region Level
    4. Industry Level (we pick 1 scheme, such as Merrill Lynch or Barclays)
    5. Sector Level
  • Note: the tricky thing here is storing the sector or geographical segments because the measure is non-aggregatable. In the country dimension, we do have SKs for each country, but we don’t have SKs for the regions. Should we create rows for Asia, Latin America, Europe, etc. in the country dimension? No. We must not change the grain of a dimension, by adding higher level rows. The best way I find is to create a separate (new) VAR Segment dimension, which has these columns: Segment Scheme (value: Country, Region, Industry or Sector), Segment L1, Segment L2, Segment L3. The rows with Segment Scheme = Country have the L1 populated with country, L2 with region, L3 blank. The rows with Segment Scheme = Region have the L1 populated with region, L2 and L3 blank. The rows with Segment Scheme populated with Industry have the L1 populated with Industry, L2 with Sector, L3 with Sector Group.

Why not separating the VAR fact table as 3 tables? 1 for portfolio level, 1 for country (and region) level and 1 for industry (and sector) level? Because they have the same measures, and because they are not aggregatable anyway, so we can’t sum the rows. So might as well put them all in one table.

One note with regards to ML sector or Barclays sector, or GICS sectors, etc.: it is not a problem obtaining the classification scheme i.e. how different industries are grouped into sectors and super sectors. The problem is to get the data of 20,000 companies in the world, each of them is allocated into a sector. Is that company manufacturing or retail? Obtaining this data is the issue. Not just for 100 or 1000 companies, but for 20,000 public companies in the world, that is the issue. So in reality we will use the industrial classification used by our security data provider, i.e. if we are sourcing the securities data from Bloomberg then we use the industrial classification provided by Bloomberg. If we source the securities data from Thomson Reuters or Factset, then we use the industry data from them.

This is part of security/instrument mastering process. I mean attaching the ML, GICS (MSCI and S&P) or Barclays sector into each security. And attaching ratings into each security. And attaching country of risk and country of incorporation into each security (and country of domicile).

We have discussed the VAR on industry sector and on geography (countries). How about the carving-up of the VAR based on the asset classes and ratings? These two are not as common as the industry and countries, but it is possible to run them pass the risk system and get an output. If this is the case in your company, then we put them into the same VAR fact table above, with the VAR Segment Key column pointing to rows in the VAR Segment dimension for asset classes (such as IRS, CDS, bond and equity) and ratings (such as AA-, BBB+, etc.)

B.4. Transaction Fact Table

Background: Internal and external orders

There are two types of order: the one which the fund managers put in to our trader (this is called “internal-facing”) and the one which our trader put out to our brokers (this is called “external-facing”). Within our company, it is possible two fund managers (or more) put an order for the same security (on the same day or thereabout), and our trader combines them into one external order. After the order is executed, the trader allocates the security into two portfolios. The opposite can also happens, i.e. one order from a fund manager is very big so the trader decided to split it into two smaller external orders and put them to separate brokers to get better price. After the orders are executed, the trader combines the two external orders and put them both to the same portfolio belonging to that fund manager.

The transaction fact table here stores the internally-facing orders. Not the externally facing orders.

Background: order lifecycle

  • A member of the desk (e.g. a financial analyst, or the fund manager) sends an email to a trader asking them to buy or sell an exchange traded security, or make an OTC derivative deal. He specifies the quantity and the price boundary (a range of price where the desk is prepared to accept) and the timing (must be same day, or can it be next day, or even day after next). If is a bond, he specifies the issue, i.e. the maturity and coupon. If it is equity, they specify the class (e.g. Alphabet class A or C, with or without voting rights), the ADR (for US), HK or Shanghai (for Chinese equity). If it’s a derivative they specify the terms of the deal, e.g. reference rate, maturity, etc.
  • A trader enter the order into their blotter, recording the security (e.g. ISIN/SEDOL), maturity, coupon, dates, class, etc. and to which portfolio this order is for. The trader enter the order on the Order Management System (OMS), aka Investment Management System (IMS) and the order status is Draft.
  • When the order is created, various compliance checks automatically run on the OMS for that portfolio, and if this order breaches one of the rules (be it mandate or regulatory), OMS will block this order and it cannot continue to the next stage.
  • If the order passes the compliance check, the manager of that fund then get notified automatically, asking them to approve this order (known formally as “authorise”). The fund manager checks the impact of this order to her/his portfolio e.g. the valuations, the risks, and the portfolio characteristics such as duration, and then authorise the order.
  • In the mean time the trader tries to find the best price by comparing quotes from several brokers. If it is an OTC deal, the risk system automatically checks the counterparty risk level of the broker. If the risk level is breached, this order is blocked and can’t continue.
  • Once the order is authorised, and the risk + compliance systems are both green, the trader go ahead with one of the brokers (or two) and place an order with them. At this point the status of the order becomes “sent”. There is a new MiFID regulation with regards to best execution, i.e. an asset management company must execute every order to the best of the client’s interest, not the company’s interest. This means finding the best price amongst the brokers and this is reported to the regulator.
  • The broker then sends the order to the exchange, to find a match on the opposite side, i.e. if it is a sell order they find a buy side, and vice versa. Once a match is found and agreed the order becomes a trade, and the order status becomes “filled”. The exchange sends the trade details to the broker i.e. who the other side is, and at what price the trade happened. The exchange also sends the trade details to the custodian i.e. the bank who holds the securities on behalf of the asset manager (usually also does settlements and administration such as dividend and coupon payments).
    Note that trade means both sides whereas order means one side.
  • The broker informs our trader that the trade has been executed, along with the trade details such as price, who the opposite side is and the settlement date. This is called “confirmation”. The broker confirms all the trade details with the trader. The broker on the other side also confirm the trade details with their client. Any incorrect details would be corrected at this stage.
  • Two days after the trade was made, we settle the trade, known as T+2 (some securities are T+1 or T+3). If we are buying, our back office will transfer the money to the clearing house, and the clearing house sends the security to our custodian. If we are selling, our custodian will send the security to the clearing house and our back office will receive the payment. Often the custodian bank offers settlement services, which means that, instead of the back office making and receiving the payments, it is the custodian bank who is making and receiving payments to and from the clearing house. After this stage, the status of the order becomes “settled”.
  • Then the securities is allocated into the relevant portfolios (if we are buying, and the trader combines several internal orders into 1 external order). At this point the status of the order is “allocated”. But, the new security was put in the portfolio way before “allocation”. The security is put as “holding” in the portfolio (and therefore affects the value of the portfolio), when the order status is “filled”.
  • After the order is allocated, the order is closed. The order status at that point becomes “closed”.
  • Before an order is executed (order status = “filled”), it can be cancelled and the order status becomes “cancelled”. After the order is filled, the order can’t be cancelled as both sides have committed*. If the broker made a mistake, e.g. the buying the wrong security, then the broker pay for the wrong security out of their pocket (which they will sell later) and then buy the correct security for us the AMC.
    *Note: The exchange usually allows cancellation, as long as both sides agree. But the opposite side of the trade rarely agree to the request to cancel the trade. Correction, yes (at confirmation stage) but cancelling, usually no.

Ok that’s a lot of background. Now let’s specify the design of the fact table.

Transaction Fact Table

  • Description: stores every order for every portfolio. In the transaction fact table we do not store the other side of the trade. We only store our side of the trade. We store the quantity, price, value and the dates of every stage.
  • Name: Fact Transaction
  • Grain: 1 row for every order
  • Fact Table Type: accumulated snapshot
  • Surrogate Keys: Transaction Key (aka Order Key), Order Status Key, Broker Key, Trader Key, Portfolio Key, Security Key, Exchange Key, Authoriser Key, Order Created Date Key, Authorised Date Key, Sent Date Key, Filled Date Key, Confirmation Date Key, Settlement Date Key, Cancellation Date Key, Closed Date Key.
  • Degenerate Dimension: Order ID (internal), Order Type (B for Buy and S for Sell), Trade ID (external)
  • Measures: Quantity, Trade Price, Order Value, Traded Yield, Traded Spread
  • Loading: once a day from the “trade file” exported by the OMS at the end of the day (about 11pm London time, after the US market is closed). Just like position file. This trade file is loaded into the data warehouse straight after it is produced by the OMS. All orders which are not “closed” are “loaded” into this fact table: new orders are inserted, existing orders are updated.

Note: Bear in mind that 2 internal orders (2 order IDs) can be combined into 1 external trade (1 trade ID). And 1 internal order (1 order ID) can be split into 2 external trades (2 trade IDs), each going to different brokers, each has different opposite side. The grain of this fact table is 1 row per internal order in the OMS. If 2 orders are combined into 1 trade, 2 rows will have the same trade ID. If an order is split into 2 trades, only the trade ID of the primary trade is stored here on that row (the largest trade, by value).

Unlike in the retail industry (supermarkets, shops) where an order usually contain multiple items, in OMS an order usually contains only one security.

External Trade Fact Table

Now that we know what a transaction fact table is, we can create an external trade fact table which stores externally facing trades.

  • Name: Fact External Trade
  • Grain: 1 row for each trade
  • Fact Table Type: Accumulated Snapshot
  • Surrogate Keys: Trade Key, Trade Date Key, Exchange Key, Broker Key, Trade Type Key, Security Key, Counterparty Key, Confirmation Date Key, Settlement Date Key.
  • Degenerate Dimension: Trade ID
  • Measures: Quantity, Trade Price, Market Value
  • Loading: once a day from the OMS export

C. Dimensions

C.1. Security Dimension

A security is a financial instrument, such as shares/stocks of a company. The followings are also securities:

  • Bond: an instrument where we get paid coupon (interest), usually every 6 months
  • Option: an agreement to buy or sell shares or bonds (or currencies, or indexes)
  • Future: an agreement to buy or sell bonds or commodities in the future
  • Forward: like Future but for currencies (including NDF*)
  • Swap: an agreement to exchange income from a security
  • Foreign Exchange (FX): an agreement to buy a currency
  • Credit Default Swap (CDS): an agreement to get compensation if a bond is defaulted (the issuer can’t pay the coupon/interest)
  • Loan: an agreement to borrow money with interest
  • Securitised: a bond backed by a collection of loans or mortgages
  • Certificate of Deposit: a certificate from a bank which indicates that we have deposited a large sum of money in that bank

Note: the above is called “asset class”.

*Non-Deliverable Forward is for emerging market currencies which can’t be brought outside the country. At maturity (usually 1 to 12 months) the 2 counterparties settle (in USD) the difference between the NDF FX rate and the spot FX rate. It is called Non Deliverable because the notional amount is not exchanged, only the difference is settled.

The followings are not a security:

  • Cash, i.e. money in different currencies (but an option to buy money in other currency at an agreed rate is a security)
  • Account/deposit, i.e. money that we put to a broker or a bank e.g. for margin trading or settling trades.
  • Real estate, i.e. property such as a house or a land
  • Commodity, such as oil, cotton, iron and copper (but the a future on commodity is a security)

In the broadest sense securities can be grouped into two: bond or equity. Bond is formally known as “debt”, to include other fixed income securities outside bonds, such as derivatives (options, swaps and forward).

A security can be listed in an exchange. For example: a share can be listed in a stock exchange, a future can be listed in a future exchange. A security can be negotiated just directly with an investment bank (they are called the “sell side”, we are called the “buy side”) and this type is called OTC, stands for over-the-counter, meaning that we make a deal outside the exchange. OTCs are mostly swaps and options, but an equity and future can also be bought outside the exchange (directly from the seller, which is mostly banks). This attribute is called security type, i.e. listed or OTC. Listed security is also called ETS, stands for Exchange Traded Security, or ETD, stands for Exchange Traded Derivatives (if it is a derivative).

Security Master Application

An asset/investment management company usually has an instrument mastering system, aka Security Master Application. This application enables the creation of a security (financial instrument*) in real time whenever the following event occurs (or before hand):

  1. A benchmark contains a new security (not already exists in the security master table).
  2. An order contains a new security.
  3. An OTC deal/trade is made.
  4. A file from ESG vendors contains a new security

*Note: cash is a financial instrument but not a security (ditto deposits).
Asset is everything that is held in a portfolio, including cash, deposits and derivatives.

Security Master

The Security Master application stores every security used in the company. Equity, bonds, options, futures, swaps, loans, certificate of deposit, treasury, money market, ETF, everything. But not funds, either external funds or internal pooled funds. Old securities which have matured are also stored here. Every security held in every portfolio is stored here. Every security held in any benchmark used by the company is stored here. Every security resulting from any over-the-counter trade is stored here (e.g. IRS, CDS, FX option). Every ISIN or Sedol mentioned in any ESG file from any ESG vendors (e.g. Bloomberg, Factset, ICOS, EY, Nasdaq, Thomson Reuters, FTSE Russell) is also stored here. The ESG data is actually per company, not per security. But, next to the company name, the vendor put a security identifier such as primary ISIN or Sedol of that company (usually the common equity). But if the company is not an equity issuer then they put the fixed income identifier/ISIN.

The primary source of a Security Master application is the Bloomberg security universe. There are about 650,000 securities there, but an investment/asset management company usually only request the actively traded ones, which is about 67,000. This way, when fund managers are considering to buy a new equity or bond which is not in the index, they will find it in the OMS. Because the OMS takes the security from the Security Master and the Security Master has every actively traded security, be it shares, bonds, treasury, listed options or listed futures.

Of course the Bloomberg security universe does not contain OTC derivatives*, because an OTC derivative is a security created by two companies making a deal between them to buy and sell an option or a swap. They are not listed in any exchange. Therefore the OTC system needs to connect to the Security Master.
*Note: are there OTCs which are not derivatives? Yes, an equity can be bought outside an exchange.

The main purpose of a Security Master app is to enrich the “skeleton” security with various attributes (usually from Bloomberg) such as security name, country of risk, country of incorporation, country of domicile, currency, rating, issuer, exchange, and industry sectors, along with various identifier such as ISIN, Sedol, Cusip and Ticker. Security Master also have screens/pages where the users can manually alter the attributes of a security. This is usually done by the data management team or the operations team.

Security Dimension

The ideal source of a security is the security master application where the creation, integration and enrichment happens, but if it is not available (usually in the case of a small asset manager), then security dimension becomes the creation and integration point. Creation point means that new securities such as OTC are created in the security dimension (skeleton only) and then enriched by making a “per sec” call to Bloomberg to get the remaining attributes. I have to emphasise that a security dimension is not the right place for mastering security in the company, because a) the mastering process requires a lot of manual checking, and potentially manual entries, b) it requires judgements with regards to what value we want to allocate to security, when there are conflicting values coming from different data sources.

Dimension Design:

  • Dimension Name: Dim Security (alternatively Dim Instrument)
  • Dimension Type: SCD Type 2
  • Grain: 1 row for each version of a security
  • Business Key: Security ID (from Security Master)
  • Surrogate Key: Security Key
  • Description: contains listed and OTC securities but not cash, funds or deposits. This is a “core and custom” dimensions, i.e. Dim Security as the core dimension contains about 40-50 common attributes across asset class and Dim Security Equity/Fixed Income/Swap/Option/CDS as custom dimensions covering attributes specific to those asset classes.
    Note: The “core and custom” dimensions is also known as “supertype and subtype” dimension, see Jeremy Kashel’s explanation here on Adatis blog and a short description here on Kimball group website. Also see the Learn Data Modelling post on this here and Oracle Academy presentation here.
  • Number of rows: about 500,000 to 2m (30-60,000 distinct securities, 10-20 version each)
  • Number of columns: about 40-50
  • Attributes: Security ID (from Security Master), Security Name, Description, ISIN, Sedol, Cusip, BBG ID (Bloomberg), Ticker (Bloomberg), FSYM ID (Factset), RED ID (Markit), RIC (Thomson Reuters), Master Issue ID (Moody’s), Exchange, Issuer, Currency, Country (Risk, Domicile, Incorporation), Market, Industry (GICS, ICB, ML, BBG, Barclays), Issue Date, Security Type, Maturity Date, Underlying Issuer, Asset Type/Class/Group.
  • Security attributes not included: Series, Tenor, Reference Name (these are CDS attributes), callable, call dates, puttable, put dates (these are Option attributes), Nominal, Principal, Coupon, Debt Type, Convertible Flag, Perpetual Flag, Securitisation, Zero Coupon, Sinking Fund, Credit Ratings, Reference Yield Curve (these are fixed income attributes). These attributes are stored in Dim Security Fixed Income, Dim Security Equity, Dim Security Swap, Dim Security Option and Dim Security CDS
  • System columns: inserted datetime, last updated datetime, latest flag.
  • Loading: from security master. Rows in the security master is already deduplicated, meaning that it’s 1 row per security. So we can take an export of the security master, and apply the SCD type 2 process on the attribute that we like to keep the historical value.

Notes:

  • Dimension Name: should it be called instrument dimension or security dimension? Either one is fine. Financial instrument includes cash and deposit, whereas security doesn’t. Is oil a financial instrument? No. Is it a security? No. Commodities like cotton, wheat, soya beans, iron ore, copper and oil are not financial instruments or securities. But the options on them are financial instruments and also securities.
    Gold is not a financial instrument. A loan is a financial instrument, because it is tradable. An FX option is a financial instrument because it is a trade (a deal) between 2 parties. Property is not a financial instrument.
    The definitive definition of a financial instrument is given by the Securities Act (link) and the IMF: link.
  • Security Name is a type 1 attribute (treat changes as corrections)
  • The same CDS is issued by several banks, for example a 5 year CDS on a Tesco bond can be issued by JP Morgan and Citi Group. Tesco is the reference entity, whereas JPM and Citi are the issuer. So the “issuer” of a CDS is different from the issuer of a bond or shares.
  • Licence on identifiers: to use the proprietary security identifier listed above (FSYM, BBG ID, RIC, etc) we need to pay a licence fee. Ditto Cusip (link). We can’t store them without paying first. But not ISIN and Sedol, as far as I know.
  • Ticker: Each vendor has their own ticker, e.g. Bloomberg tickers are different from Factset tickers, and are different from Reuters tickers. They are all licenced (not free). For example: The ticker for PT Telekomunikasi Indonesia are TKLM:IJ (Bloomberg) and TLKM:JK (Reuters). The Exchange Ticker code (known as EPIC or TIDM in UK) is free to use as far as I know. For example: NYSE: T, LSE: LLOY, ETR: BMW, NASDAQ: AMZN and IDX: TKLM.
  • The shares of a company can be traded on different exchanges under different ISINs. For example, RBS is traded in London (ISIN: GB00B7T77214) and New York (ISIN: US7800976893, as ADR).
  • OTC swaps such as IRS are stored in the security dimension as one row, not 2 rows. Most of the attributes are stored in Dim_Security_Swaps. Only the core attributes are stored in Dim_Security. In Dim_Security_Swaps IRSes are stored as 2 rows: 1 for the fixed leg and 1 for the floating leg. It is possible that both floating, in the case of multi-currencies swap.

C.2. Issuer Dimension

The issuer dimension stores four things:

  • All public companies which issue shares or bonds.
  • All central governments, municipal governments and government related bodies/agencies which issue bonds.
  • All the investment banks which issue credit default swaps (CDS) and certificate of deposit (CD).
  • All companies which have ESG data (both public and private)

Dimension Design:

  • Dimension Name: Dim Issuer
  • Dimension Type: type 2
  • Grain: 1 row for each version of company, government, or agency
  • Business Key: Bloomberg Company ID, or Factset Entity ID, or Markit RED ID, or LEI, or DUNS ID.
  • Surrogate Key: Issuer Key
  • Attributes: Issuer Code (one of the business keys above), issuer name (legal name) – this is type 1, issuer short name, several entity identifiers (BBCID, RED, FSYM, LEI, DUNS) – this is type 1, several primary security identifiers (primary ISIN, primary SEDOL, primary CUSIP, primary ticker, primary exchange) – this is type 2, issuer type (public company, private company, central government, municipal government, or government agency) – this is type 2, headquarter address, telephone, contact person, country of risk (primary country of operation), country of domicile, country of incorporation (all countries are stored as 2 character ISO codes, not as surrogate keys), parent company ID (this is the business key e.g. BBG CID, not the surrogate key), industry sector (several columns, i.e. GICS, ICB, SIC, NAICS, ISIC, ML, BBG, Barclays).
  • Attributes not included: CEO (this is stored in the Executive Dimension because it date dependent), market cap (this is stored in a fact table because it is date dependent), number of employees (date dependent – should be stored in a fact table)
  • System columns: inserted datetime, last updated datetime, latest flag.
  • Loading: a daily export of the issuer table from the security master get loaded into Dim Issuer, with SCD type 2 process.

Notes:

  • Name: we should not call this dimension Dim Company because the governments are not companies.
    Dim Entity or Dim Legal Entity is possible but it is not popular nor clear.
  • Other entity identifiers are listed on Alacra (link). Most of them are either sectoral (e.g. only banks) or geographical (e.g. only Japan).
  • The main challenge in populating this dimension is to make sure that each company is only inserted once. This matching process (to identify if a company already exists) thankfully does not happen here in the issuer dimension, but in the security master system.
    The security master system, after creating a new security, it populates the issuer code and underlying issuer code for that new security. At this time, it tries to find that issuer in its internal issuer table, using the natural key (e.g. RED, FSYM or BBG CID). Because issuers are already deduplicated in the security master, we in the DW team just need to consume it.
    But the ESG data sources will have companies not in the “investment” data space. They need to be added into the security master, and come out into the DW.

C.3. Client Dimension

A client is from whom we receive a mandate and money to invest on their behalf. It could be a segregated mandate, which means that we manage their money separately, i.e. in a separate portfolio. It could be a pooled fund, which means that we put their money into one of our existing fund, and manage it together with the money from other clients. In this context, our clients are companies or government bodies.

The client dimension is usually built from Salesforce which stores the detail of every single client and prospect. Most of the asset managers that I know use Salesforce for their CRM (Salesforce is used by many other industry too, e.g. banking and insurance).

Many asset managers are institutional, e.g. they only have corporate clients. In this case the client dimension will contain only companies and governments/government bodies. Some asset managers are retail, e.g. they have individual investors as well as corporate clients. In this case the client dimension will contain companies, government/government bodies and individual people. Companies have different attributes to people. Companies have different attributes to governments too, but not as different as people’s attributes, i.e. companies and governments and government bodies/agencies can be put into one table.

The question is, should we create two client dimensions, one for company/government and one for individual people? Yes we should. The retail client have different requirements, different attributes and different processes to the corporate clients. And they access different funds too.

Within corporate clients, we have a few major categories: insurance clients, pension funds, consultants, corporate investors, and government/sovereign clients.

Insurance clients usually request the following Solvency II related reports: Pillar 3 QRT (Quantitative Reporting Templates), Tripartite template, capital charge calculation with risk pillar attribution, and capital charge analysis.

Whereas pension fund clients usually only needs the following (insurance clients need these too): asset valuation (holding level and higher level aggregations, e.g. sector, rating, duration), performance attribution, performance return. All of the above are comparing the portfolio to the benchmark. Plus they also need market review, market outlook and trades report.

Dimension Design:

  • Dimension Name: Dim Client
  • Dimension Type: type 2
  • Grain: 1 row for each version of a client
  • Business Key: Salesforce Account Number
  • Attributes: Client Name, Address (including post code and country), Billing Address (if different), Telephone, Website, Main Contact Person, Main Email Address, Account Manager, Industry, Client Type, Client Sector, Account Source, Parent Company ID, Date became client, Date of first mandate.
  • System columns: the usual 3.
  • Data source: from Salesforce.

If we are administering our own retail funds, we need to create a subscriber dimension, which contains the individual investors. We need to install and use a fund administrator software, and the subscribe dimension is populated from the export of the fund administrator system. If the fund administration is outsourced to a third party, we get our subscriber data from the third party.

Dimension Design:

  • Dimension name: Dim Subscriber (or Dim Investor)
  • Dimension Type: type 2
  • Grain: 1 row for each version of a subscriber
  • Business Key: Subscriber
  • Surrogate Key: Subscriber Key
  • Attributes: Subscriber name, Address, Date of birth, Email address, Telephone number, Date of first subscription.
  • Data source: fund administration system

It is rare that we need to create this Dim Subscriber because we are not dealing with the day-to-day subscription (people putting money in) or redemption (people pulling their money out), or income distribution (allocating the dividends from the company in the fund to the subscribers). The day to day task of administering a fund is usually outsourced. If we do manage the day to day fund administration, then we also need to create a fact table for subscriptions and redemptions, and any other movement of money within the funds.

C.4. Portfolio Dimension

In layman term, a portfolio is a collection of shares or bonds that we manage as one entity. In addition shares and bonds, we can also have derivatives such as options, futures, swaps, FX, CDS, securitised, CD, etc (see section C1 above where I explained about various securities), and cash.

A typical equity portfolio consists of 30 to 60 shares occupying 98% of the fund, with the remaining 2% as cash. It is open to the public, meaning that individual investor can in that fund. The fund could be available in different share classes, i.e.

  • In different currencies such as USD, EUR and GBP
  • With different distribution type, i.e. accumulation or income. Accumulation means the dividend from the shares is reinvested into the fund. Income means the dividend from the shares is distributed to investors.
  • For different target inventors, i.e. institutional or retail investor. One share class might have minimum investment of £100 because it is for retail investor, whereas another share class might have minimum investment of £100,000 because it is for institutional investor.
  • With different charges, i.e. one share class having Initial Charge of 5% and Ongoing Charges Figure (OCF) of 1%, whereas another share class has Initial Charge of 4% and OCF of 1.5%. OCF includes expenses that we pay for administrator, accountant, auditor, legal, depositary and registrar.
    A depositary bank (with an o in the US, depository) holds the shares in the fund and distribute the dividends from the shares. It is also known as custodian bank or fiduciary.
    A registrar keeps a register of shareholders of the fund (including AML and KYC) and produce capital account report and distributor position report. For a fund the correct term is “subscriber”, for a unit trust it is “shareholder”. AML stands for Anti Money Laundering, to check that the money from the subscriber is from valid source. KYC stands for Know Your Customer, to check the identity of the subscriber to make sure they are who they say they are, and investigate if they are politically connected or domiciled in restricted countries.
  • With different performance fees, i.e. one share class may charge a performance fee, whereas the other share class doesn’t charge a performance fee. For example the investor pays 10% of the amount by which the NAV of the fund exceeds the FTSE All Share index.
  • ISA compliant, meaning whether individual investors can keep this fund in their Individual Saving Account (ISA) or not.
  • SIPP compliant, meaning whether individual investors can keep this fund in their Self Investing Pension Plan account (SIPP) or not.

A fund can be open ended, meaning that when there is a new subscriber they issue new shares (or unit of shares, to be precise) and when a subscriber sells their holding their shares are sold (called “redemption”). Or it can be a closed ended fund, meaning that the number of shares is fixed, and the fund is listed in the stock exchange so investors can buy and sell it.

A fund has specific objective, which is to invest in certain markets, in certain financial instrument. For example, only equity or only bonds (or both). Only in Europe or only in Asia (or global). Only small companies or only big companies (or both). Only emerging market or only developed market (or both). Only in bio technology companies only in financial services companies, or only in mining companies (or all industry sectors).

A fund has a benchmark, for example to outperform FTSE 100 GBP Total Return Index. Total Return means that the dividend is reinvested back into the index. FTSE 100 is the 100 largest companies listed in London stock exchange.

So that is one end of the spectrum: a portfolio which is a fund. The other end of the spectrum is a portfolio which is a segregated mandate. We receive a mandate (instruction) from the pension fund trustee of Supermarket ABC to invest £20 million of their money in a certain way, for example: 80% of it must be invested as fixed income in the US and European markets separately, we must not take any short position and we must not use derivative other than FX options to hedge currency risk. We create a client account called Supermarket ABC, and under this account we create a portfolio called “Supermarket ABC US Fixed Income” and “Supermarket ABC European Fixed Income”.

Now that we know the difference between a portfolio, a fund, a mandate and an account, we are ready to design the portfolio dimension. But one more thing: some asset manager called a portfolio as a “product”, because that’s what it is known as in the sales and marketing department. In my opinion this is incorrect because a product can consists of several portfolios.

Dimension Design:

  • Dimension Name: Dim Portfolio (Dim Account, Dim Fund, Dim Product are not correct names, due to the reasons I described above)
  • Dimension Type: type 2, particularly on portfolio manager, status, valuation point, distribution date, depository because we need to record the previous values. Not on inception date, portfolio name, description, etc – they are type 1 attributes because the changes means correction.
  • Grain: 1 row for each version of the portfolio.
  • Business Key: Portfolio Code
  • Surrogate Key: Portfolio Key
  • Attributes: Portfolio Name, Portfolio Description, ISIN, Client Code (not Client Key that’s snowflaking), Legal Structure (ICVC, OEIC, Unit Trust, etc.), Portfolio Type (Fund, Sub Fund, Segregated or Shell/Composite) Main Asset Class (Equity, Debt or both), Distribution Type (Accumulation or Income), Parent Portfolio Code, Main Share Class, Base Currency, Portfolio Manager (or Fund Manager), Status (Active, Soft Close or Closed), Inception Date, Closed Date, Income Frequency, Depositary Bank, Valuation Point (e.g. daily at 12pm), Main Benchmark (a better source of benchmark is the performance fact table, as a portfolio can have more than one benchmark).
  • Attributes which should not be included: number of holdings, fund size, Sharpe, Alpha, Beta and yield because they are date dependent so they should be stored in fact tables. Also Ex Dividend Date and Distribution Date because it could be twice a year.
  • Data Source: daily export from a portfolio master data system, either an in-house built application or a part of the Investment Management System.

Arguments for a Share Class Dimension

A portfolio can be a fund or it can be a segregated account/mandate. If the portfolio is a fund, the fund is offered to the public in many different variations. Each variation is called a “share class”. For example:

  • It can be offered in various currencies, e.g. in USD, in EUR or in GBP.
  • There are 2 choices with regards to dividend (or coupon if it is a fixed income fund). The dividend can be reinvested back in the fund, or the dividend can be distributed. The former is called “accumulation” share class, and the latter is called “income” share class.
  • Whether a performance fee is charged or not. For example, if the performance of the fund is over the benchmark, then the investor pays 20% of the outperformance, e.g. if the fund performance is 7% and the benchmark is 5%, investor pay 20% of this 2% = 0.4%.
  • Different minimum investment amount or minimum top up amount / regular saving value.
  • Different initial charges and on-going charges
  • Whether the broker commission is included in the fees or separate. The former is called “inclusive” share class, whereas the latter is called “unbundled” share class. Generally speaking the “inclusive” share classes have higher management charges but have higher loyalty bonuses.
  • Some share classes can be included in ISA or SIPP, and some can’t.

A fund can be not public. It is called “pooled funds”. We can put the client’s money into a completely separate account. This is called segregated accounts. Or we can put money from several clients into the same account which has the same purpose, for example: to counter inflation risk, or to counter interest rate risk. This is called a pooled fund. This approach lower the cost of operation, and therefore lower the charges to the clients. A pooled fund can have several share classes, such as different currencies. Although the number of share classes in a pooled fund is not as many as the number of share classes in a public fund (generally speaking).

The purpose of having a share class dimension is primarily to support the performance fact table. This is because the performance can be calculated on a “net of fees” basis, therefore different share classes with different charges will have different performance.

One important question is whether we should put the share classes in the portfolio dimension or not. The right answer is yes we should. Because share classes are portfolios. Therefore they should be create in the portfolio dimension. Share classes should not be created as a separate dimension. So the portfolio dimension should contain: public funds, pooled funds, segregated mandate and share classes.

C.5. Benchmark Dimension

The benchmark dimension stores the names and descriptions of the indices from index providers such as Rimes, MSCI and Barclays, but not the constituents (the index constituents are stored in the benchmark position fact table).

It also stores the names and descriptions of the “blended benchmark”, which is a combination of several indices. For example, a global fund which is 40% equity and 60%% credit can have a benchmark of 40% MSCI World and 60% of Barclays Global Credit.

A benchmark is not always an index of equity or credit. A benchmark can also be an interest rate such as 6 months LIBOR and EURIBOR. This is especially true for money market funds and absolute return funds.

So a benchmark can be one of these 3 things:

  • An index
  • A combination of indices (blended)
  • An interest rate

Dimension Design:

  • Dimension Name: Dim Benchmark
  • Dimension Type: type 2 on benchmark status, region (treat changes as genuine and store historical values). Type 1 on benchmark name, benchmark description, benchmark type, base currency, date of issue, issuer, asset class (treat changes as correction).
  • Grain: 1 row for each version of the benchmark
  • Surrogate Key: Benchmark Key
  • Business Key: Benchmark Code
  • Attributes: benchmark code, benchmark name, benchmark descirption, benchmark type (index, blended, interest rate), base currency, benchmark status, issuer (e.g. Rimes, MSCI, Barclays), issue date, expiry date, asset class (e.g. credit, equity, cash, etc.), region (e.g. World, US, Europe, Asia, Emerging Market, etc.),
  • Attributes not included: average rating (this is calculated monthly and stored in an “output table”), performance return of the benchmark (this is stored monthly in the performance fact table), number of constituents (this is calculated monthly from the benchmark position fact table and stored in an “output table”).
  • Data Source: index provider such as Rimes, MSCI and Barclays. They give daily files which we need to load into the Investment Management System and the data warehouse (benchmark position fact table and benchmark dimension).

C.6. Country Dimension

It seems obvious that a data warehouse for an asset manager should have a country dimension but sometimes this could be the very last dimension built because it is not required until later.

In the Security Dimension for each instrument we have 3 country attributes: country of risk (operation), country of domicile and country of incorporation (registration). They are stored as two character ISO country codes (star schema). We don’t store the country keys there (snowflaking). So what is the point of having a country dimension? Where else do we use country?

We use country in the position fact table (both holding and benchmark). We also use country in the ESG fact tables, such as event fact table, director fact table, carbon footprint, climate change, pollution, water scarcity, and many other fact tables.

The country dimension is also used to translate the 2 character ISO country code in the Security Dimension I mentioned above.

This dimension table contains just one attribute: the country name. Well it also contains the 3 character ISO country code but the main attribute is the country name. The issue is, the country name could be different for each system. The country name on the Investment Management System (the investment desks) might be different for client reporting. So we need to create several attributes (columns) for country name.

Dimension Design:

  • Dimension Name: Dim Country
  • Dimension Type: Type 1 (override the country name with the new value)
  • Grain: 1 row per country
  • Business Key: ISO Country Code (2 characters)
  • Surrogate Key: Country Key
  • Attributes: ISO Country Code (3 characters), Country Name (IMS), Country Name (Client Reporting), Short Country Name (Client Reporting), Country Name (ESG).
  • Attributes which should not be there: Currency, Capital, Time Zone, Dialling Code, Main Language. We don’t need these attributes in the Country Dimension, either for Client Reporting or for investment desks. Creating them in an asset management data warehouse is a waste of time. If it is for governmental organisation or NGO data warehouse they might be required, but not for investment warehouse.
  • Data Source: initially created from ISO 3166. When operational it is updated manually using a SQL script, either to insert a new row or to update the country name.

Notes:

  1. “European Union” and “Antarctica” are entries in the country dimension which is legally not a country. Others are: Great Britain, England, Wales, Scotland, Cocos Island,
  2. “The” and “Republic” prefix is often disputed between department, particularly for reporting. For example: “The Republic or Armenia” or “Republic of Armenia” or “Armenia”. Most project settled for the last one, i.e. without “The”, without “Republic”, without “of”.
    Other examples are: The Netherland, The Republic of Moldova, The Turks and Caicos Islands, and The French Southern Territories, The Sudan and The United Arab Emirates. In all these cases, the word “The” are omitted.
    The word “the” is not omitted if it is not at the beginning of the country name. For example: “South Georgia and the South Sandwich Islands” and “Saint Vincent and the Grenadines”.
  3. “East Timor” or “Timor Leste” or “Timor-Leste” (with a hyphen)
  4. Country name should be nvarchar to cater for unicode. For example: “Saint Barthélemy” (note the acute) and “Curaçao” (note the c with cedilla). I prefer to not use diacritics but client reporting requirement could dictate otherwise.
  5. The country dimension should include non sovereign countries such as Faroe Islands, Gibraltar, Greenland, Hong Kong and Guernsey. Please refer to ISO 3166 (link) for a complete list. Their sovereign are: Denmark, UK, Denmark, China and UK, respectively.
  6. Ivory Coast: the official country name is “Côte d’Ivoire” (with a circumflex), but the English name of the country is Ivory Coast so that is the entry in the country dimension.
  7. “South Korea” or “The Republic of Korea” (the latter is the legal name but many systems are using the former)
  8. The need for the short country name attribute for reporting arises to cater for long official country names such as “The United Kingdom of Great Britain and Northern Ireland” which is usually shortened as “United Kingdom”, “The United States of America” which is usually shortened as “USA”, and “People’s Republic of China” which is usually shortened as “China”.
  9. The list of countries in the country dimension is not a political list. Its main purpose is to have all possible countries that the IMS and ESG system have, and for client/regulatory reporting. Hence it contains non UN member states such as Palestine, Taiwan and Vatican City.
  10. Korea: There are two countries. The country names commonly used are “Republic of Korea” (South Korea) and “Democratic People’s Republic of Korea” (North Korea), omitting the word “The” as per the convention (see point 2).
  11. Congo: There are two countries. The country names commonly used are “Congo” and “Democratic Republic of the Congo”, omitting the word “The” as per the convention (see point 2).
  12. Virgin Islands: There are two countries. The country names commonly used for the country dimension are “British Virgin Islands” and “United States Virgin Islands”. They are a bit different to the ISO entries which are: “Virgin Islands (British)” and “Virgin Islands (U.S.)”.

C.7. Currency Dimension

The currency dimensions stores the currency code (business key) and the currency name.

The currency dimension is used by the position/holding fact table (both the portfolio side and the benchmark side), the transaction fact table and the portfolio valuation fact table. It is also used by the performance fact table, the value-at-risk fact table and several ESG fact table such as executive compensations. In these fact tables the currencies are stored as currency keys.

In the dimension tables, the currency must be stored as 3 letters code (e.g. GBP, USD, EUR), not the currency key as this would be snowflaking.

The currency dimension also contains the metal currencies and supranational currencies. The metal currencies are used for metal such as gold (XAU) and silver (XAG). The supranational currencies are currencies used by multiple countries, such as the Euro, the East Caribbean dollar (XCD), the CFP franc (XPF), and the CFA franc (XOF and XAF).

Crypto currencies are also listed in the currency dimension. For example: BTC (Bitcoin), ETH (Ethereum), LTC (Litecoin), NMC (Namecoin), XRP (Ripple)

Dimension Design:

  • Dimension Name: Dim Currency
  • Dimension Type: Type 1 (override the currency name with the new value)
  • Grain: 1 row per currency
  • Business Key: ISO Currency Code (3 characters)
  • Surrogate Key: Currency Key
  • Attributes: Currency Name, Short Country Name (for Reporting), Currency Symbol (for Reporting), Date Valid From, Date Valid Until.
  • Attributes which should not be there: Country (one currency can be used by multiple countries), Old Currency Name (old currencies have different codes and listed on different rows, see point 5 below).
  • Data Source: initially created from ISO 4217. When operational it is updated manually using SQL script, either to insert a new row or to update the currency name.

Notes:

  1. The data type for the currency name should be VARCHAR(50). The current longest currency name is “Bosnia and Herzegovina Convertible Mark” which is 39 characters.
  2. ISO is strictly 3 letter characters so VARCHAR(3) is the correct data type for the currency code column. But to accommodate crypto currencies such as DASH, STRAT, LOOM, TREZOR, ERC20, DOGE we should make it VARCHAR(10). Particularly if the investment company / hedge fund is in the crypto business.
  3. If you opted for VARCHAR(3) in #2, the unknown row (currency key 0) has the 3 letters currency code of “XXX”, not “UNK”.
  4. Old and new currencies should be stored in different rows with the same country code in the first 2 characters. For example: Mexico: MXP (until 1993) and MXN (current), Russia: RUR (1992-1997) and RUB (current), Sudan: SDD (1992-2007) and SDG (current), Argentine: ARP (1983-1985), ARA (1985-1991) and ARS (current).
    Other examples are the predecessors of Euro, such as BEF (Belgian Franc), FRF (French Franc), ITL (Italian Lira), DEM (German Mark) and GRD (Greek Drachma).
  5. The Currency Symbol column is vital for reporting, e.g. £, $, €.
  6. China: CHY is used in the mainland (stand for Chinese Yuan), whereas CNH is used in overseas (the H stands for Hong Kong). RMB is the unofficial name for the mainland currency, it stands for Renminbi.
  7. The 4 most widely used currencies are USD, EUR, GBP and JPY. After that the most widely used are AUD, CAD, CHF and NZD. When people says major currencies they usually means these top 8. Number 8 is debatable between NZD, NOK and SEK. For fund’s base currency NZD is more popular. In forex trading NOK and SEK are traded more than NZD. In forex trading CNH is also a major currency. In some asset managers, “major” currencies mean the top six, i.e. the top 4 plus AUD and CAD.

C.8. Rating Dimension

Rating dimension stores the credit ratings from the 3 major agencies, i.e. Standard & Poor’s, Moody’s and Fitch. S&P and Fitch are using the same codes e.g. AA- and BB+ (except for default) but Moody’s codes are different e.g. Aa1 and Baa2.

On the one hand it seems that we need to establish a mapping between them. For example Moody’s Aa1 is equivalent to S&P/Fitch AA+. On the other hand the Cs and Ds are not really mapable, not to mention WR, SD, RD and short term ratings. Based on my experience we do not need this mapping relationship between the 3 agencies, because in the position/holding fact table we need to store the ratings as a junk dimension.

We don’t store the rating of each bond in the security dimension. Instead, we store them in the position fact table (as a junk dimension rather individual rating key, to minimise join). This is because when we need rating data, it is always for a specific date (usually a month end date).

Let me explain the last sentence because the concept is extremely important for the design of this dimension:

  1. We do not need to know that A1 in Moody’s is equivalent to A+ in S&P and A+ in Fitch.
    (and there is no one-to-one mapping for WR, SD, RD, Ca, C, DDD, DD any way)
  2. What we need to know is: what are the Moody’s, S&P and Fitch ratings for bond X on 30 Sep 2022.
    The Moody’s rating could be A1 but S&P rating could be A, not A+ which is the equivalent of A1. And Fitch rating could be A- which different to Moody’s and S&P.

So for 2) we store the ratings for bond X on 30 Sep 2022 in the position fact table, like this:
Date | Security | S&P | Moody’s | Fitch
30/09/2022 | Bond X | A1 | A | A-

Of course when storing in the fact table we should translate them into surrogate keys. We could store each of the 3 ratings individually like this:
Date_Key | Security_Key | Rating_SnP_Key | Rating_Moodys_Key | Rating_Fitch_Key
20220930 | 12345 | 5 | 6 | 4

Using a junk dimension

But in the position fact table we don’t just store 3 ratings. We store something like 20 ratings, which are the combination between those 3 agencies, and the “House” rating. House rating is the internal rating that we (the asset manager) give ourselves, based on certain algorithm which is accepted across all desks. The House rating is also known as the “Manager” rating.

The combination ratings are something like this:

  • The highest of S&P, Moody’s and Fitch
  • The lowest of S&P, Moody’s and Fitch
  • The average of S&P, Moody’s and Fitch
  • The highest of S&P and Moody’s only (don’t use Fitch)
  • The lowest of S&P and Moody’s only (don’t use Fitch)
  • The second highest of S&P, Moody’s and Fitch
  • Use the S&P rating, and if the S&P rating is blank use the Moody’s rating.
  • For asset class A, B and C use the second highest of S&P, Moody’s and Fitch, and for other asset classes use the average of S&P, Moody’s and Fitch.
  • The highest of S&P, Moody’s and Fitch, and if none of them is available use the house rating.
  • Use the house rating if it exists. If it doesn’t exist use the average of S&P, Moody’s and Fitch.
  • For asset class A and B use the highest rating of the underlying instrument (e.g. options and CDS).

The house view should be maintained by the risk department, based on the rule agreed by all desks. The process largely automatic, but there are a few cases where ratings need to be manually determined, for example CDS. Or where they need to override the automatic rating calculation for a specific bond due to the in-house decision/view.

Because we are not only storing 3 ratings (S&P, Moody’s, Fitch) but a total of about 20 ratings as explained above, a practical way to store the ratings of the holdings is using a junk dimension. In the source table (holdings) we do a select distinct of those 20 columns. Out of many thousands holding positions, this select distinct will only give us a few hundred rows. We give every single one of these rows a surrogate key, and replace those 20 columns with this surrogate key. That’s how in the fact table we only have 1 SK column which represents the 20 rating columns.

Why do we need a rating dimension?

So why do we need a rating dimension then, if the ratings for holdings are stored in the fact table? The answer is: we need a rating dimension to do “notching”, “grouping”, “averaging” and “IG and HY”.

Notching and grouping

A notch is one step in the rating ladder. S&P ratings for example, from the highest value are as follows:
AAA, AA+, AA, AA-, A+, A, A-, BBB+, BBB, BBB-, BB+, BB, BB- and so on.

If S&P downgraded the rating of a company from BBB+ to BBB, we say that the rating moves one notch. If the rating changed from BBB+ to BBB-, it moves two notches.

AAA, AA, A, BBB, BB, B, CCC and so on are called “un-notched” ratings. Whereas AA+, AA-, BBB+, BBB-, BB+, BB- and so on are called “notched ratings”.

Similarly, AAA, AA+, AA, AA-, A+, A, A-, BBB+, BBB, BBB-, BB+, BB, BB-, and so on is called the “notched” rating scale. Whereas AAA, AA, BBB, BB, B, and so on are called “un-notched” rating scale.

For credit portfolios/funds we often need to create a “breakdown by credit rating” report, like this:

Rating Portfolio Benchmark
AAA 5% 8%
AA 15% 20%
A 75% 65%
BBB 3% 7%
Cash 2% 0%
Total 100% 100%

This breakdown by rating report is used in the client reports, in the marketing materials such as brochures and factsheets, in the meeting material (for example, when a client account manager meet with the pension trustees), and for investment management (for example, to manage exposure to credit risk).

The above is an example of an “un-notched” breakdown report by credit rating. But in many occasions a “notched” breakdown report is required. A notched report is more detail than an un-notched report. They are like this:

Rating Portfolio Benchmark
AAA 5% 8%
AA+ 2% 4%
AA 5% 7%
AA- 8% 9%
A+ 10% 5%
A 25% 25%
A- 40% 35%
BBB+ 1% 2%
BBB 1% 3%
BBB- 1% 2%
Cash 2% 0%
Total 100% 100%

We can see in the above that the A rating for this fund is 75%, and on the notched report we can see that this 75% is broken down further into 10% A+, 25% A and 40% A-. This way, the investors, clients, fund managers and financial analysts can all see more detail information about their funds.

To produce these notched and un-notched credit rating breakdown reports we need the rating dimension.

Going from notched ratings to un-notched ratings is called “grouping”. Whereas going from the un-notched ratings to notched ratings is called “notching” (detailing out). Un-notched ratings are also known as “rating groups” or “major ratings” or “coarse ratings”. Whereas notched ratings are also known as “detailed ratings” or “minor ratings”.

Often, the mandate from the client includes a credit rating criteria, such as: 30% of portfolio should be AAA and AA (this includes AA+, AA and AA-). In the above fund, the AAA and AA are 5% + 15% = 20%. The rating dimension is used to monitor this criteria every single day, and the fund managers get alerted if weighting gets below the client mandate (or going to be, because of a trade request – which should be blocked in OMS).

Averaging

On the other hand, the mandate (instruction) from the client could be the average rating. For example, the average rating of the portfolio should be AA. To calculate the average rating, we need the rating dimension.

In the above fund, what is the average credit rating? Here is how to calculate it.

In the rating dimension we labelled all ratings with number 1 to 27, as follows:
AAA = 1, AA+ = 2, AA = 3, AA- = 4, A+ = 5, A = 6, A- = 7, and so on.

Then we convert the ratings in above table to numbers, like this:
(Note: cash is given number 1, the same as AAA because it has no credit risk)

Rating Number Weight Number x Weight
AAA 1 5% 0.05
AA+ 2 2% 0.04
AA 3 5% 0.15
AA- 4 8% 0.32
A+ 5 10% 0.50
A 6 25% 1.50
A- 7 40% 2.80
BBB+ 8 1% 0.08
BBB 9 1% 0.09
BBB- 10 1% 0.10
Cash 1 2% 0.02
Total 5.65

Because A+ is 5 and A is 6, a total of 5.65 means that the average rating of the fund is somewhere between A+ and A. In this case it closer to 6 (A). So the average rating of the fund is A (for that date).

IG and HY

So we now know that in the rating dimension we need two attributes: one for “un-notched ratings” and one for the rating values (numbers).

We also need an attribute to classify the ratings into what is called Investment Grade or High Yield. Investment grade rating is anything from BBB- and above. High Yield is any rating below the BBB-.

Investment Grade is abbreviated IG, and High Yield is abbreviated HY. HY ratings are also known as “junk” rating. In some companies, to avoid saying junk they called HY rating as “Non Investment Grade” ratings, abbreviated as “NIG” or “Below Investment Grade”, abbreviated as “BIG”.

We need this attribute so that we can report the % of the fund which is at IG rating vs HY rating.

Dimension Design:

So that’s it. Now we know what “notching”, “grouping”, “averaging” and “IG and HY” are, we are now ready to understand the dimension structure.

  • Dimension Name: Dim Rating
  • Dimension Type: Type 1 (treat changes in description and any other attribute as an error/correction)
  • Grain: 1 row for each rating, per provider/agency.
  • Business Key: Rating Code, which is the notched rating.
  • Surrogate Key: Rating Key
  • Attributes: Rating Agency (this should be the first column after the surrogate key), Rating Category (the second column after SK, value: long term, short term, outlook), Rating Code (this is the notched rating), Rating Description, Unnotched Rating (or Rating Group, preferably), Rating Value, Rating Grade (IG or HY), Equivalent Rating (for Moody’s rows this column contains the S&P equivalent rating, whereas for S&P and Fitch rows this column contains the Moody’s equivalent).
  • Attributes which should not be here: PD (probability of default), Moody’s, S&P, Fitch (they are put as rows rather than column, with the corresponding rating is in the Equivalent Rating column). Rating for holding should not be stored here. ESG rating, short term rating, rating outlook and cash rating can also be stored here (see notes below).
  • Data source: this dimension is initially loaded using a SQL script composed from S&P, Moody’s and Fitch, DBRS and AM Best websites, i.e. SP&: link, Moody’s: link, Fitch: link, DBRS: link, AM Best: link.

Notes:

  • This structure is known as a “stacked” dimension, because individual rating scale are “stacked” on top of each other. For example: row 1 to 27 could be for S&P and row 28 to 54 are for Moody’s. We have 2 columns which identify the rating scale: Rating Provider and Rating Category. When using this Rating Dimension, we specify the values for these two columns, for example: Rating Provider = ‘S&P’ and Rating Category = ‘Long Term’
  • The securities data from Bloomberg only contains S&P, Moody’s and Fitch. It does not contain DBRS and AM Best.
  • S&P, Moody’s and Fitch provide credit ratings on both issuer level and at issue/security level. We can get this data from Bloomberg or directly from the rating agencies (but then we would have the challenge of translating the issuer/security ID).
  • Rating number (rating value): for S&P and Fitch scale, from AAA to C the rating numbers are clear, i.e. 1 to 21 (note that CCC has + and -). D and SD (RD for Fitch) are both given number 20. But NR, blank and R are not given rating numbers (the issuer/security) is excluded from the averaging process.
    It is important that NR, blank and R are excluded because otherwise the average rating for the fund would be slightly lower than what it should be.
  • For Moody’s, the rating numbers from Aaa to Caa3 are given number 1 to 19. Ca is 20, C is 21. Blank, NR, WR, e, and p are excluded from the averaging process.
  • Cash is given rating number 1, the same as AAA because cash has no credit risk.
  • Money market holdings such as T, CP, CD, deposit, repo uses short term credit rating which can then translated to the long term rating equivalent, e.g. A-1 to AAA (notches are ignored), A-2 to AA, A-3 to A, B to BB, C to C, D to D.
  • Derivative securities: exchange-listed derivatives such as options and CDS (single names) are given the rating of the underlying. Basket derivatives such as index option and CDS indexes are not given any rating.
  • Fund holdings: funds have credit rating ended in “f”. They are translated into the issuer equivalent rating by removing the f suffix.
  • For securities without credit rating we could take the credit rating of the issuer.
  • Rating Outlook: in addition to the long term and short term credit ratings, securities/assets in the holdings/benchmarks also have rating outlooks. Their values are: positive, negative, stable. So we should store them in the rating dimension too, with Rating Category = Outlook. Rating outlook is also known as “rating watch”.
  • ESG Rating: ESG data providers such as MSCI and Sustainalytics provides ESG ratings. We can also store these ESG ratings in this rating dimension, by setting the Rating Provider and Rating Category columns accordingly. For example: Rating Provider = MSCI and Rating Category = ESG.

C.9. Industry Dimension

The industry dimension is very important to an asset management data warehouse, because it enables the business to dissect their portfolio by industry. The industry dimension is the easy bit: it contains the hierarchy, i.e. (top to bottom): industry super group, industry group, and industry sub group. Or in other company it may be called (top to bottom): industry sector, industry group, industry. That’s the easy part, i.e. the grouping of industry.

The difficult part is to ensure that every single asset in the portfolio and benchmark is given the lowest level of this industry hierarchy. No blanks. That’s difficult. It requires hard work of the reference data team, every day. The index/benchmark data from MSCI or “Bloomberg Barclays” has this industry field, but not every holding is in the benchmark. Luckily only a few holdings are not in the benchmark (we are talking hundreds not thousands) so the reference data team needs to assign the industry manually to those assets which are in the holdings but not in benchmark.
Note: it is called “Bloomberg Barclays” because 2 years ago (Aug 2016) Bloomberg bought the index business of Barclays (used to be Lehman before 2008).

There are a few popular industry classification scheme/hierarchies:

  • Bloomberg Barclays Classification System (BCLASS, 4 levels): link, see page 17.
  • Bloomberg Industry Classification System (BICS, 7 level)): link
  • ICE Sector Classification: link (before Oct 2017 it was Bank of America Merrill Lynch, 4 levels): link
  • Global Industry Classification Standard (GICS, 4 levels), developed by MSCI and S&P: link
  • Industry Classification Benchmark (ICB, 4 levels), owned by FTSE Russell, used by Nasdaq and NYSE: link, link
  • Standard Industrial Classification (SIC): link
  • North American Industry Classification System (NASIC): link
  • UK Standard Industrial Classification (UK SIC): link
  • International Standard Industry Classification (ISIC): link
  • Thomson Reuters Business Classification (TRBC): link
  • Morningstar Classification: Equity: link, Fixed Income: link

Rimes is one of the largest index providers (link), along with MSCI and Bloomberg Barclays. Rimes also provides the industrial classification fields for the benchmark/index constituents.

So in short, the industrial classification that an asset management firm uses depends on what industry fields is being supplied in the asset/securities data. In my experience at 3 investment banks and 3 asset managers, Barclays (now Bloomberg), GICS and ICB are the most commonly used.

GICS and ICB are good for equities as they classify industry sectors, but the strength of Barclays classification (and ML) is that they contains fixed-income specific classification too, such as securitized and government. Normal equities securities such as the shares of Vodafone, can be classified using GICS, ICB and BICS, but fixed income securities such as ABS, RMBS and the bonds issued by the government of Austria can’t be classified using GICS, ICB or BICS. Using Barclays and ML they can be classified. So for credit funds, we have to use Barclays or ML.

This is why on the equity desks (or in an “equity house”) they call it “Industrial Classification”, because every company has an industry. But on the credit desks (or in a “fixed income house”) they call it “Sector Classification”, because it is not just companies and it is not just industry. It also covers governments and other instruments not issued by companies.

Dimension Design:

  • Dimension Name: Dim Industry
  • Dimension Structure: “stacked” dimension
  • Dimension Type: type 1 (changes should be treated as corrections)
  • Grain: 1 row per industry sector, per scheme
  • Business Key: the lowest level of the hierarchy (e.g. sub industry), plus the scheme
  • Surrogate Key: Industry Key
  • Attributes: Industry Classification Scheme (or Scheme, for short), Level 1 to Level 7 (both code and name), and system attributes such as the last modified datetime and the inserted datetime.
  • Attributes which should not be there: parent code (because all parent-child relationship should be put as columns/attributes), provider (because the owner keeps changing over the year and some schemes are supplied by more than one provider, for example: Rimes and MSCI)
  • Data Source: the initial data load is from the individual provider (links above) and when it’s operational any unknown row in the fact table should be manually review and then a new industry or sector should be manually created (not automatically created).

Notes:

  • The system attributes such as the last modified datetime and the inserted datetime should be applied to every dimension, not just industry.
  • This structure is called “stacked” dimension because one scheme is placed on top of the other schemes.
  • The main purpose of this dimension is to group the industry of the security/company into higher levels.
  • The industry dimension is used for both the security related fact tables (such as portfolio and benchmark position and analytic fact tables) and the company related fact tables (such as ESG fact tables).
  • Mapping one scheme to another is the big question here. Like rating, the question is not “where does industry X in GICS maps to in ICB?” But the question is: on 31st October 2018, what is the GICS, ICB and Barclays industries for this security?” This is best answered using a junk dimension containing various industry classifications for a security, attached to the position fact table.

C.10. Asset Class Dimension

Asset class dimension is one of the three core classification data in an asset manager data warehouse (the other two are industry and rating). Asset class is sometimes known as asset group or asset type.

There are several ways to classify investment holdings. At the top level it is: equity, fixed income, cash, real estate, and commodity. Other top level asset class are (arguably): FX, infrastructure, collectibles.

Note:

  • Cash here includes cash equivalents, such as bank deposits and money market instruments (T bills, CD, CP, etc). But money market instruments, albeit short term, carry credit risks, so arguably they should be classified as fixed income.
  • Real estate is a better term than property because it includes land and agriculture.
  • Collectibles such as art and antique are usually called “alternative investments” to cover not just art and antique, but also various different kind of investments such as wine, coins, private equity, structured products and venture capital.
  • A fund is an interesting one to classify into asset class because it can be equity, fixed income or cash.

Instead of just asset class and asset sub class, to add flexibility the top level (e.g. equity, fixed income, cash equivalent, etc) is called asset super class. So we have three level now.

The fixed income asset super class is divided into the following asset classes: government bond, corporate bond, securitized and derivatives. Government is further divided into these asset sub classes: sovereign, government agency, local authority, and supranational. Corporate are divided into financials, industrials, and utilities. Securitised is divided into MBS, ABS, CMBS, CMO and Covered Bonds. Derivatives is divided into the following asset sub classes: swap, option, future, forward, warrant, CLN and CDS.

The equity asset super class is divided into the following asset classes: common stock, preferred stock, stock option, index option, stock futures, equity swaps, index futures, index swaps.

Dimension Design:

  • Dimension Name: Dim Asset Class
  • Dimension Structure: “stacked” dimension
  • Dimension Type: Type 1 (treat changes on all attributes as corrections)
  • Business Key: Asset Sub Class
  • Surrogate Key: Asset Class Key
  • Attributes: Asset Class Scheme, Asset Sub Class Code, Asset Sub Class Name, Asset Class Code, Asset Class Name, Asset Super Class Code, Asset Super Class Name, and system attributes such as the last modified datetime and the inserted datetime.
  • Attributes which should not be here: Government Backed, Market (EM/DM), Callable/Puttable, Floating/Fixed, OTC or Exchange Traded – these are all asset class related but not asset class classification. They should be in the “Asset Class junk dimension” attached directly to the position fact table. Industry grouping/sectors shouldn’t be here either, i.e. cyclical or defensive for equity.
  • Data source: loaded from manually crafted spreadsheet (static). This dimension is maintained manually in production using SQL script.

Notes:

  • I wouldn’t recommend calling the attributes in this dimension as Level1, Level2, Level3, etc (like in the Industry dimension) because it makes the whole thing obscure. These attributes widely known as asset (sub) class or asset (sub) type so let’s call them that.
  • It is possible that for certain asset class the levels go to more than 3 levels. In this case it would be justified to label them as Level1, 2, 3, 4, 5, and so on.
  • This Asset Class dimension is used for reporting, i.e. to group assets within the portfolios and benchmarks.
  • There could be a different schemes for different report so the structure of this dimension is a “stacked” dimension, with a scheme column.
  • There are many ways to categorise securities on asset classes, as I describe at the beginning of this section. But these are the main decisions that a warehouse designer should be making:
    • Split derivatives into OTC vs exchange traded or not? At which level should this split happen, highest level possible or the lowest level possible? In my opinion, we should not split into OTC vs ETD.
    • Within the securitized, should you include collateralized (CMO, CDO, CLO)?
    • Should you split muni into tax-exempt and taxable?
    • After you categorised equity into cash, future and option, what is the next level down? In my opinion it should not be based on industry. Basket/index vs single name perhaps? Should you divide the options into call and put?
    • Where to put FRN, CLN, ELN, CFD, FX Derivatives, Fund, TRS?
  • By nature the asset class hierarchy is ragged. Some are only 2-3 levels, but some are 4-5 levels. That’s why if we make it to “accurate” it would become less useful. Because to use it we need to take a certain level and report on it. And many of the attributes are not asset class, e.g. DM/EM, OTC/ETD, etc. So some attributes are best to be left in the asset dimension (or a junk dimension of the position fact table) and only the clear, basic, useful asset classes are put into this asset class dimension. We need to strike a balance between completeness and usefulness.
  • There could be a lot of permutations and combinations which are required for reporting. The most flexible way is to store these custom mappings in a “rule table” which will guide the grouping of asset in the portfolios and benchmarks.

C.11. Performance Return Basis Dimension

The performance data comes with attributes like: fee basis (net or gross), valuation basis (ask, mid, or bid), valuation timing (end of day or mid day), annualised or cumulative, price basis or total return basis, pricing method (single or dual pricing), etc. These are the things which are stored in the performance related dimension.

We can create 7 to 9 different mini dimensions for this. Or we can create one dimension for this. In my opinion one dimension is better. It should be a junk dimension. Instead of putting all the theoretical possible combinations, we should be putting just the actual combination which occurs in the data.

Dimension Design:

  • Dimension Name: Dim Performance Basis
  • Dimension Type: junk dimension
  • Business Key: combination of 6 to 8 performance related attributes in the performance return data
  • Surrogate Key: Performance Basis Key
  • Attributes: Fee Basis (net of fees or gross of fees), Valuation Basis (ask, mid or bid), Valuation Timing (EOD or Mid Day), Annualisation Basis (annualised or cumulative), Return Basis (price only or total return, i.e. the income/dividend is reinvested), Pricing Basis (single or dual pricing)
  • Attributes which should not be here: income or accumulated (this should be in the share class dimension), and currency (this should be an attribute directly on the performance return fact table), inclusive or unbundled (this should be in the share class dimension), performance fee basis (this is not a factor of performance return, it should be in the share class dimension).
  • Data loading: from the output of the performance calculation system such as Barclays Point or StatPro, or a third-party back office provider.

C.12. Broker Dimension

A broker is the company that we place our order with, to buy or sell securities (we as in asset manager). A dealer is a company who buys and sells securities with their own money. A broker-dealer is a company who is a broker and a dealer.

When we want to buy shares or bonds, we go to a broker. The broker then finds us a seller who has those shares or bonds. This seller is called a counterparty. Counterparty means the other party to a financial transaction/trade. If we buy, the counterparty is a seller. If we sell, the counterparty is a buyer. Sometimes the counterparty is the broker themselves, because they are a broker-dealer. They deal their own money, as well as their client money.

When we want to buy shares or bonds, we go to a broker. But if we want to buy OTC swaps or options, we go to a counterparty. This is because an Over The Counter (OTC) contract is a direct trade between two parties, without going through a broker: us and another counterparty. This counterparty is usually an investment bank, the top ones in London are: JP Morgan, Goldman Sachs, Citi, Morgan Stanley, BOA Merrill Lynch, Nomura, Barclays, BNP Paribas.

A prime brokerage is a company who is a broker, a dealer, a counterparty, and doing some other services too related to security and trades, i.e. clearing, custody, asset service, securities lending, financing, reporting, risk management, etc.

We put both broker, broker-dealer, prime brokerage and counterparties into the broker dimension..

Dimension Design

  • Dimension Name: Dim Broker
  • Dimension Type: Type 2 (record historical values for certain attributes)
  • Business Key: Broker Code
  • Security Key: Broker Key
  • Attributes: Broker Code (business key), Broker Name, Broker Type, Broker Status, Address (including post code, city and country – the address of the Headquarters), Email Address, Website, Contact Person. Also attributes related to regulatory reporting (see Notes below).
  • Type 2 Attributes: Broker Status, Contact Person. Others are type 1 attributes (treat changes as corrections). Address is type 1, i.e. any changes should be treated as correction. Usually there are so many corrections on the address, name, email address, and we don’t usually need their previous address, it is not reported anywhere. So storing previous address in Salesforce is enough.
  • Attributes which should not be there: any time dependent attributes related to counterparty risk or mark-to-market counterparty valuation, e.g. pricing method. Thy should be in a junk dimension attached directly to the MTM fact table or counterparty risk fact table.
  • Data Source: the best place to maintain broker and counterparty data is Salesforce. Daily extract/query via dbAmp is the best mechanism (similar mechanism to client data).

Notes:

  • There are several Mifid II attributes which should be in the Client Dimension and Broker/Counterparty Dimension, for example: Mifid Reportable Status or Type. These are used for classifying client and brokers for the purpose of regulatory reporting.

C.13. Date Dimension

Finally we reach the last dimension but not in the slightest least: the Date Dimension. As per the usual practice, it is best to populate the date dimension with the last 10 years dates and the next 10 years dates.

But, as experience shows, there are many bonds with maturity of 20 years (they expires 20 years from now, i.e. 2038). There are even bonds with expiry date of 100 years, e.g. it expires in 2116. In addition, there are funds which are 15 years old so the inception date is 15 years ago. So for an asset management data warehouse I would recommend to populate the date dimension with the following:

  • Daily dates from 20 years ago to 20 years in the future (1998 to 2038, round this to 5 year multiple e.g. 1995 to 2040, so 1 Jan 1995 to 31 Dec 2039). That’s 14600 rows.
  • Month end dates from 20 to 120 years in the future (31 Jan 2040 to 31 Dec 2140). That’s 1200 rows.
  • Month end dates from 50 to 20 years ago (1 Jan 1975 to 31 Dec 1994). That’s 360 rows.
  • 1 Jan 1900 and 31 Dec 9999 for the effective date and the expiry date of SCD type 2 rows (known as min date and max date).

In an asset management data warehouse, the most useful attributes in the date dimension is the Month End Flag, the Last Week Day In Month flag, the Previous Month End Date, and the Business Day attribute. They are frequently used for client and regulatory reporting, as well as investment analysis (BI) e.g. performance, risk and allocation analysis.

The Same Day Last Year, Same Day Last Quarter and Same Day Last Month are surprisingly useful, at times.

Most asset management company only need bank holiday data for one or two countries, e.g. UK and US. Even though they have a rep office or sales office in Japan, Ireland, Germany, Australia, etc. for reporting they don’t need the bank holiday dates for these countries.

Dimension Design:

  • Dimension Name: Dim Date
  • Dimension Type: Type 0 (insert only, no changes)
  • Business Key: Date (sometimes called Calendar Date or Full Date), data type: date, not datetime2.
  • Surrogate Key: Date Key (YYYYMMDD as per the convention in warehousing)
  • Attributes: Date, Month Number, Month Name, Month Short Name, Quarter, Year, Year Month (YYYYMM), Year Quarter (YYYY Qn), Week Number (in a year), Week In Month, Week Day, Month End Flag (Y for 31/3/2019), Last Week Day In Month Flag (Y for 29/3/2019), Week Day Flag (Y for Monday to Friday), Previous Month End Date, Previous Month Last Week Date (for 14/10/2018 this is 30/9/2018 and 28/8/2018 respectively), Business Day (1,2,3,… denoting the Nth working day in that month, blank/null for week ends), First Date In Month, Last Date In Month, First Date in Quarter, Last Date in Quarter (for 16/8/2018 these 4 attributes are 1/8/2018, 31/8/2018, 1/7/2018 and 30/9/2018 respectively), UK Bank Holiday Flag, US Bank Holiday Flag, Same Day Last Year, Same Day Last Quarter, Same Day Last Month (for 16/12/2018 these 3 attributes are 16/12/2017, 16/9/18 and 16/11/2018).
  • Attributes that should not be here: there is no point having Date Inserted and Last Modified Date as this is a static dimension bar bank holidays.
  • Data Source: programmatically populated at the beginning for the project.

Notes:

  • I can’t emphasis the importance of populating this programmatically using a stored procedure rather than manually from Excel. We are dealing with 16000 rows x 30 columns here so making an error is quite possible, and this dimension is used all of the place throughout the warehouse and in various reports and analytic tools.
  • The bank holiday flags are update every November for the year after next. For example in Nov 2018 we set the bank holiday flags for year 2020.

6 September 2018

Modern Data Warehousing

Filed under: Data Warehousing — Vincent Rainardi @ 7:09 pm

What should 2018 Data Warehousing look like (as opposed to in 2005)? That’s what the word “Modern” refers to. We now have streaming data, we have big data, we have data lake, and we have data scientists. A modern DW in 2018 incorporates these 4 factors. An “old” DW in 2005 did not incorporate these 4 factors.

The first attempt to define a Modern DW was from Bill Inmon in his DW 2.0 concept in 2006, where he added unstructured data, different user communities and (whitepaper: link, book: link). There was no big data or NoSQL at that time, but we already had unstructured data in data warehousing. I would argue that metadata and different archiving lines were not new, they had been in DW before 2005.

Then there was an attempt to define a Modern DW as Lambda, i.e. streaming and batch. The best definition is probably from Melissa Coates: link, search “characteristics of a modern data warehouse” on this page. Here she defines other importance characteristics such as Hadoop, data lake, NoSQL, bimodal (sandbox). I would argue that data virtualisation, near real-time, governance, agile delivery, self service BI, large volumes, MDM and high variety are not new. They had been in DW before 2005.

The Azure community is quite adamant with the Lamda concept of a modern DW. Notably Adatis, in particular Simon Whiteley in April 2017 (link), and in SQL Saturday. Microsoft supports this Lambda idea in Azure (concept: link, diagram: link).

Then there is a concept of “Data Lake side-by-side with DW”, notably proposed by Barry Devlin, supported by Hortonworks (link) and also proposed by BlueGranite (link) where Melissa works (see her SQL Saturday presentation is here). The diagram on page 5 of the BlueGranite eBook, which essentially the same as page 5 of Melissa’s SQL Saturday, is in my opinion the best diagram describing a modern DW, as we can see below:

BlueGranite diagram

Here we have the data lake as the staging area for both the streaming data and the data warehousing. And it is also the staging area for the NoSQL data store such as graph database. And the data lake is also accessed by the machine learning. So there are 4 purposes of the data lake.

Arguably it’s Microsoft centric (i.e. Azure), but they are the best vendor at the moment with the most complete components for modern DW (in my opinion). A close second is AWS, with Redshift for the DW and S3 for the data lake (catalogue & ETL on Glue). The Hadoop and Spark are on EMR (Elastic MapReduce). GCP is on the 3rd place with BigQuery as the DW, Cloud Storage as the data lake, and Cloud DataFlow for the streaming data. The Hadoop and Spark are on Cloud Dataproc.

Blog at WordPress.com.