Data Warehousing and Data Science

3 January 2017

Data Warehousing/Business Intelligence in Investment Banking

Filed under: Analysis Services — Vincent Rainardi @ 8:39 am

I’ve written an overview of an investment bank (IB): link, which would provide a useful background to this, if you want to read it first. The intended audience of this article are those who have no experience in investment banking, so I will be explaining all the IB terminologies as we go along.

Of the 7 business areas in IB, a business intelligence system is highly required in 3 areas: research, fund/asset management, trading, and almost not required at all in the other 4 areas. I’ll start with trading as it is IB’s main business, and it is the one area that uses DW/BI the most.

An investment bank does trading on behalf of its clients because it acts as a broker dealer. It receives an order from many clients to buy and sell securities such as shares, bonds, options, FX forwards, IRS, commodity, and CDS. Both Exchange-Traded and Over-The-Counter (bilateral agreement with another dealer).

As a broker dealer, we need to offer our clients good BI, preferably real time, of all the trades we do for them. That’s the minimum, but preferably we offer market data / market intelligence. Clients usually don’t use us just because we have thin spreads, and clean-speedly executions, but because we win them over with our information systems which enable them to make decision early and accurately.

This BI enables the clients to see each and every trades, in full life cycle. I’m going to explain the mechanics so please bear with me. Our client setup the order on their OMS (Order Management System) such as Charles River or ThinkFolio. We receive the order via Omgeo CTM or Swift, for example to sell 200,000 of bond X. We start the process of origination, execution, validation, confirmation, clearing, settlement and accounting. To get the best price we might (progammatically!) have to split the order 3 ways: 75k to Goldman, 75k to MS and 50k the next day with Citi.

The terms could be different from each counterparty, particularly if it is margin trading such as a Repo. The order could be complex i.e. a delta neutral trade, a volatility play, or an option spread.

Broker or dealer: we become a broker when we buy an ETS (exchanged-traded-security) or ETD (exchange-traded-derivatives) such as shares, gilts, treasury, or CP (commercial paper) on behalf of our clients. Here we purchase or sell the security/derivative on an exchange. We also become a broker if we make an OTC (over the counter) contract on behalf of our clients, for example IRS and FX Forward or some futures. Here we make a custom “deal” with another bank. We become a dealer when we trade on behalf of ourselves, with our own money, for example for our Proprietary Desk (I’ll explain this shortly) or for our own Treasury department.

Here is an overview of each of the 6 steps:

  1. Origination is about making sure that we capture the client orders correctly into our trading system. Quantity, security details, maturity, price requirements, timing restrictions, counterparty restrictions (exchange restriction), and terms (like European or American for options, ISDA / DTCC for CDS).
  2. Execution is about sending the order down the wire to different counterparties, or to our internal pool first to match with other client orders. ETS orders could be combined with other client orders, or with our own prop desk, and placed externally in one big volume to get a good deal.
  3. Validation is done by mid office to check the trade against the counterparty, i.e. we send them our trade details, they send us their trade details, and we compare them electronically. Unmatched trades are reported back to counterparty, and fixed manually.
  4. Confirmation: after we agreed with the counterparty, we send the details of the trade to the clients, electronically.
  5. Clearing: clearning agency e.g. NSCC electronically sends the contract to both parties and both formally accept the trade terms.
  6. Settlement: payment to counterparty (OTC) or Exchange (ETS), usually T+1 or 2 depending on which exchange. Settlement for STP (straight through processing) is same day.
  7. Accounting: trades are booked to clients accounts. Calculates balance of client funds, and margin requirements.

The BI systems we are offering to clients on our extranet needs to reflect every single one of the above status/step, on real time basis (like 2-5 mins after the event). If we only offer overnight updates we may be able to win small clients but big fish (buy side) all needs real time, as it affects the fund manager ability to manage their portfolios intraday.


Business Intelligence for Trading

The 5 core requirements are: data coverage, for external & internal use, position exposure, risk management, regulatory compliance.

Data Coverage: There are many different asset classes within the IMSes across the bank (Investment Management System), and for OTCs there are several external trade repositories, all of which need to be tapped, and have the data extracted into our data warehouse. Our DW/BI must store every single trade we execute, and every single order we receive (both from client and internal), regardless of the status i.e. we must still store cancelled order, novations, compressions.

External & Internal Use: the DW/BI is used both by our clients to view their transactions, positions and risks, but also importantly by our traders and managers to understand their activity. In terms of security, Chinese Wall is an absolute must. No traders should be able to view client’s price-sensitive information, and clients must only be able to see their own positions.

Position Exposure: a “position” is an obligation or asset that a client have, on a particular day. A trade to buy a bond for example, results in that client having an long position in that bond. A position can be long or short, open or closed, have a book value and market value, date-and-security based, and has an assocated P&L value i.e. the potential profit or loss. Clients must be able to see their aggregation positions at various levels. For example, number of CDS positions and the grossed & netted exposure for each day. Clients should be able too see overal exposure by broker, by clearing house, by asset type, by status, by sector, by currency, by country, and by instrument.

Risk Management: every position/exposure carry certain market risk, e.g. the EUR-USD rate can move against you, yield and duration of bonds can increase or decrease. To quantify this market risk, we calculate VAR (Value at risk), e.g. there is a 1% chance that the market value of all Investment Grade fixed income positions decrease by 1 million in 1 day. We call this 1% as “99% convidence level”, this 1 day as “the risk time horizon”, and this 1 million as “the VAR”. Clients should be able to see VAR by broker, currency, currency, country, maturity, asset class, on a 1-day, 1-week and 1-month horizon, at 95%, 99% and 99.9% confidence levels.

Every trade has certain operational risks. A dashboard displaying daily automatic confirmation and validation across all market positions (compressed and uncompressed) gives our client confidence about how much risk exposure they have on each asset class, broker, currency, e.g. volume of mismatched trades. Every trade has certain counterparty risks, therefore we need to provide to our clients exposure to each counterparty, e.g. £4.5m agains RBS on 17/12/2016. They can see the this daily counterparty exposure on a time line, for different brokers, for different asset class, with the associated MTM (marked-to-market).

Regulatory Compliance: Different regulators demand different reports which we need to produce simultaneously, e.g. Dodd Frank in US, EMIR in EU, FATCA, Mifid. We need to help our clients fulfill their regulatory reporting requirements, as well as our own reports to the authorities, e.g. we must report every OTC Derivative contract to a trade repository, implement margining standards, and monitor clearing obligation in each CCP (Central Counterparty). This includes fraud detection, insider trading, sanctions, AML (Anti Money Laundering), KYC (Know Your Clients), RWA (Basel III). Clients pass to us their regulatory reporting requirements, and we create automated reports within our DW/BI to give them this data, securely downloadable on extranet.


Prop Desk and Fund Management

Many investment banks also invest their own money into some assets, most notably fixed income. We call this “Prop” – short for Proprietary Desk or Proprietary Trading. There 2 kinds: it can be short term, i.e. just this trade, or it can be long term, the trade is part of a big plan to manage money, some kind of “fund management”.

We want to do prop because we want to make market. If a client wants to sell their CDS cheaply, or buy an IRS at good price, why don’t we take the opposite side and create a market? If someone is under pressure to sell a UBS CDS at 6% below the market price it is profitable for us to buy it and sell it to someone else the next day at 2% below the market and we make immidiate money.

The prop desk also actively looks at the market for opportunity. So if they believe that the Columbian bond recent 32% dive has made the price too low, they will instruct our trader to buy some. Or the Prop could want a Delta Neutral Trade, to gain from Volatility. Which means, it doesn’t matter which way the price is moving, whether the price is moving down or up we’ll still make money. The more volatile the price, the more money we make.

So that’s the incidental. But the bank’s money is usually managed from a long term perspective, not just or day-to-day trading. As an example, we may want to create a long position on Cable because we believe US politics will result in the Fed raising rates next year. Or take advantage from the raising EM market next year, or Japan equity.

All the big IBs (JPM, Nomura, UBS, Barclays, etc) have asset/wealth management business. They take client’s money and invest it on the clients’ behalf. This AM business also instructs our trader to buy or sell stuff to create positions for the clients.

The BI requied to support the prop money and asset management has the following objectives:

  1. Compliance: each fund has a “boundary” e.g. 80-90% must be invested in EM bonds. “EM” = Emerging Market, i.e. Chile, Russia, India, China, etc. Must be at least 95% in USD. Use of options must be <= 10%. Latam (Latin America) maximum is 30%. And so on. And all these “limits” are coded into the OMS (Order Management System) so that everytime a trade is created on the OMS, it will be automatically checked before the trade is executed. And therefore, afterwards, we will be able to plot in our BI, the chart of “EM Bonds” content of the fund (percentage wise) for each month from 2014 to 2016. And not just EM Bonds, but on the same dashboard we should be able to see the regional breakdown of the fund, in the last 3 years. We should also see the currency breakdown of the fund, month to month, in the last 3 years. And we could see in that chart if any non-compliance event happened.
  2. Profitability: we can compare the amount of money we make in each trade, to the amount of capital required to do the trade, and the amount of time required. We can aggregate this to client level, fund level, country level, currency level, asset class level etc., for any time duration we like.
  3. Risk: we can look at the level of risk we take in each trade, in terms of contribution to VAR, and in terms of historical events. Meaning that, if the 9/11 event happens tomorrow, what kinds of risk do we have in this portfolio. Have we hedged it? This is tested not just for 1 event like 9/11, but hundreds of events. Some of them are real events which happened in the past. Some of them are theoretical events such as raise of interest rates, and FX rates.



The Research guys make money by selling market analysis. Also on economic and geo-political risk. Both the market as a whole (macro/overview level) or on an individual company level. We charge clients some fees to allow them to use our market analysis tool, which enable them to monitor market behaviour, market events, and get signals and alerts on certain situations they want to trade on.

So the BI for Research has two purposes:

  1. To support our analysts doing their market and economy analysis.
  2. To support our clients doing their analysis.

The BI provides for example various market indicators such as CDS spreads for each corp bond, historical interest rates and currency rates, yield curves. From the bond price and maturity we calculate yield, duration, and various spreads on each and every fixed income instrument, and let our analysts and clients feast on this data.

They can filter by asset class, country, currency, region, industry sector. See what is the trend today, this week, this month, and this quarter. We let our clients view our forecasts that our analysts have created.


Part 4. Technical Design

There are 5 architectures which can be used for BI for IB. Three of them are old approaches (#1-3 below), and two of them are new approaches (Data Vault & Hadoop)

  1. DDS only (Dimensional Data Store)
  2. ODS (Operational Data Store) + DDS
  3. NDS (Normalised Data Store) + DDS
  4. DV (Data Vault) + DDS
  5. Hadoop + DDS

A bit on terminology:

  • DDS means dim & fact tables. It’s a dimensional model, i.e. Kimball star schema.
  • ODS means current version only. The historical value of attributes are not stored. Surrogate Keys are generated in the DDS.
  • NDS means storing historical values of the attributes. DDS becomes dummy. Surrogate Keys are generated in the NDS.
  • Data Vault stores the attributes separately to the business keys. And the links between business keys are also stored separately. It is superior for capturing time-based relationship between tables.
  • Hadoop stores data in a distributed file system (HDFS, Yarn), and use Hive or NoSQL to query the data out.

As you notice above, in all 5 architectures the BI tool (Cognos, BO, Tableau, etc.) is facing the DDS. The ODS, NDS, DV and Hadoop are all “back end engine” which the users never see. Their function is to integrate data from many source systems.

Because of this the ODS, NDS, DV and Hadoop are called the Integration Layer (IL) or the Enterprise Layer (EL). Whereas the DDS is called the Reporting Layer (LR) or the Presentation Layer (PL). The primary benefit for having an Integration Layer separate to Reporting Layer is to make data integration easier because it has a separate model, which breaks down the entities into more detail level than RL. IL data model is shaped up to be “write friendly” (hence normalised model) whereas RL data model is shaped up to be “read friendly” (hence denormalised model).


Dimensional Model Design

Dimensions: Date, Client, Broker, Branch, Product, Instrument, Desk, Trader, Issuer, Country of Risk, Currency, Rating, Collateral, Fund, Asset Class.

Designing instrument dimension (also called “security” dimension) in IB is a tricky business. First we need to decide whether we want cash and OTC in the instrument dimension. Second, we need to decide whether we want to split the derivatives (IRS & CDS in particular) into a separate instrument dimension because of their rich attributes. Third, we need to decide whether the instrument classification (such as instrument currency, rating, country of risk, asset class, etc.) should be stored inside the instrument dimension or outside.

Because of the last point, sometimes you don’t see the rating dimension and country dimension, because they are amalgamated into the instrument dimension. But issuer and currency are usually created (more often than not).

When creating Client, Branch and Broker, some companies feel clever and created a table called Party in the Integration Layer. I advise against doing this and prefer to split them into 3 entities for clarity and flexibility reasons. In DDS they are always separate dimensions. Some companies split the address and country in the ODS/NDS into separate table in the name of normalisation. I advise against doing this and prefer to keep demography and geography attributes (such as address & country) within the client, branch & broker table for simplicity and flexibility, even if that means data redundancy and breaking the 3rd Normal Form.

There are many, many instrument classifications in IB, from asset type, duration bands, maturity bands, country, country group, region, currency, currency group, broker type, coupon type, interest rate type, settlement type, collateral type, contract type, rating type, clearning house type, issue type, market type, derivative type, interest calculation type, direction, swap type, etc. All these can be created as their own table or created as 1 common key-attribute-value table. I prefer the latter due to its simplicity and consistency in modelling, as well as ease of use. The primary benefit for doing so is that we don’t need to change the data structure, i.e. using 1 common table we don’t need to keep creating a new table when we have a new classification, which in IB it happens almost every month. Within RL of course all of them goes to 1 dimension: instrument.


Fact Tables

There 8 major fact tables in IB:

  1. Position
  2. Transaction
  3. Risk
  4. Performance
  5. Collateral
  6. Deal Pipeline
  7. Client Account
  8. Prop Desk

I’ll explain the above 8 one by one, bear in mind that each of them might be implemented as two or more physical tables.

  1. The Position fact table stores the daily value of each position (instruments and cash) in each portfolio. It also stores the analytic values of each instrument, such as spread and yield. And time-based properties such as duration and maturity along with its banding.
  2. Transaction fact table stores the buying and selling activities in each portfolio, and non-trade transactions too, e.g. interest payments, defaults, haircuts, instrument maturity, and corporate actions.
  3. The Risk fact table stores the VARs, Tracking Errors, stress test impacts. To be flexible, in some IBs it is designed as a “vertical fact table” where the “risk type” column determines the measures. But for read-efficiency and for clarity reasons I prefer to create the measures as individual columns, with the time horizon (1 month, 1 year, etc.) as a separate columns, and “cuts” (by region, currency, asset class, etc.) as separate columns.

Risk is usually calculated not at instrument level, but at “cuts” level, including at portfolio level. But if it does then we need to create a separate fact table to store risk numbers at instrument level.

  1. The Performance fact tables stores the growth of a fund in the last 1 month, 3 months, 6 months, 1 year, 3 years, 5 years and 10 years period. Each “share class” is stored in their own row.


Before continuing to fact table number 5 to 8, let me explain the aggregatability first:

Unlike Position and Transaction fact tables, both Risk and Performance fact tables are not aggregatable. Every “cut” in the Risk fact table stands on their own, and every share class in the Performance fact table stands on their own.

The Position and Transaction fact tables are only aggregatable up to portfolio or fund level. It best not to put the share class into position or transaction fact table, because from position and transaction point of view, they are the same. But in the Risk and Performance fact table, we must put the share class, because the Risk and Performance numbers are different for each share class (because Risk and Performance numbers are affected by the portfolio currency and accumulation/income).


  1. The Collateral fact table stores the market value of individual OTC derivative (called MTM – marked to market) and the required value adjustments in collateral against each broker-dealer.
  2. The Deal Pipeline fact table stores the flow of a deal between the bank and lending clients, M&A clients, and transaction banking clients. It record the status since when the client was a prospect, until the deal is agreed and closed.
  3. The Client Account fact table stores the clients money that we manage and invest, including subscriptions (deposits, i.e. client puts money in) and redemptions (withdrawals, i.e. client takes money out), buying and selling activities. This fact table need to store both the movements and the daily balance of each client account (hence preferably split into two – one periodic snapshot, one transactional).
  4. The Proprietary Desk fact table stores the bank’s own money that we manage and invest, including subscriptions and redemptions, buys and sells, and corporate actions. It is a legal requirement in US, EU and UK that we must separate clients money and our own money.


That’s the technical design of the BI for IB. Hope it has been useful for you.


  1. Message Automation
  2. ISDA OTC Commodity Lifecycle
  3. Trade Processing



  1. Hi Vincent,
    Nice writes up on Investment Banking.

    Vincent – I have a general question regarding banding in DHW. Lets suppose we have predefined age band as 0-18, 18-30, 30-45, 45-60 and 60+.
    Now the requirement is to make this band dynamic.
    E.g – There are different set of user groups and they want to see data in different bands on a particular MIS. They might want to see data in different band for different MIS as well. And this should reflect for historical data as well.

    User Group 1 want age band as 0-20, 20-40, 40-60 and 60+
    User Group 2 want age band as 0-30, 30-45, 45-60 and 60+
    User Group 3 want age band as 0-17, 18-35, 35-50, 50-65 and 65+

    How can this be implemented in datawarehouse in efficient way.

    Thanks, Santosh

    Comment by skrai81 — 15 February 2017 @ 7:32 am | Reply

    • Hi Santosh, good question. Dynamic banding is implemented by having banding tables like this:

      Table Name: Band Detail
      Band Detail ID, Band ID, Segment Label, Lower Limit, Upper Limit, Sort Order
      0, 0, 0, ‘Unknown Band’, 0, 0, 0
      1, 1, 1, ‘0 to 9’, 0, 9, 10
      2, 1, 2, ‘10 to 19’, 10, 19, 20
      21, 3, 1, ‘$0 to $9,999.99’, 0, 9999.99, 10
      22, 3, 2, ‘$10,000 to $19,999.99’, 10000, 19999.99, 20

      Table Name: Band
      Band ID, Band Group ID, Band Name, Band Description
      0, ‘Unknown Band’, ‘Unknown Band’
      1, ‘Age Bracket (10 multiples)’, ’12 age groups from 0 to 120 in 10 multiples’
      2, ‘Age Bracket (5 multiples)’, ’24 age groups from 0 to 120 in 10 multiples’
      3, ‘Annual Income ($10k increment)’, ‘income groups from 0 to $100k in $10k increment’

      Table Name: Band Group
      Band Group ID, Band Group Name, Band Group Description
      0, ‘Unknown Band Group’, ‘Unknown Band Group’
      1, ‘Age Brackets’, ‘Classification of customer age into different brackets’
      2, ‘Income Brackets, ‘Classification of customer annual income into different brackets’
      3, ‘Delivery Days’, ‘Classification of actual delivery days into different brackets’

      1. At first implementation, it is best to resist the temptation to add Effective Start Date, Effective End Date and Current Flag columns.
      2. For age brackets, it is clearer to label the segment as ‘0 to 9’ than ‘0 to 10’ because of the overlapping 10. The age is always rounded to integer, either rounded down to rounded to the nearest.

      Assuming that this is our customer dimension table,
      Table Name: Dim Customer
      Customer Key, Customer ID, Customer Name, Date of Birth
      135, ‘L6923’, ‘Andrew Kerr’, ‘5 Jun 1964’
      249, ‘H3410’, ‘Phoebe Rathbone’, ‘9 Sep 1981’

      We then create a view, which adds the Age Bracket column into the Customer Dimension above as follows:
      create view dim_customer_view as

      select dc.customer_key, dc.customer_id, dc.customer_name, dc.date_of_birth,
      band.segment_label as age_bracket_10
      from dim_customer dc
      inner join
      ( select b.band_name, b.segment_name, b.lower_limit, b
      from band_detail bd
      inner join band b on bd.band_id = b.band_id
      ) band
      on band.band_name = ‘Age Bracket (10 multiples)’
      and round(datediff(Y, getdate(), dc.date_of_birth), 0)
      between band.lower_limit and band.upper_limit

      1. You can choose to use “x >= lower_limit and x < upper_limit”, and changed the upper limit to be the same as the lower limit of the next bracket. For example, for band_id 21 in the above example, set the upper_limit to 10,000 instead of 19,999.99.
      2. We can create age_bracket_5 column by doing the same join. In this case it is more efficient to put the “band subquery” into a CTE and use it multiple times.

      Because age bracket changes every day, this method eliminate restamping the age bracket attribute during ETL. For a large dimensions with over 10 million rows, I recommend physicalise the view as a physical table (drop and recreate daily during DW load).

      Comment by Vincent Rainardi — 16 February 2017 @ 8:10 am | Reply

      • Thanks a lot Vincent for step by step explanation.

        Comment by skrai81 — 17 February 2017 @ 9:50 am

  2. For (almost) dynamic banding I sometime use a simpler but somewhat less general procedure. I use a single DimBand dimension, with a row for every year (100> rows). Then Each column could be a different band, for example “Band 0-18-25-40-60-Over 60” has year 41 assigned to the “5: – 40 to 59” band. It is not perfect (I need to add a column if a want a new band, and I need to use a lexicographic order in descriptions) but is dead simple and effective.

    Comment by momobo — 13 March 2017 @ 1:29 pm | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: