Data Warehousing and Data Science

27 September 2015

Credit Risk and Market Risk

Filed under: Business Knowledge — Vincent Rainardi @ 5:59 am

Broadly speaking when we talk about risk in investment banking IT, it’s about 2 things: Credit Risk and Market Risk. Other financial risks are liquidity risks, operational risks, legal risks, but they don’t usually require a large IT systems to manage them.

Credit Risk

As a bank, credit is about lending money to companies to get interest. The companies are called obligors or counterparty. These obligors have obligation to pay us certain amount at certain times. The risk here is if those companies cannot pay us the amount they need to pay, when it’s due. This is called a default.

Credit risk is about 2 things: a) to manage the credit portfolio, b) to manage credit transactions. For a), the goal is to maximise the risk-adjusted return by maintaining the credit exposure of the whole portfolio within certain parameters. This is done using economic capital, correlation and hedging. These subjects are explained in these articles:

Economic Capital:


Things that a credit risk business analyst (BA) is expected to understand are: counterparty risk, CVA, Basel II & III, credit portfolio management, PD, LGD, EAD, Expected Loss, VAR, KMV, PFE, volatility, Economic Capital, RWA, Monte Carlo, correlation, ratings, credit derivative.

A credit risk data warehouse has the following functionalities:

  • Calculate Value At Risk and volatility of the credit portfolio every single day.
  • Produce regulatory reports such as Risk Weighted Assets, capital requirements, stress tests, and Potential Future Exposure.
  • Calculate portfolio risk measures such as Exposure At Default, Expected Positive Exposure, Credit Valuation Adjustment, counterparty risk.

Market Risk

Banks, insurance companies, pension funds and hedge funds all invest their cash in various things: shares, bonds, derivative, commodity, property, or in other companies. You intend to keep them for years. This is called investment portfolio, e.g. if you have £1 million to invest, you put 20% in bond, 50% in shares, etc.

Sometimes you don’t keep it for a long time. But only a few days, or even a few hours. This is called trading portfolio. Shares, FX, commodity, derivative, etc.

The value of your portfolio (be it investment or trading) can go up or down depending on 4 factors: the share prices, FX rates, interest rate and commodity prices. These 4 factors is called market risk, because the prices of these 4 factors are determined by the market (the buyer and the seller).

23 September 2015

Investment Banking

Filed under: Business Knowledge — Vincent Rainardi @ 7:45 am

Traditionally, the core business of an investment bank (IB) was to help companies raise funds in the capital market  and doing merger & acquisition. In addition to these 2 core services, IBs also offer these services to clients: research, fund management, trading, market making and wealth management. IBs also do trading for themselves (using their own money, called prop desk).

Let’s take a look at these services one by one. But before that, let’s quickly describe sell side and buy sides, and private and public sides.

In the investment banking world there are 2 sides: the sell side and the buy side. The sell side (link) are companies that sell investment services, for example: an IB which does broking/dealing, raise funds in capital market, M&A/advisory, and research. Buy side (link) are companies that buy investment services, for example: private equity funds, mutual funds, life insurance company, hedge funds, and pension funds.

Within an investment bank, we have 2 sides: private side and public side. Private side is the part of the bank which have access to inside company information (i.e. their clients) which are not available to the public. For example: M&A division and capital market division (Debt Capital Market/DCM and Equity Capital Market/ECM). Public side is the part of the bank which only have access to public information. For example: trading and research. Between these 2 side we have a “chinese wall”, which separate the 2 parts of an investement bank. The 2 parts must not (by law) exchange information. Chinese wall is a fundamental principle that has to be considered very seriously when designing IT systems for an investment bank.

Now let’s take a look at the services of investment banking:

  1. Raising capital is basically issuing bonds or equity (IPO or secondary offering). The bank acts as underwriter, meaning that the bank (usually a syndicate) buys all the bonds or shares from the company, then sells it to the market with spread (for stock) or fee (for bond). This require a lot of corporate finance work.
  2. M&A is the original meaning of “investment banking”, i.e. to find the client a buyer, or to find the client a company to buy (takeover, acquisition). Or, to have an idea that if company A & B are merged there would be advantage for both companies, such as synergy, vertical integration, increased market share or economy of scale, then try to sell the merger idea to both companies.
    There is also spin off or de-merger, where some part of a company is detached (created as a new company), and then sold off to another company. M&A also involves a lot of corporate finance work. M&A is also called “advisory”.
  3. Research covers equity research, fixed income research, macro economic research, technical analysis, quantitative analysis. In addition to individual companies, equity research provides industry trends, market trends, sector weightings and geographical preferences. Technical analysis (link) studies the historical price to predict the future direction in a particular market or a single-name issue. Research also provides tools which enables clients to access forecasts and evaluate capital structure, and to search for a specific company/sector/year/asset class.
  4. Fund management manages clients’ money in mutual funds (open ended) or investment trusts (close ended). Covering many sectors, i.e. by asset class (equity, bond, cash, commodity), by geography (UK, US, European, Global), by type (growth, income, recovery, absolute return).
  5. Trading buys and sells securities in the capital markets, on behalf of the clients. Covers various asset classes including equity, credit, FX, commodity, securitized, prime, multi asset and tailored. Tailored brokerage offers tailored off-market transaction such as distressed situations, sale-and-leaseback and company expansions (link). Prime brokerage (link) offers services for clients to borrow securities and trade/invest on netted basis and leveraged basis (link).
  6. Market making: provide liquidity in the market by quoting both buy and sell price (simultaneously) in a share or a bond or a commodity, usually narrower than the market spread (link), hoping to make money from the bid-offer spread (link).
  7. Wealth Management: provide advisory on financials and investments to high net-worth individuals/families, as well as the work/execution. This includes retail banking, estate planning, will, tax and investment management (link). Aka private banking, which is misleading because wealth management is not only banking but also legal, tax, and investing services.

Some of the top investment banks are (link): Goldman Sachs, Morgan Stanley, JP Morgan, Bank of America Merrill Lynch, Deutsche Bank, Citigroup, Credit Suisse, Barclay, UBS, HSBC, Nomura, RBC, BNP Paribas, RBS.

19 September 2015

Investment Banking Books for BAs and Developers

Filed under: Business Knowledge — Vincent Rainardi @ 7:04 pm

A friend recently asked me to recommend books in investment banking and this was the list I came up with. The intended audience of these books are people who don’t have investment banking background or work experience, but have experience in database development or data architecture. So it is more of “I’m a BA, developer or architect with retail / healthcare / manufacturing experience and want to get into investment banking or asset management* (as a BA/developer/architect, not as a trader or financial analyst!), what books should I read?” So it’s kind of “I want to learn the business processes from IT point of view”.

There are two meanings of the words “investment banking”. Traditionally it means Merger, Acquisition and LBO (Leveraged Buyout). It is about analysing financial statements, valuation methods, and M&A modelling. These are skills and knowledge required to do the traditional business of an investment bank, which is to help clients raising capital by issuing securities as well as advising clients on M&A (link). I went to an investment banking course with IBI in 2011 and learned these traditional functions to my surprise. The second meaning is: an investment bank is a bank who trades financial securities, or acting as an intermediary in the trade as brokerage or market maker, as well as providing analysis, research and ratings (aka the “sell side” of Wall Street). The buy side of Wall Street are investment companies such as asset manager, who buy securities for investment purpose and fund management (see below).

Below I’m suggesting one book for each area of investment banking (both meanings above), as well as the buy side.

  • Introduction: Investment Banking Explained: An Insider’s Guide to the Industry, by Michel Fleuriet, link
  • Trading: The Trade Lifecycle: Behind the Scenes of the Trading Process by Robert P. Baker, link.
  • Equity: Investments: Principles of Portfolio and Equity Analysis (CFA Institute Investment Series) by Michael McMillan and Jerald Pinto, link.
  • Fixed Income: Fixed Income Analysis by Frank J. Fabozzy, link.
  • Derivatives: Derivatives Demystified: A Step-by-Step Guide to Forwards, Futures, Swaps and Options by AM Chisholm, link.
  • Merger & Acquisition: Investment Banking: Valuation, Leveraged Buyouts, and Mergers and Acquisition, by Joshua Pearl & Joshua Rosenbaum, link.
  • Asset Management*: A Guide to Fund Management by Daniel Broby, link.
  • Risk: Risk Management and Financial Institution by JC Hull, link.

*Asset Management or Investment Management is an industry sector containing investment companies, which manage pooled funds or segragated mandates from clients, and invest in stock market, bond market, currencies,  properties, cash or in other investments such as commodities, derivatives, etc. These investment companies are called Asset Managers or Fund Manager (link), e.g. Schroders, Invesco Perpetual, Fidelity, Blackrock. Investment banks like JP Morgan, Credit Suisse, HSBC, and UBS also have asset management division (link, link).

There are departments in Investment Banking which are not listed above, i.e. Finance, Compliance, Treasury, ALM. But these departments exist in all 3 types of banking (retail banking, corporate banking and investment banking), not just investment banking. So below I list the books in banking, not just investment banking, including those departments above. But again this is for IT developers (not system analyst!) or BAs or architects who wants to get banking knowledge or a job in banking.

  • Introduction: FT Guide to Banking by Glen Arnold, link.
  • Retail Banking: Retail Banking by Dr Ramamurthy Natarajan, link.
  • Corporate Banking: Corporate Banking: A guide book for novice by Dr Ramamurthy Natarajan, link.
  • Central Banking: Central Banking: Theory and Practice in Sustaining Monetary and Financial Stability by Thammarak Moenjak, link.
  • Treasury: Treasury Operations Handbook by Philip JL Parker, link.
  • Compliance: Financial Regulation and Compliance: How to Manage Competing and Overlapping Regulatory Oversight, by H David Kotz, link.
  • Finance: Accounting and Finance: An Introduction by Dr Peter Atrill & Eddie McLaney, link.
  • ALM: Bank Asset and Liability Managment: Strategy Trading Analysis by Irving Henry & Moorad Choudhry, link.

My banking experience: I was lucky to have a bit of investment banking and asset management experience since 2011 to date working at RBS (credit risk and credit portfolio management), Barclays (fixed income, CDS), UBS (finance, reporting), Bluebay (asset management, fixed income, risk), Insight Investment (asset management, LDI and fixed income). I had my retail and corporate banking experience when I was working for Andersen Consulting (now Accenture) in Jakarta, Indonesia, where we did a system project for Bank Exim (corporate banking), a BPR project also at Bank Exim (retail banking) and a Merger & Acquisition project at Bank Mandiri, the largest bank in Indonesia. I was fortunate that my father was a bank manager, worked for 3 banks in his career (Bank Ekonomi Indonesia, Bank Karman, Bank Umum National), from whom I got my banking passion and inspiration.

Magazines and news websites in Investment Management sector are listed below. This list is UK focus.

  • Investment Week, link
  • Global Investor, link
  • Institutional Investor, link
  • Portfolio Advisor, link
  • The Hedge Fund Journal, link
  • Risk.Net, link
  • What Investment, link
  • Institute of Asset Management, link
  • Fund Web, link
  • Financial Advisor, link

Magazines and news websites in Investment Banking (trading, FX, credit, equity, derivatives, M&A) are listed below. It is by no means comprehensive as I only spent like five minutes on it, with the intention to enhance the list over time, i.e. removing the one which are not so useful, and adding new ones.

Data Types for Common Columns

Filed under: Analysis Services — Vincent Rainardi @ 4:50 pm
  1. People’s name: nvarchar(50) for surname or first name, nvarchar(100) for full name. Norwegian and German names have special characters hence unicode.
    I prefer surname than last name, and given name than first name. But in most cases I don’t have a say because column name should match the screen.
  2. Company name: nvarchar(100). According to UK’s company house the longest they found was 82 character (link). Unicode (nvarchar) of course, to cater for German and Scandinavian companies.
  3. Address: nvarchar(300). This cover street1 (70), street2 (70), street 3 (70), local area (50), but not city/town, state/county/provice, post code and country. This is for international address, not country specific. I don’t believe in 255 (link).
  4. City or town: varchar(100). The record is a town New Zealand with 85 chars, followed by Wales with 58 chars (link). I usually set it to 50 as none of the big cities are more than 50 but I think there are small towns which more than 50.
  5. State/county/provice: varchar(100). This is for international location, not country specific. “Newfoundland and Labrador” in Canada is 25.
  6. Country: varchar(100). Formal name of UK is 52 chars (UK of GB and NI) but it is usually United Kingdom (14). I think the longest one is “South Georgia and the South Sandwich Islands” for now (44), but to future proof it I prefer 100 than 50. Got it: “Territory of the French Southern and Antarctic Lands” is 52 chars (that’s the formal name, yes the ISO name is only 35 chars)
  7. Email address: varchar(300). The official length is 254 (link). I don’t believe in 255 (link).
  8. Post code: varchar(20). 8 in UK and 5 in US and Indonesia, 10 in Brazil, Guam, Iran, and Saudi. So I’m taking 20 for future proofing it.
  9. Telephone numbers: varchar(30) is enough for international telephone number in any country. The longest is probably India and China. India’s mobile is 16 chars e.g. +91-AAA AAB BBBB. China is 17: +86 755 XXXX YYYY.

12 September 2015

EAV Fact Tables

Filed under: Data Architecture,Data Warehousing — Vincent Rainardi @ 4:20 am
Tags: ,

A few weeks ago I came across EAV fact tables. EAV is Entity Attribute Values data model (read this for a background on EAV structure). It is a data model which enables us to add column into a table without actually changing the structure. At first I thought this EAV approach has no place in Kimball dimensional model. But after thinking and rethinking I saw that it had some advantages too, not just disadvantages. And in some cases it is appropriate to be used.

So below I’ll explain what it it is, the advantages, the disadvantages and the verdict about it.

What does it look like?

An EAV Fact Table looks like this:

What does an EAV Fact Table look like

The above example is taken from retail industry (a fashion retailer), where they analysed the profitability of each product line every day. What is a product line? They have 9 product groups: women, men, kids, shoes, handbags, accessories, watches, jewelry, beauty. The women product group consists of 23 product lines: dresses, tops, jumpers, cardigans, shirts, T-shirts, blazers, capes, jackets, coats, skirts, shorts, trousers, jeans, leggings, tights, socks, nightware, lingerie, leisureware, swimware, suits, new.

The above EAV fact table is product line performance fact table. Every day, based on the sales figures, profit margins, direct costs and overheads, they calculated various performance measure for each product line: sales growth, profit margin, product distribution, margin stability, cost effectiveness, price variation, colour variation, size variation, style variation, etc. TypeKey 217 means 1 week sales growth, 218 means 1 week margin stability, 219 is product distribution, and so on. Some measures are time-based, so they have periods such as 1 day, 1 week or 1 month. Some measures have 2 versions: net and gross. Some measures have location variations i.e. local and global. Similar measures are grouped.

Performance measurements are different for each product line. Measurements applicable for a product line may not be appliable to other product line. Using EAV structure fits the structure of performance measurement data, and makes it flexible. Because there are so many product lines with so many different performance measures, almost each week they have a new performance measure. This is because in the business analytic software they can create a new measure at any time. They define the formula for that measure, and on which product lines the measure are applicable, the different range of time periods applicable to that new measure, whether it is gross or net measurement, and whether it is global or local based.

What’s bad about it?

The main down side of EAV fact tables is: when we query the fact table we may need to pivot it. If we need to retrieve the weekly margin stability for every product lines, we could just filter on Performance Type Key = 128 and we get what we need. But if we need to retrieve all time variances for sales growth (1 week, 2 weeks, 1 month, etc) for a certain product lines, then we will need to pivot the data.

This pivoting is annoying because we have to hardcode the performance type names to make them as columns. At times it could be so annoying that we wished we had a normal Kimball style fact table so we didn’t have to pivot the data.

What’s good about it?

In Kimball star schema, in the fact tables the measures are created as fixed columns. If we have a new measure, we will need to create a new column. If the retail analytics software is churning out new measures almost every week, our development cycle will not be able to cope with the pace. In a mature warehouse, it will probably take about a month to complete the analysis, design, development, testing and release process, just to add 1 column. Equal to probably 10 man days.

If the employee cost rate is $550 per day (sum of salary, bonus, medical, dental, vision, relocation, life insurance, accident insurance, training, pension, cost of HR staff, childcare, cost of facilities, and 401k for a year, say $120k, divided by 250 working days per year minus 30 days vacation and sick time) that 1 column would cost $5500. It is very not cost effective.

Having a flexible structure like the EAV fact table means that we don’t have to worry about new performance measure churned by the retail analytic software almost every week, ever. That saves $5500 per week, or $275,000 per year, which is a lot of money.

Final verdict

If new measures are created quite often (more than once per quarter) and the measures are different for every product lines (or every customer, or other dimensionality such as instrument or asset type), then the EAV Fact Table approach is justified.

If the measures are quite stable (changes are less than once per quarter), and the measures are the same for every dimensionality, then an EAV Fact Table is not justified. We should build it as a normal, Kimball-style fact table.

But that is just my opinon. You could have different experiences, and therefore different opinions. Hence I would like to hear your comments in this topic.

9 September 2015

Loading a Dimension Table using SSIS

Filed under: Analysis Services — Vincent Rainardi @ 7:46 am

Someone recently asked my opinion on loading a dimension table using MS BI / SSIS, i.e. comparing the SCD Transformation, the Merge Join + Conditional Split transformation, the Change Data Capture and the Merge command. Below is my opinion. I would love to hear your comments, opinions, and experience on this topic.

SCD Transformation created by SCD Wizard is not efficient in terms of performance, as I explained here. This is because it is doing row-by-row operation. The Dimension Merge SCD Component (link) tried to address the issues in the SCD Transformation and SCD Wizard.

We can use Merge Join transformation where the souce rows are merged with the dimension rows on the business key column. Then we use a Conditional Split transformation to update or insert into the dimension table based on the surrogate key column (if SK is null then insert, if SK is not null then update). Masud Parvez provides a good example here. But this is also a row-by-row operation so I would not use this approach when the dimension table is large. In particular, I don’t like the fact that we have to read the dimension table first and make it as a source (to be merged with the incoming source rows). This is highly inefficient. I also feel that using Merge Join + Conditional Split is cumbersome, compared to using a separate Update and Insert command.

Change Data Capture in SQL Server provides an output table with the same structure as the source table, with five additional columns (Start Log Sequence Number, Sequence Value, Operation, Update Mask), see here. CDC is a mechanism to extract the data change from the source system. It is not a mechanism to insert or update a dimension table. To load a dimension table, we will need to read the CDC output table, and update or insert into the dimension table based on the _$Operation column (1 is a delete, 2 is an insert, 3 and 4 are update).

Using Merge SQL command to load a dimension table is explained quite comprehensively by Alex Whittles here. First of all, we need to be aware of the bugs with the Merge command as Aaron Bertrand explained here, some of which continue to exist even until version 2012 as they are “by design”. We also need to be aware of the concurrency/locking issues, resulting in PK Violation error, as Dan Guzman explained here. Then we need to prepare the index. Microsoft recommend a clustered index in the target table, but in the dimenson table the clustered index should be the surrogate key column, not the business key column.

Merge command is less efficient compared to separate Update and Insert commands, because when we use separate Update and Insert commands, the Insert command does the Insert for both the new rows and the existing rows in one go. The Merge command on the other hand must do the Insert twice, once for the new rows and another one for the existing rows.

My preferred approach to load a dimension table using SSIS is to bring the source rows into a staging table, then use Execute SQL command to do the following, which I think is the most efficient way:

  1. Compare the stage table and the dimension table to find out which rows already exist in the dimension table, and those rows have changed.
  2. Update the dimension table, to set the expiry date of the rows which have changed.
  3. Insert the new rows and the changed rows into the dimension table.

For example: we have a dimension table called dim_country in the Data Warehouse (DW) with 5 columns: country_key, country_code, country_name, effective_date, expiry_date, is_current. In the staging table is called country, which has 2 columns: country_code and country_name.

update dw.dim_country set expiry_date = getdate(), is_current = ‘N’
from the s
inner join dw.dim_country d on d.country_code = s.country_code
where d.country_code = s.country_code and d.country_name <> s.country_name

insert into dw.dim_country ( country_code, country_name, effective_date,
expiry_date, is_current )
select s.country_code, s.country_name, getdate() as effective_date,
’31 Dec 9999′ as expiry_date, ‘Y’ as is_current
from the s
left join dw.dim_country d on s.country_code = d.country_code
where d.country_name is null –new rows
or (d.country_name <> s.country_name) –changed rows

Blog at