Data Warehousing and Data Science

7 May 2016

U-SQL

Filed under: Analysis Services — Vincent Rainardi @ 7:26 pm

U-SQL is a new language used in Azure Data Lake. It is a mix between T-SQL and C#. It is used to load files into Data Lake.

To use U-SQL, we need to add a New Data Lake Store in our Azure Portal, click Sign Up for Preview (we need to use IE; in Chrome we can’t click Sign Up). We will get an email when it is approved (can be a few hours, can be a few weeks).

Here is an example for calculating the duration for each currency in the portfolio:

@rawData = EXTRACT SecurityName string?, Weight decimal?, ModifiedDuration decimal, MaturityDate DateTime, Currency string
FROM “/FixedIncome/EMD.csv” USING Extractors.Csv(silent: true);

@aggData = SELECT Currency, SUM(Weight * ModifiedDuration) as Duration FROM @rawData GROUP BY Currency;

OUTPUT @aggData TO “/Report/Duration.csv” ORDER BY Currency USING Outputters.Csv();

The above U-SQL code imports a comma delimited file called Portfolio.csv, calculates the duration for each currency and create a file called Duration.csv. I’ll explain the 3 lines above (EXTRACT, SELECT, OUTPUT) one by one below.

EXTRACT

Case sensitive: U-SQL is case sensitive and strong-typed. It is case sensitive because it is using C# syntax. So “EXTRACT” must be in upper case, “string” must be in lower case and “DateTime” must be in mixed case.

EXTRACT is used to import a file into a row set. It can be used to import both a structured file (all rows have the same number of columns) or unstructured files (rows have different number of columns). Documentation: here.

The data type is C# data type, not T-SQL. Full list of C# built-in data types is here. DateTime is a structure, not a data type. It has Date, Time, Year, Month, Day, Hour, Minute, Second, Today, Now, TimeOfDay and DayOfWeek properties. See here for documentation.

The ? at the end of “string?” and “decimal?” means NOT NULL.

One thing we need to remember is that in U-SQL it is “schema on read”. Which means that we define the data types of the file we import as we import them.

When importing a file we need to use an extractor. It is a data interface program which reads byte stream in parallel into a row set. U-SQL has 3 built-in Extractors: one reads text files (Extractors.Text), one reads tab delimited files (Extractors.Tsv) and one reads Comma Separated Values files (Extractors.Csv). The last two are derived from the first one. By default U-SQL split the imported file into several pieces and import the pieces in parallel. U-SQL imports unstructured files using user-defined extractor. See here for documentation on extractors.

Silent is a parameter. It means if the row in the CSV file doesn’t have 5 columns, ignore the row (we expect the CSV file to contains 5 columns: SecurityName, Weight, ModifiedDuration, MaturityDate, and Currency). There are other parameters such as: (documentation here)

  • rowDelimiter: the character at the end of the row, e.g. carriage return, linefeed
  • quoting: whether the values are wrapped in double quotes or not
  • nullEscape: what string represents a null value

@rawData is the name of the row set containing data we import from the CSV file. We can then refer to this row set with this name.

SELECT

Once we have this @rawData row set, we can do a normal SQL statement to it, such SELECT, UPDATE, INSERT and DELETE. But we need to use C# syntax. So the where clause is not “where AssetClass = “Options” ” but “where AssetClass == “Options” ”. And SELECT, UPDATE, INSERT, DELETE all needs to be in upper case. Other supported SQL statement in U-SQL are: JOIN, UNION, APPLY, FETCH, EXCEPT, and INTERSECT.

  • SELECT: U-SQL supports WHERE, FROM, GROUP BY, ORDER BY and FETCH. Documentation here. FETCH means get the first N rows, link. For ALIAS see here.
  • JOIN: U-SQL supports INNER JOIN, LEFT (OUTER) JOIN, RIGHT (OUTER) JOIN, FULL (OUTER) JOIN, CROSS JOIN, SEMIJOIN, and ANTISEMIJOIN. Documentation here. Both the “ON T1.Col1 == T2.Col2” and the old style “WHERE T1.Col1 == T2.Col2” are supported (link).
  • UNION: U-SQL supports both UNION and UNION ALL. Doc: here.
  • APPLY: U-SQL supports both CROSS APPLY and OUTER APPLY. Doc: here.
  • Interaction between row sets (these are hyperlinked): EXCEPT, INTERSECT, REDUCE, COMBINE, PROCESS.
  • Aggregate functions: SUM, COUNT, FIRST, LAST, MAX, MIN, AVG, VAR, STDEV, ARRAY_AGG, MAP_AGG. Doc: here. ARRAY_AGG combines values into an array. Split breaks an array into values (link).
  • Operators: IN, LIKE, BETWEEN. Doc: here.
  • Logical operators: Remember that U-SQL is using C# syntax so AND is &&, OR is ||, = is ==, <> is != and not is !
  • “condition ? if_true : if_false” is also supported, see here.

 

OUTPUT

The output of a SQL statement (called Expression in U-SQL) can be assigned to a row set like in the example above i.e. “@aggData = “, or we can do these 3 things:

  1. Output it to a new file using OUTPUT
  2. Output it to a new table using CREATE TABLE AS
  3. Output it to an existing table using INSERT INTO

The OUTPUT expression saves the row set into a file. There are 3 built-in outputters in U-SQL: Outputters.Tcv for tab delimited files, Outputters.Csv for comma delimited files, and Outputters.Text for any delimiter. They accept these parameters: quoting, nullEscape, rowDelimiter, escapeCharacter, and encoding. Documentation: here.

Documentation

The U-SQL language reference is here (still work in progress), and the U-SQL tutorial is here.

Data Lake documentation is here and the learning guide is here.

Articles on U-SQL from Microsoft:

  • From Microsoft U-SQL team: link
  • Ed Macauley U-SQL tutorial: link, click on Tutorial under Prerequisites.
  • From Mike Rys: link

3 May 2016

Asset Management Business Processes and Systems

Filed under: Business Knowledge — Vincent Rainardi @ 7:46 am

I am passionate and fascinated with Asset Management sector. It is a complex industry. To work as a BA in asset management we need to know both investment banking (security, trading, credit risk, compliance) and investing (portfolio management, asset classes, hedging, performance). In this article I will try to explain asset management business from system point of view, i.e. what systems are required to run the business. But first a brief intro about the business itself.

Business Overview

An asset management business makes money from the fees they charge the clients. There are 7 main business processes in an asset management company:

  1. Product development
  2. Distribution (means sales and marketing)
  3. Client on-boarding
  4. Investment management
  5. Risk management
  6. Client reporting
  7. Compliance

To do the above 7 business processes we need these 7 departments: investment, IT, marketing, legal, finance, HR, facilities.

  1. Product development is the process of creating a product. A product in asset management is either a public fund, a pooled fund, or a managed portfolio (discretionary or mandate). The product development team must decide the product goals, asset allocation strategy (including limits), hedging policy, restrictions (not allowing repo for example), the benchmark and the fund manager. If it is a public fund, the team must decide (with the help of the legal team) the legal structure, the trustee/depositary, the registrar, and the country of domicile (for tax purposes). With the help from marketing and finance, they also need to determine, launch date (fund launch and unit launch), the sales policy, the distribution channels (including commission structure, marketing strategy, and platforms), capacity (maximum size for soft close), the share classes, ISA wrapper, minimum investment (initial and monthly), charges & fees, the spread, the pricing (single/dual), valuation point, and seed money.
  2. Distribution is basically sales and marketing. Marketing covers advertising and communications with pension funds, investment advisers, family offices, investment manager databases, brokers, asset managers, HNWIs, and financial services companies; and create marketing material and literature. Sales meet with prospective institutional clients (e.g. pension funds, pension scheme trustees), draft and review client agreements, being an account manager, manage client relationships, sorting out client needs and requests, reviewing the performance of client mandates, and proactively develop the business.
  3. Client on-boarding process is about gathering information and requirement from a new client, and enter it into the system (including background check, account setup, credit ratings), drafting and signing agreements for client mandates (legal will review first). We need to ensure that we comply with the regulations at all times, particularly KYC (Know Your Customer), AML (Anti Money Laundering), FATCA (Foreign Account Tax Compliance Act), EMIR and MiFID II. We need to agree with the client whether it is execution only, advisory or discretionary mandate; from which office we want to serve them (EU, non-EU, US), the asset allocation, portfolio strategy, portfolio limits and policy, fees structure, tax & payment withholding, before eventually we can receive their money to be invested.
  4. Investment management: once we receive client’s money, the investment process begins. The fund managers (FM) will start creating orders to purchase securities in the order management system (OMS) such as SimCorp, Charles River or ThinkFolio. The orders/trades will then be aggregated, executed, settled, and allocated. The fund managers use a portfolio management system (PMS) to construct the portfolio, manage asset allocation, monitor exposure and holdings, and manage risks, as per the investment mandates. They also use the PMS to run what-if scenarios and assess the profit & loss based on daily market prices, and create a composite portfolio with a blended benchmark. The Performance Team uses a performance and attribution system (PAS) to understand the performance of the fund against the benchmark (for each share class), attribution of performance for each factor e.g. security selection, asset allocation, duration, curve, and currency.
  5. Risk management is about understanding the sensitivities of the portfolio to interest rate changes (duration, DV01, Greeks), understanding exposure to different sectors and currencies, understanding ex-ante risks such as VaR, tracking error and volatility; hedging credit exposure, and other risk measures such as OAS, fair value, spread, and liquidity risk. These processes are usually performed a risk team which will inform the FMs of any risk found above tolerance. The risk team will also conduct stress testing by running particular scenarios against the portfolio: FX rate changes, interest rate shifts, credit spread changes, or against certain historical events such as 9/11 or an incoming event such as Brexit. It is also about managing counterparty risk and credit risk using collateral based on MtM, particularly for OTC derivatives.
  6. Client reporting is about updating clients how their portfolios are performing (against the benchmark and attribution), the asset allocation (by asset class, rating, sector, duration and maturity), and the risk profile (sensitivities to interest rates, inflation, credit risk, and other factors). The reports also contain updates on market situations, market analysis from the fund managers (and how they position the portfolio to take advantage of it), and transactions during the period (what securities were purchased/sold and why. This is usually performed not by the FMs, but by a separate client reporting team. If it is a public fund, the asset manager needs to produce KIIDs (soon to be KID) and fact sheets which contain the fund objective, investment policy, fund performance, asset allocation breakdown, top holdings, fund details, fees & changes, risk profile, and fund manager commentary.
  7. Compliance processes are about ensuring that all portfolios that the asset manager has are as per the investment objectives and restrictions. For example, if a portfolio must hedge at least 80% to Sterling, every day the compliance process will test this boundary. Other examples of the portfolio objectives and restrictions tested by compliance process are: no equity, IG fixed income only (no HY), emerging market only, no derivatives, no MBS, gilt only (no corporates), equity only (no fixed income), duration > 5y. It is important to test these boundaries every day because a) it could be regulatory (e.g. a GBP fixed income fund is not allowed to invest in equity or in other currency), b) breaking the mandate (e.g. the mandate from a pension fund which have a strict rule not to invest in derivative, because it is too risky), and c) client’s trust (e.g. if a fund manager doesn’t stick to the rules, existing clients will switch and potential clients will choose other asset managers and the AUM will dry up).

Systems Required

What systems are required to support the above 7 business processes?

  1. Investment Management System (IMS): this is a combination of order management system (aka trade management system), portfolio management system, risk management system, compliance system. Order management is a must have feature, i.e. ability to place trades with brokers and bank (Omgeo CTM and Swift messages) from origination, execution, validation, confirmation, clearing and settlement. Every good IMS must check compliance / portfolio limits before a trade is submitted, for example if High Yield is only allowed 20% and a trade will make the portfolio having 21% HY, the IMS will prevent the trade to be submitted. Examples of IMS: ThinkFolio, Charles River, SimCorp, EBIMS, IMSplus, e-AMIS, Vestio, Nirvana, FC Portfolio, VestServe.
  2. Performance Management System (PMS): it takes the daily prices of each security and daily benchmark values, calculate the NAV for each share class (value of the portfolio minus liabilities), and provide the official unit price for each share class, based on which the 1m, 3m, 6m, 1y, 3y, 5y, and SI performance is calculated. A very important feature of a PMS is performance attribution, which is the ability to determine how much each factor contributes to the performance. For example, out of 12% performance in the last 1y, 3% of it is because of security selection, 4% because of currency, and 5% because of duration. A good PMS provides GIPS compliant numbers. Examples of PMS: Barclays Point (now Bloomberg), Eagle (aka Pace), StatPro, VestServe.
  3. Risk Management System (RMS): the risk manager module within IMS usually only provides the analytics (e.g. duration, spread, yield, convexity) but it does not calculate value at risk, volatility, beta, alpha, Sharpe ratio, and tracking error because they are too complex. An RMS calculates them, as well as doing stress testing, counterparty risk and liquidity risk. A good RMS can calculate the contribution of each factor to the portfolio VaR, volatility and Beta (e.g. out of 300 bps VaR, 90 is because of government bonds, 110 because of corporate sector selection, 80 is because of curve, and 50 because of currency), or even down to the individual security, as well as calculating Component VaR, Marginal VaR and Incremental VaR. Examples of RMS: Risk Metrics, IBM Algo Risk, Risk Value, Kraytis.
  4. Client Relationship Management System (CRM): not only CRM supports the client on-boarding process (including contract/agreement, mandates, investment guidelines, AUM, KYC, AML, and detailed customer information), but also managing prospects & opportunities, deal conversion, sales targets, client meetings, reporting requests, mandate changes and revenue tracking. A good CRM system can support fundraising, product road shows, managing reporting and compliance on client-by-client basis, mass mailing campaigns, statement generation, pipeline management, client portal, and mobile access. Examples of CRM systems: SalesForce, Maximizer, Satuit, Dynamics, WDX, Communica, ProTrak, Dynamo. Mark Margolis wrote a good article on CRM for asset management.
  5. Client Reporting System (CRS): clients need to know how their investments are doing. This includes performance, attribution, risk profile, asset breakdown, and transactions. Client also need specific reports such as regulatory (e.g. Solvency II for insurance companies), duration and maturity breakdown (for fixed income houses), inflation sensitivities (for pension funds), counterparty risk (for banking clients), currency hedging, etc. We can satisfy these requests individually (fully tailored solution for each client), or we can satisfy them en-mass (create one solution that can be used by clients because it can do everything). Example of CRS are Kurtosys, Vermillion, Pulse, Comarch, and SS&C.
  6. Compliance system: checks all portfolios and positions every day for compliance breach and mandate/guideline breach.
  7. Fund administration: maintain a register of all shareholders (for investment trust) and investors (for unit trust/SICAV/OEIC), record all subscriptions and redemptions, calculate AUM for every fund share class every day taking into account all redemptions and subscriptions. An example of a fund administration system is IBS PAMS.
  8. Fund accounting: calculates the NAV for every portfolio/funds every day, based on EOD market prices, taking into account all liabilities.
  9. Collateral management: calculate mark to market valuations for every OTC derivatives (IRS, CDS, repo, etc.) and calculate the value of every single collateral with every single counterparty and determine how much over/under our counterparty position, and manage it.
  10. OTC pricing: calculate the prices of every OTC derivative contract that we have including swaps (IRS, inflation swaps, currency swaps, CDS) and options. Also calculates the derivative of price (the Greeks) e.g. Gamma, Vega, Theta.

 

Blog at WordPress.com.