Data Warehousing and Data Science

7 May 2016


Filed under: Analysis Services — Vincent Rainardi @ 7:26 pm

U-SQL is a new language used in Azure Data Lake. It is a mix between T-SQL and C#. It is used to load files into Data Lake.

To use U-SQL, we need to add a New Data Lake Store in our Azure Portal, click Sign Up for Preview (we need to use IE; in Chrome we can’t click Sign Up). We will get an email when it is approved (can be a few hours, can be a few weeks).

Here is an example for calculating the duration for each currency in the portfolio:

@rawData = EXTRACT SecurityName string?, Weight decimal?, ModifiedDuration decimal, MaturityDate DateTime, Currency string
FROM “/FixedIncome/EMD.csv” USING Extractors.Csv(silent: true);

@aggData = SELECT Currency, SUM(Weight * ModifiedDuration) as Duration FROM @rawData GROUP BY Currency;

OUTPUT @aggData TO “/Report/Duration.csv” ORDER BY Currency USING Outputters.Csv();

The above U-SQL code imports a comma delimited file called Portfolio.csv, calculates the duration for each currency and create a file called Duration.csv. I’ll explain the 3 lines above (EXTRACT, SELECT, OUTPUT) one by one below.


Case sensitive: U-SQL is case sensitive and strong-typed. It is case sensitive because it is using C# syntax. So “EXTRACT” must be in upper case, “string” must be in lower case and “DateTime” must be in mixed case.

EXTRACT is used to import a file into a row set. It can be used to import both a structured file (all rows have the same number of columns) or unstructured files (rows have different number of columns). Documentation: here.

The data type is C# data type, not T-SQL. Full list of C# built-in data types is here. DateTime is a structure, not a data type. It has Date, Time, Year, Month, Day, Hour, Minute, Second, Today, Now, TimeOfDay and DayOfWeek properties. See here for documentation.

The ? at the end of “string?” and “decimal?” means NOT NULL.

One thing we need to remember is that in U-SQL it is “schema on read”. Which means that we define the data types of the file we import as we import them.

When importing a file we need to use an extractor. It is a data interface program which reads byte stream in parallel into a row set. U-SQL has 3 built-in Extractors: one reads text files (Extractors.Text), one reads tab delimited files (Extractors.Tsv) and one reads Comma Separated Values files (Extractors.Csv). The last two are derived from the first one. By default U-SQL split the imported file into several pieces and import the pieces in parallel. U-SQL imports unstructured files using user-defined extractor. See here for documentation on extractors.

Silent is a parameter. It means if the row in the CSV file doesn’t have 5 columns, ignore the row (we expect the CSV file to contains 5 columns: SecurityName, Weight, ModifiedDuration, MaturityDate, and Currency). There are other parameters such as: (documentation here)

  • rowDelimiter: the character at the end of the row, e.g. carriage return, linefeed
  • quoting: whether the values are wrapped in double quotes or not
  • nullEscape: what string represents a null value

@rawData is the name of the row set containing data we import from the CSV file. We can then refer to this row set with this name.


Once we have this @rawData row set, we can do a normal SQL statement to it, such SELECT, UPDATE, INSERT and DELETE. But we need to use C# syntax. So the where clause is not “where AssetClass = “Options” ” but “where AssetClass == “Options” ”. And SELECT, UPDATE, INSERT, DELETE all needs to be in upper case. Other supported SQL statement in U-SQL are: JOIN, UNION, APPLY, FETCH, EXCEPT, and INTERSECT.

  • SELECT: U-SQL supports WHERE, FROM, GROUP BY, ORDER BY and FETCH. Documentation here. FETCH means get the first N rows, link. For ALIAS see here.
  • JOIN: U-SQL supports INNER JOIN, LEFT (OUTER) JOIN, RIGHT (OUTER) JOIN, FULL (OUTER) JOIN, CROSS JOIN, SEMIJOIN, and ANTISEMIJOIN. Documentation here. Both the “ON T1.Col1 == T2.Col2” and the old style “WHERE T1.Col1 == T2.Col2” are supported (link).
  • UNION: U-SQL supports both UNION and UNION ALL. Doc: here.
  • APPLY: U-SQL supports both CROSS APPLY and OUTER APPLY. Doc: here.
  • Interaction between row sets (these are hyperlinked): EXCEPT, INTERSECT, REDUCE, COMBINE, PROCESS.
  • Aggregate functions: SUM, COUNT, FIRST, LAST, MAX, MIN, AVG, VAR, STDEV, ARRAY_AGG, MAP_AGG. Doc: here. ARRAY_AGG combines values into an array. Split breaks an array into values (link).
  • Operators: IN, LIKE, BETWEEN. Doc: here.
  • Logical operators: Remember that U-SQL is using C# syntax so AND is &&, OR is ||, = is ==, <> is != and not is !
  • “condition ? if_true : if_false” is also supported, see here.



The output of a SQL statement (called Expression in U-SQL) can be assigned to a row set like in the example above i.e. “@aggData = “, or we can do these 3 things:

  1. Output it to a new file using OUTPUT
  2. Output it to a new table using CREATE TABLE AS
  3. Output it to an existing table using INSERT INTO

The OUTPUT expression saves the row set into a file. There are 3 built-in outputters in U-SQL: Outputters.Tcv for tab delimited files, Outputters.Csv for comma delimited files, and Outputters.Text for any delimiter. They accept these parameters: quoting, nullEscape, rowDelimiter, escapeCharacter, and encoding. Documentation: here.


The U-SQL language reference is here (still work in progress), and the U-SQL tutorial is here.

Data Lake documentation is here and the learning guide is here.

Articles on U-SQL from Microsoft:

  • From Microsoft U-SQL team: link
  • Ed Macauley U-SQL tutorial: link, click on Tutorial under Prerequisites.
  • From Mike Rys: link


  1. I have a column 00099 in my input file in Blob . How can I extract that column in U_SQL using EXTRACT command as the column name is integer and U-SQL engine is not recognizing the integer columns .

    Comment by Gayathri — 26 July 2018 @ 2:11 pm | Reply

    • Hi Gayathri, try enclosing the numeric column name with square brackets, i.e. [00099].

      Comment by Vincent Rainardi — 27 July 2018 @ 5:00 am | Reply

      • Hi Vincent , It worked ! Thanks a lot 🙂

        Comment by Gayathri — 27 July 2018 @ 7:58 am

  2. Is it possible in USQL to have all the letters of a column in Upper case.If so please tell me how to do it in USQL

    Comment by Shankar — 2 August 2018 @ 7:53 am | Reply

    • Yes it is possible Shankar to make all the letters of a column in upper case, using UPPER function.
      Like this: SELECT UPPER(Column1) FROM Table1

      Comment by Vincent Rainardi — 3 August 2018 @ 6:48 am | Reply

  3. 3. Is it possible to extract the content from the latest file in blob using U-SQL? Like I have multiple files with filename_20180803.csv , filename_20180802.csv . I would like to extract the filename_20180803.csv only

    Comment by Gayathri — 3 August 2018 @ 8:58 am | Reply

  4. Is it possible to extract exponential numbers from the input file using Extract ? Currently I am extracting the column values as string and then converting to decimal in select . Is there any way so that I can directly extract the numbers having E in the values in Extract?

    Comment by Gayathri — 23 August 2018 @ 9:34 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: