Data Warehousing and Data Science

27 February 2011

SSIS: Importing Files – Read The First N Rows

Filed under: SSIS — Vincent Rainardi @ 8:17 am

Sometimes we need to read the first few rows of the file, not the whole file. Usually this is because the file contains a header that gives us information about the structure of the file, like this:


In the above example we want to read the first 2 rows, which is the header. But in SSIS there is no facility to read the first N rows only. We have to read the whole file, which is rediculous if we have a file with 1 million rows. So how do we read the first N rows only?

To read the first N rows there are 3 alternatives:

  1. Use Konesans Row Number Transformation.
  2. Put all rows into a table then select top N from it.
  3. Use a script component as a source or transform.

Using Konesans Row Number Transformation

Thanks to Allan and Darren, Row Number Transformation gives us a row number for every row in the file. This row number is added as a column in the output of the data flow. For further details please see their web site, We can then use the Conditional Split to split the data flow, like this:

1 output of the Conditional Split, called header in this case, contains the first 2 rows only. Which can then be loaded into the target table.

Using this method we have to read the entire file, as summarised nicely by CTKeane here. So practically speaking this approach is ideal if the file we are importing is small e.g. 1000 rows. However, it is also suitable for importing big files, if we want to read or process the remaining of the file as well. Say the file consists of 1 million rows. The first 2 rows are header, and the remaining 999,998 rows are the details. So we can process the default output of the Conditional Split, which contains the detail rows.

The second thing we need to be aware is that it is a third party component where our infrastructure managers might not be willing to support. So before going down this route it is wise to check first with whoever is in charge for the production/UAT/dev servers in your company/project.

Putting all rows in a table then select the top N from it

In this approach we import the file into a staging table. Then we select the top N rows from this staging table. Again it is OK if we want to process the remaining of the file as well as the header. But if we don’t want to read the remaining of the file, we can’t use this solution.

If the file that we are importing consist of 2 sections like the above example, then when defining the flat file connection manager we need to define the file as 1 column. Only 1 column. When we query that 1 column table we then split the long string into several column based on the delimiters.

Transact SQL doesn’t have a string split function, not like in C#. We could use charindex, which gives us the location of the delimiters. So the idea is that we crawl the long string character by character, and when we found the delimiter we then use substring to capture that column data. For an example see this post on the SQLTeam web site.

Use a Script Component as a Source/Transform

If your flat file is of a static structure (fixed number of columns) then you can use the script component as a transform, and use flat file source as normal, as Jamie Thompson described here. But if your flat file contains multiple structures, like in the above example, it is probably best to use the script component as a source.

To use a script component as a transform, first create a flat file source pointing to the file. Then add a script component and choose Transform when SSIS asks us to select Source, Destination or Transform.

  • On the Connection Managers tab we don’t set anything.
  • On the Input Columns tab we select the columns that we want. These input columns will be automatically added to the Input 0 section of the Inputs and Outputs tab.
  • On the Inputs and Outputs tab, rename the Output 0 to something sensible, e.g. CustomerRecordFileOutput. Then create the Output Columns. Make these columns of type signed int 4 bytes. Also, on the Output 0, set the SynchronousInputID property to None, which makes it async (the default is the input column, i.e. sync).
  • Finally on the Script tab, set the ReadOnlyVariables to User::N, i.e. the package variable containing how many rows we want to read. This is important otherwise that package variable will not be recognised within the script. Then edit the script as follows (C#):
    public override void Input0_ProcessInputRow(Input0Buffer Row)
        int i = 1;

        if (i<=Variables.N)
        {   Output0Buffer.AddRow();
            Output0Buffer.Col1 = Row.col1;


Basically, in the above code, we create a row counter, which we call i. We increment this i for every Input buffer row. As long as i is less than or equal to N, we produce output. Otherwise we don’t produce any output. Leave the Pre and Post Execute method, we just need to amend the Input0_ProcessInputRow method as above.

The data flow looks like this:

If you use the script component as a source, this is how:

  1. Don’t create a flat file source, just create a flat file connection manager.

    Suppose our source file is like this:


    So create a flat file connection manager with 2 columns, tick the “Column names in the first data row” checkbox, format: Delimited, Header row delimiter: {CR}{LF}, column delimiter: vertical bar.

  2. Add the script component. Choose “As a Source” when asked whether it’s a source, destination or transformation.
  3. Edit the script component and on the Connection Managers add the flat file connection that we created on step 1 above. For example, let’s call it ConnFile2. Like this:

  4. Then on the Inputs and Outputs, rename Output0 to File2Output. Then click on the “Output Columns” and click Add Column. Call it Col1 and set the data type according to the file you are importing:

  5. On the Script tab, add the package variable N to the ReadOnlyVariables:

  6. Then edit the script as follows: (based on MSDN article, Creating a Source with the Script Component, link)
public class ScriptMain : UserComponent
    private StreamReader SR;
    private string File2;

    public override void AcquireConnections(object Transaction)
        IDTSConnectionManager100 ConnMgr = this.Connections.ConnFile3;
        File2 = (string)ConnMgr.AcquireConnection(null);

    public override void PreExecute()
        SR = new StreamReader(File2);

    public override void PostExecute()

    public override void CreateNewOutputRows()
        string Row;
        string[] Cols;
        char[] Delimiter;
        Delimiter = "|".ToCharArray();

        //Skip the header first
        Row = SR.ReadLine();

        for(int i = 1; i <= Variables.N; i++)
            Row = SR.ReadLine();
            Cols = Row.Split(Delimiter);
            File2OutputBuffer.Col1 = Cols[0];
            File2OutputBuffer.Col2 = Cols[1];




The Data Flow looks like this:


  1. […] specify the last row, which is useful when you need to import only the first N rows (see my post here). But as the source is a SQL table anyway we can of course the Top N on our select statement. So […]

    Pingback by SQL Server Destination or OLEDB « Data Warehousing and Business Intelligence — 28 February 2011 @ 12:30 am | Reply

  2. Thank you very much for this article. It help me a lot!!!

    Comment by Leo — 1 November 2011 @ 9:13 pm | Reply

  3. There is alternative way ot the Row Number Transformation, see

    Comment by todorp — 19 December 2011 @ 4:25 am | Reply

  4. You can use T-SQL to get first N rows from file:
    BULK INSERT [tablename] FROM ‘filename’ WITH (FIRSTROW=1, LASTROW=N)

    Comment by Koit — 18 October 2012 @ 3:42 pm | Reply

  5. Hi. What is this File2OutputBuffer. I am getting an error as “The name File2OutputBuffer” does not exist in current context.

    Comment by Sonali — 2 August 2017 @ 3:00 pm | Reply

  6. Hi Sonali, please see this:
    When we create a Script Component as a source, we need to specify a Connection Manager. In the above case, the Connection Manager is File2.
    Then on the Script Transformation Editor, if you click on Inputs and Outputs on the left pane, you can see File2Output in the middle. Each file output has an output buffer, which is empty at first. So to answer your question, what is this File2OutputBuffer, it is a variable created automatically by SSIS which holds the rows that will go to the output. We can add row to this output buffer by using AddRow function.
    So when we have a structure of file header like this:
    we can use this:
    Row = SR.ReadLine();
    Cols = Row.Split(Delimiter);
    File2OutputBuffer.Col1 = Cols[0];
    File2OutputBuffer.Col2 = Cols[1];
    to put a to column1 and b to column 2, so we can examine the header values.

    Comment by Vincent Rainardi — 5 August 2017 @ 5:42 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: