Sometimes we need to read the first few rows of the file, not the whole file. Usually this is because the file contains a header that gives us information about the structure of the file, like this:
col1|col2|col3
a,a,a
col1|col2|col3|col4|col5
1,1,1,1,1
2,2,2,2,2
3,3,3,3,3
In the above example we want to read the first 2 rows, which is the header. But in SSIS there is no facility to read the first N rows only. We have to read the whole file, which is rediculous if we have a file with 1 million rows. So how do we read the first N rows only?
To read the first N rows there are 3 alternatives:
- Use Konesans Row Number Transformation.
- Put all rows into a table then select top N from it.
- Use a script component as a source or transform.
Using Konesans Row Number Transformation
Thanks to Allan and Darren, Row Number Transformation gives us a row number for every row in the file. This row number is added as a column in the output of the data flow. For further details please see their web site, SQLIS.com. We can then use the Conditional Split to split the data flow, like this:

1 output of the Conditional Split, called header in this case, contains the first 2 rows only. Which can then be loaded into the target table.
Using this method we have to read the entire file, as summarised nicely by CTKeane here. So practically speaking this approach is ideal if the file we are importing is small e.g. 1000 rows. However, it is also suitable for importing big files, if we want to read or process the remaining of the file as well. Say the file consists of 1 million rows. The first 2 rows are header, and the remaining 999,998 rows are the details. So we can process the default output of the Conditional Split, which contains the detail rows.
The second thing we need to be aware is that it is a third party component where our infrastructure managers might not be willing to support. So before going down this route it is wise to check first with whoever is in charge for the production/UAT/dev servers in your company/project.
Putting all rows in a table then select the top N from it
In this approach we import the file into a staging table. Then we select the top N rows from this staging table. Again it is OK if we want to process the remaining of the file as well as the header. But if we don’t want to read the remaining of the file, we can’t use this solution.
If the file that we are importing consist of 2 sections like the above example, then when defining the flat file connection manager we need to define the file as 1 column. Only 1 column. When we query that 1 column table we then split the long string into several column based on the delimiters.
Transact SQL doesn’t have a string split function, not like in C#. We could use charindex, which gives us the location of the delimiters. So the idea is that we crawl the long string character by character, and when we found the delimiter we then use substring to capture that column data. For an example see this post on the SQLTeam web site.
Use a Script Component as a Source/Transform
If your flat file is of a static structure (fixed number of columns) then you can use the script component as a transform, and use flat file source as normal, as Jamie Thompson described here. But if your flat file contains multiple structures, like in the above example, it is probably best to use the script component as a source.
To use a script component as a transform, first create a flat file source pointing to the file. Then add a script component and choose Transform when SSIS asks us to select Source, Destination or Transform.
- On the Connection Managers tab we don’t set anything.
- On the Input Columns tab we select the columns that we want. These input columns will be automatically added to the Input 0 section of the Inputs and Outputs tab.
- On the Inputs and Outputs tab, rename the Output 0 to something sensible, e.g. CustomerRecordFileOutput. Then create the Output Columns. Make these columns of type signed int 4 bytes. Also, on the Output 0, set the SynchronousInputID property to None, which makes it async (the default is the input column, i.e. sync).
- Finally on the Script tab, set the ReadOnlyVariables to User::N, i.e. the package variable containing how many rows we want to read. This is important otherwise that package variable will not be recognised within the script. Then edit the script as follows (C#):
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
int i = 1;
if (i<=Variables.N)
{ Output0Buffer.AddRow();
Output0Buffer.Col1 = Row.col1;
i++;
}
}
Basically, in the above code, we create a row counter, which we call i. We increment this i for every Input buffer row. As long as i is less than or equal to N, we produce output. Otherwise we don’t produce any output. Leave the Pre and Post Execute method, we just need to amend the Input0_ProcessInputRow method as above.
The data flow looks like this:

If you use the script component as a source, this is how:
-
Don’t create a flat file source, just create a flat file connection manager.
Suppose our source file is like this:
col1|col2
1|1
2|2
3|3
4|4
5|5
So create a flat file connection manager with 2 columns, tick the “Column names in the first data row” checkbox, format: Delimited, Header row delimiter: {CR}{LF}, column delimiter: vertical bar.
- Add the script component. Choose “As a Source” when asked whether it’s a source, destination or transformation.
-
Edit the script component and on the Connection Managers add the flat file connection that we created on step 1 above. For example, let’s call it ConnFile2. Like this:

-
Then on the Inputs and Outputs, rename Output0 to File2Output. Then click on the “Output Columns” and click Add Column. Call it Col1 and set the data type according to the file you are importing:

-
On the Script tab, add the package variable N to the ReadOnlyVariables:

- Then edit the script as follows: (based on MSDN article, Creating a Source with the Script Component, link)
public class ScriptMain : UserComponent
{
private StreamReader SR;
private string File2;
public override void AcquireConnections(object Transaction)
{
IDTSConnectionManager100 ConnMgr = this.Connections.ConnFile3;
File2 = (string)ConnMgr.AcquireConnection(null);
}
public override void PreExecute()
{
base.PreExecute();
SR = new StreamReader(File2);
}
public override void PostExecute()
{
base.PostExecute();
SR.Close();
}
public override void CreateNewOutputRows()
{
string Row;
string[] Cols;
char[] Delimiter;
Delimiter = "|".ToCharArray();
//Skip the header first
Row = SR.ReadLine();
for(int i = 1; i <= Variables.N; i++)
{
Row = SR.ReadLine();
Cols = Row.Split(Delimiter);
File2OutputBuffer.AddRow();
File2OutputBuffer.Col1 = Cols[0];
File2OutputBuffer.Col2 = Cols[1];
}
}
}
The Data Flow looks like this:
