In ETL we often need to import files. But these files usually don’t have a fixed file name. They are variable. The file names change every day, because they have date on their file names. For example, yesterday it was called trans_20110627. Today it is called trans_20110628. Tomorrow it will be called trans_20110629.
How do we import them? We set the file name at run time.
How do we set the file name at run time? Using a script task, at run time we set the value of a variable. This variable is used as the file name of the file.
This might be one of the “oldest tricks in the book”, but still, I found that many people don’t know it. So it’s probably worth writing a post about it.
So here’s how to do it, step by step:
First, create a variable called TransFileName, of type string, scope = package:
Then create a script task on the control flow, call it Set File Name:
Edit the script task, put the TransFileName variable on the ReadWriteVariable:
Then click on Edit Script:
For simplicity in the above script I just put a fixed file name. But in the real project the logic is for example: find today’s file, find the latest file, etc, which is for another blog post. I’ll write that “find the latest file logic in using c# script” post in the next few days and put the link here.
Save the ScriptMain.cs window and close it. Click OK to come out of the Script Task Editor.
Create a Data Flow task:
Double click the Data Flow Task to edit it. Create Flat File Source and OLE DB Destination.
Double click on the Trans File to edit it. Click New on the Flat File Source Editor. Click Browse and select the file we want to import. For this exercise I’ve created a small text file like this:
It only contains 1 column and 2 rows. I also created a small table to illustrate the process:
create table VarFileName.Trans ( Col1 int )
So, click on Browse and select the file we want to import:
Click “Column names in the first data row” if your file has column names in the first row. Click on Columns tab on the left pane:
Specify the column delimiter e.g. pipe (|). Click on the Advanced tab on the left pane:
Set the data type for each column. Click OK to close the Flat File Connection Manager Editor and go back to the Flat File Source Editor:
Click on the Columns on the left pane:
Click OK to close the Flat File Source Editor.
Edit the OLE DB, point it to the target table, e.g. VarFileName.Trans:
Click on Mappings on the left pane and check that all columns on the file have been mapped properly to the table. In the example there is only 1 column:
Click OK to close the OLE DB Destination Editor and go back to the Data Flow window.
Now we are going to use the TransFileName variable as the file name of the file. We do a right click on the TransFile on the Connection Manager and select Properties.
Click on the ellipsis on the Expressions to open the Property Expression Editor. Set the Property column to Connection String. Under the Expression column, click on the ellipsis to open the Expression Builder. Choose the TransFileName variable, drag it to the Expression box:
Click OK to close the Expression Builder. Click OK again to close the Property Expressions Editor.
That’s it, we are done. At run time, the script task will set the value of the TransFileName variable. This variable is used as the file name of the file we want to import. Here’s the Control Flow (left) and Data Flow (right) at when we run it:
This can also be used to when we need to export to a file which we only know the name at run time. It works the same way.
Hope this helps,
Vincent Rainardi, 26/6/2011