Data Warehousing and Data Science

5 April 2011

Stored Procedure vs ETL Tool

Filed under: Data Warehousing — Vincent Rainardi @ 7:03 am

Some SQL server data warehouses are loaded using stored procedures (SP). Data from the source system(s) is dumped into stage, and then many SPs are executed one after the other to load the data from stage into the warehouse. This batch of SPs resembled what we had 20 years ago, when mainframes or AS/400 were running overnight batches.

If the source system is not a SQL Server, an ETL tool like SSIS or Informatica may be used to import the data into stage. If the source system is a SQL Server, a series of SPs can be used to import the data into stage. An ETL tool can read data from any RDBMS and deliver data to any RDBMS. That is the distinct advantage over SPs, which only works on SQL Server.

A few days ago I talked to a friend who uses SSIS just to dump the source data into stage. He then uses SPs to load the stage data into the warehouse. This scenario is quite common, and usually chosen because of these reasons:

  1. Both the source/operational system and the data warehouse are in SQL Server
  2. The person designing the warehouse was not familiar with ETL tools. Probably he was a SQL developer who was asked to develop a data warehouse. He had written hundreds of SPs in the past, so “when you hold a hammer everything looks like a nail”.
  3. The person responsible for the ETL believed that SPs deliver more performance than an ETL tool.

The question he asked was, if both the source system and the data warehouse are in SQL Server, is there a reason of using an ETL tool to load data from stage into the warehouse? I explained that there were a few main reasons why people took the decision to use an ETL tool:

  1. Using an ETL tool it is easier to maintain the code
  2. An ETL tool is more flexible than SPs (it has more functionalities)
  3. An ETL tool deliver better performance than SPs
  4. Features such as logging/audit and metadata are built in

Using an ETL tool it is easier to maintain the code

An ETL tool is GUI based. You drag and drop. We map columns by “connecting the lines”. We have visuals of what the source is, what the destination is and what the transformations are. SPs on the other hand are text based. True you can use Information Schemas Routine or syscomments to search, but it’s like a jungle of spaghetti code. Even with a decent documentation the production support team would still be struggling to understand it.

An ETL tool is more flexible than SPs

With a decent ETL tool like BODI, SSIS, Informatica, Ab Initio and Data Stage/Web Sphere, everything is built in. Lookup, aggregation, branching based on certain condition, expression, string & date manipulations, are all built in. I’ve seen more than 3 times now, in the case SPs, the developer created SQL functions for string or date manipulation. They are clever and nice, but with an ETL tool it’s easier because they have a library of ready-made functions. When it comes to the advanced features, SPs can’t talk much: fuzzy lookup, debugging, ability to create custom scripts, data matching algorithm, data quality features, data profiling, ability to process XML feed, etc.

An ETL tool delivers better performance than SPs

Some people said to me that they don’t need performance. It’s only a small data mart. OK, but nevertheless, let’s talk about performance, in case in the future you do bigger project. A lot of performance related features are built in such as in-memory lookup, parallel running tasks, ability to utilise 3 different servers to run a massive load, high performant aggregation, high performant join, high availability features (when node 1 stop node 2 is still going).

Logging, Audit and Metadata

If we use SPs we will need to do the logging, auditing and metadata manually. With a decent ETL tool you don’t have too. You can do data lineage analysis, logging into database tables or files, tracing who’s doing what and when, all out-of-the-box standard functionality.

As usual I welcome any comments and questions at Vincent, 5/3/11.


  1. its really very nice

    Comment by sreekanth — 14 June 2011 @ 4:03 pm | Reply

  2. The fact that you have not listed any drawbacks with standard ETL tools tells a lot about the type of projects you are basing your comments on. To claim that SSIS performs better than standards T-SQL is rather absurd to say the least.

    Comment by Tahir Riaz — 15 November 2011 @ 11:15 am | Reply

    • Thank you for your comment Tahir

      Comment by Vincent Rainardi — 15 November 2011 @ 7:52 pm | Reply

    • Vincent, I agree with what you are saying, but I would not include SSIS as an ETL tool, in fact it impairs performance. Yes it moves and transforms data, but very Clumsily. I would consider a real ETL tool to be repository based. I think this is the biggest advantage of a robust ETL tool like Informatica. The main reason companies don’t use a robust ETL tool is the cost.

      Comment by Tim — 20 December 2011 @ 7:37 pm | Reply

  3. Hi Vincent,

    thanks for your interesting post. As I think it is written a bit onesided, I would like to discuss some point from another perspective.

    .) Using an ETL tool it is easier to maintain the code
    It’s true that ETL tools are GUI based, but in my experience, that doesn’t necessarily mean, that it’s easier to maintain. There are much more SQL coders out there than people with experience in a specific ETL tool. I personally find an error much faster in a PL/SQL code, than with debugging mappings of ETL tools.
    I would also argue, that developing PL/SQL can be faster than developing ETL logic with ETL tools, as you can easily write a code generator to automatize the development with dynamic SQL code.

    .)An ETL tool is more flexible than SPs
    lookups, aggregations, branching based inserts, conditions, expressions, string & date manipulations are built in modern database solutions as well, and they are even faster.
    You are right, that some ETL tools may deliver some advanced functions, that you don’t find in a database.

    .)An ETL tool delivers better performance than SPs
    In almost all cases the SPs, which are executed directly in the database must be faster than any ETL tool. Not only, because you don’t have to transfer your data to your ETL server and back to your DB. Also because joins/lookups, aggregations, filters, etc can ran much faster in DB and can be tuned using indexes and other techniques.

    .)Logging, Audit and Metadata
    I have to agree here, that monitoring/logging is usually easier with ETL tools. You have to implement your own logging mechanism in your SPs

    Personally I have used both, ETL tools (Informatica, SSIS, Pentaho, Talend) as well as SPs to develop ETL processes.
    My opinion is, that it totally depends on the project which approach is better. When you have different source systems, web services, files, etc. that you have to bring together, and you want to design your ETL process DB-independent, ETL tools are clearly the choice.
    When the source and DWH systems are on the same DB, and you have a lot of similar transformations, I would use SPs to automatize the development process and get a better performance.

    Comment by Mike — 22 January 2012 @ 7:29 pm | Reply

  4. Hi Vincent,

    This is a very passionate topic both at your blog and at many organizations. I think this problem like many is a focus on the part than on the whole. What I mean by that is professionals needing to build a specific solution look at the tools they are most familar with and can execute on using that tool. Many of the best programmers choose storeed procedures because their ability to execute on it is very high.

    The whole focuses on the bigger picture, what happens after the SQL expert that built that solution leaves and someone else has to support it, or what happens after the small project turns into a mammoth project. The SP/View solution begins to be crushed under its own weight with very few options for tuning.

    Comment by Jonathan Smith — 30 October 2012 @ 2:12 pm | Reply

  5. Hi Author,
    Based on my experience I found SP works faster than ETL. SPs are pre-compiled and stored in SQLServer.

    Comment by momtazulkarim — 4 September 2013 @ 9:07 pm | Reply

  6. SP is much faster than SSIS when done right — you have to write DW transformations (slowly-changing building etc) in a set operation manner though

    Comment by Will — 5 February 2015 @ 3:57 am | Reply

  7. Hi Vicent, vey useful this post.

    I have a question related with this post. I know that some ETL tools work ETL(like Pentaho DI, PowerCenter, SAS Data Integration) and other wors ELT( in-datebase transformations like Oracle ODI).
    Some people say that ELT has better performance than ETL, ¿what do you think?, ¿do you know main differences?

    I know that ELT is SQL based and treat the dataset as a whole in each step, but in some ETL tool you can work row by row…, perhaps row by row is slower iof you think in a single step, but if you have several steps connected in a process flow you can send thr rows processed to the next step and then each step don’t have to wait to the previous to be completed and you have better performance. I think row by row has its advantages, it depends if you hace syncronous or asyncronous steps (order for example needs the whole dataset to execute).

    But, I don’t know if in ELT you can do in-memory steps (for example in-memory lookup) like ETL.

    Any advice you have about ETL vs ELT will be good.
    Thanks in advance,


    Comment by Juan V. — 22 March 2016 @ 10:15 am | Reply

    • Hi Juan, people define ELT differently, so let me clear that up first.
      I define ELT as: load the data AS IS into a table in the same database, and then load it into the target table with transformation.
      And ETL as: load the data from the source file/database directly into the target table with transformations done on-the-fly.

      Some ETL tools can’t perform ETL. They must load the file into a table which is located in the same database as the target table, and then execute a stored procedure to load the data from the table into the target table. This is the main reason for doing ELT, i.e. because of limited feature of the software.
      In addition to this primary reason, people also do ELT because of speed. Loading the file into a table first could be extremely fast because the RDBMS is built to support that. And set-based operation done by the stored procedure could be very efficient because the RDBMS is built that way.

      As always it is better to explain using examples than using theory like above. As an example, Teradata FastLoad can loads a file into a table with very good performance (alternatively we can use MultiLoad or TPT). And we can insert data from this table into the target table with very good performance too because of the MPP. Using Informatica we can do ETL by directly load data from a file, do the transformation (such as looking up surrogate keys) and then upsert into the target table using ODBC connection, but the performance is not as good (Ref 1 page 3-5 and Ref 2 page 5).

      Another example of ELT is Cadis (Markit EDM), which requires us to land the data we read from a file into a table first, before running a stored procedure to load the data into the target table.

      Another example of ELT is loading a partitioned table in SQL Server using partition switching. We load the data into a table in the same database as the target table which is partitioned. Then we switch an empty partition in the target table with this table. The partition switch is a split second operation, because there is no data movement; it is only a metadata operation (Ref 3).

      I would recommend not to get too hang up with this ELT-ETL paradigm, because in reality, we will have to perform the following:
      a) Data quality check
      b) Joining with other table, and complex operation such as row stamping
      Unless the data is very very simple, these two above dictates us to load the data (from a file, or RDBMS) into a table first.

      Examples of a data quality check are: 1) Find out how many rows are in the source file, and if it is less than 10% of (last 3 days average), don’t load the data into the target table. 2) If the date column is not the same as the date on the file name, reject the file and don’t load it.
      Examples of complex operations are: 1) If the total of column X is not 100%, scale up or scale down every row to make it 100%, before loading column X into the target table. 2) Set the value of column X with A, B or C, depending on the values of column K, L, M, N according to these 10 rules, in waterfall manner, which means that if a row has been covered by rule 1, rule 2 should not touch it.
      As you can see from this 4 examples, the DQ and Complex Operations force us to load the data into a staging table first, so we can do these operations, before we can load the data into the target table.

      Ref 1:
      Ref 2:
      Ref 3:

      Comment by Vincent Rainardi — 23 March 2016 @ 8:35 am | Reply

  8. Thanks for your help Vincent, is very usefull

    Comment by Juan V. — 23 March 2016 @ 4:45 pm | Reply

  9. Hi Vincent,

    I know you have worked with Microsoft SQL Server Integration Services (SSIS ). Do you recomend that ETL tool?.
    I have worked with other ETL tools like PowerCenter, Pentaho DI (Kettle) or SAS/DI, ¿can you compare SSIS with any of these?
    Thanks in advance…, any advice about advantages and disadvantages of SSIS will be appreciatted.


    Comment by Juan V. — 1 April 2016 @ 10:21 pm | Reply

  10. […] Stored Procedure vs ETL Tool […]

    Pingback by SQL Server:SSIS vs Stored Procedure – Andy Tsai學習筆記 — 11 April 2016 @ 6:47 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: