Data Warehousing and Business Intelligence

10 August 2010

General performance considerations for “insert select”

Filed under: Analysis Services — Vincent Rainardi @ 3:26 pm

“Insert select” is an “old school” way of populating a target table:

insert into target_table (col1, col2, …, col N)
select col1, col2, …, col N from source_table

“insert select” works not only in SQL Server but also in most RDBMS including Oracle, DB2, Postgres, Sybase, Ingres, MySQL, Teradata and Netezza.

Performance can be very good (in comparison to the “workflow” approach). Particularly if you have a very powerful DB engine (talking about Exadata, PDW, Netezza, Teradata)

The main disadvantage is obviously both source and target tables need to be on the same server, unless you create a linked server.

Source tables:

  1. Index on join columns *
  2. Partition on join column *

Target table:

  1. Empty it (truncate)
  2. No index on target table *
  3. No FK on target table

Database:

  1. Simple recovery mode *
  2. Enough usable space on data files (don’t let the file grow when insert is happening) *
  3. Put file groups on separate spindles

Server:

  1. If joining large table with large table like this: have large memory (64 or 128GB)
  2. Put file groups on RAID 10  disks instead of RAID 5
  3. Use 2008 instead of 2005 (select insert is bulk enabled in 2008)

* means significant impact

I know that not everything is possible, e.g. we can’t empty the table, etc. But the above list might give you some “ideas” about what can be done to increase your loading speed. Say from 1 hour to 10 mins. I’ve seen 10 fold increase by implementing some of the star ones above.

As usual, I welcome discussions and questions at vrainardi@gmail.com or via comments

Addition, 3/9/2010: In 2008 Insert Select could be minimally logged (in 2005 we can’t do this). This is done by simply adding tablock query hint, i.e. “insert into table1 with (tablock) select col1 from table2”. Sorry I forgot to mention this in the original post, for details please see “Minimally Logged Insert” in my “SQL Server 2008 Data Warehousing Features” presentation.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: