“Insert select” is an “old school” way of populating a target table:
insert into target_table (col1, col2, …, col N)
select col1, col2, …, col N from source_table
Performance can be very good (in comparison to the “workflow” approach). Particularly if you have a very powerful DB engine (talking about Exadata, PDW, Netezza, Teradata)
The main disadvantage is obviously both source and target tables need to be on the same server, unless you create a linked server.
- Index on join columns *
- Partition on join column *
- Empty it (truncate)
- No index on target table *
- No FK on target table
- Simple recovery mode *
- Enough usable space on data files (don’t let the file grow when insert is happening) *
- Put file groups on separate spindles
- If joining large table with large table like this: have large memory (64 or 128GB)
- Put file groups on RAID 10 disks instead of RAID 5
- Use 2008 instead of 2005 (select insert is bulk enabled in 2008)
* means significant impact
I know that not everything is possible, e.g. we can’t empty the table, etc. But the above list might give you some “ideas” about what can be done to increase your loading speed. Say from 1 hour to 10 mins. I’ve seen 10 fold increase by implementing some of the star ones above.
As usual, I welcome discussions and questions at firstname.lastname@example.org or via comments
Addition, 3/9/2010: In 2008 Insert Select could be minimally logged (in 2005 we can’t do this). This is done by simply adding tablock query hint, i.e. “insert into table1 with (tablock) select col1 from table2”. Sorry I forgot to mention this in the original post, for details please see “Minimally Logged Insert” in my “SQL Server 2008 Data Warehousing Features” presentation.