Data Warehousing and Business Intelligence

10 March 2010

Merge in SQL Server Data Warehousing

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 7:20 pm
Tags: ,

Merge is a Transact SQL statement that enables us to perform insert, update and delete in one statement.

In SQL Server data warehousing, merge is typically used for:

  • Updating SCD type 1 or type 2 dimensions
  • Incremental loading of transaction or accumulated fact tables
  • Updating a data mart from a data warehouse, particularly dimension tables

Why use merge instead of SSIS?

  • No, we still use merge in SSIS for ETL (rather than stored procedures), for code maintainability and ability to reach various platforms.
  • Some projects use SPs for ETL (after landing the data to the DW SQL Server) because they believe it’s faster. But that is for another blog post.

The question should be why using merge in SSIS instead of SCD2 or LookUp + OLEDB?

  • Because merge is more efficient (better performance)
  • Because merge also does delete, in addition to insert and update
    (like replication but set based)

Example/demo:

dim2 is a dimension table with 3 columns: dimkey2, attr1, attr2, containing 1000 rows. In this example we want to update dim2 to reflect changes (update, insert, delete) in the source table which is a master data table. First we create the master data table, by copying from dim2 and modifying it.

select * from dim2

— Create the source table for dim2
select * into master2 from dim2
alter table master2 drop column dimkey2
select * from master2

— Create a new row, delete a row and update a row in the source table
update master2 set attr2 = ‘Value1a’ where attr1 = ‘Value1’
delete from master2 where attr1 = ‘Value4’
insert into master2 (attr1, attr2) values (‘Value4a’, ‘Value4a’)

— Now do a merge to update dim2
select * into dim2backup from dim2 –backup first
merge dim2 as T
using master2 as S on (T.attr1 = S.attr1)
when matched and T.attr2 <> S.attr2 then update set T.attr2 = S.attr2
when not matched by target then insert (attr1,attr2) values (S.attr1,S.attr2)
when not matched by source then delete
output deleted.*, inserted.*, $action;

Optimisation:

  • It’s a join operation so index on join columns
  • Put the constraint on the when clause, rather than on the join condition
    (No ‘simple parameterisation’ for merge)
  • If you need to put contraints (literals) on the join condition, put the merge in a stored proc so the literals are parameterised. Or specify ‘parameterization forced’ query hint.
  • Put the constraint/filter on a view
  • Careful when using CTE

1 Comment »

  1. […] Merge: faster loading […]

    Pingback by SQL 2008 Data Warehousing Features « Data Warehousing and Business Intelligence — 11 March 2010 @ 12:06 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: