Merge is a Transact SQL statement that enables us to perform insert, update and delete in one statement.
In SQL Server data warehousing, merge is typically used for:
- Updating SCD type 1 or type 2 dimensions
- Incremental loading of transaction or accumulated fact tables
- Updating a data mart from a data warehouse, particularly dimension tables
Why use merge instead of SSIS?
- No, we still use merge in SSIS for ETL (rather than stored procedures), for code maintainability and ability to reach various platforms.
- Some projects use SPs for ETL (after landing the data to the DW SQL Server) because they believe it’s faster. But that is for another blog post.
The question should be why using merge in SSIS instead of SCD2 or LookUp + OLEDB?
- Because merge is more efficient (better performance)
- Because merge also does delete, in addition to insert and update
(like replication but set based)
dim2 is a dimension table with 3 columns: dimkey2, attr1, attr2, containing 1000 rows. In this example we want to update dim2 to reflect changes (update, insert, delete) in the source table which is a master data table. First we create the master data table, by copying from dim2 and modifying it.
select * from dim2
— Create the source table for dim2
select * into master2 from dim2
alter table master2 drop column dimkey2
select * from master2
— Create a new row, delete a row and update a row in the source table
update master2 set attr2 = ‘Value1a’ where attr1 = ‘Value1’
delete from master2 where attr1 = ‘Value4’
insert into master2 (attr1, attr2) values (‘Value4a’, ‘Value4a’)
— Now do a merge to update dim2
select * into dim2backup from dim2 –backup first
merge dim2 as T
using master2 as S on (T.attr1 = S.attr1)
when matched and T.attr2 <> S.attr2 then update set T.attr2 = S.attr2
when not matched by target then insert (attr1,attr2) values (S.attr1,S.attr2)
when not matched by source then delete
output deleted.*, inserted.*, $action;
- It’s a join operation so index on join columns
- Put the constraint on the when clause, rather than on the join condition
(No ‘simple parameterisation’ for merge)
- If you need to put contraints (literals) on the join condition, put the merge in a stored proc so the literals are parameterised. Or specify ‘parameterization forced’ query hint.
- Put the constraint/filter on a view
- Careful when using CTE