Data Warehousing, BI and Data Science

12 March 2010

Change Tracking in SQL Server Data Warehousing

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 9:43 pm
Tags: ,

Change Tracking is a mechanism to detect that a row has changed. It does not provide information about which column has changed. In SQL Server data warehousing, Change Tracking is typically used for:

  • Propagating transactions from the source system in real time, to ODS or DW
  • Synchronising DMs with DW

Why use Change Tracking instead of CDC or triggers?

  • Real time (synchronous)
  • Lightweight
  • Two way
  • No change to source table
  • Saving months (£) of development time

As I explained here, for incremental extraction we need to rely on last updated timestamp or identity columns. If we don’t have them, we could create CRC column to detect changes in rows. The extract of 2 columns (PK & CRC) is compared with previous day extract to detect: deletion, new rows and changed rows. We then use the PK to bring those rows to DW.

Now with Change Tracking the mechanism is provided out of the box. Yes, true that the source needs to be in 2008. But it’s only a matter of time before those 2005 boxes are upgraded to 2008. So in 2-3 years’ time I expect many SQL Server data warehousing projects will be utilising Change Tracking to provide real time data warehousing solution.

Disadvantages:

  • Does not provide the data that is changed

Demo/example:

— create an populate source table
drop table master3
create table master3
( attr1 int not null primary key clustered,
attr2 varchar(10),
attr3 varchar(10)
)

declare @i int
set @i = 1
while @i <= 1000
begin
insert into master3 (attr1, attr2, attr3)
values(@i, ‘Value’+CONVERT(varchar,@i), ‘Value’+CONVERT(varchar,@i))
set @i += 1
end

select * from master3

— enable change tracking and verify
alter database test set change_tracking = on
( change_retention = 3 days, auto_cleanup = on)

alter table master3 enable change_tracking
with (track_columns_updated = on)

select t.name, c.*
from sys.change_tracking_tables c
inner join sys.tables t on c.object_id = t.object_id

— obtain current version
declare @CT_Ver bigint
set @CT_Ver = change_tracking_current_version()

— make changes to the table
update master3 set attr2 = ‘Value2a’ where attr1 = 2
update master3 set attr3 = ‘Value1a’ where attr1 = 1
delete from master3 where attr1 = 3
insert into master3 (attr1, attr2) values (3000, ‘Value3a’)

— query changes
select CT.attr1, CT.sys_change_operation,
CT.sys_change_columns, CT.sys_change_context
from changetable(changes master3, @CT_Ver) as CT

— check column ID
select c.* from sys.columns c
join sys.tables t on t.object_id = c.object_id
where t.name = ‘master3’

Advertisements

2 Comments »

  1. To add to it, deletion detection can be a two stage process, as most of the OLTP systems does a soft delete. Soft delete is a kind of update where some column like “IsDeleted” is updated with “true/false” to mark a record as deleted.

    Comment by Siddharth Mehta — 13 March 2010 @ 11:45 pm | Reply

  2. […] Change tracking: real time, lightweight propagation of data changes into DW Leave a Comment […]

    Pingback by SQL 2008 Data Warehousing Features « Data Warehousing and Business Intelligence — 31 March 2010 @ 6:19 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: