Data Warehousing and Data Science

11 March 2010

Change Data Capture in SQL Server Data Warehousing

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 10:12 pm
Tags: ,

CDC is a mechanism to store data changes (update, insert and delete) happening in a source table by reading transaction log files. These changes are stored in a change table, enabling us to apply those changes to the a target table incrementally.

In SQL Server data warehousing, CDC is typically used for:

  • Propagating changes in the source system into DW
  • Drip feed ODS changes into DW
  • Updating dimensions in data marts
  • Propagating standing data changes into DW

Why use CDC instead of SSIS* or replication:

  • Near real time ETL
  • Small volume of data
  • More efficient than replication
  • Auditable
  • Configurable clean up

*We still use CDC in SSIS, not outside SSIS. The question should be why use CDC in SSIS instead of the normal daily load dataflow package.

Example / demo:
— enable DB for CDC and verify
use Test
exec sys.sp_cdc_enable_db
select is_cdc_enabled from sys.databases where name = ‘Test’
–output:1

— create and populate the source table (must have PK):
create table master1
( attr1 int not null primary key clustered,
attr2 varchar(20) not null
)
insert into master1 values (1,’A’)
insert into master1 values (2,’B’)
insert into master1 values (3,’C’)
select * from master1

— enable the table for CDC and verify:
exec sys.sp_cdc_enable_table @source_schema = ‘dbo’,
@source_name = ‘master1’, @role_name = ‘cdc_admin’,
@supports_net_changes = 1
select is_tracked_by_cdc from sys.tables where name = ‘master1’
–output: 1

— update the source table
update master1 set attr2 = ‘A2’ where attr1 = ‘1’
delete from master1 where attr1 = 2
insert into master1 (attr1, attr2) values (5, ‘E’)
select * from master1

1     A2
3     C
5     E

— retrieve the changes:
select * from cdc.fn_cdc_get_all_changes_dbo_master1
(sys.fn_cdc_get_min_lsn(‘dbo_master1’), sys.fn_cdc_get_max_lsn(), ‘all’)

select * from cdc.fn_cdc_get_net_changes_dbo_master1
(sys.fn_cdc_get_min_lsn(‘dbo_master1’), sys.fn_cdc_get_max_lsn(), ‘all’)

CDC functions:

— check CDC metadata tables

select * from cdc.captured_columns

select * from cdc.change_tables

select * from cdc.dbo_master1_CT

select * from cdc.lsn_time_mapping

exec sys.sp_cdc_help_change_data_capture

Disadvantages:

  • Lots of change tables and functions
  • Bad for big changes e.g. truncate & reload

Mechanism:

  • Internally implemented as sp_replcmds so reduces log contention
  • 2 agent jobs: capture and clean up
  • Column data type changed: still works
  • Column dropped: null
  • New column: ignored
  • Table dropped: change table dropped

Optimisation:

  • Stop the capture job during load
  • When applying changes to target, it is ideal to use merge

CDC Templates:

2 Comments »

  1. […] Changed data capture: enables us to drip changes from DW to DMs, or from ODS/OS to DW […]

    Pingback by SQL 2008 Data Warehousing Features « Data Warehousing and Business Intelligence — 11 March 2010 @ 10:17 pm | Reply

    • By nature DM is batch process that involve level of agregation. CDC is ideal process for DW tables population by capturing events on operational sources. Properly designed DW (on the event level) with CDC enabled on operational sources will help you to avoid data quality issues that exists within oparational databases.
      A tarnsaction in operational database can have multiply events or one event only. The point is that you have to design DW to capture all events around the transaction. And thats where you use CDC. If you fall in trap and your DW design is transaction centric, than practically you are migrating business rules and all data issues from transactional databases to DW.

      Comment by Milovan Banicevic — 21 April 2011 @ 4:24 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: