Categories

SSIS CDC LSN Debugging nightmare introduction

The other day I was working on an SSIS implementation that was utilizing Change Data Capture (CDC) to identify the source data that needed to be transformed into the destination.  This is the first time that I had used CDC other than a quick demo and I was grateful to have Whitney’s help with understanding some of the…shall we say eccentricities of CDC.

You can see Change Data Capture since Last Request Package Sample or Change Data Capture for Specified Interval Package Sample for the official MS examples which is the basis of what I was trying to accomplish, although in the specified interval sample they cheated by using a database snapshot to get the starting LSN and then just ran continuously from there keeping the LSNs in variables, they didn’t address the concept of true interval run that would be come to a stopping point and then start again after an interval passes.

So to give a little context to this post you need to understand some basics of SSIS and CDC.  CDC utilizes Log Sequence Numbers (LSNs) to know the start and stop point of the changes you are trying to identify.  CDC is table based and logs those changes to an “instance” of the table that you specify, which is basically just a copy of the table DDL with a few added columns, the new table exists within the cdc schema and the table name is what you specify via the “instance name” parameter. You can track changes on the entire table or just specific columns, and there are functions to get all the changes or just the net changes.

Within CDC you have some very helpful system functions to assist with figuring out what LSNs you should be selecting, including:

  • sys.fn_cdc_get_max_lsn
    sys.fn_cdc_get_min_lsn
    sys.fn_cdc_map_time_to_lsn
    sys.fn_cdc_map_lsn_to_time

The other thing to remember is that LSNs are binary.  In SSIS there is no variable type binary.  The first odd thing I noticed (and this should have been a clue to the difficulties that lie ahead) was that the sys.fn_map_time_to_lsn has a strange parameter called the ‘relational operator’ and I kid you not these are the potential values:

  • largest less than
    largest less than or equal
    smallest greater than
    smallest greater than or equal

Yes the actual text strings are what is needed to be entered there.  The idea being that LSNs aren’t directly time related to so you need to specify the relationship in relation to the datetime your passing. 

The next  thing I noticed that is strange was that sys.fn_cdc_get_min_lsn must be instance specific, while sys.fn_cdc_get_max_lsn is the entire server.  So hopefully this will be a little more straight forward in the next version. <TangentRant>although that’s what I thought about Service Broker which I love, but sadly still isn’t that straight forward, however that’s for another post</TangentRant> 

So to cover all bases this was my process:

1.) The parent package would get the LSN boundaries based upon the last successful run of the package (StartLSN) and the current run (EndLSN), passing those variables to the children packages.

2.) The child package which is instance specific, would then verify that that StartLSN was not less than the minimum LSN for that instance, if it was, than use the MinLSN instead. 

3.) Utilize the TVF to get the net changes cdc.fn_cdc_get_net_changes_<capture_instance> (to get all changes you would use cdc.fn_cdc_get_all_changes_<capture_instance>)

A couple other issues here when using with SSIS here:

1.) It’s a TVF so if you pass bad parameters as start and end LSNs, you get an error in addition to the column header information:

“An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_net_changes_ … .”

2.) It’s a TVF so you can’t use it as an OLEDB data source, instead you have to use an ssis variable as the data source, which mean the query needs to be set with the proper query with null values for the Start and End LSNs (remember even though it brings back an error it will bring back column information).  This also means the script task is needed to change the alter the query string variable to substitute the proper start and end LSNs at runtime.

So now that we have the basis of what I was doing, now onto the debugging…

1 comment to SSIS CDC LSN Debugging nightmare introduction

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>