SSIS CDC LSN Debugging nightmare first problem

Thursday, 17. December 2009 7:00 | Author:Aaron Lowe

So now that we have the idea let’s look as the four main problems I encountered:

The first problem I noticed was that the SQL query that was running still had nulls in on the StartLSN and EndLSN

so I was querying:

cdc.fn_cdc_get_net_changes_<instance name> (null, null, ‘all’)

Which as I said last post, returns an error. So throw a break point into the Script task and figure out why it’s not setting properly.  This took me awhile, here’s the code:

var vStartLSN = Dts.Variables["User::sStartLSN"];
var vEndLSN = Dts.Variables["User::sEndLSN"];
var vSQLQuery = Dts.Variables["User::sSQLQuery"];

string sStartLSN;
string sEndLSN;

Dts.VariableDispenser.LockForRead("User::sStartLSN");
Dts.VariableDispenser.LockForRead("User::sEndLSN");
Dts.VariableDispenser.LockForWrite("User::sSQLQuery");

sStartLSN = vStartLSN.Value.ToString();
sEndLSN = vEndLSN.Value.ToString();

vSQLQuery.Value =
    "SELECT " +
    "    c.__$operation " +
    "    , c.Column1 " +
    "    , c.Column2 " +
    "    , ColumnN " +
    "FROM " +
    "    cdc.fn_cdc_get_net_changes_<instance_name>( " + sStartLSN + ",  " + sEndLSN + ", 'all') c";
Dts.Variables.Unlock();

So hitting my first breakpoint, I first realized that the StartLSN variable was receiving a null.  That wasn’t supposed to happen, so I looked at the SQL code I used to set the variables in the SQL Task:

SELECT sys.fn_cdc_get_min_lsn(<instance_name>) as @StartLSN

A-ha, that’s it. The function returns a record set and I had the task to just get a return value, not a record set.

So changed to:

SELECT @StartLSN = sys.fn_cdc_get_min_lsn(<instance_name>);

SELECT @StartLSN as StartLSN

That got me past the first problem, now onto the next problem.

Category:2008, CDC, SSIS | Comment (0)

SSIS CDC LSN Debugging nightmare introduction

Wednesday, 16. December 2009 7:00 | Author:Aaron Lowe

The other day I was working on an SSIS implementation that was utilizing Change Data Capture (CDC) to identify the source data that needed to be transformed into the destination.  This is the first time that I had used CDC other than a quick demo and I was grateful to have Whitney’s help with understanding some of the…shall we say eccentricities of CDC.

You can see Change Data Capture since Last Request Package Sample or Change Data Capture for Specified Interval Package Sample for the official MS examples which is the basis of what I was trying to accomplish, although in the specified interval sample they cheated by using a database snapshot to get the starting LSN and then just ran continuously from there keeping the LSNs in variables, they didn’t address the concept of true interval run that would be come to a stopping point and then start again after an interval passes.

So to give a little context to this post you need to understand some basics of SSIS and CDC.  CDC utilizes Log Sequence Numbers (LSNs) to know the start and stop point of the changes you are trying to identify.  CDC is table based and logs those changes to an “instance” of the table that you specify, which is basically just a copy of the table DDL with a few added columns, the new table exists within the cdc schema and the table name is what you specify via the “instance name” parameter. You can track changes on the entire table or just specific columns, and there are functions to get all the changes or just the net changes.

Within CDC you have some very helpful system functions to assist with figuring out what LSNs you should be selecting, including:

  • sys.fn_cdc_get_max_lsn
    sys.fn_cdc_get_min_lsn
    sys.fn_cdc_map_time_to_lsn
    sys.fn_cdc_map_lsn_to_time

The other thing to remember is that LSNs are binary.  In SSIS there is no variable type binary.  The first odd thing I noticed (and this should have been a clue to the difficulties that lie ahead) was that the sys.fn_map_time_to_lsn has a strange parameter called the ‘relational operator’ and I kid you not these are the potential values:

  • largest less than
    largest less than or equal
    smallest greater than
    smallest greater than or equal

Yes the actual text strings are what is needed to be entered there.  The idea being that LSNs aren’t directly time related to so you need to specify the relationship in relation to the datetime your passing. 

The next  thing I noticed that is strange was that sys.fn_cdc_get_min_lsn must be instance specific, while sys.fn_cdc_get_max_lsn is the entire server.  So hopefully this will be a little more straight forward in the next version. <TangentRant>although that’s what I thought about Service Broker which I love, but sadly still isn’t that straight forward, however that’s for another post</TangentRant> 

So to cover all bases this was my process:

1.) The parent package would get the LSN boundaries based upon the last successful run of the package (StartLSN) and the current run (EndLSN), passing those variables to the children packages.

2.) The child package which is instance specific, would then verify that that StartLSN was not less than the minimum LSN for that instance, if it was, than use the MinLSN instead. 

3.) Utilize the TVF to get the net changes cdc.fn_cdc_get_net_changes_<capture_instance> (to get all changes you would use cdc.fn_cdc_get_all_changes_<capture_instance>)

A couple other issues here when using with SSIS here:

1.) It’s a TVF so if you pass bad parameters as start and end LSNs, you get an error in addition to the column header information:

“An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_net_changes_ … .”

2.) It’s a TVF so you can’t use it as an OLEDB data source, instead you have to use an ssis variable as the data source, which mean the query needs to be set with the proper query with null values for the Start and End LSNs (remember even though it brings back an error it will bring back column information).  This also means the script task is needed to change the alter the query string variable to substitute the proper start and end LSNs at runtime.

So now that we have the basis of what I was doing, now onto the debugging…

Category:2008, CDC, SSIS | Comments (1)

VSDBPro Permissions part 2

Monday, 12. October 2009 9:00 | Author:Aaron Lowe

So now that we have the login and the user set up, how do we actually assign permissions? 

I prefer to assign permissions to roles and add users to roles which mirrors the user/groups of windows, so going down that path we have to create a role (going under Security –> Roles):

CREATE ROLE [TestRole]

Now comes the part that isn’t very intuitive.  To associate a user with a role you have to go back to the Solution Explorer and actually create a generic user script file (see here), you can call it whatever you want and place it wherever you want in the project, however if you reverse engineer the database it will be called <ProjectName>.rolememberships.sql under the schema objects folder:

EXECUTE sp_addrolemember @rolename = N'TestRole', @membername = N'TestUser';

Now we have the role defined and the users as part of the role, we can start adding permissions, this is where you’ll have loads of fun provided you enjoy manually editing xml documents.  Again in Solution Explorer under Properties you’ll see a Database.sqlpermissions.sql file:

Properties

In there you’ll see some xml with examples:

<?xml version="1.0" encoding="utf-8"?>
<?XML:NAMESPACE PREFIX = [default] urn:Microsoft.VisualStudio.Data.Schema.Permissions NS = "urn:Microsoft.VisualStudio.Data.Schema.Permissions" />
  

So as you can see while it’s not that intuitive it is pretty straight forward.

Category:VSDBPro | Comment (0)