SQLPASS Adventure Part 1: SSIS PreCon 1

This week I’m attending SQLPass, and Monday was the first Pre-Con day.  I had the pleasure of sitting in “A Day of SSIS in the Enterprise” presented by Andy Leonard (blog|twitter), Tim Mitchell (blog|twitter) and Matt Mason (blog|twitter).

Having spent a significant time in SSIS already and even had thanked a few people, including Andy, a couple years ago when I was learning how to load data warehouses, I was really looking forward to this session.

Agenda:

  • Frameworks
  • Using Metadata
  • Centralized Logging
  • Monitoring & Analytics
  • Logging in SQL Server “Denali”
  • Configurations
  • Expressions
  • Deployment
  • Data Flow Internals
  • Measuring Performance
  • Data Flow Best Practices
  • Scripting
  • Design Patterns

First was the de facto disclaimer – This is a level 300 session and it’s difficult to stay at 300 through everything – Configurations, deployment, expressions, etc. They’re covering many areas some of which we might be more expert at than others, but they were looking at being comprehensive.

Andy took the stage first and presented his work on SSIS Frameworks that he’s been honing and presenting for awhile.  It’s some great stuff and if you’ve never looked at it, do yourself a favor and go there now, I’ll wait.  While he was presenting he said the best line, which I ended up tweeting – “You can’t avoid SSIS Pain, only pick when you feel it.”  Matt then talked about some of the upcoming features in Denali and plugged his session later in the week where he’ll be making some “announcements”.

Next up, Tim took the stage and talked about the configurations, expression and deployment at which point Matt chimed in with some of the new deployment options in Denali and since we were running early, he went on to talk about Data Flow Internals before breaking for lunch.  After lunch Matt finished discussing the date flow task and then Tim did scripting.

The day ended with Matt presenting Design Patterns and listed out design patterns he “could” talk about:

  • Using the DB Engine
  • Lookups
  • Parallel Processing
  • Avoiding Transactions
  • Surrogate Key Generation
  • Partitioning
  • CDC
  • Null Value Substitution
  • Balanced Data Distribution
  • Using Multiple Transforms
  • Late Arriving Facts
  • Work Pile Pattern – Scale Out
  • Slowly Changing Dimensions
  • Change Detection
  • XML Patterns

Course we only had time to look at 5 of them (great presentations leave you wanting more, right?).

Overall all three presenters were excellent – they were both knoweldgable passionate about their material, the slide decks were very professional, demo’s were also well prepared and they handled questions while staying on schedule very well.  They were even monitoring twitter to get questions during the session as well as honoring requests to just keep going and not worry about the time.  As with any session this long they were right about there being some things that I already knew and some things I didn’t.  However even the things I knew, it was good to have a refresher on the why and even get the re-assurance that I do something right once in awhile. Smile

Now onto some stuff I learned:

  • Got pretty deep into how the Data Flow worked, talking about the Buffers, Execution Trees and Backpressure.
  • Discussed when to use synchronous, asynchronous and blocking transformations
  • Discussed SSIS strategy for performance – Operations Volume Application Location (OVAL)
  • All LOB columns are paged to disk, no exceptions, but you can change the location
  • Currency/money data types are more efficient than doubles
  • Don’t use the SQL Destination (I’ve never used this but had seen the 30% faster claims)
  • There is an actual good use case to use the SCD Wizard. Smile
  • Cascading lookups
  • How to design for scale out scenarios – this is really cool and can even utilize Service Broker, I’m really pumped to try this one out.
  • When not to use SSIS to move data
  • Perf Counter – Buffers Swap to Disk should always be 0, otherwise you’re using too much (or don’t have enough) memory
  • Why SQL 2005 expressions run time values trump design time and why SQL 2008 is the reverse
  • Don’t use SSIS Transactions (i.e., transaction property within the packages)

Thanks to Andy, Tim and Matt for a great session.  They obviously spent a lot of time preparying, in fact they were even tweeting Sunday about practicing.  Andy later said they were making tweaks during their practice session and even decided to drop talking about source control in favor of additional Design Patterns.  Very good content, thanks to the three presenters!

Finished the day at a networking event at Lowell’s.  Looking forward to  Tuesday’s “Advanced T-SQL for SQL Server 2008 and Denali” by Itzik Ben-Gan (blog)

Case Solved: SQL Function Design for Performance

This is the solution post for – The Case of the Time it takes to Convert Time Zones

So did you figure it out?  The case asked 2 questions:

  1. What was the problem?
  2. How did it happen?

What was the problem?

So lets walk through how it would convert a datetime from CT to UTC.  We’d issue a SELECT dbo.GetDateOffset(@MyDate, 1); Next the function builds 2 in memory table variables – 1 for getting the start date of DST for the given year and 1 for getting the end date of DST for the given year.  It then compares @MyDate to see if it’s within the DST range and then offsets accordingly. Now increase that to 50,000 calls, and we have some serious performance issues.

How did it happen?

It happened because of 2 reasons.  First, the mantra of “I’ll get it to work and then worry about performance”.  Secondly, and more importantly, I didn’t spend time thinking about the problem – you know creating a design.  I figured it’s just a small function, not a huge system and therefore no big deal.  However what is a huge system made up of?  Oh yeah, that’s right, small functions and other pieces of code.  Had I taken the time to actually design it I would have written it better.

The Solution

So let’s actually think about the problem again, and let’s begin by looking at Daylight Savings, but look at it a different way:

2005 and Before:

  • Begins the first Sunday in April clocks are set ahead one hour, so 2:00 a.m. becomes 3:00 a.m.
  • Ends the last Sunday in October clocks are set back one hour, so 2:00 a.m. becomes 1:00 a.m.

2006 and After:

  • Ends the second Sunday in March clocks are set ahead one hour, so 2:00 a.m. becomes 3:00 a.m.
  • Ends the first Sunday in November clocks are set back one hour, so 2:00 a.m. becomes 1:00 a.m.

That means for 10 months out of the year we know exactly what to do without any other investigation!  For 2005 and before November, December, January, February, March I need to offset by 6, while May, June, July, August, September I need to offset by 5.  For 2006 and after I shift accordingly.    Awesome!  However while that drastically reduces the amount of times the GetStartDST and GetEndDST is called, those functions are still called to decide the start and end date for April and October of 2005 and before as well as March and November 2006 and after.

We’re doing so well, let’s continue to break it down, and start with the first one.  How can we figure out the first Sunday of a April in 2005.  We know the date and we know the month, how about if we just get the weekday of the first of the month and then just offset?  The idea being that if the first of the month is a Monday, then we know the first Sunday of the month is the 7th, also if the first of the month is a Saturday we know the first Sunday of the month is the 2nd.  We don’t need to use an in memory table at all!!

So without further adieu, here was my final function:

CREATE FUNCTION dbo.GetOffsetDate
(
    @Date datetime
    , @Sign bit
    --0 = UTC to CT
    --1 = CT to UTC
)
RETURNS datetime
AS
BEGIN
    IF @Date = CAST('1753-01-01 00:00:00.000' as datetime) AND @Sign = 0 RETURN @Date;
    IF @Date = CAST('9999-12-31 23:59:59.997' as datetime) AND @Sign = 1 RETURN @Date;

    DECLARE
        @Time tinyint
        , @Datetimeoffset datetimeoffset
        , @BeginDST datetime
        , @EndDST datetime
        , @ReturnDate datetime;

    SET @Time = 0;
    SELECT @Datetimeoffset = CAST(@Date as datetimeoffset);

    IF YEAR(@Date) >= 2006
        BEGIN
            IF MONTH(@Date) IN (1,2,12) SET @Time = 1; --Not in DST
            IF MONTH(@Date) IN (4,5,6,7,8,9,10) SET @Time = 2; --In DST
        END;
    ELSE
        BEGIN
            IF MONTH(@Date) IN (1,2,3,11,12) SET @Time = 1; --Not in DST
            IF MONTH(@Date) IN (5,6,7,8,9) SET @Time = 2; --In DST
        END;

    --Still Need to determin if DST or Not, either a starting or ending month
    IF @Time = 0
        BEGIN
            DECLARE
                @DSTDate date
                , @WeekDay int;

            IF YEAR(@Date) >= 2006
                BEGIN
                    --2006 and forward second Sunday in March, clocks are set ahead - 2:00 a.m. becomes 3:00 a.m.
                    SET @DSTDate = CAST(CAST(YEAR(@Date) as varchar(4)) + '-03-01' as Date);
                    SET
                        @WeekDay =
                            CASE DATEPART(WEEKDAY, @DSTDate)
                                WHEN 1 THEN 8
                                WHEN 2 THEN 14
                                WHEN 3 THEN 13
                                WHEN 4 THEN 12
                                WHEN 5 THEN 11
                                WHEN 6 THEN 10
                                WHEN 7 THEN 9
                            END;
                    SELECT @BeginDST = CAST(CAST(@DSTDate as varchar) + ' 02:00:00' as datetime);

                    --2005 and before first Sunday in April, clocks are set ahead - 2:00 a.m. becomes 3:00 a.m.
                    SET @DSTDate = CAST(CAST(YEAR(@Date) as varchar(4)) + '-04-01' as Date);
                    SET
                        @WeekDay =
                            CASE DATEPART(WEEKDAY, @DSTDate)
                                WHEN 1 THEN 1
                                WHEN 2 THEN 7
                                WHEN 3 THEN 6
                                WHEN 4 THEN 5
                                WHEN 5 THEN 4
                                WHEN 6 THEN 3
                                WHEN 7 THEN 2
                            END;
                    SELECT @EndDST = CAST(CAST(@DSTDate as varchar) + ' 02:00:00' as datetime);
                END;
            ELSE
                BEGIN
                    --2006 first Sunday in November, clocks are set back - 2:00 a.m. becomes 1:00 a.m.
                    SET @DSTDate = CAST(CAST(YEAR(@Date) as varchar(4)) + '-11-01' as Date);
                    SET
                        @WeekDay =
                            CASE DATEPART(WEEKDAY, @DSTDate)
                                WHEN 1 THEN 1
                                WHEN 2 THEN 7
                                WHEN 3 THEN 6
                                WHEN 4 THEN 5
                                WHEN 5 THEN 4
                                WHEN 6 THEN 3
                                WHEN 7 THEN 2
                            END;
                    SELECT @BeginDST = CAST(CAST(@DSTDate as varchar) + ' 02:00:00' as datetime);

                    --2005 last Sunday in October, clocks are set back - 2:00 a.m. becomes 1:00 a.m.
                    SET @DSTDate = CAST(CAST(YEAR(@Date) as varchar(4)) + '-10-31' as Date);
                    SET
                        @WeekDay =
                            Case DATEPART(WEEKDAY, @DSTDate)
                                WHEN 1 THEN 31
                                WHEN 2 THEN 30
                                WHEN 3 THEN 29
                                WHEN 4 THEN 28
                                WHEN 5 THEN 27
                                WHEN 6 THEN 26
                                WHEN 7 THEN 25
                            END;
                    SELECT @EndDST = CAST(CAST(@DSTDate as varchar) + ' 02:00:00' as datetime);
                END;
        END;

    SELECT
        @ReturnDate =
            CASE
                --UTC to CT
                WHEN @Sign = 0
                    THEN
                        CASE
                            WHEN (@Datetimeoffset BETWEEN @BeginDST AND @EndDST) OR (@Time = 2)
                                --UTC to CDT
                                THEN
                                    SWITCHOFFSET(@Datetimeoffset, '-05:00')
                                --UTC to CST
                                ELSE
                                    SWITCHOFFSET(@Datetimeoffset, '-06:00')
                        END
                --CT to UTC
                ELSE
                    CASE
                        WHEN (@Datetimeoffset BETWEEN @BeginDST AND @EndDST) OR (@Time = 2)
                            --CDT to UTC
                            THEN
                                SWITCHOFFSET (@Datetimeoffset, '+05:00')
                            --CST to UTC
                            ELSE
                                SWITCHOFFSET (@Datetimeoffset, '+06:00')
                    END
            END;

    RETURN @ReturnDate;
END

As you’ll see it really was a rewrite, not just “fixed” for performance.

So what did we learn?  Performance is not just a bolt-on at the end and always, always, always, think about the problem before you try code a solution.

The Case of the Time it takes to Convert Time Zones

Full Disclosure – I loved Encyclopedia Brown as a kid.

A couple weeks ago I was tasked with a the problem of unifying datetime fields within disparate databases that were loaded into a single data warehouse.  The challenge was that vendor product A recorded all time in UTC while vendor product B recorded all time in CT.  CT being the composite of CDT and CST.  With all SQL servers being in UTC (therefore applying the difference of GETDATE() and GETUTCDATE() was not possible), the primary challenge was determining when the datetime needed offset by 5 hours and when it needed offset by 6 hours.

Daylight Savings Begins:

  • 2005 and before the first Sunday in April clocks are set ahead one hour, so 2:00 a.m. becomes 3:00 a.m.
  • 2006 and forward the second Sunday in March clocks are set ahead one hour, so 2:00 a.m. becomes 3:00 a.m.

Daylight Savings Ends:

  • 2005 and before the last Sunday in October clocks are set back one hour, so 2:00 a.m. becomes 1:00 a.m.
  • 2006 and after the first Sunday in November clocks are set back one hour, so 2:00 a.m. becomes 1:00 a.m.

Next I opened up SSMS and starting writing TSQL code.  Wanting to be as efficient as possible I figured I’d create a single function that could convert either direction UTC to CT or CT to UTC, the function would return the converted datetime.

So the process:

  1. Pass the datetime to the function
  2. Determine the beginning and end of Daylight savings for that year
  3. if the datetime is between the start and end date of Daylight savings offset by 5, if not offset by 6, the offset bit parameter determines whether to add or subtract

I ended up with these 3 functions:

  • dbo.GetOffsetDate – this function took in a datetime and a bit field to indicate direction (UTC to CT or CT to UTC)
  • dbo.GetDTStart – this functions was called from dbo.GetOffsetDate and took a datetime to determine the beginning of Daylight Savings time for that year
  • dbo.GetDTEnd– this functions was called from dbo.GetOffsetDate and took a datetime to determine the ending of Daylight Savings time for that year

GetOffsetDate was fairly straightforward, took in parameters called dbo.GetDTStart and dbo.GetDSTEnd and then performed an embedded case statement to determine if the date was in between DSTStart and DTSEnd or if it was an add or subtract operation.

CREATE FUNCTION dbo.GetOffsetDate
(
    @Date datetime
    , @Sign bit
    --1 = CT to UTC
    --0 = UTC to CT
)
RETURNS datetime
AS
BEGIN
    DECLARE
        @BeginDST datetime
        , @EndDST datetime
        , @Datetimeoffset datetimeoffset
        , @ReturnDate datetime;

    SELECT @Datetimeoffset = CAST(@Date as datetimeoffset);

    SELECT
        @BeginDST = dbo.GETDSTStart(@Date)
        , @EndDST = dbo.GETDSTEnd(@Date)

    SELECT
        @ReturnDate =
        CASE
            WHEN @Datetimeoffset BETWEEN @BeginDST AND @EndDST
                THEN
                    CASE
                        WHEN @Sign = 1
                            THEN
                                SWITCHOFFSET(@Datetimeoffset, '+05:00')
                            ELSE
                                SWITCHOFFSET (@Datetimeoffset, '-05:00')
                    END
            ELSE
                CASE
                    WHEN @Sign = 0
                        THEN
                            SWITCHOFFSET(@Datetimeoffset, '+06:00')
                        ELSE
                            SWITCHOFFSET (@Datetimeoffset, '-05:00')
                END
        END;

    RETURN @ReturnDate;
END

The real magic as they say was how it was determining the first Sunday in April, second Sunday in March, last Sunday in October and first Sunday in November.   So passing in a date I created a table variable and inserted 31 rows (no month has more than 31 days).  I then inserted dates beginning with the first of the appropriate month (March, April, October or November) for all 31 rows.  I then generated a ROW_NUMBER over the Sundays in that month and took the appropriate numbered Sunday (first, second or last – last I generated a ROW_NUMBER using DESC and grabbed the first).

CREATE FUNCTION dbo.GETDSTStart
(
    @Date datetime
)
RETURNS datetime
AS
BEGIN
    --2005 and before first Sunday in April, clocks are set ahead one hour at 2:00 a.m. becomes 3:00 a.m.
    --2006 and forward second Sunday in March, clocks are set ahead one hour at 2:00 a.m. becomes 3:00 a.m. 

    DECLARE
        @BeginDSTDate date
        , @BeginDST datetime;
    DECLARE @Begin table (pkID int NOT NULL IDENTITY(1,1), DT date NULL);

    IF YEAR(@Date) >= 2006
        BEGIN
            INSERT INTO @Begin DEFAULT VALUES
            WHILE SCOPE_IDENTITY() < 31
                INSERT INTO @Begin DEFAULT VALUES;

            UPDATE
                @Begin
            SET
                DT = DATEADD(Day, pkID-1, cast(CAST(YEAR(@Date) as varchar(4)) + '-03-01' AS Date));

            SELECT
                @BeginDSTDate = DT
            FROM
                (
                SELECT
                    DT
                    , ROW_NUMBER() OVER(ORDER BY pkID ASC) AS [RowNumber]
                FROM
                    @Begin
                WHERE
                    ISNULL(DATEPART(Weekday, DT), 0) = 1
                ) d
            WHERE
                d.RowNumber = 2;
        END
    ELSE
        BEGIN
            INSERT INTO @Begin DEFAULT VALUES
            WHILE SCOPE_IDENTITY() < 30
                INSERT INTO @Begin DEFAULT VALUES;

            UPDATE
                @Begin
            SET
                DT = DATEADD(Day, pkID-1, cast(CAST(YEAR(@Date) as varchar(4)) + '-04-01' AS Date));

            SELECT
                @BeginDSTDate = DT
            FROM
                (
                SELECT
                    DT
                    , ROW_NUMBER() OVER(ORDER BY pkID ASC) AS [RowNumber]
                FROM
                    @Begin
                WHERE
                    ISNULL(DATEPART(Weekday, DT), 0) = 1
                ) d
            WHERE
                d.RowNumber = 1;
        END

    SELECT
        @BeginDST = CAST(CAST(@BeginDSTDate as varchar) + ' 02:00:00' as datetime);

    RETURN @BeginDST;
END
CREATE FUNCTION dbo.GETDSTEnd
(
    @Date datetime
)
RETURNS datetime
AS
BEGIN
    --2005 last Sunday in October, clocks are set back one hour at 2:00 a.m. becomes 1:00 a.m.
    --2006 first Sunday in November, clocks are set back one hour at 2:00 a.m. becomes 1:00 a.m. 

    DECLARE
        @EndDSTDate date
        , @EndDST datetime;
    DECLARE @End table (pkID int NOT NULL IDENTITY(1,1), DT date NULL);

    IF YEAR(@Date) >= 2006
        BEGIN
            INSERT INTO @End DEFAULT VALUES
            WHILE SCOPE_IDENTITY() < 30
                INSERT INTO @End DEFAULT VALUES;

            UPDATE
                @End
            SET
                DT = DATEADD(Day, pkID-1, cast(CAST(YEAR(@Date) as varchar(4)) + '-11-01' AS Date));

            SELECT
                @EndDSTDate = DT
            FROM
                (
                SELECT
                    DT
                    , ROW_NUMBER() OVER(ORDER BY pkID ASC) AS [RowNumber]
                FROM
                    @End
                WHERE
                    ISNULL(DATEPART(Weekday, DT), 0) = 1
                ) d
            WHERE
                d.RowNumber = 1;
        END
    ELSE
        BEGIN
            INSERT INTO @End DEFAULT VALUES
            WHILE SCOPE_IDENTITY() < 31
                INSERT INTO @End DEFAULT VALUES;

            UPDATE
                @End
            SET
                DT = DATEADD(Day, pkID-1, cast(CAST(YEAR(@Date) as varchar(4)) + '-10-01' AS Date));

            SELECT
                @EndDSTDate = DT
            FROM
                (
                SELECT
                    DT
                    , ROW_NUMBER() OVER(ORDER BY pkID DESC) AS [RowNumber]
                FROM
                    @End
                WHERE
                    ISNULL(DATEPART(Weekday, DT), 0) = 1
                ) d
            WHERE
                d.RowNumber = 1;
        END
    SELECT
        @EndDST = CAST(CAST(@EndDSTDate as varchar) + ' 02:00:00' as datetime);

    RETURN @EndDST;
END

Now that it worked, I threw it into my SSIS package at which point the ETL dropped from about a minute to 25!!.  Unacceptable.  I started looking at the code trying to find improvements to no avail.  And then it dawned on me not only what the problem was, but more importantly why it happened.  I quickly resolved the problem and learned from why it happened.  Within the hour I was able to get an accurate and performant solutions.  What was the problem?  How did it happen?

Tune into the next post to find out.