The Final day of the conference began with Rob Farley (twitter | blog) and Buck Woody (twitter | blog) singing a Rob Farley original “I Should have Looked the Other Way” which you can see here (audio compression issues), for the performance or here for the lyrics version (start at 2:20).
Next up was board announcements such as Wayne Snyder and Rick Heiges both rolling off the board. While they were bringing Wayne up on stage they were showing quotes from people within the community about Wayne, all of which were very moving, however the one that I thought really stood out was about (and I apologize that I didn’t catch who said it) how Wayne transformed SQLPASS from a technical conference into a family reunion.
Keynote
Then onto the final keynote done by Dr. David DeWitt (site), you can download his presentation here (this is a summary of that presentation, if you are really interested, go download it or better yet, watch it). Dr. DeWitt was a professor for years and has mastered the ability of explaining complex concepts. During the presentation we were able to email questions which were answered after the presentation. Today he decided to tackle Big Data.
In 2009 there was 0.8 Zettabyte (ZB) (1 million petabytes/1 trillion terabytes/1 quadrillion GB) and by the years 2020 the expectation is to have 35 ZBs (growth factor of 44). That’s a lot of data, much of it coming from sources such as sensors, Web2.0, Web clicks, etc. However much of this data while valuable to store, we don’t necessarily care about ACID properties or relational integrity or other things that would utilize a traditional RDBMS. Point being as always, right tool for the right job.
So in these cases, what is the right tool? Well with Wednesday’s announcement, looks like Microsoft is putting their weight behind Hadoop and MapReduce, which offers:
- Scalability and a high degree of fault tolerance
- Ability to quickly analyze massive collections of records without forcing data to first be modeled, cleansed and loaded
- Easy to use programming paradigm for writing and executing analysis programs that scale to 1000s of nodes and PBs of data
- Low up front software and hardware costs
So what’s the system look like:
- Hadoop Distributed File System (HDFS) – objectives are load balancing, fact access and fault tolerance, designed with the expectations that hardware/software failures
- MapReduce – framework for writing/executing distributed, fault tolerant algorithms – 2 functions map which divided a large problem into smaller problems and then performs the same function on all smaller problems and reduce which then combines the results.
- Hive & Pig – Hive was created by Facebook as a and is SQL-like, while Pig was created by Yahoo and is more procedural; both target MapReduce jobs. However due to the complexity of MapReduce, HiveQL was created to combine the best features of SQL with MapReduce
- Sqoop – package for moving data between HDFS and relational DB systems via command line load and unload utilities
He then showed some performance metrics of SQL PDW and stated
I assert that it is MUCH easier to add support to SQL Server PDW for unstructured data (w/o having to load it), improved scalability, and fault tolerance than it is to ever get competitive performance from a Hadoop-based system
But again the point being that both of these type of systems (RDBMS and Hadoop) are going to be working together, it will not be a case of choosing one or the other.
Sessions
After the keynote I sat in Adam Machanic’s (twitter | blog) Query Tuning Mastery for a bit before going downstairs to host the “SSIS for all, DBAs developers, etc.” table at the Birds of a Feather luncheon with Matt Masson (twitter | blog). Ted Krueger (twitter | blog) had to leave leave early and Mike Walsh (twitter | blog) was looking for volunteers. Had some good conversation ranging from “What is SSIS” to “How do I do meta-driven SSIS”
After that I jumped into Rewrite Your T-SQL for Great Good by Jeremiah Peschka (twitter | blog). Jeremiah’s slides as always excellent, and I have a similar presentation entitled Writing Professional Database Code so I figured I’d go and borrow check it out. I was glad I did as it had a different focus and perspective. This just underscored to me again that each person brings their own experiences and perspective which is valuable. In all honesty it is a common excuse that I used (as well as others) as to why I couldn’t present or blog, because the content was already out there. So lesson learned again, stop making excuses.
Anyway he talked about consistency and gave a link to open source Unit Testing tools. As a user of TFS, I haven’t had much experience with these and look forward to investigating them. He gave a lot of query performance options and even threw down the gauntlet for everyone who uses “Distinct” in a query to explain the necessity of it as most of the times it’s used due to either a bad data model or bad joins. Great stuff!
Last I went to Are you a Linchpin? Career Management lessons to help you become indispensable. This was a panel discussion with Q&A at the end Linchpin: Are You Indispensable by Seth Godin as the starting point of the discussion. The panel was made up of:
- Andy Warren (twitter | blog)
- Brent Ozar (twitter | blog | )
- Jeremiah Peschka (twitter | blog)
- Kevin Kline (twitter | blog)
- Louis Davidson (twitter | blog)
- Stacia Misner (twitter | blog)
- Thomas LaRock (twitter | blog)
- And Moderator: Andy Leonard (twitter | blog)
One of the interesting things that Kevin said (attributing credit to Brent) was that
People are going to remember you for 1, 2, maybe 3 adjectives. What adjectives do you want to be known for?
My mother had always told me something similar:
People won’t remember what you did, but they will remember how you made them feel
Anyway it was a great conversation with some interesting Q&A, would love to see more of these type of things.
Whew, that’s it for the summit, had a great time and looking forward to next years!