Category Archives: Database

SQLPASS Conference Day 3, the Last Day

The Final day of the conference began with Rob Farley (twitter | blog) and Buck Woody (twitter | blog) singing a Rob Farley original “I Should have Looked the Other Way” which you can see here (audio compression issues), for the performance or here for the lyrics version (start at 2:20).

Next up was board announcements such as Wayne Snyder and Rick Heiges both rolling off the board.  While they were bringing Wayne up on stage they were showing quotes from people within the community about Wayne, all of which were very moving, however the one that I thought really stood out was about (and I apologize that I didn’t catch who said it) how Wayne transformed SQLPASS from a technical conference into a family reunion.

Keynote

Then onto the final keynote done by Dr. David DeWitt (site), you can download his presentation here (this is a summary of that presentation, if you are really interested, go download it or better yet, watch it).  Dr. DeWitt was a professor for years and has mastered the ability of explaining complex concepts.  During the presentation we were able to email questions which were answered after the presentation.  Today he decided to tackle Big Data.

In 2009 there was 0.8 Zettabyte (ZB) (1 million petabytes/1 trillion terabytes/1 quadrillion GB) and by the years 2020 the expectation is to have 35 ZBs (growth factor of 44).  That’s a lot of data, much of it coming from sources such as sensors, Web2.0, Web clicks, etc.  However much of this data while valuable to store, we don’t necessarily care about ACID properties or relational integrity or other things that would utilize a traditional RDBMS. Point being as always, right tool for the right job.

So in these cases, what is the right tool?  Well with Wednesday’s announcement, looks like Microsoft is putting their weight behind Hadoop and MapReduce, which offers:

  • Scalability and a high degree of fault tolerance
  • Ability to quickly analyze massive collections of records without forcing data to first be modeled, cleansed and loaded
  • Easy to use programming paradigm for writing and executing analysis programs that scale to 1000s of nodes and PBs of data
  • Low up front software and hardware costs

So what’s the system look like:

image

  1. Hadoop Distributed File System (HDFS) –  objectives are load balancing, fact access and fault tolerance, designed with the expectations that hardware/software failures
  2. MapReduce – framework for writing/executing distributed, fault tolerant algorithms – 2 functions map which divided a large problem into smaller problems and then performs the same function on all smaller problems and reduce which then combines the results.
  3. Hive & Pig – Hive was created by Facebook as a and is SQL-like, while Pig was created by Yahoo and is more procedural; both target MapReduce jobs.  However due to the complexity of MapReduce, HiveQL was created to combine the best features of SQL with MapReduce
  4. Sqoop – package for moving data between HDFS and relational DB systems via command line load and unload utilities

He then showed some performance metrics of SQL PDW and stated

I assert that it is MUCH easier to add support to SQL Server PDW for unstructured data (w/o having to load it), improved scalability, and fault tolerance than it is to ever get competitive performance from a Hadoop-based system

But again the point being that both of these type of systems (RDBMS and Hadoop)  are going to be working together, it will not be a case of choosing one or the other.

Sessions

After the keynote I sat in Adam Machanic’s (twitter | blog) Query Tuning Mastery for a bit before going downstairs to host the “SSIS for all, DBAs developers, etc.” table at the Birds of a Feather luncheon with Matt Masson (twitter | blog).  Ted Krueger (twitter | blog) had to leave leave early and Mike Walsh (twitter | blog) was looking for volunteers.  Had some good conversation ranging from “What is SSIS” to “How do I do meta-driven SSIS”

After that I jumped into Rewrite Your T-SQL for Great Good by Jeremiah Peschka (twitter | blog).  Jeremiah’s slides as always excellent, and I have a similar presentation entitled Writing Professional Database Code so I figured I’d go and borrow check it out.  I was glad I did as it had a different focus and perspective.  This just underscored to me again that each person brings their own experiences and perspective which is valuable.  In all honesty it is a common excuse that I used (as well as others) as to why I couldn’t present or blog, because the content was already out there.   So lesson learned again, stop making excuses. SmileAnyway he talked about consistency and gave a link to open source Unit Testing tools.  As a user of TFS, I haven’t had much experience with these and look forward to investigating them.  He gave a lot of query performance options and even threw down the gauntlet for everyone who uses “Distinct” in a query to explain the necessity of it as most of the times it’s used due to either a bad data model or bad joins.  Great stuff!

Last I went to Are you a Linchpin? Career Management lessons to help you become indispensable.  This was a panel discussion with Q&A at the end Linchpin: Are You Indispensable by Seth Godin as the starting point of the discussion.  The panel was made up of:

One of the interesting things that Kevin said (attributing credit to Brent) was that

People are going to remember you for 1, 2, maybe 3 adjectives.  What adjectives do you want to be known for?

My mother had always told me something similar:

People won’t remember what you did, but they will remember how you made them feel

Anyway it was a great conversation with some interesting Q&A, would love to see more of these type of things.

Whew, that’s it for the summit, had a great time and looking forward to next years!

Don’t Worry, I’ll Get To It

Looking back you realized where you made your first mistake.  You thought it would work out differently; in fact, you knew it would.  Alas, you believed the lie.

Wait, let’s back up:

Sweat mars your brow as you start to feel panic set in.  You reach for your mountain dew can and quickly put it down so no one will see your hand shaking (plus don’t want to spill a good dew).

Still need to back up further:

You were minding your own business watching a great SQLPASS webcast and thinking of how you were going to implement those cool things you learned.  At least, that was until you got THE PHONECALL.  Your boss calls and says, “Hey there’s this project going on that is going to need some database help, I’ve told them you’re just the person to do it.  So I’ve scheduled you in a conference call tomorrow morning first thing.  Sounded pretty straight forward so should be no problem.” You then start downloading your brain to notepad and internet links so you don’t forget the SQLPASS presentation.

You come in the next day, get your morning Mountain Dew (hey it’s my story and I’m not much of a coffee drinker) and jump on the conference call.    That is when you hear this:

“Glad you could join us; your boss said you were the best, so we’re expecting great things from you.  What we need is a database to store the Excel documents we’ve been working on for the past year.  We’ve just been emailing them around, so we’ll need to import the data and be able to create reports.  I already spec’d it out in Access and don’t think it should take more than a week, but with you on here, it should take less time!  This is good because we’re already committed to deadlines in about a month. I will email you the Excel documents, once you look and understand them we will talk later this afternoon to answer any questions you have. K, thanks, bye”

You start seeing visions of SSIS, SSRS and SQL go through your head and you’re wrapping your brain around everything that needs to be done, when you get the email: “Here’s a couple of the first type of spreadsheets, there are actually 3 different types of spreadsheets but it’s all related.  Should be no problem for a wiz like you, huh? :) When you’re ready for the rest, let us know and we can create a CD of them – think there’s about 300 or so.  I forgot to mention on the call that while we have a deadline in a month, we promised to start getting reports out next week, so I’m really looking forward to seeing the reports this week.”

Sweat mars your brow as you start to feel panic set in.  You reach for your mountain dew can and quickly put it down so no one will see your hand shaking (plus don’t want to spill a good dew).   Then a thought creeps into your head, you initially reject it, and look around to make sure no one heard it, but as you think about it, the thought starts making sense and you start to smile.  You now have the way to meet or possibly exceed the expectations and you can look like the hero.  So you say to yourself, “Don’t worry, I’ll just import manually, create the code as I go to be quick, then I can go back and clean it up later”

———————

The above scenario is not that uncommon in our field.  We are brought in as the experts and expected to produce things quickly.  The business group does not know and possibly does not even care about everything we do. But, really, it is not their job, if they did, you would not have one.  Things like code quality, code coverage, and security are your job, not theirs.

However, I am still seeing too many people try to be the hero by working around the clock or reducing code quality to meet the requirements as soon as possible.  Let me refute some more common reasons:

I will have plenty of time…

The idea that you will all of a sudden have this huge amount of free time to do performance improvements, proper error handling, better naming conventions, etc. is a myth.  More likely, you will be pulled onto another project and then someone will knock on your door in a year or two asking you to explain this really bad code they found.  If on the other hand you are experiencing large blocks of free time, please send me a note as I might have some work for you.  Companies pay you to work, which is the opposite of having free time.

I’ll just do it this once…

If you put that kind of code out there, you will probably end up doing a lot of manual process/manipulation.  This means when it is time to actually go-live, you scramble to get it all working, again reducing quality.  I have often seen “While it’s in development I don’t have to worry about getting it scheduled.  Oh that process didn’t work; I’ll just manually fix it.”  If you have to fix it manually in development, you will have to fix it manually in prod.

They just don’t understand

Our job should not just be seen as a cost center it should be seen as an asset.  That does not mean we just produce great code/functionality, it means we help the business achieve the business goals.  In other words, we work “with” not “for” the business.

Back in the Exchange 5.5 days, I did some work as a systems administrator.  We discovered that some people were using mailboxes that had up to 500 MB in them, while the average was around 100; I researched best practices and did all this analysis.  I finally brought the suggestion to my boss, “We should limit everyone’s mailbox to XX size that way we don’t have space issues” His response “This is obviously the way our people work and if it is, why we shouldn’t just invest in more space so they can continue to work that way?”

I had never considered the business, only the technology.  When you just have technology on the brain and not the business or the people you come up with solutions for the technology. “My system would be perfect except for all those users”

Take the time to explain what you need to accomplish so you can create what they want you to accomplish.  They want it sooner rather than later? Then reflect some of those decisions back to them, even empowering them to make some.  “Yes I can get that really cool function out there this week, but everyone in the world will have access to it,  With a couple more days I can also have it secure.  Which would you like?”  Of course, this means that you have to think more strategically about the project, which can be difficult at times.  We all know how easy (and fun) it is to just jump in there and start producing code”.  Transparency is the key.

But I’m always having to change things

Realize that some refactoring will always be necessary. The days of going away for 6 months to create something and come back to do your big “ta-da” are long gone.  People want things faster with more agility so things can change as we go along.    This means more pressure to produce faster, which means refactoring and adding additional functionality as you go.  However every line of code you write should be quality.  I don’t know how many times I have seen Proof-of-Concepts (POCs) turn into the actual production system.  Just because it is a POC from a business perspective does not mean that it should be a POC from a code quality perspective.  I actually saw someone code up a textbook T-SQL injection (put the stored procedure parameter straight in the where clause without any validation), with the intention of fixing it later.  If it is wrong to do later, it is wrong to do now.

In the development world, there are many tools for code coverage, unit testing, refactoring, etc.  While there are some tools out there like VSDBPro unit testing, VSDBPro and Red Gate refactoring, etc., the tools in the database world are not as mature.  There is no IDE that forces white space or automatically documents for you.  We do not have tools where we can just do a few clicks and have end-to-end tests.

However, that is no excuse to be a gunslinger and just shoot from the hip.  We are professionals and should be creating professional code that is readable, documented and have a standard of quality.

If you are struggling with that, adopt a Test Driven Development (TDD) philosophy or something else that will force you to think about or do things better.

Or maybe it is taking you 5 minutes to figure out what is going on in a stored procedure you just wrote a couple days ago.  If so, let’s be honest you failed in either code readability and/or documentation. No excuses, there are even good free tools out there. Whitespaceisreallyagoodthing.

Code Quality, Code Readability and Code Documentation is Important

SQL Server Training is about to change

Just in case you are not reading Brent Ozar’s blog (why aren’t you?) you wouldn’t have heard that come Monday he’s going to be making an announcement on a solution to SQL Server Training.  He’s outlined the issue Database Professionals have with getting quality information as well as showed who he’s working with to solve the issue here and no it’s not another forum.