Common SQL errors and pseudo-code
Recently I've been reading (and commenting) on various SQL newsgroups.
More and more I find that the problems people have are because they haven't defined what they want properly. Perhaps this is a difference between people who learn to program in a structured environment (such as a university), and people who learn to program by being thrown in at the deep end.
When I went to uni, I could already code adequately. I wasn't a great programmer, but I didn't feel like I had major coding issues. At uni, they taught us things like pseudo-code, and I kinda dismissed it, thinking that that was for people who couldn't code. But quite quickly, I saw the value in it.
Pseudo-code is really handy in all kinds of ways. By writing out what I'm trying to do in my own language, I can achieve a stack of things.
Firstly, it makes me think better about what it is I'm trying to achieve. Writing something out in English will help you notice when you're approaching something from the wrong direction.
Secondly, it helps show me where modules should be written. Anything I've written as a sentence that isn't almost a line of code in itself, I put as a function, or better, re-use a function I've written before. If it's new, I write pseudo-code for that function, which helps me work out what information I need to pass in and return.
Thirdly, it breaks what I'm doing down into smaller, more quickly achievable chunks. This is a big win. It means that your project is broken down to really measurable components. If you finish your pseudo-code and find that you have 100 things to write, then you can mark them off your list, and easily see how far you have to go.
But back to SQL...
SQL queries are really not that different. Hopefully your database design was based on a chunk of English in the first place, so that you have a reference to what everything does in a language you understand nicely. So then, when you need to write a query, you start by expressing it in English. Then you explain what you mean by each statement. When it's broken down enough, your query is just about written already.
For example. Recently someone asked about a query that would run on the 'pubs' database to show all the employees that had the next anniversary of their hiredate coming up in the next 30 days.
Well, considering that 'hire_date' is stored for everyone in the employees table, the location of the data is a no-brainer. It's only the computing bit that is going to be tricky.
So let's explain our statement better.
Someone's anniversary is the date this year (or next year if it's already passed) which is an exact number of years added on to the original date.
The 'exact number of years' added on to the original date is just the difference in the number of years between then and now.
Oh yeah - let's not use getdate(), because that makes testing harder. Later, we can replace our variable with getdate(), but that's for later.
datepart(year,hire_date) is the year of hire.
datepart(year,@now) is this year.
So, dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) is when that date occurred this year.
So then we check to see if it's already been (but comparing with tomorrow, because we don't want to miss today!)
case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
end
So you see, it pretty much wrote itself! Next put all the values you need to compare in the resultset, and then make your where clause from that:
So you construct:
Select fname, lname, hire_date,
,datepart(year,hire_date)
,dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
,case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
end
From employee
where datediff(day, @now, case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
end) <= 30
order by 7
Looks messy... could maybe be optimised a little. But it's correct, which counts for a lot. And we could verify that it was correct by looking down the list of values in the earlier statement. And because we're using a variable, we could look for interesting cases such as leap years and entries near the New Year.