Rob Farley

28 April 2005

Common SQL errors and pseudo-code

Recently I've been reading (and commenting) on various SQL newsgroups.

More and more I find that the problems people have are because they haven't defined what they want properly. Perhaps this is a difference between people who learn to program in a structured environment (such as a university), and people who learn to program by being thrown in at the deep end.

When I went to uni, I could already code adequately. I wasn't a great programmer, but I didn't feel like I had major coding issues. At uni, they taught us things like pseudo-code, and I kinda dismissed it, thinking that that was for people who couldn't code. But quite quickly, I saw the value in it.

Pseudo-code is really handy in all kinds of ways. By writing out what I'm trying to do in my own language, I can achieve a stack of things.

Firstly, it makes me think better about what it is I'm trying to achieve. Writing something out in English will help you notice when you're approaching something from the wrong direction.

Secondly, it helps show me where modules should be written. Anything I've written as a sentence that isn't almost a line of code in itself, I put as a function, or better, re-use a function I've written before. If it's new, I write pseudo-code for that function, which helps me work out what information I need to pass in and return.

Thirdly, it breaks what I'm doing down into smaller, more quickly achievable chunks. This is a big win. It means that your project is broken down to really measurable components. If you finish your pseudo-code and find that you have 100 things to write, then you can mark them off your list, and easily see how far you have to go.

But back to SQL...

SQL queries are really not that different. Hopefully your database design was based on a chunk of English in the first place, so that you have a reference to what everything does in a language you understand nicely. So then, when you need to write a query, you start by expressing it in English. Then you explain what you mean by each statement. When it's broken down enough, your query is just about written already.

For example. Recently someone asked about a query that would run on the 'pubs' database to show all the employees that had the next anniversary of their hiredate coming up in the next 30 days.

Well, considering that 'hire_date' is stored for everyone in the employees table, the location of the data is a no-brainer. It's only the computing bit that is going to be tricky.

So let's explain our statement better.

Someone's anniversary is the date this year (or next year if it's already passed) which is an exact number of years added on to the original date.

The 'exact number of years' added on to the original date is just the difference in the number of years between then and now.

Oh yeah - let's not use getdate(), because that makes testing harder. Later, we can replace our variable with getdate(), but that's for later.

datepart(year,hire_date) is the year of hire.
datepart(year,@now) is this year.

So, dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) is when that date occurred this year.

So then we check to see if it's already been (but comparing with tomorrow, because we don't want to miss today!)


case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
    then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
    else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
end

So you see, it pretty much wrote itself! Next put all the values you need to compare in the resultset, and then make your where clause from that:

So you construct:


Select fname, lname, hire_date,
,datepart(year,hire_date)
,dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
,case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
    then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
    else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
 end
From employee
where datediff(day, @now, case when dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date) < dateadd(day,1,@now)
    then dateadd(year,datepart(year,@now)-datepart(year,hire_date)+1,hire_date)
    else dateadd(year,datepart(year,@now)-datepart(year,hire_date),hire_date)
 end) <= 30
order by 7

Looks messy... could maybe be optimised a little. But it's correct, which counts for a lot. And we could verify that it was correct by looking down the list of values in the earlier statement. And because we're using a variable, we could look for interesting cases such as leap years and entries near the New Year.

22 April 2005

RowNum

Having used Oracle quite a bit early in my database experience, I found that I liked using rownum for all kinds of things.

Nowadays, in SQL Server, I'm very pleased to see that SQL2005 is introducing the feature to the Microsoft world. About time!

It got me thinking though, about a common situation that I come across. Let's suppose I want a list of dates between two dates, and then join that list with another table for some reason. Perhaps I want to look at the total sales that went through each day... returning 0 if there were none (rather than just skipping that day, which is what would happen if I were to only look at the Sales table).

At the moment, I tend to use a table that I just store numbers in for this type of purpose. Like this:

select * from dbo.numbers

nums
----
0
1
2
3
4
5
...

Then I can write a query like this:


select dateadd(day,num.number,@startdate) theDate, sum(sales.amount) theSales
from numbers num (NOLOCK)
     left join salesdata sales (NOLOCK)
          on sales.saledate >= dateadd(day,num.number,@startdate)
          and sales.saledate < dateadd(day,num.number+1,@startdate)
where num.number <= datediff(day,@startdate,@enddate)
group by num.number

This query will give me zeros for weekends, public holidays, whenever no sales are made. Makes it display much more consistently in a report.

If I think I'm going to use more numbers than I have stored, I can always call a top-up function first, to check how high my numbers go and then add more as required. Or I could use a function to create the table on the fly... but I'm not so keen on this idea, as disk-space for a table is typically cheaper than the processing time involved in creating a temporary table each time.

Using rownum, I could have a table of bit values, which could be stored much smaller. I would just have to have enough of them in the table, checking count(*) instead of max(number) in my top-up function.

My query would then become:


select dateadd(day,rownum,@startdate) theDate, sum(sales.amount) theSales
from bits (NOLOCK)
     left join salesdata sales (NOLOCK)
          on sales.saledate >= dateadd(day,rownum,@startdate)
          and sales.saledate < dateadd(day,rownum+1,@startdate)
where rownum <= datediff(day,@startdate,@enddate)
group by dateadd(day,rownum,@startdate)

But you can see that rownum appears in the 'group by' statement, which isn't healthy at all.

Also, if I want to return multiple records for each day, I have an issue, because rownum will increase by one on each row. Really, I need a subquery, like this:


select dateadd(day,num.number,@startdate) theDate, sum(sales.amount) theSales, sales.salesman
from (select rownum as number from bits (NOLOCK) where rownum <= datediff(day,@startdate,@enddate)) num
     left join salesdata sales (NOLOCK)
          on sales.saledate >= dateadd(day,rownum,@startdate)
          and sales.saledate < dateadd(day,rownum+1,@startdate)
group by sales.salesman, num.number

20 April 2005

Graphical tools for writing SQL queries

Someone commented: But if it weren't for graphical tools, I wouldn't have ever been able to conceptualise a join

Fair enough. In which case, learn about joins, and then stop using your graphical tools. Once you know the difference between an inner join and a left/right join, then leave the graphical joins alone. I reckon the only use for graphical tools is to avoid typos in table names.

SQL isn't hard!

Haha... I just showed this blog page to the friend of mine I referred to in the blog below. He told me that I made it sound like he had a clue. Thing is... I reckon he does.

SQL shouldn't be hard! Typically, people confuse themselves with SQL, instead of just thinking about it in a straight-forward way. I think if you're getting confused about writing a SQL statement, then just try to do it more slowly. Put something together than queries everything you might need to use. Then try to make sure that each record is reflected properly. Then apply your aggregations, etc, and you'll probably find that you can get your query done correctly first time.

Oh, and I try to get people to avoid using graphical tools to write their queries. I think they just make things worse. Of course they can help with joins, but joins aren't very hard either. Just learn the syntax, and try writing your queries out from scratch. Chances are you'll quickly learn to do it well.

Aggregations in SQL

Today a friend of mine asked me about a SQL query he was trying to write. He had expressed it in English, along the lines of: "Each record represents a cost to the organisation dependent on the value of field3. If field3 is 1, then it costs us $1.20. If it's not, then it costs us $1.50."

He had worked out that he wanted to multiply the count where field3 = 1 by 1.2, and then add that to 1.5 * the count where field3 <> 1. But got stuck.

Problem was, he expressed it as:

CASE WHEN field3 = 1 THEN 1.20 * COUNT(id) else 1.50 * COUNT(id) END

When he tried to sum this, he got in a mess, because sum and count don't really go together very well like that.

What he wanted was:

sum(case when field3 = 1 then 1.20 else 1.50 end)

Reason being that he should've concentrated on working out the cost FOR EACH RECORD. That's easy to sum, with an appropriate 'group by' clause. By trying to work it out against the count, he was just approaching the problem from the wrong direction.

DataView column names

Today I had a SQL query which I wanted to make a table from in ASP.Net (using C#). I didn't want to use a DataGrid and bind it, I wanted to be able to loop through the fields and display it as I used to do for a RecordSet. The odd 'switch' statement to alter the formatting, that type of thing. Nice and generic, that I can pass DataViews into, so that whenever I have a field with a particular name, I can display a link, or whatever.

With an old ADO recordset, I could just refer to rs.Fields[i].Name for the column name and rs.Fields[i].Value for the values. But no such luck in .Net.

The answer turned out to be to refer to the Table property of the DataView. Then you have refer to the Columns property, and get the ColumnName out of there. Like this:

// 'dv' is my DataView. You should use better variable names than this.
Table t = new Table();
TableRow tr = new TableRow();
for (int i=0;i<dv.Table.Columns.Count;i++)
{
tc = new TableCell();
tc.HorizontalAlign = HorizontalAlign.Center;
tc.Text = dv.Table.Columns[i].ColumnName;
tr.Cells.Add(tc);
}
t.Rows.Add(tr);

foreach (DataRowView dvr in dv)
{
TableRow tr = new TableRow();
for (int i=0;i<dv.Table.Columns.Count;i++)
{
TableCell tc = new TableCell();
// switch on dv.Table.Columns[i].ColumnName if you want
tc.Text = dvr[i].ToString();
tr.Cells.Add(tc);
}
t.Rows.Add(tr);
}
return(t);

First blog

Well, I'm not entirely convinced about this blogging thing, but I figured I should probably give it a go for a while. After all, I'm sure there are things that I know that other people might be interested in, stuff like that. If it turns out I can help people out through a blog, then great.

Rob Farley - You could win from here!

Links

Blogs