Rob Farley

Rob Rob Farley has been consulting in IT since completing a Computer Science degree with first class honours in 1997. Before moving to Adelaide, he worked in consultancies in Melbourne and London. He runs the development department in one of Australia's leading IT firms, as well as doing database application consultancy and training. He heads up the Adelaide SQL Server User Group, and holds several Microsoft certifications.

Rob has been involved with Microsoft technologies for most of his career, but has also done significant work with Oracle and Unix systems. His preferred database is SQL Server and his preferred language is C#. Recently he has been involved with Microsoft Learning in the US, creating and reviewing new content for the next generation of Microsoft exams.

Over the years, Rob's clients have included BP Oil, OneLink Transit, Accenture, Avanade, Australian Electorial Commission, the Chartered Institute of Personnel and Development, the Royal Borough of Kingston, Help The Aged, Unisys, Department of Treasury and Finance (Vic), National Mutual, the Bible Society and others.

Did you mean to come here? My blog is now at http://msmvps.com/blogs/robfarley



22 October 2005

Median aggregate working

Well, I feel like I cheated, but I do have a working user-defined aggregate function that calculates the median.

I wrote earlier about it not really being possible. Well, I managed to implement one that works, even if it's very nastily done, really slow, with other issues too.

I persist the data in a file, and then in the Terminate() function, I read it back into an ArrayList, sort it, and grab the middle record. It's not very elegant, but it works.

The issues are:

1/ I had to mark it as unsafe because it accesses the file system. So then I had to make the database trustworthy and use an sa account to import the assembly.

2/ If the query is interrupted, the file isn't deleted properly. I can't implement IDispose, because the CLR object is disposed after every record, so that idea doesn't work at all. So instead, I would use a separate process to look for old files and remove them.

3/ It's really slow. On AdventureWorks on my laptop, it took less than a second to work through the 290 records in Employee, but 13 seconds to get through SalesOrderHeader. But it does give the correct records. I figure it's no good for a website, but it could be okay for a scheduled report, and to be honest... if it's something that a SQL developer is putting together, then they can use row_number() to get the median out. This is more useful as an available function that could be used in a report built by a business user in Report Manager.

I guess 'really slow' is subjective. I hope so anyway. :)

I won't post code here... I'm sure you all know how to write data to a file and then read it back again. You can always post a comment if you really want to see the source code.