Rob Farley

Rob Rob Farley has been consulting in IT since completing a Computer Science degree with first class honours in 1997. Before moving to Adelaide, he worked in consultancies in Melbourne and London. He runs the development department in one of Australia's leading IT firms, as well as doing database application consultancy and training. He heads up the Adelaide SQL Server User Group, and holds several Microsoft certifications.

Rob has been involved with Microsoft technologies for most of his career, but has also done significant work with Oracle and Unix systems. His preferred database is SQL Server and his preferred language is C#. Recently he has been involved with Microsoft Learning in the US, creating and reviewing new content for the next generation of Microsoft exams.

Over the years, Rob's clients have included BP Oil, OneLink Transit, Accenture, Avanade, Australian Electorial Commission, the Chartered Institute of Personnel and Development, the Royal Borough of Kingston, Help The Aged, Unisys, Department of Treasury and Finance (Vic), National Mutual, the Bible Society and others.

Did you mean to come here? My blog is now at http://msmvps.com/blogs/robfarley



30 September 2006

Primes

A challenge is always good. Ward Pond has a way of finding prime numbers.

My machine isn't as chunky as some, so my numbers are different. But...

I think Ward's performance largely comes from the fact that his starting point assumes he has found all the primes up to 31 already. So I adjusted his query a little, commenting out the section where he had all the previously done primes:

INSERT INTO dbo.Primes (Prime)
SELECT num
FROM dbo.nums
WHERE /*(num % 2 <> 0 OR num = 2)
AND (num % 3 <> 0 OR num = 3)
AND (num % 5 <> 0 OR num = 5)
AND (num % 7 <> 0 OR num = 7)
AND (num % 11 <> 0 OR num = 11)
AND (num % 13 <> 0 OR num = 13)
AND (num % 17 <> 0 OR num = 17)
AND (num % 19 <> 0 OR num = 19)
AND (num % 23 <> 0 OR num = 23)
AND (num % 29 <> 0 OR num = 29)
AND (num % 31 <> 0 OR num = 31)
AND */ num <> 1
AND num <= @Limit

..and having a different starting place for @Last

SET @Last = 1 --31


I set @Limit to 10,000, and it ran in 350 ms. His original query did it in only 60ms, which tells me that starting at 31 makes a massive difference, and is definitely a good idea. But let's see how another algorithm might compare, with both systems starting at 2.

I'm starting with a prepopulated auxiliary table of numbers, just as he did. I'm using the same counting mechanism as him.


DECLARE @Limit int;
SET @Limit = 10000;

DECLARE @Start datetime, @End datetime;
SET @Start = CURRENT_TIMESTAMP;


insert into dbo.rf_primesfound
select p.num
from dbo.nums n1
join
dbo.nums f
on n1.num between 2 and @limit
and f.num between 2 and sqrt(@limit)
and f.num <= @limit / n1.num
right join
dbo.nums p
on p.num = f.num * n1.num
where f.num is null
and p.num between 2 and @limit


SET @End = CURRENT_TIMESTAMP

SELECT @Start AS Start_time, @End AS End_time,
DATEDIFF(ms, @Start, @End) AS Duration,
COUNT(*) AS Primes_found, @Limit AS Limit
FROM dbo.rf_primesfound

select * from dbo.rf_primesfound


I set @Limit to 10,000, and my one ran in 170 ms.

Ok, so let's see how well they scale. Let's try for 100,000. Mine took 1673ms. His took 3243ms.

For 1,000,000, mine took 21 seconds. His took 55 seconds. But when I let him start at 31, his took about 5 seconds.

I think the difference comes down to the fact that Ward is going through the list one prime at a time, whereas I'm taking a set-based approach and just filtering them all out at once.
I feel like there would be ways of improving it even more, but will be someone else's challenge.

I definitely think the way to go is to start with the low-hanging fruit. Perhaps there are ways that I could improve mine to take a few steps to get the low-hanging fruit first... not today though.