Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Authors

Search

XML Feeds

Google Ads

« What makes a successful SQL Server Reporting Services implementation?SQL Friday, The Best SQL Server Links Of The Past Week Episode 18 »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

A while back George, Denis and I did a series of posts on calculating distance between sets of latitude/longitude coordinates. Those posts can be found here.

Part 1: T-SQL
Part 2: Geospatial Data Type (2008)
Part 3: CLR (2005 +)

At the time I promised to run some tests. I had an idea how it would turn out at that point, but wanted to make my testing a bit more thorough. So, I've got the three functions that were created from following along with those blog entries. I then created a new stored proc for each method. For example (this is the TSQL version):

  1. CREATE PROC [dbo].[ClosestZips_TSQL] (@ZipCode CHAR(5), @Quantity INT)
  2. AS
  3.  
  4. BEGIN
  5. SET NOCOUNT ON
  6.  
  7. SELECT TOP ( @Quantity ) h.ZipCode
  8.     , h.City
  9.     , h.StateAbbreviation
  10.     , h.Latitude
  11.     , h.Longitude
  12.     , dbo.clrDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude) AS Distance
  13. FROM zipcodes g
  14. JOIN zipcodes h ON g.zipcode <> h.zipcode
  15.     AND g.ZipCode = @ZipCode
  16. WHERE dbo.tsqlDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude)<=(200 * 1609.344)
  17. ORDER BY dbo.tsqlDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude)
  18.  
  19. END

Then, I ran these procs for my zip code, for a varying number of results. The number of results didn't turn out to matter, so I've left that out. This table contains the results:

MethodAverage Time (10 Runs)
T-SQL2,075 ms
GEOSPATIAL1,272 ms
CLR600 ms

These times were on a pretty weak laptop, at least by software developer's standards. I ran the test a whole bunch of times, and these were typical times (but in the low end of the spectrum I observed). I am sure times on a better machine would be much faster, and I would love to hear other people's results.

So, the CLR was the winner in this test, but there are some things to consider besides speed. One is its availability. Of course it is not available on SQL 2000 or earlier, but there are other factors in play as well. For example, on my shared web host I cannot run CLR code in the database :-/. Some organizations may not allow CLR integration code on their servers as a rule either. A big advantage you have with the other two is that if you have a SQL Server with the requisite version at your disposal, you can use them.

Another thing to consider is what else you can do with the data. Using the CLR and T-SQL approaches, the data is stored as raw coordinate pairs. I haven't confirmed this, but they seem much quicker to write to as well, so that should be taken into consideration if your app will need to do a lot of writing (not likely for a "Closest Store" type of thing).

Storing the raw coordinate pairs will also allow you to do some cool things to limit the size of your query, that can yield tremendous performance gains. For example, George showed me a trick once where you can first use the pythagorean theorem to approximate distance, and use this to create a bounding box. You then limit the number of pairs you need to run the "real" calculation on and really speed things up. But that might be a topic for another post.

I've tried to cover the ups and downs of these methods the best that I could. I'd love to hear about anything I missed. Have fun!

*** Got a SQL Server question? Check out our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

About the Author

Alex is a .net and SQL Server developer from southeastern PA, where he lives with a lovely fiance and a veritable smorgasbord of pets. He's also working on a masters degree in Software Engineering. He loves mountain biking, home brewing, and the mono runtime.
Social SitingsTwitterHomePageLTD RSS Feed
891 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

1 comment

Comment from: CB [Visitor] Email
At a former job we had to do this sort of thing, and also ran some benchmark software against the application (this was pre-2008, so no spatial type). Found that CLR and/or using UDF in TSQL was just too slow. We ended up putting the whole haversine calculation right inside the main SQL statement...way faster. Made for some messy code, but ran lots faster....if I can track it down I'll post.
05/03/09 @ 12:33

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)