Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « What makes a successful SQL Server Reporting Services implementation?SQL Friday, The Best SQL Server Links Of The Past Week Episode 18 »
    comments

    A while back George, Denis and I did a series of posts on calculating distance between sets of latitude/longitude coordinates. Those posts can be found here.

    Part 1: T-SQL
    Part 2: Geospatial Data Type (2008)
    Part 3: CLR (2005 +)

    At the time I promised to run some tests. I had an idea how it would turn out at that point, but wanted to make my testing a bit more thorough. So, I've got the three functions that were created from following along with those blog entries. I then created a new stored proc for each method. For example (this is the TSQL version):

    1. CREATE proc [dbo].[ClosestZips_TSQL] (@ZipCode char(5), @Quantity int)
    2. as
    3.  
    4. begin
    5. set nocount on
    6.  
    7. SELECT top ( @Quantity ) h.ZipCode
    8.     , h.City
    9.     , h.StateAbbreviation
    10.     , h.Latitude
    11.     , h.Longitude
    12.     , dbo.clrDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude) as Distance
    13. FROM zipcodes g
    14. JOIN zipcodes h ON g.zipcode <> h.zipcode
    15.     AND g.ZipCode = @ZipCode
    16. WHERE dbo.tsqlDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude)<=(200 * 1609.344)
    17. ORDER BY dbo.tsqlDistCalc(h.Longitude, h.Latitude, g.Longitude, g.Latitude)
    18.  
    19. end

    Then, I ran these procs for my zip code, for a varying number of results. The number of results didn't turn out to matter, so I've left that out. This table contains the results:

    MethodAverage Time (10 Runs)
    T-SQL2,075 ms
    GEOSPATIAL1,272 ms
    CLR600 ms

    These times were on a pretty weak laptop, at least by software developer's standards. I ran the test a whole bunch of times, and these were typical times (but in the low end of the spectrum I observed). I am sure times on a better machine would be much faster, and I would love to hear other people's results.

    So, the CLR was the winner in this test, but there are some things to consider besides speed. One is its availability. Of course it is not available on SQL 2000 or earlier, but there are other factors in play as well. For example, on my shared web host I cannot run CLR code in the database :-/. Some organizations may not allow CLR integration code on their servers as a rule either. A big advantage you have with the other two is that if you have a SQL Server with the requisite version at your disposal, you can use them.

    Another thing to consider is what else you can do with the data. Using the CLR and T-SQL approaches, the data is stored as raw coordinate pairs. I haven't confirmed this, but they seem much quicker to write to as well, so that should be taken into consideration if your app will need to do a lot of writing (not likely for a "Closest Store" type of thing).

    Storing the raw coordinate pairs will also allow you to do some cool things to limit the size of your query, that can yield tremendous performance gains. For example, George showed me a trick once where you can first use the pythagorean theorem to approximate distance, and use this to create a bounding box. You then limit the number of pairs you need to run the "real" calculation on and really speed things up. But that might be a topic for another post.

    I've tried to cover the ups and downs of these methods the best that I could. I'd love to hear about anything I missed. Have fun!

    *** Got a SQL Server question? Check out our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    User bio imageAlex is a .net and SQL Server developer from southeastern PA, where he lives with a lovely wife and a veritable smorgasbord of pets. He recently completed a masters degree in Software Engineering from Penn State. He loves mountain biking, open source software, home brewing, Syracuse basketball, and the mono runtime.
    Social SitingsTwitterLinkedInHomePageLTD RSS Feed
    1745 views
    Instapaper

    1 comment

    Comment from: CB [Visitor] Email
    CB At a former job we had to do this sort of thing, and also ran some benchmark software against the application (this was pre-2008, so no spatial type). Found that CLR and/or using UDF in TSQL was just too slow. We ended up putting the whole haversine calculation right inside the main SQL statement...way faster. Made for some messy code, but ran lots faster....if I can track it down I'll post.
    05/03/09 @ 12:33

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)