George and I decided to show you how you can do simple radius searches based on Zip Codes, he did the SQL 2005 and before version here: SQL Server Zipcode Latitude/Longitude proximity distance search and I will do the SQL Server 2008 version.
The first thing we need to do is load our data. There are various sources for this data, ranging from free to expensive monthly subscriptions. One source of free data is: http://download.geonames.org/export/zip/ make sure to grab the US.ZIP file for this demo. Unrar/Unzip the file and you will have a US.txt file.
Once you have downloaded your data, the next step is to import it in to your database. You can use the following script to do it.
T-SQL | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | CREATE TABLE ZipCodesTemp( [Country] [VARCHAR](2) NULL, [ZipCode] [VARCHAR](5) NULL, [City] [VARCHAR](200) NULL, [STATE] [VARCHAR](50) NULL, [StateAbbreviation] [VARCHAR](2) NULL, [County] [VARCHAR](50) NULL, [Unused1] [VARCHAR](5) NULL, [Unused2] [VARCHAR](1) NULL, [Latitude] [DECIMAL](8,5) NULL, [Longitude] [DECIMAL](8,5) NULL, [Unused3] [VARCHAR](1) NULL ) DECLARE @bulk_cmd VARCHAR(1000) SET @bulk_cmd = 'BULK INSERT ZipCodesTemp FROM ''C:YourFolderUS.txt'' WITH (FIELDTERMINATOR=''t'', ROWTERMINATOR = '''+CHAR(10)+''')' EXEC(@bulk_cmd) |
CREATE TABLE ZipCodesTemp( [Country] [VARCHAR](2) NULL, [ZipCode] [VARCHAR](5) NULL, [City] [VARCHAR](200) NULL, [STATE] [VARCHAR](50) NULL, [StateAbbreviation] [VARCHAR](2) NULL, [County] [VARCHAR](50) NULL, [Unused1] [VARCHAR](5) NULL, [Unused2] [VARCHAR](1) NULL, [Latitude] [DECIMAL](8,5) NULL, [Longitude] [DECIMAL](8,5) NULL, [Unused3] [VARCHAR](1) NULL ) DECLARE @bulk_cmd VARCHAR(1000) SET @bulk_cmd = 'BULK INSERT ZipCodesTemp FROM ''C:YourFolderUS.txt'' WITH (FIELDTERMINATOR=''t'', ROWTERMINATOR = '''+CHAR(10)+''')' EXEC(@bulk_cmd)
Now you will create this table
T-SQL | |
1 2 3 4 5 6 7 8 9 10 11 12 | CREATE TABLE ZipCodes( [Country] [VARCHAR](2) NULL, [ZipCode] [VARCHAR](5) NOT NULL, [City] [VARCHAR](200) NULL, [STATE] [VARCHAR](50) NULL, [StateAbbreviation] [VARCHAR](2) NULL, [County] [VARCHAR](50) NULL, [Latitude] [DECIMAL](8,5) NOT NULL, [Longitude] [DECIMAL](8,5) NOT NULL, [GeogCol1] [GEOGRAPHY] NULL, [GeogColTemp] [varchar](100) NULL ) |
CREATE TABLE ZipCodes( [Country] [VARCHAR](2) NULL, [ZipCode] [VARCHAR](5) NOT NULL, [City] [VARCHAR](200) NULL, [STATE] [VARCHAR](50) NULL, [StateAbbreviation] [VARCHAR](2) NULL, [County] [VARCHAR](50) NULL, [Latitude] [DECIMAL](8,5) NOT NULL, [Longitude] [DECIMAL](8,5) NOT NULL, [GeogCol1] [GEOGRAPHY] NULL, [GeogColTemp] [varchar](100) NULL )
There is at least one duplicate row in this file so we will import only uniques
T-SQL | |
1 2 3 | INSERT ZipCodes (Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude) SELECT DISTINCT Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude FROM ZipCodesTemp |
INSERT ZipCodes (Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude) SELECT DISTINCT Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude FROM ZipCodesTemp
Our next step will be to update the GeogCol1 table with something that SQL server can understand.
Here is some sample code that displays the format of this datatype
T-SQL | |
1 2 3 | DECLARE @h geography; SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326); SELECT @h |
DECLARE @h geography; SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326); SELECT @h
output
———————————————-
0xE6100000010C6744696FF07D4340EC51B81E855753C0
As you can see it is some binary data. This data is using the World Geodetic System 1984 (WGS 84)
To see if this is supported in your database you can run this query
T-SQL | |
1 | SELECT * FROM sys.spatial_reference_systems |
SELECT * FROM sys.spatial_reference_systems
And yes in my database it has a spatial_reference_id of 4326
T-SQL | |
1 2 | SELECT * FROM sys.spatial_reference_systems WHERE spatial_reference_id = 4326 |
SELECT * FROM sys.spatial_reference_systems WHERE spatial_reference_id = 4326
Here is the meta data
GEOGCS[“WGS 84”, DATUM[“World Geodetic System 1984”,
ELLIPSOID[“WGS 84”, 6378137, 298.257223563]], PRIMEM[“Greenwich”, 0],
UNIT[“Degree”, 0.0174532925199433]]
Back to our code, in order to have this data in our table
SET @h = geography::STGeomFromText(‘POINT(-77.36750 38.98390)’, 4326);
We need to do some things, first we update our temp column
T-SQL | |
1 2 3 | UPDATE zipcodes SET GeogColTemp= 'POINT(' + convert(varchar(100),longitude) +' ' + convert(varchar(100),latitude) +')' |
UPDATE zipcodes SET GeogColTemp= 'POINT(' + convert(varchar(100),longitude) +' ' + convert(varchar(100),latitude) +')'
Now we can update out geography column
T-SQL | |
1 2 | UPDATE zipcodes SET GeogCol1 = geography::STGeomFromText(GeogColTemp,4326) |
UPDATE zipcodes SET GeogCol1 = geography::STGeomFromText(GeogColTemp,4326)
We can drop the temp column now
T-SQL | |
1 | ALTER TABLE zipcodes DROP COLUMN GeogColTemp |
ALTER TABLE zipcodes DROP COLUMN GeogColTemp
Now we have to add a primary key, this is needed because otherwise we won’t be able to create our spatial index and the following message would be displayed
Server: Msg 12008, Level 16, State 1, Line 1
Table ‘zipcodes’ does not have a clustered primary key as required by the spatial index. Make sure that the primary key column exists on the table before creating a spatial index.
T-SQL | |
1 2 3 4 5 6 | ALTER TABLE zipcodes ADD CONSTRAINT [PK_ZipCode] PRIMARY KEY CLUSTERED ( Zipcode, Longitude ) WITH FILLFACTOR = 100 |
ALTER TABLE zipcodes ADD CONSTRAINT [PK_ZipCode] PRIMARY KEY CLUSTERED ( Zipcode, Longitude ) WITH FILLFACTOR = 100
Create the spatial index
T-SQL | |
1 2 | CREATE SPATIAL INDEX SIndx_SpatialTable_geography_col1 ON zipcodes(GeogCol1); |
CREATE SPATIAL INDEX SIndx_SpatialTable_geography_col1 ON zipcodes(GeogCol1);
first I will show you an example to calculate the distance that you can execute
T-SQL | |
1 2 3 4 5 | DECLARE @g geography; DECLARE @h geography; SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326); SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326); SELECT @g.STDistance(@h)/1609.344; |
DECLARE @g geography; DECLARE @h geography; SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326); SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326); SELECT @g.STDistance(@h)/1609.344;
as you can see the distance is 8.8490611480890067
Now I want to see all the zipcode which are within 20 miles of zipcode 10028 (yes I used to live there)
Here is a way that will take a long time since it is not sargable, this will take about 2 seconds
T-SQL | |
1 2 3 4 5 6 | SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)/1609.344 <= 20 |
SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)/1609.344 <= 20
Now we all know functions on the left side of the operator are bad, here is how this is optimized, we switch the calculation to the right side of the = sign
T-SQL | |
1 2 3 4 5 6 | SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344) |
SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344)
that ran in between 15 and 60 milliseconds
To find everything between 10 and 20 miles you can use this
T-SQL | |
1 2 3 4 5 6 7 | SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344) AND g.GeogCol1.STDistance(h.GeogCol1)>= (10 * 1609.344) |
SELECT h.* FROM zipcodes g JOIN zipcodes h on g.zipcode <> h.zipcode AND g.zipcode = '10028' AND h.zipcode <> '10028' WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344) AND g.GeogCol1.STDistance(h.GeogCol1)>= (10 * 1609.344)
As you can see doing stuff like this on SQL Server 2008 is fairly easy because of the geograpy data type
*** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum
50 Comments
I have absolutely no use for this, but this is a really cool new feature and a really great post. Thanks, Denis! 🙂
You are welcome.
Credits go to George since he came up with the idea of the blogpost I just happened to do the 2008 version
Thanks for doing this you guys, I’d really been wanting to see a good tutorial to get me up to speed on this. I’m about to move crummy web page to a SQL 2008 host, I’m really hoping I’ll be able to use this feature for a new and improved version of the location thing I’m working on.
I had a little beef with their data for one zip code. This is more like it
update ZipCodes
set GeogCol1 = geography::STGeomFromText(‘POINT(40.02963540591171 -75.56317090988159)’, 4326)
where ZipCode = 19345
These spatial indexes are pretty impressive!
My apologies if I missed it, but in some of your procedures you use ‘1609.344’ in some of the calculations.
What is ‘1609.344’ ?
Chuck, it is done to convert to miles, the metric system is used in geospatial calculations
Thanks!
I see it now. I didn’t think about the conversion.
Thank you so much for a great tutorial about geograpy data type.
I was very glad to find this article! I have a site I’m working on, and by using the techniques described in this blog, I was able to make my proximity searching MUCH faster.
I wrote a blog entry (http://www.kidseatcheaptonight.com/blog/2009/08/site-updated-speed-increase.aspx) pointing some people to this blog, if any tech geeks happen to come across my site, hopefully they will pay your blog a visit too.
Thanks!
Jeremy
I would highly recommend doing the geography right from the original numeric rather than converting to text:
geography::Point(latitude, longitude, 4326);
As in:
DECLARE @h geography;
SET @h = geography::Point(-77.36750, 38.98390, 4326);
SELECT @h
Then there is no interim conversion and problems are resolved before the statement is run.
this is great and works well. one question, how can i sort by distinct (closer to farthest)?
Joe,
can you try if
ORDER BY g.GeogCol1.STDistance(h.GeogCol1)
works for you, not near a PC with sql server right now but that should work
Hi … Denis …
to connvert to miles we used this : ‘1609.344’
and what the value if i want to convert to metre
thanx
Bass for meters leave of the division
examples
–MILES
DECLARE @g geography;
DECLARE @h geography;
SET @h = geography::STGeomFromText(‘POINT(-77.36750 38.98390)’, 4326);
SET @g = geography::STGeomFromText(‘POINT(-77.36160 38.85570)’, 4326);
SELECT @g.STDistance(@h)/1609.344; –miles 8.84906114808901
GO
–METERS
DECLARE @g geography;
DECLARE @h geography;
SET @h = geography::STGeomFromText(‘POINT(-77.36750 38.98390)’, 4326);
SET @g = geography::STGeomFromText(‘POINT(-77.36160 38.85570)’, 4326);
SELECT @g.STDistance(@h); –meters…14241.1834643102
Go
–KILOMETERS
DECLARE @g geography;
DECLARE @h geography;
SET @h = geography::STGeomFromText(‘POINT(-77.36750 38.98390)’, 4326);
SET @g = geography::STGeomFromText(‘POINT(-77.36160 38.85570)’, 4326);
SELECT @g.STDistance(@h)/1000; –kilometers 14.2411834643102
GO
Thanx a large ,,, Denis
Actually i v been create simple GIS applcations
using sql server 2008 for database and
visual studio 2008 to build the appliaction …
but i dont know how to preview spatial data on visual studio 2008 as shown on sql server 2008 …
if you have reference or tutorial for me…
i am very appreciated ….
i hope you not borring for my question
thanx before
and sory i am not good enough in english …
I am taking a T-SQL class and need some help. I have to writw UDF that takes as input the latitude and lingitutde of each of two points on the earth’s surface, and calculates the distance between the ow points. The @distance formula was given and need the accuracy of one tenth of a mile.
Hope some can help!
Eve
Questions like these are better posted in the forum.
This stuff is amazing. I’ve just started rewriting a proximity search piece of one of our websites and happened upon this little gem. Thanks Denis and George!
This is GREAT!! Wow, been working with Zip Code radius searches and this makes it so much easier. Thanks!!!
One question, why doesn’t the query return the zip code that was searched in the results? The distance would be 0 but is still less than the search radius.
Excellent Post. Thank you very much!
Excellent Post, but I have the same question as Matt. The query that you had provided doesn’t return the zipcode that is being searched around. Anyone have a solution to that?
We threw this together to do radius searches based on city rather than a starting zip code. All based on Denis’ article.
CREATE FUNCTION [dbo].[GetProximityZipCodesByCityStateCountry]
(
@Country varchar(2),
@City varchar(200),
@State varchar(2),
@MinRadius int,
@MaxRadius int,
@Units char(1)
)
RETURNS
@Zips TABLE
(
ZipCode varchar(5)
)
AS
BEGIN
DECLARE @Min int, @Max int, @center Geography
IF(@Units = ‘k’) –Kilometers
SELECT @Min = (@MinRadius * 1000), @Max = (@MaxRadius * 1000)
ELSE — Miles
SELECT @Min = (@MinRadius * 1609.344), @Max = (@MaxRadius * 1609.344)
SET @center = (SELECT geography::STGeomFromText(‘POINT(‘ + CONVERT(VARCHAR(100),AVG(Longitude)) +’ ‘ + CONVERT(VARCHAR(100),AVG(Latitude)) +’)’,4326)
FROM ZipCodes g
WHERE Country = @Country AND City = @City and StateAbbreviation = @State)
INSERT INTO @Zips
SELECT ZipCode
FROM zipcodes
WHERE GeogCol1.STDistance(@center) BETWEEN @Min AND @Max
RETURN
END
GO
Ashey, looks cool, thanks for the code
I have not been able to load the provided us.text data set without errors. Feel stupid sense there is no other posts on it. I get errors saying the unused 3 column is too short. So i make it bigger varchar(100) and the data goes in but now there is longitude info in the unused 3. I have tried to use excel to format the set to go in correctly with no luck.. and this is another dumb queston but what is the point of the 3 empty columns in the data set anyways???
I’m having the same problem as Jack, any suggestions? Jack – did you get it to work?
Guys, I had the same problem. The fix was to add a column after unused2. I added unused3 and changed the current unused3 to unused 4. Then they will both be populated.
Hope this help,
Ryan
ryan(@)ryanhelms.com
First, thanks for the great head start with working on the proximity search portion of my application. This has all been a great help. I knew the geography data type would come in handy at some point!
I only have one question. This is obviously disregarding any input on services to get driving time, rather than point to point calculations? This is great for part of my solution, but driving time would complete it.
Any ideas would be greatly appreciated.
Thanks
Ryan
* I meant driving mileage / time, not just time.
I got the data loaded now, but I also have the question about how to include the zip code that is being searched from, we’d want to include that in any results.
Ashey –
I am looking to use this for finding all records in our database that are, for example, within 5 miles of Los Angeles. It looks like this returns the zip codes that are within 5 miles of a central point, but for a place like Los Angeles, that doesn’t even return all the zip codes that are in Los Angeles, let alone ones that are 5 miles outside LA.
Do you have any suggestions for how I could accomplish this? Thanks.
Elizabeth-
If you define Los Angeles as just a collection of Zip codes, than you can expand that to “Los Angeles + 5 miles out” by applying Denis’ search process to every Zip in the Los Angeles data set, and removing duplicates.
e.g. in the version of the data set I currently have, there are 104 zip codes identified where stateabbreviation = ‘CA’ and city = ‘Los Angeles’. Select all zips within 5 miles of any of those Zips, and you should get pretty much what you’re looking for… all zips within Los Angeles + 5 mi
Remember you’re searching from the centroid of each zip, so there’s some distance between there and the boundary of the “Zip area” (which is a very vague concept; Zip codes are not polygons). Depending on how accurate you’re trying to be you might add a bit to your 5 mile radius to compensate for that.
I’m not sure if I’m doing something wrong or if the script is out of date with the file but I’m getting errors. Thanks for writing the article though.
My errors are below:
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 1, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 2, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 3, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 4, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 5, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 6, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 7, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 8, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 9, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 10, column 11 (Unused3).
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 11, column 11 (Unused3).
Msg 4865, Level 16, State 1, Line 1
Cannot bulk load because the maximum number of errors (10) was exceeded.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider “BULK” for linked server “(null)” reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider “BULK” for linked server “(null)”.
I have applied this procedure posted above in my application this procedure is good to get geolocation of Person using ZipCode.
I have used google api to get latitude and longitude from zip code but sometimes it will give me lat. and log. or sometimes it will not return.
I am having this kind of problem.
I also got the same Bulk data error, just change the Unused3 datatype to varchar(30), just that latitude values get imported to longitude field and longitude values get imported to unused3 field with a return carriage (char(10)) added to second row onwards in the unused3 field, so you need to replace this character or you would find the cannot convert varchar to numeric errors.
I got the bulk data errors too. Instead of creating the ZipCodesTemp table I let the data import wizard create the table for me. First I renamed the text file “US.txt” to “ZipCodesTemp.txt”. Next, I let the data import wizard data-type the columns as it saw fit (varchar(50)). As you continue, you see the wizard wants to import this data into the “ZipCodesTemp” table that does not yet exist. The wizard will automatically create the table and import all of the records. This resulted in zero errors with over 43,000+ records imported.
Renaming US.txt to ZipCodesTemp.txt and using the data import wizard fixed myriad problems with bulk insert …thanks!
P.S. This was an incredibly useful article with easy to follow instructions and awesome examples. Thank you for taking the time to demonstrate how this new functionality works.
Michelle Ufford, the very first poster (02/11/09 @ 14:46 ), said, “I have absolutely no use for this, but this is a really cool new feature and a really great post.”
For anyone else who might be wondering, “What would I ever use this for?” Well, as a web application developer, I have a customer who wants to be able to offer a list of available jobs in and around the area the web visitor is surfing from before the visitor enters any data.
One FREE source that will tell you the location of the visitor’s ISP (the company that issued the external IP Address of the device they used on the Internet): http://www.geoplugin.net/json.gp?jsoncallback=?
Give it a try. Paste the above URL in your browser and you will see a JSON result with a lot of useful info including latitude and longitude.
Wow, thanks! So awesome! Using this tutorial, I got this up and running so easily.
Does all geographic data need to be in one table with one spacial index? Or, do the performance benefits remain if you join two tables with their own spacial indexes and use the geographic methods on them (e.g. .STDistance)?
Thank you this is exactly what I needed, I was looking forward to having chance of using and understanding Geography Data types.
With so many Smart Phone applications using Lat and long along with mapping, having Geo data by zip when detailed Lat long are not available this is a great way to move forward.
Any chance you’ve come up with a way to get the zip code for a particular lat/long? If so, can you share it with me.
Thanks
Ken
Great post 🙂
Just wanted to say thanks for posting this, been looking for a way to calculate distances using tables and not variables and you were the first one to help me!
Fantastic Article – Just started using the new geography type and this helped big time
This worked for the US values but I tried to apply the same logic and scripts to create a proximity lookup for Canada (I used the CA.txt file) and I got an error stating that latitude had to be in the range -90 to 90. This affects about 30% of the addresses in Canada.
How is this possible when the ranges are supposed to be -180 to 180? If anyone knows why I am getting this error, please let me know. Thank you.
Greetings,
When I run this part.
ALTER TABLE zipcodes ADD
CONSTRAINT [PK_ZipCode] PRIMARY KEY CLUSTERED
(
Zipcode,
Longitude
) WITH FILLFACTOR = 100
I get this error message… Any ideas why there is a duplicate key?
Msg 1505, Level 16, State 1, Line 2
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name ‘dbo.ZipCodes’ and the index name ‘PK_ZipCode’. The duplicate key value is (10804, -73.78630).
Msg 1750, Level 16, State 0, Line 2
Could not create constraint. See previous errors.
The statement has been terminated.
Hello All,
Brilliant and well-explained article. Thank you very much.
Is there any particular reason why you used World Geodetic System 1984? I played around with other spatial_reference_ids and I noticed the value was different.
Regards.
I have used this code and it works great, BUT it does not include the actual zip code that I am entering. All the ones around it BUT the zip code itself. How can I make sure this is included in the results?.
Also is it required to declare h and g as geography objects? I am not a big SQL guy. 🙂
Thanks for the awesome article but I fail to understand the signifiance of the spatial id in the beginning of the article. Any help is appreciated, thanks!
Simply just to say THANK YOU 🙂
Works like lightening.