George and I decided to show you how you can do simple radius searches based on Zip Codes, he did the SQL 2005 and before version here: SQL Server Zipcode Latitude/Longitude proximity distance search and I will do the SQL Server 2008 version.

The first thing we need to do is load our data. There are various sources for this data, ranging from free to expensive monthly subscriptions. One source of free data is: http://download.geonames.org/export/zip/ make sure to grab the US.ZIP file for this demo. Unrar/Unzip the file and you will have a US.txt file.

Once you have downloaded your data, the next step is to import it in to your database. You can use the following script to do it.

T-SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE TABLE ZipCodesTemp(
    [Country] [VARCHAR](2) NULL,
    [ZipCode] [VARCHAR](5) NULL,
    [City] [VARCHAR](200) NULL,
    [STATE] [VARCHAR](50) NULL,
    [StateAbbreviation] [VARCHAR](2) NULL,
    [County] [VARCHAR](50) NULL,
    [Unused1] [VARCHAR](5) NULL,
    [Unused2] [VARCHAR](1) NULL,
    [Latitude] [DECIMAL](8,5) NULL,
    [Longitude] [DECIMAL](8,5) NULL,
    [Unused3] [VARCHAR](1) NULL
    )
 
 
DECLARE @bulk_cmd VARCHAR(1000)
SET @bulk_cmd = 'BULK INSERT ZipCodesTemp
   FROM ''C:YourFolderUS.txt''
   WITH (FIELDTERMINATOR=''t'', ROWTERMINATOR = '''+CHAR(10)+''')'
 
EXEC(@bulk_cmd)
CREATE TABLE ZipCodesTemp(
    [Country] [VARCHAR](2) NULL,
    [ZipCode] [VARCHAR](5) NULL,
    [City] [VARCHAR](200) NULL,
    [STATE] [VARCHAR](50) NULL,
    [StateAbbreviation] [VARCHAR](2) NULL,
    [County] [VARCHAR](50) NULL,
    [Unused1] [VARCHAR](5) NULL,
    [Unused2] [VARCHAR](1) NULL,
    [Latitude] [DECIMAL](8,5) NULL,
    [Longitude] [DECIMAL](8,5) NULL,
    [Unused3] [VARCHAR](1) NULL
    )


DECLARE @bulk_cmd VARCHAR(1000)
SET @bulk_cmd = 'BULK INSERT ZipCodesTemp
   FROM ''C:YourFolderUS.txt''
   WITH (FIELDTERMINATOR=''t'', ROWTERMINATOR = '''+CHAR(10)+''')'
 
EXEC(@bulk_cmd)

Now you will create this table

T-SQL
1
2
3
4
5
6
7
8
9
10
11
12
CREATE TABLE ZipCodes(
    [Country] [VARCHAR](2)  NULL,
    [ZipCode] [VARCHAR](5) NOT NULL,
    [City] [VARCHAR](200)  NULL,
    [STATE] [VARCHAR](50)  NULL,
    [StateAbbreviation] [VARCHAR](2)  NULL,
    [County] [VARCHAR](50) NULL,
    [Latitude] [DECIMAL](8,5) NOT NULL,
    [Longitude] [DECIMAL](8,5) NOT NULL,
    [GeogCol1] [GEOGRAPHY]  NULL,
    [GeogColTemp] [varchar](100) NULL
    )
CREATE TABLE ZipCodes(
    [Country] [VARCHAR](2)  NULL,
    [ZipCode] [VARCHAR](5) NOT NULL,
    [City] [VARCHAR](200)  NULL,
    [STATE] [VARCHAR](50)  NULL,
    [StateAbbreviation] [VARCHAR](2)  NULL,
    [County] [VARCHAR](50) NULL,
    [Latitude] [DECIMAL](8,5) NOT NULL,
    [Longitude] [DECIMAL](8,5) NOT NULL,
    [GeogCol1] [GEOGRAPHY]  NULL,
    [GeogColTemp] [varchar](100) NULL
    )

There is at least one duplicate row in this file so we will import only uniques

T-SQL
1
2
3
INSERT  ZipCodes (Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude)
SELECT DISTINCT Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude
FROM  ZipCodesTemp
INSERT  ZipCodes (Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude)
SELECT DISTINCT Country,ZipCode,City,STATE,StateAbbreviation,County,Latitude,Longitude
FROM  ZipCodesTemp

Our next step will be to update the GeogCol1 table with something that SQL server can understand.
Here is some sample code that displays the format of this datatype

T-SQL
1
2
3
DECLARE @h geography;
SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326);
SELECT @h
DECLARE @h geography;
SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326);
SELECT @h

output
———————————————-
0xE6100000010C6744696FF07D4340EC51B81E855753C0

As you can see it is some binary data. This data is using the World Geodetic System 1984 (WGS 84)

To see if this is supported in your database you can run this query

T-SQL
1
SELECT * FROM sys.spatial_reference_systems
SELECT * FROM sys.spatial_reference_systems

And yes in my database it has a spatial_reference_id of 4326

T-SQL
1
2
SELECT * FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4326
SELECT * FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4326

Here is the meta data

GEOGCS["WGS 84", DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84", 6378137, 298.257223563]], PRIMEM["Greenwich", 0],
UNIT["Degree", 0.0174532925199433]]

Back to our code, in order to have this data in our table

SET @h = geography::STGeomFromText(‘POINT(-77.36750 38.98390)’, 4326);

We need to do some things, first we update our temp column

T-SQL
1
2
3
UPDATE zipcodes 
SET GeogColTemp= 'POINT(' + convert(varchar(100),longitude) 
+' ' +  convert(varchar(100),latitude) +')'
UPDATE zipcodes 
SET GeogColTemp= 'POINT(' + convert(varchar(100),longitude) 
+' ' +  convert(varchar(100),latitude) +')'

Now we can update out geography column

T-SQL
1
2
UPDATE zipcodes 
SET GeogCol1 =  geography::STGeomFromText(GeogColTemp,4326)
UPDATE zipcodes 
SET GeogCol1 =  geography::STGeomFromText(GeogColTemp,4326)

We can drop the temp column now

T-SQL
1
ALTER TABLE zipcodes DROP COLUMN GeogColTemp
ALTER TABLE zipcodes DROP COLUMN GeogColTemp

Now we have to add a primary key, this is needed because otherwise we won’t be able to create our spatial index and the following message would be displayed

Server: Msg 12008, Level 16, State 1, Line 1
Table ‘zipcodes’ does not have a clustered primary key as required by the spatial index. Make sure that the primary key column exists on the table before creating a spatial index.

T-SQL
1
2
3
4
5
6
ALTER TABLE zipcodes ADD 
    CONSTRAINT [PK_ZipCode] PRIMARY KEY  CLUSTERED 
    (
        Zipcode,
        Longitude
    ) WITH  FILLFACTOR = 100 
ALTER TABLE zipcodes ADD 
	CONSTRAINT [PK_ZipCode] PRIMARY KEY  CLUSTERED 
	(
		Zipcode,
		Longitude
	) WITH  FILLFACTOR = 100 

Create the spatial index

T-SQL
1
2
CREATE SPATIAL INDEX SIndx_SpatialTable_geography_col1 
   ON zipcodes(GeogCol1);
CREATE SPATIAL INDEX SIndx_SpatialTable_geography_col1 
   ON zipcodes(GeogCol1);

first I will show you an example to calculate the distance that you can execute

T-SQL
1
2
3
4
5
DECLARE @g geography;
DECLARE @h geography;
SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326);
SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326);
SELECT @g.STDistance(@h)/1609.344;
DECLARE @g geography;
DECLARE @h geography;
SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326);
SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326);
SELECT @g.STDistance(@h)/1609.344;

as you can see the distance is 8.8490611480890067

Now I want to see all the zipcode which are within 20 miles of zipcode 10028 (yes I used to live there)

Here is a way that will take a long time since it is not sargable, this will take about 2 seconds

T-SQL
1
2
3
4
5
6
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)/1609.344 <= 20
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)/1609.344 <= 20

Now we all know functions on the left side of the operator are bad, here is how this is optimized, we switch the calculation to the right side of the = sign

T-SQL
1
2
3
4
5
6
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344)
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344)

that ran in between 15 and 60 milliseconds

To find everything between 10 and 20 miles you can use this

T-SQL
1
2
3
4
5
6
7
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344)
AND g.GeogCol1.STDistance(h.GeogCol1)>= (10 * 1609.344)
SELECT h.* 
FROM zipcodes g 
JOIN zipcodes h on g.zipcode <> h.zipcode
AND g.zipcode = '10028'
AND h.zipcode <> '10028'
WHERE g.GeogCol1.STDistance(h.GeogCol1)<=(20 * 1609.344)
AND g.GeogCol1.STDistance(h.GeogCol1)>= (10 * 1609.344)

As you can see doing stuff like this on SQL Server 2008 is fairly easy because of the geograpy data type

*** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum