In my Are you ready for SQL Server 2012 or are you still partying like it is 1999? post, I wrote about how you should start using SQL Server 2005 and SQL Server 2008 functionality now in order to prepare for SQL Server 2012. I still see tons of code that is written in the pre 2005 style and people still keep using those functions, procs and statements even though SQL Server 2005 and 2008 have much better functionality.
In today’s post I will cover Windowing functions. SQL Server 2005 introduced 4 windowing functions, these functions are ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE().
The functions make it very easy to add a number to a resultset or do paging without having to use TOP and SET ROWCOUNT
In the 2000 days you would have to use either a running count or you would insert into a temporary table with an identity column and select from that table ordered by the identity column.
In order to run all the examples we need to create a simple table first.
IF OBJECT_ID('tempdb..#Rankings') IS NOT NULL
DROP TABLE #Rankings
CREATE TABLE #Rankings (Value CHAR(1))
INSERT INTO #Rankings
SELECT 'A' UNION ALL
SELECT 'B' UNION ALL
SELECT 'C' UNION ALL
SELECT 'D' UNION ALL
SELECT 'E' UNION ALL
SELECT 'F'
ROW_NUMBER
The ROW_NUMBER function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. The syntax might look complicated the first time you use it but it is really not
The query below will simply add a number to the resultset
SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value -------------------- ----- 1 A 2 B 3 C 4 D 5 E 6 F
Let’s add some more rows so that the table is not unique anymore
INSERT INTO #Rankings
SELECT 'A' UNION ALL
SELECT 'B' UNION ALL
SELECT 'C'
When we run the same query again, you can see that it is ordered by Value and the Row column matches that ordering. You can also see that bot rows with the value A have a different row number
SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value -------------------- ----- 1 A 2 A 3 B 4 B 5 C 6 C 7 D 8 E 9 F
You can add more than one ranking function in the same query, in the query below we will add two row numbers, one will be sorted ascending and one will be sorted descending
SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,
ROW_NUMBER() OVER ( ORDER BY Value DESC) AS RowDesc,*
FROM #Rankings
ORDER BY Row
Row RowDesc Value -------------------- -------------------- ----- 1 8 A 2 9 A 3 6 B 4 7 B 5 4 C 6 5 C 7 3 D 8 2 E 9 1 F
As you can see Row is ordered ascending and RowDesc is ordered descending
RANK
The RANK function is similar to ROW_NUMBER but it will keep the same number for the same value, it also skips numbers when the value changes
SELECT RANK() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value -------------------- ----- 1 A 1 A 3 B 3 B 5 C 5 C 7 D 8 E 9 F
Now the rows with the value A have the same exact number, but did you notice that it skips numbers? What if you don’t want to skip numbers? You can use DENSE_RANK for that
DENSE_RANK
The DENSE_RANK function is similar to the RANK function, the difference is that numbers are not skipped when the value changes
SELECT DENSE_RANK() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value -------------------- ----- 1 A 1 A 2 B 2 B 3 C 3 C 4 D 5 E 6 F
More examples
Now let’s take a look at some more examples. First drop the table we created earlier and create this table instead
IF OBJECT_ID('tempdb..#Rankings') IS NOT NULL
DROP TABLE #Rankings
GO
CREATE TABLE #Rankings (Value CHAR(1), SomeValue int)
INSERT INTO #Rankings
SELECT 'A',1 UNION ALL
SELECT 'A',2 UNION ALL
SELECT 'A',3 UNION ALL
SELECT 'B',2 UNION ALL
SELECT 'C',3 UNION ALL
SELECT 'C',3 UNION ALL
SELECT 'C',9 UNION ALL
SELECT 'D',4 UNION ALL
SELECT 'E',5 UNION ALL
SELECT 'F',6
Now let’s run our simple ROW_NUMBER query again
SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value SomeValue -------------------- ----- ----------- 1 A 1 2 A 2 3 A 3 4 B 2 5 C 3 6 C 3 7 C 9 8 D 4 9 E 5 10 F 6
So, nothing special, what if you want the number start over when the value column changes. This would be a sample output
A 1
A 2
A 3
B 1
C 1
C 2
In order to do that, you can use partition, if you partition by Value, when the Value changes the number will start from 1 again. Here is what the query looks like
SELECT ROW_NUMBER() OVER ( PARTITION BY Value ORDER BY SomeValue DESC) AS Row,*
FROM #Rankings
ORDER BY Row
Row Value SomeValue -------------------- ----- ----------- 1 A 3 2 A 2 3 A 1 1 B 2 1 C 9 2 C 3 3 C 3 1 D 4 1 E 5 1 F 6
How about only returning the distinct value in the Value column together with the max value of the SomeValue column? In order to do that, you can use a Common Table Expression (see SQL Advent 2011 Day 5: Common Table Expressions for some information about Common Table Expressions), you partition by Value and then you select all the rows that are 1
;WITH cte AS( SELECT ROW_NUMBER() OVER ( PARTITION BY Value ORDER BY SomeValue DESC) AS Row,*
FROM #Rankings)
SELECT * FROM cte WHERE row = 1
ORDER By Value
Row Value SomeValue -------------------- ----- ----------- 1 A 3 1 B 2 1 C 9 1 D 4 1 E 5 1 F 6
That was pretty simple, now for a more complex example
This example is from Itzik Ben-Gan, I am not sure if I saw it in a forum or news group, it is really cool if you think about how the code works
First create this table
CREATE TABLE #Test (ID INT IDENTITY not null, SomeDate DATETIME, CrewID CHAR(2))
INSERT #Test VALUES('6/1/2008', '1')
INSERT #Test VALUES('6/2/2008', '1')
INSERT #Test VALUES('6/2/2008', '1')
INSERT #Test VALUES('6/3/2008', '1')
INSERT #Test VALUES('6/4/2008', '1')
INSERT #Test VALUES('6/6/2008', '1')
INSERT #Test VALUES('6/7/2008', '1')
INSERT #Test VALUES('6/10/2008', '1')
INSERT #Test VALUES('6/10/2008', '1')
INSERT #Test VALUES('6/29/2008', '1')
INSERT #Test VALUES('6/30/2008', '1')
INSERT #Test VALUES('7/1/2008', '2')
INSERT #Test VALUES('7/2/2008', '2')
INSERT #Test VALUES('7/3/2008', '2')
INSERT #Test VALUES('7/3/2008', '2')
INSERT #Test VALUES('7/4/2008', '2')
INSERT #Test VALUES('7/6/2008', '2')
INSERT #Test VALUES('7/7/2008', '2')
INSERT #Test VALUES('7/10/2008', '2')
INSERT #Test VALUES('7/10/2008', '2')
INSERT #Test VALUES('7/20/2008', '2')
INSERT #Test VALUES('7/21/2008', '2')
INSERT #Test VALUES('7/29/2008', '2')
Here is the desired output
CrewID startdate enddate ------ ---------- ---------- 1 2008-06-01 2008-06-04 1 2008-06-06 2008-06-07 1 2008-06-10 2008-06-10 1 2008-06-29 2008-06-30 2 2008-07-01 2008-07-04 2 2008-07-06 2008-07-07 2 2008-07-10 2008-07-10 2 2008-07-20 2008-07-21 2 2008-07-29 2008-07-29
As you can see if the cleaning crew cleaned for more than one continuous day then use the min and max dates from the range. If the same day shows up more than once then use it only ones. when a break occurs, start with another day
And here is the solution
;WITH c AS
(
SELECT CrewID, SomeDate,
DATEADD(DAY, -1*DENSE_RANK() OVER (PARTITION BY CrewID ORDER BY SomeDate),
SomeDate) AS Grouped
FROM #Test
)
SELECT CrewID, MIN(SomeDate) AS startdate, MAX(SomeDate) AS enddate
FROM c
GROUP BY CrewID, Grouped
ORDER BY CrewID,startdate;
That is all for today, come back tomorrow for some more goodies