LessThanDot

In my Are you ready for SQL Server 2012 or are you still partying like it is 1999? post, I wrote about how you should start using SQL Server 2005 and SQL Server 2008 functionality now in order to prepare for SQL Server 2012. I still see tons of code that is written in the pre 2005 style and people still keep using those functions, procs and statements even though SQL Server 2005 and 2008 have much better functionality.

In today’s post I will cover Windowing functions. SQL Server 2005 introduced 4 windowing functions, these functions are ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE().

The functions make it very easy to add a number to a resultset or do paging without having to use TOP and SET ROWCOUNT

In the 2000 days you would have to use either a running count or you would insert into a temporary table with an identity column and select from that table ordered by the identity column.

In order to run all the examples we need to create a simple table first.

IF OBJECT_ID('tempdb..#Rankings') IS NOT NULL
DROP TABLE #Rankings
		   
		   
   CREATE TABLE #Rankings (Value CHAR(1))
    INSERT INTO #Rankings
    SELECT 'A' UNION ALL
    SELECT 'B' UNION ALL
    SELECT 'C' UNION ALL
    SELECT 'D' UNION ALL
    SELECT 'E' UNION ALL
    SELECT 'F'

ROW_NUMBER

The ROW_NUMBER function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. The syntax might look complicated the first time you use it but it is really not

The query below will simply add a number to the resultset

SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value
-------------------- -----
1                    A
2                    B
3                    C
4                    D
5                    E
6                    F

Let’s add some more rows so that the table is not unique anymore

INSERT INTO #Rankings
    SELECT 'A' UNION ALL
    SELECT 'B' UNION ALL
    SELECT 'C'

When we run the same query again, you can see that it is ordered by Value and the Row column matches that ordering. You can also see that bot rows with the value A have a different row number

SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value
-------------------- -----
1                    A
2                    A
3                    B
4                    B
5                    C
6                    C
7                    D
8                    E
9                    F

You can add more than one ranking function in the same query, in the query below we will add two row numbers, one will be sorted ascending and one will be sorted descending

SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,
	   ROW_NUMBER() OVER ( ORDER BY Value DESC) AS RowDesc,*
FROM #Rankings
ORDER BY Row

Row                  RowDesc              Value
-------------------- -------------------- -----
1                    8                    A
2                    9                    A
3                    6                    B
4                    7                    B
5                    4                    C
6                    5                    C
7                    3                    D
8                    2                    E
9                    1                    F

As you can see Row is ordered ascending and RowDesc is ordered descending

RANK

The RANK function is similar to ROW_NUMBER but it will keep the same number for the same value, it also skips numbers when the value changes

SELECT RANK() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value
-------------------- -----
1                    A
1                    A
3                    B
3                    B
5                    C
5                    C
7                    D
8                    E
9                    F

Now the rows with the value A have the same exact number, but did you notice that it skips numbers? What if you don’t want to skip numbers? You can use DENSE_RANK for that

DENSE_RANK

The DENSE_RANK function is similar to the RANK function, the difference is that numbers are not skipped when the value changes

SELECT DENSE_RANK() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value
-------------------- -----
1                    A
1                    A
2                    B
2                    B
3                    C
3                    C
4                    D
5                    E
6                    F

More examples

Now let’s take a look at some more examples. First drop the table we created earlier and create this table instead

IF OBJECT_ID('tempdb..#Rankings') IS NOT NULL
		   DROP TABLE #Rankings
		   GO
CREATE TABLE #Rankings (Value CHAR(1), SomeValue int)

    INSERT INTO #Rankings
    SELECT 'A',1 UNION ALL
    SELECT 'A',2 UNION ALL
    SELECT 'A',3 UNION ALL
    SELECT 'B',2 UNION ALL
    SELECT 'C',3 UNION ALL
    SELECT 'C',3 UNION ALL
    SELECT 'C',9 UNION ALL
    SELECT 'D',4 UNION ALL
    SELECT 'E',5 UNION ALL
    SELECT 'F',6

Now let’s run our simple ROW_NUMBER query again

SELECT ROW_NUMBER() OVER ( ORDER BY Value) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value SomeValue
-------------------- ----- -----------
1                    A     1
2                    A     2
3                    A     3
4                    B     2
5                    C     3
6                    C     3
7                    C     9
8                    D     4
9                    E     5
10                   F     6

So, nothing special, what if you want the number start over when the value column changes. This would be a sample output

A 1

A 2

A 3

B 1

C 1

C 2

In order to do that, you can use partition, if you partition by Value, when the Value changes the number will start from 1 again. Here is what the query looks like

SELECT ROW_NUMBER() OVER ( PARTITION BY Value ORDER BY SomeValue DESC) AS Row,*
FROM #Rankings
ORDER BY Row

Row                  Value SomeValue
-------------------- ----- -----------
1                    A     3
2                    A     2
3                    A     1
1                    B     2
1                    C     9
2                    C     3
3                    C     3
1                    D     4
1                    E     5
1                    F     6

How about only returning the distinct value in the Value column together with the max value of the SomeValue column? In order to do that, you can use a Common Table Expression (see SQL Advent 2011 Day 5: Common Table Expressions for some information about Common Table Expressions), you partition by Value and then you select all the rows that are 1

;WITH cte AS( SELECT ROW_NUMBER() OVER ( PARTITION BY Value ORDER BY SomeValue DESC) AS Row,*
FROM #Rankings)

SELECT * FROM cte WHERE row = 1
ORDER By Value

Row                  Value SomeValue
-------------------- ----- -----------
1                    A     3
1                    B     2
1                    C     9
1                    D     4
1                    E     5
1                    F     6

That was pretty simple, now for a more complex example

This example is from Itzik Ben-Gan, I am not sure if I saw it in a forum or news group, it is really cool if you think about how the code works

First create this table

CREATE TABLE #Test (ID INT IDENTITY not null, SomeDate DATETIME, CrewID CHAR(2))
 
INSERT #Test VALUES('6/1/2008', '1')
INSERT #Test VALUES('6/2/2008', '1')
INSERT #Test VALUES('6/2/2008', '1')
INSERT #Test VALUES('6/3/2008', '1')
INSERT #Test VALUES('6/4/2008', '1')
INSERT #Test VALUES('6/6/2008', '1')
INSERT #Test VALUES('6/7/2008', '1')
INSERT #Test VALUES('6/10/2008', '1')
INSERT #Test VALUES('6/10/2008', '1')
INSERT #Test VALUES('6/29/2008', '1')
INSERT #Test VALUES('6/30/2008', '1')
 
INSERT #Test VALUES('7/1/2008', '2')
INSERT #Test VALUES('7/2/2008', '2')
INSERT #Test VALUES('7/3/2008', '2')
INSERT #Test VALUES('7/3/2008', '2')
INSERT #Test VALUES('7/4/2008', '2')
INSERT #Test VALUES('7/6/2008', '2')
INSERT #Test VALUES('7/7/2008', '2')
INSERT #Test VALUES('7/10/2008', '2')
INSERT #Test VALUES('7/10/2008', '2')
INSERT #Test VALUES('7/20/2008', '2')
INSERT #Test VALUES('7/21/2008', '2')
INSERT #Test VALUES('7/29/2008', '2')

Here is the desired output

CrewID startdate  enddate
------ ---------- ----------
1      2008-06-01 2008-06-04
1      2008-06-06 2008-06-07
1      2008-06-10 2008-06-10
1      2008-06-29 2008-06-30
2      2008-07-01 2008-07-04
2      2008-07-06 2008-07-07
2      2008-07-10 2008-07-10
2      2008-07-20 2008-07-21
2      2008-07-29 2008-07-29

As you can see if the cleaning crew cleaned for more than one continuous day then use the min and max dates from the range. If the same day shows up more than once then use it only ones. when a break occurs, start with another day

And here is the solution

;WITH c AS
(
  SELECT CrewID, SomeDate,
    DATEADD(DAY, -1*DENSE_RANK() OVER (PARTITION BY CrewID ORDER BY SomeDate),
SomeDate) AS Grouped
  FROM #Test
)
SELECT CrewID, MIN(SomeDate) AS startdate, MAX(SomeDate) AS enddate
FROM c
GROUP BY CrewID, Grouped
ORDER BY CrewID,startdate;

That is all for today, come back tomorrow for some more goodies

SQL Advent 2011 Day 6: Windowing functions

ROW_NUMBER

RANK

DENSE_RANK

More examples

About SQLDenis