Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Authors

Search

XML Feeds

Google Ads

« How to search for all words inclusive without using Full Text searchSQL Server Decode/Encode unicode data »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

In this blog post I will show you a couple of ways of getting the top 2 values from a set in SQL Server. The dense_rank queries will not work if you have SQL server 2000 but the other ones should.

Let's say you have a table that has this data
100
100
100
99
99
95
95
90

You want to get the 2 highest amounts in that table the values 100 and 99, how would you do this?
Let's take a look at some posibillities, first create this table and populate it with data

  1. CREATE TABLE TestTies (id INT IDENTITY,SomeValue TINYINT)
  2.  
  3. INSERT TestTies VALUES(100)
  4. INSERT TestTies VALUES(100)
  5. INSERT TestTies VALUES(100)
  6. INSERT TestTies VALUES(99)
  7. INSERT TestTies VALUES(99)
  8.  
  9. INSERT TestTies VALUES(95)
  10. INSERT TestTies VALUES(95)
  11. INSERT TestTies VALUES(90)

Top 2 obviously will not work

  1. SELECT TOP 2 id, SomeValue
  2. FROM TestTies
  3. ORDER BY SomeValue DESC
output
-----------------
id	SomeValue
1	100
2	100

You also cannot use WITH TIES because that just brings the value 100

  1. SELECT TOP 2  WITH TIES id, SomeValue
  2. FROM TestTies
  3. ORDER BY SomeValue DESC
Output
-----------------
id	SomeValue
1	100
2	100
3	100

Here are a couple of ways that will work

For all the queries below the ouput will be this

Output
------------------
id	SomeValue
1	100
2	100
3	100
4	99
5	99

The first one is by using the DENSE_RANK() function. The queries below are functionally identical, one is using a Common Table Expression while the other one is using a subquery

  1. --query 1
  2. WITH rankings AS (
  3. SELECT *,DENSE_RANK() OVER ( ORDER BY SomeValue DESC)  AS Rank
  4. FROM TestTies)
  5.  
  6. SELECT id, SomeValue FROM rankings
  7. WHERE Rank <=2
  1. --query 2
  2. SELECT id, SomeValue FROM   (
  3. SELECT *,DENSE_RANK() OVER ( ORDER BY SomeValue DESC)  AS Rank
  4. FROM TestTies) x
  5. WHERE Rank <=2

We can also use the MAX function twice like in the query below

  1. --query 3
  2. SELECT *
  3. FROM TestTies
  4. WHERE SomeValue >= (SELECT MAX(SomeValue)
  5.             FROM TestTies
  6.             WHERE SomeValue < (SELECT MAX(SomeValue)
  7.             FROM TestTies))

Another option is to use distinct top 2 in a sub query

  1. --query 4
  2. SELECT *
  3. FROM TestTies
  4. WHERE SomeValue in(
  5. SELECT DISTINCT TOP 2  SomeValue
  6. FROM TestTies
  7. ORDER BY SomeValue DESC)

Finally in query 5 we do a running count, as you can see that looks complicated

  1. --query 5
  2. SELECT l.id, l.SomeValue
  3. FROM(SELECT v.SomeValue, v.id,
  4.     Ranking =       (SELECT COUNT(DISTINCT SomeValue)
  5.             FROM TestTies a
  6.             WHERE v.SomeValue <= a.SomeValue)
  7.     FROM TestTies v) l
  8. WHERE l.Ranking <=2
  9. ORDER BY l.Ranking

So how do these queries perform in regards to each other?

Hit CTRL + K, select all the code in the code block below and hit F5/execute

  1. --query 1
  2. WITH rankings AS (
  3. SELECT *,DENSE_RANK() OVER ( ORDER BY SomeValue DESC)  AS Rank
  4. FROM TestTies)
  5.  
  6. SELECT id, SomeValue FROM rankings
  7. WHERE Rank <=2
  8.  
  9. --query 2
  10. SELECT id, SomeValue FROM   (
  11. SELECT *,DENSE_RANK() OVER ( ORDER BY SomeValue DESC)  AS Rank
  12. FROM TestTies) x
  13. WHERE Rank <=2
  14.  
  15.  
  16.  
  17. --query 3
  18. SELECT *
  19. FROM TestTies
  20. WHERE SomeValue >= (SELECT MAX(SomeValue)
  21.             FROM TestTies
  22.             WHERE SomeValue < (SELECT MAX(SomeValue)
  23.             FROM TestTies))
  24.  
  25.  
  26. --query 4
  27. SELECT *
  28. FROM TestTies
  29. WHERE SomeValue in(
  30. SELECT DISTINCT TOP 2  SomeValue
  31. FROM TestTies
  32. ORDER BY SomeValue DESC)
  33.  
  34.  
  35.  
  36. --query 5
  37. SELECT l.id, l.SomeValue
  38. FROM(SELECT v.SomeValue, v.id,
  39.     Ranking =       (SELECT COUNT(DISTINCT SomeValue)
  40.             FROM TestTies a
  41.             WHERE v.SomeValue <= a.SomeValue)
  42.     FROM TestTies v) l
  43. WHERE l.Ranking <=2
  44. ORDER BY l.Ranking

Here is the result

query 1 9.89% (dense_rank CTE)
query 2 9.89% (dense_rank sub query)
query 3 6.70% (max twice)
query 4 19.41% (distinct sub query)
query 5 54.11% (running count)

Wow, query 5 running count is slower than the other 4 combined, this was expected of course. It is also interesting that dense_rank is not as efficient as using max twice
Let's do some more testing, we will create a non clustered index on the SomeValue column

  1. CREATE INDEX ix_SomeValue ON TestTies(SomeValue DESC)

Run the 5 queries again

query 1 13.58% (dense_rank CTE)
query 2 13.58% dense_rank sub query)
query 3 24.03% (max twice)
query 4 13.95% (distinct sub query)
query 5 34.32% (running count)

As you can see now dense_rank is fastest. Let's make that non clustered index a clustered index and look at the plans again.

  1. DROP INDEX  TestTies.ix_SomeValue
  2.  
  3. CREATE CLUSTERED INDEX ix_SomeValue ON TestTies(SomeValue DESC)

Below are the results of running those queries again

query 1 7.78% (dense_rank CTE)
query 2 7.78% dense_rank sub query)
query 3 23.32% (max twice)
query 4 15.95% (distinct sub query)
query 5 45.16% (running count)

As you can see when we have an index on the column then dense_rank is the fastest out of all. Feel free to load up some more data into the table and experiment with these queries. If you know of another way to accomplish this feel free to post a comment with your query.




*** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

About the Author

User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
Social SitingsTwitterFacebookLinkedInHomePageLTD RSS Feed
785 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

3 comments

Comment from: Naomi [Member] Email
*****
Interesting info, thanks.
07/08/09 @ 15:55
Comment from: niikola [Member] Email
****-
It would be interesting to see cpu time from execution as Select top 5 returns exactly the same query cost as dense ranks. Unfortunately for all the queries cpu time is 0
07/09/09 @ 02:32
Comment from: SQLDenis [Member] Email
*****
Niikola, you wouldn't be able to use top 5 because in reality you wouldn't know if the top 2 distinct would be top 2, top 5, top 10 etc

Filling up the table with 10000 values or so might give different plans, also statistics io and time would be a better metric to check against, I leave that exercise up to the reader :-)
07/09/09 @ 04:34

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)