Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « How to search for all words inclusive without using Full Text searchSQL Server Decode/Encode unicode data »
    comments

    In this blog post I will show you a couple of ways of getting the top 2 values from a set in SQL Server. The dense_rank queries will not work if you have SQL server 2000 but the other ones should.

    Let's say you have a table that has this data
    100
    100
    100
    99
    99
    95
    95
    90

    You want to get the 2 highest amounts in that table the values 100 and 99, how would you do this?
    Let's take a look at some posibillities, first create this table and populate it with data

    1. create table TestTies (id int identity,SomeValue tinyint)
    2.  
    3. insert TestTies values(100)
    4. insert TestTies values(100)
    5. insert TestTies values(100)
    6. insert TestTies values(99)
    7. insert TestTies values(99)
    8.  
    9. insert TestTies values(95)
    10. insert TestTies values(95)
    11. insert TestTies values(90)

    Top 2 obviously will not work

    1. select top 2 id, SomeValue
    2. from TestTies
    3. order by SomeValue desc
    output
    -----------------
    id	SomeValue
    1	100
    2	100

    You also cannot use WITH TIES because that just brings the value 100

    1. select top 2  WITH TIES id, SomeValue
    2. from TestTies
    3. order by SomeValue desc
    Output
    -----------------
    id	SomeValue
    1	100
    2	100
    3	100

    Here are a couple of ways that will work

    For all the queries below the ouput will be this

    Output
    ------------------
    id	SomeValue
    1	100
    2	100
    3	100
    4	99
    5	99

    The first one is by using the DENSE_RANK() function. The queries below are functionally identical, one is using a Common Table Expression while the other one is using a subquery

    1. --query 1
    2. with rankings as (
    3. select *,DENSE_RANK() OVER ( ORDER BY SomeValue desc)  as Rank
    4. from TestTies)
    5.  
    6. select id, SomeValue from rankings
    7. where Rank <=2
    1. --query 2
    2. select id, SomeValue from   (
    3. select *,DENSE_RANK() OVER ( ORDER BY SomeValue desc)  as Rank
    4. from TestTies) x
    5. where Rank <=2

    We can also use the MAX function twice like in the query below

    1. --query 3
    2. select *
    3. from TestTies
    4. where SomeValue >= (select max(SomeValue)
    5.             from TestTies
    6.             where SomeValue < (select max(SomeValue)
    7.             from TestTies))

    Another option is to use distinct top 2 in a sub query

    1. --query 4
    2. select *
    3. from TestTies
    4. where SomeValue in(
    5. select distinct top 2  SomeValue
    6. from TestTies
    7. order by SomeValue desc)

    Finally in query 5 we do a running count, as you can see that looks complicated

    1. --query 5
    2. select l.id, l.SomeValue
    3. from(select v.SomeValue, v.id,
    4.     Ranking =       (select count(distinct SomeValue)
    5.             from TestTies a
    6.             where v.SomeValue <= a.SomeValue)
    7.     from TestTies v) l
    8. where l.Ranking <=2
    9. order by l.Ranking

    So how do these queries perform in regards to each other?

    Hit CTRL + K, select all the code in the code block below and hit F5/execute

    1. --query 1
    2. with rankings as (
    3. select *,DENSE_RANK() OVER ( ORDER BY SomeValue desc)  as Rank
    4. from TestTies)
    5.  
    6. select id, SomeValue from rankings
    7. where Rank <=2
    8.  
    9. --query 2
    10. select id, SomeValue from   (
    11. select *,DENSE_RANK() OVER ( ORDER BY SomeValue desc)  as Rank
    12. from TestTies) x
    13. where Rank <=2
    14.  
    15.  
    16.  
    17. --query 3
    18. select *
    19. from TestTies
    20. where SomeValue >= (select max(SomeValue)
    21.             from TestTies
    22.             where SomeValue < (select max(SomeValue)
    23.             from TestTies))
    24.  
    25.  
    26. --query 4
    27. select *
    28. from TestTies
    29. where SomeValue in(
    30. select distinct top 2  SomeValue
    31. from TestTies
    32. order by SomeValue desc)
    33.  
    34.  
    35.  
    36. --query 5
    37. select l.id, l.SomeValue
    38. from(select v.SomeValue, v.id,
    39.     Ranking =       (select count(distinct SomeValue)
    40.             from TestTies a
    41.             where v.SomeValue <= a.SomeValue)
    42.     from TestTies v) l
    43. where l.Ranking <=2
    44. order by l.Ranking

    Here is the result

    query 1 9.89% (dense_rank CTE)
    query 2 9.89% (dense_rank sub query)
    query 3 6.70% (max twice)
    query 4 19.41% (distinct sub query)
    query 5 54.11% (running count)

    Wow, query 5 running count is slower than the other 4 combined, this was expected of course. It is also interesting that dense_rank is not as efficient as using max twice
    Let's do some more testing, we will create a non clustered index on the SomeValue column

    1. create index ix_SomeValue on TestTies(SomeValue desc)

    Run the 5 queries again

    query 1 13.58% (dense_rank CTE)
    query 2 13.58% dense_rank sub query)
    query 3 24.03% (max twice)
    query 4 13.95% (distinct sub query)
    query 5 34.32% (running count)

    As you can see now dense_rank is fastest. Let's make that non clustered index a clustered index and look at the plans again.

    1. drop index  TestTies.ix_SomeValue
    2.  
    3. create clustered index ix_SomeValue on TestTies(SomeValue desc)

    Below are the results of running those queries again

    query 1 7.78% (dense_rank CTE)
    query 2 7.78% dense_rank sub query)
    query 3 23.32% (max twice)
    query 4 15.95% (distinct sub query)
    query 5 45.16% (running count)

    As you can see when we have an index on the column then dense_rank is the fastest out of all. Feel free to load up some more data into the table and experiment with these queries. If you know of another way to accomplish this feel free to post a comment with your query.




    *** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
    Social SitingsTwitterFacebookLinkedInHomePageFlickrLTD RSS Feed
    2008 views
    Instapaper

    4 comments

    Comment from: Naomi Nosonovsky [Member]
    *****
    Naomi Nosonovsky Interesting info, thanks.
    07/08/09 @ 15:55
    Comment from: niikola [Member] Email
    ****-
    niikola It would be interesting to see cpu time from execution as Select top 5 returns exactly the same query cost as dense ranks. Unfortunately for all the queries cpu time is 0
    07/09/09 @ 02:32
    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis Niikola, you wouldn't be able to use top 5 because in reality you wouldn't know if the top 2 distinct would be top 2, top 5, top 10 etc

    Filling up the table with 10000 values or so might give different plans, also statistics io and time would be a better metric to check against, I leave that exercise up to the reader :-)
    07/09/09 @ 04:34
    Comment from: Raj [Visitor] Email · http://netraju.blogspot.com
    Raj Good article Denis, Nice to see the comparison results.
    09/10/11 @ 01:20

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)