Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Authors

Search

XML Feeds

Google Ads

« How Do You Check If A Temporary Table Exists In SQL Server.NET Framework 3.5 SP1: LINQ perf improvements (LINQ to Objects and LINQ to SQL) »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

Take a look at this query.

  1. SELECT * FROM
  2.  
  3. (
  4.  
  5. SELECT customer_id, 'MTD' AS record_type, COUNT(*), SUM(...), AVG(...)
  6.  
  7. FROM payment_table
  8.  
  9.  
  10. WHERE YEAR(payment_dt) = YEAR(GETDATE())
  11.  
  12. and MONTH(payment_dt) = MONTH(GETDATE())
  13.  
  14. GROUP BY customer_id) MTD_payments
  15.  
  16. UNION ALL
  17.  
  18. (
  19.  
  20. SELECT customer_id, 'YTD' AS record_type, COUNT(*), SUM(...), AVG(...)
  21.  
  22. FROM payment_table
  23.  
  24. WHERE
  25.  
  26. WHERE YEAR(payment_dt) = YEAR(GETDATE())
  27.  
  28. GROUP BY customer_id) YTD_payments
  29.  
  30. UNION ALL
  31.  
  32. (
  33.  
  34. SELECT customer_id, 'LTD' AS record_type, COUNT(*), SUM(...), AVG(...)
  35.  
  36. FROM payment_table) LTD_payments
  37.  
  38. ) payments_report
  39.  
  40. ORDER BY customer_id, record_type

Can you see the problem?
A person had this query, it would run for over 24 hours. Wow, that is pretty bad, I don't think I had ever written something that ran over an hour, and the ones I did were mostly defragmentation and update statistics jobs.

The problem is that the following piece of code

  1. WHERE YEAR(payment_dt) = YEAR(GETDATE())
  2. and MONTH(payment_dt) = MONTH(GETDATE())

is not sargable. First what does it mean to be sargable? A query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.

This query is not sargable because there is a function on the column, whenever you use a function on the column you will not get an index seek but an index scan. The difference between an index seek and an index scan can be explained like this: when searching for something in a book, you go to the index in the back find the page number and go to the page, that is an index seek. When looking for something in a book you go from page one until the last page, read all the words on all the ages and get what you need, that was an index scan. Do you see how much more expensive in terms of performance that was?

Let's get back to the query, what can we do to make this piece of code use an index seek?

  1. WHERE YEAR(payment_dt) = YEAR(GETDATE())
  2. and MONTH(payment_dt) = MONTH(GETDATE())

You would change it to this:

  1. WHERE payment_dt >= DATEADD(mm, DATEDIFF(mm, 0, GETDATE())+0, 0)
  2. and payment_dt < DATEADD(mm, DATEDIFF(mm, 0, GETDATE())+1, 0)

You can see the complete question on the MSDN forum site here:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/d8464d33-38cc-48ba-bd15-5e7d14eeb19c

The Person said that his query went from over 24 hours to 36 seconds. Wow!! That is very significant. hardware cannot help you out if you have bad queries like that.

The same exact day I answered a very similar question, take a look here: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/6107ec5c-b671-485f-941e-4efdf3d2fc66

The person had this

  1. AND DATEDIFF(d, '08/10/2008', DateCreated) >= 0
  2. AND DATEDIFF(d, DateCreated, '08/15/2008') >= 0

I told him to change it to this

  1. AND DateCreated >= '08/10/2008'
  2. and DateCreated <= '08/15/2008'

And that solved that query. If you are interested in some more performance, I have written some Query Optimization items on the LessThanDot Wiki. Below are some direct links

Case Sensitive Search
No Functions on Left Side of Operator
Query Optimizations With Dates
Optimization: Set Nocount On
No Math In Where Clause
Don't Use (select *), but List Columns

If you are interested in some blogposts about dates, take a look at these two which I wrote earlier
How Are Dates Stored In SQL Server?
Do You Know How Between Works With Dates?

About the Author

User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
Social SitingsTwitterFacebookLinkedInHomePageLTD RSS Feed
8103 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

17 comments

Comment from: Adam Zochowski [Visitor] · http://apz.nofate.com
|| what can we do to make this piece
|| of code use an index seek

|| WHERE YEAR(payment_dt) = YEAR(GETDATE())
|| and MONTH(payment_dt) = MONTH(GETDATE())

why not

create index idx_Year on table (year(payment_dt))
create index idx_Month on table (month(payment_dt))

year(getDate()) is evaluted once (since getdate does not change per row, only per transaction), and year(payment_dt) would be evaluated through index?


Why the ugly and hard to grok

WHERE payment_dt >= DATEADD(mm, DATEDIFF(mm, 0, GETDATE())+0, 0)

08/18/08 @ 12:48
Comment from: Venshi [Visitor]
*****
I hadn't thought about this before. Is this true for all functions? I have a few statements with COALESCE on both sides but haven't noticed any issues. I'll have to see if I can improve on them.
08/18/08 @ 13:40
Comment from: SQLDenis [Member] Email
Adam, because the more indexes you add, the slower your inserts and updates will be
08/18/08 @ 14:15
Comment from: SQLDenis [Member] Email
Venshi,

Look at the execution plan to see if you get a seek or a scan

where coalesce(column,0) = 0

should result in a scan
08/18/08 @ 14:17
Comment from: Kenny [Visitor]
*****
240000% improvement?
08/18/08 @ 14:39
Comment from: jürgen [Visitor]
****-
the only thing i would add is that the analogy of the book surely is a bit misleading, because reading the book from start to the item you are looking for is analogous to a table scan whereas reading the index from beginning to what you are looking for is analogous to an index scan.

the use of functions causes this behaviour, because as in this case to find all the index items where the date has a certain month the db can't use direct access. it's analogous to finding all the index items in a book that contain 'blah'. the solution is to limit the search to a range of dates, ie between two index items that can be accessed directly. i don't think there is a way to convert my example so that the db can directly access index items.
08/18/08 @ 14:59
Comment from: SQLDenis [Member] Email
Jürgen,

It also depends if the index is clustered or not
08/18/08 @ 15:45
Comment from: Mark Brackett [Visitor] Email
While I understand the technicalities, I think it's a pretty sad statement of affairs that after 20 years of development Sql Server (and other RDBMS's) still can't perform the following logic:

1. YEAR(GETDATE()) is deterministic (for a single statement, at least) returning 2008 - so it's effectively a constant.
2. (YEAR(datetime) = 2008) is exactly equal to (datetime BETWEEN '1/1/2008' AND '12/31/2008').

I learned how to "solve for x" in 4th grade, and now I can even buy a $30 calculator that'll do it for me. Is it too much to ask that my $5000 server, and $20000 software do the same?

For a supposed declarative language that touts it's "Query Optimizer", Sql sure is brain dead quite often.
08/18/08 @ 17:21
Comment from: Jerry Hung [Visitor] · http://www.jerryhung.ca
****-
I am surprised the person did not look at the Execution Plan to figure out what's going on

A simple UNION query that runs 24 hours = something is SERIOUSLY wrong

I do agree, SQL should be smarter in Date-handling cases
and YEAR(getdate()) and MONTH(getdate()) are essentially CONSTANT in the above query. The user should save them into a variable at least

I've learned to use BETWEEN if I can, versus Date functions or DATEDIFF/DATEPART
08/19/08 @ 09:20
Comment from: jürgen [Visitor]
****-
it's not the YEAR(GETDATE()) and MONTH(getdate()) that's the problem, but the YEAR(payment_dt) and MONTH(payment_dt).

But you might think that the query optimiser would know that YEAR(payment_dt) = YEAR(GETDATE()) was the same as payment_date BETWEEN '1-Jan-' + YEAR(GETDATE()) and '31-Dec-' + YEAR(GETDATE()) and MONTH(payment_dt) = MONTH(GETDATE()) in combination with YEAR(payment_dt) = YEAR(GETDATE()) is the same as payment_dt BETWEEN '1-' + DATENAME(m,GETDATE()) + '-' + CONVERT(varchar,YEAR(GETDATE())) and DATEADD(d,-1,DATEADD(m,1,'1-' + DATENAME(m,GETDATE()) + '-' + CONVERT(varchar,YEAR(GETDATE())))) (there is probably a nicer way of doing this).
08/19/08 @ 15:04
Comment from: jürgen [Visitor]
****-
i made a mistake with using BETWEEN in the previous comment, it should be:

payment_dt >= '1-' + DATENAME(m,GETDATE()) + '-' + CONVERT(varchar,YEAR(GETDATE())) and payment_dt < DATEADD(m,1,'1-' + DATENAME(m,GETDATE()) + '-' + CONVERT(varchar,YEAR(GETDATE())))

which is pretty much the same as your answer, but not as nice.
08/19/08 @ 15:21
Comment from: Emtucifor [Member] Email
*****
>> Why the ugly and hard to grok

It's not ugly when the performance is so great *without adding any more indexes*! And it's only hard to grok if you're new to it. The concept is quite simple, in fact, much simpler than the code looks.
08/20/08 @ 11:39
Comment from: Steve Campbell [Visitor] · http://blog.perfectapi.com
****-
The title should be "Only in a database is compiler optimization so poor that it cannot figure out a trivial transform".

Really, from a developer perspective, it is *ridiculous* how poor the sql compiler is at making simple optimizations.
08/21/08 @ 11:25
Comment from: chrissie1 [Member] Email
*****
Well Steve I'm sure MS will appreciate the patch you are going to give them ;-).
08/21/08 @ 13:03
Comment from: Brent Ozar [Visitor] · http://www.brentozar.com
*****
I'm chuckling about the angry developer comments above, complaining about why SQL Server can't predict the results on year/getdate. It's a poor musician who blames his instrument....
08/30/08 @ 10:35
Comment from: jbr [Visitor] · http://www.dl3m.com
***--
thanks for posting
09/07/08 @ 15:42
Comment from: Naomi [Member] Email
*****
Great article and very important.
07/14/09 @ 13:25

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)