Take a look at this query.
T-SQL | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | select * from ( select customer_id, 'MTD' as record_type, count(*), sum(...), avg(...) from payment_table where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate()) group by customer_id) MTD_payments UNION ALL ( select customer_id, 'YTD' as record_type, count(*), sum(...), avg(...) from payment_table where where year(payment_dt) = year(getDate()) group by customer_id) YTD_payments UNION ALL ( select customer_id, 'LTD' as record_type, count(*), sum(...), avg(...) from payment_table) LTD_payments ) payments_report order by customer_id, record_type |
select * from ( select customer_id, 'MTD' as record_type, count(*), sum(...), avg(...) from payment_table where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate()) group by customer_id) MTD_payments UNION ALL ( select customer_id, 'YTD' as record_type, count(*), sum(...), avg(...) from payment_table where where year(payment_dt) = year(getDate()) group by customer_id) YTD_payments UNION ALL ( select customer_id, 'LTD' as record_type, count(*), sum(...), avg(...) from payment_table) LTD_payments ) payments_report order by customer_id, record_type
Can you see the problem?
A person had this query, it would run for over 24 hours. Wow, that is pretty bad, I don’t think I had ever written something that ran over an hour, and the ones I did were mostly defragmentation and update statistics jobs.
The problem is that the following piece of code
T-SQL | |
1 2 | where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate()) |
where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate())
is not sargable. First what does it mean to be sargable? A query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.
This query is not sargable because there is a function on the column, whenever you use a function on the column you will not get an index seek but an index scan. The difference between an index seek and an index scan can be explained like this: when searching for something in a book, you go to the index in the back find the page number and go to the page, that is an index seek. When looking for something in a book you go from page one until the last page, read all the words on all the ages and get what you need, that was an index scan. Do you see how much more expensive in terms of performance that was?
Let’s get back to the query, what can we do to make this piece of code use an index seek?
T-SQL | |
1 2 | where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate()) |
where year(payment_dt) = year(getDate()) and month(payment_dt) = month(getDate())
You would change it to this:
T-SQL | |
1 2 | where payment_dt >= dateadd(mm, datediff(mm, 0, getdate())+0, 0) and payment_dt < dateadd(mm, datediff(mm, 0, getdate())+1, 0) |
where payment_dt >= dateadd(mm, datediff(mm, 0, getdate())+0, 0) and payment_dt < dateadd(mm, datediff(mm, 0, getdate())+1, 0)
You can see the complete question on the MSDN forum site here:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/d8464d33-38cc-48ba-bd15-5e7d14eeb19c
The Person said that his query went from over 24 hours to 36 seconds. Wow!! That is very significant. hardware cannot help you out if you have bad queries like that.
The same exact day I answered a very similar question, take a look here: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/6107ec5c-b671-485f-941e-4efdf3d2fc66
The person had this
T-SQL | |
1 2 | AND DATEDIFF(d, '08/10/2008', DateCreated) >= 0 AND DATEDIFF(d, DateCreated, '08/15/2008') >= 0 |
AND DATEDIFF(d, '08/10/2008', DateCreated) >= 0 AND DATEDIFF(d, DateCreated, '08/15/2008') >= 0
I told him to change it to this
T-SQL | |
1 2 | AND DateCreated >= '08/10/2008' and DateCreated < '08/16/2008' |
AND DateCreated >= '08/10/2008' and DateCreated < '08/16/2008'
And that solved that query. If you are interested in some more performance, I have written some Query Optimization items on the LessThanDot Wiki. Below are some direct links
<a href="http://wiki.lessthandot.com/index.php/Case_Sensitive_Search" title="Case Sensitive Search">Case Sensitive Search</a> <a href="http://wiki.lessthandot.com/index.php/No_Functions_on_Left_Side_of_Operator" title="No Functions on Left Side of Operator">No Functions on Left Side of Operator</a> <a href="http://wiki.lessthandot.com/index.php/Query_Optimizations_With_Dates" title="Query Optimizations With Dates">Query Optimizations With Dates</a> <a href="http://wiki.lessthandot.com/index.php/Optimization:_Set_Nocount_On" title="Optimization: Set Nocount On">Optimization: Set Nocount On</a> <a href="http://wiki.lessthandot.com/index.php/No_Math_In_Where_Clause" title="No Math In Where Clause">No Math In Where Clause</a> <a href="http://wiki.lessthandot.com/index.php/Don%27t_Use_%28select_%2A%29%2C_but_List_Columns" title="Don't Use (select *), but List Columns">Don't Use (select *), but List Columns</a>
If you are interested in some blogposts about dates, take a look at these two which I wrote earlier
How Are Dates Stored In SQL Server?
Do You Know How Between Works With Dates?
20 Comments
|| what can we do to make this piece
|| of code use an index seek
|| WHERE YEAR(payment_dt) = YEAR(GETDATE())
|| and MONTH(payment_dt) = MONTH(GETDATE())
why not
create index idx_Year on table (year(payment_dt))
create index idx_Month on table (month(payment_dt))
year(getDate()) is evaluted once (since getdate does not change per row, only per transaction), and year(payment_dt) would be evaluated through index?
Why the ugly and hard to grok
WHERE payment_dt >= DATEADD(mm, DATEDIFF(mm, 0, GETDATE())+0, 0)
I hadn’t thought about this before. Is this true for all functions? I have a few statements with COALESCE on both sides but haven’t noticed any issues. I’ll have to see if I can improve on them.
Adam, because the more indexes you add, the slower your inserts and updates will be
Venshi,
Look at the execution plan to see if you get a seek or a scan
where coalesce(column,0) = 0
should result in a scan
240000% improvement?
the only thing i would add is that the analogy of the book surely is a bit misleading, because reading the book from start to the item you are looking for is analogous to a table scan whereas reading the index from beginning to what you are looking for is analogous to an index scan.
the use of functions causes this behaviour, because as in this case to find all the index items where the date has a certain month the db can’t use direct access. it’s analogous to finding all the index items in a book that contain ‘blah’. the solution is to limit the search to a range of dates, ie between two index items that can be accessed directly. i don’t think there is a way to convert my example so that the db can directly access index items.
Jürgen,
It also depends if the index is clustered or not
While I understand the technicalities, I think it’s a pretty sad statement of affairs that after 20 years of development Sql Server (and other RDBMS’s) still can’t perform the following logic:
1. YEAR(GETDATE()) is deterministic (for a single statement, at least) returning 2008 – so it’s effectively a constant.
2. (YEAR(datetime) = 2008) is exactly equal to (datetime BETWEEN ‘1/1/2008′ AND ’12/31/2008’).
I learned how to “solve for x” in 4th grade, and now I can even buy a $30 calculator that’ll do it for me. Is it too much to ask that my $5000 server, and $20000 software do the same?
For a supposed declarative language that touts it’s “Query Optimizer”, Sql sure is brain dead quite often.
I am surprised the person did not look at the Execution Plan to figure out what’s going on
A simple UNION query that runs 24 hours = something is SERIOUSLY wrong
I do agree, SQL should be smarter in Date-handling cases
and YEAR(getdate()) and MONTH(getdate()) are essentially CONSTANT in the above query. The user should save them into a variable at least
I’ve learned to use BETWEEN if I can, versus Date functions or DATEDIFF/DATEPART
it’s not the YEAR(GETDATE()) and MONTH(getdate()) that’s the problem, but the YEAR(payment_dt) and MONTH(payment_dt).
But you might think that the query optimiser would know that YEAR(payment_dt) = YEAR(GETDATE()) was the same as payment_date BETWEEN ‘1-Jan-‘ + YEAR(GETDATE()) and ’31-Dec-‘ + YEAR(GETDATE()) and MONTH(payment_dt) = MONTH(GETDATE()) in combination with YEAR(payment_dt) = YEAR(GETDATE()) is the same as payment_dt BETWEEN ‘1-‘ + DATENAME(m,GETDATE()) + ‘-‘ + CONVERT(varchar,YEAR(GETDATE())) and DATEADD(d,-1,DATEADD(m,1,’1-‘ + DATENAME(m,GETDATE()) + ‘-‘ + CONVERT(varchar,YEAR(GETDATE())))) (there is probably a nicer way of doing this).
i made a mistake with using BETWEEN in the previous comment, it should be:
payment_dt >= ‘1-‘ + DATENAME(m,GETDATE()) + ‘-‘ + CONVERT(varchar,YEAR(GETDATE())) and payment_dt < DATEADD(m,1,’1-‘ + DATENAME(m,GETDATE()) + ‘-‘ + CONVERT(varchar,YEAR(GETDATE())))
which is pretty much the same as your answer, but not as nice.
>> Why the ugly and hard to grok
It’s not ugly when the performance is so great *without adding any more indexes*! And it’s only hard to grok if you’re new to it. The concept is quite simple, in fact, much simpler than the code looks.
The title should be “Only in a database is compiler optimization so poor that it cannot figure out a trivial transform”.
Really, from a developer perspective, it is *ridiculous* how poor the sql compiler is at making simple optimizations.
Well Steve I’m sure MS will appreciate the patch you are going to give them ;-).
I’m chuckling about the angry developer comments above, complaining about why SQL Server can’t predict the results on year/getdate. It’s a poor musician who blames his instrument….
thanks for posting
Great article and very important.
Great info here.. even almost 2 years later! Question – many of the tables I’m dealing with at work have broken up month, day, and year into three different fields: STAT_DAY, STAT_MONTH (indexed), and FOUR_DIGIT_YEAR (indexed). I’ve been tinkering around with the best way to get this year’s data, but I’ve primarily been using:
WHERE FOUR_DIGIT_YEAR=YEAR(GETDATE())
How would this fit into “sargable-ness”? Any better way to do it?
Will, YEAR(GETDATE() should be fine since the functions is evaluated against a column but only one time
you could also do this
declare @Year int
select @Year = YEAR(GETDATE())
…..
…..
WHERE FOUR_DIGIT_YEAR= @Year
I enjoyed reading the blog and all the comments. So has anything changed when using 2012 or 2014.