Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Search

XML Feeds

Google Ads

« Removing replication and handling tables with identity seedsLog Shipping for DR and failing back in case of disaster. The cheap way! »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

Your testbed has to have the same volume of data as on production otherwise you are not really testing anything.

This blogpost is kind of a rant after I noticed this post on Stackoverflow

I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).

Now how on earth can you expect anything to behave the same when you compare 100 rows against 13 million?
This is one of the fundamental flaws when people design a database, move it to production and then find out that it blows up/breaks down/is unusable on production

The worst case I have seen was when someone designed a table with a CompanyName column which was CHAR(5000). Yes you read that right CHAR(5000) nor VARCHAR(5000). On 'staging' it was all fine with 50 rows or so. They moved this to production loaded it up with 100000 rows and it was slow as hell. What can you expect when you have only one row per page.....this was just terrible.

I understand that not every shop has the money to store terabytes of data but guess what? You can buy a USB TB hard drive for about $100. Plug in 5 of those and test with volume otherwise you will suffer later.

Now let's look at some code to see what the difference is

First create these two tables

  1. CREATE TABLE TestSmall (id INT IDENTITY not null,Somevalue CHAR(108),SomeValue2 UNIQUEIDENTIFIER)
  2. go
  3.  
  4. CREATE TABLE TestBig (id INT IDENTITY not null,Somevalue CHAR(108),SomeValue2 UNIQUEIDENTIFIER)
  5. go

We will populate the small table with 256 rows and the big one with 65536 rows

  1. --256 rows
  2. INSERT TestSmall
  3. SELECT CONVERT(VARCHAR(36),newid())
  4.  + CONVERT(VARCHAR(36),newid())
  5.  + CONVERT(VARCHAR(36),newid()),newid() FROM master..spt_values t1
  6. WHERE t1.type = 'p'
  7. and t1.number < 256
  8. go
  9.  
  10. --65536 rows
  11. INSERT TestBig
  12. SELECT CONVERT(VARCHAR(36),newid())
  13.  + CONVERT(VARCHAR(36),newid())
  14.  + CONVERT(VARCHAR(36),newid()),newid() FROM master..spt_values t1
  15. outer apply master..spt_values t2
  16. WHERE t1.type = 'p'
  17. and t1.number < 256
  18. and t2.type = 'p'
  19. and t2.number < 256
  20. go

Now we will create a clustered index on each table

  1. CREATE CLUSTERED INDEX ix_somevalue_small ON TestSmall(Somevalue)
  2. go
  3. CREATE CLUSTERED INDEX ix_somevalue_big ON TestBig(Somevalue)
  4. go

Time to run some code
First we have to turn on statistics for time

  1. SET STATISTICS io ON

Now run these queries

  1. SELECT * FROM TestSmall
  2. WHERE Somevalue like '2%'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE Somevalue like '2%'

Table 'TestSmall'. Scan count 1, logical reads 2, physical reads 0
Table 'TestBig'. Scan count 1, logical reads 74, physical reads 0

As you can see the reads are much higher for the TestBig table, this is of course not surprising since the TestBig has a lot more rows

What will happen if we write a non sargable query by using a function in the WHERE clause?

  1. SELECT * FROM TestSmall
  2. WHERE LEFT(Somevalue,1) = '2'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE LEFT(Somevalue,1) = '2'

Table 'TestSmall'. Scan count 1, logical reads 7, physical reads 0
Table 'TestBig'. Scan count 1, logical reads 1132, physical reads 0

Okay, so the smaller table had 3.5 times more reads while the bigger table had 15 times more reads. Just imagine what would happen if the bigger table was even bigger?

Time to turn of the statistics for IO

  1. SET STATISTICS io OFF

Now we will look at statistics for time, you can do that by running the following command

  1. SET STATISTICS TIME ON

Let's run the same queries again

  1. SELECT * FROM TestSmall
  2. WHERE Somevalue like '2%'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE Somevalue like '2%'

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 97 ms.

As you can see the numbers are much better for the smaller table
When we do the non sargable queries the numbers don't increase for the smaller table but they do for the bigger table

  1. SELECT * FROM TestSmall
  2. WHERE LEFT(Somevalue,1) = '2'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE LEFT(Somevalue,1) = '2'

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 132 ms.

Since data might be cached and you would like to start fresh every time you can execute the following command to clear the cache

  1. DBCC freeproccache
  2. DBCC dropcleanbuffers




Finally I will leave you with execution plan pics

Sargable Query

  1. SELECT * FROM TestSmall
  2. WHERE Somevalue like '2%'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE Somevalue like '2%'

Sargable Query

Non Sargable Query

  1. SELECT * FROM TestSmall
  2. WHERE LEFT(Somevalue,1) = '2'
  3.  
  4. SELECT * FROM TestBig
  5. WHERE LEFT(Somevalue,1) = '2'

Non Sargable Query




*** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

About the Author

User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
Social SitingsTwitterFacebookLinkedInHomePageLTD RSS Feed
3122 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

9 comments

Comment from: thirster42 [Member] Email
*****
Awesome.

Are there more statistics like those besides io and time? If so, you should either make a blog posts on them or a wiki.
06/02/09 @ 10:16
Comment from: SQLDenis [Member] Email
*****
Maybe I will do that, there are a bunch of these in addition to time and io like

SET SHOWPLAN_ALL ON
SET SHOWPLAN_TEXT ON
SET STATISTICS PROFILE ON
06/02/09 @ 10:21
Comment from: Ted Krueger (onpnt) [Member] Email
*****
Nice post Denis!

The hardware cannot be ignore even given volume testing of course. This will affect IO, CPU and the bottom line performance factors. Given a well rounded test plan you can get a close calculation to the performance of the hardware on production vs test if the money is not there for not only data mirroring but hardware mirroring.
06/02/09 @ 11:02
Comment from: SQLDenis [Member] Email
*****
onpnt, oh man, don't even get me started on that one :-)
I worked at a company once where the DB server was a single 700MHZ xeon with 1GB of RAM, our Test server was a Dual 2GHZ xeon box with 3GB of RAM

That of course is not good either, ideally you would have the same machine as on production however you can go to a less expensive machine for QA if the budget is not there
06/02/09 @ 11:13
Comment from: sqlsister [Member] Email
*****
Great Job Dennis.
06/02/09 @ 11:44
Comment from: Ted Krueger (onpnt) [Member] Email
*****
LOL That's one I haven't heard yet. I smaller box for production than test.
06/02/09 @ 13:22
Comment from: SQLDenis [Member] Email
*****
>>LOL That's one I haven't heard yet. I smaller box for production than test.

Client refused to upgrade their box since we developed for many clients we did upgrade our box. So we tested the code for that client on an old machine :-)
06/02/09 @ 13:28
Comment from: Alex Ullrich [Member] Email
*****
Damn, I want a 700 GHZ server :D

Good post though, how long until you start shouting at people for their identity columns / magnetic tape systems
06/02/09 @ 18:10
Comment from: SQLDenis [Member] Email
*****
>>Damn, I want a 700 GHZ server :D

I don't know what you mean :P

I have no problem with identity, I used them all the time, I do have a problem with clustered GUIDs when not using NEWSEQUENTIALID() but NEWID()
06/03/09 @ 07:15

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)