Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Removing replication and handling tables with identity seedsLog Shipping for DR and failing back in case of disaster. The cheap way! »
    comments

    Your testbed has to have the same volume of data as on production otherwise you are not really testing anything.

    This blogpost is kind of a rant after I noticed this post on Stackoverflow

    I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).

    Now how on earth can you expect anything to behave the same when you compare 100 rows against 13 million?
    This is one of the fundamental flaws when people design a database, move it to production and then find out that it blows up/breaks down/is unusable on production

    The worst case I have seen was when someone designed a table with a CompanyName column which was CHAR(5000). Yes you read that right CHAR(5000) nor VARCHAR(5000). On 'staging' it was all fine with 50 rows or so. They moved this to production loaded it up with 100000 rows and it was slow as hell. What can you expect when you have only one row per page.....this was just terrible.

    I understand that not every shop has the money to store terabytes of data but guess what? You can buy a USB TB hard drive for about $100. Plug in 5 of those and test with volume otherwise you will suffer later.

    Now let's look at some code to see what the difference is

    First create these two tables

    1. create table TestSmall (id int identity not null,Somevalue char(108),SomeValue2 uniqueidentifier)
    2. go
    3.  
    4. create table TestBig (id int identity not null,Somevalue char(108),SomeValue2 uniqueidentifier)
    5. go

    We will populate the small table with 256 rows and the big one with 65536 rows

    1. --256 rows
    2. insert TestSmall
    3. select convert(varchar(36),newid())
    4.  + convert(varchar(36),newid())
    5.  + convert(varchar(36),newid()),newid() from master..spt_values t1
    6. where t1.type = 'p'
    7. and t1.number < 256
    8. go
    9.  
    10. --65536 rows
    11. insert TestBig
    12. select convert(varchar(36),newid())
    13.  + convert(varchar(36),newid())
    14.  + convert(varchar(36),newid()),newid() from master..spt_values t1
    15. outer apply master..spt_values t2
    16. where t1.type = 'p'
    17. and t1.number < 256
    18. and t2.type = 'p'
    19. and t2.number < 256
    20. go

    Now we will create a clustered index on each table

    1. create clustered index ix_somevalue_small on TestSmall(Somevalue)
    2. go
    3. create clustered index ix_somevalue_big on TestBig(Somevalue)
    4. go

    Time to run some code
    First we have to turn on statistics for time

    1. set statistics io on

    Now run these queries

    1. select * from TestSmall
    2. where Somevalue like '2%'
    3.  
    4. select * from TestBig
    5. where Somevalue like '2%'

    Table 'TestSmall'. Scan count 1, logical reads 2, physical reads 0
    Table 'TestBig'. Scan count 1, logical reads 74, physical reads 0

    As you can see the reads are much higher for the TestBig table, this is of course not surprising since the TestBig has a lot more rows

    What will happen if we write a non sargable query by using a function in the WHERE clause?

    1. select * from TestSmall
    2. where left(Somevalue,1) = '2'
    3.  
    4. select * from TestBig
    5. where left(Somevalue,1) = '2'

    Table 'TestSmall'. Scan count 1, logical reads 7, physical reads 0
    Table 'TestBig'. Scan count 1, logical reads 1132, physical reads 0

    Okay, so the smaller table had 3.5 times more reads while the bigger table had 15 times more reads. Just imagine what would happen if the bigger table was even bigger?

    Time to turn of the statistics for IO

    1. set statistics io off

    Now we will look at statistics for time, you can do that by running the following command

    1. set statistics time on

    Let's run the same queries again

    1. select * from TestSmall
    2. where Somevalue like '2%'
    3.  
    4. select * from TestBig
    5. where Somevalue like '2%'

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 15 ms, elapsed time = 97 ms.

    As you can see the numbers are much better for the smaller table
    When we do the non sargable queries the numbers don't increase for the smaller table but they do for the bigger table

    1. select * from TestSmall
    2. where left(Somevalue,1) = '2'
    3.  
    4. select * from TestBig
    5. where left(Somevalue,1) = '2'

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 31 ms, elapsed time = 132 ms.

    Since data might be cached and you would like to start fresh every time you can execute the following command to clear the cache

    1. dbcc freeproccache
    2. dbcc dropcleanbuffers




    Finally I will leave you with execution plan pics

    Sargable Query

    1. select * from TestSmall
    2. where Somevalue like '2%'
    3.  
    4. select * from TestBig
    5. where Somevalue like '2%'

    Sargable Query

    Non Sargable Query

    1. select * from TestSmall
    2. where left(Somevalue,1) = '2'
    3.  
    4. select * from TestBig
    5. where left(Somevalue,1) = '2'

    Non Sargable Query




    *** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
    Social SitingsTwitterFacebookLinkedInHomePageFlickrLTD RSS Feed
    4177 views
    InstapaperVote on HN

    9 comments

    Comment from: David Forck (thirster42) [Member]
    *****
    Awesome.

    Are there more statistics like those besides io and time? If so, you should either make a blog posts on them or a wiki.
    06/02/09 @ 10:16
    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis Maybe I will do that, there are a bunch of these in addition to time and io like

    SET SHOWPLAN_ALL ON
    SET SHOWPLAN_TEXT ON
    SET STATISTICS PROFILE ON
    06/02/09 @ 10:21
    Comment from: Ted Krueger (onpnt) [Member]
    *****
    Ted Krueger (onpnt) Nice post Denis!

    The hardware cannot be ignore even given volume testing of course. This will affect IO, CPU and the bottom line performance factors. Given a well rounded test plan you can get a close calculation to the performance of the hardware on production vs test if the money is not there for not only data mirroring but hardware mirroring.
    06/02/09 @ 11:02
    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis onpnt, oh man, don't even get me started on that one :-)
    I worked at a company once where the DB server was a single 700MHZ xeon with 1GB of RAM, our Test server was a Dual 2GHZ xeon box with 3GB of RAM

    That of course is not good either, ideally you would have the same machine as on production however you can go to a less expensive machine for QA if the budget is not there
    06/02/09 @ 11:13
    Comment from: sqlsister [Member] Email
    *****
    sqlsister Great Job Dennis.
    06/02/09 @ 11:44
    Comment from: Ted Krueger (onpnt) [Member]
    *****
    Ted Krueger (onpnt) LOL That's one I haven't heard yet. I smaller box for production than test.
    06/02/09 @ 13:22
    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis >>LOL That's one I haven't heard yet. I smaller box for production than test.

    Client refused to upgrade their box since we developed for many clients we did upgrade our box. So we tested the code for that client on an old machine :-)
    06/02/09 @ 13:28
    Comment from: Alex Ullrich [Member] Email
    *****
    Alex Ullrich Damn, I want a 700 GHZ server :D

    Good post though, how long until you start shouting at people for their identity columns / magnetic tape systems
    06/02/09 @ 18:10
    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis >>Damn, I want a 700 GHZ server :D

    I don't know what you mean :P

    I have no problem with identity, I used them all the time, I do have a problem with clustered GUIDs when not using NEWSEQUENTIALID() but NEWID()
    06/03/09 @ 07:15

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)