Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Dealing with Microsoft.Ace and OPENROWSET ErrorsDealing with The multi-part identifier "dbo.Table.Column" could not be bound. error in an update statement »
    comments

    Generally when we are defining tables, the more specific the column definition the better. Yesterday, however, I ran into a case where better definition actually has increased storage use for no appreciable benefit.

    Integers - Using Numeric vs Int

    As I was working on the database I came across a curious sight, multiple columns defined as numeric(7,0), numeric(9,0), and so on. It seemed like someone was trying to provide the database with the most specific definition possible for a number of different pieces of data. Having never run into this particular practice, I immediately started searching for a reason. Was it smaller? faster? better?

    Storage

    Using a very specific, well-defined numeric has actually cost us storage space, not reduced it. A numeric (or decimal) with a precision value of 1 to 9 requires 5 bytes and with 10-19 requires 9 bytes. Compare this to the many varieties of int:

    DigitsInt VarietyInt BytesNumeric(*,0) BytesDifference
    1 - 2tinyint154 bytes
    3 - 4smallint253 bytes
    5 - 9int451 byte
    10 - 18bigint891 byte

    So for every row and every index that includes this value, we lose storage space.

    References: Int reference on MSDN and Numeric/Decimal reference on MSDN

    Performance #1

    When SQL Server is asked to execute a math function (+,-,*,/), it uses a defined set of rules to determine the output type, then implicitly converts the arguments to that type (see this article for a subset that relates to decimals). This means that in many cases there could be implicit conversions to numeric from int, so it's possible someone believed we could try and tweak our performance by defining the field as numeric instead of an int.

    Let's test out implicit conversions:

    1. /* ****** Creation of some number tables ****** */
    2. Create Table NumberIntTest(Num Int Identity(1,1) Primary Key)
    3. go
    4.  
    5. Set NOCOUNT ON
    6. Begin Tran
    7.     Insert Into NumberIntTest Default Values
    8.     Insert Into NumberIntTest Default Values
    9.     Insert Into NumberIntTest Default Values
    10.     Insert Into NumberIntTest Default Values
    11.     Insert Into NumberIntTest Default Values
    12.     Insert Into NumberIntTest Default Values
    13.     Insert Into NumberIntTest Default Values
    14.     Insert Into NumberIntTest Default Values
    15.     Insert Into NumberIntTest Default Values
    16.     Insert Into NumberIntTest Default Values
    17. Commit Tran
    18. go 100000
    19.  
    20. Create Table NumberNumericTest(Num numeric(7,0) Primary Key)
    21. Go
    22.  
    23. Set NOCOUNT ON
    24. Begin Tran
    25.     Insert Into NumberNumericTest(Num) Select Num From NumberIntTest WHERE Num > 10001
    26. Commit Tran
    27. Go
    28.  
    29. /* ****** Execute a variety of test scripts ****** */
    30. DECLARE @Start DateTime;
    31. DECLARE @Garbage int, @Junk numeric(7,0);
    32.  
    33. DECLARE @Int int; SET @Int = 1;
    34. DECLARE @Num numeric(7,0); SET @Num = 1;
    35.  
    36. -- Divide an int by an int
    37. SELECT @start = GETDATE();
    38. SELECT @Garbage = Num/@Int FROM NumberIntTest n;
    39. SELECT DateDiff(Millisecond, @Start, GetDate());
    40.  
    41. -- Divide a numeric(7,0) by an int
    42. SELECT @start = GETDATE();
    43. SELECT @Junk = Num/@Int FROM NumberNumericTest n;
    44. SELECT DateDiff(Millisecond, @Start, GetDate());
    45.  
    46. -- Divide a numeric(7,0) by a numeric(7,0)
    47. SELECT @start = GETDATE();
    48. SELECT @Junk = Num/@Num FROM NumberNumericTest n;
    49. SELECT DateDiff(Millisecond, @Start, GetDate());
    50.  
    51. -- Divide an int by an int w/ explicit casting to numeric
    52. SELECT @start = GETDATE();
    53. SELECT @Junk = Num/CAST(@Int as numeric(7,0)) FROM NumberIntTest n;
    54. SELECT DateDiff(Millisecond, @Start, GetDate());
    55.  
    56. -- Divide an numeric(7,0) by an int w/ explicit casting
    57. SELECT @start = GETDATE();
    58. SELECT @Junk = Num/CAST(@Int as numeric(7,0)) FROM NumberNumericTest n;
    59. SELECT DateDiff(Millisecond, @Start, GetDate());

    Initially I compared the execution plans and didn't see much difference, but after some modifications (thanks George!) and additions we can see the differences between a number of different situations.

    Sample Results:

    Testtime (ms)
    int/int - No Cast170
    numeric/int - Implicit Cast313
    numeric/numeric - Implicit Cast296
    int/CAST(int as numeric) - Implicit Cast320
    numeric/CAST(int as numeric)290

    In the second test's plan we can see an example of that implicit cast:

    numeric/int w/ Implicit Cast:
    
      |--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(numeric(7,0),[utils].[dbo].[NumberNumericTest].[Num] as [n].[Num]/CONVERT_IMPLICIT(numeric(10,0),[@Int],0),0)))
           |--Clustered Index Scan(OBJECT:([utils].[dbo].[NumberNumericTest].[PK__NumberNu__C7D08B630AD2A005] AS [n]))
    

    So if we did have an integer that we need to operate on with a float value (the last case), the addition of a simple cast on the integer argument will bring the execution performance in line with having two numerics, meaning there is no gain in storing the value as a numeric(7,0).

    Performance #2

    The other potential performance impact is with auto-parameterization. Auto-parameterization occurs when you provide SQL Server with a non-parameterized SQL statement. The server determines a type for those parameters and parameterizes them (part of the magic that makes plans reusable for non-parameterized queries). I couldn't find anything terribly recent, but as far back as SQL Server 6.5 and 7.0 the engine was documented as using the int type for any non-decimal value of 9 digits or less. This means that in the unlikely situation that you're executing inline, non-parameterized SQL statements and have used numeric(*,0) types in your table definitions, you will actually be taking a performance hit for the implicit conversion from auto-parameterized integer to the numeric(*,0) field.

    And if that wasn't bad enough, the same SQL Server documentation says that SQL Server treats integers as more exact than numeric and decimal types. It doesn't specify why the document goes out of its way to share this information with us, but generally when someone goes out of their way to point out something like this in a document, I get a little nervous and tend to focus more heavily on their 'recommended' practice (use int).

    More information on Parameterization and SQL 7 Comparison Optimization

    The Wrap-up

    So at the end of the day, using a numeric(*,0) requires more space, provides no appreciable benefit over explicit casting, and can actually harm you if you are executing non-parameterized SQL statements against your server.

    There are two options for finding these columns, using a SQL query like the one below or downloading SQLCop to check for this and many other common situations.

    1. SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, NUMERIC_PRECISION, NUMERIC_SCALE
    2. FROM INFORMATION_SCHEMA.COLUMNS C
    3. WHERE C.DATA_TYPE IN ('numeric','decimal') AND NUMERIC_SCALE = 0 AND NUMERIC_PRECISION <= 18

    About the Author

    User bio imageEli delivers software and technology solutions for a living. His roles have included lone developer, accidental DBA, team lead, and even unintentional Solaris consultant once. With experience in adhoc, Lean, and Agile environments across NSF grants, SaaS products, and in-house IT groups, he is just as willing to chat about the principles of Lean or Continuous Delivery as he is to dive into Azure, SQL Server, or the last ATDD project he created.
    Social SitingsTwitterLinkedInHomePagedeliciousLTD RSS Feed
    Instapaper

    12 comments

    Comment from: SQLDenis [Member] Email
    SQLDenis Probably a person who started on another RDBMS platform where they don't have int, smallint, bigint etc etc
    09/02/10 @ 09:37
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) I found one occurrence in my database. I wrote code to auto-generate audit tables. In the base table, i have a decimal(16,8), but the audit table has decimal(18,0). Obviously something went wrong with the auto-generate audit table process. This blog helped me discover the problem.

    Thanks.
    09/02/10 @ 09:43
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Erik found a mis-type, apparently I was thinking 9 bytes and typing 10 for the numeric of precision 10-19. Math and numbers should be corrected now.
    09/02/10 @ 10:46
    Comment from: Janice Lee [Visitor] · http://twitter.com/JaniceCLee
    Janice Lee Aha! :) Hey, Eli! Great post.

    P.S. I meant to tell you, I actually used your VM post 3 weekends ago to set up my SQL 2008 R2 lab. ;) M-m-m-m...was that helpful or what. Thanks again.
    09/02/10 @ 14:23
    Comment from: Erik [Member] Email
    Erik I like that you did performance testing. It's nice to see people prove things sometimes rather than just making assertions.

    Erik
    10/07/10 @ 21:27
    Comment from: MattM [Visitor]
    MattM Thanks for the article Eli,

    I was doing an import from a foxpro database into sql server and several of the columns are Numeric(n,0). I had gone down the route of just creating integer columns to store these numbers (and did not encounter any problems). But I was curious as to whether there was any benefit either way or some reason why I should preserve the datatype - for the most part, the foxpro columns set up like this were identity values, but not all.

    Much appreciate your efforts in helping to make the decision less arduous :)

    cheers
    Matt
    12/15/11 @ 21:19
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) If you are doing a direct import and do not plan to access the values from FoxPro, then I would do the import, take a backup, then convert the columns in place. If you are going to continue accessing the data from Foxpro, then you might want to do a quick test to see if it is comfortable mapping to a SQL Server int column or not. According to MSDN, FoxPro stores ints in an 8 byte numeric internally, but I don't know how it handles mapping when it uses a backend SQL and if it will accept the mapping without complaint.

    One numeric(n,0) to int savings may not seem worth it, but if you add that savings up across all of your tables and non-clustered indexes, then add in potential savings from char to varchar conversion where appropriate (another artifact I have seen from FoxPro), I think it will be fairly substantial.
    12/16/11 @ 04:13
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky Eli,

    Visual FoxPro stores integer values as 4 bytes (same as SQL Server). This is taken directly from Visual FoxPro Help file:


    To store non-decimal numeric values when performance and table storage limitations are especially important, use the Integer field type. To generate values, such ID numbers, use Integer (Autoinc) that automatically increments the value.

    In tables, the Integer field type is stored as a 4-byte binary value. As a result, it requires less memory than other Numeric data types. As a binary value, it requires no ASCII

    For specifications about the Integer and Integer (Autoinc) field types, see Visual FoxPro Data and Field Types.
    --------------------------
    So, indeed for storing values without decimals integer and bigint (VFP does not have bigint) will be a better choice.

    -----------------
    Interestingly, though, in the Data and Field Types we read the following definition:

    Numeric
    Integers or decimal numbers

    For example, the quantity of items ordered
    8 bytes in memory; 1 to 20 bytes in table
    - .9999999999E+19 to .9999999999E+20

    So, in the table the integer takes 4 bytes to store, but when defining integer value in code, it will be 8 bytes in memory. A bit confusing.
    12/16/11 @ 07:05
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Interesting, the MSDN documentation on this lists Data types and Field types, but the terminology wasn't very clear that Data type meant application level and Field type meant database. It leads me to wonder why, when FoxPro can store a non-decimal numeric as a 4-byte int in it's own tables, it translates to a much larger numeric value in SQL Server. Perhaps the version in SQL Server is a representation of the application (data) types in FoxPro rather than the table (field) types?
    12/16/11 @ 09:10
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky I believe that integer in FoxPro will correctly convert to integer in SQL Server and vice versa. For numeric values it's not 1:1 translation.

    The Integer type is relatively new in FoxPro (I believe it appeared only that FoxPro became Visual FoxPro). So old, dbase format didn't have integer type.
    12/16/11 @ 10:56
    Comment from: RGPSoftware [Visitor]
    RGPSoftware I've considered doing this, using decimal(9,0) instead of int in some cases, in order to avoid the need to CAST when doing division. The main reason to avoid this is mostly to avoid unintentional errors by people who have less understanding of the data when they are using it for reporting, etc. You can argue that anyone making such use of the data should be more knowledgeable, but that's not always realistic in the real world.

    The fact is that if you do math using an int the result is a truncated value, and the only way to avoid that is to either CAST the int to a decimal any time you want to do math (particularly division or even the use of aggregates like AVG()) or define the column as a decimal.

    So to me, this post fails to address the main reason why someone would chose to make use of decimal (or numeric) data types with 0 scale. Yes, everything you said is true, but that still doesn't address the reason why this may be used.
    01/11/12 @ 08:42
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) RPGSoftware - Thanks for the feedback. It's good that you weighed the options before deciding on Numeric(N,0) as your field type, I was simply trying to point out what the potential costs were that we should take into account. Most (all?) implementations I have personally worked with don't take these costs into account, as I often see them used across the board for all integer and numeric data in a database (including id's which will never be used as part of decimal math).

    In your situation I'm not sure I would have made the same decision as I think it would put your database and people in a better position to explicitly cast the values the way they need them rather than expecting the field to do it for them. All it takes for this process to breakdown is for a single field to be typed as an int and suddenly you have invalid business-level information (which is unlikely to be caught immediately).

    I think it is very realistic to expect developers writing queries to understand typecasting or to at least be testing their work and detecting the issue. I would consider supporting ignorance in that area to be detrimental to them and the business. If the data is to be used by non-developers, then I would consider exposing it as a consumable view that the end user could better understand anyway, rather than direct access to a table, so I could perform any necessary casting at this point.
    01/11/12 @ 09:11

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)