Generally when we are defining tables, the more specific the column definition the better. Yesterday, however, I ran into a case where better definition actually has increased storage use for no appreciable benefit.
Integers – Using Numeric vs Int
As I was working on the database I came across a curious sight, multiple columns defined as numeric(7,0), numeric(9,0), and so on. It seemed like someone was trying to provide the database with the most specific definition possible for a number of different pieces of data. Having never run into this particular practice, I immediately started searching for a reason. Was it smaller? faster? better?
Storage
Using a very specific, well-defined numeric has actually cost us storage space, not reduced it. A numeric (or decimal) with a precision value of 1 to 9 requires 5 bytes and with 10-19 requires 9 bytes. Compare this to the many varieties of int:
Digits |
---|
So for every row and every index that includes this value, we lose storage space.
References: Int reference on MSDN and Numeric/Decimal reference on MSDN
Performance #1
When SQL Server is asked to execute a math function (+,-,*,/), it uses a defined set of rules to determine the output type, then implicitly converts the arguments to that type (see this article for a subset that relates to decimals). This means that in many cases there could be implicit conversions to numeric from int, so it’s possible someone believed we could try and tweak our performance by defining the field as numeric instead of an int.
Let’s test out implicit conversions:
/* ****** Creation of some number tables ****** */
Create Table NumberIntTest(Num Int Identity(1,1) Primary Key)
go
Set NOCOUNT ON
Begin Tran
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Insert Into NumberIntTest Default Values
Commit Tran
go 100000
Create Table NumberNumericTest(Num numeric(7,0) Primary Key)
Go
Set NOCOUNT ON
Begin Tran
Insert Into NumberNumericTest(Num) Select Num From NumberIntTest WHERE Num > 10001
Commit Tran
Go
/* ****** Execute a variety of test scripts ****** */
DECLARE @Start DateTime;
DECLARE @Garbage int, @Junk numeric(7,0);
DECLARE @Int int; SET @Int = 1;
DECLARE @Num numeric(7,0); SET @Num = 1;
-- Divide an int by an int
SELECT @start = GETDATE();
SELECT @Garbage = Num/@Int FROM NumberIntTest n;
SELECT DateDiff(Millisecond, @Start, GetDate());
-- Divide a numeric(7,0) by an int
SELECT @start = GETDATE();
SELECT @Junk = Num/@Int FROM NumberNumericTest n;
SELECT DateDiff(Millisecond, @Start, GetDate());
-- Divide a numeric(7,0) by a numeric(7,0)
SELECT @start = GETDATE();
SELECT @Junk = Num/@Num FROM NumberNumericTest n;
SELECT DateDiff(Millisecond, @Start, GetDate());
-- Divide an int by an int w/ explicit casting to numeric
SELECT @start = GETDATE();
SELECT @Junk = Num/CAST(@Int as numeric(7,0)) FROM NumberIntTest n;
SELECT DateDiff(Millisecond, @Start, GetDate());
-- Divide an numeric(7,0) by an int w/ explicit casting
SELECT @start = GETDATE();
SELECT @Junk = Num/CAST(@Int as numeric(7,0)) FROM NumberNumericTest n;
SELECT DateDiff(Millisecond, @Start, GetDate());
Initially I compared the execution plans and didn’t see much difference, but after some modifications (thanks George!) and additions we can see the differences between a number of different situations.
Sample Results:
Test |
---|
In the second test’s plan we can see an example of that implicit cast:
numeric/int w/ Implicit Cast: |--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(numeric(7,0),[utils].[dbo].[NumberNumericTest].[Num] as [n].[Num]/CONVERT_IMPLICIT(numeric(10,0),[@Int],0),0))) |--Clustered Index Scan(OBJECT:([utils].[dbo].[NumberNumericTest].[PK__NumberNu__C7D08B630AD2A005] AS [n]))
So if we did have an integer that we need to operate on with a float value (the last case), the addition of a simple cast on the integer argument will bring the execution performance in line with having two numerics, meaning there is no gain in storing the value as a numeric(7,0).
Performance #2
The other potential performance impact is with auto-parameterization. Auto-parameterization occurs when you provide SQL Server with a non-parameterized SQL statement. The server determines a type for those parameters and parameterizes them (part of the magic that makes plans reusable for non-parameterized queries). I couldn’t find anything terribly recent, but as far back as SQL Server 6.5 and 7.0 the engine was documented as using the int type for any non-decimal value of 9 digits or less. This means that in the unlikely situation that you’re executing inline, non-parameterized SQL statements and have used numeric(*,0) types in your table definitions, you will actually be taking a performance hit for the implicit conversion from auto-parameterized integer to the numeric(*,0) field.
And if that wasn’t bad enough, the same SQL Server documentation says that SQL Server treats integers as more exact than numeric and decimal types. It doesn’t specify why the document goes out of its way to share this information with us, but generally when someone goes out of their way to point out something like this in a document, I get a little nervous and tend to focus more heavily on their ‘recommended’ practice (use int).
More information on Parameterization and SQL 7 Comparison Optimization
The Wrap-up
So at the end of the day, using a numeric(*,0) requires more space, provides no appreciable benefit over explicit casting, and can actually harm you if you are executing non-parameterized SQL statements against your server.
There are two options for finding these columns, using a SQL query like the one below or downloading SQLCop to check for this and many other common situations.
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, NUMERIC_PRECISION, NUMERIC_SCALE
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE C.DATA_TYPE IN ('numeric','decimal') AND NUMERIC_SCALE = 0 AND NUMERIC_PRECISION <= 18