Something I never really would have thought much about was the creation of statistics does as it impacts an index creation – mostly, the naming convention. In some situations, there may be a need to create your own statistics. Now, 99.999% of the time you will likely be indexing and creating statistics in a much different form than what I’m about to show but nonetheless, it happened to me today so I’m going to show it so others can avoid it.
Take an example table named, testarea loaded with some test data.
CREATE TABLE testarea (colint INT, colchar varchar(155))
GO
INSERT INTO testarea
SELECT RAND()*100,REPLICATE('x',155)
GO 1000
At the time of this table’s creation, you may already know that colint is going to be needed in a predicate form. At this time you could create your own statistics on colint.
CREATE STATISTICS colint_cover ON [dbo].[testarea]([colint])
GO
So far, nothing is a big surprise here. We have a table and a column that we created statistics on effectively getting ahead of the game. What we do need to pay close attention to is the name of the statistics we have chosen.
In the case of the colint column, we’ll effectively want to have an index if it is being used as a predicate. In this case, we could call that index colint_cover, while it may include non-key columns to cover queries.
CREATE INDEX colint_cover ON testarea(colint)
INCLUDE (colchar)
GO
Trying to execute this statement will generate the following error
Msg 1913, Level 16, State 1, Line 1
The operation failed because an index or statistics with name ‘colint_cover’ already exists on table ’testarea’.
This frustrated me for a second given the creation of an index will automatically generate a statistics name exactly the same as the index.
For example
CREATE INDEX IDX_colint ON [dbo].[testarea]([colint])
GO
select * from sys.stats where object_id = OBJECT_ID('testarea')
GO
As stated earlier, the statistics are named the same as the index.
In order to create the index at this point, a different name would be needed.
CREATE INDEX idx_colint_cover ON testarea(colint)
INCLUDE (colchar)
GO
But run the sys.stats query again
SELECT * FROM sys.stats WHERE object_id = OBJECT_ID('testarea')
GO
So, as shown, we now have duplicated statistics which will take twice the resources to maintain.
This is something to keep in mind if you ever create statistics manually. Mostly, ensure you are not creating duplicate statistics.
On a final note: This brings up the need to follow naming conventions closely and ensure that best practice is done for your databases and instances. Some ideas are starting procedures with what operations they perform. ins_name, del_name and so on. Statistics can follow this with, stats_cols. Defining these naming conventions and strictly following them will avoid these odd occurrences that I’ve talked about today.