Recently, during a pre-conference seminar that I presented, the group and I had a long discussion about why there would be a need to add a nonclustered index that consisted of the primary key as the key column.  This topic also came up in a client training session I presented and in passing with a few peers.  Given the topic and the overall consensus that this was a bad practice, I wanted to discuss it in a post and show a reason why creating a nonclustered index with a key column only being the primary key is actually a great optimization for some plans.

I’d like to focus on what would be the largest aspect to why you would use an index strategy of creating nonclustered indexes with only the primary key as the key columns: pages pulled into the buffer.

Let’s go through an example

Using AdventureWork2012, create a table from Sales.SalesOrderHeader that can be manipulated to fit the following tuning steps.

T-SQL
1
2
3
4
5
SELECT * INTO dbo.IndexPageCount FROM Sales.SalesOrderHeader
GO
ALTER TABLE dbo.IndexPageCount 
ADD CONSTRAINT PK_SalesOrderID PRIMARY KEY (SalesOrderID) 
GO
SELECT * INTO dbo.IndexPageCount FROM Sales.SalesOrderHeader
GO
ALTER TABLE dbo.IndexPageCount 
ADD CONSTRAINT PK_SalesOrderID PRIMARY KEY (SalesOrderID) 
GO

The above statement will create dbo.IndexPageCount and make the SalesOrderID the primary key column.  At this point, no other indexing has been done.  If a query was executed that relied on a predicate of the SalesOrderID, technically, further indexing may not be needed.  For example, review the following query and execution plan generated from how the table and indexing is setup on IndexPageCount.

T-SQL
1
2
3
4
5
6
7
8
9
10
11
12
SET STATISTICS IO ON
 
SELECT 
    hdr.DueDate
    ,hdr.ShipDate
    ,SUM(SubTotal) AS SubTotals
FROM
dbo.IndexPageCount hdr 
WHERE hdr.SalesOrderID < 50000
GROUP BY 
     hdr.DueDate
    ,hdr.ShipDate
SET STATISTICS IO ON

SELECT 
	hdr.DueDate
	,hdr.ShipDate
	,SUM(SubTotal) AS SubTotals
FROM
dbo.IndexPageCount hdr 
WHERE hdr.SalesOrderID < 50000
GROUP BY 
	 hdr.DueDate
	,hdr.ShipDate

IO results

Table ‘IndexPageCount’. Scan count 1, logical reads 164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

And the plan generated from this query

In all respects, this plan is optimized fairly well.  The one key aspect to the query and plan is, the clustered index is the table and will pull all columns along with it, which equates to more pages that are required to come into the buffer.   We can look at the page count required for the exact table by utilizing dm_os_buffer_descriptors, allocation_unit, sys.indexes and sys.partitions DMOs and catalog views.

Query referenced from: http://www.sqlteam.com/article/what-data-is-in-sql-server-memory

T-SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SELECT TOP 25 
    obj.[name],
    i.[name],
    i.[type_desc],
    count(*)AS Buffered_Page_Count ,
    count(*) * 8192 / (1024 * 1024) as Buffer_MB
FROM sys.dm_os_buffer_descriptors AS bd 
    INNER JOIN 
    (
        SELECT object_name(object_id) AS name 
            ,index_id ,allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT object_name(object_id) AS name   
            ,index_id, allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
LEFT JOIN sys.indexes i on i.object_id = obj.object_id AND i.index_id = obj.index_id
WHERE database_id = db_id() AND obj.[name] = 'IndexPageCount'
GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc]
ORDER BY Buffered_Page_Count DESC
SELECT TOP 25 
	obj.[name],
	i.[name],
	i.[type_desc],
	count(*)AS Buffered_Page_Count ,
	count(*) * 8192 / (1024 * 1024) as Buffer_MB
FROM sys.dm_os_buffer_descriptors AS bd 
    INNER JOIN 
    (
        SELECT object_name(object_id) AS name 
            ,index_id ,allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT object_name(object_id) AS name   
            ,index_id, allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
LEFT JOIN sys.indexes i on i.object_id = obj.object_id AND i.index_id = obj.index_id
WHERE database_id = db_id() AND obj.[name] = 'IndexPageCount'
GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc]
ORDER BY Buffered_Page_Count DESC

As shown, the resulting buffered page count is 172 at this point.  This page count is where we want to optimize the query and see how the indexing strategy can assist in lowering the overall page count and buffer utilization.

Optimizing by lowering buffered page count

To lower the page count, we essentially need to lessen the need to pull the pages that come along with the seek operation on the clustered index.  This can be done by indexing the primary key and effectively covering the query by including in the nonkey columns, the resulting columns in the query, or better known as, a covering index.

Clear the buffer so we are sure we  look at the new index and compare the page count in the buffer to the previous results.

T-SQL
1
DBCC DROPCLEANBUFFERS
DBCC DROPCLEANBUFFERS

Next, run the script below to create the nonclustered index or, covering index that consists of the key column being the primary key column of SalesOrderID and INCLUDE the DueDate, ShipDate and SunTotal columns.

T-SQL
1
CREATE INDEX IDX_SalesOrderID_COVER_ASC ON dbo.IndexPageCount (SalesOrderID) INCLUDE (DueDate,ShipDate,SubTotal)
CREATE INDEX IDX_SalesOrderID_COVER_ASC ON dbo.IndexPageCount (SalesOrderID) INCLUDE (DueDate,ShipDate,SubTotal)

Execute the previously used query

T-SQL
1
2
3
4
5
6
7
8
9
10
SELECT 
    hdr.DueDate
    ,hdr.ShipDate
    ,SUM(SubTotal) AS SubTotals
FROM
dbo.IndexPageCount hdr 
WHERE hdr.SalesOrderID < 50000
GROUP BY 
     hdr.DueDate
    ,hdr.ShipDate
SELECT 
	hdr.DueDate
	,hdr.ShipDate
	,SUM(SubTotal) AS SubTotals
FROM
dbo.IndexPageCount hdr 
WHERE hdr.SalesOrderID < 50000
GROUP BY 
	 hdr.DueDate
	,hdr.ShipDate

After reviewing the statistics IO from the second execution, the new covering index was utilized and showed an improvement on logical reads.

Table ‘IndexPageCount’. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Looking at the execution plan, we can see the new nonclustered index is effectively being utilized as well.

Of course, we’ve only replaced an index seek for another index seek.  To see the real optimization, the IO and then equating that to review of the pages pulled into the buffer should be checked.  Execute the query to review the pages in the buffer that was used previously.

As shown, the page count in the buffer is drastically lower than the previous count of 172.  Based on this, we’ve optimized the execution of the query and plan but, more importantly, optimized the resources that are being used by the execution of the query.