Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Why you shouldn't use SELECT *SQL Advent 2012: Recap »
    comments

    I received an email the other day asking why, when running SHOW_STATISTICS, a person was always seeing the primary key in the statistics column density and other output, when the primary key was not part of the index the statistics were created from.

    This is a great question and relates to an article I’m writing on selectivity and ordering in the creation of an index while writing queries.  I thought a quick post would be great on it, since the answer isn’t truly standing out when you search for the reason statistics on nonclustered indexes will always contain the primary key or unique constraint column (on a table that is clustered).

    Index Structure

    The first part of this is the index the statistics were created off of.  If the table being indexed is a clustered table, as shown in listing 1, the unique row indicator for that table will become the primary key.  In our case, Col_ID is now the row indicator for the table or referenced to the actual data row in case the need arises.

    1. CREATE TABLE SelectivityIndexOrder (
    2. Col_ID INT IDENTITY(1,1) PRIMARY KEY,
    3. Col_String VARCHAR(25),
    4. Col_Date DATETIME DEFAULT GETDATE(),
    5. Col_TinyInt TINYINT,
    6. Col_BIT BIT DEFAULT (0)
    7. );
    8.  
    9. DECLARE @LOOP INT = 1
    10. WHILE @LOOP <= 1000
    11. BEGIN
    12.     INSERT INTO SelectivityIndexOrder (Col_String,Col_TinyInt)
    13.     SELECT 'String value for row ' + CAST(@LOOP AS VARCHAR(4)), 2
    14.   SET @LOOP += 1
    15. END
    16.  
    17. WHILE @LOOP <= 2001
    18. BEGIN
    19.     INSERT INTO SelectivityIndexOrder (Col_String,Col_TinyInt)
    20.     SELECT 'String value for row ' + CAST(@LOOP AS VARCHAR(4)), 3
    21.   SET @LOOP += 1
    22. END
    23.  
    24. CREATE NONCLUSTERED INDEX IDX_BIT ON SelectivityIndexOrder (Col_BIT,Col_date)
    25. INCLUDE (Col_String)

    Listing 1

    What this means is, in the case of the creation of a nonclustered index, the column(s) that are being indexed are part of the index (the leaf node values).  The other structural part of a nonclustered index on a table with a clustered index is the row indicator.  In a clustered, unique situation, the best value for a row indicator is the clustered index.  This is the pointer back to the actual data row.

    Now that we know how the clustered values are used in a nonclustered index, let’s take a look at the structure of the nonclustered index.  Using DBCC IND and PAGE, we can drill down to see the actual utilization of the keys.

    1. DBCC IND('QTuner',"SelectivityIndexOrder",2)
    2. GO


    Figure 1

     

    1. DBCC TRACEON(3604)
    2. DBCC PAGE('QTuner',1,434394,3)
    3. GO


    Figure 2

     

    As shown above, the IND results give us a page number of 434394 that we can look at.  Running DBCC PAGE on that page results in the output in figure 2.  This shows a column value of the Col_ID (key) as the pointers back to those rows.

    Now, knowing that statistics are generated automatically for indexes, this would enforce the need for the statistics to not only be created on the Col_BIT column, but in this case, on the Col_ID column as well.

    Using SHOW_STATISTICS, we can see that happening on our example, as shown in Figure 3.

    1. DBCC SHOW_STATISTICS ("SelectivityIndexOrder", IDX_BIT);
    2. GO


    Figure 3

    As with any structure of statistics, the same rule of the histogram being created only for the first column still applies here.  This moves into the topic of selectivity and the future post mentioned in the beginning of this article.  Notice the density is stored for the other columns combined.  In our NC index, we have Col_BIT and Col_Date.  In order to store the row indicator, then, Col_ID is part of the statistics.

    What about HEAPs?

    As shown up until now, the reason the key is stored in the statistics is due to the key being the row indicator in the nonclustered index.  SQL Server needs this to effectively look back to the data rows.  What would happen to the statistics if all of this was based on a HEAP table (a table not clustered)?

    To test this, use listing 2.

    1. CREATE TABLE SelectivityIndexOrderHeap (
    2. Col_ID INT IDENTITY(1,1),
    3. Col_String VARCHAR(25),
    4. Col_Date DATETIME DEFAULT GETDATE(),
    5. Col_TinyInt TINYINT,
    6. Col_BIT BIT DEFAULT (0)
    7. );
    8.  
    9. DECLARE @LOOP INT = 1
    10. WHILE @LOOP <= 1000
    11. BEGIN
    12.     INSERT INTO SelectivityIndexOrderHeap (Col_String,Col_TinyInt)
    13.     SELECT 'String value for row ' + CAST(@LOOP AS VARCHAR(4)), 2
    14.   SET @LOOP += 1
    15. END
    16.  
    17. WHILE @LOOP <= 2001
    18. BEGIN
    19.     INSERT INTO SelectivityIndexOrderHeap (Col_String,Col_TinyInt)
    20.     SELECT 'String value for row ' + CAST(@LOOP AS VARCHAR(4)), 3
    21.   SET @LOOP += 1
    22. END
    23.  
    24. CREATE NONCLUSTERED INDEX IDX_BIT ON SelectivityIndexOrderHeap (Col_BIT,Col_date)
    25. INCLUDE (Col_String)
    26. GO
    27. DBCC SHOW_STATISTICS (SelectivityIndexOrderHeap, [IDX_BIT]);
    28. GO
    29. CREATE NONCLUSTERED INDEX IDX_BIT ON SelectivityIndexOrderHeap (Col_BIT,Col_date)
    30. INCLUDE (Col_String)
    31. GO
    32. DBCC SHOW_STATISTICS (SelectivityIndexOrderHeap, [IDX_BIT]);
    33. GO

    Listing 2

    The output from listing 2 is shown in figure 4.


    Figure 4

    Notice in figure 4 that the key, which doesn’t exist on the table, is shown.  This goes back to the same reasoning of why a key is included in the statistics of a clustered table.  With a HEAP, however, the row indicator is a pointer directly to the data row.  In listing 3, running DBCC PAGE on the first page shown, we can see how to find the data.

    1. DBCC IND('QTuner',SelectivityIndexOrderHeap,2)
    2. GO
    3. DBCC TRACEON(3604)
    4. DBCC PAGE('QTuner',1,434400,1)
    5. GO

    Listing 3


    Figure 5

    I don’t want to dive too far into the PAGE output as it is out of scope.  What I will point out is the row outlined being the slot and the offset.  To really dig into PAGE, read, “Inside the Storage Engine: Anatomy of a page” by Paul Randal.

    For the HEAP table and the nonclustered index, there is a need for the extra step, unlike the step utilizing a key in the clustered table, to utilize the row indicator directly back to the data rows in the HEAP structure.  Since this is not an actual column, that would not show in statistics.

    Summary

    As we’ve covered, the reason a you will see the primary key or unique keys in statistics, even when you do not specifically add them to a nonclustered index, is due to the need to have the row indicator as part of the index.

    About the Author

    Ted Krueger is a SQL Server MVP and has been working in development and database administration for 13+ years. Specialties range from High Availability and Disaster / Recovery setup and testing methods down to custom assembly development for SQL Server Reporting Services. Ted blogs and is also one of the founders of LessThanDot.com technology community. Some of the articles focused on are Backup / Recovery, Security, SSIS and working on SQL Server and using all of the SQL Server features available to create stable and scalable database services. @onpnt Personal Blog over at http://onpnt.wordpress.com/
    Social SitingsTwitterLinkedInLTD RSS Feed
    2162 views
    InstapaperVote on HN

    No feedback yet

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)