This is day fifteen of the SQL Advent 2012 series of blog posts. Today we are going to look at the benefit of indexes
So how does an index work?
How does an index work, how does it help SQL Server finding stuff faster? Here is an simple non technology explanation. If I told you to grab a cookbook and give me all the recipes in that book for cod, what would you do? There are two things yuo can do, you can read through the whole book page by page until you get to the last page looking for any cod recipes. Or………..take a look at the picture below
See that, in one second I know exactly where to find cod recipes, it is on page 305 and 61. Which do you think is faster, looking it up in an index or scanning through the book?
SQL Server pretty much uses the same technique. There are two types of basic indexes in SQL Server, these are clustered and non clustered indexes. A clustered index will contain all the data from the table as well, a non clustered index will have a pointer to the table row if there is no clustered index on that table or to the clustered index position is there is a clustered index on the table.
A table with a clustered index is also called a clustered table. A table without a clustered index is also called a heap.
SQL Server also has XML, spatial, columnstore, filtered, full text indexes but we are just focusing on the basic indexes in this post.
Here is what books on line has to say about the storage of a clustered index
In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
A non clustered index is a little different since it doesn’t store the whole data pages, here is what books on line has to say about the storage of a nonclustered index
_Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
- The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
- The leaf layer of a nonclustered index is made up of index pages instead of data pages.
Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.
The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:
- If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
- If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Cool, I will now just add indexes on every column. Not so fast, there are some things to consider here, for every update, insert and delete statement indexes have to be maintained. If you have a busy OLTP database, you have to find the right balance between your read and write IO. Not enough indexes and your retrieval queries will suffer, too many indexes and your inserts will be slower. Also keep in mind that indexes take up storage, the more you have the bigger your database will be.
Keep your clustered indexes narrow
Try to keep your clustered indexes as narrow as possible, if you can use something like an integer, this is only 4 bytes. The reason to keep your clustered indexes narrow is that when you have non clustered indexes, the row locator is the clustered index key for the row. In this case your non clustered index will become bigger as well and now you won’t be able to store as much data on a page. To illustrate that let’s take a look at some simple code
First let’s create this table and populate it with 2048 rows
sql CREATE TABLE Test1(id int, somecol char(36), somecol2 char(36)) GO
INSERT Test1 SELECT number,newid(),newid() FROM master..spt_values WHERE type = ‘P’
> Add a clustered index > > sql CREATE CLUSTERED INDEX cx on Test1(id)
Add these two non clustered indexes
sql CREATE NONCLUSTERED INDEX ix1 on Test1(somecol) CREATE NONCLUSTERED INDEX ix2 on Test1(somecol2)
> Let's check how much storage is required for the non clustered indexes > > sql SELECT > DB_NAME(DATABASE_ID) AS [DatabaseName], > OBJECT_NAME(OBJECT_ID) AS TableName, > SI.NAME AS IndexName, > INDEX_TYPE_DESC AS IndexType, > PAGE_COUNT AS PageCounts > FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS > INNER JOIN sysindexes SI > ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID > AND OBJECT_NAME(OBJECT_ID) = 'Test1' > GO
Here is the output, as you can see the non clustered indexes take up 12 pages
DatabaseName TableName IndexName IndexType PageCounts tempdb Test1 cx CLUSTERED INDEX 22 tempdb Test1 ix1 NONCLUSTERED INDEX 12 tempdb Test1 ix2 NONCLUSTERED INDEX 12
If we check the table size
sql EXEC sp_spaceused ‘Test1’
> <pre>name rows reserved data index_size unused Test1 2048 472 KB 176 KB 240 KB 56 KB</pre> > > We see that it is using 240 KB for the indexes > > Let's recreate the clustered index with all 3 columns now. > > sql CREATE CLUSTERED INDEX cx on Test1(id,somecol,somecol2) > WITH DROP_EXISTING
Recreating the clustered index also recreated the non clustered indexes. Let’s check now how many pages a non clustered index is
sql SELECT DB_NAME(DATABASE_ID) AS [DatabaseName], OBJECT_NAME(OBJECT_ID) AS TableName, SI.NAME AS IndexName, INDEX_TYPE_DESC AS IndexType, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N’LIMITED’) DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID AND OBJECT_NAME(OBJECT_ID) = ‘Test1’ GO
> Here are the results > > <pre>DatabaseName TableName IndexName IndexType PageCounts tempdb Test1 cx CLUSTERED INDEX 22 tempdb Test1 ix1 NONCLUSTERED INDEX 21 tempdb Test1 ix2 NONCLUSTERED INDEX 21</pre> > > As you can see the non clustered indexes went from 12 to 21 pages > > The index size changed, if you run this > > sql EXEC sp_spaceused 'Test1'
Here is the result
name rows reserved data index_size unused Test1 2048 600 KB 176 KB 384 KB 40 KB
So we went from 240 KB to 384 KB for the index storage.
So why does this matter you ask? SQL Server will use indexes for all kind of things, if you run a COUNT(*) it will use an index, if you do a JOIN it will use an index, it will use indexes in GROUP By queries and many more things.
Let’s look at a simple example, when you do a COUNT(*), the optimizer will pick a non clustered index if there is one since it usually has less columns than the clustered index
sql SET SHOWPLAN_TEXT ON GO
SELECT count(*) FROM Test1 GO
SET SHOWPLAN_TEXT OFF GO ```
Here is the plan
Basically it had to scan through all the index pages to get the count, if your index was now still 12 pages instead of 21, SQL Server would take less time to accomplish this.
I barely scratched the surface on indexing, it is a big topic and I recommend you start by navigating to the topic in Books On Line: http://msdn.microsoft.com/en-us/library/ms175049.aspx
You can also take a look at the following posts written about indexing right here on this site