You have to have the concept of indexing and the pros and cons if you’re in a DBA role or taking on that role while in another position. If anything you have the understanding that when you create an index there are going to be maintenance tasks and a certain impact on your system that go along with the objects. At the least, you need to understand that fragmentation is a major concern and how indexing can affect the size of your databases.
I have one database around 100GB with around a dozen tables that go greater or around 5GB in size. These tables are used heavily in reporting but really when it comes down to it are historic and financial data that typically is hit hard with changes monthly. Now if you go and create an index on one of those tables you have a very good chance of increasing the size you use greatly. Remember the basics in only one clustered and 240+ non-clustered per table (from memory). In the highest level of explanations I can come up with on the differences would be, clustered indexes on the pages is the data ordered as you have set the index (or primary key) to be. Non-clustered are pointers to the data. So you see this is why the age old question you’ll hear in interviews is, “What is the fundamental difference between clustered and non-clustered indexes?” If you answer physical ordering then you’re off to a good start to the interview. Now this goes much deeper and we could go into heap and b-tree and how the data is stored across the pages and linked and on… This isn’t what I want to get across. My focus is to get you working better and faster with knowledge of what you need to prepare for when creating indexes. Experience and time working on SQL Server along with heavy reading into how the system works and data is stored gives you deeper insight to the rest.
So in that it is critical that in the position of supporting any database that you have queries running on that it is performing to the best it can be. You can do this with SQL Server tools without having the deep knowledge of a seasoned DBA. The tool I’m referring to is the Database Engine Tuning Advisor. This tool has saved me dozens of times when time is important and either blocking or simply performance was driving my instances down. What the advisor will do for you is simple. It recommends statistics and indexes that will improve the performance of a query you pass through it for starters. You can use the tuning advisor for much more but for now let’s focus on the basic and probably the most common use of creating indexes and statistics off queries prior to putting them in production.
Let’s try it.
Back in my DBA database I will create a table named tuning_example
CREATE TABLE [dbo].[tuning_example](
[col1] [varchar](40) NULL,
[num] [int] NULL,
[col2] [varchar](20) NULL,
[seed] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_tuning_example] PRIMARY KEY CLUSTERED
(
[seed] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
With a quick load to put some data in it
declare @loop int
set @loop = 1
while @loop <= 10000
begin
insert into tuning_example values ('data example' + cast(@loop as varchar(5)),@loop,CAST(@loop/3 as varchar(20)));
set @loop += 1;
end
You can see again I like that script to create task in SSMS. This is good actually because it shows you that when you create a primary key what this does is create your clustered index. When we do an estimated execution plan on a query such as
SELECT seed FROM tuning_example
We should see in a execution plan that this will use the clustered index.
This is why setting a primary key is always a good idea. Always a good idea when it benefits the table and system that is.
Now let’s try this without the primary key
DROP TABLE tuning_example
CREATE TABLE [dbo].[tuning_example](
[col1] [varchar](40) NULL,
[num] [int] NULL,
[col2] [varchar](20) NULL,
[seed] [int] IDENTITY(1,1) NOT NULL)
GO
declare @loop int
set @loop = 1
while @loop <= 10000
begin
insert into tuning_example values ('data example' + cast(@loop as varchar(5)),@loop,CAST(@loop/3 as varchar(20)));
set @loop += 1;
end
So we have a table with the same data now which is literally identical minus out clustered index. Check out the execution plan of the same query
Ouch! We all know table scans are bad. Do this on a 5GB table and you can go to lunch while it is running. So what do you do? This is the cool part.
Make sure for this example that the only query in the query window is the one we just ran. There will be an icon in the tool bar next to display estimated execution plan for launching the tuning advisor in 2005. In 2008 you select the query and or right click a white space in the query window and select “Analyze query in database engine tuning advisor”
one of the following
When you launch this the engine will basically fills all your options required to run off the query in the window you had up. So go ahead and click Start Analysis.
Once the engine completes you’ll see a listing of what is recommended for supporting indexes and statistics to improve the performance of the query. There will also be several report options for documenting the creations and also the performance increase estimation from the recommended objects. We get the big part with the help in the index creation
Use the copy to clipboard option, paste the statement in the query window and what your execution plan should turn into is index usage and performance increase tremendously.
Cool!!!
Ok here is the bad news. We saw that from the tuning advisor we needed an index on seed but the fun only just begins. What you need to worry about now is fragmentation. Fragmentation is easy to handle as long as you handle it. Denis has an excellent blog post on this “Finding Fragmentation Of An Index And Fixing It” /index.php/DataMgmt/DataDesign/finding-fragmentation-of-an-index-and-fi and you should also read “how to get the selectivity of an index” /index.php/DataMgmt/DataDesign/how-to-get-the-selectivity-of-an-index I rebuild indexes on work tables daily and some hourly depending on the edition of the instance and having the ONLINE option with enterprise to avoid locking issues. If you do not do this then the performance of SQL Server will degrade rapidly. Statistics also have to be brought up in this mix. The tuning advisor will commonly give you recommendations on statistics creations.
The best thing I can say is read Denny’s blog on statistics http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1278729,00.html
This is without a doubt the best article I have read covering statistics and affects. I identified key places where performance degraded drastically over a time frame of 6 hours on work tables for sales and the process in which they exported to historic tables. This performance issue was getting to the point users did notice and it was a problem. The first thing I did was update the statistics on those tables to test performance differences in hour by hour changes to the tables. Given that test I found that the tables required me to run an update on the statistics around every 3 hours during posting times. Performance has never been better after that maintenance task was added.
To see how the tuning advisor helps with statistics, recreate your table and load the data again. Then change our query to this
SELECT col1,seed FROM tuning_example
WHERE col2 >= 30 And col2 <= 500
Run the advisor again and you should see additional statistics on col2 recommended. Really it amazes me how much SQL Server has to it making our lives easier. All we need now it a robot to hit F5 a few times.