I have been thinking about creating a tool like this for a while now. I know that there is the SQL Server 2005 Best Practices Analyzer and that in SQL server 2008 this has become Policy Management. What I want to create is a pure T-SQL tool where you can just run a stored proc and specify what to check.

I wonder if people would be interested in such a tool, I started a therad here: SQLCop and got some feedback from various people.

Here is what I want this tool to do

_identify that:

no procs start with sp_

no tables are named tbl…..

no column names have spaces or other crazy characters in them

every table has a primary key

no table is clustered on a uniqueidentifier when NEWSEQUENTIALID()is not used as a default

log and datafiles are on separate drives

identify procs that call undocumented proc

identify procs that have exec with dynamic sql and recommend to rewrite that to use sp_executesql

identify redundant indexes

identify unused indexes

indentify fragmented tables

identify stale statistics

tables have the float datatype

text and image datatypes are used

procs that have transactions but don’t check for @@error, @@trancount or xact_state()

procs that have select * instead of column list

George added the following

Highlight columns where the collation does not match the default collation for the database

identify where the database default collation does not match the default collation for TempDB

procedures with cursors

no foreign keys. check for columns with ID in the name and if it’s not a PK, then it should be a foreign key.

using @@Identity instead of scope_identity()

Order By ordinal value instead of listing column names

using select * instead of a column list

Check for “Auto Close”, “Auto Create Statistics”, “Auto Shrink”, and “Auto Update Statistics”

triggers that assume a single row is inserted, updated, or deleted.

non-sargable where clauses.

check for functions that operate on table data.

queries involving multiple tables without ANSI style joins.

SQLSister came up with the following

Fields that have no data in the whole table or all fields are “”

Fields where the len of the data stored is close to the max length of the field (possible that the data size may need to be increased), also a way to see the data at the max length to see if data appears to be cut off (Hey I got a file from a client one time that clearly didn’t have enough space intheir last_name field as it contained many records that were one or two characters off a common last_name and almost all the records were the same length.)

Tables where the only unique index is on a surrogate key (these may not be wrong but might point to where some unique indexes need to be created)

Fields containing string delimited data

Tables that are too wide and tables where the total bytes per record are higher that the total bytes allowed to be sotred in one record.

Field names that are some text followed by a number, especially if there is more than 1 (phone_1, phone 2, etc)

First_name, last_name, middle_name combinations that have unique indexes or are pks.

triggers that do updates based on the whoe data set (no refernce to inserted or deleted)

Fields containing characters not on the ordinary keyboard. Help to find all those rotten data entries with tabs, etc that mess up data imports and exports.

Emtucifor came up with this

Look for columns using char() instead of varchar() — I know this may not be an issue in 2008 but it is in 2005.

Warn about columns called “ID” in every table instead of “TableNameID”.

Warn about columns not named the same between foreign key table and primary key table (unless another column in the FK table already has that name).

Warn about potential problems cause by overuse of WITH (NOLOCK).

Warn about constraints without explicit names, generated by the server such as “PK__blah__07AC1A97” since scripts against these will not work from database to database (what if you script out “alter table blah drop constraint PK__blah__07AC1A97” from Dev and try to run it against Prod it will fail).

Generalizing Denis’s uniqueidentifier idea: Find tables whose insert and update patterns cause frequent page splits due to the layout of indexes.

Find cases of hash joins occurring when loop joins should be used (my most common query-performance improver). My guess is that statistics are misleading the engine to overestimate the number of rows, thus switching to a hash join. But when the leftmost input has only a few rows, e.g. 2000 when the right input has 150000, then a loop join is almost always far superior to a hash join. A join hint can correct the problem, or another way since the number of rows is low is to put them into a temp table or table variable. Sure, those use tempdb, but it is a small amount, and hash joins can take huge temp table space!

Find query plans that ought to be using bookmark lookups (joining nonclustered index seek to clustered index) instead of scanning clustered index and recommend ways to solve (such as using temp table or table variable, using index hint, other methods)

Warn about semi-joins using “SELECT *” instead of the better “SELECT 1” (improvement at query parse time, not expanding * into column list then later discarding it)

Suggest testing when semi-joins are explicitly or implicitly used, since switching in either direction can at times greatly improve performance (depending on execution plan)

Implicit anti-semi-join:

SELECT DISTINCT

FROM A

LEFT JOIN B ON A.ID = B.ID

WHERE B.ID IS NULL

Explicit anti-semi-join:

SELECT

FROM A

WHERE NOT EXISTS (SELECT 1 FROM B WHERE A.ID = B.ID)

Implicit semi-join:

SELECT DISTINCT A.Something — no B columns in select list, DISTINCT removes duplicates or not DISTINCT keyword but A<->B relationship is one-to-one

FROM A

INNER JOIN B ON A.ID = B.ID

Explicit semi-join

SELECT

FROM A

WHERE EXISTS (SELECT 1 FROM B WHERE A.ID = B.ID)

So that is a partial list, What do you think, would this be a useful tool? Do you have any suggestions? Should this be an application perhaps or pure T-SQL.

Leave a comment with your suggestion