I have been thinking about creating a tool like this for a while now. I know that there is the SQL Server 2005 Best Practices Analyzer and that in SQL server 2008 this has become Policy Management. What I want to create is a pure T-SQL tool where you can just run a stored proc and specify what to check.
I wonder if people would be interested in such a tool, I started a therad here: SQLCop and got some feedback from various people.
Here is what I want this tool to do
_identify that:
no procs start with sp_
no tables are named tbl…..
no column names have spaces or other crazy characters in them
every table has a primary key
no table is clustered on a uniqueidentifier when NEWSEQUENTIALID()is not used as a default
log and datafiles are on separate drives
identify procs that call undocumented proc
identify procs that have exec with dynamic sql and recommend to rewrite that to use sp_executesql
identify redundant indexes
identify unused indexes
indentify fragmented tables
identify stale statistics
tables have the float datatype
text and image datatypes are used
procs that have transactions but don’t check for @@error, @@trancount or xact_state()
procs that have select * instead of column list
George added the following
Highlight columns where the collation does not match the default collation for the database
identify where the database default collation does not match the default collation for TempDB
procedures with cursors
no foreign keys. check for columns with ID in the name and if it’s not a PK, then it should be a foreign key.
using @@Identity instead of scope_identity()
Order By ordinal value instead of listing column names
using select * instead of a column list
Check for “Auto Close”, “Auto Create Statistics”, “Auto Shrink”, and “Auto Update Statistics”
triggers that assume a single row is inserted, updated, or deleted.
non-sargable where clauses.
check for functions that operate on table data.
queries involving multiple tables without ANSI style joins.
SQLSister came up with the following
Fields that have no data in the whole table or all fields are “”
Fields where the len of the data stored is close to the max length of the field (possible that the data size may need to be increased), also a way to see the data at the max length to see if data appears to be cut off (Hey I got a file from a client one time that clearly didn’t have enough space intheir last_name field as it contained many records that were one or two characters off a common last_name and almost all the records were the same length.)
Tables where the only unique index is on a surrogate key (these may not be wrong but might point to where some unique indexes need to be created)
Fields containing string delimited data
Tables that are too wide and tables where the total bytes per record are higher that the total bytes allowed to be sotred in one record.
Field names that are some text followed by a number, especially if there is more than 1 (phone_1, phone 2, etc)
First_name, last_name, middle_name combinations that have unique indexes or are pks.
triggers that do updates based on the whoe data set (no refernce to inserted or deleted)
Fields containing characters not on the ordinary keyboard. Help to find all those rotten data entries with tabs, etc that mess up data imports and exports.
Emtucifor came up with this
Look for columns using char() instead of varchar() — I know this may not be an issue in 2008 but it is in 2005.
Warn about columns called “ID” in every table instead of “TableNameID”.
Warn about columns not named the same between foreign key table and primary key table (unless another column in the FK table already has that name).
Warn about potential problems cause by overuse of WITH (NOLOCK).
Warn about constraints without explicit names, generated by the server such as “PK__blah__07AC1A97” since scripts against these will not work from database to database (what if you script out “alter table blah drop constraint PK__blah__07AC1A97” from Dev and try to run it against Prod it will fail).
Generalizing Denis’s uniqueidentifier idea: Find tables whose insert and update patterns cause frequent page splits due to the layout of indexes.
Find cases of hash joins occurring when loop joins should be used (my most common query-performance improver). My guess is that statistics are misleading the engine to overestimate the number of rows, thus switching to a hash join. But when the leftmost input has only a few rows, e.g. 2000 when the right input has 150000, then a loop join is almost always far superior to a hash join. A join hint can correct the problem, or another way since the number of rows is low is to put them into a temp table or table variable. Sure, those use tempdb, but it is a small amount, and hash joins can take huge temp table space!
Find query plans that ought to be using bookmark lookups (joining nonclustered index seek to clustered index) instead of scanning clustered index and recommend ways to solve (such as using temp table or table variable, using index hint, other methods)
Warn about semi-joins using “SELECT *” instead of the better “SELECT 1” (improvement at query parse time, not expanding * into column list then later discarding it)
Suggest testing when semi-joins are explicitly or implicitly used, since switching in either direction can at times greatly improve performance (depending on execution plan)
Implicit anti-semi-join:
SELECT DISTINCT
FROM A
LEFT JOIN B ON A.ID = B.ID
WHERE B.ID IS NULL
Explicit anti-semi-join:
SELECT
FROM A
WHERE NOT EXISTS (SELECT 1 FROM B WHERE A.ID = B.ID)
Implicit semi-join:
SELECT DISTINCT A.Something — no B columns in select list, DISTINCT removes duplicates or not DISTINCT keyword but A<->B relationship is one-to-one
FROM A
INNER JOIN B ON A.ID = B.ID
Explicit semi-join
SELECT
FROM A
WHERE EXISTS (SELECT 1 FROM B WHERE A.ID = B.ID)
So that is a partial list, What do you think, would this be a useful tool? Do you have any suggestions? Should this be an application perhaps or pure T-SQL.
Leave a comment with your suggestion