Standards and best practices are like flu shots you take before you’re infected; Database best practices protect your databases from bad things. But, we all make mistakes. It could be because we’re on a time crunch, or we’re lazy (which I’m guilty of by the way), or maybe it’s part of being a developer.
Common mistakes include: tables without a primary key, column name problems, missing foreign keys, etc., This is where I love LTD’s very own SQLCop. I can quickly go on with my database development and rely on SQLCop to detect the issues. It saves time and ensures that database standards are met.
However, there are some issues explicit to data warehouses that SQLCop doesn’t look for. I list those issues below and provide scripts to detect them. I use these scripts in conjunction with SQLCop.
Detect tables in a data warehouse that aren’t prefixed with either Dim or Fact:
Tables in a warehouse are generally prefixed with Dim and Fact for dimensions and fact respectively, to easily distinguish them.
SELECT [schema_name] = s.name ,
table_name = t.name
FROM sys.tables t
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
WHERE t.name NOT LIKE 'Dim%'
AND t.name NOT LIKE 'Fact%'
AND t.TYPE = 'U';
Find tables in a data warehouse that don’t have a primary key:
Like in OLTP databases, all tables in a data warehouse also should have a primary key defined.
SELECT schema_name = SCHEMA_NAME(schema_id) ,
table_name = name
FROM sys.tables
WHERE OBJECTPROPERTY(OBJECT_ID, 'TableHasPrimaryKey') = 0
ORDER BY SCHEMA_NAME(schema_id) ,
name;
Detect dimension tables with a composite primary key:
A composite primary key on a dimension table causes degraded performance. It is best to create a single column primary key.
SELECT c.TABLE_NAME ,
COUNT(*)
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
WHERE CONSTRAINT_TYPE = 'PRIMARY KEY'
AND c.TABLE_NAME = pk.TABLE_NAME
AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME
AND c.TABLE_NAME LIKE 'Dim%'
GROUP BY c.TABLE_NAME
HAVING COUNT(*) > 1
Detect dimension tables that don’t have Identity column as a primary key:
Usually, surrogate key is made the primary key of the dimension table. Surrogate key is an auto generated Identity value.
SELECT dim_table = t.name ,
primary_key = c.name ,
c.is_identity
FROM sys.tables t
INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id
INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id
AND i.type_desc = 'CLUSTERED'
INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id
AND ic.index_id = 1
INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID
AND c.column_id = ic.column_id
WHERE t.TYPE = 'U'
AND t.name LIKE 'Dim%'
AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT'
AND c.is_identity = 0
Detect primary keys that don’t follow the naming convention:
SELECT dim_table = t.name ,
primary_key = c.name
FROM sys.tables t
INNER JOIN sys.key_constraints kc ON t.OBJECT_ID = kc.parent_object_id
INNER JOIN sys.indexes i ON i.OBJECT_ID = kc.parent_object_id
AND i.type_desc = 'CLUSTERED'
INNER JOIN sys.index_columns ic ON ic.OBJECT_ID = kc.parent_object_id
AND ic.index_id = 1
INNER JOIN sys.columns c ON c.OBJECT_ID = t.OBJECT_ID
AND c.column_id = ic.column_id
WHERE t.TYPE = 'U'
AND t.name LIKE 'Dim%'
AND kc.type_desc = 'PRIMARY_KEY_CONSTRAINT'
AND c.name <> REPLACE(t.name, 'Dim', '') + 'Key'
Detect fact tables that have no foreign keys:
Without a foreign key, a fact table isn’t really a fact table.
SELECT table_name = t.name
, fk_count = COUNT(*)
FROM sys.tables t
INNER JOIN
sys.foreign_keys fk ON t.OBJECT_ID = fk.parent_object_id
WHERE t.name LIKE 'fact%'
GROUP BY t.name
HAVING COUNT(*) < 1
Detect fact tables that have foreign key(s) to another fact table:
It’s unlikely to have a fact table related to another fact table.
SELECT foreign_key = fk.name ,
child_table = t.name ,
parent_name = rt.name
FROM sys.foreign_keys fk
INNER JOIN sys.tables rt ON rt.object_id = fk.referenced_object_id
INNER JOIN sys.tables t ON t.object_id = fk.parent_object_id
WHERE rt.name LIKE 'Fact%'
Detect missing foreign key(s) in fact tables – Columns suffixed with Key, but don’t have foreign key constraint:
I stole the following query from here posted by George Mastros, and replaced ID with Key to use it for data warehouse scenario.
SELECT C.TABLE_SCHEMA,C.TABLE_NAME,C.COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS C
INNER Join INFORMATION_SCHEMA.TABLES T
ON C.TABLE_NAME = T.TABLE_NAME
And T.TABLE_TYPE = 'Base Table'
AND T.TABLE_SCHEMA = C.TABLE_SCHEMA
LEFT Join INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE U
ON C.TABLE_NAME = U.TABLE_NAME
And C.COLUMN_NAME = U.COLUMN_NAME
And U.TABLE_SCHEMA = C.TABLE_SCHEMA
WHERE U.COLUMN_NAME IS Null
And C.COLUMN_NAME Like '%Key'
ORDER BY C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME
Results of above queries aren’t always issues. They are just rare, you’ve to look at them closely and make sure there is a reason for each choice. Also, you may use different naming conventions that make these queries void. In that case, I hope you’re able to alter them to your needs.
Follow me on Twitter! @SamuelVanga