T-SQL TuesdayHow big are your core databases right now? Do you know how they got that way? Is that normal? These questions are impossible to answer just by looking at the database options dialog in SSMS. They are also questions I’ve had to try to answer in several different environments, because without logging we not only didn’t know what normal growth looked like for our system, we didn’t know what tables and indexes were driving that growth.

The answer lies in a very simple query and a little logging.

Aaron Nelson (twitter|blog) is hosting T-SQL Tuesday this month with the topic of “Logging”. Check out his post for a list of other T-SQL Tuesday posts this month and congratulate him on speaking at TechEd on Tuesday morning (he’s denied being the loud powershell fan in the keynote, btw).

The Setup

Monitoring the sizes at the index and table level provides raw material for looking several different statistics. Luckily this is fairly easy to do for on premises and SQL Azure databases.

For the purpose of these examples I’m going to be logging the database growth in the same database. Obviously this won’t work for every environment and adds an extra variable to our growth statistics.

First up, let’s create a table to store the raw data in:

CREATE TABLE dbo.DatabaseSizeLog(
	Id		int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	LogTime		DateTime2 NOT NULL,
	DatabaseId	int NOT NULL,
	ObjectId	int NOT NULL,
	TableName	varchar(200) NOT NULL,
	IndexId		int NULL,
	IndexName	varchar(200) NULL,
	AllocatedBytes	int NOT NULL,
	UsedBytes	int NOT NULL,
	UnusedBytes	int NOT NULL,
	[RowCount]	int NOT NULL
);

Note that I am capturing both ids and names for the table and index. We could JOIN to get the names, but being able to run a quick one line statement to look at key information is worth the slightly higher storage cost. I’ve also used an integer IDENTITY column as the required clustered index for a SQL Azure database. A better option would be LogTime, DatabaseId, and IndexId in the order that makes the most sense for how you intend to look at the data.

Once we have the raw table, we need the query to capture the data:

INSERT INTO dbo.DatabaseSizeLog(LogTime, DatabaseId, ObjectId, 
				TableName, IndexId, IndexName, AllocatedBytes, 
				UsedBytes, UnusedBytes, [RowCount])
SELECT    GetDate()
	, DB_ID()
	, o.object_id AS TableId
	, o.name
	, ps.index_id as [IndexId]
	, CASE WHEN ps.index_id > 1 THEN i.name ELSE NULL END AS IndexName
	, CAST(ps.reserved_page_count * 8192 AS DECIMAL(18,0)) AS Allocated
	, CAST(ps.used_page_count * 8192 AS DECIMAL(18,0)) AS Used
	, CAST((ps.reserved_page_count - used_page_count) * 8192 AS DECIMAL(18,0)) AS Unused
	, ps.row_count AS "RowCount"
FROM sys.dm_db_partition_stats ps
	INNER JOIN sys.objects o ON ps.object_id = o.object_id
	INNER JOIN sys.indexes i on i.index_id = ps.index_id AND i.object_id = ps.object_id
WHERE o.type NOT IN ('S','IT');

This should be placed in a scheduled job in SQL Agent for an on-premises database, or tied into a scheduled task system (perhaps via a worker role) for a SQL Azure database.

Now that we have some data and we’re logging it at some regular interval, lets start getting some use out of it.

Basic Information

Once we have some data in the table, we can build some information gathering queries to learn more about our data growth rates.

What are our largest tables right now?

DECLARE @TargetTime DateTime;
SELECT TOP 1 @TargetTime = LogTime FROM dbo.DatabaseSizeLog ORDER BY LogTime DESC;

SELECT TOP 10
	TableName, 
	"Allocated (MB)" = CAST(SUM(AllocatedBytes) as DECIMAL(18,2)) / 1048576, 
	"Used in MB" = CAST(SUM(UsedBytes) as DECIMAL(18,2)) / 1048576, 
	"Unused in MB" = CAST(SUM(UnusedBytes) as DECIMAL(18,2)) / 1048576, 
	"Row Count" = MIN([RowCount])
FROM dbo.DatabaseSizeLog
WHERE LogTime = @TargetTime
GROUP BY TableName
ORDER BY SUM(AllocatedBytes) DESC;

What percentage of our largest tables is indexes?

DECLARE @TargetTime DateTime;
SELECT TOP 1 @TargetTime = LogTime FROM dbo.DatabaseSizeLog ORDER BY LogTime DESC;

SELECT TOP 10
	TableName, 
	"Total Allocated (MB)" = CAST(SUM(AllocatedBytes) as DECIMAL(18,2)) / 1048576, 
	"Percent For Indexes" = CAST(SUM(CASE WHEN IndexName IS NOT NULL THEN AllocatedBytes ELSE 0 END) as DECIMAL(18,2)) / CAST(SUM(AllocatedBytes) as DECIMAL(18,2)),
	"Row Count" = MIN([RowCount])
FROM dbo.DatabaseSizeLog
WHERE LogTime = @TargetTime
GROUP BY TableName
ORDER BY SUM(AllocatedBytes) DESC;

Which tables are growing the quickest?

WITH LogHistory AS (
	SELECT Id, LogTime, DatabaseId, ObjectId, 
			TableName, IndexId, IndexName, AllocatedBytes, 
			UsedBytes, UnusedBytes, [RowCount],
			ROW_NUMBER() OVER (PARTITION BY ObjectId, IndexId ORDER BY LogTime DESC)	AS PastTimeNumber
	FROM dbo.DatabaseSizeLog
)
SELECT 
		LH_NOW.TableName,
		PercentGrowth = CASE 
			WHEN SUM(LH_THEN.AllocatedBytes) = 0 AND SUM(LH_NOW.AllocatedBytes) = 0 THEN 0
			WHEN SUM(LH_THEN.AllocatedBytes) = 0 THEN 1
			ELSE 1 - SUM(LH_NOW.AllocatedBytes) / CAST(SUM(LH_THEN.AllocatedBytes) AS DECIMAL(18,2))
		END,
		BytesPerHour = (SUM(LH_NOW.AllocatedBytes) - SUM(LH_THEN.AllocatedBytes)) / (DateDiff(minute, LH_THEN.LogTime, LH_NOW.LogTime) / 60.0)
FROM LogHistory LH_NOW
	INNER JOIN LogHistory LH_THEN ON LH_NOW.ObjectId = LH_THEN.ObjectId AND LH_NOW.IndexId = LH_THEN.IndexId
WHERE LH_NOW.PastTimeNumber = 1
	AND LH_THEN.PastTimeNumber = 3	-- tweak this to widen or narrow the comparison time
GROUP BY LH_Now.TableName, LH_NOW.LogTime, LH_THEN.LogTime
ORDER BY SUM(LH_NOW.AllocatedBytes) - SUM(LH_THEN.AllocatedBytes) DESC

Monitoring

Besides being able to look at some general statistics, if we have a system that can monitor the result of a SQL query, we have a number of options we can start monitoring against:

Database size

Most recent total allocated size of the database (in MB).

SELECT TOP 1 SUM(AllocatedBytes) / 1048576.0
FROM dbo.DatabaseSizeLog
GROUP BY LogTime
ORDER BY LogTime DESC;

Hourly database growth for the last two entries

Hourly growth in MB/hour for the last two entries

WITH LastTwoEntries AS (
	SELECT TOP 2 
			SizeInMB = SUM(AllocatedBytes) / 1048576.0,
			LogTime
	FROM dbo.DatabaseSizeLog
	GROUP BY LogTime
	ORDER BY LogTime DESC
)
SELECT (LTE1.SizeInMB - LTE2.SizeInMB) / (DATEDIFF(minute, LTE1.LogTime, LTE2.LogTime) / 60.0)
FROM LastTwoEntries LTE1
	INNER JOIN LastTwoEntries LTE2 ON LTE1.LogTime > LTE2.LogTime

Wait, you deleted how many records?!?

This lists all tables that have shrunk by more than 10% since the previous log entry.

WITH Entries AS (
	SELECT  ObjectId,
			TableName,
			SizeInMB = AllocatedBytes / 1048576.0,
			LogTime,
			ROW_NUMBER() OVER (PARTITION BY ObjectId ORDER BY LogTime DESC) AS Num
	FROM dbo.DatabaseSizeLog
	WHERE IndexName IS NULL
)
SELECT E1.TableName
FROM Entries E1
	INNER JOIN Entries E2 ON E1.ObjectId = E2.ObjectId
WHERE E1.Num = 1 
	AND E2.num = 2
	AND E2.SizeInMb <> 0
	AND E1.SizeInMb < E2.SizeInMb - (.10 * E2.SizeInMb);

And more

By now I hope you have even more ideas for ways to slice this simple data set. Logging is a powerful tool in our inventory and often even a simple query can provide a great deal of information if we log it over time. The basic query that started this post probably didn’t look like much, but capturing the values over time allowed us to expose a lot of new information about our database. Having visibility into how our systems are running can be the difference between finding out our database hit a size limit after the fact and knowing that it is growing at an abnormal pace long before the problem occurs.