Take a look at the books in your office. Think about how much space they take up. Now pick one up and page through it. Is there a lot of white space? Some pages may only have a couple paragraphs on them. There are a lot of words and letters repeated. If there was a way to compress the book, it would be much lighter to store and carry around, wouldn’t it?

The same principle applies to your SQL Server databases. Pages have free space on them. They also have bits of data that are repeated. A great feature of SQL Server is data compression, which will reduce the size of a database on disk, making storage and administration easier.

SQL Server 2008 (and newer versions) Enterprise Edition includes two types of data compression: row and page. The following demos were done in SQL Server 2012 Enterprise Edition in order to show the value data compression has and efficiency gains that can be had by utilizing this enterprise-level feature.’ The demos will also show key performance factors to be aware of that may, or may not, be effected by data compression.

Row Compression

Row compression reduces the amount of space rows take up on a page in several ways:

  • Metadata overhead is reduced.
  • It uses the least amount of storage possible for some data types. Some numeric data types are reduced in size, and some character data types remove padding. For a full list of which data types can be compressed, and the method used, read Row Compression Implementation.
  • NULL and 0 values take up no space on a page.

Let’s see this in action. I’ll walk you through creating and populating a table, checking the size of the table, and viewing the size of the rows with DBCC PAGE. Then, I’ll enable row compression, rebuild the table, and revisit the page to see what’s changed.

First, I’ll create a table with several data types INT, CHAR(50), DATETIME, MONEY, and DECIMAL, to show the effects of compression on each.

T-SQL
1
2
3
4
5
6
7
8
9
10
USE AdventureWorks2012; 
GO  
CREATE TABLE TestRowCompression 
( ItemID INT , 
  ItemName CHAR(50) , 
  DateAdded DATETIME , 
  UnitPrice MONEY , 
  ItemLength DECIMAL , 
  ItemWidth DECIMAL  ) 
WITH ( DATA_COMPRESSION = NONE);
USE AdventureWorks2012; 
GO  
CREATE TABLE TestRowCompression 
( ItemID INT , 
  ItemName CHAR(50) , 
  DateAdded DATETIME , 
  UnitPrice MONEY , 
  ItemLength DECIMAL , 
  ItemWidth DECIMAL  ) 
WITH ( DATA_COMPRESSION = NONE);

Then, I’ll insert data into the table. I’m using a variety of values, including 0 and NULL, to show how that is compressed.

T-SQL
1
2
3
4
5
6
INSERT INTO TestRowCompression 
VALUES ( 100000, 'Small Red Lego Brick', '2012/12/4', '0.75', 0, 0 ); 
INSERT INTO TestRowCompression 
VALUES ( 100001, 'Medium Blue Lego Brick', '2012/12/04', NULL, 1.5, .5 ); 
INSERT INTO TestRowCompression  
VALUES ( 12, 'Green Plate', '2012/12/01 08:34:56', '500.00', 12, 12 );
INSERT INTO TestRowCompression 
VALUES ( 100000, 'Small Red Lego Brick', '2012/12/4', '0.75', 0, 0 ); 
INSERT INTO TestRowCompression 
VALUES ( 100001, 'Medium Blue Lego Brick', '2012/12/04', NULL, 1.5, .5 ); 
INSERT INTO TestRowCompression  
VALUES ( 12, 'Green Plate', '2012/12/01 08:34:56', '500.00', 12, 12 );

Let’s see how the data appears when selected in a normal query.

T-SQL
1
2
3
4
5
6
7
SELECT ItemID , 
 ItemName ,  
 DateAdded , 
 UnitPrice , 
 ItemLength , 
 ItemWidth 
FROM TestRowCompression;
SELECT ItemID , 
 ItemName ,  
 DateAdded , 
 UnitPrice , 
 ItemLength , 
 ItemWidth 
FROM TestRowCompression;

What I want to know now is how much space these rows are taking up in the database, and on the page. I can use sp_spaceused to find the reserved and used disk space of the table.

T-SQL
1
sp_spaceused 'dbo.TestRowCompression';
sp_spaceused 'dbo.TestRowCompression';

Here, I can see that there are three rows, and the data is taking up 8KB of space that’s one page. How do I tell how much space each row is using? I can view the page, using DBCC PAGE, to find that information.

Note: if you’re not familiar with DBCC IND and DBCC PAGE, I recommend reading Using DBCC PAGE and DBCC IND to find out if page splits ever roll back by Paul Randal.

I’ll use DBCC IND to find the file number and page number.

T-SQL
1
DBCC IND ('AdventureWorks2012', 'TestRowCompression', 0);
DBCC IND ('AdventureWorks2012', 'TestRowCompression', 0);

I’ll run DBCC PAGE, using the data page – PageType 1, PagePID 23816 – to view the data on the page.

T-SQL
1
2
3
4
DBCC TRACEON (3604); 
GO 
DBCC PAGE ('AdventureWorks2012', 1, 23816, 3); 
GO
DBCC TRACEON (3604); 
GO 
DBCC PAGE ('AdventureWorks2012', 1, 23816, 3); 
GO

I’m going to skim past the header and go to the information for the first row. Here, in the top right, I can see the entire row uses 95 bytes. You can see the bytes used by each field by reviewing ‘Length (physical)’. The ItemID field is an INT, which takes 4 bytes for storage. The ItemName field is declared as CHAR(50), and even though it is only 20 characters in length, it’s using 50 bytes. ItemLength and ItemWidth take up 9 bytes each, even though they are values of 0. This is the way data is normally stored on pages.

Reviewing the data for the third record, we can see that also takes up 95 bytes, even though the values for ItemID and ItemName are shorter in length.

If I am looking to reduce the space used by data in my database, I could enable row compression to do so. Before you start rebuilding all your tables in a development environment, use the stored procedure sp_estimate_data_compression_savings to determine what level of compression can be achieved. This will help you determine how much space can be saved, and if it is worth making the change.

T-SQL
1
EXEC sp_estimate_data_compression_savings 'dbo', 'TestRowCompression', NULL, NULL, 'ROW';
EXEC sp_estimate_data_compression_savings 'dbo', 'TestRowCompression', NULL, NULL, 'ROW';

Compressing this table would offer very little benefit, since at this time there are only three rows. However, I decide I want to enable row compression, to manage future data growth.

Note: this will rebuild the table you are working with. Depending on the size of your table, this may take some time. Always test this in a development environment first.

T-SQL
1
2
ALTER TABLE dbo.TestRowCompression REBUILD 
WITH (DATA_COMPRESSION = ROW);
ALTER TABLE dbo.TestRowCompression REBUILD 
WITH (DATA_COMPRESSION = ROW);

I’ll run sp_spaceused again to compare the space used for the table.

T-SQL
1
sp_spaceused 'dbo.TestRowCompression';
sp_spaceused 'dbo.TestRowCompression';

The data size has remained the same, but the unused size has increased. But, what this doesn’t tell me is how much space is SQL Server using for each field in each row? What effect has row compression had on the rows? Let’s view the page again.

I will run DBCC IND again to get the page number. Then, I run DBCC PAGE to view the results.

T-SQL
1
2
3
4
5
DBCC IND ('AdventureWorks2012', 'TestRowCompression', 0);
DBCC TRACEON (3604);
GO
DBCC PAGE ('AdventureWorks2012', 1, 183656, 3);
GO
DBCC IND ('AdventureWorks2012', 'TestRowCompression', 0);
DBCC TRACEON (3604);
GO
DBCC PAGE ('AdventureWorks2012', 1, 183656, 3);
GO

I can see things have changed drastically! The overall row size is reduced to 42 bytes. ItemID is only using 3 bytes. ItemName is now using 20 bytes. DateAdded has been reduced from 8 to 7 bytes. UnitPrice has decreased from 8 to 2 bytes. ItemLength and ItemWidth are using 0 bytes.

Looking at the third row, I can see even less space is being used ‘ 36 bytes. Here, the INT field, ItemID, is reduced to 1 byte, and ItemName is using only 11 bytes.

This proves that row compression can be a powerful tool for reducing the space taken up by data on pages! Here are a few examples, ranging from Person.Person barely compressing, to dbo.bigProduct showing over 50% compression.

Page Compression

The next level of data compression is page compression. The leaf level of tables and indexes will have three operations performed on it, in order:

  • Row compression
  • Prefix compression
  • Dictionary compression

We’ve already covered row compression. How do prefix and dictionary work?

Prefix Compression

This type will look at the data stored in each column. It will find a pattern in the beginning ‘ the prefix ‘ of each value, store that in a section of the page called the compression information structure (CIS), and reference the prefix in the row, instead of the entire value.

Let’s use names as an example, since that is something you could have with many repeating values.

Header

Jenna

Jen

Jennifer

Jennie

Jenny

Jeren

Jeri

Jessica

Jess

Jes

Joseph

Jonathan

After prefix compression is applied, the prefixes are stored in the CIS and then referenced in the rows.

Header

Jen (0)

Jes (1)

Jo (2)

0na

0

0nifer

0nie

0ny

Jeren

Jeri

1sica

1s

1

2seph

2nathan

Dictionary Compression

Dictionary compression takes this one step further and replaces repeating values on the page, regardless of whether they are at the beginning of the value, in the middle, or at the end.

Header

Jen (0)

Jes (1)

Jo (2)

er (3)

ni (4)

na (5)

05

0

04f3

04e

0ny

J3en

J3i

1sica

1s

1

2seph

25than

To show the effects, let’s copy the data from TestRowCompression into a new table.

T-SQL
1
2
3
4
5
6
7
8
SELECT ItemID , 
 ItemName , 
 DateAdded , 
 UnitPrice , 
 ItemLength , 
 ItemWidth 
INTO''' TestPageCompression 
FROM''' TestRowCompression;
SELECT ItemID , 
 ItemName , 
 DateAdded , 
 UnitPrice , 
 ItemLength , 
 ItemWidth 
INTO''' TestPageCompression 
FROM''' TestRowCompression;

I’ll alter the table to use page compression and check the size.

T-SQL
1
2
3
ALTER TABLE dbo.TestPageCompression REBUILD 
WITH (DATA_COMPRESSION = PAGE);
sp_spaceused 'dbo.TestPageCompression';
ALTER TABLE dbo.TestPageCompression REBUILD 
WITH (DATA_COMPRESSION = PAGE);
sp_spaceused 'dbo.TestPageCompression';

Right now, this is no different from when I applied row compression. Because this is a small table, with few repeating values, this does not surprise me. I have another table in my database, dbo.bigTransactionHistory, which is larger and may offer better results. Let’s look.

T-SQL
1
sp_spaceused 'dbo.bigTransactionHistory';
sp_spaceused 'dbo.bigTransactionHistory';

We can see the table has approximately 1100 MB of data.

First, I will apply row compression and check the size.

T-SQL
1
2
3
ALTER TABLE dbo.bigTransactionHistory REBUILD 
WITH (DATA_COMPRESSION = ROW);
sp_spaceused 'dbo.bigTransactionHistory';
ALTER TABLE dbo.bigTransactionHistory REBUILD 
WITH (DATA_COMPRESSION = ROW);
sp_spaceused 'dbo.bigTransactionHistory';

The space used by data is now down to 725 MB.

Now I will apply page compression to see if this can be reduced further.

T-SQL
1
2
3
ALTER TABLE dbo.bigTransactionHistory REBUILD 
WITH (DATA_COMPRESSION = PAGE);
sp_spaceused 'dbo.bigTransactionHistory';
ALTER TABLE dbo.bigTransactionHistory REBUILD 
WITH (DATA_COMPRESSION = PAGE);
sp_spaceused 'dbo.bigTransactionHistory';

It has indeed reduced the space used, down to 472 MB. This is roughly 58% space savings.

Using DBCC PAGE, I can view the CIS.

T-SQL
1
2
3
4
5
DBCC IND ('AdventureWorks2012', 'bigTransactionHistory', 0);
DBCC TRACEON (3604);
GO 
DBCC PAGE ('AdventureWorks2012', 1, 380976, 3); 
GO
DBCC IND ('AdventureWorks2012', 'bigTransactionHistory', 0);
DBCC TRACEON (3604);
GO 
DBCC PAGE ('AdventureWorks2012', 1, 380976, 3); 
GO

The compression information is stored below the header and above the rows.

Page compression can also greatly reduce the space your database is using on disk. Here are a few examples of tables that have had page compression applied. In these examples, space savings range from 38% to 61%.

Question:’if I have enabled row or page compression, can I take a backup of the database, which is on Enterprise Edition, and restore it to a server that is Standard Edition? No, that can’t be done. Paul Randal demonstrates this in Does my database contain Enterprise-only features? Be aware of this, and the need to manage and be aware of your compression settings, if this could happen in your environment.

There Has To Be A Negative, Right?

Data compression sounds fantastic, and it is. But, as Robert Heinlein would say, TANSTAAFL. The price you pay for your data taking up less space on disk is higher CPU usage. When the data is written to the table, or indexes are rebuilt, additional CPU will be needed to compress it. If your system currently experiences CPU pressure, it’s not a good idea to enable compression on it.

Resources

SQL Server Technical Paper – Data Compression: Strategy, Capacity Planning and Best Practices