This is day twenty-one of the SQL Advent 2012 series of blog posts. Today we are going to look at why it matters how you do things when working with a VLDB
VLDB stands for Very Large Database, Not too long ago the definition of VLDB was a database that occupies more than 1 terabyte or contains several billion rows. This of course will change over time, there are quite a few companies with Petabyte size databases. Servers with many CPUs and lots of RAM are required when your databases are big
What is the big deal with VLDB anyway, it is just bigger right?
The problem with a VLDB is that you have to change your mindset and you have to change your ways how you do certain things. Think of it like driving 20 miles per hours compared to driving 160 miles per hour, when you drive very fast you can’t get away with doing stupid stuff on the road, you will crash. The same is true when working with big databases. You can’t just delete 100 million rows, you might fill up the log file, you have to do it in batches if you can’t use a truncate statement.
While you can get away with having just one drive when dealing with smaller database, this doesn’t hold true for Very Large Databases. With Very Large Databases, ideally you want separate drives for tempdb, log file and data files. You can also put the non clustered indexes on a different spindle, separate from the heaps and clustered indexes. Also make sure that you size your database files correctly to improve performance.
If you have 64 GB of RAM and your database is 50 GB, it is very likely that the whole database will be in RAM at some point. When your database is 2 TB and you have only 512 GB of RAM, you cannot even have a quarter of the DB in RAM. This is where you need to have fast hard drives. A fast SAN or some Solid State Drives are worth looking into.
Partitioning can help with maintenance and returning data faster, take a look at my partitioning post for more info
Make the indexes narrow, you want your lookups to be as efficient as possible.
If using partitions you can now rebuild just one partition of the index, this will make index maintenance easier and faster.
When you have a small database, you can delete all the rows from a table without a problem generally. Every now and then you will have someone do this, they will of course do this just one because after that you will have THE TALK with them about this
Instead of doing that, use truncate or do deletes in batches of 50000 for example
Select * from HugeTable
Every now and then you will have someone execute something like the following
SELECT * FROM HugeTable ORDER By SomeColumn
When doing something like that SQL Server will create a worktable in tempdb, if the table is big and your tempdb is placed on a drive that doesn’t have a lot of space, you will run out of space, take a look at Dealing with the could not allocate new page for database ‘TEMPDB’. There are no more pages available in filegroup DEFAULT error message how to resolve this
Compression is great, I use it, it makes the backups smaller, it makes the restore faster. I use database compression as well as data compression, when using data compression, SQL Server will be able to store more data per page and thus you will be able to have more data in RAM
When coding against Very Large Databases, you need to test with a QA or testbox that has about the same data, you will get into trouble if you don’t. Take a look at Your testbed has to have the same volume of data as on production in order to simulate normal usage to see what can happen.
Ah yes, how to bring the database to its knees, have some n00bs write some queries against your database. While you can get away with writing non-SARGable queries, queries where the index is not used, you will suffer immensely if you do this on Very Large Databases
I only touched upon a couple of key points, just keep in mind that if you do the thing I mentioned here even with smaller databases, you won’t suffer when your database starts to grow. And no, while premature optimization might be the root of all evil, I would call this best practices instead