Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « My SQL Server Days slides and demosSQL Advent 2012 Day 1: Sizing database files »
    comments

    This is day two of the SQL Advent 2012 series of blog posts. Today we are going to take a look at how data types can have an impact in queries and also the size of your database.

    Char vs NChar

    SQL Server has two data types to store character data[1], both of them come in fixed and variable length sizes. The char and varchar data type uses one byte of store to store one character, the nchar and nvarchar data type uses two bytes of store to store one character. The nchar and nvarchar data types are used to store unicode of data

    Let's think about that for a second, what we are saying is that the char and varchar data type can store twice the number of characters in the same amount of store as the nchar and nvarchar data type. Why does this matter, space is cheap right? True, space is getting cheaper but we are also storing more and more data every year.

    Now think about what happens you have everything stored as unicode data

    • What happens to your backup and restore process, will it be faster or slower, will the files be bigger if not compressed?
    • What about when transferring the results to and from your database server, are the packets able to store the same number of characters.
    • What about the amount of data on a page, what does this do to indexes and index lookups, how does it affect index maintenance?

    If you don't need it, then don't use unicode data.
    Some examples of what I have seen stored in nchar and nvarchar when realy you shouldn't:

    Zip Code for US addresses
    US addresses
    Social Security Numbers (which were stored in plain text none the less)
    Integer data (enforced by constraints or the app layer to make sure these were only digits)

    Let's take a quick look by running some T-SQL

    First create these two tables

    1. CREATE TABLE TestChar (SomeCol char(10))
    2. GO
    3.  
    4. CREATE TABLE TestNChar (SomeCol nchar(10))
    5. GO

    Now populate each with some data

    1. INSERT TestChar
    2. SELECT TOP 1000000 '1234567890'
    3. FROM sys.sysobjects c1
    4. CROSS JOIN sys.sysobjects c2
    5. CROSS JOIN sys.sysobjects c3
    6. CROSS JOIN sys.sysobjects c4
    7. GO
    8.  
    9. INSERT TestNChar
    10. SELECT TOP 1000000 '1234567890'
    11. FROM sys.sysobjects c1
    12. CROSS JOIN sys.sysobjects c2
    13. CROSS JOIN sys.sysobjects c3
    14. CROSS JOIN sys.sysobjects c4
    15. GO

    Let's see how much space is used by both tables

    1. EXEC sp_spaceused 'TestChar'
    2.  
    3. EXEC sp_spaceused 'TestNChar'

    18,824 KB
    28,744 KB

    If you looked at the reserved column, you will see that the nchar data is using 10 MB more than the char data

    Implicit conversions

    Besides the storage increase there is also a problem when querying for data that looks like varchar but is stored as unicode. Run the code below.

    1. SET SHOWPLAN_TEXT ON
    2. GO
    3. DECLARE @v varchar(10) = '0123456789'
    4.  
    5. SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
    6. GO
    7.  
    8. SET SHOWPLAN_TEXT OFF
    9. GO

    Here is the plan for that query

    |--Table Scan(OBJECT:([tempdb].[dbo].[TestChar]),
    WHERE:([tempdb].[dbo].[TestChar].[SomeCol] like [@v]+'%'))

    If we look at the plan we can see that this looks pretty good
    Usually people will sometimes change the datatype of a column but will not change any code that access this column. Let's now change the data type of the column to nchar

    1. ALTER TABLE TestChar ALTER COLUMN SomeCol nchar(10)
    2. GO

    Run the query that gives you the plan again

    1. SET SHOWPLAN_TEXT ON
    2. GO
    3. DECLARE @v varchar(10) = '0123456789'
    4.  
    5. SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
    6. GO
    7.  
    8. SET SHOWPLAN_TEXT OFF
    9. GO

    Here is the plan

    |--Table Scan(OBJECT:([tempdb].[dbo].[TestChar]),
    WHERE:([tempdb].[dbo].[TestChar].[SomeCol] like CONVERT_IMPLICIT(nvarchar(11),[@v]+'%',0)))

    As you can see, there is a conversion going on right now.

    In order to get rid of the conversion, use the correct data types

    1. SET SHOWPLAN_TEXT ON
    2. GO
    3. DECLARE @v nvarchar(10) = '0123456789'
    4.  
    5. SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
    6. GO
    7.  
    8. SET SHOWPLAN_TEXT OFF
    9. GO


    Using larger datatypes when it is not needed

    I see this problem mostly with the integer data types. Below is a list of the integer data types together with their storage size and range

    tinyint
    Storage size is 1 byte. Integer data from 0 through 255.

    smallint
    Storage size is 2 bytes. Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767).

    int
    Storage size is 4 bytes. Integer data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).

    bigint
    Storage size is 8 bytes. Integer data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807).

    Now imagine facebook with a billion users decided to use bigint as CountryID in their Country table, this key is then uses as a foreign key in the user demographics table. This is wasteful,either use a smallint since we won't go through 32 thousand countries in the forseeable feature or use the 2 or 3 character ISO code. The problem is even worse if you have a compound 6 column key and it is used as a foreign key in tons of other tables...that was real fun to clean up....use a surrogate 1 column key in that case...but be sure to test....normalize till it hurts then denormalize till it works....I will cover normalization in another post...just wanted to mention it




    That is all for day two of the SQL Advent 2012 series, come back tomorrow for the next one, you can also check out all the posts from last year here: SQL Advent 2011 Recap

    [1] I know there is text and ntext but hose are deprecated

    About the Author

    User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
    Social SitingsTwitterFacebookLinkedInHomePageFlickrLTD RSS Feed
    1927 views
    InstapaperVote on HN

    No feedback yet

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)