Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « A couple of ways of getting the top 2 distinct values from a set in SQL ServerHow to copy data/append data into files from within T-SQL »
    comments

    Not all systems can correctly accommodate Unicode data. Therefore, it's not uncommon to receive data that is encoded in such a way that it can be stored in an ASCII format. Of course, SQL Server can store Unicode data easily if you use the nvarchar data type. Converting to and from Unicode data can be time consuming. When you receive data that is Unicode encoded, I suggest you decode it and store it in an nvarchar column. This will result in better performance.

    Usually, when Unicode is encoded in an ASCII format, the Unicode value is somehow embedded within it. For example, the Greek letter Omega Ω, has a Unicode value of 937. It may be encoded like this: Ω

    Usually, non Unicode data will not be encoded. So you may see a string like this: "Ωmega"

    Converting from an encoded string to a real Unicode string is more difficult than it may seem. There are 65,535 possible Unicode characters (the first 255 match ASCII characters). The example I showed has 3 digit Unicode values, but we should also be able to accommodate 4 and 5 digit Unicode characters too.

    The basis for the function below is the CharIndex SQL Server functions. CharIndex has an optional 3rd parameter that allows you to specify where (within the string) to start searching. What we do is first find where the encoded Unicode data starts, and then where it ends. Once we know this, we do a simple replace. Since there can be multiple encoded Unicode characters, we put the whole thing in to a while loop.

    1. Create Function DecodeUnicodeData(@Data nVarChar(4000), @Prefix VarChar(100), @Suffix VarChar(100))
    2. Returns nvarchar(4000)
    3. As
    4. Begin
    5.   Declare @Start Int
    6.   Declare @End Int
    7.  
    8.   While CharIndex(@Prefix, @Data) > 0
    9.     Begin
    10.       Set @Start = CharIndex(@Prefix, @Data)
    11.       Set @End = CharIndex(@Suffix, @Data, @Start)
    12.  
    13.       Set @Data = Replace(@Data, SubString(@Data, @Start, @End -@Start + Len(@Suffix)),NCHAR(SubString(@Data, @Start+ Len(@Prefix),@End -@Start - Len(@Prefix))))
    14.     End
    15.  
    16.   Return @Data
    17.  
    18. End

    You can call it like this:

    1. Select dbo.DecodeUnicodeData('Ωmega', '&#', ';')

    This blog wouldn't be complete if I didn't also give you a function for encoding data, too. What I mean is, suppose you need to supply data in this format. How would you do it?

    1. Create Function EncodeUnicodeData(@Data NVarChar(4000), @Prefix VarChar(20), @Suffix VarChar(20))
    2. Returns VarChar(8000)
    3. As
    4. Begin
    5.   Declare @i Int
    6.   Declare @Output VarChar(8000)
    7.  
    8.   Set @i = 1
    9.   Set @Output = ''
    10.  
    11.   While @i <= Len(@Data)   
    12.     Begin
    13.       If Unicode(SubString(@Data, @i, 1)) > 255
    14.         Set @Output = @Output + @Prefix + Convert(VarChar(5),Unicode(SubString(@Data, @i, 1))) + @Suffix
    15.       Else
    16.         Set @Output = @Output + SubString(@Data, @i, 1)
    17.  
    18.       Set @i = @i + 1
    19.     End
    20.  
    21.   Return @Output
    22. End

    You can call it like this:

    1. Select dbo.EncodeUnicodeData(N'&#937;mega', '&#', ';')

    About the Author

    George has been developing software professionally for 19 years, first for the department of defense, and then for various other companies. In 1998, George started his software company, Orbit Software, specializing in School Bus Transportation software. His specialty is refining SQL Server queries to deliver optimal performance.
    Social SitingsTwitterLTD RSS Feed
    unicode
    Instapaper

    3 comments

    Comment from: SQLDenis [Member] Email
    *****
    SQLDenis Nice stuff George, now I can encode and decode that awesome Wingdings document I wrote :-)

    just kidding, I do have big5 and gb Chinese characters

    Here is an example 道琼斯中国88指数



    07/08/09 @ 07:37
    Comment from: Shital Desai [Visitor]
    Shital Desai Hi,

    I want to store phone symbol
    http://www.fileformat.info/info/unicode/char/2706/index.htm

    How to store this phone symbol using SQL server 2005 server side coding (Storedp Procedure), any suggesiton?

    I would really appreciate your help.

    Regards

    Shital
    12/23/09 @ 06:34
    Comment from: SQLDenis [Member] Email
    SQLDenis Shital, you will see ? or a square in SSMS/Em but if you paste it into word it will be properly encoded...you need to set the encoding in your web page to properly display it
    12/23/09 @ 08:14

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)