Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Calculating number of workdays between 2 datesTable sizes »
    comments

    SQL Server (T-SQL specifically) is not usually the best place to write a function for modifying the case of your data. String functions are generally slow and often a bit cumbersome to implement. That being said, it's not uncommon to have data in your tables that needs to be "cleaned up a bit". In situations like this, it is acceptable to write and use a function like this.

    With SQL, it is easy to convert strings to upper case or lower case, but what about making it mixed case/title case. This functionality is useful in many different situations. There are many examples on the internet that show how this can be done. Some work better than others, and perform better than others. The simplest versions of this will loop through a string, character by character, and capitalize anything following a space. Unfortunately, this is not always adequate.

    I have developed a method that actively seeks out any character that is not a letter. Under certain circumstances, this is more appropriate because it satisfies more situations. This method also reduces the number of loops that need to be executed by using the PatIndex function and looking for non-alpha characters followed by lower case alpha characters.

    General theory:

    Like I stated earlier, this code actively seeks out data from within the string that needs to be changed. The first thing we do is convert the string to lower case except for the first character, which is converted to upper case. For example, suppose you have this: THIS IS-ALL "Caps"

    At the beginning of the function, we convert to lower case except for the first character, so it becomes:

    This is-all "caps"

    Next, we actively seek out positions where one character is non-alphabetic followed by a lower case alphabetic character. In this case, we would return character position 5 (space followed by lower case i). We then use the stuff function to convert this to upper case. This would result in:

    This Is-all "caps"

    We then repeat this process until there are no more occurrences of non-alpha followed by lower case alpha characters.

    This code uses several functions, some that you may not be familiar with. Upper converts characters to upper case. Lower converts characters to lower case. SubString allows you to retrieve data from the middle of string. The less common functions here are PatIndex and Stuff.

    PatIndex searches through a string and returns the position where a search pattern is found. In our case, we want to find a "non-alpha character followed by a lower case alpha character". By default, searches are not case sensitive. This is actually controlled by the collation of your database. You can get case sensitive searches by using a binary collation. For this function, I use:

    PatIndex('%[^a-zA-Z][a-z]%', @Data COLLATE Latin1_General_Bin)

    The other function I use here is stuff. Stuff can be used to insert text in to the middle of a string. It can also be used similar to a Replace function. With replace, you essentially replace all occurrences of one string with another. Stuff is different in that you can replace data based on its position within a string without regard to what is at that position. For example:

    Select Stuff('lower case', 7, 1, 'C')

    Notice how the 7th character (the lower case c) is replaced with an upper case C.

    1. Create Function dbo.Proper(@Data VarChar(8000))
    2. Returns VarChar(8000)
    3. As
    4. Begin
    5.   Declare @Position Int
    6.  
    7.   Select @Data = Stuff(Lower(@Data), 1, 1, Upper(Left(@Data, 1))),
    8.          @Position = PatIndex('%[^a-zA-Z][a-z]%', @Data COLLATE Latin1_General_Bin)
    9.  
    10.   While @Position > 0
    11.     Select @Data = Stuff(@Data, @Position, 2, Upper(SubString(@Data, @Position, 2))),
    12.            @Position = PatIndex('%[^a-zA-Z][a-z]%', @Data COLLATE Latin1_General_Bin)
    13.  
    14.   Return @Data
    15. End

    In my opinion, there are several things that make this function better. It will correctly capitalize all words and it minimizes the number of loops. In fact, it will loop just once for each word that needs to be capitalized.

    About the Author

    George has been developing software professionally for 19 years, first for the department of defense, and then for various other companies. In 1998, George started his software company, Orbit Software, specializing in School Bus Transportation software. His specialty is refining SQL Server queries to deliver optimal performance.
    Social SitingsTwitterLTD RSS Feed
    InstapaperVote on HN

    21 comments

    Comment from: remou [Member] Email
    remou In British English, there are exceptions, 'of' and 'and', for example, are not capitalized, and names get complicated (MacHenry, Machinery).
    02/24/10 @ 08:53
    Comment from: Brad Schulz [Visitor] Email
    Brad Schulz Great stuff, George.

    The one thing that I might add to your PATINDEX is the checking for an apostrophe... perhaps.

    For example, take the word "can't"... we wouldn't want it to come out as "Can'T". However, if this function is truly used for properizing names, then we would certainly want it to convert "o'bryan" to "O'Bryan"... but then again we don't want to end up with "Joe'S Fish Sauce Shop" either.

    That darn apostrophe is a pain, isn't it?

    --Brad
    02/24/10 @ 10:13
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) Thanks Brad. I hadn't thought of that. We could (potentially) look for the apostrophe as the 2nd character in a string and capitalize the 3rd character (assuming the 3rd character is an alpha). This would add some complication, but should be possible.

    Remou's situation is a bit more problematic. We would almost have to have a hard coded list of names, which I really don't like.
    02/24/10 @ 11:25
    Comment from: SQLDenis [Member] Email
    SQLDenis Nice one George, I can see this being beneficial to a lot of people
    02/24/10 @ 12:51
    Comment from: jonmcrawford [Visitor] · http://www.linkedin.com/in/jonmcrawford
    jonmcrawford Or, if you have a Tally (Numbers) table handy, you can get rid of the loop:
    http://www.sqlservercentral.com/Forums/Topic530630-8-2.aspx
    02/25/10 @ 08:05
    Comment from: David Dye [Visitor]
    David Dye Awesome work!
    02/26/10 @ 05:34
    Comment from: niikola [Member] Email
    niikola As we had discussion about most efficient fnSplitString function, what about idea of using that one, capitalize every row and concatenate again? I'm curious how efficient it would be. Another option is to use the same algorithm as in fnSplitString to decide where every word starts. I assume that after every punctuation mark there will be space, and at the moment I can't find any exception apart from open brackets - but that does not mean there are no more exceptions.
    02/28/10 @ 06:32
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky I'm guessing CLR function will beat this algorithm.
    07/07/10 @ 23:11
    Comment from: Jeff Moden [Visitor]
    Jeff Moden Naomi is correct... SQLCLR will beat this algorithm... unless you have a DBA or Company that won't allow it. ;-)

    I'm not sure why this article got such low marks. You just have to appreciate the technical prowess behind using COLLATE to make this problem so much easier and performant than other folks attempts at the same thing. I haven't tested it, yet, but I believe it'll probably beat even a Tally Table solution just because it's such a bloody tight and effecient loop.

    Nicely done, George. Like I said... not sure why others didn't appreciate this function just for its shear genius but I gave this a +5. Thanks for sharing it.
    09/25/10 @ 19:42
    Comment from: Jeff Moden [Visitor]
    Jeff Moden Heh... did I mention I hated moderated forums especially ones that won't let you edit your own bloody post until after it's met approval?
    09/25/10 @ 21:13
    Comment from: Jeff Moden [Visitor]
    Jeff Moden Verified... it beats the Tally Table and smokes at least 1 "expert" idea of how to do this same thing in XML. Sometimes memory-only RBAR isn't RBAR at all ...
    09/26/10 @ 18:33
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) Jeff, thanks for performance testing this.

    Please note that I did not make any claims that this method would be faster than any other. I merely mention that this method reduces the number of loops. Of course, it's implied that reducing the loops will shorten the time, but I make no claims about that either.

    I'm glad you liked this blog post and I appreciate your comments.
    09/27/10 @ 06:11
    Comment from: Jeff Moden [Visitor]
    Jeff Moden Sorry for the late reply, George.

    I know you didn't make any claims as to performance. I did and mostly because While Loops are typical low performers. I just wanted folks to know that your good code didn't fit in that category especially since this is a very commonly requested function.

    I also suspect that a lot of people rated this article as "average" because they didn't actually test the performance of the code.
    12/01/10 @ 20:13
    Comment from: Carrie [Visitor] · http://sarahcarter69.blogspot.com/
    Carrie I study IT at the high school and I just had my first SQL course. This was like a tutorial one, and what I see here, scares me a little. But I will print it and I'm sure that in the future will be very useful. When I wil start to.."speak the language" . :)
    12/02/10 @ 07:46
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) @Carrie,

    As a beginner, you should realize that the functionality presented in this blog is not 'normal'. From a database perspective, this functionality should never be necessary because databases are, at a fundamental level, meant to store and retrieve data.

    Unfortunately, database programmers are often required to clean up data, which is where this type of functionality becomes useful. From a database purist perspective, the database should not be modifying your data, it should simply be storing it.

    I wish you well in your studies. My best advice to you is to realize that your learning will never be done. I've been using Microsoft SQL Server for 10 years and I still find new things to learn.
    12/02/10 @ 08:11
    Comment from: pratyaharam [Visitor]
    pratyaharam I tried many methods and attempted many different approaches and I could not beat George's. So Hats off to you George :-)
    12/03/10 @ 11:03
    Comment from: Carrie [Visitor] · http://sarahcarter69.blogspot.com/
    Carrie Thank you very much George for your support and for encouraging me. I'm sure that I'll always have something new to learn. It's a little harder at the beginning but it's so interesting and useful! It worths the effort. So I'll try to do my best! Thank you again!
    01/19/11 @ 03:08
    Comment from: Viviane Rosa [Visitor]
    Viviane Rosa Thank you very much, for my case i use a data warehouse so performance is not a issue, i just wanted something simple and i've got it. About Joe's or O'Bryan, it is minimum, it is really good to transform capital texts which are hard to read.

    How they say here in New Zealand, Cher.
    01/31/11 @ 13:33
    Comment from: Sandra [Visitor] · http://www.hornypharaoh.com
    Sandra You really are an SQL guru! Thank you so much for all this. Your tutorial helped me a lot!
    03/16/11 @ 09:19
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky I wish Denali has Proper function because using scalar function will slow down the performance.
    10/07/11 @ 13:30
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky Wondering if you ever adjusted this function to handle cases with O' and MacDonald /McNut ?
    12/22/11 @ 14:21

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)