Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « SQL Server 2005 Service Pack 3 has been releasedSQL Friday, The Best SQL Server Links Of The Past Week Episode 2 »
    comments

    We all have perfectly normalized tables, with perfectly scrubbed data, right? I wish! Sometimes we are stuck with dirty data in legacy applications. What's worse is that we are sometimes expected to do interesting things with dirty data. In this blog, I will show you how to extract a number from a varchar column that contains letters and numbers.

    First, let’s take a look at what some of that data might look like:

    2.1 miles
    4 miles
    Approximately 6.5 miles
    3.9
    7.2miles

    Suppose we wanted to extract just the numeric portion of the text. How would we do this? My first reaction was to use PatIndex to find the first non-numeric character. Unfortunately, this won’t work because of the 3rd row (Approximately 6.5 miles). Then, I thought about CharIndex, knowing there is an optional 3rd parameter that allows us to pick the starting location for the search. Unfortunately, CharIndex doesn’t allow pattern matching, and PatIndex doesn’t accommodate starting somewhere other than the beginning.

    How do we do this? Well… it get’s tricky.

    The first thing we need to do is to find the character that is a number. For this, we can use PatIndex:

    Select PatIndex('%[0-9.-]%', Data)

    Next, Let's remove any characters from the beginning of the string, like this:

    Select SubString(Data, PatIndex('%[0-9.-]%', Data), 8000)

    This allows us to accommodate any characters that appear before the numbers. The substring result forces the numbers to the beginning of the string. Next step is to determine where the numbers end. For that, we can use PatIndex again.

    Select PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000))

    PatIndex returns an integer for the first character that matches. We'll want to use this in a LEFT function, there there is a potential problem. If there's no match, PatIndex will return 0. So, we should make sure that PatIndex will always return at least one. We can trick the system by appending a character before running the PatIndex function. Like this:

    Select PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000) + 'X')

    By including something we know will match the pattern, we guarantee that we get a number greater than zero from the PatIndex function.

    Next step is to get the left part of the substring, which will return just the numbers. Like this:

    Select Left(SubString(Data, PatIndex('%[0-9.-]%', Data), 8000), PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000) + 'X')-1)

    Now, I will admit that this is a very ugly formula to use. However, there is a high probability that it is re-usable. Because of this, it would make a nice little function to have in our SQL Arsenal.

    1. Create Function dbo.GetNumbers(@Data VarChar(8000))
    2. Returns VarChar(8000)
    3. AS
    4. Begin   
    5.     Return Left(
    6.              SubString(@Data, PatIndex('%[0-9.-]%', @Data), 8000),
    7.              PatIndex('%[^0-9.-]%', SubString(@Data, PatIndex('%[0-9.-]%', @Data), 8000) + 'X')-1)
    8. End

    Now, we can use this function wherever we need it. As a final step, let's test it on our sample data. Before doing this, let's think about other data that could potentially cause us problems. We should test an empty string, NULL, a string without any numbers, and a string that contains multiple numbers separated by characters.
    Declare @Temp Table(Data VarChar(60))

    1. Insert Into @Temp Values('2.1 miles')
    2. Insert Into @Temp Values('4 miles')
    3. Insert Into @Temp Values('Approximately 6.5 miles')
    4. Insert Into @Temp Values('3.9')
    5. Insert Into @Temp Values('7.2miles')
    6. Insert Into @Temp Values('')
    7. Insert Into @Temp Values(NULL)
    8. Insert Into @Temp Values('No Numbers Here')
    9. Insert Into @Temp Values('approximately 2.5 miles, but less than 3')
    10.  
    11. Select Data, dbo.GetNumbers(Data)
    12. From   @Temp

    The results are:

    
    Data                                     Numbers
    ---------------------------------------- -------
    2.1 miles                                2.1
    4 miles                                  4
    Approximately 6.5 miles                  6.5
    3.9                                      3.9
    7.2miles                                 7.2
                                             
    NULL                                     NULL
    No Numbers Here                          
    approximately 2.5 miles, but less than 3 2.5
    

    I wish we could count on our data being perfect, but we really can't. The best we can do is to use the data we are given in a way that works.

    About the Author

    George has been developing software professionally for 19 years, first for the department of defense, and then for various other companies. In 1998, George started his software company, Orbit Software, specializing in School Bus Transportation software. His specialty is refining SQL Server queries to deliver optimal performance.
    Social SitingsTwitterLTD RSS Feed
    Instapaper

    33 comments

    Comment from: Ted Krueger (onpnt) [Member]
    *****
    Ted Krueger (onpnt) This is excellent and I owe you thanks again ;)
    12/12/08 @ 12:54
    Comment from: Stephen Moore [Visitor] · https://www.accordhr.com
    ****-
    Stephen Moore Simple way to do this in Oracle is

    SELECT TRANSLATE('blah123blah', '0123456789abcdefghijklmnopqrstuvwxyz','0123456789')
    FROM dual;

    The translate function looks for a character by character translation. If it only gets a match on numerics it will strip out the remaining characters. Might need to add uppercase characters to 2nd arg, not sure.
    12/12/08 @ 14:14
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) how do decimal values get handled in your example? It looks like a basic pattern match function to me and not really what the author was getting at. In your comment a basic LIKE regex pattern is the equal in T-SQL and not much to do with the actual blog
    12/13/08 @ 10:32
    Comment from: David Krueger [Visitor] Email
    *****
    David Krueger This worked great, it is going to be a real life saver.
    01/19/09 @ 09:13
    Comment from: John Titi [Visitor] · http://www.pngpeles.com
    *****
    John Titi This is an excellent function. I have included in mine as well as clients databases with aknowledgement for the author.
    02/20/09 @ 18:13
    Comment from: Rich Ellingham [Visitor]
    *****
    Rich Ellingham Massive thanks, I looked in numerous places for such an answer. Many thanks.
    03/18/09 @ 09:12
    Comment from: Alex Ullrich [Member] Email
    Alex Ullrich I needed just this today :)
    05/19/09 @ 13:31
    Comment from: Jason Adams [Visitor] Email
    *****
    Jason Adams George,

    I was looking for something like this the other day. Found it via
    tek-tips. Thanks again!
    05/29/09 @ 08:24
    Comment from: Nate [Visitor]
    *****
    Nate THANKS! Great function for deailing with very poor database design. You have saved me a lot of time. Thanks.
    07/01/09 @ 07:39
    Comment from: Maureen [Visitor] Email
    *****
    Maureen How can i extract percentages only?

    Eg: 0.35 from "Up to 0.35% pa"
    08/24/09 @ 00:46
    Comment from: Mark [Visitor]
    *****
    Mark First off, great! We have people who submit data via cellphone and I needed a way to get just the numbers out of the messages that they send. I can't rely only on this function to do all the cleaning that's required, and I've written some of my own code to bulk it out, but it certainly helped!

    I did some playing and I want to ask if this is not just as good/better than your final query above:

    SELECT SUBSTRING(@NumTrim, PATINDEX('%[0-9.-]%', @NumTrim), PATINDEX('%[^0-9.-]%', SUBSTRING(@NumTrim, PATINDEX('%[0-9.-]%', @NumTrim), 8000)+'X')-1)

    I've simply removed the LEFT() expression and used the same info to feed one substring. It works with the data I've fed it, but am I missing something?
    09/14/09 @ 05:09
    Comment from: sql_Quest [Visitor] Email · http://sqlquest.blogspot.com/
    *****
    sql_Quest The article is excellent. Liked a lot !

    Maureen:

    the code works fine for extracting 0.35 from "Up to 0.35% pa"

    09/14/09 @ 14:59
    Comment from: Vijaya Kadiyala [Visitor] · http://www.dotnetvj.com
    Vijaya Kadiyala This article is good but one question, but for the string
    "approximately 2.5 miles, but less than 3" function returned only 2.5.
    Does it mean to find only the first occurence of the numbers?

    Thanks -- Vijaya Kadiyala
    www.DotNetVJ.com

    12/20/09 @ 19:15
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) Vijaya,

    Yes. This is only meant to find the first occurrence of a number within a string. If you have a different requirement than this, please post a question in the forum here, and I (and others) will be glad to help.

    http://forum.lessthandot.com/viewforum.php?f=17
    12/21/09 @ 06:12
    Comment from: Alex Dybenko [Visitor] Email · http://accessblog.net
    Alex Dybenko Thanks, great article!
    02/04/10 @ 01:56
    Comment from: Alex Dybenko [Visitor] Email · http://accessblog.net
    Alex Dybenko Here a function to create series based on this one:
    t-sql Increase series number
    02/04/10 @ 02:07
    Comment from: Steve J [Visitor]
    Steve J Excellent article and very helpful!

    Thanks
    03/09/10 @ 16:39
    Comment from: niikola [Member] Email
    niikola CTE to extract all numbers from string:


    Declare @txt varchar(4000) ='ab123sdf45.6dfgh. 6fgjh7df12345.67gh9a123,456.79aa8'
    Declare @num varchar(10) = '%[0-9.,]%'
    Declare @oth varchar(10) = '%[^0-9.,]%'

    Set @txt+='X'

    ;With a as (
    Select 1 as i,
    Substring(@txt,Patindex(@num,@txt),patindex(@oth,Substring(@txt,Patindex(@num,@txt),4000))-1) as num,
    substring(@txt,Patindex(@num,@txt)+patindex(@oth,Substring(@txt,Patindex(@num,@txt),4000))-1,4000) as txt
    Union All
    Select i+1,
    Substring(txt,Patindex(@num,txt),patindex(@oth,Substring(txt,Patindex(@num,txt),4000))-1),
    substring(txt,Patindex(@num,txt)+patindex(@oth,Substring(txt,Patindex(@num,txt),4000))-1,4000)
    from a
    Where txt like '%[0-9]%'
    )
    Select i, num from a
    Where num like '%[0-9]%'


    10/28/10 @ 01:36
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky Here is another interesting approach to keep only numbers (all numbers) and remove anything else from the string - Alex Kuznetsov idea

    from this thread Extracting numbers from string
    11/15/10 @ 07:12
    Comment from: Srr [Visitor] Email
    Srr Hi,

    Thanks, great article!

    I want extract first numeric values of the string, like:

    Ex.
    Input output
    ------ ---------
    # 1 0
    6 patches 6
    - 0
    1 15 gm tube 1
    1 BOX-8 1
    1/2pak 1
    1`5 gm 1
    16 packs/7e 16
    one box(100) 0
    1-28 day pak 1
    01/16/11 @ 22:37
    Comment from: Srr [Visitor]
    Srr input
    ----------
    # 1
    6 patches
    -
    1 15 gm tube
    1 BOX-8
    1/2pak
    1`5 gm
    16 packs/7e
    one box(100)
    1-28 day pak


    output
    ---------
    0
    6
    0
    1
    1
    1
    1
    16
    0
    1
    01/17/11 @ 02:31
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky Try


    declare @t table(Col varchar(100))
    insert into @t
    select
    '# 1'
    union all select
    '6 patches'
    union all select
    '- '
    union all select
    '1 15 gm tube'
    union all select
    '1 BOX-8'
    union all select
    '1/2pak'
    union all select
    '1`5 gm'
    union all select
    '16 packs/7e'
    union all select
    'one box(100)'
    union all select
    '1-28 day pak'

    select COL, F.Pos, F2.Number
    from @t
    cross apply (select PATINDEX ('%[^0-9]%',Col) as Pos) F
    cross apply (select Case when F.Pos > 1 then cast(left(Col, Pos - 1) as int) else 0 end as [Number]) F2
    01/17/11 @ 12:57
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) Naomi,

    Your query fails when the string is all numbers. For example, add "union all select '128'" to the end of your sample data.

    Srr, you will probably have better luck if you try this:

    Convert(Int, Left(YourColumnName, PatIndex('%[^0-9]%', YourColumnName + ' ')-1))

    01/17/11 @ 13:06
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky Yes. This is also simpler than cross apply and checking Pos.
    01/17/11 @ 13:32
    Comment from: Srr [Visitor]
    Srr Thanks George Mastros,

    But i want like 14.5 values also.
    I want in numeric(12,2) not in int,i tried but getting error.

    Convert(numeric(12,2), Left(col, PatIndex('%[^0-9.]%', col + ' ')-1))
    from table

    Thanks in advance,
    Srr
    01/17/11 @ 21:30
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) Srr,

    You are getting an error because you CAN convert an empty string to an int, but not to a decimal.

    Try this:
    Coalesce(Convert(Decimal(12,2), NullIf(Left(Col, PatIndex('%[^0-9.]%', Col + ' ')-1), '')), 0)

    Instead of directly converting to decimal, we use the NULLIF function to convert empty strings to nulls. Then we convert, and finally use coalesce to convert the nulls to 0.
    01/18/11 @ 07:30
    Comment from: Srr [Visitor]
    Srr Hi George Mastros,

    It is very nice.U helped me alot,my problem solved.
    In my query i used like this:
    case charindex('.',col)
    when 1 then case isnumeric(col) when 1 then col else '0' end
    else Coalesce(Convert(numeric(12,2),NullIf(Left(ltrim(col),PatIndex('%[^0-9.]%',ltrim(col) + ' ')-1), '')),0)
    end

    Thanks,
    Srr

    01/18/11 @ 21:51
    Comment from: Srr [Visitor]
    Srr Hi,
    I have table like below,

    first_nm last_nm Gender
    Raj s F
    Rajp e M
    Kar Sh M
    De re M
    Sw ko F
    Kal la F
    Sa Sa M

    now i need to update my table each row last_nm with another record last_nm within gender(means female-female,male-male)

    example:
    first_nm last_nm Gender
    Rajp re M
    Kar e M
    De Sh M
    Sw s F
    Kal ko F
    Raj la F

    Thanks,
    Srr
    01/19/11 @ 04:37
    Comment from: Paul [Visitor] · http://methodCRM.com
    Paul Awesome code. I was looking for the most efficient way to get the next increment of invoice numbers, where there can be an alphanumeric prepender and post, where the max number is not guranteed to be the last record. This code works flawlessly. Thank you.
    03/17/11 @ 20:04
    Comment from: Anand [Visitor] Email
    Anand I want to re-write the query given above:

    Select Left(SubString(Data, PatIndex('%[0-9.-]%', Data), 8000), PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000) + 'X')-1)

    but in Teradata syntax. Some part the about query does not work the way it should in Teradata. For example, '%[0-9.-]%' is not working in Teradata.
    please help
    08/21/11 @ 12:49
    Comment from: kasinathpatil [Visitor]
    kasinathpatil declare @debid bigint
    declare @address varchar(200) = null
    set @debid=(select debtor_id from account where debtor_id=34691)
    set @address = (Select Left(SubString(information, PatIndex('%[0-9.-]%', information), 8000), PatIndex('%[^0-9.-]%', SubString(information, PatIndex('%[0-9.-]%', information), 8000) + 'X')-1)
    from debtor_info where description = 'Mailing Address' and score > 0 and debtor_id=@debid)
    print @address




    I have this address--(30 JLN SAM HUP HUT 34900 PANTAIREMIS PERAK)
    i need to pik pincode like -34900 from the abow data can u please send me query ,im using abow query but it reterns only first digit of the dada...
    10/28/11 @ 04:36
    Comment from: George Mastros (gmmastros) [Member]
    George Mastros (gmmastros) @kasinathpatil,

    The code in this blog is meant to find any number. It appears as though you are looking for a 5 digit number, which does change this a bit. As such, I would encourage you to try this instead.

    -------------------------------------------------
    Declare @Information VarChar(200)

    Set @Information = '30 JLN SAM HUP HUT 34900 PANTAIREMIS PERAK'

    Select Case When PatIndex('%[0-9.-][0-9.-][0-9.-][0-9.-][0-9.-]%', @information) > 0
    Then SubString(@Information, PatIndex('%[0-9.-][0-9.-][0-9.-][0-9.-][0-9.-]%', @information), 5)
    Else '' End
    -------------------------------------------------

    If a 5 digit number is found, it will be returned. If there is no 5 digit number, this code will return an empty string.

    One word of caution, if you have a string of numbers longer than 5 digits, this code will return the first 5 digits of that number. For example, if you have a 5 (or more) digit house number, this code will return that instead of the 5 digit number.
    10/28/11 @ 06:29
    Comment from: jasmit [Visitor] · http://tutjunction.com
    jasmit very helpful
    thanks
    03/06/12 @ 03:17

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)