Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Who shrunk the files? (DBCC SHRINKFILE Audit)PASS Summit is upon us »
    comments

    Someone at my job needed to grab a filepath, take out the filename and then split all the pieces into their own columns which were separated by either a dot or an underscore. If you had the following table

    1. CREATE TABLE #test (id INT primary key, PATH VARCHAR(1000))
    2. INSERT #test VALUES(1,'Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc.BAK')
    3. INSERT #test VALUES(2,'Y:\d\gfd\bla_2010_10_15_Pre_Calc.BAK')
    4. INSERT #test VALUES(4,'Y:\d\gfd\gd\gdg\dg\d\gd\gd\Some_Data\bla_2010_16_Pre_Calc.BAK')
    5. INSERT #test VALUES(3,'Y:\d\gfd\gd\gdg\dg\d\gdgdSome_Data\bla_2010_11_12_Pre_Calc.BAK')

    Then this was the required output.

    id PATH 1 2 3 4 5 6 7 8 9
    1 bla_2010_10_14_Pre_Calc_BAK bla 2010 10 14 Pre Calc BAK NULL NULL
    2 bla_2010_10_15_Pre_Calc_BAK bla 2010 10 15 Pre Calc BAK NULL NULL
    3 bla_2010_11_12_Pre_Calc_BAK bla 2010 11 12 Pre Calc BAK NULL NULL
    4 bla_2010_16_Pre_Calc_BAK bla 2010 16 Pre Calc BAK NULL NULL NULL

    I thought that this was an interesting exercise, the person who asked me about this wrote some procedures and user defined functions but it was too slow because he was using a lot of charindex and substring functions and storing results in intermediate variables, then continued parsing until the end of the string.

    In order to attack this problem you have to divide and conquer. Here are the four steps that are needed

    1. Take the dot out
    2. Get just the filename
    3. Split the string
    4. Pivot the results


    1 Take the dot out

    This is pretty easy, all you have to do is use the replace function and replace '.' with '_'

    1. declare @FilePath varchar(100)
    2. SELECT @FilePath = 'Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc.BAK'
    3. SELECT REPLACE(@FilePath,'.','_')

    Result
    -------------------------------------------------
    Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc_BAK


    2 Get just the filename

    To get just the filename, you use the reverse function, then you look for the position of the first backslash, you will then use the right function and use the position of the first backslash as the number of characters to keep. You add -1 to the length because you do not want the backslash included.

    1. declare @FilePath varchar(100)
    2. SELECT @FilePath = 'Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc.BAK'
    3. SELECT RIGHT(@FilePath,PATINDEX('%\%',REVERSE(@FilePath))-1) AS PATH

    Result
    -------------------------------------------------
    bla_2010_10_14_Pre_Calc.BAK


    2A combine 1 and 2

    This is just a combination of 1 and 2, take the dot out and get just the file name

    1. declare @FilePath varchar(100)
    2. SELECT @FilePath = 'Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc.BAK'
    3. SELECT RIGHT(REPLACE(@FilePath,'.','_'),PATINDEX('%\%',REVERSE(@FilePath))-1) AS PATH

    Result
    --------------------------------------
    bla_2010_10_14_Pre_Calc_BAK


    3 Split the string

    To split the string we can use the built in table of numbers master..spt_values. Running the code below

    1. declare @FilePath varchar(100)
    2. SELECT @FilePath = 'Y:\d\gfd\gd\gd_Data\bla_2010_10_14_Pre_Calc.BAK'
    3. SELECT @FilePath = RIGHT(REPLACE(@FilePath,'.','_'),PATINDEX('%\%',REVERSE(@FilePath))-1)
    4.  
    5. SELECT @FilePath as FilePath, SUBSTRING('_' + @FilePath + '_', Number + 1,
    6.     CHARINDEX('_', '_' + @FilePath + '_', Number + 1) - Number -1)AS VALUE
    7.     FROM master..spt_values v
    8.     WHERE TYPE = 'P'
    9.     AND Number <= LEN('_' + @FilePath  + '_') - 1
    10.     AND SUBSTRING('_' + @FilePath  + '_', Number, 1) = '_'

    This is the output

    FilePathVALUE
    bla_2010_10_14_Pre_Calc_BAK bla
    bla_2010_10_14_Pre_Calc_BAK 2010
    bla_2010_10_14_Pre_Calc_BAK 10
    bla_2010_10_14_Pre_Calc_BAK 14
    bla_2010_10_14_Pre_Calc_BAK Pre
    bla_2010_10_14_Pre_Calc_BAK Calc
    bla_2010_10_14_Pre_Calc_BAK BAK

    We are almost there, instead of just one string we want to grab the whole table. We do a cross join with the numbers table and also use the ROW_NUMBER function to assign a column number which we will use as the column name later
    Run this to see what we have so far

    1. SELECT id,PATH, SUBSTRING('_' + PATH + '_', Number + 1,
    2.     CHARINDEX('_', '_' + PATH + '_', Number + 1) - Number -1)AS VALUE,
    3.     ROW_NUMBER() OVER (PARTITION BY id ORDER BY id,number ) AS ROW
    4.     FROM master..spt_values v
    5.     CROSS JOIN (SELECT id,RIGHT(REPLACE(PATH,'.','_'),PATINDEX('%\%',REVERSE(PATH))-1) AS PATH  
    6.     FROM #test ) AS #test
    7.     WHERE TYPE = 'P'
    8.     AND Number <= LEN('_' + PATH  + '_') - 1
    9.     AND SUBSTRING('_' + PATH  + '_', Number, 1) = '_'

    Here is the output

    id PATH VALUE ROW
    1 bla_2010_10_14_Pre_Calc_BAK bla 1
    1 bla_2010_10_14_Pre_Calc_BAK 2010 2
    1 bla_2010_10_14_Pre_Calc_BAK 10 3
    1 bla_2010_10_14_Pre_Calc_BAK 14 4
    1 bla_2010_10_14_Pre_Calc_BAK Pre 5
    1 bla_2010_10_14_Pre_Calc_BAK Calc 6
    1 bla_2010_10_14_Pre_Calc_BAK BAK 7
    2 bla_2010_10_15_Pre_Calc_BAK bla 1
    2 bla_2010_10_15_Pre_Calc_BAK 2010 2
    2 bla_2010_10_15_Pre_Calc_BAK 10 3
    2 bla_2010_10_15_Pre_Calc_BAK 15 4
    2 bla_2010_10_15_Pre_Calc_BAK Pre 5
    2 bla_2010_10_15_Pre_Calc_BAK Calc 6
    2 bla_2010_10_15_Pre_Calc_BAK BAK 7
    3 bla_2010_11_12_Pre_Calc_BAK bla 1
    3 bla_2010_11_12_Pre_Calc_BAK 2010 2
    3 bla_2010_11_12_Pre_Calc_BAK 11 3
    3 bla_2010_11_12_Pre_Calc_BAK 12 4
    3 bla_2010_11_12_Pre_Calc_BAK Pre 5
    3 bla_2010_11_12_Pre_Calc_BAK Calc 6
    3 bla_2010_11_12_Pre_Calc_BAK BAK 7
    4 bla_2010_16_Pre_Calc_BAK bla 1
    4 bla_2010_16_Pre_Calc_BAK 2010 2
    4 bla_2010_16_Pre_Calc_BAK 16 3
    4 bla_2010_16_Pre_Calc_BAK Pre 4
    4 bla_2010_16_Pre_Calc_BAK Calc 5
    4 bla_2010_16_Pre_Calc_BAK BAK 6

    The missing piece is transposing the columns, this is easily accomplished by using the PIVOT operator. As you can see we use the ROW_NUMBER function to create a number for us, this number is then used as a column name by the Pivot operator. Here is the final query which produced the desired output

    1. SELECT * FROM
    2.    (
    3.     SELECT id,PATH, SUBSTRING('_' + PATH + '_', Number + 1,
    4.     CHARINDEX('_', '_' + PATH + '_', Number + 1) - Number -1)AS VALUE,
    5.     ROW_NUMBER() OVER (PARTITION BY id ORDER BY id,number ) AS ROW
    6.     FROM master..spt_values v
    7.     CROSS JOIN (SELECT id,RIGHT(REPLACE(PATH,'.','_'),PATINDEX('%\%',REVERSE(PATH))-1) AS PATH  
    8.     FROM #test ) AS #test
    9.     WHERE TYPE = 'P'
    10.     AND Number <= LEN('_' + PATH  + '_') - 1
    11.     AND SUBSTRING('_' + PATH  + '_', Number, 1) = '_') AS pivTemp
    12.     PIVOT
    13. (   MAX(VALUE)
    14.     FOR ROW IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])
    15.   ) AS pivTable

    The query produces the desired output.

    id PATH 1 2 3 4 5 6 7 8 9
    1 bla_2010_10_14_Pre_Calc_BAK bla 2010 10 14 Pre Calc BAK NULL NULL
    2 bla_2010_10_15_Pre_Calc_BAK bla 2010 10 15 Pre Calc BAK NULL NULL
    3 bla_2010_11_12_Pre_Calc_BAK bla 2010 11 12 Pre Calc BAK NULL NULL
    4 bla_2010_16_Pre_Calc_BAK bla 2010 16 Pre Calc BAK NULL NULL NULL


    Conclusion

    As you can see, if you apply small steps and divide the problem, you can easily solve it even if it looks very difficult at first glance. Remember most likely you can already use something which is built into the product, try not to reinvent the wheel.




    *** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
    Social SitingsTwitterFacebookLinkedInHomePageFlickrLTD RSS Feed
    1361 views
    Instapaper

    3 comments

    Comment from: Mike Blaszczak [Visitor] · http://www.screwylizardracing.com/
    Mike Blaszczak Why not use a .NET user defined function? Wouldn't that be faster? Aren't applications like this entirely the point of CLR integration?
    10/23/10 @ 14:58
    Comment from: SQLDenis [Member] Email
    SQLDenis Mike, CLR is a good option however a lot of shops won't enable CLR and the box that this code was running on didn't have it enabled either
    10/23/10 @ 15:07
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) CLR is a good option but the memory concern is also a factor. for this, it probably wouldn't be an issue but for thousands of executions, it could become a bottleneck.

    I had something similar (although doing a few more complex processing) in place and had to revert back due to memory utilization. Although I handle it in SSIS now on a job server so memory is controlled outside of the source. (another good option)

    10/25/10 @ 05:52

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)