Someone at my job needed to grab a filepath, take out the filename and then split all the pieces into their own columns which were separated by either a dot or an underscore. If you had the following table
CREATE TABLE #test (id INT primary key, PATH VARCHAR(1000))
INSERT #test VALUES(1,'Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc.BAK')
INSERT #test VALUES(2,'Y:dgfdbla_2010_10_15_Pre_Calc.BAK')
INSERT #test VALUES(4,'Y:dgfdgdgdgdgdgdgdSome_Databla_2010_16_Pre_Calc.BAK')
INSERT #test VALUES(3,'Y:dgfdgdgdgdgdgdgdSome_Databla_2010_11_12_Pre_Calc.BAK')
Then this was the required output.
id | PATH | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
1 | bla_2010_10_14_Pre_Calc_BAK | bla | 2010 | 10 | 14 | Pre | Calc | BAK | NULL | NULL |
2 | bla_2010_10_15_Pre_Calc_BAK | bla | 2010 | 10 | 15 | Pre | Calc | BAK | NULL | NULL |
3 | bla_2010_11_12_Pre_Calc_BAK | bla | 2010 | 11 | 12 | Pre | Calc | BAK | NULL | NULL |
4 | bla_2010_16_Pre_Calc_BAK | bla | 2010 | 16 | Pre | Calc | BAK | NULL | NULL | NULL |
I thought that this was an interesting exercise, the person who asked me about this wrote some procedures and user defined functions but it was too slow because he was using a lot of charindex and substring functions and storing results in intermediate variables, then continued parsing until the end of the string.
In order to attack this problem you have to divide and conquer. Here are the four steps that are needed
**
- Take the dot out
- Get just the filename
- Split the string
- Pivot the results
1 Take the dot out
This is pretty easy, all you have to do is use the replace function and replace ‘.’ with ‘_’
declare @FilePath varchar(100)
SELECT @FilePath = 'Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc.BAK'
SELECT REPLACE(@FilePath,'.','_')
Result
————————————————-
Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc_BAK
2 Get just the filename
To get just the filename, you use the reverse function, then you look for the position of the first backslash, you will then use the right function and use the position of the first backslash as the number of characters to keep. You add -1 to the length because you do not want the backslash included.
declare @FilePath varchar(100)
SELECT @FilePath = 'Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc.BAK'
SELECT RIGHT(@FilePath,PATINDEX('%%',REVERSE(@FilePath))-1) AS PATH
Result
————————————————-
bla_2010_10_14_Pre_Calc.BAK
2A combine 1 and 2
This is just a combination of 1 and 2, take the dot out and get just the file name
declare @FilePath varchar(100)
SELECT @FilePath = 'Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc.BAK'
SELECT RIGHT(REPLACE(@FilePath,'.','_'),PATINDEX('%%',REVERSE(@FilePath))-1) AS PATH
Result
————————————–
bla_2010_10_14_Pre_Calc_BAK
3 Split the string
To split the string we can use the built in table of numbers master..spt_values. Running the code below
declare @FilePath varchar(100)
SELECT @FilePath = 'Y:dgfdgdgd_Databla_2010_10_14_Pre_Calc.BAK'
SELECT @FilePath = RIGHT(REPLACE(@FilePath,'.','_'),PATINDEX('%%',REVERSE(@FilePath))-1)
SELECT @FilePath as FilePath, SUBSTRING('_' + @FilePath + '_', Number + 1,
CHARINDEX('_', '_' + @FilePath + '_', Number + 1) - Number -1)AS VALUE
FROM master..spt_values v
WHERE TYPE = 'P'
AND Number <= LEN('_' + @FilePath + '_') - 1
AND SUBSTRING('_' + @FilePath + '_', Number, 1) = '_'
This is the output
FilePath | VALUE |
---|---|
bla_2010_10_14_Pre_Calc_BAK | bla |
bla_2010_10_14_Pre_Calc_BAK | 2010 |
bla_2010_10_14_Pre_Calc_BAK | 10 |
bla_2010_10_14_Pre_Calc_BAK | 14 |
bla_2010_10_14_Pre_Calc_BAK | Pre |
bla_2010_10_14_Pre_Calc_BAK | Calc |
bla_2010_10_14_Pre_Calc_BAK | BAK |
We are almost there, instead of just one string we want to grab the whole table. We do a cross join with the numbers table and also use the ROW_NUMBER function to assign a column number which we will use as the column name later
Run this to see what we have so far
SELECT id,PATH, SUBSTRING('_' + PATH + '_', Number + 1,
CHARINDEX('_', '_' + PATH + '_', Number + 1) - Number -1)AS VALUE,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id,number ) AS ROW
FROM master..spt_values v
CROSS JOIN (SELECT id,RIGHT(REPLACE(PATH,'.','_'),PATINDEX('%%',REVERSE(PATH))-1) AS PATH
FROM #test ) AS #test
WHERE TYPE = 'P'
AND Number <= LEN('_' + PATH + '_') - 1
AND SUBSTRING('_' + PATH + '_', Number, 1) = '_'
Here is the output
id | PATH | VALUE | ROW |
---|---|---|---|
1 | bla_2010_10_14_Pre_Calc_BAK | bla | 1 |
1 | bla_2010_10_14_Pre_Calc_BAK | 2010 | 2 |
1 | bla_2010_10_14_Pre_Calc_BAK | 10 | 3 |
1 | bla_2010_10_14_Pre_Calc_BAK | 14 | 4 |
1 | bla_2010_10_14_Pre_Calc_BAK | Pre | 5 |
1 | bla_2010_10_14_Pre_Calc_BAK | Calc | 6 |
1 | bla_2010_10_14_Pre_Calc_BAK | BAK | 7 |
2 | bla_2010_10_15_Pre_Calc_BAK | bla | 1 |
2 | bla_2010_10_15_Pre_Calc_BAK | 2010 | 2 |
2 | bla_2010_10_15_Pre_Calc_BAK | 10 | 3 |
2 | bla_2010_10_15_Pre_Calc_BAK | 15 | 4 |
2 | bla_2010_10_15_Pre_Calc_BAK | Pre | 5 |
2 | bla_2010_10_15_Pre_Calc_BAK | Calc | 6 |
2 | bla_2010_10_15_Pre_Calc_BAK | BAK | 7 |
3 | bla_2010_11_12_Pre_Calc_BAK | bla | 1 |
3 | bla_2010_11_12_Pre_Calc_BAK | 2010 | 2 |
3 | bla_2010_11_12_Pre_Calc_BAK | 11 | 3 |
3 | bla_2010_11_12_Pre_Calc_BAK | 12 | 4 |
3 | bla_2010_11_12_Pre_Calc_BAK | Pre | 5 |
3 | bla_2010_11_12_Pre_Calc_BAK | Calc | 6 |
3 | bla_2010_11_12_Pre_Calc_BAK | BAK | 7 |
4 | bla_2010_16_Pre_Calc_BAK | bla | 1 |
4 | bla_2010_16_Pre_Calc_BAK | 2010 | 2 |
4 | bla_2010_16_Pre_Calc_BAK | 16 | 3 |
4 | bla_2010_16_Pre_Calc_BAK | Pre | 4 |
4 | bla_2010_16_Pre_Calc_BAK | Calc | 5 |
4 | bla_2010_16_Pre_Calc_BAK | BAK | 6 |
The missing piece is transposing the columns, this is easily accomplished by using the PIVOT operator. As you can see we use the ROW_NUMBER function to create a number for us, this number is then used as a column name by the Pivot operator. Here is the final query which produced the desired output
SELECT * FROM
(
SELECT id,PATH, SUBSTRING('_' + PATH + '_', Number + 1,
CHARINDEX('_', '_' + PATH + '_', Number + 1) - Number -1)AS VALUE,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id,number ) AS ROW
FROM master..spt_values v
CROSS JOIN (SELECT id,RIGHT(REPLACE(PATH,'.','_'),PATINDEX('%%',REVERSE(PATH))-1) AS PATH
FROM #test ) AS #test
WHERE TYPE = 'P'
AND Number <= LEN('_' + PATH + '_') - 1
AND SUBSTRING('_' + PATH + '_', Number, 1) = '_') AS pivTemp
PIVOT
( MAX(VALUE)
FOR ROW IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])
) AS pivTable
The query produces the desired output.
id | PATH | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
1 | bla_2010_10_14_Pre_Calc_BAK | bla | 2010 | 10 | 14 | Pre | Calc | BAK | NULL | NULL |
2 | bla_2010_10_15_Pre_Calc_BAK | bla | 2010 | 10 | 15 | Pre | Calc | BAK | NULL | NULL |
3 | bla_2010_11_12_Pre_Calc_BAK | bla | 2010 | 11 | 12 | Pre | Calc | BAK | NULL | NULL |
4 | bla_2010_16_Pre_Calc_BAK | bla | 2010 | 16 | Pre | Calc | BAK | NULL | NULL | NULL |
Conclusion
As you can see, if you apply small steps and divide the problem, you can easily solve it even if it looks very difficult at first glance. Remember most likely you can already use something which is built into the product, try not to reinvent the wheel.
*** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum