Scott Stauffer (blog | twitter) asked a question on #sqlhelp a few days ago and thought I’d go into actually doing an example on the task he was trying to accomplish.
Below is a snapshot of the tweet.

My first thought was to throw in a script component to transformation. With each row input into the buffer the row could be parsed to validate the header. If the unique pattern in the string was found, then lock a variable and assign the value to it
Let’s get some things out of the way
The script component will read every row. That in itself will run slower than other methods such as directly importing without a transformation. This is the beauty of SSIS though and the 6 ways to do anything. In our situation we have, a script task in front of the data flow task could be utilized to grab the header and assign it to the variable while deleting it. This would be much more efficient.
Special circumstances do call for methods that may perform slower at times to provide stability and validation processing. The reasoning for showing this method is to provide the validation on the rows from regular expressions and pattern matches. This will provide a higher amount of freedom on resources as well by reading the rows into the buffer and out. When methods such as the script task are used to modify flat files prior to imports, the entire file is pulled into memory and given some large imports, this can be a hardship to manage on mid-size servers.
Let’s look at the two methods that are mentioned above. The first will be the script component to match the string value we need.
Create a package named header_Split and the following supporting tables in your lab database
CREATE TABLE [dbo].[header]( [Column 0] [varchar](50) NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[header_split]( [Column 0] [varchar](50) NULL ) ON [PRIMARY] GO
Create a variable at the package scope level named, global.
Our global connections for the package will be one flat file connection named, “importer” and an OLEDB connection named as the instance. Our flat file will contain the following text
header <grab me> string test test test test
Bring over a data flow task and let’s name it, “DF Flat File Pump”

In the data flow task, bring over a flat file source and configure it to use our importer connection.
Bring over a script component now and when the “Select Script Component Type” appears, select the “Transformation” option.
In the Script Transformation Editor, select our one column (default: Column 0)
Go into the, “Input and Outputs” window and select the output.
If the names are all left as the defaults, the output will be titled, Output 0.
Change the ExclusionGroup to 1.
Add a new Output next, Output 1.
In the new output change the ExclusionGroup group to 1 as well and set the SynchronousInputID to the Input 0 ID. This will be listed in the dropdown.

Select the Script window and add into the ReadWriteVariables the variable, “global”
Go into the studio by clicking the, “design Script” next.
The code to perform the action is below. It will utilize Regular Expressions to first find the match to the unique value we need and then assign it to the variable global. In order to assign a variable in the script component without throwing Post Execute errors, we must lock the variable by using the VariableDispenser.LockOnForWrite function.
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Imports System.Text.RegularExpressions Public Class ScriptMain Inherits UserComponent Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) If Regex.IsMatch(Row.Column0, "<[w]{4}s[w]{2}>") Then Dim mismatch As Match Dim pattern As New Regex("<[w]{4}s[w]{2}>") mismatch = pattern.Match(Row.Column0) If mismatch.Length > 0 Then Dim var As IDTSVariables90 Me.VariableDispenser.LockOneForWrite("global", var) var("global").Value = mismatch.Value var.Unlock() End If Row.DirectRowToOutput0() Else Row.DirectRowToOutput1() End If End Sub End Class
Drag and drop two OLEDB Destination’s into the data flow and rename them “data pump” and “headers”. We are only creating the headers destination for our testing purposes to validate the headers coming out and for later utilization if needed.
Connect the Output 1 to the data pump destination and map the column to the column in the header_split table. Connect the Output 0 to the headers destination and map the output column to the one column in the header table then.
Execute the package.

The results are showing the output that we found as a header based on our regular expression pattern match flows to the headers destination. At the same time, the data we need flows to the data pump destination.
Querying this in SSMS shows

The major problem with this process is the fact we are processing row by row while putting the row under a microscope. Imagine this in a 300M row import. It wouldn’t go very well but would go if needed.
The next solution that will be much easier to implement, maintain and gain performance will follow in a second part to this post. We will utilize a script task to manipulate the file before hand, assign the variable, delete the header and import the data as needed.
6 Comments
This is cool! … I have a slight variation on the same problem, wanted to see if you have any thoughts…. The problem I have is an occasional “extra” cr/lf in the file which causes the line to wrap/split in unexpected places. The problem is that the file-reader is reading a row at a time, and those rows get split… any ideas on removing “extra” cr/lf’s (leaving the correct one at the end of the row). (not a fixed width file… a delimited csv).
I like this solution, but I did come up with another that does quite similar without any scripting. I guess that means I have to get off my duff and blog about it and reference your blog. I am glad my question inspired a blog post. I think it would make sense to a have a robust component developed to help process flat file headers. This would seem like a great tool for enterprise ETL. I was hoping that there was already a tool. Your solution is great but to it would be nice to have a configurable solution in a task.
I agree with you Scott. I asked Jamie T. to take a look. It is very possible I am missing something obvious. It may exist and I missed it.
Thanks for the great feedback!
Thanks onpnt! I forwarded a link to my supervisor, who has been struggling with this very issue. Can’t wait for part two!
Wow, thanks genomon! Feedback like that is really why we keep going on sharing things like this 🙂
“The next solution that will be much easier to implement, maintain and gain performance will follow in a second part to this post.”
Bring it on!
Anxiously awaiting with bated breath…