Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Use INDEXKEY_PROPERTY in SQL Server to determine if columns in indexes are sorted ascending or descendingCTE and hierarchical queries »
    comments

    I'm going to go over some methods to import data from text files into SQL Server today. The particular file I went out and grabbed is comma delimited and with a few qualifiers in it. It is a typical file you may get and a request made to import or just for your own administrative tasks.

    Below is the location of field layout and file that I grabbed off the net to play with. This is just a text file comma separated of zip codes. I will attach the file as well to this blog.

    http://spatialnews.geocomm.com/newsletter/2000/jan/zipcodes.html

    Field 1 - State Fips Code
    Field 2 - 5-digit Zipcode
    Field 3 - State Abbreviation
    Field 4 - Zipcode Name
    Field 5 - Longitude in Decimal Degrees (West is assumed, no minus sign)
    Field 6 - Latitude in Decimal Degrees (North is assumed, no plus sign)
    Field 7 - 1990 Population (100%)
    Field 8 - Allocation Factor (decimal portion of state within zipcode)

    Example of file

    Import Wizard

    First and very manual technique is the import wizard. This is great for ad-hoc and just to slam it in tasks.

    In SSMS right click the database you want to import into. Scroll to Tasks and select Import Data…

    For the data source we want out zips.txt file. Browse for it and select it. You should notice the wizard tries to fill in the blanks for you. One key thing here with this file I picked is there are “ “ qualifiers. So we need to make sure we add “ into the text qualifier field. The wizard will not do this for you.

    Go through the remaining pages to view everything. No further changes should be needed though

    Hit next after checking the pages out and select your destination. This in our case will be DBA.dbo.zips.

    Following the destination step, go into the edit mappings section to ensure we look good on the types and counts.

    Hit next and then finish. Once completed you will see the count of rows transferred and the success or failure rate

    Import wizard completed and you have the data!

    bcp utility

    Method two is bcp with a format file http://msdn.microsoft.com/en-us/library/ms162802.aspx

    This is probably going to win for speed on most occasions but is limited to the formatting of the file being imported. For this file it actually works well with a small format file to show the contents and mappings to SQL Server.

    To create a format file all we really need is the type and the count of columns for the most basic files. In our case the qualifier makes it a bit difficult but there is a trick to ignoring them. The trick is to basically throw a field into the format file that will reference it but basically ignore it in the import process.

    Given that our format file in this case would appear like this

    1. 9.0
    2. 9
    3. 1       SQLCHAR       0       0       "\""         0     dummy1             ""
    4. 2       SQLCHAR       0       50      "\",\""      1     Field1             ""
    5. 3       SQLCHAR       0       50      "\",\""      2     Field2             ""
    6. 4       SQLCHAR       0       50      "\",\""      3     Field3             ""
    7. 5       SQLCHAR       0       50      "\","        4     Field4             ""
    8. 6       SQLCHAR       0       50      ","          5     Field5             ""
    9. 7       SQLCHAR       0       50      ","          6     Field6             ""
    10. 8       SQLCHAR       0       50      ","          7     Field7             ""
    11. 9       SQLCHAR       0       50      "\n"         8     Field8             ""

    The bcp call would be as follows

    C:\Program Files\Microsoft SQL Server\90\Tools\Binn>bcp DBA..zips in "C:\zips.txt" -f "c:\zip_format_file.txt" -S LKFW0133 -T

    Given a successful run you should see this in command prompt after executing the statement

    1. Starting copy...
    2. 1000 rows sent to SQL Server. Total sent: 1000
    3. 1000 rows sent to SQL Server. Total sent: 2000
    4. 1000 rows sent to SQL Server. Total sent: 3000
    5. 1000 rows sent to SQL Server. Total sent: 4000
    6. 1000 rows sent to SQL Server. Total sent: 5000
    7. 1000 rows sent to SQL Server. Total sent: 6000
    8. 1000 rows sent to SQL Server. Total sent: 7000
    9. 1000 rows sent to SQL Server. Total sent: 8000
    10. 1000 rows sent to SQL Server. Total sent: 9000
    11. 1000 rows sent to SQL Server. Total sent: 10000
    12. 1000 rows sent to SQL Server. Total sent: 11000
    13. 1000 rows sent to SQL Server. Total sent: 12000
    14. 1000 rows sent to SQL Server. Total sent: 13000
    15. 1000 rows sent to SQL Server. Total sent: 14000
    16. 1000 rows sent to SQL Server. Total sent: 15000
    17. 1000 rows sent to SQL Server. Total sent: 16000
    18. 1000 rows sent to SQL Server. Total sent: 17000
    19. 1000 rows sent to SQL Server. Total sent: 18000
    20. 1000 rows sent to SQL Server. Total sent: 19000
    21. 1000 rows sent to SQL Server. Total sent: 20000
    22. 1000 rows sent to SQL Server. Total sent: 21000
    23. 1000 rows sent to SQL Server. Total sent: 22000
    24. 1000 rows sent to SQL Server. Total sent: 23000
    25. 1000 rows sent to SQL Server. Total sent: 24000
    26. 1000 rows sent to SQL Server. Total sent: 25000
    27. 1000 rows sent to SQL Server. Total sent: 26000
    28. 1000 rows sent to SQL Server. Total sent: 27000
    29. 1000 rows sent to SQL Server. Total sent: 28000
    30. 1000 rows sent to SQL Server. Total sent: 29000

    bcp import completed!

    BULK INSERT

    Next, we have BULK INSERT given the same format file from bcp

    1. CREATE TABLE zips (
    2.    Col1 nvarchar(50),
    3.    Col2 nvarchar(50),
    4.    Col3 nvarchar(50),
    5.    Col4 nvarchar(50),
    6.    Col5 nvarchar(50),
    7.    Col6 nvarchar(50),
    8.    Col7 nvarchar(50),
    9.    Col8 nvarchar(50)
    10.    );
    11. GO
    12. INSERT INTO zips
    13.     SELECT *
    14.       FROM  OPENROWSET(BULK  'C:\Documents and Settings\tkrueger\My Documents\blog\cenzuszipcodes\zips.txt',
    15.       FORMATFILE='C:\Documents and Settings\tkrueger\My Documents\blog\zip_format_file.txt'    
    16.       ) as t1 ;
    17. GO

    That was simple enough given the work on the format file that we already did. Bulk insert isn’t as fast as bcp but gives you some freedom from within TSQL and SSMS to add functionality to the import.

    SSIS

    Next is my favorite playground in SSIS

    We can do many methods in SSIS to get data from point A, to point B. I’ll show you data flow task and the SSIS version of BULK INSERT

    First create a new integrated services project.

    Create a new flat file connection by right clicking the connection managers area. This will be used in both methods

    Bulk insert

    You can use format file here as well which is beneficial to moving methods around. This essentially is calling the same processes with format file usage. Drag over a bulk insert task and double click it to go into the editor.

    Fill in the information starting with connection. This will populate much as the wizard did.

    Example of format file usage

    Or specify your own details

    Execute this and again, we have some data

    Data Flow method

    Bring over a data flow task and double click it to go into the data flow tab.

    Bring over a flat file source and SQL Server destination. Edit the flat file source to use the connection manager “The file” we already created. Connect the two once they are there

    Double click the SQL Server Destination task to open the editor. Enter in the connection manager information and select the table to import into.

    Go into the mappings and connect the dots per say

    Typical issue of type conversions is Unicode to non-unicode.

    We fix this with a Data conversion or explicit conversion in the editor. Data conversion tasks are usually the route I take. Drag over a data conversation task and place it between the connection from the flat file source to the SQL Server destination.

    New look in the mappings

    And after execution…

    SqlBulkCopy Method

    Sense we’re in the SSIS package we can use that awesome “script task” to show SlqBulkCopy. Not only fast but also handy for those really “unique” file formats we receive so often

    Bring over a script task into the control flow

    Double click the task and go to the script page. Click the Design script to open up the code behind

    Go ahead and put this code into the task.

    1. Imports System
    2. Imports System.Data
    3. Imports System.Math
    4. Imports System.Xml
    5. Imports System.Data.SqlClient
    6. Imports System.Data.OleDb
    7. Imports Microsoft.SqlServer.Dts.Runtime
    8.  
    9. Public Class ScriptMain
    10.  
    11.  
    12.     Public Sub Main()
    13.         Dim conn As New SqlConnection("Data Source=LKFW0133;Initial Catalog=DBA;Integrated Security=SSPI")
    14.  
    15.         Using bulk As New SqlBulkCopy(conn.ConnectionString)
    16.             bulk.BatchSize = 1000
    17.             bulk.NotifyAfter = 1000
    18.             bulk.DestinationTableName = "zips"
    19.             AddHandler bulk.SqlRowsCopied, AddressOf OnSqlRowsCopied
    20.             bulk.WriteToServer(LoadupDTFromTxt)
    21.         End Using
    22.  
    23.         Dts.TaskResult = Dts.Results.Success
    24.     End Sub
    25.  
    26.     Private Sub OnSqlRowsCopied(ByVal sender As Object, _
    27.         ByVal args As SqlRowsCopiedEventArgs)
    28.         Console.WriteLine("Copied {0} so far...", args.RowsCopied)
    29.     End Sub
    30.  
    31.  
    32.     Private Function LoadupDTFromTxt() As DataTable
    33.         Dim cn As New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\;Extended Properties=""Text;HDR=No;FMT=Delimited""")
    34.         Dim da As New OleDbDataAdapter()
    35.         Dim ds As New DataSet
    36.         Dim cd As New OleDbCommand("SELECT * FROM C:\zips.txt", cn)
    37.         cn.Open()
    38.         da.SelectCommand = cd
    39.         ds.Clear()
    40.         da.Fill(ds, "zips")
    41.         Return ds.Tables(0)
    42.         cn.Close()
    43.     End Function
    44. End Class

    Then execute the script task

    Again the same results as previous methods but with a new look.

    All of these methods have a place in each unique situation. Performance wise in my experience, bcp wins on speed typically. This is not always a method you can use though which leads us to other resources SQL Server and services provide us.

    Of course none of this would be completely finished unless we added some statistics on runtimes along with these methods. I went ahead and created 3 addition zips.txt files to go with the original. In these files we have the following

    zips.txt - 29,471 rows at around 1.8MB
    zips_halfmill.txt - 500,991 rows at around 31.4MB
    zips_million.txt - 1,001,981 rows at around 62.8MB
    zips_5mill.txt - 5,009,901 rows and around 314.3MB

    I ran these each through all the methods. My results are below. Mind you, the important thing to understand is that I write my blogs/articles off my eprsonal test lab. In no way do I utilize monster servers that would be more suited for benchmarking each. All hardware is created not so equal and results will be varying given that variable. Take these results but keep in mind that resources will make them move up and down the chart. Memory, I/O and CPU is a big factor in speed.

    Tests were complete by running each process 5 times. All resources cleared on each execution.
    Shown in AVG of milliseconds between the types. Import Wizard was not tested as this is basically a Data Flow Task behind the scenes and can be seen (minus the slow user clicking things) from the Data Flow Task in SSIS results

    Hope this helps as a good reference in your own imports.

    About the Author

    Ted Krueger is a SQL Server MVP and has been working in development and database administration for 13+ years. Specialties range from High Availability and Disaster / Recovery setup and testing methods down to custom assembly development for SQL Server Reporting Services. Ted blogs and is also one of the founders of LessThanDot.com technology community. Some of the articles focused on are Backup / Recovery, Security, SSIS and working on SQL Server and using all of the SQL Server features available to create stable and scalable database services. @onpnt Personal Blog over at http://onpnt.wordpress.com/
    Social SitingsTwitterLinkedInLTD RSS Feed
    46874 views
    InstapaperVote on HN

    17 comments

    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) During the writing of this I received a lot of great feedback on some testing from others in the field. SQLGuyChuck @ http://www.sqlwebpedia.com/blogs/sqlguychuck reminded me to look into asynchronous data loads for one which can have major performance increases. I did not have time on this round but it is absolutely an aspect to research part in the SSIS tasks. Simon Sabin who runs SQLBits.com also mentioned testing select into and insert into with the trace on. Again I'll try to get these into this or future writes but time is not on my side this round. Thanks for the help guys!

    Recommend following them on twitter.
    http://twitter.com/simon_sabin
    http://twitter.com/sqlguychuck
    09/14/09 @ 10:35
    Comment from: sqlsister [Member] Email
    *****
    sqlsister I often need to use the wizard to bring in the first instance of a new import to a work table just to see if the promised structure and the actual structure are the same (assuming I even have a promised structure which I often do not). When you get .txt files with 20,000,000 records, it's not so easy to open them in Notepad and look.

    One thing about using the wizard. It defaults to a size of 50 (it's going to assume all the data is string type data unless you tell it differently). If you have fields larger than that (as I do 100% of the time), you will need to fix this (in the advanced tab you can set column sizes) before running the wizard or you will get a truncation error. Drove me crazy when I first moved up to 2008 that a file that would easily import in 2000 would not import with the wizard in 2008. Took me the longest time to figure out how to do it.
    09/15/09 @ 08:30
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) George (gmmastros) found a fatal issue with my format file above. He worked on it offline and let me know. I've replaced it here so no one runs into the bugs I create ;-)

    Thanks George!

    09/15/09 @ 13:52
    Comment from: SQLDenis [Member] Email
    SQLDenis In BCP and BULK INSERT did you use tablock? Otherwise it won't be truly a minimally logged operation

    so for BULK INSERT it would be WITH(TABLOCK)

    bcp would be -h "TABLOCK"

    and in SSIS/DTS you need to check the lock table option
    09/15/09 @ 19:26
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) Thanks Denis. I didn't go into much deeper settings than direct imports on this but that is always one of the best tips and settings to ensure is set on bulk loads. Should have added it as a notation and glad it's here now in your comment.
    09/16/09 @ 06:12
    Comment from: Siddharth Mehta [Visitor] · http://siddhumehta.blogspot.com
    ****-
    Siddharth Mehta While importing data from text file with import export wizard, there can be issues for date formatting. I have posted a post regarding the same on my blog: http://siddhumehta.blogspot.com/2009/09/sql-server-2008-import-export-wizard.html

    May be you would like to moderate this.

    Regards,
    Siddharth.
    09/28/09 @ 06:54
    Comment from: andyg8180 [Visitor]
    *****
    andyg8180 I use the sqlbulkcopy method with datasets and stuff... Its definitely a time saver... I would highly recommend mentioning the .Dispose in your code. I ran into some memory leaks when i didnt dispose some datasets... I write my code like this

    Dim ds as dataset
    Try

    ::Do bulkcopy work::

    Catch ex
    (handle your exception)
    Finally
    ds.dispose
    End Try

    If youre just doing one BulkInsert, then fine, you dont need to, but if youre going to be running numerous Bulks one after another, be sure to dispose...
    09/28/09 @ 07:09
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) Thanks andyg8180 and Siddharth Mehta, both very good points that should be covered.
    09/28/09 @ 07:12
    Comment from: Joel Mansford [Visitor] · http://www.mansford.net
    ****-
    Joel Mansford Two further methods I often use for small Excel / CSV where the SQL table is already defined (append):
    1) Copy + Paste (yes!), if you have the table already defined then you can often copy+paste the contents of the excel file directly in to one of the Management Studio 'Edit 200 rows' windows. Note that network latency to the SQL Server can be a real issue with this though.

    2) Use the Excel CONCATENATE function to generate an INSERT () VALUES compatible statement for each row. SQL Server 2008 allows us to not repeat the whole INSERT statement so this can be relatively efficient.
    Then copy+paste the resulting 'SQL' statement from Excel in to Management studio. I find this approach is good where there are few columns (<5) and not too many rows.
    09/28/09 @ 09:12
    Comment from: Jeffrey Roughgarden [Visitor]
    ****-
    Jeffrey Roughgarden It would be nice to have the performance comparison updated.
    10/05/09 @ 14:47
    Comment from: John Fuhrman [Visitor]
    John Fuhrman Very good article.

    Do you have any examples of how to use SISS for importing data from
    multiple source files in a subdirectory?

    I have been tasked with migrating all data that is currently in the Access Databases to MS SQL.

    There are a total of 366 MDB files spread access the various departments. Most of these are either not being used now or are left overs from a previous conversion project(before my time) or previous versions of the DB Application.

    After attempting to eliminate the "junk" I am left with somewhere around 100 or so files that will need to be migrated into SQL server. Of those files a dozen or so are the Accual Access Applications that are in use.

    So, right now I am attempting to write a few scripts to import the data from the "archives" of the dozen or so Access Applications.

    Once the data has been put into SQL server I will then start to modify the Access Applications forms, and reports to point to SQL server instead of using the local DBs and convert the Access queries to SQL queries on MS SQL. In short convert the Access Applications to Access Projects.

    One other challenge that I have found with this project is that many of the Access Applications use linked tables to pull data from each other.

    The two largest of the Access Applications are Monthly and eWARTZ both of which don't link to other data sources, but other Access Applications link to them.

    Because I have not had much experience with Applications built on Access the difference in data types has been a bit of a problem, but I have been able to solve most.

    The SSMA (SQL Server Migration Assistant) from MS has been a useful tool but lacks the ability to import multiple files into the same tables. It wants to overwrite the previous data. (truncate the tables before starting the next file.)

    So I started down the OPENROWSET road to be able to append the data from each of the Access DBs semi-manualy. As I got further into this I found that as the Access Application grew so did the tables. The eWARTZ Application has had about 8 columns added to it. So I have a table on the SQL server that has ALL Columns but need to adjust the insert statement to account for the older archived data and pad the archived data with defaault values for the missing column data during the insert.

    Hope this helps explain what I am trying to get accomplished.

    Thanks so much for your help.
    12/09/09 @ 07:46
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) Hi John, Thanks for the comments :-)

    From what you have there I think you can get this done quickly by using a foreach enumerator with a data flow task inside it. Each iteration of the foreach on an mdb file would set a variable as the source of the data flow task. if the mappings are completely different on the mdb's you may be better off getting the mdbs into a directory in which the mappings all match or create a script task to set the mappings in the foreach.

    This may be better off asked in the forums here on LTD also. I'm just throwing up a very high level of the process.

    Thanks again
    12/09/09 @ 07:59
    Comment from: John Fuhrman [Visitor]
    John Fuhrman Well that worked just slick!! And was much easier than I had expected!!

    Thanks
    John
    12/09/09 @ 11:04
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) :-) Glad it helped out.
    12/09/09 @ 11:31
    Comment from: LydiaG [Visitor]
    LydiaG I need to import from a XML file into a SQL server table. The table exits already with data, I really need to do insert.

    I have about 80 XML files, and need to inset into 80 different tables. Each table has about 50 columns.

    I read xml data into DataSet, then use SqlBuldCopy to insert into the table.

    But, I am encountering the problem:

    System.InvalidOperationException {"The given value of type String from the data source cannot be converted to type int of the specified target column."}

    InnerException {"Failed to convert parameter value from a String to a Int32."}

    Also, I just thought about, can SQLBulkCopy do “Insert into” data to an existing table with Data.

    what will be a good way to do this task?


    05/03/10 @ 11:53
    Comment from: Neville Mandelbaum [Visitor] · http://friendsofjimcirone.org/bb/profile.php?id=42923
    Neville Mandelbaum This is always helpful
    05/07/10 @ 03:18
    Comment from: Ali Sanan [Visitor] Email
    Ali Sanan Hi,

    I tried out the Dataflow method, I am getting the error of conversion between non-unicode and unicode,
    I am trying to import column3 from file to colum1 of zips table, what particular type should I select in data conversion control ? I tried string,float,double but all failed, destination column is nvarchar(50)

    Please help.
    01/01/11 @ 08:36

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)