A couple of days ago onpnt posted the following blogpost: Import directory of XML files into SQL Server 2005

In that post he was using SQLCLR to import a bunch of files. Some of you might not be so familiar with .NET so I am providing a T-SQL way to do something similar

You will need to create a directory testxml on the c drive and put a bunch of XML files in there. If you don’t have any XML files then save the following two as file1.xml and file2.xml

file1.xml

<MusicCollection>
 <Artist>
  <ArtistName>Pink Floyd</ArtistName>
 <Album>
  <AlbumName>Wish You Were Here</AlbumName>
  <YearReleased>1975</YearReleased>
  </Album>
 <Album>
  <AlbumName>The Wall</AlbumName>
  <YearReleased>1979</YearReleased>
  </Album>
  </Artist>
 <Artist>
  <ArtistName>Prince</ArtistName>
 <Album>
  <AlbumName>Purple Rain</AlbumName>
  <YearReleased>1984</YearReleased>
  </Album>
 <Album>
  <AlbumName>Lotusflow3r</AlbumName>
  <YearReleased>2009</YearReleased>
  </Album>
 <Album>
  <AlbumName>1999</AlbumName>
  <YearReleased>1982</YearReleased>
  </Album>
  </Artist>
 <Artist>
  <ArtistName>Incubus</ArtistName>
 <Album>
  <AlbumName>Morning View</AlbumName>
  <YearReleased>2001</YearReleased>
  </Album>
 <Album>
  <AlbumName>Light Grenades</AlbumName>
  <YearReleased>2006</YearReleased>
  </Album>
  </Artist>
  </MusicCollection>

file2.xml

<MusicCollection>
 <Artist>
  <ArtistName>Pink Floyd</ArtistName>
 <Album>
  <AlbumName>Wish You Were Here</AlbumName>
  <YearReleased>1975</YearReleased>
  </Album>
 <Album>
  <AlbumName>The Wall</AlbumName>
  <YearReleased>1979</YearReleased>
  </Album>
  </Artist>
 <Artist>
  <ArtistName>Prince</ArtistName>
 <Album>
  <AlbumName>Purple Rain</AlbumName>
  <YearReleased>1984</YearReleased>
  </Album>
 <Album>
  <AlbumName>Lotusflow3r</AlbumName>
  <YearReleased>2009</YearReleased>
  </Album>
 <Album>
  <AlbumName>1999</AlbumName>
  <YearReleased>1982</YearReleased>
  </Album>
  </Artist>
</MusicCollection>

Now that we have our files we are ready to grab all the files in the directory. We will use a plain vanilla DOS dir command for this with the B switch so that we don’t get a lot of garbage returned. Here is what this block of code looks like

IF OBJECT_ID('tempdb..#tempList') IS NOT NULL
DROP TABLE #tempList

CREATE TABLE #tempList ([FileName] VARCHAR(500))

--plain vanilla dos dir command with /B switch (bare format)
INSERT INTO #tempList
EXEC MASTER..XP_CMDSHELL 'dir c:testxml /B'


--delete the null values
DELETE #tempList WHERE [FileName] IS NULL

-- Delete all the files that don't have xml extension
DELETE #tempList WHERE [FileName] NOT LIKE '%.xml'

--this will be used to loop over the table
alter table #tempList add id int identity
go

Now let’s see what has actually been inserted into the table

select * from #tempList

Output

———————

FileName	id
file1.xml	1
file2.xml	2

The following table will be used to store the XML.

CREATE TABLE [dbo].[XMLImport](
    [filename] [VARCHAR](500) NULL,
    [timecreated] [DATETIME] NULL,
    [xmldata] [xml] NULL
) ON [PRIMARY]
GO

Here is where the import happens, since we have to use dynamic SQL to do the XML import it is better to use SP_EXECUTESQL instead of EXEC since SP_EXECUTESQL has output parameters.

I have put comments in this codeblock but if you need more information how exactly this works then leave me a comment.

truncate table XMLImport --in case you want to rerun just this codeblock
declare @Directory varchar(50)
select @Directory = 'c:testxml'

declare @FileExist int
DECLARE @FileName varchar(500),@DeleteCommand varchar(1000),@FullFileName varchar(500)

DECLARE @SQL NVARCHAR(1000),@xml xml

--This is so that we know how long the loop lasts
declare @LoopID int, @MaxID int
SELECT @LoopID = min(id),@MaxID = max(ID)
FROM #tempList



WHILE @LoopID <= @MaxID
BEGIN

	SELECT @FileNAme = filename
	from #tempList
	where id = @LoopID

	SELECT @FullFileName = @Directory + @FileName 
	
	exec xp_fileexist @FullFileName , @FileExist output
	if @FileExist =1 --sanity check in case some evil person removed the file
	begin
	SELECT @SQL = N'select @xml = xml 
		FROM OPENROWSET(BULK ''' + @FullFileName +''' ,Single_BLOB) as TEMP(xml)'
	 
	-- Just like in the bedroom, this is where the magic happens
	-- We use the output functionality to fill the xml variable for later use
	EXEC SP_EXECUTESQL @SQL, N'@xml xml OUTPUT', @xml OUTPUT
	 
	
	--The actual insert happens here, as you can see we use the output value (@xml)
	INSERT XMLImport ([filename],timecreated,xmldata)
	SELECT @FileName,getdate(),@xml
	
	SET @DeleteCommand = 'del ' +  @Directory + @FileName 
	--maybe you want to delete or move the file to another directory
	-- ** here is how to delete the files you just imported
	-- uncomment line below to delete the file just inserted
	--EXEC MASTER..XP_CMDSHELL @DeleteCommand
	-- ** end of here is how to delete the files
	end

	--Get the next id, instead of +1 we grab the next value in case of skipped id values
	SELECT @LoopID = min(id)
	FROM #tempList
	where id > @LoopID
END

So that is all the code that you need to make this happen, let’s see what is actually inserted into the table

select * from XMLImport

output

————————————————————————

filename	timecreated		xmldata
file1.xml	2009-04-29 09:18:42.313	Pink Floyd....
file2.xml	2009-04-29 09:18:42.330	Pink Floyd....

Attention/warning!!

Here are a couple of warnings for you.

xp_cmdshell

It is not a best practice to have xp_cmdshell enabled. As a matter of fact beginning with SQL Server 2005, the product ships with xp_cmdshell disabled. If you try to run xp_cmdshell you will get the following message if it is not enabled

Server: Msg 15281, Level 16, State 1, Procedure xp_cmdshell, Line 1

SQL Server blocked access to procedure ‘sys.xp_cmdshell’ of component ‘xp_cmdshell’ because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of ‘xp_cmdshell’ by using sp_configure. For more information about enabling ‘xp_cmdshell’, see “Surface Area Configuration” in SQL Server Books Online.

To enable xp_cmdshell execute the following code

EXECUTE SP_CONFIGURE 'show advanced options', 1
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'xp_cmdshell', '1'
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'show advanced options', 0
RECONFIGURE WITH OVERRIDE
GO

OPENROWSET

In SQL Server 2005 and 2008 OPENROWSET is also disabled by default, if you try to run an OPENROWSET query then you will see the following message:

Server: Msg 15281, Level 16, State 1, Line 1

SQL Server blocked access to STATEMENT ‘OpenRowset/OpenDatasource’ of component ‘Ad Hoc Distributed Queries’ because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of ‘Ad Hoc Distributed Queries’ by using sp_configure. For more information about enabling ‘Ad Hoc Distributed Queries’, see “Surface Area Configuration” in SQL Server Books Online.

To enable OPENROWSET and OPENQUERY you can use the previous script but instead of ‘xp_cmdshell’ you will use ‘Ad Hoc Distributed Queries’. The script to enable Ad Hoc Distributed Queries is below

EXECUTE SP_CONFIGURE 'show advanced options', 1
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'Ad Hoc Distributed Queries', '1'
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'show advanced options', 0
RECONFIGURE WITH OVERRIDE
GO

xp_fileexist

The stored proc xp_fileexist is undocumented so be aware that it could change with a service pack or be removed all together

*** If you have a SQL related question try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum