LessThanDot Site Logo

LessThanDot

A decade of helpful technical content

This is an archive of the posts published to LessThanDot from 2008 to 2018, over a decade of useful content. While we're no longer adding new content, we still receive a lot of visitors and wanted to make sure the content didn't disappear forever.

T-SQL Tuesday #38 – Resolving an SSIS package performance problem

It's this time of the month again: the T-SQL Tuesday is back and Jason Brimhall (blog | twitter) is hosting the 38th installment. The theme is "standing firm", which basically means to tell a story about one of these words: resolve, resolution or resolute. My contribution will be a story on how I once resolved an issue with an SSIS package taking more than one hour to complete. <p> <span style="text-align: justify;">At a client, they had a package developed some time ago and after a while it started to slow down enormously, to the point where it took over one hour to complete, if it completed at all. Sometimes out-of-memory exceptions where thrown and they had to start all over again. They asked if I could take a look at the package and perhaps speed it up a little. The package was not a typical data warehouse scenario, but ran on a normalized database which was being used as an OLTP application database and a reporting database at the same time (don't get me started on the design). The package itself was pretty complex – especially the data flow – and it managed quite a workflow. The data set being handled was large, but not that large to be a gigantic problem.</span> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">I took a quick look on the package and some things immediately popped out:</span> </p> <ul style="margin-left: 20pt; list-style-position: outside;"> <li> <em><span lang="EN-US">Lookup components </span></em><span style="text-align: justify; text-indent: -18pt;" lang="EN-US">There were quite a few of them and some of those had either partial or no caching and even worse, they all used the dropdown box to select the table. This is a big red flag.</span> </li> <li> <em><span lang="EN-US">OLE DB Commands</span></em><span style="text-indent: -18pt;" lang="EN-US"> I just hate those pesky little buggers. They fire off a statement for each row in your data flow, causing an avalanche of transactions against the database. The database in question was on Full Recovery Model, something I had no control over, so logging exploded when this package ran. If you thought the previous bullet is a red flag, this one is Defcon 2.</span> </li> <li> <em><span lang="EN-US">OLE DB Destinations without the Fast Load option </span></em><span style="text-align: justify; text-indent: -18pt;" lang="EN-US">Pretty much the same as an OLE DB command. Each insert is a separate transaction instead of a bulk insert.</span> </li> <li> <em><span lang="EN-US">Redundant or unnecessary logic</span></em><span style="text-align: justify; text-indent: -18pt;" lang="EN-US"> Obviously, this is to be avoided.</span> </li> </ul> <p> <!--[if !supportLists]--> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">So how do you tackle a package like this? Let's start with the Lookups. Most of the referenced tables were pretty large, but if you write a query retrieving only the columns you actually need instead of using the dropdown box, you can fit a whole lot more rows in memory. I guess the original creator of the package used partial or no caching because the referenced data sets were too big to fit all in memory. I wrote a few queries selecting only the lookup keys and the columns needing to be retrieved (which is the point of a lookup obviously) and suddenly everything fitted nicely in memory with the Full caching option. Nice.</span> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">The OLE DB commands. Usually they are used to execute UPDATE statements, as there isn't a way to do set based updates in the data flow. I got rid of them by creating temporary tables in the beginning of the package and inserting the data into the temp table with an OLE DB Destination using the Fast Load option. After the data flow, I use an Execute SQL Task to execute a set-based UPDATE statement by doing an inner join between the destination table and the temp table. You can't believe the performance improvement you get using this technique on large data sets. I'm a strong advocate of using a hybrid combination of SSIS components and T-SQL in SSIS packages. For the time being, updates certainly belong in the T-SQL realm.</span> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">The OLE DB Destinations without the Fast Load option. No idea why the default was changed to a worse option, but it was quickly resolved: simply change it to the fast load option and you're good to go.</span> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">Finally I looked at the logic of the package itself. There were some script components doing pretty basic stuff (I guess the creator was a .NET developer doing his first SSIS development) so I removed them and moved all the logic to the source query. I also removed columns that weren't used in the data flow. In SSIS, it's very important to get your row size as small as possible in order to get as many rows as possible in one buffer. I also enhanced the buffer size, so I could get rows faster in and out the data flow. To finish everything off, I removed a SORT component used to remove duplicates and used a DISTINCT clause in the source query. A SORT component is a blocking component – DEFCON 1 for large data sets – and is to be avoided at all costs.</span> </p> <p class="MsoNormal" style="text-align: justify;"> <span lang="EN-US">All this work took one afternoon. I ran the package and behold, it ran in a mere 4 minutes. From over one hour to just a couple of minutes. The sad part is most performance bottlenecks could easily be avoided. By being <em>resolute</em> (see what I did there?) about some very simple SSIS design principles, you can make a big difference in the performance of a package. If you're interested in SSIS performance tuning, start at this magnificent SQLCAT article: <a href="http://sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx?PageIndex=2">Top 10 SQL Server Integration Services Best Practices</a>. It's my bible when tuning SSIS packages.</span> </p>

Read More...

Use the sys.database_mirroring DMV to quickly check if the databases are in principal or mirror role and what state they are in

I have been using mirroring for about three years now and I must say it is one of the best features that have been added to SQL Server. We used to do plain old replication and log shipping in the past but almost all of those have been replaced by mirroring. Someone at work asked me how to check what state the databases are in and if a database on one server is the principal or the mirror. There are a couple of ways you can check this. You can open up SSMS and expand the database folder. You will see something like the following next to the database name (Mirror, Synchronized / Restoring…) or you might see (Principal, Synchronized), other states are possible, for example disconnected, synchronizing, suspended.

Read More...

Truncate rollback differences between SQL Server and Oracle

I wrote a blogpost about the fact that there is a common myth that you can’t rollback a truncate statement in SQL this post was written on June 13, 2007 and it showed you that you could rollback a truncate. Here is some code that shows that. First create this very simple table CREATE TABLE dbo.TruncateTest (ID int IDENTITY PRIMARY KEY, SomeOtherCol varchar(49)) GO Add the following two rows INSERT dbo.TruncateTest VALUES(1) INSERT dbo.TruncateTest VALUES(1) Now execute this whole block in one shot, you will see three resultsets, two of them will have two rows and one resultset will be empty

Read More...

What is the Dual table in Oracle and why do I need it?

When coming from SQL Server, you might find it weird that you don’t see code that looks like this in Oracle’s PL/SQL select 2 select sysdate -- getdate()in SQL Server Unlike with SQL Server that code won’t run in Oracle, Oracle requires the use of the FROM clause in its syntax. This is why Oracle has the DUAL table. If you try to run something like this select 2; you will get the following error

Read More...

Making an interface for plugwise with nancy

Introduction People who follow me might have heard me complain about plugwise and their poor service and late delivery. So I have at least some mixed feelings about plugwise, to say the least. But their product is awesome and works. So what does plugwise do I hear you ask. Well, you plug them in to an electricity socket and then you plug your device into the plugwise. You can now see how much electricity the device or devices are using and you can turn them on or off from a distance or via a schedule. This opens lots of opportunities. The bad part, their software only works on windows and you need a Zigbee compatible USB-stick to communicate with them. Their are some efforts to make them work on linux and I even read that you can control them via the Raspberry Pi. This opens lots of opportunities. One being that I can build my own webserver to control the devices and then log in to that server from my phone, tablet or other computers that has a browser. And so today I got it working.

Read More...

Differences between Oracle and SQL Server when working with NULL and blank values

If you ever have to start working with Oracle you have to keep in mind that NULLs and blank values don’t work exactly the same as in SQL Server. Let’s take a look at some examples Create this table and insert some rows create table TestNull(Col2 varchar(100)); insert into TestNull values(NULL); insert into TestNull values('Bla'); insert into TestNull values(''); insert into TestNull values(' '); As you can see we inserted four rows, one row is null, one row is blank, one row has a space and one row has Bla.

Read More...

Creating MongoDB as a service on Windows 8

In addition to Scala I decided to mess around with MongoDB as well. This post is about how to install MongoDB as a service on Windows 8. BTW this should also work exactly the same on Windows 7 Here is what wikipedia has to say about MongoDB MongoDB (from “humongous”) is an open source document-oriented database system developed and supported by 10gen. It is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a “classical” relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

Read More...

Installing Scala 2.10 on Eclipse Juno

As part of my resolutions for the year I said I would write more blog posts and also get into different technology. I decided to take a look at Scala. Well it turns out Scala 2.10 was released yesterday. But first what is Scala anyway? From the Scala site: Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application.

Read More...

Book Review: Expert Performance Indexing for SQL Server 2012

I just finished reading Expert Performance Indexing for SQL Server 2012 by Jason Strate (blog | twitter) and Ted Krueger (blog | twitter). This is the only index-specific book that I know of for SQL Server, and it was a long-overdue resource. The progression of the book is very logical, from index fundamentals to special types of indexes, index maintenance to a method to analyze and implement changes. The examples and queries that are included are thorough yet understandable. You are given a solid foundation as to why you should do something, and the tools with which to do it. Another outstanding benefit to this book is that Jason and Ted have taken the knowledge gained in their years of working with business users and distilled that. This book goes beyond the purely technical reasons for doing something and encourages you to think about the impact to the business, the applications, and the users behind the databases.

Read More...

Why you shouldn't use SELECT *

A customer’s DBA team created a checklist for the development teams with some best practices for writing proper T-SQL and asked me to write some contributions for their tips document library. So if I do the research and write the documents I might as well post them here. The content may not be all sparkling and new but since there is a demand from customers, there are still people out there having trouble finding the correct information.

Read More...