This month’s T-SQL Tuesday Wednesday topic is all about failure. To be exact, Crap Code. I’m not going to post any code but instead post the problem, solution and coding initiative that were taken. Code was the “crap” part that in all was the failure in this real-life event. For the record, the story I am about to tell everyone in this post, is a real-life story that I did early on in my career. Although it is my hope to always teach in order to prevent things like this particular failure to happen to all of you, I also have the belief that we learn from failure. Sometimes, a person just needs to have the reaction and heart pumping feeling from a complete failure that affects thousands of users, in order to harden how we need to approach anything.
For years the company has dealt with two systems that are completely disconnected but business continuity replies on the information of the systems to be compared, aggregated and reported on. You were just promoted to the sole .NET/Data Engineer in the company and quickly included on the project that will automate combing these data sources into one source for quick and efficient reporting.
The first meeting is in session and you get all the requirements in front of you. As you see all the requirements you recall recently using a trigger for the first time to handle data in and data out of a table. Immediately you spout out, “We could do this in a trigger in a weeks’ time”.
The team doesn’t even question you. You are the brain child for SQL Server and knowledge base for .NET as combined to anything that has Microsoft on it. See, the team is an SAP/Oracle team. They have no clue what the other side of the wall does or can do.
So you all agree on your recommendation. All is good in the data world and you set off to write the trigger.
I’m going to go over the way this project was attacked by non-other than, myself.
I took about twenty minutes to find the table in System A to determine the table that held the data that I needed pulled out and inserted into another database. Once I had the table and the actual columns, I wrote a trigger. The trigger was lengthy due to some business logic that was required. The finished trigger ended up being around 300 lines of Grade-A Transact SQL and something to be proud of. All development server tests went smooth and it was ready for production.
Going into Production
The trigger implementation into production was fairly easy. Execute the CREATE TRIGGER and wait for the data to start inserting into the new database. The script was executed and data started moving. Success!!!
The next day a clerk started reporting some performance problems with saving new records from the application that the new trigger was on. I dismissed the trigger and started looking at other things. After all, the application was a web site. We all know web sites are slow. In fact, after getting all the users out and performing an IISRESET, the application was working fine again for normal searches. Success!!!!
Twenty minutes later the clerk tried the new record screen again and the problem in fact was not fixed. Moreover, some of the other clerks started receiving errors when saving with a very cryptic error message about transactions failing. The trigger finally came under review and I started to look into handling the odd transaction error in the trigger. It was obvious it was the trigger, due to a specific column being references in the error.
I found XACT_ABORT and thought it is was a solution because it will roll back the entire transaction and not throw the ugly error. It works! Success!!!
It works for a little while and then piece-meal records are being inserted into the main systems database.
The problems here
We all see where this is going so we’ll stop and start to talk about this complete failure on my part.
First, should I have shouted out without giving much thought to the method of using a trigger?
Answer: Absolutely not! Simply because something is fresh in your mind does not mean it is appropriate for the next project. A skill learned is simply another skill in your arsenal. Each project should be approached with the mindset that it will require its unique method to achieve a successful implementation. Would a trigger be a possible solution? Possibly. However, given what I know now, do I even consider triggers? Not very often and we’ll get into why not.
Second, if you read closely, you would have caught this line, “The trigger was lengthy due to some business logic that was required”. OK, what’s the issue here?
Answer: If I could go back to 14-15 years ago, or whenever this real-life example happened, I’d kick my own ass.
One of the most critical aspects to database designs I’ve learned is: Business Logic has no place in T-SQL and most of all at the transactional level. Databases are meant to store data. T-SQL is a set based language that is downright horrid at conditional logic or implementing rules based on the business. Implementing rules based on data is different. Defaults, constraints…all are good to manage the storage of data. The performance that came from this 300 line trigger based on the business logic completely drowned the applications speed and ability to insert new records on this particular table. That table happened to be the most critical table and starting point to an entire ticketing process. So we can see how having it function optimally was critical.
Third, the entire project was a failure given the errors and problems that started appearing. Could this have been prevented even if a trigger was the answer?
Answer: YES!!! We read that I tested in development. That was a good thing. We should test our work. However, I tested in development and simply inserted a record here and there with no other tests. Load testing was critical and would have shown the issues that came up after this was sent to production. There should have been a normal development, user acceptance testing and then training protocols put into place.
Fourth (and worst): Was the use of XACT_ABORT to simply kill the transaction and roll it back to fix the nasty error the right thing to do?
Answer: XACT_ABORT may have been the option to set here to clean things up in the trigger but throwing it in and rolling that transaction back in the current transaction caused a massive data corruption problem. The application had started other transactions and had its own roll back process if something failed. So what I essentially did was trash the integrity of the database by doing this. I could have tested the use, validated the failure and data after that and then been satisfied.
What I learned?
I learned very quickly to keep my mouth shut when it comes to me “thinking” I have the answer. Discuss potential solutions and paths that can be reviewed as possible solutions. Research other possible solutions that may be in place already. Do the work and don’t commit to something as a solution simply because you were excited you recently learned it.
And I beg all of you; test your work under the normal use that the user community is putting on it. The worst person to test a program is a programmer. This goes the same for databases. The worst person to test an applications use of a database and results is a database administrator or developer. They will typically just write an INSERT and run it without using the application.