Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Search

XML Feeds

Google Ads

« Script Task vs Flat File Connection ManagementDynamic SQL Server connections in Reporting Services »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

It really can get bad…

Recently I had the pleasure of working on a new database introduced to the environment. This database brought with it the need for some critical reports to be developed in order to properly allow the new business plan to flow. Typically any database and software piece added to the business flow will cause the same two tasks to be deemed critical to the success of the implementation of the system. In these tasks, reporting will force you as either a DBA or DB Developer to become intimate with the database and how it is storing data. During that process of getting to know other database designs and storage engines all together, you will find bad and good practices. In my career I have seen the ugly and then the really ugly but I found on this particular implementation it could get even uglier.

Deal with it?

The sad truth to the matter is, third party injections into our stable environments is something you cannot prevent. Business entities buy software by the payload and will never stop doing that. It’s important to understand these entities are simply doing their part to make the business run better and make money in the long run. Of course this all means we end up dealing with the good, the bad and the ugly of software and database design.

So now that we all accept the fact we have no choice of dealing with these issues we can start to write about them. This give hope that we can help the database community and prevent the recreation of the practices. My own hopes from what I found in this one database are it will never cross your mind as a good idea. This design flaw truly was beyond ugly and bordering just under the hammer of the gun on bad.

The design uncovered…

After hours of searching the database to find critical data for reporting I realized there was little I could accomplish without contacting the vendor with my questions. The reply I received was nothing short of jaw dropping to where the data I needed really was being stored.

It’s important to note that this database sat on one of the top 3 enterprise database servers out there on the market. The database server is respected by any database professional at being able to handle enterprise level needs without question. In this case that only proved to help the situation by helping hide horror hiding under the covers.

So the data I was after was related to customer billing, item long descriptions, creation date and customer number itself. These should be few key data mining steps that should take minutes to determine locations and relations in any database. I’m going to limit this blog to those key columns but the design flowed into the entire database and tables. I took the time to populate my own fake table with data to show the issues.

Let’s take a look at my create table statement

  1. CREATE TABLE FINDME
  2. (
  3. KEYS CHAR(500)
  4. ,STRINGS CHAR(3000)
  5. ,TXT1 CHAR(150)
  6. ,TXT2 CHAR(100)
  7. ,DATE1 DATETIME
  8. ,DATE2 DATETIME
  9. ,DATE3 DATETIME
  10. ,DATE4 DATETIME
  11. ,DATE5 DATETIME
  12. )
  13.  
  14. INSERT INTO FINDME
  15. VALUES(
  16.         '9999999933333ABC94ORT               9432          CUST5     2SOP'
  17.         ,'MY CUSTOMER 5  ADDRESS1 NYYNTRUE    54324532'
  18.         ,'THIS IS MY PRODUCT DESCRIPTION THAT IS REALLY LONG'
  19.         ,' SO I NEED LOTS OF COLUMNS'
  20.         ,GETDATE()
  21.         ,GETDATE()-1
  22.         ,GETDATE()-10
  23.         ,GETDATE()+1
  24.         ,GETDATE()-2
  25.       )
  26.  
  27.  
  28. SELECT * FROM FINDME

We can already see that the first issue is the CHAR data types. That isn’t really the disturbing part though. Well, CHAR does scare me when I see it but let’s move beyond that for now. The problem is, you can immediately see the storage of dozens of unrelated and unique chunks of data held within the same columns. My career started off in development so I’m no stranger to using fixed field strings in my code. It’s a great way to pass parameters between logic in order to make passing those objects simplified and quickly maintained with a brief key so others know the fixed format and can extract the data as needed. Now put that into a database and we have a serious problem. There is no means to even the first form of normalization and you start to manipulate data not in result sets but case steps that cause performance issues.

Notice the CUST5 string. This CUST5 related back to CUSTOMER which holds other customer data. How are you going to join these tables? Given a normal primary to foreign relationship you would compose a query such as

  1. SELECT
  2.     a.CUSTID
  3.     ,b.CUSTID
  4. FROM
  5. FINDME a
  6. JOIN CUSTOMER b ON a.CUSTID = b.CUSTID

Let’s do it with the data as it is in our FINDME table now though. We’ll assume for this exercise that CUSTID is actually stored in its own column in CUSTOMER

  1. SELECT
  2.     SUBSTRING(a.KEYS,51,5)
  3.     ,b.CUSTID
  4. FROM
  5. FINDME a
  6. JOIN CUSTOMER b ON SUBSTRING(a.KEYS,51,5) = b.CUSTID

First query will obtain index seeks given the supporting indexes and even on high count tables will perform in milliseconds for you. Second has no hope whatsoever of being a query that will perform well. It is nonsargable, gives no patter for proper indexing and storage will become an issue across the table. So you are already struggling to just get your data out of the database.

Second problem is the storage of the description in multiple fields. By nature concatenation is slow. For that matter, any string manipulation in any development plan is slower than not. In order to get your long description out of this table you will be forced into the following

  1. SELECT
  2.     TXT1 + TXT2
  3. FROM
  4. FINDME

But wait, how do we know there is a leading space to form this string correctly? You don’t and you’ll have to test for it if you are a good developer.

Now there is a need to find the create date. So which is it? OK, yes. An ERD would give you the answer to that question and any good software company (Hint if that’s you) will provide the diagrams for you. What happens if the ERD is not available at audit time and the company you purchased this from is belly up? You are forced to start testing in the application to figure what data is going where. You will be entering data while watching profiler and viewing the tables to see the value you need. This will give you the best guess assessment of how to write your query. Who reading this has made those, “best guess” choices? I have and know they never work so please, don’t use those unless you fall into just that predicament.

Conclusion? Not really…

I only went over a few issues so far just with that one table. If you have designed databases before I’m sure you see about a dozen issues. I want to save the others for some follow up blogs as they are much bigger issues and showing the affects they have on the database server should be keyed in on. So by no means is this a conclusion on the “bad design” issues just found in this one instance. I do hope that from this and future blogs to come from me, that you don’t make these mistakes in your own designs.

About the Author

Ted Krueger is a SQL Server MVP and has been working in development and database administration for 13+ years. Specialties range from High Availability and Disaster / Recovery setup and testing methods down to custom assembly development for SQL Server Reporting Services. Ted blogs and is also one of the founders of LessThanDot.com technology community. Some of the articles focused on are Backup / Recovery, Security, SSIS and working on SQL Server and using all of the SQL Server features available to create stable and scalable database services. @onpnt
Social SitingsTwitterLinkedInLTD RSS Feed
3930 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

14 comments

Comment from: Jamie Thomson [Visitor] · http://sqlblog.com/blogs/jamie_thomson/
I'm stunned. "mistakes" doesn't even come near near to describing this, "negligence" is more like it!
I would never have thought that systems could be built as badly as this, how long will it be until we see a software company being taken to court for providing s/w of such poor quality. They shouldn't be allowed to get away with it.

Its scandalous, really! Name and shame Ted, name and shame!
14/12/09 @ 09:00
Comment from: Arnie Rowland [Visitor] Email
I'll use this most excellent opportunity to plug the new MSDN/TechNet Database Design Forum. The Database Design Forum is Moderated by a group of experienced and dedicated MVPs and will be a great resource for exactly these types of issues.

Perhaps the DEV team for this product could be referred to the Database Design forum.

http://social.msdn.microsoft.com/Forums/en-US/databasedesign/threads
14/12/09 @ 11:10
Comment from: SQLDenis [Member] Email
--CELKO-- would love to scream at this 'design'

This is just basically a flat file dumped into a table....might as well use Excel instead :-)


I have seen once a customerid column which was char(5000) instead of 5

or a ManagerFor column where the people the manager managed would be a comma separated list of IDs...then you needed to parse this to get the names of all those people
14/12/09 @ 11:34
Comment from: sqlsister [Member] Email
I've noticed that all the commercial software where I have had the dubious pleasure of looking at their database structures were extremely, badly designed.

I see several reasons for this. First they think they should be able to use any database as a backend and thus design by avoiding better performing database specific commands.

Second, they don't have to run their own Enterprise software with 1000 users, so they never see the pain of the poor performance.

Third, they think they don't need to hire database specialists and rely on applications programmers to design the databases.

Fourth, they have taken that nonsense about premature optimization to mean that they don't have to worry about optimizing for performance at all.

At least this one had the field names in English, I had to figure out a backend written by a French company one time.

I agree with Jamie Thompson though, this design is pure negligence and incompetence.
14/12/09 @ 12:35
Comment from: Emtucifor [Member] Email
There is a huge rift between development and management. Management, because it can't SEE the cost on a balance sheet, ignores the costs of bad design.

IT and development, because it doesn't understand how to speak to management in terms that management will get excited about, accepts the quick fix it's ordered to implement. Or perhaps IT doesn't even know what sensible technology practices are--and management is just as ignorant as before so doesn't correct this problem with proper hiring and training practices.

The fact is that bad design incurs ongoing and likely increasing costs for the entire lifetime of the product. The choice as development progresses is between repeatedly paying down the run-up redesign debt through carefully-chosen continuous refactoring (the part management doesn't want to spend one cent on) or eventually either ending up with an unfixable system with huge hidden costs, or plunking down far more money than it otherwise would have taken to fix the system or design a new one.

It's so sickening to see the garbage in these enterprise-level application databases that cost millions of dollars.
14/12/09 @ 17:25
Comment from: riverguy [Member] Email
It's fortunate that the vendor at least provided you with some clues as far as how the data is stored. I've worked with several vendors in the past where these types of questions were never answered. Sometimes it's hard to get a question passed through technical support back over to the system architects.
15/12/09 @ 07:10
Comment from: Kermit [Member] Email
sqlsister: "Third, they think they don't need to hire database specialists and rely on applications programmers to design the databases."

Hit the nail on the head, this is the most likely cause of what you are seeing with that database and not always a failing of the database engine (but can be a factor).

The developers have very little experience, time or the required resource (DBA, someone other than a coder) in-house to deal with issues like good database design (A copy of 'Database Design For Mere Mortals' on a desk wouldn't hurt) and data normalisation.

16/12/09 @ 06:38
Comment from: traingamer [Member] Email
Even database rookies know enough to put a FK in its own column.

What an inherently inept structure. I can't even fathom how someone could 'design' something that ugly. (Bring in the 'Man with no name' to finish it off.)
16/12/09 @ 11:54
Comment from: Steve [Visitor]
I'll repeat what others have said, where's the design? I stopped at " KEYS CHAR(500)." From that alone I knew it was hopeless.
16/12/09 @ 12:22
Comment from: jason [Visitor]
I'm not a dba. I'm a programmer. I've got about 6 years experience. To put that into perspective, I would have cringed at this design after my first 8 months out of college. I had worked with a database that wasn't close to normalized. Foreign key's matched on differing names and weren't numeric a lot of the time.

I've actually worked on projects where the tables really were a series of charA, charB, intA, intB, dateA, dateB, on and on. It was a nightmare as we had to magically just know about what data was intended to be stored in what field. Maybe that should be called Copperfield Programming.

*cringe*

Maybe a solution to get management to understand is to write up a fake cost sheet for the project, give it to them, let them sweat for a day or two, and then fix it by showing how much less it is by putting in some forethought and, maybe, some more money into it up front.
17/12/09 @ 07:04
Comment from: kevinK [Visitor]
Don't just blame outside packages.
I worked as a DBA, a group asked for help with the system they were developing.
They had not had any DBA involvement to date, they were an "elite" team, above the rules everyone else developed with.

So their system had fields called "Data", "Date", "Number" to name a few, you can extrapolate what the rest of the DB looked like.

After attempting to work with them to improve the design, fighting with them, even bringing it up to their CIO (we had 2) who I had much respect for. I gave up and helped the as best as I could.

As a final note 6 months before implementation, before development was even finished / stable, they ordered low 6 figures worth of hardware and software to cluster their SQL server etc.
Remember 'before development was even finished / stable', as it turned out they were never able to get their system to work, it was scaled down drastically and the clustered systems ordered were never fully utilized.

This was effectively an offline system that ran overnite, so I never was able to get a good answer why they needed to cluster them.

But they were an 'Elite' team.
20/12/09 @ 13:56
Comment from: Troy Doering [Visitor]
I to have had the pleasure of working with poorly designed DB. A well designed Db should just be common sense.
20/12/09 @ 14:10
Comment from: fregatePallada [Visitor]
G'day onpnt,

Thank you for interesting article - I had experience with HORROR DB story:
- primary key in one table consists of 5 (!) fields, two of them dates (suppose to represent a date range, second date is NULLable to highlight the fact that this is a current period);
- other tables that suppose to reference a table above sometimes have 3 or 4 fields that suppose to match PK in first table;
- No referential integrity existed whatsoever.

All my attempts to get message to management and Date Architect had been met with BLANK facial expression and answer "We are running a profitable business for last 15 years. And who do you think you are to tell a that our data is wrong".

What we did:
- provided 20+ SQL queries that highlighted the most obvious data issues with explanations what does it translates in business terms and impact in dozen of thousand of BAD records (after that our client spent 6 month to clean operational data);
- created a VALIDATION layer that performs sanity data check on ...reading data from DB (X-files message - trust NO ONE).
02/01/10 @ 19:42
Comment from: Christian Doran [Visitor]
This has got to be the result of doing an upgrade of an old flat-file based system presumably using source code conversion software; not even the worst programmers I have ever worked with could design it that badly!
04/01/10 @ 04:24

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)