Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Madison SQL Server User GroupCheatsheet: Relational Database Management Systems Comparison »
    comments

    Undeniably, the largest impact a database administrator can have on a database server’s performance is with indexing.  One index can mean the difference between a 30 minute overall time of execution or a few seconds overall time of execution.  At the same time, indexing can have a negative impact on overall database server performance.  This negative impact can be from too many indexes, overlapping indexes, or indexes that are not defined correctly.

    Jason Strate (Blog | Twitter) has written an index analysis script that can assist in showing statistics for all types of index-related topics including missing indexes, foreign key indexing, alignment of indexes, and overall performance of an index.  To further the value of this script Jason has developed, we can automate the use of the script by utilizing SQL Server Integration Services to execute the script and export the results to a file that can be later analyzed.  This gives us analysis over time, proactive analysis on a database that may change often for query needs and retention of performance impacts from that analysis for later review.

    Designing an SSIS Package

    The design of the SSIS package that will use Jason’s script involves utilizing dynamic features and methods.  This dynamic nature of the package will allow for the script to be executed on several SQL Server Instances and several databases that are located on each of those instances.

    The Design

    The SSIS package will first use an external file that will have a list of all the servers to collect the index statistics from.   In the first process, the file will be read and a data table used to populate the list into a variable of an object data type.  This method is commonly referred to in SSIS as Shredding Variables.

    Once the variable is loaded with the list of servers from the file in a table format, a Foreach Loop Container is utilized to perform the actual shredding task.  For each row in the table contained in the variable, the internal processing will occur.

    The internal processing of the Foreach Loop will consist of an Execute T-SQL Task to execute Jason’s script based on the instance the Foreach Loop is focused on (or indexed currently), which will pass the results to another Foreach Loop Container that will process each row returned and insert the row into a flat file destination.

    The SSIS Control Flow

    With this design and method used, there are a few bottlenecks that appear.  Later we will look at these bottlenecks and put another design together that will allow a higher scalable SSIS package and a direct data flow.  The design above has many methods that are valuable and will be shown.

    Shredding Variables

    With SSIS, configurations can be used to allow a package to become more dynamic with the SQL Server Instance it uses and databases in those instances it connects directly to.  In some cases, multiple instances are needed in a situation where the package should iterate through each instance and perform the same task on each instance.  To do this, flat files are often a stable and easy solution for other database administrators to implement and alter.

    The flat file connection used in this package connects to a text file containing a list of SQL Server Instances formatted as follows.

    ONPNT\RC0

    ONPNT\RC0_Mirror

    ONPNT

    In order to utilize this file and the list, the first task in the package is a script task.  This script task opens the file, reads the contents and populates a data table with them.  This data table is then set to a variable that is an object type.  The code in the demonstration to do this is shown below.

    1. try
    2. {
    3.     string Serverlisting;
    4.     DataTable dt = new DataTable();
    5.     dt.Columns.Add("Server");
    6.  
    7.     Serverlisting = (Dts.Connections["ServerList"].AcquireConnection(Dts.Transaction) as String);
    8.  
    9.     using (System.IO.StreamReader sr = new System.IO.StreamReader(Serverlisting))
    10.     {
    11.         string line;
    12.         while ((line = sr.ReadLine()) != null)
    13.         {
    14.             DataRow newRow = dt.NewRow();
    15.             newRow["Server"] = line.ToString();
    16.             dt.Rows.Add(newRow);
    17.         }
    18.     }
    19.     Dts.Variables["ServerListing"].Value = dt;
    20. }
    21. catch (Exception ex)
    22. {
    23.     MessageBox.Show(ex.Message.ToString());
    24. }
    25.            
    26. Dts.TaskResult = (int)ScriptResults.Success;
    27. }

    To complete the shredding of the variable process, a Foreach Loop Container is used with an enumerator, Foreach ADO Enumerator.  This is the same method of parsing record sets used in past years with scripting languages.   Once the Foreach ADO Enumerator is set, the ADO object source variable option is made available.  Select the variable to be read in this drop down list.  The last major configuration is to set one or more variables that will be populated by the enumeration of each row in the object variable.  In the case of the container that will read the server list, only one column per row is available.  So only one variable mapping is required.

     

    For a detailed article on shredding variables, read “Using Variables as a source in a Data Flow Component” by Josef Richberg (Blog | Twitter).

     

    Data Flow

    With the method and designs we have chosen to do this SSIS package, the data flow that will read the data and export to the file, will need to utilize a script component as the source.  The script component as a source in a data flow task allows us to perform several code specific transformations, such aslogic or even data extraction.  Earlier we shredded the secondary object variable into several matching variables.  Those variables are now required to be the source in an output format.  This output format is then passed to the destination.  Before being able to set the variables as output values, the output columns need to be defined.  For each variable, or column that is needed as an output column, the column needs to be defined in the Inputs and Outputs page of the script transformation editor.

    Using the Add Column button, add a column into the Output Columns for each matching variable that was created and populated from the Foreach Loop over the object variable.

    Remember to ensure the length and data types of these columns are accurate.  Lengths that are too small can cause truncation errors and lengths that are too large can cause unwanted performance problems.  In some cases, data types can be explicitly converted at the time of the buffer moving in the data flow, but this value should match to ensure no performance problems or errors on casting the data types occur.  Remember, casting from unicode to non-unicode will always generate an error.  In flat file and many systems or other data sources, Unicode will be the default.

     

    After the output columns are defined, the next step is to add the variables as write available variables to the script component and then add the code needed to move the variables to the output columns or output buffer.

    1. IDTSVariables100 var = null;
    2.  
    3. VariableDispenser.LockForWrite("User::buffer_mb");
    4. VariableDispenser.LockForWrite("User::duplicate_indexes");
    5. VariableDispenser.LockForWrite("User::has_unique");
    6. VariableDispenser.LockForWrite("User::index_action");
    7. VariableDispenser.LockForWrite("User::index_name");
    8. VariableDispenser.LockForWrite("User::indexed_columns");
    9. VariableDispenser.LockForWrite("User::is_unique");
    10. VariableDispenser.LockForWrite("User::object_name");
    11. VariableDispenser.LockForWrite("User::overlapping_indexes");
    12. VariableDispenser.LockForWrite("User::pct_in_buffer");
    13. VariableDispenser.LockForWrite("User::related_foreign_keys");
    14. VariableDispenser.LockForWrite("User::related_foreign_keys_xml");
    15. VariableDispenser.LockForWrite("User::row_block_pct");
    16. VariableDispenser.LockForWrite("User::row_count");
    17. VariableDispenser.LockForWrite("User::size_in_mb");
    18. VariableDispenser.LockForWrite("User::table_buffer_mb");
    19. VariableDispenser.LockForWrite("User::type_desc");
    20. VariableDispenser.LockForWrite("User::user_lookups");
    21. VariableDispenser.LockForWrite("User::user_scans");
    22. VariableDispenser.LockForWrite("User::user_seeks");
    23. VariableDispenser.LockForWrite("User::user_total_pct");
    24. VariableDispenser.LockForWrite("User::included_columns");
    25. VariableDispenser.GetVariables(out var);
    26.  
    27. Output0Buffer.AddRow();
    28.  
    29. Output0Buffer.buffermb = (String)var["User::buffer_mb"].Value;
    30. Output0Buffer.duplicateindexes = (String)var["User::duplicate_indexes"].Value;
    31. Output0Buffer.hasunique = (Int32)var["User::has_unique"].Value;
    32. Output0Buffer.indexaction = (String)var["User::index_action"].Value;
    33. Output0Buffer.indexname = (String)var["User::index_name"].Value;
    34. Output0Buffer.indexedcolumns = (String)var["User::indexed_columns"].Value;
    35. Output0Buffer.isunique = (Int32)var["User::is_unique"].Value;
    36. Output0Buffer.objectname = (String)var["User::object_name"].Value;
    37. Output0Buffer.overlappingindexes = (String)var["User::overlapping_indexes"].Value;
    38. Output0Buffer.pctinbuffer = (String)var["User::pct_in_buffer"].Value;
    39. Output0Buffer.relatedforeignkeys = (String)var["User::related_foreign_keys"].Value;
    40. Output0Buffer.relatedforeignkeysxml = (String)var["User::related_foreign_keys_xml"].Value;
    41. Output0Buffer.rowblockpct = (String)var["User::row_block_pct"].Value;
    42. Output0Buffer.rowcount = (String)var["User::row_count"].Value;
    43. Output0Buffer.sizeinmb = (String)var["User::size_in_mb"].Value;
    44. Output0Buffer.tablebuffermb = (String)var["User::table_buffer_mb"].Value;
    45. Output0Buffer.typedesc = (String)var["User::type_desc"].Value;
    46. Output0Buffer.userlookups = (String)var["User::user_lookups"].Value;
    47. Output0Buffer.userscans = (String)var["User::user_scans"].Value;
    48. Output0Buffer.userseeks = (String)var["User::user_seeks"].Value;
    49. Output0Buffer.usertotalpct = (String)var["User::user_total_pct"].Value;
    50. Output0Buffer.includedcolumns = (String)var["User::included_columns"].Value;
    51.  
    52. var.Unlock();
    53. Output0Buffer.SetEndOfRowset();

    The code above locks all the variables and then assigns them accordingly to the output buffer columns.  One completed, the variables are unlocked and the row is sent through.

     

    At this point, mapping the output columns to the flat file destination columns is all that is needed.

    Configuration File

    Add a configuration file to the package to complete the dynamic and mobile setup of the package.  The configuration file will take the server list and database list file names.  It should also contain a default database, in most cases master, for administrative operations.

     

    For detailed information on using and setting up configuration files, refer to “Making SSIS Dynamic: Configuration Management

     

    Execution

    This package runs well and performed the exact task that was required of it.  The shredding of the object variables completed, the output buffer created for each row and loaded into the file destination.  The files were also created for each task that we wanted based on each SQL Server Instance.

    Review and Problems with this Design

    In review, we’ve talked about how to do several things to make this package possible.

    1. Dynamic file lists for reusable package executions on several SQL Server Instances
    2. Shred variables and map them to multiple variables for each column returned by using the Foreach Loop Container and the ADO Enumerator
    3. Script component as sources and populating the output buffer with variables.

     

    Although this design works well on databases with a lower number of indexes and databases used less frequently, the design is inherently poor on performance overall if a large database is a factor.  In the tests I made, the least utilized and smallest database resulted in a relatively quick total execution time.  Using a larger more realistic database, the total execution time suffered drastically from the design.

    Based on three executions of the package for each instance set in the Server List file.

     

    We went through this design to show methods that could be used.  We also went through the design to show that even when coming from something that was successfully execute and results validated, the overall achievement and performance factor was poor.  Due to those factors, this package will be redesigned to better use SQL Server and the Data Flow task in SSIS.  In the next article we will build that package from scratch and show each step that is involved.

     

    Use this design when there is a need.  Several cases require the shredding of variables and utilizing them as a data source.  Be sure that when utilizing either a row-by-row operations or a one row in type situation, to measure your performance on all database sizes.

    About the Author

    Ted Krueger is a SQL Server MVP and has been working in development and database administration for 13+ years. Specialties range from High Availability and Disaster / Recovery setup and testing methods down to custom assembly development for SQL Server Reporting Services. Ted blogs and is also one of the founders of LessThanDot.com technology community. Some of the articles focused on are Backup / Recovery, Security, SSIS and working on SQL Server and using all of the SQL Server features available to create stable and scalable database services. @onpnt Personal Blog over at http://onpnt.wordpress.com/
    Social SitingsTwitterLinkedInLTD RSS Feed
    1413 views
    InstapaperVote on HN

    8 comments

    Comment from: Christiaan Baes (chrissie1) [Member]
    Christiaan Baes (chrissie1) You might need to rethink that first C# example.
    01/25/12 @ 06:39
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) I'm very open to better ways to write code I don't get to play with often, my friend :) That's what I have you for. to show me the better light!
    01/25/12 @ 06:53
    Comment from: Keith Mescha [Visitor]
    Keith Mescha Ted, any reason you used a script task to read in the file of servers versus just a flat file data source control? Wondering if there is some benefit to that approach.
    01/25/12 @ 07:02
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) Hey Keith, error handling is usually the reasoning but I'm not following how you would do that with a flat file source without in an out of the pipe? (multiple DFs and ending up still going back to the control flow to still enumerate each one)

    Sounds interesting actually. Off to try it out :)
    01/25/12 @ 07:12
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) On a side note. The follow up blog to this one and a MUCH better way to do this will be posted shortly. It executes in seconds and the bottleneck or highest utilization that shows in that design is the index script itself which won't have much tuning available. This can all be done with an execute sql task and DF (with the same script task to dynamically hit multiple servers and databases).
    01/25/12 @ 07:15
    Comment from: SQLDenis [Member] Email
    SQLDenis I have a way to this in PowerShell as well


    here is what you need to do

    save this as IndexAnalysis.ps1

    ###############################################
    Param($InstanceName,$DatabaseName)

    Add-PSSnapin SqlServerCmdletSnapin100
    Add-PSSnapin SqlServerProviderSnapin100



    $FilePath = "C:\PowerShellOutput\" + $InstanceName+ "_" + $DatabaseName + "_IndexAnalysis2005.csv"


    Invoke-Sqlcmd -ServerInstance $InstanceName -Database $DatabaseName -Query "
    EXEC sp_IndexAnalysis2005" |
    export-csv $FilePath

    ###############################################

    now call it like this, change "D:\SVN\Powershell" to where your path is

    powershell.exe "D:\SVN\Powershell\IndexAnalysis.ps1" -InstanceName LocalHost -DatabaseName SomeDatabase


    So then the thing left to do is to loop over the servers and databases that you want. Stick it in a table or file and it should be pretty eay

    Maybe I should make it into a blog post?
    01/25/12 @ 10:08
    Comment from: Ted Krueger (onpnt) [Member]
    Ted Krueger (onpnt) Absolutely make it into a blog. I'd add scheduling it also. Very nice and thanks!
    01/25/12 @ 11:47
    Comment from: bini [Visitor]
    bini Nice, is there any way we can download the whole package and try it out ?!
    02/28/12 @ 13:01

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)