Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Dealing with The multi-part identifier "dbo.Table.Column" could not be bound. error in an update statementThe Lazy DBA Series: Wizards! »
    comments

    With this blog post I am hoping to start a new series of blogs devoted to the interesting T-SQL problems I encounter in forums during the week.

    The idea of this blog series came to me on Wednesday night when I thought I solved a complex problem...

    First Problem - Divide data into 15 min. time intervals

    The first problem, I'd like to discuss, is found in this MSDN thread:

    Given this table

    SalesDateTimeSalesAmount
    2010-08-10 00:05:1258.22
    2010-08-10 00:08:2221.10
    2010-08-10 00:09:388.45
    2010-08-10 00:18:049.52
    2010-08-10 00:19:5645.20
    2010-08-10 11:35:1547.12
    2010-08-10 11:36:1288.55
    2010-08-10 11:40:3145.12

    find the Average Sale amount for the 15 minutes time interval.

    The first idea, that comes to mind, of how to solve this problem, is to use integer math in T-SQL. In T-SQL, unlike other languages, when you divide one integer by another integer, you get an integer in return.

    So, if we divide datepart(minute,SalesDateTime) by 15, we will organize the data into the 15 minutes intervals.

    Now, my first idea was to use datepart function to get year, month, day, hour portions of the date and then construct the date based on these parts using concatenation and string functions. Later, upon thinking, I realized we can use CONVERT() function to grab first portion of the datetime field (up to hours), then add Group * 15 to get the minute part and then convert back to datetime. While I was writing this solution and testing it, Tom Cooper came up with the simpler idea based on the datediff function and fixed date.

    I list both of the solutions below:

    1. declare @Sales table (SalesDateTime datetime,     SalesAmount decimal(10,2))
    2. insert into @Sales
    3. select
    4.  
    5. '2010-08-10 00:05:12',   58.22
    6. union all select
    7. '2010-08-10 00:08:22',   21.10
    8. union all select
    9. '2010-08-10 00:09:38',   8.45
    10.  
    11.  union all select
    12.  
    13. '2010-08-10 00:18:04',   9.52
    14. union all select
    15. '2010-08-10 00:19:56',   45.20
    16.  
    17.  union all select
    18.  
    19. '2010-08-10 11:35:15',   47.12
    20. union all select
    21. '2010-08-10 11:36:12',   88.55
    22. union all select
    23. '2010-08-10 11:40:31',   45.12
    24. union all select
    25. '2010-08-10 11:52:31', 23.45
    26.  
    27. -- Naomi's query
    28.  
    29. ;with cte as (select MIN(SalesDateTime) as MinDate,
    30. MAX(SalesDateTime) as MaxDate,
    31. convert(varchar(14),SalesDateTime, 120) as StartDate, DATEPART(minute, SalesDateTime) /15 as GroupID,
    32. avg(SalesAmount) as AvgAmount
    33. from @Sales
    34. group by convert(varchar(14),SalesDateTime, 120),
    35. DATEPART(minute, SalesDateTime) /15)
    36.  
    37. select dateadd(minute, 15*GroupID, CONVERT(datetime,StartDate+'00')) as [Start Date],
    38. dateadd(minute, 15*(GroupID+1), CONVERT(datetime,StartDate+'00')) as [End Date],
    39. cast(AvgAmount as decimal(12,2)) as [Average Amount],
    40. MinDate as [Min Date], MaxDate as [Max Date]
    41. from cte
    42.  
    43. -- Tom Cooper's query
    44. ;With cte As
    45. (Select DateAdd(minute, 15 * (DateDiff(minute, '20000101', SalesDateTime) / 15), '20000101') As SalesDateTime, SalesAmount
    46. From @Sales)
    47. Select SalesDateTime, Cast(Avg(SalesAmount) As decimal(12,2)) As AvegSalesAmount
    48. From cte
    49. Group By SalesDateTime;

    See also a different and simpler approach suggested by Celko in this thread of using Time based Calendar table.



    Second Problem - Find overlapping ranges

    The second problem is really a gem and it is presented in How can I find overlapped ranges? thread:

    Name Chromosome Start End
    N1 chr3 17443613 17443685
    N2 chr3 17443521 17443685
    N3 chr2 162180459 162180499
    N343 chr2 131865573 131865687
    N34 chr16 34623393 34623610
    N2456 chr3 17443512 17443685
    N43243 chr3 17443608 17443685

    The whole table is about 31,000 records (includes 22 chromosomes + X, Y chromosomes).

    The problem is to find the overlapped regions (or ranges) and these overlapped records have to be on the same Chromosome of this table.

    -------------
    I saw this problem at ~11pm on Wednesday night, thought I solved it and that's when the idea of this blog came to my mind.

    The next day, however, based on Hunchback's (Alejandro Mesa) comment I realized, that my "solution" worked only for a single overlapping range for the same chromosome. If we have multiple overlapping ranges, that solution will not work. I spent ~ an hour trying to fix my idea for multiple overlapping ranges, but gave up at the end, as I had work to do. So, I will show two ingenious solutions of this problem presented in that thread by (Peso) Peter Larsson.

    First let's create the test table with 100K records:

    1. CREATE TABLE [dbo].[Chromosomes](
    2.     [Name] [varchar](10) NOT NULL,
    3.     [Chromosome] [varchar](10) NOT NULL,
    4.     [iStart] [int] NOT NULL,
    5.     [iEnd] [int] NOT NULL
    6. )
    7. Go
    8. CREATE NONCLUSTERED INDEX [IX_Chromosomes] ON [dbo].[Chromosomes]
    9. (
    10.     [Chromosome] ASC,
    11.     [iStart] ASC,
    12.     [iEnd] ASC
    13. )
    14. INCLUDE ( [Name])
    15. insert into Chromosomes
    16. SELECT    'N' + CAST(ABS(CHECKSUM(NEWID())) % 5000 AS VARCHAR(12)) AS Name,
    17.     'chr' + CAST(ABS(CHECKSUM(NEWID())) % 22 AS VARCHAR(12)) AS Chromosome,
    18.     iStart,
    19.     iStart + ABS(CHECKSUM(NEWID())) % 10000 AS iEnd
    20. FROM    (
    21.         SELECT    TOP(100000)
    22.             ABS(CHECKSUM(NEWID())) % 200000000 AS iStart
    23.         FROM    Tally
    24.     ) AS d
    25.    
    26. select * from Chromosomes

    The first solution uses cursor and takes ~27 sec. to execute:

    1. -- Cursor based idea
    2. set nocount on
    3. declare @TimeStart datetime = getdate()
    4. CREATE TABLE    #Work
    5.         (
    6.             Chromosome VARCHAR(10) NOT NULL,
    7.             FromNum INT NOT NULL,
    8.             ToNum INT NOT NULL,
    9.             Names VARCHAR(MAX) NOT NULL,
    10.             Items INT NOT NULL
    11.         )
    12.  
    13. CREATE CLUSTERED INDEX IX_Chromosome ON #Work (Chromosome, FromNum)
    14.  
    15. DECLARE    curWork CURSOR READ_ONLY FOR    SELECT        Chromosome,
    16.                             iStart,
    17.                             iEnd,
    18.                             Name
    19.                     FROM        dbo.Chromosomes
    20.                     ORDER BY    Chromosome,
    21.                             iStart
    22.  
    23. DECLARE    @FromNum INT,
    24.     @ToNum INT,
    25.     @Chromosome VARCHAR(10),
    26.     @Name VARCHAR(10)
    27.  
    28. OPEN    curWork
    29.  
    30. FETCH    NEXT
    31. FROM    curWork
    32. INTO    @Chromosome,
    33.     @FromNum,
    34.     @ToNum,
    35.     @Name
    36.  
    37. WHILE @@FETCH_STATUS = 0
    38.     BEGIN
    39.         UPDATE    #Work
    40.         SET    FromNum = CASE WHEN FromNum < @FromNum THEN FromNum ELSE @FromNum END,
    41.             ToNum = CASE WHEN ToNum < @ToNum THEN @ToNum ELSE ToNum END,
    42.             Names = CASE WHEN Names LIKE '%, ' + @Name + ', %' THEN Names ELSE Names + ', ' + @Name END,
    43.             Items = Items + 1
    44.         WHERE    Chromosome = @Chromosome
    45.             AND FromNum <= @ToNum
    46.             AND ToNum >= @FromNum
    47.  
    48.         IF @@ROWCOUNT = 0
    49.             INSERT    #Work
    50.                 (
    51.                     Chromosome,
    52.                     FromNum,
    53.                     ToNum,
    54.                     Names,
    55.                     Items
    56.                 )
    57.             VALUES    (
    58.                     @Chromosome,
    59.                     @FromNum,
    60.                     @ToNum,
    61.                     @Name,
    62.                     1
    63.                 )
    64.  
    65.         FETCH    NEXT
    66.         FROM    curWork
    67.         INTO    @Chromosome,
    68.             @FromNum,
    69.             @ToNum,
    70.             @Name
    71.     END
    72.  
    73. CLOSE        curWork
    74. DEALLOCATE    curWork
    75.  
    76. SELECT        Chromosome,
    77.         FromNum,
    78.         ToNum,
    79.         Names,
    80.         Items
    81. FROM        #Work
    82. --WHERE        Items > 1    -- Uncomment this line to get only the overlapping ranges
    83. ORDER BY    FromNum
    84.  
    85. DROP TABLE #Work
    86. print 'Time elapsed (sec): ' + convert(varchar(30),datediff(second, @TimeStart, getdate()))

    Set based solution based on the quirky update idea - it takes ~5 second to execute:

    1. set statistics time off
    2. set nocount on
    3. declare @TimeStart datetime = getdate()
    4. SELECT  Name,
    5.     Chromosome,
    6.     iStart,
    7.     iEnd,
    8.     0 AS Grp
    9. INTO    #Temp
    10. FROM    dbo.Chromosomes
    11.  
    12. DECLARE @Grp INT = 0,
    13.     @Chromosome VARCHAR(10) = '',
    14.     @End INT = 0
    15.  
    16. ;WITH cteUpdate(Chromosome, iStart, iEnd, Grp)
    17. AS (
    18.     SELECT      TOP(2147483647)
    19.             Chromosome,
    20.             iStart,
    21.             iEnd,
    22.             Grp
    23.     FROM        #Temp
    24.     ORDER BY    Chromosome,
    25.             iStart
    26. )
    27. -- Quirky update - updating variable and field at the same time
    28. UPDATE  cteUpdate
    29. SET @Grp = Grp =    CASE
    30.                 WHEN Chromosome <> @Chromosome THEN @Grp + 1
    31.                 WHEN @End < iStart THEN @Grp + 1
    32.                 ELSE @Grp
    33.             END,
    34.     @EndCASE
    35.             WHEN Chromosome <> @Chromosome THEN iEnd
    36.             WHEN iEnd < @End THEN @End
    37.             ELSE iEnd
    38.         END,
    39.     @Chromosome = Chromosome
    40.  
    41. SELECT      Chromosome,
    42.         MIN(iStart) AS FromNum,
    43.         MAX(iEnd) AS ToNum,
    44.         CAST(MIN(Name) AS VARCHAR(MAX)) AS Names,
    45.         COUNT(*) AS Items,
    46.         Grp
    47. INTO        #Stage
    48. FROM        #Temp
    49. GROUP BY    Chromosome,
    50.         Grp
    51.  
    52. UPDATE      s
    53. SET     s.Names += f.Names
    54. FROM        #Stage AS s
    55. CROSS APPLY (
    56.             SELECT DISTINCT ', ' + x.Name
    57.             FROM        #Temp AS x
    58.             WHERE       x.Grp = s.Grp
    59.                     AND x.Name > s.Names
    60.             FOR XML     PATH('')
    61.         ) AS f(Names)
    62. WHERE       s.Items > 1
    63.  
    64. SELECT      Chromosome,
    65.         FromNum,
    66.         ToNum,
    67.         Names,
    68.         Items
    69. FROM        #Stage
    70. ORDER BY    Chromosome,
    71.         FromNum
    72.  
    73. DROP TABLE  #Temp,
    74.         #Stage
    75.        
    76. print 'Time elapsed (sec): ' + convert(varchar(30),datediff(second, @TimeStart, getdate()))


    Third problem - Transpose columns to rows

    The third problem is simple enough, but yet quite interesting. It is presented in the following Vertical Result Set thread

    Transpose table of any structure vertically (in other words, each column becomes a row) and show just a few typical data for every column.

    First problem is to convert the data into the same format. I chose nvarchar(max), but it may not be 100% working solution if there are columns with exotic data types that can not be converted to nvarchax(max). In this case we may want to use sql_variant as the type instead.

    Here is the solution I used based on AdventureWorks.Person.Address table:

    1. declare @SQL nvarchar(max)
    2.  
    3. select @SQL = coalesce(@SQL + '
    4. FROM AdventureWorks.Person.Address
    5. UNION ALL
    6. ','') + 'SELECT convert(nvarchar(max),' + quotename(column_name) + ') as Column_Value, ' +
    7. QUOTENAME(Column_Name,'''') + ' as Column_Name'
    8.  from AdventureWorks.INFORMATION_SCHEMA.COLUMNS
    9. where TABLE_NAME = 'Address' and TABLE_SCHEMA = 'Person'
    10. --print @cols
    11.  
    12.  
    13. set @SQL =' ;with cte as (' + @SQL + '
    14. FROM AdventureWorks.Person.Address)
    15. -- get 10 records per each column
    16. select * from (select *,
    17. row_number() over (partition by Column_Name order by Column_Value) as row
    18. from cte) X
    19. where Row <=10'
    20. order by Row, Column_Name
    21. --print @SQL
    22.  
    23. execute(@SQL)

    And here is Hunchback's (Alejandro Mesa) solution using SQL Server 2008 specific syntax:

    1. SELECT
    2.     C.*
    3. FROM
    4.     (
    5.  SELECT TOP (3)
    6.      AddressID,
    7.      AddressLine1,
    8.      AddressLine2,
    9.      City,
    10.      StateProvinceID,
    11.      PostalCode
    12.  FROM
    13.      Person.Address
    14.  ORDER BY
    15.   ModifiedDate
    16.  ) AS T
    17.  CROSS APPLY
    18.  (
    19.  VALUES
    20.   (AddressID, 'AddressID', CAST(AddressID AS sql_variant)),
    21.   (AddressID, 'AddressLine1', CAST(AddressLine1 AS sql_variant)),
    22.   (AddressID, 'AddressLine2', CAST(AddressLine2 AS sql_variant)),
    23.   (AddressID, 'City', CAST(City AS sql_variant)),
    24.   (AddressID, 'StateProvinceID', CAST(StateProvinceID AS sql_variant)),
    25.   (AddressID, 'PostalCode', CAST(PostalCode AS sql_variant))
    26.  ) AS C(rowident, cn, cv)
    27. ORDER BY
    28.  rowident,
    29.  cn;
    30. GO


    Fourth problem - Count % of NULLs in every column in a table

    I decided to add this problem here rather than create a new blog for the last week interesting problems.
    The problem is presented in this thread. For any given table find percent of NULL values in any column and post the result.

    My solution for this problem is to create dynamic query using the idea from my other blog How to get information about all databases without a loop:

    1. USE AdventureWorks
    2.  
    3. DECLARE  @TotalCount DECIMAL(10,2),
    4.          @SQL        NVARCHAR(MAX)
    5.  
    6. SELECT @TotalCount = COUNT(* )
    7. FROM   [AdventureWorks].[Production].[Product]
    8.  
    9. SELECT @SQL = COALESCE(@SQL + ', ','SELECT ') + CASE
    10.                                                   WHEN IS_NULLABLE = 'NO' THEN '0'
    11.                                                   ELSE 'cast(sum (case when ' + QUOTENAME(column_Name) + ' IS NULL then 1 else 0 end)/@TotalCount*100.00 as decimal(10,2)) '
    12.                                                 END + ' as [' + column_Name + ' NULL %] '
    13. FROM   INFORMATION_SCHEMA.COLUMNS
    14. WHERE  TABLE_NAME = 'Product'
    15.        AND TABLE_SCHEMA = 'Production'
    16.  
    17. SET @SQL = 'set @TotalCount = NULLIF(@TotalCount,0)  ' + @SQL + ' FROM [AdventureWorks].Production.Product'
    18.  
    19. --print @SQL
    20. EXECUTE SP_EXECUTESQL
    21.   @SQL ,
    22.   N'@TotalCount decimal(10,2)' ,
    23.   @TotalCount

    Hope you find these problems interesting as well and see you in a week (or more)...

    The next series of this blog:

    Interesting T-SQL problems - new problems

    And perhaps you appreciate this topic as well
    How to search a string in all tables in a database




    *** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    Naomi Nosonovsky, Senior Software Developer, has more than 15 years of enterprise experience in analysis, design, development and implementation of high-performance client/server data management solutions. She is a Personality of the Year for 2008, 2009, 2010 and 2011 at www.universalthread.com in .NET, SQL Server & VFP categories. She is also an All Star contributor/MCC Community Contributor at forums.asp.net and MSDN T-SQL forum.
    Social SitingsFacebookLinkedIndeliciousLTD RSS Feed
    6001 views
    InstapaperVote on HN

    5 comments

    Comment from: SQLDenis [Member] Email
    SQLDenis These are some cool solutions to some interesting problems

    08/29/10 @ 16:21
    Comment from: Erik [Member] Email
    Erik Ooh! I like how the UNION ALL SELECT literal values table can be replaced with the more succinct VALUES syntax, which (surprisingly) can still use an outer reference.
    08/29/10 @ 18:33
    Comment from: Mike [Visitor]
    Mike I like this site and information thanks.
    09/12/10 @ 18:06
    Comment from: Naomi Nosonovsky [Member]
    Naomi Nosonovsky I posted a followup on a different site. See here
    Interesting T-SQL problems - new problems

    You can find my new other blog also interesting
    How to search a string in all tables in a database
    11/04/10 @ 13:00
    Comment from: Fırat Esmer [Visitor] · http://firatesmer.com
    F&#305;rat Esmer There are some nice tips. Thanks
    04/25/13 @ 02:30

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)