Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « T-SQL Window Functions - Part 4: Analytic FunctionsT-SQL Window Functions - Part 2: Ranking Functions »
    comments

    This is part 3 in my series on SQL window functions. In this post, we will explore using aggregation functions with T-SQL windows. SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of GROUPING and GROUPING_ID. However, prior to SQL Server 2012 only the PARTITION BY clause was supported which greatly limited the usability of aggregate window functions. When support for the ORDER BY clause was introduced in SQL Server 2012, more complex business problems such as running totals could be solved without the extensive use of cursors or nested select statement. In my experience, I used to try various ways to get around this limitation including pushing the data to .NET as it could solve this problem more efficiently. However, this was not always possible when working with reporting. Now that we are able to use SQL to solve the problem, more complex and low-performing solutions can be replaced with these window functions.

    Once again, the following CTE will be used as the query in all examples throughout the post:

    with CTEOrders as
    (select cast(1 as int) as OrderID, cast('3/1/2012' as date) as OrderDate, cast(10.00 as money) as OrderAmt, 'Joe' as CustomerName
    union select 2, '3/1/2012', 11.00, 'Sam'
    union select 3, '3/2/2012', 10.00, 'Beth'
    union select 4, '3/2/2012', 15.00, 'Joe'
    union select 5, '3/2/2012', 17.00, 'Sam'
    union select 6, '3/3/2012', 12.00, 'Joe'
    union select 7, '3/4/2012', 10.00, 'Beth'
    union select 8, '3/4/2012', 18.00, 'Sam'
    union select 9, '3/4/2012', 12.00, 'Joe'
    union select 10, '3/4/2012', 11.00, 'Beth'
    union select 11, '3/5/2012', 14.00, 'Sam'
    union select 12, '3/6/2012', 17.00, 'Beth'
    union select 13, '3/6/2012', 19.00, 'Joe'
    union select 14, '3/7/2012', 13.00, 'Beth'
    union select 15, '3/7/2012', 16.00, 'Sam'
    )
    select OrderID
    ,OrderDate
    ,OrderAmt
    ,CustomerName
    from CTEOrders;

    Using PARTITION BY with Aggregate Functions

    SQL Server 2005 and the newer versions supports the usage of the PARTITION BY clause by itself. This allowed for some simple aggregate windows. The following example shows SUM and AVG for different partitions of data. The third function actually creates and average using a SUM and COUNT function.

    select CustomerName
    ,OrderDate
    ,OrderAmt
    ,SUM(OrderAmt) OVER(PARTITION BY CustomerName) CustomerTotal
    ,AVG(OrderAmt) OVER(PARTITION BY OrderDate) AvgDailyAmt
    ,CAST(COUNT(OrderID) OVER(PARTITION BY OrderDate) as DECIMAL(8,3)) / CAST(COUNT(OrderID) OVER() as DECIMAL(8,3)) PctOfTotalPerDay
    from CTEOrders
    order by OrderDate;

    NOTE: The COUNT aggregate returns an integer value. In order to return the decimal, the values need to be explicitly converted to decimal types. Otherwise, the result was rounding to zero for all results in this sample.

    Results

    CustomerName

    OrderDate

    OrderAmt

    CustomerTotal

    AvgDailyAmt

    PctOfTotalPerDay

    Joe

    3/1/2012

    10

    68

    10.5

    0.133333333

    Sam

    3/1/2012

    11

    76

    10.5

    0.133333333

    Sam

    3/2/2012

    17

    76

    14

    0.2

    Joe

    3/2/2012

    15

    68

    14

    0.2

    Beth

    3/2/2012

    10

    61

    14

    0.2

    Joe

    3/3/2012

    12

    68

    12

    0.066666667

    Joe

    3/4/2012

    12

    68

    12.75

    0.266666667

    Beth

    3/4/2012

    10

    61

    12.75

    0.266666667

    Beth

    3/4/2012

    11

    61

    12.75

    0.266666667

    Sam

    3/4/2012

    18

    76

    12.75

    0.266666667

    Sam

    3/5/2012

    14

    76

    14

    0.066666667

    Beth

    3/6/2012

    17

    61

    18

    0.133333333

    Joe

    3/6/2012

    19

    68

    18

    0.133333333

    Beth

    3/7/2012

    13

    61

    14.5

    0.133333333

    Sam

    3/7/2012

    16

    76

    14.5

    0.133333333

    Using Subselects

    As many of you know, subselect statements in SQL Server are supported, but harder to optimize in SQL Server versus Oracle. Until window functions were introduced all of the items above could be solved by subselects, but performance would degrade as the results needed to work with larger sets of data. With the improved functionality in SQL Server 2012, you should not need to use subselects to return row based aggregations. Besides the performance implications, maintenance will also be much simpler as the SQL becomes more transparent. For reference, here is the subselect syntax to return the same results as above:

    select cte.CustomerName
    ,cte.OrderDate
    ,cte.OrderAmt
    ,(select SUM(OrderAmt) from CTEOrders where CustomerName = cte.CustomerName) CustomerTotal
    ,(select AVG(OrderAmt) from CTEOrders where OrderDate = cte.OrderDate) AvgDailyAmt
    ,(select cast(COUNT(OrderID) as DECIMAL(8,3)) from CTEOrders where OrderDate = cte.OrderDate)
    / (select cast(COUNT(OrderID) as DECIMAL(8,3)) from CTEOrders) AvgDailyAmt
    from CTEOrders cte
    order by cte.OrderDate;

    As you can see, while it is possible to solve the same function using the subselects, the code is already getting messier and with data sets larger than what we have here, you would definitely see performance degredation.

    Some Thoughts on GROUP BY

    While I am digressing, I wanted to also highlight some details concerning GROUP BY. The one the biggest difficulties working with the GROUP BY clause and aggregates, every column must either be a part of the GROUP BY or have an aggregation associated with it. The window functions help solve this problem as well.
    In the following examples, the first query returns the sum of the amount by day. This is pretty standard logic when working with aggregated queries in SQL.

    select OrderDate
    ,sum(OrderAmt) as DailyOrderAmt
    from CTEOrders
    group by OrderDate;

    However, if you wanted to see more details, but not include them in the aggregation, the following will not work.

    select OrderDate
    ,OrderID
    ,OrderAmt
    ,sum(OrderAmt) as DailyOrderAmt
    from CTEOrders
    group by OrderDate
    ,OrderID
    ,OrderAmt;

    This SQL statement will reach row individually with the sum at the detail level. You can solve this using the subselect above which is not recommended or you can use a window function.

    select OrderDate
    ,OrderID
    ,OrderAmt
    ,sum(OrderAmt) OVER (PARTITION BY OrderDate) as DailyOrderAmt
    from CTEOrders

    As you can see here and in previous examples the OVER clause allows you to manage the grouping based on the context specified in relationship to the current row.
    One other twist on the GROUP BY clause. First, I need to give credit to Itzik Ben-Gan for calling this to my attention at one of our Minnesota SQL Server User Group meetings. In his usual fashion he was showing some T-SQL coolness and he showed an interesting error when using the OVER clause with the GROUP BY clause.

    The following will return an error because the first expression is an aggregate, but the second expression which is using the OVER clause is not. Also note that in this example the OVER clause is being evaluated for the entire set of data.

    select sum(OrderAmt)
    , sum(OrderAmt) over() as TotalOrderAmt
    from CTEOrders
    group by CustomerName

    The expression above returns the following error:
    Column 'CTEOrders.OrderAmt' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

    The goal of the statement above was to show the customer’s total order amount with the overall order amount. The following statement resolves this issue because it is aggregating the aggregates. The window is now summing the aggregated amount which are grouped on the customer name.

    select sum(OrderAmt)
    , sum(sum(OrderAmt)) over() as TotalOrderAmt
    from CTEOrders
    group by CustomerName

    Thanks again to Itzik for bringing this problem and resolution to my attention.

    Aggregates with ORDER BY

    With the expansion of the OVER clause to include ORDER BY support with aggregates, window functions increased their value substantially. One of the key business problems this allowed us to solve was a running aggregate.

    The first example shows how to get a running total by customer based on date and order ID.

    select OrderID
    ,OrderDate
    ,OrderAmt
    ,CustomerName
    ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as RunningByCustomer
    from CTEOrders
    ORDER BY CustomerName, OrderDate;

    Results

    OrderID

    OrderDate

    OrderAmt

    CustomerName

    RunningByCustomer

    3

    3/2/2012

    10

    Beth

    10

    7

    3/4/2012

    10

    Beth

    20

    10

    3/4/2012

    11

    Beth

    31

    12

    3/6/2012

    17

    Beth

    48

    14

    3/7/2012

    13

    Beth

    61

    1

    3/1/2012

    10

    Joe

    10

    4

    3/2/2012

    15

    Joe

    25

    6

    3/3/2012

    12

    Joe

    37

    9

    3/4/2012

    12

    Joe

    49

    13

    3/6/2012

    19

    Joe

    68

    2

    3/1/2012

    11

    Sam

    11

    5

    3/2/2012

    17

    Sam

    28

    8

    3/4/2012

    18

    Sam

    46

    11

    3/5/2012

    14

    Sam

    60

    15

    3/7/2012

    16

    Sam

    76

    This next example is more creative. It begins to show how powerful the window functions are. In this statement, we are going to return the annual running total aggregated by day. The differentiator here is that we use a DATEPART function in the OVER clause to achieve the desired results.

    select OrderID
    ,OrderDate
    ,OrderAmt
    ,CustomerName
    ,SUM(OrderAmt) OVER (PARTITION BY datepart(yyyy, OrderDate) ORDER BY OrderDate) as AnnualRunning
    from CTEOrders
    ORDER BY OrderDate;

    Results

    OrderID

    OrderDate

    OrderAmt

    CustomerName

    AnnualRunning

    1

    3/1/2012

    10

    Joe

    21

    2

    3/1/2012

    11

    Sam

    21

    3

    3/2/2012

    10

    Beth

    63

    4

    3/2/2012

    15

    Joe

    63

    5

    3/2/2012

    17

    Sam

    63

    6

    3/3/2012

    12

    Joe

    75

    7

    3/4/2012

    10

    Beth

    126

    8

    3/4/2012

    18

    Sam

    126

    9

    3/4/2012

    12

    Joe

    126

    10

    3/4/2012

    11

    Beth

    126

    11

    3/5/2012

    14

    Sam

    140

    12

    3/6/2012

    17

    Beth

    176

    13

    3/6/2012

    19

    Joe

    176

    14

    3/7/2012

    13

    Beth

    205

    15

    3/7/2012

    16

    Sam

    205

    The ORDER BY clause creates an expanding group within the partition. In the examples above, the partition was the customer. Within each partition, ordered groups based on order date and order id are “created”. At each row, the order date and order id groups are aggregated up to the current row’s group thus producing the running total. If more than one row has the same order grouping, all of the rows in the group are aggregated into the total as shown in the second example above with the days and years.

    Aggregates with ROWS

    The ROWS clause is used to further define the partition by specifying which physical rows to include based on their proximity to the current row. As noted in the first post in the series, ROWS requires the ORDER BY clause as this determines the orientation of the partition.

    The following example uses the FOLLOWING keywords to find the next two purchases that the customer made.

    select OrderID
    ,OrderDate
    ,OrderAmt
    ,CustomerName
    ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
    from CTEOrders
    order by CustomerName, OrderDate, OrderID;

    Results

    OrderID

    OrderDate

    OrderAmt

    CustomerName

    NextTwoAmts

    3

    3/2/2012

    10

    Beth

    21

    7

    3/4/2012

    10

    Beth

    28

    10

    3/4/2012

    11

    Beth

    30

    12

    3/6/2012

    17

    Beth

    13

    14

    3/7/2012

    13

    Beth

    NULL

    1

    3/1/2012

    10

    Joe

    27

    4

    3/2/2012

    15

    Joe

    24

    6

    3/3/2012

    12

    Joe

    31

    9

    3/4/2012

    12

    Joe

    19

    13

    3/6/2012

    19

    Joe

    NULL

    2

    3/1/2012

    11

    Sam

    35

    5

    3/2/2012

    17

    Sam

    32

    8

    3/4/2012

    18

    Sam

    30

    11

    3/5/2012

    14

    Sam

    16

    15

    3/7/2012

    16

    Sam

    NULL

    As we noted in the first blog, the last two rows in the partition only contain partial values. For example, order ID 12 contains the sum of only one order, 14, and order ID 14 has now rows following it in the partition and returns NULL as a result. When working with the ROWS clause this must be taken into account.

    Aggregates with RANGE

    Lastly, adding the RANGE to the OVER clause allows you to create aggregates which go to the beginning or end of the partition. RANGE is commonly used with UNBOUNDED FOLLOWING which goes to the end of the partition and UNBOUNDED PRECEDING which goes to the beginning of the partition. One of the most common use would be to specify the rows from the beginning of the partition to the current row which allows for aggregations such as year to date.

    In the example below, we are calculating the average order size over time to the current row. This could be a very effective in a trending report.

    select OrderID
    ,OrderDate
    ,OrderAmt
    ,CustomerName
    ,AVG(OrderAmt) OVER (ORDER BY OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as AvgOrderAmt
    from CTEOrders
    order by OrderDate;

    Results

    OrderID

    OrderDate

    OrderAmt

    CustomerName

    AvgOrderAmt

    1

    3/1/2012

    10

    Joe

    10

    2

    3/1/2012

    11

    Sam

    10.5

    3

    3/2/2012

    10

    Beth

    10.333333

    4

    3/2/2012

    15

    Joe

    11.5

    5

    3/2/2012

    17

    Sam

    12.6

    6

    3/3/2012

    12

    Joe

    12.5

    7

    3/4/2012

    10

    Beth

    12.142857

    8

    3/4/2012

    18

    Sam

    12.875

    9

    3/4/2012

    12

    Joe

    12.777777

    10

    3/4/2012

    11

    Beth

    12.6

    11

    3/5/2012

    14

    Sam

    12.727272

    12

    3/6/2012

    17

    Beth

    13.083333

    13

    3/6/2012

    19

    Joe

    13.538461

    14

    3/7/2012

    13

    Beth

    13.5

    15

    3/7/2012

    16

    Sam

    13.666666

    As you can see, the latest versions of OVER clause supports powerful yet simple aggregations which can help in a multitude of reporting and business solutions. Up next, the last blog in the series – Analytic Functions which are all new in SQL Server 2012.

    About the Author

    User bio imageSteve Hughes is a Practice Lead at Magenic. He works with Microsoft’s SQL Server stack to deliver data and business intelligence solutions. He has been working with technology for over 15 years with much of that time spent on creating business intelligence solutions including multidimensional databases, reporting solutions, and data marts. He is passionate about using data effectively and helping customers understand that data is valuable and profitable. Steve also has experience working with a large Software as a Service vendor where he received awards for customer satisfaction and engineering excellence. Steve believes the data in the cloud will change how information can be delivered to and consumed by the masses. He also provides insights from the field on his own blog at http://dataonwheels.wordpress.com. Steve also is active in PASS as MNPASS chapter leader and NE Regional Mentor.
    Social SitingsTwitterLinkedInHomePageLTD RSS Feed
    InstapaperVote on HN

    3 comments

    Comment from: Brad [Visitor]
    Brad Great article with detailed explanations and examples! Nice to know you can have a windowed aggregate surrounding a group by aggregate. Thanks!
    09/18/12 @ 10:22
    Comment from: Joe Celko [Visitor] Email
    Joe Celko Did you ever consider using ANSI/ISO Standard SQL and not Dialect? Why did you use MONEY when you know it does incorrect math? Why did you not use ISO-8601 dates since MS is finally moving over to them? Here is a quick re-write of the CTE body

    CTE_Orders (order_id, order_date, order_amt, customer_name)
    AS
    (SELECT X.*
    FROM (VALUES (CAST(1 AS INTEGER), CAST('2012-03-01' AS DATE),
    CAST(10.00 AS DECIMAL(12,2)), CAST('Joe' AS VARCHAR(25)),
    (2, '2012-03-01', 11.00, 'Sam'),
    (3, '2012-03-02', 10.00, 'Beth'),
    (4, '2012-03-02', 15.00, 'Joe'),
    (5, '2012-03-02', 17.00, 'Sam'),
    (6, '2012-03-03', 12.00, 'Joe'),
    (7, '2012-03-04', 10.00, 'Beth'),
    (8, '2012-03-04', 18.00, 'Sam'),
    (9, '2012-03-04', 12.00, 'Joe'),
    (10, '2012-03-04', 11.00, 'Beth'),
    (11, '2012-03-05', 14.00, 'Sam'),
    (12, '2012-03-06', 17.00, 'Beth'),
    (13, '2012-03-06', 19.00, 'Joe'),
    (14, '2012-03-07', 13.00, 'Beth'),
    (15, '2012-03-07', 16.00, 'Sam'))))
    AS X(order_id, order_date, order_amt, customer_name)

    I have some good stories about the Window clause and how it got into the Standards. However, most of use the ROWS option and not the RANGE
    09/18/12 @ 13:29
    Comment from: Steve Hughes (DataOnWheels) [Member]
    Steve Hughes (DataOnWheels) Joe -
    Thanks for the data type feedback. I will take this into consideration in the future for my general use of types and for my demos. For consistency, I will keep the CTE unchanged for this series as it is demo code only.
    09/19/12 @ 09:19

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)