This is part 3 in my series on SQL window functions. In this post, we will explore using aggregation functions with T-SQL windows. SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of GROUPING and GROUPING_ID. However, prior to SQL Server 2012 only the PARTITION BY clause was supported which greatly limited the usability of aggregate window functions. When support for the ORDER BY clause was introduced in SQL Server 2012, more complex business problems such as running totals could be solved without the extensive use of cursors or nested select statement. In my experience, I used to try various ways to get around this limitation including pushing the data to .NET as it could solve this problem more efficiently. However, this was not always possible when working with reporting. Now that we are able to use SQL to solve the problem, more complex and low-performing solutions can be replaced with these window functions.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012’ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012’, 11.00, ‘Sam’
union select 3, ‘3/2/2012’, 10.00, ‘Beth’
union select 4, ‘3/2/2012’, 15.00, ‘Joe’
union select 5, ‘3/2/2012’, 17.00, ‘Sam’
union select 6, ‘3/3/2012’, 12.00, ‘Joe’
union select 7, ‘3/4/2012’, 10.00, ‘Beth’
union select 8, ‘3/4/2012’, 18.00, ‘Sam’
union select 9, ‘3/4/2012’, 12.00, ‘Joe’
union select 10, ‘3/4/2012’, 11.00, ‘Beth’
union select 11, ‘3/5/2012’, 14.00, ‘Sam’
union select 12, ‘3/6/2012’, 17.00, ‘Beth’
union select 13, ‘3/6/2012’, 19.00, ‘Joe’
union select 14, ‘3/7/2012’, 13.00, ‘Beth’
union select 15, ‘3/7/2012’, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Using PARTITION BY with Aggregate Functions

SQL Server 2005 and the newer versions supports the usage of the PARTITION BY clause by itself. This allowed for some simple aggregate windows. The following example shows SUM and AVG for different partitions of data. The third function actually creates and average using a SUM and COUNT function.

select CustomerName
,OrderDate
,OrderAmt
,SUM(OrderAmt) OVER(PARTITION BY CustomerName) CustomerTotal
,AVG(OrderAmt) OVER(PARTITION BY OrderDate) AvgDailyAmt
,CAST(COUNT(OrderID) OVER(PARTITION BY OrderDate) as DECIMAL(8,3)) / CAST(COUNT(OrderID) OVER() as DECIMAL(8,3)) PctOfTotalPerDay
from CTEOrders
order by OrderDate;

NOTE: The COUNT aggregate returns an integer value. In order to return the decimal, the values need to be explicitly converted to decimal types. Otherwise, the result was rounding to zero for all results in this sample.

Results

CustomerName

OrderDate

OrderAmt

CustomerTotal

AvgDailyAmt

PctOfTotalPerDay

Joe

3/1/2012

10

68

10.5

0.133333333

Sam

3/1/2012

11

76

10.5

0.133333333

Sam

3/2/2012

17

76

14

0.2

Joe

3/2/2012

15

68

14

0.2

Beth

3/2/2012

10

61

14

0.2

Joe

3/3/2012

12

68

12

0.066666667

Joe

3/4/2012

12

68

12.75

0.266666667

Beth

3/4/2012

10

61

12.75

0.266666667

Beth

3/4/2012

11

61

12.75

0.266666667

Sam

3/4/2012

18

76

12.75

0.266666667

Sam

3/5/2012

14

76

14

0.066666667

Beth

3/6/2012

17

61

18

0.133333333

Joe

3/6/2012

19

68

18

0.133333333

Beth

3/7/2012

13

61

14.5

0.133333333

Sam

3/7/2012

16

76

14.5

0.133333333

Using Subselects

As many of you know, subselect statements in SQL Server are supported, but harder to optimize in SQL Server versus Oracle. Until window functions were introduced all of the items above could be solved by subselects, but performance would degrade as the results needed to work with larger sets of data. With the improved functionality in SQL Server 2012, you should not need to use subselects to return row based aggregations. Besides the performance implications, maintenance will also be much simpler as the SQL becomes more transparent. For reference, here is the subselect syntax to return the same results as above:

select cte.CustomerName
,cte.OrderDate
,cte.OrderAmt
,(select SUM(OrderAmt) from CTEOrders where CustomerName = cte.CustomerName) CustomerTotal
,(select AVG(OrderAmt) from CTEOrders where OrderDate = cte.OrderDate) AvgDailyAmt
,(select cast(COUNT(OrderID) as DECIMAL(8,3)) from CTEOrders where OrderDate = cte.OrderDate)
/ (select cast(COUNT(OrderID) as DECIMAL(8,3)) from CTEOrders) AvgDailyAmt
from CTEOrders cte
order by cte.OrderDate;

As you can see, while it is possible to solve the same function using the subselects, the code is already getting messier and with data sets larger than what we have here, you would definitely see performance degredation.

Some Thoughts on GROUP BY

While I am digressing, I wanted to also highlight some details concerning GROUP BY. The one the biggest difficulties working with the GROUP BY clause and aggregates, every column must either be a part of the GROUP BY or have an aggregation associated with it. The window functions help solve this problem as well.

In the following examples, the first query returns the sum of the amount by day. This is pretty standard logic when working with aggregated queries in SQL.

select OrderDate
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate;

However, if you wanted to see more details, but not include them in the aggregation, the following will not work.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate
,OrderID
,OrderAmt;

This SQL statement will reach row individually with the sum at the detail level. You can solve this using the subselect above which is not recommended or you can use a window function.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) OVER (PARTITION BY OrderDate) as DailyOrderAmt
from CTEOrders

As you can see here and in previous examples the OVER clause allows you to manage the grouping based on the context specified in relationship to the current row.

One other twist on the GROUP BY clause. First, I need to give credit to Itzik Ben-Gan for calling this to my attention at one of our Minnesota SQL Server User Group meetings. In his usual fashion he was showing some T-SQL coolness and he showed an interesting error when using the OVER clause with the GROUP BY clause.

The following will return an error because the first expression is an aggregate, but the second expression which is using the OVER clause is not. Also note that in this example the OVER clause is being evaluated for the entire set of data.

select sum(OrderAmt)
, sum(OrderAmt) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

The expression above returns the following error:

Column ‘CTEOrders.OrderAmt’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

The goal of the statement above was to show the customer’s total order amount with the overall order amount. The following statement resolves this issue because it is aggregating the aggregates. The window is now summing the aggregated amount which are grouped on the customer name.

select sum(OrderAmt)
, sum(sum(OrderAmt)) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

Thanks again to Itzik for bringing this problem and resolution to my attention.

Aggregates with ORDER BY

With the expansion of the OVER clause to include ORDER BY support with aggregates, window functions increased their value substantially. One of the key business problems this allowed us to solve was a running aggregate.

The first example shows how to get a running total by customer based on date and order ID.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as RunningByCustomer
from CTEOrders
ORDER BY CustomerName, OrderDate;

Results

OrderID

OrderDate

OrderAmt

CustomerName

RunningByCustomer

3

3/2/2012

10

Beth

10

7

3/4/2012

10

Beth

20

10

3/4/2012

11

Beth

31

12

3/6/2012

17

Beth

48

14

3/7/2012

13

Beth

61

1

3/1/2012

10

Joe

10

4

3/2/2012

15

Joe

25

6

3/3/2012

12

Joe

37

9

3/4/2012

12

Joe

49

13

3/6/2012

19

Joe

68

2

3/1/2012

11

Sam

11

5

3/2/2012

17

Sam

28

8

3/4/2012

18

Sam

46

11

3/5/2012

14

Sam

60

15

3/7/2012

16

Sam

76

This next example is more creative. It begins to show how powerful the window functions are. In this statement, we are going to return the annual running total aggregated by day. The differentiator here is that we use a DATEPART function in the OVER clause to achieve the desired results.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY datepart(yyyy, OrderDate) ORDER BY OrderDate) as AnnualRunning
from CTEOrders
ORDER BY OrderDate;

Results

OrderID

OrderDate

OrderAmt

CustomerName

AnnualRunning

1

3/1/2012

10

Joe

21

2

3/1/2012

11

Sam

21

3

3/2/2012

10

Beth

63

4

3/2/2012

15

Joe

63

5

3/2/2012

17

Sam

63

6

3/3/2012

12

Joe

75

7

3/4/2012

10

Beth

126

8

3/4/2012

18

Sam

126

9

3/4/2012

12

Joe

126

10

3/4/2012

11

Beth

126

11

3/5/2012

14

Sam

140

12

3/6/2012

17

Beth

176

13

3/6/2012

19

Joe

176

14

3/7/2012

13

Beth

205

15

3/7/2012

16

Sam

205

The ORDER BY clause creates an expanding group within the partition. In the examples above, the partition was the customer. Within each partition, ordered groups based on order date and order id are “created”. At each row, the order date and order id groups are aggregated up to the current row’s group thus producing the running total. If more than one row has the same order grouping, all of the rows in the group are aggregated into the total as shown in the second example above with the days and years.

Aggregates with ROWS

The ROWS clause is used to further define the partition by specifying which physical rows to include based on their proximity to the current row. As noted in the first post in the series, ROWS requires the ORDER BY clause as this determines the orientation of the partition.

The following example uses the FOLLOWING keywords to find the next two purchases that the customer made.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
from CTEOrders
order by CustomerName, OrderDate, OrderID;

Results

OrderID

OrderDate

OrderAmt

CustomerName

NextTwoAmts

3

3/2/2012

10

Beth

21

7

3/4/2012

10

Beth

28

10

3/4/2012

11

Beth

30

12

3/6/2012

17

Beth

13

14

3/7/2012

13

Beth

NULL

1

3/1/2012

10

Joe

27

4

3/2/2012

15

Joe

24

6

3/3/2012

12

Joe

31

9

3/4/2012

12

Joe

19

13

3/6/2012

19

Joe

NULL

2

3/1/2012

11

Sam

35

5

3/2/2012

17

Sam

32

8

3/4/2012

18

Sam

30

11

3/5/2012

14

Sam

16

15

3/7/2012

16

Sam

NULL

As we noted in the first blog, the last two rows in the partition only contain partial values. For example, order ID 12 contains the sum of only one order, 14, and order ID 14 has now rows following it in the partition and returns NULL as a result. When working with the ROWS clause this must be taken into account.

Aggregates with RANGE

Lastly, adding the RANGE to the OVER clause allows you to create aggregates which go to the beginning or end of the partition. RANGE is commonly used with UNBOUNDED FOLLOWING which goes to the end of the partition and UNBOUNDED PRECEDING which goes to the beginning of the partition. One of the most common use would be to specify the rows from the beginning of the partition to the current row which allows for aggregations such as year to date.

In the example below, we are calculating the average order size over time to the current row. This could be a very effective in a trending report.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,AVG(OrderAmt) OVER (ORDER BY OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as AvgOrderAmt
from CTEOrders
order by OrderDate;

Results

OrderID

OrderDate

OrderAmt

CustomerName

AvgOrderAmt

1

3/1/2012

10

Joe

10

2

3/1/2012

11

Sam

10.5

3

3/2/2012

10

Beth

10.333333

4

3/2/2012

15

Joe

11.5

5

3/2/2012

17

Sam

12.6

6

3/3/2012

12

Joe

12.5

7

3/4/2012

10

Beth

12.142857

8

3/4/2012

18

Sam

12.875

9

3/4/2012

12

Joe

12.777777

10

3/4/2012

11

Beth

12.6

11

3/5/2012

14

Sam

12.727272

12

3/6/2012

17

Beth

13.083333

13

3/6/2012

19

Joe

13.538461

14

3/7/2012

13

Beth

13.5

15

3/7/2012

16

Sam

13.666666

As you can see, the latest versions of OVER clause supports powerful yet simple aggregations which can help in a multitude of reporting and business solutions. Up next, the last blog in the series – Analytic Functions which are all new in SQL Server 2012.