In this blog post I would like to touch a very common problem which often trips up newbies and even serious SQL Professionals. I wanted to write this blog post for quite some time, but always put it aside. However, just a few days ago I have been dealing with this problem at my work place, so I believe it’s time to discuss the problem.

For the discussion let’s consider AdventureWorks database SalesOrderHeader and SalesOrderDetail tables. To illustrate the problem, let’s assume that returns are handled in a different table with a structure very similar to SalesOrderDetail. And let’s assume there are about 20 percent of returns (the business must not be doing too well!).

Ok, to create the returns table in the Sales schema let’s use the following code:

SELECT IDENTITY(INT) AS ReturnID, D.LineTotal AS ReturnTotal, D.ProductID, D.OrderQty, D.UnitPrice, D.SalesOrderID
INTO Sales.SalesReturns
FROM Sales.SalesOrderDetail D TABLESAMPLE (20 PERCENT)

Now, suppose we need to get quantity ordered and quantity returned as well as line total for both quantity and returned grouped by Customer ID. Let’s write the most intuitive query:

SELECT OH.CustomerID, 
       COALESCE(SUM(OD.OrderQty),0) AS [Ordered quantity],
       COALESCE(SUM(OD.LineTotal),0) AS [Ordered],
       COALESCE(SUM(R.OrderQty),0) AS [Returned quantity],       
       COALESCE(SUM(R.ReturnTotal),0) AS [Returned]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
LEFT JOIN Sales.SalesReturns R ON OH.SalesOrderID = R.SalesOrderID 
GROUP BY OH.CustomerID 
ORDER BY OH.CustomerID 

Which gives us these results

Aggregate Query Results – Ordered and Returned
CustomerIDOrdered quantityOrderedReturned quantityReturned
1882724610.142300857713619.368600
219825199.4993920.000000
3201404084947.217763189163814775.599788
482684493242.54479275304036924.558800
51190325894.561736973261739.811536

Now, let's verify these results by only running Ordered amounts.

Let's run this query (I re-wrote using CTE for simplicity):

;with cte as (SELECT OH.CustomerID, 
COALESCE(SUM(OD.OrderQty),0) AS [Ordered quantity],
COALESCE(SUM(OD.LineTotal),0) AS [Ordered]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
GROUP BY OH.CustomerID 
)
SELECT * FROM cte

which produces the following results:

Aggregate Query Results – Ordered only
CustomerIDOrdered quantityOrdered
112185177.081200
219825199.499392
31695361999.058170
4980586524.947384
529986177.847024

We see that numbers differ dramatically. So, what did happen in our first query? Why did it produce such high numbers that have nothing to do with the real numbers?

The reason is that for each row of Ordered results we get N Returned rows and so the total number of rows per one customer became M*N, where M is number of Ordered rows and N number of Returned rows. Therefore our results get much higher numbers.

So, the rule I always employ when I need to do aggregates that will involve more than one table to JOIN is to produce the necessary results separately as derived tables or CTE and only then join them together to form final result.

Here is how our query works when I re-write it using this idea:

;WITH Ordered AS (SELECT OH.CustomerID, 
COALESCE(SUM(OD.OrderQty),0) AS [Ordered Quantity],
COALESCE(SUM(OD.LineTotal),0) AS [Ordered]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
Group BY OH.CustomerID 
),

Returned AS (SELECT OH.CustomerID, 
COALESCE(SUM(R.OrderQty),0) AS [Returned Quantity],
COALESCE(SUM(R.ReturnTotal),0) AS [Returned]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesReturns R ON OH.SalesOrderID = R.SalesOrderID
Group BY OH.CustomerID 
)

SELECT COALESCE(O.CustomerID, R.CustomerID) AS CustomerID, 
COALESCE(O.[Ordered Quantity],0) AS [Ordered Quantity],
COALESCE(O.[Ordered],0) AS [Ordered], 
COALESCE(R.[Returned Quantity],0) AS [Returned Quantity],
COALESCE(R.[Returned],0) AS [Returned]

FROM Ordered O FULL JOIN Returned R ON O.CustomerID = R.CustomerID 
ORDER BY COALESCE(O.CustomerID, R.CustomerID)

with these correct now results:

Aggregate Query Results – Ordered and Returned
CustomerIDOrdered QuantityOrderedReturned QuantityReturned
112185177.0812004437199.802700
219825199.4993920.000000
31695361999.05817047191827.440195
4980586524.947384242130206.961392
529986177.8470248222023.096824

I used the FULL JOIN just in case we may have Ordered items and no returns or vs. versa for the customer.

I received a question of how to write this code for SQL 2000. In SQL 2000 we can not use CTE (common table expressions), but we can use derived tables, so the code above will be re-written in this manner:

SELECT COALESCE(O.CustomerID, R.CustomerID) AS CustomerID,
COALESCE(O.[Ordered Quantity],0) AS [Ordered Quantity],
COALESCE(O.[Ordered],0) AS [Ordered],
COALESCE(R.[Returned Quantity],0) AS [Returned Quantity],
COALESCE(R.[Returned],0) AS [Returned]

FROM (SELECT OH.CustomerID,
COALESCE(SUM(OD.OrderQty),0) AS [Ordered Quantity],
COALESCE(SUM(OD.LineTotal),0) AS [Ordered]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
GROUP BY OH.CustomerID
)O FULL JOIN (SELECT OH.CustomerID,
COALESCE(SUM(R.OrderQty),0) AS [Returned Quantity],
COALESCE(SUM(R.ReturnTotal),0) AS [Returned]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesReturns R ON OH.SalesOrderID = R.SalesOrderID
GROUP BY OH.CustomerID
) R ON O.CustomerID = R.CustomerID
ORDER BY COALESCE(O.CustomerID, R.CustomerID)

So, hopefully you can now easily recognize the problem and apply the same idea when dealing with a similar situation.

Say, few days ago I looked into report. The number of joins used in the query made my head dizzy – I told right away that because of the JOINs the total results are not going to be correct.

So, how did I solve the problem? I included the totals using WINDOW AGGREGATE function in the CTE first and then added the rest of the JOINs to get the data needed for report.

Unfortunately, I am not sure if an easier method exists for reports in SSRS when we need to present information for customer orders, for example, but also display customer addresses (and there may be many addresses per customer). Obviously, we will not get correct totals if we will just sum the records – we need to apply some tricks, simplest of them will be to pre-calculate totals and include these fields as part of the returned dataset. I will be interested to know your solutions for this problem.

I should also mention another way to solve this query (assuming we also want to return the Customer name) by using OUTER APPLY query.

Here I am showing two different approaches for comparison. The first approach performs better.

;WITH Ordered AS (SELECT OH.CustomerID, 
COALESCE(SUM(OD.OrderQty),0) AS [Ordered Quantity],
COALESCE(SUM(OD.LineTotal),0) AS [Ordered]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
Group BY OH.CustomerID 
),

Returned AS (SELECT OH.CustomerID, 
COALESCE(SUM(R.OrderQty),0) AS [Returned Quantity],
COALESCE(SUM(R.ReturnTotal),0) AS [Returned]

FROM Sales.SalesOrderHeader OH LEFT JOIN Sales.SalesReturns R ON OH.SalesOrderID = R.SalesOrderID
Group BY OH.CustomerID 
)

SELECT TOP (10) C.FirstName, C.LastName, 
COALESCE(O.[Ordered Quantity],0) AS [Ordered Quantity],
COALESCE(O.[Ordered],0) AS [Ordered], 
COALESCE(R.[Returned Quantity],0) AS [Returned Quantity],
COALESCE(R.[Returned],0) AS [Returned]
--INTO dbo.QueryResults
FROM Sales.vIndividualCustomer C
LEFT JOIN Ordered O On C.CustomerID = O.CustomerID 
LEFT JOIN Returned R ON C.CustomerID = R.CustomerID 
ORDER BY C.CustomerID 


SELECT TOP (10) C.CustomerID, C.FirstName, C.LastName,  
COALESCE(O.[Ordered Quantity],0) AS [Ordered Quantity],
COALESCE(O.[Ordered],0) AS [Ordered], 
COALESCE(R.[Returned Quantity],0) AS [Returned Quantity],
COALESCE(R.[Returned],0) AS [Returned]
FROM Sales.vIndividualCustomer C
OUTER APPLY (SELECT SUM(OD.OrderQty) AS [Ordered Quantity],
SUM(OD.LineTotal) AS [Ordered]
FROM Sales.SalesOrderHeader OH 
LEFT JOIN Sales.SalesOrderDetail OD ON OH.SalesOrderID = OD.SalesOrderID
WHERE OH.CustomerID = C.CustomerID) O

OUTER APPLY (SELECT 
COALESCE(SUM(R.OrderQty),0) AS [Returned Quantity],
COALESCE(SUM(R.ReturnTotal),0) AS [Returned]

FROM Sales.SalesOrderHeader OH 
LEFT JOIN Sales.SalesReturns R ON OH.SalesOrderID = R.SalesOrderID
WHERE OH.CustomerID = C.CustomerID) R
ORDER BY C.CustomerID 

*** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum