This blog post contains test cases for the first article in the series Including an Aggregated Column’s Related Values
There is also an excellent (as always) article by Itzik Ben Gan Optimizing TOP N per Group Queries. I suggest to read and try the recommendations from that article first.
There is a related thread on MSDN forum where different methods are also compared in speed.
One more related discussion on MSDN with more performance tests.
Take a look also at this T-SQL challenge
USE AdventureWorks
-- This solution is SQL Server 2000 compatible - correlated subquery
SET STATISTICS TIME ON
SELECT Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue] FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID
WHERE Ord.OrderDate =
(SELECT MAX(OrderDate) AS LastDate FROM
Sales.SalesOrderHeader OH WHERE OH.CustomerID = Ord.CustomerID)
-- The above solution will return duplicate records if there is more than 1 maximum date for a given customer
SELECT Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue] FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID
WHERE Ord.SalesOrderID =
(SELECT top 1 SalesOrderID FROM
Sales.SalesOrderHeader OH WHERE OH.CustomerID = Cust.CustomerID order by OrderDate DESC)
-- This is SQL Server 2000 compatible solution based on derived table idea
SELECT Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue] FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID
INNER join (SELECT CustomerID, MAX(OrderDate) AS LastDate FROM
Sales.SalesOrderHeader OH GROUP BY CustomerID) LastOrder
ON Cust.CustomerID = LastOrder.CustomerID and Ord.OrderDate = LastOrder.LastDate
-- The same comment as above applies - it will return duplicate records in case of several same last dates for the Customer
--- Two solutions bellow are only available in SQL Server 2005 and up - if we want them to return multiple records
--- We would need to use RANK() function instead of ROW_NUMBER()
SELECT * FROM (SELECT Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue],
ROW_NUMBER() OVER (PARTITION BY Ord.CustomerID
ORDER BY OrderDate DESC) AS rown FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID) Ordered WHERE rown = 1
SELECT TOP 1 WITH ties Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue] FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID
ORDER BY ROW_NUMBER() OVER (PARTITION BY Ord.CustomerID
ORDER BY OrderDate DESC)
-- Compound key solution that outperforms all other solutions
SELECT Cust.[CustomerID],
Cust.[TerritoryID],
Cust.[AccountNumber],
Cust.[CustomerType],
Ord.[SalesOrderID],
Ord.[OrderDate],
Ord.[DueDate],
Ord.[ShipDate],
Ord.[Status],
Ord.[SubTotal], Ord.[TaxAmt],
Ord.[Freight], Ord.[TotalDue] FROM Sales.Customer Cust
INNER JOIN Sales.SalesOrderHeader Ord
ON Cust.CustomerID = Ord.CustomerID
INNER join (SELECT CustomerID,
MAX(CONVERT(NVARCHAR(30), OrderDate, 126) + CAST(SalesOrderID AS CHAR(12))) AS CompoundKey FROM
Sales.SalesOrderHeader OH GROUP BY CustomerID) LastOrder
ON Cust.CustomerID = LastOrder.CustomerID and Ord.SalesOrderID = CAST(RIGHT(LastOrder.CompoundKey,12) AS INT)
SET STATISTICS TIME OFF
With the results
(19127 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 327 ms, elapsed time = 825 ms.
(19119 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 578 ms, elapsed time = 1261 ms.
(19127 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 810 ms.
(19119 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 359 ms, elapsed time = 865 ms.
(19119 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 436 ms, elapsed time = 972 ms.
(19119 row(s) affected)
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 375 ms, elapsed time = 778 ms.
The execution plans
This illustrates a real case scenario – see ASP.NET forum’s thread
CREATE TABLE Items(
ItemId int NOT NULL,
ItemName varchar(20) NOT NULL,
ItemDesciption varchar(20) NULL,
Supplier varchar(20) NULL,
ItemValue numeric(10, 2) NULL,
CONSTRAINT PK_Items PRIMARY KEY CLUSTERED (ItemId ASC),
CONSTRAINT UQ_Items_Name UNIQUE NONCLUSTERED (ItemName ASC)
)
GO
CREATE TABLE Bid(
BidId int IDENTITY(1,1) NOT NULL,
ItemId int NOT NULL,
BidAmount decimal(10, 2) NOT NULL,
BidDateTime datetime NOT NULL,
CONSTRAINT PK_Bid PRIMARY KEY CLUSTERED (BidId ASC)
)
GO
CREATE NONCLUSTERED INDEX IX_Bid
ON dbo.Bid (ItemId ASC, BidAmount DESC)
ALTER TABLE dbo.Bid
ADD CONSTRAINT FK_Bid_Items1
FOREIGN KEY(ItemId)
REFERENCES dbo.Items (ItemId)
GO
ALTER TABLE dbo.Bid CHECK CONSTRAINT FK_Bid_Items1
GO
SET NOCOUNT ON
Declare @i int
Declare @j int
Declare @r int
Set @i=2000
While @i<=100000 Begin
Insert Into Items( ItemId, ItemName, ItemDesciption, ItemValue )
Select @i,
'Item' + Right('0000'+ Cast(@i as varchar(5)),5),
'Desc' + Right('0000'+ Cast(@i as varchar(5)),5),
cast(Rand()*1000. as numeric(10,2))
Select @r = Rand()*10, @j=1
While @j<=@r Begin
Insert Into Bid(ItemId, BidAmount, BidDateTime)
Select @i,
cast(Rand()*1000. as numeric(10,2)),
GetDate()
Set @j=@j+1
end
Set @i=@i+1
end
GO
6 cases combined:
set nocount on
declare @ItemsCount int, @BidsCount int
select @ItemsCount = COUNT(*) from Items
select @BidsCount = COUNT(*) from Bid
print 'Test case - Number of Items - ' + cast(@ItemsCount as varchar(10)) + ' Bids count - ' + cast(@BidsCount as varchar(10))
print replicate('-',50) + char(13) + 'Nikola''s solution - TOP clause '
set statistics time on
SELECT Bid.BidId,
Bid.ItemId,
Bid.BidAmount,
Bid.BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue
From Items
Left Join Bid
on Bid.ItemId = Items.ItemId
and Bid.BidId in (Select Top 1 BidId
From Bid b
Where b.ItemId=Items.ItemId
Order By b.BidAmount desc) order by Items.ItemId
set statistics time off
print replicate('-',50) + char(13) + 'Derived table solution - can have duplicates '
set statistics time on
SELECT X.BidId,
X.ItemId,
X.BidAmount,
X.BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue
From Items
Left Join (select Bid.* from Bid inner join (
Select ItemId, MAX(BidAmount) as BidAmount from Bid group by ItemId ) B on
b.ItemId=Bid.ItemId and Bid.BidAmount = B.BidAmount) X on Items.ItemId = X.ItemId order by Items.ItemId
set statistics time off
print replicate('-',50) + char(13) + 'Correlated subquery solution '
set statistics time on
SELECT Bid.BidId,
Bid.ItemId,
Bid.BidAmount,
Bid.BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue
From Items
Left Join Bid on Bid.ItemId = Items.ItemId where Bid.ItemId IS NULL OR
Bid.BidAmount = (select MAX(BidAmount) from Bid b where b.ItemId = Items.ItemId) order by Items.ItemId
set statistics time off
print replicate('-',50) + char(13) + 'ROW_NUMBER() solution '
set statistics time on
select * from (SELECT Bid.BidId,
Bid.ItemId,
Bid.BidAmount,
Bid.BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue, ROW_NUMBER() over (PARTITION by Items.ItemID order by Bid.BidAmount Desc) as rn
From Items
Left Join Bid on Bid.ItemId = Items.ItemId) X where rn = 1 order by X.ItemId
set statistics time off
print replicate('-',50) + char(13) + 'Row_number() - slight variation '
set statistics time on
SELECT top 1 with ties Bid.BidId,
Bid.ItemId,
Bid.BidAmount,
Bid.BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue
From Items
Left Join Bid on Bid.ItemId = Items.ItemId
order by ROW_NUMBER() over (PARTITION by Items.ItemID order by Bid.BidAmount Desc)
set statistics time off
print replicate('-',50) + char(13) + 'Finally - compound key solution '
set statistics time on
select Bid.ItemId,
Bid.BidAmount,
cast(right(Bid.CompKey,30) as datetime) as BidDateTime,
Items.ItemId AS Expr1,
Items.ItemName,
Items.ItemDesciption,
Items.Supplier,
Items.ItemValue
From Items
Left Join (select ItemID, Max(BidAmount) as BidAmount,
MAX(cast(BidAmount as char(10)) + convert(char(30),BidDateTime,126)) as CompKey from Bid group by ItemId) Bid
on Items.ItemId = Bid.ItemId order by Items.ItemId
set statistics time off
Test results in SQL Server 2008 Express
Test case - Number of Items - 100000 Bids count - 497381
————————————————–
Nikola’s solution - TOP clause
SQL Server Execution Times:
CPU time = 842 ms, elapsed time = 1904 ms.
————————————————–
Derived table solution - can have duplicates
SQL Server Execution Times:
CPU time = 624 ms, elapsed time = 2734 ms.
————————————————–
Correlated subquery solution
SQL Server Execution Times:
CPU time = 3729 ms, elapsed time = 6418 ms.
————————————————–
ROW_NUMBER() solution
SQL Server Execution Times:
CPU time = 1342 ms, elapsed time = 4175 ms.
————————————————–
Row_number() - slight variation
SQL Server Execution Times:
CPU time = 2137 ms, elapsed time = 4633 ms.
————————————————–
Finally - compound key solution
SQL Server Execution Times:
CPU time = 1108 ms, elapsed time = 2742 ms.
*** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum