Often, I see in articles and code a query using the EXISTS logical operator and a Subquery within the EXISTS using SELECT *. Typically this is an area where it’s been instilled in me to use SELECT 1 instead of SELECT * for performance reasons. However, is it a myth or is it reality that SELECT * vs. SELECT 1 will make an impact to performance when used in the EXISTS logical operator? The best way to find the answer is to take a closer look.
**Explanation of SELECT 1 vs. SELECT ***
There is one essential difference between the use of SELECT * and SELECT 1. SELECT * will expand the column list and then throw what isn’t needed out. Now, don’t take, “throw what isn’t needed out” literally. The compilation of the query will simply determine which columns are relevant and to be used. With SELECT 1, this step isn’t performed during compilation..
It’s important to note that compilation is where the effects of this occur. The runtime versions of these different methods will be used functionally, the same with the same performance.
Example
To look closely at the difference in compilation and execution time, take the following two queries.
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE EXISTS (SELECT * FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber = '4911-403C-98')
OPTION (RECOMPILE)
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber = '4911-403C-98')
OPTION (RECOMPILE)
Looking at both execution plans, they are identical.
These queries perform the same task but are differentiated by the use of * and 1 in the EXISTS logical operator. If we execute these two queries using SQLQueryStress to examine the elapsed time closely, we can determine if there truly a difference between the two methods
As shown, there is a slight improvement with the use of SELECT 1. Running this test repeatedly shows variable returns in elapsed time. However, the running average of SELECT 1 has a better overall impact and is constant.
Now, the impact of .0190 is extremely low. In order to really see if the metadata expansion of the columns has a larger impact when there are a high number of columns, run the following code to create an extremely wide table.
DECLARE @int int = 1
DECLARE @cmd VARCHAR(max) = ''
WHILE @int <= 1000
BEGIN
SET @cmd = @cmd + ' col' + CAST(@int AS varchar(4)) + ' tinyint, '
SET @int += 1
END
SET @cmd = 'CREATE TABLE tbl (SalesOrderID INT, ' + @cmd + ' MyWideTable int)'
Exec (@cmd)
This creates a table that has 1002 columns in it. Now, alter the select statements to see if the expansion of the columns and review of the metadata during compilation affects the elapsed time. Note: we haven’t loaded data into tbl because it should not be required for this operation.
As shown, the time difference in using SELECT 1 vs. SELECT * has roughly the same effect even with the extremely high count of columns. The fact remains that the need to expand the columns and remove what is not needed is part of the compilation process. The higher column count really didn’t show much of an effect over a typical column with 20 or 30 columns.
Summary
Although there is a minimal impact when using SELECT * over SELECT 1 in the EXISTS logical operator, there truly isn’t enough impact to cause a throw down on calling it a best practice to never use SELECT * in the EXISTS. However, an impact is an impact and for that, we sit on the fence of best practices. This is why I’ll keep using SELECT 1 in this type of query and why I’ll keep recommending it over the SELECT *.