Monday, May 23, 2011

SQL Server: Which Performs Better, IN or EXISTS


It’s true that to solve a problem in TSQL you have many choices, and adopting any of these techniques you can get your desired results. Today we will discuss two very commonly used IN and EXISTS clause to filter query result set based on records from a sub-query. As compared to EXISTS, IN is more commonly used, but which one is more efficient? That’s what we are looking for, today.
 IN and EXISTS perform in a same way if we use them with single column search. Like as follow:
USE AdventureWorks
GO
--IN
SELECT * FROM Production.Product pr
WHERE ProductID IN
      (SELECT ProductID FROM Purchasing.ProductVendor)
--EXISTS
SELECT * FROM Production.Product pr
WHERE EXISTS
      (SELECT 1 FROM Purchasing.ProductVendor  
              WHERE ProductID = pr.ProductID)
Logical reads and query elapsed time is also same for both queries.
In situations where you need to filter records based on more than one columns existence in sub-query, you will find EXISTS much better in performance. To observe this, let’s create two temporary tables.
CREATE TABLE #Cars (Make VARCHAR(50), Color VARCHAR(30), Seats INT)
CREATE TABLE #CarIssuance (Make VARCHAR(50),Color VARCHAR(30),IssuanceDate DATETIME)
GO
INSERT INTO #Cars
SELECT 'Honda','Black',2 union all
SELECT 'Honda','White',2 union all
SELECT 'Toyota','Black',4 union all
SELECT 'Toyota','Silver',4 union all
SELECT 'BMW', 'Red',2
GO
INSERT INTO #CarIssuance
SELECT 'Honda','Black','2011-05-20' union all
SELECT 'BMW','Red','2011-05-03' union all
SELECT 'Toyota','Black','2011-05-03'
If we need to get all records from #Cars table, where records exists in #CarIssuance table on basis of “make” and “color” columns. Let’s first try traditional IN clause.
SELECT * FROM #Cars
WHERE Color  IN
 (SELECT Color FROM #CarIssuance)
 AND Make IN
      (SELECT Make FROM #CarIssuance)
Table '#CarIssuance Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0
Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 40 ms.
Now execute same query with EXISTS option
SELECT * FROM #Cars Cr
WHERE
 EXISTS
 (SELECT 1 FROM #CarIssuance CI
                        WHERE Make = Cr.Make
                              AND Color = Cr.Color)

Table '#CarIssuance Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0
Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
Exists performed much better (We got result in 1ms instead of 40ms) just because we have used single sub-query but in case of “IN”, we have used multiple sub-queries. So if somehow, we change our first query of “IN” clause so that we can use only one sub-query, performance will be same as to EXISTS. But code will be bit mixed up and in real life sometime even its not possible.
SELECT * FROM #Cars Cr
WHERE Color  IN
 (SELECT Color FROM #CarIssuance WHERE Make = Cr.Make)
Summary: Exists performs much better when used for more then one column filter from a sub-query but as a good programming practice EXISTS must be preferred even when handling with single column filter.

3 comments:

  1. The last 2 queries are NOT equivalent!

    The EXISTS version checks that 'Make' and 'Color' are found on the same line.

    The IN version check that 'Make' and 'Color' are both found, but possibly on different lines.

    The correct EXISTS equivalent for the IN query is SELECT * FROM #Cars Cr WHERE EXISTS (SELECT 1 FROM #CarIssuance WHERE Make = Cr.Make) AND EXISTS (SELECT 1 FROM #CarIssuance WHERE Color = Cr.Color)

    The equivalent IN query for the EXISTS query is the one you have indicated at the end of your article. As you noted, in that case performance is the same...

    ReplyDelete
  2. The multicolumn Exists and In comparison is not logically the same. The Exists version the make and model must match on the same record, while the In version, the field can match any record.

    ReplyDelete
  3. Your conclusion is logical - "IN" gets a list of all product ids and then tries to find if your current product id matches one from the list. As for "EXISTS", it just checks for in sub-query if the ID simply exists in the table.

    The gap will be much higher once there is more data in your other table.

    ReplyDelete

All suggestions are welcome