It’s true that to solve a problem in TSQL you have many choices, and adopting any of these techniques you can get your desired results. Today we will discuss two very commonly used IN and EXISTS clause to filter query result set based on records from a sub-query. As compared to EXISTS, IN is more commonly used, but which one is more efficient? That’s what we are looking for, today.
IN and EXISTS perform in a same way if we use them with single column search. Like as follow:
USE AdventureWorks
GO
--IN
SELECT * FROM Production.Product pr
WHERE ProductID IN
(SELECT ProductID FROM Purchasing.ProductVendor)
--EXISTS
SELECT * FROM Production.Product pr
WHERE EXISTS
(SELECT 1 FROM Purchasing.ProductVendor
WHERE ProductID = pr.ProductID)
Logical reads and query elapsed time is also same for both queries.
In situations where you need to filter records based on more than one columns existence in sub-query, you will find EXISTS much better in performance. To observe this, let’s create two temporary tables.
CREATE TABLE #Cars (Make VARCHAR(50), Color VARCHAR(30), Seats INT)
CREATE TABLE #CarIssuance (Make VARCHAR(50),Color VARCHAR(30),IssuanceDate DATETIME)
GO
INSERT INTO #Cars
SELECT 'Honda','Black',2 union all
SELECT 'Honda','White',2 union all
SELECT 'Toyota','Black',4 union all
SELECT 'Toyota','Silver',4 union all
SELECT 'BMW', 'Red',2
GO
INSERT INTO #CarIssuance
SELECT 'Honda','Black','2011-05-20' union all
SELECT 'BMW','Red','2011-05-03' union all
SELECT 'Toyota','Black','2011-05-03'
If we need to get all records from #Cars table, where records exists in #CarIssuance table on basis of “make” and “color” columns. Let’s first try traditional IN clause.
SELECT * FROM #Cars
WHERE Color IN
(SELECT Color FROM #CarIssuance)
AND Make IN
(SELECT Make FROM #CarIssuance)
Table '#CarIssuance Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0
Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 40 ms.
Now execute same query with EXISTS option
SELECT * FROM #Cars Cr
WHERE
EXISTS
(SELECT 1 FROM #CarIssuance CI
WHERE Make = Cr.Make
AND Color = Cr.Color)
Table '#CarIssuance Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0
Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Exists performed much better (We got result in 1ms instead of 40ms) just because we have used single sub-query but in case of “IN”, we have used multiple sub-queries. So if somehow, we change our first query of “IN” clause so that we can use only one sub-query, performance will be same as to EXISTS. But code will be bit mixed up and in real life sometime even its not possible.
SELECT * FROM #Cars Cr
WHERE Color IN
(SELECT Color FROM #CarIssuance WHERE Make = Cr.Make)
Summary: Exists performs much better when used for more then one column filter from a sub-query but as a good programming practice EXISTS must be preferred even when handling with single column filter.