SQL Server: Which Performs Better, IN or EXISTS : Connect SQL

Monday, May 23, 2011

SQL Server: Which Performs Better, IN or EXISTS

It’s true that to solve a problem in TSQL you have many choices, and adopting any of these techniques you can get your desired results. Today we will discuss two very commonly used IN and EXISTS clause to filter query result set based on records from a sub-query. As compared to EXISTS, IN is more commonly used, but which one is more efficient? That’s what we are looking for, today.

IN and EXISTS perform in a same way if we use them with single column search. Like as follow:

USE AdventureWorks

--IN

SELECT * FROM Production.Product pr

WHERE ProductID IN

(SELECT ProductID FROM Purchasing.ProductVendor)

--EXISTS

SELECT * FROM Production.Product pr

WHERE EXISTS

(SELECT 1 FROM Purchasing.ProductVendor
WHERE ProductID = pr.ProductID)

Logical reads and query elapsed time is also same for both queries.

In situations where you need to filter records based on more than one columns existence in sub-query, you will find EXISTS much better in performance. To observe this, let’s create two temporary tables.

CREATE TABLE #Cars (Make VARCHAR(50), Color VARCHAR(30), Seats INT)

CREATE TABLE #CarIssuance (Make VARCHAR(50),Color VARCHAR(30),IssuanceDate DATETIME)

INSERT INTO #Cars

SELECT 'Honda','Black',2 union all

SELECT 'Honda','White',2 union all

SELECT 'Toyota','Black',4 union all

SELECT 'Toyota','Silver',4 union all

SELECT 'BMW', 'Red',2

INSERT INTO #CarIssuance

SELECT 'Honda','Black','2011-05-20' union all

SELECT 'BMW','Red','2011-05-03' union all

SELECT 'Toyota','Black','2011-05-03'

If we need to get all records from #Cars table, where records exists in #CarIssuance table on basis of “make” and “color” columns. Let’s first try traditional IN clause.

SELECT * FROM #Cars

WHERE Color IN

(SELECT Color FROM #CarIssuance)

AND Make IN

(SELECT Make FROM #CarIssuance)

Table '#CarIssuance Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0

Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 40 ms.

Now execute same query with EXISTS option

SELECT * FROM #Cars Cr

WHERE

EXISTS

(SELECT 1 FROM #CarIssuance CI

WHERE Make = Cr.Make

AND Color = Cr.Color)

Table '#CarIssuance Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0

Table '#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Exists performed much better (We got result in 1ms instead of 40ms) just because we have used single sub-query but in case of “IN”, we have used multiple sub-queries. So if somehow, we change our first query of “IN” clause so that we can use only one sub-query, performance will be same as to EXISTS. But code will be bit mixed up and in real life sometime even its not possible.

SELECT * FROM #Cars Cr

WHERE Color IN

(SELECT Color FROM #CarIssuance WHERE Make = Cr.Make)

Summary: Exists performs much better when used for more then one column filter from a sub-query but as a good programming practice EXISTS must be preferred even when handling with single column filter.

3 comments:

doeMay 24, 2011 at 9:07 AM
The last 2 queries are NOT equivalent!

The EXISTS version checks that 'Make' and 'Color' are found on the same line.

The IN version check that 'Make' and 'Color' are both found, but possibly on different lines.

The correct EXISTS equivalent for the IN query is SELECT * FROM #Cars Cr WHERE EXISTS (SELECT 1 FROM #CarIssuance WHERE Make = Cr.Make) AND EXISTS (SELECT 1 FROM #CarIssuance WHERE Color = Cr.Color)

The equivalent IN query for the EXISTS query is the one you have indicated at the end of your article. As you noted, in that case performance is the same...
ReplyDelete
Replies
AnonymousMay 24, 2011 at 11:47 AM
The multicolumn Exists and In comparison is not logically the same. The Exists version the make and model must match on the same record, while the In version, the field can match any record.
ReplyDelete
Replies
itoctopusSeptember 20, 2012 at 1:22 PM
Your conclusion is logical - "IN" gets a list of all product ids and then tries to find if your current product id matches one from the list. As for "EXISTS", it just checks for in sub-query if the ID simply exists in the table.

The gap will be much higher once there is more data in your other table.
ReplyDelete
Replies

Add comment

All suggestions are welcome

Connect SQL

Monday, May 23, 2011

SQL Server: Which Performs Better, IN or EXISTS

3 comments:

Translate

About Me

Categories

Sql Server Must Visit Blogs