Wednesday, January 12, 2011

SQL SERVER: Why We Should Avoid COUNT (*)


A blog reader asked that why should we avoid COUNT (*) and why it is declared forbidden in an early post SQL SERVER: 9 Most Forbidden Things . Actual point was that we should avoid to use COUNT (*) to check records existence. How few developers check record existence:

USE AdventureWorks
GO
-- Declare a variable to store intermediate results
DECLARE @TotalCount INT
-- Count total records and place resultant figure in variable
SELECT  @TotalCount=COUNT(*) FROM Sales.SalesOrderDetail
IF @TotalCount >0
-- If records exists then

      SELECT 1

If we place SET STATISTICS IO ON before query to check logical reads we will get following results.

Table 'SalesOrderDetail'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0

Now modify above statements as follow to get same results and check how costly the query is.

IF EXISTS(SELECT 1 FROM Sales.SalesOrderDetail)

      SELECT 1

Table 'SalesOrderDetail'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0

Difference between logical reads of both queries is huge and that is the reason we should avoid count (*)

If your target is to count all records for some other options use following query

USE AdventureWorks
GO

SELECT  sys.partitions.rows
FROM    sys.partitions
WHERE   index_id = 1

        AND object_id = OBJECT_ID('Sales.SalesOrderDetail')

Table 'sysrowsets'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0

I urge you to execute above query and COUNT (*) query for largest table of your own database and check the difference of logical reads.

6 comments:

  1. Is it possible to use this technique in criteria queries?

    ReplyDelete
  2. Why can't SQL Server optimize our count(*) query to perform as well as your version?

    ReplyDelete
  3. One thing to bear in mind is accuracy - the sys.partitions approach may not be 100% accurate depending on when the metadata was last updated by SQL Server. If you don't need it to be 100% accurate then it's a great way to do it!

    ReplyDelete
  4. Thanx Adrian for mentioning this important point.

    ReplyDelete
  5. Im sorry, but i dont see this improvement in a real-life sittuation, this is: to check if a record exist in some conditions (it is not usuall for me to check if a table has records, it will always have, i want to know about the exact condition).

    That said, i tried to check if i could use your tip to improve the performance, i used the following code

    trial 1
    SELECT COUNT(*) FROM dbo.Orders where CustomerID = 'VINET'

    trial 2
    IF EXISTS(SELECT 1 FROM dbo.Orders where CustomerID = 'VINET') SELECT 1

    And the result on the execution plan is 50% - 50% !!!!

    Both effort will reside in the index seek in that case, i think your tip will only work when there is no need to read anything from the database to check the conditions, is this right? Or do i have a better way to check for records?

    Thanks and regards

    ReplyDelete
  6. Thanks for the article. Sorry Gabriel I saw great improvement with the tip.

    Trial 1
    SET STATISTICS IO ON
    SELECT COUNT(*) FROM dbo.myTable where ID = 113
    (1 row(s) affected) count = 430,115
    Table 'properties'. Scan count 1, logical reads 592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



    Trial 2
    SET STATISTICS IO ON
    IF EXISTS(SELECT 1 FROM dbo.myTable where ID = 113)
    SELECT 1

    Table 'properties'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ReplyDelete

All suggestions are welcome