Tuesday, May 10, 2011

SQL Server: Short Code Doesn’t Mean Smart Code


Recently a blog reader shared an interesting thing. This mail was basically in response to my early post Why to Avoid TRIM functions in WHERE and JOIN clauses,  where we have discussed, that why we should avoid functions (user defined or system) in where clause columns, because these functions in WHERE, JOIN and GROUP  clauses mislead query optimizer for proper index selection and ultimately results in poor query performance.
Blog reader asked that he tried to remove functions from WHERE clause of all the queries but few queries where date was involve, was hard to correct and after asking at some forum he got a solution and now his code is shorter and quicker.
Actual query was something like as following:
Use AdventureWorks
Go
DECLARE @FindDate DATETIME
SET @FindDate = '2005/09/12 12:00:00'

SELECT  *
FROM    Purchasing.PurchaseOrderDetail
WHERE   CAST(CONVERT(VARCHAR(30), ModifiedDate, 101) AS DATETIME) = CAST(CONVERT(VARCHAR(30), @FindDate, 101) AS DATETIME)
It’s an everyday query, where records from “Purchasing.PurchaseOrderDetail” table are required but where modifieddate column values are equal to given parameter (date). But comparison should be based on date only and time portion should be ignored.
Modified smart query :
SELECT  *
FROM    Purchasing.PurchaseOrderDetail
WHERE   DATEDIFF(DD,ModifiedDate,@FindDate) = 0
Though new query is shorter, but is it quick? Let’s checkout input/output statistics, query time and execution plan for both quires.

Table 'PurchaseOrderDetail'. Scan count 1, logical reads 66, physical reads 0
CPU time = 0 ms,  elapsed time = 11 ms.

Table 'PurchaseOrderDetail'. Scan count 1, logical reads 66, physical reads 0
CPU time = 0 ms,  elapsed time = 9 ms
Though second query looks more smart but if we ignore minor difference of query elapsed time, both query are almost same as both queries are using clustered index scan and have same value of logical reads. Because problem still exists i.e. Function on WHERE clause columns. For best query performance we have to get rid of this DATEDIFF function too. Here is a better version, as per performance and not the code because our first priority should be performance.
SELECT  *
FROM    Purchasing.PurchaseOrderDetail
WHERE   ModifiedDate >= CAST(
CONVERT(VARCHAR(30), @FindDate, 101) AS DATETIME)
        AND ModifiedDate < = DATEADD(SS, 86399,
                                     CAST(CONVERT(VARCHAR(30), @FindDate, 101) AS DATETIME))
Query code is even more lengthy then first version but what about performance, lets check out input/output stats with query time.



Table 'PurchaseOrderDetail'. Scan count 1, logical reads 4, physical reads 0
CPU time = 0 ms,  elapsed time = 1 ms.
From execution plan, it’s clear that after removing functions from modifieddate column (used in WHERE clause), query optimizer selected proper non clustered index, which searched only 4 pages for result and finally query performance is increased.

4 comments:

  1. another interesting post , thanks dude now I have become permanent subscriber of your SQL articles.great job.

    JP
    How HashMap works in Java

    ReplyDelete
  2. Would using BETWEEN be cleaner and just as fast?

    ReplyDelete
  3. i prefer this

    AND ModifiedDate < DATEADD(d, 1, CAST(CONVERT(VARCHAR(30), @FindDate, 101) AS DATETIME))

    over this

    AND ModifiedDate <= DATEADD(SS, 86399, CAST(CONVERT(VARCHAR(30), @FindDate, 101) AS DATETIME))

    You should avoid using seconds because datetime data type is accurate upto milliseconds and datetime2 datatype is accurate upto microseconds. Use day for simplicity and to avoid confusion.

    Regards,
    Syed Muhammad Yasir

    ReplyDelete
  4. I like the comment above.

    Regards,
    Muntazer Mehdi

    ReplyDelete

All suggestions are welcome