It’s
a common practice by database developers to use local variables in stored
procedures and scripts to place filter on basis of these local variables. YES,
these local variables can slowdown your queries. Let’s prove it.
Create a new table and insert dummy rows.
CREATE TABLE TempTable
(tempID UNIQUEIDENTIFIER,tempMonth INT, tempDateTime DATETIME )
GO
GO 100000 -- (EXECUTE THIS BATCH 100000 TIME)
([tempDateTime] ASC)
INCLUDE ( [tempID]) WITH ( ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now let’s execute a simple query with hard coded values in WHERE clause
SELECT * FROM TempTable
WHERE tempDateTime > '2012-07-10 03:18:01.640'
-------------------------------------------------------------------------------------------
Table 'TempTable'. Scan count 1, logical reads 80, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET @RequiredDate = '2012-07-10 03:18:01.640'
SELECT * FROM TempTable
WHERE tempDateTime > @RequiredDate
------------------------------------------------------------------------------------------
Table 'TempTable'. Scan count 1, logical reads 481, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
In our case of inequality operator in query, query optimizer used a simple formula of 30% of total rows.
In Case of Equality Operator:
DECLARE
@RequiredDate DATETIME
SET @RequiredDate = '2012-07-10 03:18:01.640'
SELECT * FROM TempTable
WHERE tempDateTime = @RequiredDate
Estimated Rows = Density * Total Number = 0.0007358352 * 100000 = 73.5835
Create a new table and insert dummy rows.
USE
AdventureWorks
GO CREATE TABLE TempTable
(tempID UNIQUEIDENTIFIER,tempMonth INT, tempDateTime DATETIME )
GO
INSERT INTO TempTable (tempID, tempMonth, tempDateTime)
SELECT NEWID(),(CAST(100000*RAND() AS INT) % 12) + 1 ,GETDATE()GO 100000 -- (EXECUTE THIS BATCH 100000 TIME)
-- Create an index to support our
query
CREATE NONCLUSTERED INDEX
[IX_tempDateTime] ON [dbo].[TempTable] ([tempDateTime] ASC)
INCLUDE ( [tempID]) WITH ( ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now let’s execute a simple query with hard coded values in WHERE clause
SET STATISTICS IO ON
GOSELECT * FROM TempTable
WHERE tempDateTime > '2012-07-10 03:18:01.640'
-------------------------------------------------------------------------------------------
Table 'TempTable'. Scan count 1, logical reads 80, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Check
out its execution plan and index seeks properties. You can find that estimated
rows are double to actual rows but that’s not a big difference to affect
execution plan and resultantly optimizer has selected a proper plan to execute
this query.
Query
optimizer has estimated number of rows from its base statistics histogram
i.e. EQ_ROWS + AVG_RANGE_ROWS (77 + 88.64286)
DBCC SHOW_STATISTICS ('dbo.TempTable', IX_tempDateTime)
Now,
let’s modify our SELECT query and use local variable and execute it. You will
find that query optimizer has selected a different plan this time, a more
costly plan. WHY ??
DECLARE
@RequiredDate DATETIMESET @RequiredDate = '2012-07-10 03:18:01.640'
SELECT * FROM TempTable
WHERE tempDateTime > @RequiredDate
------------------------------------------------------------------------------------------
Table 'TempTable'. Scan count 1, logical reads 481, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Huge
difference of estimated and actual number of rows clearly indicating that query
optimizer was unable to proprly estimate number of rows and with this wrong
estimation, it has selected a more costly execution plan. Basically Query Optimizer
does not know the value of local variable at the time of optimization and
resultantly can’t use histogram of statistics. It behaves differently with
inequality and equality operators.
In
Case of Inequality Operator:In our case of inequality operator in query, query optimizer used a simple formula of 30% of total rows.
Estimated Rows =(Total Rows * 30)/100
= (100000*30)/100 = 30000
SET @RequiredDate = '2012-07-10 03:18:01.640'
SELECT * FROM TempTable
WHERE tempDateTime = @RequiredDate
If
equality operator is used with local variables, query optimizer gets estimated
rows figure from a different formula i.e.
Density * Total Number of Table Rows. Execute following query
to get density value.
DBCC SHOW_STATISTICS
('dbo.TempTable', IX_tempDateTime)
All Density = 0.0007358352
Total Number of Rows in Table =
100000Estimated Rows = Density * Total Number = 0.0007358352 * 100000 = 73.5835
Drop table when not required
DROP TABLE
TempTable
I see the result too, but I have to disagree on the conclusion.
ReplyDeleteSince not all columns are included, the second query goes for a total scan for all rows to avoid the costly key lookup, hence 481 reads.
The first query takes only 3 reads to get the wanted 77 rows, but then it uses 77 reads to get the missing column. If you use SET STATISTICS TIME ON, you can see there is no (or minimal) time difference between the two approaches.
If you make tempmonth column included too, you get identical results with both queries.