Sort records without ORDER BY?
How often you see some smartie “optimizes” the query by removing ORDER BY, justifying that query always goes by that index, and index is in desired order for the result? Or they say: “I executed this query thousands of times and always got the result in order of that index”.
Order of the result might be really good for thousands of times, but it is not guaranteed. One day it is ok, the next day it isn’t. Let me show you with an example.
We will populate a table with 100 rows, where Id is identity (1-100), clustered key:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
USE tempdb GO IF OBJECT_ID('dbo.test_t', 'U') IS NOT NULL DROP table test_T GO CREATE TABLE test_t ( id INT IDENTITY PRIMARY KEY CLUSTERED, filler CHAR(4000) NOT NULL DEFAULT 'a', some_int int DEFAULT (ABS(CHECKSUM(NEWID()))%1000+1) ) GO INSERT INTO test_t(filler) SELECT TOP 100 'a' FROM sys.all_columns c CROSS JOIN sys.all_columns c2 GO |
Notice “some_int” column that is just some arbitrary int value, and a “filler” which makes this table a bit bigger in terms of pages. We want to select the rows from that table, and get them in order of Id without using ORDER BY. Let us assume we have a WHERE clause that implements certain business logic of our program:
1 2 |
DECLARE @m INT = 0x7FFFFFFF SELECT t.id FROM test_t t WHERE t.some_int BETWEEN -@m AND @m |
The query returns all 100 rows despite the WHERE clause, and all ordered nicely by ID. Seems like it is working! Or is it?
What if after some time, someone creates a non-clustered index? For example, someone wants to optimize SELECT operations on that table by creating suggested “missing indexes”:
1 2 3 4 5 6 |
CREATE INDEX test_t_ix ON test_t(some_int) GO DECLARE @m INT = 0x7FFFFFFF SELECT t.id FROM test_t t WHERE t.some_int BETWEEN -@m AND @m GO |
Wow! We have the SAME QUERY, but rows are NOT ordered anymore!
Why?
Because optimizer decided to go with a different execution plan (it goes with non-clustered index on some_int, and with a nested loop does a key lookup on a clustered index). Execution plans can change with time, as data grows, to better suite larger amounts of data or different distributions of data than it was before.
You could force an index with a hint, fix the plans with plan guide, but it is generally a bad practice. Because data change, and what was optimal then, might not be optimal now. With hints you criple the optimizer for a freedom to adjust to a data change. If you limit it, it cannot make a better plan for the new distribution of data that occured with time and you might experience slower perfromance.
Summary
Don’t rely on indexes to deliver the rows in some specific order. Plans change, data and statistics change, indexes are added and removed,it is a live tissue. The only guarantee that result will be sorted in particular way is ORDER BY. Use ORDER BY or leave sort operation out of the DB and let the consumer app do the sorts.
Leave a Reply