How NOT to use CROSS APPLY
OUTER APPLY(SELECT TOP 1) pattern is widespread, even baked into some frameworks like Entity Framework. But is it a good practice? This demo will show. Based on a real-world optimization problem and solved with great success.
The demo shows it can produce a very bad performing queries, as it produces “nested loops” which is not optimal for bigger number of rows/iterations. For more rows much more efficient is merge or hash join. Also order of join is set in code, while much better is to give optimizer freedom to choose the order. Using outer apply top 1 “anti-pattern” everywhere is a bad practice. Alternatives like grouping and windowing functions often perform much much better, as optimizer can choose other join types, it is not limited to just “nested loops”, and can choose different join order. Demo shows how giving optimizer freedom and avoiding this anti-pattern can result in 100x or 1000x faster performance. Example is from real-life situation, where I optimized a long running query from 7h down to around 2minutes.
Entire code from the demo is here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
SET STATISTICS IO, TIME OFF GO IF OBJECT_ID('dbo.Street') IS NOT NULL DROP TABLE dbo.Street GO SELECT TOP 5000000 -- 5M StreetId = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)), -- 1,2,... StreetName = LEFT(NEWID(), 10), MaxAmount = CONVERT(money, NULL), DateOfMaxAmount = CONVERT(SMALLDATETIME, NULL) INTO dbo.Street FROM sys.all_columns, sys.all_columns c GO IF OBJECT_ID('dbo.StreetDetails') IS NOT NULL DROP TABLE dbo.StreetDetails GO SELECT TOP 5000000 -- 5M StreetId = ABS(CHECKSUM(NEWID()))%50000+1, -- 1-50 000 DateFrom = CONVERT(SMALLDATETIME, DATEADD(MINUTE, CHECKSUM(NEWID())%100000, GETDATE())), Amount = ABS(CHECKSUM(NEWID()))%1000 INTO dbo.StreetDetails FROM sys.all_columns, sys.all_columns c GO SELECT TOP 20 s.* FROM dbo.Street s ORDER BY s.StreetId SELECT TOP 20 d.* FROM dbo.StreetDetails d ORDER BY d.StreetId, DateFrom SET STATISTICS IO, TIME ON GO -- How long must this query run? UPDATE s SET s.MaxAmount = f.Amount, s.DateOfMaxAmount = f.DateFrom -- SELECT TOP 10 * FROM dbo.Street s CROSS APPLY ( SELECT TOP 1 d.Amount, d.DateFrom FROM dbo.StreetDetails d WHERE d.StreetId = s.StreetId AND d.DateFrom < GETDATE() ORDER BY d.Amount DESC ) f -- OPTION(NO_PERFORMANCE_SPOOL) GO UPDATE s SET s.MaxAmount = d.Amount, s.DateOfMaxAmount = d.DateFrom FROM dbo.Street s JOIN ( SELECT d.StreetId, d.Amount, d.DateFrom, Rnr = ROW_NUMBER() OVER(PARTITION BY StreetId ORDER BY d.Amount DESC) FROM dbo.StreetDetails d WHERE d.DateFrom < GETDATE() ) d ON d.StreetId = s.StreetId WHERE d.Rnr = 1 -- top 1 GO -- DROP INDEX IX_Street_CL ON dbo.Street CREATE CLUSTERED INDEX IX_Street_CL ON dbo.Street(StreetId) -- DROP INDEX IX_StreetDetails_CL ON dbo.StreetDetails CREATE CLUSTERED INDEX IX_StreetDetails_CL ON dbo.StreetDetails(StreetId, Amount) GO |
Thanks Vedran for this article.
From my experience, I also encountered many situations where CROSS APPLY is not the best solution because as you said leads to nested loop (with outer references) which is not fit well for all scenarios.
The second solution is also interesting with (dummy) CCI batch mode (in some degrees and depending on the version you’re using of course) with ROW_NUMBER() PARTITION BY and corresponding Windows Aggregate operator. Batch mode has good chance to be propagated from right to left of the execution plan as well.
Thanks David for your valuable input!