READ COMMITTED sucks!

Are you still using READ COMMITTED transaction isolation level? Default transaction isolation level on all SQL Server versions (2000-2014) has serious inconsistency problems, by design. Not many people are familiar how and why is it happening. I am writing this because one fellow MCM just had that problem, scratching his head how is it possible. Few days ago I met one experienced Oracle DBA, a friend (yes, I do have Oracle friends! :)), who was asking me “Does SQL Server still have consistency problems?” making me scratch my head on what does he meant by that. I will show you a demo code here to reproduce it and a solution to the problem.

How READ COMMITTED works?

Under default READ COMMITTED isolation level, shared locks on the rows/pages are released as soon as the row is read. Consequence are two anomalies at a single-statement level, that some call “inconsistent analysis”:

– reading the same row multiple times

– legitimate rows are omitted, although they are present in the table all the time

That are serious, statement-level inconsistencies. E.g. single simple SELECT can read the same row twice and bring you duplicates in the result or it will skip some rows. The worst thing is that you will often not notice at all that data is inconsistent.

If you are running a SELECT and the other transaction updates the row you already read in a way that it moves that row in front of your reading point, you will read that row twice. If the other transaction updates the row you are yet to be read in a way that it moves that row before of your reading point, you will not read that row at all. Such data anomalies are difficult to reproduce because the timing is crucial.

Let’s reproduce it!

Here is the script. You have to be fast with separate session because timing is crucial:

--==== SETUP - begin
create database test_artefacts
GO
alter database test_artefacts set recovery simple
alter database test_artefacts set read_committed_snapshot off
alter database test_artefacts set allow_snapshot_isolation off
go
use test_artefacts
go

create table big_table
(
	id int primary key clustered,
	filler char(8000) default('a')
)
go

-- insert 500 000 rows. Or number large enough for selecting all rows from that table to take at least 5 seconds.
truncate table big_table
go
insert into big_table(id, filler)
select t.id, t.id
from
(	select top 10000 id = ROW_NUMBER() over(order by (select null)) + isnull((SELECT max(id) from big_table), 0)
	from sys.all_columns a cross join sys.all_columns b
) t
go 50

if OBJECT_ID('tempdb..#t') is not null drop table #t

--select COUNT(*) from big_table -- 500 000
--==== SETUP - end

-- Start session 1, then session 2 very soon after (about 1 second or less)

--==== SESSION 1 - begin
-- Copy 500 000 rows table
select * into #t from big_table
go
--==== SESSION 1 - end

--==== SESSION 2 - begin
use test_artefacts
go
--sp_spaceused big_table

--Update first rows to move them at the end of the clustered index.
--We are NOT changing number of records. We do not do inserts nor deletes.
--Number of rows should stay constant.
update t set id=id+1000000
from (select top 1 * from dbo.big_table order by id) t
go 123
--==== SESSION 2 - end

--==== RESULT
select COUNT(*) from #t -- 500 123! We expected 500 000.
select COUNT(*) from big_table -- 500 000!

-- Run the same experiment again, but this time with this code in session 2:
update t set id=-id
from (select top 1 * from dbo.big_table order by id desc) t
go 123

--==== RESULT
select COUNT(*) from #t -- 499 877! We expected 500 000.
select COUNT(*) from big_table -- 500 000!

--==== SETUP - begin

create database test_artefacts

alter database test_artefacts set recovery simple

alter database test_artefacts set read_committed_snapshot off

alter database test_artefacts set allow_snapshot_isolation off

use test_artefacts

create table big_table

(

id int primary key clustered,

filler char(8000) default('a')

)

-- insert 500 000 rows. Or number large enough for selecting all rows from that table to take at least 5 seconds.

truncate table big_table

insert into big_table(id, filler)

select t.id, t.id

from

( select top 10000 id = ROW_NUMBER() over(order by (select null)) + isnull((SELECT max(id) from big_table), 0)

from sys.all_columns a cross join sys.all_columns b

) t

go 50

if OBJECT_ID('tempdb..#t') is not null drop table #t

--select COUNT(*) from big_table -- 500 000

--==== SETUP - end

-- Start session 1, then session 2 very soon after (about 1 second or less)

--==== SESSION 1 - begin

-- Copy 500 000 rows table

select * into #t from big_table

--==== SESSION 1 - end

--==== SESSION 2 - begin

use test_artefacts

--sp_spaceused big_table

--Update first rows to move them at the end of the clustered index.

--We are NOT changing number of records. We do not do inserts nor deletes.

--Number of rows should stay constant.

update t set id=id+1000000

from (select top 1 * from dbo.big_table order by id) t

go 123

--==== SESSION 2 - end

--==== RESULT

select COUNT(*) from #t -- 500 123! We expected 500 000.

select COUNT(*) from big_table -- 500 000!

-- Run the same experiment again, but this time with this code in session 2:

update t set id=-id

from (select top 1 * from dbo.big_table order by id desc) t

go 123

--==== RESULT

select COUNT(*) from #t -- 499 877! We expected 500 000.

select COUNT(*) from big_table -- 500 000!

Basically, it creates a database and a 500 000 rows table. Then copies full table to a #temp table, while another session does updates. We would expect that copy of the table has the same number of rows as original table (500 000), since UPDATE command does not change the number of rows – all of the time there is exactly 500 000 rows in a table.
But – surprise!
In first example we copied 123 rows more (500 123), showing multiple reads of the same row.
In second example we copied 123 rows less (499 877), showing that some rows are skipped during copy.

The solution

The solution of this complex problem is really simple, and is available since SQL 2005. It is called READ COMMITTED SNAPSHOT. You can enable it at database level with a single ALTER DATABASE command. But it also has consequences, like tempdb growing and changed rows become wider by 14-byte pointer, possibly causing page splits if your fillfactor is 100%. So you probably want to test that before you enable it on existing systems. New databases should be safe to use it. On SQL 2000 your only option is to introduce more locks: REPEATABLE READ or SERIALIZABLE isolation level.

3 Comments on “READ COMMITTED sucks!”

Andreas Wolter says:

2013-09-26 at 12:11

Just to remind our Oracle friends:
Returning a snapshot of old data that is in the midst of being modified, is again not necessarily the best option and telling “the truth”.
It might just as well be, that there is a serious mistake, like a wrong product price, which someone detected and just updated. But still, the enduser will get the old and wrong 1$ price for the new MAC – whereas under Read Committed this read would wait until the correction is done.
So it always depends. And having a choice and knowing about it, that’s what leads to “returning the right information” (and not only consistent)

- Vedran says:
  
  2013-09-26 at 12:49
  
  Andreas, thank you for your valuable comment. Indeed, with SQL server, if the other transaction is modifying a row, we can choose will we wait for the other transaction to finish and read the fresh value (standard “READ COMMITTED” behaviour) or we will not wait and read the value as it was before the other transaction began changing the row (READ COMMITTED SNAPSHOT). Even if database is in RC SNAPSHOT mode we can use standard RC behavior per statement with a hint “READCOMMITTEDLOCK”.
  
Jaime says:

2013-12-12 at 09:20

Hi,

I deeply recommend reading this great article by Kendra Little:
http://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/

Regarding what Andreas says, there is also a great article by Craig Freedman that explains how you can end up having different results from the same set of operations, depending on the transaction isolation level you choose: http://blogs.msdn.com/b/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx

Regards

READ COMMITTED sucks!

How READ COMMITTED works?

Let’s reproduce it!

The solution

3 Comments on “READ COMMITTED sucks!”

Leave a Reply Cancel reply