Jeremy Hylton : weblog : 2003-05-14

Consistency in ZODB

Wednesday, May 14, 2003

ZODB uses optimistic concurrency control. It can be hard to describe what isolation guarantees it provides, because optimistic implementations don't map naturally onto the ANSI isolation levels.

Atul Adya wrote a thesis that describes isolation levels using directed graphs. A node in the graph is a transaction, and an edge is a dependency between the two transactions. These definitions make sense for optimistic and multi-version schemes as well as traditional locking implementations.

I've made some notes about Adya's thesis, and about how they apply to ZODB. I had hoped to understand how the formalisms apply to ZODB, but I'll need to spend some more time on it.

Adya's thesis is available online -- Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions -- and so is a shorter conference paper that presents that basic graph notion -- Generalized Isolation Level Definitions.

There are three kinds of direct dependencies between transactions. (See section 3.1.4 of Adya' thesis.)

write-depends
Ti installs a version of x and Tj installs a new version of x, where install means writes x and commits.

read-depends
Ti installs xi and Tj reads xi.

anti-depends
Ti reads xi and Tj installs xj, where xj is the next version of x

ZODB reports two kinds of conflict errors: read conflicts and write conflicts. A read conflict occurs while a transaction is executing. A write conflict occurs while a transaction is committing. ZODB uses backward validation, which means that a committing transaction is compared against all earlier committed transactions for conflicts.

When a transaction commits, each object it wrote is checked for conflicts. If the revision of the object the transaction read is not the same as the current revision in the database, a ConflictError is raised. This means another transaction installed a new version of the object while the current transaction was running. The explanation in Adya's terminology goes like this:

There are three transactions to consider:

The history of activities would be something like this:

w1 c1 r2 r3 w3 c3 w2 c2
(w -- write, r -- read, c -- commit).

This history is invalid, and ZODB wil report a conflict. There are several dependencies present in this history:

  1. T2 directly read-depends on T1
  2. T3 directly read-depends on T1

  3. T3 directly anti-depends on T2

  4. T1 directly write-depends on T2
  5. T2 directly write-depends on T3

The resulting graph has a cycle between T2 and T3, resulting from edges #3 and #5. T2 reads the object and T3 installs the next version, creating an anti-dependency (#3). But if T2 were to commit, it would create a write-dependency in the other direction (#5).

One significant drawback of ZODB's consistency mechanism is that reads are not tracked explicitly. If an object is loaded in memory when a transaction starts, it is possible to see an inconsistent view of the database. The cause appears to be efficiency. Persistent objects provide no read notification to their database connection. As a result, the database has no idea what objects a transaction reads unless it is asked to fetch the object from storage. When a transaction starts, any objects modified by earlier transactions are flushed from the cache. So the window of vulnerability is at most the length of a transaction.

It's hard to say if what isolation level ZODB provides given this problem with reading stale data. It certainly provides PL-2 isolation (basicly equivalent to ANSI read committed), since PL-2 allows inconsistent reads. Adya presents PL-2+ as the weakest level that provides basic consistency (originally defined by Weihl) provided that updates are serializable.

Definition 11 : Basic-Consistency. A transaction Tj is provided basic-consistency if the values read by Tj are the result of a serial execution of some subset of committed update transactions and each update transaction in the serial execution executes the same steps as it did in the concurrent execution.

In practice, this means that a programmer could rely on PL-2+ if a transaction was read-only or if its writes did not affect consistency, because, e.g., objects written are not involved in consistency constraints.

Adya distinguishes between consistency at commit-time and at execution-time. Commit-time consistency means that an invalid transaction is not allowed to commit. Execution-time consistency is just as important; it means that an executing transaction is not allowed to see an inconsistent state, even if it will eventually abort.

Execution-time consistency is important for data structures that involve constraints among multiple objects. One BTrees bug we chased down was caused by a failure of execution-time consistency. One transaction add enough keys to a BTree bucket to cause it to split. A bucket split modifies two objects; in addition to the bucket itself, the parent is modified to add a new key for the new bucket. In the buggy case, a second transaction started and saw the modification to the bucket but not the parent. As a result, it added new keys in the wrong bucket; because this transaction did not see the new key in the parent, it would put keys that belong in the new bucket in the old bucket.

Several years ago, Andrew Kuchling asked if ZODB supports ACID:

I don't really see how the ZODB supports Consistency, since there's no mechanism for specifying integrity constraints within ZODB itself, though you can certainly write such checks into your classes. For example, you can't enforce that the .actor attribute is a User object that must be present in a given list.

I think he misunderstands was consistency means for ACID transactions. The BTrees example illustrates how database consistency is necessary to be able to program certain kinds of applications. Features like referential integrity and type-checking can be helpful, but are different than consistency / isolation.

ZODB provides execution time consistency by raising read conflict errors. A simple explanation of read conflict errors requires some extra details about how ZODB works. ZODB transactions execute in parallel. When one connection commits a transaction, it sends an invalidation message to all the other connections; the invalidation messages lists all the objects that were modified by the committed transaction. The next time another connection commits a transaction, it will apply the invalidation; i.e. it will ghostify all the objects in the invalidation message.

If a connection is asked to load an object from the database and it has received (but not applied) an invalidation message for the object, it raises a ReadConflictError. This error is necessary to avoid an inconsistent read. When ZODB loads an object from storage, it always reads the most recent revision. Since the connection has an invalidation message for the object, the object was modified after the transaction began. If it reads the current revision of that object, it could be inconsistent with other objects read before the transaction began.

I believe a read conflict would be allowed by PL-2 but not by PL-2+. The cycle in graph for a read conflict involves a read dependency and an anti-dependency. See example on p. 68. The example involves two objects with the constraints that x + y >= 0. The Hbroken history could raise a ReadConflictError in ZODB, unless the read object was already in the cache.

If ZODB was modified so that ReadConflictError was possible when reading objects from the cache, then it would clearly provide PL-2+. It may also provide enough of a hook to provide full serializability for transactions that ask for it. The example transaction in the PL-3 section (3.2.3) would not work in ZODB; a conflict error would occur. I should try to understand what phenomena ZODB does allow, described in terms of a serializability graph. It may be that PL-3 is also achievable at relatively low cost. One possibility is to keep the read set locally and compare it to the invalidation set when a transaction commits. I think this would work unless an invalidation message was in transit at commit time.