Jeremy Hylton : weblog : 2003-09-25

ZODB: Transaction ids, timestamps, serial numbers

Thursday, September 25, 2003

This entry has detailed notes on transaction ids in ZODB -- probably a boring topic for most readers. Each storage associates a transaction id with a transaction. Operations like undo and history that need to refer to specific transactions, use this transaction id. The only constraint on the ids is that they be monotonically increasing.

Each object revision has a serial number. The serial number uniquely identifies a revision of the object. It is possible for different transactions to write data records with the same serial number. For example, an abort version operation will write a new data record with the same serial number as the last non-version data record. (The abort version case may be the only case where serial numbers are re-used.)

In most implementations, the transaction id is used for the serial numbers of each of the object revision. So if transaction id 12 commit changes to four objects, each object will get the serial number 12. I don't think there is any code that relies on this feature, though. I think it is just a convenient implementation technique.

The transaction ids are implemented using ZODB TimeStamp objects, although I'm not sure if that is part of the contract or just a detail of the implementation that has crept into widespread use. When the dump utilities for a storage print out the transaction headers, they transaction the transaction id using time.ctime(). For debugging and analyzing failures, it is convenient to read the ids as timestamps.

When ZEO communicates invalidations from server to client, it sends a set of invalidations along with the transaction id that generated them. Currently, the ZEO client stores the transaction id in its cache. When it needs to validate the cache, it requests all the changes since the last transaction id it received invalidations for. (If there are two many, it falls back to validating the serial numbers of every object.)

All of the standard storages use timestamps for transaction ids, relying on the laterThan() method of TimeStamp to guarantee that timestamps are always increasing. The repr of a TimeStamp object is an 8-byte string that is used as the id. (It's very weird to use repr() this way; in ZODB4, we used a method on the TimeStamp instead.) BaseStorage and ClientStorage have basically the same code. The tpc_begin() method takes a transaction id as an optional second argument, although the only place this is used it copyTransactionsFrom(). (Incidentally, the laterThan() call is still used for a supplied id, so there's no guarantee that the specified tid will actually be used.)

If a single transaction span multiple storages, each storage could pick a different id. The code uses the current time at seconds granularity as the default, so there's a decent chance that they will end up being the same. But it's also possible for two storages to generate timestamps before and after a clock tick, respectively.

I discussed some of these issues with Jim, and I discovered that he did not have a clear sense of the requirements for transaction ids. There's no written specification that I know of, and most of the implementations use the same techniques. We agreed, in the end, that we would declare that monotonically increasing ids was all the was required.