Jeremy Hylton : weblog : 2003-04-16

Packing a storage

Wednesday, April 16, 2003

Rys McCusker asked why every storage implementation needs to provides its own garbage collection (via the pack() method). The garbage collector is hard to implement. So it would be better to implement it once and allow all the storages to share that implementation. I didn't have a good answer when he asked, but I've spent about a week struggling with pack() and have one now.

The garbage collector implements the same policy for each storage: A pack, given a specified pack time, allows the storage to reclaim storage for revisions that are only needed to support undo of transactions before the pack time. That is, if an object revision is neither the current revision at the pack time nor referenced as a result of an undo or version operation after the pack time. An efficient implementation of this policy depends on all sorts of details of the storage implementation.

I have been working on a new pack implementation for FileStorage for the last several days. The current implementation is very complicated and seems impossible to maintain. As a result, I've resisted any substantial refactoring. But we've found several bugs recently and small patches to fix those bugs just increase the complexity. No choice but to start from scratch and try to do the simplest thing.

The pack implementation that Barry came up with for Berkeley storages is fairly simple, but can't be used for FileStorage at all. The chief difference is that Berkeley has data structures that can be updated in place. The storage can add and remove elements from a Berkeley table as needed. FileStorage is an append-only log, so it isn't practical to maintain auxiliary data structures. BerkeleyDB also allows these data structures to be moved between memory and disk transparently. In FileStorage, data structures like the index are always in memory.

Berkeley storage uses reference counting to do most of the work. The data and references for an object revision are stored in a separate table. Every transaction that uses the same data has a reference to the entry in the pickle table. (In FileStorage, there is a backpointer to the first transaction that wrote the pickle.) When a transaction is removed by pack, the reference count is decremented. (The Berkeley pack uses a mark-and-sweep phase to handle unreachable objects.)

The new FileStorage pack implementation starts by determining what data records written before the pack time are reachable. A FileStorage is pack by copying all the reachable records from the original file into a new file. Reachability is harder to determine that you might expect, because the undo and versions features create references to object revisions that otherwise unreachable. Tim helped me develop a straightforward reachability pass. It still needs revision to handle versions and to be space efficient.