Jeremy Hylton : weblog : 2003-10-21

Chandler and ZODB

Tuesday, October 21, 2003, 1:52 p.m.

Early versions of Chandler used ZODB, but it was removed in favor of a repository system they developed themselves. Andi Vajda, the repository developer, posted a short explanation for that decision today:

Today, the Chandler repository is not really so much an object database as an item XML database combined with large collections of references directly stored in Berkeley DB....

The trade-off for these decisions and design choices is a somewhat steeper learning curve for programmers expecting a real object database like ZODB. My hope is that this trade-off is well worth the gains.

John Anderson, Chandler's architect, offered the following comment about ZODB:

In my discussions with various programmers concerning ZODB, it seems like they are reluctant to spend much time learning it and would rather go invent something new.

A key mismatch between ZODB's goal and the Chandler developer's goal was language independence. One of their developers explained to me that they might want to run client code on the repository and Python was too slow to run on a server. It's an odd comment given that ZODB's primary purpose is to provide persistence for an application server.

I haven't seen a document that describes the current Chandler repository or its programming model. It would be interesting to see how Chandler-specific requirements affected the protocol. I recall that an early design for the repository used an XML-based protocol that was like IMAP but with many more bells and whistles. It wasn't completely obvious how to integrate this protocol with ZODB. At the same time, it wasn't clear how many of these requirements were just YAGNI.

The key integration problem was granularity. ZODB works at the level of individual objects. When you request an object, you get the persistent object and any non-persistent objects it contains. (They might be better called second-class persistent objects; they do persist, but they can't be shared or referred to be name.) If you want to be able to load part of something, you need to organize that something as a collection of individual persistent objects. In Zope, large binaries are stored as a collection of smaller persistent objects, so that you can load part of the data without loading all of it. In that particular case, Zope is more complicated in order to get more fine-grained control over resource usage. In general, it seems good to have a single naming and sharing mechanism.

Another interesting requirement for Chandler was to be able to load objects in bulk from the server. It would be nice to have some kind of pre-fetching that would expose this to ZODB. I wrote about prefetching in April.