[Mailman-Developers] ZODB and ZEO for mailman

Ken Manheimer klm@digicool.com
Tue, 20 Jun 2000 17:55:13 -0400 (EDT)


When i left CNRI for Digital Creations, i thought i might continue
with mailman development, but that didn't wind up happening.  However,
i'm still pretty darn enthusiastic about the benefits of using Zope
stuff for mailman, and seeing Andrew Kuchling's recent ZODB/ZEO
introduction:

  http://starship.python.net/crew/amk/python/writing/zodb-zeo.html

brings home the point pretty well - i think many of mailman's burning
issues would best be solved using ZODB and ZEO.

ZODB is the Zope Object DataBase, and ZEO is the Zope Enterprise
Option - multiple concurrent access to a single database.

Where would this help?  Lots and lots of places.  Mailman wants a
persistent object store, badly.  Anytime mailman does something with a
maillist, an instance of the Maillist class is instantiated, which
then fetches its instance information from a marshalled dictionary in
the filesystem.  Unless the activity is sure to *not* involve any
state changes, the maillist in question is locked - since multiple
instances of the maillist are separate copies.  !

Lots of stuff is jammed into the current, feeble persistence
mechanism, beyond the basic maillist state - stuff like subscriber
information, messages caught for administrative approval (though that
may be changing), maybe other stuff.  This means that not only are
multiple concurrent instances of a maillist distinct, disassociated
copies, but also that there's no way to share these other components
between them, without going to other storage.  (Other storage, eg the
filesystem, for things like the message delivery queues.)

Imagine, instead, that maillist instances and the other components
were implemented as persistent objects in the ZODB, which is reached
from the web, email, news, and command line interfaces with
lightweight scripts that plug into ZEO.  ZODB, being transactional,
would handle conflict resolution - no more locking performance or
misbehavior hassles - and the Connection class cache would take care
of rolling stuff in and out based on activation, for good
responsiveness and memory performance.

Messages going through the system could be implemented as persistent
objects in their own right.  This is useful for mediated transmission
through the message pipline, and also for retention for archival
purposes.  (Maybe the message store would be a separately mounted
ZODB, tuned for use as a smart archive, with provisions for good
cataloguing, message annotations, etc.)  Multiple threads in the
delivery pipeline could be processing the messages at once - for
parallel news gatewaying and mail transmission.  Messages held for
approval would already be in a kind of archive - currently, they're
held as message objects in the respective maillists, and are
marshalled and unmarshalled with the maillist state.  Bogus!

How are subscribers currently represented?  Urgh.  Last i knew, they
were manifest in scattered places in the maillist structure - entries
in the members or digest_members attributes, for membership info, with
parallel entries in the passwords (or somename like that) dictionary,
and probably elsewhere.  Unless this has since changed, they weren't
instance objects, but scattered data, and most importantly, they're
specific to each list.  Bogus!

This last may be the most telling failing of the current maillist-
marshal based persistence mechanism - it's neither unified nor
transparent, so there's no easy way to share state between lists.
Judicious use of ZODB plus ZEO could solve all that, making it easy to
represent the components of the system - mailling lists, members,
messages, etc - as real objects, with transparent, transactional
persistence and concurrent access built in.

My cries of "bogus", above, should not be taken as condemnation.  I
still think mailman is great - i'm pretty pleased to see the way it
has taken off.  I do think that the current architecture has severe
limitations, in particular, it will continue to involve great pain
with filesystem locking, obstruction to unified membership, etc.  And
i think that a concurrent persistent object system is the right way to
deal with this - and ZODB plus ZEO are just about ideal prospects for
providing that...

Take a look at the andrews description - 

  http://starship.python.net/crew/amk/python/writing/zodb-zeo.html

Ken Manheimer
klm@digicool.com