[Mailman-Developers] about qrunner and locking

Fri, 8 Dec 2000 14:13:34 -0500

>>>>> "CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:

    >> Background for those who don't know: zodb is the Zope Object
    >> Database, ZEO is Zope Enterprise Objects.

    CVR> My only worry about this is adding enough complexity and
    CVR> overhead that mailman loses it's attractiveness to the small
    CVR> site.

If I was proposing to swallow all of Zope, I think that'd be a valid
criticism.  I think ZODB is self-contained, transparent, and small
enough to outweigh any complexity.  In fact it may be a complexity win
because of the headaches involved in the current architecture.
Certainly it'll be miles better than trying to cook our own, which I
fear would end up looking a lot like ZODB, feature-wise.

    >> Pythonlabs.  We've talked about all this stuff before, but the
    >> question now is: is it better to jump in sooner rather than
    >> later?

    CVR> Probably sooner, if that's the direction we want to go -- but
    CVR> that simply defines 2.1 as "bug fixes and really easy stuff",
    CVR> and puts us in 3.0 development sooner, rather than later So
    CVR> to a good degree it means 2.1 or 3.0 determinations are made
    CVR> based on "easy" rather than "high priority" to minimize
    CVR> re-doing stuff when it's rearchitected.

Yep, although you have to add i18n into the mix for 2.1, which is
probably enough if we want to release it early in 2001.

    >> We'd have to handle collisions for multiple qrunner processes,
    >> potentially on separate machines.  One way that doesn't involve
    >> locking shenanigans is to divide the hash space up and assign a
    >> segment to each out-qrunner process.

    CVR> here's another way that should work: each record has a
    CVR> locking field in it. When qrunner wants to execute that item,
    CVR> it reads the field.  If the field is NULL, it writes its ID
    CVR> (hwatever it is, guaranteed unique) into that locking
    CVR> field. It then waits a beat, and reads it back. if it reads
    CVR> back its own ID, it knows it owns the record and can execute
    CVR> it. If it reads back someone else's ID, it lost the lock, but
    CVR> someone else owns the record so it can skip it and move on.

    CVR> you can simulate atomic locks with a little thought and
    CVR> cooperative processes, by everyone writing to the store and
    CVR> then seeing who won.  A LOT easier from and administrative
    CVR> view than partitioning hashes and the like, IMHO.

Hmm, I do worry about using writes to coordinate the various
processes.  I think we're heading toward the NFS atomicity problem
again.  Administratively, partitioning the hash space shouldn't be too
hard -- we can simply have a variable that says how many concurrent
qrunners to start and divide the hash space up evenly.

If we want to weight the hash partitions, then I think it would be
simple to have a list of partition weights.  The lenght of the list
would be the number of concurrent qrunner processes, and then it's a
simple matter of taking the ratio for each individual weight.

What's difficult is getting the right qrunners to start on separate
machines, and splitting the hash space up across machines, but that'd
be difficult anyway.

-Barry