[Mailman-Developers] A couple of archiving questions

Mon Feb 16 12:11:53 CET 2015

OK looks pretty easy. Seems like I’ll just write an archiver if I want access to archived messages.

Can there only be one active archiver or multiple?  Say I wanted all messages to go to a zip archiver as well as going to monharc or pipermail as well as my custom archive script.

as

On 16 Feb 2015, at 8:57 pm, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Andrew Stuart writes:

>>> Why would you want to access "messages scheduled for archiving”
>>> via REST?  I have trouble imagining a use case.
> 
> It would make it pretty easy to write an archiver if all I had to
> do is poll via the REST API for new messages waiting to be archived
> whenever I feel like it and put them somewhere.

It's no harder than that to write a Handler in Mailman 2, and IIRC the
additional burden (adding the message-id-hash) in Mailman 3 is
elsewhere in the pipeline.  Here's the whole thing for MM 2:

import time
from cStringIO import StringIO

from Mailman import mm_cfg
from Mailman.Queue.sbcache import get_switchboard

def process(mlist, msg, msgdata):
   # short circuits
   if msgdata.get('isdigest') or not mlist.archive:
       return
   # Common practice seems to favor "X-No-Archive: yes".  No other value for
   # this header seems to make sense, so we'll just test for it's presence.
   # I'm keeping "X-Archive: no" for backwards compatibility.
   if msg.has_key('x-no-archive') or msg.get('x-archive', '').lower() == 'no':
       return
   # Send the message to the archiver queue
   archq = get_switchboard(mm_cfg.ARCHQUEUE_DIR)
   # Send the message to the queue
   archq.enqueue(msg, msgdata)

*One* function of *six* lines after deleting comments (and ignoring
imports), all of which you need to do on the archiver side of the REST
interface anyway.  Except the part about get the archiver queue
object, but you'll need some equivalent *on the Mailman core side* to
ensure that messages hang around until archived.  So this is more
complex than the current push design.  And it also leaves you
vulnerable to DoS'ing yourself if the polling process goes down and
the queue fills your disk -- probably not a *big* issue, but one that
needs a little thought at least to be sure it isn't.  (The self-DoS
problem is a non-problem for the current design, because if you are
expecting to archive locally you probably do have the storage for it.)

Note that the "archq.enqueue()" in the above is semantically just "put
them somewhere" (I'm quoting you).  That's where the devilish details
are in both Pipermail and in your abstract REST-pull-based archiver,
not in interfacing with the core pipeline.

I suppose you could push the MM 2 archive queue (which I'm pretty sure
currently is in Pipermail) into core in MM 3, and then you could use
REST to pull the messages out, but really, I don't see a big gain
except that you use REST for everything.  But writing an archiver is
not as simple as you seem to think (unless it just piles up the
messages somewhere, and we already have that as example code in MM 3:
mailman3/src/mailman/archiving/prototype.py -- hardly more complex
than the MM 2 code above, and it actually does the work of storing the
messages!)

Regards,
Steve