[Mailman-Developers] Retrieving individual messages from raw Mailman mboxes via http

Andy Sy andy at nospam.com
Sun Oct 5 05:10:17 EDT 2003


I am thinking of adapting a collapsible, outlinable, no-page
refresh, on-demand message-body load browser-based message
thread interface I made (see http://www.neotitans.com/page.gif)
to work with GNU Mailman (among other things) lists.

Ideally, I would like it to function as a 'www-interface
gateway' that works with all existing Mailman raw archives.
 From what I've been researching, one would need some kind of
index into the raw mbox file, either a mail summary file format
or a database which would contain file seek pointers into
the raw mbox.

It would then use a ranged HTTP request to retrieve only
the particular message body it needs to display (would this
work?). Several issues arise which I'd be glad to have input
on from the experts on this list:


I. Which mbox index / mail summary file format to use?

The Mozilla .msf format looks like a strong candidate.
Does anyone have other suggestions?  Does Mailman maintain
such a mail summary file and is it publicly accessible
by default?


II. index / mail summary file performance and maintenance

Mozilla .msf files can be regenerated on the fly but
for a 100MB mailbox (Python-list's is 600MB+!), it already takes
fairly long (a few minutes).  Assuming index file corruption is
very rare, then this should not be a real problem.


III. index / mail summary file hosting issues

If an index/mail summary file is not available by default, and
such a www-interface gateway were to work with no additional work
on the list manager's part, then the index/mail summary file would
have to be generated by the machine hosting the gateway instead.

- What then, would be the mechanics of the (constant) remote reindexing
that would need to be done as new messages come in?  Would it be possible
to just constantly poll the size of the raw mbox and if it has changed,
to just reindex using data starting from the last retrieved file position?

- How often do list admins compact/expunge their raw archive mboxes?

Everytime they do, afaik, it would require the index / mail summary file
to be regenerated.

- Is it possible, then, for the www-interface gateway to automatically sense
if the remotely hosted raw archive mbox has been expunged/compacted?

- Also, how would the www-interface gateway machine know when its index
/ mail summary file has been corrupted?


A second possible approach would be for the www-interface gateway machine
to maintain its own copy of the raw archive and constantly rsync it with
the one maintained by the list admin.  This will probably only be feasible
if Mailman list admins provide rsync access to the raw mailbox archive.

- Would adding rsync serving of the raw mailbox to Mailman be a good idea?
(If it was in Mailman, it is more likely to be enabled by default).



-- 
=========================================
reply-to: a n d y @ n e t f x p h . c o m
http://www.neotitans.com
Web and Software Development







More information about the Mailman-Developers mailing list