[Mailman-Developers] Requirements for a new archiver

J C Lawrence claw at kanga.nu
Thu Oct 30 10:06:04 EST 2003


On Thu, 30 Oct 2003 07:04:19 +0100 
Brad Knowles <brad.knowles at skynet.be> wrote:
> At 12:40 AM -0500 2003/10/30, J C Lawrence wrote:

>> I've already said my bits there and proposed what I see as the cheap,
>> easy, incremental improvement course: Twisted's NNTP supports for
>> storage, Message IDs for keys, a variant best-effort detection and
>> rewriting policy for collisions, and a MeoWWW derivative for HTML
>> presentation/posting.

> I don't know anything about Twisted or MeoWWW, so I can't say how they
> address the subjects above.

Twisted is a pythonic library that implements most of the basic network
protocols.  Among other things it has an RFC conformant NNTP server and
client implementations.  Creating an NNTP server with a backing message
store is, literally, three lines in Python.  Of course it doesn't
support all the nifties that real netnews servers do ala expires,
administrative controls, feeds, etc.  Its not intended for that market,
and Mailman doesn't need those supports.  If deployment sites need that,
they're going to be using inn2|[BCD}News|Diablo anyway.

MeoWWW is a (very inefficient but fixable) pythonic CGI which supports
reading and posting to netnews via NNTP.  It has various nice UI points,
a decent feature set (more than we have now), and does The Right Thing
in almost every aspect I've checked except for performance in the spool
reads.

> I can say that I'm not sure about an NNTP-based storage solution...

We should really start out by splitting that discussion.  NNTP is an
access protocol.  Netnews servers have various storage formats and
techniques.  Currently NNTP and IMAP are the only standardised
wide-deployment protocols for message spool access.  I'm not interested
in IMAP for the reasons previously discussed.  NNTP isn't great, but it
is already supported by Mailman for the new gating features and adds a
clean abstraction model which allows trivial replacement of Mailman's
implementation by inn2|[BCD]news|Diablo|whatever should the deployment
site wish.  Additionally, again as a standards-etc based protocol, it
allows clean abstraction for archive presentation: anything that talks
NNTP can now be an effective Mailman archive presenter.  Ditto for
archive indexing.

As a dev I'm interested in arguments about how to handle the store
behind the NNTP interface -- I find that stuff fun and intriguing -- but
also think they are fairly uninteresting right now for Mailman
specifically.  The 90% case for Mailman will have less than 200K
messages in their site-wide spool, and most of those an order of
magnitude less.  For me the interesting point is that once we abstract
the message storage behind a well-supported standards-based protocol we
can incrementally improve our implementation and those really concerned
with the larger cases can throw in inn2 or whatever else, like a filter
to SQL, instead.  ITMT we get the flexibility and time to grow and do it
Really Right.  Additionally, having adopted such a well defined
abstraction model once, moving down the road should something else
better appear it should be a comparatively small cost to support that in
addition or instead.

> ... although certain storage techniques we've recently discussed
> borrow a lot from extant NNTP implementations, and I'm not sure how
> much sense it would make to rip out just those parts we know we need,
> or if we could actually reasonably take the whole thing,
> kit-n-caboodle.

Which may indeed happen.

> I do believe that we need an alternative solution to the message-id
> header as it was presented to us in the message, as a stable
> guaranteed unique (well, as good as MD-5 or SHA-1 gets) message
> identifier that can always be used to refer to the exact same message
> no matter what.

I'm in split minds here.  I see the temptation.  I like using
Message-IDS, and they are a natural fit to the model semantically, but
messing with Message-IDs has unpleasant effects for some other systems.

<shrug>

> Whether we use this message identifier as a replacement for the
> message-id header value as it was presented to us -- I think that's a
> more philosophical discussion, and I think we should address it by
> allowing both options but deciding which would be a reasonable default
> to take.

<nod> I'm on the side of rewriting Message-IDs if we do generate our own
keys.  I don't like it, but it seems the cleanest approach.

> Given that the mailman UI is basically completely contained within the
> CGI, I'm inclined to leave it there and work on improving it
> internally, allowing us to continue to work with most any webserver
> the client may have.  

Agreed.

> I don't know how MeoWWW addresses this issue, either by replacing the
> webserver, or providing additional tools that may make it easier to
> present a good and consistent UI.

MeoWWW is a CGI as discussed above.  Twisted implements both sides of
HTTP in addition to the NNTP discussed above, but I haven't looked at
the details.

-- 
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.



More information about the Mailman-Developers mailing list