[Mailman-Developers] Improving the archives

Stephen J. Turnbull stephen at xemacs.org
Fri Jul 20 15:21:27 CEST 2007


Barry Warsaw writes:

 > First, I want to avoid talking about file system layout.  To me,  
 > that's an implementation detail we needn't worry about right now.   

Agreed.

 > How likely is it that two messages with the same message-id and
 > date are /not/ duplicates?

For message id generators that include a time-stamp in the generated
id, approximately the same as the probability that two messages with
the same message-id are not duplicates, no?

 > Heck, at that point, I'd feel justified in simply automatically
 > rejecting the duplicate and chucking it from the archive.

I'd rather not go there.  There may be applications for the archiver
that require that all mail received be filed.

Counterproposal: have a "collisions" namespace, and provide an
interface for the list owner to decide what to do with them.  They
could be thrown away, they could be given an alternative global ID
somehow and added (eg, the archive page could add a "See probable
duplicates too" link), or they could be put into a moderation-like
queue for list admins to decide about.

 > So now, think of the interface to a message store that supports this  
 > addressing scheme.  Well it's something like:

I don't understand how the calling application is supposed to deal
with a DuplicateMessageError exception since it should not change
either the Message-ID or the Date if present.

I see this as a major problem with any proposal to use only author
headers in computing the "global id".

 > Or by using the global id, or by rejecting messages with duplicate  
 > message ids.

Er, the MTA has already accepted it.  Do you plan to generate a list
manager bounce to the poster?  This has the unpleasant misfeature that
it could be used to bounce spam off the list manager, since the poster
needs to see content to determine whether this is a multiple send or
actually the "intended version" after a "fat-finger" send; we already
know the message-id isn't good enough.



More information about the Mailman-Developers mailing list