[Mailman-Users] Efficient handling of cross-posting

Mikhail T. mi+mailman at aldan.algebra.com
Tue Jan 29 17:09:21 CET 2008


понеділок 28 січень 2008 08:05 по, Brad Knowles Ви написали:
> We do not do a "single instance store" within the archiving system of 
> Mailman, and I can pretty much guarantee you that we never will.
> That's not to say that this is necessarily a bad idea, but I think we
> have much, much more important issues to resolve

May I suggest, you underestimate the importance of this feature? Cross-posting 
may often be justified from the end-user perspective, but is discouraged by 
the admins exactly because it increases the archival-storage requirements...

> We do not implement any kind of IMAP or other user mailbox service 
> with Mailman.  If you want that, you should go somewhere else.

Brad, I brought up a particular IMAP-server's implementation as /an example/ 
of how a single message can appear in multiple mailboxes, while only copy of 
it is stored. You refer to this as "single instance store".

IMAP-server developers are just more affected by the same issue -- people 
CC-ing multiple addressees results in the same message getting to multiple 
mailboxes. IMAP-server admins also don't have the "luxury" of prohibiting 
CC-ing, as mailing-list admins often do. So IMAP-servers already implement 
the "single instance store", and it would be nice (and logical) if mailing 
list software did too -- starting with the recognized leader of the pack...

> I *violently* disagree with your claim.  If a message was
> cross-posted to multiple mailing lists and indexed by Google, then
> Google will most certainly return multiple hits for the same message,
> and this is precisely what any proper search engine should do.
>
> De-duplication at this level is absolutely the worst thing you could
> do -- at least by default

And yet Google does just that -- de-duplication -- in its search results... It 
will display a warning at the bottom of the page, saying that duplicate 
results were suppressed...

> Mailman does not incorporate any search function, therefore which
> searches return which messages is totally and completely irrelevant
> to Mailman.

Well, this is more important -- I was under the (mistaken) impression, that it 
does. There is no point arguing, how a good search-engine should do things on 
a Mailman forum, if Mailman implements no search function.

Thank you, guys, very much for your comments. We'll try to look into the 
"sister-list" feature of 2.1.10 to eliminate/reduce multiple copies of 
messages going to the same subscriber and await 3.0 for a full solution to 
the problem.

I hope, you'll give the idea of "single instance storage" another thought. 
There is already an option to archive in "Maildir" format. Optionally storing 
hardlinks instead of copies of cross-posts can't be too difficult...

Yours,

 -mi


More information about the Mailman-Users mailing list