[Mailman-Users] Efficient handling of cross-posting

Brad Knowles brad at shub-internet.org
Wed Jan 30 03:11:02 CET 2008


On 1/29/08, Mikhail T. wrote:

>  May I suggest, you underestimate the importance of this feature? 
>Cross-posting
>  may often be justified from the end-user perspective, but is discouraged by
>  the admins exactly because it increases the archival-storage requirements...

I've never once heard an admin discourage cross-posting because of 
archival storage requirements.  In my experience, the issue of 
cross-posting has much more to do with political control, some admins 
not wanting their lists to be well-publicised, and no admin liking to 
have to deal with the problem when hundreds of users from other lists 
try to do a "reply-all" that includes their lists and then the 
message gets rejected or put in the hold queue because those other 
people are not subscribers to the list.

>  Brad, I brought up a particular IMAP-server's implementation as /an example/
>  of how a single message can appear in multiple mailboxes, while only copy of
>  it is stored. You refer to this as "single instance store".

We don't really have mailboxes at all.  We have mail archives.  The 
raw mail archives are kept in 7th edition "mbox" format, and for the 
"cooked" archives they are broken down by month (or other archive 
rotation policy as set by the listowner) and either stored as 
something akin to 7th edition mbox format files (for the plain text 
archives) or split up into multiple *.html files for the HTML format 
archives, but in none of these cases are any of these files what you 
would call a "mailbox" per se.

>  IMAP-server developers are just more affected by the same issue -- people
>  CC-ing multiple addressees results in the same message getting to multiple
>  mailboxes. IMAP-server admins also don't have the "luxury" of prohibiting
>  CC-ing, as mailing-list admins often do. So IMAP-servers already implement
>  the "single instance store", and it would be nice (and logical) if mailing
>  list software did too -- starting with the recognized leader of the pack...

UW-IMAP certainly doesn't do single instance store, and I'm pretty 
sure that Courier-IMAP and Dovecot don't do single instance store by 
default.  There's a lot of problems that come along with single 
instance store that people are not likely to turn on such features by 
default.

>  And yet Google does just that -- de-duplication -- in its search 
>results... It
>  will display a warning at the bottom of the page, saying that duplicate
>  results were suppressed...

That's just search results.  They're not actually storing the 
original copies of those objects, and they give you the option of 
turning off that feature if you like.

That's completely different from doing an Internet-wide 
de-duplication of all data.

>  Well, this is more important -- I was under the (mistaken) 
>impression, that it
>  does. There is no point arguing, how a good search-engine should do things on
>  a Mailman forum, if Mailman implements no search function.

We don't do forums, either.

We do provide hooks that other people have used to implement such 
features, but none of that has been incorporated into the baseline 
version of Mailman.

>  I hope, you'll give the idea of "single instance storage" another thought.
>  There is already an option to archive in "Maildir" format. Optionally storing
>  hardlinks instead of copies of cross-posts can't be too difficult...

I believe you'll find it a lot harder than you think to convert the 
entire archive storage mechanism to use Maildir as an option, and 
then to integrate single instance store on top of that.  Once you do 
that, you're welcome to contribute the code, and then it becomes a 
matter of when one of the core developers can take a look at that 
code and decide whether or not to actually incorporate that into a 
future version of Mailman.

Personally, I think we have much higher priorities elsewhere, but 
then I don't assign tasks to guys like Mark Sapiro, Tokio Kikuchi, or 
Barry Warsaw.

-- 
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>


More information about the Mailman-Users mailing list