[Mailman-Users] Archive merge and search

Stephen J. Turnbull stephen at xemacs.org
Mon Nov 10 09:30:38 CET 2014


Barry S. Finkel writes:
 > On 11/9/2014 8:37 PM, Hal wrote:

 > > I did some more research and found out that the MBOX format isn't
 > > standardized as there are 4 different variations around
 > > (http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html).

Jamie Zawinski's page http://www.jwz.org/doc/content-length.html has
some historical information not on that page, and is far more
entertaining (well, to warped minds like mine, anyway).  The most
important is that there are a lot more than 4 variations, and it turns
out that there's a good chance that a given mbox file can contain a
mixture of them.

 > > Investigating the MBOX files in a text editor I found the problematic
 > > ones to have headers starting with ">From " (without the quotes) which
 > > the working ones didn't, so I removed all those lines from a couple of
 > > MBOX files, imported into the Mailman archives and all looked fine!
 > > Obviously I can't check every single posting, so does my discovery and
 > > solution sound feasible?

I'm surprised that this works.  What should work is to remove the ">"
from From_ delimiter lines.

 > When I read a message that has "From " changed to ">From " (at the
 > beginning of a line), I have to trouble interpreting the mail.
 > The URL above says that the transformation "corrupts" mailboxes.
 > I would use the term "changes", as the e-mail body has been
 > changed.

See Jamie's page for why "corrupt" (in quotes) is of appropriate
severity.  In particular, the example of a digital signature is
salient.

 > I dislike the format that uses "Content-Length:" to determine the end
 > of a message.

So do we all.  I'm sure it was directly responsible for the decline of
Sun and its eventual consumption by Oracle. ;-)


More information about the Mailman-Users mailing list