[Mailman-Users] Archive merge and search
Stephen J. Turnbull
stephen at xemacs.org
Mon Nov 10 09:30:38 CET 2014
Barry S. Finkel writes:
> On 11/9/2014 8:37 PM, Hal wrote:
> > I did some more research and found out that the MBOX format isn't
> > standardized as there are 4 different variations around
> > (http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html).
Jamie Zawinski's page http://www.jwz.org/doc/content-length.html has
some historical information not on that page, and is far more
entertaining (well, to warped minds like mine, anyway). The most
important is that there are a lot more than 4 variations, and it turns
out that there's a good chance that a given mbox file can contain a
mixture of them.
> > Investigating the MBOX files in a text editor I found the problematic
> > ones to have headers starting with ">From " (without the quotes) which
> > the working ones didn't, so I removed all those lines from a couple of
> > MBOX files, imported into the Mailman archives and all looked fine!
> > Obviously I can't check every single posting, so does my discovery and
> > solution sound feasible?
I'm surprised that this works. What should work is to remove the ">"
from From_ delimiter lines.
> When I read a message that has "From " changed to ">From " (at the
> beginning of a line), I have to trouble interpreting the mail.
> The URL above says that the transformation "corrupts" mailboxes.
> I would use the term "changes", as the e-mail body has been
> changed.
See Jamie's page for why "corrupt" (in quotes) is of appropriate
severity. In particular, the example of a digital signature is
salient.
> I dislike the format that uses "Content-Length:" to determine the end
> of a message.
So do we all. I'm sure it was directly responsible for the decline of
Sun and its eventual consumption by Oracle. ;-)
More information about the Mailman-Users
mailing list