[Mailman-Users] Importing large archives ... design limit hit, and possible bug
Scott Courtney
courtney at 4th.com
Sun Jun 2 09:31:34 CEST 2002
On Sunday 02 June 2002 02:35 am, Scott Courtney wrote:
> By slightly I mean, among other things, that it wraps
> the header lines in a way that appears incompatible with RFC 822. The Perl
> script that extracts these pages has no way of knowing what Yahoo has done
> to the original data.
>
> I used some careful "grep -v" filtering to eliminate the offending lines,
> all of which begin (at least in my data) with either "by " or " for", and
> everything now imports.
For the record:
1. It's more than just the Received: headers that are the problem. Other types
of header are also wrapped in Yahoo's archives.
2. This is in fact incompliant with RFC822. RFC822 requires that continuation
lines of a single header logical line must begin with linear whitespace
characters. See example below.
Received: blah blah blah
blah blah blah
Received: blah blah blah
blah blah blah
The first is valid; the second is not. Messages retrieved from the "view
source" function on YahooGroups appear to have the second format.
I may work out an awk script to do the cleanup automatically, if time permits.
Scott
--
-----------------------+------------------------------------------------------
Scott Courtney | "I don't mind Microsoft making money. I mind them
courtney at 4th.com | having a bad operating system." -- Linus Torvalds
http://www.4th.com/ | ("The Rebel Code," NY Times, 21 February 1999)
More information about the Mailman-Users
mailing list