[Mailman-Users] Importing large archives ... design limit hit, and possible bug

Scott Courtney courtney at 4th.com
Sun Jun 2 09:31:34 CEST 2002


On Sunday 02 June 2002 02:35 am, Scott Courtney wrote:
> By slightly I mean, among other things, that it wraps
> the header lines in a way that appears incompatible with RFC 822. The Perl
> script that extracts these pages has no way of knowing what Yahoo has done
> to the original data.
>
> I used some careful "grep -v" filtering to eliminate the offending lines,
> all of which begin (at least in my data) with either "by " or " for", and
> everything now imports.

For the record:

1. It's more than just the Received: headers that are the problem. Other types
   of header are also wrapped in Yahoo's archives.
2. This is in fact incompliant with RFC822. RFC822 requires that continuation
   lines of a single header logical line must begin with linear whitespace
   characters. See example below.

Received: blah blah blah
    blah blah blah

Received: blah blah blah
blah blah blah

The first is valid; the second is not. Messages retrieved from the "view
source" function on YahooGroups appear to have the second format.

I may work out an awk script to do the cleanup automatically, if time permits.

Scott

-- 
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney at 4th.com       | having a bad operating system."    -- Linus Torvalds
http://www.4th.com/    | ("The Rebel Code," NY Times, 21 February 1999)






More information about the Mailman-Users mailing list