[Mailman-Users] Importing large archives ... design limit hit, and possible bug

Scott Courtney courtney at 4th.com
Sun Jun 2 08:35:00 CEST 2002


On Sunday 02 June 2002 02:06 am, Scott Courtney wrote:
> I'm not sure if this is "the" problem, but it is certainly "a" problem. The
> parser in Pipermail chokes on headers that look like this:
>
> Received: from blah blah blah
> by blah blah blah
> Received: from some other thing
> by some other thing
[...]
> It appears that AOL's mailer has, or at least had, a habit of wrapping
> these header lines. I still need to dig into RFC 2822 or RFC 822 to see
> whether the blame goes to Pipermail for not liking the lines or to AOL for
> generating them, but joining these lines in the text editor seems to make
> Pipermail accept the messages. Anyone know offhand?

BINGO! The culprit is unmasked!

It's not Pipermail and it's not AOL. It's Yahoo. If you recall, I am importing
YahooGroups lists. The only way to get your data from these people is via
their web interface, and they don't let you have a raw source dump. They have
a link to "view source" of a message, and it gives you full headers, but it
formats them slightly. By slightly I mean, among other things, that it wraps
the header lines in a way that appears incompatible with RFC 822. The Perl
script that extracts these pages has no way of knowing what Yahoo has done
to the original data.

I used some careful "grep -v" filtering to eliminate the offending lines,
all of which begin (at least in my data) with either "by " or " for", and
everything now imports.

I'm documenting this because, with Yahoo's new "privacy" (note quotes) policy,
I'll bet a lot of list admins will make a move in the same direction as I'm
moving -- away from YahooGroups and toward Mailman or something like it.
So the message is: beware of this bug that can cause data to be silently
discarded in Pipermail archiving.

Scott

-- 
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney at 4th.com       | having a bad operating system."    -- Linus Torvalds
http://www.4th.com/    | ("The Rebel Code," NY Times, 21 February 1999)






More information about the Mailman-Users mailing list