[Mailman-Users] "No subject" messages in archives
Ivan Van Laningham
ivanlan at pauahtun.org
Sun May 20 13:41:14 CEST 2007
Hi All--
Mark Sapiro wrote:
> Ivan Van Laningham wrote:
>> But I have one list for which I used archives from two previous
>> incarnations of the list, plus the current archive mbox, as input to
>> arch. I made sure that the previous archives were in mbox format and
>> that they contained only one "From " line per message.
>
>
> Are you sure? Did you run bin/cleanarch against the .mbox file to check
> it?
>
I ran cleanarch, yes, but all it did was to escape every single "From "
line, which would make arch think there was only one message.
>
> This usually results from a message containing an embedded "From "
> somewhere in the message body. The message is archived properly under
> its correct date and subject, but that entry is truncated at the line
> that begins with "From ". Then the rest of the message is archived as
> a separate message. Since it has no From:, Subject: or Date: headers,
> it is archived with the current date and no subject. Also , text
> following the "From " up to the first totally empty (not just blank)
> line is considered part of the header and is not archived with this
> 'second' message.
>
That would describe what I'm seeing, except that--
>
> If there is any message body text in the 'No subject' archived entry,
> you should be able to find that in the .mbox.
>
Right, but there are 5,000 entries with "No subject" and no body, not a
hint of a body.
>
>> The _only_ thing I can see, in the current mbox,
>> is that the end of the last message from the old archives ends on one
>> line and the "From " line for the next message begins on the very next
>> line, with no blank lines between,
>
>
> That shouldn't cause this.
>
Good to know.
>
>> and everywhere else there are either
>> one or more blank lines or one of those message separator lines from
>> AOL:
>>> "----------MB_8C9379FAFA8ECEC_DAC_6C2A_WEBMAIL-MC05.sysops.aol.com--"<
>> These bogus entries aren't really hurting anything, I suppose, but they
>> are annoying and it is irritating to have to scroll down 5000 lines to
>> get to the next real message.
>
>
> They are actually, because they represent missing pieces of other
> messages.
>
How to track them down?
>
>> What is causing this? And is there anything I can do to get rid of the
>> problem? I am willing to live with it if I have to, but I would prefer
>> having a fix.
>
>
> I think you have unescaped "From " lines in the bodies of messages. Run
> bin/cleanarch (with the -n/--dry-run option) to check.
>
> Another possibility is you have real looking but extraneous
> (duplicate?) "From " lines not followed by a real message with
> Subject: and Date: headers prior to the next "From ".
>
Do lines beginning with whitespace before a From count? There are about
a hundred of those in the input mbox.
Metta,
Ivan
--
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
More information about the Mailman-Users
mailing list