[Mailman-Users] "No subject" messages in archives

Mark Sapiro msapiro at value.net
Sat May 19 19:33:01 CEST 2007


Ivan Van Laningham wrote:
>
>But I have one list for which I used archives from two previous 
>incarnations of the list, plus the current archive mbox, as input to 
>arch.  I made sure that the previous archives were in mbox format and 
>that they contained only one "From " line per message.


Are you sure? Did you run bin/cleanarch against the .mbox file to check
it?

>Once I was 
>convinced they were all ready, I combined the old archive mbox with the 
>current archive mbox using cat, and ran arch.
>
>It worked perfectly, creating archive pages going all the way back to 
>1999, except that in the archive page for the month in which I ran arch 
>(May) for the day on which I ran it (May 7), I have in the vicinity of 
>5000 entries for messages with "No subject" and no body.  The index page 
>for May looks like this:
>
># [Guppies] Malice 2008   Suzanne Williams
># No subject
># No subject
># No subject
>... 5000 entries
># No subject
># No subject
># [Guppies] harsh words for cheating   peg908 at aol.com
># [Guppies] harsh words for cheating   Vwright


This usually results from a message containing an embedded "From "
somewhere in the message body. The message is archived properly under
its correct date and subject, but that entry is truncated at the line
that begins with "From ". Then the rest of the message is archived as
a separate message. Since it has no From:, Subject: or Date: headers,
it is archived with the current date and no subject. Also , text
following the "From " up to the first totally empty (not just blank)
line is considered part of the header and is not archived with this
'second' message.


>I tried to find these mysterious entries in the current archive mbox, 
>but they don't appear.


If there is any message body text in the 'No subject' archived entry,
you should be able to find that in the .mbox.


>The _only_ thing I can see, in the current mbox, 
>is that the end of the last message from the old archives ends on one 
>line and the "From " line for the next message begins on the very next 
>line, with no blank lines between,


That shouldn't cause this.


>and everywhere else there are either 
>one or more blank lines or one of those message separator lines from 
>AOL: 
> >"----------MB_8C9379FAFA8ECEC_DAC_6C2A_WEBMAIL-MC05.sysops.aol.com--"<
>
>These bogus entries aren't really hurting anything, I suppose, but they 
>are annoying and it is irritating to have to scroll down 5000 lines to 
>get to the next real message.


They are actually, because they represent missing pieces of other
messages.


>What is causing this?  And is there anything I can do to get rid of the 
>problem?  I am willing to live with it if I have to, but I would prefer 
>having a fix.


I think you have unescaped "From " lines in the bodies of messages. Run
bin/cleanarch (with the -n/--dry-run option) to check.

Another possibility is you have real looking but extraneous
(duplicate?) "From " lines not followed by a real message with
Subject: and Date: headers prior to the next "From ".

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list