[Mailman-Developers] Random HTML archiving failures possibly solved

Georg Mischler schorsch@schorsch.com
Mon, 18 Dec 2000 12:40:31 -0500 (EST)


Hi all,

There have been a number of reports about the HTML archiving
to fail misteriously, which were apparently impossible to
reproduce for the experts.  I think I have just found a bug in
Mailbox.py from 2.0 that can cause this behaviour. Since I'm CVS
challenged, I am unable check if it has already been fixed since
then, but here it goes anyway.

The pattern that checks for the unix style "From " lines
fails when it encounters a negative timezone:

 _fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
                  r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+\d\d\d\d\s*$'

The consequence is, that when a mailbox file has a message from
such a timezone at the beginning, then Mailman will think it
contains no messages at all.  A more robust approach (assuming
that a plus sign in front of the timezone is also legal) would
probably look similar to this:

 _fromlinepattern = r'From \s*\S+\s+\w\w\w\s+\w\w\w\s+\d\d?\s+' \
                  r'\d\d?:\d\d(:\d\d)?(\s+\S+)?\s+[+-]?\d\d\d\d\s*$'

At least this fixes the problem on my system here...

On another thought, wouldn't it be even better to use 
rfc822.parsedate_tz() here as well? I realize this implies
some processing overhead, but I'd prefer robustness before
the last two percent of increased performance.


Have fun!

-schorsch

-- 
Georg Mischler  --  simulations developer  --  schorsch at schorsch.com
+schorsch.com+  --  lighting design tools  --  http://www.schorsch.com/