[Spambayes] Re: Mailbox class in the spambayes project & python2.2.1

Greg Ward gward@python.net
Thu, 26 Sep 2002 16:50:19 -0400


On 26 September 2002, Alexander Leidinger said:
> Shouldn't I first fix the From lines (those which need to have a '>' in
> front of them)?

It depends how smart your mbox parser is.  Neither of the classes
supplied by mailbox.py is smart enough -- UnixMailbox is too strict (it
doesn't like the goofy "hh: m:ss" time format), and PortableUnixMailbox
is too loose (it assumes every occurence of "\nFrom " is a message
delimiter).

The good thing about mailbox.py is that it's really easy to define your
own message delimiter; from a script I was messing on the other day to
deal with concatenation of messages that can hardly be called an "mbox
file":

class MyUnixMailbox (UnixMailbox):
    _fromlinepattern = (r'From .*?(\S+)\s+'
                        r'\w\w\w\s+\w\w\w\s+\d?\d\s+'
                        r'\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$')

If you can come up with a regex to match the real "From " lines in
your mail archive -- in particular the wonky "hh: m:ss" time format --
then drop it in there and away-hey-hey you go.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
And I wonder ... will Elvis take the place of Jesus in a thousand years?
    -- Dead Kennedys