[Tutor] Mailbox.UnixMailbox vs Mailbox.PortableUnixMailbox

Michael Janssen Janssen@rz.uni-frankfurt.de
Mon Mar 17 15:11:02 2003


On Mon, 17 Mar 2003, Danny Yoo wrote:

> There's a reason: PortableUnixMailbox was much too permissive in what it
> considered to be "boundaries" between emails in a mail file, and broke
> when I tried feeding it the total wisdom of the Tutor mailing archive.
>
>
> A mailbox file consists of all the emails, concatenated to each other end
> by end.  How does the system distinguish where one message begins and
> another ends?  One way is to look for anything that begins with,
>
>     From: ...
>
> and treat that as the start of a new message.  This is probably the
> strategy that UnixMailbox takes (although I think it does a few more
> checks to see that it's really seeing the start of an email header).
> PortableUnixMailbox is a little looser: it looks for anything like
>
>     From ...

In a unixmailbox every message starts with the "unixfrom". Example:
>From janssen@rz.uni-frankfurt.de Mon Mar 17 20:41:35 2003

Escaping any "From[space]" in the mailbody (at start of line) is left to
the MTA's. "From me" should be made into "From=20me". The tutor archiv
isn't fed by such a clever programm, it seems ;-)

This way you can use a very lazy check for unixfromness.
PortableUnixMailbox checks indeed only for a starting From[space] (never
"From:" which would be the From-Header).  UnixMailbox checks further if
this line continues with an mail address and a date. In case the
mailclient add some more infos (pine for example adds timezone date) this
test fails and no message is found. UnixMailbox doesn't check for dubble
newlines (or start of file) befor unixfrom wich would be the proper
behaviour.

UnixMailbox provides a nice mechanism to substitude the default regular
expression for testing unixfromness with an own one. Example

class UnixMailboxWithAdditionalData(mailbox.UnixMailbox):
    _regexp = re.compile(r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
              r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*")

instead of r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
           r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"

Note the missing "$" at the end of the regexp to let the mailclient add
additional informations or something like this.

All this is mentioned in the mailbox.py file (as comment not docstring).

Another thing that follows is that one can trick UnixMailbox with putting
a line like
 "From janssen@rz.uni-frankfurt.de Mon Mar 17 20:41:35 2003"

into ones mail (Ouh, I really should put quotes around it ;-)

Michael