[issue11728] mbox parser incorrect behaviour

Steffen Daode Nurpmeso report at bugs.python.org
Mon Jun 13 15:56:25 CEST 2011


Steffen Daode Nurpmeso <sdaoden at googlemail.com> added the comment:

Hello Valery Masiutsin, i recently stumbled over this while searching
for the link to the standart i've stored in another issue.
(Without being logged in, say.)
The de-facto standart (http://qmail.org/man/man5/mbox.html) says:

HOW A MESSAGE IS READ
          A reader scans through an mbox file looking for From_ lines.
          Any From_ line marks the beginning of a message.  The reader
          should not attempt to take advantage of the fact that every
          From_ line (past the beginning of the file) is preceded by a
          blank line.

This is however the recent version.  The "mbox" manpage of my up-to-date
Mac OS X 10.6.7 does not state this, for example.  It's from 2002.
However, all known MBOX standarts, i.e. MBOXO, MBOXRD, MBOXCL, require
proper quoting of non-From_ "From " lines (by preceeding with '>').
So your example should not fail in Python.
(But hey - are you sure *that* has been produced by Perl?)

You're right however that Python seems to only support the old MBOXO
way of un-escaping only plain "From " to/from ">From ", which is not
even mentioned anymore in the current standart - that only describes
MBOXRD ("(>*From )" -> ">"+match.group(1)). 
(Lucky me: i own Mac OS X, otherwise i wouldn't even know.)
Thus you're in trouble if the unescaping is performed before the split..
This is another issue, though: "MBOX parser uses MBOXO algorithm".

;> - Ciao, Steffen

----------
nosy: +sdaoden

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11728>
_______________________________________


More information about the Python-bugs-list mailing list