[Patches] mailbox.UnixMailbox and rfc822.Message.unixfrom

Guido van Rossum guido@python.org
Fri, 31 Mar 2000 16:01:38 -0500


> I actually meant 'too many users of rfc822.Message.unixfrom to bluntly
> remove it'. My main problem with this (mis)feature of the mailbox module is
> that it *looks* like it can handle all mailboxes, and it does it best, but
> it fails in the Unixmailbox case. Were one to write a tool to read, modify
> and write mailboxes, one would have to use a rewritten mailbox module, or
> special-case the unixmailbox and write a seperate module for that.

Frankly, I don't see any support for *writing* mailboxes anywhere in
the mailbox module.  All it does it allow you to *read* it.
Recreating a mailbox given its messages would require a lot of code
that is not part of the current module.

> I'm not arguing the sensibility of the unixfrom line here. We use qmails'
> maildir format as much as possible, for a large number of reasons (though we
> dont actually use qmail) and I much prefer it over unixmailboxes. But the
> unixmailbox format is the most widespread mailbox format -- all Unix
> mailclients read it, most write it by default, as do most MDAs (not only
> sendmail) and Netscape and Eudora use it too, for instance.
> 
> But the fact remains that you can't use the mailbox module the same way for
> different mailboxes -- the other single-file mailbox formats use fixed
> end/start message indicators, that dont contain any other information. The
> Unixmailbox format does, and it's necessary to keep that data if you want to
> re-write the message somewhere. And there is no way to retrieve the unixfrom
> line *at all*, except by not using this module.
> 
> > > > Why do you need access the Unix from header through the mailbox
> > > > module?
> > > 
> > > I thought Mailman needed it.
> 
> > [But it doesn't]
> 
> The only reason (as far as I can see) that it doesn't need it is because
> programmers have always worked around this problem. Mailman's archiver
> creates two sets of archives for each maillist, an HTML version and a unix
> mailbox (The 'downloadable' version). I dont know if you ever tried to use
> the unix mailbox format, but some people have, and come away with a nasty
> surprise: It's not a valid mailbox. Why ? Because the unixfrom line has to
> be kludged, and the kludge uses the wrong date format, in addition to the
> wrong From address and the wrong time. (i posted a fix to fix the kludge in
> the mean time, by the way)
> 
> (pipermail/HyperArch *do* use the mailbox module's UnixMailbox, you see.)
> 
> And yes, I know this can all be fixed in Mailman (and if it wont be fixed in
> the python library, I'll post the patch there myself ;) but I still see it
> as a problem in the python mailbox module. If the mailbox module had done it
> right from the start, the kludge wouldn't even *be* there.
> 
> > The mailbox module supports several other mailbox formats which don't
> > have a "Unix From line" either.  It is intended to provide a uniform
> > API for reading mailboxes.  Features specific to mailbox formats don't
> > belong in this API.
> 
> If it forgets about unixfrom, it's not an API for reading mailboxes, it's an
> API for reading individual *messages*. It loses any and all information
> about the mailbox, and that's what I disagree with... It's not useful for
> real mailbox-editing. 

"It's not useful" is a bit of an exaggeration, don't you think?  You
seem to be fixed on the idea that the Unix header contains useful
information.  I have been living without this information for years
and have never missed it...

> [ about the codingstyle of the fix ]
> 
> > Well, since _search_end() is always called after a successful
> > _search_start(), you could have solved the problem locally in
> > _search_end().
> 
> Ah, well, I was worried it might screw up if someone else seek()ed to the
> start of a message, and then called next(), but I now see it can't happen...
> Not because next() seeks back to seekp, but because the seek() method on the
> mailbox class is nonfunctional :)
> 
> Okay, if I write the patch like that (and maybe remove and/or fix the seek()
> method on the mailbox class ?) does it stand a chance of being accepted ? Or
> should I write a mailboxhandling module instead ? :-)

Here's what I suggest for a patch.  It works for me.  If it works for
you, I'll check it in.

Index: mailbox.py
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Lib/mailbox.py,v
retrieving revision 1.18
diff -c -r1.18 mailbox.py
*** mailbox.py	2000/02/10 17:17:13	1.18
--- mailbox.py	2000/03/31 20:53:03
***************
*** 97,109 ****
--- 97,112 ----
  
          def _search_start(self):
                  while 1:
+ 			pos = self.fp.tell()
                          line = self.fp.readline()
                          if not line:
                                  raise EOFError
                          if line[:5] == 'From ' and self._isrealfromline(line):
+ 				self.fp.seek(pos)
                                  return
  
          def _search_end(self):
+ 		self.fp.readline()	# Throw away header line
                  while 1:
                          pos = self.fp.tell()
                          line = self.fp.readline()

--Guido van Rossum (home page: http://www.python.org/~guido/)