[Spambayes-checkins] spambayes timtoken.py,1.6,1.7

Guido van Rossum guido@python.org
Sat, 07 Sep 2002 01:35:37 -0400


> > Made tokenize() polymorphic.  It now accepts an email.Message.Message
> > instance, a file-like object (something with a readline method), or a
> > string (anything else).
> 
> Good change.  One question/concern:
> 
> > ! def tokenize(obj):
> >       # Create an email Message object.
> > !     if isinstance(obj, email.Message.Message):
> > !         msg = obj
> > !     elif hasattr(obj, "readline"):
> > !         msg = email.message_from_file(obj)
> > !     else:
> > !         try:
> > !             msg = email.message_from_string(obj)
> > !         except email.Errors.MessageParseError:
> > !             yield 'control: MessageParseError'
> > !             # XXX Fall back to the raw body text?
> > !             return
> >
> >       # Special tagging of header lines.
> 
> It's a fact of life that some messages can't be parsed by the email package,
> and the code was careful to catch that when parsing from a string.  I don't
> see anything here to protect the system from dying if a message can't be
> parsed from file.  Barry, when would MessageParseError get raised then?  At
> the time message_from_file() is called (in which case fixing the above is
> easy), or at some later time when trying to invoke some method of the
> Message object (in which case I'm not sure what to do)?

I'm guessing at the time that message_from_file() is called;
message_from_string() is a thin layer on top of that using StringIO,
so if the above code works for message_from_string(), it should work
for message_from_file().  I'll add it.

--Guido van Rossum (home page: http://www.python.org/~guido/)