[Spambayes-checkins] spambayes timtoken.py,1.6,1.7
Guido van Rossum
guido@python.org
Sat, 07 Sep 2002 01:35:37 -0400
> > Made tokenize() polymorphic. It now accepts an email.Message.Message
> > instance, a file-like object (something with a readline method), or a
> > string (anything else).
>
> Good change. One question/concern:
>
> > ! def tokenize(obj):
> > # Create an email Message object.
> > ! if isinstance(obj, email.Message.Message):
> > ! msg = obj
> > ! elif hasattr(obj, "readline"):
> > ! msg = email.message_from_file(obj)
> > ! else:
> > ! try:
> > ! msg = email.message_from_string(obj)
> > ! except email.Errors.MessageParseError:
> > ! yield 'control: MessageParseError'
> > ! # XXX Fall back to the raw body text?
> > ! return
> >
> > # Special tagging of header lines.
>
> It's a fact of life that some messages can't be parsed by the email package,
> and the code was careful to catch that when parsing from a string. I don't
> see anything here to protect the system from dying if a message can't be
> parsed from file. Barry, when would MessageParseError get raised then? At
> the time message_from_file() is called (in which case fixing the above is
> easy), or at some later time when trying to invoke some method of the
> Message object (in which case I'm not sure what to do)?
I'm guessing at the time that message_from_file() is called;
message_from_string() is a thin layer on top of that using StringIO,
so if the above code works for message_from_string(), it should work
for message_from_file(). I'll add it.
--Guido van Rossum (home page: http://www.python.org/~guido/)