[Email-SIG] Maybe a bug, maybe not
Alexandre Ratti
alex at gabuzomeu.net
Mon May 3 15:45:55 EDT 2004
[Resent because first message was bounced by the email-sig list.]
Hi Eric,
[Eric S. Johansson wrote]
> found a very common form of spam that triggers an exception. don't know
> if you considered a bug or not. I've enclosed a sample message and a
> very simple program to trigger the bug. From my limited understanding,
> the payload type is correct but somehow it is dispatched to the wrong
> handler. When I was writing the test program, I also copied some of the
> generator code so I could see what method was being requested etc. then
> I ran into limits of my knowledge and time
[http://mail.python.org/pipermail/email-sig/2004-May/000101.html]
I also received several junk emails that crash the email package. They
are a pain because they also crash spambayes since it uses this package.
I'm copying the spambayes list since people started reporting this
problem on this list too.
I suspect that the crash occur because these messages have multipart
boundaries but have a text content type header. This cause the
"_handle_text" method of the Generator class (in email/Generator.py) to
be called. This method expects get_payload() to return a string, which
doesn't happen since the message is multipart.
This seems to similar to a know issue:
http://sourceforge.net/tracker/index.php?func=detail&aid=846938&group_id=5470&atid=105470
I'm not sure at which levels in the email package this problem should be
fixed. For now, I applied this simple fix in the Generator.py module:
replace the _handle_text method with this code:
def _handle_text(self, msg):
payload = msg.get_payload()
if payload is None:
return
cset = msg.get_charset()
if cset is not None:
payload = cset.body_encode(payload)
if not _isstring(payload):
# Changed to handle malformed messages with a text base
# type and a multipart content.
if type(payload) == type([]) and msg.is_multipart():
return self._handle_multipart(msg)
else:
raise TypeError, 'string payload expected: %s' %
type(payload)
if self._mangle_from_:
payload = fcre.sub('>From ', payload)
self._fp.write(payload)
or use this diff (against the 2.5.4 version of the email package):
--- Generator.orig.py Mon May 3 20:41:27 2004
+++ Generator.py Mon May 3 20:43:46 2004
@@ -197,7 +197,12 @@
if cset is not None:
payload = cset.body_encode(payload)
if not _isstring(payload):
- raise TypeError, 'string payload expected: %s' % type(payload)
+ # Changed to handle malformed messages with a text base
+ # type and a multipart content.
+ if type(payload) == type([]) and msg.is_multipart():
+ return self._handle_multipart(msg)
+ else:
+ raise TypeError, 'string payload expected: %s' %
type(payload)
if self._mangle_from_:
payload = fcre.sub('>From ', payload)
self._fp.write(payload)
This change seems to fix the problem. I fed a mailbox with several of
these messages to spambayes and they were parsed OK and flagged as spam
as expected.
Cheers.
Alexandre
More information about the Email-SIG
mailing list