[spambayes-bugs] [ spambayes-Bugs-1600821 ] Classifier UnicodeDecodeError on wrong transfer encoding

SourceForge.net noreply at sourceforge.net
Tue Oct 2 12:55:51 CEST 2007


Bugs item #1600821, was opened at 2006-11-21 17:59
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1600821&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: imapfilter
Group: 1.0.1
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ivan Vilata i Balaguer (ivilata)
Assigned to: Skip Montanaro (montanaro)
Summary: Classifier UnicodeDecodeError on wrong transfer encoding

Initial Comment:
Running ``sb_imapfilter.py`` 1.0.1 seems to raise the following ``UnicodeDecodeError`` when it comes across a mail with 7-bit content transfer encoding with 8-bit characters in it while classifying::

    Traceback (most recent call last):
    File "/usr/bin/sb_imapfilter.py", line 924, in ?
      run()
    File "/usr/bin/sb_imapfilter.py", line 914, in run
      imap_filter.Filter()
    File "/usr/bin/sb_imapfilter.py", line 785, in Filter
      self.unsure_folder)
    File "/usr/bin/sb_imapfilter.py", line 703, in Filter
      evidence=True)
    File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 190, in chi2_spamprob
      clues = self._getclues(wordstream)
    File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 496, in _getclues
      clues.sort()
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

I'm attaching the mail which caused this.  I know it is not properly-formatted, but it is a legitimate mail produced by a popular MUA (Thunderbird 1.5).  Spam surely is worsely formatted

Someone talked about the same problem in the list: http://www.mail-archive.com/spambayes@python.org/msg04543.html

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2007-10-02 05:55

Message:
Logged In: YES 
user_id=44345
Originator: NO

None of these make the current version of sb_filter.py barf.
I wonder if there's something peculiar about the way the
mail is transmitted via IMAP?  (Just a wild guess.)


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2007-10-01 19:06

Message:
Logged In: YES 
user_id=44345
Originator: NO

File Added: mailbox

----------------------------------------------------------------------

Comment By: Jesús Cea Avión (jcea)
Date: 2007-10-01 17:46

Message:
Logged In: YES 
user_id=97460
Originator: NO

Three examples sent to skip at pobox.com.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2007-09-25 21:48

Message:
Logged In: YES 
user_id=44345
Originator: NO

jcea,

Do you have an email message I can work with?  If so, zip it and send it
to me as an attachment (skip at pobox.com).

Thx,

Skip


----------------------------------------------------------------------

Comment By: Jesús Cea Avión (jcea)
Date: 2007-09-25 21:02

Message:
Logged In: YES 
user_id=97460
Originator: NO

My version is 1.0.4 and the traceback is:

"""
Traceback (most recent call last):
  File "/usr/local/lib/python2.5/site-packages/Milter/__init__.py", line
203, in <lambda>
    milter.set_eom_callback(lambda ctx: ctx.getpriv().eom())
  File "antispam.py", line 513, in eom
    prob=hammiedb.score(msg)
  File "/usr/local/lib/python2.5/site-packages/spambayes/hammie.py", line
62, in score
    return self._scoremsg(msg, evidence)
  File "/usr/local/lib/python2.5/site-packages/spambayes/hammie.py", line
38, in _scoremsg
    return self.bayes.spamprob(tokenize(msg), evidence)
  File "/usr/local/lib/python2.5/site-packages/spambayes/classifier.py",
line 190, in chi2_spamprob
    clues = self._getclues(wordstream)
  File "/usr/local/lib/python2.5/site-packages/spambayes/classifier.py",
line 496, in _getclues
    clues.sort()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfa in position 0:
ordinal not in range(128)
"""


----------------------------------------------------------------------

Comment By: Jesús Cea Avión (jcea)
Date: 2007-09-17 10:37

Message:
Logged In: YES 
user_id=97460
Originator: NO

My version is 1.0.4.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2007-09-06 21:11

Message:
Logged In: YES 
user_id=44345
Originator: NO

I ran the submitted email through the current sb_filter.py in Subversion
(probably the same classifier as in 1.1a4).  It worked for me.  While I
don't use the IMAP filter, any of the SpamBayes applications should use the
same classifier code.  I'm not sure this is a problem in the current code. 
What version of SpamBayes are you using?


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2007-09-05 12:23

Message:
Logged In: YES 
user_id=44345
Originator: NO

Do you have a traceback?  What version of SpamBayes are you using?


----------------------------------------------------------------------

Comment By: Jesús Cea Avión (jcea)
Date: 2007-09-05 09:59

Message:
Logged In: YES 
user_id=97460
Originator: NO

I'm seeing a lot (>1 per hour in my system) of current spam crashing
spambayes because they are marked as "ascii" but body is 8-bit actually.

Since my milter spam filter crashes and sendmail disables the milter
filtering for 50 seconds because the failure (my configuration, and I
wouldn't like to touch it), a lot of spam is getting thru. About 30-100
spams, everytime this bug hits.

Please, increase the priority of this bug a bit... It is hitting. Hard.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1600821&group_id=61702


More information about the Spambayes-bugs mailing list