[spambayes-bugs] [ spambayes-Bugs-901920 ] sb_dbexpimp.py barfs on
0xA3 char
SourceForge.net
noreply at sourceforge.net
Sat Feb 21 18:12:50 EST 2004
Bugs item #901920, was opened at 2004-02-21 23:12
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=901920&group_id=61702
Category: None
Group: Source code 1.0a9 (0.9)
Status: Open
Resolution: None
Priority: 5
Submitted By: Dougie Lawson (dougielawson)
Assigned to: Nobody/Anonymous (nobody)
Summary: sb_dbexpimp.py barfs on 0xA3 char
Initial Comment:
I've got a problem where I can't export my hammie.db
and re-import it with sb_dbexpimp.py.
The script barfs with "UnicodeDecodeError".
jerry:/etc/spambayes # ~/sb_dbexpimp.py -i -d new.db -f
hammie.db.export
Importing database new.db using file hammie.db.export
Debug: current word: subject%3A%20%A3
Traceback (most recent call last):
File "/root/broke.py", line 271, in ?
runImport(dbFN, useDBM, newDBM, flatFN)
File "/root/broke.py", line 203, in runImport
word = uunquote(word)
File "/root/broke.py", line 116, in uunquote
return unicode(urllib.unquote(s), 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa3
in position 9: unexpected code byte
jerry:/etc/spambayes #
I added this code to get the debugging output:
def uunquote(s):
try:
return unicode(urllib.unquote(s), 'utf-8')
except UnicodeDecodeError, e:
print "Debug: current word: %s\n" % s
raise
jerry:/etc/spambayes # python -V
Python 2.3.3
0xA3 is a GBP currency symbol. The web interface
handles it OK.
I get these results for a word query on "subject: £"
Statistics for 'subject: £'
Number of spam messages: 0.
Number of ham messages: 2.
Probability that a message containing this word is
spam: 0.091837.
If you need a copy of the exported (100K) file just ask.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=901920&group_id=61702
More information about the Spambayes-bugs
mailing list