[spambayes-bugs] [ spambayes-Bugs-922063 ] Intermittent sb_filter.py failure with URL pickle
SourceForge.net
noreply at sourceforge.net
Fri Oct 19 06:36:37 CEST 2007
Bugs item #922063, was opened at 2004-03-23 17:10
Message generated for change (Comment added) made by david_abrahams
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=922063&group_id=61702
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: hammie
Group: Source code - CVS
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Leonid (leobru)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Intermittent sb_filter.py failure with URL pickle
Initial Comment:
Here are the relevant .spambayesrc lines:
[Tokenizer]
x-fancy_url_recognition=True
x-pick_apart_urls=True
[URLRetriever]
x-slurp_urls=True
Here is the stack trace:
File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 239, in ?
main()
File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 231, in main
action(msg)
File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 163, in filter
return h.filter(msg)
File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/hammie.py",
line 109, in filter
prob, clues = self._scoremsg(msg, True)
File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/hammie.py",
line 38, in _scoremsg
return self.bayes.spamprob(tokenize(msg), evidence)
File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 246, in slurpi
ng_spamprob
slurp_tokens = list(self._generate_slurp())
File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 550, in _gener
ate_slurp
self.setup()
File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 609, in setup
self.bad_urls = pickle.load(b_file)
File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 982, in load
return Unpickler(file).load()
File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 597, in load
dispatch[key](self)
File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 667, in load_string
raise ValueError, "insecure string pickle"
----------------------------------------------------------------------
Comment By: David Abrahams (david_abrahams)
Date: 2007-10-18 23:36
Message:
Logged In: YES
user_id=52572
Originator: NO
I'm seeing the same problem in ImageStripper.py now:
saving 720 items to /home/dave/spambayes/imagecache.pck
Traceback (most recent call last):
File "/usr/local/bin/sb_filter.py", line 290, in ?
main()
File "/usr/local/bin/sb_filter.py", line 281, in main
action(msg)
File "/usr/local/bin/sb_filter.py", line 199, in filter
return self.h.filter(msg)
File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
156, in filter
debug, train)
File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
110, in score_and_filter
prob, clues = self._scoremsg(msg, True)
File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
39, in _scoremsg
return self.bayes.spamprob(tokenize(msg), evidence)
File "/usr/local/lib/python2.4/site-packages/spambayes/classifier.py",
line 196, in chi2_spampro
b
clues = self._getclues(wordstream)
File "/usr/local/lib/python2.4/site-packages/spambayes/classifier.py",
line 498, in _getclues
for word in Set(wordstream):
File "/usr/local/lib/python2.4/site-packages/spambayes/tokenizer.py",
line 1281, in tokenize
for tok in self.tokenize_body(msg):
File "/usr/local/lib/python2.4/site-packages/spambayes/tokenizer.py",
line 1640, in tokenize_bod
y
from spambayes.ImageStripper import crack_images
File
"/usr/local/lib/python2.4/site-packages/spambayes/ImageStripper.py", line
391, in ?
crack_images = ImageStripper(_cachefile).analyze
File
"/usr/local/lib/python2.4/site-packages/spambayes/ImageStripper.py", line
305, in __init__
self.cache = pickle.load(open(self.cachefile))
ValueError: insecure string pickle
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-11-02 20:11
Message:
Logged In: YES
user_id=552329
For the sake of resolving this, I've changed the code anyway:
1. If an error occurs loading the pickle, then a new one is
used - at least the classifier will keep going, and this
shouldn't hurt much (it's only a cache).
2. Saving saves to a temp file first, and then replaces the
old one. This should be completely (*nix) or reasonably
(win32) robust.
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-08-03 00:59
Message:
Logged In: YES
user_id=552329
I'm guessing that something went wrong writing the pickle.
(I get an EOFError trying to open the attached pickle). The
slurping code really ought to do what the other code does
and save a copy and then replace the original once the save
succeeds.
I'm reluctant to do this at the moment, though, since it
seems fairly likely that the slurping code will vanish given
that it's only experimental and no-one's spoken up saying
that it does them any good.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=922063&group_id=61702
More information about the Spambayes-bugs
mailing list