[spambayes-bugs] [ spambayes-Bugs-922063 ] Intermittent sb_filter.py failure with URL pickle

SourceForge.net noreply at sourceforge.net
Fri Oct 19 06:36:37 CEST 2007


Bugs item #922063, was opened at 2004-03-23 17:10
Message generated for change (Comment added) made by david_abrahams
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=922063&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: hammie
Group: Source code - CVS
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Leonid (leobru)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Intermittent sb_filter.py failure with URL pickle

Initial Comment:
Here are the relevant .spambayesrc lines:

[Tokenizer]
x-fancy_url_recognition=True
x-pick_apart_urls=True

[URLRetriever]
x-slurp_urls=True

Here is the stack trace:

  File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 239, in ?
    main()
  File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 231, in main
    action(msg)
  File
"/usr/home/leob/spambayes-1.0a9/scripts/sb_filter.py",
line 163, in filter
    return h.filter(msg)
  File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/hammie.py",
line 109, in filter
    prob, clues = self._scoremsg(msg, True)
  File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/hammie.py",
line 38, in _scoremsg
    return self.bayes.spamprob(tokenize(msg), evidence)
  File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 246, in slurpi
ng_spamprob
    slurp_tokens = list(self._generate_slurp())
  File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 550, in _gener
ate_slurp
    self.setup()
  File
"/usr/home/leob/opt/lib/python2.2/site-packages/spambayes/classifier.py",
line 609, in setup
    self.bad_urls = pickle.load(b_file)
  File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 982, in load
    return Unpickler(file).load()
  File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 597, in load
    dispatch[key](self)
  File "/usr/home/leob/opt/lib/python2.2/pickle.py",
line 667, in load_string
    raise ValueError, "insecure string pickle"


----------------------------------------------------------------------

Comment By: David Abrahams (david_abrahams)
Date: 2007-10-18 23:36

Message:
Logged In: YES 
user_id=52572
Originator: NO

I'm seeing the same problem in ImageStripper.py now:

saving 720 items to /home/dave/spambayes/imagecache.pck
Traceback (most recent call last):
  File "/usr/local/bin/sb_filter.py", line 290, in ?
    main()
  File "/usr/local/bin/sb_filter.py", line 281, in main
    action(msg)
  File "/usr/local/bin/sb_filter.py", line 199, in filter
    return self.h.filter(msg)
  File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
156, in filter
    debug, train)
  File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
110, in score_and_filter
    prob, clues = self._scoremsg(msg, True)
  File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line
39, in _scoremsg
    return self.bayes.spamprob(tokenize(msg), evidence)
  File "/usr/local/lib/python2.4/site-packages/spambayes/classifier.py",
line 196, in chi2_spampro
b
    clues = self._getclues(wordstream)
  File "/usr/local/lib/python2.4/site-packages/spambayes/classifier.py",
line 498, in _getclues
    for word in Set(wordstream):
  File "/usr/local/lib/python2.4/site-packages/spambayes/tokenizer.py",
line 1281, in tokenize
    for tok in self.tokenize_body(msg):
  File "/usr/local/lib/python2.4/site-packages/spambayes/tokenizer.py",
line 1640, in tokenize_bod
y
    from spambayes.ImageStripper import crack_images
  File
"/usr/local/lib/python2.4/site-packages/spambayes/ImageStripper.py", line
391, in ?
    crack_images = ImageStripper(_cachefile).analyze
  File
"/usr/local/lib/python2.4/site-packages/spambayes/ImageStripper.py", line
305, in __init__
    self.cache = pickle.load(open(self.cachefile))
ValueError: insecure string pickle


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-11-02 20:11

Message:
Logged In: YES 
user_id=552329

For the sake of resolving this, I've changed the code anyway:

 1. If an error occurs loading the pickle, then a new one is
used - at least the classifier will keep going, and this
shouldn't hurt much (it's only a cache).

 2. Saving saves to a temp file first, and then replaces the
old one.  This should be completely (*nix) or reasonably
(win32) robust.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-08-03 00:59

Message:
Logged In: YES 
user_id=552329

I'm guessing that something went wrong writing the pickle. 
(I get an EOFError trying to open the attached pickle).  The
slurping code really ought to do what the other code does
and save a copy and then replace the original once the save
succeeds.

I'm reluctant to do this at the moment, though, since it
seems fairly likely that the slurping code will vanish given
that it's only experimental and no-one's spoken up saying
that it does them any good.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=922063&group_id=61702


More information about the Spambayes-bugs mailing list