[spambayes-dev] Re: Pickle vs DB inconsistencies

Meyer, Tony T.A.Meyer at massey.ac.nz
Thu Jun 26 17:48:14 EDT 2003


Ah well, I almost had it right before...

Ok more investigation prompted by trying to come up with an example for
Tim.

(Note, I wasn't saying that the database package was broken, just that
the _wordinfo* functions in storage.py were).

I can now get a list of the incorrect words by putting a print statement
in two places - either all those words for which _wordinfodel() is
called, or all those words for whom the "del self.changed_words[word]"
line does not raise an exception in _wordinfoset().

I guess the problem is not what I guessed before (to my credit, I said
that I was unsure, and that I had narrowed it down, which was true ;),
but along the lines of the delete issue that Tim pointed out.  I was
somewhat on the right track...

The problem (I am more sure, but still in the unsure range ;) is when
tokens are deleted before they are written to the db.  (A much nicer and
easier to solve problem :)

Here's example code:

from spambayes.storage import DBDictClassifier
from spambayes.classifier import WordInfo
d = DBDictClassifier("fail.db")
print "Should not be an entry"
print d._wordinfoget("tok")
w = WordInfo()
w.hamcount = 1
d._wordinfoset("tok", w)
print "Should have a ham count of 1, spam count of 0"
print d._wordinfoget("tok")
w.hamcount -=1 # not really necessary
d._wordinfodel("tok")
#d.store()  # uncomment this line and it will work
print "Should not be an entry (or have ham and spam of 0)"
print d._wordinfoget("tok")

=Tony Meyer



More information about the spambayes-dev mailing list