[spambayes-bugs] [ spambayes-Bugs-777026 ] Possible cause for db
corruption in storage.py/DBDictClassif
SourceForge.net
noreply at sourceforge.net
Thu Jul 24 15:28:37 EDT 2003
Bugs item #777026, was opened at 2003-07-25 04:17
Message generated for change (Settings changed) made by anadelonbrin
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fionn Behrens (fionn)
>Assigned to: Mark Hammond (mhammond)
Summary: Possible cause for db corruption in storage.py/DBDictClassif
Initial Comment:
DBDistClassifier uses a neat trick to save some memory:
def _wordinfoset:
if record and (record.spamcount+record.hamcount
<= 1):
self.db[word] = record.__getstate__()
# Remove this word from the changed list
(not that it should be
# there, but strange things can happen :)
try:
del self.changed_words[word]
except KeyError:
pass
Unfortunately the programmer seems to have overlooked
that there might already be a self.wordinfo[word] entry
if (record.spamcount+record.hamcount) have been > 1
previously and some message has been untrained.
So, if some record is e.g. untrained from a count of 2
to a count of 1, then wordinfo[word] will still be 2
while the db[word] entry will be 1. This can lead to
minor miscounts in the spam/ham.
To circumvent the problem, the following should be
added to storage.py at line 239 (referring to version
1.0a3, right below the code part you see above):
try:
del self.wordinfo[word]
except KeyError:
pass
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702
More information about the Spambayes-bugs
mailing list