[spambayes-bugs] [ spambayes-Bugs-777026 ] Possible cause for db corruption in storage.py/DBDictClassif

SourceForge.net noreply at sourceforge.net
Thu Jul 24 15:28:37 EDT 2003


Bugs item #777026, was opened at 2003-07-25 04:17
Message generated for change (Settings changed) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fionn Behrens (fionn)
>Assigned to: Mark Hammond (mhammond)
Summary: Possible cause for db corruption in storage.py/DBDictClassif

Initial Comment:

DBDistClassifier uses a neat trick to save some memory:

    def _wordinfoset:
        if record and (record.spamcount+record.hamcount
<= 1):
            self.db[word] = record.__getstate__()
            # Remove this word from the changed list
(not that it should be
            # there, but strange things can happen :)
            try:
                del self.changed_words[word]
            except KeyError:
                pass

Unfortunately the programmer seems to have overlooked
that there might already be a self.wordinfo[word] entry
if (record.spamcount+record.hamcount) have been > 1
previously and some message has been untrained.
So, if some record is e.g. untrained from a count of 2
to a count of 1, then wordinfo[word] will still be 2
while the db[word] entry will be 1. This can lead to
minor miscounts in the spam/ham.

To circumvent the problem, the following should be
added to storage.py at line 239 (referring to version
1.0a3, right below the code part you see above):

            try:
              del self.wordinfo[word]
            except KeyError:
              pass

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702



More information about the Spambayes-bugs mailing list