[spambayes-bugs] [ spambayes-Bugs-777026 ] Possible cause for db corruption in storage.py/DBDictClassif

SourceForge.net noreply at sourceforge.net
Thu Jul 24 21:25:30 EDT 2003


Bugs item #777026, was opened at 2003-07-25 02:17
Message generated for change (Comment added) made by mhammond
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Fionn Behrens (fionn)
Assigned to: Mark Hammond (mhammond)
Summary: Possible cause for db corruption in storage.py/DBDictClassif

Initial Comment:

DBDistClassifier uses a neat trick to save some memory:

    def _wordinfoset:
        if record and (record.spamcount+record.hamcount
<= 1):
            self.db[word] = record.__getstate__()
            # Remove this word from the changed list
(not that it should be
            # there, but strange things can happen :)
            try:
                del self.changed_words[word]
            except KeyError:
                pass

Unfortunately the programmer seems to have overlooked
that there might already be a self.wordinfo[word] entry
if (record.spamcount+record.hamcount) have been > 1
previously and some message has been untrained.
So, if some record is e.g. untrained from a count of 2
to a count of 1, then wordinfo[word] will still be 2
while the db[word] entry will be 1. This can lead to
minor miscounts in the spam/ham.

To circumvent the problem, the following should be
added to storage.py at line 239 (referring to version
1.0a3, right below the code part you see above):

            try:
              del self.wordinfo[word]
            except KeyError:
              pass

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2003-07-25 13:25

Message:
Logged In: YES 
user_id=14198

The code currently works, as in the case you describe
self.wordinfo[key] is still correctly set to 1.  Thus, the
_wordinfoget() gets the correct value.

1.3 is quite out of date - other bugs have been fixed since
then. However, I added a test\test_storage.py file that
tries to exercise these edge cases - if you believe there is
still a bug, please provoke that into failing.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=777026&group_id=61702



More information about the Spambayes-bugs mailing list