[spambayes-dev] Re: Pickle vs DB inconsistencies

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Jun 25 14:06:55 EDT 2003


Ok, I think I have this figured out now.

The DBDictClassifier currently tries to be efficient by not storing
"singleton" words (i.e. words that have only appeared once) in the
wordinfo cache, but saving them directly to the database.  This is all
fine, except that they are *not* saved to the database until store() is
called.  This means that between a call to _wordinfoset() and a call to
store() the counts are unreliable.

To get around this, we need to either sync the db in the _wordinfoset
function (seems to be expensive), or cache the words after all, or
something else.

Anyway, this is how it seems to me - I could be wrong!  If Mark or
someone more familiar with this stuff could look at it, that would be
great.

=Tony Meyer



More information about the spambayes-dev mailing list