[spambayes-dev] Re: Pickle vs DB inconsistencies

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Jun 25 13:28:17 EDT 2003


>From my testing:

  * Unless dbexpimp.py is broken, the 10.pkl and 10.db you supplied were
not identical.  There are two tokens that have different counts:
'header:MIME-Version:1' and header:Mime-Version:1 (2,5 vs 4,3 and 4,3 vs
2,3 respectively).  I'm not sure what this means!

  * I couldn't use the same test messages as you because the filenames
weren't valid for a win32 system and I couldn't unpack them.  I grabbed
a random message of my own to use as a test and changed the simplescore
script, adding a initial learn (since I can't unlearn one that's not in
the db).

  * Is the message that you give to simplescore one of the ones that was
trained?  It should be, because you can't untrain a message that hasn't
been trained (you might get negative counts).

However, even given all of this, I also get the db count 1 higher for
each token.  This problem goes away if after every learn/unlearn call
there is a save call [1].  This would be why the problem doesn't occur
running hammie multiple times.

I'll keep looking...

=Tony Meyer

[1] Although this gave me the db error that was recently submitted as a
bug, which is also where Greg ran into it, I presume.  I think 'word'
should be 'key' - it makes sense and seems to work.  I think (sorry
Mark!) that it's in here that there is a problem.



More information about the spambayes-dev mailing list