[spambayes-dev] Re: Pickle vs DB inconsistencies
Meyer, Tony
T.A.Meyer at massey.ac.nz
Wed Jun 25 13:28:17 EDT 2003
>From my testing:
* Unless dbexpimp.py is broken, the 10.pkl and 10.db you supplied were
not identical. There are two tokens that have different counts:
'header:MIME-Version:1' and header:Mime-Version:1 (2,5 vs 4,3 and 4,3 vs
2,3 respectively). I'm not sure what this means!
* I couldn't use the same test messages as you because the filenames
weren't valid for a win32 system and I couldn't unpack them. I grabbed
a random message of my own to use as a test and changed the simplescore
script, adding a initial learn (since I can't unlearn one that's not in
the db).
* Is the message that you give to simplescore one of the ones that was
trained? It should be, because you can't untrain a message that hasn't
been trained (you might get negative counts).
However, even given all of this, I also get the db count 1 higher for
each token. This problem goes away if after every learn/unlearn call
there is a save call [1]. This would be why the problem doesn't occur
running hammie multiple times.
I'll keep looking...
=Tony Meyer
[1] Although this gave me the db error that was recently submitted as a
bug, which is also where Greg ran into it, I presume. I think 'word'
should be 'key' - it makes sense and seems to work. I think (sorry
Mark!) that it's in here that there is a problem.
More information about the spambayes-dev
mailing list