[Spambayes] don't update if you don't want to retrain
Richie Hindle
richie@entrian.com
Sun Dec 1 20:20:10 2002
[Neale]
> I've just checked in a new anydbm that has a more appropriate list of
> database back-ends to try on the Windows platform. [...]
> This should eliminate any dbm concerns for Windows folk.
You left dbhash in the list - that's just another interface to the broken
bsddb. And if that gets removed, Windows users will be left with dumbdbm -
the name doesn't inspire confidence, and the docstring says "XXX TO DO: -
seems to contain a bug when updating..."
As far as I can see there's a complete solution available to these DBM
problems. Perhaps I've missed something, but I've been back over all the
discussions and I can't see anything wrong with it:
o We demand bsddb 3 or better on platforms where bsddb is the dbm
implementation that gets picked up. So until Python 2.3 is released,
Windows users need to install pybsddb. I've just done this and it's
trivial. (We already demand a new "email" library and no-one's
complained.) Would this cause problems on any other platforms?
o If training goes slowly, we implement Tim Peters' idea: "Bulk training
could be taught to use a new classifier based on an in-memory dict.
When that's done, the in-memory dict's ham and spam counts would be
added into the persistent DB (rewriting only those WordInfo records
corresponding to words that appeared in the bulk training data), and
then the in-memory dict could be thrown away."
o Or (Neale) you were talking about writing a caching front-end for the
DBM (regardless of which actual DBM was behind it) - that would work
as well.
Wouldn't that solve *everything*? Startup times would be quick, training
would be quick, no buggy DBM implementations would be used, and different
components wouldn't default to different storage formats (hammie vs.
pop3proxy). Installing pybsddb on Windows is trivial, and once Python 2.3
comes out you won't even need to do that.
I've probably missed something - it's hard to keep up!
--
Richie Hindle
richie@entrian.com
More information about the Spambayes
mailing list