[spambayes-dev] A spectacular false positive

Tim Peters tim.one at comcast.net
Sat Nov 15 22:34:59 EST 2003


[T. Alexander Popiel]
> ...
> This is something that I don't understand... why do we care if the
> database is huge?  With 100 gigabyte drives commonplace, why are
> we quibbling over 20 or 40 megabytes?

I expect large drives are still rare among consumers, and this has become a
"mass market" application.  It wouldn't be *just* the database size, of
course -- keeping "last access" up to date also requires caching token
timestamps in memory, and most significantly updating the DB on disk after
scoring (we never have to write to disk after scoring now, only after
training).  So there are many costs.  I'd feel a lot better about it if
Berkeley DB were a lot faster on Windows, and wasn't still implicated in so
many maddeningly baffling database corruption reports.




More information about the spambayes-dev mailing list