[Spambayes] Database reduction

Neale Pickett neale@woozle.org
Fri Nov 1 00:14:41 2002


So then, Tim Peters <tim.one@comcast.net> is all like:

> [cool database trick]

The bigger problem, at least for hammie, is that pickling wordinfo
instances makes huge strings, the majority of which is redundant
information.  When pickling a Bayes object, the pickler is smart enough
not to repeatedly say "this is a wordinfo object" but rather, I assume,
"this is of type 2", only having to name type 2 once.  However, hammie
pickles each wordinfo individually, keyed by a string.  This makes for
fast lookups, but giant databases.

Tim just mentioned a performance tweak; is this an indicator that now
would be a good time to resume trying to reduce hammie's database size?

Neale