[Spambayes] Database reduction

Neale Pickett neale@woozle.org
Mon Nov 4 22:36:37 2002


So then, Tim Peters <tim.one@comcast.net> is all like:

> OTOH,
> 
> >>> cPickle.dumps(w.__getstate__(), 1)
> '(U\x04aoeuq\x01K\x00K\x00K\x00K\x02t.'
> >>> len(_)
> 19
> >>>
> 
> which is shorter than your string repr.  This isn't typical because 2
> is an absurd spamprob (it's > 1, and is an int instead of a double);
> the savings would be greater with a real spamprob (which will consume
> about 19 bytes in a string repr, but about 8 in a pickle).

Right.  I had some code in hammie to pickle the tuple instead of the
object itself, but I thought it was a pretty gnarly kludge at the time.
In any case, some variation on this seems obviously the right way to go.

> [ Tim magic regarding pickle hacks ]

> I'd avoid all that and pickle the states, but that's just me.

I'm inclined to agree with you.  If I do this, though, we have to all
agree on a convention: if you need to modify a wordinfo object, you
*must* write it back to the dictionary.  Otherwise hammie will never
know it changed.  I was bitten by this a few times at first, and I
haven't played with the code enough to know if any of this has crept
back in.

Would it be out of line to alter WordInfo to be immutable, to encourage
folks to write it back to the dictionary?

Neale