[Spambayes] expiration ideas.

Alexander G. M. Smith agmsmith@rogers.com
Sun Oct 20 17:04:14 2002


Anthony Baxter wrote:
>   Keep the "interim" wordinfo around (gzipped, datestamped) until your
>   expiration time is up - then undo the earlier merge, subtracting
>   the spamcount/hamcounts. 
> 
> Thoughts? Unless there's a screamingly obvious "don't be stupid" I'll
> play with this tomorrow (ah, leave....)

Sounds reasonable.  But I'd rather keep around the whole messages so
that I can change tokenizing schemes.  Or perhaps use one of those
future inter-word relation schemes.

The total space is several times (ten times) more than a word list
(5.9MB raw, 2.4MB zipped archive, 1.5MB gzip tar file, 1.2MB
bzip2ed tar file vs 660KB raw, 270KB zipped word list), but it is
still almost trivial on today's computers and huge disk drives to
store the complete messages.  So, you have to ask yourself if a
10X space (and tokenizing time) savings is worth it.

- Alex