[Spambayes] expiration ideas.
Anthony Baxter
anthony@interlink.com.au
Mon Oct 21 05:22:47 2002
>>> "Alexander G. M. Smith" wrote
> Anthony Baxter wrote:
> > Keep the "interim" wordinfo around (gzipped, datestamped) until your
> > expiration time is up - then undo the earlier merge, subtracting
> > the spamcount/hamcounts.
> Sounds reasonable. But I'd rather keep around the whole messages so
> that I can change tokenizing schemes. Or perhaps use one of those
> future inter-word relation schemes.
That's fine, but once this stuff is deployed, how many end-users are
going to want to tweak their tokeniser? I'd suggest approximately
three eighth's of one fifth of bugger-all :)
> The total space is several times (ten times) more than a word list
> (5.9MB raw, 2.4MB zipped archive, 1.5MB gzip tar file, 1.2MB
> bzip2ed tar file vs 660KB raw, 270KB zipped word list), but it is
> still almost trivial on today's computers and huge disk drives to
> store the complete messages. So, you have to ask yourself if a
> 10X space (and tokenizing time) savings is worth it.
For one user, fine - but in a setting where you've got multiple
users, say, using an IMAP server? You'd want the stuff to happen
on the server, before the end users have to run a program to
download the mail, check it, and send commands to the IMAP server
to move the spam out of the way...
I also get enough email that I really don't want to be lugging
around all of my old email for a couple of months...
Anthony
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.