[Spambayes] Graph results

T. Alexander Popiel popiel at wolfskeep.com
Sat Mar 1 07:47:19 EST 2003


In message:  <20030301134557.15F1216F16 at jmason.org>
             jm at jmason.org (Justin Mason) writes:
>
>Alexander -- nice work!  Thanks for investigating this...

Heh.  It's just a way to use up even more CPU-hours, in the same
spirit as was prevalent last October... ;-)

>> 2. Spambayes continues to improve for a couple months,
>>    but I'm starting to see an increase in errors after
>>    about 4-5 months.  I don't know why this is; it might
>>    be because spam is mutating, or it might be because
>>    my definition of spam has been mutating.
>
>Spam has definitely been mutating heavily in the last 4 months.

Oh, definitely.  However, since the test runs were training
throughout the data period, one would hope that they'd have
picked up on the mutations without a loss of accuracy.  (Of
course, some of the mutations have been to include features
that SB doesn't recognize at all (s p a c e d  o u t  w o r d s),
which could well be the source of the trouble.)

I'm just worried that having too much information about past
forms of spam may be interfering with recognition of current
spam (through the auspices of spam probability deflation due
to the probabilities being based on fraction of known spams
containing any feature... so as more spams are known with
differing features, the probability for any given feature
decreases).  Hence my interest in aging.

>> Anyway, the next thing for me to really look at is the effect
>> of aging...
>
>As in expiration of tokens?  I thought SB didn't use that?
>Or do you mean validity of trained results from >3 months ago...

Standard SB doesn't, you're right.  On the other hand, my personal
installation (not what I ran tests with!) expires messages after
120 days.  I'm curious to see if this is actually the boon I
suspect it is.

- Alex



More information about the Spambayes mailing list