[spambayes-dev] More obvious logarithmic expiration data

T. Alexander Popiel popiel at wolfskeep.com
Mon Jun 9 15:09:03 EDT 2003


In message:  <1055186806.42.2463 at sake.mondoinfo.com>
             Matthew Dixon Cowles <matt at mondoinfo.com> writes:
>
>It's a combination. Four scores were computed for each message and
>the within-two-weeks, within-one-week, and within-24-hours scores
>were compared to the within-30-days score. Presumably, the larger
>differences are from the comparisons with the results that use the
>shorter cutoffs.

I've seen strange things come out of the data wherein more training
made things worse, so I wouldn't take the above presumption on faith. ;-)

>I've thought a bit of that. It might be useful to bias the delete
>function toward retaining a token that hadn't been used for longer
>periods as a function of hamcount+spamcount. Some more work could
>determine if that's a valuable strategy.

*nod*  This is why I was leaning towards average appearance
frequency over the lifetime of the token instead of time since
last use in my own thoughts.  Of course, I still haven't done
anything beyond thought-experiments.

>I agree but I meant something simpler than that. If I were on
>vacation for two weeks and therefore hadn't scored any messages in
>that time, it wouldn't make sense to expire my entire database.

I keep forgetting that some people score messages only when a
human is about to look at them.  I've got my rules in procmail,
so even if I go on vacation, the scoring takes place.

- Alex



More information about the spambayes-dev mailing list