[spambayes-dev] More obvious logarithmic expiration data
T. Alexander Popiel
popiel at wolfskeep.com
Mon Jun 9 15:09:03 EDT 2003
In message: <1055186806.42.2463 at sake.mondoinfo.com>
Matthew Dixon Cowles <matt at mondoinfo.com> writes:
>
>It's a combination. Four scores were computed for each message and
>the within-two-weeks, within-one-week, and within-24-hours scores
>were compared to the within-30-days score. Presumably, the larger
>differences are from the comparisons with the results that use the
>shorter cutoffs.
I've seen strange things come out of the data wherein more training
made things worse, so I wouldn't take the above presumption on faith. ;-)
>I've thought a bit of that. It might be useful to bias the delete
>function toward retaining a token that hadn't been used for longer
>periods as a function of hamcount+spamcount. Some more work could
>determine if that's a valuable strategy.
*nod* This is why I was leaning towards average appearance
frequency over the lifetime of the token instead of time since
last use in my own thoughts. Of course, I still haven't done
anything beyond thought-experiments.
>I agree but I meant something simpler than that. If I were on
>vacation for two weeks and therefore hadn't scored any messages in
>that time, it wouldn't make sense to expire my entire database.
I keep forgetting that some people score messages only when a
human is about to look at them. I've got my rules in procmail,
so even if I go on vacation, the scoring takes place.
- Alex
More information about the spambayes-dev
mailing list