[spambayes-dev] spammy subject lines
T. Alexander Popiel
popiel at wolfskeep.com
Tue Oct 14 12:28:58 EDT 2003
In message: <1ED4ECF91CDED24C8D012BCF2B034F13026F29A6 at its-xchg4.massey.ac.nz>
"Tony Meyer" <tameyer at ihug.co.nz> writes:
>> > So, I (and anyone else ;) should do timcv.py,
>> That's right, and with -n10 if possible.
>
>Is there a recommended number of messages in each bucket (or min and max
>numbers)? I think I remember seeing 500 mentioned at one point, but I
>can't remember where (and am too lazy to search).
Some of the original shootout tests were done with a minimum of 2000
each ham/spam messages divided into 10 buckets (for 200 per bucket).
Of course, more is better, but since Tim said 200 for a useful lower
bound back then, I'll trust him. :-)
>Would a better move be to update cmp.py so that it does know about
>unsures?
+1
>I do *get* a lot of list mail, but I don't keep it around
Eh, am I the only one around here who never throws away mail?
I've for 75K+ messages, over 2/3 of which is spam, collected
over the last year...
>Of course, my data doesn't really tell us anything until we can compare
>it to someone else's...hopefully the OP, at least, will give this a go.
Working on it.
- Alex
More information about the spambayes-dev
mailing list