[spambayes-dev] spammy subject lines

T. Alexander Popiel popiel at wolfskeep.com
Tue Oct 14 12:28:58 EDT 2003


In message:  <1ED4ECF91CDED24C8D012BCF2B034F13026F29A6 at its-xchg4.massey.ac.nz>
             "Tony Meyer" <tameyer at ihug.co.nz> writes:
>> > So, I (and anyone else ;) should do timcv.py,
>> That's right, and with -n10 if possible.
>
>Is there a recommended number of messages in each bucket (or min and max
>numbers)?  I think I remember seeing 500 mentioned at one point, but I
>can't remember where (and am too lazy to search).

Some of the original shootout tests were done with a minimum of 2000
each ham/spam messages divided into 10 buckets (for 200 per bucket).
Of course, more is better, but since Tim said 200 for a useful lower
bound back then, I'll trust him. :-)

>Would a better move be to update cmp.py so that it does know about
>unsures?

+1

>I do *get* a lot of list mail, but I don't keep it around

Eh, am I the only one around here who never throws away mail?
I've for 75K+ messages, over 2/3 of which is spam, collected
over the last year...

>Of course, my data doesn't really tell us anything until we can compare
>it to someone else's...hopefully the OP, at least, will give this a go.

Working on it.

- Alex



More information about the spambayes-dev mailing list