[spambayes-dev] siickkk and deprrravved stufff totallly grossssse

Tim Peters tim.one at comcast.net
Mon Dec 22 12:50:25 EST 2003


[Glenn Brown]
>> 'from:addr:aloktorvaldis.com'       0.155172            1      0
>> 'from:addr:ger'                     0.155172            1      0
>> 'from:name:detractors m. tinnier'   0.155172            1      0
>> 'heyyouguys,'                       0.155172            1      0
>> 'huunngg'                           0.155172            1      0
>> 'message-id:@aloktorvaldis.com'     0.155172            1      0
>> 'reply-to:addr:aloktorvaldis.com'   0.155172            1      0
>> 'subject:giirrllss'                 0.155172            1      0
>> 'subject:soak'                      0.155172            1      0
>> 'subject:squiiirrrtt'               0.155172            1      0
>> 'url:aloktorvaldis'                 0.155172            1      0

[T. Alexander Popiel]
> Based on these clues, I'd say that you trained on one of these
> messages as ham.  That'll certainly encourage a ham classification
> for them.

Yup, looks certain -- or else Glenn makes some mighty fine distinctions
about which kinds of porn spam he *wants* to see <wink>.

This line:

'reply-to:addr:stacy'               0.290906            1      1

also tells us the database was trained on a lot more spam than ham (a token
appearing equally often in both ends up with a decidedly hammy spamprob).
Glenn, you should find that spambayes works better if you train on *less*
spam (or more ham -- the math works out best if you train on an
approximately equal number of each).  This database isn't wildly unbalanced,
but it's beyond the point where my classifier starts acting flaky.




More information about the spambayes-dev mailing list