[spambayes-dev] correlated clues
Toby Dickenson
tdickenson at geminidataloggers.com
Sat Jun 26 09:47:52 EDT 2004
Im seeing a significant number of misclassified spams that come through
mailing lists. If the original spam body is small then it doesnt generate
enough tokens to outweigh those added by the mailing list. Manually removing
those tokens from the list causes it to be firmly nailed as spam.
(To be fair, most of these small ones are viruses not spams. But spambayes
does a good job of classifying those viruses that I receive direct, rather
than via a list.)
Example evidence below.
Has anyone implemented or tested any mechanism to inhibit these gangs of
tokens?
X-Spambayes-Classification: ham; 0.25
X-Spambayes-Evidence: '*H*': 0.67; '*S*': 0.16; 'so?': 0.11;
'header:Received:4': 0.15; 'subject:] ': 0.16; 'url:zope': 0.19;
'sender:addr:zope.org': 0.19; 'zope': 0.20;
'email addr:zope.org': 0.20; 'think': 0.20;
'to:addr:zope.org': 0.21; 'subject:Zope': 0.21;
'sender:no real name:2**0': 0.23; 'url:mailman': 0.24;
'url:listinfo': 0.24; 'url:mail': 0.26; 'subject:[': 0.29;
'maillist': 0.31; 'url:org': 0.31; 'header:Errors-To:1': 0.32;
'content-disposition:inline': 0.33; 'reply-to:none': 0.34;
'subject:!': 0.72; 'charset:windows-1252': 0.88;
'from:addr:info': 0.93; 'message-id:@mail.zope.org': 0.94;
'subject:you': 0.95;
'content-type:application/x-zip-compressed': 0.98;
'filename:fname piece:zip': 0.98
--
Toby Dickenson
More information about the spambayes-dev
mailing list