[spambayes-dev] correlated clues

Toby Dickenson tdickenson at geminidataloggers.com
Sat Jun 26 09:47:52 EDT 2004


Im seeing a significant number of misclassified spams that come through 
mailing lists. If the original spam body is small then it doesnt generate 
enough tokens to outweigh those added by the mailing list. Manually removing 
those tokens from the list causes it to be firmly nailed as spam.

(To be fair, most of these small ones are viruses not spams. But spambayes 
does a good job of classifying those viruses that I receive direct, rather 
than via a list.)

Example evidence below.

Has anyone implemented or tested any mechanism to inhibit these gangs of 
tokens?

X-Spambayes-Classification: ham; 0.25
X-Spambayes-Evidence: '*H*': 0.67; '*S*': 0.16; 'so?': 0.11;
        'header:Received:4': 0.15; 'subject:] ': 0.16; 'url:zope': 0.19;
        'sender:addr:zope.org': 0.19; 'zope': 0.20;
        'email addr:zope.org': 0.20; 'think': 0.20;
        'to:addr:zope.org': 0.21; 'subject:Zope': 0.21;
        'sender:no real name:2**0': 0.23; 'url:mailman': 0.24;
        'url:listinfo': 0.24; 'url:mail': 0.26; 'subject:[': 0.29;
        'maillist': 0.31; 'url:org': 0.31; 'header:Errors-To:1': 0.32;
        'content-disposition:inline': 0.33; 'reply-to:none': 0.34;
        'subject:!': 0.72; 'charset:windows-1252': 0.88;
        'from:addr:info': 0.93; 'message-id:@mail.zope.org': 0.94;
        'subject:you': 0.95;
        'content-type:application/x-zip-compressed': 0.98;
        'filename:fname piece:zip': 0.98

-- 
Toby Dickenson



More information about the spambayes-dev mailing list