[Spambayes] Problems with classifying as spam

Jesse Pelton jsp at PKC.com
Thu Feb 4 15:29:16 CET 2010


SpamBayes doesn't follow links (see http://spambayes.sourceforge.net/faq.html#will-show-spam-clues-notify-a-spammer-that-i-opened-their-message for a tangentially related discussion), but it does process message headers.  Lots of good information in there that you might think came from a Web site.

Unless you're willing to dive into the the code and the math, I'd caution against trying to second-guess SpamBayes.  You're going to want it to behave rationally, and it doesn't (at least at the level you're looking at); it behaves statistically.  That's why the FAQ (http://spambayes.svn.sourceforge.net/viewvc/spambayes/trunk/spambayes/Outlook2000/docs/troubleshooting.html#Messages_have_incorrect_or_unexpected) suggests sending all the Spam clues to the list when trying to understand why a given message isn't classified as expected.


-----Original Message-----
From: spambayes-bounces+jsp=pkc.com at python.org on behalf of Ocean
Sent: Thu 2/4/2010 8:58 AM
To: spambayes at python.org
Subject: [Spambayes] Problems with classifying as spam
 

	In addition to the startup problems, Spambayes is having problems
marking messages as spam.  



As an example, I received this email:

------------------------------

Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall****

Body:

<URL Link>***Discount_Viagra_VXPL_Percocet*_Adderall****!
<Links to:> http://kashertqdum17.com/

------------------------------


	That's it.  The only text in the body of the message is that URL
link.  


There are two issues I see showing up:


1.  The subject and link text isn't being parsed properly.  Nowhere in the
spam clues are the words "viagra", "percocet", or "adderall" showing up.
The spam token involving the subject is "'subject:****'"  So, not only is
SpamBayes not treating the underscores as word seperators, but it's not even
getting to the words, because it looks like it's getting choked up on the
asterisks.

2.  I've got a *lot* of tokens showing up in the Spam Clues that are nowhere
in the email itself.  I'm guessing that Spambayes is actually going to that
link and processing what's on the page, but if so, that's a big problem.
First of all, it gives the spammers more flexibility in trying to bypass
spambayes.  And second, if it's following links, then it's confirming to the
spammers that my email address is valid.  That's a huge no-no.  Spambayes
should not be following links at all, but should only look in the message
itself.



_______________________________________________
SpamBayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/spambayes/attachments/20100204/4a18d6ea/attachment.htm>


More information about the SpamBayes mailing list