[Spambayes] Randomized Spam Beating SpamBayes

Amedee Van Gasse amedee at amedee.be
Wed Oct 18 00:24:44 CEST 2006


Op dinsdag 17-10-2006 om 16:12 uur [tijdzone -0600], schreef Quinn:
> > Sounds like something a "disociated press" or other random text
> > generator created. Perhaps you know about the monkeys with a
> > typewriter? If you let a thousand monkeys press random keys on
> > a typewriter, eventually one of them will by accident write a
> > few lines from a Shakespeare sonnet. These random text
> > generators work in a similar way.
> 
> Interesting.  I hadn't realized that was being used to actually do anything;
> that's kind of cool.  Not sure if these are coming from that sort of thing,
> though.  There are references to specific websites and publications
> scattered around self-referentially.  I really think they're somehow farming
> real source and taking strings of variable length and just stringing them
> together.  It's a pretty good way to produce coherent-ish body text that
> doesn't read as gibberish from an electronic standpoint.
> 
> So, does this sort of thing defeat SpamBayes?  They're making it through the
> filter with great regularity, and have been for quite a while, so the
> algorithms haven't figured it out in several hundred messages.  Is there
> _any_ way to deal with it, in SB or any other filter other than sender
> black- or white lists?  

I suppose there must be some way, because I don't get them.
Your message with the example scored as unsure:

X-Spambayes-Classification: unsure; 0.79

If it didn't include the typical spambayes mailing list headers, I'm
sure it would have gotten an even higher spam score.

-- 
Amedee



More information about the SpamBayes mailing list