OT: spam filtering idea

Paul Wright -$P-W$- at verence.demon.co.uk
Tue Jan 14 08:27:49 EST 2003


In article <mailman.1042490665.8620.python-list at python.org>,
Tim Peters  <tim.one at comcast.net> wrote:
>[Paul Wright]
>> <http://www.jerf.org/irights/2002/11/18.html> argues that human malice
>> can and will defeat Bayesian filters, and that widespread adoption of
>> them will end up making spam harder to recognize by hand. 
...
>He's probably right that the way to beat this generation of filters is to
>create spam statistically indistinguishable from ham.  The unknown not
>addressed there is that all forms of advertising are a percentage game, and
>current spam uses (e.g.) ALL CAPS and huge fonts and bright colors because
>those tricks increase response rate.  Spam so bland that it looks like it
>came from your grandmother may not draw a response rate large enough to
>repay the costs of spamming (which, while tiny on a per-msg basis, aren't
>zero).

Indeed. However, I am seeing a lot of "minimalist" spam which is
obviously intended to evade body filtering: usually just a URL and a
hashbuster. I imagine that they're banking on people being curious
enough to click the link. I'm planning on dealing with short spam like
this by looking up the website host IP in blacklists, but it's not quite
enough of a problem to worry about yet.

>> [1] How many boneheaded keyword filters will now bounce this post when
>> it goes out as mail on the python list, I wonder? There's an awful lot
>> of snake oil out there being sold as spam filters.
>
>I received it via the c.l.py mailing list gateway.  My personal spambayes
>filter gave your msg a score of 0.9998 for haminess (0.0 = spam, 1.0 = ham),

Indeed. I don't think the Bayesian stuff is snake oil. However, mailing
list operators often complain about broken filters which seem to operate on
single key phrases (such as "Viagra" or "my pictures") in isolation,
causing legitimate discussion to get filtered. Someone out there is
probably making money selling these filters to big business, alas. See
<http://groups.google.com/groups?selm=ahkddc%24b7a%241%40verence.demon.co.uk>

-- 
Paul Wright | http://pobox.com/~pw201 |




More information about the Python-list mailing list