spam classification breaker

Tim Peters tim.one at comcast.net
Thu Feb 5 12:20:54 EST 2004


[Michael Hudson]
> I did wonder what the point of some of the stuff that ends up in my
> unsure folder was.  It seems so mashed up that even if I wanted to
> work out what the hell they were selling me I would have a hard time
> figuring it out.

Those usually aren't attempts to defeat Bayesian classifiers (and as Dr.
Graham-Cumming noted in the article, "word salad" isn't effective against
such classifiers).  They're usually attempts to frustrate fingerprinting
schemes -- lots of filters work by trying to measure the similarity of a new
message to a database of "fingerprints" derived from known spam.  Lots of
spam has always contained random gibberish strings to defeat the simplest
approaches of that kind.  A newer twist is adding random dictionary words
("word salad"), because some fingerprinting schemes got smart enough to
ignore non-dictionary words, or even to penalize a message for containing
gibberish.

Another thing feeding into this is that the smaller spammers seem to have
trouble figuring out how to use their software.

...

>> I have to deduce *your* magic words, not mine.  I have to send email
>> to you, and deduce what you did and didn't look at.  This is an
>> expensive process for the spammer, of course.

> Surely there comes a point when just sending me mail selling something
> I actually want becomes cheaper...

Note that the article didn't say spammers *are* doing this, it's just
something they *could* do.  AFAIK, it's just a theoretical attack at this
point.  Sales is a percentage game, and spammers prosper now on response
rates so low that any increase in the cost of delivering spam (which is very
cheap per unit for them, but not free) hurts.  At some point it will indeed
become more profitable to use traditional targeted advertising, which is
much more expensive to produce (it's not cheap for me to guess what you'd
buy), but also enjoys much higher response rates.





More information about the Python-list mailing list