[Mailman-Developers] Interesting study -- spam onpostedaddresses...

21 Feb 2002 13:23:48 +0900

>>>>> "Chuq" == Chuq Von Rospach <chuqui@plaidworks.com> writes:

    Chuq> On 2/20/02 1:37 PM, "Damien Morton"
    Chuq> <dm-temp-310102@nyc.rr.com> wrote:
    >> As far as I can see thay are using url/cgi encoding in the
    >> email address. This is trivial to circumvent, as is using html
    >> entities, or any other reversible scheme.

    Chuq> With a constantly varying algorithm. So they obfuscate, but
    Chuq> they never obfuscate in a predictable way. Which means if
    Chuq> you're a spambot, you have to look at every byte of every
    Chuq> page and attempt to de-obfuscate it in every possible way to
    Chuq> see if it's obfuscated. You CAN do it, but you make it
    Chuq> computationally massively expensive.

Er, last I heard "massively expensive" ~ "exponential".  This is
O(n*m) where _n_ is the number of bytes and _m_ is the number of
obfuscations, and _m_ is bounded by user patience.

Nor do the spammers need to deobfuscate all the obfuscations.  They
only need enough that they're getting a reasonable harvest rate.  But
the people who post to /. etc tend to be repeat offenders, and the
obfuscation is random.  So we lose as soon as the amount of address
content obfuscated in this way becomes noticable.

And maybe before that, as many spammers seem to take address-hiding as
a personal offense, in the same way that crackers view passwords.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.