Antispam measures circumventing

Chris Angelico rosuav at gmail.com
Fri Sep 20 11:44:17 EDT 2013


On Sat, Sep 21, 2013 at 1:04 AM, Jugurtha Hadjar
<jugurtha.hadjar at gmail.com> wrote:
> Supposing my name is John Doe and the e-mail is john.doe at hotmail.com, my
> e-mail was written like this:
>
> REMOVEMEjohn.doSPAMeSPAM at REMOVEMEhotmail.com'
>
> With a note saying to remove the capital letters.
>
> Now, I wrote this :
>
> for character in my_string:
> ...     if (character == character.upper()) and (character !='@') and
> (character != '.'):
> ...             my_string = my_string.replace(character,'')
>
>
> And the end result was john.doe at hotmail.com.
>
> Is there a better way to do that ?

Instead of matching the ones that are the same as their uppercase
version, why not instead keep the ones that are the same as their
lowercase?

>>> email = 'REMOVEMEjohn.doSPAMeSPAM at REMOVEMEhotmail.com'
>>> ''.join(filter(lambda x: x==x.lower(),email))
'john.doe at hotmail.com'

This could be a neat introduction to a functional style of code, if
you haven't already met it; use of filter and lambda expressions can
make for some beautifully expressive code.

> Also, what would in your opinion make it *harder* for a non-human
> to retrieve the original e-mail address? Maybe a function with no
> inverse function ? Generating an image that can't be converted back
> to text, etc..

Ah, now you're getting into the realm of CAPTCHAs. I'll be quite frank
with you: Don't bother. Many MANY experts are already looking into it
- with various levels of success. Spammers are getting better and
better at harvesting addresses and solving CAPTCHAs, and your legit
users aren't getting that benefit, so you make it harder for the
humans while still possible for the bots. (And some CAPTCHAs are
solved by simply farming the jobs off to actual human beings (in
China, I think I heard) for a pittance each. There's fundamentally no
way to prevent that.) So your options are:

1) Call on someone else's code. Search the internet for ways of
concealing email addresses, pick one that isn't too much hassle to
legit users, and use it. I've seen quite a few that put the email
address in an image, one way or another; they tend to be a bit
annoying, but some aren't too bad.

2) Give up on protecting your address, and protect your inbox instead.
Get some good spam filtering, and let 'em send it all at you. I run a
local mail server for a few domains, and even with the filter set
conservatively enough to all but eliminate false positives, we see
only a handful of false negatives (according to my logs, 182 emails
reported as spam this week, across all domains and all accounts - most
accounts see <10 a week, a couple of them see maybe 20-30). And again,
you can call on someone else to do the work for you - sending all your
mail to gmail lets you take advantage of their filtering, for
instance.

But hey. If you want to play around with text processing, Python's a
good choice for it!

ChrisA



More information about the Python-list mailing list