Antispam measures circumventing
Jussi Piitulainen
jpiitula at ling.helsinki.fi
Fri Sep 20 12:23:29 EDT 2013
Jugurtha Hadjar writes:
> Supposing my name is John Doe and the e-mail is john.doe at hotmail.com,
> my e-mail was written like this:
>
> REMOVEMEjohn.doSPAMeSPAM at REMOVEMEhotmail.com'
>
> With a note saying to remove the capital letters.
>
> Now, I wrote this :
>
> for character in my_string:
> ... if (character == character.upper()) and (character !='@') and
> (character != '.'):
> ... my_string = my_string.replace(character,'')
That does a lot of needless work, but I'll suggest other things
instead of expanding on this remark.
First, there's character.isupper() that will replace your entire
condition.
Second, there's ''.join(c for c in my_string if not c.isupper()).
> And the end result was john.doe at hotmail.com.
>
> Is there a better way to do that ? Without using regular expressions
> (Looked *really* ugly and it doesn't really make sense, unlike the few
> lines I've written, which are obvious even to a beginner like me).
I don't see how you get to consider '[A-Z]' ugly. (Python doesn't seem
to have the named character classes like '[[:upper:]]' that would do
more than ASCII in some regexp systems. I only looked very briefly.)
Third, here's a way - try help(str.translate) and help(str.maketrans)
or python.org for some details:
>>> from string import ascii_uppercase
>>> 'Ooh, CamelCase!'.translate(str.maketrans('', '', ascii_uppercase))
'oh, amelase!'
> I obviously don't like SPAM, but I just thought "If I were a spammer,
> how would I go about it".
>
> Eventually, some algorithm of detecting the
> john<dot>doe<at>hotmail<dot>com must exist.
>
> Also, what would in your opinion make it *harder* for a non-human to
> retrieve the original e-mail address? Maybe a function with no
> inverse function ? Generating an image that can't be converted back
> to text, etc..
Something meaningful: make it john.doeray at hotmail.com with a note to
"remove the female deer" for john.ray at hotmail.com, or "remove the drop
of golden sun" for "john.doe at hotmail.com". You may get a cease and
desist letter - much uglier than a simple regex - if you do literally
this, but you get the idea. I've seen people using "remove the animal"
or "remove the roman numeral".
(Put .invalid at the end, maybe. But I wish spam was against the law,
effectively.)
More information about the Python-list
mailing list