[Tutor] Textual Captchas: Would like to show that bots can crack a simple text captcha

Luke Paireepinart rabidpoobear at gmail.com
Wed Aug 1 13:32:17 CEST 2007


Kyle Brooks wrote:
> Hi.
>
> My name is Kyle Brooks, and I hereby introduce myself with this post.
>
> I have seen a lot of text captchas. A text captcha is the logical
> opposite of a image captcha. That is, among other qualities: image
> captchas are inaccessible, while text captchas are, and image captchas
> cannot be easily read by bots, while text captchas can be because they
> are clearly embedded in forms for all to visibly see.
>
> Therein lies the problem. You see, text captchas can be insecure! But
> there are too many people that think text captchas are secure, that
> they are a panacea. So, I would like to show that a bot can crack a
> simple textual captcha that involves math, like 1 + 1.
>   
Essentially, if you generate these questions programmatically, and 
they're all math-based, then you're right,
this is trivially easy to crack.
If, however, you had questions such as "what color is the sky?"  "do 
cats meow or bark?" and such, like that,
thrown in with the math questions, it becomes slightly more difficult to 
crack these.
> I want to generate a captcha, show it, and have the user (in this
> case, me) submit the answer.
You could write a simple CGI script to do this.  You'll obviously have 
to have a webserver with CGI running on your local machine.
>  Then I would like to write an automated
> bot that goes to the webpage, reads the captcha, evaluates it, puts it
> in, and submits it.
>
> I would like some suggestions on how to do both stages.
>   
First, install Apache.  Then get CGI working.
Then, figure out a way to generate your captchas and their answers.
Here's a very hackish and simple way to do this:

ops = ['%s +','%s -','%s *','%s /']
import random
temp = ' '.join([random.choice(ops) for x in range(random.randrange(3,6))])
temp += ' %s'

vals = []
while 1:
    try:
        finalstr = temp % tuple(vals)
        break
    except TypeError:
        vals.append(random.randrange(30))
print "The answer to the captcha :",finalstr
finalanswer = eval(finalstr)
print "should be :",finalanswer

I must confess, I just woke up a few minutes ago and I didn't take the 
time to come up with a good solution to your problem,
so there's almost definitely a better way to do this.  You might want to 
throw some random text into the finalstr, such as
"what is the answer to " + finalstr + "?"  so that your naysayers won't 
say "but my captchas have other text!" etc.

As far as automating reading the captcha, use BeautifulSoup to parse the 
HTML, figure out where on their page the captcha lies.
Automating the form submission is pretty easy.  Especially if it's a 
submit form, instead of a post, or something, you just submit by
making a query with the form data attached to the end.
For example, if I wanted to submit the values "q = 2000" and "x = 100" I 
would do this:
http://www.example.com/captcha.html?q=2000&x=100
If this is the case, you don't even have to deal with the HTML.  Just 
use urllib to load this web address, and save the page locally,
so you can check if it says "YOU SOLVED IT" or "YOU SUCK" or whatever.
> In advance, thanks for any help that you may give.
>
> - Kyle
>   
-Luke


More information about the Tutor mailing list