help!! *extra* tricky web page to extract data from...

Steve Holden steve at holdenweb.com
Tue Mar 13 20:13:58 EDT 2007


Paul Rubin wrote:
> "Diez B. Roggisch" <deets at nospam.web.de> writes:
>> Obviously this wouldn't really help, as you can't predict what a
>> website actually wants which events, in possibly which
>> order. Especially if the site does not _want_ to be scrapable- think
>> of a simple "click on the images in the order of the numbers shown on
>> them" captcha.
> 
> Sure, but most sites don't go to such lengths, and even captchas can
> be defeated if you're trying to scrape a specific site and are willing
> to spend effort on the particular captcha generator that it uses.
> Plus there is always www.captchasolver.com (!).
> 
I especially like the rems and conditions they ask you to acknowledge if 
you want to sign up as a worker:

   http://www.captchasolver.com/join/worker#

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb     http://del.icio.us/steve.holden
Blog of Note:          http://holdenweb.blogspot.com
See you at PyCon?         http://us.pycon.org/TX2007




More information about the Python-list mailing list