[Tutor] Retrieving data from a web site

Dave Angel davea at davea.name
Sat May 18 04:49:56 CEST 2013


On 05/17/2013 07:57 PM, Phil wrote:
> I'd like to "download" eight digits from a web site where the digits are
> stored as individual graphics. Is this possible, using perhaps, one of
> the countless number of Python modules? Is this the function of a web
> scraper?
>

Anything's possible.  But if these "digits" are purposely hard to read, 
perhaps to avoid spamming, then the likelihood of your algorithmically 
reading them is vanishingly small.  For example, "captcha" pictures.

There are libraries to "scrape" textual information from the web page, 
no sweat.  But that information might not even point directly to the 8 
image files.  There could be many layers of indirection, through 
javascript and other tricks.

But most importantly, if the images are deliberately distorted parodies 
of digits, most of us would be stymied, and I don't know any library 
anywhere that's intended to "break" such coding.

As a result, I'd recommend starting there.  Visit the page in a regular 
browser, use screen capture techniques to capture each of the displayed 
images, and have at it.  If you have no luck with those, no point in 
writing the other code, which could be anything from easy to very hard.

-- 
DaveA


More information about the Tutor mailing list