stealth screen scraping with python?

kyosohma at gmail.com kyosohma at gmail.com
Fri May 11 15:43:35 EDT 2007


On May 11, 2:32 pm, different.eng... at gmail.com wrote:
> Folks:
>
> I am screen scraping a large volume of data from Yahoo Finance each
> evening, and parsing with Beautiful Soup.
>
> I was wondering if anyone could give me some pointers on how to make
> it less obvious to Yahoo that this is what I am doing, as I fear that
> they probably monitor for this type of activity, and will soon ban my
> IP.
>
> -DE

Depends on what you're doing exactly. I've done something like this
and it only hits the page once:

URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2'
TICKS = ('AMZN', 'AMD', 'EBAY', 'GOOG', 'MSFT', 'YHOO')
u = urlopen(URL % ','.join(TICKS))
    for data in u:
        tick, price, chg, per = data.split(',')
        # do something with data

If you're grabbing all the data in one fell swoop (which is what you
should aim for), then it's harder for Yahoo! to know what you're doing
exactly. And I can't see why they'd care as that is all a browser does
anyway. It's when you hit the site a bunch of times in a short period
of time that sets off the alarms.

Mike




More information about the Python-list mailing list