[Baypiggies] web scraping best practice question

Isaac hyperneato at gmail.com
Mon Nov 2 20:22:11 CET 2009


Hello Baypiggies.

I wrote a Python script to send a query to a single website. I am
curious: what is the best practice for the rate of sending requests
when scraping a single site? I'll have about 4000 requests.
I thought about _politely_ writing:

import random
for x in large_query_list:
   send_scrap_query(x)
   t = random.randint(1, 5)
   sleep(t)

to pause for a psuedo-random duration between each request- so I don't
put strain on anyone's network. Does anyone have recommendations for
best practices regarding rete of sending a set of queries? I missed
the talk about web scraping from the beginning of the year.

-Isaac


More information about the Baypiggies mailing list