[Baypiggies] web scraping best practice question
Isaac
hyperneato at gmail.com
Mon Nov 2 20:22:11 CET 2009
Hello Baypiggies.
I wrote a Python script to send a query to a single website. I am
curious: what is the best practice for the rate of sending requests
when scraping a single site? I'll have about 4000 requests.
I thought about _politely_ writing:
import random
for x in large_query_list:
send_scrap_query(x)
t = random.randint(1, 5)
sleep(t)
to pause for a psuedo-random duration between each request- so I don't
put strain on anyone's network. Does anyone have recommendations for
best practices regarding rete of sending a set of queries? I missed
the talk about web scraping from the beginning of the year.
-Isaac
More information about the Baypiggies
mailing list