[Baypiggies] web scraping best practice question
Andrew Dalke
dalke at dalkescientific.com
Tue Nov 3 23:04:12 CET 2009
On Nov 2, 2009, at 10:24 PM, Dennis Reinhardt wrote:
> 1) Save the pages you access so that if you need to re-parse, you
> have a local copy ... or you hit an error and need to reacquire.
For one project, what I did for this was set up Squid reverse proxy,
and configured it to keep all pages for a few hours. In that way I
could test nearly everything, including HTTP error codes, without
having to do a separate file I/O interface and without hitting the
remote server hard while I was debugging things. The only change in
my code was setting http_proxy.
Andrew
dalke at dalkescientific.com
More information about the Baypiggies
mailing list