Network failure when using urllib2

Ravi Teja webraviteja at gmail.com
Mon Jan 8 19:15:39 EST 2007


jdvolz at gmail.com wrote:
> I have a script that uses urllib2 to repeatedly lookup web pages (in a
> spider sort of way).  It appears to function normally, but if it runs
> too long I start to get 404 responses.  If I try to use the internet
> through any other programs (Outlook, FireFox, etc.) it will also fail.
> If I stop the script, the internet returns.
>
> Has anyone observed this behavior before?  I am relatively new to
> Python and would appreciate any suggestions.
>
> Shuad

I am assuming that you are fetching the full page every little while.
You are not supposed to do that. The admin of the web site you are
constantly hitting probably configured his server to block you
temporarily when that happens. But don't feel bad :-). This is a common
Beginners mistake.

Read here on the proper way to do this.
http://diveintopython.org/http_web_services/review.html
especially 11.3.3. Last-Modified/If-Modified-Since in the next page

Ravi Teja.




More information about the Python-list mailing list