Simple Python web proxy stalls for some web sites

Richie Hindle richie at entrian.com
Fri Oct 8 05:20:29 EDT 2004


[Richie]
> By default, urllib2 specifies "User-Agent: Python-urllib/x.y"  Some
> sites, Google included, reject this because they don't like to be
> web-scraped.

[Bryan]
> Google dis' Python?  No way!

Way, I'm afraid.

> I checked, and Google is answering in good faith.

Try doing an actual query:

>>> import urllib2
>>> f = urllib2.urlopen("http://www.google.com/")  # Works OK
>>> f = urllib2.urlopen("http://www.google.com/search?q=python")
Traceback (most recent call last):
[...]
urllib2.HTTPError: HTTP Error 403: Forbidden
>>>

This is probably not the problem you're facing right now, but it will be
a problem when you solve your current one.  8-)

-- 
Richie Hindle
richie at entrian.com




More information about the Python-list mailing list