Simple Python web proxy stalls for some web sites

Richie Hindle richie at entrian.com
Thu Oct 7 09:24:41 EDT 2004


[Carl]
> I have written a simple web proxy using the Python standard library
> BaseHTTPRequestHandler.  [...]  some web sites simply seem to stall
> indefinitely (e.g. www.google.com).

By default, urllib2 specifies "User-Agent: Python-urllib/x.y"  Some
sites, Google included, reject this because they don't like to be
web-scraped.

You need to tell urllib2 to ditch its built-in User-Agent header and
pass through the one from the browser.  I don't know how to do that off
the top of my head, but some Googling should soon find the answer.  8-)

I do know that there were some bugs in this that were fixed in Python
2.3, so make sure you're using at least that version.

-- 
Richie Hindle
richie at entrian.com




More information about the Python-list mailing list