Simple Python web proxy stalls for some web sites
Richie Hindle
richie at entrian.com
Thu Oct 7 09:24:41 EDT 2004
[Carl]
> I have written a simple web proxy using the Python standard library
> BaseHTTPRequestHandler. [...] some web sites simply seem to stall
> indefinitely (e.g. www.google.com).
By default, urllib2 specifies "User-Agent: Python-urllib/x.y" Some
sites, Google included, reject this because they don't like to be
web-scraped.
You need to tell urllib2 to ditch its built-in User-Agent header and
pass through the one from the browser. I don't know how to do that off
the top of my head, but some Googling should soon find the answer. 8-)
I do know that there were some bugs in this that were fixed in Python
2.3, so make sure you're using at least that version.
--
Richie Hindle
richie at entrian.com
More information about the Python-list
mailing list