Python Web Servers and Page Retrievers

Max Erickson maxerickson at gmail.com
Wed Apr 11 20:31:07 EDT 2007


"Collin Stocks" <collinstocks at gmail.com> wrote:

> ------=_Part_19087_21002019.1176329323968
> I tried it, and when checking it using a proxy, saw that it
> didn't really work, at least in the version that I have (urllib
> v1.17 and urllib2 v2.5). It just added that header onto the end,
> therefore making there two User-Agent headers, each with
> different values. I might add that my script IS able to retrieve
> search pages from Google, whereas both urllibs are FORBIDDEN with
> the headers that they use. 
> 

I don't know enough about either library to argue about it, but here 
is what I get following the Dive Into Python example(but hitting 
google for a search):

>>> import urllib2
>>> opener=urllib2.build_opener()
>>> request=urllib2.Request('http://www.google.com/search?
q=tesla+battery')
>>> request.add_header('User-Agent','OpenAnything/1.0 
+http://diveintopython.org/')
>>> data=opener.open(request).read()
>>> data
'<html><head><meta http-equiv="content-type" content="text/html; 
charset=ISO-8859-1"><title>tesla battery - Google Search</title><
[snip rest of results page]

This is with python 2.5 on windows.


max




More information about the Python-list mailing list