[Tutor] urllib2.urlopen()

Sun Oct 14 08:15:33 CEST 2012

On 10/13/2012 07:50 PM, Steven D'Aprano wrote:
> On 14/10/12 12:45, Ray Jones wrote:
>> On 10/13/2012 05:09 PM, Brian van den Broek wrote:
>>> On 13 October 2012 19:44, Ray Jones<crawlzone at gmail.com>  wrote:
>>>> I am attempting to capture url headers and have my script make
>>>> decisions
>>>> based on the content of those headers.
>>>>
>>>> Here is what I am using in the relative portion of my script:
>>>>
>>>> try:
>>>>      urllib2.urlopen('http://myurl.org')
>>>> except urllib2.HTTPError, e:
>
> Well, in this case, for that URL, the connection succeeds without
> authentication. It might help if you test with a URL that actually
> fails :)
Ya think? ;))
>
>>>> In the case of authentication error, I can print e.info() and get all
>>>> the relevant header information. But I don't want to print.
>
> Then don't.
>
> If you can do `print e.info()`, then you can also do `info = e.info()`
> and inspect the info programmatically.
>
One would expect that to be true. But when I do info = e.info(), info is
<httplib.HTTPMessage instance at 0x85bdd2c>.

When I print e.info(), I get the following:

Content-Type: text/html
Connection: close
WWW-Authenticate: Basic realm="xxxx"
Content-Length: xx

I can iterate through e.info() with a 'for' loop, but all I get as a
result is:

connection
content-type
www-authenticate
content-length

In other words, I get the headers but not the corresponding values.

The same also happens if I iterate through e.headers.

> but unfortunately the docs are rather sparse. In this case, I strongly
> recommend the "urllib2 missing manual":
>
> http://www.voidspace.org.uk/python/articles/urllib2.shtml
>
I checked out this site, but it didn't have any further information than
I had found on another site.

Any further suggestions?

Ray