[Python-bugs-list] [ python-Bugs-413135 ] urllib2 fails with proxy requiring auth
SourceForge.net
noreply@sourceforge.net
Wed, 12 Feb 2003 12:45:30 -0800
Bugs item #413135, was opened at 2001-04-02 15:14
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=413135&group_id=5470
Category: Python Library
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Paul Moore (pmoore)
Assigned to: Moshe Zadka (moshez)
Summary: urllib2 fails with proxy requiring auth
Initial Comment:
The following program:
import urllib2
proxy_info = {
'user' : 'my_name', 'pass' : 'my_pass',
'host' : "my-proxy", 'port' : 80
}
# build a new opener that uses a proxy requiring
# authorization
proxy_support = urllib2.ProxyHandler(
{"http" :
"http://%(user)s:%(pass)s@%(host)s:%(port)d"
% proxy_info})
opener = urllib2.build_opener(proxy_support,
urllib2.HTTPHandler)
# install it
urllib2.install_opener(opener)
f = urllib2.urlopen('http://www.python.org/')
print f.headers
print f.read()
fails with the following error on Python 2.1b2 (on
Windows)
C:\Data>python21 proxy_auth.py
Traceback (most recent call last):
File "proxy_auth.py", line 18, in ?
f = urllib2.urlopen('http://www.python.org/')
File "c:\applications\python21\lib\urllib2.py", line
135, in urlopen
return _opener.open(url, data)
File "c:\applications\python21\lib\urllib2.py", line
318, in open
'_open', req)
File "c:\applications\python21\lib\urllib2.py", line
297, in _call_chain
result = func(*args)
File "c:\applications\python21\lib\urllib2.py", line
823, in http_open
return self.do_open(httplib.HTTP, req)
File "c:\applications\python21\lib\urllib2.py", line
801, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error host not found>
A similar error occurred in beta 1, but this was
reported as bug 406683. The fix is in beta 2. I
applied the fix manually in beta 1, and it worked, so
I can only assume that something else changed in the
transition from beta 1 to beta 2, which broke this
again.
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-02-12 20:45
Message:
Logged In: YES
user_id=261020
Well, I believe Paul when he says his proxy was doing this, but
AFAICS, RFC 2616 says this is incorrect:
(RFC 2616, p. 128):
The Host request-header field specifies the Internet host and port
number of the resource being requested, as obtained from the original
URI given by the user or referring resource (generally an HTTP URL,
as described in section 3.2.2). The Host field value MUST represent
the naming authority of the origin server or gateway given by the
original URL. This allows the origin server or gateway to
differentiate between internally-ambiguous URLs, such as the root "/"
URL of a server for multiple host names on a single IP address.
Can't find anything to say this doesn't apply to requests going via a
proxy.
Of course (well, I say of course -- I *think* I understand fully), the
Host: header is redundant for proxies, because you have to send the
full absoluteURI (with host) anyway, but I think Python should follow
the RFC rather some random, and presumably broken, proxy.
[well, to clarify: Host is redundant when using a proxy, but still
required to be present by the RFC -- don't know why]
BTW, urllib.py (in 2.2.1, anyway) *does* still follow RFC 2616, so
urllib now differs from Moshe's 'fixed' urllib2.
I think the patch to urllib2 should be reversed.
John
----------------------------------------------------------------------
Comment By: Moshe Zadka (moshez)
Date: 2001-04-11 08:45
Message:
Logged In: YES
user_id=11645
You're right!
I've fixed this in urllib2.py v 1.12
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2001-04-10 10:46
Message:
Logged In: YES
user_id=113328
I found the problem. In urllib2.py, class
AbstractHTTPHandler, method do_open, the first line is now
host = urlparse.urlparse(req.get_full_url())[1]
It used to be
host = req.get_host()
With the old version, the code works (with my proxy). With
the new version it doesn't, as it passes the destination
host, rather than the proxy name (and so loses the proxy
info totally).
Paul.
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2001-04-10 10:45
Message:
Logged In: YES
user_id=113328
I found the problem. In urllib2.py, class
AbstractHTTPHandler, method do_open, the first line is now
host = urlparse.urlparse(req.get_full_url())[1]
It used to be
host = req.get_host()
With the old version, the code works (with my proxy). With
the new version it doesn't, as it passes the destination
host, rather than the proxy name (and so loses the proxy
info totally).
Paul.
----------------------------------------------------------------------
Comment By: Moshe Zadka (moshez)
Date: 2001-04-09 15:11
Message:
Logged In: YES
user_id=11645
I've just tested with my installation of Python 2.1b2 and it
works. So I cannot reproduce the problem, and I need more
information from you: can you insert prints in the correct
places (e.g. do_open) to see what host urllib2 *thinks* it
is trying to access?
Thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=413135&group_id=5470