Internationalized domain names not working with URLopen

John Nagle nagle at animats.com
Wed Jun 13 02:17:32 EDT 2012


I'm trying to open

http://пример.испытание

with

urllib2.urlopen(s1)

in Python 2.7 on Windows 7. This produces a Unicode exception:

 >>> s1
u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435'
 >>> fd = urllib2.urlopen(s1)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\python27\lib\urllib2.py", line 126, in urlopen
     return _opener.open(url, data, timeout)
   File "C:\python27\lib\urllib2.py", line 394, in open
     response = self._open(req, data)
   File "C:\python27\lib\urllib2.py", line 412, in _open
     '_open', req)
   File "C:\python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
   File "C:\python27\lib\urllib2.py", line 1199, in http_open
     return self.do_open(httplib.HTTPConnection, req)
   File "C:\python27\lib\urllib2.py", line 1168, in do_open
     h.request(req.get_method(), req.get_selector(), req.data, headers)
   File "C:\python27\lib\httplib.py", line 955, in request
     self._send_request(method, url, body, headers)
   File "C:\python27\lib\httplib.py", line 988, in _send_request
     self.putheader(hdr, value)
   File "C:\python27\lib\httplib.py", line 935, in putheader
     hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
0-5: ordinal not in range(128)
 >>>

The HTTP library is trying to put the URL in the header as ASCII.  Why 
isn't "urllib2" handling that?

What does "urllib2" want?  Percent escapes?  Punycode?

				John Nagle



More information about the Python-list mailing list