Unexpected exception from socket.getaddrinfo on Unicode URL

John Nagle nagle at animats.com
Sat Apr 21 02:01:01 EDT 2007


     Here's a strange little bug.  "socket.getaddrinfo" blows up
if given a bad domain name containing ".." in Unicode.  The
same string in ASCII produces the correct "gaierror" exception.

     Actually, this deserves a documentation mention.  The "socket" module,
given a Unicode string, calls the International Domain Name parser,
"idna.py", which has a a whole error system of its own.  The IDNA
documentation says that "Furthermore, the socket module transparently converts 
Unicode host names to ACE, so that applications need not be concerned about 
converting host names themselves when they pass them to the socket module."
However, that's not quite true; the IDNA rules say that syntax errors must
be treated as errors, so you have to be prepared for IDNA exceptions.
They are all "UnicodeError" exceptions.

     It's worth a mention in the documentation for "socket".

					John Nagle

D:\>/python25/python.exe
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
 >>> ss = 'www.gallery84..com'
 >>> uss = unicode(ss)
 >>> import socket
 >>> socket.getaddrinfo(ss,"http")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
socket.gaierror: (11001, 'getaddrinfo failed')
 >>> socket.getaddrinfo(uss,"http")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "D:\python25\lib\encodings\idna.py", line 164, in encode
     result.append(ToASCII(label))
   File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
     raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
 >>>




More information about the Python-list mailing list