[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
Martin v. Löwis
report at bugs.python.org
Fri Oct 29 21:31:49 CEST 2010
Martin v. Löwis <martin at v.loewis.de> added the comment:
The Solaris case then is already supported, with no change required: if Solaris bans non-ASCII in the network configuration (or, rather, recommends to use IDNA), then this will work fine with the current code.
The Josefsson AI_IDN flag is irrelevant to Python, IMO: it treats byte names as locale-encoded, and converts them with IDNA. Python 3 users really should use Unicode strings in the first place for non-ASCII data, in which case the socket.getaddrinfo uses IDNA, anyway. However, it can't hurt to expose this flag if the underlying C library supports it. AI_CANONIDN might be interesting to implement, but I'd rather wait whether this finds RFC approval. In any case, undoing IDNA is orthogonal to this issue (which is about non-ASCII data returned from the socket API).
If anything needs to be done on Unix, I think that the gethostname result should be decoded using the file system encoding; I then don't mind using surrogate escape there for good measure. This won't hurt systems that restrict host names to ASCII, and may do some good for systems that don't.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9377>
_______________________________________
More information about the Python-bugs-list
mailing list