[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

Tue Aug 24 01:07:06 CEST 2010

Martin v. Löwis <martin at v.loewis.de> added the comment:

>> Is this patch in response to an actual problem, or a theoretical problem?
>> If "actual problem": what was the specific application, and what was the specific host name?
> 
> It's about environments, not applications

Still, my question remains. Is it a theoretical problem (i.e. one
of your imagination), or a real one (i.e. one you observed in real
life, without explicitly triggering it)? If real: what was the
specific environment, and what was the specific host name?

> There are two points here.  One is that the decoding can fail; I
> do think that programmers will find this surprising, and the fact
> that Python refuses to return what was actually received is a
> regression compared to 2.x.

True. However, I think this is an acceptable regression,
assuming the problem is merely theoretical. It is ok if
an operation fails that you will never run into in real life.

> That means that when a decoded hostname contains a non-ASCII
> character which is not prohibited by IDNA/Nameprep, that string
> will, when used in a subsequent call, not refer to the hostname
> that was actually received, because it will be re-encoded using a
> different codec.

Again, I fail to see the problem in this. It won't happen in
real life. However, if you worried that this could be abused,
I think it should decode host names as ASCII, not as UTF-8.
Then it will be symmetric again (IIUC).

----------
title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket,	PEP 383: Mishandling of non-ASCII bytes in host/domain names

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9377>
_______________________________________