[Python-Dev] [ssl] The weird case of IDNA
Nathaniel Smith
njs at pobox.com
Sun Dec 31 01:13:10 EST 2017
On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> Christian Heimes writes:
> > Questions:
> > - Is everybody OK with breaking backwards compatibility? The risk is
> > small. ASCII-only domains are not affected
>
> That's not quite true, as your German example shows. In some Oriental
> renderings it is impossible to distinguish halfwidth digits from
> full-width ones as the same glyphs are used. (This occasionally
> happens with other ASCII characters, but users are more fussy about
> digits lining up.) That is, while technically ASCII-only domain names
> are not affected, users of ASCII-only domain names are potentially
> vulnerable to confusable names when IDNA is introduced. (Hopefully
> the Asian registrars are as woke as the German ones! But you could
> still register a .com containing full-width digits or letters.)
This particular example isn't an issue: in IDNA encoding, full-width
and half-width digits are normalized together, so number1.com and
number1.com actually refer to the same domain name. This is true in
both the 2003 and 2008 versions:
# IDNA 2003
In [7]: "number\uff11.com".encode("idna")
Out[7]: b'number1.com'
# IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True)
Out[8]: b'number1.com'
That said, IDNA does still allow for a bunch of spoofing opportunities
that aren't possible with pure ASCII, and this requires some care:
https://unicode.org/faq/idn.html#16
This is mostly a UI issue, though; there's not much that the socket or
ssl modules can do to help here.
-n
--
Nathaniel J. Smith -- https://vorpus.org
More information about the Python-Dev
mailing list