[Python-Dev] [ssl] The weird case of IDNA

Guido van Rossum guido at python.org
Fri Dec 29 23:46:12 EST 2017


This being a security issue I think it's okay to break 3.6. might even
backport to 3.5 if it's easy?

On Dec 29, 2017 1:59 PM, "Christian Heimes" <christian at python.org> wrote:

> Hi,
>
> tl;dr
> This mail is about internationalized domain names and TLS/SSL. It
> doesn't concern you if you live in ASCII-land. Me and a couple of other
> developers like to change the ssl module in a backwards-incompatible way
> to fix IDN support for TLS/SSL.
>
>
> Simply speaking the IDNA standards (internationalized domain names for
> applications) describe how to encode non-ASCII domain names. The DNS
> system and X.509 certificates cannot handle non-ASCII host names. Any
> non-ASCII part of a hostname is punyencoded. For example the host name
> 'www.bücher.de <http://www.xn--bcher-kva.de>' (books) is translated into '
> www.xn--bcher-kva.de'. In
> IDNA terms, 'www.bücher.de <http://www.xn--bcher-kva.de>' is called an
> IDN U-label (unicode) and
> 'www.xn--bcher-kva.de' an IDN A-label (ASCII). Please refer to the TR64
> document [1] for more information.
>
> In a perfect world, it would be very simple. We'd only had one IDNA
> standard. However there are multiple standards that are incompatible
> with each other. The German TLD .de demands IDNA-2008 with UTS#46
> compatibility mapping. The hostname 'www.straße.de <http://www.strasse.de>'
> maps to
> 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard,
> 'www.straße.de <http://www.strasse.de>' maps to 'www.strasse.de', but '
> strasse.de' is a totally
> different domain!
>
>
> CPython has only support for IDNA 2003.
>
> It's less of an issue for the socket module. It only converts text to
> IDNA bytes on the way in. All functions support bytes and text. Since
> IDNA encoding does change ASCII and IDNA-encoded data is ASCII, it is
> also no problem to pass IDNA2008-encoded text or bytes to all socket
> functions.
>
> Example:
>
> >>> import socket
> >>> import idna  # from PyPI
> >>> names = ['straße.de <http://strasse.de>', b'strasse.de', idna.encode('
> straße.de <http://strasse.de>'),
> idna.encode('straße.de <http://strasse.de>').encode('ascii')]
> >>> for name in names:
> ...     print(name, socket.getaddrinfo(name, None, socket.AF_INET,
> socket.SOCK_STREAM, 0, socket.AI_CANONNAME)[0][3:5])
> ...
> straße.de <http://strasse.de> ('strasse.de', ('89.31.143.1', 0))
> b'strasse.de' ('strasse.de', ('89.31.143.1', 0))
> b'xn--strae-oqa.de' ('xn--strae-oqa.de', ('81.169.145.78', 0))
> xn--strae-oqa.de ('xn--strae-oqa.de', ('81.169.145.78', 0))
>
> As you can see, 'straße.de <http://strasse.de>' is canonicalized as '
> strasse.de'. The IDNA
> 2008 encoded hostname maps to a different IP address.
>
>
> On the other hand ssl module is currently completely broken. It converts
> hostnames from bytes to text with 'idna' codec in some places, but not
> in all. The SSLSocket.server_hostname attribute and callback function
> SSLContext.set_servername_callback() are decoded as U-label.
> Certificate's common name and subject alternative name fields are not
> decoded and therefore A-labels. The *must* stay A-labels because
> hostname verification is only defined in terms of A-labels. We even had
> a security issue once, because partial wildcard like 'xn*.example.org'
> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>
> In issue [2] and PR [3], we all agreed that the only sensible fix is to
> make 'SSLContext.server_hostname' an ASCII text A-label. But this is an
> backwards incompatible fix. On the other hand, IDNA is totally broken
> without the fix. Also in my opinion, PR [3] is not going far enough.
> Since we have to break backwards compatibility anyway, I'd like to
> modify SSLContext.set_servername_callback() at the same time.
>
> Questions:
> - Is everybody OK with breaking backwards compatibility? The risk is
> small. ASCII-only domains are not affected and IDNA users are broken
> anyway.
> - Should I only fix 3.7 or should we consider a backport to 3.6, too?
>
> Regards,
> Christian
>
> [1] https://www.unicode.org/reports/tr46/
> [2] https://bugs.python.org/issue28414
> [3] https://github.com/python/cpython/pull/3010
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171229/6a529f3d/attachment-0001.html>


More information about the Python-Dev mailing list