[Python-Dev] [ssl] The weird case of IDNA (original) (raw)
Christian Heimes christian at python.org
Fri Dec 29 15:54:46 EST 2017
- Previous message (by thread): [Python-Dev] Concerns about method overriding and subclassing with dataclasses
- Next message (by thread): [Python-Dev] [ssl] The weird case of IDNA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
tl;dr This mail is about internationalized domain names and TLS/SSL. It doesn't concern you if you live in ASCII-land. Me and a couple of other developers like to change the ssl module in a backwards-incompatible way to fix IDN support for TLS/SSL.
Simply speaking the IDNA standards (internationalized domain names for applications) describe how to encode non-ASCII domain names. The DNS system and X.509 certificates cannot handle non-ASCII host names. Any non-ASCII part of a hostname is punyencoded. For example the host name 'www.bücher.de' (books) is translated into 'www.xn--bcher-kva.de'. In IDNA terms, 'www.bücher.de' is called an IDN U-label (unicode) and 'www.xn--bcher-kva.de' an IDN A-label (ASCII). Please refer to the TR64 document [1] for more information.
In a perfect world, it would be very simple. We'd only had one IDNA standard. However there are multiple standards that are incompatible with each other. The German TLD .de demands IDNA-2008 with UTS#46 compatibility mapping. The hostname 'www.straße.de' maps to 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard, 'www.straße.de' maps to 'www.strasse.de', but 'strasse.de' is a totally different domain!
CPython has only support for IDNA 2003.
It's less of an issue for the socket module. It only converts text to IDNA bytes on the way in. All functions support bytes and text. Since IDNA encoding does change ASCII and IDNA-encoded data is ASCII, it is also no problem to pass IDNA2008-encoded text or bytes to all socket functions.
Example:
import socket import idna # from PyPI names = ['straße.de', b'strasse.de', idna.encode('straße.de'), idna.encode('straße.de').encode('ascii')] for name in names: ... print(name, socket.getaddrinfo(name, None, socket.AF_INET, socket.SOCK_STREAM, 0, socket.AI_CANONNAME)[0][3:5]) ... straße.de ('strasse.de', ('89.31.143.1', 0)) b'strasse.de' ('strasse.de', ('89.31.143.1', 0)) b'xn--strae-oqa.de' ('xn--strae-oqa.de', ('81.169.145.78', 0)) xn--strae-oqa.de ('xn--strae-oqa.de', ('81.169.145.78', 0))
As you can see, 'straße.de' is canonicalized as 'strasse.de'. The IDNA 2008 encoded hostname maps to a different IP address.
On the other hand ssl module is currently completely broken. It converts hostnames from bytes to text with 'idna' codec in some places, but not in all. The SSLSocket.server_hostname attribute and callback function SSLContext.set_servername_callback() are decoded as U-label. Certificate's common name and subject alternative name fields are not decoded and therefore A-labels. The must stay A-labels because hostname verification is only defined in terms of A-labels. We even had a security issue once, because partial wildcard like 'xn*.example.org' must not match IDN hosts like 'xn--bcher-kva.example.org'.
In issue [2] and PR [3], we all agreed that the only sensible fix is to make 'SSLContext.server_hostname' an ASCII text A-label. But this is an backwards incompatible fix. On the other hand, IDNA is totally broken without the fix. Also in my opinion, PR [3] is not going far enough. Since we have to break backwards compatibility anyway, I'd like to modify SSLContext.set_servername_callback() at the same time.
Questions:
- Is everybody OK with breaking backwards compatibility? The risk is small. ASCII-only domains are not affected and IDNA users are broken anyway.
- Should I only fix 3.7 or should we consider a backport to 3.6, too?
Regards, Christian
[1] https://www.unicode.org/reports/tr46/ [2] https://bugs.python.org/issue28414 [3] https://github.com/python/cpython/pull/3010
- Previous message (by thread): [Python-Dev] Concerns about method overriding and subclassing with dataclasses
- Next message (by thread): [Python-Dev] [ssl] The weird case of IDNA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]