[Python-Dev] [ssl] The weird case of IDNA (original) (raw)

Nathaniel Smith njs at pobox.com
Sun Dec 31 01:13:10 EST 2017


On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:

Christian Heimes writes: > Questions: > - Is everybody OK with breaking backwards compatibility? The risk is > small. ASCII-only domains are not affected

That's not quite true, as your German example shows. In some Oriental renderings it is impossible to distinguish halfwidth digits from full-width ones as the same glyphs are used. (This occasionally happens with other ASCII characters, but users are more fussy about digits lining up.) That is, while technically ASCII-only domain names are not affected, users of ASCII-only domain names are potentially vulnerable to confusable names when IDNA is introduced. (Hopefully the Asian registrars are as woke as the German ones! But you could still register a .com containing full-width digits or letters.)

This particular example isn't an issue: in IDNA encoding, full-width and half-width digits are normalized together, so number1.com and number1.com actually refer to the same domain name. This is true in both the 2003 and 2008 versions:

IDNA 2003

In [7]: "number\uff11.com".encode("idna") Out[7]: b'number1.com'

IDNA 2008 (using the 'idna' package from pypi)

In [8]: idna.encode("number\uff11.com", uts46=True) Out[8]: b'number1.com'

That said, IDNA does still allow for a bunch of spoofing opportunities that aren't possible with pure ASCII, and this requires some care: https://unicode.org/faq/idn.html#16

This is mostly a UI issue, though; there's not much that the socket or ssl modules can do to help here.

-n

-- Nathaniel J. Smith -- https://vorpus.org



More information about the Python-Dev mailing list