[Python-Dev] [ssl] The weird case of IDNA (original) (raw)
Nathaniel Smith njs at pobox.com
Sun Dec 31 01:13:10 EST 2017
- Previous message (by thread): [Python-Dev] [ssl] The weird case of IDNA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
Christian Heimes writes: > Questions: > - Is everybody OK with breaking backwards compatibility? The risk is > small. ASCII-only domains are not affected
That's not quite true, as your German example shows. In some Oriental renderings it is impossible to distinguish halfwidth digits from full-width ones as the same glyphs are used. (This occasionally happens with other ASCII characters, but users are more fussy about digits lining up.) That is, while technically ASCII-only domain names are not affected, users of ASCII-only domain names are potentially vulnerable to confusable names when IDNA is introduced. (Hopefully the Asian registrars are as woke as the German ones! But you could still register a .com containing full-width digits or letters.)
This particular example isn't an issue: in IDNA encoding, full-width and half-width digits are normalized together, so number1.com and number1.com actually refer to the same domain name. This is true in both the 2003 and 2008 versions:
IDNA 2003
In [7]: "number\uff11.com".encode("idna") Out[7]: b'number1.com'
IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True) Out[8]: b'number1.com'
That said, IDNA does still allow for a bunch of spoofing opportunities that aren't possible with pure ASCII, and this requires some care: https://unicode.org/faq/idn.html#16
This is mostly a UI issue, though; there's not much that the socket or ssl modules can do to help here.
-n
-- Nathaniel J. Smith -- https://vorpus.org
- Previous message (by thread): [Python-Dev] [ssl] The weird case of IDNA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]