[Python-Dev] Unicode literals in Python 2.7 (original) (raw)
Chris Angelico rosuav at gmail.com
Thu Apr 30 03:43:46 CEST 2015
- Previous message (by thread): [Python-Dev] Unicode literals in Python 2.7
- Next message (by thread): [Python-Dev] Unicode literals in Python 2.7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Apr 30, 2015 at 11:03 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
Note that even if you have a UTF-8 input source, some users are likely to be surprised because IIRC Python doesn't canonicalize in its codecs; that is left for higher-level libraries. Linux UTF-8 is usually NFC normalized, while Mac UTF-8 is NFD normalized.
> >> u'\xce\xb1' Note that that is perfectly legal Unicode.
It's legal Unicode, but it doesn't mean what he typed in. This means:
'\xce' LATIN CAPITAL LETTER I WITH CIRCUMFLEX '\xb1' PLUS-MINUS SIGN
but the original input was:
'\u03b1' GREEK SMALL LETTER ALPHA
ChrisA
- Previous message (by thread): [Python-Dev] Unicode literals in Python 2.7
- Next message (by thread): [Python-Dev] Unicode literals in Python 2.7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]