[Python-Dev] Support of UTF-16 and UTF-32 source encodings (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Sun Nov 15 02:23:50 EST 2015
- Previous message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
- Next message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Steve Dower writes:
Saying [UTF-16] is rarely used is rather exposing your own unawareness though - it could arguably be the most commonly used encoding (depending on how you define "used").
Because we're discussing the storage of .py files, the relevant definition is the one used by the Unicode Standard, of course: a text/plain stream intended to be manipulated by any conformant Unicode processor that claims to handle text/plain. File formats with in-band formatting codes and allowing embedded non-text content like Word, or operating system or stdlib APIs, don't count. Nor have I seen UTF-16 used in email or HTML since the unregretted days of Win2k betas[1] (but I don't frequent Windows- or Java-oriented sites, so I have to admit my experience is limited in a possibly relevant way).
In Japan my impression is that modern versions of Windows have Memopad[sic] configured to emit UTF-8-with-signature by default for new files, and if not, the abomination known as Shift JIS (I'm not sure if that is a user or OEM option, though). Never a widechar encoding (after all, the whole point of Shift JIS was to use an 8-bit encoding for the katakana syllabary to save space or bandwidth).
I think if anyone wants to use UTF-16 or UTF-32 for exchange of Python programs, they probably already know how to convert them to UTF-8. As somebody already suggested, this can be delegated to the py.exe launcher, if necessary, AFAICS.
I don't see any good reason for allowing non-ASCII-compatible encodings in the reference CPython interpreter.
However, having mentioned Windows and Java, I have to wonder about IronPython and Jython, respectively. Having never lived in either of those environments, I don't know what text encoding their users might prefer (or even occasionally encounter) in Python program source.
Steve
Footnotes: [1] The version of Outlook Express shipped with them would emit "HTML" mail with ASCII tags and UTF-8-encoded text (even if it was encodable in pure ASCII). No, it wasn't spam, either, so it probably really was Outlook Express as it claimed to be in one of the headers.
- Previous message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
- Next message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]