[Python-Dev] Support of UTF-16 and UTF-32 source encodings (original) (raw)

M.-A. Lemburg mal at egenix.com
Sun Nov 15 15:38:45 EST 2015

Previous message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
Next message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 14.11.2015 23:56, Victor Stinner wrote:

These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space.

UTF-16 is used a lot for Windows text files, e.g. Unicode CSV files (the save as "Unicode text file" option writes UTF-16).

However, nowadays, all text editors also support UTF-8 and many of these recognize the UTF-8 BOM as identifier to detect Unicode text files.

Ok, even if it exists, Python already accepts a very wide range of encoding. It is not worth to make the parser much more complex just to support encodings which are also never used (for .py files).

Agreed. In Python 2 we decided to only allow ASCII super-sets for Python source files, which out ruled multi-byte encodings such as UTF-16 and -32. I don't think we need to make the parser more complex just to support them. UTF-8 works fine as Python source code encoding.

Victor Le 14 nov. 2015 20:20, "Serhiy Storchaka" <storchaka at gmail.com> a écrit :

For now UTF-16 and UTF-32 source encodings are not supported. There is a comment in Parser/tokenizer.c:

/* Disable support for UTF-16 BOMs until a decision is made whether this needs to be supported. */ Can we make a decision whether this support will be added in foreseeable future (say in near 10 years), or no? Removing commented out and related code will help to refactor the tokenizer, and that can help to fix some existing bugs (e.g. issue14811, issue18961, issue20115 and may be others). Current tokenizing code is too tangled. If the support of UTF-16 and UTF-32 is planned, I'll take this to attention during refactoring. But in many places besides the tokenizer the ASCII compatible encoding of source files is expected.

Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Nov 15 2015)

Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/

2015-10-23: Released mxODBC Connect 2.1.5 ... http://egenix.com/go85

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Previous message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
Next message (by thread): [Python-Dev] Support of UTF-16 and UTF-32 source encodings
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list