[Python-Dev] len(chr(i)) = 2? (original) (raw)

R. David Murray rdmurray at bitdance.com
Mon Nov 22 18:30:29 CET 2010


On Mon, 22 Nov 2010 12:00:14 -0500, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote:

I recently updated chr() and ord() documentation and used "narrow/wide" terms. I thought USC2/4 proponents objected to that on the basis that these terms are imprecise.

For reference, a grep in py3k/Doc reveals that there are currently exactly 23 lines mentioning UCS2 or UCS4 in the docs. Most are in the unicode part of the c-api, and 6 are in what's new for 2.2:

c-api/arg.rst: Convert a null-terminated buffer of Unicode (UCS-2 or UCS-4) data to a Python c-api/arg.rst: Convert a Unicode (UCS-2 or UCS-4) data buffer and its length to a Python

c-api/unicode.rst: for :c:type:Py_UNICODE and store Unicode values internally as UCS2. It is also c-api/unicode.rst: possible to build a UCS4 version of Python (most recent Linux distributions come c-api/unicode.rst: with UCS4 builds of Python). These builds then use a 32-bit type for c-api/unicode.rst: :c:type:Py_UNICODE and store Unicode data internally as UCS4. On platforms c-api/unicode.rst: short (UCS2) or :c:type:unsigned long` (UCS4). c-api/unicode.rst:Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep c-api/unicode.rst: values is interpreted as an UCS-2 character.

whatsnew/2.2.rst:usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be whatsnew/2.2.rst:compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by whatsnew/2.2.rst:supplying :option:--enable-unicode=ucs4 to the configure script. (It's also whatsnew/2.2.rst:When built to use UCS-4 (a "wide Python"), the interpreter can natively handle whatsnew/2.2.rst:compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still whatsnew/2.2.rst:Marc-André Lemburg. The changes to support using UCS-4 internally were

howto/unicode.rst:.. comment Additional topic: building Python w/ UCS2 or UCS4 support howto/unicode.rst: - [ ] Building Python (UCS2, UCS4)

library/sys.rst: characters are stored as UCS-2 or UCS-4.

library/json.rst: specified. Encodings that are not ASCII based (such as UCS-2) are not

faq/extending.rst:When importing module X, why do I get "undefined symbol: PyUnicodeUCS2*"? faq/extending.rst:If instead the name of the undefined symbol starts with PyUnicodeUCS4, the faq/extending.rst: ... print('UCS4 build') faq/extending.rst: ... print('UCS2 build')

-- R. David Murray www.bitdance.com



More information about the Python-Dev mailing list