[Python-Dev] len(chr(i)) = 2? (original) (raw)
Alexander Belopolsky alexander.belopolsky at gmail.com
Mon Nov 22 18:00:14 CET 2010
- Previous message: [Python-Dev] len(chr(i)) = 2?
- Next message: [Python-Dev] len(chr(i)) = 2?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Nov 22, 2010 at 11:13 AM, Nick Coghlan <ncoghlan at gmail.com> wrote: ..
Do you think these articles are helpful for someone learning how to use chr() and ord() in Python for the first time? No, that's what the documentation of chr() and ord() is for. For that use case, it doesn't matter what the terms are.
I recently updated chr() and ord() documentation and used "narrow/wide" terms. I thought USC2/4 proponents objected to that on the basis that these terms are imprecise.
http://docs.python.org/dev/library/functions.html#chr http://docs.python.org/dev/library/functions.html#ord
They could say "in a FOO build this will do X, in a BAR build it will do Y, see for a detailed explanation of the differences between FOO and BAR builds of Python" and be perfectly adequate for the task. If there is no appropriate documentation link to point to (probably somewhere in the C API docs if it isn't anywhere else) then that is a key issue that needs to be fixed, rather than trying to change the terms that have been in use for the better part of a decade already.
That's the point that I was trying to make. Using somewhat vague narrow/wide terms gives us an opportunity to describe exactly what is going on without confusing the reader with the intricacies of the Unicode Standard or Python'd compliance with a particular version of it.
The raw meaning of UCS2/UCS4 mainly comes into the story when people are encountering this as a config option when building Python. The whole idea of changing the terms for the two build types should have been short circuited by the "status quo wins a stalemate" guideline, but apparently that didn't happen at the time.
It also comes in the "Data model" reference section on String which is currently out of date:
""" Strings The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items. The built-in functions chr() and ord() convert between code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to other encodings are possible through the string method encode(). """ http://docs.python.org/dev/reference/datamodel.html
The out of date part is the reference to the Unicode Standard 3.0. I don't think we should refer to a specific version of Unicode here. It has little consequence for the "Python data model" and AFAICT does not come into play anywhere except unicodedata which is currently at version 6.0.
The description of chr() and ord() is also not accurate on narrow builds and nether is the statement "The items of a string object are Unicode code units."
- Previous message: [Python-Dev] len(chr(i)) = 2?
- Next message: [Python-Dev] len(chr(i)) = 2?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]