[Python-Dev] Multilingual programming article on the Red Hat Developer blog (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu Sep 18 06:57:40 CEST 2014


Steven D'Aprano writes:

On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote:

Guido's mantra is something like "Python's str doesn't contain characters or even code points[1], it contains code units."

But is that true?

It's not. That's why I wrote the slightly pejorative "mantra" and qualified it with "something like". The precise statement is "something like" the array property is more important than preserving character boundaries, so slices etc are allowed to do unexpected or even evil things in the presence of astral characters in UTF-16 representations.

I don't understand what you are trying to say here.

Nor am I sure what you are trying to say here either.

We can discuss this off-list if you would like. The natives are getting restless.

non-characters.

Actually not quite. "Noncharacter"

Note the hyphen! (Just kidding, I will avoid that terminology in the future. I knew, but forgot.)

Characters are those code points that may be assigned an interpretation as a character, including undefined characters (private space and reserved).

So characters are code points which are characters, including undefined characters? :-)

No, there's a clear hierarchy here.



More information about the Python-Dev mailing list