[Python-Dev] Multilingual programming article on the Red Hat Developer blog (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Fri Sep 12 19:24:15 CEST 2014
- Previous message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Next message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Allen writes:
Simply having a block "for private use" seems to create an unmanaged space for conflict,
No. The uncharted range of human language (including recently- invented nonsense like "emoticons" and the annual "design a character" contest run by a newpaper in Taipei, with the grand prize being your character gets added to the national standard IIRC, but maybe it's just that newspaper's collection of private space characters) already contains those conflicts. Believe me, "private use space, manage it yourself" was the best they could do.
I've been working with the beureaucratic insanity of the Japanese national standard -- it took almost 3 decades before every Japanese citizen could store their names in a computer using government- approved codes -- and the chaos of the Taiwanese national standard -- which contains hordes of characters with one known use and no known meaning, many of them duplicates -- for twenty years now. Neither approach works as well as Unicode's, despite its design-by-committee flaws overlaid with national animosities that can flare into linguicidal vetoes and code-space-stuffing logrolling.
reminiscent of the "other 128 characters" in bilingual programming. I wondered if the way to respect use by applications might be to make it private to a particular sub-class of str, idly however.
If I understand your suggestion, that's precisely the intent of PEP 383, to make undecodable bytes in a coded character stream private. But they need to be in the stream one way or another. So PEP 383 chose to use a non-Unicode encoding (based on the "lone surrogate" device invented by Markus Kuhn for utf-8b) to deal with that, and that does effectively make those elements private to Python (but of course not in the Unicode sense, as they're not even characters in Unicode).
But I gather the "native" Unicode type in Java doesn't allow you to use that dodge because it checks for malformed Unicode internally (ie, at a level not controllable by Jython). So you have to embed such stream elements in the space of Unicode characters. You have the option of the private space or unallocated (reserved) space. The latter seems like asking for trouble, and the only way to avoid it would be to be prepared to move that data around in case of collision. But that's precisely what I'm suggesting doing in private space. Same issue, either way. Private space with a local registry seems saner.
- Previous message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Next message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]