[Python-3000] PEP 3131 accepted (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu May 24 12:05:24 CEST 2007


Josiah Carlson writes:

Removing those words that some found offensive, perhaps I will get a reponse to the point of my post: "your tools aren't very good" and "Emacs does it right" are not valid responses to the concerns brought up regarding unicode.

You're missing my point still, and I don't find the words offensive. (It's a pain in the neck, since I already wrote my reply, but I'll remove them too.) Nor do I find your completely groundless conclusion that I'm deprecating other tools offensive.

I find them to be an indicator of your fears which cannot be grounded in any experience of mine---in exactly the kind of environment PEP 3131 will provide. I strongly suspect you have no experience at all, not even hearsay, to offer. Please prove me wrong! My experience is far from definitive.

But if you can't, well, I don't blame you for your fear, but I also cannot take it seriously as a reason to not implement this PEP in the face of my own long experience.

but Ka-Ping already stated why this argument is invalid: there does not currently exist a font where one can differentiate all the glyphs,

I'll tell you why Ka-Ping's argument is a strawman. First, one only needs to be able to distinguish those characters that one can read. It's nice to be able to admire the rest, of course, but you don't need to see them as a speaker of that language would. You just use a font you like for the characters you can read, and the rest can be any old dog.

Second, you do not need a single font with universal coverage. I typically use different fonts for Roman, Kanji, half-width kana, and Hangul. If I happen to have some Chinese in there, that will be yet another font. If I had cause to use Arabic, Hebrew, or Thai, they would be yet other fonts. It simply is not at all unpleasant to use LucidaTypwriter for ASCII and Latin-1 in the same buffer with Sazanami Gothic for Japanese.

N.B. Martin is correct to point out the existing of the SIL BMP fallback font, but that doesn't answer the real issue, that users should use the fonts (and tools) they like best.

and further, even if one could visually differentiate similar

I have actually worked in an environment where you can't visually distinguish different characters. Security aside, it's a PITA, and you do want tools to deal with it. Those tools are not expensive; simply audit the editor buffer for characters outside of the user's acceptable set, and be 99% happy. Once you've got tools, it's not a big deal. Can you find somebody with experience to say otherwise?

glyphs, remembering the 64,000+ glyphs that are available in just the primary unicode plane to differentiate them, is a herculean task.

Strawman. The only people who need to remember the glyphs are those who need to read them anyway, or glyphs that look like them (cf Ka-Ping's example). So they have already memorized them.

Never mind the fact that people use dozens, perhaps hundreds of different editors to write and maintain Python code, that the 'Emacs works' argument is poor at best. it was invalid then, and it was invalid now.

It was intended only to counter Ka-Ping's strawman of "impossible to detect", and it demolishes that claim.

But addressing the content of what you write, you mean that, in a world that allows multilingual identifiers, 'Emacs works' "smells like" [from your original post] a threat to the market share of editors that can't deal with multilingual identifiers, not to mention the work habits of Emacs-haters everywhere, don't you?

Well, you're probably wrong. If your users need to deal with multilingual identifiers, maybe they'll prefer to switch to Emacs. If they need extremely robust handling of multilingual identifiers on a daily basis, they probably will switch to Emacs.

I doubt it, though. What they'll probably do is write a five line patch to get them 90% of the way to what Emacs gives them out of the box, and be ecstatic that they don't have to use Emacs at all. (That's a guess, as an XEmacs developer I don't see much of that activity.)

And that's a big "if". Most of your users will not see code in a language the current version of your editor can't deal with in their working lives, and 90% won't in the usable life of your product. That I can tell you from experience. Emacs has all these wonderful multilingual features, but you know what? 95% of our users are monoscript 100% of the time.[1] 90% of the rest use their primary script 95% of the time. Emacs being multilingual only means that the one language might be Japanese or Thai. If 99% of your users currently use only ISO-8859-15, that isn't going to change by much just because Python now allows Thai identifiers.

In other works, if you're up multilingual creek without a paddle, Emacs will get you to shore. Do you have a problem with it, put that way?

That's a invalid argument, and you know it. "Just use hex escapes"?

No, my argument is not "just use hex escapes". Please read it again, and if you wish to respond to what I wrote, feel free.

So, you have my apologies, but I still advocate implementation of PEP 3131 over your objections, and those of Ka-Ping.

Footnotes: [1] Eg, all Swiss know a half-dozen languages, but they can write all of them with one script, ISO-8859-15.



More information about the Python-3000 mailing list