[Python-3000] pep 3131 again (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Thu May 17 20:42:56 CEST 2007


2. Python forbids these characters. Martin, JavaScript treats these specially, and I think Python probably should, too:

The ECMAScript 3 standard for JavaScript requires the tokenizer to throw away all Unicode format-control characters (general category Cf). ECMAScript 4 will likely tweak this (an incompatible change) to retain those characters only in strings and regexps. I like that better.

I've added this as an open issue. It would be easy to add, but I would like to get some confirmation first that it actually helps writers of the RTL languages (preferably from some native speakers).

The proposed change would be that Cf characters would be allowed only in and immediately around identifiers, and in string literals and comments, i.e. the scanner would work this way:

IOW, you couldn't put the formatting characters around whitespace, keywords, or punctuation.

An alternative implementation would be to drop formatting characters everywhere except in string literals.

I'll repeat that UTR#39 explicitly discourages support for formatting characters in identifiers.

Regards, Martin



More information about the Python-3000 mailing list