[Python-Dev] Raw string syntax inconsistency (original) (raw)

MRAB python at mrabarnett.plus.com
Mon Jun 18 03:13:31 CEST 2012


On 18/06/2012 00:55, Nick Coghlan wrote:

On Mon, Jun 18, 2012 at 6:41 AM, Guido van Rossum<guido at python.org> wrote:

Would it make sense to detect and reject these in 3.3 if the 2.7 syntax is used? Possibly - I'm trying not to actually change any of the internals of the string literal processing, though. (If I recall the way we implemented the change correctly, by the time we get to processing the string contents, we've forgotten which specific prefix was used) However, tis question did remind me of another detail I wanted to check after realising this discrepancy existed: it turns out this semantic inconsistency already arises if you use "from future import unicodeliterals" to get supposedly "Python 3 style" string literals in 2.x Python 2.7.3 (default, May 29 2012, 14:54:22)

from future import unicodeliterals print(r"\u03b3") γ print("\u03b3") γ Python 3.2.1 (default, Jul 11 2011, 18:54:42) print(r"\u03b3") \u03b3 print("\u03b3") γ So, perhaps the answer is to leave this as is, and try to make 2to3 smart enough to detect such escapes and replace them with their properly encoded (according to the source code encoding) Unicode equivalent?

What if it's not possible to encode that character? I suppose that it could be expanded into a string expression so that a non-raw string literal could be used, possibly using implicit concatenation, parenthesised, if necessary (or always?).

After all, that's already the way to include such characters in a forward compatible way when using the future import:

Python 2.7.3 (default, May 29 2012, 14:54:22)

from future import unicodeliterals print("γ") γ print(r"γ\n") γ\n Python 3.2.1 (default, Jul 11 2011, 18:54:42) print("γ") γ print(r"γ\n") γ\n So, rather than going ahead with reverting "ur" support as I first suggested (since it turns out that's not a new problem, but just a different way of spelling an existing problem), how about I do the following: 1. Add a note to PEP 414 and the Py3k porting guide regarding the discrepancy in escaping semantics for raw Unicode strings between 2.x and 3.x 2. Reject the tracker issue for reverting the ur support (the semantic problem already exists, and any solution we come up with for future.unicodeliterals should handle the ur prefix as well) 3. Create a new feature request for 2to3 to see if it can automatically handle the problem of translating "\u" and "\U" escapes into properly encoded Unicode characters The scope of the problem is really quite small: you have to be using a raw Unicode string in 2.x (either via the string prefix, or the future import) and using a "\u" or "\U" escape within that string. [snip]



More information about the Python-Dev mailing list