[Python-Dev] Raw string syntax inconsistency (original) (raw)
Terry Reedy tjreedy at udel.edu
Mon Jun 18 07:59:39 CEST 2012
- Previous message: [Python-Dev] Raw string syntax inconsistency
- Next message: [Python-Dev] Raw string syntax inconsistency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6/17/2012 9:07 PM, Guido van Rossum wrote:
On Sun, Jun 17, 2012 at 4:55 PM, Nick Coghlan <ncoghlan at gmail.com
So, perhaps the answer is to leave this as is, and try to make 2to3 smart enough to detect such escapes and replace them with their properly encoded (according to the source code encoding) Unicode equivalent?
But the whole point of the reintroduction of u"..." is to support code that isn't run through 2to3.
People writing 2&3 code sometimes use 2to3 once (or a few times) on their 2.6/7 version during development to find things they must pay attention to. So Nick's idea could be helpful to people who do not want to use 2to3 routinely either in development or deployment.
Frankly, I don't care how it's done, but I'd say it's important not to silently have different behavior for the same notation in the two versions.
The fundamental problem was giving the 'u' prefix two different meanings in 2.x: 'change the storage type from bytes to unicode', and 'change the contents by partially cooking the literal even when raw processing is requested'*. The only way to silently have the same behavior is to re-introduce the second meaning of partial cooking. (But I would rather make it unnecessary.) But that would freeze the 'u' prefix, or at least 'ur' ('un-raw') forever. It would be better to introduce a new, separate 'p' prefix, to mean partially raw, partially cooked. (But I am opposes to
*I think this non-orthogonal interaction effect was a design mistake and that it would have been better to have re do all the cooking needed by also interpreting \u and \U sequences. I also think we should add this now for 3.3 if possible, to make partial cooking at the parsing stage unnecessary. Putting the processing in re makes it work for all strings, not just those given as literals.
If that means we have to add an extra step to the compiler to reject r"\u03b3", so be it.
I do not get this. Surely you cannot mean to suddenly start rejecting, in 3.3, a large set of perfectly legal and sensible 6 and 10 character sequences when embedded in literals?
Hm. I still encounter enough environments that don't know how to display such characters that I would prefer to have a rock solid \u escape mechanism. I can think of two ways to support "expanded" unicode characters in raw strings a la Python 2;
(a) let the re module interpret the escapes (like it does for \r and \n);
As said above, I favor this. The 2.x partial cooking (with 'ur' prefix) was primarily a substitute for this.
(b) the user can write r"someblah" "\u03b3" r"moreblah".
This is somewhat orthogonal to (a). Users can this whenever they want partial processing of backslashes without doubling those they want left as is. A generic example is r'someraw' 'somecooked' r'moreraw' 'morecooked'.
-- Terry Jan Reedy
- Previous message: [Python-Dev] Raw string syntax inconsistency
- Next message: [Python-Dev] Raw string syntax inconsistency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]