[Python-Dev] \u and \U escapes in raw unicode string literals (original) (raw)

Thomas Heller theller at ctypes.org
Fri May 11 13:05:05 CEST 2007


M.-A. Lemburg schrieb:

On 2007-05-11 07:52, Martin v. Löwis wrote:

This is what prompted my question, actually: in Py3k, in the str/unicode unification branch, r"\u1234" changes meaning: before the unification, this was an 8-bit string, where the \u was not special, but now it is a unicode string, where \u is special.

That is true for non-raw strings also: the meaning of "\u1234" also changes. However, traditionally, there was no escaping mechanism in raw strings in Python, and I feel that this is a good principle, because it is easy to learn (if you leave out the detail that \ can't be the last character in a raw string - which should get fixed also, IMO). So I think in Py3k, "\u1234" should continue to be a string with 6 characters. Otherwise, people will complain that os.stat(r"c:\windows\system32\user32.dll") fails. Telling them to write os.stat(r"c:\windows\system32\u005Cuser32.dll") will just cause puzzled faces. Using double backslashes won't cause that reaction: os.stat("c:\windows\system32\user32.dll")

Sure. But I want to use raw strings for Windows path names; it's much easier to type.

Also note that Windows is smart enough nowadays to parse the good old Unix forward slash:

os.stat("c:/windows/system32/user32.dll")

In my opinion this is a windows bug and not a features. Especially because there are Windows api functions (the shell functions, IIRC) that do NOT accept forward slashes.

Would you say that *nix is dumb because it doesn't parse "\usr\include"?

Windows path names are one of the two primary applications of raw strings (the other being regexes).

Thomas



More information about the Python-Dev mailing list