[Python-Dev] \u and \U escapes in raw unicode string literals (original) (raw)

Guido van Rossum guido at python.org
Thu May 10 20:45:57 CEST 2007


I just discovered that, in all versions of Python as far back as I have access to (2.0), \uXXXX escapes are interpreted inside raw unicode strings. Thus:

a = ur"\u1234" len(a) 1

Contrast this with:

a = ur"\x12" len(a) 4

The \U escape has the same behavior, in versions that support it.

Does anyone remember why it is done this way? The reference manual describes this behavior, but doesn't give an explanation:

""" When an "r" or "R" prefix is used in conjunction with a "u" or "U" prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed while all other backslashes are left in the string. For example, the string literal ur"\u0062\n" consists of three Unicode characters: LATIN SMALL LETTER B', REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. Backslashes can be escaped with a preceding backslash; however, both remain in the string. As a result, \uXXXX escape sequences are only recognized when there are an odd number of backslashes. """

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list