[Python-3000] Raw strings containing \u or \U (original) (raw)

Ron Adam rrr at ronadam.com
Fri May 18 18:17:53 CEST 2007


Georg Brandl wrote:

Ron Adam schrieb:

Guido van Rossum wrote:

That would be great! This will automatically turn \u1234 into 6 characters, right? I'm not exactly clear when the '\uxxxx' characters get converted. There isn't any conversion done in tokanize.c that I can see. It's primarily only concerned with finding the beginning and ending of the string at that point. It looks like everything between the beginning and end is just passed along "as is" and it's translated further later in the chain. Look at Python/ast.c, which has functions parsestr() and decodeunicode(). The latter calls PyUnicodeDecodeRawUnicodeEscape() which I think is the function you're looking for. Georg

Thanks, I'll look there.

That should be where I need to look to fix a glitch where the last quote of a raw string is both the end of the string and part of a string.

r'' "\'"

Interestingly it works just fine for raw byte strings. (I wish the letter were reversed, saying bytes-raw-string is awkward.)

br'' b'\'

Anyway, I've made the corresponding modifications to tokenize.py and tokenize_tests.txt.

The tests for tokenize.py need to be updated. They do a round trip test, but I've found that doesn't mean it's the correct round trip!

Cheers, Ron



More information about the Python-3000 mailing list