Issue 14221: re.sub backreferences to numbered groups produce garbage (original) (raw)

The first example below works; the second one produces output containing garbage characters. (This came up while I was creating a set of examples for a tutorial on regular expressions).

import re

text= "The cat ate the rat." print("before: %s" % text) m= re.search("The (\w+) ate the (\w+)", text) text= "The %s ate the %s." % (m.group(2), m.group(1)) print("after : %s" % text)

text= "The cat ate the rat." print("before: %s" % text) text= re.sub("(\w+) ate the (\w+)", "\2 ate the \1", text) print("after : %s" % text)

You forgot to use raw strings:

text = "The cat ate the rat." print("before: %s" % text) before: The cat ate the rat. text = re.sub("(\w+) ate the (\w+)", r"\2 ate the \1", text) print("after : %s" % text) after : The rat ate the cat.

(Maybe you should reconsider writing yet another tutorial about regular expressions, and possibly submit patches to improve the official regex howto if you think it's not good enough.)