Traceback (most recent call last): File "/Users/jlaver/retest.py", line 6, in test_escape self.assertEquals(re.escape('-'), '-') AssertionError: '\\-' != '-' The only place you can do bad things with hyphens is in a character class. I fail to see how you'd be in the situation of wanting to use escape()d data in a character class. Even if I could think of a reason to do that, it's decidedly not the usual case. It's http://bugs.python.org/issue2650 all over again, just with a different character (in that case, underscore). While we're at it, what else shouldn't it be escaping?
The help says: """>>> help(re.escape) Help on function escape in module re: escape(pattern) Escape all the characters in pattern except ASCII letters, numbers and '_'. """ The complementary approach is to escape _only_ the metacharacters.
Quite right, it does say that in the documentation. The documentation is perfectly correct, but the behaviour is wrong in my opinion and as you suggest, we should be escaping metacharacters only.
In #2650 re.escape() was updated to match Perl's behavior. I don't think there's any actual reason to change it -- it brings no benefits and it might break some code (even if admittedly it's not very likely).
I looked up quotemeta with perldoc and you're right, it will quote the hyphen. Given that python's regex engine correctly deals with unnecessarily quoted characters, I suppose this is fine.
I can think of a real disadvantage with the current behaviour: it messes up Unicode graphemes. For example: >>> print('हिन्दी') हिन्दी >>> print(re.escape('हिन्दी')) \ह\ि\न\्\द\ी Of course, that's only a problem if you need to print it out or write it to a file.
History
Date
User
Action
Args
2022-04-11 14:57:49
admin
set
github: 62862
2013-08-06 16🔞28
mrabarnett
set
messages: +
2013-08-06 15:26:13
ezio.melotti
set
stage: resolved
2013-08-06 13:48:51
jjl
set
status: open -> closedresolution: wont fixmessages: +