Issue 18662: re.escape should not escape the hyphen (original) (raw)

Created on 2013-08-05 17:10 by jjl, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
rebugtest.py jjl,2013-08-05 17:10 Test case
Messages (6)
msg194495 - (view) Author: James Laver (jjl) Date: 2013-08-05 17:10
Traceback (most recent call last): File "/Users/jlaver/retest.py", line 6, in test_escape self.assertEquals(re.escape('-'), '-') AssertionError: '\\-' != '-' The only place you can do bad things with hyphens is in a character class. I fail to see how you'd be in the situation of wanting to use escape()d data in a character class. Even if I could think of a reason to do that, it's decidedly not the usual case. It's http://bugs.python.org/issue2650 all over again, just with a different character (in that case, underscore). While we're at it, what else shouldn't it be escaping?
msg194496 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-08-05 17:19
The help says: """>>> help(re.escape) Help on function escape in module re: escape(pattern) Escape all the characters in pattern except ASCII letters, numbers and '_'. """ The complementary approach is to escape _only_ the metacharacters.
msg194497 - (view) Author: James Laver (jjl) Date: 2013-08-05 17:35
Quite right, it does say that in the documentation. The documentation is perfectly correct, but the behaviour is wrong in my opinion and as you suggest, we should be escaping metacharacters only.
msg194526 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-08-06 05:52
In #2650 re.escape() was updated to match Perl's behavior. I don't think there's any actual reason to change it -- it brings no benefits and it might break some code (even if admittedly it's not very likely).
msg194544 - (view) Author: James Laver (jjl) Date: 2013-08-06 13:48
I looked up quotemeta with perldoc and you're right, it will quote the hyphen. Given that python's regex engine correctly deals with unnecessarily quoted characters, I suppose this is fine.
msg194563 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-08-06 16:18
I can think of a real disadvantage with the current behaviour: it messes up Unicode graphemes. For example: >>> print('हिन्दी') हिन्दी >>> print(re.escape('हिन्दी')) \ह\ि\न\्\द\ी Of course, that's only a problem if you need to print it out or write it to a file.
History
Date User Action Args
2022-04-11 14:57:49 admin set github: 62862
2013-08-06 16🔞28 mrabarnett set messages: +
2013-08-06 15:26:13 ezio.melotti set stage: resolved
2013-08-06 13:48:51 jjl set status: open -> closedresolution: wont fixmessages: +
2013-08-06 05:52:36 ezio.melotti set type: behavior -> enhancementmessages: + versions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.5
2013-08-05 17:35:44 jjl set messages: +
2013-08-05 17:19:34 mrabarnett set messages: +
2013-08-05 17:10:44 jjl create