Issue 1519638: Unmatched Group issue - workaround (original) (raw)

Issue1519638

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: BMintern, Nikker, THRlWiTi, effbot, ezio.melotti, gerardjp, mchaput, mrabarnett, nneonneo, python-dev, serhiy.storchaka, terry.reedy, timehorse
Priority: normal Keywords: patch

Created on 2006-07-09 18:34 by nneonneo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_sub_unmatched_group.patch serhiy.storchaka,2014-09-18 10:54 review
Messages (23)
msg29112 - (view) Author: Robert Xiao (nneonneo) * Date: 2006-07-09 18:34
Using sre.sub[n], an "unmatched group" error can occur. The test I used is this pattern: sre.sub("foo(?:b(ar)|baz)","\\1","foobaz") This will cause the following backtrace to occur: Traceback (most recent call last): File "", line 1, in ? File "lib/python2.4/sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count) File "lib/python2.4/sre.py", line 260, in filter return sre_parse.expand_template(template, match) File "lib/python2.4/sre_parse.py", line 782, in expand_template raise error, "unmatched group" sre_constants.error: unmatched group Python Version 2.4.3, Mac OS X (behaviour has been verified on Windows 2.4.3 as well). This behaviour, while by design, is unwanted because this type of matching usually requests that a blank match be returned (i.e. the example should return '') The example that I was trying resembles the following: sre.sub("User: (?:Registered User #(\d+) Guest)","%USERID \1%",data) The intended behaviour is that the function returns "" when the user is a guest and the user number if the user is a registered member. However, when this function encounters a Guest, it raises an exception and terminates, which is not what is wanted. Perl and other regex engines behave as I have described, substituting empty strings for unmatched groups. The code fix is relatively simple, and would really help out for these types of things.
msg29113 - (view) Author: Matt Chaput (mchaput) Date: 2007-02-15 18:35
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception.
msg29114 - (view) Author: Robert Xiao (nneonneo) * Date: 2007-02-17 02:56
AFAIK the findall function works as desired in this respect: empty matches will return empty strings.
msg58672 - (view) Author: Brandon Mintern (BMintern) Date: 2007-12-16 12:24
This is still a problem which has just given me a headache, because using re.sub now requires gymnastics instead of just using a simple string as I did in Perl.
msg69541 - (view) Author: Gerard (gerardjp) Date: 2008-07-11 08:17
Hi All, I found a workaround for the re.sub method so it does not raise an exception but returns and empty string when backref-ing an empty group. This is the nutshell: When doing a search and replace with sub, replace the group represented as optional for a group represented as an alternation with one empty subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without the double quotes). If there’s nothing matched by this group the empty subexpression matches. Then an empty string is returned instead of a None and the sub method is executed normally instead of raising the “unmatched group” error. A complete description is in my post: http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/ Regards, Gerard.
msg69558 - (view) Author: Brandon Mintern (BMintern) Date: 2008-07-11 16:52
Looking at your code example, that solution seems quite obvious now, and I wouldn't even call it a "workaround". Thanks for figuring this out. Now if I could only remember what code I was using that for...
msg78272 - (view) Author: Robert Xiao (nneonneo) * Date: 2008-12-24 21:30
How would I apply that workaround to my example? re.sub("foo(?:b(ar)|baz)","\\1","foobaz")
msg79830 - (view) Author: Gerard (gerardjp) Date: 2009-01-14 05:21
Dear Bobby, I don't see what would be the part that generates the empty string? Regards, Gerard.
msg79853 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-01-14 14:34
Well, in this example the group (ar) is unmatched, so sre throws the error, and because of the alternation, the workaround you mentioned doesn't seem to directly apply. A better example is probably re.sub("foo(?:b(ar)|foo)","\\1","foofoo") because this can't be simply repaired by refactoring the regex. The correct behaviour, as I have observed in other regex implementations, is to replace the group by the empty string; for example, in Javascript: >>> 'foobar'.replace(/foo(?:b(ar) baz)/,'$1') "ar" >>> 'foobaz'.replace(/foo(?:b(ar) baz)/,'$1') ""
msg81064 - (view) Author: Gerard (gerardjp) Date: 2009-02-03 15:59
Bobby, Can you post the actual text you need this for? The back ref indeed returns a None. I'm wondering if the regex can be be simplefied and if a positive lookbehind could solve this. Symantically speaking ... If there's a "b" then return the "ar", because then an empty alternate might again be of help. Kind regards, Gerard.
msg81118 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-02-04 00:36
It was so long ago, I've since redone half my codebase (the hack is still there, but I can't remember what it was meant to replace now :( ). Sorry about that.
msg81220 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2009-02-05 19:32
This has been addressed in issue #2636.
msg81462 - (view) Author: Gerard (gerardjp) Date: 2009-02-09 16:44
Matthew, Thanx for the heads-up! Regards, Gerard.
msg108662 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-26 00:30
If I understand "This has been addressed in issue #2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct?
msg108669 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-06-26 00:58
Issue #2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module.
msg108670 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-06-26 01:09
It would be nice if you could port 'pieces' of #2636 to Python, in order to fix this and other bugs (and possibly add more features too).
msg155967 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-15 22:02
I'm having the same issue as the original author of this issue was. The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional. I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it. My regular expression looks like so: These work: >>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)') 'avatar 2009' >>> re.sub(r'(?:\((?:(\d+) .*?)\)\s*)+$','\\1','avatar (2009) (special edition)') 'avatar 2009' This doesn't: >>> re.sub(r'(?:\((?:(\d+) .*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/re.py", line 151, in sub return _compile(pattern, 0).sub(repl, string, count) File "/usr/lib/python2.6/re.py", line 278, in filter return sre_parse.expand_template(template, match) File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template raise error, "unmatched group" sre_constants.error: unmatched groupedition)') Is there some way I can apply this workaround to this situation?
msg155969 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-15 22:04
Sorry, the non-working command should look as follows: re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)')
msg155982 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-03-16 00:59
The replacement can be a callable, so you could do this: re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)')
msg155983 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-16 01:08
Perfect; thank you!
msg227037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 10:54
Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5.
msg228966 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-10 08:16
New changeset bd2f1ea04025 by Serhiy Storchaka in branch 'default': Issue 1519638: Now unmatched groups are replaced with empty strings in re.sub() https://hg.python.org/cpython/rev/bd2f1ea04025
msg228969 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-10 08:45
Thank you for your review Antoine.
History
Date User Action Args
2022-04-11 14:56:18 admin set github: 43640
2014-10-10 08:45:02 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2014-10-10 08:16:35 python-dev set nosy: + python-devmessages: +
2014-10-10 07:50:01 serhiy.storchaka set assignee: serhiy.storchaka
2014-10-08 20:32:20 pitrou set assignee: effbot -> (no value)
2014-09-18 10:54:53 serhiy.storchaka set files: + re_sub_unmatched_group.patchtype: enhancementcomponents: + Library (Lib)versions: + Python 3.5, - Python 2.6, Python 2.7keywords: + patchnosy: + serhiy.storchakamessages: + stage: patch review
2013-09-16 14:39:27 THRlWiTi set nosy: + THRlWiTi
2012-03-16 01:08:10 Nikker set messages: +
2012-03-16 00:59:59 mrabarnett set messages: +
2012-03-15 22:04:12 Nikker set messages: +
2012-03-15 22:02:49 Nikker set nosy: + Nikkermessages: +
2010-06-26 01:09:57 ezio.melotti set nosy: + ezio.melottimessages: +
2010-06-26 00:58:24 mrabarnett set messages: +
2010-06-26 00:30:53 terry.reedy set nosy: + terry.reedymessages: + versions: - Python 2.5, Python 3.0
2009-02-09 16:44:49 gerardjp set messages: +
2009-02-05 19:32:55 mrabarnett set nosy: + mrabarnettmessages: +
2009-02-04 00:36:38 nneonneo set messages: +
2009-02-03 15:59:47 gerardjp set messages: +
2009-01-14 14:34:02 nneonneo set messages: + versions: + Python 2.6, Python 2.5, Python 3.0
2009-01-14 05:21:40 gerardjp set messages: +
2008-12-24 21:30:42 nneonneo set messages: +
2008-09-27 14:39:08 timehorse set versions: + Python 2.7, - Python 2.5
2008-09-27 14:36:36 timehorse set nosy: + timehorse
2008-07-11 16:52:19 BMintern set messages: +
2008-07-11 08:17:20 gerardjp set nosy: + gerardjpmessages: + title: Unmatched Group issue -> Unmatched Group issue - workaround
2007-12-16 12:24:50 BMintern set nosy: + BMinternmessages: +
2006-07-09 18:34:12 nneonneo create