Issue 22437: re module: number of named groups is limited to 100 max (original) (raw)

Created on 2014-09-18 17:39 by yselivanov, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_maxgroups.patch serhiy.storchaka,2014-09-18 20:36
re_maxgroups_dynamic.patch serhiy.storchaka,2014-09-21 20:49 review
Messages (10)
msg227055 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-09-18 17:39
While writing a lexer for javascript language, I managed to hit the limit of named groups in one regexp, it's 100. The check is in sre_compile.py:compile() function, and there is even an XXX comment on this. Unfortunately, I'm not an expert in this module, so I'm not sure if this check can be lifted, or at least if the number can be bumped to 200 or 500 (why is 100 btw?) Please share your thoughts.
msg227058 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-18 18:04
It is 100 to avoid a syntactic ambiguity between numbered groups and octal numbers, if I remember correctly. I can't remember if that constraint still applies in python3, where the octal notation was made more strict in general.
msg227060 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014-09-18 18:54
In the regex module, I borrowed the \g<...> escape from .sub's replacement string to provide an alternative way to refer to a group in a pattern, and that let me remove the limit.
msg227063 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 20:36
There is two reasons for this limitation. First reason is mentioned by David. There is no syntax to backreference a group with number > 99 (but there is a syntax for conditional groups and for substitutions). Second reason is that current implementation of regexp engine uses an array of constant size for groups. Here is a patch which increases static limit to 1000 groups. It also allows to specify arbitrary group number in form of "(?P=number)". This is conformed to the syntax of conditional groups and for substitutions.
msg227064 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-09-18 20:53
Serhiy, This is awesome! Is is possible to split the patch in two, and commit the one that just increases the groups limit to 3.4 as well? Thank you
msg227066 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 21:13
This is definitely not a bug fix. May be Matthew will commit it to the regex module and then you could use regex instead of re.
msg227237 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-21 20:49
Here is a patch which removes static limit. It is much more complicated than the first patch and I prefer first apply the first patch. Aren't 1000 groups enough for everyone?
msg227635 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-09-26 16:51
I'm fine with either one, Serhiy. The static one looks good to me.
msg227820 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-29 19:50
New changeset 0b85ea4bd1af by Serhiy Storchaka in branch 'default': Issue #22437: Number of capturing groups in regular expression is no longer https://hg.python.org/cpython/rev/0b85ea4bd1af
msg227825 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-29 20:15
Thank you Antoine for your review. To avoid discrepancy between re and regex (and other engines), I have committed only a part of dynamic patch, without adding support of backreferences with index over 99. It is unlikely to achieve this limit in hand written regular expression, and in generated regular expression you can use named groups. I found that backreference syntax is one of most discrepant thing in regular expressions. There are at least 8 different variants (\N, \gN, \g, \g{N}, \k, \k'N', \k{N}, (?P=N)), and \g in Perl have different meaning.
History
Date User Action Args
2022-04-11 14:58:08 admin set github: 66627
2014-09-29 20:15:38 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2014-09-29 19:50:49 python-dev set nosy: + python-devmessages: +
2014-09-26 16:51:05 yselivanov set messages: +
2014-09-21 20:49:46 serhiy.storchaka set files: + re_maxgroups_dynamic.patchmessages: +
2014-09-18 21:13:27 serhiy.storchaka set messages: +
2014-09-18 20:53:23 yselivanov set messages: +
2014-09-18 20:36:43 serhiy.storchaka set assignee: serhiy.storchakastage: patch reviewversions: + Python 3.5
2014-09-18 20:36:02 serhiy.storchaka set files: + re_maxgroups.patchkeywords: + patchmessages: +
2014-09-18 18:54:25 mrabarnett set messages: +
2014-09-18 18:04:00 r.david.murray set nosy: + r.david.murraymessages: +
2014-09-18 17:39:42 yselivanov create