Issue 14260: re.groupindex is available for modification and continues to work, having incorrect data inside it (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Abel.Farias, eric.araujo, eric.snow, ezio.melotti, georg.brandl, gvanrossum, mrabarnett, py.user, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: needs review, patch

Created on 2012-03-12 05:28 by py.user, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_groupindex_copy.patch serhiy.storchaka,2014-11-01 15:27 review
re_groupindex_proxy.patch serhiy.storchaka,2014-11-01 15:27 review
Messages (18)
msg155442 - (view) Author: py.user (py.user) * Date: 2012-03-12 05:28
>>> import re >>> p = re.compile(r'abc(?Pdef)') >>> p.sub(r'\g', 'abcdef123abcdef') 'def123def' >>> p.groupindex['n'] = 2 >>> p.sub(r'\g', 'abcdef123abcdef') 'def123def' >>> p.groupindex {'n': 2} >>>
msg155459 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-03-12 18:16
The re module creates the dict purely for the benefit of the user, and as it's a normal dict, it's mutable. An alternative would to use an immutable dict or dict-like object, but Python doesn't have such a class, and it's probably not worth writing one just for this use-case.
msg155484 - (view) Author: py.user (py.user) * Date: 2012-03-12 21:37
Matthew Barnett wrote: > The re module creates the dict purely for the benefit of the user this dict affects on regex.sub() >>> import re >>> p = re.compile(r'abc(?Pdef)') >>> p.groupindex {'n': 1} >>> p.groupindex['n'] = 2 >>> p.sub(r'\g', 'abcdef') Traceback (most recent call last): File "/usr/local/lib/python3.2/sre_parse.py", line 811, in expand_template literals[index] = s = g(group) IndexError: no such group During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/re.py", line 286, in filter return sre_parse.expand_template(template, match) File "/usr/local/lib/python3.2/sre_parse.py", line 815, in expand_template raise error("invalid group reference") sre_constants.error: invalid group reference >>>
msg155546 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-03-13 00:52
It appears I was wrong. :-( The simplest solution in that case is for it to return a _copy_ of the dict.
msg155560 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-13 02:34
But regex.sub is affected only if you manually muck with the dict, right? If so, then it looks like a case of “it hurts when I do this” (the doctor’s reply: “Don’t do this.”)
msg155570 - (view) Author: py.user (py.user) * Date: 2012-03-13 05:02
the first message shows how it can work with a broken dict Éric Araujo wrote: > But regex.sub is affected only if you manually muck with the dict, right? I can get code from anywhere
msg155593 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-13 11:54
> I can get code from anywhere I am afraid I don’t understand. Could you start again and explain what bug you ran into, i.e. what behavior does not match what the docs say? At present this report looks like it is saying “when I put random things in an internal data structures then bad things happen”, and I don‘t think Python promises to not break when people do random editions to internal data structures.
msg155702 - (view) Author: py.user (py.user) * Date: 2012-03-14 00:56
I take someone's code make tests for its behavior all tests say "the code is working" I continue to write the code make tests for its behavior all tests say "the code is working" I install it somewhere and it crashes now it is depending on the cache, when this exception is raised Éric Araujo wrote: >and I don‘t think Python promises to not break when people do random editions when people do something wrong, python should raise an exception
msg155734 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-03-14 07:26
Looks like a case for a read-only dict/dictproxy :)
msg175494 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-13 10:14
I fully agree with Éric. Just don't do this.
msg175497 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-13 11:06
I'm not so sure. If dicts or classes are used for configuration or informational purposes, I prefer them to be locked down. An example of the first is the decimal context, where it was possible to write context.emax = 9 instead of context.Emax = 9 without getting an error. This is an easy mistake to make and can be hard to track down in a large program. The mistake here is maybe less likely, but I agree with Georg that it's a case for a read-only dict/dictproxy.
msg175502 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-11-13 16:01
I propose using a MappingProxy type in 3.4 and add an example to the docs for stable versions.
msg175505 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-13 16:48
Copy or proxy may affect performance. We will need to make benchmarks to see how much.
msg230450 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-01 15:27
Here are two patches which implement two alternative solutions. They are based on regex code. Dict copying patch matches current regex behavior and needs modifying other code to avoid small slowdown. Artificial example: $ ./python -m timeit -s 'import re; n = 100; m = re.match("".join("(?P<g%d>.)" % g for g in range(n)), "x" * n); t = ",".join(r"\g<g%d>" % g for g in range(n))' -- 'm.expand(t)' Without patch: 7.48 msec per loop With re_groupindex_copy.patch but without modifying _expand: 9.61 msec per loop With re_groupindex_copy.patch and with modifying _expand: 7.41 msec per loop While stdlib code can be modified, this patch can cause small slowdown of some third-party code. Dict proxying patch has no performance effect, but it is slightly less compatible. Some code can accept dict but not dict-like object.
msg234878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-28 09:23
Ping.
msg239041 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-23 15:27
What approach looks better, a copy or a read-only proxy?
msg239094 - (view) Author: py.user (py.user) * Date: 2015-03-24 07:35
@Serhiy Storchaka > What approach looks better, a copy or a read-only proxy? ISTM, your proxy patch is better, because it expects an exception rather than silence.
msg239526 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-03-29 22:02
New changeset 4d5826fa77a1 by Serhiy Storchaka in branch 'default': Issue #14260: The groupindex attribute of regular expression pattern object https://hg.python.org/cpython/rev/4d5826fa77a1
History
Date User Action Args
2022-04-11 14:57:27 admin set github: 58468
2015-03-30 07:08:36 serhiy.storchaka set status: open -> closedassignee: serhiy.storchakaresolution: fixedstage: patch review -> resolved
2015-03-29 22:02:50 python-dev set nosy: + python-devmessages: +
2015-03-24 07:35:37 py.user set messages: +
2015-03-23 15:27:01 serhiy.storchaka set messages: +
2015-01-28 09:23:56 serhiy.storchaka set keywords: + needs reviewmessages: +
2014-11-05 20:17:53 serhiy.storchaka set stage: needs patch -> patch reviewversions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2014-11-01 15:27:02 serhiy.storchaka set files: + re_groupindex_copy.patch, re_groupindex_proxy.patchkeywords: + patchmessages: +
2014-10-14 15:13:45 skrah set nosy: - skrah
2012-11-13 22:03:11 py.user set title: re.groupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex is available for modification and continues to work, having incorrect data inside it
2012-11-13 16:48:56 serhiy.storchaka set messages: +
2012-11-13 16:01:30 eric.araujo set resolution: not a bug -> (no value)stage: needs patchmessages: + versions: + Python 2.7, Python 3.3, Python 3.4
2012-11-13 11🔞05 skrah set title: re.grupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex available for modification and continues to work, having incorrect data inside it
2012-11-13 11:12:13 Abel.Farias set nosy: + Abel.Farias
2012-11-13 11:11:28 Abel.Farias set title: re.groupindex available for modification and continues to work, having incorrect data inside it -> re.grupindex available for modification and continues to work, having incorrect data inside it
2012-11-13 11:06:20 skrah set status: pending -> opennosy: + skrahmessages: +
2012-11-13 10:14:55 serhiy.storchaka set status: open -> pendingnosy: + serhiy.storchakamessages: + resolution: not a bug
2012-11-13 03:04:49 eric.snow set nosy: + eric.snow
2012-03-14 12:15:04 vstinner set nosy: + gvanrossum
2012-03-14 07:26:21 georg.brandl set nosy: + georg.brandl, vstinnermessages: +
2012-03-14 00:56:37 py.user set messages: +
2012-03-13 11:54:14 eric.araujo set messages: +
2012-03-13 05:02:23 py.user set messages: +
2012-03-13 02:34:27 eric.araujo set nosy: + eric.araujomessages: +
2012-03-13 00:52:18 mrabarnett set messages: +
2012-03-12 21:37:18 py.user set messages: +
2012-03-12 18:16:40 mrabarnett set messages: +
2012-03-12 05:35:27 eric.smith set title: regex.groupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex available for modification and continues to work, having incorrect data inside it
2012-03-12 05:28:22 py.user create