Issue 602444: non greedy match bug (original) (raw)

Issue602444

Created on 2002-08-30 14:44 by rjroy, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reference.pdf rjroy,2002-08-30 14:44 file that will trigger error
Messages (4)
msg12215 - (view) Author: Robert Roy (rjroy) Date: 2002-08-30 14:44
When using the following re to extract all objects from a PDF file, I get a maximum recursion limit exceeded error. Attached is a pdf file that will reproduce the error. If I do import pre as re, it works fine. platform is Win2k, Python 2.2.1 build #34 ####### import re GETOBJECT = re.compile(r'\d+\s+\d+\s+obj.+?endobj', re.I|re.S re.M) pdf = open('userguide.pdf', 'rb').read() all = GETOBJECT.findall(pdf) print len(all)
msg12216 - (view) Author: Robert Roy (rjroy) Date: 2003-02-14 18:56
Logged In: YES user_id=352797 The max recursion limit problem in the re module is well-known. Until this limitation in the implementation is removed, to work around it check http://www.python.org/dev/doc/devel/lib/module-re.html http://python/org/sf/493252
msg12217 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2003-05-21 05:54
Logged In: YES user_id=357491 Closing this since hitting the recursion limit is not a bug.
msg12218 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-05-24 16:52
Logged In: YES user_id=7887 As Gary Herron correctly pointed me out, this was fixed in 2.3 with the introduction of a new opcode to handle single character non-greedy matching. This won't be fixed in 2.2.3, but hopefully will be backported to 2.2.4 together with other regular expression fixes.
History
Date User Action Args
2022-04-10 16:05:38 admin set github: 37117
2002-08-30 14:44:20 rjroy create