Issue 33566: re.findall() dead locked whent the expected ending char not occur until end of string (original) (raw)

Created on 2018-05-18 08:06 by mamamiaibm, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg317013 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:06
Firstly, I wrote something like this: patn = r"\bROW\s*\((\d+|\*)\)(. \s)*?\)" newlines = re.sub(patn, "\nYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY\n", newlines) but if the file(or string) ended without the expected ")" the code deadlock there, no progress, no exception, and no exit. Then I changed it to : patn = r"\bROW\s*\((\d+ \*)\)(.
msg317015 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:09
Sorry, forgot I have upgraded to 3.6.2, not 3.5
msg317017 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:19
Sorry again, the sample code offered is issue of re.sub(), not findall() :o)))
msg317042 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-05-18 17:47
You don't give the value of 'newlines', but the problem is probably catastrophic backtracking, not deadlock.
msg317043 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-05-18 17:56
Min, you need to give a complete example other people can actually run for themselves. Offhand, this part of the regexp (.|\s)* all by itself _can_ cause exponential-time behavior. You can run this for yourself: >>> import re >>> p = r"(. \s)*K" >>> re.search(p, " " * 10) # fast >>> re.search(p, " " * 15) # fast >>> re.search(p, " " * 20) # obviously takes a bit of time >>> re.search(p, " " * 21) # very obviously takes time >>> re.search(p, " " * 22) # over a second >>> re.search(p, " " * 25) # about 10 seconds Etc.
msg322599 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-29 00:18
Closing as not-a-bug - not enough info to reproduce, but the regexp looked prone to exponential-time backtracking to both MRAB and me, and there's been no response to requests for more info.
History
Date User Action Args
2022-04-11 14:59:00 admin set github: 77747
2018-07-29 00🔞40 tim.peters set status: open -> closedcomponents: + Regular Expressionsnosy: + ezio.melottimessages: + resolution: not a bugstage: resolved
2018-05-18 17:56:58 tim.peters set nosy: + tim.petersmessages: +
2018-05-18 17:47:19 mrabarnett set nosy: + mrabarnettmessages: +
2018-05-18 08:19:22 mamamiaibm set messages: +
2018-05-18 08:09:57 mamamiaibm set messages: + versions: + Python 3.6, - Python 3.5
2018-05-18 08:06:05 mamamiaibm create