Issue 33113: Query performance is very low and can even lead to denial of service (original) (raw)

Created on 2018-03-21 06:28 by ghi5107, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_performance.py ghi5107,2018-03-21 06:28 repro file
Messages (7)
msg314187 - (view) Author: guohui (ghi5107) Date: 2018-03-21 06:28
I found a issue in regex (findall search)function, when seaching some content by some pattern, the function return for a long long time, match performance is very low. I think this issue could lead to too low query performance, or a attacker may exploit the issue to cause a denail of service condition. system: python 2.7.14 regex(2018.2.21) poc: import re pat = r'^(\(?[\w\d\-\.\\]{3,}\|?){1,}[\w\d\-\.\\]{3,}\)?$' #plaintext content content = r'(ftp\x3a\x2f\x2f http\x3a\x2f\x2f https\x3a\x2f\x2f
msg322584 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-28 18:36
I suggest closing this as "wontfix". This is a just an non-optimized regexp pattern leading to long run times. That these are possible is a well-known trait of backtracking regular expression engines in general, and ours in particular. IMO this isn't a security issue since the root of the issue is the pattern. I don't see this as a bug or a significant performance issue either, and there is no concrete enhancement suggestion here. For clarification, the given pattern is equivalent to: pat = r'''^ ( \(? [\w\d\-\.\\]{3,} \|? )+ [\w\d\-\.\\]{3,} \)? $'''
msg322585 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-28 18:37
Clarification: The given pattern is equivalent to that in my previous post, assuming the latter is used with the re.VERBOSE flag.
msg322587 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-28 19:53
I concur with Tal. This problem is called "catastrophic backtracking".
msg322593 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-28 22:25
Note: if you found a regexp like this _in_ the Python distribution, then a bug report would be appropriate. It's certainly possible to write regexps that can suffer catastrophic backtracking, and we've repaired a few of those, over the years, that shipped with Python. But we can't stop you from writing such things yourself. If you post your regexp to, e.g., comp.lang.python or StackOverflow, I'm sure someone will show you how to rewrite it in a safe way. But be prepared to explain in English what you're trying to accomplish with it. For example, while it appears you're trying to ensure there are at least 3 characters (of the right kind) between "|" separators, for some reason you made matching " " optional. That leaves open an exponential number of ways to try to match long strings of non-" " characters between "
msg322608 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-29 03:51
Perhaps it would be nice to add a section about catastrophic backtracking and ways of resolving it in the RE Howto.
msg322611 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-29 06:50
Serhiy, that would be a good idea. A short mention of the issue with a link to an external reference would also suffice IMO.
History
Date User Action Args
2022-04-11 14:58:58 admin set github: 77294
2018-07-29 06:50:44 taleinat set messages: +
2018-07-29 03:51:47 serhiy.storchaka set messages: +
2018-07-28 22:25:48 tim.peters set nosy: + tim.petersmessages: +
2018-07-28 19:53:24 serhiy.storchaka set status: open -> closedresolution: wont fixmessages: + stage: resolved
2018-07-28 18:37:20 taleinat set messages: +
2018-07-28 18:36:11 taleinat set nosy: + taleinatmessages: +
2018-03-21 08:52:06 xiang.zhang set nosy: + serhiy.storchaka
2018-03-21 06:28:15 ghi5107 create