msg314187 - (view) |
Author: guohui (ghi5107) |
Date: 2018-03-21 06:28 |
I found a issue in regex (findall search)function, when seaching some content by some pattern, the function return for a long long time, match performance is very low. I think this issue could lead to too low query performance, or a attacker may exploit the issue to cause a denail of service condition. system: python 2.7.14 regex(2018.2.21) poc: import re pat = r'^(\(?[\w\d\-\.\\]{3,}\|?){1,}[\w\d\-\.\\]{3,}\)?$' #plaintext content content = r'(ftp\x3a\x2f\x2f |
http\x3a\x2f\x2f |
https\x3a\x2f\x2f |
msg322584 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-07-28 18:36 |
I suggest closing this as "wontfix". This is a just an non-optimized regexp pattern leading to long run times. That these are possible is a well-known trait of backtracking regular expression engines in general, and ours in particular. IMO this isn't a security issue since the root of the issue is the pattern. I don't see this as a bug or a significant performance issue either, and there is no concrete enhancement suggestion here. For clarification, the given pattern is equivalent to: pat = r'''^ ( \(? [\w\d\-\.\\]{3,} \|? )+ [\w\d\-\.\\]{3,} \)? $''' |
|
|
msg322585 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-07-28 18:37 |
Clarification: The given pattern is equivalent to that in my previous post, assuming the latter is used with the re.VERBOSE flag. |
|
|
msg322587 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-07-28 19:53 |
I concur with Tal. This problem is called "catastrophic backtracking". |
|
|
msg322593 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2018-07-28 22:25 |
Note: if you found a regexp like this _in_ the Python distribution, then a bug report would be appropriate. It's certainly possible to write regexps that can suffer catastrophic backtracking, and we've repaired a few of those, over the years, that shipped with Python. But we can't stop you from writing such things yourself. If you post your regexp to, e.g., comp.lang.python or StackOverflow, I'm sure someone will show you how to rewrite it in a safe way. But be prepared to explain in English what you're trying to accomplish with it. For example, while it appears you're trying to ensure there are at least 3 characters (of the right kind) between "|" separators, for some reason you made matching " |
" optional. That leaves open an exponential number of ways to try to match long strings of non-" |
" characters between " |
msg322608 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-07-29 03:51 |
Perhaps it would be nice to add a section about catastrophic backtracking and ways of resolving it in the RE Howto. |
|
|
msg322611 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-07-29 06:50 |
Serhiy, that would be a good idea. A short mention of the issue with a link to an external reference would also suffice IMO. |
|
|