[Python-Dev] (no subject) (original) (raw)

MRAB python at mrabarnett.plus.com
Tue Dec 26 15:15:18 EST 2017


On 2017-12-26 07:01, Yogev Hendel wrote:

I don't know if this is the right place to put this, but I've found the following lines of code results in an incredibly long processing time. Perhaps it might be of use to someone. /import re/ /pat = re.compile('^/?(?:\w+)/(?:[%\w-]+/?)+/?$')/ /pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')/ The pattern has a repeated repeat, which results in catastrophic backtracking.

As an example, think about how the pattern (?:a+)+b would try to match the string 'aaac'.

 Match 'aaa', but not 'c'.

 Match 'aa' and 'a', but not 'c'.

 Match 'a' and 'aa', but not 'c'.

 Match 'a' and 'a' and 'a', but not 'c'.

That's 4 failed attempts.

Now try match the string 'aaaac'.

 Match 'aaaa', but not 'c'.

 Match 'aaa' and 'a', but not 'c'.

 Match 'aa' and 'aa', but not 'c'.

 Match 'aa' and 'a a', but not 'c'.

 Match 'a' and 'aaa', but not 'c'.

 Match 'a' and 'aa' and 'a', but not 'c'.

 Match 'a' and 'a aa', but not 'c'.

 Match 'a' and 'a a' and 'a', but not 'c'.

That's 8 failed attempts.

Each additional 'a' in the string to match will double the number of attempts.

Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has a '.', which the pattern can't match, but it'll keep trying until it finally fails.

If you add just 1 more 'a' or '-' to the string, it'll take twice as long as it does now.

You need to think more carefully about how the pattern matches and what it'll do when it doesn't match.



More information about the Python-Dev mailing list