Issue 19365: Quadratic complexity in the parsing of re replacement string (original) (raw)
sre_parse.parse_template uses string concatenation to accumulate characters.
def literal(literal, p=p, pappend=a):
if p and p[-1][0] is LITERAL:
p[-1] = LITERAL, p[-1][1] + literal
else:
pappend((LITERAL, literal))
This operation have quadratic calculation complexity for long replacement strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 3.38 sec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 18.2 sec per loop
The proposed patch change amortized complexity to be linear. It also speeds up parsing shorter strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 915 msec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 1.79 sec per loop