Issue 19365: Quadratic complexity in the parsing of re replacement string (original) (raw)

sre_parse.parse_template uses string concatenation to accumulate characters.

def literal(literal, p=p, pappend=a):
    if p and p[-1][0] is LITERAL:
        p[-1] = LITERAL, p[-1][1] + literal
    else:
        pappend((LITERAL, literal))

This operation have quadratic calculation complexity for long replacement strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 3.38 sec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 18.2 sec per loop

The proposed patch change amortized complexity to be linear. It also speeds up parsing shorter strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 915 msec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 1.79 sec per loop