[issue19365] Quadratic complexity in the parsing of re replacement string

Serhiy Storchaka report at bugs.python.org
Wed Oct 23 16:36:02 CEST 2013


New submission from Serhiy Storchaka:

sre_parse.parse_template uses string concatenation to accumulate characters.

    def literal(literal, p=p, pappend=a):
        if p and p[-1][0] is LITERAL:
            p[-1] = LITERAL, p[-1][1] + literal
        else:
            pappend((LITERAL, literal))

This operation have quadratic calculation complexity for long replacement strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000"  "parse_template(repl, '')"
1 loops, best of 1: 3.38 sec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000"  "parse_template(repl, '')"
1 loops, best of 1: 18.2 sec per loop

The proposed patch change amortized complexity to be linear. It also speeds up parsing shorter strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000"  "parse_template(repl, '')"
1 loops, best of 1: 915 msec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000"  "parse_template(repl, '')"
1 loops, best of 1: 1.79 sec per loop

----------
components: Library (Lib), Regular Expressions
files: re_parse_template.patch
keywords: patch
messages: 201032
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Quadratic complexity in the parsing of re replacement string
type: performance
versions: Python 2.7, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file32315/re_parse_template.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19365>
_______________________________________


More information about the Python-bugs-list mailing list