[issue19365] Quadratic complexity in the parsing of re replacement string
Serhiy Storchaka
report at bugs.python.org
Wed Oct 23 16:36:02 CEST 2013
New submission from Serhiy Storchaka:
sre_parse.parse_template uses string concatenation to accumulate characters.
def literal(literal, p=p, pappend=a):
if p and p[-1][0] is LITERAL:
p[-1] = LITERAL, p[-1][1] + literal
else:
pappend((LITERAL, literal))
This operation have quadratic calculation complexity for long replacement strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')"
1 loops, best of 1: 3.38 sec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')"
1 loops, best of 1: 18.2 sec per loop
The proposed patch change amortized complexity to be linear. It also speeds up parsing shorter strings.
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')"
1 loops, best of 1: 915 msec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')"
1 loops, best of 1: 1.79 sec per loop
----------
components: Library (Lib), Regular Expressions
files: re_parse_template.patch
keywords: patch
messages: 201032
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Quadratic complexity in the parsing of re replacement string
type: performance
versions: Python 2.7, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file32315/re_parse_template.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19365>
_______________________________________
More information about the Python-bugs-list
mailing list