[issue19329] Faster compiling of charset regexpes

Serhiy Storchaka report at bugs.python.org
Thu Oct 24 21:24:59 CEST 2013


Serhiy Storchaka added the comment:

Here is a more complex patch which optimizes charset compiling. It affects small charsets too. Big charsets now supports same optimizations as small charsets. Optimized bitmap now can be used even if the charset contains category items or non-bmp characters.

$ ./python -m timeit "from sre_compile import compile; r = '[0-9]+'"  "compile(r, 0)"
Unpatched: 1000 loops, best of 3: 457 usec per loop
Patched: 1000 loops, best of 3: 368 usec per loop
$ ./python -m timeit "from sre_compile import compile; r = '[ \t\n\r\v\f]+'"  "compile(r, 0)"
Unpatched: 1000 loops, best of 3: 490 usec per loop
Patched: 1000 loops, best of 3: 413 usec per loop
$ ./python -m timeit "from sre_compile import compile; r = '[0-9A-Za-z_]+'"  "compile(r, 0)"
Unpatched: 1000 loops, best of 3: 760 usec per loop
Patched: 1000 loops, best of 3: 527 usec per loop
$ ./python -m timeit "from sre_compile import compile; r = r'[^\ud800-\udfff]*'"  "compile(r, 0)"
Unpatched: 100 loops, best of 3: 2.07 msec per loop
Patched: 1000 loops, best of 3: 1.44 msec per loop
$ ./python -m timeit "from sre_compile import compile; r = '[\u0410-\u042f\u0430-\u043f\u0404\u0406\u0407\u0454\u0456\u0457\u0490\u0491]+'"  "compile(r, 0)"
Unpatched: 100 loops, best of 3: 8.24 msec per loop
Patched: 100 loops, best of 3: 2.13 msec per loop
$ ./python -m timeit "from sre_compile import compile; r = '[%s]' % ''.join(map(chr, range(256, 2**16, 255)))"  "compile(r, 0)"
Unpatched: 10 loops, best of 3: 119 msec per loop
Patched: 10 loops, best of 3: 24.1 msec per loop

----------
title: Faster compiling of big charset regexpes -> Faster compiling of charset regexpes
Added file: http://bugs.python.org/file32337/re_optimize_charset.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19329>
_______________________________________


More information about the Python-bugs-list mailing list