[issue1160] Medium size regexp crashes python

Fredrik Lundh report at bugs.python.org
Sun Sep 23 21:10:00 CEST 2007


Fredrik Lundh added the comment:

Well, I'm not sure 81k qualifies as "medium sized", really.  If you look
at the size distribution for typical RE:s (which are usually
handwritten, not machine generated), that's one or two orders of
magnitude larger than "medium".

(And even if this was guaranteed to work on all Python builds, my guess
is that performance would be pretty bad compared to a using a minimal RE
and checking potential matches against a set.  The "|" operator is
mostly O(N), not O(1).)

As for fixing this, the "byte code" used by the RE engine uses a word
size equal to the Unicode character size (sizeof(Py_UNICODE)) for the
given platform.  I don't think it would be that hard to set it to 32
bits also on platforms using 16-bit Unicode characters (if anyone would
like to experiment, just set SRE_CODE to "unsigned long" in sre.h and
see what happens when you run the test suite).

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1160>
__________________________________


More information about the Python-bugs-list mailing list