advice modifying re library to support more than 100 named captures.

Tim Peters tim.peters at gmail.com
Tue May 16 20:50:22 EDT 2006


[Richard Meraz]
> We need to capture more than 99 named groups using python regular
> expressions.
> ...
> its clear why the language designers have decided on this limitation.  For
> our system, however, it is essential that we be able to capture an arbitrary
> number of groups.
>
> Could anyone on the list suggest what parts of the library code make
> assumptions about this restriction? We'd like to make some local changes to
> the core library to allow us to continue the development of our system (we
> don't want to switch to another language). We removed the condition in
> sre_compile.py that raises an exception for compiled regexps with more than
> 100 groups.  This allowed us to compile a regular expression with more than
> 100 groups, but subsequent attempts to match or search with that regular
> expression resulted in segfaults.

Which is a good clue that you'll have to understand the C code
implementing regexps.  That's in Modules/_sre.c.  In the absence of
understanding, your best bet is to get in a debugger, see where it's
segfaulting, guess at the cause, try to fix it, and start over.

For a start, you'll certainly need to boost the value of this #define in sre.h:

#define SRE_MARK_SIZE 200

Sorry, but I have no idea whether you'll need more than just that.



More information about the Python-list mailing list