Is there a maximum length of a regular expression in python?

Fredrik Lundh fredrik at pythonware.com
Wed Jan 18 09:14:52 EST 2006


olekristianvillabo at gmail.com wrote:

> I have a regular expression that is approximately 100k bytes. (It is
> basically a list of all known norwegian postal numbers and the
> corresponding place with | in between. I know this is not the intended
> use for regular expressions, but it should nonetheless work.
>
> the pattern is
> ur'(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
> SVOLVÆR)'
>
> The error message I get is:
> RuntimeError: internal error in regular expression engine

you're most likely exceeding the allowed code size (usually 64k).

however, putting all postal numbers in a single RE is a horrid abuse of the RE
engine.  why not just scan for "(N-|NO-)?(\d+)" and use a dictionary to check
if you have a valid match?

    postcodes = {
        "5269": "HJELLESTAD",
        ...
        "9999": "ØSTRE FJORDVIDDA",
    }

    for m in re.finditer("(N-|NO-)?(\d+) ", text):
        prefix, number = m.groups()
        try:
            place = postcodes[number]
        except KeyError:
            continue
        if not text.startswith(place, m.end()):
            continue
        # got a match!
        print prefix, number, place

</F> 






More information about the Python-list mailing list