88k regex = RuntimeError

Peter Otten __peter__ at web.de
Tue Feb 14 05:58:38 EST 2006


jodawi wrote:

> I need to find a bunch of C function declarations by searching
> thousands of source or html files for thousands of known function
> names. My initial simple approach was to do this:
> 
> rxAllSupported = re.compile(r"\b(" + "|".join(gAllSupported) + r")\b")
> # giving a regex of   \b(AAFoo|ABFoo|   (uh... 88kb more...)   |zFoo)\b
> 
> for root, dirs, files in os.walk( ... ):
> ...
>     for fileName in files:
> ...
>         filePath = os.path.join(root, fileName)
>         file = open(filePath, "r")
>         contents = file.read()
> ...
>         result = re.search(rxAllSupported, contents)
> 
> but this happens:
> 
>     result = re.search(rxAllSupported, contents)
>   File "C:\Python24\Lib\sre.py", line 134, in search
>     return _compile(pattern, flags).search(string)
> RuntimeError: internal error in regular expression engine
> 
> I assume it's hitting some limit, but don't know where the limit is to
> remove it. I tried stepping into it repeatedly with Komodo, but didn't
> see the problem.
> 
> Suggestions?

One workaround may be as easy as

wanted = set(["foo", "bar", "baz"])
file_content = "foo bar-baz ignored foo()"

r = re.compile(r"\w+")
found = [name for name in r.findall(file_content) if name in wanted]

print found

Peter




More information about the Python-list mailing list