[Tutor] re module fails to handle text > 16142 characters ???

Kristoffer Erlandsson krier115@student.liu.se
Tue May 20 03:56:34 2003


On Tue, May 20, 2003 at 12:43:16AM -0500, pan@uchicago.edu wrote:
> The other day I was using re to do some parsing and 
> found that one of the text that I tried to parse returned
> an error :
> 
>    return reObj.findall(doc)[0]
>    RuntimeError: maximum recursion limit exceeded
> 
> After some painful debugging, I found:
> 
> let S = len(doc)
> 
> 1. When S > 16143: re.find function failed;
> 2. When S = 16143: failed too
> 3. When S = 16142: successful. This was tested by deleting ANY
>    character in that 16143-long doc.
> 
> I am using Python 2.2.1. Is this a known-bug???
> 
> I would really like to post the entire code here but it's very 
> long, so I think it's better to ask first to see if it's a known
> bug.
> 
> pan

I have gotten this too on large files. I think it occurs when the
qualifers match hordes of text. It seems like the internals of the re
module uses recursion somewhere when matching these and when you match
against too large chunks of texts you recurse too deep and run into a
limit to prevent you from running out of stack space. I'm not sure at
this, but it is how it seems to me. If I'm right the solutions are to
either increase the maximum recursion depth limit (no idea how to do
that though :), split your text into smaller pieces or make your
qualifiers match less somehow.

Hope it helps a bit at least :)

-- 
Kristoffer Erlandsson
E-mail:  krier115@student.liu.se
ICQ#:    378225