using re: hitting recursion limit

Oliver Fromme olli at haluter.fromme.com
Wed Oct 27 08:47:13 EDT 2004


Erik Johnson <spam at nospam.org> wrote:
 >    I have done a fair amount of regular expression text processing in Perl,
 > and am currently trying to convert a running Perl script into Python (for a
 > number of reasons I won't go into here). I have not had any problems with
 > memory limits using Perl, but in trying to clip out a particular table from
 > a web page, I am hitting Python's recursion limit.
 > 
 > The RE is pretty simple:
 > 
 > pat = '(<table.*?%s.*?</table>)' % magic_string

Maybe the most efficient way is not ot use REs at all.

For example, one way would be to split the whole page on
the string "<table", strip off the closing "</table>" and
any stuff behind it, then look for the one containing your
magic_string.

   tables = [t.split("</table>", 1)[0] for t in page.split("<table")]
   for t in tables:
       if t.find(magic_string) >= 0:
           print t

There's one disadvantage:  If you have nested tables, this
approach will only handle the innermost tables correctly
(i.e. those which don't contain further tables).

Best regards
   Oliver

-- 
Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

``All that we see or seem is just a dream within a dream.''
(E. A. Poe)



More information about the Python-list mailing list