using re: hitting recursion limit
Oliver Fromme
olli at haluter.fromme.com
Wed Oct 27 08:47:13 EDT 2004
Erik Johnson <spam at nospam.org> wrote:
> I have done a fair amount of regular expression text processing in Perl,
> and am currently trying to convert a running Perl script into Python (for a
> number of reasons I won't go into here). I have not had any problems with
> memory limits using Perl, but in trying to clip out a particular table from
> a web page, I am hitting Python's recursion limit.
>
> The RE is pretty simple:
>
> pat = '(<table.*?%s.*?</table>)' % magic_string
Maybe the most efficient way is not ot use REs at all.
For example, one way would be to split the whole page on
the string "<table", strip off the closing "</table>" and
any stuff behind it, then look for the one containing your
magic_string.
tables = [t.split("</table>", 1)[0] for t in page.split("<table")]
for t in tables:
if t.find(magic_string) >= 0:
print t
There's one disadvantage: If you have nested tables, this
approach will only handle the innermost tables correctly
(i.e. those which don't contain further tables).
Best regards
Oliver
--
Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany
``All that we see or seem is just a dream within a dream.''
(E. A. Poe)
More information about the Python-list
mailing list