re module non-greedy matches broken

André Malo auch-ich-m at g-kein-spam.com
Tue Apr 5 13:21:01 EDT 2005


* lothar wrote:

As already said by Georg, regexes are the wrong tool for such tasks, but
anyway...

> give an re to find every innermost "table" element:

<table(?:\s[^>]*)?>[^<]*(?:<(?!/table>|table(?:\s[^>]*)?>)[^<]*)*</table>

> give an re to find every "pre" element directly followed by an "a"
> element:

<pre(?:\s[^>]*)?>[^<]*(?:<(?!/pre>|pre(?:\s[^>]*)?>)[^<]*)*</pre>(?=<a[\s>])

The are written more common than needed for your samples. Depending on the
data to be expected, they can be written a bit shorter, but this is left as
an exercise for the reader.

HTH, nd



More information about the Python-list mailing list