Matching XML Tag Contents with Regex

garage xmikedavis at gmail.com
Tue Dec 11 11:41:30 EST 2007


> Is what I'm trying to do possible with Python's Regex library? Is
> there an error in my Regex?

Search for '*?' on http://docs.python.org/lib/re-syntax.html.

To get around the greedy single match, you can add a question mark
after the asterisk in the 'content' portion the the markup.  This
causes it to take the shortest match, instead of the longest. eg

<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*?[^(%(tagName)s)]*

There's still some funkiness in the regex and logic, but this gives
you the three matches



More information about the Python-list mailing list