Python regex question

Tim van der Leeuw tnleeuw at gmail.com
Wed Jun 11 05:22:27 EDT 2008


Hi,

I'm trying to create a regular expression for matching some particular XML
strings. I want to extract the contents of a particular XML tag, only if it
follows one tag, but not follows another tag. Complicating this, is that
there can be any number of other tags in between.

So basically, my regular expression should have 3 parts:
- first match
- any random text, that should not contain string '<Xds'
- second match

I have a problem figuring out how to do the second part: a random bit of
text, that should _not_ contain the substring '<Xds' ('<Xds' being the start
of any tags which should not be in between my first and second match).
Because of the variable length of the overal match, I cannot do this with a
negative look-behind assertion, and a negative look-ahead assertion doesn't
seem to work either.

The regular expression that I have now is:

r'(?s)<Xds\w*Policy>.*?<ref>(?P<pol_ref>\d+)</ref>'

(hopefully without typos)

Here '<Xds\w*Policy>' is my first match, and '<ref>(?P<pol_ref>\d+)</ref>'
is my second match.

In this expression, I want to change the generic '.*?', which matches
everything, with something that matches every string that does not include
the substring '<Xds'.

I know that I could capture the text matched by '.*?' and manually check if
it contains that string '<Xds', but that would be very hard to fit into the
rest of the code, for a number of reasons.

Does anyone have an idea how to do this within one regular expression?

Regards,

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080611/7d0f76f5/attachment.html>


More information about the Python-list mailing list