RegExp Help

Sean DiZazzo half.italian at gmail.com
Thu Dec 13 20:49:20 EST 2007


Hi group,

I'm wrapping up a command line util that returns xml in Python.  The
util is flaky, and gives me back poorly formed xml with different
problems in different cases.  Anyway I'm making progress.  I'm not
very good at regular expressions though and was wondering if someone
could help with initially splitting the tags from the stdout returned
from the util.

I have the following example string, and am simply trying to split it
into two xml tags...

simplified = """2007-12-13 <tag1 attr1="text1" attr2="text2" /tag1>
\n2007-12-13 <tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2>
\n"""

Basically I want the two tags, and to discard anything in between
using a reg exp.  Like this:

tags = ["<tag1 attr1="text1" attr2="text2" /tag1>", "<tag2
attr1="text1" attr2="text2" attr3="text3\n" /tag2>"]

I've tried several approaches, some of which got close, but the
newline in the middle of one of the tags screwed it up.  The closest
I've been is something like this:

retag = re.compile(r'<.+>*') # tried here with re.DOTALL as well
tags = re.findall(retag)

Can anyone help me?

~Sean




More information about the Python-list mailing list