RegExp Help

Sean DiZazzo half.italian at gmail.com
Thu Dec 13 21:04:00 EST 2007


On Dec 13, 5:49 pm, Sean DiZazzo <half.ital... at gmail.com> wrote:
> Hi group,
>
> I'm wrapping up a command line util that returns xml in Python.  The
> util is flaky, and gives me back poorly formed xml with different
> problems in different cases.  Anyway I'm making progress.  I'm not
> very good at regular expressions though and was wondering if someone
> could help with initially splitting the tags from the stdout returned
> from the util.
>
> I have the following example string, and am simply trying to split it
> into two xml tags...
>
> simplified = """2007-12-13 <tag1 attr1="text1" attr2="text2" /tag1>
> \n2007-12-13 <tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2>
> \n"""
>
> Basically I want the two tags, and to discard anything in between
> using a reg exp.  Like this:
>
> tags = ["<tag1 attr1="text1" attr2="text2" /tag1>", "<tag2
> attr1="text1" attr2="text2" attr3="text3\n" /tag2>"]
>
> I've tried several approaches, some of which got close, but the
> newline in the middle of one of the tags screwed it up.  The closest
> I've been is something like this:
>
> retag = re.compile(r'<.+>*') # tried here with re.DOTALL as well
> tags = re.findall(retag)
>
> Can anyone help me?
>
> ~Sean

I found something that works, although I couldn't tell you why it
works.  :)

retag = re.compile(r'<.+?>', re.DOTALL)
tags = retag.findall(retag)

Why does that work?

~Sean



More information about the Python-list mailing list