RegExp Help

Sean DiZazzo half.italian at gmail.com
Fri Dec 14 04:06:21 EST 2007


On Dec 14, 12:04 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Thu, 13 Dec 2007 17:49:20 -0800, Sean DiZazzo wrote:
> > I'm wrapping up a command line util that returns xml in Python.  The
> > util is flaky, and gives me back poorly formed xml with different
> > problems in different cases.  Anyway I'm making progress.  I'm not
> > very good at regular expressions though and was wondering if someone
> > could help with initially splitting the tags from the stdout returned
> > from the util.
>
> > [...]
>
> > Can anyone help me?
>
> Flaky XML is often produced by programs that treat XML as ordinary text
> files. If you are starting to parse XML with regular expressions you are
> making the very same mistake.  XML may look somewhat simple but
> producing correct XML and parsing it isn't.  Sooner or later you stumble
> across something that breaks producing or parsing the "naive" way.
>
> Ciao,
>         Marc 'BlackJack' Rintsch

It's not really complicated xml so far, just tags with attributes.
Still, using different queries against the program sometimes offers
differing results...a few examples:

<id 123456 />
<tag name="foo" />
<tag2 name="foo" moreattrs="..." /tag2>
<tag3 name="foo" moreattrs="..." tag3/>

It's consistent (at least) in that consistent queries always return
consistent tag styles.  It's returned to stdout with some extra
useless information, so the original question was to help get to just
the tags. After getting the tags, I'm running them through some
functions to fix them, and then using elementtree to parse them and
get all the rest of the info.

There is no api, so this is what I have to work with.  Is there a
better solution?

Thanks for your ideas.

~Sean



More information about the Python-list mailing list