[XML-SIG] lost

Kai Hendry hendry@cs.helsinki.fi
Mon, 24 Mar 2003 11:42:25 +0200


I am just getting started with XML processing in python, and now I need
to do something a little more complicated, and I staring at:
http://www.python.org/doc/current/lib/markup.html

And wondering what to use!

Preferably something within the default python2.2 distribution.

Here is something I would like to do, to a piece of XML, with the 'p?' tags:

case 1:
<p div="sadasd">we are <b>co<it>mi</it>ng</b> along</p>
case 2:
<p2 div="sadasd">

        Trouble in
        <p>
                tinseltown
        </p>
        yada

</p2>

case 1:
I want to return:
we are coming along

ignore the div attr. grab all the words between the tags, no matter what other tags are there.
Be nice to know what tags are being ignored.

case 2:
Trouble in tinseltown yada

It ignores the nested p tag (although it would be nice to be notified).
So if both case1 and case2 were just one xml file, the procedure would return:
we are coming along\nTrouble in tinseltown yada
Or a couple of lists...

I have been messing around with minidom, but as I intend to parse loads of xml,
should I look at sax for speed?  The way I had my minidom implementation was
with 'xmldoc.getElementsByTagName'. But it does not seem to work, if I use more
than one case.

Thank you in advance for any pointers,

Regards,
-Kai Hendry