Expat XML Parser
Martin von Loewis
loewis at informatik.hu-berlin.de
Thu Nov 29 11:32:07 EST 2001
"Richard Boardman" <rpb at soton.ac.uk> writes:
> The problem major is that I can't seem to return any of these values at
> all - they will all print on the screen, but I can't actually *do* anything
> with these values. I don't think it's anything to do with Expat; more my
> lack of experience with this language. I can't find any documentation
> explaining how Expat works.
Expat works in an event-driven manner: For each chunk of the XML
document, it invokes a function passing the data it has read. Those
functions don't return anything (their return value is ignored); they
must do all processing before they return.
That processing could be to print the contents out, or it could be to
set some global variables to some values, for later inspection.
> What I'd like to do is have something that works thus:
>
> readInXML
That is the source of confusion. In event-driven XML processing, there
is no separate "read-in"-step. The document is processed while being
read; once reading is complete, the processing must be done also.
What you want is that reading returns some data structure to inspect.
For that, I recommend to use the DOM. To read the document, do
document = xml.dom.minidom.parse(url-of-document)
> foreach element in XML
> if element = "abcdefg" {
This is written as
for element in document.getElementsByTagName("abcdefg"):
> getCharacterData
This is more tricky: the content of element may be other elements; or
it may be multiple text nodes (e.g. resulting from CDATA sections):
chardata = ""
for child in element.childNodes:
if child.nodeType in [Node.TEXT_NODE, Node.CDATA_SECTION_NODE]:
chardata += child.data
If you know there ain't any CDATA sections, and no comments,
processing instructions etc inside the text, you could also invoke
.normalize() first.
> doStuff with characterData
doStuff(chardata)
HTH,
Martin
More information about the Python-list
mailing list