[Tutor] Issues Parsing XML

Moos Heintzen iwasroot at gmail.com
Fri Mar 13 01:54:28 CET 2009


So you want one line for each <finding> element? Easy:

# Get <finding> elements
findings = domDatasource.getElementsByTagName('FINDING')

# Get the text of all direct child nodes in each element
# That's assuming every <finding> child has a TEXT_NODE node.
lines = []
for finding in findings:
    lines.append([f.firstChild.data for f in finding.childNodes])

# print
for line in lines:
     print ", ".join(line)

Not sure how you want to deal with newlines. You can escape them to \n
in the output, or you might find something in the CSV module. (I
haven't looked at it.)

Now this doesn't deal with missing elements. I found some have 7, and
others have 9. You might be able to insert two empty elements in lines
with length 7.

Or, if you want to have more control, you can make a dictionary with
keys of all available tag names, and for each element found in
<finding>, insert it in the dictionary (If it's a valid tag name).

Then you have a list of dictionaries, and you can print the elements
in any order you want. Missing elements will have null strings as
values.

Moos


More information about the Tutor mailing list