python and parsing an xml file

Stefan Behnel stefan_ml at behnel.de
Mon Feb 21 12:43:16 EST 2011


Matt Funk, 21.02.2011 18:30:
> I want to create a set of xml input files to my code that look as follows:
> <?xml version="1.0" encoding="UTF-8"?>
>
> <!-- Settings for the algorithm to be performed
>                                           -->
> <Algorithm>
>
>      <!-- The algorithm type.
>                                      -->
>      <!-- The supported options are:
>                                      -->
>      <!--     - Alg0
>                                          -->
>      <!--     - Alg1
>                                          -->
>      <Type>Alg1</Type>
>
>      <!-- the location/path of the input file for this algorithm
>                                          -->
>      <path>./Alg1.in</path>
>
> </Algorithm>
>
>
> <!-- Relevant information during the processing will be written to a
> logfile                                 -->
> <Logfile>
>
>      <!-- the location/path of the logfile (i.e. where to put the
> logfile)                                    -->
>      <path>c:\tmp</path>
>
>      <!-- verbosity level (i.e. how much to print)
>                                      -->
>      <!-- The supported options are:
>                                      -->
>      <!--     - 0    (nothing printed)
>                                          -->
>      <!--     - 1    (print on error)
>                                          -->
>      <verbosity>1</verbosity>
>
> </Logfile>

That's not XML. XML documents have exactly one root element, i.e. you need 
an enclosing element around these two tags.


> So there are comments, whitespace etc ... in it.
> I would like to be able to put everything into some sort of structure

Including the comments or without them? Note that ElementTree will ignore 
comments.


> such that i can access it as:
> structure['Algorithm']['Type'] == Alg1

Have a look at lxml.objectify. It allows you to write

     alg_type = root.Algorithm.Type.text

and a couple of other niceties.

http://lxml.de/objectify.html


> I was wondering if there is something out there that does this.
> I found and tried a few things:
> 1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
> It simply doesn't work. I get the following error:
> raise exception
> xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
> (invalid token)

"not well formed" == "not XML".


> But i removed everything from the file except:<?xml version="1.0"
> encoding="UTF-8"?>
> and i still got the error.

That's not XML, either.


> Anyway, i looked at ElementTree, but that error out with:
> xml.parsers.expat.ExpatError: junk after document element: line 19, column 0

In any case, ElementTree is preferable over a SAX based solution, both for 
performance and maintainability reasons.

Stefan




More information about the Python-list mailing list