XML using standard Python modules

Thu Sep 13 17:53:54 EDT 2001

In article <k2nuptg6i18n0l48sg2llnl5kho3hqp4g3 at 4ax.com>,
Dale Strickland-Clark  <dale at riverhall.NOSPAM.co.uk> wrote:
>I'm trying to get to grips with XML using Python.
>
>A simple app to start with, it will read a plain text file containing
>some data, convert it to XML and write an XML file.
>
>Later that file will be used as a random-access data source.
>
>Where do I start?
>
>I'm reading this: http://py-howto.sourceforge.net/xml-howto/SAX.html
>at the moment. Is it up-to-date?
>
>There seems to be half a dozen XML modules. Which is the right one for
>this type of application? XML.SAX?
>
>Thanks for any pointers.

Minidom is pretty straightforward:

    >>> from xml.dom.minidom import Document
    >>> d = Document()
    >>> e1 = d.createElement("foo")
    >>> e1.attributes["attr"] = "value"
    >>> print e1.toxml()
	<foo attr="value"/>
    >>> e2 = d.createElement("bar")
    >>> e21 = d.createTextNode("Hello, world!")
    >>> e2.appendChild(e21)
	<DOM Text node "Hello, wor...">
    >>> print e2.toxml()
	<bar>Hello, world!</bar>
    >>> e = d.createElement("main")
    >>> d.appendChild(e)
	<DOM Element: main at 135810908>
    >>> print d.toxml()
	<?xml version="1.0" ?>
	<main/>
    >>> e.appendChild(e1)
	<DOM Element: foo at 135830820>
    >>> e.appendChild(e2)
	<DOM Element: bar at 135838812>
    >>> print d.toxml()
	<?xml version="1.0" ?>
	<main><foo attr="value"/><bar>Hello, world!</bar></main>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135607220>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135837932>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135841092>
    >>> e.appendChild(e2.cloneNode(1))
	<DOM Element: bar at 135841796>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135913204>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135914692>
    >>> e.appendChild(e1.cloneNode(1))
	<DOM Element: foo at 135606788>
    >>> print e.toxml()
	<main><foo attr="value"/><bar>Hello, world!</bar><foo attr="value"/><foo attr="value"/><foo attr="value"/><bar>Hello, world!</bar><foo attr="value"/><foo attr="value"/><foo attr="value"/></main>
    >>> print e.toprettyxml(indent="  ")
	<main>
	  <foo attr="value"/>
	  <bar>
	    Hello, world!
	  </bar>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <bar>
	    Hello, world!
	  </bar>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <foo attr="value"/>
	</main>

    >>>  

I meant to do d.toprettyxml in the last command...  Anyhow, you get the idea.
cloneNode(1) means "deep" clone, that is, with all subelements.

Here's some more:

    >>> bars = e.getElementsByTagName("bar")
    >>> bars
	[<DOM Element: bar at 135838812>, <DOM Element: bar at 135841796>]
    >>> bar = bars[1]
    >>> text = bar.firstChild
    >>> text.data
	'Hello, world!'
    >>> text.data = "Good bye."
    >>> print d.toprettyxml(indent="  ")
	<?xml version="1.0" ?>
	<main>
	  <foo attr="value"/>
	  <bar>
	    Hello, world!
	  </bar>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <bar>
	    Good bye.
	  </bar>
	  <foo attr="value"/>
	  <foo attr="value"/>
	  <foo attr="value"/>
	</main>

    >>>  

To make a document out of an XML string, use "parseString"; to parse a file use
"parse".

Xerces API javadoc is where I learned this stuff.  It is slightly different in
Python, but it's still very close.

	--Lyosha