xml file structure for use with ElementTree?

Andrew Dalke adalke at mindspring.com
Sat Oct 9 23:05:00 EDT 2004


Stewart Midwinter wrote:
> That's the ticket!  Unfortunately at the moment when I run this code I
> get the following error:'
> ElementTree instance has no attribute 'remove'
> but I'll try to work through that.

Perhaps you need a newer version of ElementTree?  I don't
know when 'remove' was added.

> The main appeal of ElementTree was so I could avoid having to learn a
> whole lot about XML in order to parse a simple file, but I am coming
> to the conclusion that ElementTree is only simple if you already have
> an understanding about XML.

The problem is that you also need to generate the XML file.
You could use ElementTree to do that, but I've not used
it that way yet.

XML syntax isn't that hard.  The summary is that everything
looks pretty much like this

   <tagname attrib="val">Content goes here</tagname>

The <tagname> is called an opening tag and the </tagname>
is called an opening tag.  The whole thing is called an
element.

There can be 0 or more attributes in the opening tag, but
none in the closing tag.  So the following are valid
opening tag names

   <tagname>
   <person name="Andrew">
   <person name="Andrew" city="Santa Fe">

There are ways to escape special characters in the
values for an attribute.  Only some characters are
allowed as tag names and attribute names.  The ':'
is the only special one.  It's used for namespaces.
That's more complicated and you'll need to look
elsewhere for details on that.  I don't believe duplicate
attribute names are allowed.  Even if they are, don't
use them.

The contents of an element can contain text and other
elements.  This is what makes it an element tree.
So the following is also valid

<person><name>Andrew</name><city>Santa Fe</city></person>

It's a matter of some preference about whether to put
data into attributes or as contents of an element.

As a shortcut, if there is no content then ending
the tag with a '/>' makes it both an opening tag and
a closing tag, so the following is a complete element.

   <person name="Andrew"/>

The first line of your XML document could contain
another sort of element called a processing directive.
It tells the XML parser how to process the rest of
the document.  It looks like this

<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about), this tells the
processor to interpret bytes as the UTF-8 encoding
of Unicode characters.  I believe the first few
bytes are also used to determing the byte ordering
in case the text is stored as big-endian or little-
endian "wide" unicode characters.

One final note.  Only one top-level element is
allowed in an XML file.  For example, this is allowed


<?xml version='1.0' encoding='utf-8'?>
<people>
  <person name="Andrew"/>
  <person name="Fred"/>
</people>

while this is not

<?xml version='1.0' encoding='utf-8'?>
<person name="Andrew"/>
<person name="Fred"/>

In other words there is only one root to the
element tree.

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list