Problem parsing namespaces with xml.dom.minidom

Tue Jan 18 00:49:48 EST 2005

Hi everyone.

I've been trying for several hours now to get minidom to parse 
namespaces properly from my stream of XML, so that I can use DOM methods 
such as getElementsByTagNameNS().  For some reason, though, it just 
doesn't seem to want to split the prefixes from the rest of the tags 
when parsing.

The minidom documentation at 
http://docs.python.org/lib/module-xml.dom.minidom.html implies that 
namespaces are supposed to be supported as long as I'm using a parser 
that supports them, but I just can't seem to get it to work.  I was 
wondering if anyone can see what I'm doing wrong.

Here's a simple test case that represents the problem I'm having.  If it 
makes a difference, I have PyXML installed, or at the very least, I have 
the Debian Linux python-xml package installed, which I'm pretty sure is 
PyXML.

========

from xml.dom import minidom
from xml import sax
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
    <xte:creator>alias</xte:creator>
    <xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
    <xte:object objectid="object1">
      Nothing
    </xte:object>
</xte:xte>
'''
# Set up a parser for namespace-ready parsing.
parser = sax.make_parser()
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.setFeature(sax.handler.feature_namespace_prefixes, 1)

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS("object",
        'http://www.mcs.vuw.ac.nz/renata/xte')
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

=========

Output is:

1: [<DOM Element: xte:object at 0x404a922c>]
2: []

=========

What *seems* to be happening is that the namespace prefix isn't being 
separated, and is simply being parsed as if it's part of the rest of the 
tag.  Therefore when I search for a tag in a particular namespace, it's 
not being found.

I've looked through the code in the python libraries, and the 
minidom.parseString function appears to be calling the PullDOM parse 
method, which creates a PullDOM object to be the ContentHandler.  Just 
browsing over that code, it *appears* to be trying to split the prefix 
from the local name in order to build a namespace-ready DOM as I would 
expect it to.  I can't quite figure out why this isn't working for me, 
though.

I'm not terribly experienced with XML in general, so it's possible that 
I'm just incorrectly interpreting how things are supposed to work to 
begin with.  If this is the case, please accept my apologies, but I'd 
like any suggestions for how I should be doing it.  I'd really just like 
to be able to parse an XML document into a DOM, and then be able to pull 
out elements relative to their namespaces.

Can anyone see what I'm doing wrong?

Thanks.
Mike.