[XML-SIG] Minidom bugs/questions
Guido van Rossum
guido@digicool.com
Sat, 03 Feb 2001 14:39:54 -0500
I'm making my first steps into XML, so please forgive me. I wrote a
simple XML application using a DOM implementation by Digital Creations
folks. Then I was trapped in a hotel room with my code on a laptop
without a copy of DC's code, but with the Python 2.1a1 release
installed.
Converting my app to use minidom was easy enough, but I found out a
bout a bunch of differences between the two DOM implementations. Some
of these are fine with me (e.g. minidom doesn't preserve comments,
doesn't prefix its output with "<?xml version="1.0" ?>" when writing
XML output, minidom returns Unicode strings even for ASCII input).
But others suggest that either the DOM standard isn't very strict or
unambiguous, or one of the implementations has a bug. Here's the list
of things that I had to fix in my code:
1. The other DOM has a hasAttributes() predicate; minidom is missing
this and I have to use the more expensive form "if node.attributes".
2. In minidom, Element.getAttribute() and .getAttributeNS() raise
KeyError for a non-existing attribute; in the othe DOM, they return
"". (Personally, I'd prefer KeyError or perhaps None, but according
to Fred, the DOM standard requires "". Note that this is poorly
documented -- from the docs for getAttribute*() it's not clear
*what* is returned in this case.)
3. Note that getAttributeNode() correctly returns None of the attribute
doesn't exist, but getAttributeNodeNS() looks like it will raise
KeyError too!
4. In minidom, createDocument() leaves doc.documentElement set to None;
in the other DOM, doc.documentElement is initialized to an Element
node created from the second argument to createDocument(). (Again,
according to Fred, the DOM standard requires the latter.)
5. When writing XML output from a DOM tree that uses namespace
attributes, minidom doesn't insert the proper "xmlns:<tag>=<URI>"
attributes. The other DOM gets this right. (This is a bit tricky
to do, although I've figured a good way to do it which I'll gladly
donate to minidom if it's deemed useful.)
6. When writing XML output from a DOM tree that has a default
namespace, minidom writes <:tag>...</:tag> instead of
<tag>...</tag> like the other DOM, and like I would have expected.
Other comments:
7. I noticed that minidom's __getattr__ special-cases requests for an
attribute whose name begins with _get_, and makes up a lambda on the
fly. This suggests that the caller is using for _get_foo() where
there is no such method, but there is a foo attribute. Since
_get_foo() is a detail of the implementation (I hope), doesn't this
mean that the implementation is doing something silly? Shouldn't
the implementation be fixed rather than accommodated? Or am I
missing something?
Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6
turns out to require a patch to pulldom.py). 5 is more work; 7 is a
trivial patch but I expect there's a reason (in which case a comment
would be a nice idea :-).
I'd like some feedback before checking this in...
*** pulldom.py 2001/01/27 08:47:37 1.17
--- pulldom.py 2001/02/03 19:38:26
***************
*** 56,62 ****
# provide us with the original name. If not, create
# *a* valid tagName from the current context.
if tagName is None:
! tagName = self._current_context[uri] + ":" + localname
node = self.document.createElementNS(uri, tagName)
else:
# When the tagname is not prefixed, it just appears as
--- 56,66 ----
# provide us with the original name. If not, create
# *a* valid tagName from the current context.
if tagName is None:
! prefix = self._current_context[uri]
! if prefix:
! tagName = prefix + ":" + localname
! else:
! tagName = localname
node = self.document.createElementNS(uri, tagName)
else:
# When the tagname is not prefixed, it just appears as
***************
*** 66,72 ****
for aname,value in attrs.items():
a_uri, a_localname = aname
if a_uri:
! qname = self._current_context[a_uri] + ":" + a_localname
attr = self.document.createAttributeNS(a_uri, qname)
else:
attr = self.document.createAttribute(a_localname)
--- 70,80 ----
for aname,value in attrs.items():
a_uri, a_localname = aname
if a_uri:
! prefix = self._current_context[a_uri]
! if prefix:
! qname = prefix + ":" + a_localname
! else:
! qname = a_localname
attr = self.document.createAttributeNS(a_uri, qname)
else:
attr = self.document.createAttribute(a_localname)
*** minidom.py 2001/02/02 19:40:19 1.22
--- minidom.py 2001/02/03 19:38:50
***************
*** 435,444 ****
Node.unlink(self)
def getAttribute(self, attname):
! return self._attrs[attname].value
def getAttributeNS(self, namespaceURI, localName):
! return self._attrsNS[(namespaceURI, localName)].value
def setAttribute(self, attname, value):
attr = Attr(attname)
--- 435,450 ----
Node.unlink(self)
def getAttribute(self, attname):
! try:
! return self._attrs[attname].value
! except KeyError:
! return ""
def getAttributeNS(self, namespaceURI, localName):
! try:
! return self._attrsNS[(namespaceURI, localName)].value
! except KeyError:
! return ""
def setAttribute(self, attname, value):
attr = Attr(attname)
***************
*** 457,463 ****
return self._attrs.get(attrname)
def getAttributeNodeNS(self, namespaceURI, localName):
! return self._attrsNS[(namespaceURI, localName)]
def setAttributeNode(self, attr):
if attr.ownerElement not in (None, self):
--- 463,469 ----
return self._attrs.get(attrname)
def getAttributeNodeNS(self, namespaceURI, localName):
! return self._attrsNS.get((namespaceURI, localName))
def setAttributeNode(self, attr):
if attr.ownerElement not in (None, self):
***************
*** 528,533 ****
--- 534,545 ----
def _get_attributes(self):
return AttributeList(self._attrs, self._attrsNS)
+ def hasAttributes(self):
+ if self._attrs or self._attrsNS:
+ return 1
+ else:
+ return 0
+
class Comment(Node):
nodeType = Node.COMMENT_NODE
nodeName = "#comment"
***************
*** 635,640 ****
--- 647,654 ----
raise xml.dom.NamespaceErr("illegal use of 'xml' prefix")
if prefix and not namespaceURI:
raise xml.dom.NamespaceErr("illegal use of prefix without namespaces")
+ element = doc.createElementNS(namespaceURI, qualifiedName)
+ doc.appendChild(element)
doctype.parentNode = doc
doc.doctype = doctype
doc.implementation = self
--Guido van Rossum (home page: http://www.python.org/~guido/)