[XML-SIG] Minidom bugs/questions

Guido van Rossum guido@digicool.com
Sat, 03 Feb 2001 14:39:54 -0500


I'm making my first steps into XML, so please forgive me.  I wrote a
simple XML application using a DOM implementation by Digital Creations
folks.  Then I was trapped in a hotel room with my code on a laptop
without a copy of DC's code, but with the Python 2.1a1 release
installed.

Converting my app to use minidom was easy enough, but I found out a
bout a bunch of differences between the two DOM implementations.  Some
of these are fine with me (e.g. minidom doesn't preserve comments,
doesn't prefix its output with "<?xml version="1.0" ?>" when writing
XML output, minidom returns Unicode strings even for ASCII input).

But others suggest that either the DOM standard isn't very strict or
unambiguous, or one of the implementations has a bug.  Here's the list
of things that I had to fix in my code:

1. The other DOM has a hasAttributes() predicate; minidom is missing
   this and I have to use the more expensive form "if node.attributes".

2. In minidom, Element.getAttribute() and .getAttributeNS() raise
   KeyError for a non-existing attribute; in the othe DOM, they return
   "".  (Personally, I'd prefer KeyError or perhaps None, but according
   to Fred, the DOM standard requires "".  Note that this is poorly
   documented -- from the docs for getAttribute*() it's not clear
   *what* is returned in this case.)

3. Note that getAttributeNode() correctly returns None of the attribute
   doesn't exist, but getAttributeNodeNS() looks like it will raise
   KeyError too!

4. In minidom, createDocument() leaves doc.documentElement set to None;
   in the other DOM, doc.documentElement is initialized to an Element
   node created from the second argument to createDocument().  (Again,
   according to Fred, the DOM standard requires the latter.)

5. When writing XML output from a DOM tree that uses namespace
   attributes, minidom doesn't insert the proper "xmlns:<tag>=<URI>"
   attributes.  The other DOM gets this right.  (This is a bit tricky
   to do, although I've figured a good way to do it which I'll gladly
   donate to minidom if it's deemed useful.)

6. When writing XML output from a DOM tree that has a default
   namespace, minidom writes <:tag>...</:tag> instead of
   <tag>...</tag> like the other DOM, and like I would have expected.

Other comments:

7. I noticed that minidom's __getattr__ special-cases requests for an
   attribute whose name begins with _get_, and makes up a lambda on the
   fly.  This suggests that the caller is using for _get_foo() where
   there is no such method, but there is a foo attribute.  Since
   _get_foo() is a detail of the implementation (I hope), doesn't this
   mean that the implementation is doing something silly?  Shouldn't
   the implementation be fixed rather than accommodated?  Or am I
   missing something?

Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6
turns out to require a patch to pulldom.py).  5 is more work; 7 is a
trivial patch but I expect there's a reason (in which case a comment
would be a nice idea :-).

I'd like some feedback before checking this in...

*** pulldom.py	2001/01/27 08:47:37	1.17
--- pulldom.py	2001/02/03 19:38:26
***************
*** 56,62 ****
              # provide us with the original name. If not, create
              # *a* valid tagName from the current context.
              if tagName is None:
!                 tagName = self._current_context[uri] + ":" + localname
              node = self.document.createElementNS(uri, tagName)
          else:
              # When the tagname is not prefixed, it just appears as
--- 56,66 ----
              # provide us with the original name. If not, create
              # *a* valid tagName from the current context.
              if tagName is None:
!                 prefix = self._current_context[uri]
!                 if prefix:
!                     tagName = prefix + ":" + localname
!                 else:
!                     tagName = localname
              node = self.document.createElementNS(uri, tagName)
          else:
              # When the tagname is not prefixed, it just appears as
***************
*** 66,72 ****
          for aname,value in attrs.items():
              a_uri, a_localname = aname
              if a_uri:
!                 qname = self._current_context[a_uri] + ":" + a_localname
                  attr = self.document.createAttributeNS(a_uri, qname)
              else:
                  attr = self.document.createAttribute(a_localname)
--- 70,80 ----
          for aname,value in attrs.items():
              a_uri, a_localname = aname
              if a_uri:
!                 prefix = self._current_context[a_uri]
!                 if prefix:
!                     qname = prefix + ":" + a_localname
!                 else:
!                     qname = a_localname
                  attr = self.document.createAttributeNS(a_uri, qname)
              else:
                  attr = self.document.createAttribute(a_localname)
*** minidom.py	2001/02/02 19:40:19	1.22
--- minidom.py	2001/02/03 19:38:50
***************
*** 435,444 ****
          Node.unlink(self)
  
      def getAttribute(self, attname):
!         return self._attrs[attname].value
  
      def getAttributeNS(self, namespaceURI, localName):
!         return self._attrsNS[(namespaceURI, localName)].value
  
      def setAttribute(self, attname, value):
          attr = Attr(attname)
--- 435,450 ----
          Node.unlink(self)
  
      def getAttribute(self, attname):
!         try:
!             return self._attrs[attname].value
!         except KeyError:
!             return ""
  
      def getAttributeNS(self, namespaceURI, localName):
!         try:
!             return self._attrsNS[(namespaceURI, localName)].value
!         except KeyError:
!             return ""
  
      def setAttribute(self, attname, value):
          attr = Attr(attname)
***************
*** 457,463 ****
          return self._attrs.get(attrname)
  
      def getAttributeNodeNS(self, namespaceURI, localName):
!         return self._attrsNS[(namespaceURI, localName)]
  
      def setAttributeNode(self, attr):
          if attr.ownerElement not in (None, self):
--- 463,469 ----
          return self._attrs.get(attrname)
  
      def getAttributeNodeNS(self, namespaceURI, localName):
!         return self._attrsNS.get((namespaceURI, localName))
  
      def setAttributeNode(self, attr):
          if attr.ownerElement not in (None, self):
***************
*** 528,533 ****
--- 534,545 ----
      def _get_attributes(self):
          return AttributeList(self._attrs, self._attrsNS)
  
+     def hasAttributes(self):
+         if self._attrs or self._attrsNS:
+             return 1
+         else:
+             return 0
+ 
  class Comment(Node):
      nodeType = Node.COMMENT_NODE
      nodeName = "#comment"
***************
*** 635,640 ****
--- 647,654 ----
                  raise xml.dom.NamespaceErr("illegal use of 'xml' prefix")
              if prefix and not namespaceURI:
                  raise xml.dom.NamespaceErr("illegal use of prefix without namespaces")
+             element = doc.createElementNS(namespaceURI, qualifiedName)
+             doc.appendChild(element)
          doctype.parentNode = doc
          doc.doctype = doctype
          doc.implementation = self

--Guido van Rossum (home page: http://www.python.org/~guido/)