[XML-SIG] minidom: Genius or just plain bad?

Philipp Hagemeister phihag at phihag.de
Sun May 24 14:40:13 CEST 2009


I was puzzled when I tripped over the following:

>>> NS = 'http://phihag.de/2009/test/python/ns'
>>> s = '<rootelem a="val" xmlns="' + NS + '" />'
>>> import xml.dom.minidom
>>> doc = xml.dom.minidom.parseString(s)
>>> doc.documentElement.getAttributeNS(NS, 'a')
'' # wtf?
>>> doc.documentElement.getAttribute('a')
u'val'

Looking in the implementation, it seems that minidom is essentially a
DOM Level 1 implementation, with very limited support for namespaces.

Wouldn't be nice to have a full-fledged XML implementation in the Python
stdlib? Probably not (yet) including validation, XSLT and similar
auxiliary technologies, but come on, XML namespaces and DOM 3 L/S should
be supported.

I noticed that important minidom features such as
http://bugs.python.org/issue1621421 are not going anywhere. Is this
because of performance considerations or lack of manpower?

Also, it seems strange that minidom.py is full of comments referencing
outdated 2002 working drafts.
I'm intrigued by the idea of overriding __setattr__ to do crazy stuff
(including invalidating a document-wide cache that probably stays valid
in >99% of the cases although a local check for attribute name = id
would improve performance here) instead of using properties, and then
avoiding actually using it "for performance" reasons.
Additionally, the comment "nodeValue and value are set elsewhere" in
Attr.__init__ neatly conveys the intention of allowing extremly fast
creation of value-less attributes.
Similarly, the opening comment of expatbuilder.py is excellent of the
little-known Alternative Zen of Python

Ugly is better than beautiful.
Implicit is better than explicit.
Performance is better than anything.
Code needs comments explaining and defending it.
Constants are great, especially when depending on their value.¹
Code first, then think about the interface.²
Or don't think about the interface at all.
Fixing bugs in dependencies is bad.
Unless you fix by changing your code.
But do not allow others to do that.
Modularization is good.
As long as you access internals of other modules.
Import from many modules.
Whose names all sound the same.
If self.childnodes (:return True else return False)
That's how I spell pain.

¹ minidom.prefix
² grep "not sure this is meaningful"

Regards,

Philipp


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090524/5b28d526/attachment.pgp>


More information about the XML-SIG mailing list