[XML-SIG] Anybody using PyXML (4DOM) HTML DOM?

Mon Aug 25 18:33:05 EDT 2003

Does anybody use PyXML's (4DOM's) HTML DOM implementation (including the
implementors themselves)?

A couple of examples where it looks clearly broken, which makes me suspect
nobody but me is using it:

1. HTMLDocument.getElementsByTagName doesn't work at all for lower-case
attribute values (SF bug 782470):

#!/usr/bin/env python

from xml.dom.ext.reader import HtmlLib

doc = HtmlLib.FromHtml("""<html><head><title></title></head><body>
<form name="blah"></form>
</body></html>""")

# HTMLElement.getAttribute uppercases the name, but it was *stored*
# in lower case, so both fail.
print repr(doc.getElementsByName("blah"))
print repr(doc.getElementsByName("BLAH"))

I don't know how this should be fixed: case issues in HTML DOM seem
horribly complicated.

2. HTMLInputElement._get_type capitalisation is wrong.

xml/dom/html/HTMLInputElement.py says:

|     def _get_type(self):
|         return string.capitalize(self.getAttribute('TYPE'))

HTML DOM level 2 spec says:

| The type of control created (all lower case). See the type attribute
                               ^^^^^^^^^^^^^^
| definition in HTML 4.01.

John