[XML-SIG] Anybody using PyXML (4DOM) HTML DOM?

John J Lee jjl at pobox.com
Mon Aug 25 18:33:05 EDT 2003


Does anybody use PyXML's (4DOM's) HTML DOM implementation (including the
implementors themselves)?

A couple of examples where it looks clearly broken, which makes me suspect
nobody but me is using it:


1. HTMLDocument.getElementsByTagName doesn't work at all for lower-case
attribute values (SF bug 782470):

#!/usr/bin/env python

from xml.dom.ext.reader import HtmlLib

doc = HtmlLib.FromHtml("""<html><head><title></title></head><body>
<form name="blah"></form>
</body></html>""")

# HTMLElement.getAttribute uppercases the name, but it was *stored*
# in lower case, so both fail.
print repr(doc.getElementsByName("blah"))
print repr(doc.getElementsByName("BLAH"))


I don't know how this should be fixed: case issues in HTML DOM seem
horribly complicated.


2. HTMLInputElement._get_type capitalisation is wrong.

xml/dom/html/HTMLInputElement.py says:

|     def _get_type(self):
|         return string.capitalize(self.getAttribute('TYPE'))


HTML DOM level 2 spec says:

| The type of control created (all lower case). See the type attribute
                               ^^^^^^^^^^^^^^
| definition in HTML 4.01.


John




More information about the XML-SIG mailing list