Problem with xml.dom parser and xmlns attribute

Mon May 10 12:38:31 EDT 2004

Peter Maas <peter.maas at mplusr.de> wrote in message news:<c682uu$sco$1 at swifty.westend.com>...
> Hi,
> 
> I have a problem parsing html text with xmldom. The following code
> runs well:
> 
> --------------------------------------------
> from xml.dom.ext.reader import HtmlLib
> from xml.dom.ext import PrettyPrint
> 
> r = HtmlLib.Reader()
> doc = r.fromString(
> '''
> <html>
> <head>
> </head>
> <body>
> <p>hallo welt
> </body>
> </html>
> ''')
> PrettyPrint(doc)
> --------------------------------------------
> 
> but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
> I get the error
> 
> Traceback (most recent call last):
>    File "xhtml.py", line 5, in ?
>      doc = r.fromString(
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
>      return self.fromStream(stream, ownerDoc, charset)
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
>      self.parser.parse(stream)
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
>      self._parser.parse(stream.read())
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
>      unicode(value, self._charset))
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
>      attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
>    File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
>      raise NamespaceErr()
> xml.dom.NamespaceErr: Invalid or illegal namespace operation
>  >Exit code: 1
> 
> A lot of HTML documents on Internet have this xmlns=.... Are
> they wrong or is this a PyXML bug?

This looks like a 4DOM bug.  What are you hoping to do once you've
parsed these documents?  If we know we can either suggest an
alternative tool to use or perhaps a workaround.

--Uche