Problem with xml.dom parser and xmlns attribute
Uche Ogbuji
uche at ogbuji.net
Mon May 10 12:38:31 EDT 2004
Peter Maas <peter.maas at mplusr.de> wrote in message news:<c682uu$sco$1 at swifty.westend.com>...
> Hi,
>
> I have a problem parsing html text with xmldom. The following code
> runs well:
>
> --------------------------------------------
> from xml.dom.ext.reader import HtmlLib
> from xml.dom.ext import PrettyPrint
>
> r = HtmlLib.Reader()
> doc = r.fromString(
> '''
> <html>
> <head>
> </head>
> <body>
> <p>hallo welt
> </body>
> </html>
> ''')
> PrettyPrint(doc)
> --------------------------------------------
>
> but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
> I get the error
>
> Traceback (most recent call last):
> File "xhtml.py", line 5, in ?
> doc = r.fromString(
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
> return self.fromStream(stream, ownerDoc, charset)
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
> self.parser.parse(stream)
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
> self._parser.parse(stream.read())
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
> unicode(value, self._charset))
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
> attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
> File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
> raise NamespaceErr()
> xml.dom.NamespaceErr: Invalid or illegal namespace operation
> >Exit code: 1
>
> A lot of HTML documents on Internet have this xmlns=.... Are
> they wrong or is this a PyXML bug?
This looks like a 4DOM bug. What are you hoping to do once you've
parsed these documents? If we know we can either suggest an
alternative tool to use or perhaps a workaround.
--Uche
More information about the Python-list
mailing list