Problem with xml.dom parser and xmlns attribute
Peter Maas
peter.maas at mplusr.de
Thu Apr 22 10:04:58 EDT 2004
Richard Brodie wrote:
> "Peter Maas" <peter.maas at mplusr.de> wrote in message news:c682uu$sco$1 at swifty.westend.com...
[...]
>>but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
[...]
>>A lot of HTML documents on Internet have this xmlns=.... Are
>>they wrong or is this a PyXML bug?
>
>
> If they are genuine XHTML documents, they should be well-formed XML,
> so you should be able to use an XML rather than an SGML parser.
>
> from xml.dom.ext.reader import Sax2
> r = Sax2.Reader()
Thanks, Richard. But in the Internet most of the time I don't know
what kind of document I'm dealing with when I start parsing. I guess
I should use HTMLParser (?).
Mit freundlichen Gruessen,
Peter Maas
--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas at mplusr.de
-------------------------------------------------------------------
More information about the Python-list
mailing list