Problem with xml.dom parser and xmlns attribute

Richard Brodie R.Brodie at rl.ac.uk
Fri Apr 23 05:03:42 EDT 2004


"Peter Maas" <peter.maas at mplusr.de> wrote in message news:c68jai$g85$1 at swifty.westend.com...

> Thanks, Richard. But in the Internet most of the time I don't know
> what kind of document I'm dealing with when I start parsing. I guess
> I should use HTMLParser (?).

If you're dealing with a wide range of web pages, chances are they
will have all manner of rubbish in them. I would probably feed the
stuff through Tidy (or uTidyLib) first, to convert to cleanish XHTML,
then use an XML parser.





More information about the Python-list mailing list