How *extract* data from XHTML Transitional web pages? got xml.dom.minidom troubles..
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Fri Mar 2 20:46:57 EST 2007
seberino at spawar.navy.mil a écrit :
> I'm trying to extract some data from an XHTML Transitional web page.
>
> What is best way to do this?
>
> xml.dom.minidom.
As a side note, cElementTree is probably a better choice. Or even a
simple SAX parser.
>parseString("text of web page") gives errors about it
> not being well formed XML.
If it's not well-formed XML, most - if not all - XML parsers will shoke
on it.
> Do I just need to add something like <?xml ...?> or what?
How could we say without looking at the XML ?
But anyway, even if the XHTML is crappy, BeautifulSoup may do the job...
HTH
More information about the Python-list
mailing list