Parsing HTML/XML documents
pabloski at giochinternet.com
pabloski at giochinternet.com
Thu Apr 26 06:41:29 EDT 2007
I need to parse real world HTML/XML documents and I found two nice python
solution: BeautifulSoup and Tidy.
However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.
I'm interested in using Mozilla DOM from inside a Python script, however
I'm a bit confused about how can I use pyXPCOM to accomplish this job.
Any suggestions?
More information about the Python-list
mailing list