HTML DOM parser?

Gilles Lenfant glenfant at NOSPAM.bigfoot.com
Fri Aug 1 06:29:18 EDT 2003


mailto:gilles at pilotsystems.net
"Paul Rubin" <http://phr.cx@NOSPAM.invalid> a écrit dans le message de news:
7x7k5y5wfh.fsf_-_ at ruckus.brouhaha.com...
> Is there an HTML DOM parser available for Python?  Preferably one that
> does a reasonable job with the crappy HTML out there on real web
> pages, that doesn't get upset about unterminated tables and stuff like
> that.  Many extra points if it understands Javascript.  Application is
> a screen scraping web robot.  Thanks.

Windoze IE5(+) + Win32All python package only :

Use IE as COM object, browse the file or URL, then, get it's DOM root.
But any javascript found in that page is executed at page load and may fool
your app.

--Gilles





More information about the Python-list mailing list