Looking for a specific html parser

Davor Cengija dcengija_remove_ at inet.hr
Wed Mar 19 03:21:04 EST 2003


Grzegorz Adam Hankiewicz wrote:

> On Tue, Mar 18, 2003 at 09:07:47AM +0100, Davor Cengija wrote:
>> Basically, I need a DOM like parser for HTML, with xpath
>> capabilities. xml.dom might help me, but before that I obviously
>> need some kind of html-tidy.
> 
> I required something similar for a small script and I found most
> useful to create first an HTMLParser which translated all code to
> xml and feed that into Python's minidom. It's quite easy to do if
> your input HTML is 'correct', otherwise the xml parsing will surely
> fail, unless you filter all through tidy, of course.
> 

I doubt all of my input would be correct HTML, therefore I obviously need 
tidy-like library. Unfortunatelly, I couldn't find native python tidy, only 
the before-mentioned wrapper. However, I found java tidy implementation, 
which could be helpful, together with jython.

Thanks
-- 
Davor Cengija, dcengija_remove_ at inet.hr




More information about the Python-list mailing list