html parsing? Or just simple regex'ing?

Diez B. Roggisch deetsNOSPAM at web.de
Wed Nov 10 17:58:13 EST 2004


> But if I use an XML parser to parse HTML instead of a dedicated HTML
> parser, will I still get smart handling of unpaired tags?  I'm not sure we
> can count on getting 100% properly formed HTML...

There should be html2dom parsers - after all, extending htmlparser to
generate dom shouldn't be to hard.

Googling turns up tidy - so you may want to feed your html through it
before:

http://www.xml.com/pub/a/2004/09/08/pyxml.html


-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list