successor to htmllib

Carl Banks imbosol at vt.edu
Fri Sep 6 18:04:23 EDT 2002


Erik Price wrote:
> I noticed that the htmllib module is really best suited to HTML 2.0 
> documents.  I was wondering if there was a newer (4+) HTML or even 
> XHTML parsing module in development right now.  (By XHTML parsing lib I 
> mean perhaps an XML parser that is specifically written for the XHTML 
> 1.0 DTD.  I realize that any XML parser -should- work.)


Probably a better idea is to deprecate it and sgmllib.

Having used sgmllib for an HTML preprocessor, I can say its design is
not versatile at all, and trying to make it do anything different than
it was originally intended tends to be a headache.

sgmllib's reason for existence is to support the very limited subset
(term used loosely) of SGML needed by htmllib.  htmllib's reason for
existence is to support the rendering of HTML by an event-based
program.  I suppose they do that job satisfactorily.  However, the
more you deviate from their narrow intended uses, the more useless
they become.  sgmllib appears to be fairly buggy as well, although I
am not an SGML expert so I can't say how much.

I'd say you're better off using the XML tools, even though I don't
know anything about them, I can't imagine they would be less
versatile.


-- 
CARL BANKS
http://www.aerojockey.com



More information about the Python-list mailing list