[Web-SIG] HTML parsing: anyone use formatter?

amk at amk.ca amk at amk.ca
Thu Oct 30 14:27:18 EST 2003


[Crossposted to python-dev, web-sig, and xml-sig.  Followups to
web-sig at python.org, please.]

I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for
all the missing elements.  I've currently been adding just empty methods to
the HTMLParser class, but the existing methods actually help render the HTML
by calling methods on a Formatter object. For example, the definitions for
the H1 element look like this:

    def start_h1(self, attrs):
        self.formatter.end_paragraph(1)
        self.formatter.push_font(('h1', 0, 1, 0))
		    
    def end_h1(self):
        self.formatter.end_paragraph(1)
        self.formatter.pop_font()

Question: should I continue supporting this in new methods?  This can only
go so far; a tag such as <big> or <small> is easy for me to handle, but
handling <form> or <frameset> or <table> would require greatly expanding the
Formatter class's repertoire.

I suppose the more general question is, does anyone use Python's formatter
module?  Do we want to keep it around, or should htmllib be pushed toward
doing just HTML parsing?  formatter.py is a long way from being able to
handle modern web pages and it would be a lot of work to build a decent
renderer.

--amk



More information about the Web-SIG mailing list