printing indented html code

Konstantin Veretennicov kveretennicov at gmail.com
Fri Jun 24 07:16:09 EDT 2005


On 6/24/05, Lowell Kirsh <lkirsh at cs.ubc.ca> wrote:
> Is there a module or library anyone knows of that will print html code
> indented?

Depends on whether you have to deal with xhtml, valid html or just
any, possibly invalid, "pseudo-html" that abounds on the net.

Here's an example of (too) simple html processing using standard
HTMLParser module:

from HTMLParser import HTMLParser

class Indenter(HTMLParser):
    
    def __init__(self, out):
        HTMLParser.__init__(self)
        self.out = out
        self.indent_level = 0
    
    def write_line(self, text):
        print >> self.out, '\t' * self.indent_level + text
    
    def handle_starttag(self, tag, attrs):
        self.write_line(
            '<%s %s>' % (tag, ' '.join('%s=%s' % (k, v) for k, v in attrs))
            )
        self.indent_level += 1
    
    def handle_endtag(self, tag):
        self.indent_level -= 1
        self.write_line('</%s>' % tag)
    
    handle_data = write_line
    
    # handle everything else...
    # http://docs.python.org/lib/module-HTMLParser.html

if __name__ == '__main__':
    import sys
    i = Indenter(sys.stdout)
    i.feed('<html><head>foobar</head><body color=0>body</body></html>')
    i.close()

- kv



More information about the Python-list mailing list