XML beautifier?

Andreas Jung ajung at sz-sb.de
Thu Sep 2 13:12:23 EDT 1999


On Thu, Sep 02, 1999 at 05:29:40PM +0200, Alexander Staubo wrote:
> I'm having a gas with the XML package and its DOM classes, but its toxml
> () mechanism outputs mainly flat XML -- no visual structure in the form 
> of line shifts or indentation. Is there a Python module that such 
> beautification reasonably hassle-free?

Here is just a very stupid program which does the job. It works
with regular expressions. You can although use the sgmllib
to parse the file, find the tags with the unknown_starttag() and
unknown_endtag() functions and indent the output corresponding.

Cheers,
Andreas
------------

import os,sys,re,string
import gzip

fname = sys.argv[1]
if fname[-2:] == 'gz':
    data = gzip.GzipFile(fname,'r').read()
else:
    data = open(fname,'r').read()

fields = re.split('(<.*?>)',data)
level = 0
for f in fields:
    if string.strip(f)=='': continue
    if f[0]=='<' and f[1] != '/':
        print ' '*(level*4) + f
        level = level + 1
    elif f[:2]=='</':
        level = level - 1
        print ' '*(level*4) + f
    else:
        print ' '*(level*4) + f   




More information about the Python-list mailing list