XML Tree Discovery (script, tool, __?)

Thu Oct 27 23:25:44 EDT 2005

> - don't use SAX unless your document is huge
> - don't use DOM unless someone is putting a gun to your head

What I say is: use what works for you.  I think SAX would be fine for
this task, but, hey, I personally would use Amara (
http://uche.ogbuji.net/tech/4suite/amara/ ), of course.  The following
does the trick:

import sets
import amara
from amara import binderytools

#element_skeleton_rule suppresses char data from the resulting binding
#tree.  If you have a large document and only care about element/attr
#structure and not text, this saves a lot of memory
rules = [binderytools.element_skeleton_rule()]
#XML can be a file path, URI, string, or even an open-file-like object
doc = amara.parse(XML, rules=rules)
elems = {}
for e in doc.xml_xpath('//*'):
    paths = elems.setdefault((e.namespaceURI, e.localName), sets.Set())
    path = u'/'.join([n.nodeName for n in e.xml_xpath(u'ancestor::*')])
    paths.add(u'/' + path)

#Pretty-print output
for name in elems:
    print name, '\n\t\t\t', '\n\t\t\t'.join(elems[name])

--
 Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/