using TreeBuilder in an ElementTree like way

Sat Jul 1 05:28:33 EDT 2006

Greg Aumann wrote:

> In reading the elementtree documentation I found the 
> ElementTree.TreeBuilder class which it says can be used to create 
> parsers for XML-like languages.

a TreeBuilder is a thing that turns a sequence of start(), data(), and 
end() method calls into an Element tree structure.

a Parser is a think that turns a sequence of feed() method calls into a 
stream of start(), data(), and end() method calls on a target object. 
the standard parsers all automatically uses a TreeBuilder instance as 
the default target.

unfortunately, the current ET release uses classes named XXXTreeBuilder 
also for the actual parsers, which is a bit confusing.  (the reason for 
this is historical; the separate TreeBuilder class is factored out from 
a couple of format-specific XXXTreeBuilder parsers, but the naming 
wasn't fully sorted out).

> Essentially I was trying to implement the following advice from Frederik 
> Lundh (Wed, Sep 8 2004 12:54 am):
>  > by the way, it's trivial to build trees from arbitrary SAX-style sources.
>  > just create an instance of the ElementTree.TreeBuilder class, and call
>  > the "start", "end", and "data" methods as appropriate.
>  >
>  >     builder = ElementTree.TreeBuilder()
>  >     builder.start("tag", {})
>  >     builder.data("text")
>  >     builder.end("tag")
>  >     elem = builder.close()

that's the intended use of the TreeBuilder class.

> but in another post he wrote (Wed, May 21 2003 2:56 am):
>  > usage:
>  >
>  >     from elementtree import ElementTree, HTMLTreeBuilder
>  >
>  >     # file is either a filename or an open stream
>  >     tree = ElementTree.parse(file, parser=HTMLTreeBuilder.TreeBuilder())
>  >     root = tree.getroot()
>  >
>  > or
>  >
>  >     from elementtree import HTMLTreeBuilder
>  >
>  >     parser = HTMLTreeBuilder.TreeBuilder()
>  >     parser.feed(data)
>  >     root = parser.close()

and this is the confusing naming; here, the HTMLTreeBuilder.TreeBuilder 
class is actually doing the parsing (which uses a TreeBuilder instance 
on the inside).

> This second one makes me think I should have implemented a parser class 
> using Treebuilder.

that's entirely up to you: the only real advantage of having a parser 
class is that you can pass it to any other module that uses the Python 
consumer interface:

     http://effbot.org/zone/consumer.htm

but if that's not relevant for your application, feel free to use a 
TreeBuilder directly.

> Also when I used return builder.close() in the code below it didn't return
 > an ElementTree structure but an _ElementInterface.

an Element, in other words (i.e. the thing returned by the Element 
factory in this specific implementation).  that's the documented 
behaviour; if you want an ElementTree wrapper, you have to wrap it yourself.

</F>