[Web-SIG] DOM-based templating

Fri Jun 3 08:18:24 CEST 2005

While we're on the topic of DOM-based templating...

FormEncode has a module htmlfill 
(http://formencode.org/docs/htmlfill.html), which is basically like 
DOM-based templating that just knows about HTML forms.  But currently it 
doesn't use a DOM, it uses an HTMLParser subclass.  This makes it much 
more complex than it would otherwise be, and misses out on some 
potential performance gains -- many times the input to htmlfill will be 
output from a template or HTML generator, and so often the DOM from the 
template is serialized to text, then parsed again.

I had thought about moving this to a DOM or DOM-ish thing of some sort, 
but I don't know which one.  Unfortunately many of the options are not 
very humane -- that is, they are "correct", but not user-friendly. 
Here's what I'd like, and maybe someone can suggest something (I won't 
claim HTMLParser is that humane either; but I'm looking to improve 
this).  Here's what I'd like:

* Can parse HTML, not just XHTML.  Not the crazy HTML browsers parse, 
but unambiguous well-formed HTML.  I don't like the idea of putting the 
HTML through tidy; that's fine for a screen-scraper, but is way too 
defensive for this kind of thing.

* Can generate HTML.  This is probably easy to tack onto most systems, 
even if it isn't present now -- it's just a couple rules about how to 
serialize tags.

* Doesn't modify the output at all for areas where no transformations 
occurred.  It doesn't wipe out whitespace.  It *definitely* doesn't lose 
comments.  It keeps attribute order.  When nodes are modified it's 
sometimes ambiguous how that effects the output, so if attribute order 
is lost there it's not that big a deal.

* Can output nicely-formatted code.  Probably easy to add, but nice if 
it's already there.  This is, of course, entirely contrary to the 
previous item ;)  When generating nodes *purely* from Python, systems 
tend to produce HTML/XML with no extra whitespace at all, and completely 
  unreadable.

* Keeps around enough information to produce good error messages.  It 
needs to be possible to figure out the line and maybe column where a 
node was originally defined.  If we're supporting multiple 
transformations by multiple systems, then this information needs to 
persist through the transformations.  I think this is a really important 
and undervalued feature; anyone can write a templating system with 
crappy error messages (and lots and lots of people do).  Good error 
messages set a templating system apart.

* Reasonably fast.

I've played around just a bit with ElementTree, but I only felt so-so 
about it.  I felt like it was pretty correct, but not very humane -- 
maybe that'd be good enough if I was processing big XML documents, but 
it doesn't work for HTML templating.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org