[Web-SIG] DOM-based templating
Ian Bicking
ianb at colorstudy.com
Fri Jun 3 08:18:24 CEST 2005
While we're on the topic of DOM-based templating...
FormEncode has a module htmlfill
(http://formencode.org/docs/htmlfill.html), which is basically like
DOM-based templating that just knows about HTML forms. But currently it
doesn't use a DOM, it uses an HTMLParser subclass. This makes it much
more complex than it would otherwise be, and misses out on some
potential performance gains -- many times the input to htmlfill will be
output from a template or HTML generator, and so often the DOM from the
template is serialized to text, then parsed again.
I had thought about moving this to a DOM or DOM-ish thing of some sort,
but I don't know which one. Unfortunately many of the options are not
very humane -- that is, they are "correct", but not user-friendly.
Here's what I'd like, and maybe someone can suggest something (I won't
claim HTMLParser is that humane either; but I'm looking to improve
this). Here's what I'd like:
* Can parse HTML, not just XHTML. Not the crazy HTML browsers parse,
but unambiguous well-formed HTML. I don't like the idea of putting the
HTML through tidy; that's fine for a screen-scraper, but is way too
defensive for this kind of thing.
* Can generate HTML. This is probably easy to tack onto most systems,
even if it isn't present now -- it's just a couple rules about how to
serialize tags.
* Doesn't modify the output at all for areas where no transformations
occurred. It doesn't wipe out whitespace. It *definitely* doesn't lose
comments. It keeps attribute order. When nodes are modified it's
sometimes ambiguous how that effects the output, so if attribute order
is lost there it's not that big a deal.
* Can output nicely-formatted code. Probably easy to add, but nice if
it's already there. This is, of course, entirely contrary to the
previous item ;) When generating nodes *purely* from Python, systems
tend to produce HTML/XML with no extra whitespace at all, and completely
unreadable.
* Keeps around enough information to produce good error messages. It
needs to be possible to figure out the line and maybe column where a
node was originally defined. If we're supporting multiple
transformations by multiple systems, then this information needs to
persist through the transformations. I think this is a really important
and undervalued feature; anyone can write a templating system with
crappy error messages (and lots and lots of people do). Good error
messages set a templating system apart.
* Reasonably fast.
I've played around just a bit with ElementTree, but I only felt so-so
about it. I felt like it was pretty correct, but not very humane --
maybe that'd be good enough if I was processing big XML documents, but
it doesn't work for HTML templating.
--
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Web-SIG
mailing list