[Web-SIG] DOM-based templating

Fri Jun 3 09:45:13 CEST 2005

On Jun 2, 2005, at 11:18 PM, Ian Bicking wrote:

> While we're on the topic of DOM-based templating...
>
> FormEncode has a module htmlfill
> (http://formencode.org/docs/htmlfill.html), which is basically like
> DOM-based templating that just knows about HTML forms.  But  
> currently it
> doesn't use a DOM, it uses an HTMLParser subclass.  This makes it much
> more complex than it would otherwise be, and misses out on some
> potential performance gains -- many times the input to htmlfill  
> will be
> output from a template or HTML generator, and so often the DOM from  
> the
> template is serialized to text, then parsed again.
>
> I had thought about moving this to a DOM or DOM-ish thing of some  
> sort,
> but I don't know which one.  Unfortunately many of the options are not
> very humane -- that is, they are "correct", but not user-friendly.
> Here's what I'd like, and maybe someone can suggest something (I won't
> claim HTMLParser is that humane either; but I'm looking to improve
> this).  Here's what I'd like:

We've talked about this slightly before, but I think now more than  
ever stan can be that DOM. I don't think it would be too much work;  
it would mostly require removing assumptions that other nevow modules  
are available. I think stan could be broken out of nevow and into a  
standalone thing by pulling these modules:

nevow.stan
nevow.tags
nevow.loaders
nevow.context

And the package:

nevow.flat

I'm willing to do the work, and I'm willing to remove assumptions it  
makes and refactor things until they are clean. The module which  
would require the most work is nevow.context -- an internal rendering  
implementation detail that Nevow makes explicit but I would want to  
hide from non-Nevow users of stan. nevow.context was the first module  
of nevow to be written, and has a bunch of crufty bad decisions that  
haven't yet been refactored out of existence. But I'd like to do it,  
and this would give me an excuse to.

> * Can parse HTML, not just XHTML.  Not the crazy HTML browsers parse,
> but unambiguous well-formed HTML.  I don't like the idea of putting  
> the
> HTML through tidy; that's fine for a screen-scraper, but is way too
> defensive for this kind of thing.

nevow.loaders.htmlfile does a good job of parsing normal html.  
nevow.loaders.xmlfile parses strict XHTML and allows more tag tricks,  
but I think casual users won't notice the difference, especially for  
the purpose you desire.

> * Can generate HTML.  This is probably easy to tack onto most systems,
> even if it isn't present now -- it's just a couple rules about how to
> serialize tags.

HTML rather than XHTML? I'm curious what the motivation for this is,  
and if you know what the couple of rules would be. I think it  
wouldn't be too hard.

Hmm, I guess the motivation for the previous point is the next point?

> * Doesn't modify the output at all for areas where no transformations
> occurred.  It doesn't wipe out whitespace.  It *definitely* doesn't  
> lose
> comments.  It keeps attribute order.  When nodes are modified it's
> sometimes ambiguous how that effects the output, so if attribute order
> is lost there it's not that big a deal.

stan is whitespace in, whitespace out. It keeps comments. It uses a  
dict for attributes, but this could be changed easily. nevow.url uses  
a list of tuples, because order is actually important there. This  
means it needs to have a different API; it has .add() as well  
as .replace(). Add adds a new key value pair, even if the key is  
already present; replace finds any existing keys and puts a new value  
in it's place, preserving the original order.

> * Can output nicely-formatted code.  Probably easy to add, but nice if
> it's already there.  This is, of course, entirely contrary to the
> previous item ;)  When generating nodes *purely* from Python, systems
> tend to produce HTML/XML with no extra whitespace at all, and  
> completely
>   unreadable.

This is really, really, really a bad idea. While browsers claim to be  
whitespace agnostic, they make a huge rendering distinction between  
"no whitespace present" and "any whitespace present". Nevow preserves  
any whitespace that was originally in your template, but when  
generating tags from Python it can't, so it doesn't.

That said, it is something I have considered writing before. Woven  
had it. I found it to be more trouble than it is worth. I think it  
should be added, but you should have to go out of your way to turn it  
on, and it should be off by default.

> * Keeps around enough information to produce good error messages.  It
> needs to be possible to figure out the line and maybe column where a
> node was originally defined.  If we're supporting multiple
> transformations by multiple systems, then this information needs to
> persist through the transformations.  I think this is a really  
> important
> and undervalued feature; anyone can write a templating system with
> crappy error messages (and lots and lots of people do).  Good error
> messages set a templating system apart.

A great idea. It would be trivial to add file/line/column information  
and populate it differently in each of the loaders. I love it, I'm  
going to go do it right now.

> * Reasonably fast.

Nevow was designed as an optimization of woven, and as a result is  
pretty fast. It has a two-pass system where one pass is taken when  
the template is initially loaded (once per template per process,  
assuming the template doesn't change on disk) and non-important nodes  
are optimized out of what actually happens at render time.

There's also a bunch of low hanging optimization work in  
nevow.context. When I originally wrote it, I was worried about people  
mutating things so I made lots of copies. In the meantime, it turns  
out that the "correct" style of using it is to not mutate things but  
be somewhat functional and side effect free. Since mutating is still  
nice for some things, the objects which get mutated get copied before  
you get called to mutate them. But, a lot, lot more copying currently  
happens than is necessary.

Yet another thing I have been meaning to do but haven't gotten around  
to, that this might encourage me to do.

> I've played around just a bit with ElementTree, but I only felt so-so
> about it.  I felt like it was pretty correct, but not very humane --
> maybe that'd be good enough if I was processing big XML documents, but
> it doesn't work for HTML templating.

Agreed.

dp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20050603/60bf4100/attachment.htm