[Web-SIG] DOM-based templating

Fri Jun 3 17:33:39 CEST 2005

Donovan Preston wrote:
> We've talked about this slightly before, but I think now more than ever 
> stan can be that DOM. I don't think it would be too much work; it would 
> mostly require removing assumptions that other nevow modules are 
> available. I think stan could be broken out of nevow and into a 
> standalone thing by pulling these modules:
> 
> nevow.stan
> nevow.tags
> nevow.loaders
> nevow.context
> 
> And the package:
> 
> nevow.flat
> 
> I'm willing to do the work, and I'm willing to remove assumptions it 
> makes and refactor things until they are clean. The module which would 
> require the most work is nevow.context -- an internal rendering 
> implementation detail that Nevow makes explicit but I would want to hide 
> from non-Nevow users of stan. nevow.context was the first module of 
> nevow to be written, and has a bunch of crufty bad decisions that 
> haven't yet been refactored out of existence. But I'd like to do it, and 
> this would give me an excuse to.

I'd be very interested in that.  The DOM is just annoying and HTMLParser 
and other SAX-ish parsers are just way too tedious and difficult.

>> * Can parse HTML, not just XHTML.  Not the crazy HTML browsers parse, 
>> but unambiguous well-formed HTML.  I don't like the idea of putting the 
>> HTML through tidy; that's fine for a screen-scraper, but is way too 
>> defensive for this kind of thing.
> 
> 
> nevow.loaders.htmlfile does a good job of parsing normal html. 
> nevow.loaders.xmlfile parses strict XHTML and allows more tag tricks, 
> but I think casual users won't notice the difference, especially for the 
> purpose you desire.
> 
>> * Can generate HTML.  This is probably easy to tack onto most systems, 
>> even if it isn't present now -- it's just a couple rules about how to 
>> serialize tags.
> 
> 
> HTML rather than XHTML? I'm curious what the motivation for this is, and 
> if you know what the couple of rules would be. I think it wouldn't be 
> too hard.

Well, there's a couple rules I guess.  First, you don't use the 
XML-style empty tags, which I suspect will confuse browsers in some 
cases.  Then there's a list of contentless tags in the HTML spec, and 
you just have rules about them specifically.

I know Ryan Tamayko added this to Kid, which I believe uses ElementTree 
as its backend.  He might be interested in Stan as well, since all these 
features would be useful to him as well.

> Hmm, I guess the motivation for the previous point is the next point?
> 
>> * Doesn't modify the output at all for areas where no transformations 
>> occurred.  It doesn't wipe out whitespace.  It *definitely* doesn't lose 
>> comments.  It keeps attribute order.  When nodes are modified it's 
>> sometimes ambiguous how that effects the output, so if attribute order 
>> is lost there it's not that big a deal.
> 
> 
> stan is whitespace in, whitespace out. It keeps comments. It uses a dict 
> for attributes, but this could be changed easily. nevow.url uses a list 
> of tuples, because order is actually important there. This means it 
> needs to have a different API; it has .add() as well as .replace(). Add 
> adds a new key value pair, even if the key is already present; replace 
> finds any existing keys and puts a new value in it's place, preserving 
> the original order.

Maybe just an ordered dictionary would work -- avoid any backward 
compatibility problems as well.  For instance, here's an implementation: 
http://www.voidspace.org.uk/python/recipebook.shtml#odict and 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/107747

One thing this would allow -- while I understand why ZPT has a fixed 
order of attribute evaluation that ignores the textual order, I must 
admit I wish it was not so.  Even now you probably can't add an option 
to warn if they show up out of order if it loses the order of the tags 
(though I have no idea what ZPT's internal representation is -- some ad 
hoc DOM I'd guess).  I'm sure other systems have similar issues.

>> * Can output nicely-formatted code.  Probably easy to add, but nice if 
>> it's already there.  This is, of course, entirely contrary to the 
>> previous item ;)  When generating nodes *purely* from Python, systems 
>> tend to produce HTML/XML with no extra whitespace at all, and completely 
>>   unreadable.
> 
> 
> This is really, really, really a bad idea. While browsers claim to be 
> whitespace agnostic, they make a huge rendering distinction between "no 
> whitespace present" and "any whitespace present". Nevow preserves any 
> whitespace that was originally in your template, but when generating 
> tags from Python it can't, so it doesn't.
> 
> That said, it is something I have considered writing before. Woven had 
> it. I found it to be more trouble than it is worth. I think it should be 
> added, but you should have to go out of your way to turn it on, and it 
> should be off by default.

I think you can generally safely add whitespace to block-level tags.  So 
if the indenter is HTML-aware it should be okay. I seem to remember 
problems with whitespace in <td> elements (which I think count as block 
level, but maybe don't), but I don't think I've noticed that since the 
NS4 days.

Another option is for some flavor of markup generation, where newlines 
would get appended automatically.  Just the occassional newline would be 
helpful.

OTOH, this might all be better resolved with a Firefox extension or 
bookmarklet or somesuch, that may or may not already exist.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org