a 100-line indentation-based preprocessor for HTML
David Williams
david at bibliolabs.com
Sat Nov 28 00:56:37 EST 2009
You might want to take a look at this:
http://www.ghrml.org/
David
> Python has this really neat idea called indentation-based syntax, and
> there are folks that have caught on to this idea in the HTML
> community.
>
> AFAIK the most popular indentation-based solution for generating HTML
> is a tool called HAML, which actually is written in Ruby.
>
> I have been poking around with the HAML concepts in Python, with the
> specific goal of integrating with Django. But before releasing that,
> I thought it would be useful to post code that distills the basic
> concept with no assumptions about your target renderer. I hope it
> also serves as a good example of what you can do in exactly 100 lines
> of Python code.
>
> Here is what it does...
>
> You can use indentation syntax for HTML tags like table.
>
> From this...
>
> table
> tr
> td
> Left
> td
> Center
> td
> Right
>
> ...you get this:
>
> <table>
> <tr>
> <td>
> Left
> </td>
> <td>
> Center
> </td>
> <td>
> Right
> </td>
> </tr>
> </table>
>
> Lists and divs work the same way, and note that attributes are not
> a problem.
>
> From this...
>
> div class="spinnable"
> ul
> li id="item1"
> One
> li id="item2"
> Two
>
> ...you get this:
>
> <div class="spinnable">
> <ul>
> <li id="item1">
> One
> </li>
> <li id="item2">
> Two
> </li>
> </ul>
> </div>
>
> You can still use raw HTML tags where appropriate (such as when
> converting
> legacy markup to the new style).
>
> From this...
>
> <table>
> tr
> td
> <b>Hello World!</b>
> </table>
>
> ...you get this:
>
> <table>
> <tr>
> <td>
> <b>Hello World!</b>
> </td>
> </tr>
> </table>
>
> And here is the code:
>
> import re
>
> def convert_text(in_body):
> '''
> Convert HAML-like markup to HTML. Allow raw HTML to
> fall through.
> '''
> indenter = Indenter()
> for prefix, line, kind in get_lines(in_body):
> if kind == 'branch' and '<' not in line:
> html_block_tag(prefix, line, indenter)
> else:
> indenter.add(prefix, line)
> return indenter.body()
>
>
> def html_block_tag(prefix, line, indenter):
> '''
> Block tags have syntax like this and only
> apply to branches in indentation:
>
> table
> tr
> td class="foo"
> leaf #1
> td
> leaf #2
> '''
> start_tag = '<%s>' % line
> end_tag = '</%s>' % line.split()[0]
> indenter.push(prefix, start_tag, end_tag)
>
>
> class Indenter:
> '''
> Example usage:
>
> indenter = Indenter()
> indenter.push('', 'Start', 'End')
> indenter.push(' ', 'Foo', '/Foo')
> indenter.add (' ', 'bar')
> indenter.add (' ', 'yo')
> print indenter.body()
> '''
> def __init__(self):
> self.stack = []
> self.lines = []
>
> def push(self, prefix, start, end):
> self.add(prefix, start)
> self.stack.append((prefix, end))
>
> def add(self, prefix, line):
> if line:
> self.pop(prefix)
> self.insert(prefix, line)
>
> def insert(self, prefix, line):
> self.lines.append(prefix+line)
>
> def pop(self, prefix):
> while self.stack:
> start_prefix, end = self.stack[-1]
> if len(prefix) <= len(start_prefix):
> whitespace_lines = []
> while self.lines and self.lines[-1] == '':
> whitespace_lines.append(self.lines.pop())
> self.insert(start_prefix, end)
> self.lines += whitespace_lines
> self.stack.pop()
> else:
> return
>
> def body(self):
> self.pop('')
> return '\n'.join(self.lines)
>
> def get_lines(in_body):
> '''
> Splits out lines from a file and identifies whether lines
> are branches, leafs, or blanks. The detection of branches
> could probably be done in a more elegant way than patching
> the last non-blank line, but it works.
> '''
> lines = []
> last_line = -1
> for line in in_body.split('\n'):
> m = re.match('(\s*)(.*)', line)
> prefix, line = m.groups()
> if line:
> line = line.rstrip()
> if last_line >= 0:
> old_prefix, old_line, ignore = lines[last_line]
> if len(old_prefix) < len(prefix):
> lines[last_line] = (old_prefix, old_line,
> 'branch')
> last_line = len(lines)
> lines.append((prefix, line, 'leaf')) # leaf for now
> else:
> lines.append(('', '', 'blank'))
> return lines
>
> As I mention in the comment for get_lines(), I wonder if there are
> more elegant ways to deal with the indentation, both of the input and
> the output.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list