a 100-line indentation-based preprocessor for HTML

David Williams david at bibliolabs.com
Sat Nov 28 00:56:37 EST 2009


You might want to take a look at this:

http://www.ghrml.org/

David

> Python has this really neat idea called indentation-based syntax, and
> there are folks that have caught on to this idea in the HTML
> community.
>
> AFAIK the most popular indentation-based solution for generating HTML
> is a tool called HAML, which actually is written in Ruby.
>
> I have been poking around with the HAML concepts in Python, with the
> specific goal of integrating with Django.   But before releasing that,
> I thought it would be useful to post code that distills the basic
> concept with no assumptions about your target renderer.  I hope it
> also serves as a good example of what you can do in exactly 100 lines
> of Python code.
>
> Here is what it does...
>
>     You can use indentation syntax for HTML tags like table.
>
>     From this...
>
>     table
>         tr
>             td
>                 Left
>             td
>                 Center
>             td
>                 Right
>
>     ...you get this:
>
>     <table>
>         <tr>
>             <td>
>                 Left
>             </td>
>             <td>
>                 Center
>             </td>
>             <td>
>                 Right
>             </td>
>         </tr>
>     </table>
>
>     Lists and divs work the same way, and note that attributes are not
> a problem.
>
>     From this...
>
>     div class="spinnable"
>         ul
>             li id="item1"
>                One
>             li id="item2"
>                Two
>
>     ...you get this:
>
>     <div class="spinnable">
>         <ul>
>             <li id="item1">
>                One
>             </li>
>             <li id="item2">
>                Two
>             </li>
>         </ul>
>     </div>
>
>     You can still use raw HTML tags where appropriate (such as when
> converting
>     legacy markup to the new style).
>
>     From this...
>
>     <table>
>         tr
>             td
>                 <b>Hello World!</b>
>     </table>
>
>     ...you get this:
>
>     <table>
>         <tr>
>             <td>
>                 <b>Hello World!</b>
>             </td>
>         </tr>
>     </table>
>
> And here is the code:
>
>     import re
>
>     def convert_text(in_body):
>         '''
>         Convert HAML-like markup to HTML.  Allow raw HTML to
>         fall through.
>         '''
>         indenter = Indenter()
>         for prefix, line, kind in get_lines(in_body):
>             if kind == 'branch' and '<' not in line:
>                 html_block_tag(prefix, line, indenter)
>             else:
>                 indenter.add(prefix, line)
>         return indenter.body()
>
>
>     def html_block_tag(prefix, line, indenter):
>         '''
>         Block tags have syntax like this and only
>         apply to branches in indentation:
>
>         table
>             tr
>                 td class="foo"
>                     leaf #1
>                 td
>                     leaf #2
>         '''
>         start_tag = '<%s>' % line
>         end_tag = '</%s>' % line.split()[0]
>         indenter.push(prefix, start_tag, end_tag)
>
>
>     class Indenter:
>         '''
>         Example usage:
>
>         indenter = Indenter()
>         indenter.push('', 'Start', 'End')
>         indenter.push('    ', 'Foo', '/Foo')
>         indenter.add ('        ', 'bar')
>         indenter.add ('    ', 'yo')
>         print indenter.body()
>         '''
>         def __init__(self):
>             self.stack = []
>             self.lines = []
>
>         def push(self, prefix, start, end):
>             self.add(prefix, start)
>             self.stack.append((prefix, end))
>
>         def add(self, prefix, line):
>             if line:
>                 self.pop(prefix)
>             self.insert(prefix, line)
>
>         def insert(self, prefix, line):
>             self.lines.append(prefix+line)
>
>         def pop(self, prefix):
>             while self.stack:
>                 start_prefix, end =  self.stack[-1]
>                 if len(prefix) <= len(start_prefix):
>                     whitespace_lines = []
>                     while self.lines and self.lines[-1] == '':
>                         whitespace_lines.append(self.lines.pop())
>                     self.insert(start_prefix, end)
>                     self.lines += whitespace_lines
>                     self.stack.pop()
>                 else:
>                     return
>
>         def body(self):
>             self.pop('')
>             return '\n'.join(self.lines)
>
>     def get_lines(in_body):
>         '''
>         Splits out lines from a file and identifies whether lines
>         are branches, leafs, or blanks.  The detection of branches
>         could probably be done in a more elegant way than patching
>         the last non-blank line, but it works.
>         '''
>         lines = []
>         last_line = -1
>         for line in in_body.split('\n'):
>             m = re.match('(\s*)(.*)', line)
>             prefix, line = m.groups()
>             if line:
>                 line = line.rstrip()
>                 if last_line >= 0:
>                     old_prefix, old_line, ignore = lines[last_line]
>                     if len(old_prefix) < len(prefix):
>                         lines[last_line] = (old_prefix, old_line,
> 'branch')
>                 last_line = len(lines)
>                 lines.append((prefix, line, 'leaf')) # leaf for now
>             else:
>                 lines.append(('', '', 'blank'))
>         return lines
>
> As I mention in the comment for get_lines(), I wonder if there are
> more elegant ways to deal with the indentation, both of the input and
> the output.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>






More information about the Python-list mailing list