Working with HTML5 documents

Thu Nov 20 16:15:59 EST 2014

On Thu, Nov 20, 2014 at 1:10 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Ian Kelly schrieb am 20.11.2014 um 20:44:
>> On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote:
>>> There's also the E-factory for creating (sub-)trees and a nicely objectish way:
>>>
>>> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory
>>
>> That looks ugly with all those caps and also hard to extend. Notably
>> it seems to be missing any functions to build HTML5 elements, unless
>> those have been added in lxml 3.4.
>
> It's actually trivial to extend, and it's designed for it. The factory
> simply uses "__getattr__()", so you can ask it for any tag name. The
> predefined names in the builder.py module are mainly there to easily detect
> typos on user side.

This is not the case from what I saw in my testing based on the documentation.

Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import builder as E
>>> html = E.HTML(E.HEAD(), E.BODY())
>>> html = E.HTML(E.HEAD(), E.BODY(E.ARTICLE()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'ARTICLE'

> https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py
>
> If you don't like capital names for constants, just copy the module and
> change the tag names to lower case, or use the blank E-factory if you feel
> like it.

Based on the source file that you linked, I can see that this would
work but is undocumented:

>>> from lxml.builder import ElementMaker
>>> import lxml.html
>>> E = ElementMaker(makeelement=lxml.html.html_parser.makeelement)
>>> html = E.html(E.head(), E.body(E.article()))
>>> lxml.html.tostring(html)
'<html><head></head><body><article></article></body></html>'