lxml empty versus self closed tag

Robin Becker robin at reportlab.com
Thu Mar 3 04:21:42 EST 2022


On 02/03/2022 18:39, Dieter Maurer wrote:
> Robin Becker wrote at 2022-3-2 15:32 +0000:
>> I'm using lxml.etree.XMLParser and would like to distinguish
>>
>> <tag></tag>
>>
>> from
>>
>> <tag/>
>>
>> I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text==''
> 
> I do not think so (at least not without a DTD):

I have a DTD which has

<!ELEMENT tag (content)*>

so I guess the empty case is allowed as well as the self closed.

I am converting from an older parser which has text=='' for <tag></tag> and text==None for the self closed version. I 
don't think I really need to make the distinction. However, I wonder how lxml can present an empty string content 
deliberately or if that always has to be a semantic decision.

> `<t

ag/>' is just a shorthand notation for '<tag></tag>' and
> the difference has no influence on the DOM.
> 
> Note that `lxml` is just a Python binding for `libxml2`.
> All the parsing is done by this library.
yes I think I knew that


More information about the Python-list mailing list