[New-bugs-announce] [issue17582] xml.etree.ElementTree does not preserve whitespaces in attributes

Daniele Varrazzo report at bugs.python.org
Sat Mar 30 17:26:34 CET 2013


New submission from Daniele Varrazzo:

XML defines the following chars as whitespace [1]::

    S ::= (#x20 | #x9 | #xD | #xA)+

However the chars are not properly escaped into attributes, so they are converted into spaces as per attribute-value normalization [2]

    >>> data = '\x09\x0a\x0d\x20'
    >>> data
    '\t\n\r '

    >>> import  xml.etree.ElementTree as ET
    >>> e = ET.Element('x', attr=data)
    >>> s = ET.tostring(e)
    >>> s
    '<x attr="\t
\r " />'

    >>> e1 = ET.fromstring(s)
    >>> data1 = e1.attrib['attr']
    >>> data1 == data
    False

    >>> data1
    ' \n  '

cElementTree suffers of the same bug::

    >>> import  xml.etree.cElementTree as cET
    >>> cET.fromstring(cET.tostring(cET.Element('a', attr=data))).attrib['attr']
    ' \n  '

but not the external library lxml.etree::

    >>> import lxml.etree as LET
    >>> LET.fromstring(LET.tostring(LET.Element('a', attr=data))).attrib['attr']
    '\t\n\r '

The bug is analogous to #5752 but it refers to a different and independent module. Proper escaping should be added to the _escape_attrib() function into /xml/etree/ElementTree.py (and equivalent for cElementTree).

[1] http://www.w3.org/TR/REC-xml/#white
[2] http://www.w3.org/TR/REC-xml/#AVNormalize

----------
components: Library (Lib), XML
messages: 185574
nosy: piro
priority: normal
severity: normal
status: open
title: xml.etree.ElementTree does not preserve whitespaces in attributes
versions: Python 2.7, Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17582>
_______________________________________


More information about the New-bugs-announce mailing list