[New-bugs-announce] [issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

Dmitry Chichkov report at bugs.python.org
Sat May 1 00:57:29 CEST 2010


New submission from Dmitry Chichkov <dchichkov at gmail.com>:

The namespace_separator parameter is hard coded in the cElementTree.XMLParser class disallowing the option of ignoring XML Namespaces with cElementTree library.

Here's the code example:
 from xml.etree.cElementTree import iterparse
 from StringIO import StringIO
 xml = """<root xmlns="http://www.very_long_url.com"><child/></root>"""
 for event, elem in iterparse(StringIO(xml)): print event, elem

It produces:
 end <Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
 end <Element '{http://www.very_long_url.com}root' at 0xb7ddfa40> 

In the current implementation local tags get forcibly concatenated with URIs often resulting in the ugly code on the user's side and performance degradation (at least due to extra concatenations and extra lengthy compare operations in the elements matching code).

Internally cElementTree uses EXPAT parser, which is doing namespace processing only optionally, enabled by providing a value for namespace_separator argument. This argument is hard-coded in the cElementTree: 
 self->parser = EXPAT(ParserCreate_MM)(encoding, &memory_handler, "}");

Well, attached is a patch exposing this parameter in the cElementTree.XMLParser() arguments. This parameter is optional and the default behavior should be unchanged.  Here's the test code:

import cElementTree

x = """<root xmlns="http://www.very_long_url.com"><child>text</child></root>"""

parser = cElementTree.XMLParser()
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator="}")
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator=None)
parser.feed(x)
elem = parser.close()
print elem

The resulting output:
<Element '{http://www.very_long_url.com}root' at 0xb7e885f0>
<Element '{http://www.very_long_url.com}root' at 0xb7e88608>
<Element 'root' at 0xb7e88458>

----------
components: Library (Lib)
messages: 104671
nosy: dmtr
priority: normal
severity: normal
status: open
title: Hardcoded namespace_separator in the cElementTree.XMLParser
type: performance
versions: Python 2.5, Python 2.6, Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8583>
_______________________________________


More information about the New-bugs-announce mailing list