Ignoring XML Namespaces with cElementTree

Stefan Behnel stefan_ml at behnel.de
Sat May 1 08:34:39 EDT 2010


Carl Banks, 01.05.2010 12:33:
> On Apr 29, 10:12 pm, Stefan Behnel wrote:
>> dmtr, 30.04.2010 04:57:
>>> I don't want these "{http://www.very_long_url.com}" in front of my
>>> tags.  They create performance disaster on large files
>>
>> I seriously doubt that they do.
>
> I don't know what kind of XML files you deal with, but for me a large
> XML file is gigabyte-sized (obviously I don't use Element Tree for
> those).

Why not? I used cElementTree for files of that size (1-1.5GB unpacked) a 
couple of times, and it was never a problem.


> Even for files tens-of-megabyte files string ops to expand tags with
> namespaces is going to be a pretty decent penalty--remember
> ElementTree does nothing lazily.

So? Did you run a profiler on it to know that there is a penalty due to the 
string concatenation? cElementTree's parser (expat) and its tree builder 
are blazingly fast, especially the iterparse() implementation.

http://codespeak.net/lxml/performance.html#parsing-and-serialising
http://codespeak.net/lxml/performance.html#a-longer-example
http://effbot.org/zone/celementtree.htm#benchmarks


>>> (first cElementTree adds them, then I have to remove them in python).
>>
>> I think that's your main mistake: don't remove them. Instead, use the fully
>> qualified names when comparing.
>
> Unless you have multiple namespaces or are working with defined schema
> or something, it's useless boilerplate.
>
> It'd be a nice feature if ElementTree could let users optionally
> ignore a namespace, unfortunately it doesn't have it.

I agree that that would make for a nice parser option, e.g. when dealing 
with HTML and XHTML in the same code.

Stefan




More information about the Python-list mailing list