Ignoring XML Namespaces with cElementTree

Carl Banks pavlovevidence at gmail.com
Sat May 1 06:33:57 EDT 2010


On Apr 29, 10:12 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> dmtr, 30.04.2010 04:57:
>
>
>
> > I'm referring to xmlns/URI prefixes. Here's a code example:
> >   from xml.etree.cElementTree import iterparse
> >   from cStringIO import StringIO
> >   xml = """<root xmlns="http://www.very_long_url.com"><child/></
> > root>"""
> >   for event, elem in iterparse(StringIO(xml)): print event, elem
>
> > The output is:
> >   end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
> >   end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>
>
> > I don't want these "{http://www.very_long_url.com}" in front of my
> > tags.
>
> > They create performance disaster on large files
>
> I seriously doubt that they do.

I don't know what kind of XML files you deal with, but for me a large
XML file is gigabyte-sized (obviously I don't use Element Tree for
those).

Even for files tens-of-megabyte files string ops to expand tags with
namespaces is going to be a pretty decent penalty--remember
ElementTree does nothing lazily.


> > (first cElementTree
> > adds them, then I have to remove them in python).
>
> I think that's your main mistake: don't remove them. Instead, use the fully
> qualified names when comparing.

Unless you have multiple namespaces or are working with defined schema
or something, it's useless boilerplate.

It'd be a nice feature if ElementTree could let users optionally
ignore a namespace, unfortunately it doesn't have it.


Carl Banks



More information about the Python-list mailing list