request for advice - possible ElementTree nexus

Gerard Flanagan grflanagan at yahoo.co.uk
Wed Jul 5 03:35:31 EDT 2006


mirandacascade at yahoo.com wrote:
> Situation is this:
> 1) I have inherited some python code that accepts a string object, the
> contents of which is an XML document, and produces a data structure
> that represents some of the content of the XML document
> 2) The inherited code is somewhat 'brittle' in that some well-formed
> XML documents are not correctly processed by the code; the brittleness
> is caused by how the parser portion of the code handles whitespace.
> 3) I would like to change the code to make it less brittle.  Whatever
> changes I make must continue to produce the same data structure that is
> currently being produced.
> 4) Rather than attempt to fix the parser portion of the code, I would
> prefer to use ElementTree.  ElementTree handles parsing XML documents
> flawlessly, so the brittle portion of the code goes away.  In addition,
> the ElementTree model is very sweet to work with, so it is a relatively
> easy task using the information in ElementTree to produce the same data
> structure that is currently being produced.
> 5) The existing data structure--the structure that must be
> maintained--that gets produced does NOT include any {xmlns=<whatever>}
> information that may appear in the source XML document.
> 6) Based on a review of several posts in this group, I understand why
> ElementTree hanldes xmlns=<whatever> information the way it does.  This
> is an oversimplification, but one of the things it does is to
> incorporate the {whatever} within the tag property of the element and
> of any descendent elements.
> 7) One of the pieces of information in the data structure that gets
> produced by this code is the tag...the tag in the data structure should
> not have any xmlns=<whatever> information.
>
> So, given that the goal is to produce the same data structure and given
> that I really want to use ElementTree, I need to find a way to remove
> the xmlns=<whatever> information.  It seems like there are 2 general
> methods for accomplishing this:
> 1) before feeding the string object to the ElementTree.XML() method,
> remove the xmlns=<whatever> information from the string.
> 2) keep the xmlns=<whatever> information in the string that feeds
> ElementTree.XML(), but when building the data structure, ensure that
> the {whatever} information in the tag property of the element should
> NOT be included in the data structure.
>
[snip]

maybe transform the document with XSLT before processing?

google: xslt remove namespaces

eg. http://www.tei-c.org/wiki/index.php/Remove-Namespaces.xsl

eg. http://www.thescripts.com/forum/thread86057.html

hth

Gerard




More information about the Python-list mailing list