Minimally intrusive XML editing using Python

Nobody nobody at nowhere.com
Wed Nov 18 14:17:14 EST 2009


On Wed, 18 Nov 2009 13:55:52 +0100, Thomas Lotze wrote:

> I wonder what Python XML library is best for writing a program that makes
> small modifications to an XML file in a minimally intrusive way. By that I
> mean that information the program doesn't recognize is kept, as are
> comments and whitespace, the order of attributes and even whitespace
> around attributes. In short, I want to be able to change an XML file while
> producing minimal textual diffs.
> 
> Most libraries don't allow controlling the order of and the whitespace
> around attributes, so what's generally left to do is store snippets of
> original text along with the model objects and re-use that for writing the
> edited XML if the model wasn't modified by the program. Does a library
> exist that helps with this? Does any XML library at all allow structured
> access to the text representation of a tag with its attributes?

Expat parsers have a CurrentByteIndex field, while SAX parsers have
locators. You can use this to identify the portions of the input which
need to be processed, and just copy everything else. One downside is that
these only report either the beginning (Expat) or end (SAX) of the tag;
you'll have to deduce the other side yourself.

OTOH, "diff" is probably the wrong tool for the job.




More information about the Python-list mailing list