Python script to optimize XML text

Robert Dailey rcdailey at gmail.com
Tue Sep 25 10:43:03 EDT 2007


Hey guys,

Thanks for everyone's input. I wanted to learn regular expressions, however
I'm finding them to be quite evil. I think I've learned that it's always a
good idea to make regex a very LAST resort. This is my opinion I'm
developing on. In any case, I like the ideas mentioned here concerning using
the XML parser to do the job for me. Thanks again everyone, I think I'll be
going with the XML parser to do what I need.

Have a good day everyone.

On 9/25/07, Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
>
> Gabriel Genellina wrote:
> > En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey <rcdailey at gmail.com>
> > escribi�:
> >
> >> I'm currently seeking a python script that provides a way of
> >> optimizing out
> >> useless characters in an XML document to provide the optimal size for
> the
> >> file. For example, assume the following XML script:
> >>
> >> <root>
> >>     <Test></Test>
> >>     <!-- <CommentedOutElement/> -->
> >>
> >>     <!-- Do Something Else -->
> >> </root>
> >>
> >> By running this through an XML optimizer, the file would appear as:
> >>
> >> <root><Test/></root>
> >
> > ElementTree does that almost for free.
>
> As the OP is currently using lxml.etree (and as this was a cross-post to
> c.l.py and lxml-dev), I already answered on the lxml list.
>
> This is just to mention that the XMLParser of lxml.etree accepts keyword
> options to ignore plain whitespace content, comments and processing
> instructions, and that you can provide a DTD to tell it what
> whitespace-only
> content really is "useless" in the sense of your specific application.
>
> Stefan
> --
> http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070925/3e1765a1/attachment.html>


More information about the Python-list mailing list