Problem with processing XML

Paul McGuire ptmcg at austin.rr.com
Tue Jan 22 09:59:35 EST 2008


On Jan 22, 8:11 am, John Carlyle-Clarke <j... at nowhere.org> wrote:
> Hi.
>
> I'm new to Python and trying to use it to solve a specific problem.  I
> have an XML file in which I need to locate a specific text node and
> replace the contents with some other text.  The text in question is
> actually about 70k of base64 encoded data.
>

Here is a pyparsing hack for your problem.  I normally advise against
using literal strings like "<value>" to match XML or HTML tags in a
parser, since this doesn't cover variations in case, embedded
whitespace, or unforeseen attributes, but your example was too simple
to haul in the extra machinery of an expression created by pyparsing's
makeXMLTags.

Also, I don't generally recommend pyparsing for working on XML, since
there are so many better and faster XML-specific modules available.
But if this does the trick for you for your specific base64-removal
task, great.

-- Paul

# requires pyparsing 1.4.8 or later
from pyparsing import makeXMLTags, withAttribute, keepOriginalText,
SkipTo

xml = """
... long XML string goes here ...
"""

# define a filter that will key off of the <data> tag with the
# attribute 'name="PctShow.Image"', and then use suppress to filter
the
# body of the following <value> tag
dataTag = makeXMLTags("data")[0]
dataTag.setParseAction(withAttribute(name="PctShow.Image"),
                       keepOriginalText)

filter = dataTag + "<value>" + SkipTo("</value>").suppress() + "</
value>"

xmlWithoutBase64Block = filter.transformString(xml)
print xmlWithoutBase64Block




More information about the Python-list mailing list