[XML-SIG] processing "special characters" efficiently

David Goodger dgoodger@bigfoot.com
Thu, 13 Apr 2000 18:47:45 -0400


> From: Craig.Curtin@wdr.com
> Date: Thu, 6 Apr 2000 15:57:43 -0500
> 
> i'm looking for an efficient mechanism for filtering out
> XML special characters....

I don't know if you follow comp.lang.python, but Fredrik Lundh just posted
the solution to your problem. His book "(the eff-bot guide to) the standard
python library" looks to be a treasure trove of such examples. Enjoy!

-- 
David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 (more to come!)


============================================================
Fredrik Lundh <effbot@telia.com> posted to comp.lang.python:
============================================================

Randall Hopper <aa8vb@yahoo.com> wrote:
> Is there a Python feature or standard library API that will get me less
> Python code spinning inside this loop?   re.multisub or equivalent? :-)


haven't benchmarked it, but I suspect that this approach
is more efficient:

...

# based on re-example-5.py

import re
import string

symbol_map = { "foo": "FOO", "bar": "BAR" }

def symbol_replace(match, get=symbol_map.get):
    return get(match.group(1), "")

symbol_pattern = re.compile(
    "(" + string.join(map(re.escape, symbol_map.keys()), "|") + ")"
    )

print symbol_pattern.sub(symbol_replace, "foobarfiebarfoo")

...

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->

============================================================

Randall Hopper <aa8vb@yahoo.com> wrote:
> Thanks!  It's much more efficient.  The 140 seconds original running time
> was reduced to 11.6 seconds.  I can certainly live with that.

thought so ;-)

while you're at it, try replacing the original readline loop with:

    while 1:
        lines = fp.readlines(BUFFERSIZE)
        if not lines:
            break
        lines = string.join(lines, "")
        lines = re.sub(...)
        out_fp.write(lines)

where BUFFERSIZE is 1000000 or so...

</F>