[XML-SIG] processing "special characters" efficiently
David Goodger
dgoodger@bigfoot.com
Thu, 13 Apr 2000 18:47:45 -0400
> From: Craig.Curtin@wdr.com
> Date: Thu, 6 Apr 2000 15:57:43 -0500
>
> i'm looking for an efficient mechanism for filtering out
> XML special characters....
I don't know if you follow comp.lang.python, but Fredrik Lundh just posted
the solution to your problem. His book "(the eff-bot guide to) the standard
python library" looks to be a treasure trove of such examples. Enjoy!
--
David Goodger dgoodger@bigfoot.com Open-source projects:
- The Go Tools Project: http://gotools.sourceforge.net
(more to come!)
============================================================
Fredrik Lundh <effbot@telia.com> posted to comp.lang.python:
============================================================
Randall Hopper <aa8vb@yahoo.com> wrote:
> Is there a Python feature or standard library API that will get me less
> Python code spinning inside this loop? re.multisub or equivalent? :-)
haven't benchmarked it, but I suspect that this approach
is more efficient:
...
# based on re-example-5.py
import re
import string
symbol_map = { "foo": "FOO", "bar": "BAR" }
def symbol_replace(match, get=symbol_map.get):
return get(match.group(1), "")
symbol_pattern = re.compile(
"(" + string.join(map(re.escape, symbol_map.keys()), "|") + ")"
)
print symbol_pattern.sub(symbol_replace, "foobarfiebarfoo")
...
</F>
<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->
============================================================
Randall Hopper <aa8vb@yahoo.com> wrote:
> Thanks! It's much more efficient. The 140 seconds original running time
> was reduced to 11.6 seconds. I can certainly live with that.
thought so ;-)
while you're at it, try replacing the original readline loop with:
while 1:
lines = fp.readlines(BUFFERSIZE)
if not lines:
break
lines = string.join(lines, "")
lines = re.sub(...)
out_fp.write(lines)
where BUFFERSIZE is 1000000 or so...
</F>