Parsing strings (\n and \\)
Fredrik Lundh
fredrik at pythonware.com
Wed Jun 26 06:29:44 EDT 2002
Thomas Guettler wrote:
> Look at the two functoins quote and unquote. I wrote them
> without regular expression because I think it faster.
faster to write, perhaps.
and faster to run, if you only use them on strings with no
more than 2-3 characters.
but if you use a different set of test strings with more ordinary
characters than escaped characters, e.g.
strings = ['foo', '', '\\', ' ', '"', '\\"', '\\\\']
strings = [(x+"spamspamspamspamspam")*10 for x in strings]
you'll find that a RE approach can be much faster. the following
version is about four times faster than your code, under 2.2:
def re_quote(string, sub=re.compile(r"[\\\"]").sub):
def fixup(m):
return "\\" + m.group(0)
return sub(fixup, string)
def re_unquote(string, sub=re.compile(r"(?s)\\(.)|\\").sub):
def fixup(m):
ch = m.group(1)
if ch is None:
raise 'Parse Error: Backslash at end of string'
if ch not in r"\\\"":
raise 'Parse Error: unsupported character after backslash'
return ch
return sub(fixup, string)
:::
note the use of callbacks instead of substitution templates. it's
usually faster (and in my opinion, also more pythonic) to use e.g.
def fixup(m):
return "spam %s %s" % m.group(1, 2)
re.sub(pattern, fixup, string)
or, if you prefer lambdas:
re.sub(pattern, lambda m: "spam %s %s" % m.group(1, 2), string)
than the re.sub non-standard interpolation syntax:
re.sub(pattern, "spam \\1 \\2", string)
(and where possible, it's also slightly faster to use m.groups() instead
of enumerating all the groups in m.group(...))
ymmv, as usual.
</F>
<!-- (the eff-bot guide to) the python standard library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->
More information about the Python-list
mailing list