Performing a number of substitutions on a unicode string

Peter Otten __peter__ at web.de
Tue Dec 20 10:35:21 EST 2011


Arnaud Delobelle wrote:

> I've got to escape some unicode text according to the following map:
> 
> escape_map = {
>     u'\n': u'\\n',
>     u'\t': u'\\t',
>     u'\r': u'\\r',
>     u'\f': u'\\f',
>     u'\\': u'\\\\'
> }
> 
> The simplest solution is to use str.replace:
> 
> def escape_text(text):
>     return text.replace('\\', '\\\\').replace('\n',
> '\\n').replace('\t', '\\t').replace('\r', '\\r').replace('\f', '\\f')
> 
> But it creates 4 intermediate strings, which is quite inefficient
> (I've got 10s of MB's worth of unicode strings to escape)
> 
> I can think of another way using regular expressions:
> 
> escape_ptn = re.compile(r"[\n\t\f\r\\]")
> 
> # escape_map is defined above
> def escape_match(m, map=escape_map):
>     return map[m.group(0)]
> 
> def escape_text(text, sub=escape_match):
>     return escape_ptn.sub(sub, text)
> 
> Is there a better way?

>>> escape_map = {
...     u'\n': u'\\n',
...     u'\t': u'\\t',
...     u'\r': u'\\r',
...     u'\f': u'\\f',
...     u'\\': u'\\\\'
... }
>>> escape_map = dict((ord(k), v) for k, v in escape_map.items())
>>> print u"the quick\n brown\tfox 
jumps\\over\\the\\lazy\\dog".translate(escape_map)
the quick\n brown\tfox jumps\\over\\the\\lazy\\dog





More information about the Python-list mailing list