decode unicode string using 'unicode_escape' codecs

Steven Bethard steven.bethard at gmail.com
Fri Jan 13 03:01:18 EST 2006


aurora wrote:
> I have some unicode string with some characters encode using python  
> notation like '\n' for LF. I need to convert that to the actual LF  
> character. There is a 'unicode_escape' codec that seems to suit my purpose.
> 
>>>> encoded = u'A\\nA'
>>>> decoded = encoded.decode('unicode_escape')
>>>> print len(decoded)
> 3
> 
> Note that both encoded and decoded are unicode string. I'm trying to 
> use  the builtin codec because I assume it has better performance that 
> for me  to write pure Python decoding. But I'm not converting between 
> byte string  and unicode string.
> 
> However it runs into problem in some cases.
> 
> encoded = u'€\\n€'
> decoded = encoded.decode('unicode_escape')
> Traceback (most recent call last):
>   File "g:\bin\py_repos\mindretrieve\trunk\minds\x.py", line 9, in ?
>     decoded = encoded.decode('unicode_escape')
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in  
> position 0: ordinal not in range(128)

Does this do what you want?

 >>> u'€\\n€'
u'\x80\\n\x80'
 >>> len(u'€\\n€')
4
 >>> u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8')
u'\x80\n\x80'
 >>> len(u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8'))
3

Basically, I convert the unicode string to bytes, escape the bytes using 
the 'string_escape' codec, and then convert the bytes back into a 
unicode string.

HTH,

STeVe



More information about the Python-list mailing list