How to read strings cantaining escape character from a file and use it as escape sequences?

John Machin sjmachin at lexicon.net
Sat Dec 1 17:18:20 EST 2007


On Dec 2, 2:33 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> slomo <slim... at gmail.com> wrote:
> >>>> print line
> > \u0050\u0079\u0074\u0068\u006f\u006e
>
> > But I want to get a string:
>
> > "\u0050\u0079\u0074\u0068\u006f\u006e"
>
> > How do you make it?
>
> line.decode('unicode-escape')

Amazing what you can find in obscure corners of the obscure docs! BTW,
how many folks know what "bijective" means ?

Hmmm ... the encode is documented as "Produce a string that is
suitable as Unicode literal in Python source code", but it *isn't*
suitable. A Unicode literal is u'blah', this gives just blah. Worse,
it leaves the caller to nut out how to escape apostrophes and quotes:

>>> test = u'Python\'\'\'\'\"\"\"\"\u1234\n'
>>> print repr(test)
u'Python\'\'\'\'""""\u1234\n'
>>> print test.encode('unicode-escape')
Python''''""""\u1234\n
>>>

Why would someone bother writing this codec when repr() does the job
properly?

Anyhow, here's a solution to the OP's stated problem from first
principles using basic building blocks:

>>> line = '\\u0050\\u0079\\u0074\\u0068\\u006f\\u006e\n'
>>> u''.join(unichr(int(x, 16)) for x in line.split(r'\u') if x and x != '\n') + u'\n'
u'Python\n'
>>>



More information about the Python-list mailing list