Why does unicode-escape decode escape symbols that are already escaped?

Somelauw . somelauw at gmail.com
Sun May 10 19:56:46 EDT 2015


2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav at gmail.com>:
> Whenever you start encoding and decoding, you need to know whether
> you're working with bytes->text, text->bytes, or something else. In
> the case of unicode-escape, it expects to encode text into bytes, as
> you can see with your second example - you give it a Unicode string,
> and get back a byte string. When you attempt to *decode* a Unicode
> string, that doesn't actually make sense, so it first gets *encoded*
> to bytes, before being decoded. What you're actually seeing there is
> that the one-character string is being encoded into a three-byte UTF-8
> sequence,and then the unicode-escape decode takes those bytes and
> interprets them as characters; as it happens, that's equivalent to a
> Latin-1 decode:

Thanks for your response.
I was using unicode-escape for handling escape characters like
converting "\\n" to actual newlines.
My input argument is already in string format and the decoding from
bytes to string has already been done a couple of layers deeper, so I
really needed a string to string conversion.
I guess that it's not possible to do this operation without converting
to bytes first (even if I use the codecs module, it will convert to
bytes implicitly as you just told me).
What I'm probably going to do is writing my own parser to perform this task.



More information about the Python-list mailing list